How to Build a High-Performance Blockchain

Source: Aptos Labs
Since the advent of computing technology, engineers and researchers have been continuously exploring how to push computing resources to the performance limit, aiming to maximize efficiency while minimizing the latency of computing tasks. The two pillars of high performance and low latency have always shaped the development of computer science, influencing a wide range of fields from CPUs, FPGAs, and database systems to more recent artificial intelligence infrastructure and blockchain systems. In the pursuit of high performance, pipeline technology has become an indispensable tool. Since the introduction of pipeline technology in the IBM System/360 in 1964 [1], it has been a core of high-performance system design, driving key discussions and innovations in the field.
Pipeline technology is not only applied to hardware but also widely used in the database field. For example, Jim Gray introduced the pipeline parallelism approach in his work "High-Performance Database Systems" [2]. This method breaks down complex database queries into multiple stages and runs them simultaneously, thus improving efficiency and performance. Pipeline technology is equally vital in the field of artificial intelligence, especially in widely used deep learning frameworks like TensorFlow. It utilizes data pipeline parallelism to process data preprocessing and loading, ensuring a smooth flow of data for training and inference, making AI workflows faster and more efficient [3].
Blockchain is no exception. Its core function is similar to a database, handling transactions and updating the state, but it adds the challenge of Byzantine fault-tolerant consensus. The key to improving blockchain throughput (transactions per second) and reducing latency (time to finality) lies in optimizing the different stages—ordering, execution, submission, and transaction synchronization—during interactions under high loads. This challenge is particularly crucial in high-throughput scenarios where traditional designs struggle to maintain low latency.
To explore these concepts, let's consider a familiar analogy: the automobile factory. Understanding how the assembly line has revolutionized manufacturing can help us grasp the evolution of the blockchain pipeline—and why next-generation designs like Zaptos [8] are pushing blockchain performance to new heights.
From Automobile Factory to Blockchain
Imagine you are the owner of an automobile factory with two main goals:
· Maximize throughput: Assemble as many cars as possible every day.
· Minimize latency: Reduce the build time of each car.
Now, consider three types of factories:
Simple Factory
In a simple factory, a group of versatile workers systematically assembles a car. One worker assembles the engine, the next worker installs the wheels, and so on—producing only one car at a time.
The issue? Some workers often wait idle, leading to an overall low production efficiency because no one is working on different parts of the same car simultaneously.
Ford Factory
Enter the Ford assembly line[4]! Here, each worker focuses on a single task. The car moves along a conveyor belt, and as each car passes through, a dedicated worker adds their part.
The result? Multiple cars are at different assembly stages simultaneously, and all workers are busy. Throughput increases significantly—but each car still needs to go through each worker sequentially, meaning the delay per car remains the same.
Magic Factory
Imagine a magic factory where all workers can work on a single car simultaneously! No longer needing to move the car from one station to the next, each part of the car is built simultaneously.
The outcome? The car is assembled at a record speed, with every step happening in sync. This is the ideal scenario to address throughput and latency issues.
Alright, enough about car factories—what about blockchain? As it turns out, designing a high-performance blockchain is not so different from optimizing an assembly line.
Blockchain as a Car Factory
In blockchain, processing a block is akin to assembling a car. The analogy goes as follows:
· Worker = Validator Resource
· Car = One Block
· Assembly Task = Consensus, Execution, and Submission stages
Just as in a simple factory where only one car is processed at a time, if a blockchain were to handle only one block at a time, it would result in underutilization of resources. In contrast, modern blockchain designs aim to emulate the Ford assembly line—processing multiple blocks in different stages simultaneously. This is where pipeline technology shines.
Evolution of Blockchain Pipelines
Traditional Architecture: Sequential Blockchain
Imagine a blockchain that processes blocks sequentially. Validators need to:
1. Receive block proposals.
2. Execute blocks to update the blockchain state.
3. Proceed with achieving consensus on that state.
4. Persist the state to the database.
5. Initiate the consensus for the next block.
Where is the problem?
· Execution and submission are in the critical path of the consensus process.
· Each consensus instance needs to wait for the previous one to complete before starting.
This setup is akin to factories of the pre-Ford era: workers (resources) often idle as they focus on only one block (car) at a time. Unfortunately, many existing blockchains still fall into this category, leading to low throughput and high latency.
Aptos: Parallelizing Performance
Diem introduced a pipeline architecture that decouples execution and submission from the consensus phase, with the consensus phase itself also adopting a pipeline design.
· Asynchronous Execution and Submission [5]: Validators first agree on a block, then execute the block based on the parent block's state. Once validated by a quorum of validators, the state is persisted to storage.
· Pipeline Consensus (Jolteon[6]): New consensus instances can start before the previous one completes, akin to a moving assembly line.
This enhancement allows different blocks to be in different stages simultaneously, increasing throughput and significantly reducing block times to just 2 message delays. However, Jolteon's leader-based design may lead to bottlenecks as the leader can become overloaded during transaction dissemination.
Aptos further optimizes the pipeline through Quorum Store[7], a mechanism that decouples data distribution from consensus. Quorum Store no longer relies on a single leader to broadcast large data blocks in the consensus protocol but separates data distribution from metadata ordering, allowing validators to asynchronously and concurrently distribute data. This design leverages the total bandwidth of all validators, effectively eliminating leader bottlenecks in consensus.

Visualization: How Quorum Store balances resource utilization in leader-based consensus protocols.
Thus far, the Aptos blockchain has built the "Ford Factory" of blockchains. Just as Ford's assembly line revolutionized car manufacturing—different cars in different stages simultaneously—Aptos processes different blocks in different stages concurrently. Each validator's resources are fully utilized, ensuring no part of the process remains idle. This clever arrangement has led to a high-throughput system, making Aptos a robust platform for efficiently and scalably handling blockchain transactions.

Illustration: Pipelined Processing of Sequential Blocks in the Aptos Blockchain. Validators can pipeline process different stages of sequential blocks to maximize resource utilization and increase throughput.
While throughput is crucial, end-to-end latency—the time from transaction submission to final confirmation—is equally important. For applications such as payments, decentralized finance (DeFi), and gaming, every millisecond counts. Many users have experienced delays during high-traffic events because each transaction must sequentially pass through a series of stages: client-full node-validator communication, consensus, execution, state validation, submission, and full node synchronization. Under high load, stages like execution and full node synchronization introduce additional latency.

Illustration: Pipeline Architecture of the Aptos Blockchain. The diagram shows client Ci, full node Fi, and validator Vi. Each box represents a stage a transaction block in the blockchain must go through from left to right. The pipeline consists of five stages: consensus (including dissemination and ordering), execution, validation, submission, and full node synchronization.
It's like a Ford factory: while the assembly line maximizes overall throughput, each car still needs to pass through each worker sequentially, resulting in longer completion times. To truly push blockchain performance to the limit, we need to build a "magic factory" where these stages run in parallel.
Zaptos: Towards Optimal Blockchain Latency
Zaptos[8] further reduces latency through three key optimizations without sacrificing throughput.
· Optimistic Execution: Reducing pipeline latency by starting execution immediately upon receiving a block proposal. Validators promptly add the block to the pipeline and speculatively execute after the parent block completes. Full nodes, upon receiving the proposal from the validator, also perform optimistic execution to validate the state proof.
· Optimistic Submission: Writing state to storage immediately after block execution—even before state validation. When validators eventually validate the state, only minimal updates are needed to complete the submission. If a block ultimately remains unsorted, its optimistically submitted state is rolled back for consistency.
· Fast Verification: Validators expedite verification by concurrently sending validation messages at the final consensus round, starting early verification of the executed block's state without waiting for consensus completion. This optimization significantly reduces pipeline latency by one round in common scenarios.

Illustration: Parallel Pipeline Architecture of Zaptos. Stages other than consensus are effectively hidden within the consensus stage, reducing end-to-end latency.
Through these optimizations, Zaptos effectively hides the latency of other pipeline stages within the consensus stage. Thus, if a blockchain adopts an optimal latency consensus protocol, the overall blockchain latency can also reach an optimum!
Talk is Cheap, Show Me the Data
We evaluated Zaptos' end-to-end performance through geographically distributed experiments, with Aptos as the high-performance baseline. For more details, refer to the paper [8].
On Google Cloud, we simulated a globally decentralized network consisting of 100 validators and 30 full nodes distributed across 10 regions, using commercial-grade machines similar to Aptos deployment.
Throughput-Latency

Figure: Common performance characteristics of Zaptos and Aptos blockchains.
The above figure compares the relationship between end-to-end latency and throughput of the two systems. Both exhibit a gradual latency increase as the load increases, with sharp spikes at maximum capacity, but Zaptos consistently demonstrates more stable latency before reaching peak throughput, reducing latency by 160 milliseconds under low load and over 500 milliseconds under high load.
Impressively, Zaptos achieves sub-second latency at 20k TPS in a production-level mainnet environment—this breakthrough makes real-world applications requiring speed and scalability a possibility.
Latency Breakdown

Figure: Latency breakdown of the Aptos blockchain.

Figure: Latency breakdown of Zaptos.
The latency breakdown charts detail the duration of each stage for validators and full nodes in the pipeline. Key insights include:
· Up to 10k TPS: Zaptos' overall latency is nearly equivalent to its consensus latency, as optimistic execution, authentication, and optimistic commit stages are effectively "hidden" within the consensus stage.
· Above 10k TPS: Due to increased optimistic execution and full node synchronization time, non-consensus stages become more significant. Nevertheless, Zaptos significantly reduces overall latency by overlapping most stages. For example, at 20k TPS, the baseline total latency is 1.32 seconds (consensus 0.68 seconds, other stages 0.64 seconds), while Zaptos is 0.78 seconds (consensus 0.67 seconds, other stages 0.11 seconds).
Conclusion
The evolution of blockchain architecture parallels the transformation in manufacturing—from simple sequential workflows to highly parallelized assembly lines. Aptos's assembly line approach has significantly increased throughput, while Zaptos goes further, reducing latency to sub-second levels, all while maintaining high TPS. Just as modern computing architectures leverage parallelism to maximize efficiency, blockchain must continuously optimize its design to eliminate unnecessary delays. By comprehensively optimizing the blockchain pipeline to achieve minimal latency, Zaptos paves the way for real-world blockchain applications that require speed and scalability.
References
[1] Gene M. Amdahl, Gerrit A. Blaauw, and Frederick P. Brooks. 1964. "Architecture of the IBM System/360." IBM Journal of Research and Development. https://doi.org/10.1147/rd.82.0087
[2] David DeWitt, and Jim Gray. 1992. "Parallel Database Systems: The Future of High Performance Database Systems." Communications of the ACM. https://doi.org/10.1145/129888.129894
[3] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin et al. 2016. "TensorFlow: a System for Large-Scale Machine Learning." In 12th USENIX symposium on operating systems design and implementation (OSDI). https://arxiv.org/abs/1605.08695
[4] The Moving Assembly Line and the Five-Dollar Workday. https://corporate.ford.com/articles/history/moving-assembly-line.html
[5] Zekun Li, and Yu Xia. 2021. DIP-213 - Decoupled Execution. https://github.com/diem/dip/blob/7dc44ee57bb7efe76559f05dcc6851d97e2d3149/dips/dip-213.md
[6] Rati Gelashvili, Lefteris Kokoris-Kogias, Alberto Sonnino, Alexander Spiegelman, and Zhuolun Xiang. 2022. "Jolteon and Ditto: Network-Adaptive Efficient Consensus with Asynchronous Fallback." In International conference on financial cryptography and data security (FC). https://arxiv.org/abs/2106.10362
[7] Quorum Store: How Consensus Horizontally Scales on the Aptos Blockchain. https://medium.com/aptoslabs/quorum-store-how-consensus-horizontally-scales-on-the-aptos-blockchain-988866f6d5b0
[8] Zhuolun Xiang, Zekun Li, Balaji Arun, Teng Zhang, and Alexander Spiegelman. 202 2025. "Zaptos: Towards Optimal Blockchain Latency." arXiv preprint arXiv:2501.10612. https://arxiv.org/abs/2501.10612
This article is from a submission and does not represent the views of BlockBeats.
You may also like

Wall Street Shorts ETH: Vitalik is aware and has front-run, while Tom Lee remains oblivious

Social Capital CEO: How Equity Tokenization is Reshaping Capital Markets from US Stocks to SpaceX?

CoinGecko Report: Surge of 346% vs Dip of 20.8%, The Wild Rise of DEX

a16z: The Real Opportunity of Stablecoins Lies Not in Disruption but in Filling Gaps

Mining Exodus: Someone Holds $12.8 Billion AI Order

March 6 Market Key Intelligence, How Much Did You Miss?

a16z: The True Opportunity of Stablecoins is in Complementing, Not Disrupting
Predict LALIGA Matches, Shoot Daily & Win BTC, USDT and WXT on WEEX
The WEEX × LALIGA campaign brought together football excitement and crypto participation through a dynamic interactive experience. During the event, users predicted matches, completed trading tasks, and took daily shots to compete for rewards including BTC, USDT, WXT, and exclusive prizes.

Ray Dalio Dialogue: Why I'm Betting on Gold and Not Bitcoin

Who Took the Money in the AI Era? A Must-See Investment Checklist for HALO Asset Trading

Wall Street Bears Target Ethereum: Vitalik In the Know Takes Flight, Tom Lee Remains Bullish

Pump.fun Hacker Steals $2 Million, Receives 6-Year Prison Sentence, Opts for 'Self-Detonation'

6% Annual Percentage Yield as Musk Declares War on Traditional Banks

36 years, 4 wars, 1 script: How does capital price the world in conflict?

Mining Companies' Great Migration: Some Have Already Secured $12.8 Billion in AI Orders

What Is Vibe Coding? How AI Is Changing Web3 & Crypto Development
What is vibe coding? Learn how AI coding tools are lowering the barrier to Web3 development and enabling anyone to build crypto applications.

The parent company of the New York Stock Exchange strategically invests in OKX: The intentions behind the $25 billion valuation

WEEX P2P update: Country/region restrictions for ad posting
To improve ad security and matching accuracy, WEEX P2P now allows advertisers to restrict who can trade with their ads based on country or region. Advertisers can select preferred counterparty locations for a safer, smoother trading experience.
I. Overview
When publishing P2P ads, advertisers can now set the following:
Allow only counterparties from selected countries or regions to trade with your ads.
With this feature, you can:
Target specific user groups more precisely.Reduce cross-region trading risks.Improve order matching quality.
II. Applicable scenarios
The following are some common scenarios:
Restrict payment methods: Limit orders to users in your country using supported local banks or wallets.Risk control: Avoid trading with users from high-risk regions.Operational strategy: Tailor ads to specific markets.
III. How to get started
On the ad posting page, find "Trading requirements":
Select "Trade with users from selected countries or regions only".Then select the countries or regions to add to the allowlist.Use the search box to quickly find a country or region.Once your settings are complete, submit the ad to apply the restrictions.
When an advertiser enables the "Country/Region Restriction" feature, users who do not meet the criteria will be blocked when placing an order and will see the following prompt:
If you encounter this issue when placing an order as a regular user, try the following solutions.
Choose another ad: Select ads that do not restrict your country/region, or ads that allow users from your location.Show local ads only: Prioritize ads available in the same country as your identity verification.
IV. Benefits
Compared with ads without country/region restrictions, this feature provides the following improvements.
Aspect
Improvement
Trading security
Reduces abnormal orders and fraud risk
Conversion efficiency
Matches ads with more relevant users
Order completion rate
Reduces failures caused by incompatible payment methods
V. FAQ
Q1: Why are some users not able to place orders on my ad?
A1: Their country or region may not be included in your allowlist.
Q2: Can I select multiple countries or regions when setting the restriction?
A2: Yes, multiple selections are supported.
Q3: Can I edit my published ads?
A3: Yes. You can edit your ad in the "My Ads" list. Changes will take effect immediately after saving.