Erigon.tech https://erigon.tech Beyond Software Frontiers. Sat, 29 Nov 2025 08:48:07 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.5 https://erigon.tech/wp-content/uploads/2024/10/cropped-Icon-Erigon-500-32x32.png Erigon.tech https://erigon.tech 32 32 Erigon v3.3: Introducing the Historical Proofs Data Model https://erigon.tech/erigon-v3-3-introducing-the-historical-proofs-data-model/ https://erigon.tech/erigon-v3-3-introducing-the-historical-proofs-data-model/#respond Fri, 28 Nov 2025 16:33:21 +0000 https://erigon.tech/?p=2738 With the release of Erigon v3.3, we are announcing the addition of efficient indexing for historical MPT proofs. Now users will be able to resync with –prune.include-commitment-history to store all historical proofs. This article is divided into two parts:

  • Part 1: Why historical proofs matter and why solving this problem is worthwhile (use cases, comparisons with other clients, and practical implications).
  • Part 2: A technical explanation of how the system works and the design decisions that made it possible.

First, we look at why historical proofs are important for both the Ethereum roadmap and the broader application layer.

Below is a tightened, corrected, and completed Part 1, presented with clear structure and a straightforward technical style. Subsequently, we will move to Part 2 where there will be a technical on how it works.

1. Why Historical Proofs Matter

Historical proofs allow any party to verify past Ethereum state without trusting a full node or replaying the entire chain. A historical proof provides the minimal data needed to reconstruct and validate the state value at a specific block—either account state or contract storage supported by the corresponding MPT branch.

This capability matters for several reasons:

  1. Ethereum’s long-term direction prioritizes ZKVms. In a ZK world, block verifiers rely on compact state witnesses instead of carrying the full state. To reach that future, clients must be able to store, index, and serve historical proofs efficiently. Without efficient proof handling, the generation of proofs and syncing up nodes gets very hard and tedious.
  2. Trustless Access to Historical State for applications. Without efficient indexed proofs, users either rely on centralized RPCs or run heavy infrastructure. Providing direct, verifiable access without replaying blocks—reduces operational cost and increases trust guarantees. the applications that will benefit will be Layer 2 systems reconstructing L1 provenance and for creating challenges (see Nitro) and Zero-knowledge systems needing inclusion proofs for older data

Performance Compared to Other Clients

Before v3.3, obtaining historical proofs required one of the following:

  • Running a full archive Geth/Nethermind nodes (Reth only supports last 10k blocks) which are >20TB in size.
  • Querying a centralized RPC node that performs expensive on-demand proof reconstruction.

These approaches are either heavy in resources or lead to centralization. With v3.3, Erigon can serve proofs directly from indexed structures, delivering:

  • Consistent low-latency retrieval
  • Predictable resource use

Comparison against other clients

Comparison against other clients
ClientStores Historical Proofs Trie nodes Size (Approx.) p50 Latency (s)p75 Latency (s)p90 Latency (s)p99 Latency (s)
RethNoN/AN/AN/AN/AN/A
GethYes (full archive)>20 TB0.0150.0260.0350.060
NethermindYes (full archive)>20 TB 0.0150.0220.0300.058
Erigon v3.3Yes (indexed proofs)4.1 TB 0.0030.0060.0070.012

NOTE: Geth is working on a solution to this problem, so the figure of >20TB is not their long-term goal, as they are developing a solution that will produce comparable results to those of Erigon.

Erigon is consistently 5x times faster at serving proofs than both Geth and and Nethermind.

2. Gentle technical explainer on how the proof indexing work (tech people only)

Before we talk about the actual indexing, we need to talk about Haystack.

Haystack is Meta’s cold-storage system for photos. The core idea is to separate large, immutable binary objects (the photos) from the small metadata required to locate them. This enables extremely cost-effective sequential disk access for bulk data, while ensuring the indexing layer remains compact and fast. We could try separating these two types of data into two different kinds of file.

Segment Files (Data Files)

These contain the raw objects (photos) almost entirely sequentially according to some heuristic.
Key properties:

  • Large, append-only, immutable.
  • Optimized for throughput, not random seeks.
  • No metadata inside besides minimal framing.
  • You do not search inside them; you jump to an offset and read.
  • Heuristic for the order is up to the implemented (the design is flexible).

Segment files are the “haystack.”

Index Files

Index files store the data structure needed to locate an object inside the seg files:

  • Object key → (seg file ID, offset, length)
  • Small, random-access friendly, designed for memory mapping or SSD.

Index files tell you where to read, seg files contain what to read.

This separation is the key architectural trick which we are going to build upon to map it on the historical merkle proofs problem.

Mapping This to Erigon’s Proof Indexing

Erigon adopts the same principle: Large, immutable data files for trie nodes + small, query-optimized index files. This is important because you can cleanly separate files with very similar looking data into separate contiguous memory without mixing them. this can allow for clever storage optimization tricks such as doing any kind of compression we want and manipulate the heuristic as to what order each individual trie node is stored into the file (a sorting heuristic).

Index Files

The index in and out of itself is not important, it maps (also note that this is a very high level description. in reality there is a lot of compression at this layer through elias-fano encoding a roaring bitmaps so please take a leap of faith consider the storage overhead minimal for indexing):

(block) -> (trie nodes at that state)

Meaning:

  • The index answers which segment contains the specific proof branch given a block number.
  • Actual node retrieval is a single quick offset jump into the corresponding side-node data file.
  • No reconstruction, no re-hashing, no walking the historical trie, just a one-way lookup on a perfect hash-table which is O(1) ignoring hardware overhead.

This is important for the latency but not important for the size. the way we optimize the storage requirements come from.

The Data File (The Storage Optimization)

The data files store all historical trie nodes. Each file contains many nodes, but they are not written in the order the chain encounters them. Instead, Erigon reorganizes them to maximize compression and locality.

The chain is split into fixed-size chunks of transactions:

{ChunkSize} = 1,500,000 \ \text{tx}

For each chunk (C_i), Erigon collects all trie nodes touched during execution:

N_i = { n_1, n_2, \ldots, n_k }

These nodes are the input set for the layout and compression pipeline.

Similarity Metric: Hamming Distance on Compressed Nodes

Before sorting, each node is serialized and transformed into a compact encoding:

c(n) = \text{CompressedEncoding}(n)

Similarity is measured on this compressed representation, not on the raw node bytes. For two nodes (n_a) and (n_b):

d_H(c(n_a), c(n_b)) = \sum_{j=1}^{L} \mathbf{1}\big(c(n_a)_j \neq c(n_b)_j\big)

Erigon then orders the nodes in (N_i) so that consecutive nodes are as similar as possible:

\min \sum_{j=1}^{k-1} d_H\big(c(n_j), c(n_{j+1})\big)

A neat property here is that, in practice, similarity behaves almost transitively on these compressed encodings. If

d_H(c(n_j), c(n_{j+1})) \ \text{is small} \quad\text{and}\quad d_H(c(n_{j+1}), c(n_{j+2})) \ \text{is small},

then typically

d_H(c(n_j), c(n_{j+2}))

is also small. Once similar nodes start clustering, that similarity naturally propagates along the sorted sequence, which is exactly what we want for compression.

Importantly, this means:

  • Each data file contains many trie nodes.
  • Nodes are not stored in chain order or “first-seen” order.
  • They are stored in an order that minimizes Hamming distance between neighbors in their compressed form.

Batch Compression

After sorting, the sequence is split into fixed-size batches of 64 nodes:

B_m = { n_{64m}, n_{64m+1}, \ldots, n_{64m+63} }

Compression is applied after batching:

\text{CompressedBlock}(B_m) = \mathcal{C}(B_m)

where C is the chosen compression function. Because each batch contains nodes that are all close to each other under dH on their compressed encodings, the entropy within a batch is low, and the compressor achieves very high ratios. Indexing is also done on compressed functions. our compression function of choice is ZSTD.

Resulting Layout

For each 1.5M-tx chunk:

  • Every trie node seen in that chunk is stored.
  • Nodes are reordered by similarity on their compressed representation.
  • The reordered list is cut into groups of 64 and compressed batch-by-batch.

The index layer points into these data files by offset, but the files themselves are optimized purely for density and locality. This layout is what allows Erigon to store full historical proofs in about 4.1 TB for trie nodes, instead of the 20+ TB required by archive-style layouts that keep nodes in chronological or naive trie order.

Analysis of the entropy of historical trie nodes.

This section is for most nerdy people that want to understand how this approach came to be. the way I figured it out is by looking at the optimal entropy.

What is Entropy?

Entropy measures the average uncertainty or randomness in a dataset. Coined by Claude Shannon in 1948, it quantifies how much information is contained in each bit of data. The higher the entropy, the more “surprised” we are by each new piece of data—and the harder it is to compress.

The Mathematical Definition given by ChatGPT

Shannon entropy H(X) for a discrete random variable X with possible values {x₁, x₂, , xₙ} is:

H(X) = -\sum_{i=1}^{n} p(x_i) \log_2 p(x_i)

Where:

  • p(xᵢ) is the probability of outcome xᵢ
  • log₂ gives us the result in bits
  • The negative sign ensures entropy is positive
  1. Fair coin (50% heads, 50% tails):

    • H = -[0.5 × log₂(0.5) + 0.5 × log₂(0.5)] = 1 bit

    • Maximum entropy for a binary outcome

  2. Biased coin (90% heads, 10% tails):

    • H = -[0.9 × log₂(0.9) + 0.1 × log₂(0.1)] ≈ 0.47 bits
    • Lower entropy because outcomes are more predictable
  3. Two-headed coin (100% heads):

    • H = -[1.0 × log₂(1.0)] = 0 bits
    • Zero entropy—no uncertainty!

Examples:

  • Random data: ~8 bits per byte (maximum)
  • English text: ~4.5 bits per byte
  • Repetitive logs: ~2-3 bits per byte
  • Cryptographic hashes: ~7.9 bits per byte

Compression Ratio as a way to measure entropy

The most practical way to measure entropy is to compress the data and see how small it gets:

\text{Compression Ratio} = \frac{\text{Compressed Size}}{\text{Original Size}}

\text{Estimated Entropy} ≈ 8 × \text{Compression Ratio} \text{ bits/byte}

This works because optimal compression approaches the theoretical entropy limit. Real compressors like ZSTD, gzip, or bzip2 give us a practical approximation of the true entropy.

Examples:

Data TypeCompression RateEstimated Entropy
Random bytes~100% (no compression)~8 bits/byte
English text~30%~2.4 bits/byte
JSON logs~15%~1.2 bits/byte
Repeated pattern~5% ~0.4 bits/byte

Why This Works

Compression algorithms exploit redundancy, and thus the compression ratio directly reflects how much redundancy (inverse of entropy) exists in your data. If ZSTD can compress your file to 10% of its original size, that means 90% of the original was redundant—the true information content was only 10%. Additionally, Entropy isn’t just about the data it’s about how it is represented. The same information can have different entropy depending on how it’s organized.

The trivial way to measure the entropy in our case is to take a 100% unoptimized historical trie layout and see how well ZSTD compresses it with different ordering.

Erigon v3.3: Historical Trie Node Compression Analysis

]]>
https://erigon.tech/erigon-v3-3-introducing-the-historical-proofs-data-model/feed/ 0
Introducing Zilkworm, Erigon’s C++ path to proving Ethereum https://erigon.tech/introducing-zilkworm-erigons-c-path-to-proving-ethereum/ https://erigon.tech/introducing-zilkworm-erigons-c-path-to-proving-ethereum/#respond Thu, 13 Nov 2025 07:23:25 +0000 https://erigon.tech/?p=2715 Scaling Ethereum L1, Proving for L2s

In Ethereum’s proof-of-stake system, finality is achieved when a block is confirmed by at least two-thirds of validators over two consecutive epochs, taking about 12 minutes under current settings. This duration balances decentralization, processing power, and security, though our developers are exploring ways to reduce it to improve efficiency and security for decentralized applications and exchanges. 

Erigon is researching the potential of Zilkworm, a C++ based ZK core for prover backends: a project in the realm of the Ethereum Foundation’s increased focus for scalability, privacy, and decentralization.

// Create block proofs in one command
z6m_prover prove --block-number 23456789 --out proof.json
z6m_prover verify proof.json

As of today, current proof generation takes about 100 seconds on a decent GPU (~1 Nvidia 4090), which is acceptable given Ethereum’s 12-minute finality. The Zilkworm team has established a robust, debuggable foundation, which includes a functional RISC-V debugger, and it is now aggressively driving technical innovations aimed at a 150-200% performance gain.
This new validator model enables consumer-grade hardware to handle increased block sizes by downloading only block headers and proofs, which significantly reduces data processing. In other words, this is a big step towards Ethereum’s goal of 1 Gigagas per second.

This approach also facilitates faster “rollups” with dramatically improved transaction inclusion latency and reduced processing overhead. Furthermore, a new type-1 EVM chain can be easily spun off re-using the same software stack.

Zilkworm marks Erigon’s entry into the space of STARKifying Layer 2 solutions, representing a significant step for the company.

Full ETH-Proofs on ethproofs

Zilkworm is now live on https://ethproofs.org/clusters, demonstrating itself to be one of the fastest proving clients, running on a cheaper architecture machine, with a single 5090 GPU and a modest consumer desktop chip. At the time of writing it is 2x faster than competition in terms of cycle count!

Provers - Ethproofs

The C++ Advantage in a Rust-Dominated World

Zilkworm is loosely based on the former Erigon project Silkworm that heavily made use of battle-tested C++ constructs and libraries to achieve big performance leaps. Its successor continues the journey and  aims to scale Ethereum using Zero-Knowledge Proofs (ZKPs). At the moment it’s one of a kind “guest program” running on top of solutions like Succinct’s SP1 Turbo.

Needless to say, the current domain of provers is largely dominated by Rust, and the standout  C++ implementation offers the much required client diversity, a crucial factor for ecosystem resilience.

Furthermore when it comes to small embedded targets of RISCV (e.g rv32im) C++’s proximity to machine hardware results in many opportunities for superior optimizations.

While Rust is often favored for new implementations, Zilkworm’s existing C++ codebase provides a competitive edge in performance, a strong marketing point in the race for efficient ZK solutions.

Strategic Integration and Decentralization with L2s

Zilkworm could potentially integrate with any Ethereum L2 and by eliminating the challenge period, it could provide them with true L1-level security. While decentralizing sequencers presents a revenue challenge for most L2s, the move could attract more Total Value Locked (TVL) and prevent customer loss to competitors. This strategic shift would enhance decentralization, addressing concerns about censorship and centralized control, especially for international entities.

Privacy for Enterprise and Beyond

For enterprises, we are looking at applications with Zilkworm as an offering for privacy-preserving blockchain. Businesses can maintain internal accounting and transactions privately, submitting only proofs of state transitions to Ethereum and vice-versa: call it a private L2. This enhances public trust, enables interoperability and integrates with the Ethereum smart contracts’ existing ecosystem for assets.

Broader Applicability and Future Outlook

Zilkworm’s applicability extends beyond Ethereum to other EVM blockchains. Zilkworm can thrive in partnership with various Risc V prover SDKs and solutions, offering a superior alternative to existing solutions like Revm (and others in the line). With mainnet proofs expected to reach beta or production release soon, Zilkworm is poised to become a market leader in this domain.

Join us in our mission to make ZK more accessible for the Ethereum protocol!

Visit https://zilkworm.erigon.tech/ for more information.

 

]]>
https://erigon.tech/introducing-zilkworm-erigons-c-path-to-proving-ethereum/feed/ 0
Partnership for Progress https://erigon.tech/partnership-for-progress/ https://erigon.tech/partnership-for-progress/#respond Mon, 27 Oct 2025 14:52:33 +0000 https://erigon.tech/?p=2700 Erigon is at the forefront of Ethereum client optimization, uniquely committed to providing a unified staking experience, as the only EVM client to integrate both the Execution Layer and a Consensus Layer (CL), Caplin. This unification eliminates the need for stakers to run and manage separate CL software, streamlining operations and significantly enhancing performance. This cohesive architecture is key to Erigon’s promise of superior network robustness and security.

To rigorously test this innovative, comprehensive solution, Erigon has established a critical partnership with Gateway.fm.

The Imperative of Early Testing

An innovative solution of this caliber requires extensive testing under real-world, high-stakes conditions to ensure that new features are battle-tested before general release.

Since its early beta/release candidate stages in the spring, Gateway.fm has been a key test partner, pioneering the use of Erigon + Caplin for the Ethereum Foundation’s Client Incentive Program validators since early spring and for Lido Curated Module (CM) validators since May.

By meticulously monitoring validator health and performance metrics on these early release candidates, Gateway.fm contributes directly to the stability and resilience of the Erigon + Caplin client and the broader Ethereum network.

With Erigon + Caplin, Gateway.fm now secures more than 4,000 ETH, representing one-third of its Ethereum staking portfolio.

Commitment to Robustness and Security

The unique architectural benefits of the Erigon and Caplin pairing, such as optimized performance, faster block propagation, and superior throughput, are detailed further in our recent article Benefits of Caplin and Erigon for Staking.

This partnership signifies Erigon and Gateway.fm’s shared commitment to operational excellence. By focusing on stringent testing protocols, particularly for new client versions and integrated CL features, this alliance provides crucial feedback loops that enhance the reliability of the global Ethereum staking infrastructure. This partnership exemplifies how specialized client development and expert operational insight can secure and decentralize the network.

 

]]>
https://erigon.tech/partnership-for-progress/feed/ 0
An Update on Our Support for Polygon https://erigon.tech/an-update-on-our-support-for-polygon/ https://erigon.tech/an-update-on-our-support-for-polygon/#respond Fri, 03 Oct 2025 14:00:05 +0000 https://erigon.tech/?p=2681 We are writing to share an important update regarding Erigon’s support for the Polygon network.

We have made the decision to formally discontinue official support for Polygon. This means that moving forward, we will no longer be addressing issues or implementing fixes specifically related to the Polygon network.

Last Erigon release supporting Polygon is v3.1.0 Pebble Paws.

Why the Change?

This wasn’t a decision we took lightly. Our top-tier support for the Polygon network was previously funded by a grant. However, following Polygon’s strategic decision not to renew our collaboration, we are now focusing our resources and efforts on delivering a superior and stable experience for the other chains and projects we support.

What This Means for Erigon’s Users

  • Snapshots: Snapshots: We will continue to take snapshots of Polygon data until October 31, 2025, to facilitate a necessary transition period for data finalization.
  • Existing Issues: If you encounter any issues on the Polygon network after this announcement, we strongly encourage you to direct your inquiries and requests to the Polygon support channels. They are the authoritative source for resolving any operational problems on their network.
  • Users of Polygon who were utilizing Erigon to run their nodes are advised to migrate to Polygon’s Erigon fork, available at https://github.com/0xPolygon/erigon/releases.

We appreciate your understanding as we make this necessary shift to ensure the continued quality and reliability of Erigon. We remain committed to providing robust support for our officially supported networks and look forward to continuing to build and innovate with our partners.

]]>
https://erigon.tech/an-update-on-our-support-for-polygon/feed/ 0
Introducing Erigon Nitro for Arbitrum Sepolia https://erigon.tech/introducing-erigon-nitro-for-arbitrum-sepolia/ https://erigon.tech/introducing-erigon-nitro-for-arbitrum-sepolia/#respond Mon, 22 Sep 2025 13:31:26 +0000 https://erigon.tech/?p=2557 In mid-May of this year, the Erigon and the Arbitrum Foundation announced a partnership aimed at providing the ecosystem with an alternative client implementation for cost benefits, client diversity, and operational improvements. The goal is to make it easier and more affordable for developers and infrastructure providers to run and scale nodes, benefiting the entire Arbitrum community. We are happy to share the first details on our progress.

How Erigon 3 Enhances Scalability on Arbitrum

Erigon 3 is built with a transaction-first architecture, processing the chain at the level of individual transactions rather than whole blocks. This makes it a natural fit for Arbitrum, where the sequencer streams transactions rather than pre-assembled blocks. After execution, a transaction can emit additional transactions that must be executed or scheduled before the next “block start” transaction arrives. 

Looking ahead, this approach could pave the way for new scaling techniques, smarter scheduling algorithms, and finer-grained parallelism, further enhancing the network’s efficiency and performance.

Archive Node made smaller than state growth

At ErigonTech, we are committed to delivering robust results. Our expertise lies in compressing extensive data into compact, laptop-optimized workloads, making us a leader in the field. Arbitrum integration is built on top of the Erigon client, inheriting its data model, the improved staged-sync execution pipeline at its core, and the snapshot delivery mechanism known as OtterSync. The integration process started with a Arbitrum Sepolia testnet. 

We estimate that state growth for Nitro Sepolia rollup snapshots is approximately 700 GB per month (source), while Arbitrum One’s is about 850GB per month (source).

Recent execution results shown that our prototype Erigon Nitro running an Arbitrum Sepolia archive node was shrunk from 12TB (source) to 732GB, achieving a 94% reduction, making its size smaller than the actual monthly state growth. A minimal Erigon Nitro Sepolia node with no history (set using the –prune.mode=minimal option) requires 208 GB of storage. Arbitrum One’s development is currently underway, and we anticipate achieving results that are comparable to those on the mainnet.

Given these small snapshots and the embedded OtterSync feature potentially allow node operators to bootstrap an Arbitrum Sepolia archive node in a matter of hours, depending on network throughput. In case of node failure – we got your back: it is no longer necessary to perform a full resync. Using Erigon tooling, administrators can reset the most recent execution progress and start over from a recent block.

Our Roadmap to Full Arbitrum Support

We are working hard to deliver full Arbitrum support:

  • 2025 Q4: Stable Sepolia-rollup RPC and execution client.
  • 2026 Q1: Basic support for Arbitrum One.
  • 2026 Q3: Full prover support, completing our Arbitrum integration.

While this is a v0.1 alpha release, we encourage you to test it on the Sepolia testnet and provide feedback on our Discord. Your input is crucial for us to refine our solution and ensure a seamless experience. Visit our HackMD for technical instructions on how to run a Erigon Nitro node for Arbitrum Sepolia.

]]>
https://erigon.tech/introducing-erigon-nitro-for-arbitrum-sepolia/feed/ 0
How I reduced a Polygon archive node size by +900GB https://erigon.tech/how-i-reduced-a-polygon-archive-node-size-by-900gb/ https://erigon.tech/how-i-reduced-a-polygon-archive-node-size-by-900gb/#respond Fri, 19 Sep 2025 14:18:20 +0000 https://erigon.tech/?p=2584 Introduction

This article describes an optimization made into Erigon 3.1 which significantly reduced the size of internal indexes. But before diving into our recent findings, let’s first contextualize the reader into how indexing in Erigon evolved over time.

From block to transaction granularity

In Erigon 2 the indexes had block granularity. That means that if you are going to index, for example, the occurrence of a log topic, it is storing the block number where the event happened.

In that case, if the user asks the node for occurrences for that particular log topic, the database is going to answer a list of block numbers, then the code needs to traverse all the individual transactions inside that block in order to finally find the matches.

The indexes are stored in a data structure called roaring bitmap, which offers a very compact way to store continuous numbers. Having block granularity indexes benefit directly from that property as popular contracts have some appearance in every block.

However block granularity doesn’t scale well, as block size varies among chains. Chains with short intervals between blocks have smaller blocks, while chains like Ethereum mainnet which have a 12 seconds block time have more transactions.

Enter Erigon 3, where all indexes have transaction granularity. That means every index pinpoint exactly the transaction where the event occurred.

In Erigon 3 the roaring bitmaps were replaced by Elias-Fano, which is another encoding format for number sequences. It performed so well that even with a smaller index granularity (hence more data being indexed), Erigon 3 total disk size is smaller than Erigon 2.

That was the result of +3 years of research done by the Erigon team.

💡 Note: there are other changes made in Erigon 3 that have also contributed to the smaller disk size.

In order to establish a baseline, here are the disk space requirements of Erigon 3 up to chain tip (Sep/2025):

Can we do even better?

I started studying the new index format of Erigon 3 and while looking at production data I realized there were improvements that could be done in order to make the index size even smaller.

This article won’t describe how Elias-Fano encoding works, there are plenty of articles out there, but suffice to say that it incurs some overhead per key in order to store the support data structures that allow fast get/search operations inside the index.

That means that for very short sequences it would take less disk space if you DO NOT use Elias-Fano, but simply store the raw numbers instead. Now, how much data does these short sequences occur in real blockchain data? Let’s take a look at Ethereum mainnet data (as of Sep/25).

First, the number of index keys grouped by how many transaction matches it has (entries with >16 matches are grouped):

Second, the disk usage of each key index, grouped by the number of transaction matches:

The reader shouldn’t bother trying to understand the details because they are related to Erigon internal data structures. For the same reason they don’t necessarily map 1:1 to other client implementations.

The important data point here is: most of the keys in production data index only 1 or 2 transactions and they correspond to the majority of disk usage (~360GB of index values).

It means that by NOT using Elias-Fano for such cases would result in large disk space savings. In fact, the next chart shows a simulation where indexes with < 16 transaction matches would be naively encoded just as a concatenation of their raw values (uint64):

The naive solution would reduce the disk usage from +360GB to 150GB, almost 60% reduction. In the end I implemented an even more optimized encoding by taking advantage of the file format Erigon 3 is using and the final index size is ~98GB.

On Polygon (bor-mainnet) the disk savings were even bigger, over 900GB, resulting in the entire archive node going from 5.8TB to 4.9TB. For the record, here are the index values histogram of bor-mainnet on Erigon 3.0 (~1.3TB):

And here is the optimized simulation of it (~290GB):

Result: much smaller archives in Erigon 3.1

All indexes were regenerated on Erigon 3.1 and the table below shows the gains per supported chain (only index files):

ChainErigon 3Erigon 3.1Savings
Ethereum Mainnet504GB252GB-252GB (-50%)
Gnosis179GB102GB-77GB (-43%)
Polygon1,900GB970GB-930GB (-49%)
Sepolia Testnet142GB72GB-70GB (-49%)
Holesky Testnet101GB45GB-56GB (-56%)
Hoodi Testnet12GB6GB-6GB (-50%)

 

And here are the differences considering the entire node size (note: the savings column doesn’t match exactly the previous table because there are other small differences other than indexes between the compared nodes):

ChainErigon 3Erigon 3.1Savings
Ethereum Mainnet2,050GB1,770GB-280GB (-14%)
Gnosis600GB539GB-61GB (-10%)
Polygon5,780GB4,850GB-930GB (-16%)
Sepolia Testnet863GB780GB-83GB (-10%)
Holesky Testnet342GB277GB-65GB (-19%)
Hoodi Testnet58GB53GB-5GB (-9%)

 

Comparison of Erigon 3.1 (Ethereum mainnet) vs other clients* **:

* Reth snapshot from Merkle, image from 2025-09-01. Geth using the path-based archive mode. Erigon 3/3.1 synced up to mid-09/2025.

** Special thanks to Chase Wright who provided data on Reth/Geth from his nodes.

Final thoughts

Erigon 3.1 reaffirms its position as the most space efficient archive node implementation. The state snapshots introduced in Erigon 3.0 established a new foundation enabling a whole new set of possible optimizations.

I’m confident there are many more optimizations waiting to be uncovered as we continue to study production data patterns more closely. Looking forward to bringing more disk space savings in the future!

]]>
https://erigon.tech/how-i-reduced-a-polygon-archive-node-size-by-900gb/feed/ 0
Erigon 3.1 Pebble Paws https://erigon.tech/erigon-3-1-pebble-paws/ https://erigon.tech/erigon-3-1-pebble-paws/#respond Wed, 17 Sep 2025 07:55:15 +0000 https://erigon.tech/?p=2542 We are thrilled to announce the release of Erigon 3.1.0, a major update packed with new features and optimizations. For users, validators and node operators, this release brings significant improvements to stability, performance, and overall efficiency, ensuring a smoother and more reliable experience.

Faster, Smoother, and More Reliable

Receipts and Predictable Latency: Erigon 3 was designed to be lean by regenerating transaction receipts on demand, but this could cause latency spikes for high-demand applications. With Erigon 3.1.0, we’re introducing the –persist.receipts flag, which is now enabled by default for full and minimal nodes. This downloads pre-calculated receipts, ensuring a consistent and predictable latency, meaning a more reliable system when interacting with the network, especially during high-traffic periods, and a blazing 10x faster RPC. 

Upgrades and Maintenance Made Easy

Managing a validator or RPC node shouldn’t be a chore. We’ve introduced new tools to simplify the upgrade process and reduce downtime.

  • Upgrading Data Without a Full Re-sync: The new Erigon snapshot reset command upgrades your data and triggers a resync without throwing away snapshots files that haven’t changed. This drastically cuts down on the time and resources needed to get a node back online after an update.
  • Efficient Disk Usage: We’ve managed to cut the size of our .ef files by 50%. This means less disk space is needed to run a node, making it more cost-effective and easier to manage. The storage size of Ethereum mainnet archive nodes has decreased from 2.05TB to 1.77TB, and Polygon nodes have seen a reduction from 5.78TB to 4.85TB. More details on how the Erigon team achieved it here.
  • Disk I/O during file merges has been notably reduced, thanks to a more efficient data writing process that minimizes redundant writes and improves overall merge speed.
  • Smarter Snapshot data Downloader: A new webseeding algorithm reduces network overhead and load on nodes. P2P uploading has been improved to allow higher speeds.
  • Snapshot Downloader Reliability and Data Management: A common issue in Erigon 3.0 where nodes would halt if a snapshot set became inaccessible has been resolved. The new downloader allows nodes to automatically find the latest snapshot set upon restart, preventing them from getting stuck. The file system state has been simplified for better visibility and debugging. Read-only flags are now applied to files once downloaded to help prevent accidental corruption. 
  • Improved Logging and Control: Logging and status output have been significantly improved to provide better visibility into sync speed, estimated completion time, and reasons for any stalls. There’s also a new flag that allows users to control web seeding and peer-to-peer download speeds separately, offering more granular control over data sources. These combined improvements should lead to less downtime for central services and a more consistent syncing experience for users.

Shutter Network Enhancements

A key highlight of this release is the official support for the Shutter’s encrypted mempool on Gnosis Chain. The Shutter Network offers a solution for an encrypted transaction pool through the use of threshold encryption, known as an encrypted mempool. This technology safeguards users from harmful MEV attacks (front-running and sandwich attacks) and real-time censorship.

This is a significant step forward for validators on Gnosis. It allows them to participate in a network designed to prevent malicious MEV attacks, censorship, and provides them with access to extra transactions. These shielded transactions are not available in the public transaction pool, potentially leading validators to higher block rewards.

You can read more about Shutter’s integration on Gnosis Chain at https://blog.shutter.network/shutterized-gnosis-chain-is-now-liveFor instructions on how to run a Shutterized validator using Erigon head to our docs website.

Polygon support

Pebble Paws will be the last Erigon release series to officially support Polygon.

See also GitHub Release Notes.

]]>
https://erigon.tech/erigon-3-1-pebble-paws/feed/ 0
Erigon Joins the Arbitrum Ecosystem https://erigon.tech/erigon-joins-the-arbitrum-ecosystem/ https://erigon.tech/erigon-joins-the-arbitrum-ecosystem/#respond Wed, 14 May 2025 14:02:59 +0000 https://erigon.tech/?p=2457 press release: Erigon Joins the Arbitrum Ecosystem

Erigon is proud to announce a new collaboration with the Arbitrum Foundation. As part of this partnership, Erigon will begin developing a high-performance execution client for the Arbitrum ecosystem, contributing to a more decentralized, scalable, and efficient infrastructure.

This marks an exciting step in our mission to deliver modular and optimized software for the Ethereum ecosystem, now extended to one of the most widely adopted Layer 2 networks.

“Collaborating with Arbitrum is a natural extension of our work,” said Giulio Rebuffo, CTO of Erigon. “We’re excited to contribute to Arbitrum’s technical diversity and help shape the future of Ethereum scalability.”

To learn more about this initiative and the broader effort to expand client diversity on Arbitrum, read the official announcement:

🔗 blog.arbitrum.io/erigon-and-nethermind-join-arbitrum

]]>
https://erigon.tech/erigon-joins-the-arbitrum-ecosystem/feed/ 0
Introducing Named Releases https://erigon.tech/introducing-named-releases/ https://erigon.tech/introducing-named-releases/#respond Sat, 12 Apr 2025 10:21:58 +0000 https://erigon.tech/?p=2413 With the release of Erigon v3.0.1, we’re introducing something new: for the first time, our version comes with a name — Otterly Odyssey. This marks a small but meaningful evolution in how we present our software. From now on, each major version of Erigon will be accompanied by a distinctive name.

These names are not just labels — they’re a way to make each release more recognizable, memorable, and easier to reference in the community. Patch versions (like 3.0.2, 3.0.3, etc.) will continue under the same name as their parent major release.

This new approach reflects a broader goal: to make Erigon not only a high-performance Ethereum client, but also an accessible and user-friendly piece of infrastructure — one that developers, validators, and ecosystem partners can adopt and talk about more easily.

A Brief History of Erigon Versions:

Erigon has come a long way since its origins as Turbogeth. In the early days, releases followed a date-based versioning scheme:v2020.09.01, v2020.11.02, v2021.01.01 — simple and efficient, in line with the project’s early focus on performance and rapid iteration.

In mid-2021, the project officially transitioned to the name Erigon, signaling a new phase in its development. The versioning remained consistent, continuing with releases like v2021.06.01, v2021.07.02, and so on. These releases were frequent and pragmatic, reflecting the steady refinement of the client.

Starting in late 2021, Erigon adopted a semantic versioning format — v2.0.0 marked this transition — which helped standardize expectations around backward compatibility and feature progression.

Now, in 2025, with the release of v3.0.1 Otterly Odyssey, we’re entering a new chapter — one where each major version comes with a unique identity.

We believe this evolution in naming will help foster a stronger connection between our work and the community that uses it. Names give personality to software, and serve as useful anchors in conversations, documentation, and development roadmaps.

We’re excited to continue this journey, and we’re already looking forward to what the next named release will bring.

]]>
https://erigon.tech/introducing-named-releases/feed/ 0
Superchains and Unified Portal Network https://erigon.tech/superchains-and-unified-portal-network/ https://erigon.tech/superchains-and-unified-portal-network/#respond Tue, 08 Apr 2025 16:39:11 +0000 https://erigon.tech/?p=2404

Video Transcript:

Today I will explain in more detail what optimistic rollups are, what is the idea of rollup superchains, and then I will propose an alternative to the current direction of development of superchains.

To prepare this program I have used my conversations with Grok 3, and specifically DeepSearch and Think modes. So far it works rather well. I am not, of course, just going to read you the results of my queries. All the information went through my own brain, challenged and refined. However, I could still get things wrong, and I would appreciate if you let me know if I did.

First of all, I need to tell you what kind of Superchains I will be talking about. I am referring to the OP Superchain, where OP stands for Optimism. Optimism is a software platform for optimistic rollups using Ethereum as parent blockchain, in other words, as Layer 1, or L1, with the optimistic rollups themselves being Layer 2, or L2. The most well known instances of Optimism “stack” are OP Mainnet, and Base (run by Coinbase). Optimistic rollups have their own transactions, their own blocks, and their own state. Sequencers are the computers in the rollup networks that collect transactions from the users, form blocks, and calculate state modifications and state commitments. These sequencers periodically generate and submit transactions to L1. These transactions serve main 2 functions. Firstly, they invoke a special smart contract on L1, which is associated with the rollup, and insert an L2 state commitment (Merkle tree hash) into a structure of such smart contract. These L2 state commitments, existing in L1 data structures, allow the merkle proofs about elements of L2 state to be verified within L1 transactions. The most often cited use case for this is a withdrawal of some tokens from L2 into L1. In that case, what is being proven is the disposal of the tokens in L2, so the corresponding number of tokens can be released on L1. The second function of such periodic L1 transactions generated and submitted by the L2 sequencers, are batched blocks of L2. Why is this needed? There are two situations in which those batched blocks are useful. First situation may arise in both optimistic and non-optimistic rollups. Imagine L2 sequencers keep collecting transactions, and form blocks, and generate and submit L1 transaction with state commitments, but fail to distribute the blocks that they form (effective keeping them secret, or unavailable), and the state changes. If there were no batched blocks published in L1, no one except the L2 sequencer operators would be able to generate any correct Merkle proofs about L2 state – because without knowing the blocks, they cannot re-execute transactions and recalculate the state and its Merkle tree. The second situation where batched blocks are useful, can only theoretically arise in optimistic rollups. In optimistic rollups, when we look at it from the perspective of L1 user, for example, wishing to prove something in L2 state, let’s say for withdrawal, we assume that any L2 state commitment, posted by L2 sequencer, is initially correct. It may not be correct, because the correctness is not explicitly proven. Instead, for any such L2 state commitment, it may not be used for proving anything important for a certain period of time (currently 7 days). It is assumed that if the L2 blocks were executed incorrectly, and the L2 state commitment published corresponds to the incorrectly computed state, then someone will find out and initiate a fault proof procedure (used to be called fraud proof, but now probably changed for legal reasons). During this procedure, which happens in the form of transactions in L1, information from the batched blocks is used to interactively prove (or not) that the fault has occurred. This proving interaction can go down to the level of individual EVM opcodes. For this reason, it is a requirement that EVM instruction set on L1 is a superset of EVM instruction set of the L2 rollup. There are a lot of questions about practicality of this approach, but I will not discuss them now.

From the discussion of when the blob data submitted from rollups are useful, we can make an observation that in non-optimistic rollups, these blobs serve only as a mitigation from potential data unavailability. If such mitigation is removed, we arrive at a subset of non-optimistic rollup design, called Validium. What the L2 optimistic sequencer is claimed to be able to prove interactively, L2 non-optimistic sequencer proves non-interactively. An added consequence of this is that non-optimistic rollups may use any execution environment, or virtual machine, even not related to EVM. An example is StarkNet, which uses Cairo VM.

Let us now come back to optimistic rollups. Let us look at smart contracts on L1 and related smart contracts on L2. For example, these could be representing a token with the same ticker, but existing these two separate blockchains. In order to make such tokens move from L1 to L2 and back, one may employ some mechanism for reliable message passing between these smart contracts, so that tokens locked or destroyed in one blockchain, can release or create the same amount of tokens in another blockchain. This message passing needs to be reliable in the sense that it is possible to know when the action that generated the message is final. For optimistic rollups, such message passing mechanism consists of two mechanisms: one for L1 to L2 message passing, another for L2 to L1 message passing. In order to enable L1 to L2 message passing, L2 sequencer includes L1 block hashes into L2 blocks. These L1 block hashes serve as anchors for proofs. In order to pass a message from L1 to L2, one needs to invoke a smart contract that emits log event. Log events are added to the receipt of the L1 transaction where this invocation happened. In turn, receipts of all transactions in L1 are placed into a Merkle tree and its root hash is inserted into the block header. Therefore, any emitted event can be proven using L1 block hash as an anchor. This proof would need to happen within a transaction on L2. The message is considered to be locally-safe at this point. If L1 block that was used in the proof reverts, then the corresponding L2 block will also revert. Therefore, locally-safe messages can be relied upon only within L1-L2 system itself, by passing messages further within the same L2, or back to L1. But if one wants to rely on this message to perform an action outside L1-L2 system, then one either takes on the risk of reversion, or has to wait for L1 finality.

Passing messages back from L2 to L1 is more complex, it takes longer, and it costs more, at least for the optimistic rollups. This is, of course, due to the fault proof challenge period. The message is initiated by a transaction on L2 which records it within a special smart contract. Then, once the next L2 state commitment is submitted to L1 rollup contract, this state commitment (or any subsequent commitment) can be used as an anchor to prove that the message on L2 has been registered. It is done by a transaction on L1, which submits the merkle proof. The merkle proof has to include the path to the L2 messaging contract and then to the specific storage item containing the message. At this point, the message is considered locally-safe, only for the communication with that particular L2 rollup that sent it. Which means that L1 smart contracts can react and send the message back to that L2 rollup, which would also be locally-safe. However, the delivery can only be considered final once the fault proof challenge period elapsed (7 days), and the block in which the L2 state root was first submitted, is finalised. Usually, finalisation happens much earlier.

I noted before that message sent from L2 to L1 is locally-safe pretty soon, but only for responding to the same L2. If the message has to be relayed to a different L2 via L1, this locally-safe property is not very helpful, since we assume that different optimistic rollups generally experience faults independently of each other.

To summarise, there are two techniques for sending the messages between the blockchains, one of which is L1 and another is L2. The first technique is based on event logs, and it requires the receiving blockchain to incorporate all the block hashes of the sending blockchain. The second technique is based on contract storage used as an outbox, and it requires the receiving blockchain to incorporate the state commitments of the sending chain regularly, but not necessarily for each state change. If we look at the requirements on the receiving end, it becomes clearer why event log based technique is used for sending messages from L1 to L2, and the outbox technique is used for sending messages from L2 to L1. It is assumed that L2 blockchain progresses quicker, so it is considered more practical for L2 to incorporate all block hashes of L1, and not the other way around. Even though the second technique, the outbox-based technique, relies on the state root and not on the block hash to prove the existence of a message, there are still log events generated on the sending blockchain. This is a subtle point is it is to do with the problem of incomprehensible state of Ethereum. Observing the storage of the outbox contract does not generally make it possible to figure out what messages we are looking at. They would some kind of key-value mapping, but the keys are all scrambled by Keccak hashing when Solidity compiler spreads the mapping across the storage. So one either has to use retracing, which is problematic, or emit log events, containing the necessary information to find the correct location in the outbox storage without retracing.

What is then the superchain and what does it do? On the surface the idea is simple, but there are important nuances that are somewhat subtle. Several optimistic rollups, all based on the Optimism platform, join a common system, called Superchain, with the goal of making it possible for smart contracts on these rollups to pass messages between each other without intermediation of L1, and therefore, quicker. One of the example use cases is the same – transfer of tokens by locking or destroying them on the source rollup, then delivering the message to the destination rollup and then unlocking or creating the same amount of tokens there.

How is this done? The second technique is used here, based on the outbox contract. And, as mentioned above, there are log events that are important for anyone who wants to prove the receipt of the message, because those log events contain location of the message within the scrambled storage of the outbox contract. Given the second technique is used, we can see two requirements. Firstly, all rollup blockchains participating in the superchain, must regularly incorporate the state commitments of the other blockchains, from where they wish to be able to receive messages. To achieve this, the sequencers of the participating rollups form a p2p mesh network, where they exchange their state commitments, and incorporate those into their blocks. The configuration of such mesh network is governed by multisig smart contracts. Secondly, in order for anyone to be able to prove the receipt of the messages, they need to have access to the stream of log events from the sending blockchains. For manual, ad-hoc operation, this can be arranged by running an extra node of the sending blockchain. But for automations, the component called OP Supervisor has been developed. In order to operate, it needs to be connected to the nodes of the sending L2 rollups, to listen for relevant log events, and storing them in the database.

Superchain documentation introduces the term “cross-safe” to describe the status of the messages that were passed between two different optimistic rollups participating in the superchain, such that their dependencies are committed to L1. Is is similar to locally-safe we discussed before, but a bit more confusing. In order for this message to be relied upon with finality, one has to watch out not only for the fault proofs on the receiving chain, but also on the sending chain, because either of them can invalidate the message. Then, of course, the receipt the of the message may trigger another message to the third L2 chain (token transfer accross the superchain, for example), and so on. If cross-safe messages are generally trusted, and one of the participant chains experiences a fault, and this fault is challenged successfully, it may be that effects can be felt on many participating chains. This is different from trusting locally-safe messages, where fault leads to problems only within one chain.

All in all, superchains do bring this new functionality of cross-chain communication without the involvement of L1. The cost of such communication is in fact smaller, because there is no need for L1 transactions at all. The latency is also reduced, but not because the whole challenge period of 7 days is cut out. If one still wants the same assurance brought by waiting for the complete 7 days period, one still needs to wait for it. Cross-safe messages are only as safe as locally-safe messages before.

Now lets quickly discuss the infrastructure. As we saw, L2 sequencers needs to be connected via a special p2p mesh network. Supervisors have to be connected to RPC nodes from all participating networks. More likely than not, this means that the participating rollups will end up never implementing decentralised sequencers. Unless this infrastructure is changed. And this where my proposal comes in. In one of my previous programmes, I mentioned the Portal Network project. It has subnets that create content addressable distributed storage for headers, blocks, state, receipts, and so on. In other words, they can be used to provide the kind of information the superchain sequencers require from other another, and the information the Supervisors require from the sequencers. The proposal is to enable Portal Network clients to simultaneously participate in the subnets for multiple blockchains, in our example, in the subnets for all Superchain participants. I did not go into the details of such design though. Because I think it is unlikely that such design will ever be required.

Original video on the Akexey Akunov Monoblunt channel on Telegram: https://t.me/monoblunt/132

Original Video Transcript on the Akexey Akunov Monoblunt channel on Telegram: https://t.me/monoblunt/133

]]>
https://erigon.tech/superchains-and-unified-portal-network/feed/ 0