feat(evm): add database with separate storage for block data #4485

DracoLi · 2025-11-10T14:13:31Z

Why this should be merged

Adds a ethdb.Database implementation that moves EVM block data out of the shared KV store into dedicated height-indexed databases optimized for block storage, reducing KV store size and compaction cost.

Headers, bodies, and receipts are stored in heightindexdb; all other data remains in the underlying KV store.
A decrease in total LevelDB/PebbleDB size and total compaction time is expected. Detailed metrics will be tracked in Benchmark C-Chain with height-indexed db vs leveldb-only blocks #4523.

How this works

The blockdb.Database wraps ethdb.Database and routes block data by key prefix, while leaving non-block data in the KV store.

Header/body/receipt keys for blocks with height ≥ the configured minimum are written to height-indexed DBs; below that threshold they use the KV store.
At most one block per height is stored. Writing at an occupied height overwrites the previous value and is not an intended use case. Deletes of block data are no-ops.
If a block isn’t yet in the height-indexed DBs and migration isn’t complete, reads fall back to the KV store.
Block-related keys bypass the KV batch and write directly to the height-indexed DBs; non-block keys are batched on the underlying KV store.
Deferred init: With allowDeferredInit, initialization can wait until the minimum block height is known (e.g., when state sync is enabled). Database min height is persisted and cannot be changed without recreating the block databases.

Migration

Canonical block data is migrated from the KV store to the height-indexed databases in the background.

Migrates headers, bodies, and receipts for canonical blocks; genesis is skipped.
During migration, corresponding KV entries are deleted in batches; header/body/receipt ranges are periodically compacted.
Periodic logs report status and ETA.
Migration can be paused and resumed safely.
Migrating ~71M blocks on a Mainnet node took ~5 hours.

How this was tested

Unit tests for routing and migration behavior.
Two Mainnet nodes (with state sync enabled and disabled) with blockdb enabled running for >4 weeks.
C-Chain migrations at height 1M/10M/20M via reexecution tests and live-node migrations (including one with ~71M blocks).
A script comparing block data via EVM JSON-RPC between a blockdb node and a non-blockdb node to verify block data.

Need to be documented in RELEASES.md?

No

Copilot

Pull request overview

This PR introduces a new blockdb.Database implementation that separates EVM block data (headers, bodies, receipts) from the shared KV store into dedicated height-indexed databases. This optimization aims to reduce KV store size and compaction overhead.

Key changes:

Implements routing logic to store block data at/above a minimum height in height-indexed databases while maintaining backward compatibility via fallback reads
Adds background migration to move existing canonical block data from the KV store to height-indexed databases
Provides deferred initialization support for scenarios like state sync where the minimum block height isn't initially known

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
vms/evm/database/blockdb/database.go	Core database implementation with key routing, read/write operations, and batch support
vms/evm/database/blockdb/migrator.go	Background migration logic with progress tracking, compaction, and pause/resume capabilities
vms/evm/database/blockdb/database_test.go	Comprehensive tests for database operations, initialization scenarios, and edge cases
vms/evm/database/blockdb/migration_test.go	Tests for migration completion, resumption, and data integrity during migration
vms/evm/database/blockdb/helpers_test.go	Test utilities for creating blocks, comparing data, and controlling migration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vms/evm/database/blockdb/migrator.go

vms/evm/database/blockdb/database.go

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vms/evm/database/blockdb/migrator.go

vms/evm/database/blockdb/database.go

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

vms/evm/database/blockdb/database_test.go:1

[nitpick] The comment uses 'blocks 1-3' but the code implements i >= 1 && i <= 3 without parentheses. While operator precedence makes this correct, adding parentheses would improve clarity: migrated := (i >= 1) && (i <= 3).

// Copyright (C) 2019-2025, Ava Labs, Inc. All rights reserved.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vms/evm/database/blockdb/migrator.go

vms/evm/database/blockdb/migration_test.go

ARR4N · 2025-11-27T10:35:08Z

vms/evm/database/blockdb/database.go

+	stateDB database.Database,
+	evmDB ethdb.Database,
+	dbPath string,
+	allowDeferredInit bool,


Why might someone want to defer? Why would the consumer know the minimum block height later, but not now? I'm thinking that this could instead be defaultMinimBlockHeight uint64, which you use instead of 1 when the current argument is false.

The InitBlockDBs method could then be unexported and the heightsDBReady bool field (which feels like a race-condition waiting to happen) removed.

So for context, I added this to address the state sync use case.

Currently, evm database is created on vm initialization. At that point, if state sync is enabled, we don't know what the height of the first block we will be fetching until we get the state sync summary. While we can just set min height to 1, this allows us to init the block dbs once we know the first block height we need to store is and call InitBlockDBs manually (See the an example here)

I'm open to alternative solutions. The main goal here is to being able to set the min height of the dbs to the first non genesis block we need when state sync is enabled to reduce the size of the db index file. Alternatively, we can also not do that and just use min height = 1. This will lead to the db index files being few GBs larger though.

ARR4N · 2025-11-27T10:38:21Z

vms/evm/database/blockdb/database.go

+	heightindexdb "github.com/ava-labs/avalanchego/x/blockdb"
+)
+
+var (


Although it's idiomatic in other languages, it's not a requirement in Go to declare all globals at the top like this. The primary drawback is that they lack context relative to where they're used so a reader has to jump back and forth between declaration and usage sites.

Please move the declarations (perhaps still in groups, but also feel free to break them up if warranted) closer to where they're used.

ARR4N · 2025-11-27T10:42:48Z

vms/evm/database/blockdb/database.go

+	_ ethdb.Database = (*Database)(nil)
+	_ ethdb.Batch    = (*batch)(nil)
+
+	migratorDBPrefix = []byte("migrator")


Do we want to prefix our prefixes to guarantee that they never clash with a geth value? Like "ava-migrator"?

Is there a downside to doing that? FWIW they use b for block bodies and blt- for something else so we don't need to worry about a substring clash and starting with a is (at first glance) safe.

this is the prefix on the stateDB, which is different than the evmDB, which contains the evm data. The evmDB currently has the ethedb prefix and clients should be passing a different prefixed db when creating the blockdb.Database (See a draft usage of this database here)

Current convention seems to be having the caller creating the prefixdb. But alternatively, we can also just wrap the stateDB with prefix db in New if you think that might be clearer.

ARR4N · 2025-11-27T10:43:38Z

vms/evm/database/blockdb/database.go

+// Since the prefixes should never be changed, we can avoid libevm changes by
+// duplicating them here.


Agreed.

Just noting in case any other reviewer asks me about libevm.

ARR4N · 2025-11-27T10:48:14Z

vms/evm/database/blockdb/database.go

+)
+
+const (
+	hashDataElements = 2


Suggested change

hashDataElements = 2

I don't think this particular constant adds much value, but it does force the reader to go and check it. The two below are feasibly configurable, but the number of elements is a property of the implementation because readHashAndData() (the only place it's used) is just the inverse of writeHashAndData(), which has 2 arguments of interest.

ARR4N · 2025-11-27T13:12:33Z

vms/evm/database/blockdb/database.go

+}
+
+func (db *Database) Close() error {
+	if db.migrator != nil {


This can never be the case when using the public API so the check is unnecessary and may also hide an implementation bug (see my similar comment on initMigrator()).

ARR4N · 2025-11-27T13:21:55Z

vms/evm/database/blockdb/database.go

+func (db *Database) heightDBForKey(key []byte) (database.HeightIndex, error) {
+	switch {
+	case isHeaderKey(key):
+		return db.headerDB, nil
+	case isBodyKey(key):
+		return db.bodyDB, nil
+	case isReceiptsKey(key):
+		return db.receiptsDB, nil
+	default:
+		return nil, errUnexpectedKey
+	}
+}
+
+func (db *Database) useHeightIndexedDB(key []byte) bool {
+	if !db.heightDBsReady {
+		return false
+	}
+
+	var n int
+	switch {
+	case isBodyKey(key):
+		n = len(evmBlockBodyPrefix)
+	case isHeaderKey(key):
+		n = len(evmHeaderPrefix)
+	case isReceiptsKey(key):
+		n = len(evmReceiptsPrefix)
+	default:
+		return false
+	}
+	num := binary.BigEndian.Uint64(key[n : n+blockNumberSize])
+	return num >= db.minHeight
+}


There's a lot of duplicated code here and you regularly call heightDBForKey() if useHeightIndexedDB() returns true. Combining them simplifies the implementation and some of the call sites, and the only "waste" on the false path is the hdb variable. Anywhere that only uses useHeightIndexedDB() can swap to if _, ok := db.heightDBForKey(); ok { (or !ok).

func (db *Database) heightDBForKey(key []byte) (database.HeightIndex, bool) { if !db.heightDBsReady { return nil, false } var ( prefix []byte hdb database.HeightIndex ) switch { case isBodyKey(key): prefix = evmBlockBodyPrefix hdb = db.bodyDB case isHeaderKey(key): prefix = evmHeaderPrefix hdb = db.headerDB case isReceiptsKey(key): prefix = evmReceiptsPrefix hdb = db.receiptsDB default: return nil, false } n := len(prefix) blockNum := binary.BigEndian.Uint64(key[n : n+blockNumberSize]) if blockNum < db.minHeight { return nil, false } return hdb, true }

When I got to parseBlockKey() I noticed that there was further overlap; I think you can collapse all 3. I've pushed my review/experiment branch for you to see what I mean.

Note that everything in there is either func parseBlockKey() (very similar logic) or func (*Database) parseBlockKey(), which has the added functionality of selecting the correct database while parsing.

thx for providing that example. definitely agreed. i think we can clean up a lot by having a parseKey function like your draft. I'll steal that idea and update.

ARR4N · 2025-11-27T13:32:31Z

vms/evm/database/blockdb/database.go

+func isBodyKey(key []byte) bool {
+	if len(key) != len(evmBlockBodyPrefix)+blockNumberSize+blockHashSize {
+		return false
+	}
+	return bytes.HasPrefix(key, evmBlockBodyPrefix)
+}
+
+func isHeaderKey(key []byte) bool {
+	if len(key) != len(evmHeaderPrefix)+blockNumberSize+blockHashSize {
+		return false
+	}
+	return bytes.HasPrefix(key, evmHeaderPrefix)
+}
+
+func isReceiptsKey(key []byte) bool {
+	if len(key) != len(evmReceiptsPrefix)+blockNumberSize+blockHashSize {
+		return false
+	}
+	return bytes.HasPrefix(key, evmReceiptsPrefix)
+}


Suggested change

func isBodyKey(key []byte) bool {

if len(key) != len(evmBlockBodyPrefix)+blockNumberSize+blockHashSize {

return false

}

return bytes.HasPrefix(key, evmBlockBodyPrefix)

}

func isHeaderKey(key []byte) bool {

if len(key) != len(evmHeaderPrefix)+blockNumberSize+blockHashSize {

return false

}

return bytes.HasPrefix(key, evmHeaderPrefix)

}

func isReceiptsKey(key []byte) bool {

if len(key) != len(evmReceiptsPrefix)+blockNumberSize+blockHashSize {

return false

}

return bytes.HasPrefix(key, evmReceiptsPrefix)

}

func isKeyForPrefix(key, prefix []byte) bool {

if len(key) != len(prefix)+blockNumberSize+blockHashSize {

return false

}

return bytes.HasPrefix(key, prefix)

}

func isBodyKey(key []byte) bool {

return isKeyForPrefix(key, evmBlockBodyPrefix)

}

func isHeaderKey(key []byte) bool {

return isKeyForPrefix(key, evmHeaderPrefix)

}

func isReceiptsKey(key []byte) bool {

return isKeyForPrefix(key, evmReceiptsPrefix)

}

ARR4N · 2025-11-27T13:40:12Z

vms/evm/database/blockdb/database_test.go

+	customtypes.Register()
+	params.RegisterExtras()


Why are these needed?

This is needed by the core.GenerateChainWithGenesis call in createBlocksToAddr in helpers_test.go.
Without them I get a nil pointer error on coreth/params.GetExtra and coreth/plugin/evm/customtypes.GetHeaderExtra

ARR4N · 2025-11-27T13:44:01Z

vms/evm/database/blockdb/database.go

+	}
+	num = binary.BigEndian.Uint64(key[n : n+blockNumberSize])
+	bytes := key[n+blockNumberSize:]
+	if len(bytes) != blockHashSize {


This can never happen because all of the is*Key() functions check the length.

I realised this because I thought it was unsafe to take a range into the key slice, wrote a test, and it didn't panic. I think this signals that the checks are too far away from here.

github-project-automation bot added this to avalanchego Nov 10, 2025

DracoLi moved this to In Progress 🏗️ in avalanchego Nov 10, 2025

DracoLi self-assigned this Nov 10, 2025

DracoLi linked an issue Nov 12, 2025 that may be closed by this pull request

Add evm database that supports separate storage for block data #4491

Open

DracoLi force-pushed the dl/evm-blockdb branch from 234838d to 03fde43 Compare November 12, 2025 16:55

DracoLi mentioned this pull request Nov 19, 2025

feat: Add block database ava-labs/coreth#1340

Closed

DracoLi force-pushed the dl/evm-blockdb branch 4 times, most recently from ef6921e to d8ee531 Compare November 23, 2025 23:22

DracoLi marked this pull request as ready for review November 24, 2025 14:30

DracoLi requested review from ARR4N, StephenButtolph and joshua-kim as code owners November 24, 2025 14:30

Copilot AI review requested due to automatic review settings November 24, 2025 14:30

Copilot AI reviewed Nov 24, 2025

View reviewed changes

vms/evm/database/blockdb/migrator.go Outdated Show resolved Hide resolved

vms/evm/database/blockdb/database.go Outdated Show resolved Hide resolved

DracoLi force-pushed the dl/evm-blockdb branch from d8ee531 to 10b2fbe Compare November 25, 2025 15:30

DracoLi requested a review from Copilot November 25, 2025 15:38

Copilot AI reviewed Nov 25, 2025

View reviewed changes

vms/evm/database/blockdb/migrator.go Outdated Show resolved Hide resolved

vms/evm/database/blockdb/database.go Outdated Show resolved Hide resolved

DracoLi force-pushed the dl/evm-blockdb branch from 10b2fbe to 7d7cb8b Compare November 25, 2025 15:45

DracoLi requested a review from Copilot November 25, 2025 15:45

Copilot AI reviewed Nov 25, 2025

View reviewed changes

vms/evm/database/blockdb/migrator.go Outdated Show resolved Hide resolved

vms/evm/database/blockdb/migration_test.go Outdated Show resolved Hide resolved

DracoLi force-pushed the dl/evm-blockdb branch from 7d7cb8b to 3cad123 Compare November 25, 2025 19:09

feat(evm): add database with separate block storage

2c6713d

DracoLi force-pushed the dl/evm-blockdb branch from 3cad123 to 2c6713d Compare November 25, 2025 19:14

ARR4N reviewed Nov 27, 2025

View reviewed changes

		// Since the prefixes should never be changed, we can avoid libevm changes by
		// duplicating them here.

feat(evm): add database with separate storage for block data #4485

Are you sure you want to change the base?

feat(evm): add database with separate storage for block data #4485

Conversation

DracoLi commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why this should be merged

How this works

Migration

How this was tested

Need to be documented in RELEASES.md?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DracoLi Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DracoLi commented Nov 10, 2025 •

edited

Loading

DracoLi Nov 27, 2025 •

edited

Loading