-
Notifications
You must be signed in to change notification settings - Fork 838
feat(evm): add database with separate storage for block data #4485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
234838d to
03fde43
Compare
ef6921e to
d8ee531
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a new blockdb.Database implementation that separates EVM block data (headers, bodies, receipts) from the shared KV store into dedicated height-indexed databases. This optimization aims to reduce KV store size and compaction overhead.
Key changes:
- Implements routing logic to store block data at/above a minimum height in height-indexed databases while maintaining backward compatibility via fallback reads
- Adds background migration to move existing canonical block data from the KV store to height-indexed databases
- Provides deferred initialization support for scenarios like state sync where the minimum block height isn't initially known
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| vms/evm/database/blockdb/database.go | Core database implementation with key routing, read/write operations, and batch support |
| vms/evm/database/blockdb/migrator.go | Background migration logic with progress tracking, compaction, and pause/resume capabilities |
| vms/evm/database/blockdb/database_test.go | Comprehensive tests for database operations, initialization scenarios, and edge cases |
| vms/evm/database/blockdb/migration_test.go | Tests for migration completion, resumption, and data integrity during migration |
| vms/evm/database/blockdb/helpers_test.go | Test utilities for creating blocks, comparing data, and controlling migration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d8ee531 to
10b2fbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
10b2fbe to
7d7cb8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
vms/evm/database/blockdb/database_test.go:1
- [nitpick] The comment uses 'blocks 1-3' but the code implements i >= 1 && i <= 3 without parentheses. While operator precedence makes this correct, adding parentheses would improve clarity:
migrated := (i >= 1) && (i <= 3).
// Copyright (C) 2019-2025, Ava Labs, Inc. All rights reserved.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7d7cb8b to
3cad123
Compare
3cad123 to
2c6713d
Compare
| stateDB database.Database, | ||
| evmDB ethdb.Database, | ||
| dbPath string, | ||
| allowDeferredInit bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why might someone want to defer? Why would the consumer know the minimum block height later, but not now? I'm thinking that this could instead be defaultMinimBlockHeight uint64, which you use instead of 1 when the current argument is false.
The InitBlockDBs method could then be unexported and the heightsDBReady bool field (which feels like a race-condition waiting to happen) removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for context, I added this to address the state sync use case.
Currently, evm database is created on vm initialization. At that point, if state sync is enabled, we don't know what the height of the first block we will be fetching until we get the state sync summary. While we can just set min height to 1, this allows us to init the block dbs once we know the first block height we need to store is and call InitBlockDBs manually (See the an example here)
I'm open to alternative solutions. The main goal here is to being able to set the min height of the dbs to the first non genesis block we need when state sync is enabled to reduce the size of the db index file. Alternatively, we can also not do that and just use min height = 1. This will lead to the db index files being few GBs larger though.
| heightindexdb "github.com/ava-labs/avalanchego/x/blockdb" | ||
| ) | ||
|
|
||
| var ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it's idiomatic in other languages, it's not a requirement in Go to declare all globals at the top like this. The primary drawback is that they lack context relative to where they're used so a reader has to jump back and forth between declaration and usage sites.
Please move the declarations (perhaps still in groups, but also feel free to break them up if warranted) closer to where they're used.
| _ ethdb.Database = (*Database)(nil) | ||
| _ ethdb.Batch = (*batch)(nil) | ||
|
|
||
| migratorDBPrefix = []byte("migrator") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to prefix our prefixes to guarantee that they never clash with a geth value? Like "ava-migrator"?
Is there a downside to doing that? FWIW they use b for block bodies and blt- for something else so we don't need to worry about a substring clash and starting with a is (at first glance) safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the prefix on the stateDB, which is different than the evmDB, which contains the evm data. The evmDB currently has the ethedb prefix and clients should be passing a different prefixed db when creating the blockdb.Database (See a draft usage of this database here)
Current convention seems to be having the caller creating the prefixdb. But alternatively, we can also just wrap the stateDB with prefix db in New if you think that might be clearer.
| // Since the prefixes should never be changed, we can avoid libevm changes by | ||
| // duplicating them here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
Just noting in case any other reviewer asks me about libevm.
| ) | ||
|
|
||
| const ( | ||
| hashDataElements = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| hashDataElements = 2 |
I don't think this particular constant adds much value, but it does force the reader to go and check it. The two below are feasibly configurable, but the number of elements is a property of the implementation because readHashAndData() (the only place it's used) is just the inverse of writeHashAndData(), which has 2 arguments of interest.
| } | ||
|
|
||
| func (db *Database) Close() error { | ||
| if db.migrator != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can never be the case when using the public API so the check is unnecessary and may also hide an implementation bug (see my similar comment on initMigrator()).
| func (db *Database) heightDBForKey(key []byte) (database.HeightIndex, error) { | ||
| switch { | ||
| case isHeaderKey(key): | ||
| return db.headerDB, nil | ||
| case isBodyKey(key): | ||
| return db.bodyDB, nil | ||
| case isReceiptsKey(key): | ||
| return db.receiptsDB, nil | ||
| default: | ||
| return nil, errUnexpectedKey | ||
| } | ||
| } | ||
|
|
||
| func (db *Database) useHeightIndexedDB(key []byte) bool { | ||
| if !db.heightDBsReady { | ||
| return false | ||
| } | ||
|
|
||
| var n int | ||
| switch { | ||
| case isBodyKey(key): | ||
| n = len(evmBlockBodyPrefix) | ||
| case isHeaderKey(key): | ||
| n = len(evmHeaderPrefix) | ||
| case isReceiptsKey(key): | ||
| n = len(evmReceiptsPrefix) | ||
| default: | ||
| return false | ||
| } | ||
| num := binary.BigEndian.Uint64(key[n : n+blockNumberSize]) | ||
| return num >= db.minHeight | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of duplicated code here and you regularly call heightDBForKey() if useHeightIndexedDB() returns true. Combining them simplifies the implementation and some of the call sites, and the only "waste" on the false path is the hdb variable. Anywhere that only uses useHeightIndexedDB() can swap to if _, ok := db.heightDBForKey(); ok { (or !ok).
func (db *Database) heightDBForKey(key []byte) (database.HeightIndex, bool) {
if !db.heightDBsReady {
return nil, false
}
var (
prefix []byte
hdb database.HeightIndex
)
switch {
case isBodyKey(key):
prefix = evmBlockBodyPrefix
hdb = db.bodyDB
case isHeaderKey(key):
prefix = evmHeaderPrefix
hdb = db.headerDB
case isReceiptsKey(key):
prefix = evmReceiptsPrefix
hdb = db.receiptsDB
default:
return nil, false
}
n := len(prefix)
blockNum := binary.BigEndian.Uint64(key[n : n+blockNumberSize])
if blockNum < db.minHeight {
return nil, false
}
return hdb, true
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I got to parseBlockKey() I noticed that there was further overlap; I think you can collapse all 3. I've pushed my review/experiment branch for you to see what I mean.
Note that everything in there is either func parseBlockKey() (very similar logic) or func (*Database) parseBlockKey(), which has the added functionality of selecting the correct database while parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx for providing that example. definitely agreed. i think we can clean up a lot by having a parseKey function like your draft. I'll steal that idea and update.
| func isBodyKey(key []byte) bool { | ||
| if len(key) != len(evmBlockBodyPrefix)+blockNumberSize+blockHashSize { | ||
| return false | ||
| } | ||
| return bytes.HasPrefix(key, evmBlockBodyPrefix) | ||
| } | ||
|
|
||
| func isHeaderKey(key []byte) bool { | ||
| if len(key) != len(evmHeaderPrefix)+blockNumberSize+blockHashSize { | ||
| return false | ||
| } | ||
| return bytes.HasPrefix(key, evmHeaderPrefix) | ||
| } | ||
|
|
||
| func isReceiptsKey(key []byte) bool { | ||
| if len(key) != len(evmReceiptsPrefix)+blockNumberSize+blockHashSize { | ||
| return false | ||
| } | ||
| return bytes.HasPrefix(key, evmReceiptsPrefix) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| func isBodyKey(key []byte) bool { | |
| if len(key) != len(evmBlockBodyPrefix)+blockNumberSize+blockHashSize { | |
| return false | |
| } | |
| return bytes.HasPrefix(key, evmBlockBodyPrefix) | |
| } | |
| func isHeaderKey(key []byte) bool { | |
| if len(key) != len(evmHeaderPrefix)+blockNumberSize+blockHashSize { | |
| return false | |
| } | |
| return bytes.HasPrefix(key, evmHeaderPrefix) | |
| } | |
| func isReceiptsKey(key []byte) bool { | |
| if len(key) != len(evmReceiptsPrefix)+blockNumberSize+blockHashSize { | |
| return false | |
| } | |
| return bytes.HasPrefix(key, evmReceiptsPrefix) | |
| } | |
| func isKeyForPrefix(key, prefix []byte) bool { | |
| if len(key) != len(prefix)+blockNumberSize+blockHashSize { | |
| return false | |
| } | |
| return bytes.HasPrefix(key, prefix) | |
| } | |
| func isBodyKey(key []byte) bool { | |
| return isKeyForPrefix(key, evmBlockBodyPrefix) | |
| } | |
| func isHeaderKey(key []byte) bool { | |
| return isKeyForPrefix(key, evmHeaderPrefix) | |
| } | |
| func isReceiptsKey(key []byte) bool { | |
| return isKeyForPrefix(key, evmReceiptsPrefix) | |
| } |
| customtypes.Register() | ||
| params.RegisterExtras() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed by the core.GenerateChainWithGenesis call in createBlocksToAddr in helpers_test.go.
Without them I get a nil pointer error on coreth/params.GetExtra and coreth/plugin/evm/customtypes.GetHeaderExtra
| } | ||
| num = binary.BigEndian.Uint64(key[n : n+blockNumberSize]) | ||
| bytes := key[n+blockNumberSize:] | ||
| if len(bytes) != blockHashSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can never happen because all of the is*Key() functions check the length.
I realised this because I thought it was unsafe to take a range into the key slice, wrote a test, and it didn't panic. I think this signals that the checks are too far away from here.
Why this should be merged
Adds a
ethdb.Databaseimplementation that moves EVM block data out of the shared KV store into dedicated height-indexed databases optimized for block storage, reducing KV store size and compaction cost.heightindexdb; all other data remains in the underlying KV store.How this works
The
blockdb.Databasewrapsethdb.Databaseand routes block data by key prefix, while leaving non-block data in the KV store.allowDeferredInit, initialization can wait until the minimum block height is known (e.g., when state sync is enabled). Database min height is persisted and cannot be changed without recreating the block databases.Migration
Canonical block data is migrated from the KV store to the height-indexed databases in the background.
How this was tested
blockdbenabled running for >4 weeks.Need to be documented in RELEASES.md?
No