-
Notifications
You must be signed in to change notification settings - Fork 945
Optimise state_root_at_slot for finalized slot
#8353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise state_root_at_slot for finalized slot
#8353
Conversation
| } | ||
|
|
||
| // Fast-path for the split slot (which usually corresponds to the finalized slot). | ||
| let split = self.store.get_split_info(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces a new read of the store.split RwLock in this function, prior to reading the canonical_head RwLock.
The risk here is:
Some other thread is holding the split lock exclusively (i.e. the DB finalization migration is running), and this blocks our read. Counter point: this will happen anyway (and with other downsides) if we miss the fast_lookup on the head state and need to load states from disk.
Maybe we could sequence the split read after the head read, as we mostly care about optimising the case where the finalized_slot is out of range of the head but known to the DB. Conversely, maybe using the split over the head is preferable, as there is more write-contention for the head (the head changes every slot whereas the split only changes at most once per epoch).
I suspect the effect sizes of both these phenomena are small. What we have now seems to be performing very well on Holesky, so I'm inclined to just leave it as-is pending further benchmarking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use a try_read here to prevent the deadlock risk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no deadlock risk. I figure try_read isn't worth it because the alternative is doing state reconstruction. We may as well wait for the split lock, as it will be less work in total
* Optimise `state_root_at_slot` for finalized slot (sigp#8353) This is an optimisation targeted at Fulu networks in non-finality. While debugging on Holesky, we found that `state_root_at_slot` was being called from `prepare_beacon_proposer` a lot, for the finalized state: https://github.com/sigp/lighthouse/blob/2c9b670f5d313450252c6cb40a5ee34802d54fef/beacon_node/http_api/src/lib.rs#L3860-L3861 This was causing `prepare_beacon_proposer` calls to take upwards of 5 seconds, sometimes 10 seconds, because it would trigger _multiple_ beacon state loads in order to iterate back to the finalized slot. Ideally, loading the finalized state should be quick because we keep it cached in the state cache (technically we keep the split state, but they usually coincide). Instead we are computing the finalized state root separately (slow), and then loading the state from the cache (fast). Although it would be possible to make the API faster by removing the `state_root_at_slot` call, I believe it's simpler to change `state_root_at_slot` itself and remove the footgun. Devs rightly expect operations involving the finalized state to be fast. Co-Authored-By: Michael Sproul <michael@sigmaprime.io> * Remove Windows CI jobs (sigp#8362) Remove all Windows-related CI jobs Co-Authored-By: antondlr <anton@sigmaprime.io> * Update proposer-only section in the documentation (sigp#8358) Co-Authored-By: Tan Chee Keong <tanck@sigmaprime.io> Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com> * Fix unaggregated delay metric (sigp#8366) while working on this sigp#7892 @michaelsproul pointed it might be a good metric to measure the delay from start of the slot instead of the current `slot_duration / 3`, since the attestations duties start before the `1/3rd` mark now with the change in the link PR. Co-Authored-By: hopinheimer <knmanas6@gmail.com> Co-Authored-By: hopinheimer <48147533+hopinheimer@users.noreply.github.com> * Downgrade and remove unnecessary logs (sigp#8367) ### Downgrade a non error to `Debug` I noticed this error on one of our hoodi nodes: ``` Nov 04 05:13:38.892 ERROR Error during data column reconstruction block_root: 0x4271b9efae7deccec3989bd2418e998b83ce8144210c2b17200abb62b7951190, error: DuplicateFullyImported(0x4271b9efae7deccec3989bd2418e998b83ce8144210c2b17200abb62b7951190) ``` This shouldn't be logged as an error and it's due to a normal race condition, and it doesn't impact the node negatively. ### Remove spammy logs This logs is filling up the log files quite quickly and it is also something we'd expect during normal operation - getting columns via EL before gossip. We haven't found this debug log to be useful, so I propose we remove it to avoid spamming debug logs. ``` Received already available column sidecar. Ignoring the column sidecar ``` In the process of removing this, I noticed we aren't propagating the validation result, which I think we should so I've added this. The impact should be quite minimal - the message will stay in the gossip memcache for a bit longer but should be evicted in the next heartbeat. Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> * Prepare `sensitive_url` for `crates.io` (sigp#8223) Another good candidate for publishing separately from Lighthouse is `sensitive_url` as it's a general utility crate and not related to Ethereum. This PR prepares it to be spun out into its own crate. I've made the `full` field on `SensitiveUrl` private and instead provided an explicit getter called `.expose_full()`. It's a bit ugly for the diff but I prefer the explicit nature of the getter. I've also added some extra tests and doc strings along with feature gating `Serialize` and `Deserialize` implementations behind the `serde` feature. Co-Authored-By: Mac L <mjladson@pm.me> * Remove ecdsa feature of libp2p (sigp#8374) This compiles, is there any reason to keep `ecdsa`? CC @jxs Co-Authored-By: Michael Sproul <michael@sigmaprime.io> * CI workflows to use warpbuild ci runner (sigp#8343) Self hosted GitHub Runners review and improvements local testnet workflow now uses warpbuild ci runner Co-Authored-By: lemon <snyxmk@gmail.com> Co-Authored-By: antondlr <anton@sigmaprime.io> * Remove `sensitive_url` and import from `crates.io` (sigp#8377) Use the recently published `sensitive_url` and remove it from Lighthouse Co-Authored-By: Mac L <mjladson@pm.me> * Migrate derivative to educe (sigp#8125) Fixes sigp#7001. Mostly mechanical replacement of `derivative` attributes with `educe` ones. ### **Attribute Syntax Changes** ```rust // Bounds: = "..." → (...) #[derivative(Hash(bound = "E: EthSpec"))] #[educe(Hash(bound(E: EthSpec)))] // Ignore: = "ignore" → (ignore) #[derivative(PartialEq = "ignore")] #[educe(PartialEq(ignore))] // Default values: value = "..." → expression = ... #[derivative(Default(value = "ForkName::Base"))] #[educe(Default(expression = ForkName::Base))] // Methods: format_with/compare_with = "..." → method(...) #[derivative(Debug(format_with = "fmt_peer_set_as_len"))] #[educe(Debug(method(fmt_peer_set_as_len)))] // Empty bounds: removed entirely, educe can infer appropriate bounds #[derivative(Default(bound = ""))] #[educe(Default)] // Transparent debug: manual implementation (educe doesn't support it) #[derivative(Debug = "transparent")] // Replaced with manual Debug impl that delegates to inner field ``` **Note**: Some bounds use strings (`bound("E: EthSpec")`) for superstruct compatibility (`expected ','` errors). Co-Authored-By: Javier Chávarri <javier.chavarri@gmail.com> Co-Authored-By: Mac L <mjladson@pm.me> * Fix flaky reconstruction test (sigp#8321) FIx flaky tests that depends on timing. Previously the test processes all 128 columns and expect reconstruction to happen after all columns are processed. There is a race here, and reconstruction could be triggered before all columns are processed. I've updated the tests to process 64 columns, just enough for reconstruction and wait for 50ms for reconstruction to be triggered. This PR requires the change made in sigp#8194 for the test to pass consistently (blob count set to 1 for all blocks instead of random blob count between 0..max) Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io> * Remove `ethers-core` from `execution_layer` (sigp#8149) sigp#6022 Use `alloy_rpc_types::Transaction` to replace the `ethers_core::Transaction` inside the execution block generator. Co-Authored-By: Mac L <mjladson@pm.me> * Include block root in publish block logs (sigp#8111) Debugging sigp#8104 it would have been helpful to quickly see in the logs that a specific block was submitted into the HTTP API. Because we want to optimize the block root computation we don't include it in the logs, and just log the block slot. I believe we can take a minute performance hit to have the block root in all the logs during block publishing. Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> * fix: clarify `bb` vs `bl` variable names in BeaconProcessorQueue (sigp#8315) since block and blob both start with `bl`, it was not clear how to differentiate between `blbroots_queue` and `bbroots_queue` After renaming, there also seems to be a discrepancy Co-Authored-By: Kevaundray Wedderburn <kevtheappdev@gmail.com> * Migrate the `deposit_contract` crate to `alloy` (sigp#8139) sigp#6022 Switches the `deposit_contract` crate to use the `alloy` ecosystem and removes the dependency on `ethabi` Co-Authored-By: Mac L <mjladson@pm.me> --------- Co-authored-by: Michael Sproul <michaelsproul@users.noreply.github.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io> Co-authored-by: antondlr <anton@sigmaprime.io> Co-authored-by: chonghe <44791194+chong-he@users.noreply.github.com> Co-authored-by: hopinheimer <48147533+hopinheimer@users.noreply.github.com> Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io> Co-authored-by: Jimmy Chen <jchen.tc@gmail.com> Co-authored-by: Mac L <mjladson@pm.me> Co-authored-by: lmnzx <snyxmk@gmail.com> Co-authored-by: Javier Chávarri <javier.chavarri@gmail.com> Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com>
* Optimise `state_root_at_slot` for finalized slot (sigp#8353) This is an optimisation targeted at Fulu networks in non-finality. While debugging on Holesky, we found that `state_root_at_slot` was being called from `prepare_beacon_proposer` a lot, for the finalized state: https://github.com/sigp/lighthouse/blob/2c9b670f5d313450252c6cb40a5ee34802d54fef/beacon_node/http_api/src/lib.rs#L3860-L3861 This was causing `prepare_beacon_proposer` calls to take upwards of 5 seconds, sometimes 10 seconds, because it would trigger _multiple_ beacon state loads in order to iterate back to the finalized slot. Ideally, loading the finalized state should be quick because we keep it cached in the state cache (technically we keep the split state, but they usually coincide). Instead we are computing the finalized state root separately (slow), and then loading the state from the cache (fast). Although it would be possible to make the API faster by removing the `state_root_at_slot` call, I believe it's simpler to change `state_root_at_slot` itself and remove the footgun. Devs rightly expect operations involving the finalized state to be fast. Co-Authored-By: Michael Sproul <michael@sigmaprime.io> * Remove Windows CI jobs (sigp#8362) Remove all Windows-related CI jobs Co-Authored-By: antondlr <anton@sigmaprime.io> * Update proposer-only section in the documentation (sigp#8358) Co-Authored-By: Tan Chee Keong <tanck@sigmaprime.io> Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com> * Fix unaggregated delay metric (sigp#8366) while working on this sigp#7892 @michaelsproul pointed it might be a good metric to measure the delay from start of the slot instead of the current `slot_duration / 3`, since the attestations duties start before the `1/3rd` mark now with the change in the link PR. Co-Authored-By: hopinheimer <knmanas6@gmail.com> Co-Authored-By: hopinheimer <48147533+hopinheimer@users.noreply.github.com> * Downgrade and remove unnecessary logs (sigp#8367) I noticed this error on one of our hoodi nodes: ``` Nov 04 05:13:38.892 ERROR Error during data column reconstruction block_root: 0x4271b9efae7deccec3989bd2418e998b83ce8144210c2b17200abb62b7951190, error: DuplicateFullyImported(0x4271b9efae7deccec3989bd2418e998b83ce8144210c2b17200abb62b7951190) ``` This shouldn't be logged as an error and it's due to a normal race condition, and it doesn't impact the node negatively. This logs is filling up the log files quite quickly and it is also something we'd expect during normal operation - getting columns via EL before gossip. We haven't found this debug log to be useful, so I propose we remove it to avoid spamming debug logs. ``` Received already available column sidecar. Ignoring the column sidecar ``` In the process of removing this, I noticed we aren't propagating the validation result, which I think we should so I've added this. The impact should be quite minimal - the message will stay in the gossip memcache for a bit longer but should be evicted in the next heartbeat. Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> * Prepare `sensitive_url` for `crates.io` (sigp#8223) Another good candidate for publishing separately from Lighthouse is `sensitive_url` as it's a general utility crate and not related to Ethereum. This PR prepares it to be spun out into its own crate. I've made the `full` field on `SensitiveUrl` private and instead provided an explicit getter called `.expose_full()`. It's a bit ugly for the diff but I prefer the explicit nature of the getter. I've also added some extra tests and doc strings along with feature gating `Serialize` and `Deserialize` implementations behind the `serde` feature. Co-Authored-By: Mac L <mjladson@pm.me> * Remove ecdsa feature of libp2p (sigp#8374) This compiles, is there any reason to keep `ecdsa`? CC @jxs Co-Authored-By: Michael Sproul <michael@sigmaprime.io> * CI workflows to use warpbuild ci runner (sigp#8343) Self hosted GitHub Runners review and improvements local testnet workflow now uses warpbuild ci runner Co-Authored-By: lemon <snyxmk@gmail.com> Co-Authored-By: antondlr <anton@sigmaprime.io> * Remove `sensitive_url` and import from `crates.io` (sigp#8377) Use the recently published `sensitive_url` and remove it from Lighthouse Co-Authored-By: Mac L <mjladson@pm.me> * Migrate derivative to educe (sigp#8125) Fixes sigp#7001. Mostly mechanical replacement of `derivative` attributes with `educe` ones. ```rust // Bounds: = "..." → (...) // Ignore: = "ignore" → (ignore) // Default values: value = "..." → expression = ... // Methods: format_with/compare_with = "..." → method(...) // Empty bounds: removed entirely, educe can infer appropriate bounds // Transparent debug: manual implementation (educe doesn't support it) // Replaced with manual Debug impl that delegates to inner field ``` **Note**: Some bounds use strings (`bound("E: EthSpec")`) for superstruct compatibility (`expected ','` errors). Co-Authored-By: Javier Chávarri <javier.chavarri@gmail.com> Co-Authored-By: Mac L <mjladson@pm.me> * Fix flaky reconstruction test (sigp#8321) FIx flaky tests that depends on timing. Previously the test processes all 128 columns and expect reconstruction to happen after all columns are processed. There is a race here, and reconstruction could be triggered before all columns are processed. I've updated the tests to process 64 columns, just enough for reconstruction and wait for 50ms for reconstruction to be triggered. This PR requires the change made in sigp#8194 for the test to pass consistently (blob count set to 1 for all blocks instead of random blob count between 0..max) Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io> * Remove `ethers-core` from `execution_layer` (sigp#8149) Use `alloy_rpc_types::Transaction` to replace the `ethers_core::Transaction` inside the execution block generator. Co-Authored-By: Mac L <mjladson@pm.me> * Include block root in publish block logs (sigp#8111) Debugging sigp#8104 it would have been helpful to quickly see in the logs that a specific block was submitted into the HTTP API. Because we want to optimize the block root computation we don't include it in the logs, and just log the block slot. I believe we can take a minute performance hit to have the block root in all the logs during block publishing. Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> * fix: clarify `bb` vs `bl` variable names in BeaconProcessorQueue (sigp#8315) since block and blob both start with `bl`, it was not clear how to differentiate between `blbroots_queue` and `bbroots_queue` After renaming, there also seems to be a discrepancy Co-Authored-By: Kevaundray Wedderburn <kevtheappdev@gmail.com> * Migrate the `deposit_contract` crate to `alloy` (sigp#8139) sigp#6022 Switches the `deposit_contract` crate to use the `alloy` ecosystem and removes the dependency on `ethabi` Co-Authored-By: Mac L <mjladson@pm.me> --------- Co-authored-by: Michael Sproul <michaelsproul@users.noreply.github.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io> Co-authored-by: antondlr <anton@sigmaprime.io> Co-authored-by: chonghe <44791194+chong-he@users.noreply.github.com> Co-authored-by: hopinheimer <48147533+hopinheimer@users.noreply.github.com> Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io> Co-authored-by: Jimmy Chen <jchen.tc@gmail.com> Co-authored-by: Mac L <mjladson@pm.me> Co-authored-by: lmnzx <snyxmk@gmail.com> Co-authored-by: Javier Chávarri <javier.chavarri@gmail.com> Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com>
* Optimise `state_root_at_slot` for finalized slot (sigp#8353) This is an optimisation targeted at Fulu networks in non-finality. While debugging on Holesky, we found that `state_root_at_slot` was being called from `prepare_beacon_proposer` a lot, for the finalized state: https://github.com/sigp/lighthouse/blob/2c9b670f5d313450252c6cb40a5ee34802d54fef/beacon_node/http_api/src/lib.rs#L3860-L3861 This was causing `prepare_beacon_proposer` calls to take upwards of 5 seconds, sometimes 10 seconds, because it would trigger _multiple_ beacon state loads in order to iterate back to the finalized slot. Ideally, loading the finalized state should be quick because we keep it cached in the state cache (technically we keep the split state, but they usually coincide). Instead we are computing the finalized state root separately (slow), and then loading the state from the cache (fast). Although it would be possible to make the API faster by removing the `state_root_at_slot` call, I believe it's simpler to change `state_root_at_slot` itself and remove the footgun. Devs rightly expect operations involving the finalized state to be fast. Co-Authored-By: Michael Sproul <michael@sigmaprime.io> * Remove Windows CI jobs (sigp#8362) Remove all Windows-related CI jobs Co-Authored-By: antondlr <anton@sigmaprime.io> * Update proposer-only section in the documentation (sigp#8358) Co-Authored-By: Tan Chee Keong <tanck@sigmaprime.io> Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com> * Fix unaggregated delay metric (sigp#8366) while working on this sigp#7892 @michaelsproul pointed it might be a good metric to measure the delay from start of the slot instead of the current `slot_duration / 3`, since the attestations duties start before the `1/3rd` mark now with the change in the link PR. Co-Authored-By: hopinheimer <knmanas6@gmail.com> Co-Authored-By: hopinheimer <48147533+hopinheimer@users.noreply.github.com> * Downgrade and remove unnecessary logs (sigp#8367) I noticed this error on one of our hoodi nodes: ``` Nov 04 05:13:38.892 ERROR Error during data column reconstruction block_root: 0x4271b9efae7deccec3989bd2418e998b83ce8144210c2b17200abb62b7951190, error: DuplicateFullyImported(0x4271b9efae7deccec3989bd2418e998b83ce8144210c2b17200abb62b7951190) ``` This shouldn't be logged as an error and it's due to a normal race condition, and it doesn't impact the node negatively. This logs is filling up the log files quite quickly and it is also something we'd expect during normal operation - getting columns via EL before gossip. We haven't found this debug log to be useful, so I propose we remove it to avoid spamming debug logs. ``` Received already available column sidecar. Ignoring the column sidecar ``` In the process of removing this, I noticed we aren't propagating the validation result, which I think we should so I've added this. The impact should be quite minimal - the message will stay in the gossip memcache for a bit longer but should be evicted in the next heartbeat. Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> * Prepare `sensitive_url` for `crates.io` (sigp#8223) Another good candidate for publishing separately from Lighthouse is `sensitive_url` as it's a general utility crate and not related to Ethereum. This PR prepares it to be spun out into its own crate. I've made the `full` field on `SensitiveUrl` private and instead provided an explicit getter called `.expose_full()`. It's a bit ugly for the diff but I prefer the explicit nature of the getter. I've also added some extra tests and doc strings along with feature gating `Serialize` and `Deserialize` implementations behind the `serde` feature. Co-Authored-By: Mac L <mjladson@pm.me> * Remove ecdsa feature of libp2p (sigp#8374) This compiles, is there any reason to keep `ecdsa`? CC @jxs Co-Authored-By: Michael Sproul <michael@sigmaprime.io> * CI workflows to use warpbuild ci runner (sigp#8343) Self hosted GitHub Runners review and improvements local testnet workflow now uses warpbuild ci runner Co-Authored-By: lemon <snyxmk@gmail.com> Co-Authored-By: antondlr <anton@sigmaprime.io> * Remove `sensitive_url` and import from `crates.io` (sigp#8377) Use the recently published `sensitive_url` and remove it from Lighthouse Co-Authored-By: Mac L <mjladson@pm.me> * Migrate derivative to educe (sigp#8125) Fixes sigp#7001. Mostly mechanical replacement of `derivative` attributes with `educe` ones. ```rust // Bounds: = "..." → (...) // Ignore: = "ignore" → (ignore) // Default values: value = "..." → expression = ... // Methods: format_with/compare_with = "..." → method(...) // Empty bounds: removed entirely, educe can infer appropriate bounds // Transparent debug: manual implementation (educe doesn't support it) // Replaced with manual Debug impl that delegates to inner field ``` **Note**: Some bounds use strings (`bound("E: EthSpec")`) for superstruct compatibility (`expected ','` errors). Co-Authored-By: Javier Chávarri <javier.chavarri@gmail.com> Co-Authored-By: Mac L <mjladson@pm.me> * Fix flaky reconstruction test (sigp#8321) FIx flaky tests that depends on timing. Previously the test processes all 128 columns and expect reconstruction to happen after all columns are processed. There is a race here, and reconstruction could be triggered before all columns are processed. I've updated the tests to process 64 columns, just enough for reconstruction and wait for 50ms for reconstruction to be triggered. This PR requires the change made in sigp#8194 for the test to pass consistently (blob count set to 1 for all blocks instead of random blob count between 0..max) Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io> * Remove `ethers-core` from `execution_layer` (sigp#8149) Use `alloy_rpc_types::Transaction` to replace the `ethers_core::Transaction` inside the execution block generator. Co-Authored-By: Mac L <mjladson@pm.me> * Include block root in publish block logs (sigp#8111) Debugging sigp#8104 it would have been helpful to quickly see in the logs that a specific block was submitted into the HTTP API. Because we want to optimize the block root computation we don't include it in the logs, and just log the block slot. I believe we can take a minute performance hit to have the block root in all the logs during block publishing. Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> * fix: clarify `bb` vs `bl` variable names in BeaconProcessorQueue (sigp#8315) since block and blob both start with `bl`, it was not clear how to differentiate between `blbroots_queue` and `bbroots_queue` After renaming, there also seems to be a discrepancy Co-Authored-By: Kevaundray Wedderburn <kevtheappdev@gmail.com> * Migrate the `deposit_contract` crate to `alloy` (sigp#8139) sigp#6022 Switches the `deposit_contract` crate to use the `alloy` ecosystem and removes the dependency on `ethabi` Co-Authored-By: Mac L <mjladson@pm.me> --------- Co-authored-by: Michael Sproul <michaelsproul@users.noreply.github.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io> Co-authored-by: antondlr <anton@sigmaprime.io> Co-authored-by: chonghe <44791194+chong-he@users.noreply.github.com> Co-authored-by: hopinheimer <48147533+hopinheimer@users.noreply.github.com> Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io> Co-authored-by: Jimmy Chen <jchen.tc@gmail.com> Co-authored-by: Mac L <mjladson@pm.me> Co-authored-by: lmnzx <snyxmk@gmail.com> Co-authored-by: Javier Chávarri <javier.chavarri@gmail.com> Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com>
Proposed Changes
This is an optimisation targeted at Fulu networks in non-finality.
While debugging on Holesky, we found that
state_root_at_slotwas being called fromprepare_beacon_proposera lot, for the finalized state:lighthouse/beacon_node/http_api/src/lib.rs
Lines 3860 to 3861 in 2c9b670
This was causing
prepare_beacon_proposercalls to take upwards of 5 seconds, sometimes 10 seconds, because it would trigger multiple beacon state loads in order to iterate back to the finalized slot. Ideally, loading the finalized state should be quick because we keep it cached in the state cache (technically we keep the split state, but they usually coincide). Instead we are computing the finalized state root separately (slow), and then loading the state from the cache (fast).Although it would be possible to make the API faster by removing the
state_root_at_slotcall, I believe it's simpler to changestate_root_at_slotitself and remove the footgun. Devs rightly expect operations involving the finalized state to be fast.Additional Info
Speaking of footgun removal: