-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Experimental triggerNextLedger timer Change #4865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
6af4b3f to
99cf3aa
Compare
99cf3aa to
24db760
Compare
src/herder/HerderImpl.cpp
Outdated
| uint64_t nextIndex = nextConsensusLedgerIndex(); | ||
| auto lastIndex = trackingConsensusLedgerIndex(); | ||
|
|
||
| // Note, this is not the LCL externalized close time, but the close time of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a hard time reasoning about this -- why is lcl-1 more trustworthy than lcl?
I think that if you have concerns that a bad validator picks a bad value for close time, this should be handled in the validity checks not after externalizing.
Then this code only needs to be concerned about the impact of latency in both SCP and apply.
By the time you externalize, you already went through both consensus and apply, so now (assuming no clock drift) is:
- under "low network load", close to
lcl.ct+1(with SCP overhead ~ 1s) - under "max capacity", close to
lcl.ct+1+4(SCP overhead and apply time combined are 5 seconds) - over "max capacity", larger than
lcl.ct+5
I think that if you look at lcl-1, you end up with values that are off by 5 across the board -- I suspect lastLedgerStatingPoint = std::max(now - timeSinceExternalized, lastLedgerStatingPoint); is what makes this code recover? (some tests should allow you to confirm/deny this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I misunderstood what the closeTime in the externalized message represented, this had nothing to do with being more trustworthy. I didn't realize closeTime for block N was the time at which SCP started, but thought it was the time that the message was broadcast at the end of SCP. Using N -1 assumed the time was the end of N - 1, not the beginning. I've pushed an update accordingly.
I had not run a supercluster test after adding the max clamp, but interestingly, the runs with the erroneous approach were still pretty good, and still saw a boost in TPS and fairly stable block times around 5 seconds (accept at times of heavy load). I'm not sure why it worked like that, and I doubt it was safe, but I'll dig into it a big more.
24db760 to
52b7601
Compare
















Description
This adds an experimental flag that when set, uses the closeTime from the last externalized SCP message as the basis for setting the triggerNextLedger timer.
Checklist
clang-formatv8.0.0 (viamake formator the Visual Studio extension)