-
Notifications
You must be signed in to change notification settings - Fork 32
Checkpoints and replication
Each proposal that gets delivered to the application is signed by 2f+1
. The signatures are piggybacked to the Commit
messages. Therefore, each decided proposal can essentialy serve as a checkpoint, as defined by PBFT.
As a result, the Write Ahead Log (WAL) can be truncated at any point in time, but for efficiency reasons it is truncated whenever it reaches a certain threshold.
When a node was disconnected for too long, it may have missed some proposals that have been committed, and as a result it needs to fill the gap before it can deliver new proposals to the application.
This is triggered by having the node participate in a View and receiving a heartbeat from the leader that doesn't match the view's state.
We distinguish between two scenarios:
- The code doesn't have the latest verification sequence so it cannot verify the signatures of the in-flight proposals.
- The node has the latest verification sequence so it can verify the signatures of the in-flight proposals.
If (1) occurs, the replicated state machine calls Sync() which is provided by the application layer, and should:
- Contact 2f+1 nodes and obtain the latest verification sequence, the last view number, and the last sequence in that view.
- Replicate the proposals until that proposal, and deliver it to the application.
Since several proposals could have been created during the invocation of Sync(), the node might start participating in the latest view while having a gap of proposals that weren't delivered to the application. As a result, the node writes these proposals to a temporary Write Ahead Log, calls Sync() again, and then delivers the proposals to the application layer by reading from the Write Ahead Log.