-
Notifications
You must be signed in to change notification settings - Fork 14
chore: Introduce download-live — Continuous GCS-backed Block Ingestion with Validation, Daily Tar Archives, and Automatic Rollover #1866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 1736-add-blockchain-cli-tools
Are you sure you want to change the base?
Conversation
b18732f to
9922483
Compare
|
Note, DCO (Developer Certificate of Origin) is required for every commit (hence why it fails on this PR). It's very difficult to add that if there are merge commits (one reason I recommend rebase instead), but you can try the following git command (I'd recommend using a separate clone just in case this goes badly). git rebase -f --signoff --sign HEAD~11The number |
|
Just an initial note based on the PR description. To make this maximally compatible with the Block Node, it would be helpful to |
52531c8 to
84185a0
Compare
f4c1d91 to
afa8cc7
Compare
Sure, will add that. |
1224daa to
3d937fb
Compare
# Conflicts: # tools-and-tests/tools/src/main/java/org/hiero/block/tools/days/subcommands/DownloadLive.java Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
…json) - Persists `dayKey` and `lastSeenBlock` after each tick for resuming downloads - Loads state automatically once per day to continue from the last processed block - Uses minimal JSON read/write via java.nio and regex (no external dependencies) - Added console logs for state load/resume/failure - Ensures parent directories are created before writing - Completes core of resumable day-scoped live poller (next: dedupe + compression) Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
- Persists and after each tick for resuming downloads - Loads state automatically once per day to continue from the last processed block - Uses minimal JSON read/write via java.nio and regex (no external dependencies) - Added console logs for state load/resume/failure - Ensures parent directories are created before writing - Completes core of resumable day-scoped live poller (next: dedupe + compression) Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
• Copy files from sourceRoot into a tmp file, then atomically move to the per-day folder • Decompress .gz in-memory and parse via RecordFileInfo.parse(...) • Compute the block hash and compare against the mirror-provided hash (with optional 0x prefix) • Append each successfully validated file into a per-day <dayKey>.tar via system tar • Add background compression and cleanup: • On day rollover, schedule async compression of <dayKey>.tar → <dayKey>.tar.zstd using zstd • After successful compression, delete the per-day <dayKey>/ folder while keeping both .tar and .tar.zstd • Make the poller continuous across days: • Detect day change in the configured rollover timezone • Finalize the previous day in the background and start writing into the new day’s tar • Preserve lastSeenBlock across days to maintain monotonic block ingestion and avoid re-downloading previous blocks • Add logging around state load/save, sample descriptors, download/placement, tar append, and compression/cleanup for operational visibility Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
this then allows us to to use the download-live in three modes 1. start date with no end 2. start date with end date 3. or in pure live mode. Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
…ily tar rollover Includes mirror polling, GCS downloads, hash+signature validation, tar append, zstd rollover, state persistence, and quarantine handling. Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
…rical day ranges and then transitioning seamlessly into live-follow behaviour. The previous implementation always processed “today” regardless of --start-day/--end-day; this patch replaces that logic with explicit day-by-day iteration. Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
…ed to pass in storage class Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
- Wire LiveDownloader to use DownloadDayLiveImpl.downloadSingleBlockForLive(...) for live-day batches - Introduce BlockDownloadResult-based fullBlockValidate(...) that builds RecordFileBlockV6 directly from the in-memory files (record, sigs, sidecars) returned by DownloadDayLiveImpl - Run full block validation in live mode before persisting any files or advancing previousRecordFileHash, skipping persistence on validation failure - Persist fully validated files into per-day folders and append to <dayKey>.tar to mirror historic downloadDay behaviour - Keep historical days on the existing full-day downloadDay(...) pipeline for backfill Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
3d937fb to
c089745
Compare
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
mustafauzunn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build is failing
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
f1ba2e6 to
4da2805
Compare
This PR adds a complete live block ingestion pipeline through a new days download-live command. It continuously polls the mirror node for new blocks, downloads the associated record files, signature files, and sidecars from GCS, and validates them using the same full verification path as the historical download2 and Validate tools. The downloader computes the record hash, checks the signature file hash, verifies node signatures against the address book, enforces the ⅔ (node) rule, validates the hash chain, and ensures that all sidecar hashes match the values embedded in the record file. Successfully validated blocks are written into a per-day directory and appended into an open tar file for that day, while invalid or unverifiable blocks are safely quarantined with detailed reason logging. The system automatically rolls over at day boundaries, finalizes the tar, compresses it to zstd the background, and initializes a new day’s tar. A lightweight JSON state file records the current day and last ingested block, allowing ingestion to resume seamlessly after restarts. Overall, this PR brings live ingestion to feature parity with the historical tooling while adding tar-based archival, day rollover, resiliency, and resumable state.