Skip to content

Conversation

@rockysingh
Copy link
Contributor

@rockysingh rockysingh commented Nov 14, 2025

This PR adds a complete live block ingestion pipeline through a new days download-live command. It continuously polls the mirror node for new blocks, downloads the associated record files, signature files, and sidecars from GCS, and validates them using the same full verification path as the historical download2 and Validate tools. The downloader computes the record hash, checks the signature file hash, verifies node signatures against the address book, enforces the ⅔ (node) rule, validates the hash chain, and ensures that all sidecar hashes match the values embedded in the record file. Successfully validated blocks are written into a per-day directory and appended into an open tar file for that day, while invalid or unverifiable blocks are safely quarantined with detailed reason logging. The system automatically rolls over at day boundaries, finalizes the tar, compresses it to zstd the background, and initializes a new day’s tar. A lightweight JSON state file records the current day and last ingested block, allowing ingestion to resume seamlessly after restarts. Overall, this PR brings live ingestion to feature parity with the historical tooling while adding tar-based archival, day rollover, resiliency, and resumable state.

@rockysingh rockysingh requested review from a team as code owners November 14, 2025 00:58
@rockysingh rockysingh changed the title Implement End-to-End Live Block Downloader: Polling, Hash Validation, Tar Packaging, Compression, and Resumable State chore: Implement End-to-End Live Block Downloader: Polling, Hash Validation, Tar Packaging, Compression, and Resumable State Nov 14, 2025
@rockysingh rockysingh force-pushed the 1845-create-live-downloader-for-latest-files branch from b18732f to 9922483 Compare November 14, 2025 18:15
@rockysingh rockysingh added the Block Node Tools Additional tools related to, but not part of, the Block Node label Nov 15, 2025
@rockysingh rockysingh self-assigned this Nov 15, 2025
@rockysingh rockysingh marked this pull request as draft November 15, 2025 02:52
@rockysingh rockysingh added this to the 0.25.0 milestone Nov 15, 2025
@rockysingh rockysingh changed the title chore: Implement End-to-End Live Block Downloader: Polling, Hash Validation, Tar Packaging, Compression, and Resumable State chore: Introduce download-live — Continuous GCS-backed Block Ingestion with Validation, Daily Tar Archives, and Automatic Rollover Nov 15, 2025
@jsync-swirlds
Copy link
Contributor

Note, DCO (Developer Certificate of Origin) is required for every commit (hence why it fails on this PR).

It's very difficult to add that if there are merge commits (one reason I recommend rebase instead), but you can try the following git command (I'd recommend using a separate clone just in case this goes badly).

git rebase -f --signoff --sign HEAD~11

The number 11 here is the count of commits on the branch after it diverged from main.

@jsync-swirlds
Copy link
Contributor

Just an initial note based on the PR description.

To make this maximally compatible with the Block Node, it would be helpful to zstd compress the written block files before they're added to a tar (ideally when first written).

@rockysingh rockysingh force-pushed the 1845-create-live-downloader-for-latest-files branch from 52531c8 to 84185a0 Compare November 19, 2025 02:40
@rockysingh rockysingh closed this Nov 21, 2025
@rockysingh rockysingh reopened this Nov 21, 2025
@rockysingh rockysingh force-pushed the 1845-create-live-downloader-for-latest-files branch from f4c1d91 to afa8cc7 Compare November 21, 2025 21:44
@rockysingh
Copy link
Contributor Author

Just an initial note based on the PR description.

To make this maximally compatible with the Block Node, it would be helpful to zstd compress the written block files before they're added to a tar (ideally when first written).

Sure, will add that.

@rockysingh rockysingh force-pushed the 1845-create-live-downloader-for-latest-files branch 2 times, most recently from 1224daa to 3d937fb Compare November 25, 2025 22:22
# Conflicts:
#	tools-and-tests/tools/src/main/java/org/hiero/block/tools/days/subcommands/DownloadLive.java

Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
…json)

- Persists `dayKey` and `lastSeenBlock` after each tick for resuming downloads
- Loads state automatically once per day to continue from the last processed block
- Uses minimal JSON read/write via java.nio and regex (no external dependencies)
- Added console logs for state load/resume/failure
- Ensures parent directories are created before writing
- Completes core of resumable day-scoped live poller (next: dedupe + compression)

Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
- Persists  and  after each tick for resuming downloads
- Loads state automatically once per day to continue from the last processed block
- Uses minimal JSON read/write via java.nio and regex (no external dependencies)
- Added console logs for state load/resume/failure
- Ensures parent directories are created before writing
- Completes core of resumable day-scoped live poller (next: dedupe + compression)

Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
	•	Copy files from sourceRoot into a tmp file, then atomically move to the per-day folder
	•	Decompress .gz in-memory and parse via RecordFileInfo.parse(...)
	•	Compute the block hash and compare against the mirror-provided hash (with optional 0x prefix)
	•	Append each successfully validated file into a per-day <dayKey>.tar via system tar
	•	Add background compression and cleanup:
	•	On day rollover, schedule async compression of <dayKey>.tar → <dayKey>.tar.zstd using zstd
	•	After successful compression, delete the per-day <dayKey>/ folder while keeping both .tar and .tar.zstd
	•	Make the poller continuous across days:
	•	Detect day change in the configured rollover timezone
	•	Finalize the previous day in the background and start writing into the new day’s tar
	•	Preserve lastSeenBlock across days to maintain monotonic block ingestion and avoid re-downloading previous blocks
	•	Add logging around state load/save, sample descriptors, download/placement, tar append, and compression/cleanup for operational visibility

Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
this then allows us to to use the download-live in three modes

1. start date with no end
2. start date with end date
3. or in pure live mode.

Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
…ily tar rollover

Includes mirror polling, GCS downloads, hash+signature validation, tar append, zstd rollover, state persistence, and quarantine handling.

Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
…rical day ranges and then transitioning seamlessly into live-follow behaviour. The previous implementation always processed “today” regardless of --start-day/--end-day; this patch replaces that logic with explicit day-by-day iteration.

Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
…ed to pass in storage class

Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
- Wire LiveDownloader to use DownloadDayLiveImpl.downloadSingleBlockForLive(...) for live-day batches
- Introduce BlockDownloadResult-based fullBlockValidate(...) that builds RecordFileBlockV6
  directly from the in-memory files (record, sigs, sidecars) returned by DownloadDayLiveImpl
- Run full block validation in live mode before persisting any files or advancing
  previousRecordFileHash, skipping persistence on validation failure
- Persist fully validated files into per-day folders and append to <dayKey>.tar to
  mirror historic downloadDay behaviour
- Keep historical days on the existing full-day downloadDay(...) pipeline for backfill

Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
@rockysingh rockysingh force-pushed the 1845-create-live-downloader-for-latest-files branch from 3d937fb to c089745 Compare November 25, 2025 22:22
@rockysingh rockysingh marked this pull request as ready for review November 25, 2025 22:23
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
Copy link
Contributor

@mustafauzunn mustafauzunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build is failing

@rockysingh rockysingh requested a review from a team as a code owner November 27, 2025 00:01
Signed-off-by: Rocky Thind <harpender.t@swirldslabs.com>
@rockysingh rockysingh force-pushed the 1845-create-live-downloader-for-latest-files branch from f1ba2e6 to 4da2805 Compare November 27, 2025 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Block Node Tools Additional tools related to, but not part of, the Block Node

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants