feat(pdp): BIG RENAME "proof set"->"data set", "root"->"piece" #569

rvagg · 2025-07-30T09:48:09Z

This is a rename to match the PDPVerifier contract that is about to be deployed. The main changes are the terminology of "proof set" becoming "data set" and "root" becoming "piece", but these have large flow-on effects.

I've split this into 3 commits, so it's possible to not go the whole way, although the last commit does contain additional cleanup as I kept on finding things, unfortunately, so if we opt for a part-way rename then I'd have to do a bit more reconciliation work. IMO just doing the full thing makes everything easier and we don't have weird mismatches anywhere in the stack. But I understand that it introduces a little more complication in the mkv2 migration but hopefully that's minimal since the rules are clear.

Internal rename, mainly focused on the interaction with PDPVerifier, trying to be fairly minimal and non-breaking except for the PDPVerifier interaction (this was my initial intent for this PR but the more I got into the it the more I felt that we could/should just go the whole way).
External API rename - /pdp/ routes get renamed and their request and response values get renamed too. I took the opportunity to fix some inconsistencies (like piece_cid->pieceCid and proofsetRoot->dataSetRoot (S`)) and opted for "SubRoot" and "subRoot", never "subroot". This commit does a fairly nice job of cleaning things up but it is breaking for clients. Ultimately in the SDK and all external interactions we want to entirely remove the old language, we don't want to see error traces or network logs showing "proofset" or "root".
This is the whole hog - database tables and columns renamed and a migration to support it. I've tested the migration with my database and it works nicely.

Unfortunately, as I mentioned above, this third commit also contains some non-database renames that I found and ended up getting squashed into that commit.

Going to leave this draft until we have a contract deployed and the ability to test it against it. I'd like to do full e2e testing with the SDK rename I'm working on in FilOzone/synapse-sdk#129.

magik6k · 2025-08-02T17:14:43Z

harmony/harmonydb/sql/20250730-pdp-rename.sql

+-- Check if migration has already been run (idempotency check)
+DO $$
+BEGIN
+    IF EXISTS (SELECT 1 FROM information_schema.tables 
+               WHERE table_schema = current_schema() 
+               AND table_name = 'pdp_data_sets') THEN
+        RAISE NOTICE 'Migration already applied, skipping...';
+        RETURN;
+    END IF;
+END
+$$;


Migrations are already guarded in harmonydb, this is redundant - https://github.com/filecoin-project/curio/blob/main/harmony/harmonydb/harmonydb.go#L271-L297

(To be fair we could improve the harmonydb guard - right now the base table prevents from a migration running twice, but it doesn't prevent from two nodes starting a migration at the same time - this could be solved with a second table, e.g. base_apply - before starting an update a node would look at that table, try to insert an entry with it's address - if that fails it means another node is running the upgrade, or that the upgrade has failed, and in either case the correct behavior is to sit there and wait for the entry to disappear + relevant base entry to appear)

magik6k · 2025-08-02T17:22:59Z

harmony/harmonydb/sql/20250730-pdp-rename.sql

What happens if this is applied in a running cluster?

I'm all for the rename, but there are multiple live pretty serious clusters with the expectation that all upgrades can be performed live (tho admittedly that set doesn't really overlap with the set of nodes running PDP, for now).

Maybe for now this is fine, but in the future we need to build some mechanism which makes migrations like this possible - e.g. a feature/upgrade flags table, where each node goes and says e.g. "I support PDP name migration", then the cluster operator would roughly:

Do a rolling upgrade of all nodes in the cluster

All nodes would advertise support for the pdp upgrade

The upgrade would trigger (automatically or manually) in some atomic way (e.g. cordoning tasks related to the feature flag)

Additional mechanism could e.g. apply an implicit config layer which would be applied to nodes NOT supporting a certain feature flag enabled in the cluster (so that old nodes joining the cluster after a feature flag was enabled don't cause a mess)

rvagg · 2025-08-04T11:52:11Z

Updated new PDP contract address in here and made it bork on mainnet PDP because that's not deployed yet (@rjan90 maybe we should get a matching mainnet one deployed too?).

Re database -- one problem we have with this is that the data itself cannot be migrated because the new contract is not a plain upgrade of the old one with state copied across, so this unfortunately has to be coupled with a cleared database. Which is not going to be ideal for anyone on mainnet, but currently we lack any infrastructure that has a notion of matching proof sets ("data sets") to contracts, it's just one global contract. So we'll have to work on docs & messaging for that, perhaps also some warning or prompting .. somehow.

Ref: FilOzone/pdp#181 (comment)

rvagg · 2025-08-04T11:57:13Z

Retargetted to the rename branch so we can do CommPv2 work on top of this and iterate this into a nicely working state.

rjan90 · 2025-08-05T06:09:27Z

I've manually merged the changes from rvagg:rvagg/rename into the rename branch to preserve the commit history and optionality that Rod included in his implementation.

rvagg force-pushed the rvagg/rename branch from 40a1cbf to b6bd1c7 Compare July 30, 2025 09:51

rjan90 added the team/fs-wg Items being worked on or tracked by the "FS Working Group". See FilOzone/github-mgmt #10 label Jul 30, 2025

FilOzzy added this to FS Jul 30, 2025

github-project-automation bot moved this to 📌 Triage in FS Jul 30, 2025

This was referenced Jul 30, 2025

PDP: update term names #557

Closed

Make methods names more user-friendly FilOzone/synapse-sdk#105

Open

Publish new test contracts for PDP verifier FilOzone/pdp#181

Closed

magik6k reviewed Aug 2, 2025

View reviewed changes

rjan90 assigned rvagg Aug 4, 2025

rvagg added 4 commits August 4, 2025 21:53

feat(pdp): internal rename "proof set"->"data set", "root"->"piece"

acb6e7e

feat(pdp): external rename "proof set"->"data set", "root"->"piece"

bea4dce

feat(pdp): db & extras rename "proof set"->"data set", "root"->"piece"

69348ba

feat!: update PDP contract address

d42873d

Ref: FilOzone/pdp#181 (comment)

rvagg force-pushed the rvagg/rename branch from b6bd1c7 to d42873d Compare August 4, 2025 11:53

rvagg changed the base branch from main to rename August 4, 2025 11:56

rvagg marked this pull request as ready for review August 4, 2025 11:57

rvagg mentioned this pull request Aug 4, 2025

fix: extend CurioChainSched timeout, isolate ctx cancellation #575

Merged

rjan90 merged commit d42873d into filecoin-project:rename Aug 5, 2025
17 checks passed

github-project-automation bot moved this from 📌 Triage to 🎉 Done in FS Aug 5, 2025

ZenGround0 mentioned this pull request Aug 12, 2025

feat: market 2.0 #508

Open

33 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(pdp): BIG RENAME "proof set"->"data set", "root"->"piece" #569

feat(pdp): BIG RENAME "proof set"->"data set", "root"->"piece" #569

Uh oh!

rvagg commented Jul 30, 2025

Uh oh!

magik6k Aug 2, 2025

Uh oh!

magik6k Aug 2, 2025

Uh oh!

rvagg commented Aug 4, 2025

Uh oh!

rvagg commented Aug 4, 2025

Uh oh!

Uh oh!

rjan90 commented Aug 5, 2025

Uh oh!

Uh oh!

feat(pdp): BIG RENAME "proof set"->"data set", "root"->"piece" #569

feat(pdp): BIG RENAME "proof set"->"data set", "root"->"piece" #569

Uh oh!

Conversation

rvagg commented Jul 30, 2025

Uh oh!

magik6k Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

magik6k Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

rvagg commented Aug 4, 2025

Uh oh!

rvagg commented Aug 4, 2025

Uh oh!

Uh oh!

rjan90 commented Aug 5, 2025

Uh oh!

Uh oh!