Skip to content

Conversation

@celiala
Copy link
Collaborator

@celiala celiala commented Nov 22, 2025

Part of the quarterly M.4 "Bump MinSupported" task as outlined in pkg/clusterversion/README.md.

This PR bumps MinSupported from v25.3 to v25.4 in 5 commits:

Fixes: #158383

Commit 1: Refactor CLAUDE.md into modular runbooks

Refactors the monolithic 4114-line CLAUDE.md into a modular structure:

  • Main CLAUDE.md (113 lines): Overview, navigation, and general guidance
  • Individual runbooks in pkg/clusterversion/runbooks/:
    • R1_prepare_for_beta.md
    • R2_mint_release.md
    • M1_bump_current_version.md
    • M2_enable_mixed_cluster_logic_tests.md
    • M3_enable_upgrade_tests.md
    • M4_bump_minsupported_version.md
  • Updated README.md Claude Prompts to reference new runbook locations

This makes each runbook easier to read, maintain, and use independently.

Note: This commit also adds Step 7 to the M.1 runbook documenting the requirement to update pkg/storage/pebble.go with a version mapping for the new release, which was missing from the original M.1 documentation.

Commit 2: Bump MinSupported from v25.3 to v25.4

  • Updated MinSupported constant in pkg/clusterversion/cockroach_versions.go from V25_3 to V25_4
  • Updated pkg/storage/pebble.go:
    • Added V25_4 entry to pebbleFormatVersionMap with FormatMarkForCompactionInVersionEdit (fixes missing entry that should have been added during M.1)
    • Updated MinimumSupportedFormatVersion from FormatValueSeparation to FormatMarkForCompactionInVersionEdit

After this change, clusters running v25.3 can no longer connect to clusters running master, and direct upgrades from v25.3 to master are no longer supported.

Commit 3: Remove schema changer rules for release_25_3

  • Deleted entire release_25_3 rules directory (~13,000 lines)
  • Removed import and registry entry from plan.go
  • Updated Bazel dependencies

These frozen schema changer rules are no longer needed since v25.3 nodes can no longer participate in mixed-version clusters.

Commit 4: Remove bootstrap data for v25.3

  • Deleted 25_3_system.keys, 25_3_system.sha256
  • Deleted 25_3_tenant.keys, 25_3_tenant.sha256
  • Removed V25_3 entry from initialValuesFactoryByKey map
  • Removed go:embed variables for v25.3 bootstrap data
  • Updated BUILD.bazel to remove embedsrcs

Bootstrap data files are no longer needed since clusters can no longer start at v25.3 (below MinSupported).

Commit 5: Remove local-mixed-25.3 test configuration

  • Removed local-mixed-25.3 config from logictestbase.go
  • Removed from default-configs and schema-locked-disabled sets
  • Deleted generated test directories for local-mixed-25.3
  • Removed all skipif/onlyif references from 64 logic test files
  • Regenerated Bazel BUILD files via ./dev gen bazel

The local-mixed-25.3 configuration simulates a mixed-version cluster with v25.3 nodes, which can no longer connect after bumping MinSupported.


Reference PR: #157767 (v25.2 → v25.3 MinSupported bump)

Epic: None
Release note: None

@blathers-crl
Copy link

blathers-crl bot commented Nov 22, 2025

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@celiala celiala force-pushed the bump-minsupported-25.3-to-25.4 branch 7 times, most recently from 3a7e916 to fca94ab Compare November 24, 2025 17:11
@celiala
Copy link
Collaborator Author

celiala commented Nov 24, 2025

Unit tests are failing for pkg/upgrade/upgrades/first_upgrade_test.go: TestFirstUpgradeRepair

  • @fqazi or @rafiss - could you recommend what to do for the TestFirstUpgradeRepair test failure?
  • See also analysis and recommendation from Claude below

thank you!

# pkg/upgrade/upgrades/first_upgrade_test.go
=== RUN   TestFirstUpgradeRepair
...
first_upgrade_test.go:288: expected query 'ALTER TABLE test.public.foo RENAME TO bar'
error 'relation "foo" \(106\): invalid foreign key backreference: missing table=123456789...'
got: pq: referenced descriptor ID 123456789: looking up ID 123456789: descriptor not found

Claude's Analysis: TestFirstUpgradeRepair Test Failure

Issue: Error message changed from detailed FK-specific format to simpler descriptor lookup format.

TL;DR

Questions

  1. Was this error message change intentional?
  2. Is the detailed FK validation error valuable for debugging, or is "descriptor not found" sufficient?
  3. Should we standardize on the simpler format (matching repair_test.go)?

Recommendation

Update test expectations (Option 1). The new format:

  • Correctly identifies the problem
  • Matches existing test patterns
  • Maintains test's core purpose (validating repair functionality)
  • 3-line change

If detailed FK errors are preferred, investigate and restore previous behavior (Option 2).

Root Cause

Descriptor validation now happens earlier during lookup (pkg/sql/catalog/errors.go) rather than during FK constraint validation. The simpler error surfaces before the detailed FK validation runs. This pattern matches pkg/sql/tests/repair_test.go:910 which already expects the simpler format.

Fix Options

Option 1: Update Test Expectations (Recommended)

// Change from:
const errRE = "relation \"foo\" \\(106\\): invalid foreign key backreference: missing table=123456789..."
const errReForFunction = " function \"f\" \\(107\\): referenced descriptor ID 123456789..."

// To:
const errRE = "referenced descriptor ID 123456789: looking up ID 123456789: descriptor not found"
tdb.ExpectErr(t, errRE, "ALTER TABLE test.public.foo RENAME TO bar")
tdb.ExpectErr(t, errRE, "ALTER FUNCTION test.public.f RENAME TO g")

Why: Simpler, consistent with existing tests, still validates corruption detection and repair.

Option 2: Investigate If Error Change Was Intentional

If the detailed FK error is valuable for debugging, investigate recent descriptor validation changes and consider restoring it.

Option 3: Accept Both Formats

Make test flexible to accept either error format using strings.Contains() checks.

@celiala celiala force-pushed the bump-minsupported-25.3-to-25.4 branch 6 times, most recently from b7abd5f to 66d3555 Compare December 2, 2025 02:13
@celiala celiala marked this pull request as ready for review December 2, 2025 04:58
@celiala celiala requested review from a team as code owners December 2, 2025 04:58
Copy link
Collaborator Author

@celiala celiala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is RFAL - thank you!

var pebbleFormatVersionMap = map[clusterversion.Key]pebble.FormatMajorVersion{
clusterversion.V25_4_PebbleFormatV2BlobFiles: pebble.FormatV2BlobFiles,
clusterversion.V25_3: pebble.FormatValueSeparation,
clusterversion.V25_4: pebble.FormatV2BlobFiles,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V25_4_PebbleFormatV2BlobFiles has been changed to V25_4, as discussed in #158225 (comment)

func TestPebbleFormatVersion(t *testing.T) {
defer leaktest.AfterTest(t)()

if len(pebbleFormatVersionMap) == 1 {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a test skip if len(map) is 1, as discussed in #158225 (comment)


// The corruption should interfere with DDL statements.
const errRE = "relation \"foo\" \\(106\\): invalid foreign key backreference: missing table=123456789: referenced table ID 123456789: referenced descriptor not found"
const errRE = "referenced descriptor ID 123456789: looking up ID 123456789: descriptor not found"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fqazi
Copy link
Collaborator

fqazi commented Dec 2, 2025

Unit tests are failing for pkg/upgrade/upgrades/first_upgrade_test.go: TestFirstUpgradeRepair

@celiala Option 1 is fine for this, the failure is linked to the declarative schema changer being activated. So it hits the same error at the different point in the code.

Copy link
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes all look good. One comment / concern related to the TestFirstUpgradeRepair runbook, the steps seems geared towards this specific problem. I'm skeptical we'll see the same problem again.

@fqazi reviewed 3 of 81 files at r12, 3 of 3 files at r13, 31 of 31 files at r14, 7 of 7 files at r15, 32 of 37 files at r16, 8 of 8 files at r17, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @celiala, @RaduBerinde, @rail, and @xinhaoz)


pkg/clusterversion/runbooks/M4_bump_minsupported_version.md line 269 at r17 (raw file):

⚠️ IMPORTANT: Update test expectations in first_upgrade_test.go

I don't think we want this in the runbook in general? I think this was a one off thing.

The CLAUDE.md runbook had grown to 4114 lines (141K), making it difficult
to navigate and maintain. This commit refactors it into a modular structure:

- Main CLAUDE.md (113 lines): Overview, navigation, and general guidance
- Individual runbooks in pkg/clusterversion/runbooks/:
  - R1_prepare_for_beta.md
  - R2_mint_release.md
  - M1_bump_current_version.md
  - M2_enable_mixed_cluster_logic_tests.md
  - M3_enable_upgrade_tests.md
  - M4_bump_minsupported_version.md

Also updated README.md Claude Prompts to reference the new runbook locations.

This makes each runbook easier to read, maintain, and use independently.

Release note: None
Part of the quarterly M.4 "Bump MinSupported" task as outlined in
`pkg/clusterversion/README.md`.

This commit updates the MinSupported constant from V25_3 to V25_4,
along with all code that references the MinSupported version.

After this change, clusters running v25.3 can no longer connect to
clusters running master, and direct upgrades from v25.3 to master are
no longer supported.

Changes include updates to:
- MinSupported constant in pkg/clusterversion/cockroach_versions.go
- Storage layer: MinimumSupportedFormatVersion updated to FormatV2BlobFiles

Release note: None
Part of the quarterly M.4 "Bump MinSupported" task as outlined in
`pkg/clusterversion/README.md`.

After bumping MinSupported from v25.3 to v25.4, the frozen schema
changer rules for release 25.3 are no longer needed. These rules were
used to ensure schema changes work correctly in mixed-version clusters
with v25.3 nodes, which are no longer supported.

This commit removes:
- The entire release_25_3 rules directory
- Import and registry entry from plan.go
- Bazel dependency

Release note: None
Part of the quarterly M.4 "Bump MinSupported" task as outlined in
`pkg/clusterversion/README.md`.

This commit removes the bootstrap data for v25.3, which is now below the
minimum supported version after bumping MinSupported from v25.3 to v25.4.

The bootstrap data files are used to initialize clusters at specific
versions. Since clusters can no longer start at v25.3 (it's below
MinSupported), these files are no longer needed.

Changes:
- Removed 25_3_system.keys and 25_3_system.sha256
- Removed 25_3_tenant.keys and 25_3_tenant.sha256
- Removed V25_3 entry from initialValuesFactoryByKey map in initial_values.go
- Removed go:embed variables for v25.3 bootstrap data
- Updated BUILD.bazel to remove embedsrcs for deleted files

Part of cockroachdb#157767 (reference PR for this quarterly task).

Release note: None
Part of the quarterly M.4 "Bump MinSupported" task as outlined in
`pkg/clusterversion/README.md`.

After bumping MinSupported from v25.3 to v25.4, the local-mixed-25.3
test configuration is no longer needed since it simulates a mixed-version
cluster with v25.3 nodes, which can no longer connect to the cluster.

This commit:
- Removes the local-mixed-25.3 config from logictestbase.go
- Removes it from the default-configs and schema-locked-disabled sets
- Deletes the generated test directories for local-mixed-25.3
- Removes all references from logic test files (skipif/onlyif directives)
- Removes empty LogicTest directive lines that resulted from deletions
- Regenerates Bazel BUILD files via `./dev gen bazel`

Note: pebbleFormatVersionMap updates are deferred to a separate PR.
Following the pattern from PR cockroachdb#157767 (v25.2→v25.3 bump), pebble.go is
not modified in the M.4 PR. The map will be updated separately once
V26_1_* internal versions with Pebble format changes are added, similar
to how V25_4_PebbleFormatV2BlobFiles was added before the previous M.4.

Changes affect 64 test files that had skipif or onlyif directives
referencing local-mixed-25.3, plus the generated test files and BUILD
files that were auto-generated based on the removed configuration.

Part of cockroachdb#157767 (reference PR for this quarterly task pattern).

Release note: None
Part of the quarterly M.4 "Bump MinSupported" task as outlined in
`pkg/clusterversion/README.md`.

After bumping MinSupported from v25.3 to v25.4, the cockroach-go-testserver-25.3
configuration and related mixed-version tests are no longer needed:
- Clusters can no longer start below v25.4 (MinSupported)
- Mixed-version tests that check for v25.4 features are now obsolete since
  all clusters start at v25.4 or higher

This commit:
- Removes the cockroach-go-testserver-25.3 configuration from logictestbase.go
- Deletes the cockroach-go-testserver-25.3 test directory
- Deletes obsolete mixed-version test files:
  - mixed_version_ltree: Tested that LTREE is only available after upgrading to 25.4
  - mixed_version_partial_stats: Tested partial stats availability after upgrade to 25.4
- Regenerates Bazel BUILD files via `./dev gen bazel`

Following the pattern from reference PR cockroachdb#157767, which removed
cockroach-go-testserver-25.2 when MinSupported was bumped from v25.2 to v25.3.

Release note: None
@celiala celiala force-pushed the bump-minsupported-25.3-to-25.4 branch from 66d3555 to 7d4fb35 Compare December 3, 2025 03:14
@celiala
Copy link
Collaborator Author

celiala commented Dec 3, 2025

One comment / concern related to the TestFirstUpgradeRepair runbook, the steps seems geared towards this specific problem. I'm skeptical we'll see the same problem again.

Noted - this 1-off fix has been removed from the runbook 👍🏼

TFTRs!

bors r=fqazi

@craig craig bot merged commit 26a5f93 into cockroachdb:master Dec 3, 2025
24 of 26 checks passed
@craig
Copy link
Contributor

craig bot commented Dec 3, 2025

craig bot pushed a commit that referenced this pull request Dec 5, 2025
158684: sql: extend SST writer support to legacy schema changer backfill r=spilchen a=spilchen

Previously, SST writer support during index backfill with distributed merge was only available in the declarative schema changer (added in #158456). This commit extends the same logic to the legacy schema changer. To avoid duplication, shared logic was extracted into a helper function reused by both.

Informs #158378
Epic: CRDB-48845

Release note: none

158832: clusterversion: update M3 M4 runbooks r=rail a=celiala

This PR:
- adds a note to update `sqllogic_corpus_nightly_impl.sh` which was 
  - correctly done for 25.3 bump (#157767), 
  - but missed for 25.4 bump (#158225)
- adds a "quick reference" version for M3 and M4 runbooks, to mitigate bot confusion.

Improvements related to: 
- #158817
- #158741

Release note: None
Epic: None

Co-authored-by: Matt Spilchen <matt.spilchen@cockroachlabs.com>
Co-authored-by: Celia La <celia@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

clusterversion: advance MinSupported to v25.4

4 participants