Skip to content

Conversation

@lalvarezt
Copy link
Owner

No description provided.

This commit addresses several critical issues in the CI/CD workflows:

1. Fix broken baseline artifact storage:
   - benchmark.yml now downloads baseline from update-baseline.yml workflow
   - Previously tried to download from itself after baseline upload was removed

2. Standardize Rust toolchain versions to stable:
   - CI workflow: Changed test and rustfmt jobs from nightly to stable
   - CD workflow: Changed deb publish job from nightly to stable
   - Ensures consistent builds and testing across all jobs

These changes restore benchmark comparisons and ensure consistent CI/CD behavior.
Add new workflow that allows repository owners to trigger benchmark
comparisons between any two refs via PR comments.

Features:
- Command syntax: /bench <ref1> <ref2>
- Owner-only security: Only repo owner can trigger
- Works on PR comments only (not regular issues)
- Compares any two commits, branches, or tags
- Posts detailed comparison report to PR
- Interactive reactions (👀 → 🚀 or 😕)

Use cases:
- Compare feature branch vs stable release
- Validate optimizations between specific commits
- Test performance across version boundaries
- Ad-hoc performance debugging

Documentation:
- Updated scripts/README.md with complete workflow documentation
- Added security model and usage examples
- Documented all three benchmark workflows
Removed automatic benchmarking workflows and consolidated to a single
on-demand command-based system with enhanced flexibility.

Changes:
- Removed benchmark.yml (automatic benchmarks on every PR/push)
- Removed update-baseline.yml (manual baseline updates)
- Enhanced bench-command.yml with optional parameters:
  * /bench <ref1> <ref2> [iterations] [sizes]
  * Defaults: iterations=100, sizes=1000,5000,10000
  * Examples: /bench main HEAD
             /bench v0.12.0 v0.13.0 100 1000,5000,10000

New features:
- Graceful handling when benchmark tool doesn't exist in refs
- Posts helpful error messages with commit info
- Validates all parameters before running
- Supports custom iterations and sizes per comparison
- Build logs included in artifacts for debugging

Benefits:
- No automatic runs = lower CI costs
- Owner controls when benchmarks run
- Flexible parameters per comparison
- Clear error messages for invalid refs
- Only one workflow to maintain

Documentation:
- Completely rewrote scripts/README.md
- Added comprehensive usage examples
- Documented all error cases
- Added use case scenarios
- Simplified troubleshooting guide
Address critical issues to make the workflow less flaky and more reliable.

Fixes:

1. Permissions
   - Added issues: write and pull-requests: write to check-permission job
   - Allows posting comments and reactions without permission errors

2. Concurrency control
   - Added concurrency group: bench-${{ issue.number }}
   - Prevents overlapping benchmark runs on the same PR
   - Uses cancel-in-progress: false to queue instead of cancel

3. Ref fetching
   - Added targeted git fetch before validation
   - Fetches refs, tags, and heads explicitly
   - Prevents "ref not found" errors for remote branches/tags

4. Shell robustness
   - Added set -euo pipefail to all bash scripts
   - Ensures early failure on any error
   - Prevents silent failures and undefined variables

5. Early parse feedback
   - Added continue-on-error to parse step
   - Posts immediate error message on invalid format
   - Shows usage examples with all parameters
   - Prevents workflow from proceeding with bad params

Benefits:
- More reliable ref resolution
- Clear error messages for invalid input
- No concurrent runs causing conflicts
- Proper permissions for all operations
- Fail-fast behavior prevents wasted CI time
Critical fixes for YAML parsing and workflow reliability:

1. YAML Syntax Errors (CRITICAL)
   - Fixed template literals with markdown causing YAML parse errors
   - Converted all multi-line template literals to array.join() pattern
   - Lines starting with ** were interpreted as YAML alias markers
   - Affected: parse error message, acknowledgment, results, error handler

2. Graceful Error Handling
   - Removed process.exit(78) from "Post missing tool error" step
   - Non-zero exit triggers handle-error job unnecessarily
   - Existing if guards properly skip subsequent steps
   - Workflow now exits gracefully without false failures

3. Concurrency Behavior
   - Changed cancel-in-progress from false to true
   - Previous: Multiple /bench commands queue and run sequentially
   - Now: Newer /bench commands cancel older pending runs
   - Saves CI time and provides faster feedback

All changes maintain functionality while fixing syntax and improving UX.
Implement the intended benchmark logic to ensure correct comparisons:

1. Automatic commit ordering
   - Determines which ref is older (baseline) vs newer (current)
   - Uses git commit timestamps for ordering
   - Ensures baseline is always the older commit regardless of argument order
   - Example: /bench v0.13.0 main → baseline=v0.13.0, current=main (if main is newer)

2. Same-commit detection
   - Checks if both refs resolve to the same SHA
   - Exits early with helpful message if refs are identical
   - Prevents wasteful benchmark runs on identical code
   - Example: /bench main origin/main → detects same commit, posts message

3. Updated all steps to use determined ordering
   - Renamed steps: check_ref1_tool → check_baseline_tool, check_ref2_tool → check_current_tool
   - Updated benchmark steps to use baseline_ref and current_ref
   - Updated artifact names: benchmark_ref1.json → benchmark_baseline.json
   - Updated build logs: build_ref1.log → build_baseline.log
   - Results always show: Baseline (older) vs Current (newer)

4. Improved error messages
   - Missing tool errors now indicate "baseline (older)" vs "current (newer)"
   - Same-commit message explains refs are identical
   - Clear distinction between user input and determined ordering

Benefits:
- Consistent baseline/current semantics regardless of argument order
- No wasted CI time on identical commits
- Clear, predictable results in every scenario
- Better user experience with informative messages
@lalvarezt lalvarezt self-assigned this Nov 6, 2025
@lalvarezt lalvarezt added the enhancement New feature or request label Nov 6, 2025
@lalvarezt lalvarezt merged commit 21f1bc1 into main Nov 6, 2025
4 checks passed
@lalvarezt lalvarezt deleted the claude/fix-cicd-inconsistencies-011CUrHwbNj9K2r2jwNMkdmh branch November 6, 2025 09:31
lalvarezt added a commit that referenced this pull request Nov 6, 2025
* fix(ci): resolve CI/CD inconsistencies and broken baseline storage

This commit addresses several critical issues in the CI/CD workflows:

1. Fix broken baseline artifact storage:
   - benchmark.yml now downloads baseline from update-baseline.yml
workflow
   - Previously tried to download from itself after baseline upload was
removed

2. Standardize Rust toolchain versions to stable:
   - CI workflow: Changed test and rustfmt jobs from nightly to stable
   - CD workflow: Changed deb publish job from nightly to stable
   - Ensures consistent builds and testing across all jobs

These changes restore benchmark comparisons and ensure consistent CI/CD
behavior.

* feat(ci): add /bench command for on-demand PR benchmark comparisons

Add new workflow that allows repository owners to trigger benchmark
comparisons between any two refs via PR comments.

Features:
- Command syntax: /bench <ref1> <ref2>
- Owner-only security: Only repo owner can trigger
- Works on PR comments only (not regular issues)
- Compares any two commits, branches, or tags
- Posts detailed comparison report to PR
- Interactive reactions (👀 → 🚀 or 😕)

Use cases:
- Compare feature branch vs stable release
- Validate optimizations between specific commits
- Test performance across version boundaries
- Ad-hoc performance debugging

Documentation:
- Updated scripts/README.md with complete workflow documentation
- Added security model and usage examples
- Documented all three benchmark workflows

* refactor(ci): simplify benchmark system to single /bench command

Removed automatic benchmarking workflows and consolidated to a single
on-demand command-based system with enhanced flexibility.

Changes:
- Removed benchmark.yml (automatic benchmarks on every PR/push)
- Removed update-baseline.yml (manual baseline updates)
- Enhanced bench-command.yml with optional parameters:
  * /bench <ref1> <ref2> [iterations] [sizes]
  * Defaults: iterations=100, sizes=1000,5000,10000
  * Examples: /bench main HEAD
             /bench v0.12.0 v0.13.0 100 1000,5000,10000

New features:
- Graceful handling when benchmark tool doesn't exist in refs
- Posts helpful error messages with commit info
- Validates all parameters before running
- Supports custom iterations and sizes per comparison
- Build logs included in artifacts for debugging

Benefits:
- No automatic runs = lower CI costs
- Owner controls when benchmarks run
- Flexible parameters per comparison
- Clear error messages for invalid refs
- Only one workflow to maintain

Documentation:
- Completely rewrote scripts/README.md
- Added comprehensive usage examples
- Documented all error cases
- Added use case scenarios
- Simplified troubleshooting guide

* fix(ci): improve bench workflow robustness and reliability

Address critical issues to make the workflow less flaky and more
reliable.

Fixes:

1. Permissions
   - Added issues: write and pull-requests: write to check-permission
job
   - Allows posting comments and reactions without permission errors

2. Concurrency control
   - Added concurrency group: bench-${{ issue.number }}
   - Prevents overlapping benchmark runs on the same PR
   - Uses cancel-in-progress: false to queue instead of cancel

3. Ref fetching
   - Added targeted git fetch before validation
   - Fetches refs, tags, and heads explicitly
   - Prevents "ref not found" errors for remote branches/tags

4. Shell robustness
   - Added set -euo pipefail to all bash scripts
   - Ensures early failure on any error
   - Prevents silent failures and undefined variables

5. Early parse feedback
   - Added continue-on-error to parse step
   - Posts immediate error message on invalid format
   - Shows usage examples with all parameters
   - Prevents workflow from proceeding with bad params

Benefits:
- More reliable ref resolution
- Clear error messages for invalid input
- No concurrent runs causing conflicts
- Proper permissions for all operations
- Fail-fast behavior prevents wasted CI time

* fix(ci): fix YAML syntax errors and improve workflow behavior

Critical fixes for YAML parsing and workflow reliability:

1. YAML Syntax Errors (CRITICAL)
   - Fixed template literals with markdown causing YAML parse errors
   - Converted all multi-line template literals to array.join() pattern
   - Lines starting with ** were interpreted as YAML alias markers
   - Affected: parse error message, acknowledgment, results, error
handler

2. Graceful Error Handling
   - Removed process.exit(78) from "Post missing tool error" step
   - Non-zero exit triggers handle-error job unnecessarily
   - Existing if guards properly skip subsequent steps
   - Workflow now exits gracefully without false failures

3. Concurrency Behavior
   - Changed cancel-in-progress from false to true
   - Previous: Multiple /bench commands queue and run sequentially
   - Now: Newer /bench commands cancel older pending runs
   - Saves CI time and provides faster feedback

All changes maintain functionality while fixing syntax and improving UX.

* feat(ci): add intelligent commit ordering and same-commit detection

Implement the intended benchmark logic to ensure correct comparisons:

1. Automatic commit ordering
   - Determines which ref is older (baseline) vs newer (current)
   - Uses git commit timestamps for ordering
   - Ensures baseline is always the older commit regardless of argument
order
   - Example: /bench v0.13.0 main → baseline=v0.13.0, current=main (if
main is newer)

2. Same-commit detection
   - Checks if both refs resolve to the same SHA
   - Exits early with helpful message if refs are identical
   - Prevents wasteful benchmark runs on identical code
   - Example: /bench main origin/main → detects same commit, posts
message

3. Updated all steps to use determined ordering
   - Renamed steps: check_ref1_tool → check_baseline_tool,
check_ref2_tool → check_current_tool
   - Updated benchmark steps to use baseline_ref and current_ref
   - Updated artifact names: benchmark_ref1.json →
benchmark_baseline.json
   - Updated build logs: build_ref1.log → build_baseline.log
   - Results always show: Baseline (older) vs Current (newer)

4. Improved error messages
   - Missing tool errors now indicate "baseline (older)" vs "current
(newer)"
   - Same-commit message explains refs are identical
   - Clear distinction between user input and determined ordering

Benefits:
- Consistent baseline/current semantics regardless of argument order
- No wasted CI time on identical commits
- Clear, predictable results in every scenario
- Better user experience with informative messages

---------

Co-authored-by: Claude <noreply@anthropic.com>
lalvarezt added a commit that referenced this pull request Nov 9, 2025
* fix(ci): resolve CI/CD inconsistencies and broken baseline storage

This commit addresses several critical issues in the CI/CD workflows:

1. Fix broken baseline artifact storage:
   - benchmark.yml now downloads baseline from update-baseline.yml
workflow
   - Previously tried to download from itself after baseline upload was
removed

2. Standardize Rust toolchain versions to stable:
   - CI workflow: Changed test and rustfmt jobs from nightly to stable
   - CD workflow: Changed deb publish job from nightly to stable
   - Ensures consistent builds and testing across all jobs

These changes restore benchmark comparisons and ensure consistent CI/CD
behavior.

* feat(ci): add /bench command for on-demand PR benchmark comparisons

Add new workflow that allows repository owners to trigger benchmark
comparisons between any two refs via PR comments.

Features:
- Command syntax: /bench <ref1> <ref2>
- Owner-only security: Only repo owner can trigger
- Works on PR comments only (not regular issues)
- Compares any two commits, branches, or tags
- Posts detailed comparison report to PR
- Interactive reactions (👀 → 🚀 or 😕)

Use cases:
- Compare feature branch vs stable release
- Validate optimizations between specific commits
- Test performance across version boundaries
- Ad-hoc performance debugging

Documentation:
- Updated scripts/README.md with complete workflow documentation
- Added security model and usage examples
- Documented all three benchmark workflows

* refactor(ci): simplify benchmark system to single /bench command

Removed automatic benchmarking workflows and consolidated to a single
on-demand command-based system with enhanced flexibility.

Changes:
- Removed benchmark.yml (automatic benchmarks on every PR/push)
- Removed update-baseline.yml (manual baseline updates)
- Enhanced bench-command.yml with optional parameters:
  * /bench <ref1> <ref2> [iterations] [sizes]
  * Defaults: iterations=100, sizes=1000,5000,10000
  * Examples: /bench main HEAD
             /bench v0.12.0 v0.13.0 100 1000,5000,10000

New features:
- Graceful handling when benchmark tool doesn't exist in refs
- Posts helpful error messages with commit info
- Validates all parameters before running
- Supports custom iterations and sizes per comparison
- Build logs included in artifacts for debugging

Benefits:
- No automatic runs = lower CI costs
- Owner controls when benchmarks run
- Flexible parameters per comparison
- Clear error messages for invalid refs
- Only one workflow to maintain

Documentation:
- Completely rewrote scripts/README.md
- Added comprehensive usage examples
- Documented all error cases
- Added use case scenarios
- Simplified troubleshooting guide

* fix(ci): improve bench workflow robustness and reliability

Address critical issues to make the workflow less flaky and more
reliable.

Fixes:

1. Permissions
   - Added issues: write and pull-requests: write to check-permission
job
   - Allows posting comments and reactions without permission errors

2. Concurrency control
   - Added concurrency group: bench-${{ issue.number }}
   - Prevents overlapping benchmark runs on the same PR
   - Uses cancel-in-progress: false to queue instead of cancel

3. Ref fetching
   - Added targeted git fetch before validation
   - Fetches refs, tags, and heads explicitly
   - Prevents "ref not found" errors for remote branches/tags

4. Shell robustness
   - Added set -euo pipefail to all bash scripts
   - Ensures early failure on any error
   - Prevents silent failures and undefined variables

5. Early parse feedback
   - Added continue-on-error to parse step
   - Posts immediate error message on invalid format
   - Shows usage examples with all parameters
   - Prevents workflow from proceeding with bad params

Benefits:
- More reliable ref resolution
- Clear error messages for invalid input
- No concurrent runs causing conflicts
- Proper permissions for all operations
- Fail-fast behavior prevents wasted CI time

* fix(ci): fix YAML syntax errors and improve workflow behavior

Critical fixes for YAML parsing and workflow reliability:

1. YAML Syntax Errors (CRITICAL)
   - Fixed template literals with markdown causing YAML parse errors
   - Converted all multi-line template literals to array.join() pattern
   - Lines starting with ** were interpreted as YAML alias markers
   - Affected: parse error message, acknowledgment, results, error
handler

2. Graceful Error Handling
   - Removed process.exit(78) from "Post missing tool error" step
   - Non-zero exit triggers handle-error job unnecessarily
   - Existing if guards properly skip subsequent steps
   - Workflow now exits gracefully without false failures

3. Concurrency Behavior
   - Changed cancel-in-progress from false to true
   - Previous: Multiple /bench commands queue and run sequentially
   - Now: Newer /bench commands cancel older pending runs
   - Saves CI time and provides faster feedback

All changes maintain functionality while fixing syntax and improving UX.

* feat(ci): add intelligent commit ordering and same-commit detection

Implement the intended benchmark logic to ensure correct comparisons:

1. Automatic commit ordering
   - Determines which ref is older (baseline) vs newer (current)
   - Uses git commit timestamps for ordering
   - Ensures baseline is always the older commit regardless of argument
order
   - Example: /bench v0.13.0 main → baseline=v0.13.0, current=main (if
main is newer)

2. Same-commit detection
   - Checks if both refs resolve to the same SHA
   - Exits early with helpful message if refs are identical
   - Prevents wasteful benchmark runs on identical code
   - Example: /bench main origin/main → detects same commit, posts
message

3. Updated all steps to use determined ordering
   - Renamed steps: check_ref1_tool → check_baseline_tool,
check_ref2_tool → check_current_tool
   - Updated benchmark steps to use baseline_ref and current_ref
   - Updated artifact names: benchmark_ref1.json →
benchmark_baseline.json
   - Updated build logs: build_ref1.log → build_baseline.log
   - Results always show: Baseline (older) vs Current (newer)

4. Improved error messages
   - Missing tool errors now indicate "baseline (older)" vs "current
(newer)"
   - Same-commit message explains refs are identical
   - Clear distinction between user input and determined ordering

Benefits:
- Consistent baseline/current semantics regardless of argument order
- No wasted CI time on identical commits
- Clear, predictable results in every scenario
- Better user experience with informative messages

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants