Skip to content

Conversation

@Malmahrouqi3
Copy link
Collaborator

@Malmahrouqi3 Malmahrouqi3 commented Nov 10, 2025

User description

Description

Enhanced Framework to Continuously Benchmark MFC.
More Details to be Added,,,

Initial PR: #936


PR Type

Enhancement


Description

  • Add continuous benchmarking workflow triggered on PR reviews

  • Enhance bench_diff command with historical data comparison

  • Support loading benchmark data from JSON file for trend analysis

  • Improve argument parsing for bench_diff with file and test name options

  • Preserve data.js during documentation updates


Diagram Walkthrough

flowchart LR
  A["PR Review Trigger"] --> B["cont-bench.yml Workflow"]
  B --> C["Run Benchmarks"]
  C --> D["Load Historical Data"]
  D --> E["bench_diff Analysis"]
  E --> F["Compare vs Historical Avg"]
  F --> G["Generate Report"]
  H["data.js"] -.->|Historical Data| D
Loading

File Walkthrough

Relevant files
Enhancement
args.py
Add bench_diff arguments for data file and test name         

toolchain/mfc/args.py

  • Add -f/--file argument to bench_diff for specifying data file path
  • Add -n/--name argument to bench_diff for test name specification
  • Move name slugification logic from common parsing to run command only
  • Add special handling for bench_diff command to join multi-word test
    names
+7/-3     
bench.py
Integrate historical data loading and comparison                 

toolchain/mfc/bench.py

  • Import json module for loading benchmark data
  • Load historical benchmark data from JSON file if provided
  • Parse benchmark entries and compute average grind times per case
  • Add CB (Continuous Benchmarking) column to diff output table
  • Calculate offset percentage between current and historical average
  • Color-code performance comparisons (red/yellow/green)
+39/-2   
Configuration changes
bench.yml
Enable continuous benchmarking in bench workflow                 

.github/workflows/bench.yml

  • Clone documentation repository to access data.js
  • Copy data.js to PR directory for bench_diff analysis
  • Pass data file and test name parameters to bench_diff command
  • Enable historical benchmark comparison in existing workflow
+3/-1     
cont-bench.yml
Create continuous benchmarking workflow                                   

.github/workflows/cont-bench.yml

  • New workflow file for continuous benchmarking on PR reviews
  • Triggered on pull_request_review submission or manual dispatch
  • Matrix strategy for multiple clusters and device/interface
    combinations
  • Parallel benchmark execution for master and PR branches
  • Load and compare against historical data from documentation repo
  • Post results using github-action-benchmark with auto-push
  • Polish and update data.js with standardized formatting
+168/-0 
Bug fix
docs.yml
Preserve benchmark data during doc updates                             

.github/workflows/docs.yml

  • Add restore step to preserve documentation/data.js during doc updates
  • Prevent accidental deletion of benchmark historical data
+1/-0     
Documentation
bench.yaml
Add sample benchmark results file                                               

bench.yaml

  • Sample benchmark results file with multiple test cases
  • Contains output summaries with execution times and grind metrics
  • Includes metadata about benchmark invocation and system configuration
+285/-0 
data.js
Add historical benchmark data file                                             

data.js

  • Historical benchmark data in JSON format with multiple test entries
  • Contains commit information, timestamps, and performance metrics
  • Organized by test name with multiple benchmark runs per test
  • Includes grind values and execution statistics for trend analysis
+790/-0 

Copilot AI review requested due to automatic review settings November 10, 2025 19:45
@Malmahrouqi3 Malmahrouqi3 requested review from a team and sbryngelson as code owners November 10, 2025 19:45
@qodo-merge-pro
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 Security concerns

Sensitive information exposure:
The workflows clone and push using secrets.DOC_PUSH_URL. While masked, echoing or errors around git commands could leak URLs if not handled carefully. Consider using a scoped GITHUB_TOKEN with repository permissions and actions/checkout/actions-js/push patterns instead of embedding credentials in URLs.

⚡ Recommended focus areas for review

Possible Issue

Inconsistent handling of args["name"] across commands: sometimes expected as string (slugified) and sometimes as list (joined). This dual-type behavior can lead to downstream errors if other paths assume a single type.

    if isinstance(args["name"], str):
        # "Slugify" the name of the job
        args["name"] = re.sub(r'[\W_]+', '-', args["name"]).strip('-')
elif args["command"] == "bench_diff" and len(args["name"]) > 0:
    args["name"] = " ".join(args["name"])
Logic Bug

When building case_times, the condition resets the list if a later run has grind_value missing; this can drop prior values and skew averages. Also, offset_pct color logic flags positive deviation as red and zero as yellow, which may be inverted from expected semantics.

                if (case_name is None or case_name not in case_times) or grind_value is None:
                    case_times[case_name] = []
                case_times[case_name].append(grind_value)
        for case_name, values in case_times.items():
            avg = sum(values) / len(values)
            cb_stats[case_name] = {"avg": avg, "count": len(values)}
        cons.print(f"[bold]Loaded cb data for test: [bold]{cb_test}[/bold] ({len(cb_stats)} cases)[/bold]")
    else:
        cons.print(f"[bold yellow]Warning[/bold yellow]: Test '[bold]{cb_test}[/bold]' not found in data file.")
except Exception as e:
    cons.print(f"[bold yellow]Warning[/bold yellow]: Could not load data file: {e}")

table = rich.table.Table(show_header=True, box=rich.table.box.SIMPLE)
table.add_column("[bold]Case[/bold]",    justify="left")
table.add_column("[bold]Pre Process[/bold]", justify="right")
table.add_column("[bold]Simulation[/bold]", justify="right")
table.add_column("[bold]Post Process[/bold]", justify="right")
if cb_stats:
    table.add_column("[bold] CB (Grind)[/bold]", justify="right")

err = 0
for slug in slugs:
    lhs_summary, rhs_summary = lhs["cases"][slug]["output_summary"], rhs["cases"][slug]["output_summary"]
    speedups = ['N/A', 'N/A', 'N/A']
    grind_comparison = 'N/A'

    for i, target in enumerate(sorted(DEFAULT_TARGETS, key=lambda t: t.runOrder)):
        if (target.name not in lhs_summary) or (target.name not in rhs_summary):
            cons.print(f"{target.name} not present in lhs_summary or rhs_summary - Case: {slug}")
            err = 1; continue

        if not math.isfinite(lhs_summary[target.name]["exec"]) or not math.isfinite(rhs_summary[target.name]["exec"]):
            err = 1
            cons.print(f"lhs_summary or rhs_summary reports non-real exec time for {target.name} - Case: {slug}")
        try:
            exec_time_value = lhs_summary[target.name]["exec"] / rhs_summary[target.name]["exec"]
            if exec_time_value < 0.9:
                cons.print(f"[bold yellow]Warning[/bold yellow]: Exec time speedup for {target.name} is less than 0.9 - Case: {slug}")
            speedups[i] = f"Exec: {exec_time_value:.2f}"
            if target == SIMULATION:
                if not math.isfinite(lhs_summary[target.name]["grind"]) or not math.isfinite(rhs_summary[target.name]["grind"]):
                    err = 1
                    cons.print(f"lhs_summary or rhs_summary reports non-real grind time for {target.name} - Case: {slug}")

                grind_time_value = lhs_summary[target.name]["grind"] / rhs_summary[target.name]["grind"]
                speedups[i] += f" & Grind: {grind_time_value:.2f}"
                if grind_time_value <0.95:
                    cons.print(f"[bold red]Error[/bold red]: Benchmarking failed since grind time speedup for {target.name} below acceptable threshold (<0.95) - Case: {slug}")
                    err = 1
                if slug in cb_stats:
                    rhs_grind = rhs_summary[target.name]["grind"]
                    avg = cb_stats[slug]["avg"]
                    offset_pct = (rhs_grind - avg) / avg * 100
                    color = "red" if (offset_pct) > 0 else "yellow" if (offset_pct) == 0 else "green"
                    grind_comparison = f"[{color}]{offset_pct:+.2f}%[/{color}] (avg: {avg:.2f})"
Fragile Workflow

Using ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION and cloning/pushing via a secret URL may expose credentials if logs leak. Also, multiple matrix entries reuse identical names, which can create duplicate benchmark series in data.js.

  ACTIONS_RUNNER_FORCE_ACTIONS_NODE_VERSION: node16
  ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true
steps:
  - name: Clone - PR
    uses: actions/checkout@v4
    with:
      path: pr

  - name: Clone - Master
    uses: actions/checkout@v4
    with:
      repository: MFlowCode/MFC
      ref: master
      path: master

  - name: Setup & Build
    if: matrix.build_script != ''
    run: | 
      (cd pr     && ${{ matrix.build_script }}) &
      (cd master && ${{ matrix.build_script }}) &
      wait %1 && wait %2

  - name: Bench (Master v. PR)
    run: |
      (cd pr     && bash .github/workflows/${{ matrix.cluster }}/submit-bench.sh .github/workflows/${{ matrix.cluster }}/bench.sh ${{ matrix.device }} ${{ matrix.interface }}) &
      (cd master && bash .github/workflows/${{ matrix.cluster }}/submit-bench.sh .github/workflows/${{ matrix.cluster }}/bench.sh ${{ matrix.device }} ${{ matrix.interface }}) &
      wait %1 && wait %2

  - name: Generate & Post Comment
    run: |
      git clone "${{ secrets.DOC_PUSH_URL }}" ../www
      cp ../www/documentation/data.js ./pr/data.js
      (cd pr && . ./mfc.sh load -c ${{ matrix.flag }} -m g)
      (cd pr && ./mfc.sh bench_diff ../master/bench-${{ matrix.device }}-${{ matrix.interface }}.yaml ../pr/bench-${{ matrix.device }}-${{ matrix.interface }}.yaml -f data.js -n "${{ matrix.name }} (${{ matrix.device }}/${{ matrix.interface }})")

  - name: Prep Benchmark Results
    run: |
      # should fill in the template in pr/cont-bench-template.json

  - name: Post Benchmark Results
    uses: benchmark-action/github-action-benchmark@v1
    with:
      name: "${{ matrix.name }} (${{ matrix.device }}/${{ matrix.interface }})"
      tool: 'googlecpp'
      output-file-path: pr/report.json
      github-token: ${{ secrets.TOKEN }}
      auto-push: true
      alert-threshold: '200%'
      comment-on-alert: true
      fail-on-alert: true
      gh-repository: 'github.com/MFlowCode/MFlowCode.github.io'
      gh-pages-branch: 'main'
      benchmark-data-dir-path: 'documentation'

  - name: Polishing Results
    run:  |
      TAG="${{ github.event.inputs.tag || github.ref_name }}"
      echo "tag=$TAG" >> $GITHUB_OUTPUT
      echo "TAG=$TAG" >> $GITHUB_ENV

      set +e
      git ls-remote "${{ secrets.DOC_PUSH_URL }}" -q
      if [ "$?" -ne "0" ]; then exit 0; fi
      set -e
      git config --global user.name  'MFC Action'
      git config --global user.email '<>'
      git clone "${{ secrets.DOC_PUSH_URL }}" ../www
      sed -i 's/"unit": "s\/iter"/"unit": "s"/g; s/cpu: /runtime: /g; s/iterations: /time steps: /g' ../www/documentation/data.js
      sed -i "s/${GITHUB_SHA::7}/$TAG/g" ../www/documentation/data.js

      git -C ../www add -A
      git -C ../www commit -m "Docs @ ${GITHUB_SHA::7}" || true
      git -C ../www push

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level Suggestion

The current benchmark data update process in cont-bench.yml is complex and fragile, using both a GitHub Action and manual sed commands. This should be consolidated into a single, robust mechanism, ideally by extending the Python toolchain to handle data transformations before pushing. [High-level, importance: 9]

Solution Walkthrough:

Before:

# in .github/workflows/cont-bench.yml
# ...
  - name: Post Benchmark Results
    uses: benchmark-action/github-action-benchmark@v1
    with:
      # ... (configuration)
      auto-push: true
      gh-repository: 'github.com/MFlowCode/MFlowCode.github.io'
      benchmark-data-dir-path: 'documentation'

  - name: Polishing Results
    run:  |
      # ...
      git clone "${{ secrets.DOC_PUSH_URL }}" ../www
      sed -i 's/"unit": "s\/iter"/"unit": "s"/g; s/cpu: /runtime: /g; ...' ../www/documentation/data.js
      sed -i "s/${GITHUB_SHA::7}/$TAG/g" ../www/documentation/data.js
      git -C ../www add -A
      git -C ../www commit -m "..." || true
      git -C ../www push

After:

# in .github/workflows/cont-bench.yml
# ...
  - name: Post and Polish Benchmark Results
    uses: benchmark-action/github-action-benchmark@v1
    with:
      # ... (configuration)
      auto-push: true
      gh-repository: 'github.com/MFlowCode/MFlowCode.github.io'
      benchmark-data-dir-path: 'documentation'
      # Use a script to post-process the benchmark output before it's committed
      # This script would be part of the MFC toolchain
      post-process-script: './mfc.sh bench_polish_results'

# in toolchain/mfc/bench.py
def polish_results():
    # Load data.js
    # Perform transformations in Python (robustly)
    # e.g., replace units, keys, and tags
    # Save modified data.js
    # The action would then commit and push this polished file.

Comment on lines +132 to +150
- name: Polishing Results
run: |
TAG="${{ github.event.inputs.tag || github.ref_name }}"
echo "tag=$TAG" >> $GITHUB_OUTPUT
echo "TAG=$TAG" >> $GITHUB_ENV
set +e
git ls-remote "${{ secrets.DOC_PUSH_URL }}" -q
if [ "$?" -ne "0" ]; then exit 0; fi
set -e
git config --global user.name 'MFC Action'
git config --global user.email '<>'
git clone "${{ secrets.DOC_PUSH_URL }}" ../www
sed -i 's/"unit": "s\/iter"/"unit": "s"/g; s/cpu: /runtime: /g; s/iterations: /time steps: /g' ../www/documentation/data.js
sed -i "s/${GITHUB_SHA::7}/$TAG/g" ../www/documentation/data.js
git -C ../www add -A
git -C ../www commit -m "Docs @ ${GITHUB_SHA::7}" || true
git -C ../www push
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: To fix a race condition in the Polishing Results step, implement a retry loop for the git push command. Before pushing, pull the latest changes with rebase to handle concurrent updates from other jobs. [possible issue, importance: 8]

Suggested change
- name: Polishing Results
run: |
TAG="${{ github.event.inputs.tag || github.ref_name }}"
echo "tag=$TAG" >> $GITHUB_OUTPUT
echo "TAG=$TAG" >> $GITHUB_ENV
set +e
git ls-remote "${{ secrets.DOC_PUSH_URL }}" -q
if [ "$?" -ne "0" ]; then exit 0; fi
set -e
git config --global user.name 'MFC Action'
git config --global user.email '<>'
git clone "${{ secrets.DOC_PUSH_URL }}" ../www
sed -i 's/"unit": "s\/iter"/"unit": "s"/g; s/cpu: /runtime: /g; s/iterations: /time steps: /g' ../www/documentation/data.js
sed -i "s/${GITHUB_SHA::7}/$TAG/g" ../www/documentation/data.js
git -C ../www add -A
git -C ../www commit -m "Docs @ ${GITHUB_SHA::7}" || true
git -C ../www push
- name: Polishing Results
run: |
TAG="${{ github.event.inputs.tag || github.ref_name }}"
echo "tag=$TAG" >> $GITHUB_OUTPUT
echo "TAG=$TAG" >> $GITHUB_ENV
set +e
git ls-remote "${{ secrets.DOC_PUSH_URL }}" -q
if [ "$?" -ne "0" ]; then exit 0; fi
set -e
git config --global user.name 'MFC Action'
git config --global user.email '<>'
git clone "${{ secrets.DOC_PUSH_URL }}" ../www
sed -i 's/"unit": "s\/iter"/"unit": "s"/g; s/cpu: /runtime: /g; s/iterations: /time steps: /g' ../www/documentation/data.js
sed -i "s/${GITHUB_SHA::7}/$TAG/g" ../www/documentation/data.js
git -C ../www add -A
git -C ../www commit -m "Docs @ ${GITHUB_SHA::7}" || true
for i in {1..5}; do
git -C ../www pull --rebase && git -C ../www push && break
sleep 5
done

Comment on lines +110 to +131
try:
with open(ARG("file"), 'r') as f:
data_json = json.load(f)
cb_test = ARG("name")
if "entries" in data_json and cb_test in data_json["entries"]:
benchmark_runs = data_json["entries"][cb_test]
case_times = {}
for run in benchmark_runs:
for benches in run["benches"]:
case_name = benches.get("name")
grind_value = benches.get("value")
if (case_name is None or case_name not in case_times) or grind_value is None:
case_times[case_name] = []
case_times[case_name].append(grind_value)
for case_name, values in case_times.items():
avg = sum(values) / len(values)
cb_stats[case_name] = {"avg": avg, "count": len(values)}
cons.print(f"[bold]Loaded cb data for test: [bold]{cb_test}[/bold] ({len(cb_stats)} cases)[/bold]")
else:
cons.print(f"[bold yellow]Warning[/bold yellow]: Test '[bold]{cb_test}[/bold]' not found in data file.")
except Exception as e:
cons.print(f"[bold yellow]Warning[/bold yellow]: Could not load data file: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Fix the benchmark averaging logic by initializing the case_times[case_name] list only when a case_name is first encountered. This prevents overwriting historical data and ensures the average is calculated correctly across all runs. [possible issue, importance: 9]

Suggested change
try:
with open(ARG("file"), 'r') as f:
data_json = json.load(f)
cb_test = ARG("name")
if "entries" in data_json and cb_test in data_json["entries"]:
benchmark_runs = data_json["entries"][cb_test]
case_times = {}
for run in benchmark_runs:
for benches in run["benches"]:
case_name = benches.get("name")
grind_value = benches.get("value")
if (case_name is None or case_name not in case_times) or grind_value is None:
case_times[case_name] = []
case_times[case_name].append(grind_value)
for case_name, values in case_times.items():
avg = sum(values) / len(values)
cb_stats[case_name] = {"avg": avg, "count": len(values)}
cons.print(f"[bold]Loaded cb data for test: [bold]{cb_test}[/bold] ({len(cb_stats)} cases)[/bold]")
else:
cons.print(f"[bold yellow]Warning[/bold yellow]: Test '[bold]{cb_test}[/bold]' not found in data file.")
except Exception as e:
cons.print(f"[bold yellow]Warning[/bold yellow]: Could not load data file: {e}")
try:
with open(ARG("file"), 'r') as f:
data_json = json.load(f)
cb_test = ARG("name")
if "entries" in data_json and cb_test in data_json["entries"]:
benchmark_runs = data_json["entries"][cb_test]
case_times = {}
for run in benchmark_runs:
for benches in run["benches"]:
case_name = benches.get("name")
grind_value = benches.get("value")
if case_name is None or grind_value is None:
continue
if case_name not in case_times:
case_times[case_name] = []
case_times[case_name].append(grind_value)
for case_name, values in case_times.items():
if not values: continue
avg = sum(values) / len(values)
cb_stats[case_name] = {"avg": avg, "count": len(values)}
cons.print(f"[bold]Loaded cb data for test: [bold]{cb_test}[/bold] ({len(cb_stats)} cases)[/bold]")
else:
cons.print(f"[bold yellow]Warning[/bold yellow]: Test '[bold]{cb_test}[/bold]' not found in data file.")
except Exception as e:
cons.print(f"[bold yellow]Warning[/bold yellow]: Could not load data file: {e}")

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements continuous benchmarking infrastructure for MFC, allowing comparison of benchmark results against historical data. The system integrates with GitHub Actions to track performance over time and detect regressions.

Key changes:

  • Added functionality to load and compare benchmark results against historical continuous benchmarking data stored in data.js
  • Enhanced the bench_diff command to accept optional file and test name parameters for historical comparison
  • Created a new continuous benchmarking workflow that runs benchmarks and stores results

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
toolchain/mfc/bench.py Added logic to load historical benchmark data from JSON and display comparisons in the diff output
toolchain/mfc/args.py Added --file and --name arguments to bench_diff command; adjusted name slugification logic
data.js New JSON file containing historical benchmark data for CPU and GPU tests
bench.yaml Example benchmark results file in YAML format
.github/workflows/docs.yml Added restoration of data.js file to preserve benchmark history
.github/workflows/cont-bench.yml New workflow for continuous benchmarking automation
.github/workflows/bench.yml Updated to fetch and use historical data.js for comparisons

Comment on lines +110 to +130
try:
with open(ARG("file"), 'r') as f:
data_json = json.load(f)
cb_test = ARG("name")
if "entries" in data_json and cb_test in data_json["entries"]:
benchmark_runs = data_json["entries"][cb_test]
case_times = {}
for run in benchmark_runs:
for benches in run["benches"]:
case_name = benches.get("name")
grind_value = benches.get("value")
if (case_name is None or case_name not in case_times) or grind_value is None:
case_times[case_name] = []
case_times[case_name].append(grind_value)
for case_name, values in case_times.items():
avg = sum(values) / len(values)
cb_stats[case_name] = {"avg": avg, "count": len(values)}
cons.print(f"[bold]Loaded cb data for test: [bold]{cb_test}[/bold] ({len(cb_stats)} cases)[/bold]")
else:
cons.print(f"[bold yellow]Warning[/bold yellow]: Test '[bold]{cb_test}[/bold]' not found in data file.")
except Exception as e:
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When ARG('file') returns None (when --file is not provided), this will raise a TypeError: 'expected str, bytes or os.PathLike object, not NoneType'. The try-except catches this but silently continues. Consider checking if ARG('file') is None before attempting to open the file, or use ARG('file', None) and explicitly check for None to make the intent clearer.

Suggested change
try:
with open(ARG("file"), 'r') as f:
data_json = json.load(f)
cb_test = ARG("name")
if "entries" in data_json and cb_test in data_json["entries"]:
benchmark_runs = data_json["entries"][cb_test]
case_times = {}
for run in benchmark_runs:
for benches in run["benches"]:
case_name = benches.get("name")
grind_value = benches.get("value")
if (case_name is None or case_name not in case_times) or grind_value is None:
case_times[case_name] = []
case_times[case_name].append(grind_value)
for case_name, values in case_times.items():
avg = sum(values) / len(values)
cb_stats[case_name] = {"avg": avg, "count": len(values)}
cons.print(f"[bold]Loaded cb data for test: [bold]{cb_test}[/bold] ({len(cb_stats)} cases)[/bold]")
else:
cons.print(f"[bold yellow]Warning[/bold yellow]: Test '[bold]{cb_test}[/bold]' not found in data file.")
except Exception as e:
cb_file = ARG("file")
if cb_file is not None:
try:
with open(cb_file, 'r') as f:
data_json = json.load(f)
cb_test = ARG("name")
if "entries" in data_json and cb_test in data_json["entries"]:
benchmark_runs = data_json["entries"][cb_test]
case_times = {}
for run in benchmark_runs:
for benches in run["benches"]:
case_name = benches.get("name")
grind_value = benches.get("value")
if (case_name is None or case_name not in case_times) or grind_value is None:
case_times[case_name] = []
case_times[case_name].append(grind_value)
for case_name, values in case_times.items():
avg = sum(values) / len(values)
cb_stats[case_name] = {"avg": avg, "count": len(values)}
cons.print(f"[bold]Loaded cb data for test: [bold]{cb_test}[/bold] ({len(cb_stats)} cases)[/bold]")
else:
cons.print(f"[bold yellow]Warning[/bold yellow]: Test '[bold]{cb_test}[/bold]' not found in data file.")
except Exception as e:
cons.print(f"[bold yellow]Warning[/bold yellow]: Could not load cb data file: {e}")
else:
cons.print(f"[bold yellow]Warning[/bold yellow]: No --file argument provided; skipping cb data load.")

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +123
if (case_name is None or case_name not in case_times) or grind_value is None:
case_times[case_name] = []
case_times[case_name].append(grind_value)
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is inverted. When case_name is None or grind_value is None, you're initializing an empty list. However, you should skip adding the value or handle the None case properly. If case_name is None, using it as a dictionary key will cause issues. The condition should be 'if case_name is not None and grind_value is not None' to append values, or explicitly handle None cases.

Suggested change
if (case_name is None or case_name not in case_times) or grind_value is None:
case_times[case_name] = []
case_times[case_name].append(grind_value)
if case_name is not None and grind_value is not None:
if case_name not in case_times:
case_times[case_name] = []
case_times[case_name].append(grind_value)

Copilot uses AI. Check for mistakes.
Comment on lines +173 to +174
offset_pct = (rhs_grind - avg) / avg * 100
color = "red" if (offset_pct) > 0 else "yellow" if (offset_pct) == 0 else "green"
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parentheses around 'offset_pct' are unnecessary and the comparison semantics are unclear. Red for positive offset means slower performance (higher grind time), but this should be documented. Consider adding a comment explaining that positive offset means degradation, or use more descriptive variable names like 'performance_degradation_pct'.

Suggested change
offset_pct = (rhs_grind - avg) / avg * 100
color = "red" if (offset_pct) > 0 else "yellow" if (offset_pct) == 0 else "green"
# Positive offset_pct means grind time is higher than average (performance degradation)
offset_pct = (rhs_grind - avg) / avg * 100
color = "red" if offset_pct > 0 else "yellow" if offset_pct == 0 else "green"

Copilot uses AI. Check for mistakes.
Comment on lines +98 to +104
wait %1 && wait %2
- name: Bench (Master v. PR)
run: |
(cd pr && bash .github/workflows/${{ matrix.cluster }}/submit-bench.sh .github/workflows/${{ matrix.cluster }}/bench.sh ${{ matrix.device }} ${{ matrix.interface }}) &
(cd master && bash .github/workflows/${{ matrix.cluster }}/submit-bench.sh .github/workflows/${{ matrix.cluster }}/bench.sh ${{ matrix.device }} ${{ matrix.interface }}) &
wait %1 && wait %2
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'wait %1 && wait %2' syntax is for Windows batch/PowerShell, not bash. In bash, you should use 'wait' without arguments to wait for all background jobs, or use process IDs. This will cause the workflow to fail when executed in a bash shell.

Suggested change
wait %1 && wait %2
- name: Bench (Master v. PR)
run: |
(cd pr && bash .github/workflows/${{ matrix.cluster }}/submit-bench.sh .github/workflows/${{ matrix.cluster }}/bench.sh ${{ matrix.device }} ${{ matrix.interface }}) &
(cd master && bash .github/workflows/${{ matrix.cluster }}/submit-bench.sh .github/workflows/${{ matrix.cluster }}/bench.sh ${{ matrix.device }} ${{ matrix.interface }}) &
wait %1 && wait %2
wait
- name: Bench (Master v. PR)
run: |
(cd pr && bash .github/workflows/${{ matrix.cluster }}/submit-bench.sh .github/workflows/${{ matrix.cluster }}/bench.sh ${{ matrix.device }} ${{ matrix.interface }}) &
(cd master && bash .github/workflows/${{ matrix.cluster }}/submit-bench.sh .github/workflows/${{ matrix.cluster }}/bench.sh ${{ matrix.device }} ${{ matrix.interface }}) &
wait

Copilot uses AI. Check for mistakes.
run: |
(cd pr && bash .github/workflows/${{ matrix.cluster }}/submit-bench.sh .github/workflows/${{ matrix.cluster }}/bench.sh ${{ matrix.device }} ${{ matrix.interface }}) &
(cd master && bash .github/workflows/${{ matrix.cluster }}/submit-bench.sh .github/workflows/${{ matrix.cluster }}/bench.sh ${{ matrix.device }} ${{ matrix.interface }}) &
wait %1 && wait %2
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above - 'wait %1 && wait %2' is Windows syntax, not bash. Use 'wait' alone or capture PIDs and wait for them explicitly.

Suggested change
wait %1 && wait %2
wait

Copilot uses AI. Check for mistakes.
Comment on lines +113 to +116
- name: Prep Benchmark Results
run: |
# should fill in the template in pr/cont-bench-template.json
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step contains only a TODO comment and performs no actual work. Either implement the functionality or remove the step if it's not needed yet. Leaving empty placeholder steps in production workflows can be confusing.

Suggested change
- name: Prep Benchmark Results
run: |
# should fill in the template in pr/cont-bench-template.json

Copilot uses AI. Check for mistakes.
Comment on lines +186 to +187
elif args["command"] == "bench_diff" and len(args["name"]) > 0:
args["name"] = " ".join(args["name"])
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If args['name'] is not a list (e.g., if it has a default value of empty list or None), this will fail. Additionally, accessing args['name'] without checking if the key exists could raise KeyError. Consider using args.get('name', []) and verifying it's a list before calling len() and join().

Suggested change
elif args["command"] == "bench_diff" and len(args["name"]) > 0:
args["name"] = " ".join(args["name"])
elif args["command"] == "bench_diff":
name_val = args.get("name", [])
if isinstance(name_val, list) and len(name_val) > 0:
args["name"] = " ".join(name_val)

Copilot uses AI. Check for mistakes.
@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 46.02%. Comparing base (bfd732c) to head (83fd1b2).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1033   +/-   ##
=======================================
  Coverage   46.02%   46.02%           
=======================================
  Files          67       67           
  Lines       13437    13437           
  Branches     1550     1550           
=======================================
  Hits         6185     6185           
  Misses       6362     6362           
  Partials      890      890           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

1 participant