-
Notifications
You must be signed in to change notification settings - Fork 65
perf(tidy3d): FXC-3721 Speed up test suite #2991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
14 files reviewed, 7 comments
dd55b29 to
0d5292d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
14 files reviewed, 1 comment
700ecb3 to
693676d
Compare
Diff CoverageDiff: origin/develop...HEAD, staged and unstaged changesNo lines with coverage information in this diff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering a bit what benefit this provides over just running pytest --durations > times.txt?
| N_tests = 2 # may be increased temporarily, makes it slow for routine tests | ||
| max_structures = np.log10(2) # may be increased temporarily, makes it slow for routine tests | ||
|
|
||
| # adjust as needed, keeping small to speed tests up | ||
| num_structures = np.logspace(0, 2, N_tests).astype(int) | ||
| num_structures = np.logspace(0, max_structures, N_tests).astype(int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this turns the test in this file into a smoke-test essentially. i believe the purpose there was to catch regressions related to validation time where the serialization scales poorly as structure count grows. now we basically just assert that both cases work. but to be fair the old test didnt actually assert any regressions so i'm not really sure what makes sense here, maybe @momchil-flex has an opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made for now a "perf" marker and deselected for regular tests
5bbca11 to
3f326b4
Compare
As our test suite does have a quite long runtime, this PR addresses the identification of the most time-intensive modules/functions/runs and speeds up the most critical ones.
profile_pytest.pyin scripts to analyze the most time-consuming modules/functions/runs (see report below)pytest.mark.slowfor the most time-consuming tests. May be used later for skipping for faster test modes.test_autograd.pyas the most critical module (10 min raw runtime!) -> cut it down by roughly the half. Made sure these tests run in the beginning such that we avoid non-fully-parallelized works are on them at the end.I'm open for more inputs how to speedup the tests.
Analysis before PR:
Analysis after PR
Greptile Overview
Greptile Summary
This PR achieves a ~30% reduction in test suite runtime (195s → 137s) through strategic optimizations targeting the most time-intensive tests, particularly in the autograd module which was reduced from 636s to 331s.
Key Changes:
test_autograd_asyncfrom 32 independent test runs to 2 combined batch testspytest.mark.slowmarkers to 15+ expensive tests for potential future filteringpytest.mark.order(0)to run autograd tests first, ensuring they execute in parallel rather than serially at the endscripts/profile_pytest.pyutility for ongoing performance analysisTrade-offs:
The performance gains come with reduced test coverage isolation. Tests that previously ran 16+ structure/monitor combinations independently now run all combinations in a single batch, making individual failures harder to isolate and debug. The
test_multi_frequency_equivalencetest now only validates the first structure type instead of all 12.Confidence Score: 3/5
tests/test_components/autograd/test_autograd.py- the test consolidation significantly changes coverage patterns and may make debugging autograd issues harder in productionImportant Files Changed
File Analysis
TEST_MAX_NUM_MEDIUMSconstant from production value to 3 for faster testsSequence Diagram
sequenceDiagram participant Dev as Developer participant Script as profile_pytest.py participant Pytest as pytest participant Tests as Test Suite participant CI as CI Pipeline Dev->>Script: Run profiling analysis Script->>Pytest: Execute with --durations Pytest->>Tests: Run full suite Tests-->>Pytest: Execution times Pytest-->>Script: Duration data Script->>Script: Aggregate by file/test Script-->>Dev: Performance report Note over Dev: Identifies slow tests<br/>(autograd: 636s) Dev->>Tests: Add pytest.mark.slow Dev->>Tests: Consolidate parameterized tests Dev->>Tests: Reduce test parameters Dev->>Tests: Add pytest.mark.order(0) CI->>Pytest: Run test suite Pytest->>Tests: Execute in order Note over Tests: Autograd tests run first<br/>(parallel execution) Tests->>Tests: Batch async tests together Tests-->>Pytest: Combined results Pytest-->>CI: Suite complete (137s vs 195s) Note over CI,Tests: 30% faster execution<br/>but reduced test isolation