In MSQT, increased the upper bound on the expected number of metric samples when the test runs long #276

bieryAtFnal · 2025-12-11T15:57:25Z

Description

This PR is correlated with DUNE-DAQ/integrationtest#137.

Recently, there was a failure in the running of the minimal_system_quick_test.py in the overnight regression test run. The failure was that the number of samples for the generated_trigger_records metric was larger than expected.

This was caused by a longer-than-expected run time of the DAQ session. (not sure what caused that)

This change, plus the correlated one in the integrationtest repo addresses this possiblity by increasing the upper bound in the expected number of metric samples when the test runs long.

Here are instructions for testing these changes along with the ones in the integrationtest repo:

DATE_PREFIX=`date '+%d%b'`
TIME_SUFFIX=`date '+%H%M'`

source /cvmfs/dunedaq.opensciencegrid.org/setup_dunedaq.sh
setup_dbt latest
dbt-create -n NFD_DEV_251211_A9 ${DATE_PREFIX}FDDevTest_${TIME_SUFFIX}
cd ${DATE_PREFIX}FDDevTest_${TIME_SUFFIX}/sourcecode

git clone https://github.com/DUNE-DAQ/daqsystemtest.git -b kbiery/handle_extra_metric_samples
cd ..

dbt-workarea-env

git clone https://github.com/DUNE-DAQ/integrationtest.git -b kbiery/session_time_bundle_info
cd integrationtest; pip install -U . ; cd ..

dbt-build -j 12
dbt-workarea-env

daqsystemtest_integtest_bundle.sh -k minimal
echo ""
echo -e "\U1F535 \U2705 Note that the previous regression test succeeded. \U2705 \U1F535"
echo -e "\U1F535 \U2705 In particular, the metric sample check was successful. \U2705 \U1F535"
echo ""

sed -i 's,boot conf start,boot conf wait 35 start,' sourcecode/daqsystemtest/integtest/minimal_system_quick_test.py

daqsystemtest_integtest_bundle.sh -k minimal
echo ""
echo -e "\U1F535 \U2705 Note that the previous regression test succeeded, even though \U2705 \U1F535"
echo -e "\U1F535 \U2705 the number of metric samples fluctuation upward from 3 to 6. \U2705 \U1F535"

Type of change

Optimization (non-breaking change that improves code/performance)

Testing checklist

Minimal system quicktest passes (pytest -s minimal_system_quick_test.py)
Full set of integration tests pass (daqsystemtest_integtest_bundle.sh)

Further checks

Code is commented where needed, particularly in hard-to-understand areas

…upper bound on the expected number of metric samples when the test runs long.

eflumerf · 2025-12-11T18:51:48Z

integtest/minimal_system_quick_test.py

+        expected_daq_session_time = 40  # this was determined by looking at the overall time from a normal run
+        if run_nanorc.daq_session_overall_time is not None:
+            extra_time_taken = run_nanorc.daq_session_overall_time - expected_daq_session_time
+            if extra_time_taken > 10:


Should a warning message be printed here? Should we measure expected_daq_session_time across several different run_duration values, and see if it scales in an expected way? (And if so, should expected_daq_session_time be expressed as a function of run_duration?)

…quick_test.py to include the specified run_duration

bieryAtFnal · 2025-12-11T22:24:25Z

I'd vote against printing out an error message when the DAQ session time is a little longer than expected. That's not really the point of this integtest. (Of course, maybe we should create a test to watch for that.)

I measured the overall DAQ session times for different run durations on daq.fnal.gov and np04-srv-011, and got the following results:

run duration=20s : overall DAQ session time ~40 seconds
run duration=30s : overall DAQ session time ~50 seconds
run duration=40s : overall DAQ session time ~60 seconds

so, I'm fairly confident that the run_duration plus 20 seconds is a good estimate of the expected overall DAQ session time.

And based on your feedback, I updated the expected DAQ session time calculation to use the run_duration. (and fixed a bug in the metric value sum validation - it should also be based on the run duration)

eflumerf

Run duration-based scaling appears to be spot-on, tested with 160s run:

✅ The number of metric samples for key "minimal/df-01/df-01-trb/dfmodules.TRBInfo/generated_trigger_records" (17) is within the expected range (1..19).
✅ The sum of metric values for key "minimal/df-01/df-01-trb/dfmodules.TRBInfo/generated_trigger_records" (160) is within the expected range (157..163).

In minimal_system_quick_test.py, added functionality to increase the …

1d7d87f

…upper bound on the expected number of metric samples when the test runs long.

bieryAtFnal mentioned this pull request Dec 11, 2025

Added a DAQ session overall time measurement to the integrationtest results… DUNE-DAQ/integrationtest#137

Merged

4 tasks

bieryAtFnal requested a review from eflumerf December 11, 2025 16:18

eflumerf reviewed Dec 11, 2025

View reviewed changes

Updated the calculation of expected metric samples in minimal_system_…

14cd266

…quick_test.py to include the specified run_duration

eflumerf approved these changes Dec 12, 2025

View reviewed changes

bieryAtFnal merged commit 01239b1 into develop Dec 15, 2025
3 checks passed

bieryAtFnal deleted the kbiery/handle_extra_metric_samples branch December 15, 2025 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

In MSQT, increased the upper bound on the expected number of metric samples when the test runs long #276

In MSQT, increased the upper bound on the expected number of metric samples when the test runs long #276

Uh oh!

bieryAtFnal commented Dec 11, 2025 •

edited

Loading

Uh oh!

eflumerf Dec 11, 2025

Uh oh!

bieryAtFnal commented Dec 11, 2025 •

edited

Loading

Uh oh!

eflumerf left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

In MSQT, increased the upper bound on the expected number of metric samples when the test runs long #276

In MSQT, increased the upper bound on the expected number of metric samples when the test runs long #276

Uh oh!

Conversation

bieryAtFnal commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Testing checklist

Further checks

Uh oh!

eflumerf Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

bieryAtFnal commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eflumerf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bieryAtFnal commented Dec 11, 2025 •

edited

Loading

bieryAtFnal commented Dec 11, 2025 •

edited

Loading