Skip to content

Improve testing strategy for infrastructure repositories (CMEPS, CDEPS, share, CIME, etc.) #356

@billsacks

Description

@billsacks

This is arising out of a discussion with @mvertens about what testing we should do for ESCOMP/CMEPS#595. I have struggled to know what testing to do for CMEPS and CDEPS changes in the past as well. My focus here is on a selection of full system tests with baseline comparisons, since that's where I get the most confidence that a change isn't introducing bugs.

Some existing testing methods:

  • @jedwards4b has put in place some great GitHub Action-based tests, but I believe (please correct me if I'm wrong!) these don't yet go so far as doing baseline comparisons, or maybe even running the code at all?
  • we've had aux_cmeps and aux_cdeps test lists, but my impression (again, please correct me if I'm wrong!) is that they have fallen into disuse and are no longer regularly used or maintained
  • prealpha and prebeta test lists are comprehensive but usually overkill for a single set of relatively small changes
  • aux_cime_baselines is probably the closest to what I'm thinking about, but looking at that test list, it looks very unbalanced, with a ton of I compset tests, not a single C or G compset test, etc. - so if we are relying on this, it should be revamped (and renamed).

I propose that we do one of the following - but am also open to other ideas. Please note that I view these proposals as strawmen at this point, more to generate discussion rather than being well-thought-through proposals.

(1) Revisit the aux_cmeps and aux_cdeps test lists, also introducing aux_share.

Criteria for these test lists should be:

  • Tests selected from the prealpha test suite (so baselines will exist and we can be sure the tests will remain passing) (if we want tests that aren't yet in the prealpha test suite, they should also be added to the prealpha test suite)
  • Cover the important options / uses / code paths in the respective repository with respect to CESM usage
  • Total test suite run time less than about an hour (ideally), or two hours (max)

I feel like it will be very hard or impossible to guarantee coverage of any particular change with a small test list because there are too many different options / code paths in cmeps and cdeps, so I wouldn't view these as replacements for targeted manual testing in many cases, but rather a supplement for that targeted testing.

Then I'd like to see the respective test suite run for any non-trivial change to a repository, with baseline comparisons to ensure no unintended answer changes (that last point is really the key point here). In general, the starting point for the code base could be the most recent CESM alpha tag, with an update in just the repository being changed. We could store baselines with a naming convention like cesm30alpha07d_cmepsXXX (indicating an update in cmeps relative to cesm30alpha07d), so that baselines would generally exist for the changes you want to test. Tests could either be run manually or via GitHub Actions or similar.

(2) Revamp aux_cime_baselines and rename it to something like aux_infrastructure_baselines

This test list was originally intended to run a small-ish cross-section of tests covering different options in CIME, when CIME contained the Fortran infrastructure code in addition to the Python case control system. @fischer-ncar set up a cron job to run this nightly to ensure there were no unexpected answer changes from these infrastructure changes. This cron job used to consist of checking out the latest version of CIME; I'm not clear on whether it has been updated to also check out the latest version of other infrastructure repositories... under this proposal, it would.

We relied on component model developers to add tests to this test suite, and at this point this test suite has become very unbalanced in its coverage of different compsets: Here is a list of the compsets covered by tests in this test suite, with one line per test (so, e.g., two instances of DTEST means we have two tests covering that compset in aux_cime_baselines):

B1850C_LTso
DTEST
DTEST
F1850
F2000climo
FHIST
FHIST_BGC
I1850Clm45BgcCru
I1850Clm50BgcCrop
I1850Clm50BgcCrop
I1850Clm50BgcCrop
I1850Clm50SpGag
I1850Clm60BgcCrop
I1PtClm60SpRs
I2000Clm50BgcCru
I2000Clm50Sp
I2000Clm50SpRs
I2000Clm60FatesRs
I2000Clm60Sp
IHistClm60Bgc
IHistClm60BgcQianRs
QPC6
QSC6
T1850Gg

So, if we stick with this mechanism, I feel this needs an overhaul.

The guidance in (1) for the design of the test suite(s) applies here as well, with the main difference being when the tests are run: In (1), tests would be run before making a tag, whereas in (2) tests would be run nightly, covering updates to multiple infrastructure repositories. I have a slight preference for (1), as long as it can be done in a way that doesn't create a large burden for developers.

@briandobbins @fischer-ncar @jedwards4b @mvertens - I'd like to hear any thoughts you have about testing strategies here (@mvertens as a representative for NorESM as well as from your extensive experience testing CMEPS & CDEPS and CESM more generally).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions