Adding scripts for ATLAS HEPMC Open Data handling #259

zlmarshall · 2025-04-21T11:39:50Z

This is the first iteration on scripts required for handling of the new ATLAS event generation open data in HEPMC format. The README explains all the different scripts and files included here (or should).

A few samples have been transferred to CERN already to establish the functionality of all the scripts. Everything seems to be ok so far.

The key outstanding item (needed before anything can actually go onto the Open Data Portal) is the record ID and DOI list for all the various records that will be created. Otherwise this should be just about ready to go, at least to the QA portal for checking.

This is the first iteration on scripts required for handling of the new ATLAS event generation open data in HEPMC format. The README explains all the different scripts and files included here (or should). A few samples have been transferred to CERN already to establish the functionality of all the scripts. Everything seems to be ok so far. The key outstanding item (needed before anything can actually go onto the Open Data Portal) is the record ID and DOI list for all the various records that will be created. Otherwise this should be just about ready to go, at least to the QA portal for checking.

Connected to ATLAS internal discussion in https://its.cern.ch/jira/browse/CENTRPAGE-569 For now, I think this is doing the correct thing

- Adding late-requested exotics datasets as an explicit list of datasets, so the various parsing scripts had to be updated accordingly - Updating production sheet and metadata requests accordingly - Sorting keywords in metadata (looks nicer) - Updating sample rules to appropriately handle the new exotics samples

This js file then goes into the ATLAS open data website to document the metadata. Including it here so that everything is in one place. Because it is really just a copy of the csv with a header and footer, not trying to include the output here as well (just unnecessary extra files)

This sets the remaining infrastructure up. I believe it is sufficient for the first release of a test record page. Once more production is done, we can release more of the records.

zlmarshall · 2025-05-05T21:08:36Z

Hi @tiborsimko ,

Thank you for the DOIs and record IDs in
cernopendata/opendata.cern.ch#3737

I've updated the script here to make use of them, and tried to build a little infrastructure so that we don't screw up the assignment or get things mixed up in the future. I've tested a bit, and things seem to be working so far.

I think this means we're ready for a test deployment on the QA instance. This is going to feel like a LOT of infrastructure for just two records, but I am hoping that all this setup means we will be able to scale from 2 to 200 without any problems.

The nominal plan (I hope you agree!) would be to get this up on the QA instance, check things over, then to actually release it onto the normal portal. We'd then let phenomenologists look at it and see how to use it, and assuming everything is kosher we would then go ahead and release the next (larger) batch.

Thanks again,
Zach

atlas-2025-odfr-hepmc/create_file_metadata.py

atlas-2025-odfr-hepmc/make_od_hepmc_json.py

zlmarshall · 2025-06-05T02:27:02Z

Hi @tiborsimko ,

Thank you for the test deployment! I've pushed a very minor update to the text that would be nice to have in. The only other thing I'd like if possible is for the empty "Files and Indexes" block to be removed from the meta-record at https://opendata-qa.cern.ch/record/160000 . After that, I think you're all clear to press the button and make these public!

Thanks again,
Zach

…d script

…e nicely

Zach Marshall added 6 commits April 21, 2025 13:36

Preferring MC16 to MC15 metadata

da1c06d

Connected to ATLAS internal discussion in https://its.cern.ch/jira/browse/CENTRPAGE-569 For now, I think this is doing the correct thing

Fixing production spreadsheet header after csv move

41db70f

Final updates for first release of a test page

38f5851

This sets the remaining infrastructure up. I believe it is sufficient for the first release of a test record page. Once more production is done, we can release more of the records.

tiborsimko mentioned this pull request May 19, 2025

related links using doi over recid is not comfortable whilst developing cernopendata/cernopendata-portal#161

Open

tiborsimko reviewed May 19, 2025

View reviewed changes

atlas-2025-odfr-hepmc/create_file_metadata.py Show resolved Hide resolved

atlas-2025-odfr-hepmc/make_od_hepmc_json.py Show resolved Hide resolved

Zach Marshall added 2 commits May 19, 2025 20:38

Comments from Tibor (thanks)

4d60322

Little textual update before the release

9ff8c6f

Updating file names to match names on disk

96fa2f4

tiborsimko mentioned this pull request Jun 10, 2025

empty "Files and Indexes" section for summary records cernopendata/cernopendata-portal#169

Open

Zach Marshall added 2 commits June 10, 2025 19:55

Updating description with update notes (ha) and adding a build full m…

8e9bd74

…d script

Suggested changes from Tibor to the description to format it much mor…

0b0e871

…e nicely

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding scripts for ATLAS HEPMC Open Data handling #259

Adding scripts for ATLAS HEPMC Open Data handling #259

Uh oh!

zlmarshall commented Apr 21, 2025

Uh oh!

zlmarshall commented May 5, 2025

Uh oh!

Uh oh!

Uh oh!

zlmarshall commented Jun 5, 2025

Uh oh!

Uh oh!

Adding scripts for ATLAS HEPMC Open Data handling #259

Are you sure you want to change the base?

Adding scripts for ATLAS HEPMC Open Data handling #259

Uh oh!

Conversation

zlmarshall commented Apr 21, 2025

Uh oh!

zlmarshall commented May 5, 2025

Uh oh!

Uh oh!

Uh oh!

zlmarshall commented Jun 5, 2025

Uh oh!

Uh oh!