Skip to content

[RF] Compress the embedded RooDataHists in workspaces when serializing #19459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

guitargeek
Copy link
Contributor

Significantly reduce the size of RooWorkspaces with template histograms on disk.

Should circumvent the 1 GB buffer limit that ATLAS is hitting in the Higgs combination.

Significantly reduce the size of RooWorkspaces with template histograms
on disk.

Should circumvent the 1 GB buffer limit that ATLAS is hitting in the
Higgs combination.
Copy link

Test Results

    21 files      21 suites   3d 6h 39m 39s ⏱️
 3 217 tests  3 210 ✅   0 💤 7 ❌
65 831 runs  65 680 ✅ 144 💤 7 ❌

For more details on these failures, see this check.

Results for commit c2eb4f4.

@chenzhl2018
Copy link

Hi @guitargeek, thanks for your nice development. Recently, i try to test this development with 12 workspaces of ATLAS Higgs analysis. There are some improvements for some workspaces in disk size, while some failures for some workspaces. I would like to ask what causes the failure and how to fix it. Thanks a lot.
图片
图片

@guitargeek
Copy link
Contributor Author

Great, thanks for trying it out! Do you also have the comparison for the uncompressed sizes, like reported in the meeting 3 weeks ago?

The errors are because I have not implemented support for template histograms with asymmetric uncertainties, because I thought nobody would do this in practice. Interesting that you have this, because it's not clear at all how to use this in a statistical model :) Maybe it's just an artifact of the way you build the histograms? You can then also reduce the workspace size by removing these asymmetric errors.

In any case, I'll implement support for that in this PR tomorrow, and then you can try again.

@chenzhl2018
Copy link

Hi @guitargeek, thanks for your quick reply. Yeah, i check the Htautau workspace quickly, and we could find the buffer size decreases largely with the new development. And then i will fit the workspaces and check the workspaces could really work or not. It's good to see the great development.

And i am also curious about the reason this development could reduce the buffer size. I will appreciate it if you could explain this for me. Thanks a lot.

图片

@chenzhl2018
Copy link

Hi @guitargeek, i would like to the issue of not implementation for template histograms with asymmetric uncertainties. I think the asymmetric uncertainties of template histograms are the statistical uncertainties of the histos, right? If so, i would like to ask if it's possible to provide one option to remove it or skip this error?

@chenzhl2018
Copy link

For the channels with successful implementation, there are about 15% buffer size reduction.
图片

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants