[df] First integration of `RHist` filling #20664

hahnjo · 2025-12-08T10:58:26Z

As the classes are experimental, using the method will print a warning to inform the user.

Note that this is only a first version, there are a couple of advanced functionalities and better error handling that will come in future PRs.

As the classes are experimental, using the method will print a warning to inform the user.

This needs one indirection to construct the std::tuple and separate the weight argument.

hahnjo · 2025-12-08T12:36:50Z

tree/dataframe/inc/ROOT/RDF/RInterface.hxx

+   /// ### Example usage:
+   /// ~~~{.cpp}
+   /// auto h = std::make_shared<ROOT::Experimental::RHist<double>>(10, {5.0, 15.0});
+   /// auto myHist = myDf.Hist(h, {"col0"});
+   /// ~~~
+   template <typename ColumnType = RDFDetail::RInferredType, typename... ColumnTypes, typename BinContentType>
+   RResultPtr<ROOT::Experimental::RHist<BinContentType>>
+   Hist(std::shared_ptr<ROOT::Experimental::RHist<BinContentType>> h, const ColumnNames_t &columnList)


I believe we should market this overload as "bring your own histogram" 😅

Jokes aside, this will enable some nice use cases, for example filling a shared histogram from multiple computation graphs, as demonstrated by the RDFHist.PtrRunGraphs unit test.

This is great!

github-actions · 2025-12-08T13:08:56Z

Test Results

21 files 21 suites 3d 18h 43m 15s ⏱️
3 787 tests 3 787 ✅ 0 💤 0 ❌
77 582 runs 77 582 ✅ 0 💤 0 ❌

Results for commit 46f531b.

hahnjo

Things to discuss during review (after the break)

hahnjo · 2025-12-12T07:45:50Z

tree/dataframe/inc/ROOT/RDF/RInterface.hxx

+   {
+      std::shared_ptr h = std::make_shared<ROOT::Experimental::RHist<BinContentType>>(std::move(axes));
+      if (h->GetNDimensions() != columnList.size()) {
+         throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");


In histv7, we are consistently throwing std::invalid_argument while RDF prefers std::runtime_error. I think we have to decide which "consistency" to follow...

According to https://en.cppreference.com/w/cpp/error/runtime_error.html and https://en.cppreference.com/w/cpp/error/invalid_argument.html :

std::runtime_error: It reports errors that are due to events beyond the scope of the program and cannot be easily predicted.

std::invalid_argument: It reports errors that arise because an argument value has not been accepted.

I infer from this that any type of error that arises from breaking a pre-condition of a function, such as this case, should be an invalid_argument. In RDataFrame, we are just using std::runtime_error everywhere (for simplicity/consistency). If we were to apply the two definitions above, we should actually use both type of errors: runtime_error for things like errors in JITting, invalid_argument for this type of cases

hahnjo · 2025-12-12T07:47:25Z

tree/dataframe/inc/ROOT/RDF/RInterface.hxx

+   /// ROOT::Experimental::RRegularAxis axis(10, {5.0, 15.0});
+   /// auto myHist = myDf.Hist({axis}, {"col0"});
+   /// ~~~
+   template <typename BinContentType = double, typename ColumnType = RDFDetail::RInferredType, typename... ColumnTypes>


One could also argue for BinContentType = int or long as default since it's unweighted filling. However, then a number of "post-processing steps" require conversion, for example scaling...

I like this approach: least number of operations by default.

hahnjo · 2025-12-12T07:49:34Z

tree/dataframe/inc/ROOT/RDF/RInterface.hxx

   /// A list containing the names of the columns that will be passed when calling `Fill`.
   ///  (N columns for unweighted filling, or N+1 columns for weighted filling)


Reminder to ourselves that we want to change this, and backport to 6.38 a warning message

Opened #20816 to keep track of this comment 👍

hahnjo · 2025-12-12T07:50:53Z

tree/dataframe/inc/ROOT/RDF/InterfaceUtils.hxx

 struct HistoND{};
 struct HistoNSparseD{};
 struct Hist{};
+struct HistWithWeight{};


Having a second tag felt like the easiest solution, together with just a template argument bool WithWeight = false for RHistFillHelper. Other approaches are certainly possible...

I believe this is valid and given that everything is internal anyway I don't see a reason to spend more time on this.

vepadulano

Thank you! The PR is already in an advanced state, I have left some minor comments

vepadulano · 2026-01-07T09:16:28Z

tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx

+      std::tuple args{std::get<I>(columnValues)...};
+      ROOT::Experimental::RWeight weight(std::get<sizeof...(ColumnTypes) - 1>(columnValues));
+      fContexts[slot]->Fill(args, weight);


Is the extra args tuple necessary or can we directly pass columnValues to Fill?

vepadulano · 2026-01-07T09:18:28Z

tree/dataframe/inc/ROOT/RDF/InterfaceUtils.hxx

 struct HistoND{};
 struct HistoNSparseD{};
 struct Hist{};
+struct HistWithWeight{};


I believe this is valid and given that everything is internal anyway I don't see a reason to spend more time on this.

vepadulano · 2026-01-07T09:19:09Z

tree/dataframe/inc/ROOT/RDF/InterfaceUtils.hxx

 }

+#ifdef R__HAS_ROOT7
+// Action for RHist using RHistConcurrentFiller


Suggested change

// Action for RHist using RHistConcurrentFiller

// Action for RHist using RHistConcurrentFiller without weights

vepadulano · 2026-01-07T09:19:16Z

tree/dataframe/inc/ROOT/RDF/InterfaceUtils.hxx

+   return std::make_unique<Action_t>(Helper_t(h, nSlots), columnList, std::move(prevNode), colRegister);
+}
+
+// Action for RHist using RHistConcurrentFiller


Suggested change

// Action for RHist using RHistConcurrentFiller

// Action for RHist using RHistConcurrentFiller with weights

vepadulano · 2026-01-07T09:24:30Z

tree/dataframe/inc/ROOT/RDF/RInterface.hxx

   /// A list containing the names of the columns that will be passed when calling `Fill`.
   ///  (N columns for unweighted filling, or N+1 columns for weighted filling)


Opened #20816 to keep track of this comment 👍

vepadulano · 2026-01-07T09:28:43Z

tree/dataframe/inc/ROOT/RDF/RInterface.hxx

+   {
+      std::shared_ptr h = std::make_shared<ROOT::Experimental::RHist<BinContentType>>(std::move(axes));
+      if (h->GetNDimensions() != columnList.size()) {
+         throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");


According to https://en.cppreference.com/w/cpp/error/runtime_error.html and https://en.cppreference.com/w/cpp/error/invalid_argument.html :

std::runtime_error: It reports errors that are due to events beyond the scope of the program and cannot be easily predicted.

std::invalid_argument: It reports errors that arise because an argument value has not been accepted.

I infer from this that any type of error that arises from breaking a pre-condition of a function, such as this case, should be an invalid_argument. In RDataFrame, we are just using std::runtime_error everywhere (for simplicity/consistency). If we were to apply the two definitions above, we should actually use both type of errors: runtime_error for things like errors in JITting, invalid_argument for this type of cases

vepadulano · 2026-01-07T09:29:56Z

tree/dataframe/inc/ROOT/RDF/RInterface.hxx

+   /// During execution of the computation graph, the passed histogram must only be accessed with methods that are
+   /// allowed during concurrent filling.


This is always a tricky subject to communicate to users, so I wonder if we can be even more explicit here?

Suggested change

/// During execution of the computation graph, the passed histogram must only be accessed with methods that are

/// allowed during concurrent filling.

/// During execution of the computation graph, the passed histogram must only be accessed with methods that are

/// allowed during concurrent filling, i.e. that are thread-safe

vepadulano · 2026-01-07T09:32:01Z

tree/dataframe/inc/ROOT/RDF/RInterface.hxx

+   /// ### Example usage:
+   /// ~~~{.cpp}
+   /// auto h = std::make_shared<ROOT::Experimental::RHist<double>>(10, {5.0, 15.0});
+   /// auto myHist = myDf.Hist(h, {"col0"});
+   /// ~~~
+   template <typename ColumnType = RDFDetail::RInferredType, typename... ColumnTypes, typename BinContentType>
+   RResultPtr<ROOT::Experimental::RHist<BinContentType>>
+   Hist(std::shared_ptr<ROOT::Experimental::RHist<BinContentType>> h, const ColumnNames_t &columnList)


This is great!

vepadulano · 2026-01-07T09:34:22Z

tree/dataframe/inc/ROOT/RDF/RInterface.hxx

+
+      std::shared_ptr h = std::make_shared<ROOT::Experimental::RHist<BinContentType>>(std::move(axes));
+      if (h->GetNDimensions() != columnList.size()) {
+         throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");


Since we know both numbers, we could think about making the error a bit more friendly:

Suggested change

throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");

throw std::runtime_error("Wrong number of columns for the specified number of histogram axes: need '" + std::to_string(h->GetNDimensions()) + "', got '" + std::to_string(columnList.size()) + "'.");

Applies similarly in other places.

vepadulano · 2026-01-07T09:41:39Z

tree/dataframe/test/dataframe_hist.cxx

+{
+   RDataFrame df(10);
+   const RRegularAxis axis(10, {5.0, 15.0});
+   auto hist = df.Define("x", "rdfentry_ + 5.5").Hist({axis}, {"x"});


Thinking out loud, outside the scope of this PR, about how much JITting we strictly need. In principle, the bin content type defaults to double, so what we're JITting is the column type, I wonder if we could find a way to ask the type-system to just read the contents of the column truly as double, then we'd have at least one default scenario without JITting and without the need for template arguments.

hahnjo added 2 commits December 8, 2025 11:47

[df] First integration of RHist filling

ad552e4

As the classes are experimental, using the method will print a warning to inform the user.

[df] Implement weighted filling of RHist

46f531b

This needs one indirection to construct the std::tuple and separate the weight argument.

hahnjo requested review from dpiparo, jblomer and vepadulano December 8, 2025 10:58

hahnjo self-assigned this Dec 8, 2025

hahnjo requested review from bellenot and martamaja10 as code owners December 8, 2025 10:58

hahnjo added in:RDataFrame in:Hist labels Dec 8, 2025

hahnjo commented Dec 8, 2025

View reviewed changes

hahnjo commented Dec 12, 2025

View reviewed changes

vepadulano mentioned this pull request Jan 7, 2026

Improve RInterface::HistoND signature with an extra argument for weight #20816

Open

vepadulano requested changes Jan 7, 2026

View reviewed changes

		/// A list containing the names of the columns that will be passed when calling `Fill`.
		/// (N columns for unweighted filling, or N+1 columns for weighted filling)

	// Action for RHist using RHistConcurrentFiller
	// Action for RHist using RHistConcurrentFiller without weights

		/// During execution of the computation graph, the passed histogram must only be accessed with methods that are
		/// allowed during concurrent filling.

	throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");
	throw std::runtime_error("Wrong number of columns for the specified number of histogram axes: need '" + std::to_string(h->GetNDimensions()) + "', got '" + std::to_string(columnList.size()) + "'.");

[df] First integration of RHist filling #20664

Are you sure you want to change the base?

[df] First integration of RHist filling #20664

Conversation

hahnjo commented Dec 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 8, 2025

Test Results

Uh oh!

hahnjo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vepadulano left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[df] First integration of `RHist` filling #20664

[df] First integration of `RHist` filling #20664