Skip to content

Conversation

@hahnjo
Copy link
Member

@hahnjo hahnjo commented Dec 8, 2025

As the classes are experimental, using the method will print a warning to inform the user.

Note that this is only a first version, there are a couple of advanced functionalities and better error handling that will come in future PRs.

As the classes are experimental, using the method will print a warning
to inform the user.
This needs one indirection to construct the std::tuple and separate
the weight argument.
Comment on lines +2407 to +2414
/// ### Example usage:
/// ~~~{.cpp}
/// auto h = std::make_shared<ROOT::Experimental::RHist<double>>(10, {5.0, 15.0});
/// auto myHist = myDf.Hist(h, {"col0"});
/// ~~~
template <typename ColumnType = RDFDetail::RInferredType, typename... ColumnTypes, typename BinContentType>
RResultPtr<ROOT::Experimental::RHist<BinContentType>>
Hist(std::shared_ptr<ROOT::Experimental::RHist<BinContentType>> h, const ColumnNames_t &columnList)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should market this overload as "bring your own histogram" 😅

Jokes aside, this will enable some nice use cases, for example filling a shared histogram from multiple computation graphs, as demonstrated by the RDFHist.PtrRunGraphs unit test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

@github-actions
Copy link

github-actions bot commented Dec 8, 2025

Test Results

    21 files      21 suites   3d 18h 43m 15s ⏱️
 3 787 tests  3 787 ✅ 0 💤 0 ❌
77 582 runs  77 582 ✅ 0 💤 0 ❌

Results for commit 46f531b.

Copy link
Member Author

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things to discuss during review (after the break)

{
std::shared_ptr h = std::make_shared<ROOT::Experimental::RHist<BinContentType>>(std::move(axes));
if (h->GetNDimensions() != columnList.size()) {
throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In histv7, we are consistently throwing std::invalid_argument while RDF prefers std::runtime_error. I think we have to decide which "consistency" to follow...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://en.cppreference.com/w/cpp/error/runtime_error.html and https://en.cppreference.com/w/cpp/error/invalid_argument.html :

  • std::runtime_error: It reports errors that are due to events beyond the scope of the program and cannot be easily predicted.
  • std::invalid_argument: It reports errors that arise because an argument value has not been accepted.

I infer from this that any type of error that arises from breaking a pre-condition of a function, such as this case, should be an invalid_argument. In RDataFrame, we are just using std::runtime_error everywhere (for simplicity/consistency). If we were to apply the two definitions above, we should actually use both type of errors: runtime_error for things like errors in JITting, invalid_argument for this type of cases

/// ROOT::Experimental::RRegularAxis axis(10, {5.0, 15.0});
/// auto myHist = myDf.Hist({axis}, {"col0"});
/// ~~~
template <typename BinContentType = double, typename ColumnType = RDFDetail::RInferredType, typename... ColumnTypes>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could also argue for BinContentType = int or long as default since it's unweighted filling. However, then a number of "post-processing steps" require conversion, for example scaling...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach: least number of operations by default.

Comment on lines 2227 to 2228
/// A list containing the names of the columns that will be passed when calling `Fill`.
/// (N columns for unweighted filling, or N+1 columns for weighted filling)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to ourselves that we want to change this, and backport to 6.38 a warning message

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #20816 to keep track of this comment 👍

struct HistoND{};
struct HistoNSparseD{};
struct Hist{};
struct HistWithWeight{};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a second tag felt like the easiest solution, together with just a template argument bool WithWeight = false for RHistFillHelper. Other approaches are certainly possible...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is valid and given that everything is internal anyway I don't see a reason to spend more time on this.

Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! The PR is already in an advanced state, I have left some minor comments

Comment on lines +513 to +515
std::tuple args{std::get<I>(columnValues)...};
ROOT::Experimental::RWeight weight(std::get<sizeof...(ColumnTypes) - 1>(columnValues));
fContexts[slot]->Fill(args, weight);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the extra args tuple necessary or can we directly pass columnValues to Fill?

struct HistoND{};
struct HistoNSparseD{};
struct Hist{};
struct HistWithWeight{};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is valid and given that everything is internal anyway I don't see a reason to spend more time on this.

}

#ifdef R__HAS_ROOT7
// Action for RHist using RHistConcurrentFiller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Action for RHist using RHistConcurrentFiller
// Action for RHist using RHistConcurrentFiller without weights

return std::make_unique<Action_t>(Helper_t(h, nSlots), columnList, std::move(prevNode), colRegister);
}

// Action for RHist using RHistConcurrentFiller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Action for RHist using RHistConcurrentFiller
// Action for RHist using RHistConcurrentFiller with weights

Comment on lines 2227 to 2228
/// A list containing the names of the columns that will be passed when calling `Fill`.
/// (N columns for unweighted filling, or N+1 columns for weighted filling)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #20816 to keep track of this comment 👍

{
std::shared_ptr h = std::make_shared<ROOT::Experimental::RHist<BinContentType>>(std::move(axes));
if (h->GetNDimensions() != columnList.size()) {
throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://en.cppreference.com/w/cpp/error/runtime_error.html and https://en.cppreference.com/w/cpp/error/invalid_argument.html :

  • std::runtime_error: It reports errors that are due to events beyond the scope of the program and cannot be easily predicted.
  • std::invalid_argument: It reports errors that arise because an argument value has not been accepted.

I infer from this that any type of error that arises from breaking a pre-condition of a function, such as this case, should be an invalid_argument. In RDataFrame, we are just using std::runtime_error everywhere (for simplicity/consistency). If we were to apply the two definitions above, we should actually use both type of errors: runtime_error for things like errors in JITting, invalid_argument for this type of cases

Comment on lines +2404 to +2405
/// During execution of the computation graph, the passed histogram must only be accessed with methods that are
/// allowed during concurrent filling.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always a tricky subject to communicate to users, so I wonder if we can be even more explicit here?

Suggested change
/// During execution of the computation graph, the passed histogram must only be accessed with methods that are
/// allowed during concurrent filling.
/// During execution of the computation graph, the passed histogram must only be accessed with methods that are
/// allowed during concurrent filling, i.e. that are thread-safe

Comment on lines +2407 to +2414
/// ### Example usage:
/// ~~~{.cpp}
/// auto h = std::make_shared<ROOT::Experimental::RHist<double>>(10, {5.0, 15.0});
/// auto myHist = myDf.Hist(h, {"col0"});
/// ~~~
template <typename ColumnType = RDFDetail::RInferredType, typename... ColumnTypes, typename BinContentType>
RResultPtr<ROOT::Experimental::RHist<BinContentType>>
Hist(std::shared_ptr<ROOT::Experimental::RHist<BinContentType>> h, const ColumnNames_t &columnList)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!


std::shared_ptr h = std::make_shared<ROOT::Experimental::RHist<BinContentType>>(std::move(axes));
if (h->GetNDimensions() != columnList.size()) {
throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know both numbers, we could think about making the error a bit more friendly:

Suggested change
throw std::runtime_error("Wrong number of columns for the specified number of histogram axes.");
throw std::runtime_error("Wrong number of columns for the specified number of histogram axes: need '" + std::to_string(h->GetNDimensions()) + "', got '" + std::to_string(columnList.size()) + "'.");

Applies similarly in other places.

{
RDataFrame df(10);
const RRegularAxis axis(10, {5.0, 15.0});
auto hist = df.Define("x", "rdfentry_ + 5.5").Hist({axis}, {"x"});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud, outside the scope of this PR, about how much JITting we strictly need. In principle, the bin content type defaults to double, so what we're JITting is the column type, I wonder if we could find a way to ask the type-system to just read the contents of the column truly as double, then we'd have at least one default scenario without JITting and without the need for template arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants