Skip to content

SurvivalSplittingRule's relabeled_failures should be ctor initialized #1511

@erikcs

Description

@erikcs

This initialization happens at every n-sized node with cost O(N) where N is the full data set size. It should just be initialized once at instantiation of the splitting rule.

To do that, we just need to add one more optional argument to the SplittingRuleFactory::create signature (the caller has has everything it needs for that).

Back when adding SurvivalSplittingRule, I skipped this "optimization" with resulting signature change since it made practically zero difference in performance: log-rank splitting was the bottle neck. But, with #1509 (which also uses relabeled_failures), this change could give a gain of around ~10%.

This is not super urgent, but worth addressing. Are you OK with this change, @jtibshirani ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceIssue relates to the speed, memory usage, or scaling aspects of the package.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions