Skip to content

Conversation

@TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Oct 21, 2025

Description

This adds a keyword-only context argument to cudf_polars IR.do_evaluate method. The purpose to provide access to special pieces of data that might be necessary for controlling an IR nodes' execution, but doesn't belong on the IR node itself as a non-child argument. Specifically, we'd like to provide a CUDA stream argument as part of #20228, but we generalize that slightly and provide a system for providing arbitrary data.

A few notes on the implementation:

  • For now, the context is just an empty dataclass. I suspect its design might change in the future.
  • I've opted to push the creation of the context as high as possible. For now it's created in _callback and passed into ir.evaluate / evaluate_streaming and from there to all the methods that require it.
  • There's some awkwardness between how our IR nodes and Dask's task graph treat arguments. I've opted to make context keyword only in IR.do_evaluate(..., context). However, Dask's task graph doesn't really deal with that. It wants a tuple of (function, arg1, arg2, ...). So that requires using functools.partial(function, context=context)(arg1, arg2, ...).
  • After implementing this, I realized that Expr.evaluate also takes a context, and its a different type ExecutionContext :( I can rename the IR variant if we want.

Just a draft for now, and probably not worth reviewing until I have a branch somewhere that combines CUDA streams with this to verify it meets our needs.

This adds a keyword-only `context` argument to cudf_polars
IR.do_evaluate method. The purpose to provide access to special pieces
of data that might be necessary for controlling an IR nodes' execution,
but doesn't belong on the IR node itself as a non-child argument.
Specifically, we'd like to provide a CUDA `stream` argument, but we
generalize that slightly and provide a system for providing arbitrary
data.
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 21, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Oct 21, 2025
@GPUtester GPUtester moved this to In Progress in cuDF Python Oct 21, 2025
@TomAugspurger TomAugspurger added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 21, 2025
@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Oct 21, 2025

5a77a80 has a POC for how this can be used. We add a new_stream: Callable[[], Stream] member to the context dataclass. Inside of do_evaluate we can call context.new_stream(). Once rapidsai/rapidsmpf#592 is done, we should be able to pass in a context that uses the stream pool from rapidsmpf.

Alternatively, rather than giving a Callable[[], Stream] we could attach a stream directly to the context and relying on dataclasses.replace() with new streams as needed. I'm not sure which is better at the moment.

Finally, we could drop the dataclass and just make it a dictionary. But I'd prefer to keep things structured where possible, so that both the functions and the callers of the function know what belongs in the context. We can attach an extra field to the dataclass that's just a dictionary if we need to pass arbitrary things in.

@TomAugspurger TomAugspurger marked this pull request as ready for review October 21, 2025 17:48
@TomAugspurger TomAugspurger requested a review from a team as a code owner October 21, 2025 17:48
@TomAugspurger
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 6fdbd4e into rapidsai:main Oct 23, 2025
473 of 489 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Oct 23, 2025
@TomAugspurger TomAugspurger deleted the tom/cudf-polars-ir-context branch October 23, 2025 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cudf-polars Issues specific to cudf-polars improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants