Skip to content

Conversation

@timsaucer
Copy link
Member

@timsaucer timsaucer commented Nov 24, 2025

Which issue does this PR close?

Addresses part of #18671 but does not close it.

Rationale for this change

We have an issue in the current FFI code in that we will need access to the TaskContext to do function encoding and decoding to pass logical expressions across the boundary. We use the proto crate for this operations.

We cannot encode and decode functions that have been registered since our current code creates a default session context. This causes problems in crates where we are using UDFs as inputs to either aggregations, window functions, or table filters.

With this change we keep a weak reference to a new trait, a TaskContextProvider. By keeping a weak trait we make sure we do not create a circular dependency between function that is internally holding on to the provider, which holds the task context, which holds the function.

What changes are included in this PR?

  • Introduce the TaskContextProvider trait
  • Implement FFI versions of TaskContext and TaskContextProvider.

This PR does not use these structures in the current code. That is coming as part of a later PR in an effort to keep the size of the PRs small for effective code review.

Are these changes tested?

Unit tests are added. Coverage report:

Screenshot 2025-11-25 at 2 36 57 PM

Are there any user-facing changes?

No.

@github-actions github-actions bot added core Core DataFusion crate execution Related to the execution crate ffi Changes to the ffi crate labels Nov 24, 2025
@timsaucer timsaucer changed the title Implement task context and task context provider Implement FFI task context and task context provider Nov 24, 2025
@timsaucer timsaucer marked this pull request as ready for review November 26, 2025 12:11
@timsaucer
Copy link
Member Author

@ntjohnson1 a good starter FFI issue for review

@timsaucer timsaucer requested a review from Copilot November 26, 2025 12:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces FFI support for TaskContext and a new TaskContextProvider trait to enable cross-boundary function encoding/decoding. The changes address the limitation where registered UDFs couldn't be properly encoded/decoded across FFI boundaries due to the use of default session contexts.

Key changes:

  • Added TaskContextProvider trait with implementations for SessionState and SessionContext
  • Created FFI wrapper structures (FFI_TaskContext and FFI_TaskContextProvider) for safe cross-boundary passing
  • Refactored session config handling to use direct SessionConfig instead of the wrapper type

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
datafusion/execution/src/task.rs Defines the new TaskContextProvider trait
datafusion/execution/src/lib.rs Exports the new TaskContextProvider trait
datafusion/core/src/execution/session_state.rs Implements TaskContextProvider for SessionState
datafusion/core/src/execution/context/mod.rs Implements TaskContextProvider for SessionContext
datafusion/ffi/src/session_config.rs Refactors to use SessionConfig directly and adds library marker support
datafusion/ffi/src/table_provider.rs Updates to use SessionConfig directly instead of wrapper
datafusion/ffi/src/execution/task_ctx.rs Implements FFI wrapper for TaskContext
datafusion/ffi/src/execution/task_ctx_provider.rs Implements FFI wrapper for TaskContextProvider with weak references
datafusion/ffi/src/execution/mod.rs Module definition for execution-related FFI types
datafusion/ffi/src/lib.rs Adds execution module to public API
datafusion/ffi/Cargo.toml Adds required dependencies for execution support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@ntjohnson1 ntjohnson1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM a few small clarifying questions/clarifications

use datafusion_execution::config::SessionConfig;

/// A stable struct for sharing [`SessionConfig`] across FFI boundaries.
/// Instead of attempting to expose the entire SessionConfig interface, we
Copy link

@ntjohnson1 ntjohnson1 Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is no longer storing ConfigOptions as private data. So this doc string doesn't seem to be totally correct. It still uses config options to go from FFI_ to SessionConfig but doesn't convert to config options to get into FFI

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further review I think the documentation is still correct. While we are no longer storing it as ConfigOptions in the private data, when we do pass it across the boundary we convert the session config into config options and pass those as described in the docstring.

/// interface.s
pub struct ForeignSessionConfig(pub SessionConfig);

impl TryFrom<&FFI_SessionConfig> for ForeignSessionConfig {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like maybe a note that the from_string approach allows directly converting rather than extracting with Foreign. This breaks a little bit from the pattern but definitely seems nicer here.

#[repr(C)]
#[derive(Debug, StableAbi)]
#[allow(non_camel_case_types)]
pub struct FFI_TaskContextProvider {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned this is used primarily for data coming across protobuf so should this be set more internally? Ex. pub(crate) or something. You mentioned usage in follow up so can review this point in context there if that makes more sense.

unsafe impl Sync for FFI_TaskContextProvider {}

struct TaskContextProviderPrivateData {
ctx: Weak<dyn TaskContextProvider>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly ok if it makes more sense to look at in follow up PR note here. Do you have a concrete code snippet example for the circular reference this breaks? I feel like a small comment with that might make this clearer for the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more text to the README and the follow on PRs should make it more abundantly clear.

}

unsafe extern "C" fn release_fn_wrapper(ctx: &mut FFI_TaskContext) {
let private_data = Box::from_raw(ctx.private_data as *mut TaskContextPrivateData);
Copy link
Contributor

@comphead comphead Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like in other PR, any chances of double free here for concurrent execution?
I can see

unsafe impl Send/Sync

but it is not guaranteed afaik?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FFI_TaskContext is not cloneable, so I don't know how that would happen. Maybe I'm missing something?

More generally, those other structs that are Clone I don't think could possibly double free because each one owns its own private data.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @timsaucer I think it is good and in the same consistent way like other PRs, my more curious questions on double free and if you planning to comment all pubs, it might be beneficial for auto generated docs


/// Internal data. This is only to be accessed by the provider of the plan.
/// The foreign library should never attempt to access this data.
pub private_data: *mut c_void,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, but could this lead to a double free? for example, if FFI users clone the FFI_TaskContext, we might end up holding multiple references to private_data .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FFI_TaskContext does not implement Clone. Do you see somewhere else this could happen?

Copy link
Contributor

@renato2099 renato2099 Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @timsaucer , I was thinking on explicitly forbidding cloning / copying, somthing like

impl !Clone for FFI_TaskContext {} 
impl !Copy for FFI_TaskContext {} 

I was thinking that worst case scenario , FFI users could do a memcopy of the struct , so that it would be somehow "safer" to have something like

pub struct FFI_TaskContext {
    // An opaque pointer to an Arc<TaskContextPrivateData>
    arc_ptr: *const Arc<TaskContextPrivateData>,
}

and then returning a raw pointer to inner data , something along these lines

    pub fn as_opaque(&self) -> *const c_void {
        self.arc_ptr as *const c_void
    }

let udf = <Arc<dyn ScalarUDFImpl>>::try_from(&kv_pair.1);

if let Err(err) = &udf {
log::error!("Unable to create WindowUDF in FFI: {err}")
Copy link
Contributor

@renato2099 renato2099 Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question on the expected behavior here, let's say that we fail while attempting try_From , could not failing loud cause confusion/bugs of functions not found/missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent question! In reviewing I discovered that I had some intermediate work where my conversion of FFI UDFs was fallible but the final versions were infallible! I updated the traits to use From<> instead of TryFrom<> so this is no longer an issue in this code snippet. Thank you for bringing it up!

@timsaucer timsaucer force-pushed the feat/2-ffi-implement-task-ctx-provider branch from 867398f to 053ef21 Compare December 3, 2025 12:26
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 3, 2025
@timsaucer
Copy link
Member Author

Thank you for the reviews @ntjohnson1 @renato2099 @comphead ! I've rebased now that the other PR went in and made some adjustments based on the feedback.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @timsaucer for driving this and appreciate @renato2099 @ntjohnson1 for the review

@timsaucer timsaucer added this pull request to the merge queue Dec 3, 2025
Merged via the queue into apache:main with commit 0c6b654 Dec 3, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate documentation Improvements or additions to documentation execution Related to the execution crate ffi Changes to the ffi crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants