Skip to content

Conversation

@KKould
Copy link
Member

@KKould KKould commented Nov 11, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

python: databendlabs/databend-udf#16

Supports remote external UDTF, and also supports STAGE_LOCATION as a parameter type.

@udf(
    stage_refs=["data_stage"],
    input_types=["INT"],
    result_type=[
        ("stage_name", "VARCHAR"),
        ("stage_type", "VARCHAR"),
        ("bucket", "VARCHAR"),
        ("relative_path", "VARCHAR"),
        ("value", "INT"),
        ("summary", "VARCHAR"),
    ],
)
def stage_summary_udtf(data_stage: StageLocation, value: int):
    assert data_stage.stage_type.lower() == "external"
    assert data_stage.storage
    bucket = _stage_bucket(data_stage)
    summary = f"{data_stage.stage_name}:{bucket}:{data_stage.relative_path}:{value}"
    return [
        {
            "stage_name": data_stage.stage_name or "",
            "stage_type": data_stage.stage_type or "",
            "bucket": bucket,
            "relative_path": data_stage.relative_path or "",
            "value": value,
            "summary": summary,
        }
    ]
CREATE OR REPLACE FUNCTION stage_summary_udtf(data_stage STAGE_LOCATION, arg int) RETURNS TABLE (stage_name varchar, stage_type varchar, bucket varchar,relative_path varchar, value int, summary varchar) LANGUAGE python HANDLER = 'stage_summary_udtf' HEADERS = ('X-Authorization' = '123') ADDRESS = 'http://0.0.0.0:8815';

SELECT * from stage_summary_udtf(@s3_stage/output/2024, 21)
----
s3_stage External test output/2024 21 s3_stage:test:output/2024:21

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Nov 11, 2025
@KKould KKould requested a review from BohuTANG November 14, 2025 16:33
@KKould KKould marked this pull request as ready for review November 14, 2025 16:38
@KKould KKould requested a review from drmingdrmer as a code owner November 14, 2025 16:38
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

pub async fn fold_udf_server(
&mut self,
name: &str,
args: Vec<Scalar>,
udf_definition: UDFServer,
) -> Result<Scalar> {
let mut block_entries = Vec::with_capacity(args.len());
for (arg, dest_type) in args.into_iter().zip(
udf_definition
.arg_types
.iter()
.filter(|ty| ty.remove_nullable() != DataType::StageLocation),
) {
if matches!(dest_type, DataType::StageLocation) {
continue;
}
let entry = BlockEntry::new_const_column(dest_type.clone(), arg, 1);
block_entries.push(entry);

P1 Badge Filtered arg types desynchronise UDF constant-fold arguments

When folding immutable server UDFs, the loop builds Flight arguments by zipping the full args list with udf_definition.arg_types.iter().filter(|ty| ty.remove_nullable() != DataType::StageLocation). Because only the type iterator is filtered, any StageLocation arguments remain in args but are now paired with the next non-stage type. The first stage argument is therefore sent with the wrong type, and every subsequent argument shifts left while the final argument is dropped. For immutable UDFs that accept a stage location, constant evaluation will call the external server with misordered or missing parameters, producing incorrect results or server-side errors. The loop should filter both sequences consistently (e.g. iterate all pairs and continue when the type is StageLocation).

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature this PR introduces a new feature to the codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant