Skip to content

Conversation

Joe-Abraham
Copy link
Contributor

@Joe-Abraham Joe-Abraham commented Oct 6, 2025

Description

Fixes

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Oct 6, 2025
Copy link
Contributor

sourcery-ai bot commented Oct 6, 2025

Reviewer's Guide

Implements catalog-scoped function metadata retrieval and registration by extending the native sidecar API, HTTP endpoint, client provider, connector integration for Hive functions, OpenAPI spec, and corresponding tests to support custom schemas.

Class diagram for NativeFunctionDefinitionProvider and catalog-scoped retrieval

classDiagram
  class NativeFunctionDefinitionProvider {
    - JsonCodec nativeFunctionSignatureMapJsonCodec
    - HttpClient httpClient
    - NativeFunctionNamespaceManagerConfig config
    - String catalogName
    + NativeFunctionDefinitionProvider(..., String catalogName)
    + UdfFunctionSignatureMap getUdfDefinition(NodeManager)
  }
  class UdfFunctionSignatureMap {
    + UdfFunctionSignatureMap(Map<String, List<JsonBasedUdfFunctionMetadata>>)
  }
  NativeFunctionDefinitionProvider --> UdfFunctionSignatureMap
Loading

File-Level Changes

Change Details Files
Add catalog-filtered metadata retrieval
  • Overload getFunctionsMetadata to accept a catalog parameter
  • Iterate and filter scalar, aggregate, and window functions by catalog using name parts
  • Exclude internal, blocklisted, and companion functions
presto-native-execution/presto_cpp/main/functions/FunctionMetadata.cpp
presto-native-execution/presto_cpp/main/functions/FunctionMetadata.h
Expose catalog-scoped functions endpoint in sidecar
  • Register new GET /v1/functions/{catalog} route in PrestoServer
  • Invoke getFunctionsMetadata(catalog) in the request handler
presto-native-execution/presto_cpp/main/PrestoServer.cpp
Document new catalog endpoint in OpenAPI spec
  • Add /v1/functions/{catalog} path with catalog path parameter
  • Define 200 (UdfSignatureMap) and 404 responses in rest_function_server.yaml
presto-openapi/src/main/resources/rest_function_server.yaml
Consume catalog parameter in NativeFunctionDefinitionProvider
  • Inject catalogName into provider and store it
  • Construct /v1/functions/{catalog} URI for HTTP request
  • Update exception message and remove fallback to unfiltered endpoint
presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/functionNamespace/NativeFunctionDefinitionProvider.java
Integrate Hive-specific function registration module
  • Create InitCapFunction and HiveFunctionRegistration for hive.default namespace
  • Call registerHiveNativeFunctions in PrestoToVeloxConnector for hive connectors
  • Update CMakeLists to include presto_hive_functions library and hive subdirectory
presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.cpp
presto-native-execution/presto_cpp/main/connectors/CMakeLists.txt
presto-native-execution/presto_cpp/main/connectors/hive/functions/InitcapFunction.h
presto-native-execution/presto_cpp/main/connectors/hive/functions/HiveFunctionRegistration.cpp
presto-native-execution/presto_cpp/main/connectors/hive/functions/HiveFunctionRegistration.h
presto-native-execution/presto_cpp/main/connectors/hive/CMakeLists.txt
presto-native-execution/presto_cpp/main/connectors/hive/functions/CMakeLists.txt
Add tests and plugin util updates for catalog support
  • Add unit tests for getFunctionsMetadata with valid and non-existent catalogs
  • Extend NativeSidecarPluginQueryRunnerUtils to load hive catalog manager
presto-native-execution/presto_cpp/main/functions/tests/FunctionMetadataTest.cpp
presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/NativeSidecarPluginQueryRunnerUtils.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@Joe-Abraham Joe-Abraham changed the title feat(native): support custom schemas in native sidecar function registry feat(native): Support custom schemas in native sidecar function registry Oct 7, 2025
@Joe-Abraham
Copy link
Contributor Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The new GET /v1/functions/{catalog} handler always returns 200 with null or empty JSON even for non‐existent catalogs—modify it to return a 404 when no functions are found to match the OpenAPI spec.
  • The /v1/functions/{catalog} path conflicts with the existing /v1/functions/{schema} route—consider renaming or ordering the handlers to avoid ambiguous routing.
  • The hard-coded blocklist in getFunctionsMetadata is scattered in the implementation—extract it into a shared constant or make it configurable to simplify future updates.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new GET /v1/functions/{catalog} handler always returns 200 with null or empty JSON even for non‐existent catalogs—modify it to return a 404 when no functions are found to match the OpenAPI spec.
- The `/v1/functions/{catalog}` path conflicts with the existing `/v1/functions/{schema}` route—consider renaming or ordering the handlers to avoid ambiguous routing.
- The hard-coded blocklist in getFunctionsMetadata is scattered in the implementation—extract it into a shared constant or make it configurable to simplify future updates.

## Individual Comments

### Comment 1
<location> `presto-native-execution/presto_cpp/main/functions/FunctionMetadata.cpp:341` </location>
<code_context>
+      continue;
+    }
+
+    const auto parts = getFunctionNameParts(name);
+    if (parts[0] != catalog) {
+      continue;
</code_context>

<issue_to_address>
**issue:** Potential out-of-bounds access in 'parts' array.

Validate that 'parts' contains at least three elements before accessing indices 1 and 2 to prevent undefined behavior.
</issue_to_address>

### Comment 2
<location> `presto-native-execution/presto_cpp/main/connectors/PrestoToVeloxConnector.cpp:63-65` </location>
<code_context>
+
+  // Register hive-specific functions when hive catalog is detected.
+  // Delegate to generic Hive native function registrar which is idempotent.
+  if (connectorName ==
+          velox::connector::hive::HiveConnectorFactory::kHiveConnectorName ||
+      connectorName == std::string("hive-hadoop2")) {
+    hive::functions::registerHiveNativeFunctions();
+  }
</code_context>

<issue_to_address>
**suggestion:** Connector name comparison may be brittle.

Hardcoding connector names increases maintenance risk if new variants are added. Centralize connector name definitions or use a more robust matching approach.

Suggested implementation:

```cpp
  // Register hive-specific functions when hive catalog is detected.
  // Delegate to generic Hive native function registrar which is idempotent.
  if (isHiveConnector(connectorName)) {
    hive::functions::registerHiveNativeFunctions();
  }

```

```cpp
#include "velox/functions/FunctionRegistry.h"
#include <unordered_set>

```

```cpp
      connectorName);
  protocol::registerConnectorProtocol(
      connectorName, std::move(connectorProtocol));

// Centralized Hive connector name definitions.
namespace {
const std::unordered_set<std::string> kHiveConnectorNames = {
    velox::connector::hive::HiveConnectorFactory::kHiveConnectorName,
    "hive-hadoop2"
};

bool isHiveConnector(const std::string& connectorName) {
  return kHiveConnectorNames.count(connectorName) > 0;
}
} // namespace

```
</issue_to_address>

### Comment 3
<location> `presto-native-execution/presto_cpp/main/connectors/hive/functions/InitcapFunction.h:30` </location>
<code_context>
+struct InitCapFunction {
+  VELOX_DEFINE_FUNCTION_TYPES(T);
+
+  static constexpr bool is_default_ascii_behavior = true;
+
+  FOLLY_ALWAYS_INLINE void call(
</code_context>

<issue_to_address>
**nitpick:** Unused 'is_default_ascii_behavior' constant.

If this constant is not needed, please remove it. If it is reserved for future use or external access, add a comment explaining its purpose.
</issue_to_address>

### Comment 4
<location> `presto-native-execution/presto_cpp/main/functions/tests/FunctionMetadataTest.cpp:117` </location>
<code_context>
   testFunction("variance", "Variance.json", 5);
 }
+
+TEST_F(FunctionMetadataTest, GetFunctionsMetadataWithCatalog) {
+  // Test with the "presto" catalog that is registered in SetUpTestSuite
+  std::string catalog = "presto";
</code_context>

<issue_to_address>
**suggestion (testing):** Missing test coverage for custom schemas and non-default schemas.

Please add tests for functions registered under non-default schemas to ensure getFunctionsMetadata returns correct metadata for those cases.

Suggested implementation:

```cpp
TEST_F(FunctionMetadataTest, GetFunctionsMetadataWithCatalog) {
  // Test with the "presto" catalog that is registered in SetUpTestSuite
  std::string catalog = "presto";
  auto metadata = getFunctionsMetadata(catalog);

  // The result should be a JSON object with function names as keys
  ASSERT_TRUE(metadata.is_object());
  ASSERT_FALSE(metadata.empty());

  // Verify that common functions are present
  ASSERT_TRUE(metadata.contains("abs"));
  ASSERT_TRUE(metadata.contains("mod"));
}

// Register a function under a custom schema for testing purposes.
TEST_F(FunctionMetadataTest, GetFunctionsMetadataWithCustomSchema) {
  // Assume registerFunction is available for registering test functions.
  // Register a function "custom_func" under schema "custom_schema".
  std::string customCatalog = "presto";
  std::string customSchema = "custom_schema";
  std::string functionName = "custom_func";
  // The registration API may differ; adjust as needed for your codebase.
  registerFunction(customCatalog, customSchema, functionName, /*function implementation*/ nullptr);

  auto metadata = getFunctionsMetadata(customCatalog);

  // The result should include the custom function under the custom schema.
  ASSERT_TRUE(metadata.is_object());
  ASSERT_TRUE(metadata.contains(functionName));
  // Optionally, check that the schema is correct in the metadata.
  ASSERT_EQ(metadata[functionName]["schema"], customSchema);
}

// Register a function under another non-default schema for additional coverage.
TEST_F(FunctionMetadataTest, GetFunctionsMetadataWithNonDefaultSchema) {
  std::string catalog = "presto";
  std::string nonDefaultSchema = "analytics";
  std::string functionName = "analytics_func";
  registerFunction(catalog, nonDefaultSchema, functionName, /*function implementation*/ nullptr);

  auto metadata = getFunctionsMetadata(catalog);

  ASSERT_TRUE(metadata.is_object());
  ASSERT_TRUE(metadata.contains(functionName));
  ASSERT_EQ(metadata[functionName]["schema"], nonDefaultSchema);
}

```

- Ensure that the `registerFunction` API exists and is accessible in your test environment. If not, you may need to use the actual function registration mechanism used in your codebase.
- If the metadata structure differs (e.g., schema is not a direct property), adjust the assertions accordingly.
- If setup/teardown is required for custom functions, add appropriate cleanup code.
</issue_to_address>

### Comment 5
<location> `presto-native-execution/presto_cpp/main/functions/tests/FunctionMetadataTest.cpp:150` </location>
<code_context>
+  }
+}
+
+TEST_F(FunctionMetadataTest, GetFunctionsMetadataWithNonExistentCatalog) {
+  // Test with a catalog that doesn't exist
+  std::string catalog = "nonexistent";
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding a test for error conditions or malformed catalog names.

Please include test cases for malformed catalog names, such as empty strings or special characters, to verify robust error handling.
</issue_to_address>

### Comment 6
<location> `presto-native-execution/presto_cpp/main/functions/tests/FunctionMetadataTest.cpp:143` </location>
<code_context>
+      ASSERT_TRUE(signature.contains("functionKind"))
+          << "Function: " << it.key();
+
+      // Schema should be "default" since we registered with "presto.default."
+      // prefix
+      EXPECT_EQ(signature["schema"], "default") << "Function: " << it.key();
</code_context>

<issue_to_address>
**nitpick (testing):** Test assertions could be more robust for schema values.

Hardcoding 'default' for the schema may cause the test to fail if registration logic changes or custom schemas are used. Parameterize the expected schema or derive it from the registration logic to improve test resilience.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

continue;
}

const auto parts = getFunctionNameParts(name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Potential out-of-bounds access in 'parts' array.

Validate that 'parts' contains at least three elements before accessing indices 1 and 2 to prevent undefined behavior.

struct InitCapFunction {
VELOX_DEFINE_FUNCTION_TYPES(T);

static constexpr bool is_default_ascii_behavior = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Unused 'is_default_ascii_behavior' constant.

If this constant is not needed, please remove it. If it is reserved for future use or external access, add a comment explaining its purpose.

ASSERT_TRUE(signature.contains("functionKind"))
<< "Function: " << it.key();

// Schema should be "default" since we registered with "presto.default."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (testing): Test assertions could be more robust for schema values.

Hardcoding 'default' for the schema may cause the test to fail if registration logic changes or custom schemas are used. Parameterize the expected schema or derive it from the registration logic to improve test resilience.

@Joe-Abraham Joe-Abraham marked this pull request as ready for review October 8, 2025 11:54
@Joe-Abraham Joe-Abraham requested review from pdabre12 and a team as code owners October 8, 2025 11:54
@prestodb-ci prestodb-ci requested review from a team and sh-shamsan and removed request for a team October 8, 2025 11:54
Copy link
Contributor

@pdabre12 pdabre12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Joe-Abraham ,
From a high-level, the changes look good to me. Will do a more thorough analysis later.

Can you write a small RFC so we can get more feedback on the architecture?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants