Skip to content

Conversation

@lovincyrus
Copy link
Contributor

@lovincyrus lovincyrus commented Nov 19, 2025

This PR supports explicit credentials for S3 connector. Closes https://linear.app/rilldata/issue/APP-396/s3-explicit-credential-flow-for-models

CleanShot 2025-11-21 at 13 36 35@2x CleanShot 2025-11-21 at 13 37 39@2x

Checklist:

  • Covered by tests
  • Ran it and it works as intended
  • Reviewed the diff before requesting a review
  • Checked for unhandled edge cases
  • Linked the issues it closes
  • Checked if the docs need to be updated. If so, create a separate Linear DOCS issue
  • Intend to cherry-pick into the release branch
  • I'm proud of this work!

@lovincyrus lovincyrus mentioned this pull request Nov 19, 2025
8 tasks
@lovincyrus lovincyrus self-assigned this Nov 20, 2025
@lovincyrus lovincyrus force-pushed the cyrus/s3-explicit-credentials-1 branch from b347232 to d3448f5 Compare November 21, 2025 04:31
@lovincyrus lovincyrus requested a review from royendo November 21, 2025 05:38
@lovincyrus lovincyrus marked this pull request as ready for review November 21, 2025 05:38
Copy link
Contributor

@royendo royendo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice,
Getting Strange Behavior, likely because PLAT-278 was just merged but not applied.

steps to reproduce.

  1. create s3 cred with wrong creds
  2. create s3 cred with right creds
  3. create model with appropriate bucket path that cred_2 can access but it fails due likley using cred 1 in the backend.

Did a quick test, lets go ahead and add the
create_secrets_from_connectors: s3 key into the YAML, solves the above issue. (this will be for all cloud storage: azure, s3, gcs)

Second issue is the name of the YAML. looks like its iterating on s3, s4 due to the integer.
Screenshot 2025-11-21 at 17 20 19

https://jam.dev/c/f3536ced-5c16-411f-b6fb-add0f9b3dcfb

Other than that looks good! public buckets working too.

@lovincyrus lovincyrus force-pushed the cyrus/s3-explicit-credentials-1 branch from d3448f5 to 3604ff3 Compare November 24, 2025 09:36
@lovincyrus
Copy link
Contributor Author

lovincyrus commented Nov 24, 2025

Second issue is the name of the YAML. looks like its iterating on s3, s4 due to the integer.
Screenshot 2025-11-21 at 17 20 19

Fixed in a38d1df

@lovincyrus
Copy link
Contributor Author

Getting Strange Behavior, likely because PLAT-278 was just merged but not applied.

steps to reproduce.

  1. create s3 cred with wrong creds
  2. create s3 cred with right creds
  3. create model with appropriate bucket path that cred_2 can access but it fails due likley using cred 1 in the backend.

Did a quick test, lets go ahead and add the create_secrets_from_connectors: s3 key into the YAML, solves the above issue. (this will be for all cloud storage: azure, s3, gcs)

Is this part of the scope of the ticket? I don't see frontend changes for https://linear.app/rilldata/issue/PLAT-278/support-create-secrets-from-connectors-for-duckdb @royendo

Copy link
Contributor

royendo commented Nov 24, 2025

adding frontend changes isn't part of the scope in PLAT-278.

Depending on the timeline, i was going to have it added on APP-467 since this is when we'll revisit cloud connector storage and model flows. However, if its a trivial add to append the connector name under the create_secrets_from_connectors on these PRs, please do so 🙂

@lovincyrus lovincyrus force-pushed the cyrus/s3-explicit-credentials-1 branch from 9135a0e to d80c37b Compare November 25, 2025 03:45
@lovincyrus
Copy link
Contributor Author

lovincyrus commented Nov 25, 2025

adding frontend changes isn't part of the scope in PLAT-278.Depending on the timeline, i was going to have it added on APP-467 since this is when we'll revisit cloud connector storage and model flows. However, if its a trivial add to append the connector name under the create_secrets_from_connectors on these PRs, please do so 🙂

Added in d80c37b

Can you also provide a brief explanation of when the create_secrets_from_connectors: s3 function should be included in a S3 model file? If no S3 connector file is created, should we omit the create_secrets_from_connectors function from the model file?

@lovincyrus lovincyrus requested a review from royendo November 25, 2025 03:46
@lovincyrus lovincyrus mentioned this pull request Nov 25, 2025
8 tasks
Copy link
Contributor

@royendo royendo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also provide a brief explanation of when the create_secrets_from_connectors: s3 function should be included in a S3 model file? If no S3 connector file is created, should we omit the create_secrets_from_connectors function from the model file?

So basically today, we(duck) does some magic to select which credentials to use if you have multiple secrets defined. In my QA i had two connectors and it was 50/50 on which connector it wanted to use. so we add it in on Rill side to ensure its using the correct one.

Id pick to not add anything to the YAML for public sources. This makes the logic easier as we wont need to consider "which one should we add?"

A few things i noticed:

@lovincyrus
Copy link
Contributor Author

lovincyrus commented Nov 25, 2025

We currently don’t know which connector instance name to use until submission because the lookup happens asynchronously during submission.

Update: Oh! I think I'm misunderstanding this flow. We will get the connector instance name when user is in the multi step form.

Update: Added the fix fb5605b

@lovincyrus lovincyrus force-pushed the cyrus/s3-explicit-credentials-1 branch from d0aa62e to fb5605b Compare November 25, 2025 16:56
@royendo royendo self-requested a review November 25, 2025 17:30
Copy link
Contributor

@royendo royendo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost!

one strange behavior i see is that
create_secrets_from_connectors: s3
works even when there is no connector present.

@ericpgreen2
Copy link
Contributor

ericpgreen2 commented Nov 25, 2025

Code Review: Abstractions and Function Boundaries

This PR introduces good patterns but has several areas that need attention before merge. Focusing on code structure and boundaries:

🚨 Blocking Issues

1. Validation Logic Broken (web-common/src/features/sources/modal/yupSchemas.ts:8-18)

All S3 fields are now optional, which is incorrect:

s3: yup.object().shape({
  aws_access_key_id: yup.string().optional(), // Should be required in step 1
  aws_secret_access_key: yup.string().optional(), // Should be required in step 1
  path: yup.string().matches(...).optional(), // Should be required in step 2
  name: yup.string().matches(...).optional(), // Should be required in step 2
}),

Solution: Create separate schemas for each step:

s3_connector: yup.object().shape({
  aws_access_key_id: yup.string().required("AWS access key ID is required"),
  aws_secret_access_key: yup.string().required("AWS secret access key is required"),
}),
s3_source: yup.object().shape({
  path: yup.string().matches(/^s3:\/\//, "Must be an S3 URI").required(),
  name: yup.string().matches(VALID_NAME_PATTERN, INVALID_NAME_MESSAGE).required(),
}),

2. Missing Test Coverage (Checklist Item Unchecked)

The multi-step flow introduces complex state management that needs testing:

  • Connector instance name resolution with fallbacks
  • State transitions between steps
  • Edge cases: no matching connectors, network failures, concurrent submissions

🔧 Function Boundary Issues

3. AddDataFormManager.getYamlPreview() Has Mixed Responsibilities (web-common/src/features/sources/modal/AddDataFormManager.ts:395-520)

This function does too many things:

  • State inspection and merging
  • ClickHouse special-casing
  • Multi-step vs single-step logic
  • Connector rewriting
  • YAML generation

Current structure (simplified):

getYamlPreview(ctx: {...}) {
  // 1. Handle ClickHouse special case
  if (isClickhouse) { ... }
  
  // 2. Handle multi-step connectors
  if (isMultiStepConnector && stepState?.step === "source") { ... }
  
  // 3. Rewrite connector if needed
  const rewritten = rewriteConnectorToLocal(...)
  
  // 4. Generate YAML
  return compileYAML(...)
}

Suggested refactor: Break into specialized functions:

private prepareYamlPreviewValues(ctx: PreviewContext): PreparedValues {
  // Handle all state inspection and value merging
}

private getMultiStepSourceYamlPreview(values: PreparedValues): string {
  // Multi-step specific logic
}

private getClickHouseYamlPreview(values: PreparedValues): string {
  // ClickHouse specific logic
}

getYamlPreview(ctx: PreviewContext): string {
  // Route to appropriate handler
}

4. submitAddSourceForm() Mixes Concerns (web-common/src/features/sources/modal/submitAddDataForm.ts:190-250)

Function now handles 4 distinct responsibilities:

  1. Form data preparation
  2. Connector instance name resolution (new complex logic)
  3. File creation
  4. Reconciliation waiting

The connector instance resolution block (lines 202-220) should be extracted:

async function getConnectorInstanceForSource(
  queryClient: QueryClient,
  instanceId: string,
  connector: V1ConnectorDriver,
  stepState: ConnectorStepState | null
): Promise<string | undefined> {
  // Consolidate all the instance name resolution logic here
}

5. Component Duplication: S3MultiStepForm.svelte vs GCSMultiStepForm.svelte

These two components are nearly identical (only differ in which keys to exclude). This violates DRY and makes maintenance harder.

Solution: Create a shared MultiStepConnectorForm.svelte:

<script lang="ts">
  export let connector: string; // "gcs" | "s3"
  export let properties: any[];
  export let paramsForm: any;
  export let paramsErrors: Record<string, any>;
  export let onStringInputChange: (e: Event) => void;
  
  // Define exclusions based on connector type
  const excludeKeys = ["path", "name"]; // Common exclusions
</script>

Then replace both forms with this shared component.


⚠️ Error Handling Gaps

6. Silent Failure in Connector Instance Lookup (submitAddDataForm.ts:180-183)

} catch {
  // If lookup fails, return undefined and rely on auto-detection.
  return undefined;
}

This fails silently. Users won't know if the wrong connector is being used or if there's a configuration issue.

Add logging:

} catch (error) {
  console.warn(`Failed to resolve ${driverName} connector instance:`, error);
  return undefined;
}

7. Race Condition in Submission Tracking (submitAddDataForm.ts:394-401)

if (existingSubmission.completed) {
  await waitForResourceReconciliation(...);
  return newConnectorName;
} else if (!existingSubmission.completed) {
  await existingSubmission.promise;
  return existingSubmission.connectorName;
}

If existingSubmission completes between the first check and the else if, behavior is undefined. Simplify:

if (existingSubmission.completed) {
  // Already done
} else {
  // Not done, wait for it
  await existingSubmission.promise;
}
return existingSubmission.connectorName;

📋 Action Items Before Merge

  1. Fix validation schemas for multi-step S3 forms
  2. Add error logging for connector instance lookup failures
  3. Fix race condition in submission tracking
  4. Add tests for multi-step flow and edge cases
  5. Refactor getYamlPreview() to extract specialized functions
  6. Consolidate S3MultiStepForm and GCSMultiStepForm into shared component
  7. Extract connector instance resolution into helper function
  8. Add JSDoc for submitAddConnectorForm return value change

Developed in collaboration with Claude Code and Eric

Copy link
Contributor

@ericpgreen2 ericpgreen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some specific comments below, and also see the Claude Code comment above. Most importantly, please add e2e tests for the S3 flow.

Description: "Optional external ID to use when assuming an AWS role for cross-account access.",
},
},
SourceProperties: []*drivers.PropertySpec{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these used anymore? Can we delete them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're still being used.

CleanShot 2025-12-09 at 00 12 17@2x

})
.join("\n");

// Add create_secrets_from_connectors for S3 models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused that we're creating model YAML within a compileSourceYAML function. To match the compileConnectorYAML function, shouldn't we have a compileModelYAML function that accepts the connectorName as a parameter? Then we can avoid passing a connectorInstanceName to this compileSourceYAML function, which is quite confusing.

Copy link
Contributor Author

@lovincyrus lovincyrus Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compileSourceYAML is the old naming convention. I've cleaned it up in this commit 5c7d43e

}
}

async function resolveConnectorInstanceName(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a function-level comment to describe what this does?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we don't have a formalized concept of a connector instance. So it's a bit confusing to see that terminology introduced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CleanShot 2025-12-09 at 00 38 44@2x

@lovincyrus
Copy link
Contributor Author

Superseded by #8467

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants