-
Notifications
You must be signed in to change notification settings - Fork 32
Adds S3 source connector #659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit introduces a new S3 source connector that monitors an S3 bucket for new files and streams their content to a Kafka topic. The connector supports: - Configuration via environment variables - File monitoring with configurable polling interval - Optional content download and chunking for large files - Metadata extraction for file information This replaces the previous S3 source implementation with a custom file watcher for more flexibility and control over the data ingestion process.
Updates the S3 source configuration to remove the app.yaml file and consolidate settings into the library.json. Configures the deployment type to 'Service' and increases the CPU and Memory allocation. Adds 'MAX_MB_PER_MESSAGE' and 'DOWNLOAD_CONTENT' environment variables.
| "Required": false | ||
| }, | ||
| { | ||
| "Name": "DOWNLOAD_CONTENT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are missing POLL_INTERVAL_SECONDS
| "DefaultValue": "False", | ||
| "Required": false | ||
| } | ||
| ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no longer S3_FILE_FORMAT parameter
| "Description": "The type of file compression used for the files", | ||
| "DefaultValue": "gzip", | ||
| "Required": true | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S3_FOLDER_PREFIX is wrong
Refactors S3 source configuration by renaming `S3_FOLDER_PATH` to `S3_FOLDER_PREFIX` and making it optional. Removes the required `S3_FILE_FORMAT` and `S3_FILE_COMPRESSION` parameters. Updates default value for `MAX_MB_PER_MESSAGE` and `DOWNLOAD_CONTENT`. Adds `POLL_INTERVAL_SECONDS` for configuring poll frequency.
This commit introduces a new S3 source connector that monitors an S3 bucket for new files and streams their content to a Kafka topic.
The connector supports:
This replaces the previous S3 source implementation with a custom file watcher for more flexibility and control over the data ingestion process.