Skip to content

Conversation

@JotaBlanco
Copy link
Collaborator

This commit introduces a new S3 source connector that monitors an S3 bucket for new files and streams their content to a Kafka topic.

The connector supports:

  • Configuration via environment variables
  • File monitoring with configurable polling interval
  • Optional content download and chunking for large files
  • Metadata extraction for file information

This replaces the previous S3 source implementation with a custom file watcher for more flexibility and control over the data ingestion process.

This commit introduces a new S3 source connector that monitors an S3 bucket for new files and streams their content to a Kafka topic.

The connector supports:
- Configuration via environment variables
- File monitoring with configurable polling interval
- Optional content download and chunking for large files
- Metadata extraction for file information

This replaces the previous S3 source implementation with a custom file watcher for more flexibility and control over the data ingestion process.
Updates the S3 source configuration to remove the app.yaml file and consolidate settings into the library.json.

Configures the deployment type to 'Service' and increases the CPU and Memory allocation.
Adds 'MAX_MB_PER_MESSAGE' and 'DOWNLOAD_CONTENT' environment variables.
"Required": false
},
{
"Name": "DOWNLOAD_CONTENT",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are missing POLL_INTERVAL_SECONDS

"DefaultValue": "False",
"Required": false
}
],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no longer S3_FILE_FORMAT parameter

"Description": "The type of file compression used for the files",
"DefaultValue": "gzip",
"Required": true
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3_FOLDER_PREFIX is wrong

Refactors S3 source configuration by renaming `S3_FOLDER_PATH` to `S3_FOLDER_PREFIX` and making it optional.

Removes the required `S3_FILE_FORMAT` and `S3_FILE_COMPRESSION` parameters.

Updates default value for `MAX_MB_PER_MESSAGE` and `DOWNLOAD_CONTENT`.

Adds `POLL_INTERVAL_SECONDS` for configuring poll frequency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants