generated from amazon-archives/__template_MIT-0
-
Notifications
You must be signed in to change notification settings - Fork 40
Benefits & Limitations of Spark on AWS Lambda
Patrick Muller edited this page Apr 3, 2023
·
6 revisions
-
Cost effective: Tested workloads from 50MB to 300MB per payload on the Spark on Lambda Framework. It consumes 1.5 to 2GB AWS Lambda Memory and takes under a 1 min.
-
Faster Startup time : The Spark start up is improved and it is few seconds compared to a cluster solutions. Some of simple testing showed performance of 30s to 1 min for batch loads.
- Ideal for streaming : With faster startup time, it becomes a good candidate for streaming workloads. The batch size of the workloads can controlled using the Kinesis/Kafka Lambda triggers.
- File size : We stress tested the file up to 1 GB and it work well, but keep is conservative upto 500MB per file. Chunking the big file >1 GB an sending it to the AWS Lambda will allow you process bigger dataset.
- Kafka AWS Lambda trigger has a limitation of 6 MB per payload. Please ensure that you configure the batch size in the trigger to control the volume.
- As of now, Single AWS Lambda execution is tested and functional. Triggering multiple AWS Lambda containers and writing to Apache HUDI is pending.
- The event parameter from AWS Lambda to be passed into the Spark script as parameters.
- We still need to test concurrent processing with multiple Lambdas running simultaneously.