Skip to content

Benefits & Limitations of Spark on AWS Lambda

Patrick Muller edited this page Apr 3, 2023 · 6 revisions

Benefits:-

Batch Workloads

  • Cost effective: Tested workloads from 50MB to 300MB per payload on the Spark on Lambda Framework. It consumes 1.5 to 2GB AWS Lambda Memory and takes under a 1 min.

  • Faster Startup time : The Spark start up is improved and it is few seconds compared to a cluster solutions. Some of simple testing showed performance of 30s to 1 min for batch loads.

Streaming Workloads

  • Ideal for streaming : With faster startup time, it becomes a good candidate for streaming workloads. The batch size of the workloads can controlled using the Kinesis/Kafka Lambda triggers.

Limitations:-

Batch Workloads

  • File size : We stress tested the file up to 1 GB and it work well, but keep is conservative upto 500MB per file. Chunking the big file >1 GB an sending it to the AWS Lambda will allow you process bigger dataset.

Streaming Workloads:

  • Kafka AWS Lambda trigger has a limitation of 6 MB per payload. Please ensure that you configure the batch size in the trigger to control the volume.
  • As of now, Single AWS Lambda execution is tested and functional. Triggering multiple AWS Lambda containers and writing to Apache HUDI is pending.
  • The event parameter from AWS Lambda to be passed into the Spark script as parameters.
  • We still need to test concurrent processing with multiple Lambdas running simultaneously.
Clone this wiki locally