Python example file's data location does not meet Lambda's expectation

I am using the Python example [python/ml/kmeans_example.py](https://github.com/qubole/spark-on-lambda/blob/lambda-2.1.0/examples/src/main/python/ml/kmeans_example.py). This file has a hard-coded path 'data/mllib/sample_kmeans_data.txt'.

Now when I run `./bin/spark-submit --master lambda://test examples/src/main/python/ml/kmeans_example.py` under the driver folder, Spark's log shows `java.io.FileNotFoundException: File file:/home/ec2-user/driver/data/mllib/sample_kmeans_data.txt does not exist`.

I was told that data file location string needs to be consistent between Lambda and Spark. [Your Lambda code](https://github.com/qubole/spark-on-lambda/blob/lambda-2.1.0/bin/lambda/spark-lambda-os.py) expects data file to be somewhere under `/tmp/lambda`, I looked at what actually was under `/tmp/lambda`. There was a `spark` folder. So my work-around was to create a temporary `/tmp/lambda/spark/data/mllib/` under my EC2, move my data file there, and then point to that file in `spark.read`. Specifically I changed line 42 to

```
    import os
    data_folder = '/home/ec2-user/driver/data/mllib'
    lambda_folder = '/tmp/lambda/spark/data/mllib'
    filename = 'sample_kmeans_data.txt'
    os.system('mkdir -p ' + lambda_folder)
    os.system('cp {}/{} {}/{}'.format(data_folder, filename, lambda_folder, filename))
    dataset = spark.read.format("libsvm").load('{}/{}'.format(lambda_folder, filename))
```

And then it worked fine.

I suppose that part or many Python files has this problem, so it can be a barrier for python users.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Python example file's data location does not meet Lambda's expectation #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Python example file's data location does not meet Lambda's expectation #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions