Skip to content

Speed up Oozie Spark example #22

@pregazzoni

Description

@pregazzoni

In order for oozie spark job to run in Yarn we need the spark-assembly.jar to be in job path. Right now we get the jar for the cluster (webhdfs) and then put (webhdfs) it into the $jobDir/lib directory. This takes over few minutes.

Another way would be too have the lib in the oozie shared lib directory by default.

As oozie, you can do:

# Copy spark-assembly jar to Oozie shared lib directory
hdfs dfs -put /usr/iop/current/spark-client/lib/spark-assembly.jar /user/oozie/share/lib/lib_20160805191701/spark/.

# Set oozie environment
source /usr/iop/current/oozie-client/bin/oozie-env.sh
export OOZIE_URL=http://<replace with oozie node>:11000/oozie

# Update shared lib
oozie admin -sharelibupdate

Once this is done, there is no need to put the jar under $jobDir/lib as it will be automatically picked from the oozie shared lib.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions