Home > Java > javaTutorial > How to Add JAR Files to Spark Jobs with spark-submit?

How to Add JAR Files to Spark Jobs with spark-submit?

Patricia Arquette
Release: 2024-11-15 00:25:02
Original
1007 people have browsed it

How to Add JAR Files to Spark Jobs with spark-submit?

Adding JAR Files to Spark Jobs with spark-submit

Ambiguous Details

The following details were previously unclear or omitted in documentation:

  • ClassPath: --driver-class-path and --conf spark.driver.extraClassPath affect the Driver classpath, while --conf spark.executor.extraClassPath affects the Executor classpath.
  • Separation Character: Linux uses colon (:), while Windows uses semicolon (;).
  • Distribution:

    • Client mode: JARs are distributed through HTTP by a server on the Driver node.
    • Cluster mode: JARs must be manually made available to Worker nodes via HDFS or similar.
  • URIs: "file:/" scheme is served by the Driver HTTP server, while "hdfs", "http", "ftp" pull files directly from the URI. "local:/" assumes files are already on each Worker node.
  • File Location: JARs are copied to the working directory on each Worker node (usually /var/run/spark/work).

Affected Options

Options with precedence from highest to lowest:

  1. SparkConf properties set directly in code
  2. Flags passed to spark-submit
  3. Options in spark-defaults.conf

Option Analysis

  • --jars vs SparkContext.addJar: These are equivalent for adding JAR dependencies.
  • SparkContext.addJar vs SparkContext.addFile: addJar for dependencies, addFile for arbitrary files.
  • DriverClassPath Options: --driver-class-path and --conf spark.driver.extraClassPath are aliases.
  • DriverLibraryPath Options: --driver-library-path and --conf spark.driver.extraLibraryPath are aliases, representing java.library.path.
  • Executor ClassPath: --conf spark.executor.extraClassPath for dependencies.
  • Executor Library Path: --conf spark.executor.extraLibraryPath for JVM library path.

Safe Practice for Adding JAR Files

For simplicity in Client mode, it is safe to use all three main options together:

spark-submit --jars additional1.jar,additional2.jar \
  --driver-class-path additional1.jar:additional2.jar \
  --conf spark.executor.extraClassPath=additional1.jar:additional2.jar \
  --class MyClass main-application.jar
Copy after login

In Cluster mode, external JARs should be added manually to Worker nodes via HDFS.

The above is the detailed content of How to Add JAR Files to Spark Jobs with spark-submit?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template