Adding JAR Files to Spark Jobs with spark-submit
Ambiguous Details
The following details were previously unclear or omitted in documentation:
-
ClassPath: --driver-class-path and --conf spark.driver.extraClassPath affect the Driver classpath, while --conf spark.executor.extraClassPath affects the Executor classpath.
-
Separation Character: Linux uses colon (:), while Windows uses semicolon (;).
-
Distribution:
- Client mode: JARs are distributed through HTTP by a server on the Driver node.
- Cluster mode: JARs must be manually made available to Worker nodes via HDFS or similar.
-
URIs: "file:/" scheme is served by the Driver HTTP server, while "hdfs", "http", "ftp" pull files directly from the URI. "local:/" assumes files are already on each Worker node.
-
File Location: JARs are copied to the working directory on each Worker node (usually /var/run/spark/work).
Affected Options
Options with precedence from highest to lowest:
- SparkConf properties set directly in code
- Flags passed to spark-submit
- Options in spark-defaults.conf
Option Analysis
-
--jars vs SparkContext.addJar: These are equivalent for adding JAR dependencies.
-
SparkContext.addJar vs SparkContext.addFile: addJar for dependencies, addFile for arbitrary files.
-
DriverClassPath Options: --driver-class-path and --conf spark.driver.extraClassPath are aliases.
-
DriverLibraryPath Options: --driver-library-path and --conf spark.driver.extraLibraryPath are aliases, representing java.library.path.
-
Executor ClassPath: --conf spark.executor.extraClassPath for dependencies.
-
Executor Library Path: --conf spark.executor.extraLibraryPath for JVM library path.
Safe Practice for Adding JAR Files
For simplicity in Client mode, it is safe to use all three main options together:
spark-submit --jars additional1.jar,additional2.jar \
--driver-class-path additional1.jar:additional2.jar \
--conf spark.executor.extraClassPath=additional1.jar:additional2.jar \
--class MyClass main-application.jar
Copy after login
In Cluster mode, external JARs should be added manually to Worker nodes via HDFS.
The above is the detailed content of How to Add JAR Files to Spark Jobs with spark-submit?. For more information, please follow other related articles on the PHP Chinese website!