如果想要在主機上自動跑排程, 就得先包成 fat jar, 透過 spark-submit 來跑
什麼是 fat jar ?
What is an uber(fat) jar
defined as one that contains both your package and all its dependencies in one single JAR file.
先安裝 sbt
windows:
https://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Windows.html
linux:
https://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html
設定 sbt assembly
參考這篇
Creating Scala Fat Jars for Spark on SBT with sbt-assembly Plugin
在 src/project
多加一個 assembly.sbt
, 內容如下
1 | addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.9") |
把 build.sbt
之前的內容改成
1 | lazy val root = (project in file(".")). |
到 cmd, 專案根目錄下跑 (此例是 D:\workspace\scala\ScalaSBTTest)
1 | sbt assembly |
但此時會噴錯…
1 | "deduplicate: different file contents found in the following:"... |
參考這篇
Spark 2: “deduplicate: different file contents found in the following:”
把 build.sbt
加上排除的策略
1 | assemblyMergeStrategy in assembly := { |
再跑一次
1 | sbt assembly |
打包成功
產在 D:\workspace\scala\ScalaSBTTest\target\scala-2.11\ScalaSBTTest-assembly-1.0.jar
src code:
https://github.com/timmyBeef/SparkSbtAssemblyDemo.git
run by spark-submit
直接到 D:\workspace\scala\ScalaSBTTest\target\scala-2.11 這跑看看
D:\workspace\scala\ScalaSBTTest\target\scala-2.11>spark-submit ScalaSBTTest-assembly-1.0.jar
不意外, 噴錯了, 因為還沒放 word.txt
補上 word.txt
有成功印出結果, 但temp檔案刪不掉…
What are key differences between sbt-pack and sbt-assembly?
sbt-assembly
sbt-assembly creates a fat JAR - a single JAR file containing all class files from your code and libraries. By evolution, it also contains ways of resolving conflicts when multiple JARs provide the same file path (like config or README file). It involves unzipping of all library JARs, so it’s a bit slow, but these are heavily cached.
sbt-pack
sbt-pack keeps all the library JARs intact, moves them into target/pack directory (as opposed to ivy cache where they would normally live), and makes a shell script for you to run them.
sbt-native-packager
sbt-native-packager is similar to sbt-pack but it was started by a sbt committer Josh Suereth, and now maintained by highly capable Nepomuk Seiler (also known as muuki88). The plugin supports a number of formats like Windows msi file and Debian deb file. The recent addition is a support for Docker images.
All are viable means of creating deployment images. In certain cases like deploying your application to a web framework etc., it might make things easier if you’re dealing with one file as opposed to a dozen.