安裝 spark (Spark Shell、IntelliJ IDEA)

Posted by Tim Lin on 2019-02-22

最近開始要做 AI 相關領域, 首先先從 Spark 和 Scala 開始

Spark Shell

下載最新版的 Spark

https://spark.apache.org/downloads.html

我是放在
D:\spark\spark-2.4.0-bin-hadoop2.7\spark-2.4.0-bin-hadoop2.7\bin\spark-shell.bat

設個環境變數

path 也加一下

developing in IntelliJ Idea

安裝 scala

主畫面進入pligin

搜尋 scala

開一個新專案


scala 要用 2.11.x 的版本

https://spark.apache.org/docs/latest/index.html

Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.4.0 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x).

設定 lib dependencies


以後每次遇到 build.sbt 更新後自動導入相依。

等 “dump project structure from sbt” 的操作全部執行結束, 在建立 scala class

在 src/main/scala/下 建立 scala class

在 WordCount.scala 中貼上程式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.log4j.{Level,Logger}

object WordCount {
def main(args: Array[String]) {

Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
val inputFile = "D:/workspace/scala/word.txt"
val conf = new SparkConf().setAppName("WordCount").setMaster("local[2]")
val sc = new SparkContext(conf)
val textFile = sc.textFile(inputFile)
val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
wordCount.foreach(println)
}
}

建立文字檔和一些內容在 “D:/workspace/scala/word.txt”

執行

有 error

error 1

1
2
ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

Could not locate executable null\bin\winutils.exe in the hadoop binary

You will need to follow the next steps:

  1. Download winutils.exe
  2. Create folder, say C:\winutils\bin
  3. Copy winutils.exe inside C:\winutils\bin
  4. Set environment variable HADOOP_HOME to C:\winutils

error 2

1
2
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: 
Input path does not exist: file:/D:/workspace/scala/word.txt

原本路徑放這樣

1
val inputFile =  "D:\\workspace\\scala\\word.txt"


後來放在目錄下…

1
val inputFile =  "word.txt"

這樣就行

Reference

Setup Spark Development Environment – IntelliJ and Scala

使用Intellij Idea編寫Spark應用程序(Scala+SBT