Scala是一種多範式的編程語言,其設計的初衷是要集成面向對象編程和函數式編程的各種特性。Scala運行於Java平台(Java虛擬機)之上,併兼容現有的Java程序。因此,要安裝Scala環境之前,首先需要安裝Java的JDK。學習Scala編程語言,將為後續學習Spark和Flink奠定基礎。視頻講解如下:
https://www.bilibili.com/video/BV1wdUWYeEcS/?aid=113503537076...
下面的代碼展示了在Spark中如何使用Scala開發一個WordCount程序。
package demo
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.log4j.Logger
import org.apache.log4j.Level
object WordCountDemo {
def main(args: Array[String]): Unit = {
Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)
Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
//本地模式
//val conf = new SparkConf().setAppName("WordCountDemo").setMaster("local")
//集羣模式
val conf = new SparkConf().setAppName("WordCountDemo")
//創建SparkContext
val sc = new SparkContext(conf)
val result = sc.textFile("hdfs://bigdata111:9000/input/data.txt")
.flatMap(_.split(" "))
.map((_,1))
.reduceByKey(_+_)
//輸出到屏幕
result.collect.foreach(println)
//輸出到HDFS
result.saveAsTextFile("hdfs://bigdata111:9000/output/spark/wc")
sc.stop
}
}
在Flink中也可以使用Scala編程語言,下面的代碼也將在Flink中執行一個WordCount程序。
package demo
import org.apache.flink.api.scala._
object WordCount {
def main(args: Array[String]) {
val env = ExecutionEnvironment.getExecutionEnvironment
val text = env.fromElements("I love Beijing","I love China",
"Beijing is the capital of China")
val counts = text.flatMap(_.split(" ")).map((_,1)).groupBy(0).sum(1)
counts.print()
}
}