Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
Spark What is it ?
Open Source Alternative to Map Reduce for certain applications A low latency cluster computing system For very large data sets May be 100 times faster than Map Reduce for
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
Uses in memory cluster computing Memory access faster than disk access Has API's written in
Can be accessed from Scala and Python shells Currently an Apache incubator project
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
Spark Benefits
Scales to very large clusters Uses in memory processing for increased speed High Level API's
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
Spark Tuning
CPU, memory or network bandwidth Java ObjectOutputStream vs Kryo Use primitive types Set JVM Flags Store objects in serialized form i.e.
Memory Tuning
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
Spark Examples
Example from spark-project.org, Spark job in Scala. Showing a simple text count from a system log.
/*** SimpleJob.scala ***/ import spark.SparkContext import SparkContext._ object SimpleJob { def main(args: Array[String]) { val logFile = "/var/log/syslog" // Should be some file on your system val sc = new SparkContext("local", "Simple Job", "$YOUR_SPARK_HOME", List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar")) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line => line.contains("a")).count() val numBs = logData.filter(line => line.contains("b")).count() println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
Contact Us
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
We offer IT project consultancy We are happy to hear about your problems You can just pay for those hours that you need To solve your problems