Sei sulla pagina 1di 7

Apache Spark

What is it ? How does it work ? Benefits Tuning Examples

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Spark What is it ?

Open Source Alternative to Map Reduce for certain applications A low latency cluster computing system For very large data sets May be 100 times faster than Map Reduce for

Iterative algorithms Interactive data mining

Used with Hadoop / HDFS Released under BSD License

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Spark How does it work ?

Uses in memory cluster computing Memory access faster than disk access Has API's written in

Scala Java Python

Can be accessed from Scala and Python shells Currently an Apache incubator project

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Spark Benefits

Scales to very large clusters Uses in memory processing for increased speed High Level API's

Java, Scala, Python

Low latency shell access

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Spark Tuning

Bottlenecks can occur in the cluster via

CPU, memory or network bandwidth Java ObjectOutputStream vs Kryo Use primitive types Set JVM Flags Store objects in serialized form i.e.

Tune data serialization method i.e.

Memory Tuning

RDD Persistence MEMORY_ONLY_SER

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Spark Examples
Example from spark-project.org, Spark job in Scala. Showing a simple text count from a system log.
/*** SimpleJob.scala ***/ import spark.SparkContext import SparkContext._ object SimpleJob { def main(args: Array[String]) { val logFile = "/var/log/syslog" // Should be some file on your system val sc = new SparkContext("local", "Simple Job", "$YOUR_SPARK_HOME", List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar")) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line => line.contains("a")).count() val numBs = logData.filter(line => line.contains("b")).count() println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Contact Us

Feel free to contact us at


www.semtech-solutions.co.nz info@semtech-solutions.co.nz

We offer IT project consultancy We are happy to hear about your problems You can just pay for those hours that you need To solve your problems

Potrebbero piacerti anche