Benvenuto in Scribd!

Spark Details

Caricato da

Il 0% ha trovato utile questo documento (0 voti)

67 visualizzazioni11 pagine

Spark is an expressive computing system that facilitates in-memory computing to avoid storing intermediate results to disk. It introduces the RDD abstraction of partitioned and distributed datasets that can be cached in memory across a cluster. RDDs support transformations and actions, where transformations build new RDDs and actions trigger execution by returning values or exporting data. Jobs are executed through a DAG scheduler and task scheduler to optimize partitioning and execution across worker nodes.

Descrizione originale:

spark

Titolo originale

spark details

Copyright

Formati disponibili

PPTX, PDF, TXT o leggi online da Scribd

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Segnala questo documento

Copyright:

Formati disponibili

Scarica in formato PPTX, PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Il 0% ha trovato utile questo documento (0 voti)

67 visualizzazioni11 pagine

Spark Details

Caricato da

sarvesh_mishra

Copyright:

Formati disponibili

Scarica in formato PPTX, PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Salta alla pagina

Sei sulla pagina 1di 11

Cerca all'interno del documento

Spark

Spark ideas
• expressive computing system, not limited to
map-reduce model
• facilitate system memory
– avoid saving intermediate results to disk
– cache data for repetitive queries (e.g. for machine
learning)
• compatible with Hadoop
RDD abstraction
• Resilient Distributed Datasets
• partitioned collection of records
• spread across the cluster
• read-only
• caching dataset in memory
– different storage levels available
– fallback to disk possible
RDD operations
• transformations to build RDDs through
deterministic operations on other RDDs
– transformations include map, filter, join
– lazy operation
• actions to return value or export data
– actions include count, collect, save
– triggers execution
Job example
Driver
val log = sc.textFile(“hdfs://...”)
val errors = file.filter(_.contains(“ERROR”))
errors.cache()
Action!
errors.filter(_.contains(“I/O”)).count()
errors.filter(_.contains(“timeout”)).count()

Worker Worker Worker

Cache1 Cache2 Cache2

Block1 Block2 Block3

RDD partition-level view
Dataset-level view: Partition-level view:

log:
HadoopRDD
path = hdfs://...

errors:
FilteredRDD
func = _.contains(…)
shouldCache = true
Task 1 Task 2 ...

source: https://cwiki.apache.org/confluence/display/SPARK/Spark+Internals
Job scheduling
RDD Objects DAGScheduler TaskScheduler Worker
Cluster Threads
DAG TaskSet manager Task Block
manager

rdd1.join(rdd2) split graph into launch tasks via execute tasks

.groupBy(…)
.filter(…)
stages of tasks cluster manager
submit each retry failed or store and serve
build operator DAG
stage as ready straggling tasks blocks

source: https://cwiki.apache.org/confluence/display/SPARK/Spark+Internals
Available APIs
• You can write in Java, Scala or Python
• interactive interpreter: Scala & Python only
• standalone applications: any
• performance: Java & Scala are faster thanks to
static typing
Hand on - interpreter
• script
http://cern.ch/kacper/spark.txt

• run scala spark interpreter

$ spark-shell

• or python interpreter
$ pyspark
Hand on – build and submission
• download and unpack source code
wget http://cern.ch/kacper/GvaWeather.tar.gz; tar -xzf GvaWeather.tar.gz
• build definition in
GvaWeather/gvaweather.sbt

• source code
GvaWeather/src/main/scala/GvaWeather.scala

• building
cd GvaWeather
sbt package

• job submission
spark-submit --master local --class GvaWeather \
target/scala-2.10/gva-weather_2.10-1.0.jar
Summary
• concept not limited to single pass map-reduce
• avoid soring intermediate results on disk or
HDFS
• speedup computations when reusing datasets

Potrebbero piacerti anche

Professional Hadoop Solutions
Da Everand
Professional Hadoop Solutions
Boris Lublinsky
Valutazione: 4 su 5 stelle
4/5 (2)
Overview
Documento25 pagine
Overview
sarvesh_mishra
Nessuna valutazione finora
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Da Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
Nessuna valutazione finora
Big Data Hadoop Certification Training: About Intellipaat
Documento13 pagine
Big Data Hadoop Certification Training: About Intellipaat
Vinay Nagnath Jokare
Nessuna valutazione finora
Getting Started with Big Data Query using Apache Impala
Da Everand
Getting Started with Big Data Query using Apache Impala
Agus Kurniawan
Nessuna valutazione finora
PySpark+Slides v1
Documento458 pagine
PySpark+Slides v1
ravikumar lanka
Nessuna valutazione finora
Kelly Hadoop Hyd May 2018
Documento14 pagine
Kelly Hadoop Hyd May 2018
dilip kumar
Nessuna valutazione finora
SAmple Hadoop
Documento7 pagine
SAmple Hadoop
Mottu2003
Nessuna valutazione finora
Hands On
Documento26 pagine
Hands On
Ashok Kumar K R
Nessuna valutazione finora
Homework Labs Lecture01
Documento9 pagine
Homework Labs Lecture01
Episode Unlocker
Nessuna valutazione finora
Mix Hadoop Developer Interview Questions
Documento3 pagine
Mix Hadoop Developer Interview Questions
Amit Kumar
Nessuna valutazione finora
HOL Hive PDF
Documento23 pagine
HOL Hive PDF
Kishore Kumar
Nessuna valutazione finora
Hadoop Architecture Exercise
Documento24 pagine
Hadoop Architecture Exercise
pav20021
Nessuna valutazione finora
1 Apache Zookeeper
Documento7 pagine
1 Apache Zookeeper
atuf
Nessuna valutazione finora
Inndata Analytics PVT LTD December 2016 - Present: E-Mail: Phone
Documento3 pagine
Inndata Analytics PVT LTD December 2016 - Present: E-Mail: Phone
preeti d
Nessuna valutazione finora
Mohit BigData 5yr
Documento3 pagine
Mohit BigData 5yr
shreya arun
Nessuna valutazione finora
Bigdata Hadoop Spark - Python
Documento8 pagine
Bigdata Hadoop Spark - Python
Rishiraj Paul
Nessuna valutazione finora
Create An Spark Streaming App: 1. Architecture and Abstraction
Documento8 pagine
Create An Spark Streaming App: 1. Architecture and Abstraction
Ngô Hoàng
Nessuna valutazione finora
Course Contents of Hadoop and Big Data
Documento11 pagine
Course Contents of Hadoop and Big Data
rahulsse
Nessuna valutazione finora
Spark Sample Resume 2
Documento7 pagine
Spark Sample Resume 2
pupscribd
100% (1)
Hadoop Training VM: 1 Download Virtual Box
Documento3 pagine
Hadoop Training VM: 1 Download Virtual Box
Miguel Angel Hernández Ruiz
Nessuna valutazione finora
Hadoop Training Institute in Hyderabad
Documento8 pagine
Hadoop Training Institute in Hyderabad
OrienIt Orienit
Nessuna valutazione finora
Deepak Professional Summary
Documento3 pagine
Deepak Professional Summary
aras4mavis1932
Nessuna valutazione finora
Dice Resume CV Yamini Vakula
Documento5 pagine
Dice Resume CV Yamini Vakula
harsh
Nessuna valutazione finora
Spark Interview 4
Documento10 pagine
Spark Interview 4
consania
Nessuna valutazione finora
2 HDFS Commands
Documento7 pagine
2 HDFS Commands
VIPUL GUPTA
Nessuna valutazione finora
STUTI - GUPTA Hadoop Resume PDF
Documento2 pagine
STUTI - GUPTA Hadoop Resume PDF
Noble kumar
Nessuna valutazione finora
HBase Interview Questions
Documento12 pagine
HBase Interview Questions
pooh06
Nessuna valutazione finora
Resume
Documento4 pagine
Resume
shekhar
Nessuna valutazione finora
Real Time Hadoop Interview Questions From Various Interviews
Documento6 pagine
Real Time Hadoop Interview Questions From Various Interviews
Saurabh Gupta
Nessuna valutazione finora
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
Documento74 pagine
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
Ashita Punjabi
Nessuna valutazione finora
Hadoop and Java Ques - Ans
Documento222 pagine
Hadoop and Java Ques - Ans
ravi
Nessuna valutazione finora
Hadoop I/O: Jaeyong Choi
Documento36 pagine
Hadoop I/O: Jaeyong Choi
Manognya Reddy
Nessuna valutazione finora
Dzone Apache Hadoop Deployment
Documento7 pagine
Dzone Apache Hadoop Deployment
VernFWK
Nessuna valutazione finora
HDFS Architecture
Documento47 pagine
HDFS Architecture
krishan Goyal
Nessuna valutazione finora
Hadoop Interview Questions Faq
Documento14 pagine
Hadoop Interview Questions Faq
mihirhota
Nessuna valutazione finora
Hadoop-Oozie User Material
Documento183 pagine
Hadoop-Oozie User Material
rahulneel
Nessuna valutazione finora
Apache Spark Installation
Documento4 pagine
Apache Spark Installation
Harshit Sinha
Nessuna valutazione finora
Apache Kafka Installation
Documento3 pagine
Apache Kafka Installation
surendra yandra
Nessuna valutazione finora
Hadoop Admin Interview Question and Answers
Documento5 pagine
Hadoop Admin Interview Question and Answers
Vivek Kushwaha
Nessuna valutazione finora
Hive Function Cheat Sheet
Documento1 pagina
Hive Function Cheat Sheet
Vikas Srivastava
Nessuna valutazione finora
Mining Data Streams
Documento67 pagine
Mining Data Streams
usha
Nessuna valutazione finora
Big Data Introduction PDF
Documento180 pagine
Big Data Introduction PDF
valtech20086605
Nessuna valutazione finora
Facebook Hive POC
Documento18 pagine
Facebook Hive POC
Jayashree Ravi
Nessuna valutazione finora
S MapReduce Types Formats Features 03
Documento16 pagine
S MapReduce Types Formats Features 03
Ashwin Ajmera
Nessuna valutazione finora
Hadoop Big Data Administration
Documento6 pagine
Hadoop Big Data Administration
dsunte
Nessuna valutazione finora
Hadoop Distributed File System (HDFS) : Suresh Pathipati
Documento43 pagine
Hadoop Distributed File System (HDFS) : Suresh Pathipati
Kancharla
Nessuna valutazione finora
Hadoop Interview Questions
Documento2 pagine
Hadoop Interview Questions
moby
Nessuna valutazione finora
Gcloud Python
Documento398 pagine
Gcloud Python
anonymous_9888
Nessuna valutazione finora
Bigdata Notes
Documento26 pagine
Bigdata Notes
Anil Yarlagadda
Nessuna valutazione finora
BD - Unit - III - MapReduce
Documento31 pagine
BD - Unit - III - MapReduce
Prem Kumar
Nessuna valutazione finora
CCD-410 Cloudera Hadoop Certification Questions
Documento8 pagine
CCD-410 Cloudera Hadoop Certification Questions
Selvarajaguru Ramaswamy
Nessuna valutazione finora
13 - m1 - Linux Basic Commands - Edureka VM PDF
Documento3 pagine
13 - m1 - Linux Basic Commands - Edureka VM PDF
Sahjaada Ankeet
Nessuna valutazione finora
Hadoop
Documento7 pagine
Hadoop
Amaleswar
Nessuna valutazione finora
Hadoop Interview Questions
Documento14 pagine
Hadoop Interview Questions
satish.sathya.a2012
Nessuna valutazione finora
Big Data Masters Program
Documento13 pagine
Big Data Masters Program
Arun Singh
Nessuna valutazione finora
Hadoop Module 3.2
Documento57 pagine
Hadoop Module 3.2
Sainath Reddy
Nessuna valutazione finora
Sampath Polishetty BigData Consultant
Documento7 pagine
Sampath Polishetty BigData Consultant
Sampath Polishetty
Nessuna valutazione finora
Cloudera Administrator Training For Apache Hadoop PDF
Documento2 pagine
Cloudera Administrator Training For Apache Hadoop PDF
Rocky
50% (2)
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
Documento7 pagine
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
nalinbhatt
Nessuna valutazione finora
Python C
Documento108 pagine
Python C
sarvesh_mishra
Nessuna valutazione finora
Interview Questions Big Data Analytics
Documento27 pagine
Interview Questions Big Data Analytics
Senthil Kumar
Nessuna valutazione finora
Think CSpy
Documento288 pagine
Think CSpy
api-3713317
Nessuna valutazione finora
Unix Important Command
Documento44 pagine
Unix Important Command
rraghuram64
Nessuna valutazione finora
TK Inter
Documento168 pagine
TK Inter
German Martinez Solis
Nessuna valutazione finora
Big Data Capacity Planning
Documento7 pagine
Big Data Capacity Planning
sarvesh_mishra
Nessuna valutazione finora
Strata Spark Streaming
Documento40 pagine
Strata Spark Streaming
sarvesh_mishra
Nessuna valutazione finora
AdvancedBooks - Python Wiki
Documento104 pagine
AdvancedBooks - Python Wiki
sarvesh_mishra
0% (1)
Python Crash Course 0.07 PDF
Documento68 pagine
Python Crash Course 0.07 PDF
Matheus Rodrigues Borba
Nessuna valutazione finora
Sed - An Introduction and Tutorial
Documento51 pagine
Sed - An Introduction and Tutorial
sarvesh_mishra
Nessuna valutazione finora
Work With Strings With Stringr::: Cheat Sheet
Documento2 pagine
Work With Strings With Stringr::: Cheat Sheet
hiqma
Nessuna valutazione finora
Fifty Examples
Documento49 pagine
Fifty Examples
sarvesh_mishra
Nessuna valutazione finora
Hadoop Spark
Documento31 pagine
Hadoop Spark
sarvesh_mishra
Nessuna valutazione finora
Clustering Basics 1
Documento53 pagine
Clustering Basics 1
sarvesh_mishra
Nessuna valutazione finora
2016 07 21 Godil Presentation
Documento47 pagine
2016 07 21 Godil Presentation
sarvesh_mishra
Nessuna valutazione finora
Model in R
Documento75 pagine
Model in R
sarvesh_mishra
Nessuna valutazione finora
02 Math Essentials File
Documento55 pagine
02 Math Essentials File
HemaNath
Nessuna valutazione finora
13b Neural Networks 1
Documento24 pagine
13b Neural Networks 1
sarvesh_mishra
Nessuna valutazione finora
Log Linear Models and Logistic Regression Springer Texts in Statistics
Documento33 pagine
Log Linear Models and Logistic Regression Springer Texts in Statistics
sarvesh_mishra
Nessuna valutazione finora
Classification Basic Concepts, Decision Trees, and Model Evaluation
Documento67 pagine
Classification Basic Concepts, Decision Trees, and Model Evaluation
sarvesh_mishra
Nessuna valutazione finora
2015jan Ggplot2koffman
Documento79 pagine
2015jan Ggplot2koffman
mindlinjas
Nessuna valutazione finora
13b Neural Networks 1
Documento24 pagine
13b Neural Networks 1
sarvesh_mishra
Nessuna valutazione finora
The Scala Programming Language: Presented by Donna Malayeri
Documento25 pagine
The Scala Programming Language: Presented by Donna Malayeri
sarvesh_mishra
Nessuna valutazione finora
Intro To RAML - The RESTful API Modeling Language - Baeldung
Documento10 pagine
Intro To RAML - The RESTful API Modeling Language - Baeldung
sarvesh_mishra
Nessuna valutazione finora
Cloud Era Csu La 11122012
Documento50 pagine
Cloud Era Csu La 11122012
sarvesh_mishra
Nessuna valutazione finora
Cloud Era Csu La 11122012
Documento50 pagine
Cloud Era Csu La 11122012
sarvesh_mishra
Nessuna valutazione finora
Hive User Meeting March 2010 Cloudera Quick Start 100325151728 Phpapp01
Documento36 pagine
Hive User Meeting March 2010 Cloudera Quick Start 100325151728 Phpapp01
Faruk Berksöz
Nessuna valutazione finora
Had Oop Excercises
Documento12 pagine
Had Oop Excercises
sarvesh_mishra
Nessuna valutazione finora
Boot Sector Games
Documento100 pagine
Boot Sector Games
Ionas F
Nessuna valutazione finora
FS2004 - Manual Real Xtreme
Documento68 pagine
FS2004 - Manual Real Xtreme
paulo_schneider
Nessuna valutazione finora
Ahmedabad Institute of Technology CE-IT Department Assignment
Documento2 pagine
Ahmedabad Institute of Technology CE-IT Department Assignment
keli
100% (1)
Ramon Almerco Samira Noelia Codemix
Documento25 pagine
Ramon Almerco Samira Noelia Codemix
Karina Morales
Nessuna valutazione finora
9 Reasons File Geodatabase
Documento4 pagine
9 Reasons File Geodatabase
Hadiwibowo Bowie
Nessuna valutazione finora
DNS Probe Finished Nxdomain Error
Documento14 pagine
DNS Probe Finished Nxdomain Error
jack ryan
Nessuna valutazione finora
Image in Mail Body
Documento3 pagine
Image in Mail Body
Ricky Das
Nessuna valutazione finora
Install Postgresql
Documento7 pagine
Install Postgresql
Lord_King
Nessuna valutazione finora
The POJO Antipattern and Data-Centric Design - Antonyh
Documento4 pagine
The POJO Antipattern and Data-Centric Design - Antonyh
GODISNOWHERE
Nessuna valutazione finora
BB 91 SafeAssign Student Guide
Documento8 pagine
BB 91 SafeAssign Student Guide
johnalis22
Nessuna valutazione finora
Business Requirements and System Requirements
Documento3 pagine
Business Requirements and System Requirements
api-3756170
Nessuna valutazione finora
Cloud Computing Notes
Documento21 pagine
Cloud Computing Notes
H117 Survase Krutika
Nessuna valutazione finora
Director VP Engineering Product Management in New York City Resume Ofir Dassa
Documento3 pagine
Director VP Engineering Product Management in New York City Resume Ofir Dassa
OfirDassa
Nessuna valutazione finora
DynamicTech Acumatica XRP Platform Data Sheet
Documento2 pagine
DynamicTech Acumatica XRP Platform Data Sheet
djdazed
Nessuna valutazione finora
MCA
Documento11 pagine
MCA
Alphones Damon
Nessuna valutazione finora
Universe Cicp2100
Documento2 pagine
Universe Cicp2100
lidosmarkers0a
Nessuna valutazione finora
Selecting Development Approach
Documento10 pagine
Selecting Development Approach
Soroosh Rahimian
Nessuna valutazione finora
Cognos!Go Mobile-PoC Final
Documento20 pagine
Cognos!Go Mobile-PoC Final
Harik C
Nessuna valutazione finora
C++ For Engineers and Scientists: Selection Structures
Documento32 pagine
C++ For Engineers and Scientists: Selection Structures
Mahamed Hussein
Nessuna valutazione finora
Install Instructions
Documento33 pagine
Install Instructions
MartinDiaz
Nessuna valutazione finora
IBM Restricting Consurrent User Counts
Documento20 pagine
IBM Restricting Consurrent User Counts
omar.antezana6636
Nessuna valutazione finora
Synthetic Aperture Software License Agreement
Documento3 pagine
Synthetic Aperture Software License Agreement
Fabián Pérez González
Nessuna valutazione finora
Pacis OI 1296
Documento2 pagine
Pacis OI 1296
Serge Rinaudo
Nessuna valutazione finora
La Empresa Con Destino
Documento26 pagine
La Empresa Con Destino
Daniel Adrian
Nessuna valutazione finora
It8211/it8211 - 14.2
Documento5 pagine
It8211/it8211 - 14.2
Surya
33% (3)
Naruto - Nagareboshi
Documento2 pagine
Naruto - Nagareboshi
Ole Hansen
Nessuna valutazione finora
Synopsis Driving School Admn
Documento18 pagine
Synopsis Driving School Admn
Raj Bangalore
Nessuna valutazione finora
Tally ERP 9
Documento2 pagine
Tally ERP 9
ABDUL FAHEEM
100% (2)
Osb12 Directories Files
Documento7 pagine
Osb12 Directories Files
wish_new
Nessuna valutazione finora