Benvenuto in Scribd!

Salta carosello

Spark Edupristine

Caricato da

wroin1

Il 0% ha trovato utile questo documento (0 voti)

18 visualizzazioni13 pagine

Spark Edupristine

Copyright

Formati disponibili

PDF, TXT o leggi online da Scribd

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Segnala questo documento

Spark Edupristine

Copyright:

Formati disponibili

Scarica in formato PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Il 0% ha trovato utile questo documento (0 voti)

18 visualizzazioni13 pagine

Spark Edupristine

Caricato da

wroin1

Spark Edupristine

Copyright:

Formati disponibili

Scarica in formato PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Salta alla pagina

Sei sulla pagina 1di 13

Cerca all'interno del documento

Apache Spark

The Next Gen Toolset for Big Data processing

▪ Big Data Analysis methodologies

▪ Drawbacks of Map Reduce
▪ Overview of Spark
▪ RDD – A Restricted form of distributed shared memory
▪ Overview of a Spark application

© EduPristine Spark Apache 1

Big Data

▪ Data too huge for normal systems

▪ Big Data has 3 dimensions:
• 3Vs : Volume, Velocity and Variety
▪ Storage is a challenge because of its huge size
▪ Processing without failure is a challenge
▪ The processing may take hours, days, weeks or months
▪ The process may fail because of huge data size

© EduPristine Spark Apache 2

Big Data Analysis methodologies

▪ Data will be generally processed in 3 triads

Interactive
• Batch
– Ad-hoc queries on historical data
– Trend reporting and data analysis queries

• Interactive
– Queries on historical data
– Queries to get answers for a specific question Big Data

• Streaming
– Real time queries
– In-stream analytics-recommendations
– Eg: Processing the facebook news feeds, tweets in twitter etc, log parsing etc.

© EduPristine Spark Apache 3

Drawbacks of Mapreduce

▪ Mapreduce writes the intermediate results to disk.

▪ Writing to and reading from disk will be very slow compared to in-memory processing
▪ In iterative algorithms and data mining processes mapreduce will be slow because of disk I/O,
serialization and replication

HDFS HDFS HDFS HDFS

read write read write

Input iter. 1 iter. 2 . . .

Map
Reduce

Input Map Output

Reduce
Map

© EduPristine Spark Apache 4

Need for In memory Processing ?

▪ Most of Machine Learning Algorithms are iterative because each iteration can improve the results
▪ With Disk based approach each iteration’s output is written to disk making it slow
▪ Mapreduce is slow due to replication, serialization, and disk IO

© EduPristine Spark Apache 5

Spark – The Ultimate Solution

▪ Hadoop execution flow

▪ Spark execution flow

http://www.wiziq.com/blog/hype-around-apache-spark/

Overview of Spark

▪ Apache Spark is a fast and general-purpose cluster computing system.

▪ It provides high-level APIs in Java, Scala and Python, and an optimized engine that supports
general execution graphs.

▪ It also supports a rich set of higher-level tools

About Apache Spark

▪ “A big data analytics cluster-computing framework written in Scala.”

▪ Open Sourced originally developed in AMPLab at UC Berkley.
▪ Designed to work with data in memory
▪ Designed for running Iterative algorithms & Interactive analytics
▪ Highly compatible with Hadoop’s Storage APIs.
▪ Can run on your existing Hadoop Cluster Setup.
▪ Developers can write driver programs using multiple programming languages. (Java, Scala,
Python)
▪ Spark is programmatic as well as interactive

▪ Provides In-Memory analytics which is faster than Hadoop/Hive (upto 100x).

▪ Spark has similar scalability and fault tolerance features as Hadoop mapreduce
▪ Spark is a generalized framework of mapreduce build using RDD (Resilient Distributed Dataset)
▪ Spark uses Lineage to reconstitute data.
▪ Spark is compatible with Hadoop
▪ Spark supports Batch, Streaming and Interactive operations.

Spark Applications

▪ Spark applications are similar to MR jobs.

▪ Each application is a self-contained computation which runs some user-supplied code to compute
a result.
▪ Like mapreduce jobs, spark applications can make use of the resources available in the cluster.
▪ Spark revolves around the concept of a resilient distributed dataset (RDD), which is a fault-
tolerant collection of elements that can be operated on in parallel.

What makes Spark powerful ?

▪ In spark, behind the curtain, it is RDD which is doing the fault tolerant in memory data processing.
▪ RDD is resilient distributed datasets.
▪ It is a fault-tolerant abstraction for In-memory cluster computing.

Thank You!

Support@edupristine.com
www.edupristine.com

Potrebbero piacerti anche

Fear: Trump in the White House
Da Everand
Fear: Trump in the White House
Bob Woodward
Valutazione: 3.5 su 5 stelle
3.5/5 (738)
A Man Called Ove: A Novel
Da Everand
A Man Called Ove: A Novel
Fredrik Backman
Valutazione: 4.5 su 5 stelle
4.5/5 (4609)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Da Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Valutazione: 4.5 su 5 stelle
4.5/5 (121)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Da Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Valutazione: 3.5 su 5 stelle
3.5/5 (231)
Grit: The Power of Passion and Perseverance
Da Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Valutazione: 4 su 5 stelle
4/5 (588)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Da Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Valutazione: 4.5 su 5 stelle
4.5/5 (266)
Principles: Life and Work
Da Everand
Principles: Life and Work
Ray Dalio
Valutazione: 4 su 5 stelle
4/5 (599)
Never Split the Difference: Negotiating As If Your Life Depended On It
Da Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Valutazione: 4.5 su 5 stelle
4.5/5 (838)
The Emperor of All Maladies: A Biography of Cancer
Da Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Valutazione: 4.5 su 5 stelle
4.5/5 (271)
The Little Book of Hygge: Danish Secrets to Happy Living
Da Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Valutazione: 3.5 su 5 stelle
3.5/5 (400)
Yes Please
Da Everand
Yes Please
Amy Poehler
Valutazione: 4 su 5 stelle
4/5 (1891)
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Da Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Valutazione: 4 su 5 stelle
4/5 (5794)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Da Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Valutazione: 3.5 su 5 stelle
3.5/5 (2259)
The Glass Castle: A Memoir
Da Everand
The Glass Castle: A Memoir
Jeannette Walls
Valutazione: 4.5 su 5 stelle
4.5/5 (1713)
Shoe Dog: A Memoir by the Creator of Nike
Da Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Valutazione: 4.5 su 5 stelle
4.5/5 (537)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Da Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Valutazione: 4 su 5 stelle
4/5 (1090)
John Adams
Da Everand
John Adams
David McCullough
Valutazione: 4.5 su 5 stelle
4.5/5 (2409)
A Tree Grows in Brooklyn
Da Everand
A Tree Grows in Brooklyn
Betty Smith
Valutazione: 4.5 su 5 stelle
4.5/5 (1929)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Da Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Valutazione: 4.5 su 5 stelle
4.5/5 (345)
Team of Rivals: The Political Genius of Abraham Lincoln
Da Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Valutazione: 4.5 su 5 stelle
4.5/5 (234)
Her Body and Other Parties: Stories
Da Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Valutazione: 4 su 5 stelle
4/5 (821)
The Art of Racing in the Rain: A Novel
Da Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Valutazione: 4 su 5 stelle
4/5 (4200)
Wolf Hall: A Novel
Da Everand
Wolf Hall: A Novel
Hilary Mantel
Valutazione: 4 su 5 stelle
4/5 (3811)
The Light Between Oceans: A Novel
Da Everand
The Light Between Oceans: A Novel
M.L. Stedman
Valutazione: 4.5 su 5 stelle
4.5/5 (789)
The Perks of Being a Wallflower
Da Everand
The Perks of Being a Wallflower
Stephen Chbosky
Valutazione: 4.5 su 5 stelle
4.5/5 (2104)
Rise of ISIS: A Threat We Can't Ignore
Da Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Valutazione: 3.5 su 5 stelle
3.5/5 (137)
Angela's Ashes: A Memoir
Da Everand
Angela's Ashes: A Memoir
Frank McCourt
Valutazione: 4.5 su 5 stelle
4.5/5 (440)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Da Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Valutazione: 4 su 5 stelle
4/5 (895)
The Woman in Cabin 10
Da Everand
The Woman in Cabin 10
Ruth Ware
Valutazione: 3.5 su 5 stelle
3.5/5 (2322)
The Outsider: A Novel
Da Everand
The Outsider: A Novel
Stephen King
Valutazione: 4 su 5 stelle
4/5 (1839)
The Unwinding: An Inner History of the New America
Da Everand
The Unwinding: An Inner History of the New America
George Packer
Valutazione: 4 su 5 stelle
4/5 (45)
Little Women
Da Everand
Little Women
Louisa May Alcott
Valutazione: 4 su 5 stelle
4/5 (104)
Sing, Unburied, Sing: A Novel
Da Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Valutazione: 4 su 5 stelle
4/5 (1103)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Da Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Valutazione: 4.5 su 5 stelle
4.5/5 (474)
On Fire: The (Burning) Case for a Green New Deal
Da Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Valutazione: 4 su 5 stelle
4/5 (74)
Brooklyn: A Novel
Da Everand
Brooklyn: A Novel
Colm Toibin
Valutazione: 3.5 su 5 stelle
3.5/5 (1937)
The Yellow House: A Memoir (2019 National Book Award Winner)
Da Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Valutazione: 4 su 5 stelle
4/5 (98)
The Constant Gardener: A Novel
Da Everand
The Constant Gardener: A Novel
John le Carré
Valutazione: 3.5 su 5 stelle
3.5/5 (104)
Manhattan Beach: A Novel
Da Everand
Manhattan Beach: A Novel
Jennifer Egan
Valutazione: 3.5 su 5 stelle
3.5/5 (792)
Steve Jobs
Da Everand
Steve Jobs
Walter Isaacson
Valutazione: 4.5 su 5 stelle
4.5/5 (806)
Multi Threading
Documento25 pagine
Multi Threading
Aatika Fatima
Nessuna valutazione finora
Bad Feminist: Essays
Da Everand
Bad Feminist: Essays
Roxane Gay
Valutazione: 4 su 5 stelle
4/5 (1016)
Lecture Clusters PDF
Documento168 pagine
Lecture Clusters PDF
Gabriel Espiñeira
Nessuna valutazione finora
Operating Systems 8th Edition Cheat Sheet (Up To Chapter 6)
Documento5 pagine
Operating Systems 8th Edition Cheat Sheet (Up To Chapter 6)
Michael Beswick
0% (1)
Asynctask and Half Sync Half Async Pattern
Documento9 pagine
Asynctask and Half Sync Half Async Pattern
Khaoula Hamed
Nessuna valutazione finora
Concurrency Control: S X S True False X False False
Documento9 pagine
Concurrency Control: S X S True False X False False
Khondoker Razzakul Haque
Nessuna valutazione finora
DC MidSem
Documento333 pagine
DC MidSem
Qadeer Ahmed Shah
100% (1)
SISd
Documento17 pagine
SISd
PriyaSrihari
Nessuna valutazione finora
Logcat
Documento8.307 pagine
Logcat
soberanoez23
Nessuna valutazione finora
CS4411 Intro. To Operating Systems Exam 1 Solutions Fall 2006
Documento10 pagine
CS4411 Intro. To Operating Systems Exam 1 Solutions Fall 2006
norbihaver
Nessuna valutazione finora
Thread Level Parallelism (1) : EEC 171 Parallel Architectures John Owens UC Davis
Documento47 pagine
Thread Level Parallelism (1) : EEC 171 Parallel Architectures John Owens UC Davis
jeganathan
Nessuna valutazione finora
OS Sem4 - CS
Documento97 pagine
OS Sem4 - CS
iG么RayG
Nessuna valutazione finora
978 1 4987 7387 4
Documento211 pagine
978 1 4987 7387 4
andaihiep
Nessuna valutazione finora
Log
Documento497 pagine
Log
Ehrin Mae Balagso
Nessuna valutazione finora
Java - Inici
Documento24 pagine
Java - Inici
Adolfo José Martínez Lamberto
Nessuna valutazione finora
Semester: 8 Sem Subject: DOS Quiz Bank
Documento92 pagine
Semester: 8 Sem Subject: DOS Quiz Bank
SAI RAMAN
Nessuna valutazione finora
Os MCQ
Documento69 pagine
Os MCQ
Santosh Phatangare
Nessuna valutazione finora
CS526 1 Intro
Documento15 pagine
CS526 1 Intro
Kamran Anwar
Nessuna valutazione finora
Untitled
Documento103 pagine
Untitled
KEN DAEL
Nessuna valutazione finora
Assignment
Documento4 pagine
Assignment
Muqaddas Pervez
Nessuna valutazione finora
SUPERCOMPUTERS1
Documento12 pagine
SUPERCOMPUTERS1
SimantoPreeom
Nessuna valutazione finora
CSE325 OS Laboratory Manual
Documento36 pagine
CSE325 OS Laboratory Manual
Kanna Chowdary
Nessuna valutazione finora
Multithreading in Java
Documento45 pagine
Multithreading in Java
CHANDRA BHUSHAN
Nessuna valutazione finora
Operating System - Lab Manual # 11
Documento7 pagine
Operating System - Lab Manual # 11
bsef21m008
Nessuna valutazione finora
Module 2 PDF
Documento76 pagine
Module 2 PDF
Rajadorai Ds
Nessuna valutazione finora
Supplier Payment Details OAF Page Error
Documento9 pagine
Supplier Payment Details OAF Page Error
S.GIRIDHARAN
Nessuna valutazione finora
Uniprocessor Scheduling: Operating Systems: Internals and Design Principles, 6/E
Documento48 pagine
Uniprocessor Scheduling: Operating Systems: Internals and Design Principles, 6/E
Muhammad Fahim
Nessuna valutazione finora
Java Err
Documento38 pagine
Java Err
Rifqy Trisna Ariffudin
Nessuna valutazione finora
Chapter 2 ILP
Documento89 pagine
Chapter 2 ILP
Setina Ali
Nessuna valutazione finora
Presentation On:: Multi-Threading and Thread Synchronization
Documento24 pagine
Presentation On:: Multi-Threading and Thread Synchronization
David Shourov
Nessuna valutazione finora
Parallel and Distributed Computing Lecture#12
Documento19 pagine
Parallel and Distributed Computing Lecture#12
Ihsan Ullah
Nessuna valutazione finora