Defense

Performance Evaluation of
Dynamic Resource Management

in Big Data Applications
MASTER THESIS PRESENTATION, JAN HENNING (336291)
Can we achieve predictable

performance in Big Data
applications?
Proceeding
Big Data, or why are we doing this?
Hadoop / YARN: The system we work with
Conceptual challenges: What makes it hard?
A glance at the implemented system
Conclusion and outlook: What to draw from it?
The Term Big Data
Not a clearly defined term
Often used for data that is neither

big nor complex
We will focus on the technical side:

How to process large amounts of
data
Large: Datasets that cannot be

processed by a single machine in
reasonable time
Reasonable time in Big Data

Big Data Frameworks operate as black
box
Faults on single machines can lead
to substantial increases of the
execution time
It is therefore hard up to impossible,

to predict the end-time of a job
Disconnected from time-sensitive

business objectives
We need our processed weather data until 20 oclock news

Why takes our recurring, daily job twice as long as normal?
Why are we doing this?
State of the Art
Current mainstream Big Data System do not

provide timing guarantees
Research currently takes place in solving this

problem
Two general approaches:

Online:
The system learns during runtime
Offline:
Profiling data is provided beforehand
The Goal
Add timing guarantees to the Black

Box and implement a real world proof
of concept
The System we use
The System we use
Open Source Big Data Framework

Widely used
Black Box for the user
How does a Job looks like?
10
How does it actually work?

1. Client submits job
2. Resource Manager allocates
container
3. Container spawns on Node #X

4. Node Manager launches
container
5. Container executes Application

Master
This architecture is called YARN: Yet Another Resource Negotiator
11
Where to Add Dynamic Resource

Management?
The Application
Master is the central
element that controls
the Job
Implement one that
can dynamically
influence the job!
12
Static Job Execution
13
Dynamic Job Execution
14
Each symbol represents

one measuring point
Approach: Speed up job execution by adding more tasks
15
Conceptual Challenges
Conceptual Challenges
Speeding up job execution is possible by adding more
tasks dynamically
When
to add additional tasks?
Introduce a Deadline the system tries to meet
Detect possible deadline violation as soon as

possible
How
many more tasks to add?
Overuse of resources vs. not adding enough
16
Implement Deadline Awareness
Core problem: How to tell when the job will presumably end?
Not a feature of Hadoop
We add this functionality to the custom Application Master
General approach:
1.
Determine when an individual task will end
2.
Extrapolate from that to whole Job
17
Determine When A Task Will End

Progress Score: Basic progress
value between 0.0 and 1.0.
Usually the fraction of processed
data.
Progress Rate: Progress Score in
relation to time
Use Progress Rate to calculate

how long the task will take to
complete
Algorithm is called LATE:
Longest Approximate Time to End
18
Determine When The Job Will End
Approach:
Use runtimes from previous tasks and LATE prediction of running

ones
Calculate speedup through parallelism using Amdahls Law
Uses the fact that parallel running tasks are tracked by the
Application Master
N: Parallel running tasks
P:
Job fraction that can be parallelized
Sn: Speedup, e.g. 2 says job goes twice as fast
19
20
Proof of Concept
Implementation
The Implemented System

Features of the system:
Dynamically
Calculate
React
add tasks to a running job to speed it up
estimated finishing time of the job
on a given deadline
Implemented as Application Master
Custom task structure
Replaces Hadoop Map/Reduce Application Master
Therefore mimics Map/Reduce paradigm
21
Excursion: Map/Reduce
22
The Implemented System: Statistics
Took 3346 lines of code to implement according to

sloccount
Uses asynchronous communication to notify the

Application Master about individual task progresses
Initially used polling based API

Proofed
hard to use
Bugs encountered: YARN #3020 takes the cake
Now in a stable state
23
Static System Behavior
24
Dynamic System Behavior
25
Conclusions
Proof of concept implementation showed:

Adjusting
resources at runtime is possible
End
time prediction of a job is hard but

achievable under most circumstances
Covered use cases:

Adding
resources at runtime
Predict
job finishing time
React
upon a presumably missed deadline
26
Outlook
27
Additional improvements of implementation:

Assign
equally sized data chunks to each
node
Deal
with stragglers
Predict
the exact amount of additional
resources needed to meet the deadline
Defining a Job with Dynamic

Resource Allocation
Custom Task Interface introduced
Inherit from ReduceTask / MapTask to achieve Map/Reduce like

behavior
Actual workload is implemented in specialized, task specific classes
28

Defense

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Defense

Caricato da

Copyright:

Formati disponibili

Performance Evaluation of

Dynamic Resource Management

Can we achieve predictable

Big Data, or why are we doing this?

Hadoop / YARN: The system we work with

Conceptual challenges: What makes it hard?

A glance at the implemented system

Conclusion and outlook: What to draw from it?

The Term Big Data

Not a clearly defined term

Often used for data that is neither

We will focus on the technical side:

Large: Datasets that cannot be

Reasonable time in Big Data

It is therefore hard up to impossible,

Disconnected from time-sensitive

We need our processed weather data until 20 oclock news

Why are we doing this?

State of the Art

Current mainstream Big Data System do not

Research currently takes place in solving this

Two general approaches:

The system learns during runtime

Profiling data is provided beforehand

Add timing guarantees to the Black

The System we use

The System we use

Open Source Big Data Framework

How does a Job looks like?

How does it actually work?

3. Container spawns on Node #X

5. Container executes Application

This architecture is called YARN: Yet Another Resource Negotiator

Where to Add Dynamic Resource

Static Job Execution

Dynamic Job Execution

Each symbol represents

Approach: Speed up job execution by adding more tasks

to add additional tasks?

Introduce a Deadline the system tries to meet

Detect possible deadline violation as soon as

many more tasks to add?

Overuse of resources vs. not adding enough

Implement Deadline Awareness

Not a feature of Hadoop

We add this functionality to the custom Application Master

Determine when an individual task will end

Extrapolate from that to whole Job

Determine When A Task Will End

Use Progress Rate to calculate

Determine When The Job Will End

Use runtimes from previous tasks and LATE prediction of running

Calculate speedup through parallelism using Amdahls Law

Job fraction that can be parallelized

Sn: Speedup, e.g. 2 says job goes twice as fast

The Implemented System

add tasks to a running job to speed it up

estimated finishing time of the job

Implemented as Application Master

Custom task structure

Replaces Hadoop Map/Reduce Application Master

Therefore mimics Map/Reduce paradigm

The Implemented System: Statistics

Took 3346 lines of code to implement according to