Sei sulla pagina 1di 28

Performance Evaluation of

Dynamic Resource Management


in Big Data Applications
MASTER THESIS PRESENTATION, JAN HENNING (336291)

Can we achieve predictable


performance in Big Data
applications?

Proceeding

Big Data, or why are we doing this?

Hadoop / YARN: The system we work with

Conceptual challenges: What makes it hard?

A glance at the implemented system

Conclusion and outlook: What to draw from it?

The Term Big Data

Not a clearly defined term

Often used for data that is neither


big nor complex

We will focus on the technical side:


How to process large amounts of
data

Large: Datasets that cannot be


processed by a single machine in
reasonable time

Reasonable time in Big Data


Big Data Frameworks operate as black
box
Faults on single machines can lead
to substantial increases of the
execution time

It is therefore hard up to impossible,


to predict the end-time of a job

Disconnected from time-sensitive


business objectives

We need our processed weather data until 20 oclock news


Why takes our recurring, daily job twice as long as normal?

Why are we doing this?

State of the Art

Current mainstream Big Data System do not


provide timing guarantees

Research currently takes place in solving this


problem

Two general approaches:


Online:

The system learns during runtime

Offline:

Profiling data is provided beforehand

The Goal

Add timing guarantees to the Black


Box and implement a real world proof
of concept

The System we use

The System we use

Open Source Big Data Framework


Widely used
Black Box for the user

How does a Job looks like?

10

How does it actually work?


1. Client submits job
2. Resource Manager allocates
container

3. Container spawns on Node #X


4. Node Manager launches
container

5. Container executes Application


Master

This architecture is called YARN: Yet Another Resource Negotiator

11

Where to Add Dynamic Resource


Management?
The Application
Master is the central
element that controls
the Job
Implement one that
can dynamically
influence the job!

12

Static Job Execution

13

Dynamic Job Execution

14

Each symbol represents


one measuring point

Approach: Speed up job execution by adding more tasks

15

Conceptual Challenges

Conceptual Challenges
Speeding up job execution is possible by adding more
tasks dynamically
When

to add additional tasks?

Introduce a Deadline the system tries to meet

Detect possible deadline violation as soon as


possible

How

many more tasks to add?

Overuse of resources vs. not adding enough

16

Implement Deadline Awareness

Core problem: How to tell when the job will presumably end?

Not a feature of Hadoop

We add this functionality to the custom Application Master

General approach:
1.

Determine when an individual task will end

2.

Extrapolate from that to whole Job

17

Determine When A Task Will End


Progress Score: Basic progress
value between 0.0 and 1.0.
Usually the fraction of processed
data.
Progress Rate: Progress Score in
relation to time

Use Progress Rate to calculate


how long the task will take to
complete
Algorithm is called LATE:
Longest Approximate Time to End

18

Determine When The Job Will End

Approach:

Use runtimes from previous tasks and LATE prediction of running


ones

Calculate speedup through parallelism using Amdahls Law

Uses the fact that parallel running tasks are tracked by the
Application Master
N: Parallel running tasks
P:

Job fraction that can be parallelized

Sn: Speedup, e.g. 2 says job goes twice as fast

19

20

Proof of Concept
Implementation

The Implemented System


Features of the system:
Dynamically

Calculate
React

add tasks to a running job to speed it up

estimated finishing time of the job

on a given deadline

Implemented as Application Master

Custom task structure

Replaces Hadoop Map/Reduce Application Master

Therefore mimics Map/Reduce paradigm

21

Excursion: Map/Reduce

22

The Implemented System: Statistics

Took 3346 lines of code to implement according to


sloccount

Uses asynchronous communication to notify the


Application Master about individual task progresses

Initially used polling based API


Proofed

hard to use

Bugs encountered: YARN #3020 takes the cake

Now in a stable state

23

Static System Behavior

24

Dynamic System Behavior

25

Conclusions

Proof of concept implementation showed:


Adjusting

resources at runtime is possible

End

time prediction of a job is hard but


achievable under most circumstances

Covered use cases:


Adding

resources at runtime

Predict

job finishing time

React

upon a presumably missed deadline

26

Outlook

27

Additional improvements of implementation:


Assign

equally sized data chunks to each

node
Deal

with stragglers

Predict

the exact amount of additional

resources needed to meet the deadline

Defining a Job with Dynamic


Resource Allocation

Custom Task Interface introduced

Inherit from ReduceTask / MapTask to achieve Map/Reduce like


behavior

Actual workload is implemented in specialized, task specific classes

28

Potrebbero piacerti anche