Sei sulla pagina 1di 36

Apache Hadoop 3.

0 in
Nutshell
Munich, Apr. 2017
Sanjay Radia, Junping Du

1 Hortonworks Inc. 2011 2016. All Rights Reserved


About Speakers

Sanjay Radia
Chief Architect, Founder, Hortonworks
Part of the original Hadoop team at Yahoo! since 2007
Chief Architect of Hadoop Core at Yahoo!
Apache Hadoop PMC and Committer

Prior
Data center automation, virtualization, Java, HA, OSs, File Systems
Startup, Sun Microsystems, Inria
Ph.D., University of Waterloo

Junping Du
Apache Hadoop Committer & PMC member
Lead Software Engineer @ Hortonworks YARN Core Team
10+ years for developing enterprise software (5+ years for being Hadooper)

2 Hortonworks Inc. 2011 2016. All Rights Reserved


Page 2
Why Hadoop 3.0

The Driving Reasons Some features taking advantage of 3.0


YARN: long running services
Lot of content in Trunk that did not
make it to 2.x branch Ephemeral Ports (incompatible)
JDK Upgrade does not truly require
bumping major number
Hadoop command scripts rewrite
(incompatible)
Big features that need stabilizing major
release Erasure codes

3 Hortonworks Inc. 2011 2016. All Rights Reserved


Apache Hadoop 3.0
Key Takeaways Release Timeline
HDFS: Erasure codes 3.0.0-alpha1 - Sep/3/2016

YARN: Alpha2 - Jan/25/2017


Long running services, Alpha3 - Q2 2017 (Estimated)
scheduler enhancements, Beta/GA - Q3/Q4 2017 (Estimated)
Isolation & Docker
UI
Lots of Trunk content
JDK8 and newer dependent
libraries
4 Hortonworks Inc. 2011 2016. All Rights Reserved
Agenda
Hadoop 3.0 Basis - Major changes you should know before upgrade
JDK upgrade
Dependency upgrade
Change on default port for daemon/services
Shell script rewrite
Features
Hadoop Common
Client-Side Classpath Isolation
HDFS
Erasure Coding
Support for more than 2 NameNodes
YARN
Support for long running services
Scheduling enhancements: : App / Queue Priorities, global scheduling, placement strategies
New UI
ATS v2
MAPREDUCE
Task-level native optimizationHADOOP-11264
5 Hortonworks Inc. 2011 2016. All Rights Reserved
Hadoop Operation - JDK Upgrade
Minimum JDK for Hadoop 3.0.x is JDK8OOP-11858
Oracle JDK 7 is EoL at April 2015!!

Moving forward to use new features of JDK8


Lambda Expressions starting to use this
Stream API
security enhancements
performance enhancement for HashMaps, IO/NIO, etc.

Hadoops evolution with JDK upgrades


Hadoop 2.6.x - JDK 6, 7, 8 or later
Hadoop 2.7.x/2.8.x/2.9.x - JDK 7, 8 or later
Hadoop 3.0.x - JDK 8 or later

6 Hortonworks Inc. 2011 2016. All Rights Reserved


Dependency Upgrade
Jersey: 1.9 to 1.19
the root element whose content is empty collection is changed from null to
empty object({}).
Grizzly-http-servlet: 2.1.2 to 2.2.21
Guice: 3.0 to 4.0
cglib: 2.2 to 3.2.0
asm: 3.2 to 5.0.4
netty-all: 4.0.23 to 4.1x (in discussion)
Protocol Buffer: 2.5 to 3.x (in discussion)

7 Hortonworks Inc. 2011 2016. All Rights Reserved


Change of Default Ports for Hadoop Services
Previously, the default ports of multiple Hadoop services were in the Linux
ephemeral port range (32768-61000)
Can conflict with other apps running on the same node

New ports:
Namenode ports: 50470 9871, 50070 9870, 8020 9820
Secondary NN ports: 50091 9869, 50090 9868
Datanode ports: 50020 9867, 50010 9866, 50475 9865, 50075 9864

KMS service port 16000 9600

8 Hortonworks Inc. 2011 2016. All Rights Reserved


Hadoop Common
Client-Side Classpath Isolation

9 Hortonworks Inc. 2011 2016. All Rights Reserved


Client-side classpath isolation
HADOOP-11656/HADOOP-13070
Problem
Application codes dependency (including Apache Hive or dependency projects) can conflict with
Hadoops dependencies

Single Jar File

User code Server


Hadoop
newer Older
Client
commons commons
Conflicts!!!
Solution
Separating Server-side jar and Client-side jar
Like hbase-client, dependencies are shaded

Co-existable!
User code Hadoop Server
newer -client Older
commons shaded commons
1 Hortonworks Inc. 2011 2016. All Rights Reserved
0
HDFS
Support for Three NameNodes for HA
Erasure coding

1 Hortonworks Inc. 2011 2016. All Rights Reserved


1
Current (2.x) HDFS Replication Strategy
Three replicas by default
1st replica on local node, local rack or random node
2nd and 3rd replicas on the same remote rack
Reliability: tolerate 2 failures
Rack I Rack II
Good data locality, local shortcut
DataNode DataNode
Multiple copies => Parallel IO for parallel compute
Very Fast block recovery and node recovery
Parallel recover - the bigger the cluster the faster r1 r2
10TB Node recovery 30sec to a few hours

r3
3/x storage overhead vs 1.4-1.6 of Erasure Code
Remember that Hadoops JBod is much much cheaper
1/10 - 1/20 of SANs

1/10 1/5 of NFS

1 Hortonworks Inc. 2011 2016. All Rights Reserved


2
Erasure Coding
k data blocks + m parity blocks (k + m)
Example: Reed-Solomon 6+3
Reliability: tolerate m failures
b1 b2 b3 b4 b5 b6 P1 P2 P3
Save disk space
Save I/O bandwidth on the write path
6 data blocks 3 parity blocks

1.5x storage overhead


Tolerate any 3 failures

3-replication (6, 3) Reed-Solomon

Maximum fault Tolerance 2 3

Disk usage 3N 1.5N


(N byte of data)

1 Hortonworks Inc. 2011 2016. All Rights Reserved


3
Block Reconstruction
Block reconstruction overhead
Higher network bandwidth cost
Extra CPU overhead
Local Reconstruction Codes (LRC), Hitchhiker

Rack Rack Rack Rack Rack Rack Rack Rack Rack

b1 b2 b3 b4 b5 b6 P1 P2 P3

Huang et al. Erasure Coding in Windows Azure Storage. USENIX ATC'12.


Sathiamoorthy et al. XORing elephants: novel erasure codes for big data. VLDB 2013.
Rashmi et al. A "Hitchhiker's" Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers. SIGCOMM'14.
1 Hortonworks Inc. 2011 2016. All Rights Reserved
4
Erasure Coding on Contiguous/Striped Blocks
File f1 File f2 f3
Two Approaches

b1 b2 b3 b4 b5 b6 P1 P2 P3
EC on contiguous blocks
Pros: Better for locality
Cons: small files cannot be handled parity blocks
data blocks

b1 b2 b3 b4 b5 b6 P1 P2 P3
EC on striped blocks C1 C2 C3 C4 C5 C6 PC1 PC2 PC3
stripe 1
Pros: Leverage multiple disks in parallel C7 C8 C9 C10 C11 C12 PC4 PC5 PC6
Pros: Works for small small files stripe 2
Cons: No data locality for readers

stripe n

6 Data Blocks 3 Parity Blocks

1 Hortonworks Inc. 2011 2016. All Rights Reserved


5
Apache Hadoops decision

Starting from Striping to deal with smaller files


Hadoop 3.0.0 implementes Phase 1.1 and Phase 1.2

1 Hortonworks Inc. 2011 2016. All Rights Reserved


6
Erasure Coding Zone

Create a zone on an empty directory


Shell command: hdfs erasurecode createZone [-s <schemaName>] <path>

All the files under a zone directory are automatically erasure


coded
Rename across zones with different EC schemas are disallowed

1 Hortonworks Inc. 2011 2016. All Rights Reserved


7
Write Pipeline for Replicated Files
Write pipeline to datanodes
Durability
Use 3 replicas to tolerate maximum 2 failures
Visibility
Read is supported for being written files
Data can be made visible by hflush/hsync
Consistency
Client can start reading from any replica and failover to any other replica to read the same data
Appendable
Files can be reopened for append
data data data

Writer DN1 DN2 DN3


ack ack ack
* DN = DataNode
1 Hortonworks Inc. 2011 2016. All Rights Reserved
8
Parallel Write for EC Files
Parallel write Stipe size 1MB
Client writes to a group of 9 datanodes at the same time DN1
Calculate Parity bits at client side, at Write Time data


Durability ack
(6, 3)-Reed-Solomon can tolerate maximum 3 failures
data
Visibility (Same as replicated files) DN6
Read is supported for being written files
Writer ack
Data can be made visible by hflush/hsync parity
Consistency ack
Client can start reading from any 6 of the 9 replicas DN7
parity
When reading from a datanode fails, client can failover to
any other remaining replica to read the same data. ack


Appendable (Same as replicated files)
Files can be reopened for append
DN9

1 Hortonworks Inc. 2011 2016. All Rights Reserved


9
EC: Write Failure Handling
Datanode failure
Client ignores the failed datanode and continue writing. DN1
Able to tolerate 3 failures. data
Require at least 6 datanodes.


ack
Missing blocks will be reconstructed later.
data
Writer DN6
ack
parity
ack
DN7
parity
ack


DN9

2 Hortonworks Inc. 2011 2016. All Rights Reserved


0
Replication:
Slow Writers & Replace Datanode on Failure
Write pipeline for replicated files
Datanode can be replaced in case of failure.
Slow writers
A write pipeline may last for a long time
The probability of datanode failures increases over time.
Need to replace datanode on failure.
EC files
Do not support replace-datanode-on-failure. data
Slow writer improved

ack
data data

Writer DN1 DN2 DN3 DN4


ack ack

2 Hortonworks Inc. 2011 2016. All Rights Reserved


1
Reading with Parity Blocks
Parallel read
Read from 6 Datanodes with data blocks DN1
Support both stateful read and pread
Block1
Block2
DN2
Block reconstruction
Read parity blocks to reconstruct missing blocks
Reader DN3
Block4
reconstruct
DN4
Block5
Block3 DN5
Parity1 Block6

DN6
DN7
2 Hortonworks Inc. 2011 2016. All Rights Reserved
2
Network traffic Need good network bandwidth

Pros
Low latency because of parallel write/read
Good for small-size files
Cons
Require high network bandwidth between client-server
Higher reconstruction cost
Dead DataNode implies high network traffic and reconstruction time
Workload 3-replication (6, 3) Reed-Solomon

Read 1 block 1 LN 1/6 LN + 5/6 RR

Write 1LN + 1LR + 1RR 1/6 LN + 1/6 LR +


7/6 RR

LN: Local Node


LR: Local Rack
RR: Remote Rack
2 Hortonworks Inc. 2011 2016. All Rights Reserved
3
YARN
YARN Scheduling Enhancements
Support for Long Running Services
Re-architecture for YARN Timeline Service - ATS v2
Better elasticity and resource utilization
Better resource isolation and Docker!!
Better User Experiences
Other Enhancements

2 Hortonworks Inc. 2011 2016. All Rights Reserved


4
Scheduling Enhancements
Application priorities within a queue: YARN-1963
In Queue A, App1 > App 2
Inter-Queue priorities
Q1 > Q2 irrespective of demand / capacity
Previously based on unconsumed capacity

Affinity / anti-affinity: YARN-1042


More restraints on locations
Global Scheduling: YARN-5139
Get rid of scheduling triggered on node heartbeat
Replaced with global scheduler that has parallel threads
Globally optimal placement
Critical for long running services they stick to the allocation better be a good one
Enhanced container scheduling throughput (6x)

2 Hortonworks Inc. 2011 2016. All Rights Reserved


5
Key Drivers for Long Running Services

Consolidation of Infrastructure
Hadoop clusters have a lot of compute and storage resources (some unused)
Cant I use Hadoops resources for non-Hadoop load?
Openstack is hard to run, can I use YARN?
But does it support Docker? yes, we heard you
Hadoop related Data Services that run outside a Hadoop cluster
Why cant I run them in the Hadoop cluster
Run Hadoop services (Hive, HBase) on YARN
Run Multiple instances
Benefit from YARNs Elasticity and resource management

2 Hortonworks Inc. 2011 2016. All Rights Reserved


6
Built-in support for long running Service in YARN
A native YARN framework. YARN-4692
Abstract common Framework (Similar to Slider) to support long running service
More simplified API (to manage service lifecycle)
Better support for long running service

Recognition of long running service


Affect the policy of preemption, container reservation, etc.
Auto-restart of containers
Containers for long running service are retried to same node in case of local state

Service/application upgrade support YARN-4726


In general, services are expected to run long enough to cross versions

Dynamic container configuration


Only ask for resources just enough, but adjust them at runtime (memory harder)
2 Hortonworks Inc. 2011 2016. All Rights Reserved
7
Discovery services in YARN
Services can run on any YARN node; how do get its IP?
It can also move due to node failure

YARN Service Discovery via DNS: YARN-4757


Expose existing service information in YARN registry via DNS
Current YARN service registrys records will be converted into DNS entries

Discovery of container IP and service port via standard DNS lookups.


Application
zkapp1.user1.yarncluster.com -> 192.168.10.11:8080
Container
Container 1454001598828-0001-01-00004.yarncluster.com -> 192.168.10.18

2 Hortonworks Inc. 2011 2016. All Rights Reserved


8
A More Powerful YARN
Elastic Resource Model
Dynamic Resource Configuration
YARN-291
Allow tune down/up on NMs resource in runtime
Graceful decommissioning of NodeManagers
YARN-914
Drains a node thats being decommissioned to allow running containers to
finish
Efficient Resource Utilization
Support for container resizing
YARN-1197
Allows applications to change the size of an existing container

2 Hortonworks Inc. 2011 2016. All Rights Reserved


9
More Powerful YARN (Contd.)
Resource Isolation
Resource isolation support for disk and network
YARN-2619 (disk), YARN-2140 (network)
Containers get a fair share of disk and network resources using Cgroups

Docker support in LinuxContainerExecutor


YARN-3611
Support to launch Docker containers alongside process
Packaging and resource isolation
Complements YARNs support for long running services

3 Hortonworks Inc. 2011 2016. All Rights Reserved


0
Docker on Yarn & YARN on YARN - YCloud

Can use Yarn to test Hadoop!! MR Tez Spar


k

Hadoop Apps TensorFlow YARN


MR Tez Spark

YARN
3 Hortonworks Inc. 2011 2016. All Rights Reserved
1
YARN New UI (YARN-3368)

3 Hortonworks Inc. 2011 2016. All Rights Reserved


2
Timeline Service Revolution Why ATS v2
Scalability & Performance
v1 limitation: Reliability
Single global instance of writer/reader v1 limitation:
Local disk based LevelDB storage Data is stored in a local disk
Single point of failure (SPOF) for timeline
server
Usability
Handle flows as first-class concepts and
model aggregation Flexibility
Add configuration and metrics as first-class Data model is more describable
members Extended to more specific info to app
Better support for queries

3 Hortonworks Inc. 2011 2016. All Rights Reserved


3
Core Design for ATS v2
Distributed write path Separate reader instances
Logical per app collector + physical per
node writer
Collector/Writer launched as an auxiliary Aggregation & Accumulation
service in NM.
Aggregation: rolling up the metric values to the
Standalone writers will be added later. parent
Online aggregation for apps and flow runs
Pluggable backend storage Offline aggregation for users, flows and
Built in with a scalable and reliable queues
implementation (HBase) Accumulation: rolling up the metric values
across time interval
Accumulated resource consumption for app,
Enhanced data model
flow, etc.
Entity (bi-directional relation) with flow,
queue, etc.
Configuration, Metric, Event, etc.
3 Hortonworks Inc. 2011 2016. All Rights Reserved
4
Other YARN work planned in Hadoop 3.X
Resource profiles
YARN-3926
Users can specify resource profile name instead of individual resources
Resource types read via a config file
YARN federation
YARN-2915
Allows YARN to scale out to tens of thousands of nodes
Cluster of clusters which appear as a single cluster to an end user
Gang Scheduling
YARN-624

3 Hortonworks Inc. 2011 2016. All Rights Reserved


5
Thank you!
Reminder: BoFs on Thursday at 5:50pm

3 Hortonworks Inc. 2011 2016. All Rights Reserved


6

Potrebbero piacerti anche