Sei sulla pagina 1di 2

TRAINING SHEET

CLOUDERA DEVELOPER TRAINING
FOR SPARK & HADOOP
Take your knowledge to the next level

This four-day hands-on training course delivers the key concepts and expertise developers need
“Cloudera has not only prepared us for to develop high-performance parallel applications with Apache Spark 2. Participants will learn
how to use Spark SQL to query structured data and Spark Streaming to perform real-time
success today, but has also trained us
processing on streaming data from a variety of sources. Developers will also practice writing
to face and prevail over our big data
applications that use core Spark to perform ETL processing and iterative algorithms. The course
challenges in the future by using covers how to work with large datasets stored in a distributed file system, and execute Spark
Hadoop." applications on a Hadoop cluster. After taking this course, participants will be prepared to face
Persado real-world challenges and build applications to execute faster decisions, better decisions, and
interactive analysis, applied to a wide variety of use cases, architectures, and industries.
With this course update, we streamlined the agenda to help you quickly become productive with
the most important technologies, including Spark 2.

Get Hands-On Experience


Hands-on exercises take place on a live cluster, running in the cloud. A private cluster will be
built for each student to use during the class.

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate
the Hadoop ecosystem, learning how to:
_ Distribute, store, and process data in a Hadoop cluster
_ Write, configure, and deploy Spark applications on a cluster
_ Use the Spark shell for interactive data analysis
_ Process and query structured data using Spark SQL
_ Use Spark Streaming to process a live data stream

What to Expect
This course is designed for developers and engineers who have programming experience, but
prior knowledge of Hadoop and/or Spark is not required.
_ Apache Spark examples and hands-on exercises are presented in Scala and Python. The
ability to program in one of those languages is required.
_ Basic familiarity with the Linux command line is assumed
_ Basic knowledge of SQL is helpful

Get Certified
Upon completion of the course, attendees are encouraged to continue their study and register
for the CCA Spark and Hadoop Developer exam. Certification is a great di erentiator. It helps
establish you as a leader in the field, providing employers and customers with tangible evidence
of your skills and expertise.
TRAINING SHEET

Course Details: Developer Training for Spark & Hadoop

Introduction to Apache Hadoop and the RDD Overview Distributed Processing


Hadoop Ecosystem _ RDD Overview _ Review: Apache Spark on a Cluster
_ Apache Hadoop Overview _ RDD Data Sources _ RDD Partitions
_ Data Ingestion and Storage _ Creating and Saving RDDs _ Example: Partitioning in Queries
_ Data Processing _ RDD Operations _ Stages and Tasks
_ Data Analysis and Exploration _ Job Execution Planning
_ Other Ecosystem Tools Transforming Data with RDDs
_ Example: Catalyst Execution Plan
_ Writing and Passing Transformation
_ Introduction to the Hands-On _ Example: RDD Execution Plan
Functions
Exercises
_ Transformation Execution Distributed Data Persistence
Apache Hadoop File Storage _ Converting Between RDDs and _ DataFrame and Dataset Persistence
_ Apache Hadoop Cluster Components DataFrames _ Persistence Storage Levels
_ HDFS Architecture _ Viewing Persisted RDDs
Aggregating Data with Pair RDDs
_ Using HDFS
_ Key-Value Pair RDDs
Common Patterns in Apache Spark Data
Distributed Processing on an Apache _ Map-Reduce Processing
Hadoop Cluster _ Other Pair RDD Operations _ Common Apache Spark Use Cases
_ YARN Architecture _ Iterative Algorithms in Apache Spark
_ Working With YARN Querying Tables and Views with Apache
Spark SQL _ Machine Learning
Apache Spark Basics _ Querying Tables in Spark Using SQL _ Example: k-means
_ What is Apache Spark? _ Querying Files and Views
Apache Spark Streaming: Introduction
_ Starting the Spark Shell _ The Catalog API to DStreams
_ Using the Spark Shell _ Comparing Spark SQL, Apache Impala, _ Apache Spark Streaming Overview
_ Getting Started with Datasets and and Apache Hive-on-Spark _ Example: Streaming Request Count
DataFrames _ DStreams
Working with Datasets in Scala
_ DataFrame Operations _ Developing Streaming Applications
_ Datasets and DataFrames
Working with DataFrames and Schemas _ Creating Datasets
Apache Spark Streaming: Processing
_ Creating DataFrames from Data _ Loading and Saving Datasets Multiple Batches
Sources _ Dataset Operations _ Multi-Batch Operations
_ Saving DataFrames to Data Sources _ Time Slicing
Writing, Configuring, and Running
_ DataFrame Schemas _ State Operations
Apache Spark Applications
_ Eager and Lazy Execution _ Sliding Window Operations
_ Writing a Spark Application
Analyzing Data with DataFrame Queries _ Building and Running an Application _ Preview: Structured Streaming
_ Querying DataFrames Using Column _ Application Deployment Mode
Apache Spark Streaming: Data Sources
Expressions _ The Spark Application Web UI _ Streaming Data Source Overview
_ Grouping and Aggregation Queries _ Configuring Application Properties _ Apache Flume and Apache Kafka Data
_ Joining DataFrames
  Sources
  _ Example: Using a Kafka Direct Data
  Source
 
 

Cloudera, Inc. 395 Page Mill Road Palo Alto, CA 94306 USA university.cloudera.com
© 2018 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of
Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies.
Information is subject to change without notice. Cloudera_Developer_Spark_Hadoop_Datasheet_103       180404

Potrebbero piacerti anche