Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Big Data
with
Introduction
ETL
Extract Transform Load
^15
1.1259x10
600 TB
• Distributed Architecture
Needed
• Structured / Unstructured
Around 6 hours
Yes.
Most of the existing systems can’t handle it.
X =>
Devices: Application
Connectivity
Smart Phones Social Networks
Wifi, 4G, NFC, GPS
4.6 billion mobile-phones. Internet of Things
1 - 2 billion people accessing the internet.
1. CPU Speed
4. Network
3. HDD or SSD
2. RAM - Speed & Size
Disk Size + Speed
Introduction to Hadoop & Spark
Which Components Impact the Speed of
Computing?
A. CPU
B. Memory Size
C. Memory Read Speed
D. Disk Speed
E. Disk Size
F. Network Speed
G. All of Above
Telecommunications
1.Customer Churn Prevention
2.Network Performance Optimization
3.Calling Data Record (CDR) Analysis
4.Analyzing Network to Predict Failure
1.Apache Hadoop
○ Apache Spark
2.Cassandra
3.MongoDB
4.Google Compute Engine
Compute Engine
NoSQL Datastore
Resource Manager
File Storage
HDFS
HBase
Hive
Dataframes Streaming MLLib GraphX Libraries
Tachyon
Spark Core
Cassandra
Hadoop
Amazon EC2 Standalone Apache Mesos
YARN
Resource/cluster managers