Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Data Analysis
Statistics
Machine Learning
Software Engineering
Multivariable Calculus and Linear Algebra
Big Data
Data Cleaning
Data Visualization and Communication
2
How do you define Big Data?
Any collection of data sets so large and complex that it becomes difficult to
process using traditional data processing applications.
(Wikipedia)
Any collection of data sets so large that it becomes difficult to process using
traditional MATLAB functions, which assume all of the data is in memory.
(MATLAB)
3
Big Data Data Sources
Database Access
Financial Data
ODBC
File I/O JDBC
HDFS (Hadoop) Hardware Access
Text Data acquisition
Spreadsheet Image capture
XML GPU
CDF/HDF Lab instruments
Image
Audio
Video
Geospatial
Web content Communication Protocols
CAN (Controller Area Network)
DDS (Data Distribution Service)
OPC (OLE for Process Control)
XCP (eXplicit Control Protocol)
4
Three Dimensions of Scaling
Data
More data, more quickly
Complicated, incomplete, and variable formats
System too complex to know governing equation
5
Three Dimensions of Scaling - MathWorks Solutions
Data
MATLAB Hadoop interface
Distributed arrays
6
Scale Your Data
Memory and Data Access
64-bit processors
Memory Mapped Variables
Disk Variables
Programming Constructs
Databases
Streaming
Datastores
Block Processing
Parallel-for loops
GPU Arrays
SPMD and Distributed Arrays
MapReduce
Platforms
Desktop (Multicore, GPU)
Clusters
Cloud Computing (MDCS for EC2)
Hadoop
7
MATLAB Access Data in HDFS
ds = datastore('hdfs://localhost:9000/datasets/airline/airlinedata.csv);
Datastore
HDFS
Node Data
Node Data
Node Data
Hadoop
8
MATLAB Distributed Computing Server - Hadoop
MATLAB
Distributed
Computing
Datastore
Server
HDFS
Node Data
Map Reduce
Node Data
Map Reduce
Data
MapReduce Node
Map Reduce
Code
9
Scalable Data Workflow
Easily migrate from desktop to Clusters/Hadoop
10
Key Takeaways
11
Resources
Consulting Team
MATLAB for Business Critical Applications
12
Thank you!