Sei sulla pagina 1di 13

Big Data Analytics with MATLAB

Dmitrij Martynenko, Application Engineer, The MathWorks Germany

2015 The MathWorks, Inc.


1
Data Science with MATLAB

Data Analysis
Statistics
Machine Learning
Software Engineering
Multivariable Calculus and Linear Algebra
Big Data
Data Cleaning
Data Visualization and Communication

2
How do you define Big Data?

Any collection of data sets so large and complex that it becomes difficult to
process using traditional data processing applications.
(Wikipedia)

Any collection of data sets so large that it becomes difficult to process using
traditional MATLAB functions, which assume all of the data is in memory.
(MATLAB)

3
Big Data Data Sources

Database Access
Financial Data
ODBC
File I/O JDBC
HDFS (Hadoop) Hardware Access
Text Data acquisition
Spreadsheet Image capture
XML GPU
CDF/HDF Lab instruments
Image
Audio
Video
Geospatial
Web content Communication Protocols
CAN (Controller Area Network)
DDS (Data Distribution Service)
OPC (OLE for Process Control)
XCP (eXplicit Control Protocol)

4
Three Dimensions of Scaling

Data
More data, more quickly
Complicated, incomplete, and variable formats
System too complex to know governing equation

People Compute power


Share algorithms, protect IP Larger, complex problems
Web and enterprise Cloud technologies

5
Three Dimensions of Scaling - MathWorks Solutions

Data
MATLAB Hadoop interface
Distributed arrays

People Compute power


MATLAB deployment tools MATLAB parallel computing solutions

6
Scale Your Data
Memory and Data Access
64-bit processors
Memory Mapped Variables
Disk Variables
Programming Constructs
Databases
Streaming
Datastores
Block Processing
Parallel-for loops
GPU Arrays
SPMD and Distributed Arrays
MapReduce

Platforms
Desktop (Multicore, GPU)
Clusters
Cloud Computing (MDCS for EC2)
Hadoop

7
MATLAB Access Data in HDFS

ds = datastore('hdfs://localhost:9000/datasets/airline/airlinedata.csv);

Datastore

HDFS

Node Data

Node Data

Node Data

Datastore access portions of data


stored in HDFS from MATLAB

Hadoop

8
MATLAB Distributed Computing Server - Hadoop

MATLAB
Distributed
Computing
Datastore
Server
HDFS

Node Data
Map Reduce

Node Data
Map Reduce

Data
MapReduce Node
Map Reduce
Code

9
Scalable Data Workflow
Easily migrate from desktop to Clusters/Hadoop

Desktop Connected to Clusters Production Clusters


datastore/mapreduce mapreduce on clusters Deploy mapreduce
Access HDFS including Hadoop (HDFS) for use on production clusters

MATLAB MATLAB Distributed MATLAB Compiler


(Parallel Computing) Computing Server

10
Key Takeaways

Easy access to Big Data from your desktop with MATLAB

Work on the desktop with MATLAB and scale to clusters

Easy deployment into production


including support for Hadoop

11
Resources

MATLAB MapReduce and Hadoop


http://www.mathworks.com/discovery/matlab-mapreduce-hadoop.html
Google MATLAB Hadoop

Consulting Team
MATLAB for Business Critical Applications

Reach out to your account team

12
Thank you!

2015 The MathWorks, Inc.


13

Potrebbero piacerti anche