Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Agenda
Big data platforms
Relational databases Analytical databases Hadoop
103 bytes 106 bytes 109 bytes 1012 bytes 1015 bytes 1018 bytes 1021 bytes 1024 bytes
Systems
Movement
a) b) c) d) e) f) g) h) i)
Lots of data Different types of data More data than you can handle Purpose-built analytical systems Distributed file system New staging area and archive A Java developers employment act A replacement for the RDBMS A club for hip data people
Yes!
Information explosion
Unstructured & Content Depot Structured & Replicated
Source: IDC Digital Universe 2009; White Paper, Sponsored by EMC, May 2009
2005
2006
2007
2008
2009
2010
2011
2012
Data deluge
Structured data
Call detail records Point of sale records Claims data
Semi-structured data
Web logs Sensor data Email, Twitter
Unstructured data
Video, Audio, Images, Text
A Sea of Sensors, The Economist, Nov 4, 2010
Structured
Semi-Structured
Unstructured
Operational System
ETL
Operational System
ETL
Data Mart
BI Server
Reports / Dashboards
Operational System
Challenges: - Cost to deploy and upgrade - Doesnt support complex analytics - Scalability and performance
2. Analytical platforms
1010data Aster Data (Teradata) Calpont Datallegro (Microsoft) Exasol Greenplum (EMC) IBM SmartAnalytics Infobright Kognitio Netezza (IBM) Oracle Exadata Paraccel Pervasive Sand Technology SAP HANA Sybase IQ (SAP) Teradata Vertica (HP)
Purpose-built database management systems designed explicitly for query processing and analysis that provides dramatically higher price/performance and availability compared to general purpose solutions. Deployment Options -Software only (Paraccel, Vertica) -Appliance (SAP, Exadata, Netezza) -Hosted(1010data, Kognitio)
Game-changing technology
Quicker to deploy
Preconfigured and tuned Fast ROI
Built-in analytics
Libraries of functions Extensible SDK
Less costly
Less power, cooling, space Fewer people to maintain
Analytical appliance
Analytical Database
3. Hadoop
Ecosystem of open source projects Hosted by Apache Foundation Google developed and shared concepts Distributed file system that scales out on commodity servers with direct attached storage and automatic failover.
13
Benefits
Distributed File System
Data scientist
BIG DATA
Open Source $$ MapReduce
Schema at Read
Drawbacks
No SQL
14
Hadoop ecosystem
Source: Hortonworks
Vestas
Site wind turbines by modeling larger volumes of weather data
CBS Interactive
Optimize ad placement and pricing
Nokia
Identify new data services
16
Hadoop hype
Overheard
Hadoop will replace relational databases. Hadoop will replace data warehouses. Hadoop has a superior query engine compared to analytical platforms.
Use Hadoop for any application that requires more than one node.
17
Considering
Experimenting
Implementing
In production
4%
Hadoop workloads
Today In 18 Months 92% 92% 92% 92% 83% 58% 42% 25% 58% 92%
Ad hoc queries
Scheduled reports Visual exploration Data mining
Based on respondents that have implemented 19 Hadoop. BI Leadership Forum, April, 2012
Hadoop
Analytic Database
Structured
Semi-Structured
Unstructured
20
22
BI Framework 2020
Business Intelligence
End-User Tools
MAD Dashboards
Continuous Intelligence
Content Intelligence
Architecture
MapReduce, XML schema, Key-value pairs, graph notation, etc. HDFS, NoSQL databses
Data Warehousing
Data Warehousing
Dashboard Alerts Event-Driven Alerts and Dashboards Event detection and correlation
CEP, Streams
Event-driven
Ad hoc query, Spreadsheets, Ad hoc SQL OLAP, Visual Analysis, Analytic Workbenches, Hadoop Excel, Access, OLAP, Data mining, visual exploration
Analytics Intelligence
23
Exploration
Power Users
Pros: - Alignment -Consistency Cons: - Hard to build - Politically charged - Hard to change - Expensive - Schema Heavy
BI Framework
TOP DOWN- Business Intelligence Corporate Objectives and Strategy
Reporting & Monitoring (Casual Users) Data Warehousing Architecture
Predefined Metrics
Non-volatile Data
Ad hoc queries
Volatile Data
Casual User
Operational System
BI Server
Machine Data
Data Warehouse
Hadoop Cluster
Top-down Architecture
Virtual Sandboxes
Web Data
Bottom-up Architecture
Inm em ory Sandbox
Audio/video Data
External Data
Power User
Analytical sandboxes
Operational Systems (Structured data)
Operational System
Casual User
Operational System
BI Server
Machine Data
Data Warehouse
Hadoop Cluster
Virtual Sandboxes
Web Data
Audio/video Data
External Data
Power User
Workflows
Capture only whats needed
Source Systems
Capture in case its needed 5. Explore data 9. Report and mine data Analytical tools
6. Parse, aggregate
27
Recommendations
Explore applications for multi-structured data Apply the right tool for the job
RDBMS, Analytical platform, Hadoop, NoSQL
Make power users full-fledged members of your BI environment Reconcile top-down and bottom-up BI environments Create an analytical ecosystem!
28
Questions?
Analytical thought leader Founder, BI Leadership Forum Director of Research, TechTarget Former director of research at TDWI Author