Sei sulla pagina 1di 29

The New BI Ecosystem:

How Big Data Merges Top Down and Bottom up Computing


Wayne W. Eckerson Director of Research and Founder Founder, BI Leadership Forum

Agenda
Big data platforms
Relational databases Analytical databases Hadoop

New analytical ecosystem

What comes next?


Kilobyte (KB) Megabyte (MB) Gigabyte (GB) Terabyte (TB) Petabyte (PB) Exabyte (EB) Zettabyte (ZB) Yottabyte (YB)
3

103 bytes 106 bytes 109 bytes 1012 bytes 1015 bytes 1018 bytes 1021 bytes 1024 bytes

What is big data?


Data

Systems

Movement

a) b) c) d) e) f) g) h) i)

Lots of data Different types of data More data than you can handle Purpose-built analytical systems Distributed file system New staging area and archive A Java developers employment act A replacement for the RDBMS A club for hip data people

Yes!

Information explosion
Unstructured & Content Depot Structured & Replicated

Source: IDC Digital Universe 2009; White Paper, Sponsored by EMC, May 2009

2005

2006

2007

2008

2009

2010

2011

2012

Every 18 months, non-rich structured and unstructured enterprise data doubles 5

Data deluge
Structured data
Call detail records Point of sale records Claims data

Semi-structured data
Web logs Sensor data Email, Twitter

Unstructured data
Video, Audio, Images, Text
A Sea of Sensors, The Economist, Nov 4, 2010

From transactions to observations

Structured

Semi-Structured

Unstructured

Three big data platforms (systems)


General purpose relational database Analytical database Hadoop

1. General purpose RDBMS - Powers first generation DW


Operational System

Benefits: - RDBMS already inhouse - SQL-based - Trained DBAs

Operational System

ETL
Operational System

Data Warehouse Data Warehouse

ETL

Data Mart

BI Server

Reports / Dashboards

Operational System

Challenges: - Cost to deploy and upgrade - Doesnt support complex analytics - Scalability and performance

2. Analytical platforms
1010data Aster Data (Teradata) Calpont Datallegro (Microsoft) Exasol Greenplum (EMC) IBM SmartAnalytics Infobright Kognitio Netezza (IBM) Oracle Exadata Paraccel Pervasive Sand Technology SAP HANA Sybase IQ (SAP) Teradata Vertica (HP)

Purpose-built database management systems designed explicitly for query processing and analysis that provides dramatically higher price/performance and availability compared to general purpose solutions. Deployment Options -Software only (Paraccel, Vertica) -Appliance (SAP, Exadata, Netezza) -Hosted(1010data, Kognitio)

Game-changing technology
Quicker to deploy
Preconfigured and tuned Fast ROI

Faster and more scalable


Faster query response times Linear performance

Built-in analytics
Libraries of functions Extensible SDK

Less costly
Less power, cooling, space Fewer people to maintain

Business value of analytic platforms


Kelley Blue Book Consolidates millions of auto transactions each week to calculate car valuations AT&T Mobility Tracks purchasing patterns for 80M customers daily to optimize targeted marketing

Analytical appliance

Analytical Database

3. Hadoop

Ecosystem of open source projects Hosted by Apache Foundation Google developed and shared concepts Distributed file system that scales out on commodity servers with direct attached storage and automatic failover.

13

Hadoop distilled: Whats new?


Unstructured data

Benefits
Distributed File System

Data scientist

BIG DATA
Open Source $$ MapReduce

Schema at Read

- Comprehensive - Agile - Expressive - Affordable

Drawbacks
No SQL

- Immature - Batch oriented - Expertise - TCO

14

Hadoop ecosystem

Source: Hortonworks

Hadoop use cases


Sabre Holdings
Analyze airline shopping data

Vestas
Site wind turbines by modeling larger volumes of weather data

CBS Interactive
Optimize ad placement and pricing

Nokia
Identify new data services
16

Hadoop hype
Overheard
Hadoop will replace relational databases. Hadoop will replace data warehouses. Hadoop has a superior query engine compared to analytical platforms.

Gartner Group Hype Cycle

Use Hadoop for any application that requires more than one node.

17

Hadoop adoption rates


No plans

38% 32% 20%


5%

Considering
Experimenting

Implementing
In production

4%

Based on 158 respondents, BI Leadership Forum, April, 2012 18

Hadoop workloads
Today In 18 Months 92% 92% 92% 92% 83% 58% 42% 25% 58% 92%

Staging area Online archive Transformation Engine

Ad hoc queries
Scheduled reports Visual exploration Data mining

67% 67% 67% 83%

Based on respondents that have implemented 19 Hadoop. BI Leadership Forum, April, 2012

Which platform do you choose?

Hadoop

Analytic Database

General Purpose RDBMS

Structured

Semi-Structured

Unstructured

20

Big data platform comparison


RDBMS Purpose Volume Variety Access Latency OLTP Low Relational SQL Low Analytical Database Analytics Moderate Relational+ SQL+ Moderate Hadoop Anything High Variable Java+ High

Concurrency Cost per GB


Role

High High DW Hub or data mart


21

Moderate Moderate DW or Sandbox

Low Low Staging area and archive

The New BI Ecosystem

22

BI Framework 2020

Business Intelligence
End-User Tools

Reports and Dashboards


Design Framework

MAD Dashboards

Continuous Intelligence

Content Intelligence

Architecture

Keyword search, BI tools, Xquery, Hive, Java, etc.

MapReduce, XML schema, Key-value pairs, graph notation, etc. HDFS, NoSQL databses

Data Warehousing

Data Warehousing

Dashboard Alerts Event-Driven Alerts and Dashboards Event detection and correlation

CEP, Streams

Event-driven

Reporting & Analysis


Analytic Analytic Sandboxes
Sandboxes

Ad hoc query, Spreadsheets, Ad hoc SQL OLAP, Visual Analysis, Analytic Workbenches, Hadoop Excel, Access, OLAP, Data mining, visual exploration

Analytics Intelligence
23
Exploration
Power Users

Pros: - Alignment -Consistency Cons: - Hard to build - Politically charged - Hard to change - Expensive - Schema Heavy

BI Framework
TOP DOWN- Business Intelligence Corporate Objectives and Strategy
Reporting & Monitoring (Casual Users) Data Warehousing Architecture

Predefined Metrics

Non-volatile Data

Reports Beget Analysis


Pros: - Quick to build - Politically uncharged - Easy to change -Low cost Cons: - Alignment - Consistency - Schema Light Analytics Architecture

Analysis Begets Reports

Ad hoc queries

Volatile Data

Analysis and Prediction (Power Users) Processes and Projects


24

The new analytical ecosystem


Operational Systems (Structured data)
Operational System

Extract, Transform, Load


(Batch, near real-time, or real-time)

Streaming/ CEP Engine

Casual User

Operational System

BI Server

Machine Data

Data Warehouse
Hadoop Cluster

Dept Data Mart

Top-down Architecture

Virtual Sandboxes
Web Data

Bottom-up Architecture
Inm em ory Sandbox

Audio/video Data

FreeStanding Sandbox Analytic platform or nonrelational database

External Data

Power User

Documents & Text

Analytical sandboxes
Operational Systems (Structured data)
Operational System

Extract, Transform, Load


(Batch, near real-time, or real-time)

Streaming/ CEP Engine

Casual User

Operational System

BI Server

Machine Data

Data Warehouse
Hadoop Cluster

Dept Data Mart

Top-down Architecture Bottom-up Architecture


Inmemory Sandbox

Virtual Sandboxes
Web Data

Audio/video Data

FreeStanding Sandbox Analytic platform or nonrelational database

External Data

Power User

Documents & Text

Workflows
Capture only whats needed
Source Systems

1. Extract, transform, load

Analytical database (DW)

Capture in case its needed 5. Explore data 9. Report and mine data Analytical tools

6. Parse, aggregate

27

Recommendations
Explore applications for multi-structured data Apply the right tool for the job
RDBMS, Analytical platform, Hadoop, NoSQL

Make power users full-fledged members of your BI environment Reconcile top-down and bottom-up BI environments Create an analytical ecosystem!

28

Questions?
Analytical thought leader Founder, BI Leadership Forum Director of Research, TechTarget Former director of research at TDWI Author

Wayne Eckerson weckerson@bileadership.com


29

Potrebbero piacerti anche