Sei sulla pagina 1di 40

Database Strategies for Modern

Business Intelligence and


Analytics
David Stodder
Senior Director, TDWI
Business Intelligence
April 26, 2017
SPONSOR
DAVID STODDER

Senior Research Director


Business Intelligence
TDWI
Agenda
• Database landscape
• Data warehouse to big data lake
evolution
• NoSQL and different types of
databases
• Conclusion and poll question
• Looker presentation
• Q&A with the audience
The Crowded Database Landscape
A Poly-Database World: How Did
We Get Here?
• Weakness of mainstream
relational database solutions
– Expensive to grow
– Scalability issues for massive,
“fast” data volumes
– Complexity in handling semi- and
unstructured data
– Problems as users and use cases
grow more diverse
Data Outside the Relational Realm
• Most users continue to • Data in ERP, CRM, point
work with spreadsheets solution business
for data analysis, applications
calculation, forecasting, • Varying quality,
charting, etc.; CSV files consistency and
• XML, JSON, documents completeness
• Flat files
• Multimedia
• BI and OLAP data stores
BI & DW: Working With Databases
• Traditional BI: Expectation • Data preparation:
that the data in the DW will profiling, quality
be structured improvement,
– Making data consistent and transformation (ETL)
complete, particularly for – Drawbacks: can be
reporting slow, labor-intensive,
• Data warehousing above and inflexible
the DBMS: Integrating and
optimizing access to multiple
sources for analysis
As the Database Goes, So Goes
the Data Warehouse…
• Database systems and • Data warehouse
data management modernization: Often
“platforms” are the starts with the underlying
engines of the data platform
warehouse – Augmentation
– DBMS and data platform – Automation
limitations circumscribe – Optimization
what firms can do with DW – “Rip and replace”
Structured Data Still Dominates

Source: TDWI Best Practices Report on Big Data and Data Science, 2016
DW Still On Top, But Others Rising

Source: TDWI Best Practices Report on Big Data and Data Science, 2016
Analytics: Driving New User Requirements
• Looking beyond reports: What is
desired business outcome? How close
are we? Data interaction
• Optimization: Using insights to reduce
waste and improve efficiency in
business processes, such as marketing,
product/service development
• Analyzing real vs. forecast
performance; identifying outliers and
anomalies; improving agility
• Customer insight: Aspiring to
predictive and proactive engagement

12
Supporting Democratization of BI & Analytics
• Databases must fuel users with data
to “compete on analytics” at all levels
– Expectation that all decisions, strategic,
tactical, and operational, be data-driven
• “One size fits all” BI becoming
outmoded
– Executives, managers, and front-line
workers have distinct needs and require
self-service access, analysis, and
visualization
Demand for Data to Fill in Gaps in
Understanding Customer Lifecycles
• Customer loyalty: When is loyalty • Find answers: Without proper
most important, i.e., most profitable data analysis, marketing
and aligned with strategic managers struggle to know:
objectives? – When are customers most
• Showing ROI of analytics: likely to churn?
identifying cost to business of – What products or services
inadequate info in each phase of would prevent churn? When
customer lifecycle should offer new/additional
• See where missed opportunities & products?
unnecessary costs lie: Engaging – When is it too costly to try to
customer segments at the right time keep certain customers
From BI/OLAP to Analytics/Data Science
BI/OLAP Data Needs Analytics/Data Science
• Traditionally limited to query • Solving unknowns: Asking
and reporting on narrow right questions, not just
selection of structured data getting answers
• Reporting: Have to know • Investigative, iterative “what-
ahead what you want to know; if” data inquiry; questions
limits on interaction leading to more questions,
• Precise data for precise with different variables
answers: But are the answers • Multivariate analysis using
relevant to decisions? multiple types of data
• Metadata, schema limits of • Data lakes for all the data;
BI/OLAP: No one best way to “schema-on-read” for analysis
define, categorize all across sources – structured,
information; silos abound semi-structured, unstructured
Variety of Big Data Analytics and Data
Science Use Cases Driving Diversity and
Volume • Analyzing customer behavior
• Marketing personalization, segmentation
• Social media analysis
• Call detail record
• Systems of engagement (ecommerce, real-time bidding,
gaming)
• Fraud detection
• Risk analysis and loan approvals
• Facility/inventory monitoring using IoT
• Preventive maintenance
• IT operations analytics
Data-Intensive Applications: New
Demands on Database and Platforms
• Applications run on analytics: need data availability
• Data management must serve up diverse internal and
external data to feed analytics
– Could include interaction and observational data, machine data (IoT)
• Data-intensive applications: devoting most processing time to
data interaction and manipulation
– Parallel processing critical to performance and scalability
• Need to minimize data movement; executing algorithms
where the data resides; machine learning and automation
MPP and In-Memory for DW/Big Data
• Massively parallel processing
(MPP) databases and appliances
– Teradata, Pivotal Greenplum, IBM
PureData (formerly Netezza), Spice
Machine
• In-memory databases
– SAP HANA, MemSQL, Exasol,
Redis
The Data Lake: Open and Unstructured
• A “lake” of raw, natural-state data that is not
uniformly modeled and structured
– Set up for data science; BI use evolving
• Structure as needed: late binding, schema-
on-read
• Taking advantage of cheaper storage
(HDFS, cloud storage: Amazon S3, etc.)
• Data security: a work in progress
• Future: logical data warehouse, information
fabrics, hybrid architecture
NoSQL: Freeing Up Database Tech
• “Not only SQL” and often based on open source, enabling
tremendous diversity in how databases are set up
• Key focus: using massively parallel, distributed process
platforms and cheaper storage to enable better scalability
• Hadoop, MapReduce and related technologies part of the
NoSQL environment
• Column-oriented (“columnar): going deep rather than wide
– HPE Vertica, Amazon Redshift, SAP Sybase IQ, Apache Hbase,
Cassandra; storing data in columns rather than rows for faster
retrieval with less I/O to disk; many work with SQL (or variations)
NoSQL: Expanding to New Data Types
• Real-time data, event processing, and streaming
– Aerospike, Vitria
– From call detail records to capturing whole customer
data lifecycle
• Graph databases
– Neo4j, others
• Documents, media, key value, etc.
– MarkLogic, MongoDB, Datastax, Couchbase, Oracle
NoSQL
Conclusion: The Future is Hybrid
• Hybrid BI and analytics • Hybrid data and
– Enabling a broader analytics architecture
types of BI/analytics aims:
– Using the right – Flexibility and openness
platforms for analytics – Matching workloads with
workloads and meeting platforms
business requirements
– Knowledge about the
• Hybrid data platforms data; access to trusted
– Mixing on-premises data
and cloud platforms – Fit with skill sets
Thank You!

David Stodder
Director of Research for Business Intelligence
TDWI (www.tdwi.org)
dstodder@tdwi.org
(415) 859-9933

23
When to Switch?
Haarthi Sadasivam, Technical Product Marketing Manager
HAARTHI SADASIVAM

Technical Product
Marketing Manager,
Looker
The space and the needs have evolved

Data Volume is Growing Need for More Access


The growing number of data sources, coupled Self Service analytics tools are seeing
with a decrease in the price of storage, is increased adoption at many companies.
prompting companies to collect everything.
Increased Usage / High Concurrency
Internal
Self-service tools drive usage among
employees

External
Companies with embedded analytics
provide insights to customers
‘’
“We feel very confident that
whatever we run into, we will be
able to scale the Snowflake
solution to meet the
performance requirements of
our pharmacy customers.”
— John Foss, Director of Business Intelligence and
Manufacturer Reporting at PDX
Volume of Data / Low Latency

Volume of Data Latency Needs to MPP Databases


Decrease become key
Storing data has
become very cheap. Databases that can
In-database analytics
Database solutions are scale compute power
means that queries
separating storage and are winning.
can’t take inordinate
compute.
amounts of time.
‘’
“As our business and data keeps
growing, we’re confident that BigQuery
will continue to deliver a high-quality
interactive analytics experience so our
teams can spend more time digging for
insights rather than waiting for hours to
get answers.”

— Jason Jho, Head of Data Engineering at


Blue Apron
• Transactional vs.
Analytical Data

Types of • Structured vs.


Unstructured Data

Data • Centralization of
Disparate Data
Sources for
Analytics
‘’
“There was a point where really, if
we hadn’t made this move [to
Redshift], we would have plateaued
and peaked, and the data team
would have been an anchor slowing
down the rest of the company.”

— Scott Breitenother, VP of Data and Analytics at


Casper
New Database Technologies

Rise of Cloud Databases Proliferation of MPP Hadoop


Proliferation of cloud-based Columnar Databases HDFS being leveraged for
offerings BigQuery, Redshift, and rapid processing of very large
Snowflake are seeing datasets
widespread adoption
Questions
What do you want to learn more about?
More resources at
@lookerdata
www.looker.com/learn
Poll Question
What is your biggest challenge in matching
BI/analytics needs with databases?
• Skills and training deficiencies
• Data volume growing faster than we can manage it
• Data diversity
• Users have difficulty accessing and interacting with data
• Data security, governance, and management concerns
QUESTIONS?

tdwi.or
CONTACT INFORMATION
If you have further questions or comments:

David Stodder, TDWI


dstodder@tdwi.org @dbstodder

Haarthi Sadasivam, Looker


haarthi@looker.com

tdwi.or
Learn More in Chicago!
TDWI Conference
“Modernizing Your Data Ecosystem”
Keynotes, Educational Classes, Networking, and More
Chicago, IL | May 7-12, 2017
http://www.tdwi.org/chicago

*
TDWI Leadership Summit
“Architecting a Modern Data Ecosystem”
Chicago, IL | May 8-9, 2017
http://www.tdwi.org/chicagosummit

40

Potrebbero piacerti anche