Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Eighth Edition
Chapter # 8
Data Warehouses, Business
Intelligence Systems, and
Big Data
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Chapter Objectives (1 of 2)
• Learn the basic concepts of data warehouses and data
marts
• Learn the basic concepts of dimensional databases
• Learn the basic concepts of business intelligence (BI)
systems
• Learn the basic concepts of online analytical processing
(OLAP)
• Learn the basic concepts of virtualization and virtual
machines
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Chapter Objectives (2 of 2)
• Learn the basic concepts of cloud computing
• Learn the basic concepts of Big Data, structured storage,
and the MapReduce process
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Big Data
• The rapidly expanding amount of data being
stored and used in enterprise information systems
• Search Tools:
– Google
– Bing
• Web 2.0 social networks:
– Facebook
– LinkedIn
– Twitter
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Storage Capacity Terms
Name Symbol Approximate Value for Actual Value
Reference
Byte 8 bits [Store one character]
Kilobyte KB About 103 210 = 1,024 bytes
Megabyte MB About 106 220 = 1,024 KB
Gigabyte GB About 109 230 = 1,024 MB
Terabyte TB About 1012 240 = 1,024 GB
Petabyte PB About 1015 250 = 1,024 TB
Exabyte EB About 1018 260 = 1,024 PB
Zetabyte ZB About 1021 270 = 1,024 EB
Yottabyte YB About 1024 280 = 1,024 ZB
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Heather Sweeney Designs Review:
Database Design
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Heather Sweeney Designs Review:
HSD Database Diagram in SQL Server 2016
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Business Intelligence Systems
• Business intelligence (BI) systems are information systems
that:
– assist managers and other professionals in the analysis of current and
past activities and in the prediction of future events
– do not support operational activities, such as the recording and
processing of orders
these are supported by transaction processing systems
– support management assessment, analysis, planning and control
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
The Relationship Among
Operational and BI Applications
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Characteristics of Business
Intelligence Applications
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Data Warehouses and Data Marts
• A data warehouse is a database system that has
data, programs, and personnel that specialize in the
preparation of data for BI processing.
• Data are read from operational databases by the
Extract, Transform, and Load (ETL) system. The ETL
system then cleans and prepares the data for BI
processing.
• This can be a complex process.
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Components of a
Data Warehouse
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Problems with Operational Data
• “Dirty data,” examples include:
– “G” for gender
– “213” for age
• Missing values
• Inconsistent data
– data that has changed (ex: customer’s phone number)
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Data Warehouse Versus Data Marts
• The data warehouse takes data from the data manufacturers
(operational systems and purchased data), cleans and
processes them, and locates the data on the shelves.
• A data mart is a collection of data that is smaller than that in the
data warehouse and that addresses a particular component or
functional area of the business.
• The data warehouse takes data from the data producers
and distributes the data to three data marts.
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Enterprise Data Warehouse (EDW)
Architecture
• Combines the data warehouse structure
and the data mart structures shown in the
previous slide
• Expensive to create, staff, and operate
• Smaller organizations use subsets of the
EDW architecture
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Dimensional Databases
• A non-normalized database structure used for data
warehouses
• May use slowly changing dimensions:
– values change frequently
phone number
address
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
HSD-DW Star Schema
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
The HSD-DW SQL
Create Table Statements (1 of 2)
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
The HSD-DW SQL
Create Table Statements (2 of 2)
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
The HSD-DW Table Data
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
A Query to Summarize Products Sold
by Customer and Product
• The following SQL code is used to summarize
products sold by Customer and Product
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Query Results to Summarize Products
Sold by Customer and Product
Figure 8-12: The HSD-DW Query Results Summarize Product Units Sold by
Customer and Product
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Two-Dimensional Matrix
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Three-Dimensional Matrix
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Conformed Dimensions
and the Extended HSD-DW Schema
Figure 8-15: The HSD-DW Star Schema Extended for RFM Analysis
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Online Analytical Processing (OLAP)
• Online Analytical Processing (OLAP) is a
technique for dynamically examining
database data:
– OLAP uses arithmetic functions such as Sum
and Average
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
OLAP Reports
• OLAP systems produce an OLAP report,
also known as an OLAP cube.
• The OLAP report uses inputs called
dimensions.
• The OLAP report calculates outputs called
measures.
• Excel PivotTables can be used to create
OLAP reports
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
SQL Query for OLAP Data
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
SQL View for OLAP Data
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Results of the
SQL Query for OLAP Data
Figure 8-16: The HSD-DW Query for OLAP Results: Time-Product-Customer Cube
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Excel PivotTable
OLAP Report I
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Excel PivotTable
OLAP Report II
Figure 8-19: The HSD-DW OLAP City by ProductNumber, Customer, and Year Report
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Distributed Database Processing
• A database is distributed when it is:
– partitioned
– replicated
– both partitioned and replicated
• This is fairly straightforward for read-only
replicas, but it can be very difficult for other
installations.
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Types of Distributed Databases
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Object-Relational Database
Management
• Object-oriented programming (OOP) is
based on objects, and OOP is now used as
the basis of many computer programming
languages:
– Java
– Python
– C++
– C#
– Visual Basic.NET
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Objects
• Object classes have:
– methods
these are programs that perform some task with the
object
– properties
these are data items particular to an object
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Object Persistence
• Object persistence means that values of
the object properties are storable and
retrievable.
• Object persistence can be achieved by
various techniques:
– a main technique is database technology
– relational databases can be used, but require
substantial programming
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
OODBMS
• Object-Oriented DBMSs (OODBMSs)
have been developed:
– never achieved commercial success
it would be too expensive to transfer existing
data from relational and other legacy databases
the OODBMSs were, therefore, not cost
justifiable
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Object-Relational DBMSs
• Some relational DBMS vendors have added
object-oriented features to their products.
– Example: Oracle
• These products are known as object-
relational DBMSs and support object-
relational databases.
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Virtualization
• Virtualization is using hardware and software to
simulate another hardware resource.
• Virtualization is done by having one physical computer
host one or more virtual computers, also known as
virtual machines.
• The host machine runs a special program known as
a virtual machine manager or hypervisor.
• There are two ways to implement hypervisors:
– “bare metal” or type 1 (used in large data centers)
– “hosted” or type 2 (used by students and others)
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Types of Hypervisors
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Cloud Computing
• Cloud computing services are ultimately
provided by large data centers.
• Redundant arrays of independent disks (RAID)
can be configured for maximum access speed or
for reliability.
• Three basic ways to lease cloud services:
– Software as a service (SaaS), Ex: Salesforce.com
– Platform as a service (PaaS), Ex: operating systems,
software development tools & system programs provided
– Infrastructure as a service (IaaS), Ex: Only hardware
provided and users manage their own software
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
NoSQL
• The NoSQL movement is a movement to
use non-relational databases.
• These databases are often classified into
four categories:
– Key-Value (Dynamo, MemcacheDB, Redis)
– Document (Couchbase, MongoDB,
MarkLogic)
– Column family (Vertica, Apache-Cassandra)
– Graph (Neo4J, AllegroGraph, Titan)
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Column Family Database
Storage System
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Column Family Database
Storage System
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
The MapReduce Process
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Hadoop
• Hadoop Distributed File System (HDFS)
is a development platform which provides
standard file services to clustered servers
so that their file systems can function as
one distributed, replicated file system that
can support large scale MapReduce
processing.
• All major DBMS players are supporting
Hadoop.
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved