Sei sulla pagina 1di 51

Database Concepts

Eighth Edition

Chapter # 8
Data Warehouses, Business
Intelligence Systems, and
Big Data

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Chapter Objectives (1 of 2)
• Learn the basic concepts of data warehouses and data
marts
• Learn the basic concepts of dimensional databases
• Learn the basic concepts of business intelligence (BI)
systems
• Learn the basic concepts of online analytical processing
(OLAP)
• Learn the basic concepts of virtualization and virtual
machines

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Chapter Objectives (2 of 2)
• Learn the basic concepts of cloud computing
• Learn the basic concepts of Big Data, structured storage,
and the MapReduce process

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Big Data
• The rapidly expanding amount of data being
stored and used in enterprise information systems
• Search Tools:
– Google
– Bing
• Web 2.0 social networks:
– Facebook
– LinkedIn
– Twitter

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Storage Capacity Terms
Name Symbol Approximate Value for Actual Value
Reference
Byte 8 bits [Store one character]
Kilobyte KB About 103 210 = 1,024 bytes
Megabyte MB About 106 220 = 1,024 KB
Gigabyte GB About 109 230 = 1,024 MB
Terabyte TB About 1012 240 = 1,024 GB
Petabyte PB About 1015 250 = 1,024 TB
Exabyte EB About 1018 260 = 1,024 PB
Zetabyte ZB About 1021 270 = 1,024 EB
Yottabyte YB About 1024 280 = 1,024 ZB

Figure 8-1: Storage Capacity Terms

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Heather Sweeney Designs Review:
Database Design

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Heather Sweeney Designs Review:
HSD Database Diagram in SQL Server 2016

Figure 8-2: The HSD Database Diagram

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Business Intelligence Systems
• Business intelligence (BI) systems are information systems
that:
– assist managers and other professionals in the analysis of current and
past activities and in the prediction of future events
– do not support operational activities, such as the recording and
processing of orders
 these are supported by transaction processing systems
– support management assessment, analysis, planning and control

• BI systems fall into two broad categories:


– reporting systems that sort, filter, group, and make elementary
calculations on operational data
– data mining applications that perform sophisticated analyses on
data; analyses that usually involve complex statistical and
mathematical processing

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
The Relationship Among
Operational and BI Applications

Figure 8-3: The Relationship Between Operational and BI Applications

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Characteristics of Business
Intelligence Applications

Figure 8-4: Characteristics of Business Intelligence Applications

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Data Warehouses and Data Marts
• A data warehouse is a database system that has
data, programs, and personnel that specialize in the
preparation of data for BI processing.
• Data are read from operational databases by the
Extract, Transform, and Load (ETL) system. The ETL
system then cleans and prepares the data for BI
processing.
• This can be a complex process.

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Components of a
Data Warehouse

Figure 8-5: Components of a Data Warehouse

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Problems with Operational Data
• “Dirty data,” examples include:
– “G” for gender
– “213” for age

• Missing values
• Inconsistent data
– data that has changed (ex: customer’s phone number)

• Nonintegrated data (data from multiple sources)


• Incorrect format (ex: too many or not enough digits
• Too much data (ex: an excess number of columns)
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
ETL Data Transformation
• Data may need to be transformed for use in a data
warehouse. For example:
– {CountryCode  CountryName}
– “US”  “United States”
– Email address to Email domain
 joe@somewhere.com  “somewhere.com”
• ETL system loads the data into the data warehouse
database.
• The extracted data are stored in a data warehouse
database, using a data warehouse DBMS, which may be
from a different vendor than the organization’s operational
DBMS
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Characteristics of a
Data Mart

Figure 8-6: Data Warehouse and Data Marts

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Data Warehouse Versus Data Marts
• The data warehouse takes data from the data manufacturers
(operational systems and purchased data), cleans and
processes them, and locates the data on the shelves.
• A data mart is a collection of data that is smaller than that in the
data warehouse and that addresses a particular component or
functional area of the business.
• The data warehouse takes data from the data producers
and distributes the data to three data marts.

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Enterprise Data Warehouse (EDW)
Architecture
• Combines the data warehouse structure
and the data mart structures shown in the
previous slide
• Expensive to create, staff, and operate
• Smaller organizations use subsets of the
EDW architecture

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Dimensional Databases
• A non-normalized database structure used for data
warehouses
• May use slowly changing dimensions:
– values change frequently
 phone number
 address

• Use a Date or Time Dimension

Operational Database Dimensional Database


Used for structured transaction data processing Used for unstructured analytical data processing
Current data are used Current and historical data are used
Data are inserted, updated, and deleted by users Data are loaded, updated systematically, not by
users

Figure 8-7: Characteristics of Operational and Dimensional Database


Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Star Schema

Figure 8-8: A Star Schema

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
HSD-DW Star Schema

Figure 8-9: The HSD-DW Star Schema

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
The HSD-DW SQL
Create Table Statements (1 of 2)

Figure 8-10(1 of 2): The HSD-DW SQL Create Table Statements

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
The HSD-DW SQL
Create Table Statements (2 of 2)

Figure 8-10(2 of 2): The HSD-DW SQL Create Table Statements

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
The HSD-DW Table Data

Figure 8-11: The HSD-DW Table Data

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
A Query to Summarize Products Sold
by Customer and Product
• The following SQL code is used to summarize
products sold by Customer and Product

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Query Results to Summarize Products
Sold by Customer and Product

Figure 8-12: The HSD-DW Query Results Summarize Product Units Sold by
Customer and Product
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Two-Dimensional Matrix

Figure 8-13: The Two-Dimensional ProductNumber-CustomerID Matrix

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Three-Dimensional Matrix

Figure 8-14: The Three-Dimensional Time-ProductNumber-CustomerID Cube

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Conformed Dimensions
and the Extended HSD-DW Schema

Figure 8-15: The HSD-DW Star Schema Extended for RFM Analysis

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Online Analytical Processing (OLAP)
• Online Analytical Processing (OLAP) is a
technique for dynamically examining
database data:
– OLAP uses arithmetic functions such as Sum
and Average

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
OLAP Reports
• OLAP systems produce an OLAP report,
also known as an OLAP cube.
• The OLAP report uses inputs called
dimensions.
• The OLAP report calculates outputs called
measures.
• Excel PivotTables can be used to create
OLAP reports
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
SQL Query for OLAP Data

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
SQL View for OLAP Data

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Results of the
SQL Query for OLAP Data

Figure 8-16: The HSD-DW Query for OLAP Results: Time-Product-Customer Cube
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Excel PivotTable
OLAP Report I

Figure 8-17: The HSD-DW OLAP ProductNumber by City Report

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Excel PivotTable
OLAP Report II

Figure 8-18: The HSD-DW OLAP ProductNumber by City Report:


CustomerName and Year Dimension Added
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Excel PivotTable
OLAP Report III

Figure 8-19: The HSD-DW OLAP City by ProductNumber, Customer, and Year Report

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Distributed Database Processing
• A database is distributed when it is:
– partitioned
– replicated
– both partitioned and replicated
• This is fairly straightforward for read-only
replicas, but it can be very difficult for other
installations.

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Types of Distributed Databases

Figure 8-20: Types of Distributed Databases

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Object-Relational Database
Management
• Object-oriented programming (OOP) is
based on objects, and OOP is now used as
the basis of many computer programming
languages:
– Java
– Python
– C++
– C#
– Visual Basic.NET
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Objects
• Object classes have:
– methods
 these are programs that perform some task with the
object
– properties
 these are data items particular to an object

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Object Persistence
• Object persistence means that values of
the object properties are storable and
retrievable.
• Object persistence can be achieved by
various techniques:
– a main technique is database technology
– relational databases can be used, but require
substantial programming

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
OODBMS
• Object-Oriented DBMSs (OODBMSs)
have been developed:
– never achieved commercial success
 it would be too expensive to transfer existing
data from relational and other legacy databases
 the OODBMSs were, therefore, not cost
justifiable

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Object-Relational DBMSs
• Some relational DBMS vendors have added
object-oriented features to their products.
– Example: Oracle
• These products are known as object-
relational DBMSs and support object-
relational databases.

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Virtualization
• Virtualization is using hardware and software to
simulate another hardware resource.
• Virtualization is done by having one physical computer
host one or more virtual computers, also known as
virtual machines.
• The host machine runs a special program known as
a virtual machine manager or hypervisor.
• There are two ways to implement hypervisors:
– “bare metal” or type 1 (used in large data centers)
– “hosted” or type 2 (used by students and others)
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Types of Hypervisors

Figure 8-21: Type 1 and Type 2 Hypervisors

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Cloud Computing
• Cloud computing services are ultimately
provided by large data centers.
• Redundant arrays of independent disks (RAID)
can be configured for maximum access speed or
for reliability.
• Three basic ways to lease cloud services:
– Software as a service (SaaS), Ex: Salesforce.com
– Platform as a service (PaaS), Ex: operating systems,
software development tools & system programs provided
– Infrastructure as a service (IaaS), Ex: Only hardware
provided and users manage their own software
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
NoSQL
• The NoSQL movement is a movement to
use non-relational databases.
• These databases are often classified into
four categories:
– Key-Value (Dynamo, MemcacheDB, Redis)
– Document (Couchbase, MongoDB,
MarkLogic)
– Column family (Vertica, Apache-Cassandra)
– Graph (Neo4J, AllegroGraph, Titan)
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Column Family Database
Storage System

Figure 8-22a: A Generalized Column Family Database Storage System

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Column Family Database
Storage System

Figure 8-22b: A Generalized Column Family Database Storage System

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
The MapReduce Process

Figure 8-23: MapReduce

Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved
Hadoop
• Hadoop Distributed File System (HDFS)
is a development platform which provides
standard file services to clustered servers
so that their file systems can function as
one distributed, replicated file system that
can support large scale MapReduce
processing.
• All major DBMS players are supporting
Hadoop.
Copyright © 2018, 2015, 2013 Pearson Education, Inc. All Rights Reserved

Potrebbero piacerti anche