Unit 1

Chapter I
Business Intelligence
Aim
The aim of this chapter is to:

introduce the history of business intelligence
elucidate levels of analytical applications and corresponding tools
explain application areas of business intelligence
Objectives
The objectives of this chapter are to:

explain customer relationship management
explicate fast analysis of shared multidimensional information
elucidate multiple dimensions of OLAP
Learning outcome
At the end of this chapter, you will be able to:

understand data warehouse and its invention
explain application areas of business intelligence
explicate data mining process
1/uts
Business Intelligence Tools
1.1 The Birth of BI

In search for the year in which Business Intelligence was first introduced, Naeem Hashmi (2000) says that BI is a
term introduced by Howard Dresner of Gartner Group in 1989 and whilst Hans Dekker (2002) claims that Howard
Dresner invented the term in 1992! It is clear they speak of the same term BI and the same Howard Dresner, but the
supposed birth-years of BI are somewhat puzzling! Who is right?
As we know, most companies include a Contact Us page in their website. Fortunately, Gartner Group is one of
them. The term Business Intelligence was created in 1989 and coined by Gartner in that year. Howard Dresner had
a hand in the creation of that term, but did not join Gartner until YE 1992, when he drove it into the mainstream.
1.2 What is Business Intelligence?

Having got the indistinctness of BIs birth-year out of the way, we can proceed with questioning: What is Business
Intelligence?
Frequency and # of users,
Data mining
OLAP,
Queries & reports
Data Warehouse
Complexity & Business Potential.
Many authors speak of BI as being an umbrella term, with various components hanging under this umbrella.
Another way to look at it is the first explanation of Business Intelligence, which is the following pyramid:
Fig. 1.1 The pyramid of BI

(Source: www.few.vu.nl/en/Images/werkstuk-quarles_tcm39-91416.doc)
What this simple picture tells us is that BI consists of various levels of analytical applications and corresponding
tools that are carried out on top of a Data Warehouse. The lower we go in this hierarchy, the more frequently the
tool is used and the more users it will have. Also, the more the extracted information is based on facts in figures.
The higher we go in the hierarchy, the more complex the analyses taking place and the more business potential that
lies in the resulting information and knowledge.
In researching what is written about the elements hanging under the umbrella, or contained in the pyramid of Business
Intelligence, a conclusion can be drawn that the above ordering is one widely adopted. That is why we are following
this ordering for our subject layout.
1.3 History of BI
Up to this point, we have agreed on Business Intelligence as being an umbrella that covers a whole range of concepts.
It is clear that BI has somehow evolved from other concepts. Therefore, when exploring the history of Business
Intelligence, it seems wise to take a look at what preceded Business Intelligence.
2/uts
The problem with topics such as Business Intelligence, Decision Support Systems and many other acronyms with the
S standing for System is that they are all part of a terribly volatile field. Much has been written about Information
and Support Systems, authors have filled tomes with describing the existing Systems: how do they work, how should
they be built, what are the requirements, and so forth. Unfortunately, little to nothing is written on the history and
development of the Systems. What we would have to do is take all these writings lay them out next to each other
and compare. Consider the following overview given in figure below.
Financial
reporting system
World
Wide
Web
Executive
Information
System(EIS)
Demographic
data provider
Enterprise
Information Portals
Math/ Merge
Services
Web Analytics
Closed Loop
CRM
Marketing
Database
Transaction
Systems
Customer
Information File
(CIFs)
Analytic
Application
Data mining
Interaction
Personalization
Customer
Resource
Management
Extract Files
Data
Warehouse
Reporting
Systems
Online analytical
process(OLAP)
Decision Support
System (DSS)
Ad Hoc Query
Tools
Personal
Computing
Relational
Databases
Multi dimensional
database
Spreadsheet
Software
1975
1980
1985
1990
1995
1998
Fig. 1.2 Trends and influences in data warehousing, 1975-2000
The information that is most volatile is that what we read on the Internet. Where up to about ten years ago authors
wrote their findings down in books and journals, nowadays the easier, faster, cheaper and more accessible way
of publishing is on the World Wide Web. The problem with this medium however, is that a web page has to be
maintained and updated regularly to keep it and its topics alive. When this does not happen, pages get lost or wiped
away or simply contain information that is out of date.
The Database Magazine (also known as DB/M) proved a valuable source of information. DB/M is pinpointed as
being a magazine we must not miss if we are interested in BI. Because it has been published since 1990, however,
it is only since 1997 that BI received the attention of the authors of DB/M.
1.4 Customer Relationship Management

Where companies used to be focused on delivering the right products to their customers, they are now focused on
delivering their products to the right customers. The same goes for Business Intelligence applications. They used to
be more of a back-office tool, concentrated on reporting to the higher management of an organisation. But with the
shift from product to customer, we welcome Customer Relationship Management (commonly abbreviated as CRM).
3/uts
Within this framework of CRM, BI is no longer only used by management levels, but BI-tools and techniques are
developed for all organisational levels.
At various points in the report we will see how Business Intelligence can influence CRM. To give a brief example
up front, BI can be used to identify what is called customer profitability: which customer profiles are responsible
for the highest profit? Based on the answer to this question, a company can choose to change their strategy and, for
instance, make special offers to certain customer groups.
1.5 What is a Data Warehouse?

According to Simon & Schaffer (2001), there is no official definition of a data warehouse, that is, a standard definition
supported by a standards committee such as the American National Standards Institute (ANSI).
Lewis (2001) writes that the most authoritative names in data warehousing define data warehouse as:
A collection of integrated, subject-oriented databases designed to support the DSS function, where each unit of
data is specific to some moment of time. The data warehouse contains atomic data and lightly summarised data.
Basically, a Data Warehouse consists of one or more copies of transaction and/or non-transaction data. These data
have been transformed in such a way that they are contained in a structure that is suitable for querying, reporting
and other data analysis. One of the key features of a Data Warehouse is that it deals with very large volumes of data,
in the range of terabytes. But this is not all. There are cases in which a Data Warehouse has to serve 100s to 1000s
of users, process millions of daily records and carry out 1000s to 100.000s of daily queries and reports! Data rich
industries have been the most typical users, consumer goods, retail, financial services and transport for the obvious
reason that they have large quantities of good quality internal and external data available.
In the opinion of most authors and companies, a data warehouse forms a base, on top of which tools like querying
and reporting can be used to analyse business results. In particular, multi-dimensional warehouses allow more
advanced techniques like OLAP and data mining to identify trends and make predictions. One definition that does
not match this description is the following by Laudon & Laudon (2000):
A data warehouse is a database, with reporting and query tools that stores current and historical data extracted from
various operational systems and consolidated for management reporting and analysis.
The definition these authors give is incomplete. They fail to include the aspect of multi-dimensionality and with this the
fields of OLAP and data mining. Also they include reporting and query tools in the concept of data warehouse, instead
of placing them on top of the data warehouse. Finally, they ascribe the use of data warehousing to the management
level, whereas most providers focus on business users at all levels when developing this kind of tools.
Of course a Data Warehouse does not come into existence out of nothing. A short description of this is given in the
third section: Extraction, Information and Loading. In the last section of this chapter we will go into more detail
on the two authoritative names in data warehousing, Inmon and Kimball. When dealing with Data Warehousing,
one could also come across the term Data Mart. Basically a Data Mart is a part of a Data Warehouse, specifically
concentrated on a part of the business, like a single department. For instance, all the data needed by the Sales
Department are copied out of the Data Warehouse into a Data Mart that will suit just the Sales Department.
Summarising, the Data Warehouse has the following features:

It forms the basis for analytical applications.
It experiences enterprise wide usage.
It is a replication of the data existing in the operational databases.
The business data are cleaned, re-arranged, aggregated and combined.
The warehouse is regularly updated with new data.
It contains the single truth of the business.
4/uts
1.5.1 The Invention of the Data Warehouse

According to Brant (1999) many claims are going round on who actually invented the data warehouse. The right
answer to this question, he says, is IBM. To find the roots of the data warehouse we need to go back to 1988, when
Barry Devlin and Paul Murphy published their article Architecture for a business and information system. This
article led to IBM developing an information-warehouse-strategy. These roots were quickly buried underneath
the data warehouse rage that was created by Bill Inmons book Building the Data Warehouse.
1.5.2 Extraction, Transformation and Loading
A data warehouse is the beginning of the business analysis. The most important process in creating a data warehouse
is ETL, which stands for Extraction, Transformation and Loading. In the first step, Extraction, data from one or
more data sources (databases or file systems) is extracted and copied into what is called the warehouse. Like in the
example of UnoVu, this data source is often a Transaction Processing System. After the extraction, the data has to
undergo the Transformation step. These transformations can range from simple data conversions, summarising and
unifying data codes to complex data scrubbing techniques.
Especially when the data comes from many different sources, it has to be brought together so that all of the information
from each source is brought into the transformation model cleanly. This is a crucial step in the chain from data sources
to data warehouse, since it is here that the data quality is taken care of. After the transformation, the cleansed data
is finally moved from flat files into a data warehouse. This last step is called Loading.
1.6 What are Queries and Reports?

The definitions of querying and reporting are given by Alter (1999):
Query (language): Special-purpose computer language used to provide immediate, online answers to user
questions.
Report (generator): Program that makes it comparatively easy for users or programmers to generate reports by
describing specific report components and features.
Comparatively little is written on querying and reporting (hereafter called Q&R). That is, compared to techniques
like OLAP and data mining. This is probably due to the fact that queries and reports are the most basic forms of
analysis on a data warehouse. They already existed back in the 1970s, in the form of hardcopy reports. As Lewis
(2001) puts it, interactivity was limited to the visual and perhaps extended to writing notes or highlighting on the
reports. Today users have available highly-interactive, online, analytic processing and visualisation tools, where
selected data can be formatted, graphed, drilled, sliced, diced, mined, annotated, enhanced, exported and distributed.
Queries and reports fulfil the purpose of telling management and users what has happened, for example how high
the sales were in the past month or how are the sales of this month compared to those of last month.
Nearly everywhere querying and reporting are lumped together in one tool. This is quite understandable. There
are two types of reporting. The first is the standard reporting. Examples of these are point-in-time reports on sales
figures or other key business that appear each day, week, month, etc. The second type of reporting is when a report
is the output of an ad hoc query. Using a query tool, a user can ask questions about patterns or details in the data.
Logically, the answer will be in some form of a report. Even though this type of reporting can also be standardised
when necessary, the unique thing about queries is that they are built so that the user can ask extra questions about
information that doesnt appear directly from the data. If we take this querying to a higher-dimensional level and
shorter response times, we arrive at OLAP-tools.
The results in the reports form an important input element for the Customer Relationship Management. For instance,
reports on sales and marketing analyses may result in readjusting the marketing strategies or promotions. Financial
reports may indicate that the company is running risks in certain product areas. Analysing customer profitability
can lead to changes in the way certain customers are approached when buying their products. And there are many
more examples where these came from.
5/uts
1.7 What is OLAP?

A useful definition of On-Line Analytical Processing is the following:
On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and
executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of
information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood
by the users.
OLAP is a technology that allows users to carry out complex data analyses with the help of a quick and interactive
access to different viewpoints of the information in data warehouses. These different viewpoints are an important
characteristic of OLAP, also called multidimensionality. Multidimensional means viewing the data in three or more
dimensions. For a database of a Sales Department, these dimensions could be Product, Time, Store and Customer
Age. Analysing data in multiple dimensions is particularly helpful in discovering relationships that can not be
directly deduced from the data itself.
Managers must be able to analyze data across any dimension, at any level of aggregation, with equal functionality
and ease. OLAP software should support these views of data in a natural and responsive fashion, insulating users
of the information from complex query syntax. The fact is that the multidimensionality of OLAP reflects the
multidimensionality of an organisation. The average business model cannot be represented in a two-dimensional
spreadsheet, but needs many more dimensions. Equally, managers and analysts want to be able to look at data from
these different dimensions. That is why all these dimensions should be contained in the OLAP database.
Next to this aspect of multidimensionality, Forsman reviews two other key features of OLAP: calculation-intensive
capabilities and time intelligence. The first refers to the ability to perform complex calculations, in order to create
information from very large and complex amounts of data. The second feature is the dimension time. Time is an
integral component of almost any analytical application. In an OLAP system comparisons of different time periods
must be easily defined, as well as the concept of balances over time totals, averages etcetera.
Turban & Aronson (2001, p.147) employ a much broader definition of OLAP:
The term online analytical processing (OLAP) refers to a variety of activities usually performed by end users in
online systems. There is no agreement on what activities are considered OLAP. Usually one includes activities such
as generating queries, requesting ad hoc reports, conducting statistical analyses, and building DSS and multimedia
applications. Some include executive information systems and data mining. To facilitate OLAP it is useful to work
with the data warehouse and with a set of OLAP tools. These tools can be query tools, spreadsheets, data mining
tools, data visualisation tools, and the like.
Not all organisations have the same idea about what products/tools/techniques are contained within the concept of
OLAP. The only feature that all agree upon is that of Multidimensionality. For the rest, the borderlines between Q&R,
OLAP and Data Mining (DM) are very vague. Some say OLAP is DM, some include OLAP in DM, and some include
DM in OLAP. Turban and Aronson describe BI as the new role of EIS, so a replacement. Well, in the definition
here above they tell us that some include EIS and DM in OLAP. But werent DM and OLAP part of BI?
A brief look at how BI-related organisations categorise their BI-products reveals that most of them offer products
in the line of OLAP. OLAP is the component that is used most generally to describe the activities and services of
an organisation. As mentioned before, different BI-tools are then contained in this OLAP-element.
1.7.1 An OLAP Example
Consider a simple example to understand OLAP. Consider a shoe retailer with many shops in different cities and
many different styles of shoes, for example ski boot, gumboot, and sneaker. Each shop delivers data daily on
quantities sold in numbers per style. These data are stored centrally. Now the business analyst wants to follow sales
by month, outlet and style. These are called dimensions, for example month dimension. If we want to look at the
data of these three dimensions and say something significant about them, what we are actually doing is looking at
the data stored in a 3-dimensional cube:
6/uts
ut
le
t
O
style
month
Fig. 1.3 A 3-dimensional OLAP cube
t
O
ut
le
t
le
ut
O
ea
style
Sn
Amsterdam
ke
ut
le
The following three cubes show us how we can look at, respectively: data on all shoe styles sold in all months in
the outlet Amsterdam, data on shoe style sneaker sold in all months in all outlets, and data on all shoe styles sold
in all outlets in the month April.
style
style
April
month
month
month
Fig. 1.4 The OLAP cube looked at from 3 different dimensions

ut
le
When we combine these three dimensions, we get data on the number of sneakers sold in the outlet Amsterdam in
the month April:
style
250
month
Number of
sneakers sold
in Amsterdam
in April
Fig. 1.5 The 3 dimensions combined in the OLAP cube

7/uts
Suppose we want information about the colours of the sneakers or the sizes sold, we would have to define new
dimensions. This would mean a 4-, 5- or even more-dimensional cube. Of course cubes like this are no longer
visible to the eye, but in an OLAP-application they are possible.
1.8 FASMI
If we go back in time a few decades we come across Dr. E.F. Codd, a well-known database researcher during the
60s, 70s and 80s. In 1993, Dr. Codd wrote a report titled: Providing OLAP (On-Line Analytical Processing) to
User-Analysts: An IT Mandate, in which he defined OLAP in 12 rules. These rules make up the requirements that
an OLAP application should satisfy. A year later, Nigel Pendse and his co-author Richard Creeth became increasingly
occupied by the phenomenon OLAP. After a critical study of the rules of Dr. Codd, some were discarded and others
lumped together in one feature, and a new definition of OLAP was born:
Fast Analysis of Shared Multidimensional Information (FASMI)
In a later article they go on to describe what they mean exactly with the five separate words that make up this
definition:
Fast means that the system is targeted to deliver most responses to users within about five seconds, with the
simplest analyses taking no more than one second and very few taking more than 20 seconds.
Analysis means that the system can cope with any business logic and statistical analysis that is relevant for the
application and the user, and keep it easy enough for the target user.
Shared means that the system implements all the security requirements for confidentiality (possibly down to cell
level) and if multiple writes access is needed, concurrent update locking at an appropriate level.
Multidimensional means that the system must provide a multidimensional conceptual view of the
data, including full support for hierarchies and multiple hierarchies, as this is certainly the most logical
way to analyze businesses and organisations.
Information is all of the data and derived information needed, wherever it is and however much is
relevant for the application.
Nigel Pendse declares that this definition was first used by him and his company in early 1995, and that it has not
needed revision in the years since. He states that the definition has now been widely adopted and is cited in over
120 Web sites in about 30 countries. Research with the help of Google revealed there to be 34 countries with one
or more Web site(s) containing the term FASMI. A total of 21 countries host one or more Web site(s) that write
about FASMI in combination with The OLAP Report. The term is widely and globally used. Striking is, next to
mostly English-language sites, the large number of German (university) sites that include the terms.
We can conclude some points from the history of OLAP:

Multidimensionality is here to stay. Even hard to use, expensive, slow and elitist multidimensional products
survive in limited niches; when these restrictions are removed, it booms. We are about to see the biggest-ever
growth of multidimensional applications.
End-users will not give up their general-purpose spreadsheets. Even when accessing multidimensional databases,
spreadsheets are the most popular client platform. Multidimensional spreadsheets are not successful unless they
can provide full upwards compatibility with traditional spreadsheets, something that Improve and Compete
failed to do.
Most people find it easy to use multidimensional applications, but building and maintaining them takes a
particular aptitude which has stopped them from becoming mass market products. But, using a combination
of simplicity, pricing and bundling, Microsoft now seems determined to prove that it can make OLAP servers
almost as widely used as relational databases.
8/uts
Multidimensional applications are often quite large and are usually suitable for workgroups, rather than
individuals. Although there is a role for pure single-user multidimensional products, the most successful
installations are multi-user, client/server applications, with the bulk of the data downloaded from feeder systems
once rather than many times. There usually needs to be some IT supports for this, even if the application is
driven by end-users.
Simple, cheap OLAP products are much more successful than powerful, complex, expensive products. Buyers
generally opt for the lowest cost, simplest product that will meet most of their needs; if necessary, they often
compromise their requirements. Projects using complex products also have a higher failure rate, probably
because there is more opportunity for things to go wrong.
1.9 OLAP Applications

OLAP technology can be used in a wide range of business applications and industries. The OLAP Report lists the
following application areas:
Application Area
Description
Marketing and sales analysis
Mostly found in consumer goods industries, retailers and the financial services
industry.
Database marketing
Determine who are the best customers for targeted promotions for particular
products or services.
Financial reporting
To address this specific market, certain vendors have developed specialist

products.
Management reporting
Using OLAP based systems one is able to report faster and more flexible, with
better analysis than the alternative solutions.
Profitability analysis
Important in setting prices and discounts, deciding on promotional activities,

selecting areas for investment or divestment and anticipating competitive
pressures.
Quality analysis
OLAP tools provide an excellent way of measuring quality over long periods of
time and of spotting disturbing trends before they become too serious.
Table 1.2 OLAP application areas
1.10 What is Data Mining?

Data mining is the use of data analysis tools to try to find the patterns in large transaction databases. The extended
versions are like the following:
Data mining is analysis of large pools of data to find patterns and rules that can be used to guide decision making
and predict future behaviour.
The first type of definition talks about finding patterns in large databases; the second type also include why we want
to find these patterns, namely to help decision making and predict the future. Based on this, in my opinion there are
four key elements that make up a good definition of data mining:

Finding patterns
Large amounts of data
Help decision making
Predict the future
9/uts
The idea of Data Mining (DM) is to discover patterns in large amounts of data. Whereas query and even OLAP
functions require human interaction to follow relationships through a data source, data mining programs are able to
derive many of these relationships automatically by analysing and learning from the data values contained in files
and databases (Lewis, 2001). The patterns that are found in the data could provide information that cannot directly
be deduced from the data itself, patterns and connections that are not straightforward. These invisible patterns
might not always be logical and useful.
For instance, for a supermarket chain that is based in several different countries, DM might show that the sales of
yogurt in America might be strongly correlated with the sales of bicycles in the UK. Naturally this is a coincidental
connection. But if DM reveals that customers who buy Product X most of the time also purchase Products Y and Z,
it is a very valuable tool for the management to help them in their strategic decision making. Products X, Y and Z
could be in shelves that are located close to each other, or the management could chose to make special offers for
these three products at the same time, to increase the sales in a short time.
Actually there is nothing new about looking for patterns in data. People have been seeking patterns in data ever
since human life began. Hunters seek patterns in animal migration behaviour, farmers seek patterns in crop growth,
and politicians seek patterns in voter opinion. A scientists job is to make sense of data, to discover the patterns that
govern how the physical world works and encapsulate them in theories that can be used for predicting what will
happen in new situations. The entrepreneurs job is to identify opportunities, that is, patterns in behaviour that can
be turned into a profitable business, and exploit them.
1.10.1 The Data Mining Process
A quite general view of the Data Mining process is the one offered by Van der Putten (1999):
Business
Data
Understanding
Understanding
Data
Preparation
Deployment
Modeling
Evaluation
Fig. 1.6 The data mining process

This model is also often referred to as CRISP, the CRoss Industry Standard Process. It is easy to read a book about
DM-techniques and modelling and to understand DM and know what has to be done to mine our data. It is also a
big mistake. Very significant is the larger picture, or process, within which the data mining takes place. The whole
business around it, the type of data, the preparations of the data and a thorough evaluation have to be taken into
account. Each step of the process consists of a number of activities:
10/uts
Step in the Process
Description
Business Understanding
Determining the business objectives, situation assessment, determining the

goal of the data mining, producing a project plan.
Data Understanding
Collecting the initial data, describing and exploring these data and verifying
its quality.
Data Preparation
Selecting, cleaning, constructing, integrating and formatting the data.
Modelling
Selecting a modeling technique, generating test design, building and

implementing the model.
Evaluation
Evaluating the results, reviewing the process and determining the next
steps.
Deployment
Plan deployment, plan monitoring and maintenance, producing the final report
and reviewing the project.
Table 1.3 The steps of the data mining process
1.10.2 Data Mining Techniques

There are many techniques for carrying out Data Mining. A book by Witten & Frank (2000) presents a clear
separation between the desired output information and the tools used to acquire this desired information. The type
of output information is the way in which the newly gained knowledge is represented. For instance: classification of
data, clustering of data, association rules, decision trees or tables or trees for numeric predictions. All these types of
output can be the result of one or more of a wide range of techniques these are also called algorithms. Examples of
algorithms are: inferring rules, statistical modelling, constructing decision trees, and constructing rules with covering
algorithms, linear modelling. Other DM-tools are case based reasoning, neural computing, genetic algorithms and
support vector machines.
All these techniques and concepts can also be found in the categories Machine Learning and Artificial Intelligence.
In fact, they are all about Artificial Intelligence, because the information that is (artificially) gained provides the
user with some form of intelligence. And the techniques used mostly involve a machine that learns from the input
examples it gets and is afterwards able to predict what will happen when other examples occur.
There is no real indication as to which techniques should be used in which cases. However, a choice of technique
can be based on one or more of the following criteria:

Solution quality
Speed
Solution comprehensibility
Expertise required
In some cases it could be preferred to have a DM-tool that provides answers very quickly, no matter what the quality
of the solution is. In other cases one might want a solution of very high quality, but if this means that the solution
concerned becomes totally incomprehensible one will have no use for it.
11/uts
1.10.3 Web Mining: The Internet-variant of Mining

An area of growing importance for companies trying to sell their products is e-commerce. To give an indication
of the growth of this area: in a Data Mining book written in 2000, the first before last sub-section is dedicated to
mining the Web and the authors place it in an infancy stadium. Here we are, two years later, and a large part of the
conversation on Data Mining is dedicated to Web Mining.
The idea behind Web Mining is that the information and knowledge that is dug up by data mining in everyday databases can also be used to provide information about a web site and its visitors. Web sites, and especially
commercial ones, generate gigabytes of data a day that describe every action made by every visitor to the site. One
should realise that there is much more information hidden in the pages of a web site than one would think there is.
And it is exactly this invisible and not-straightforward information that is most valuable to have when engaged
in e-commerce activities. Typical questions answered by Web Mining are:

On which page of the web site do visitors enter / leave the site?
How much time do visitors spend on which page of the site?
How many visitors fill their shopping cart but leave the site without making a purchase?
An article by Carine Joosse (2000) gives a short but interesting description of the different ways of applying data
mining to the Internet. The first is Mining the Web itself. An example of this is collecting data from various sites and
categorising, analysing and presenting them on new web pages for the benefit of the web visitor. Another example is
a search engine on the Web by searching for hits of a word, phrase or synonym, registration of these hits, grouping
them into categories and keeping up a history, the search engine could be made more powerful. The data mining
element in this is making predictions, trend analysis, categorising and data reduction.
A second type of Web mining is Web usage mining. The goal of web usage mining is analysing the site navigation:
how do visitors click through the site, how much time do they spend on which part (page) of the site, on which point
do they enter or leave the site? This form of analysis is also referred to as Clickstream Analysis. Just as important is
to keep records of which visitors finally make a purchase, which visitors start making a purchase that is start filling
their virtual shopping cart and do not buy in the end, and which visitors leave the site without making a purchase.
By combining all these data with the registered customer profiles it is possible to define those types of customers
that are most likely to purchase using the internet. Also, these customer profiles in connection with their behaviour
on the Web site can be used to see if the site should be designed differently.
While most authors ascribe the Web Mining tool Clickstream Analysis to the Data Mining field, Nigel Pendse says in
his OLAP Report that it is one of the latest OLAP applications (Pendse, 2001). He also adds Database Marketing
to his list of OLAP applications. In his opinion, determining who the preferred customers are can be done with
brute force data mining techniques (which are slow and can be hard to interpret), or by experienced business users
investigating hunches using OLAP cubes (which is quicker and easier). In other words, here we encounter once
again the vague boundaries that exist between the concepts within Business Intelligence!
Web mining applications of a more advanced level are personalisation and multichannel-analysis. Personalisation
happens when rules are activated in order to offer personalised content to the visitor. A danger in this application is
that the information is not always fully reliable, in the sense that the visitor cannot be categorised correctly. When
individual visitors make use of a large company network, for example, they will not be recognised as separate
visitors. What Multichannel-analysis comes down to is anticipating the behaviour, wishes and possibilities of the
customer in the use of different communication channels.
12/uts
1.11 Business Intelligence vs. Decision Support Systems

Maybe the reader has noted that the term decision support is used quite often throughout this report. Clearly,
this is because the bottom line of Business Intelligence is supporting decision making. There are three types of
Decision Support: model-driven, data-driven and user-driven. It is the thing to wonder whether Business Intelligence
is actually the new term for Decision Support Systems. And more specifically: is BI replacing Data-driven
Decision Support?
Business Intelligence
Decision Support System
Business Intelligence (BI) is a broad category of A decision support system (DSS) is a computer program
applications and technologies for gathering, storing, application that analyzes business data and presents it so
analyzing, and providing access to data to help enterprise that users can make business decisions more easily.
users make better business decisions.
Table 1.4 BI vs. DSS definition
The key similarity in these two definitions is making business decisions, and in particular both concepts are
focused on helping to make these decisions in a better and easier way. The other important similarity is they both
involve decision making based on data.
The way Dekker (2002) looks at it is that Data Warehousing and Data Mining have two precursors: DSS and EIS.
DSS is focused on the lower and middle management and makes it possible to look at and analyze data in different
ways. EIS is the precursor focused on the higher management. Given the fact that Data Warehousing and Data
Mining form a large part of Business Intelligence, we could indeed see DSS as the precursor of BI.
The following (Alter, 1999) fully enforces Eibens theory about BI replacing data-driven decision support: A
number of approaches developed for supporting decision making include online analytical processing (OLAP) and
data mining. The idea of OLAP grew out of difficulties analyzing the data in databases that were being updated
continually by online transaction processing systems.
When the analytical processes accessed large slices of the transaction database, they slowed down transaction
processing critical to customer relationships. The solution was periodic downloads of data from the active transaction
processing database into a separate database designed specifically to support analysis work. This separate database
often resides on a different computer, which together with its specialised software is called a data warehouse.
What Alter points out here is that, because of the difficulties when analysing the data to support decision making, the
data are duplicated in a Data Warehouse on top of which OLAP and Data Mining can be applied without disturbing
transaction processing. In other words, the components that make up Business Intelligence are replacing the oldfashioned way of performing data-driven decision support on the original transaction processing systems.
1.12 Current Status

As we discussed earlier on, according to Howard Dresner (Buytendijk, no.8 1997) Business Intelligence is an
umbrella-concept with a large number of techniques hanging underneath it. Several segments can be distinguished.
On the underside of the market these are query & reporting tools and the so-called OLAP-viewers. On the upper
side these are DSS- and EIS-packages. Business Intelligence is the covering concept of providing management
information. If we add Dekkers view from the previous section to this and replace the DSS and EIS with Data
Warehousing and Data Mining (not respectively though), we have all the components of the pyramid above. Roughly
speaking, that is:
13/uts
Turban & Aronson (2001) write that the term Business Intelligence (BI) or Enterprise Systems is used to describe the
new role of the Executive Information System, especially now that data warehouses can provide data in easy-to-use,
graphics-intensive query systems capable of slicing and dicing data (Q&R) and providing active multi-dimensional
analysis (OLAP).
Simon & Shaffer (2001) find the following classification of business intelligence applications to be useful:

Simple reporting and querying
Online analytical processing (OLAP)
Executive information systems (EISs)
Data mining
Why do they include EISs as an application amongst Q&R, OLAP and DM? Dont EISs already have some form
of Q&R and even OLAP-like activities in them? One thing is certain, Turban & Aronson and Simon & Shaffer will
not agree on a definition of BI. The first duo says that BI replaces EIS, and the second includes EIS in BI.
1.13 Application Areas

The figure below is given by Pieter den Hamer (1998). It shows what applications are used by people with a
certain level of Expertise (from Low to High) and concerning a certain knowledge value (Data, Information or
Knowledge).
Fig. 1.7 Knowledge value versus user expertise

What this figure makes clear is that end-users with different levels of expertise can apply Business Intelligence
applications to different levels of knowledge. When we think of different types of users, we can picture a junior
sales assistant or an accountant or an employee from the marketing department, but also their managers or the
directors personal secretary or the director himself. Some of these might not want to use BI, but the idea is that
all types of end-users can use BI-tools. They will all use BI in a different way. After all, not everyone is equally
computer-literate, not everyone has the same user expertise.
14/uts
With BI-tools it is possible to carry out analyses and reports on virtually all thinkable aspects of the underlying
business, as long as the data about this business come in large amounts and are stored in a Data Warehouse.
Departments that are known to benefit most from Business Intelligence are (Database) Marketing, Sales, Finance,
ICT (especially the Web) and the higher Management.
Recall in the chapter about Queries & Reports the remark about Q&R not being as far away from our daily line of
work as it may seem. A very good example for this was the SPSS Report Writer that is tightly integrated with SPSS.
Another BI-tool integrated with an application many of us use daily is Business Intelligence for Excel offered by
Business Intelligence Technologies, Inc. This tool also called BIXL differs from other BI-tools in this respect: the
product delivers to an end-users Excel spreadsheet data that can be used for analytical and reporting purposes, from
Microsofts Analysis Services (and other OLEDB for OLAP cube providers), and adds all-important write-back
capabilities for planning (and budgeting and forecasting) tasks.
1.14 Competitive Intelligence

In the context of Business Intelligence Voorma has given an article. In this article Voorma writes that Competitive
Intelligence (hereafter to be called CI) is meant to transform information into action-focused knowledge with news
value for the strategy of a company. CI can be used in many situations. Amongst others:

Learning from the mistakes of competitors
Anticipating new legislation
Anticipating trends in environment and market
Identifying partners and takeover candidates
The output of a CI-process is Actionable Intelligence, by Voorma abbreviated as AI, but please be sure not to get
mixed up with the widely accepted abbreviation of Artificial Intelligence! Actionable Intelligence is the actionfocused (actionable) knowledge, the intelligence that stimulates changes in an organisations strategy.
For the rest it is not entirely clear from Voormas article what the added value is of CI. He names a few initial steps,
like:

Identifying clear goals
Choosing between strategic or operational control
Specifying information requirements
Collecting data
15/uts
Summary

BI is a term introduced by Howard Dresner of Gartner Group in 1989 and whilst Hans Dekker (2002).
Many authors speak of BI as being an umbrella term, with various components hanging under this
umbrella.
BI consists of various levels of analytical applications and corresponding tools that are carried out on top of a
Data Warehouse.
Data warehouse is a collection of integrated, subject-oriented databases designed to support the DSS function,
where each unit of data is specific to some moment of time.
The data warehouse contains atomic data and lightly summarised data.
Data Mart is a part of a Data Warehouse, specifically concentrated on a part of the business, like a single
department.
A data warehouse is the beginning of the business analysis.
OLAP is a technology that allows users to carry out complex data analyses with the help of a quick and interactive
access to different viewpoints of the information in data warehouses.
Multidimensional means viewing the data in three or more dimensions.
Multidimensional applications are often quite large and are usually suitable for workgroups, rather than
individuals.
OLAP products are much more successful than powerful, complex, expensive products.
Data mining is the use of data analysis tools to try to find the patterns in large transaction databases.
Data mining is analysis of large pools of data to find patterns and rules that can be used to guide decision making
The idea of Data Mining (DM) is to discover patterns in large amounts of data.
An area of growing importance for companies trying to sell their products is e-commerce.
The goal of web usage mining is analyzing the site navigation.
With BI-tools it is possible to carry out analyses and reports on virtually all thinkable aspects of the underlying
business.
The output of a CI-process is Actionable Intelligence.
Actionable Intelligence is the action-focused (actionable) knowledge, the intelligence that stimulates changes
in an organisations strategy.
16/uts
References

Business Intelligence The Beginning, [Online] Available at: <http://www.few.vn.nl> [Accessed 25 April
2012].
Pechenizkiy, M., 2006. Lecture 2 Introduction to Business Intelligence, [Online] Available at: <http://www.win.
tue.nl/~mpechen/courses/TIES443/handouts/lecture02.pdf> [Accessed 27 April 2012].
Hartenauer, J., 2007. Introduction to Business Intelligence, VDM Verlag Publication.
Biere, M., 2003. Business Intelligence for the Enterprise, Prentice Hall Professional Publication.
2009. History of Business Intelligence, [Video Online] Available at: <http://www.youtube.com/

watch?v=_1y5jBESLPE> [Accessed 27 April 2012].
2010. What is Business Intelligence?, [Video Online] Available at: <http://www.youtube.com/watch?v=0aHtHljcAs> [Accessed 27 April 2012].
Recommended Reading

Becerra-Fernandez, I. &Sabherwal, R., 2010. Business Intelligence, John Wiley & Sons Publication.
Howson, C., 2007. Successful Business Intelligence, Tata McGraw-Hill Education Publication.
Whitehorn, M., 1999. Business Intelligence: The IBM Solution, Springer Publication.
17/uts
Self Assessment
1. Which of the following statements is false?
a. The data warehouse contains atomic data and lightly summarised data.
b. Data warehouse is designed to support the DSS function.
c. Data warehouse is unable to deal with very large amount of data.
d. A data warehouse is a database, with reporting and query tools.
2. __________ is a replication of the data existing in the operational databases.
a. Data warehouse
b. Data Mart
c. DBMS
d. Database
3. Which of the following process does not include while creating a data warehouse?
a. Extraction
b. Manipulation
c. Transformation
d. Loading
4. _________ is the special-purpose computer language used to provide immediate, online answers to user
questions.
a. Report
b. OLAP
c. Extraction
d. Query
5. _______ is a technology that allows users to carry out complex data analyses with the help of a quick and
interactive access to different viewpoints of the information in data warehouses.
a. OLAP
b. OLATP
c. OLAP-tools
d. OLEDB
6. ________program that makes it comparatively easy for users or programmers to generate reports by describing
specific report components and features.
a. Report
b. OLAP
c. Extraction
d. Query
7. _________is analysis of large pools of data to find patterns and rules that can be used to guide decision making
a. Data extraction
b. Data warehouse
c. Data mining
d. Data manipulation
18/uts
8. Which of the following process does not include in data mining process?
a. Evaluation
b. Abstraction
c. Modelling
d. Deployment
9. Which of the following system is not decision support system?
a. Model-driven
b. Data-driven
c. User-driven
d. System driven
10. The output of a CI-process is_____________.
a. Business Intelligence
b. Artificial Intelligence
c. Actionable Intelligence
d. Competitive Intelligence
19/uts
Chapter II
Components of Business Intelligence Tools
Aim
The aim of this chapter is to:

elucidate various business intelligence terms
introduce the components of data warehouse
explain business intelligence implementations
Objectives
The objectives of this chapter are to:

explain terms related to Business Intelligence
explicate data mart and its components
elucidate data warehouse processes
Learning outcome
At the end of this chapter, you will be able to:

understand analysis of business intelligence candidates
distinguish between types of data marts
identify presentation and analysis tools
20/uts
2.1 Introduction
Business intelligence is not business as usual. Its about making better decisions easier and making them more
quickly. Businesses collect enormous amounts of data every day: information about orders, inventory, accounts
payable, point-of-sale transactions, and of course, customers. Businesses also acquire data, such as demographics
and mailing lists, from outside sources. Unfortunately, based on a recent survey, over 93% of corporate data is not
usable in the business decision-making process today.
Consolidating and organising data for better business decisions can lead to a competitive advantage, and learning
to uncover and leverage those advantages is what business intelligence is all about. The amount of business data is
increasing exponentially. In fact, it doubles every two to three years. More information means more competition.
In the age of the information explosion, executives, managers, professionals, and workers all need to be able to
make better decisions faster.
IBM Business Intelligence solutions are not about bigger and better technology they are about delivering more
sophisticated information to the business end user. BI provides an easy-to-use, shareable resource that is powerful,
cost-effective and scalable to our needs. Much more than a combination of data and technology, BI helps us to create
knowledge from a world of information. Get the right data, discover its power, and share the value, BI transforms
information into knowledge. Business Intelligence is the application of putting the right information into the hands
of the right user at the right time to support the decision-making process.
2.2 Business Driving Forces

It can be noted that there are some business driving forces behind business intelligence, one being the need to
improve ease-of-use and reduce the resources required to implement and use new information technologies. There
are additional driving forces behind business intelligence, for example:
The needs to increase revenues, reduce costs, and compete more effectively. Gone are the days when end users could
manage and plan business operations using monthly batch reports, and IT organisations had months to implement
new applications. Today companies need to deploy informational applications rapidly, and provide business users
with easy and fast access to business information that reflects the rapidly changing business environment. Business
intelligence systems are focused towards end user information access and delivery, and provide packaged business
solutions in addition to supporting the sophisticated information technologies required for the processing of todays
business information.
The need to manage and model the complexity of todays business environment; corporate mergers and deregulation
means that companies today are providing and supporting a wider range of products and services to a broader and
more diverse audience than ever before. Understanding and managing such a complex business environment and
maximising business investment are becoming increasingly more difficult. Business intelligence systems provide
more than just basic query and reporting mechanisms, they also offer sophisticated information analysis and
information discovery tools that are designed to handle and process the complex business information associated
with todays business environment.
The need to reduce IT costs and leverage existing corporate business information, the investment in IT systems
today is usually a significant percentage of corporate expenses, and there is a need not only to reduce this overhead,
but also to gain the maximum business benefits from the information managed by IT systems. New information
technologies like corporate intranets, thin-client computing, and subscription-driven information delivery help reduce
the cost of deploying business intelligence systems to a wider user audience, especially information consumers like
executives and business managers. Business intelligence systems also broaden the scope of the information that can
be processed to include not only operational and warehouse data, but also information managed by office systems
and corporate Web servers.
21/uts
2.3 How to Identify BI Candidates?

The following discovery process will helps in assessing or identifying a candidate for business intelligence. The
following section provides some questions that may help in the thought process these questions are categorised by
level of management and areas within a business, followed by some possible answers to the questions.
2.3.1 Senior Executives of a Corporation
When talking to a senior executive of a company, there are some questions that might help to find out if this company
is a prospect for a BI project in general, and whether the senior executive will be a supportive player during the
process. Some of these questions are:

How do we currently monitor the key or critical performance indicators of our business?
How do we presently receive monthly management reports?
How easily can we answer ad hoc questions with our current reporting systems?
Can we quickly spot trends and exceptions in our business?
Do we have to wait a long time (hours? days?) for answers to new questions?
Is everyone on our management team working from the same information?
Depending on the response of an executive, there are certain needs that, if addressed in his responses, identify the
executive as a BI project prospect. The answers to the previously mentioned questions would point to the following,
if he is a candidate:

Dissatisfaction is exhibited with the current reporting systems, especially in terms of flexibility, timeliness,
accuracy, detail, consistency, and integrity of the information across all business users.
Many people in the organisation spend a lot of time re-keying numbers into spreadsheets.
The senior executive is very vague about how key performance indicators are monitored.
2.3.2 IT Vice Presidents, Directors, and Managers

Addressing other, more technically-oriented executives, the questions to be asked would look like the following
examples:

How do our non-I/S end users analyze or report information?
Do end users often ask IT to produce queries, reports, and other information from the database?
Do end users frequently re-key data into spreadsheets or word processing packages?
Does our production system suffer from a heavy volume of queries and reports running against the system?
Would we like to see our end users receiving more business benefits from the IT organisation? The IT staff is
a data warehousing prospect if the answers point to problem areas, such as:
End users are relying on IT to perform most or all ad hoc queries and reports.
End users have to re-key data into their spreadsheets on a regular basis.
IT identifies end user dissatisfaction with the current reporting systems and processes.
IT has a large backlog built up of end user requests for queries and reports.

IT is concerned about end user queries and reports that are bogging down the production systems.
22/uts
2.3.3 CFOs, Financial Vice Presidents, and Controllers

When talking to financially-oriented executives, there are some totally different questions to be asked to identify
this part of the organisation as an active supporter of a BI project. Some sample questions are shown below:

How are our monthly management reports and budgets delivered and produced?
How timely is that information?
Do we spend more time preparing, consolidating, and reporting on the data, or on analyzing performance that
is based on what the data has highlighted?
Do all the companys executives and managers have a single view of key information to avoid inconsistency?
How easy is it to prepare budgets and forecasts, and then to disseminate that critical information?
Can we easily track variances in costs and overhead by cost center, product, and location?
Is the year-end consolidation and reporting cycle a major amount of duplicated effort in data preparation and
validation, and then in consolidation reporting? The financial staff is a data warehousing prospect if the answers
given to these questions are like these:
Personnel like using spreadsheets, but they usually or often need to re-key or reformat data.
They indicate in any way that their preferred reporting tool would be a spreadsheet if they did not have to
constantly re-key great amounts of numbers into them.
They admit that much time is spent in the production of reports and the gathering of information, with less time
actually spent analyzing the data, and they can identify inconsistencies and integrity issues in the reports that
have been produced.
Budget collection is a painful and time consuming process and there is very little control available in the
collection and dissemination process.
The monthly management reports involve too much time and effort to produce and circulate, and do not easily allow
queries and analysis to be run against them.

Management information does not go into sufficient detail, especially in terms of expense control and overhead
analysis.
General dissatisfaction is expressed with the current information delivery systems.
2.3.4 Sales VPs, Product Managers, and Customer Service Directors

After talking to the senior executive and to the technical and financial executives, there are some more possible
sponsors for a BI project. These are the sales and marketing-oriented personnel, and their possible sponsorship may
be evaluated with the following questions:

How do we perform ad hoc analysis against our marketing and sales data?
How do we monitor and track the effectiveness of a marketing or sales promotion program?
How do we re-budget or re-forecast sales figures and margins?
Do we have to wait a long time (days? weeks?) for sales management information to become available at month
or quarter-end?
How do we track best/worst performance of product/customers, and how do we monitor/analyze product/

customer profitability?
Do we know our customers profiles: buying patterns, demographics, and so on?
Are we and our staff using spreadsheets a lot, and re-keying great mounts of data?
23/uts
The sales and marketing staff is a BI prospect if:

Current reporting is very static and ad hoc requests must be accomplished through IT.
Profitability versus volume and value cannot be easily analyzed, and the measurement of data is inconsistent;
for example, there might be more than one way of calculating margin, profit, and contribution.
There is no concept of re-planning and re-budgeting as it is too difficult to accomplish with the current
systems.
Suppliers cannot be provided with timely information, so it is very difficult to achieve reviews of their
performance.
Getting down to the right level of detail is impossible: for example, to the SKU level in a retail store.

General dissatisfaction is expressed with the current process of information flow and management.
2.3.5 Operations and Production Management

The last group to be covered within this section is the management of operations and production. Their support can
be evaluated by asking questions like these:

How is the validity of the MRP model checked and how accurate do we think it really is?
How do we handle activity based costing?
How do we handle ad hoc analysis and reporting for raw materials, on-time, and quality delivery?
How do we handle production line efficiency, machine, and personnel efficiency?
How do we evaluate personnel costs and staffing budgets?
How do we handle shipments and returns, inventory control, supplier performance, and invoicing?
The operations and production staff is a DW prospect if:

New projects cannot easily be costed out, and trends in quality, efficiency, cost, and throughput cannot be
analyzed.
The preferred access to information would be via a spreadsheet or an easy-to-use graphical user interface.
Currently there is access to daily information only, which means much re-keying into spreadsheets for trending
analysis and so on is required.
The MRP model cannot easily be checked for accuracy and validity on a constant basis.
24/uts
2.4 Main BI Terms

Before we get into more detail about BI, this section will explain some of the terms related to Business Intelligence.
Some common Business Intelligence tools are given below:

Data Mining
Data Warehouse
ODS
Drill down
OLTP
OLTP Server
OLAP
Data Mart
Data Visualisation
Meta Data
2.4.1 Operational Databases

Operational databases are detail oriented databases defined to meet the needs of sometimes very complex processes
in a company. This detailed view is reflected in the data arrangement in the database. The data is highly normalised
to avoid data redundancy and double-maintenance.
2.4.2 OLTP
On-Line Transaction Processing (OLTP) describes the way data is processed by an end user or a computer system.
It is detail oriented, highly repetitive with massive amounts of updates and changes of the data by the end user. It
is also very often described as the use of computers to run the on-going operation of a business.
2.4.3 Data Warehouse
A data warehouse is a database where data is collected for the purpose of being analyzed. The defining characteristic
of a data warehouse is its purpose. Most data is collected to handle a companys on-going business. This type of
data can be called operational data. The systems used to collect operational data are referred to as OLTP (OnLine Transaction Processing). A data warehouse collects, organises, and makes data available for the purpose of
analysis to give management the ability to access and analyze information about its business. This type of data can
be called informational data. The systems used to work with informational data are referred to as OLAP (On-Line
Analytical Processing).
Bill Inmon coined the term data warehouse in 1990. His definition is: A (data) warehouse is a subjectoriented, integrated, time-variant and non-volatile collection of data in support of managements decision-making
process.

Subject-oriented: Data that gives information about a particular subject instead of about a companys on-going
operations.
Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent
whole.
Time-variant: All data in the data warehouse is identified with a particular time period.
25/uts
2.4.4 Data Mart

A data mart contains a subset of corporate data that is of value to a specific business unit, department, or set of users.
This subset consists of historical, summarised, and possibly detailed data captured from transaction processing
systems, or from an enterprise data warehouse. It is important to realise that a data mart is defined by the functional
scope of its users, and not by the size of the data mart database. Most data marts today involve less than 100 GB
of data; some are larger, however, and it is expected that as data mart usage increases they will rapidly increase in
size.
2.4.5 External Data Source
External data is data that cannot be found in the OLTP systems but is required to enhance the information quality
in the data warehouse. The following figure shows some of these sources.
Fig. 2.1 External data sources

(Source: capstone.geoffreyanderson.net)
2.4.6 OLAP
On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and
executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of
information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood
by the user.
OLAP functionality is characterised by dynamic multi-dimensional analysis of consolidated enterprise data supporting
end user analytical and navigational activities including:

Calculations and modelling applied across dimensions, through hierarchies and/or across members
Trend analysis over sequential time periods
Slicing subsets for on-screen viewing
Drill-down to deeper levels of consolidation
Reach-through to underlying detail data
Rotation to new dimensional comparisons in the viewing area
26/uts
OLAP is implemented in a multi-user client/server mode and offers consistently rapid response to queries, regardless
of database size and complexity. OLAP helps the user synthesize enterprise information through comparative,
personalised viewing, as well as through analysis of historical and projected data in various what-if data model
scenarios. This is achieved through use of an OLAP Server.
2.4.7 OLAP Server
An OLAP server is a high-capacity, multi-user data manipulation engine specifically designed to support and operate
on multi-dimensional data structures. A multi-dimensional structure is arranged so that every data item is located
and accessed, based on the intersection of the dimension members that define that item. The design of the server
and the structure of the data are optimised for rapid ad hoc information retrieval in any orientation, as well as for
fast, flexible calculation and transformation of raw data based on formulaic relationships. The OLAP Server may
either physically stage the processed multi-dimensional information to deliver consistent and rapid response times
to end users, or it may populate its data structures in real-time from relational or other databases, or offer a choice
of both. Given the current state of technology and the end user requirement for consistent and rapid response times,
staging the multi-dimensional data in the OLAP Server is often the preferred method.
2.4.8 Metadata
Metadata is the kind of information that describes the data stored in a database and includes such information as:

A description of tables and fields in the data warehouse, including data types and the range of acceptable
values.
A similar description of tables and fields in the source databases, with a mapping of fields from the source to
the warehouse.
A description of how the data has been transformed, including formulae, formatting, currency conversion, and
time aggregation.
Any other information that is needed to support and manage the operation of the data warehouse.
2.4.9 Drill-Down
Drill-down can be defined as the capability to browse through information, following a hierarchical structure. A
small sample is shown in figure below.
27/uts
Fig. 2.2 Drill-down

2.4.10 Operational Versus Informational Databases
The major difference between operational and informational databases is the update frequency:

On operational databases a high number of transactions take place every hour. The database is always up to
date, and it represents a snapshot of the current business situation, or more commonly referred to as point in
time.
Informational databases are usually stable over a period of time to represent a situation at a specific point in
time in the past, which can be noted as historical data.
For example, a data warehouse load is usually done overnight. This load process extracts all changes and new records
from the operational database into the informational database. This process can be seen as one single transaction
that starts when the first record gets extracted from the operational database and ends when the last data mart in the
data warehouse is refreshed. Following figure shows some of the main differences of these two database types.
28/uts
Fig. 2.3 Operational versus informational databases

Data Mining: Data mining is the process of extracting valid, useful, previously unknown, and comprehensible
information from data and using it to make business decisions.
2.5 Different BI Implementations

Different approaches have been made in the past to find a suitable way to meet the requirements for On Line Analytical
Processing. The figure below gives an overview of four major models to implement a decision support system.
29/uts
Fig. 2.4 Business Intelligence implementations

The approaches shown above are described below.
2.5.1 Summary Table
A summary table on an OLTP system is the most common implementation that is already included in many standard
software packages. Usually these summary tables cover only a certain set of requirements from business analysts.
Study the figure below, it shows the advantages and disadvantages of this approach.
30/uts
Fig. 2.5 Summary tables on OLTP

2.5.2 OLTP Data at Separate Server
OLTP data moved to separate server no changes in the database structure are made. This mirroring is a first step to
offload the workload from the OLTP system to a separate dedicated OLAP machine. As long as no restructuring of
the database takes place, this solution will not be able to track changes over time. Changes in the past can not be
reflected in the database because the fields for versioning of slowly changing dimensions are missing. Figure given
below shows this approach, sometimes called A Poor Mans Data Warehouse.
31/uts
Fig. 2.6 Poor mans data warehouse

The technique to move the original OLTP data regularly to a dedicated system for reporting purposes is a step that
can be made to avoid the impact of long running queries on the operational system. In addition to the advantages
in performance, security issues can be handled very easily in this architecture.
Totally isolated machines eliminate any interdependence between analysis and operational workload. The major
problem that will still persist in this architecture is the fact that the database architecture has not changed or been
optimised for query performance the most detailed level of information is copied over to the dedicated analysis
server. The lack of summary tables or aggregations will result in long running queries with a high number of files
and joins in every request. To build architecture like this, file transfer or FTP can be sufficient for some situations.
2.5.3 Single Data Mart
A growing number of customers are implementing single data marts now to get the experiences with data warehousing.
These single data marts are usually implemented as a proof of concept and keep growing over time. A data warehouse
has to be built we cannot buy it! This first brick in the data warehouse has to be kept under control too many single
data marts would create an administration nightmare.
The two tiered model of creating a single data mart on a dedicated machine includes more preparation, planning
and investment. This approach is shown in figure below.
32/uts
Fig. 2.7 2-tiered data mart

The major benefits of this solution compared to the other models are in performance; precalculated and aggregated
values, higher flexibility to add additional data from multiple systems and OLTP applications, and better capabilities
to store historical data. Metadata can be added to the data mart to increase the ease-of-use and the navigation through
the information in the informational database. The implementation of a stand alone data mart can be done very
quickly as long as the scope of the information to be included in the data mart is precisely limited to an adequate
number of data elements.
The three-tiered data warehouse model consists of three stages of data stored on the system(s):

OLTP data in operational databases.
Extracted, detailed, denormalised data organised in a Star-Join Schema to optimise query performance.
Multiple aggregated and precalculated data marts to present the data to the end user.
33/uts
Fig. 2.8 3-tiered data mart

The characteristics of this model are:

Departmental data marts to hold data in an organisational form that is optimised for specific requests new
requirements usually require the creation of a new data mart, but have no further influence on already existing
components of the data warehouse.
Historical changes over time can be kept in the data warehouse.
Metadata is the major component to guarantee success of this architecture, ease-of-use and navigation support
for end users.
Cleansing and transformation of data is implemented at a single point in the architecture.
The three different stages in aggregating/transforming data offer the capability to perform data mining tasks
in the extracted, detailed data without creating workload on the operational system.

Workload created by analysis requests is totally offloaded from the OLTP system.
2.6 Data Warehouse Components

The figure below shows the entire data warehouse architecture in a single view. The following sections will concentrate
on single parts of this architecture and explain them in detail.
34/uts
Fig. 2.9 Data warehouse components

This figure shows the following ideas. The processes required to keep the data warehouse up to date as marked are
extraction/propagation, transformation/cleansing, data refining, presentation, and analysis tools.

The different stages of aggregation in the data are: OLTP data, ODS Star-Join Schema, and data marts.
Metadata and how it is involved in each process is shown with solid connectors.
The horizontal dotted line in the figure separates the different tasks into two groups.
Tasks to be performed on the dedicated OLTP system are optimised for interactive performance and to handle
the transaction oriented tasks in the day-to-day-business.
Tasks to be performed on the dedicated data warehouse machine require high batch performance to handle the
numerous aggregations, precalculation, and query tasks.
2.7 Data Sources

Data sources can be operational databases, historical data usually archived on tapes and external data, for example
from market research companies or from the Internet or information from the already existing data warehouse
environment. The data sources can be relational databases from the line of business applications. They also can reside
on many different platforms and can contain structured information, such as tables or spreadsheets, or unstructured
information, such as plain text files or pictures and other multimedia information.
35/uts
2.7.1 Extraction/Propagation
Data extraction / data propagation is the process of collecting data from various sources and different platforms to
move it into the data warehouse. Data extraction in a data warehouse environment is a selective process to import
decision-relevant information into the data warehouse. Data extraction / data propagation is much more than mirroring
or copying data from one database system to another. Depending on the technique, this process is either:

Pulling (Extraction) or
Pushing (Propagation)
2.7.2 Transformation/Cleansing
Transformation of data usually involves code resolution with mapping tables for example, changing 0 to female and
1 to male in the gender field and the resolution of hidden business rules in data fields, such as account numbers. Also
the structure and relationships of the data are adjusted to the analysis domain. Transformations occur throughout
the population process, usually in more than one step. In the early stages of the process, the transformations are
used more to consolidate the data from different sources, whereas, in the later stages the data is transformed to suit
a specific analysis problem and/or tool.
Data warehousing turns data into information, on the other hand, cleansing ensures that the data warehouse will
have valid, useful, and meaningful information. Data cleansing can also be described as standardisation of data.
Through careful review of the data contents, the following criteria are matched:

Correct business and customer names
Correct and valid addresses
Usable phone numbers and contact information
Valid data codes and abbreviations
Consistent and standard representation of the data
Domestic and international addresses
Data consolidation (one view), such as house holding and address correction
2.7.3 Data Refining

Data refining is creating subsets of the enterprise data warehouse, which have either a multidimensional or a relational
organisation format for optimised OLAP performance. The figure below shows where this process is located within
the entire BI architecture. The atomic level of information from the star schema needs to be aggregated, summarised,
and modified for specific requirements. This data refining process generates data marts that:

Create a subset of the data in the star schema.
Create calculated fields / virtual fields.
Summarise the information.
Aggregate the information.
36/uts
Fig. 2.10 Data refining

This layer in the data warehouse architecture is needed to increase the query performance and minimise the
amount of data that is transmitted over the network to the end user query or analysis tool. When talking about data
transformation/cleansing, there are basically two different ways the result is achieved. These are:

Data aggregation: Change the level of granularity in the information. Example: The original data is stored on a
daily basis the data mart contains only weekly values. Therefore, data aggregation results in less record.
Data summarisation: Add up values in a certain group of information. Example: The data refining process
generates records that contain the revenue of a specific product group, resulting in more records.
2.7.4 Physical Database Model

In BI, talking about the physical data model is talking about relational or multidimensional data models. Figure
below shows the difference between those two physical database models.
37/uts
Fig. 2.11 Physical database models
Both database architectures can be selected to create departmental data marts, but the way to access the data in the
databases is different:

To access data from a relational database, common access methods like SQL or middleware products like
ODBC can be used.
Multidimensional databases require specialised APIs to access the usually proprietary database architecture.
2.7.5 Logical Database Model

In addition to the previously mentioned physical database model, there also is a certain logical database model.
When talking about BI, the most commonly used logical database model is the Star-Join Schema. The Star-Join
Schema consists of two components study the figure below.

Fact tables
Dimension tables
Fig. 2.12 Logical data model

38/uts
The following is a definition for those two components of the Star-Join Schema:

Fact Tables: -what are we measuring? Contain the basic transaction-level information of the business that is
of interest to a particular application. In marketing analysis, for example, this is the basic sales transaction data.
Fact tables are large, often holding millions of rows, and mainly numerical.
Dimension Table: - by what are we measuring? Contain descriptive information and are small in comparison
to the fact tables. In a marketing analysis application, for example, typical dimension tables include time period,
marketing region, product type etcetera.
2.7.6 Metadata Information

Metadata structures the information in the data warehouse in categories, topics, groups, hierarchies and so on. It
is used to provide information about the data within a data warehouse, as given in the following list and shown in
figure given below.

Subject oriented, based on abstractions of real-world entities like project, customer, organisation
etcetera.
Defines the way in which the transformed data is to be interpreted.
Gives information about related data in the Data Warehouse.
Estimates response time by showing the number of records to be processed in a query. Holds calculated fields
and pre-calculated formulas to avoid misinterpretation, and contains historical changes of a view.
Fig. 2.13 Metadata

39/uts
The data warehouse administrator perspective of metadata is a full repository and documentation of all contents
and all processes in the data warehouse, whereas, from an end user perspective, metadata is the roadmap through
the information in the data warehouse.
2.7.7 Operational Data Source (ODS)
The operational data source can be defined as an updatable set of integrated data used for enterprise-wide tactical
decision making. It contains live data, not snapshots, and has minimal history that is retained.
Fig. 2.14 (ODS) Operational data store

Here are some features of an Operational Data Store (ODS):
An ODS is subject oriented: It is designed and organised around the major data subjects of a corporation, such as
customer or product. They are not organised around specific applications or functions, such as order entry or
accounts receivable. An ODS is integrated: It represents a collectively integrated image of subject-oriented data
which is pulled in from potentially any operational system. If the customer subject is included, then all of the
customer information in the enterprise is considered as part of the ODS.
An ODS is current valued: It reflects the current content of its legacy source systems. Current may be defined
in different ways for different ODSs depending on the requirements of the implementation. An ODS should not
contain multiple snapshots of whatever current is defined to be. That is, if current means one accounting period,
then the ODS does not include more that one accounting periods data. The history is either archived or brought
into the data warehouse for analysis.
An ODS is volatile: Since an ODS is current valued; it is subject to change on a frequency that supports the definition
of current. That is, it is updated to reflect the systems that feed it in the true OLTP sense. Therefore, identical
queries made at different times will likely yield different results because the data has changed.
An ODS is detailed: The definition of detailed also depends on the business problem that is being solved by the
ODS. The granularity of data in the ODS may or may not be the same as that of its source operational systems.
40/uts
2.7.8 Data Mart

Figure below gives the idea where data marts are located logically within the BI architecture. The main purpose of
a data mart can be defined as follows:

Store pre-aggregated information.
Control end user access to the information.
Provide fast access to information for specific analytical needs or user group.
Represents the end users view and data interface of the data warehouse.
Creates the multidimensional/relational view of the data.
Offers multiple slice-and-dice capabilities.
The database format can either be multidimensional or relational.
Fig. 2.15 Data mart

2.7.9 Presentation and Analysis Tools
From the end users perspective, the presentation layer is the most important component in the BI architecture shown
in figure below. To find the adequate tools for the end users with information requirements, the assumption can be
made that there are at least four user categories and the possibility of any combination of these categories.
The power user
Users those are willing and able to handle a more or less complex analysis tool to create their own reports and
analysis. These users have an understanding of the data warehouse structure, interdependencies of the organisation
form of the data in the data warehouse.
The non-frequent user
This user group consists of people that are not interested in the details of the data warehouse but have a requirement
to get access to the information from time to time. These users are usually involved in the day-to-day business
and dont have the time or the requirement to work extensively with the information in the data warehouse. Their
virtuosity in handling reporting and analysis tools is limited.
41/uts
Fig. 2.16 Presentation and analysis tools

Users requiring static information
This user group has a specific interest in retrieving precisely defined numbers in a given time interval, such as: I
have to get this quality-summary report every Friday at 10:00 AM as preparation to our weekly meeting and for
documentation purposes.
Users requiring dynamic or ad hoc query and analysis capabilities
Typically, this is a business analyst. All the information in the data warehouse might be of importance to those users,
at some point in time. Their focus is related to availability, performance, and drill-down capabilities to slice and
dice through the data from different perspectives at any time.
Different user-types need different front-end tools, but all can access the same data warehouse architecture. Also,
the different skill levels require different visualisation of the result, such as graphics for a high-level presentation
or tables for further analysis.
42/uts
Summary

Business intelligence is not business as usual. Its about making better decisions easier and making them more
quickly.
Businesses acquire data, such as demographics and mailing lists, from outside sources.
Consolidating and organising data for better business decisions can lead to a competitive advantage, and learning
to uncover and leverage those advantages is what business intelligence is all about.
Dissatisfaction is exhibited with the current reporting systems, especially in terms of flexibility, timeliness,
accuracy, detail, consistency, and integrity of the information across all business users.
Operational databases are detail oriented databases defined to meet the needs of sometimes very complex
processes in a company.
A data warehouse is a database where data is collected for the purpose of being analyzed.
The systems used to collect operational data are referred to as OLTP (On-Line Transaction Processing).
Bill Inmon coined the term data warehouse in 1990. His definition is: A (data) warehouse is a subjectoriented, integrated, time-variant and non-volatile collection of data in support of managements decisionmaking process.
A data mart contains a subset of corporate data that is of value to a specific business unit, department, or set
of users.
External data is data that can not be found in the OLTP systems but is required to enhance the information
quality in the data warehouse.
OLAP is implemented in a multi-user client/server mode and offers consistently rapid response to queries,
regardless of database size and complexity.
An OLAP server is a high-capacity, multi-user data manipulation engine specifically designed to support and
operate on multi-dimensional data structures.
Drill-down can be defined as the capability to browse through information, following a hierarchical structure.
Metadata is the kind of information that describes the data stored in a database and includes information.
A summary table on an OLTP system is the most common implementation that is already included in many
standard software packages.
A data warehouse has to be built we cannot buy it.
The three different stages in aggregating/transforming data offer the capability to perform data mining tasks in
the extracted, detailed data without creating workload on the operational system.
Data sources can be operational databases, historical data usually archived on tapes and external data.
Data extraction / data propagation is the process of collecting data from various sources and different platforms
to move it into the data warehouse.
Transformation of data usually involves code resolution with mapping tables.
Data warehousing turns data into information, on the other hand, cleansing ensures that the data warehouse will
have valid, useful, and meaningful information.
Data refining is creating subsets of the enterprise data warehouse, which have either a multidimensional or a
relational organisation format for optimised OLAP performance.
43/uts
References

Reinschmidt, J., Business Intelligence Certification Guide [pdf] Available at: <capstone.geoffreyanderson.net/
export/.../sg245747.pdf - United States> [Accessed 27 April 2012].
Business Intelligence Components [Online] Available at: <download.microsoft.com/.../Business%20

Intelligence%20components..> [Accessed 27 April 2012].
Haag, 2005. Business Driven Technology W/Cd, Tata McGraw-Hill Education Publication.
Schlukbier, A., 2007. Implementing Enterprise Data Warehousing: A Guide for Executives, Lulu.com
Publication.
2010. Data Warehouse Basics [Video Online] Available at: <http://www.youtube.com/watch?v=EtaUzQrAPK

E&feature=related> [Accessed 27 April 2012].
2011, 1.2.1 BI Tools and Processes, [Video Online] Available at: <http://www.youtube.com/watch?v=ZpBtxKf20zY>
[Accessed 27 April 2012].
Recommended Reading

Panos, V., Vassiliou, Y., Lenzerini, M. & Jarke, M., 2003. Fundamentals of Data Warehouses, 2nd ed. Springer
Publication.
Paredes, J., 2009. The Multidimensional Data Modeling Toolkit: Making Your Business Intelligence Applicatio,
John Paredes Publication.
Scheps, S., 2008. Business Intelligence For Dummies, John Wiley & Sons Publication.
44/uts
Self Assessment
1. ______ describes the way data is processed by an end user or a computer system.
a. OLTP Server
b. ODS
c. OLAP
d. OLTP
2. A _________is a database where data is collected for the purpose of being analysed.
a. data mart
b. data warehouse
c. meta Data
d. data Mining
3. An _________is a high-capacity, multi-user data manipulation engine specifically designed to support and
operate on multi-dimensional data structures.
a. OLAP server
b. OLTP Server
c. ODS
d. OLAP
4. __________is the kind of information that describes the data stored in a database and includes information.
a. Data mart
b. ODS
c. Metadata
d. OLAP
5. ________is the capability to browse through information, following a hierarchical structure.
a. Drill-down
b. Meta data
c. Drill-up
d. Data haunting
6. Which one of the following is not the stage of aggregation in data?
a. OLTP data
b. ODS Star-Join Schema
c. data marts
d. OLAP
7. _________is the process of collecting data from various sources and different platforms to move it into the
data warehouse.
a. Data aggregation
b. Data extraction
c. Data manipulation
d. Drill-down
45/uts
8. Which of the following statement is false?

a. Data warehousing turns data into information.
b. Data cleansing is the standardisation of data.
c. Data extraction in a data warehouse environment is a selective process.
d. Data propagation is mirroring or copying of data.
9. ________is creating subsets of the enterprise data warehouse, which have either a multidimensional or a relational
organisation format for optimised OLAP performance.
a. Data refining
b. Data transformation
c. Data mining
d. Data abstraction
10. _________structures the information in the data warehouse in categories, topics, groups, hierarchies and so
on.
a. OLTP
b. OLAP
c. Metadata
d. OLTP server
46/uts

Unit 1

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Unit 1

Caricato da

Copyright:

Formati disponibili

Chapter I

introduce the history of business intelligence

elucidate levels of analytical applications and corresponding tools

explain application areas of business intelligence

explain customer relationship management

explicate fast analysis of shared multidimensional information

elucidate multiple dimensions of OLAP

understand data warehouse and its invention

explain application areas of business intelligence

explicate data mining process

Business Intelligence Tools

1.1 The Birth of BI

1.2 What is Business Intelligence?

Frequency and # of users,

Complexity & Business Potential.

Fig. 1.1 The pyramid of BI

Fig. 1.2 Trends and influences in data warehousing, 1975-2000

1.4 Customer Relationship Management

Business Intelligence Tools

1.5 What is a Data Warehouse?

It forms the basis for analytical applications.

It experiences enterprise wide usage.

It is a replication of the data existing in the operational databases.

The business data are cleaned, re-arranged, aggregated and combined.

The warehouse is regularly updated with new data.

It contains the single truth of the business.

1.5.1 The Invention of the Data Warehouse

1.6 What are Queries and Reports?

Business Intelligence Tools

1.7 What is OLAP?

Fig. 1.4 The OLAP cube looked at from 3 different dimensions

Fig. 1.5 The 3 dimensions combined in the OLAP cube

Business Intelligence Tools

1.9 OLAP Applications

Marketing and sales analysis

To address this specific market, certain vendors have developed specialist

Important in setting prices and discounts, deciding on promotional activities,

1.10 What is Data Mining?

Large amounts of data

Help decision making

Predict the future

Business Intelligence Tools

Fig. 1.6 The data mining process

Step in the Process

Determining the business objectives, situation assessment, determining the

Selecting, cleaning, constructing, integrating and formatting the data.

Selecting a modeling technique, generating test design, building and

1.10.2 Data Mining Techniques

Business Intelligence Tools

1.10.3 Web Mining: The Internet-variant of Mining

How much time do visitors spend on which page of the site?

1.11 Business Intelligence vs. Decision Support Systems

Decision Support System

1.12 Current Status

Business Intelligence Tools

Simple reporting and querying

Online analytical processing (OLAP)

Executive information systems (EISs)

1.13 Application Areas

Fig. 1.7 Knowledge value versus user expertise

1.14 Competitive Intelligence

Learning from the mistakes of competitors

Anticipating new legislation