Sei sulla pagina 1di 56

Architecting a

BI Solution
Erich Teichmann
Technical Director, BT
Quick Introduction

• Enterprise Architecture
– Why do we need the solution?
• Market drivers; business needs
• Business Drivers
– What are the major components?
• Key pieces that make up the solution & Key interfaces
• Application Architecture
– How are we going to buy/build the components?
• Vendor Analysis
• Frameworks, Methodologies & Tools
– How are we going to deploy & manage the solution?

• Case Study
– BI solution driven by the Basel II accord
BI Market Analysis
Market Definition

• Business intelligence (BI) is now a ubiquitous,


all-encompassing and often overused term
covering a range of different tools, technologies
and disciplines.
• Most analysts have a very fluid definition of BI
describing it as ‘a set of technologies and processes
that support the decision making process’.
Market Definition - BI Components

• From a software perspective, BI technology covers, but is not


limited to, the following capabilities and disciplines:
– extracting and/or processing information from operational IT systems;
– integrating, possibly supplementing and storing that information in the most appropriate
form
– providing the ability to opportunistically analyse and/or automate the analysis of
information according to business needs
– disseminating and delivering analyses of that information
– determining how best and when to feed the results of analysis back into a company’s
business processes so that it may be actioned
– providing robust development and management tools to support the BI information
infrastructure.

• A BI platform is the name commonly given to the culmination of


these technologies and processes, and is the technical
infrastructure by which technology or industryspecific analytic
applications can be built and deployed.
BI Components
Data feed Data feed Data feed

Staging

Historical
Data Store
HDS
HDSSilo HDS
HDSSilo HDS
Silo Silo HDSSilo
Silo

Restructure - Cleanse
Development
Management

Enterprise
Meta data

Layer

Subset - Aggregate

User Layer
BI Market Drivers

• The growth in BI can be attributed to the use of BI


technology to address three broad business drivers:
– reducing exposure to risk
– reducing costs
– driving competitive advantage.
BI Market Drivers – Reducing Risk

• Regulation is a fact of corporate life today.


– Over the last few years we have seen the introduction of a raft of regulations,
such as Sarbanes-Oxley, Basel II, antimoney laundering and HIPPA, all of
which place pressure on companies to be more accountable for the
monitoring and use of information.
– These regulatory requirements not only impact on the business but also IT,
as the process of capturing, storing and reporting business information is
brought under tighter control.
• BI software can play a vital role in helping organisations comply
with these regulations
– information audit trails,
– data lineage,
– providing an integrated view of the business, and
– enhanced financial reporting and performance measurement
• More standardisation and control to a company’s business
processes.
BI Market Drivers – Reducing Cost
• CxOs are being set far more drastic targets by their
CEOs.
– CIOs, for example, are being asked to reduce operational IT costs by
30% over three years, which means that for many trimming around
the edges is no longer enough.

• From a BI perspective organisations are having to


take more radical steps to reduce costs and this is
manifesting itself in two broad forms:
– Greater emphasis on the consolidation and rationalisation of BI
products and vendors. The cost and complexity of maintaining
multiple BI tools is too high.
– Organisations are using BI software to identify areas of the business
that are not performing, by defining and measuring key performance
indicators across the business - commonly referred to as
performance management.
BI Market Drivers – Competitive Advantage

• The window of opportunity for competitive differentiation


is short but highly valuable, and BI software can play a
key role in helping organisations make the most of this
window.
• In particular, the ability to access, analyse and exploit
the right information at the right time allows
organisations to optimise and make more effective and
timely day-to-day business decisions.
• Examples?
Inhibitors to BI Mass Market

• Data integration;
• Pricing;
• Ease of use
Data Integration
• The critical success factor for many BI projects centres on the
confidence levels attributed to the data underlying a BI solution.
• Users need to be confident about the validity of the data in order to
make successful decisions and ensure widespread BI adoption.
• Need a series of data integration tools and techniques for better
integrating their data in such a way that these initiatives and efforts
can be shared and reused, thereby reducing some of the costs
associated with duplication and unnecessary complexity.
– not just a case of slapping some reporting tools on to exisiting data.

• Data integration issues such as poor data quality, isolated data silos
and multiple versions of the truth contribute to this lack of reuse and
data confidence, and prevent many BI deployments from
maximising their true enterprise potential and ROI.
Pricing Pressures

• The high cost of buying, implementing and maintaining a


BI solution has so far prevented adoption of BI
technologies on a mass scale.
• The most immediate obstacle to this mass adoption
comes from the pricing and licensing models employed
by many of the BI vendors.
– vendors realise that this has to change…

• Large, enterprirse-wide projects.


Ease of Use
• Information consumers are typically concerned with getting access
to the right information at the right time and do not want to struggle
with the baffling array of feature and functions present within certain
BI tools.
• Vedors address this by:
– offering dashboards that provide a more simplistic and relevant view of business
information;
– placing an increasing emphasis on integrating with Office productivity tools,
especially Excel, to increase the familiarity and usability of the user interface for
accessing BI information;
– employing varying levels of visualisation capabilities within their products, ranging
from basic charts and gauges to more advanced mapping technologies to
improve the display and comprehensibility of information.
Positioning BI
BI at the right Organisational Layer
• There are three general strata, each with unique characteristics:
– Strategic BI.
• Targets executives and other high-level decision-makers concerned with long-
term corporate strategy.
• e.g. long-term customer and product trend analysis; long-term forecasting
(sales over the next 12 months, expenses for the next three years); and
corporate financials by divisional, geographic, and temporal dimensions.
– Tactical BI.
• Targets middle management, where mid-level decision-makers are concerned
with short-term domain-specific tactics.
• e.g. short-term customer and product analysis; short-term forecasting (workers
needed for the next three weeks, resource use expectations for two months);
and departmental financial forecasts versus actuals.
– Operational BI.
• Targets production workers who make low-level decisions about instantaneous
process-specific actions.
• e.g. immediate call-center upsell/cross-sell; just-in-time manufacturing; trading
floor buy/sell/hold; and retail credit approval.
Key differentiators for three strata of BI

Source: Forrester (2004)


Key differentiators for three strata of BI

Source: Forrester (2004)


BI Products & Platforms
BI Products - Shortfalls
• Today's BI products fall short in three key areas:
– Data Mining.
Complex statistical and mathematical capabilities that map complex patterns from
large quantities of data. Sophisticated products are few and far between…
– ETL: Extraction, cleaning, and normalizing of data that is then translated into a
common format and loaded into data warehouses for use.
– Vertical reporting tools: Standardized report formats and specific business issue
solutions that can be generated and delivered across the enterprise, as well as
predefined analytical questions of the data.
• Tier one and tier two companies are spending R&D pounds on
developing interface and standard report offerings to address
vertical issues.
• Companies like Oracle are building out their ETL and data mining
offerings, too, but vendors with niche expertise and product
offerings are well positioned for acquisition for this functionality, e.g.
Ab Initio
• Data mining in particular needs sophisticated business analytical
personnel
BI Platforms - Shortfalls
• Most firms' BI strategies involve a multitude of tools connected to
many different data sources, resulting in:
• Unnecessary software spending.
– BI software purchases have historically occurred at the department level, so most
firms have more than one competitive BI tool in-house.
– Piecemeal tool purchasing based on per-user licenses also leads to shelfware,
further increasing firms' software costs without delivering any extra value.
• Excessive training, development, and support.
– Vendors use the same language to describe the functionality of their query,
reporting, and OLAP tools, but their actual implementations and tools vary
significantly. As a result, each BI package that firms own requires unique training,
development, and IT support.
• Multiple versions of the truth.
– The status quo at most firms is that departments use their own BI software and
data sources to produce their own metrics.
– The problem? Many metrics - like profit, customer value, and revenue, for
example - are bigger than any single department. Get conflicting metrics that
purport to measure the same thing.
Status quo – a tangled web of tools
One-Stop Enterprise Intelligence
• Firms have historically pieced together multivendor BI strategies because
they didn't have a choice -- no single product or vendor provided the
analytical features necessary to serve all users.
• New breed of BI software has arrived: BI platforms. They offer…
• A unified data layer.
– Instead of multiple tools making independent connections to data in different combinations of
source systems, BI platforms provide a common metadata layer that unifies data access,
creating a "virtual warehouse" view of enterprise data.
– Reporting, query, OLAP, and mining tools that make up the BI platform all interact with this
central metadata model so that all users, regardless of their department or analytical prowess,
have access to the same “version of the truth”
• Centralised tools.
– BI platforms replace firms' chaotic mix of multiple, redundant tools for functions like reporting
and OLAP with centralised, Web-based interfaces or services (SOA).
– Sharing a single tool set across the enterprise means that IT can finally gain control over the
aspects of BI they handle best: infrastructure, security, and account maintenance. (go BT!)
• Shared metrics.
– BI platforms put an end to contradictory computations by storing key metrics centrally within
the same metadata layer that unifies access to data. When users want to design a report that
computes "churn," they simply drag it to their report from the corporate metric repository.
Streamlined Enterprise Intelligence
Few vendors offer true BI platforms
• Vendors have converged on the BI platform market from the
reporting market (Brio Software and Crystal Decisions), query tools
(Business Objects), multidimensional analysis (Hyperion Solutions
and Cognos), infrastructure (Microsoft and Oracle), business
applications (SAP), and out of the blue (MicroStrategy). They've all
got work to do.
• Database vendors must improve tools.
– Key players host much of the data that feeds firms' BI platforms and have built
OLAP engines and substantial analytical capabilities into their database engines,
but neither has brought a coherent set of BI tools to market.
• BI veterans must embrace data mining.
– Deliver solid reporting and OLAP features, but they don't provide sophisticated
data mining capabilities that professional data analysts need.
• Leaders must focus on functional depth.
– Provide software that runs the gamut of BI platform functionality, but their tools
are merely adequate, not outstanding, in advanced disciplines like data mining
and inline analytics.
Forrester Research (2004)
Business Intelligence Platforms – ‘04

Forrester Research (2004)


Magic Quadrant for BI Platforms

Gartner (Jan 2007)


Questions?
Questions?
Case Study

DWH / Data Mining Solution


in response to
Basel II Accord
Background
• The Basel II Accord demands a new approach to Risk Management
and Capital Adequacy
– Current rules require banks set aside a fixed proportion (8%) of deposited funds
to pay customers when they withdraw money. The rest (up to 92%) they lend out
(mortgages, loans, etc) or otherwise invest
– The magic 8% is supposed to be sufficient to cover occasional losses incurred
when debtors “default” (fail to pay back their loans when expected)
– It doesn’t take into account the actual level of risk of loss presented by different
debtors – i.e. the likelihood that they will default, and the amount of the lending
exposure that will be lost when they default
– For example – the likelihood of default may be lower for a 40 year old employed
person with no bad credit history than for a 22 year old self-employed person with
several “black marks” on their credit file
– Also – if a mortgage for £50,000 on a house worth £100,000 goes into default, the
expected loss may be significantly less than for an unsecured loan of £25,000
– Current rules do not require banks to perform any risk analysis on their loan book,
nor disclose how they have calculated their capital set-aside
Background
• The Basel II Accord demands a more rigorous approach in three
key respects (“Pillars”):
– 1) Capital adequacy requirements must be calculated using a well-defined
quantitative approach, based on estimates of three key measures:
• Probability of Default (PD) – the likelihood an account will default
• Exposure At Default (EAD) – the amount of the loan exposure at the time of
the default
• Loss Given Default (LGD) – the proportion of the loan exposure that the bank
will lose (i.e. not recoup from collections) if the account goes into default
– 2) Risk calculations and capital requirements must be managed and validated
through a controlled, supervised process approved by the Regulator
– 3) Summary risk evaluations and capital adequacy calculations must be disclosed
through regular reporting to the FSA
Background
• To support the fulfilment of the 3 Pillars, the Accord explicitly or
implicitly specifies a number of high-level requirements relating to
capture and exploitation of data:
– Access to comprehensive, accurate, up-to-date account and customer data
across the entire portfolio of products sufficient to support quantitative estimation
of the 3 risk measures
– Ability to create and validate calculation processes (“models”) for risk measures
using historical data going back up to 7 years or more
– Facility to execute models on a regular basis for actual capital calculations
– Capability to provide a complete audit trail of data back to source systems to
show how every calculation was derived
– Facility to create reports on risk measures and capital requirements broken down
by product and/or customer segments
Customer Requirements
• BASEL HISTORICAL DATA CAPTURE
– Acquire and retain historical data for risk analysis, enabling data exploration and modelling
• CAPACITY
– Store 5 - 7 years of detailed historical data, minimise duplication of data
• FLEXIBILITY
– Support wide range of user queries against current and historical data - data store may need
to support other areas (marketing, general CIS queries) as well as Basel
• PERFORMANCE
– Deliver high-performance for data load process and end-user queries
• USABILITY
– Enable users to access, query and understand data with ease
• SINGLE VIEW OF TRUTH
– Provide consistent “single view” of data to all user groups
• AUDITABILITY
– Provide full audit trail from source data to model output and reporting
BDS Architecture Overview
Data feed Data feed Data feed • BASEL HISTORICAL DATA REQUIREMENTS - retains
complete set of historical data for 5 to 7 years
Staging
• CAPACITY - efficient data capture structure, will support
nearline storage technology
Historical
• PERFORMANCE - High speed “Change Data Capture”
Data Store provides rapid population of HDS
HDS
HDSSilo HDS
HDSSilo HDS
Silo Silo HDSSilo
Silo
• AUDITABILITY - Data is held in raw, unmodified form,
enabling derived data to be traced back to source
• SINGLE VIEW OF TRUTH - data is cleaned,
Restructure - Cleanse
transformed and mapped into a single “conformed”
structure, with clear business definitions available from
the Metadata repository
• USABILITY - based on Kimball’s dimensional modelling
Enterprise techniques, ensuring usable, high-performance querying.
Layer Clear business definitions for all data available from the
Metadata repository
• PERFORMANCE -”Changes only” approach supports
rapid EL population. Star schema models support fast
querying.
• FLEXIBILITY- Can easily be modified or partially rebuilt
Subset - Aggregate using historical data to support new or evolving
requirements
User Layer • ANALYSIS - Uses SAS to support exploration, analysis
and modelling to generate risk data, which is fed back
into the BDS for internal and external reporting
• REPORTING TOOLS - Used directly against Enterprise
Layer, or against OLAP cubes for rapid multidimensional
analysis
The Historical Data Store
What is the HDS?
Data feed Data feed Data feed

• The Historical Data Store is a key Staging

component of the Data Historical


Warehouse Architecture Data Store
HDS
HDSSilo
Silo HDS
HDSSilo
Silo HDS
HDSSilo
Silo

• Addresses the challenges of


Restructure - Cleanse
– Storing complete history of a data
source
– Storing it efficiently Enterprise
– Updating it quickly Layer

– Supporting flexible development of


high-performance, usable data marts
in the EL Subset - Aggregate

User Layer
What is historical data?
• Data used for analysis and reporting can be divided into two broad categories
– Transactional data – i.e. records which represent events occuring in time, which tend to be written once
and not subsequently updated, e.g. “financial transaction” or “contact event”
– Static data – i.e. records which represent persistent business entities, e.g. “customer”, “account”,
“product”, which may change over time

• Previous states of static records, and the underlying events which are embodied
in changes to static records, can be vital elements of business information
• New records, changes to existing records, and even the deletion of existing
records are all events that can be useful for analysing trends, producing models,
triggering customer communication, etc
• History of static data is generally not retained in operational systems. A change in
a variable, such as a customer changing name or marital status, typically
overwrites the variable, losing its previous value
• The HDS is designed to capture data regularly from source systems, and retain
the current and all previous states of each static record over time. It also holds
a full history of associate transactional data, enabling transactional records to be
linked to the correct static record versions.
What is historical data?
• Representing historical static data Original data
requires the storage of multiple record
“versions” for each source record 123 J SMITH 1 0 £300
Primary
– Initial version
Key
– Possibly one or more updated versions
Data
– Possibly a “deleted record” version, indicating
the record has been deleted
– Change data records referred to as “deltas”

• Each delta is time-stamped with a Historical data


start and (for previous versions) an
123 J SMITH 0 0 £250 First version
end date, representing the time range
over which the delta is/was valid. 123 J SMITH 0 0 £300 Updated

123 J SMITH 1 0 £300 Updated

123 J SMITH 1 0 £300 Deleted

Deltas
How is historical data stored?
• The HDS stores historical data as follows 123 J SMITH £300 Staging
– For each source table, an equivalent table is constructed
in the HDS containing “deltas”
– The fields in the HDS table include all the fields in the
source table (except where specified as excluded by the
developer), plus a “surrogate key”, and a number of
control fields

Monthly payment
Account number
– The surrogate key is the primary key in the HDS table.

Surrogate key

Previous key
This is because the source primary key cannot be used,

Checksum

Start date

Open flag
since multiple deltas for a single source record may be

End date

Next key
held, resulting in multiple records with the same source

Action
Name
primary key.
– Control fields used to manage multiple record versions
– In general, minimal transformations are performed on the 456 123 J SMYTH £250 … Mon Tue - 567 0 I
data prior to load into the HDS – it is typically held in the
same form to the received extract(s), or split in two to
represent slow-changing (static) and fast-changing 567 123 J SMITH £250 … Tue Fri 456 678 0 U HDS
(transactional) data

678 123 J SMITH £300 … Fri - 567 - 1 U

I Insert U Update D Delete


How is historical data stored?
• Each delta in the HDS has a time range over which it is/was valid, and is described as
being one of three “action” types - an insert (I.e. a new record), an update, or a delete
• Each delta is linked to its previous and next incarnations, so that each source record is
represented as an independent “chain” of deltas spanning the time of the source record’s
existence (and possible subsequent deletion)

Load Number

1 2 3 4 5 6 7 8 9

A I U U
I Insert
B I U D
Legacy Key

U Update
C I U

D Delete
D I U U

E I U D I

Time
How is historical data stored?
• The “Open view” is used to represent the set of deltas that are/were “open” (i.e.
representing current and currently deleted source records) at a particular point in time
• Normally it represents the open records at the current time

Load Number

1 2 3 4 5 6 7 8 9

A I U U

B I U D
Legacy Key

C I U

D I U U

E I U D I Open View
How is historical data stored?
• The Open view can be “rolled back” to a previous load cycle, allowing HDS users to
“see” the data as it existed at any previous time-point

Load Number

1 2 3 4 5 6 7 8 9

A I U U

B I U D
Legacy Key

C I U

D I U U

E I U D I

Open View for Load Number 6


How is historical data loaded?
• Static data is loaded into the HDS using the “Change Data Capture” process. This
compares the new set of data with the current Open view, and determines which changes
have occurred (inserts, updates, deletes)
Detected Changes
HDS A £300 U
Data I
HDS
U U
source I U D

I U C £100 D
I U U

I U D I

E £350 I

New Data Open View

A £300 A £250

C £100
Change Data
D £400 D £400 Capture

E £350
Benefits and caveats
• Benefits • However…
 Efficient storage - does not store multiple • Architects must bear in mind that the HDS is
versions of the same record unless it has not a suitable structure for general user
changed querying. In addition to being “unclean”, data
 Avoids loss of data through cleansing in the HDS does not conform to generally
process - full original versions of all data understood principles of relational integrity -
fields are retained in the HDS irrespective of i.e. relationships between tables rely on
whether they are fully or partially used, or applying time constraints as well as foreign
even understood - this is a key requirement key joins. This results in sub-optimal
of the Data Feeds Architecture performance and usability for some query
types.
 Flexible - No restrictions over “temporal
granularity” - i.e. can capture changes every • The Enterprise Layer is provided in order to
day if required, not just once per expose HDS data to general users in usable,
week/month/quarter high-performance, clean structure.

 Rapid data load - change data capture • The HDS captures data from an early point
processes data very quickly in the development process and ensures
that no data is lost through cleaning or
 High performance data mart load process restructuring, allowing the Enterprise Layer
(built in load cycle iterations, using indexes to be created and modified to support all
on appropriate control columns and keys) foreseeable user query types
The Enterprise Layer
What is the Enterprise Layer?
Data feed Data feed Data feed

• Based on Kimball’s Dimensional Data Staging


Warehouse
Historical
• Presents data to users in a form that is Data Store
HDS
HDSSilo HDS
HDSSilo HDS
Silo Silo HDSSilo
Silo
– Easy to use - no complex joins, meaning
mistakes are less likely
Restructure - Cleanse
– Consistent to use - all fact tables and
dimensions are used in broadly the same
way
– High-performance - each table only one join Enterprise
away from fact table - fact/dimension joins Layer
are consistently optimised by RDBMSs
(including SQL Server)
– Supports straightforward load into cube
Subset - Aggregate
formats
– Data mart build process accelerated using User Layer
CDC
– Capability to partially or completely
reconstruct EL from HDS if necessary
Kimball’s Dimensional Warehouse
Data feed Data feed Data feed

Staging

Restructure - Cleanse
Dimensional model - a collection
of “star schemas” in which data is
divided into measures (stored in
fact tables) and context (stored in
dimension tables), using
surrogate keys to link records
Data over time. Fact tables are
Warehouse independent of one another,
whereas Dimensions are
“conformed” (shared between fact
tables.

Users directly access


Data Warehouse, or use
summary/subsetted
data for enhanced
performance or
specialised needs
What is the Enterprise Layer?

• Methods include
– Dimensional modelling, widely used, well-documented structured
method for star schema design
– Pervasive use of “surrogate keys” - integer primary keys unrelated to
original source key, to enable correct treatment of historical data, reduce
data volumes and improve performance
– Normalised fact tables (all categorical or textual data such as codes,
account numbers, indicators, etc, are moved out to dimensions) reducing
transaction data volumes
– Denormalised dimension tables - making dimensional data easier to
use - e.g. combining customer with current residential address so that
users don’t need to use an intersection table
SAS Integration SAS/CONNECT
to SQL Server
SAS

Historical
Data Store
Enterprise Layer Data
Exploration
Mart
Ad-hoc
load

Treasury
Party
Deal

Ad-hoc Model Build


load Mart
Account
Product
Group
Deploy
model

Account
Time Production Model
load Production
Mart

Credit Risk
Measures

Model writeback
Storage Summary

Static data Dimension Dimension


Open
Transaction data
Static data =< 14 days old
Closed in last ETL
cycle
HDS
Archive Transactions EL

Static data
Closed more Transaction data
than one cycle > 14 days old
previously
Dimension Dimension

Nearline Online
Layer Sizing (5 years)
230GB
• Includes dB overhead, not BCV backup space (online)
Basel II Data Store
42GB 176GB(online) 2.2TB (online)
User Layer

(online) 1.4TB(nearline) Metadata


Browser
Metadata
Repository

Model output "write-back"

HDS

Model Production
BorexMaster Borex Silo
Mart
ODS

Enterprise

Deploy
Model
Other static data Layer
Other
ODS
Silo
SAS
ODS HDS Staging
UL
Model Development 2.5TB
File
HDS Enterprise
Layer
Staging
EL
Mart

(online)
Unisys TCSOUT Unpack & Silo
Format
Reporting/OLAP
File Mart
Other
Unpack & Silo
Transactions
Format

File
CIS Unpack & Silo
Decode Data Exploration
Mart

Change Data Capture


Metadata architecture

FDE Unisys ICL CIS RBS TMS Experian Data definitions


Data feeds

Staging (Complete refresh/Deltas only)

Business rules
Historical
HDS
Data Store
HDSSilo
Silo HDS
HDSSilo
Silo HDS
HDSSilo
Silo

Restructure - Cleanse
Data structure

Metadata
Enterprise
Layer
Data quality

Subset - Aggregate

Data models
User
Layer
SAS OLAP cubes Reporting tools Contacts /
ownership
Transformations /
data lineage
Metadata management process
• Processes for generating and maintaining Metadata are integrated into
the development cycle
– Data analysis
– Design and build
– User Testing
– Live operation

• Users have full visibility of metadata, ensuring that they understand the
data being accessed
• Metadata is held and managed using the CASE tool (Popkin), with
extensions to support data lineage information
Summary

• A BI solution is an enterprise wide inititative


• Realised by a long term project (with a life of its own…);
• Affecting all major systems;
• And users => STAKE HOLDER MANAGEMENT;
• There are various layers of storage;
• Front-end tools only provide the “tip of the iceberg”;

• It is imperative for survival of the enterprise.


Questions?

Potrebbero piacerti anche