Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
BI Solution
Erich Teichmann
Technical Director, BT
Quick Introduction
• Enterprise Architecture
– Why do we need the solution?
• Market drivers; business needs
• Business Drivers
– What are the major components?
• Key pieces that make up the solution & Key interfaces
• Application Architecture
– How are we going to buy/build the components?
• Vendor Analysis
• Frameworks, Methodologies & Tools
– How are we going to deploy & manage the solution?
• Case Study
– BI solution driven by the Basel II accord
BI Market Analysis
Market Definition
Staging
Historical
Data Store
HDS
HDSSilo HDS
HDSSilo HDS
Silo Silo HDSSilo
Silo
Restructure - Cleanse
Development
Management
Enterprise
Meta data
Layer
Subset - Aggregate
User Layer
BI Market Drivers
• Data integration;
• Pricing;
• Ease of use
Data Integration
• The critical success factor for many BI projects centres on the
confidence levels attributed to the data underlying a BI solution.
• Users need to be confident about the validity of the data in order to
make successful decisions and ensure widespread BI adoption.
• Need a series of data integration tools and techniques for better
integrating their data in such a way that these initiatives and efforts
can be shared and reused, thereby reducing some of the costs
associated with duplication and unnecessary complexity.
– not just a case of slapping some reporting tools on to exisiting data.
• Data integration issues such as poor data quality, isolated data silos
and multiple versions of the truth contribute to this lack of reuse and
data confidence, and prevent many BI deployments from
maximising their true enterprise potential and ROI.
Pricing Pressures
User Layer
What is historical data?
• Data used for analysis and reporting can be divided into two broad categories
– Transactional data – i.e. records which represent events occuring in time, which tend to be written once
and not subsequently updated, e.g. “financial transaction” or “contact event”
– Static data – i.e. records which represent persistent business entities, e.g. “customer”, “account”,
“product”, which may change over time
• Previous states of static records, and the underlying events which are embodied
in changes to static records, can be vital elements of business information
• New records, changes to existing records, and even the deletion of existing
records are all events that can be useful for analysing trends, producing models,
triggering customer communication, etc
• History of static data is generally not retained in operational systems. A change in
a variable, such as a customer changing name or marital status, typically
overwrites the variable, losing its previous value
• The HDS is designed to capture data regularly from source systems, and retain
the current and all previous states of each static record over time. It also holds
a full history of associate transactional data, enabling transactional records to be
linked to the correct static record versions.
What is historical data?
• Representing historical static data Original data
requires the storage of multiple record
“versions” for each source record 123 J SMITH 1 0 £300
Primary
– Initial version
Key
– Possibly one or more updated versions
Data
– Possibly a “deleted record” version, indicating
the record has been deleted
– Change data records referred to as “deltas”
Deltas
How is historical data stored?
• The HDS stores historical data as follows 123 J SMITH £300 Staging
– For each source table, an equivalent table is constructed
in the HDS containing “deltas”
– The fields in the HDS table include all the fields in the
source table (except where specified as excluded by the
developer), plus a “surrogate key”, and a number of
control fields
Monthly payment
Account number
– The surrogate key is the primary key in the HDS table.
Surrogate key
Previous key
This is because the source primary key cannot be used,
Checksum
Start date
Open flag
since multiple deltas for a single source record may be
End date
Next key
held, resulting in multiple records with the same source
Action
Name
primary key.
– Control fields used to manage multiple record versions
– In general, minimal transformations are performed on the 456 123 J SMYTH £250 … Mon Tue - 567 0 I
data prior to load into the HDS – it is typically held in the
same form to the received extract(s), or split in two to
represent slow-changing (static) and fast-changing 567 123 J SMITH £250 … Tue Fri 456 678 0 U HDS
(transactional) data
Load Number
1 2 3 4 5 6 7 8 9
A I U U
I Insert
B I U D
Legacy Key
U Update
C I U
D Delete
D I U U
E I U D I
Time
How is historical data stored?
• The “Open view” is used to represent the set of deltas that are/were “open” (i.e.
representing current and currently deleted source records) at a particular point in time
• Normally it represents the open records at the current time
Load Number
1 2 3 4 5 6 7 8 9
A I U U
B I U D
Legacy Key
C I U
D I U U
E I U D I Open View
How is historical data stored?
• The Open view can be “rolled back” to a previous load cycle, allowing HDS users to
“see” the data as it existed at any previous time-point
Load Number
1 2 3 4 5 6 7 8 9
A I U U
B I U D
Legacy Key
C I U
D I U U
E I U D I
I U C £100 D
I U U
I U D I
E £350 I
A £300 A £250
C £100
Change Data
D £400 D £400 Capture
E £350
Benefits and caveats
• Benefits • However…
Efficient storage - does not store multiple • Architects must bear in mind that the HDS is
versions of the same record unless it has not a suitable structure for general user
changed querying. In addition to being “unclean”, data
Avoids loss of data through cleansing in the HDS does not conform to generally
process - full original versions of all data understood principles of relational integrity -
fields are retained in the HDS irrespective of i.e. relationships between tables rely on
whether they are fully or partially used, or applying time constraints as well as foreign
even understood - this is a key requirement key joins. This results in sub-optimal
of the Data Feeds Architecture performance and usability for some query
types.
Flexible - No restrictions over “temporal
granularity” - i.e. can capture changes every • The Enterprise Layer is provided in order to
day if required, not just once per expose HDS data to general users in usable,
week/month/quarter high-performance, clean structure.
Rapid data load - change data capture • The HDS captures data from an early point
processes data very quickly in the development process and ensures
that no data is lost through cleaning or
High performance data mart load process restructuring, allowing the Enterprise Layer
(built in load cycle iterations, using indexes to be created and modified to support all
on appropriate control columns and keys) foreseeable user query types
The Enterprise Layer
What is the Enterprise Layer?
Data feed Data feed Data feed
Staging
Restructure - Cleanse
Dimensional model - a collection
of “star schemas” in which data is
divided into measures (stored in
fact tables) and context (stored in
dimension tables), using
surrogate keys to link records
Data over time. Fact tables are
Warehouse independent of one another,
whereas Dimensions are
“conformed” (shared between fact
tables.
• Methods include
– Dimensional modelling, widely used, well-documented structured
method for star schema design
– Pervasive use of “surrogate keys” - integer primary keys unrelated to
original source key, to enable correct treatment of historical data, reduce
data volumes and improve performance
– Normalised fact tables (all categorical or textual data such as codes,
account numbers, indicators, etc, are moved out to dimensions) reducing
transaction data volumes
– Denormalised dimension tables - making dimensional data easier to
use - e.g. combining customer with current residential address so that
users don’t need to use an intersection table
SAS Integration SAS/CONNECT
to SQL Server
SAS
Historical
Data Store
Enterprise Layer Data
Exploration
Mart
Ad-hoc
load
Treasury
Party
Deal
Account
Time Production Model
load Production
Mart
Credit Risk
Measures
Model writeback
Storage Summary
Static data
Closed more Transaction data
than one cycle > 14 days old
previously
Dimension Dimension
Nearline Online
Layer Sizing (5 years)
230GB
• Includes dB overhead, not BCV backup space (online)
Basel II Data Store
42GB 176GB(online) 2.2TB (online)
User Layer
HDS
Model Production
BorexMaster Borex Silo
Mart
ODS
Enterprise
Deploy
Model
Other static data Layer
Other
ODS
Silo
SAS
ODS HDS Staging
UL
Model Development 2.5TB
File
HDS Enterprise
Layer
Staging
EL
Mart
(online)
Unisys TCSOUT Unpack & Silo
Format
Reporting/OLAP
File Mart
Other
Unpack & Silo
Transactions
Format
File
CIS Unpack & Silo
Decode Data Exploration
Mart
Business rules
Historical
HDS
Data Store
HDSSilo
Silo HDS
HDSSilo
Silo HDS
HDSSilo
Silo
Restructure - Cleanse
Data structure
Metadata
Enterprise
Layer
Data quality
Subset - Aggregate
Data models
User
Layer
SAS OLAP cubes Reporting tools Contacts /
ownership
Transformations /
data lineage
Metadata management process
• Processes for generating and maintaining Metadata are integrated into
the development cycle
– Data analysis
– Design and build
– User Testing
– Live operation
• Users have full visibility of metadata, ensuring that they understand the
data being accessed
• Metadata is held and managed using the CASE tool (Popkin), with
extensions to support data lineage information
Summary