Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
BusinessObjects Analytics
Copyright No part of the computer software or this document may be reproduced or transmitted in any form
or by any means, electronic or mechanical, including photocopying, recording, or by any
information storage and retrieval system, without permission in writing from Business
Objects S.A.
The information in this document is subject to change without notice. If you find any problems
with this documentation, please report them to Business Objects S.A. in writing at
documentation@businessobjects.com.
Business Objects S.A. does not warrant that this document is error free.
Copyright Business Objects S.A. 2004. All rights reserved.
Printed in France.
Trademarks The Business Objects logo, WebIntelligence, BusinessQuery, the Business Objects tagline,
BusinessObjects, BusinessObjects Broadcast Agent, Rapid Mart, Set Analyzer, Personal
Trainer, and Rapid Deployment Template are trademarks or registered trademarks of Business
Objects S.A. in the United States and/or other countries.
Contains IBM Runtime Environment for AIX(R), Java(TM) 2 Technology Edition Runtime
Modules (c) Copyright IBM Corporation 1999, 2000. All Rights Reserved.
Contains ICU libraries (c) 1995-2003 International Business Machines Corporation and others.
All rights reserved.
All other company, product, or brand names mentioned herein, may be the trademarks of their
respective owners.
Use restrictions This software and documentation is commercial computer software under Federal Acquisition
regulations, and is provided only under the Restricted Rights of the Federal Acquisition
Regulations applicable to commercial computer software provided at private expense. The use,
duplication, or disclosure by the U.S. Government is subject to restrictions set forth in
subdivision (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at 252.227-
7013.
Contents
Chapter 1 Deployment Guidelines 3
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Installing Data Integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Installing the rapid mart(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Customization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Performance tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Addendum: Updating software versions . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Chapter 2 Summary 21
Contents
2 Rapid Mart Deployment Guidelines
Contents
Deployment Guidelines
chapter
4 Rapid Mart Deployment Guidelines
Overview
These guidelines have two main purposes:
to help you get started in your rapid mart deployment
to propose typical solutions for some of the issues you may encounter
The guidelines provide only a brief overview of the deployment process. They
walk you through deployment step by step, beginning with architectural
concepts and going on to installation according to your chosen architecture,
configuration, and customization.
For more information, refer to the following documentation:
Data Integrator Getting Started Guide
Describes how to install Data Integrator.
Data Integrator Reference Guide
Describes Data Integrators features.
Rapid Mart Deployment Guide
Explains how a rapid mart is built, and why it is built as it is.
Rapid Mart documentation (one for each rapid mart)
Describes how to use a specific rapid mart.
Data Integrator Performance Tuning Guide
Describes how to optimize flows.
database documentation
Describes how to use, configure, and tune a database.
Ralph Kimball and Team: Tips and Tricks for data modeling
Deployment Guidelines
Rapid Mart Deployment Guidelines 5
Architecture
Data Integrator has a very flexible architecture. You can install components such
as the repository, jobserver, Designer, Web Admin and data warehouse in
different locations. You can also install more than one copy of each component.
Architecture
6 Rapid Mart Deployment Guidelines
of failure, Business Objects recommends that you assign someone to run a test
across the entire system, including looking at the side effects of changes on other
elements.
To enable such tests, build a third systemthe integration test system. This will
increase the workload required when moving any change to production, because
the system must pass more tests before the move, but this process will ensure a
higher quality of deployment.
In some cases, it can be beneficial to use a dedicated machine for the integration
testwhen upgrading software, for instancebut in most cases the integration
test system can be installed on the development computer (using different user
accounts and file paths for data).
The diagram below shows an overview of the architecture of a recommended
transition from development to production.
Developers Central
repository
Integration
repository
Deployment Guidelines
Rapid Mart Deployment Guidelines 7
Summary
1. Install all Data Integrator components with the exception of Designer (that is,
repositories, jobserver and WebAdmin) on the data warehouse machine.
2. Set up two separate environmentsdevelopment and productionon
separate physical machines.
3. As soon as the data warehouse is in production, add an integration test
phase. Integration and development environments can share hardware.
Architecture
8 Rapid Mart Deployment Guidelines
Repository considerations
Developers need a local repository in order to work independently. Create one
(and only one) local repository for the integration environment, and another one
for the production system. Give the data warehouse owners names that reflect
the environment, such as rep_int and rep_prod.
How you name and configure these local repositories depends largely on your
answers to the following questions:
How many people will develop simultaneously using Data Integrator?
Will those people all work on the same tables?
Can the project be divided into multiple areas?
The answers to these questions normally lead to one of two conclusions:
The project is easily divisible into areas like cost center accounting, sales, and
purchasing. For each area there is just one lead, with others who act as
backups.
In this case, name the repositories in relation to their subject area, such as
rep_sales and rep_cost. Development relates more closely to the subject
area than to the specific individual developers, who will not be working on the
same area simultaneously.
A large team is working on the same parts of a data warehouse. For example,
one developer specializes in retrieving sales data from SAP and another from
Deployment Guidelines
Rapid Mart Deployment Guidelines 9
Summary
1. Set up one server for production and one for test and integration.
2. Set up three separate databases and define a user called dwh in each.
3. In the development database, create a user for each subproject.
4. In the integration database, create a user called rep_int, and in the production
database, create a user called rep_prod.
5. Create all datastores and export them into all repositories.
Deployment Guidelines
Rapid Mart Deployment Guidelines 11
User permissions
Owners should have connect and resource role permissions so that they can
create and drop database objects for their own schema. In addition, you should
grant the dwh users the following permissions, to enable them to create and
refresh Oracle materialized views:
CREATE MATERIALIZED VIEW
CREATE ANY TABLE
COMMENT ANY TABLE
Use the following statement:
Installation
Once you have made sure the database is correctly configured and the users
have the appropriate rights, you can install the rapid mart jobs and the rapid
mart tables. This involves:
running the installer
specifying the repository
selecting the datastores
providing the information for the target database so that the tables can be
created
As part of this installation, two sample stored procedures are installed. Because
the stored procedures are dependent on database versions and configuration,
database administrators must review and configure them as appropriate before
executing them.
One stored procedure, called before each load of a fact table, drops all indexes
and materialized view logs for a more efficient load. The second is called after the
load to recreate the indexes and logs.
Post-installation
To complete your installation, read the rapid marts Configuration Guide and
specify global parameters for language, default values, and so on.
If you are using a central repository as part of a multi-user environment, you
should add all objects to it now. If you do not, you may end up with different
versions of the same (shared) object.
If you are not using a central repository, create a list of the shared dimensions
across all of your rapid marts. Note where they are used and which repository
owns each component. Make any subsequent modifications to these objects
in the owner repository only.
Summary
1. Ensure all necessary database privileges are granted.
2. Install the rapid mart.
3. Configure the rapid mart as described in the rapid mart's Configuration
Guide.
4. Customize the sample stored procedures to meet your needs.
Deployment Guidelines
Rapid Mart Deployment Guidelines 13
Testing
Once you have installed, customized, and loaded the rapid mart, check that the
information it provides is accurate. One way to do this is to compare revenue
results according to the rapid mart with those in a standard SAP report. They
should be the same.
The purpose of this testing phase is to become familiar with all the rapid mart
components, and to gain an understanding of the rapid mart data model, the
flows and transformations, as well as understanding the SAP reports.
Some common reasons for differences are:
Not all data was extracted.
Rapid marts often load data using posting_date, whereas reports are based
on booking_date. Your document might be based on a later date.
Conditions may be missing in a BusinessObjects report when compared to
the SAP report.
There may be incorrect assumptions regarding terminology. What do you
mean by: Revenue? Orders? Delivery amounts? Billing amount? Do FI
account bookings include payment terms?
Usually the differences are self-evident, but sometimes they can be less easy to
spot. Therefore, be thorough and careful in your analysis.
Summary
Compare rapid mart summary reports with SAP reports.
Analyze carefully to explain any differences, checking dates and
conditions in particular.
Testing
14 Rapid Mart Deployment Guidelines
Customization
Rapid marts are built to be customized and extended, but you need to follow a
few rules to simplify use later on. Every change has to be moved to production
at some point, so you need to manage these changes.
To customize a rapid mart effectively:
Create guidelines such as naming conventions and organization. The entire
data warehouse should look and feel consistent, not as if it has been
developed by a number of different people.
Remember that rapid marts consist not only of the ETL logic, but also the
create-table-scripts, documentation, and universes.
Create a shared folder to store copies of all rapid mart installation scripts.
Replicate the load job of the rapid mart, and remove all sections and the calls
of the AW_Start, AW_End function in the init/end script. This function will be
used to test single components.
Use this simplified version of the rapid mart job to test single components.
Modifying a component
To modify a component:
1. Modify the objects in Data Integrator to get an idea of the changes involved
the additional columns and their data types.
2. Update the rapid mart create-table-script.
3. Recreate the table in the database.
4. Update the target table object in Data Integrator.
5. Call the changed component in the test job and execute it.
6. Update the documentation.
7. If the modified object is a fact table and the production environment is loaded
already, define how this information will be completed for the data already
loaded.
TIP
If you do not have a central repository, copy the updated object to the other
development repositories that share this component.
Deployment Guidelines
Rapid Mart Deployment Guidelines 15
Summary
1. Use a simplified version of the rapid mart job to test single components.
2. Make any changes according to your rapid mart naming conventions and
organizational guidelines.
3. Update the table scripts.
4. Update the documentation.
Customization
16 Rapid Mart Deployment Guidelines
Performance tuning
If the data warehouse has been loaded and reports have been built, but
refreshing the reports with new data is taking longer than it should, you probably
need to optimize the database for those reports.
Indexing
The database gathers the data for a report faster when it has less to read.
One technique is to create an index on the key column so that only the indexed
row of data is read, not all records. This is most effective when you are
interested in only one row of data and you know the unique identifier of the row,
such as the customer-number in the customer table.
In typical BI reports, the data you're looking for is in several columns; each
column may not be very selective by itself, but the combination will narrow
results down to a much smaller number of records. For example, there might
be millions of records for a specific booking period, hundreds of thousands for a
given customer, but only a few that satisfy both criteria. Having one index on date
and one on customer would work for this one query, but an index for all possible
combinations of columns would be needed. A better solution can be to use a
bitmap index.
Create a bitmap index on a column when:
The table is used frequently.
The table row count is at least 100,000 records.
The column is frequently used in where clauses.
The ratio between count(distinct column)/count(*) is 1% or less.
The table doesn't already have a very selective index on a combination of
n columns.
Do not create too many bitmap indexes, as each one significantly reduces the
ETL performance.
For example, in a given table, two bitmap indexes might return 100 rows,
whereas with three the number falls to just 10. In such a case, combining three
indexes may take more time than simply filtering the 100 rows. The aim is to
retrieve about 10,000 rows with the indexes. The rest can be filtered.
Materialized views
Another technique for reducing the amount of data to read is pre-aggregation:
either physically stored or, more dynamically, through the use of what Oracle
calls materialized views. Often the queries of reports are not very selective, such
Deployment Guidelines
Rapid Mart Deployment Guidelines 17
as when gathering the data for an entire year. If the sums for a year are pre-
calculated in a separate table, the database has less to read (just the
aggregation). The aim is not to have one aggregation for each kind of query, but
to reduce the amount of data to read to a useful amount and in a form that is
common to as many situations as possible.
In the above example the materialized view might be built on periods rather than
years, thus containing 12 times more data. However, reading 12 rows in
comparison to one row for a year takes almost the same effort. The aim should
be to reduce the number of records read to between 10,000 and 100,000 rows.
For example, the materialized view might contain 10 years of data for 5,000
customers for 12 periods, with an index on the year.
Also consider:
To enable delta loads for the materialized views, all base tables used must
have a materialized view log assigned. This reduces ETL performance by
50%.
Foreign key relationships must be created for all joins used inside a
materialized view.
Each foreign key column should be of the type not null.
Stored procedures
All database objects, bitmap indexes, materialized views, and materialized view
logs compromise ETL performance. The way an ETL job deals with a load
depends on the number of changes, and whether the load is an initial load or a
delta load.
For an initial load, an ETL job proceeds as follows:
1. The stored procedure preprocessing_fact_table performs the following steps:
- Drops materialized view logs. (Materialized views themselves are refreshed
per request later so they do not need to be dropped and recreated.)
- Drops all bitmap indexes on the fact table.
- Retrieves a list of all materialized views based on the fact table and drops
their bitmap indexes as well.
2. The data flow loads the fact table.
3. The stored procedure postprocessing_fact_table performs the following
steps:
- Creates all bitmap indexes.
- Creates materialized view logs.
- Refreshes the materialized views.
- Retrieves a list of all materialized views based on that fact table and creates
Performance tuning
18 Rapid Mart Deployment Guidelines
Summary
Create indexes and pre-aggregated fact tables to match the specific
requirements of the reports.
Deployment Guidelines
Rapid Mart Deployment Guidelines 19
Deployment
Once the development phase is finished:
1. Move all sections of all rapid marts into the integration system.
- if there is a central repository, use get latest
- if there is no central repository, push from each single development
repository
2. Create a job called Load_All in the integration repository that supersedes all
the single rapid mart jobs.
This job must:
- contain all the global variables used in all of the rapid marts, with default
values assigned to the variables
- contain init and end scripts
- call all the sections
3. If you are using a central repository, add this job to it.
4. Create all the tables in the production database, together with the table
definitions.
Test that the production database tables and the repository table
definitions are synchronized:
- Open the target data warehouse data store.
- Select the integration repository data.
- Reconcile all tables.
- Check that the list does not contain any tables marked as changed, as this
indicates that the table just created and the table definitions held in the
repository are not synchronized.
5. Test all the created objects, then move to production.
Repeat this procedure for moving to production.
Summary
1. Move all sections to the integration system.
2. Create the Load_All job to call all sections and add to the central repository
if applicable.
3. Create all target database tables.
4. Test everything.
5. Copy all objects to production.
Deployment
20 Rapid Mart Deployment Guidelines
Deployment Guidelines
Summary
chapter
22 Rapid Mart Deployment Guidelines
Overview
This chapter contains all the summaries from the previous chapter. Once familiar
with the deployment process, use this page as a quick reference for the process.
Architecture:
- Install all Data Integrator components (except Designer) on the data
warehouse machine.
- Set up two separate environmentsdevelopment and productionon
separate machines.
- As soon as the data warehouse is in production, add an integration test
phase. Integration and development environments can share hardware.
Installing Data Integrator:
- Set up one server for production and one for test and integration.
- Set up three separate databases and define a user called dwh in each.
- In the development database, create a user for each subproject.
- In the integration database, create a user called rep_int, and in the
production database, create a user called rep_prod.
- Create all datastores and export them into all repositories.
Installing the rapid mart(s):
- Ensure all necessary database privileges are granted.
- Install the rapid mart and customize it following its documentation.
- Customize the sample stored procedures to meet your needs.
Testing: Compare rapid mart summary reports with SAP reports.
Customization:
- Use a simplified version of the rapid mart job to test single components.
- Make any changes following the rapid mart guidelines.
- Update the table scripts and the documentation.
Performance tuning: Create indexes and pre-aggregated fact tables to match
the specific requirements of the reports.
Deployment:
- Move all sections to the integration system.
- Create the Load_All job to call all sections.
- Create all target database tables and test everything.
- Copy all objects to production.
Summary