Sei sulla pagina 1di 24

Rapid Mart Deployment Guidelines 1.

BusinessObjects Analytics

Windows and UNIX


2 Rapid Mart Deployment Guidelines

Copyright No part of the computer software or this document may be reproduced or transmitted in any form
or by any means, electronic or mechanical, including photocopying, recording, or by any
information storage and retrieval system, without permission in writing from Business
Objects S.A.
The information in this document is subject to change without notice. If you find any problems
with this documentation, please report them to Business Objects S.A. in writing at
documentation@businessobjects.com.
Business Objects S.A. does not warrant that this document is error free.
Copyright Business Objects S.A. 2004. All rights reserved.
Printed in France.

Trademarks The Business Objects logo, WebIntelligence, BusinessQuery, the Business Objects tagline,
BusinessObjects, BusinessObjects Broadcast Agent, Rapid Mart, Set Analyzer, Personal
Trainer, and Rapid Deployment Template are trademarks or registered trademarks of Business
Objects S.A. in the United States and/or other countries.
Contains IBM Runtime Environment for AIX(R), Java(TM) 2 Technology Edition Runtime
Modules (c) Copyright IBM Corporation 1999, 2000. All Rights Reserved.
Contains ICU libraries (c) 1995-2003 International Business Machines Corporation and others.
All rights reserved.
All other company, product, or brand names mentioned herein, may be the trademarks of their
respective owners.

Use restrictions This software and documentation is commercial computer software under Federal Acquisition
regulations, and is provided only under the Restricted Rights of the Federal Acquisition
Regulations applicable to commercial computer software provided at private expense. The use,
duplication, or disclosure by the U.S. Government is subject to restrictions set forth in
subdivision (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at 252.227-
7013.

Patents U.S. Patent Numbers 5,555,403, 6,247,008, and 6,578,027.

Part Number 3D1-50-650-01


Rapid Mart Deployment Guidelines 1

Contents
Chapter 1 Deployment Guidelines 3
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Installing Data Integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Installing the rapid mart(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Customization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Performance tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Addendum: Updating software versions . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Chapter 2 Summary 21

Contents
2 Rapid Mart Deployment Guidelines

Contents
Deployment Guidelines

chapter
4 Rapid Mart Deployment Guidelines

Overview
These guidelines have two main purposes:
to help you get started in your rapid mart deployment
to propose typical solutions for some of the issues you may encounter
The guidelines provide only a brief overview of the deployment process. They
walk you through deployment step by step, beginning with architectural
concepts and going on to installation according to your chosen architecture,
configuration, and customization.
For more information, refer to the following documentation:
Data Integrator Getting Started Guide
Describes how to install Data Integrator.
Data Integrator Reference Guide
Describes Data Integrators features.
Rapid Mart Deployment Guide
Explains how a rapid mart is built, and why it is built as it is.
Rapid Mart documentation (one for each rapid mart)
Describes how to use a specific rapid mart.
Data Integrator Performance Tuning Guide
Describes how to optimize flows.
database documentation
Describes how to use, configure, and tune a database.
Ralph Kimball and Team: Tips and Tricks for data modeling

Deployment Guidelines
Rapid Mart Deployment Guidelines 5

Architecture
Data Integrator has a very flexible architecture. You can install components such
as the repository, jobserver, Designer, Web Admin and data warehouse in
different locations. You can also install more than one copy of each component.

Deciding where to install each component


Business Objects recommends you install all rapid mart components on the data
warehouse database server, with the exception of the Designer development
tool.
In such a configuration, no network activity is involved when the jobserver is
writing to the database, and although it may seem that having the components
share hardware resources is a problem, in practice this is not the case.
The database is used for different activities at different times of the day.
The repository and jobserver are not normally used during office hours, and
the rest of the time the database and Data Integrator components make very
different demands of hardware resources
Resource issues are avoided because components are used in turn.
During a delta load, for example, objects are read, optimized, and processed
in sequence, not simultaneously.
The jobserver and database run concurrently, but the database uses system
I/O almost exclusively when loading data, while the jobserver uses only CPU.
The processing power of CPUs is many times more efficient than writing to
disk, so the jobserver will frequently wait for the records to be written to the
disk by the database.
As a result, the jobserver and the database use different resources, even
when sharing the same hardware.
All these points are true for both development and production environments.
Even though the repository is used far more during development, as designers
interact constantly with the repository, the additional load is marginal.

Determining how many phases to use to move from development to


production
You could go straight from development to production but it's safer, for the
reasons below, to add an integration test system
The separation of development from production gives developers an opportunity
to make changes themselves without compromising the production system at all.
However, developers are unlikely to test everything. So to further reduce the risk

Architecture
6 Rapid Mart Deployment Guidelines

of failure, Business Objects recommends that you assign someone to run a test
across the entire system, including looking at the side effects of changes on other
elements.
To enable such tests, build a third systemthe integration test system. This will
increase the workload required when moving any change to production, because
the system must pass more tests before the move, but this process will ensure a
higher quality of deployment.
In some cases, it can be beneficial to use a dedicated machine for the integration
testwhen upgrading software, for instancebut in most cases the integration
test system can be installed on the development computer (using different user
accounts and file paths for data).
The diagram below shows an overview of the architecture of a recommended
transition from development to production.

Test & development server Production server

Database instance one Database Database instance three


instance two

Developers Central
repository

Integration
repository

Development data Integration data Production data


warehouse warehouse warehouse

Deployment Guidelines
Rapid Mart Deployment Guidelines 7

Summary
1. Install all Data Integrator components with the exception of Designer (that is,
repositories, jobserver and WebAdmin) on the data warehouse machine.
2. Set up two separate environmentsdevelopment and productionon
separate physical machines.
3. As soon as the data warehouse is in production, add an integration test
phase. Integration and development environments can share hardware.

Architecture
8 Rapid Mart Deployment Guidelines

Installing Data Integrator


Before you install Data Integrator:
Install and configure your database middleware.
With Oracle it is useful to use one database instance for both the data
warehouse data and the repositories.
Identify a database.
Configure the Oracle network layer so that the same database alias
(TNSNAME) can be used from all systems and designer workstations as well
as from jobserver machines.
Create and configure user accounts.
Create database users that hold the repository tables and the data
warehouse tables. Working with Data Integrator is easier if the data
warehouse owners of all three phases have the same name, such as dwh. To
do so, create a separate database instance for the integration test system. If
you prefer to use the same database instance, call the owners something like
dwh_dev, dwh_int, and dwh_prod (for the production system).

Repository considerations
Developers need a local repository in order to work independently. Create one
(and only one) local repository for the integration environment, and another one
for the production system. Give the data warehouse owners names that reflect
the environment, such as rep_int and rep_prod.
How you name and configure these local repositories depends largely on your
answers to the following questions:
How many people will develop simultaneously using Data Integrator?
Will those people all work on the same tables?
Can the project be divided into multiple areas?
The answers to these questions normally lead to one of two conclusions:
The project is easily divisible into areas like cost center accounting, sales, and
purchasing. For each area there is just one lead, with others who act as
backups.
In this case, name the repositories in relation to their subject area, such as
rep_sales and rep_cost. Development relates more closely to the subject
area than to the specific individual developers, who will not be working on the
same area simultaneously.
A large team is working on the same parts of a data warehouse. For example,
one developer specializes in retrieving sales data from SAP and another from

Deployment Guidelines
Rapid Mart Deployment Guidelines 9

JDE, both loading the same target table.


Some project organization needs to be defined, potentially using the multi-
user options within Data Integrator. Clearly, no IT system can help if two
people create the same object but name it differently, such as dim_customer
and dim_customers. However, you should name the developers repositories
according to their user names, (and perhaps synchronize them with the NT
domain accounts), and create a shared central repository named according
to the project area.
At the start of a project, you should decide which of these two options is the more
suited to your deployment needs. The second one is more flexible but you should
note the following constraints:
You must have the multi-user license for Data Integrator. This is included only
in the enterprise version.
The multi-user environment requires the repositories to keep and maintain all
the old versions of objects. This means that the repositories can grow quickly,
and therefore tend to be slower and more complex to navigate.
Nonetheless, the multi-user environment does have advantages:
You can control different versions of objects, enabling you to build a new
release of an object and modify the current production version (branching).
The person responsible for testing can download specific versions of objects
without having to rely on developers to upload them.
All objects can be maintained by all developers; there is no need to assign an
object to one development repository only.
After installing Data Integrator and all the repositories, create all the datastore
objects in one of the development repositories and copy (export) them to all other
repositoriesincluding production. Change the names carefully; do not indicate
Dev, Int, or Prod in the datastore names; all datastores will be named the same
but will point to different servers/databases. Try to predict the mid- to long-term
future of the production environment as this may impact your decisions: for
example, are there two production servers, such as one for Human Resources
applications and another for all the other applications?

Installing Data Integrator


10 Rapid Mart Deployment Guidelines

Summary
1. Set up one server for production and one for test and integration.
2. Set up three separate databases and define a user called dwh in each.
3. In the development database, create a user for each subproject.
4. In the integration database, create a user called rep_int, and in the production
database, create a user called rep_prod.
5. Create all datastores and export them into all repositories.

Deployment Guidelines
Rapid Mart Deployment Guidelines 11

Installing the rapid mart(s)


Rapid marts are pre-built data warehouse solutions that are used out of the box
by many customers, but can also be modified and extended. A broad range of
knowledge and expertise is needed for the first installation, which can be
complicated. However, the knowledge gained in the process can render
subsequent projects much easier to manage.

Database kernel parameters


Rapid marts define the use of Oracle bitmap indexes, materialized views and
table partitioning, and your database administrator must ensure that the
database is configured to support them.
To do so, set the following parameters
QUERY_REWRITE_ENABLED=TRUE
QUERY_REWRITE_INTEGRITY=STALE_TOLERATED
OPTIMIZER_MODE=FIRST_ROWS
OPTIMIZER_INDEX_COST_ADJ=10
OPTIMIZER_INDEX_CACHING=50
DB_FILE_MULTIBLOCK_READ_COUNT=32
The first two are needed to use Oracle materialized views; the others are
standard for data warehouse environments.
These parameters are in Oracle 9i syntax and are set in one of two ways,
depending on your setup:
using the following statement: alter system set .. scope=spfile
in the init.ora file
Restart the database after making these changes.

User permissions
Owners should have connect and resource role permissions so that they can
create and drop database objects for their own schema. In addition, you should
grant the dwh users the following permissions, to enable them to create and
refresh Oracle materialized views:
CREATE MATERIALIZED VIEW
CREATE ANY TABLE
COMMENT ANY TABLE
Use the following statement:

Installing the rapid mart(s)


12 Rapid Mart Deployment Guidelines

grant <permission> to <user>

Installation
Once you have made sure the database is correctly configured and the users
have the appropriate rights, you can install the rapid mart jobs and the rapid
mart tables. This involves:
running the installer
specifying the repository
selecting the datastores
providing the information for the target database so that the tables can be
created
As part of this installation, two sample stored procedures are installed. Because
the stored procedures are dependent on database versions and configuration,
database administrators must review and configure them as appropriate before
executing them.
One stored procedure, called before each load of a fact table, drops all indexes
and materialized view logs for a more efficient load. The second is called after the
load to recreate the indexes and logs.

Post-installation
To complete your installation, read the rapid marts Configuration Guide and
specify global parameters for language, default values, and so on.
If you are using a central repository as part of a multi-user environment, you
should add all objects to it now. If you do not, you may end up with different
versions of the same (shared) object.
If you are not using a central repository, create a list of the shared dimensions
across all of your rapid marts. Note where they are used and which repository
owns each component. Make any subsequent modifications to these objects
in the owner repository only.

Summary
1. Ensure all necessary database privileges are granted.
2. Install the rapid mart.
3. Configure the rapid mart as described in the rapid mart's Configuration
Guide.
4. Customize the sample stored procedures to meet your needs.

Deployment Guidelines
Rapid Mart Deployment Guidelines 13

Testing
Once you have installed, customized, and loaded the rapid mart, check that the
information it provides is accurate. One way to do this is to compare revenue
results according to the rapid mart with those in a standard SAP report. They
should be the same.
The purpose of this testing phase is to become familiar with all the rapid mart
components, and to gain an understanding of the rapid mart data model, the
flows and transformations, as well as understanding the SAP reports.
Some common reasons for differences are:
Not all data was extracted.
Rapid marts often load data using posting_date, whereas reports are based
on booking_date. Your document might be based on a later date.
Conditions may be missing in a BusinessObjects report when compared to
the SAP report.
There may be incorrect assumptions regarding terminology. What do you
mean by: Revenue? Orders? Delivery amounts? Billing amount? Do FI
account bookings include payment terms?
Usually the differences are self-evident, but sometimes they can be less easy to
spot. Therefore, be thorough and careful in your analysis.

Summary
Compare rapid mart summary reports with SAP reports.
Analyze carefully to explain any differences, checking dates and
conditions in particular.

Testing
14 Rapid Mart Deployment Guidelines

Customization
Rapid marts are built to be customized and extended, but you need to follow a
few rules to simplify use later on. Every change has to be moved to production
at some point, so you need to manage these changes.
To customize a rapid mart effectively:
Create guidelines such as naming conventions and organization. The entire
data warehouse should look and feel consistent, not as if it has been
developed by a number of different people.
Remember that rapid marts consist not only of the ETL logic, but also the
create-table-scripts, documentation, and universes.
Create a shared folder to store copies of all rapid mart installation scripts.
Replicate the load job of the rapid mart, and remove all sections and the calls
of the AW_Start, AW_End function in the init/end script. This function will be
used to test single components.
Use this simplified version of the rapid mart job to test single components.

Modifying a component
To modify a component:
1. Modify the objects in Data Integrator to get an idea of the changes involved
the additional columns and their data types.
2. Update the rapid mart create-table-script.
3. Recreate the table in the database.
4. Update the target table object in Data Integrator.
5. Call the changed component in the test job and execute it.
6. Update the documentation.
7. If the modified object is a fact table and the production environment is loaded
already, define how this information will be completed for the data already
loaded.

TIP
If you do not have a central repository, copy the updated object to the other
development repositories that share this component.

Deployment Guidelines
Rapid Mart Deployment Guidelines 15

Summary
1. Use a simplified version of the rapid mart job to test single components.
2. Make any changes according to your rapid mart naming conventions and
organizational guidelines.
3. Update the table scripts.
4. Update the documentation.

Customization
16 Rapid Mart Deployment Guidelines

Performance tuning
If the data warehouse has been loaded and reports have been built, but
refreshing the reports with new data is taking longer than it should, you probably
need to optimize the database for those reports.

Indexing
The database gathers the data for a report faster when it has less to read.
One technique is to create an index on the key column so that only the indexed
row of data is read, not all records. This is most effective when you are
interested in only one row of data and you know the unique identifier of the row,
such as the customer-number in the customer table.
In typical BI reports, the data you're looking for is in several columns; each
column may not be very selective by itself, but the combination will narrow
results down to a much smaller number of records. For example, there might
be millions of records for a specific booking period, hundreds of thousands for a
given customer, but only a few that satisfy both criteria. Having one index on date
and one on customer would work for this one query, but an index for all possible
combinations of columns would be needed. A better solution can be to use a
bitmap index.
Create a bitmap index on a column when:
The table is used frequently.
The table row count is at least 100,000 records.
The column is frequently used in where clauses.
The ratio between count(distinct column)/count(*) is 1% or less.
The table doesn't already have a very selective index on a combination of
n columns.
Do not create too many bitmap indexes, as each one significantly reduces the
ETL performance.
For example, in a given table, two bitmap indexes might return 100 rows,
whereas with three the number falls to just 10. In such a case, combining three
indexes may take more time than simply filtering the 100 rows. The aim is to
retrieve about 10,000 rows with the indexes. The rest can be filtered.

Materialized views
Another technique for reducing the amount of data to read is pre-aggregation:
either physically stored or, more dynamically, through the use of what Oracle
calls materialized views. Often the queries of reports are not very selective, such

Deployment Guidelines
Rapid Mart Deployment Guidelines 17

as when gathering the data for an entire year. If the sums for a year are pre-
calculated in a separate table, the database has less to read (just the
aggregation). The aim is not to have one aggregation for each kind of query, but
to reduce the amount of data to read to a useful amount and in a form that is
common to as many situations as possible.
In the above example the materialized view might be built on periods rather than
years, thus containing 12 times more data. However, reading 12 rows in
comparison to one row for a year takes almost the same effort. The aim should
be to reduce the number of records read to between 10,000 and 100,000 rows.
For example, the materialized view might contain 10 years of data for 5,000
customers for 12 periods, with an index on the year.
Also consider:
To enable delta loads for the materialized views, all base tables used must
have a materialized view log assigned. This reduces ETL performance by
50%.
Foreign key relationships must be created for all joins used inside a
materialized view.
Each foreign key column should be of the type not null.

Stored procedures
All database objects, bitmap indexes, materialized views, and materialized view
logs compromise ETL performance. The way an ETL job deals with a load
depends on the number of changes, and whether the load is an initial load or a
delta load.
For an initial load, an ETL job proceeds as follows:
1. The stored procedure preprocessing_fact_table performs the following steps:
- Drops materialized view logs. (Materialized views themselves are refreshed
per request later so they do not need to be dropped and recreated.)
- Drops all bitmap indexes on the fact table.
- Retrieves a list of all materialized views based on the fact table and drops
their bitmap indexes as well.
2. The data flow loads the fact table.
3. The stored procedure postprocessing_fact_table performs the following
steps:
- Creates all bitmap indexes.
- Creates materialized view logs.
- Refreshes the materialized views.
- Retrieves a list of all materialized views based on that fact table and creates

Performance tuning
18 Rapid Mart Deployment Guidelines

their bitmap indexes.


For a delta load the logic is similar but:
The materialized view logs are not dropped.
The bitmap indexes can optionally be dropped and recreated.
The refresh is performed as delta refresh of the materialized view.

Summary
Create indexes and pre-aggregated fact tables to match the specific
requirements of the reports.

Deployment Guidelines
Rapid Mart Deployment Guidelines 19

Deployment
Once the development phase is finished:
1. Move all sections of all rapid marts into the integration system.
- if there is a central repository, use get latest
- if there is no central repository, push from each single development
repository
2. Create a job called Load_All in the integration repository that supersedes all
the single rapid mart jobs.
This job must:
- contain all the global variables used in all of the rapid marts, with default
values assigned to the variables
- contain init and end scripts
- call all the sections
3. If you are using a central repository, add this job to it.
4. Create all the tables in the production database, together with the table
definitions.
Test that the production database tables and the repository table
definitions are synchronized:
- Open the target data warehouse data store.
- Select the integration repository data.
- Reconcile all tables.
- Check that the list does not contain any tables marked as changed, as this
indicates that the table just created and the table definitions held in the
repository are not synchronized.
5. Test all the created objects, then move to production.
Repeat this procedure for moving to production.

Summary
1. Move all sections to the integration system.
2. Create the Load_All job to call all sections and add to the central repository
if applicable.
3. Create all target database tables.
4. Test everything.
5. Copy all objects to production.

Deployment
20 Rapid Mart Deployment Guidelines

Addendum: Updating software versions


Upgrading to a newer version of the Data Integrator software can be tricky.
There is always the possibility that installation may not be successful the first
time around, causing the repository to become corrupted and jobs to no
longer work. An upgrade to a newer version of the Data Integrator software
cannot be reversed. Therefore, you must plan carefully and make backups of
your software and data so that you can recover the previous version if
necessary.
Business Objects recommends setting a limit of three days for the software
upgrade. During this time there should be no further development. All developers
and testers should dedicate their time to testing their components with the new
version. If more time is needed for the upgrade, you may require separate,
dedicated hardware for the tests. In such circumstances, an integration
environment on its own server is invaluable.
The update process is as follows:
1. Make backups of the integration system.
2. Export the repository of the integration system to a file.
3. Install the new software version.
4. Run an initial load of the integration system.
5. Run a delta load.
Once the update of the integration system is successful, migrate the
development environment as well, followed by the production system.

Deployment Guidelines
Summary

chapter
22 Rapid Mart Deployment Guidelines

Overview
This chapter contains all the summaries from the previous chapter. Once familiar
with the deployment process, use this page as a quick reference for the process.
Architecture:
- Install all Data Integrator components (except Designer) on the data
warehouse machine.
- Set up two separate environmentsdevelopment and productionon
separate machines.
- As soon as the data warehouse is in production, add an integration test
phase. Integration and development environments can share hardware.
Installing Data Integrator:
- Set up one server for production and one for test and integration.
- Set up three separate databases and define a user called dwh in each.
- In the development database, create a user for each subproject.
- In the integration database, create a user called rep_int, and in the
production database, create a user called rep_prod.
- Create all datastores and export them into all repositories.
Installing the rapid mart(s):
- Ensure all necessary database privileges are granted.
- Install the rapid mart and customize it following its documentation.
- Customize the sample stored procedures to meet your needs.
Testing: Compare rapid mart summary reports with SAP reports.
Customization:
- Use a simplified version of the rapid mart job to test single components.
- Make any changes following the rapid mart guidelines.
- Update the table scripts and the documentation.
Performance tuning: Create indexes and pre-aggregated fact tables to match
the specific requirements of the reports.
Deployment:
- Move all sections to the integration system.
- Create the Load_All job to call all sections.
- Create all target database tables and test everything.
- Copy all objects to production.

Summary

Potrebbero piacerti anche