Sei sulla pagina 1di 39

Velocity for Data Integration

Module 06: Phase 3 Architect

© 2012 Informatica Inc. All rights reserved


Module Objectives
Phase 3 Architect Learning Objectives:

• Describe the Architecture phase


• Discuss key points in developing a PowerCenter
Architecture
• List the installation and configuration recommendations
Phase 3: Architect

• Solution Architecture
• Tools/Technology
• Location of Informatica
components
• Production focused
• Development Architecture
• Develop Dev/Test/Prod strategy
• Testing/QA strategy
• PowerCenter Folders/Security
• Implement Architecture
• Install physical product
architecture
Phase 3: Architect

• Develop solutions architecture


• Technical requirements are defined
• Project infrastructure is developed
• Development standards and strategies are
defined
• The conceptual architecture that forms the basis
for determining capacity requirements
• Configuration recommendations are made
• The Informatica software is installed
Develop Solution Architecture

• Understanding the Informatica data integration


architecture and components is an important step in
this phase
• Architectural considerations needs to be given to
developing a Development, Test, QA and Production
systems
• The architectural solutions may be influenced by
several constraints such as budget, time and
regulatory requirements
V9.X Architecture Overview

Informatica
Analyst Service
Admin Analyst
Service

Model
Repository
Service
Informatica
Stage Developer
Repository

Data Integration Service


DO Cache ODBC/JDBC
Driver
Runtime
MRS
Admin ISP
Console Profile Repository
Warehouse Workflow
Integration Service Manager
(PowerCenter)

Repository Mapping
Service Designer
MRS
Repository
Domain DB

Metadata Manager Service Metadata


Manager

“Classic” “9.0+”
MM Warehouse
Key Architecture Considerations
Item Impact
Licensing Can limit the amount of CPU and repositories
used in the architecture.
Database DBA group may require Repository and
Management Domain DB’s on standard DB Servers
Hardware Make/model of hardware available may limit
the versions of Informatica to ensure running
on supported ‘PAM’ configuration
Data Volume Best to have integration server as close to the
target as possible especially w/high data
volume
Network Network traffic/speeds may limit your
architecture choices
Storage Shared Storage and Memory Requirements
Virtualization CPU, Memory and IO configuration
Cloud Infrastructure Network, File storage, DB and Performance
Requirements
Develop The Technical Requirements

• The technical requirements should address, at least


at a conceptual level, implementation specifications
based on the findings to date (regarding data rules,
source analysis, strategic decisions, etc.) at a high
level such as:
• Business rule derivations
• Logical source and target schema
• High level data quality rules
• Integration systems
• Security
Develop The Logical Model

• In a logical model a physical component can be


represented by several logical components
• Other services, applications, etc. can also show
representation in a logical model
Reporting
Flat Files Service

Staging Server ODS Server DM Server

Mainframe Staging ODS Data Mart

Note: Physically these servers can reside in a single virtual environment.


Informatica Physical Architecture

• Q: What is the correct/reference architecture for


Informatica Integration components?
• A: It Depends!
• Informatica Integration Architecture
• Flexibility
• Location of Domain/Repository/Integration services and components
• Key is to design an architecture that is right for the project
• Need to review standards, licensing, or available infrastructure
• Architecture can be altered later as new information is presented
• Fits with rest of solution architecture (BI, Scheduling, WS etc.)
• The Product Availability Matrix (PAM) provides the current supported
environments
Project Type Considerations

• To summarize a data integration project contains


many physical components which impact the
physical integration architecture such as:
• CPU processing resources
• CPU memory resources
• Data storage resources (disk and file space)
• Networking resources
• Virtualization
• Cloud infrastructure

• The Velocity project types will have an impact on the


size and configuration of the listed resources
Platform Sizing Recommendations

• These recommendation are PowerCenter specific


and do not take into account non-PowerCenter
process running on a server such as databases, web
services and other additional services
• Processor
• 1 to 1.5 CPUs per concurrent non-partitioned session or
transformation job.
• Virtual CPU is considered as 0.75 CPU. For example 4
CPU with 4 cores each, could be considered as 12 Virtual
CPUs.
Platform Sizing Recommendations
• Memory
• 20 to 30MB of memory for the Integration Service for session
coordination.
• 20 to 30MB of memory per session, if there are no aggregations,
lookups, or heterogeneous data joins. Note that 32-bit systems
have an operating system limitation of 2GB per session.
• Caches for aggregation, lookups or joins use additional memory:
• Lookup tables are cached in full; the memory consumed depends
on the size of the tables and selected data ports.
• Aggregate caches store the individual groups; more memory is
used if there are more groups. Sorting the input to aggregations
greatly reduces the need for memory.
• Joins cache the master table in a join; memory consumed
depends on the size of the master.
• Full Pushdown Optimization uses much less resources on
PowerCenter server in comparison to partial (source/target)
pushdown optimization.
General Installation Recommendations
• Make sure you have reviewed Informatica PAM for
system component compatibility
• Review the appropriate product installation guide(s)
• Do not forget licenses
• Use the “My Support” website for latest updates
• Support “Flash” program
• Support multimedia update presentation

• Additional storage space consideration and planning for


growth
• Repositories
• Separate file system for infa_shared which includes logs, flat files,
cache and temp storage
Estimating Data Volume Requirements

• Database space requirements are generally based


on the target environment such as data marts,
operational data stores and staging tables
• Database space determining factors include:
• Number of target tables
• Row size, including column data type, precision, scale
• Codepage
• Initial data load and growth
• Database specific attributes such as indexing, percent free,
etc.

• Involve the project DBA’s in this process


Estimating Data Volume Requirements

• The Velocity “Database Sizing Model” spreadsheet


can assist in calculating total disk volume
requirements
• Determine the upper bound of the precision of each
table row
• Estimate table growth on an incremental time period
such as initial and 12 month intervals
• Consider the archiving of data that does not have
immediate operational use
Database Sizing Model Spreadsheet

1- Enter the table


characteristics

2- Enter the volume information 3- Calculated Table Size Estimates

Note: Spreadsheet may be customized to add more tables and DB dependencies


Demo – Estimating Data Volume Using
the Database Sizing Model Spreadsheet
Informatica Single Server Option

• All major components housed on a single server


• Advantages are single server to manage and minimized network
traffic
• Disadvantages are shared resources and single point of failure
• Double licensing Informatica and DBMS

Informatica Domain
Client Tools
PowerCenter Repository Svc
PowerCenter Integration Svc
Database Server
Domain Repository
PowerCenter Repository
Separate DB Server and Informatica Server
• Repositories segregated from PowerCenter Server
• Dedicated box for Informatica Domain and Integration Service
• Dedicated box for DBMS and databases

Client Tools

Database Server
Informatica Domain
Domain Repository
PowerCenter Integration Svc
PowerCenter Repository
PowerCenter Repository Svc
Domain Environment Configuration
Options
• Isolated development, test, and production environments
• Replicate hardware as much as possible
Dev / Test Domain QA Domain Prod Domain

Project 1&2 Project 1&2


Project 1&2

Dev / Prod
QA
Test
Int Service
Int Service Int Service

Repository
Repository Repository Database3
Database1 Database2
Repository Repository
Repository Server3
Server1 Server2
Domain Configuration Options
• Shared development and test. Isolated production environments

Dev/Test Domain Prod Domain

Project 1&2 Project 1&2


Project 1&2

Dev Prod
Test

Int Service Prod


Int Service Dev Int Service Test

Repository
Repository Database3
Database1

Repository Repository
Server3
Server1
Architecture Example
Complex Architecture Example
PowerExchange Informatica PowerCenter Note:
All source and target DB is Oracle
PowerExchange Oracle Real-time PowerCenter Advanced Edition (8.1.1  8.5) (except Real-time DWH)
- Real-time Option
- Enterprise Grid Option
- Profiling Option
- Pushdown Optimization Option

Source Transaction system


A
Production 1

Target
Repository servers
Real-time
(Oracle RAC)
DWH
H (NeoView, Teradata, or
Source Transaction system A Netezza)
B

ESB Production 2
Message HP Integrity server
Source Transaction system (HP-UX, IA64)
PowerExchange Target
C Application
Connector Options
Systems

EII Client for Developers

Lab
Source Transaction system environment
PowerCenter
D
Data Federation
Options
Download the answer to this lab from the resources folde

Lab: Architecture Design


In this lab …
• You will be provided with a sample customer
environment for review to design a data integration
architecture
Continuing Environment Definition

• Determine Number of Environments


• Development (Dev)
• Testing (Test)
• Quality Assurance (QA)
• Production (Prod)

• Account for Multiple Teams


• Security considerations
• Connection Information
• Shared Folder Ownership

• Folder Structure
• Developer folders vs. Production folders
Folders – Best Practices
• Folder scope
• Main driver is uniqueness of target table name (i.e. Sales_DM_DEV)
• Source naming does not have to be unique

• Shared/Shortcut Folders
• Typically Sources/Targets
• Can be other shared objects across projects
• Avoid bidirectional shortcuts between folder which inhibits folder object promotion
• Do NOT make all folders shared!

• Stand Alone Folders


• Typically organized by integration target
• More difficult to schedule across folders so try to keep folders consistent with
projects/load plans

• Naming
• Alphabetic organization so consider that when developing names
• Avoid spaces due to command line issues if using Infacmd
• Use alpha character such as ~ for developer folders
Folder Organization Example: Four
Repositories

DEV TEST QA PROD


FIN_DM FIN_DM FIN_DM FIN_DM
HR_DM HR_DM HR_DM HR_DM
FIN_SHARED FIN_SHARED FIN_SHARED FIN_SHARED
HR_SHARED HR_SHARED HR_SHARED HR_SHARED
~ Gene
~ Kerstin
Four repositories can ensure tight control over the
development lifecycle
~ Maret
Dev: Developers will create and test all mappings
~ Neil
Test: Testers unit test mappings and ensure referential
~ Zoe integrity is maintained through workflows
QA: Data Quality testing will ensure data integrity
Production: Live Environment

Note the shared repository between to development projects


Folder Security - Why ??
• What happens if folder owned by jsmith if jsmith leaves
company?
• Do you want to have jsmith in your production repository
security?
• Easy to give user access to the group and then see who
is in the group.
• Can then manage across folders – IE if you are on EDW
team you have EDW Folder group and put all developers
in the group
• If Administrator owns folders – do you want to hand out
Administrator access to development member in charge
of managing folders?
Folder Security Best Practices

• Consider using Folder Owner Groups


• Example:
• Folder named : STG_SAP
• Group Named: STG_SAP_Folder
• User Named : Folder_Owner

• Put all developers in ‘STG_SAP_Folder’ group that


needs access
• Folder_Owner is in STG_SAP_Folder Group
• Setup Group Security on the folder for RWX
Security Summary

• Responsibility of the PowerCenter Administrator


• Maintains folder management

• All users assigned security level based on role


• Administrator, Developer, Operator

• Folder permissions based on group not individual privileges


• PowerCenter client setup should use Repository Manager
Export/Import Registry feature
• Server connection security (Database, FTP, Loader, etc.)
should be maintained at a group level
Velocity Architecture Recommendations
• Velocity best practices related to Architecture:
• Designing Data Integration Architectures
• Domain Configuration
• PowerCenter Enterprise Grid Option
• Master Data Management Architecture with Informatica
• Establishing a B2B Data Transformation Development
Architecture
• Disaster Recovery Planning with PowerCenter HA Option
Development Standards/Processes – BP’s

• Naming Conventions
• PowerCenter/DI
• B2B
• Data Quality
• Etc.
• Organizing and Maintaining Parameter Files
• Metadata Strategy
• Change Control Procedures
• Error Handling
Naming Conventions Sample
• Transformation and mapping naming standards
• Aggregator - agg_(description)
• Expression - exp_(description)
• External Procedures - ext_(description)
• Filters - fil_(description)
• Joiner - jnr_(description)
• Lookup - lkp_(table/description)
• Normalizer - nrm_(description)
• Rank - rnk_(description)
• Sequence - seq_(description)
• Source Qualifier - sq(Source)
• Stored Procedures - sp_(description)
• Mapping - m_<description>
Note: Reference the Velocity section on “Development Techniques”
for additional information
Data Integration Architecture Best Practices

• Hardware
• Keep in mind the sharing of hardware – DI vs. RDBMS
• Licensing (CPU based)
• Separate OS User for Informatica processes
• Best Practice Documents
• Advanced PowerCenter Server Configuration Options
• Advanced Client Configuration Options
• Domain Configuration
• Document the Installation
TIP: Focus on Reuse
And take advantage of meta data reporting
Change Control Procedure

• Once the project plan, system architecture, project


policies and procedures are in place change is
always a consideration in any project
• Types of changes
• Possible changes to the logical and physical data models,
extract programs, business rules, or deployment plans can
occur in the course of a project
• Normal changes such as source, targets, sessions, reports,
deployment, etc.

• The change control process is used as a formal


procedure to request and document changes
CHANGE REQUEST FORM

The Sample Velocity


Template “Change
Request Form” creates a
formal procedure to
request and document
normal operational
project changes
The form can be
customized for different
project level changes
Phase 3 : Architect – Velocity Documents
• Database Sizing Model.xls • BP – Platform Sizing
• Change Request Form.doc • BP – PowerCenter Enterprise
Grid Option
• BP – Designing Data
Integration Architectures • BP – Configuring Security
• BP – Database Sizing • BP – Naming Conventions
• BP – Data Services • BP – Naming Conventions -
Architecture Data Quality
• BP – Domain Configuration • BP – Organizing and
Maintaining Parameter Files
& Variables
Module Summary

Phase 3 Architect module, in review:

• Discuss Velocity Phase 3 Architect


• Create an Informatica PowerCenter Architecture
Document
• Calculate database volume estimations
• Define Development Standards including Folder
Structures
• Discuss the need for change control procedures

Potrebbero piacerti anche