Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Material
Version 1.0
REVISION HISTORY
The following table reflects all changes to this document.
Date
Author / Contributor
01-Nov-2004
Version
1.0
Page 1 of 98
DWH Training -9739096158
14-Sep-2010
1.1
Updated Document
Table of Contents
Introduction
1.1 Purpose
ORACLE
2.1 DEFINATIONS
NORMALIZATION:
Syntaxes:
ORACLE JOINS:
10
Non-Equi Join
10
Self Join
10
Natural Join
11
Cross Join
11
Outer Join
11
11
12
12
12
View:
13
Materialized View:
13
Inline view:
14
Indexes:
19
19
Explain Plan:
22
Store Procedure:
23
Packages:
24
Triggers:
24
Page 2 of 98
DWH Training -9739096158
26
27
DWH CONCEPTS
30
What is BI?
30
ETL-INFORMATICA
53
53
92
99
103
105
UNIX
108
Purpose
The purpose of this document is to provide the detailed information about
DWH Concepts and Informatica based on real-time training.
Page 3 of 98
DWH Training -9739096158
2 ORACLE
2.1
DEFINATIONS
Organizations can store data on various media and in different formats, such as
a hard-copy document
in a filing cabinet or data stored in electronic spreadsheets or in databases.
A database is an organized collection of information.
To manage databases, you need database management systems (DBMS). A
DBMS is a program that
stores, retrieves, and modifies data in the database on request. There are four
main types of databases:
hierarchical, network, relational, and more recently object relational(ORDBMS).
NORMALIZATION:
Some Oracle databases were modeled according to the rules of normalization
that were intended to eliminate redundancy.
Does not have a composite primary key. Meaning that the primary key can
not be subdivided into separate logical entities.
Page 4 of 98
All the non-key columns are functionally dependent on the entire primary
key.
A row is in second normal form if, and only if, it is in first normal form and
every non-key attribute is fully dependent on the key.
Update
Delete
Data Querying Language (DQL)
Select
Data Control Language (DCL)
Grant
Revoke
Transactional Control Language (TCL)
Commit
Rollback
Save point
Syntaxes:
CREATE OR REPLACE SYNONYM HZ_PARTIES FOR SCOTT.HZ_PARTIES
CREATE DATABASE LINK CAASEDW CONNECT TO ITO_ASA IDENTIFIED BY
exact123 USING ' CAASEDW
Materialized View syntax:
CREATE MATERIALIZED VIEW
EBIBDRO.HWMD_MTH_ALL_METRICS_CURR_VIEW
REFRESH COMPLETE
START WITH sysdate
NEXT TRUNC(SYSDATE+1)+ 4/24
WITH PRIMARY KEY
AS
select * from HWMD_MTH_ALL_METRICS_CURR_VW;
Another Method to refresh:
DBMS_MVIEW.REFRESH('MV_COMPLEX', 'C');
Page 6 of 98
DWH Training -9739096158
Case Statement:
Select NAME,
(CASE
WHEN (CLASS_CODE = 'Subscription')
THEN ATTRIBUTE_CATEGORY
ELSE TASK_TYPE
END) TASK_TYPE,
CURRENCY_CODE
From EMP
Decode()
Select empname,Decode(address,HYD,Hyderabad,
Bang, Bangalore, address) as address from emp;
Procedure:
CREATE
OR
cust_id_IN
REPLACE
In
amount_IN
In
PROCEDURE
Update_bal (
NUMBER,
NUMBER DEFAULT 1) AS
BEGIN
Update account_tbl Set amount= amount_IN where cust_id= cust_id_IN
End
Trigger:
CREATE
OR
REPLACE
TRIGGER
AFTER/BEFORE UPDATE
EMP_AUR
ON EMP
REFERENCING
NEW AS NEW
OLD AS OLD
FOR EACH ROW
Page 7 of 98
DWH Training -9739096158
DECLARE
BEGIN
IF (:NEW.last_upd_tmst <> :OLD.last_upd_tmst) THEN
-- Insert into Control table record
Insert into table emp_w values('wrk',sysdate)
ELSE
-- Exec procedure
Exec update_sysdate()
END;
ORACLE JOINS:
Equi join
Non-equi join
Self join
Natural join
Cross join
Outer join
Left outer
Right outer
Full outer
Equi Join/Inner Join:
select
empno,ename,job
,dname,loc
from
emp
join
dept
using(deptno);
Page 8 of 98
DWH Training -9739096158
ON CLAUSE
SQL>
select
empno,ename,job,dname,loc
from
emp
join
dept
on(e.deptno=d.deptno);
Non-Equi Join
A join which contains an operator other than = in the joins condition.
Ex:
e.deptno >
d.deptno;
Self Join
Joining the table itself is called self join.
Ex1:
where
e1.mgr=e2.empno;
Ex2:
SELECT worker. employee_id, manager.last_name as manger name
FROM employees worker, employees manager
WHERE worker.manager_id = manager.employee_id ;
Natural Join
Natural join compares all the common columns.
Ex:
Outer join gives the non-matching records along with matching records.
Page 9 of 98
DWH Training -9739096158
on(e.deptno=d.deptno);
Or
SQL> select empno,ename,job,dname,loc from emp e,dept d where
e.deptno=d.deptno(+);
Right Outer Join
This will display the all matching records and the records which are in right hand
side table those that are not in left hand side table.
Ex:
dept d
on(e.deptno=d.deptno);
Or
SQL> select empno,ename,job,dname,loc from emp e,dept d where
e.deptno(+) =
d.deptno;
View:
Why Use Views?
To restrict data access
To make complex queries easy
To provide data independence
A simple view is one that:
Derives data from only one table
Contains no functions or groups of data
Can perform DML operations through the view.
Materialized view
It is a database object.
TRUNCATE
TRUNCATE removes all rows from a table. The operation cannot be rolled back.
As such, TRUCATE is faster and doesn't use as much undo space as a DELETE.
DROP
The DROP command removes a table from the database. All the tables' rows,
indexes and privileges will also be removed. The operation cannot be rolled back.
Difference between Rowid and Rownum?
ROWID
A globally unique identifier for a row in a database. It is created at the time the
row is inserted into a table, and destroyed when it is removed from a
table.'BBBBBBBB.RRRR.FFFF' where BBBBBBBB is the block number, RRRR is the
slot(row) number, and FFFF is a file number.
ROWNUM
For each row returned by a query, the ROWNUM pseudo column returns a
number indicating the order in which Oracle selects the row from a table or set of
joined rows. The first row selected has a ROWNUM of 1, the second has 2, and so
on.
You can use ROWNUM to limit the number of rows returned by a query, as in this
example:
SELECT * FROM employees WHERE ROWNUM < 10;
Rowid
Row-num
Rowid is permanent.
Row-num is temporary.
FROM table
[WHERE condition]
[GROUP BY group_by_expression]
[HAVING group_condition]
[ORDER BY column];
The WHERE clause cannot be used to restrict groups. you use the
HAVING clause to restrict groups.
Having clause
Both where and having clause can be used to filter the data.
Where as in where clause it is not
mandatory.
MERGE Statement
You can use merge command to perform insert and update in a single
command.
Ex: Merge into student1 s1
Using (select * from student2) s2
On (s1.no=s2.no)
Page 14 of 98
DWH Training -9739096158
Sub-query
Co-related sub-query
Page 15 of 98
Example:
Example:
Indexes:
1.
Bitmap indexes are most appropriate for columns having low distinct
valuessuch as GENDER, MARITAL_STATUS, and RELATION. This
assumption is not completely accurate, however. In reality, a bitmap
index is always advisable for systems in which data is not frequently
updated by many concurrent systems. In fact, as I'll demonstrate here, a
bitmap index on a column with 100-percent unique values (a column
candidate for primary key) is as efficient as a B-tree index.
2.
3.
4.
5.
6.
7.
The table is large and most queries are expected to retrieve less than 2
to 4 percent of the rows
8.
hints such as ORDERED, LEADING, INDEX, FULL, and the various AJ and SJ hints
can take a wild optimizer and give you optimal performance
Tables analyze and update Analyze Statement
The ANALYZE statement can be used to gather statistics for a specific table,
index or cluster. The statistics can be computed exactly, or estimated based on a
specific number of rows, or a percentage of rows:
ANALYZE TABLE employees COMPUTE STATISTICS;
ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 15 PERCENT;
ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing
systems.
FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.
CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS,
based on statistics gathered.
Hints for Parallel Execution, (/*+ parallel(a,4) */) specify degree either 2
or 4 or 16
Additional Hints
HASH
Hashes one table (full scan) and creates a hash index for that table. Then
hashes other table and uses hash index to find corresponding records.
Therefore not suitable for < or > join conditions.
/*+ use_hash */
Use Hint to force using index
SELECT /*+INDEX (TABLE_NAME INDEX_NAME) */ COL1,COL2 FROM
TABLE_NAME
Select ( /*+ hash */ ) empno from
ORDERED- This hint forces tables to be joined in the order specified. If you
know table X has fewer rows, then ordering it first may speed execution in a join.
PARALLEL (table, instances)This specifies the operation is to be done in
parallel.
If index is not able to create then will go for /*+ parallel(table, 8)*/-----For
select and update example---in where clase like st,not in ,>,< ,<> then we will
use.
Explain Plan:
Explain plan will tell us whether the query properly using indexes or not.whatis
the cost of the table whether it is doing full table scan or not, based on these
statistics we can tune the query.
The explain plan process stores data in the PLAN_TABLE. This table can be
located in the current schema or a shared schema and is created using in
SQL*Plus as follows:
SQL> CONN sys/password AS SYSDBA
Connected
SQL> @$ORACLE_HOME/rdbms/admin/utlxplan.sql
SQL> GRANT ALL ON sys.plan_table TO public;
SQL> CREATE PUBLIC SYNONYM plan_table FOR sys.plan_table;
What is your tuning approach if SQL query taking long time? Or how do
u tune SQL query?
Page 18 of 98
DWH Training -9739096158
If query taking long time then First will run the query in Explain Plan, The explain
plan process stores data in the PLAN_TABLE.
it will give us execution plan of the query like whether the query is using the
relevant indexes on the joining columns or indexes to support the query are
missing.
If joining columns doesnt have index then it will do the full table scan if it is full
table scan the cost will be more then will create the indexes on the joining
columns and will run the query it should give better performance and also
needs to analyze the tables if analyzation happened long back. The ANALYZE
statement can be used to gather statistics for a specific table, index or cluster
using
ANALYZE TABLE employees COMPUTE STATISTICS;
If still have performance issue then will use HINTS, hint is nothing but a clue. We
can use hints like
ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing
systems.
FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.
CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS,
based on statistics gathered.
HASH
Hashes one table (full scan) and creates a hash index for that table. Then
hashes other table and uses hash index to find corresponding records.
Therefore not suitable for < or > join conditions.
/*+ use_hash */
Hints are most useful to optimize the query performance.
Store Procedure:
What are the differences between stored procedures and triggers?
Page 19 of 98
DWH Training -9739096158
Triggers:
Oracle lets you define procedures called triggers that run implicitly when an
INSERT, UPDATE, or DELETE statement is issued against the associated table
Page 20 of 98
DWH Training -9739096158
Triggers are similar to stored procedures. A trigger stored in the database can
include SQL and PL/SQL
Types of Triggers
This section describes the different types of triggers:
INSTEAD OF Triggers
Row Triggers
A row trigger is fired each time the table is affected by the triggering statement.
For example, if an UPDATE statement updates multiple rows of a table, a row
trigger is fired once for each row affected by the UPDATE statement. If a
triggering statement affects no rows, a row trigger is not run.
BEFORE and AFTER Triggers
When defining a trigger, you can specify the trigger timing--whether the trigger
action is to be run before or after the triggering statement. BEFORE and AFTER
apply to both statement and row triggers.
BEFORE and AFTER triggers fired by DML statements can be defined only on
tables, not on views.
Difference between Trigger and Procedure
Triggers
Stored Procedures
Functions
Page 22 of 98
DWH Training -9739096158
2.2
IMPORTANT QUERIES
No
Add1
Add2
abc
100
hyd
bang
xyz
200
Mysore
pune
max(decode(rank_id,1,address)) as add1,
max(decode(rank_id,2,address)) as add2,
max(decode(rank_id,3,address))as add3
from
(select emp_id,address,rank() over (partition by emp_id order by
emp_id,address )rank_id from temp )
group by
emp_id
5. Rank query:
Select empno, ename, sal, r from (select empno, ename, sal, rank () over (order
by sal desc) r from EMP);
6. Dense rank query:
The DENSE_RANK function works acts like the RANK function except that it
assigns consecutive ranks:
Select empno, ename, Sal, from (select empno, ename, sal, dense_rank () over
(order by sal desc) r from emp);
7. Top 5 salaries by using rank:
Select empno, ename, sal,r from (select empno,ename,sal,dense_rank() over
(order by sal desc) r from emp) where r<=5;
Or
Select * from (select * from EMP order by sal desc) where rownum<=5;
8. 2 nd highest Sal:
Select empno, ename, sal, r from (select empno, ename, sal, dense_rank () over
(order by sal desc) r from EMP) where r=2;
9. Top sal:
Select * from EMP where sal= (select max (sal) from EMP);
10.
Hierarchical queries
Page 24 of 98
Starting at the root, walk from the top down, and eliminate employee Higgins in
the result, but
process the child rows.
SELECT department_id, employee_id, last_name, job_id, salary
FROM employees
WHERE last_name! = Higgins
START WITH manager_id IS NULL
CONNECT BY PRIOR employee_id = menagerie;
3 DWH CONCEPTS
What is BI?
Business Intelligence refers to a set of methods and techniques that are used by
organizations for tactical and strategic decision making. It leverages methods and
technologies that focus on counts, statistics and business objectives to improve
business performance.
The objective of Business Intelligence is to better understand customers and
improve customer service, make the supply and distribution chain more efficient,
and to identify and address business problems and opportunities quickly.
Warehouse is used for high level data analysis purpose.It
is used for predictions, timeseries analysis, financial
Analysis, what -if simulations etc. Basically it is used
for better decision making.
What is a Data Warehouse?
Data Warehouse is a "Subject-Oriented, Integrated, Time-Variant Nonvolatile
collection of data in support of decision making".
In terms of design data warehouse and data mart are almost the same.
In general a Data Warehouse is used on an enterprise level and a Data Marts is
used on a business division/department level.
Subject Oriented:
Data that gives information about a particular subject instead of about a
company's ongoing operations.
Integrated:
Data that is gathered into the data warehouse from a variety of sources and
Page 25 of 98
DWH Training -9739096158
Data Warehouse
Page 27 of 98
DWH Training -9739096158
Types of facts?
There are three types of facts:
Additive: Additive facts are facts that can be summed up through all of the
dimensions in the fact table.
What is Granularity?
Principle: create fact tables with the most granular data possible to support
analysis of the business process.
In Data warehousing grain refers to the level of detail available in a given fact
table as well as to the level of detail provided by a star schema.
It is usually given as the number of records per key within the table. In general,
the grain of the fact table is the grain of the star schema.
Facts: Facts must be consistent with the grain.all facts are at a uniform grain.
Dimensions: each dimension associated with fact table must take on a single
value for each fact row.
Page 30 of 98
DWH Training -9739096158
Dimensional Model
Page 31 of 98
DWH Training -9739096158
No attribute is specified.
At this level, the data modeler attempts to identify the highest-level relationships
among the different entities.
Page 33 of 98
DWH Training -9739096158
At this level, the data modeler attempts to describe the data in as much detail as
possible, without regard to how they will be physically implemented in the
database.
In data warehousing, it is common for the conceptual data model and the logical
data model to be combined into a single step (deliverable).
The steps for designing the logical data model are as follows:
1. Identify all entities.
2. Specify primary keys for all entities.
3. Find the relationships between different entities.
4. Find all attributes for each entity.
5. Resolve many-to-many relationships.
6. Normalization.
Physical Data Model
Features of physical data model include:
At this level, the data modeler will specify how the logical data model will be
realized in the database schema.
The steps for physical data model design are as follows:
Page 34 of 98
DWH Training -9739096158
http://www.learndatamodeling.com/dm_standard.htm
10.
The differences between a logical data model and physical data model is
shown below.
Logical vs Physical Data Modeling
Logical Data Model
Entity
Table
Attribute
Column
Primary Key
Alternate Key
Rule
Relationship
Foreign Key
Definition
Comment
Page 35 of 98
DWH Training -9739096158
Page 36 of 98
DWH Training -9739096158
Page 37 of 98
DWH Training -9739096158
Page 38 of 98
DWH Training -9739096158
ACW_DF_FEES_STG
Non-Key Attributes
SEGMENT1
ORGANIZATION_ID
ITEM_TYPE
BUYER_ID
COST_REQUIRED
QUARTER_1_COST
QUARTER_2_COST
QUARTER_3_COST
QUARTER_4_COST
COSTED_BY
COSTED_DATE
APPROV ED_BY
APPROV ED_DATE
ACW_DF_FEES_F
Primary Key
ACW_DF_FEES_KEY
[PK1]
Non-Key Attributes
PRODUCT_KEY
ORG_KEY
DF_MGR_KEY
COST_REQUIRED
DF_FEES
COSTED_BY
COSTED_DATE
APPROV ING_MGR
APPROV ED_DATE
D_CREATED_BY
D_CREATION_DATE
D_LAST_UPDATE_BY
D_LAST_UPDATED_DATE
EDW_TIME_HIERARCHY
ACW_PCBA_A PPROVAL_STG
Non-Key Attributes
INV ENTORY_ITEM_ID
LATEST_REV
LOCATION_ID
LOCATION_CODE
APPROV AL_FLAG
ADJUSTMENT
APPROV AL_DATE
TOTA L_ADJUSTMENT
TOTA L_ITEM_COST
DEMAND
COMM_MGR
BUYER_ID
BUYER
RFQ_CREATED
RFQ_RESPONSE
CSS
ACW_DF_A PPROVAL_STG
Non-Key Attributes
INV ENTORY_ITEM_ID
CISCO_PART_NUMBER
LATEST_REV
PCBA _ITEM_FLAG
APPROV AL_FLAG
APPROV AL_DATE
LOCATION_ID
LOCATION_CODE
BUYER
BUYER_ID
RFQ_CREATED
RFQ_RESPONSE
CSS
ACW_PCBA_A PPROVAL_F
Primary Key
PCBA _APPROVAL_KEY
[PK1]
Non-Key Attributes
PART_KEY
CISCO_PART_NUMBER
SUPPLY_CHANNEL_KEY
NPI
APPROV AL_FLAG
ADJUSTMENT
APPROV AL_DATE
ADJUSTMENT_AMT
SPEND_BY _ASSEMBLY
COMM_MGR_KEY
BUYER_ID
RFQ_CREATED
RFQ_RESPONSE
CSS
D_CREATED_BY
D_CREATED_DATE
D_LAST_UPDATED_BY
D_LAST_UPDATE_DATE
ACW_DF_A PPROVAL_F
Primary Key
DF_APPROVAL_KEY
[PK1]
Non-Key Attributes
PART_KEY
CISCO_PART_NUMBER
SUPPLY_CHANNEL_KEY
PCBA _ITEM_FLAG
APPROV ED
APPROV AL_DATE
BUYER_ID
RFQ_CREATED
RFQ_RESPONSE
CSS
D_CREATED_BY
D_CREATION_DATE
D_LAST_UPDATED_BY
D_LAST_UPDATE_DATE
ACW_ORGANIZATION_D
Primary Key
ORG_KEY [PK1]
Non-Key Attributes
ORGANIZATION_CODE
CREA TED_BY
CREA TION_DATE
LAST_UPDATE_DATE
LAST_UPDATED_BY
D_CREATED_BY
D_CREATION_DATE
D_LAST_UPDATE_DATE
D_LAST_UPDATED_BY
ACW_USERS_D
Primary Key
USER_KEY [PK1]
Non-Key Attributes
PERSON_ID
EMAIL_ADDRESS
LAST_NAME
FIRST_NAME
FULL_NAME
EFFECTIV E_STA RT_DATE
EFFECTIV E_END_DATE
EMPLOYEE_NUMBER
LAST_UPDATED_BY
LAST_UPDATE_DATE
CREA TION_DATE
CREA TED_BY
D_LAST_UPDATED_BY
D_LAST_UPDATE_DATE
D_CREATION_DATE
D_CREATED_BY
ACW_PART_TO_PID_D
Users
Primary Key
PART_TO_PID_KEY [PK1]
Non-Key Attributes
PART_KEY
CISCO_PART_NUMBER
PRODUCT_KEY
PRODUCT_NA ME
LATEST_REVISION
D_CREATED_BY
D_CREATION_DATE
D_LAST_UPDATED_BY
D_LAST_UPDATE_DATE
ACW_PRODUCTS_D
Primary Key
PRODUCT_KEY [PK1]
Non-Key Attributes
PRODUCT_NA ME
BUSINESS_UNIT_ID
BUSINESS_UNIT
PRODUCT_FAMILY_ID
PRODUCT_FAMILY
ITEM_TYPE
D_CREATED_BY
D_CREATION_DATE
D_LAST_UPDATE_BY
D_LAST_UPDATED_DATE
ACW_SUPPLY_CHA NNEL_D
Primary Key
SUPPLY_CHANNEL_KEY
[PK1]
Non-Key Attributes
SUPPLY_CHANNEL
DESCRIPTION
LAST_UPDATED_BY
LAST_UPDATE_DATE
CREA TED_BY
CREA TION_DATE
D_LAST_UPDATED_BY
D_LAST_UPDATE_DATE
D_CREATED_BY
D_CREATION_DATE
Page 39 of 98
DWH Training -9739096158
ACW_DF_FEES_F
Columns
ACW_DF_FEES_KEY
NUMB ER(10) [P K1]
PRODUCT_KEY
NUMB ER(10)
ORG_KE Y
NUMB ER(10)
DF_MGR_K EY
NUMB ER(10)
COST_REQUIRED
CHA R(1)
DF_FE ES
FLOAT(12)
COSTED_B Y
NUMB ER(10)
COSTED_DATE
DAT E
APP ROV ING_MGR
NUMB ER(10)
APP ROV ED_DATE
DAT E
D_CREA TED_BY
CHA R(10)
D_CREA TION_DATE
DAT E
D_LAST_UPDATE_BY CHA R(10)
D_LAST_UPDATED_DAT CHA
E R(10)
ACW_DF_FEES_STG
Columns
SEGMENT 1
VARCHAR2(40)
ORGA NIZATION_IDNUMB ER(10)
IT EM_TYPE
CHA R(30)
BUY ER_ID
NUMB ER(10)
COST_REQUIRED CHA R(1)
QUARTE R_1_COSTFLOAT(12)
QUARTE R_2_COSTFLOAT(12)
QUARTE R_3_COSTFLOAT(12)
QUARTE R_4_COSTFLOAT(12)
COSTED_B Y
NUMB ER(10)
COSTED_DATE
DAT E
APP ROV ED_BY
NUMB ER(10)
APP ROV ED_DATE DAT E
EDW_TIME_HIE RARCHY
ACW_PCBA_APPROVAL_F
Columns
PCB A_A PPROVAL_KEY CHA R(10) [PK1]
PART_K EY
NUMB ER(10)
CISCO_PA RT _NUMBE R CHA R(10)
SUP PLY _CHANNE L_KEYNUMB ER(10)
NPI
CHA R(1)
APP ROV AL_FLAG
CHA R(1)
ADJUSTME NT
CHA R(1)
APP ROV AL_DA TE
DAT E
ADJUSTME NT_AMT
FLOAT(12)
SPE ND_BY_ASSE MBLYFLOAT(12)
COMM_MGR_K EY
NUMB ER(10)
BUY ER_ID
NUMB ER(10)
RFQ_CREATED
CHA R(1)
RFQ_RE SPONSE
CHA R(1)
CSS
CHA R(10)
D_CREA TED_BY
CHA R(10)
D_CREA TED_DAT E
CHA R(10)
D_LAST_UPDATED_BY CHA R(10)
D_LAST_UPDATE_DATEDAT E
ACW_PCBA_APPROVAL_STG
Columns
INVENTORY_IT EM_ID
NUMB ER(10)
LATEST _REV
CHA R(10)
LOCATION_ID
NUMB ER(10)
LOCATION_CODE
CHA R(10)
APP ROV AL_FLAG CHA R(1)
ADJUSTME NT
CHA R(1)
APP ROV AL_DA TE DAT E
TOT AL_ADJUSTMENT
CHA R(10)
TOT AL_ITEM _COST FLOAT(10)
DEMA ND
NUMB ER
COMM_MGR
CHA R(10)
BUY ER_ID
NUMB ER(10)
BUY ER
VARCHAR2(240)
RFQ_CREATED
CHA R(1)
RFQ_RE SPONSE
CHA R(1)
CSS
CHA R(10)
ACW_DF_APPROVA L_STG
Columns
INVENTORY_IT EM_ID NUMB ER(10)
CISCO_PA RT _NUMBE R
CHA R(30)
LATEST _REV
CHA R(10)
PCB A_ITEM_FLAG
CHA R(1)
APP ROV AL_FLAG
CHA R(1)
APP ROV AL_DA TE
DAT E
LOCATION_ID
NUMB ER(10)
SUP PLY _CHANNE L
CHA R(10)
BUY ER
VARCHAR2(240)
BUY ER_ID
NUMB ER(10)
RFQ_CREATED
CHA R(1)
RFQ_RE SPONSE
CHA R(1)
CSS
CHA R(10)
ACW_DF_APPROVA L_F
Columns
DF_APPROVAL_KEY
NUMB ER(10) [P K1]
PART_K EY
NUMB ER(10)
CISCO_PA RT_NUMBE R CHA R(30)
SUP PLY _CHANNE L_KEYNUMB ER(10)
PCB A_ITEM_FLAG
CHA R(1)
APP ROV ED
CHA R(1)
APP ROV AL_DA TE
DAT E
BUY ER_ID
NUMB ER(10)
RFQ_CREATED
CHA R(1)
RFQ_RE SPONSE
CHA R(1)
CSS
CHA R(10)
D_CREA TED_BY
CHA R(10)
D_CREA TION_DATE
DAT E
D_LAST_UPDATED_BY CHA R(10)
D_LAST_UPDATE_DATEDAT E
ACW_PA RT_TO_PID_D
Columns
PART_T O_PID_KEY
NUMB ER(10) [P K1]
PART_K EY
NUMB ER(10)
CISCO_PA RT_NUMBE RCHA R(30)
PRODUCT_KEY
NUMB ER(10)
PRODUCT_NAME
CHA R(30)
LATEST _REVIS ION
CHA R(10)
D_CREA TED_BY
CHA R(10)
D_CREA TION_DATE
DAT E
D_LAST_UPDATED_BYCHA R(10)
D_LAST_UPDATE_DATE
DAT E
ACW_ORGANIZAT ION_D
Columns
ORG_KE Y
NUMB ER(10) [P K1]
ORGA NIZATION_CODE CHA R(30)
CRE ATED_BY
NUMB ER(10)
CRE ATION_DATE
DAT E
LAST_UPDATE_DATE DAT E
LAST_UPDATED_BY NUMB ER
D_CREA TED_BY
CHA R(10)
D_CREA TION_DATE
DAT E
D_LAST_UPDATE_DATE
DAT E
D_LAST_UPDATED_BYCHA R(10)
PID_for_DF_Fees
ACW_US ERS_D
Columns
USE R_K EY
NUMB ER(10) [P K1]
PERSON_ID
CHA R(10)
EMAIL_ADDRESS
CHA R(10)
LAST_NAM E
VARCHAR2(50)
FIRST _NAME
VARCHAR2(50)
FULL_NAM E
CHA R(10)
EFFECTIVE_START _DATE
DAT E
EFFECTIVE_END_DAT E DAT E
EMPLOYEE_NUMBER
NUMB ER(10)
SEX
NUMB ER
LAST_UPDATE_DATE
DAT E
CRE ATION_DATE
DAT E
CRE ATED_BY
NUMB ER(10)
D_LAST_UPDATED_BY CHA R(10)
D_LAST_UPDATE_DATE DAT E
D_CREA TION_DATE
DAT E
D_CREA TED_BY
CHA R(10)
ACW_PRODUCTS_D
Columns
PRODUCT_KEY
NUMB ER(10) [P K1]
PRODUCT_NAME
CHA R(30)
BUS INESS _UNIT_ID
NUMB ER(10)
BUS INESS _UNIT
VARCHAR2(60)
PRODUCT_FAM ILY_ID NUMB ER(10)
PRODUCT_FAM ILY
VARCHAR2(180)
IT EM_TYPE
CHA R(30)
D_CREA TED_BY
CHA R(10)
D_CREA TION_DATE
DAT E
D_LAST_UPDATE_BY CHA R(10)
D_LAST_UPDATED_DAT CHA
E R(10)
ACW_SUPPLY_CHANNEL_D
Columns
SUP PLY _CHANNE L_KEYNUMB ER(10) [P K1]
SUP PLY _CHANNE L
CHA R(60)
DES CRIPT ION
VARCHAR2(240)
LAST_UPDATED_BY
NUMB ER
LAST_UPDATE_DATE DAT E
CRE ATED_BY
NUMB ER(10)
CRE ATION_DATE
DAT E
D_LAST_UPDATED_BY CHA R(10)
D_LAST_UPDATE_DATEDAT E
D_CREA TED_BY
CHA R(10)
D_CREA TION_DATE
DAT E
Users
Page 40 of 98
DWH Training -9739096158
Customer Key
Name
State
1001
Christina
Illinois
After Christina moved from Illinois to California, the new information replaces the
new record, and we have the following table:
Customer Key
Name
State
1001
Christina
California
Advantages:
- This is the easiest way to handle the Slowly Changing Dimension problem,
since there is no need to keep track of the old information.
Disadvantages:
-
Usage:
Name
State
1001
Christina
Illinois
After Christina moved from Illinois to California, we add the new information as a
new row into the table:
Customer Key
Name
State
Page 41 of 98
1001
Christina
Illinois
1005
Christina
California
Advantages:
- This allows us to accurately keep all historical information.
Disadvantages:
- This will cause the size of the table to grow fast. In cases where the number of
rows for the table is very high to start with, storage and performance can
become a concern.
- This necessarily complicates the ETL process.
Usage:
About 50% of the time.
When to use Type 2:
Type 2 slowly changing dimension should be used when it is necessary for the
data warehouse to track historical changes.
Type 3 Slowly Changing Dimension
In Type 3 Slowly Changing Dimension, there will be two columns to indicate the
particular attribute of interest, one indicating the original value, and one
indicating the current value. There will also be a column that indicates when the
current value becomes active.
In our example, recall we originally have the following table:
Customer Key
Name
State
1001
Christina
Illinois
Customer Key
Name
Original State
Current State
Page 42 of 98
Effective Date
After Christina moved from Illinois to California, the original information gets
updated, and we have the following table (assuming the effective date of change
is January 15, 2003):
Customer Key
Name
Original State
Current State
Effective Date
1001
Christina
Illinois
California
15-JAN-2003
Advantages:
- This does not increase the size of the table, since new information is updated.
- This allows us to keep some part of history.
Disadvantages:
- Type 3 will not be able to keep all history where an attribute is changed more
than once. For example, if Christina later moves to Texas on December 15, 2003,
the California information will be lost.
Usage:
Type 3 is rarely used in actual practice.
When to use Type 3:
Type III slowly changing dimension should only be used when it is necessary for
the data warehouse to track historical changes, and when such changes will only
occur for a finite number of time.
What is Staging area why we need it in DWH?
If target and source databases are different and target table volume is high it
contains some millions of records in this scenario without staging table we need to
design your informatica using look up to find out whether the record exists or not
in the target table since target has huge volumes so its costly to create cache it
will hit the performance.
If we create staging tables in the target database we can simply do outer join in
the source qualifier to determine insert/update this approach will give you good
performance.
It will avoid full table scan to determine insert/updates on target.
And also we can create index on staging tables since these tables were designed
for specific application it will not impact to any other schemas/users.
While processing flat files to data warehousing we can perform cleansing.
Data cleansing, also known as data scrubbing, is the process of ensuring that a
set of data is correct and accurate. During data cleansing, records are checked for
accuracy and consistency.
Page 43 of 98
DWH Training -9739096158
Data cleansing
Weeding out unnecessary or unwanted things (characters and spaces etc)
from incoming data to make it more meaningful and informative
Data merging
Data can be gathered from heterogeneous systems and put together
Data scrubbing
Data scrubbing is the process of fixing or eliminating individual pieces of
data that are incorrect, incomplete or duplicated before the data is passed
to end user.
Data scrubbing is aimed at more than eliminating errors and redundancy.
The goal is also to bring consistency to various data sets that may have
been created with different, incompatible business rules.
4 ETL-INFORMATICA
4.1 Informatica Overview
Page 45 of 98
DWH Training -9739096158
Informatica Transformations:
Mapping: Mapping is the Informatica Object which contains set of
transformations including source and target. Its look like pipeline.
Mapplet:
Mapplet is a set of reusable transformations. We can use this mapplet in any
mapping within the Folder.
A mapplet can be active or passive depending on the transformations in the
mapplet. Active mapplets contain one or more active transformations. Passive
mapplets contain only passive transformations.
When you add transformations to a mapplet, keep the following restrictions in
mind:
Normalizer transformations
COBOL sources
XML sources
Page 46 of 98
Target definitions
Other mapplets
The mapplet contains at least one Output transformation with at least one
port connected to a transformation in the mapplet.
Unconnected Lookup
Lookup Caches:
When configuring a lookup cache, you can specify any of the following options:
Persistent cache
Static cache
Dynamic cache
Shared cache
Dynamic cache: When you use a dynamic cache, the PowerCenter Server
updates the lookup cache as it passes rows to the target.
If you configure a Lookup transformation to use a dynamic cache, you can only
use the equality operator (=) in the lookup condition.
NewLookupRow Port will enable automatically.
NewLookupRow
Value
Description
Static cache: It is a default cache; the PowerCenter Server doesnt update the
lookup cache as it passes rows to the target.
Persistent cache: If the lookup table does not change between sessions,
configure the Lookup transformation to use a persistent lookup cache. The
PowerCenter Server then saves and reuses cache files from session to session,
eliminating the time required to read the lookup table.
Differences between dynamic lookup and static lookup
Dynamic Lookup Cache
It is a default cache.
Page 50 of 98
DWH Training -9739096158
Aggregator Transformation:
Transformation type:
Page 51 of 98
DWH Training -9739096158
Active
Connected
The Aggregator transformation performs aggregate calculations, such as
averages and sums. The Aggregator transformation is unlike the Expression
transformation, in that you use the Aggregator transformation to perform
calculations on groups. The Expression transformation permits you to perform
calculations on a row-by-row basis only.
Components of the Aggregator Transformation:
The Aggregator is an active transformation, changing the number of rows in the
pipeline. The Aggregator transformation has the following components and
options
Aggregate cache: The Integration Service stores data in the aggregate cache
until it completes aggregate calculations. It stores group values in an index
cache and row data in the data cache.
Group by port: Indicate how to create groups. The port can be any input,
input/output, output, or variable port. When grouping data, the Aggregator
transformation outputs the last row of each group unless otherwise specified.
Sorted input: Select this option to improve session performance. To use sorted
input, you must pass data to the Aggregator transformation sorted by group by
port, in ascending or descending order.
Aggregate Expressions:
The Designer allows aggregate expressions only in the Aggregator
transformation. An aggregate expression can include conditional clauses and
non-aggregate functions. It can also include one aggregate function nested
within another aggregate function, such as:
MAX (COUNT (ITEM))
The result of an aggregate expression varies depending on the group by ports
used in the transformation
Aggregate Functions
Use the following aggregate functions within an Aggregator transformation. You
can nest one aggregate function within another aggregate function.
The transformation language includes the following aggregate functions:
(AVG,COUNT,FIRST,LAST,MAX,MEDIAN,MIN,PERCENTAGE,SUM,VARIANCE and
STDDEV)
When you use any of these functions, you must use them in an expression within
an Aggregator transformation.
Page 52 of 98
DWH Training -9739096158
Type
ScriptName Input
Description
Receives the name of the script to execute for the current
row.
ScriptResult Output Returns PASSED if the script execution succeeds for the row.
Otherwise contains FAILED.
ScriptError
Output Returns errors that occur when a script fails for a row.
Page 55 of 98
DWH Training -9739096158
Lookup
Page 56 of 98
DWH Training -9739096158
Lookup
If we look into session logs it shows busy percentage based on that we need to
find out where is bottle neck.
***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] ****
Thread [READER_1_1_1] created for [the read stage] of partition point
[SQ_ACW_PCBA_APPROVAL_STG] has completed: Total Run Time = [7.193083]
secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000]
Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point
[SQ_ACW_PCBA_APPROVAL_STG] has completed. The total run time was
insufficient for any meaningful statistics.
Thread [WRITER_1_*_1] created for [the write stage] of partition point
[ACW_PCBA_APPROVAL_F1, ACW_PCBA_APPROVAL_F] has completed: Total Run
Time = [0.806521] secs, Total Idle Time = [0.000000] secs, Busy Percentage =
[100.000000]
If suppose I've to load 40 lacs records in the target table and the workflow
is taking about 10 - 11 hours to finish. I've already increased
the cache size to 128MB.
There are no joiner, just lookups
and expression transformations
Ans:
(1) If the lookups have many records, try creating indexes
on the columns used in the lkp condition. And try
increasing the lookup cache.If this doesnt increase
the performance. If the target has any indexes disable
them in the target pre load and enable them in the
target post load.
(2) Three things you can do w.r.t it.
1. Increase the Commit intervals ( by default its 10000)
2. Use bulk mode instead of normal mode incase ur target doesn't have
primary keys or use pre and post session SQL to
implement the same (depending on the business req.)
3. Uses Key partitionning to load the data faster.
(3)If your target consists key constraints and indexes u slow
the loading of data. To improve the session performance in
this case drop constraints and indexes before you run the
session and rebuild them after completion of session.
Page 58 of 98
DWH Training -9739096158
1)
2)
3)
4)
Because its var so it stores the max last upd_date value in the
repository, in the next run our source qualifier query will fetch only the
records updated or inseted after previous run.
create two store procedures one for update cont_tbl_1 with session
st_time, set property of store procedure type as Source_pre_load .
Update the previous record eff-end-date with sysdate and insert as a new
record with source data.
Once you fetch the record from source qualifier. We will send it to lookup to
find out whether the record is present in the target or not based on source
primary key column.
Once we find the match in the lookup we are taking SCD column from
lookup and source columns from SQ to expression transformation.
If the source and target data is same then I can make a flag as S.
If the source and target data is different then I can make a flag as U.
If source data does not exists in the target that means lookup returns null
value. I can flag it as I.
Based on the flag values in router I can route the data into insert and
update flow.
Complex Mapping
server.
Source file directory contain older than 30 days files with timestamps.
For this requirement if I hardcode the timestamp for source file name it will
process the same file every day.
Then I am going to use the parameter file to supply the values to session
variables ($InputFilename).
This mapping will update the parameter file with appended timestamp to
file name.
I make sure to run this parameter file update mapping before my actual
mapping.
If the file size is greater than zero then it will send email notification to
source system POC (point of contact) along with deno zero record file and
appropriate email subject and body.
If file size<=0 that means there is no records in flat file. In this case shell
script will not send any email notification.
Or
We are expecting a not null value for one of the source column.
$DBConnection_Source
$DBConnection_Target
$InputFile
$OutputFile
Variable
Parameter
Delimiter
Fixed Width
Page 64 of 98
DWH Training -9739096158
Page 65 of 98
DWH Training -9739096158
Page 66 of 98
DWH Training -9739096158
Developer Changes:
Page 67 of 98
DWH Training -9739096158
Client applications are the same, but work on top of the new services
framework
Manages the data from source system to target system within the memory
and disk
The main three components of Integration Service which enable data movement
are,
Load Balancer
Adds partitions to the session when the session is configured for dynamic
partitioning.
Sends a request to start worker DTM processes on other nodes when the
session is configured to run on a grid.
Runs post-session stored procedures, SQL, and shell commands and sends
post-session email
2)
3)
4)
Because its var so it stores the max last upd_date value in the
repository, in the next run our source qualifier query will fetch only the
records updated or inseted after previous run.
Page 72 of 98
DWH Training -9739096158
Page 73 of 98
DWH Training -9739096158
Logic in the SQ is
In expression assign max last update date value to the variable using function
set max variable.
Page 74 of 98
DWH Training -9739096158
Page 75 of 98
DWH Training -9739096158
Page 76 of 98
DWH Training -9739096158
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WE
EKLY_HIST_BAAN.ST:s_m_GEHC_APO_BAAN_SALES_HIST_AUSTRI]
$DBConnection_Source=DMD2_GEMS_ETL
$DBConnection_Target=DMD2_GEMS_ETL
$$LastUpdateDate Time =01/01/1940
Page 77 of 98
DWH Training -9739096158
Main mapping
Page 78 of 98
DWH Training -9739096158
Workflod Design
Page 79 of 98
DWH Training -9739096158
generate file).It has to generate 2 flat files and name of the flat file is
corresponding state name that is the requirement.
Below is my mapping.
Source (Table) -> SQ -> Target (FF)
Source:
State
Transaction
City
AP
HYD
AP
TPT
KA
BANG
KA
MYSORE
KA
HUBLI
This functionality was added in informatica 8.5 onwards earlier versions it was
not there.
We can achieve it with use of transaction control and special "FileName" port in
the target file .
In order to generate the target file names from the mapping, we should make
use of the special "FileName" port in the target file. You can't create this special
port from the usual New port button. There is a special button with label "F" on it
to the right most corner of the target flat file when viewed in "Target Designer".
When you have different sets of input data with different target files created, use
the same instance, but with a Transaction Control transformation which defines
the boundary for the source sets.
in target flat file there is option in column tab i.e filename as column.
when you click that one non editable column gets created in metadata of target.
in transaction control give condition as
iif(flag=1,tc_commit_before,tc_continue_tranaction)
map the state column to target's filename column
ur mapping will be like this
source -> sq->expression-> transaction control-> target
run it ,separate files will be created by name of state
Source:
Ename
EmpNo
stev
100
methew
100
john
101
tom
101
Target:
Ename
EmpNo
Stev
methew
100
John tom
101
EmpNo
stev
100
Stev
100
john
101
Mathew
102
Page 82 of 98
DWH Training -9739096158
Output:
Target_1:
Ename
EmpNo
Stev
100
John
101
Mathew
102
Target_2:
Ename
EmpNo
Stev
100
If both workflow exists in same folder we can create 2 worklet rather than
creating 2 workfolws.
We can set the dependency between these two workflow using shell script
is one approach.
If both workflow exists in different folrder or different rep then we can use below
approaches.
1) Using shell script
As soon as first workflow get completes we are creating zero byte file
(indicator file).
If indicator file is not available we will wait for 5 minutes and again we will
check for the indicator. Like this we will continue the loop for 5 times i.e 30
minutes.
After 30 minutes if the file does not exists we will send out email
notification.
Page 85 of 98
DWH Training -9739096158
Transformation Specifications
Page 86 of 98
DWH Training -9739096158
Before developing the mappings you need to prepare the specifications document
for the mappings you need to develop. A good template is placed in the
templates folder You can use your own template as long as it has as much detail
or more than that which is in this template.
While estimating the time required to develop mappings the thumb rule is as
follows.
Simple Mapping 1 Person Day
Medium Complexity Mapping 3 Person Days
Complex Mapping 5 Person Days.
Usually the mapping for the fact table is most complex and should be allotted as
much time for development as possible.
Data Loading from Flat Files
Its an accepted best practice to always load a flat file into a staging table before
any transformations are done on the data in the flat file.
Always use LTRIM, RTRIM functions on string columns before loading data into a
stage table.
You can also use UPPER function on string columns but before using it you need
to ensure that the data is not case sensitive (e.g. ABC is different from Abc)
If you are loading data from a delimited file then make sure the delimiter is not a
character which could appear in the data itself. Avoid using comma-separated
files. Tilde (~) is a good delimiter to use.
Failure Notification
Once in production your sessions and batches need to send out notification when
then fail to the Support team. You can do this by configuring email task in the
session level.
Naming Conventions and usage of Transformations
Port Standards:
Input Ports It will be necessary to change the name of input ports for lookups,
expression and filters where ports might have the same name. If ports do have
the same name then will be defaulted to having a number after the name.
Change this default to a prefix of in_. This will allow you to keep track of input
ports through out your mappings.
Prefixed with: IN_
Prefixed with: V_
Quick Reference
Object Type
Syntax
Folder
Mapping
Session
Batch
Source Definition
Target Definition
Aggregator
AGG_<Purpose>
Expression
EXP_<Purpose>
Filter
FLT_<Purpose>
Joiner
Lookup
Normalizer
Norm_<Source Name>
Rank
RNK_<Purpose>
Router
RTR_<Purpose>
Sequence Generator
Source Qualifier
Page 88 of 98
DWH Training -9739096158
Stored Procedure
STP_<Database Name>_<Procedure
Name>
Update Strategy
Mapplet
MPP_<Purpose>
Input Transformation
Output Tranformation
Database Connections
XXX_<Database Name>_<Schema
Name>
work
connections.
1. Cache lookups if source table is under 500,000 rows and DONT cache for
tables over 500,000 rows.
2. Reduce the number of transformations. Dont use an Expression
Transformation to collect fields. Dont use an Update Transformation if only
inserting. Insert mode is the default.
3. If a value is used in multiple ports, calculate the value once (in a variable)
and reuse the result instead of recalculating it for multiple ports.
4. Reuse objects where possible.
Page 89 of 98
DWH Training -9739096158
When overriding the Lookup SQL, always ensure to put a valid Order
By statement in the SQL. This will cause the database to perform the order
rather than Informatica Server while building the Cache.
11.
12.
13.
14.
15.
16.
Define the source with less number of rows and master source in
Joiner Transformations, since this reduces the search time and also the
cache.
17.
18.
19.
20.
If the lookup table does not change between sessions, configure the
Lookup transformation to use a persistent lookup cache. The Informatica
Server saves and reuses cache files from session to session, eliminating the
time required to read the lookup table.
Page 90 of 98
21.
22.
23.
24.
Reduce the number of rows being cached by using the Lookup SQL
Override option to add a WHERE clause to the default SQL statement.
4.5
UTP Template:
Page 91 of 98
DWH Training -9739096158
Actual
Results,
Step
Description
Test Conditions
Pass
or Fail
Teste
d By
(P or
F)
Expected Results
#
SAPCMS
Interf
aces
1
SOURCE:
the PRCHG
table for a
perticular
session
timestamp
Should be
same as the
expected
Pass
TARGET:
Page 92 of 98
DWH Training -9739096158
Stev
Actual
Results,
Step
Description
Test Conditions
Pass
or Fail
Teste
d By
(P or
F)
Expected Results
#
2
select PRCHG_ID,
PRCHG_DESC,
Should be
same as the
expected
Pass
Stev
DEPT_NBR,
EVNT_CTG_CDE,
PRCHG_TYP_CDE,
PRCHG_ST_CDE,
from T_PRCHG
MINUS
select PRCHG_ID,
PRCHG_DESC,
DEPT_NBR,
EVNT_CTG_CDE,
PRCHG_TYP_CDE,
PRCHG_ST_CDE,
from PRCHG
Check for
Insert
strategy to
load records
into target
table.
Should be
same as the
expected
Pass
Stev
Check for
Update
strategy to
load records
into target
table.
Should be
same as the
expected
Pass
Stev
Page 93 of 98
DWH Training -9739096158
5 UNIX
Basic Commands:
Cat file1
(cat is the command to create none zero byte file)
cat file1 file2 > all -----it will combined (it will create file if it doesnt exit)
cat file1 >> file2---it will append to file 2
o
> will redirect output from standard out (screen) to file or printer or
whatever you like.
25 22 15 * 0 /usr/local/bin/backup_jobs
copy a file
Page 95 of 98
mv file1 newname
mv file1 ~/AAA/
directory.
head filename
-name aaa.txt
(The usual sed command for global string search and replace is this)
If you want to replace 'foo' with the string 'bar' globally in a file.
$ sed -e 's/foo/bar/g' myfile.txt
You can find out what shell you are using by the command:
echo $SHELL
Interactive History
A feature of bash and tcsh (and sometimes others) you can use
the up-arrow keys to access your previous commands, edit
them, and re-execute them.
Basics of the vi editor
Opening a file
Vi filename
Creating text
Edit modes: These keys enter editing modes and type in the text
of your document.
Page 97 of 98
DWH Training -9739096158
Replace 1 character
Replace mode
:w! existing.file Overwrite an existing file with the file currently being edited.
:wq
:q
Quit.
:q!
Page 98 of 98
DWH Training -9739096158