Sei sulla pagina 1di 39

DBMS & Data Modeling

Q. No. Question
1. What is a Multi-Dimensional Database? How does it fit into DW? (2 )
+ (4)

Answer:
A schema with multiple dimensions lined with single or multiple fact tables are multi
dimensional modeling.

The below are the benefits that the user gets out of multidimensional database for analysis
which is why Dim Modeling is used for DW

Dimensions are
• Descriptive data
• Usually textual (not numeric)
• Source of constraints for queries
• Way business users understand data
• Entry point for data access

Dimensional modeling is,

• Predictable
All query constraints come from dimensions
• Withstands changes in user behavior
Supports future, unplanned queries
• Aggregation utilities available
• Standard approaches to modeling
• Extensible
2. What is the difference between a dimensional data model and a normal data model?
(5)

Answer:
Normal Modeling(OLTP)
• For transactional systems
• Used to model minute relations between data elements
• Very complex models
• Difficult to query
• Eliminate data redundancy

Dimensional Modeling(DW)
• Simpler to design
• Denormalized form
• Easier to build analysis queries
• More intuitive, understandable for users
3. What are conformed dimensions? Explain the need for conformed dimensions? (4)

Answer:
They are dimensions where two facts share identical dimensions.
Q. No. Question

Definition:-
Conformed dimension is a dimension which is common to more than one fact table in Dimensional
Modeling.

For example:-
while creating datamart for Sales and Promotions using Dimensional modeling, we need to make
relationship to both the facts for sales and Promotions, for this we have to identify Common
dimension that makes relationship between two facts(sales and promotions) to retrieve the both
facts information by referencing through conformed dimension called Time Dimension which
contains time hierarchy that applicable for both the facts.

Conclusion:-
Most of the times Time Dimension acts as Conformed dimension while designing or creating data
marts through Dimensional Modeling.

4. What is a bus achitecture and how is it implemented? (3)

Answer:
A bus architecture is an array of data marts integrated by conformed dimensions and conformed
facts. This is Kimaball's view of the EDW. It is implemented step by step , building one datamart at
a time. Prior to building the datamarts, the conformed dimensions and facts should be established.
Data marts cater to particular departmental needs, whereas the bus architecture will cater to the
entire organizational needs.

5. What is Market Basket Analysis? How will you arrive at the combination? (5)

Answer:
Noticing the combinations of products that sell together.

Start at the top of the merchandise hierarchy, which is assumed as Department. Calculate the
market basket counts for all pairs of the departments. If there are 20 departments, then counts for
up to 400 pairs are calculated. Rank the results by total market basket counts. The most desirable
results of this first iteration are the records near the top of the list, where the dollars or the units
from the two.

6. Consider the relation CAR_SALE (Car#, Date_sold, Salesman#, Commission%,


Discount_amt). Assume that a car may be sold by multiple salesmen and hence {Car#,
Salesman#} is the primary key. Additional dependencies are Date_sold → Discount_amt and
Salesman # → Commission%. Based on the given primary key, identify the normal form of
the relation. If the relation is not in BCNF, normalize it successively into BCNF. (10)

Answer :
CAR_SALE (Car#, Date_sold, Salesman#, Commission%, Discount_amt).

First Normal Form :


If a table of data meets the definition of a relation, it is in first normal form.
Every relation has a unique name.
Every attribute value is atomic (single-valued).
Q. No. Question
Every row is unique.
Attributes in tables have unique names.
The order of the columns is irrelevant.
The order of the rows is irrelevant.

Table1 : Salesman #(PK), Commission%


Table2 : Car #(PK), Date Sold, Discount_Amt

Second Normal Form :


Tab1e should be in 1st NF and no partial functional dependencies.
Partial functional dependency: when one or more non-key attributes are functionally dependent on
part of the primary key.
Every non-key attribute must be defined by the entire key, not just by part of the key.

Table1 : Salesman #(PK), Commission%


Table2 : Car #(PK), Date Sold
Table3 : Car #(PK), Discount_Amt

Third Normal Form :


Table should be in 2nd NF and no transitive dependencies
Transitive dependency: a functional dependency between two or more non-key attributes.
No transitive Dependencies Found

BCNF:
3NF and every determinant is a candidate key
The tables are in BCNF.

7. What are the different types of partitions? (6)

Answer :
Partition Types :
1) Range Partition : Range partitioning maps rows to partitions based on ranges of
column values. Range partitioning is defined by the partitioning specification for a
table or index.
2) Hash Partition : Hash partitioning, uses a hash function on the partitioning columns
to stripe data into partitions. Hash partitioning allows data that does not lend itself
to range partitioning to be easily partitioned for performance reasons (such as
parallel DML, partition pruning, and partition-wise joins).
3) Composite Partition : Composite partitioning partitions data using the range
method and, within each partition, sub partitions it using the hash method. This
type of partitioning supports historical operations data at the partition level and
parallelism (parallel DML) and data placement at the sub partition level.
8. What are the types of refresh in the materialized view? (4)

Answer :
Oracle maintains the data in materialized views by refreshing them after changes are made to their
master tables.
The refresh method can be
a) incremental (fast refresh) or
Q. No. Question
b) complete.

Incremental (fast refresh): Incremental refresh is done when only new information is required to be
updated to an existing information

Complete refresh: In this complete view refresh from the scratch.

For materialized views that use the fast refresh method, a materialized view log or direct loader log
keeps a record of changes to the master tables.
Materialized views can be refreshed on demand or at regular time intervals.
Alternatively, materialized views in the same database as their master tables can be refreshed
whenever a transaction commits its changes to the master tables.

9. What are the 2 types of BUILD parameters in a materialized view? (2)

Answer :
Two types of Build Parameters are
1) BUILD IMMEDIATE clause populates the materialized view during creation (default).
2) BUILD DEFERRED clause creates the structure only; In this case view is required to be
populated later on using DBMS_MVIEW package. Use BUILD DEFERRED to populate
the materialized view after office hours to avoid affecting normal operations.
10. If you are going for New Fact Tables for Aggregates, Explain the Strategies that can be
followed. Provide at least two strategies. (6)

Answer:
1) LOST DIMENSION AGGREGATES - Lost dimension aggregates are created by completely
excluding one or more dimensions when summarizing a fact table.

2) SHRUNKEN DIMENSION AGGREGATES - Shrunken dimension aggregates have one or


more dimensions replaced by shrunken or rolled versions of themselves.

3) COLLAPSED DIMENSION AGGREGATES - Collapsed dimension aggregates are created


when dimensional keys have been replaced with high-level dimensional attributes, resulting in a
single, fully denormalized summary table.

LOST DIMENSION AGGREGATES


Q. No. Question

SHRUNKEN DIMENSION AGGREGATES

LEVEL-1 Between 15 And 25 Marks

LEVEL-2 Between 26 And 40 Marks

LEVEL-3 Above 40 Marks


PL / SQL
Q.No Question
1. What will happen after commit statement ? (5)
Declare
Cursor C1 is
Select empno,
ename from emp
for update;

Begin
open C1;
loop
Fetch C1 into eno.ename;
Exit When C1 %notfound;-----
commit;
end loop;
end;

Answer:
The cursor having query as SELECT .... FOR UPDATE gets closed after
COMMIT/ROLLBACK.

The cursor having query as SELECT.... does not get closed even after
COMMIT/ROLLBACK.(Answer for cursor having no for update clause)
2. Which one is the good programming method? (4)

a) IF func1(emp_no) AND ( sal < 450 ) THEN


...
END IF;

b) IF ( sal < 450 ) AND func1(emp_no) THEN


...
END IF;

c) Both

Answer:
b.
3. What happens if a procedure that updates a column of table X is called in a database
trigger of the same table ? (7)

Answer :
Mutation of table occurs.

If Mutating triggers is explained in detail and also the reasons to avoid that then award 10
marks
4. I have one validation in a trigger and another validation in integrity constraints. Which
validaton will be first validated ? (7)
Q.No Question
Answer :
First trigger validation will be completed and then integrity validation will be completed.
5. What is Pragma EXECPTION_INIT ? Explain the usage ? (5)

Answer : The PRAGMA EXECPTION_INIT tells the complier to associate an exception with
an oracle error. To get an error message of a specific oracle error.

e.g. PRAGMA EXCEPTION_INIT (exception name, oracle error number)


6. What will happen to the following code? (3)

Begin

CREATE TABLE emp


(
emp_no number,
emp_name char(10)
);

End;

a). It will give error in runtime error.


b). No errors.
c). It will give error in compile time.

Answer :
C
7. How to know that somebody modified the code? (5)

Answer :
Code for stored procedures, functions and packages is stored in the Oracle Data Dictionary.
One can detect code changes by looking at the LAST_DDL_TIME column in the
USER_OBJECTS dictionary view.
8. what is the result of following code? (2)

Declare

A varchar2(10) := NULL;

B varchar2(10) := null;

Begin

If ( a = b) then

Dbms_output.put_line(‘condition success’);

Else
Q.No Question

Dbms_output.put_line(‘condition failed’);

End if;

End;

Answer :
It will display condition failed;
9. What is Raise_application_error ? (3)

Answer :
Raise_application_error is a procedure of package DBMS_STANDARD which allows to
issue an user_defined error messages from stored sub-program or database trigger.
10. I want to execute one operating system command in the pl/sql. How to do? (7)

Answer :
In Oracle8 one can call external 3GL code in a dynamically linked library (DLL or shared
object). One just write a library in C/ C++ to do whatever is required. Defining this C/C++
function to PL/SQL makes it executable.
11. What is pinning in PL/SQL? How to pin? (6)

Answer :
Another way to improve performance is to pin frequently used packages in the shared memory
pool. When a package is pinned, it is not aged out by the least recently used (LRU) algorithm
that Oracle normally uses. The package remains in memory no matter how full the pool gets or
how frequently you access the package.

BEGIN

DBMS_SHARED_POOL.KEEP(‘PROCESS_DATE’,’P’);

END
12. Will the following code work? (3)

Create function fun1 ( a in number(5), b out number(3))


Return number is
Begin
b := 10;
a := a + b;
return 1;
End;

Answer :
It won’t work.
13. What is difference between UNIQUE constraint and PRIMARY KEY constraint ? (2)

Answer :
Q.No Question
A column defined as UNIQUE can contain NULLs while a column defined as PRIMARY
KEY can't contain Nulls.
14. How to Optimise the following code. (4)

DECLARE
TYPE nlist IS VARRAY(20) OF NUMBER;
Dept_type nlist := nlist(10, 30, 70, ...);
BEGIN
...
FOR i IN Dept_type.FIRST.. Dept_type.LAST LOOP
...
UPDATE emp SET sal = sal * 1.10 WHERE deptno = Dept_type(i);
END LOOP;

End;

Answer :
Use the bulk update to do in one shot.
15. Consider the scenario of report generation block in pl/sql. Populate the record set into a
driving table for report from 5 source tables. Before populating the data , remove all the
records from the report table. What is the option will you use to remove the data from
report table? (2)

Answer :
The best way is truncate table instead of delete the table . Truncate always faster than delete.

Note :If gives justification that in this no rollback is necessary then award 4
16. Consider the scenario of moving employee details from one table to another table , for
each employee in the source table, insert the details into target table , if already that
employee present in the target table, handle that exception. Write the pseudo code ? (4)

Answer :
Use one inner block in a for loop. Inside the inner block do the insert and catch the
DUP_VAL_INDEX in the exception section of the inner block.
17. I want to lock some records in a table for my processing but I don’t want to wait for those
record if that records are dirty? How to do? (5)

Answer :
Use NOWAIT option in the cursor.

LEVEL-1 Between 20 And 30 Marks

LEVEL-2 Between 31 And 60 Marks

LEVEL-3 Above 60 Marks


SQL QUERIES - PERFORMANCE TUNING
Q.No Questions
1. Mention a few optimizer hints used in SQL Queries ? (8)

Answer:
1) COST 2) RULE 3) FIRST_ROWS 4) ALL_ROWS 5) INDEX (TABLE/ALIAS NAME
INDEX_NAME) 6) NESTED_LOOPS 7) PARALLEL 8) USE_MERGE
Note: Award marks based on the number of answers
2. What are the 2 methods of finding / analysing the execution path taken by a SQL Query
? (4)

Answer:
1) EXPLAIN PLAN and 2) SQL TRACE AND THEN APPLY TKPROFF
3. Given a query, What would be the approach to tune the same ? (5)

Answer:
STEP 1 - UNDERSTAND THE DATA THAT NEED TO BE RETRIEVED.
STEP 2 - GET THE FILTER CONDITIONS RIGHT
STEP 3 - HAVE A STEP BY STEP APPROACH IN GETTING THE REQUIRED
OUTPUT
STEP 4 - START LOOKING AT THE TECHNICAL ASPECTS OF SQL LIKE JOINS,
INDEXES, HINTS AND BEST PRACTICES.
4. Mention a few configurable parameters in init.ora (8)

Answer:
THE PARAMETERS AND THEIR MAXIMUM VALUES ARE GIVEN BELOW.
Database size 512PB ( Up from 32TB)
Maximum datafiles 65,533
Datafiles per tablespace 1022
Blocks per datafile 4000000
Block size 32K
Maximum datafile size 128GB (4000000*32K)
Maximum tablespace size 128TB (1022 *128GB)
Db_block_buffers Unlimited
SGA for 32-bit OS 4GB
Table columns 1,000 (up from 256
5. Can we force the Oracle optimizer to use an index ? (5)

Answer:
YES, THIS IS POSSIBLE BY USING THE HINT (/*+ INDEX(TABLE_NAME
INDEX_NAME */)
Q.No Questions
6. Give a scenario where it would be better to go for a full table scan instead of the query
using an index range scan ? (5)

Answer:
A QUERY RETURNING MORE THAN 20% OF THE TABLE DATA IS A GOOD
CANDIDATE FOR FULL TABLE SCAN.

7. How do we ensure that a query uses a key to hit the partition and not all the partitions ?
(8)
Answer:
In the predicate(where) clause of a given sql, make sure all the partition keys are used.
8. Explain Cost / Rule based query processing. (6)

Answer:
Cost of executing a query determines the path of execution. The cost is derived from the
statistics available on all objects we are intersted in.

Rule is a predetermined set of steps of execution decided by the optimizer for a given sql.
9. Is it better to use the operators >= and <= instead of "Between". If the answer to the
question is "YES", please explain. (4)

Answer:
A "between" clause is internally converted into a <= and >=. So, the time needed to parse the
query and convert it internally isreduced. Hence, the between clause is at best avoided.
10. Who decides the path of query execution ? (3)

Answer:
THE ORACLE OPTIMIZER
Note : if the candidate is aware of 9i feature that by default it is optimizer then award 5
11. On what basis the optimizer decides the path of execution ? (5)

Answer:
Based on the statistics available about the table and indexes under consideration.
12. A function is used in the where clause of a SQL I.E.,(LTRIM(COLUMN_NAME)). Will
the corresponding index on the column be used for data retrieval and processing ? (4)

Answer:
At run time, i.e., during query execution, the index will not used. The index will be used
only, if it was created as a function based index.
13. Explain the difference using a distinct and a group by in a sql query: (3)

Answer:
A distinct does a grouping internally and then brings the unique value which means using a
group by would be the same and so, there would be a performance benefit of using group by
and having than a distinct clause in a SQL query.
14. What might happen to a query execution if the statistics are not availale for the tables
and index in question ? (6)

Answer:
Q.No Questions
THE OPTIMIZER MAY GO FOR A RULE BASED EXECUTION.
Note : We think the optimizer would proceed with available (wrong) statistics only, Plz cross
check
15. Given a scenario where we can use bitmap index ? If yes why ? (4)

Answer:
Columns whose cardinality is low are good candidates for creating a bitmap index. In other
words, columns which may have a few distinct values are good examples. Columns like
gender, status flag etc. are good examples.

16. How the statistics of a table will be build? (4)

Answer:
Analyze table / analyze index / estimate statistics / compute statistics are the mechanisms
used to get the statistics of the objects under consideration.
17. Consider the query

update table1 t1
set t1.col1, t1.col2 = ( select t2.col1, t2.col2
from table2 t2
where t2.col3 = t1.col3 )
WHERE t1.col4 = 'Y';

what will happen if for any of the record the where condition fails? How to resolve this?
(6)

Answer:
The query will fail, if the where condition in the subquery fails for any one of the record. The
mechanism to resolve is given below.

update table1 t1
set t1.col1, t1.col2 = ( select t2.col1, t2.col2
from table2 t2
where t2.col3 = t1.col3 )

where exists ( select 'x'


from table2 t2
where t2.col3 = t1.col3 )
AND t1.col4 = 'Y';
Note :Mention the inner/outer where condition fails in the question
18. Is it optimal to have more no of indexes in a table in OLTP ? if no why ? (4)

Answer:
It is always better to have optimal indexes on a table in oltp. If the no. Of the indexes
increases, performance BECOMES AN ISSUE, DURING INSERT/UPDATE/DELETE
OPERATIONS.
19. Consider a emp table having index on emp_name . Will the following query do full scan
or index scan? (5)
Q.No Questions
select * from emp where nvl(emp_name , 'JOHN') = 'PETER';

Answer:
The index will not be used because of the nvl function. If the index were created as a
function index with the nvl function, then the index will be used now.
20. Consider a emp table which is having index on emp_id. Will the following query will do
the full scan or index scan? (5)

select * from emp where emp_name = 'JOHN' and emp_id = 256;

Answer:
The index will not be used as the emp_id finds its place later in the where clause. Because,
emp_name finds its place in the first, the query will do a full table scan.
21. What is the Advantage and Disadvantage of using Set Operators?. And what are the
different Set Operators available in ORACLE? (6)

Answer:
When you want to retrieve without WHERE predicate then SET operator can be used, since
Index if present in Where predicate will not be considered. The advantage of using Set
operators is Parallel Processing and disadvantage is it will not consider the Index present in
the Where clause.

The different Set Operators are


1) UNION
2) INTERSECT
3) MINUS

LEVEL-1 Between 35 And 45 Marks

LEVEL-2 Between 46 And 80 Marks

LEVEL-3 Above 80 Marks


ORACLE FEATURES & ARCHITECTURE & ADMINISTRATION
CONCEPTS :-
Q.No Questions
1. What are Clusters ? (5)

Answer:
Clusters are groups of one or more tables physically stores together to share common
columns and are often used together.
2. What is Row Chaining ? (5)

Answer:
In Circumstances, all of the data for a row in a table may not be able to fit in the same data
block. When this occurs , the data for the row is stored in a chain of data block (one or more)
reserved for that segment.
3. What is the use of Control File ? (3)

Answer:
When an instance of an ORACLE database is started, its control file is used to identify the
database and redo log files that must be opened for database operation to proceed.
4. What are the different type of Segments ? (3)

Answer:
Data Segment, Index Segment, Rollback Segment and Temporary Segment.
5. When Does DBWR write to the database ? (6)

Answer:
DBWR writes when more data needs to be read into the SGA and too few database buffers
are free. The least recently used data is written to the data files first. DBWR also writes when
CheckPoint occurs.
6. What is the function of Dispatcher (Dnnn) ? (7)

Answer:
Dispatcher (Dnnn) process is responsible for routing requests from connected user
processes to available shared server processes and returning the responses back to the
appropriate user processes.
Atleast one Dispatcher process is created for every communication protocol in use.
7. Will the Optimizer always use COST-based approach if OPTIMIZER_MODE is set to
"Cost'? (5)

Answer:
Presence of statistics in the data dictionary for atleast one of the tables accessed by the SQL
statements is necessary for the OPTIMIZER to use COST-based approach. Otherwise
OPTIMIZER chooses RULE-based approach.
8. What are the values that can be specified for OPTIMIZER_GOAL parameter of the
Q.No Questions
ALTER SESSION Command? (6)

Answer:
CHOOSE , ALL_ROWS ,FIRST_ROWS and RULE.
9. What is the effect of setting the value "CHOOSE" for OPTIMIZER_GOAL, parameter
of the ALTER SESSION Command ? (6)

Answer:
The Optimizer chooses Cost_based approach and optimizes with the goal of best throughput
if statistics for atleast one of the tables accessed by the SQL statement exist in the data
dictionary. Otherwise the OPTIMIZER chooses RULE_based approach.
10. What is the use of SNAPSHOT LOG ? (5)

Answer:
A snapshot log is a table in the master database that is associated with the master table.
ORACLE uses a snapshot log to track the rows that have been updated in the master table.
Snapshot logs are used in updating the snapshots based on the master table.
11. What is Full Backup ? (3)

Answer:
A full backup is an operating system backup of all data files, on-line redo log files and
control file that constitute ORACLE database and the parameter.
12. Which parameter in Storage clause will reduce no. of rows per block? (7)

Answer:
PCTFREE parameter
This is used to reserve certain amount of space in a block for expansion of rows.
13. How will you monitor rollback segment status ? (7)

Answer:
Querying the DBA_ROLLBACK_SEGS view
14. What are the database administrators utilities avaliable ? (6)

Answer:
SQL * DBA - This allows DBA to monitor and control an ORACLE database.

SQL * Loader - It loads data from standard operating system files (Flat files) into ORACLE
database tables.

Export (EXP) and Import (imp) utilities allow you to move existing data in ORACLE format
to and from ORACLE database.
15. What is a trace file and how is it created ? (6)

Answer:
Each server and background process can write an associated trace file. When an internal error
is detected by a process or user process, it dumps information about the error to its trace.
This can be used for tuning the database.
16. What are the different methods of backing up oracle database ? (3)
Q.No Questions
Answer:
- Logical Backups
- Cold Backups
- Hot Backups (Archive log)
Logical backup involves reading a set of databse records and writing them into a file. Export
utility is used for taking backup and Import utility is used to recover from backup.

Cold backup is taking backup of all physical files after normal shutdown of database. We need
to take.
- All Data files.
- All Control files.
- All on-line redo log files.
- The init.ora file (Optional)
Taking backup of archive log files when database is open. For this the ARCHIVELOG
mode should be enabled. The following files need to be backed up.
All data files
All Archive log
redo log files
All control files
17. Name the ORACLE Background Process ? (6)

Answer:
DBWR - Database Writer.
LGWR - Log Writer
CKPT - Check Point
SMON - System Monitor
PMON - Process Monitor
ARCH - Archiver
RECO - Recover
Dnnn – Dispatcher
LCKn - Lock
Snnn - Server.
18. What is the use of PGA? (4)

Answer:
The PGA is an area in memory that helps user processes execute, such as bind variable
information, sort areas, and other aspects of cursor handling
19. What are functions of PMON ? (4)

Answer:
Process Monitor (PMON) performs process recovery when a user process fails PMON is
responsible for cleaning up the cache and Freeing resources that the process was using
PMON also checks on dispatcher and server processes and restarts them if they have failed.
20. Can we change DB block size after a data base created? (cannot rate)

Answer:
Depends on the version of oracle(8i/9i)
21. Which export option will generate code to create an initial extent that is equal to the sum
of the sizes of all the extents currently allocated to an object? (4)
Q.No Questions
A. FULL
B. DIRECT
C. COMPACT
D. COMPRESS

Answer : D – DBA Question


22. What are two reasons for changing user quotas on a tablespace? (3)
A. A datafile becomes full.
B. A user encounters slow response time from the application.
C. Tables owned by a user exhibit rapid and unanticipated growth.
D. Database objects are reorganized and placed in different tablespace.

Answer : C,D
23. A DBA performs the query:
SELECT tablespace_name, max_blocks
FROM dba_tablespace_quotas
WHERE username= ‘JERRY’;

That returns the result:

TABLESPACE_NAME MAX_BYTES
DATA01 -1
What does -1 indicate? (5)

A. Tablespace DATA01 has been dropped.


B. Tablespace DATA01 has no free space.
C. The user has no quotas on tablespace DATA01.
D. The user has an unlimited quota on tablespace DATA01.
E. The user has exceeded his or her quota on the tablespace DATA01.

Answer : D
24. Consider the following command to create the user ‘peter’:

CREATE USER peter


IDENTIFIED by pan
TEMPORARY TABLESPACE temp
PASSWORD EXPIRE;
Since no default tablespace was specified, what will happen if this command is executed?
A. The user will not own a home directory.
B. The user peter will be created using the TEMP tablespace as the default.
C. The user peter will be created using the SYSTEM tablespace as the default.
D. The code will produce an error message, the user peter will not be created. (4)

Answer : C
25. An Oracle user receives the following error:
ORA-01555 SNAPSHOP TOO OLD
What are two possible solutions? (5)
A. Increase the extent size of the rollback segments
B. Perform media recovery.
Q.No Questions
C. Increase the number of rollback segments.
D. Increase the size of the rollback segment tablespace.
E. Increase the value of the OPTIMAL storage parameter.
Answer : A,C
26. MINEXTENTS must be at least _____ when a rollback segment is created. (5)
A. 1
B. 2
C. 3
D. 5

Answer : B
27. You are creating a database with a character set other than US7ACII. Which operating
system environmental variable needs to be set to specify the director location of the NLS
support files? (6)
A. NLS_LANG
B. ORA_NLS33
C. ORACLE_SID
D. ORACLE_BASE
E. ORACLE_HOME

Answer : B
28. Given the statement:
CREATE DATABASE orc1
LOGFILE GROUP 1 ‘u1/Oracle/dba/logla.rdo’ SITE DM
GROUP 2 ‘u01/Oracle/dba/logla.rdo’ SITE DM
DATAFILE ‘u01/Oracle/dbs/sys_01.dbf’ REUSE;
Which statement is true? (6)
A. The online redo logs will be multiplexed.
B. The file ‘u01/Oracle/dbs/sys_01.dbf’ already exists.
C. File ‘u01/Oracle/dbs/sys_01.dbf’ as a parameter file.
D. The control file name is ‘u01/Oracle/dbs/sys_01.dbf’.
E. Oracle will determine the optimum size for ‘u01/Oracle/dba/sys_01.dbf’.

Answer : B
29. What is a default role? (3)
A. A role that requires a password.
B. A role that requires no password.
C. A role automatically enabled when the user logs on.
D. A role automatically assigned when the user is created.

Answer : C
30. Which data dictionary view shows the available free space in a certain tablespace? (4)
A. DBA_EXTENTS
B. V$FREESPACE
C. DBA_FREE_SPACE
D. DBA_TABLESPACE
E. DBA_FREE_EXTENTS

Answer : C
Q.No Questions
31. Which statement about using PCTFREE and PCTUSED is true? (6)
A. Block space utilization can be specified only at the segment level.
B. Block space utilization can be specified only in the data dictionary.
C. Block space utilization parameters can only be specified at the tablespace.
D. Block space utilization can be specified both at the tablespace level and segment
level.

Answer : A
32. Which type of index should be created to spread the distribution of an index across the
index tree? (6)
A. B-tree indexes.
B. Bitmap indexes.
C. Reverse-key indexes.
D. Function-based indexes.

Answer : C
33. Which statement about rebuilding indexes is true? (5)
A. The NOSORT option must be used.
B. The new index is built using the table as the data source
C. A reverse b-tree index can be converted to a normal index.
D. Query performance may be affected because the index is not.

Answer : C
34. Which view will show a list of privileges that are available for the current session to a
user? (5)
A. SESSION_PRIVS
B. DBA_SYS_PRIVS
C. DBA_COL_PRIVS
D. DBA_SESSION_PRIVS

Answer : A
35. In which situation is it appropriate to enable the restricted session mode? (7)
A. Creating a table
B. Dropping an index
C. Taking a rollback segment offline
D. Exporting a consistent image of a large number of tables.

Answer : D
36. Which three events are logged in the ALERT file? (6)
A. Socket usage
B. Block corruption errors
C. User session information
D. Internal errors (ORA-600)
E. Database startup activities.

Answer : B,D,E
37. Which data dictionary view displays the database character set? (5)
A. V$DATBASE
B. DBA_CHARACTER_SET
Q.No Questions
C. NLS_DATABASE_PARAMETERS
D. NLS_DATABASE_CHARACTERSET

Answer : C

LEVEL-1 Between 50 And 94 Marks

LEVEL-2 Between 95 And 141 Marks

LEVEL-3 Above 142 Marks


DBA related Questions (Performance and General DBA activity questions)
Q.No Questions
Oracle Architecture
1. Which statement best describes the purpose of redo log files? (5)
A. They ensure that log switches are performed efficiently.
B. They allow changes to the database to be asynchronously recorded.
C. They provide a means to redo transactions in the event of a database failure.
D. They record changes that have not yet been committed.

The best answer is C.


2. What is the minimum number of redo log file groups that are required for an Oracle
database instance? (3)
A. One
B. Two
C. Three
D. Four

The correct answer is B


3. Select the three statements that are true about checkpoints: (6)
A. Checkpoints occur when an automatic log switch is performed.
B. Checkpoints occur when the database is shut down with the normal, immediate, or
transactional option.
C. The DBA cannot force checkpoints.
D. Checkpoints automatically occur when the DBA performs a manual log switch.
E. Checkpoints are recorded in the alert.log file by default.

The correct answers are A, B, and D.


4. What four parameters most affect SGA size? (5)
A. SGA_MAX_SIZE
B. SHARED_POOL_SIZE
C. DB_CACHE_SIZE
D. LARGE_POOL_SIZE
E. LOG_BUFFERS

The correct answers are B, C, D, and E.


Physical and Logical Schema
5. Select two characteristics of locally managed tablespaces: (6)
A. Extents are managed by the data dictionary.
B. A bitmap in the datafile keeps track of the free or used status of blocks in the
datafile.
C. Each segment stored in the tablespace can have a different storage clause.
D. No coalescing is required.
E. UNDO is not generated when allocation or deallocation of extents occurs.

The correct answers are B, D, and E.


6. What type of tablespace is best for managing sort operations? (3)
A. UNDO tablespace
B. SYSTEM tablespace
C. Temporary tablespace
D. Permanent tablespace
The correct answer is C.
Oracle Networking
7. What is the purpose of Oracle Net Services in an Oracle database environment? (3)
A. Process requests within the database
B. Establish the connection between a client and the database
C. Maintain data integrity within the client application
D. Start the listener process

B is the correct answer.


Background Processes and Oracle Configuration
8. When using dynamic service registration, the process monitor (PMON) reads
initialization parameters using what file? (3)
A. listener.ora
B. tnsnames.ora
C. sqlnet.ora
D. init.ora

The correct answer is D.


9. What two init.ora file parameters must be set to support dynamic service registration?( 6)
A. TNS_ADMIN
B. SERVICE_NAMES
C. SID_NAME
D. INSTANCE_NAME

The correct answer is both B and D.


10. What four statements regarding a dedicated server process environment are true? (6)
A. The user process and server process are separate
B. Each user process has its own server process
C. The user and server processes can run on different machines to take advantage of
distributed processing
D. There is a one-to-many ratio between the user and server processes
E. Even when the user process is not making a database request, the dedicated server
exists but remains idle
F. Processing results are sent from the server processes satisfying the request to a
dispatcher process Next Steps

The correct answer is A, B, C, and E.


11. What two parameters are required to configure an Oracle shared server process
environment? (6)
A. SHARED_SERVERS
B. CIRCUITS
C. MAX_SHARED_SERVERS
D. DISPATCHERS
E. MAX_DISPATCHERS

The correct answer is A and D.


12. In a dedicated server configuration, the contents of the PGA include which of the
following three pieces of data? (6)
A. User session data
B. Stack space
C. Cursor state
D. Shared pool and other memory structures

A, B, and C is the correct answer.


13. What command would you execute to decrease the size of the shared pool from 50MB to
20MB? (5)
A. ALTER SESSION set SHARED_POOL_SIZE 50m;
B. ALTER SYSTEM set SHARED_POOL_SIZE = 50m;
C. ALTER SYSTEM set SHARED_POOL_SIZE 20m;
D. ALTER SYSTEM set SHARED_POOL_SIZE = 20m;

The correct answer is D.


14. Oracle Managed Files are established by setting what two of the following parameters?(6)
A. DB_FILE_CREATE_DEST
B. DB_FILE_NAME_CONVERT
C. DB_FILES
D. DB_CREATE_ONLINE_LOG_DEST_N

The correct answers are A and D.


SQL
15. What ALTER SYSTEM command sets the default directory for OMF? (5)
A. ALTER SYSTEM set DB_CREATE_FILE_DEST '/u01/exam_files';
B. ALTER SYSTEM set DB_CREATE_ONLINE_LOG_DEST_1 '/u01/exam_files';
C. ALTER SYSTEM set DB_CREATE_FILE_DEST = '/u01/exam_files';
D. ALTER SYSTEM set DB_CREATE_ONLINE_LOG_DEST_2 =
'/u01/exam_files';

The correct answer is C.


16. Which of the following commands will fail if you are using OMF? (4)
A. CREATE TABLESPACE ocp_data size 2m;
B. CREATE TABLESPACE ocp_data datafile '/u01/exam.ora' size 2m;
C. CREATE TABLESPACE ocp_data datafile 2m;
D. CREATE TABLESPACE ocp_data;

The correct answer is A.


17. Which one of the following statements is true regarding PCTFREE? (4)
A. It specifies the minimum percentage of used space that the Oracle server tries to
maintain for each data block of the table.
B. It specifies the percentage of space in each block reserved for growth resulting from
changes.
C. The default value is 40 percent.
D. It is not used for index segments.

The correct answer is B


18. Match the STATSPACK activity (from the first set of answers, uppercase A-D) with the
correct statement to perform the activity (from the second set of answers, lowercase a-d).
(6)

A. Install STATSPACK a. $ORACLE_HOME/rdbms/admin/ spauto.sql


B. Collect statistics b. $ORACLE_HOME/rdbms/admin/ spreport.sql
C. Automatically collect statistics c. $ORACLE_HOME/rdbms/admin/ spcreate.sql
D. Produce a report d. execute STATSPACK.snap

The correct matches are A and c, B and d, C and a, and D and b.


19. What two memory areas are located within the shared pool? (3)
A. Library cache
B. Large pool
C. Keep buffer pool
D. Dictionary cache
E. Default buffer pool

The correct answers are A and D.


20. What init.ora parameter determines the amount of memory that a server process should
use for a sort process? (3)
A. SORT_AREA_SIZE
B. SORT_AREA_RETAINED_SIZE
C. SORT_MEMORY
D. SORT_DISK

The correct answer is A.


21. What two statements are true with regard to when a dedicated server configuration is
used? (5)
A. Sort space is part of the shared pool.
B. Sort space is part of the PGA.
C. Sort space is part of the large pool.
D. The parameter SORT_AREA_RETAINED_SIZE can be set dynamically.

The correct answers are B and D.


22. Consider the following SQL statement: (4)

select last_name, first_name,


department, salary
from employee
order by last_name;

As the DBA, assume that you did some research and found the following:

1. There is one index on the employee table that is based on the employee_id column.
2. The user SCOTT who executes the statement has the permanent tablespace USER01
assigned as both his default and temporary tablespaces.
3. The sort operation is too large to fit within the memory space specified by
SORT_AREA_SIZE.

What two actions can you take to optimize the query?

A. Assign SCOTT to a temporary tablespace that will be used for sort segments.
B. Increase the value of SORT_AREA_RETAINED_SIZE.
C. Create an index on the last_name column of the employee table.
D. Employ a shared server configuration to allow the sort space to be allocated in the
shared pool instead of the PGA.
The correct answers are A and C.
Oracle Objects
23. Select the statements that are true about bitmap index structures: (5)
A. Use for high-cardinality columns.
B. Good for multiple predicates.
C. Use minimal storage space.
D. Best for read-only systems.
E. Updates on key values are relatively inexpensive.
F. Good for very large tables.

The correct answers are B, C, D, and F.


24. Select the statements that are true about materialized views: (5)
A. Fast refreshes apply only to changes made since the last refresh.
B. A complete refresh of a materialized view involves truncating existing data and
reinserting all the data based on the detail tables.
C. A view defined with a refresh type of Force always performs a complete refresh.
D. A refresh type of Never suppresses all refreshes of the materialized view.
E. Materialized views use the same internal mechanism as snapshots for refresh.

The correct answers are A, B, D, and E.


25. What are external tables in oracle? (5)

Answer:
External tables allow Oracle to query data that is stored outside the database in flat files.
The ORACLE_LOADER driver can be used to access any data stored in any format that can be
loaded by SQL*Loader. No DML can be performed on external tables but they can be used for
query, join and sort operations. They are useful in the ETL process of data warehouses since
the data doesn't need to be staged and can be queried in parallel. They should not be used for
frequently queried tables.

Steps to create :
1. Directory Object needs to be created.
2. Edit the flat file and put it in the respective path
3. Create Table
Syntax :

Create table< Table Name>


( Column Names…..)
Organization External
( TYPE ORACLE_LOADER
DEFAULT Directory <Directory Name>
Access Parameters
( Records delimited by <char>,
fields terminated by <char>
( column name 1…2…)
)
location (‘filename with path’)
);
LEVEL-1 Between 35 And 50 Marks
LEVEL-2 Between 51 And 80 Marks

LEVEL-3 Above 80 Marks


BI & DW Technologies
Q. No. Question
1. How do you decide whether to build data marts or an EDW? (5)

Answer:
Data marts are logical/physical subsets of Enterprise Data Warehouse (EDW). This question has
more to do with Kimball's view (Bottom-Top Approach) or Inmon's view (Top-Down Approach)
of building a EDW.
In the bottom top approach (Kimball's view), we build physical datamarts (catering to particular
departments within organization viz. Sales, HR etc.), integrate it with other data marts over a
period of time using conformed dimensions and build the Bus Architecture (EDW).
In the top down approach (Inmon's view), we build the bigger picture i.e. the EDW (for the entire
organization) and later depending on specific departmental requirements, separate logical /
physical subsets of the EDW in form of data marts.
Criteria for choice of approach:
If the customer wants quick results involving shorter iteration of development, go in for data
mart. It will give the customer confidence of using BI and DW as the ideal technology for taking
strategic decisions
Data marts are easier to manage.
If the customer has the budget and can wait for a longer time for results, go in for the EDW.
But during the long development cycle of EDW, business sponsors may change and the priorities
may also change leading to termination of EDW project.

2. What are slowly changing dimensions? What are the various methods of handling them?(5)

Answer:
Dimension which change over time are called Slowly Changing Dimensions.
1) Type 1 – The old record is updated with the new data. History is not maintained.
2) Type 2 – History of changes are maintained by adding new records.
3) Type 3 – A new attribute is created to indicate whether the record is Original or Current.
Only the original and Current values are maintained.
3. What is an Operational Data Store (ODS)? How different it is from a data warehouse? (5)

Answer:
ODS :-
 It is a subject-oriented, integrated, volatile, current-valued, collection of data in support of an
organization's need for up-to-the second, operational, integrated, collective information.
 ODS is strictly operational construct and it provides mission critical data. It does not store
the summary information because it is dynamic.
 An operational data store (ODS) is a type of database often used as an interim area for a data
warehouse. Unlike a data warehouse, which contains static data, the contents of the ODS are
updated through the course of business operations. An ODS is designed to quickly perform
relatively simple queries on small amounts of data (such as finding the status of a customer
order), rather than the complex queries on large amounts of data typical of the data
warehouse. An ODS is similar to your short term memory in that it stores only very recent
information; in comparison, the data warehouse is more like long term memory in that it
stores relatively permanent information.

Distinctive traits of ODS and DW :-


Q. No. Question
 ODS has volatile data while DW contains non-volatile data.
 ODS contains only the current data whereas DW contains both current and historical data.
i.e., DW contains data that is no more current than 24 hours. But, ODS contains the data that
may be only seconds old.
 Major difference is ODS contains Detailed data but DW contains both detailed and summary
data
 Types of data in ODS is different than DW. ODS has system of record which is formal
identification of data in the legacy systems that feeds ODS. But DW has summary based
data that are stored for analysis and reporting.
 ODS can be source for DW but DW cannot be source for ODS.
4. What are the data warehouse architecture goals? (5)

Anwer:
Different kinds of data warehouse architecture that we can design.
Top-down architecture.
Bottom-Up architecture.
Federated architecture.
All the architecture are useful for analysis and business decision making.
Some of the goals are
 Easy analysis by maintaining centralized historical,integrated warehouse database.
 analyze by writing complex queries for decision making according to business
Requirements.
 creating data marts specific to subject area for maintaining security to end users.
 We cannot modify the data in the warehouse database and just we can append new
sources for multidimensional analysis.
 We can query the data in different dimensions and generate reports in different forms.
 Forecasting existing data depending on time period.
 Mostly useful for analysis at corporate level rather than transaction processing.
 extensibility by anticipating future end-user needs and providing a "roadmap" that
reveals where such needs are addressed (e.g. where and how does the financial budget
management tool fit into the data warehouse architecture?).
 reusability by documenting reusable components, processes, etc. (e.g. after documenting
and revising the process of building the first data mart, the process should be reused to
build subsequent data marts).
 improved productivity by enabling reusability and revealing where specific tools may be
necessary to automate data warehouse processes (e.g. how will the incoming data be
analyzed and cleansed?).
5. What is a star schema? (2)

Answer:
In the Star Schema there is a central fact table which holds the numerical measurements of the
business are stored. This is surrounded by Dimension tables which contains the textual
description of the dimensions of the business. The dimension tables have only a single join
attaching them to the central fact table.

6. What is a snowflake schema? (2)

Answer:
Q. No. Question
This is an extension of the star schema; here low cardinality redundant attributes are moved to
sub-dimension tables. This is done to save storage space for large dimensions.
7. What is a surrogate key? Justify its usage in a DWH environment (5)

Answer:
Surrogate keys are used to uniquely identify a record in a DWH environment. The following
situations may arise in a OLTP production environment resulting in the need for generating /
using surrogate keys.
 Production may reuse keys that it has purged but that you are still maintaining.
 Production may make a mistake and reuse a key even when it isn’t supposed to. This
happens frequently in the world of
 UPCs in the retail world, despite everyone's best intentions.
 Production may re compact its key space because it has a need to garbage-collect the
production system.
 Production may legitimately overwrite some part of a product description or a customer
description with new values but not change the product key or the customer key to a new
value.
 Production may generalize its key format to handle some new situation in the transaction
system. Now the production keys that used to be integers become alphanumeric. Or
perhaps the 12-byte keys have become 20 byte keys.
 The company has made an acquisition and there is a need to merge more than a million
new customers into the master customer list. The newly acquired production system has
nasty production keys that don't look remotely like others.
8. How will we capture the information in an Order Fact table with granularity of order -
item level in the following scenario -

An order having more than one item of same nature eg. Two Times Magazine having same
Universal Magazine Code. (3)

Answer:
In this scenario, the composite key (based on the combination of dimension table primary keys)
will fail to give a unique record. We need to have a surrogate key defined for the fact table which
can be simple sequence numbers. This will serve as unique identifier of each record in the fact
table.
9. What are the Components of typical data warehouse architecture? (5)

Answer:
Note: Below is the detailed explanation for the above question, if the candidate addresses the key
points with relevant explanation award marks.

Key Component Areas


A complete data warehouse architecture includes data and technical elements. Thornthwaite
breaks down the architecture into three broad areas. The first, data architecture, is centered
on business processes. The next area, infrastructure, includes hardware, networking,
operating systems, and desktop machines. Finally, the technical area encompasses the
decision-making technologies that will be needed by the users, as well as their supporting
structures. These areas are detailed in the sub-sections below.

Data Architecture
Q. No. Question
As stated above, the data architecture portion of the overall data warehouse architecture is
driven by business processes. For example, in a manufacturing environment the data model
might include orders, shipping, and billing. Each area draws on a different set of
dimensions. But where dimensions intersect in the data model the definitions have to be the
same—the same customer who buys is the same that builds. So data items should have a
common structure and content, and involve a single process to create and maintain.

Thornthwaite says that organizations often ask how data should be represented in the
warehouse—entity/relationship or dimensional? “If you have a star schema then use
dimensional. Is your detail normalized or dimensional? Will users be querying detail? Then
use dimensional.” He adds that most data warehousing experts are in substantial agreement;
the [data] sources are typically entity/relationship models and the front end is a dimensional
model. The only issue is where you draw the line between the warehouse itself and the data
staging area.

As you work through the architecture and present data to your users, tool choices will be
made, but many choices will disappear as the requirements are set. For example, he
explains that product capabilities are beginning to merge, like MOLAP and ROLAP.
“MOLAP is okay if you stay within the cube you've built. It's fast and allows for flexible
querying—within the confines of the cube.” Its weaknesses are size (overall and within a
dimension), design constraints (limited by the cube structure), and the need for a proprietary
data base.

Infrastructure Architecture
With the required hardware platform and boxes, sometimes the data warehouse becomes its
own IS shop. Indeed, there are lots of “boxes” in data warehousing, mostly used for data
bases and application servers.

The issues with hardware and DBMS choices are size, scalability, and flexibility. In about
80 percent of data warehousing projects this isn't difficult; most businesses can get enough
power to handle their needs.

In terms of the network, check the data sources, the warehouse staging area, and everything
in between to ensure there's enough bandwidth to move data around. On the desktop, run
the tools and actually get some data through them to determine if there's enough power for
retrieval. Sometimes the problem is simply with the machine, and the desktops must be
powerful enough to run current-generation access tools. Also, don't forget to implement a
software distribution mechanism.

Technical Architecture
The technical architecture is driven by the meta data catalog. “Everything should be meta
data-driven,” says Thornthwaite. “The services should draw the needed parameters from
tables, rather than hard-coding them.” An important component of technical architecture is
the data staging process, which covers five major areas:

• Extract - data comes from multiple sources and is of multiple types. Data
compression and encryption handling must be considered at this area, if it applies.
• Transform - data transformation includes surrogate key management,
Q. No. Question
integration, de-normalization, cleansing, conversion, aggregation, and auditing.
• Load - loading is often done to multiple targets, with load optimization and
support for the entire load cycle.
• Security - administrator access and data encryption policies.
• Job control - this includes job definition, job scheduling (time and event),
monitoring, logging, exception handling, error handling, and notification.

The staging box needs to be able to extract data from multiple sources, like MVS, Oracle,
VM, and others, so be specific when you choose your products. It must handle data
compression and encryption, transformation, loading (possibly to multiple targets), and
security (at the front end this is challenging, Thornthwaite says). In addition, the staging
activities need to be automated. Many vendors' offerings do different things, so he advises
that most organizations will need to use multiple products.

A system for monitoring data warehouse use is valuable for capturing queries and tracking
usage, and performance tuning is also helpful. Performance optimization includes cost
estimation through a “governor” tool, and should include ad hoc query scheduling.
Middleware can provide query management services. Tools for all of these and other
related tasks are available for the front end, for server-based query management, and for
data from multiple sources. Tools are also available for reporting, connectivity, and
infrastructure management. Finally, the data access piece should include reporting services
(such as publish and subscribe), a report library, a scheduler, and a distribution manager.
10. Explain the role of metadata in data warehousing environment? Who are the users of
metadata? (5)

Answer:
Role of Metadata:
Provide a simple catalogue of business metadata descriptions and views
Document/manage metadata descriptions from an integrated development environment
Enable DW users to identify and invoke pre-built queries against the data stores
Design and enhance new data models and schemas for the data warehouse
Capture data transformation rules between the operational and data warehousing databases
Provide change impact analysis, and update across these technologies

Users of Metadata:
Technical Users - Warehouse Administrators, Application Developers
Business Users
11. What are the different phases involved in data warehousing development lifecycle? (5)

Answer:
The different phases involved in datawarehousing development life cycle are
1) Planning
2) Gathering Data requirements and modeling
3) Physical database design and development
4) Development and Implementation
5) Deployment
12. What are fact less fact tables? Mention the different kinds of factless fact tables and explain
with an example. (8)
Q. No. Question
Answer:
Fact tables which have no measured facts. There are two kinds
1) Fact table to record events. –
Example - Student Attendance at a college.
The dimensions include -
Date : One record for each day on the calendar.
Student : one record for each student
Course : One record for each course taught each semester.
Teacher: one record for each teacher.
Facility : one record for each room, laboratory or athletic field.

The grain of the fact table is the individual student attendance event.
When the student walks through the door into the lecture , a record is generated.It is clear
that the these dimensions
ae all well defined and the fact table record, consisting of just the five keys, is a good
representation of the student atendance event.

The only problem is that the there is no obvious fact to record each time a student attends a
lecture or suits up for pysical rejection. This table records the student attendance process and
not a semester grading process.

2) Coverage Table. – Coverage tables are frequently needed, when a primary fact table in a
dimensional datawarehouse is sparse.

For Eg, let us consider a sales fact table. This records the sales details or sale events that
happened. This cannot tell what did not happen i.e., "which products were on promotion that
did not sell?" This is because this contais only the products that did sell.

The coverage table comes to the rescue. A record is placed in the coverage table for each
product in each store that is on promotion in each time period. The items not on promotion
that also did not sell can be left out.

Answering the question, "Which products were on promtion that did not sell?" requires a two
step application. First, consult the coverage table for the liSt of products on that day in that
store. Second, consult ths sales table for the list of products that did sell. The desired answer
is the set difference between these two list of products.

Another application of coverage table - It is useful for recording the assignment of sales team
to customers in businesses in which the sales teams make occasional very large sales. In such
a business, the sales fact table is too sparse to provide a good place to record which sales
team were associated with which customers, even if some of the combination never resulted
in a sale.
13. What are the basic load types required while building DW: (4)

Answer:
Full Load: Truncate the target table and reload all rows from the source. This type of load is the
simplest to implement, as there is no update type involved, and no need to do a lookup to the
target table to establish whether a row is required to be updated or inserted. Here, the target table
will be truncated and loaded and this type of load is mainly used for staging table loads.
Q. No. Question
Incremental Load: Only load the rows which are either new or have been updated since they
were originally loaded. This type of load is more complex to implement and can be achieved in
one of two ways

1) Select all rows from the source table and enable the ETL process to establish which rows are
to be updated, inserted and ignored. This is implemented through a lookup to the target table.
Only process the rows, which are new or have been updated since the last extract. This would
involve discarding the Rows, which are not changed since the last extract.
14. What are the basic issues with Snowflake Schema (3)

Answer:
 Only few tools optimized for this schema
 More complex presentation as, not all the tools are not optimized for this.
 Browsing is slower/Report drill down process is also slower
 Problems with multiple join
15. What
are the basic steps needed to be considered for Dimensional modeling? (4)

Answer:
 Identify the Business Process
o A major operational process that is supported by some kind of legacy system(s)
from which data can be collected for the purpose of the data warehouse
o Example: orders, invoices, shipments, inventory, sales
 Identify the Grain
o The fundamental lowest level of data represented in a fact table for the business
process
o Example: individual transactions, individual daily snapshots
 Identify the Dimensions
o Choose the dimensions that will apply to each fact table record
 Identify the Facts
o Choose the measured facts that will populate each fact table record
What are the various types of facts? Describe and give 2 examples each. (3)

Answer:
Additive Facts: These facts are meaningfully added along any dimensions. (Ex: dollar sales, Unit
sales and Unit sold)
Non-Additive Fact: Order No, Item No, Unit Price
Semi-Additive Facts: Some of these tables may be added along certain dimensions but not all
dimensions Ex: Customer counts which is non-additive to product dimension but additive to
customer dimension, Average Account balance is semi-additive over Time Dimension.
16. What are the different architectural approaches of DM? (4)

Answer:
 Satellite data mart
 Tactical and quickly developed data mart
 Feeder data marts
 Partition data marts
Q. No. Question
17. When speed and cost are a constraint what is the architectural approach you will adopt for
your DM? (2)

Answer:
Tactical and quickly developed data mart,
 A tactical data mart is typically completed in 90- 120 days rather than the much longer
time required for a full-scale data warehouse.
 Time to develop a system is less, the cost automatically becomes less.
18. Which is a subject-oriented view of a data warehouse? (2)

Answer:
Data mart is subject-oriented view of a data warehouse because its related to specific subject area
like Accounting, Finance, HR, MM, PM etc.
19. What is the main difference between Data Warehousing and Business Intelligence? (4)

Answer:
The differentials are:

DW - is a way of storing data and creating information through leveraging data marts. DM's are
segments or categories of information and/or data that are grouped together to provide
'information' into that segment or category. DW does not require BI to work. Reporting tools can
generate reports from the DW.

BI - is the leveraging of DW to help make business decisions and recommendations. Information


and data rules engines are leveraged here to help make these decisions along with statistical
analysis tools and data mining tools.

You will find that BI is much like ERP in that it can be extremely expensive and invasive to your
firm and there is a wide range between the offerings - low end to high end - which facilitates the
pricing. There is a long list of tools to select from. There are also services that provide this as an
outsource. Some of these services will allow you to eventually 'own' the solution and in-source it
at a future date. Like anything else, this comes at a price. This scenario works well for those who
do not have a high caliber IT staff and would like to get results with a short ramp up time,
basically because the system is already built. Your rules and reports just have to be generated.
That is a bit oversimplified, but you get the picture.

LEVEL-1 Between 30 And 45 Marks

LEVEL-2 Between 46 And 60 Marks

LEVEL-3 Above 60 Marks


Data Analysis
Q.No Questions
1. What is data analysis and what are the essential steps involved in data analysis. (4)

Answer:
Data analysis is a process where in data from a source system is analyzed for its completeness, for
integrity of data, for data quality; identify the missing elements in the source data, identifying the
codification problem in the data coming from various systems.

Data analysis essentially involves the following steps to be carried out.


• Identify source systems
• Data profiling involving the following activities.
o Identify the data patterns across the source systems.
o Identify the business rules associated with the data.
o Identify the missing data elements.
• Prepare a data analysis document, which gives the details of various data files / tables from
the source system. Data integrity rules, business rules associated with the data elements in
the source system.
2. What is data profiling? What will be the outcome of data profiling stage in data analysis
process? (4)

Answer:
Data profiling is the process of understanding the data, its charecterisitcs and relationship with its
associates.
Following will be the outcome of data profiling activity
a. A source (data file / tables) identification document
b. Source to target mapping document
c. Enumeration definition document
d. Data dictionary identifying all the source fields and their descriptions.

3. What should be done to analyse the following ?


a) heterogenous system
b) homogenous system
c) large volume of data needs (2)

Answer:
Data Sampling.
4. What is data quality analysis and what are the activities involved in data quality analysis. (4)

Answer:
Data quality analysis is the process to examine inconsistencies in the source data. This involves
identifying the inconsistencies in the following area.
1. Codification differences across various sources systems ( ex. Use of different codes /
values to convey the same meaning)
2. Multiple entries of the same entity
3. Out of range values / Missing values.
4. Identifying uniqueness of keys
5. Ensuring referential integrity
6. Business rules compliance
5. What are the scenarios where data analysis will be helpful ? (3)

Answer:
1) Data Migration
2) Data Cleansing for DWH
3) Data Modelling
6. What is data cleansing? How this process is carried out? (2)

Answer:
Data cleansing is an activity which involves identification of unwanted data in the source systems
and define rules to filter this data before migrating into the target system. Data cleansing requires
processing at field level to eliminate the unwanted data. Hence business rules or validations need to
be developed at each filed level which needs to be filtered during this cleansing process. The data
rejected due to this process is stored in log files for verification.

LEVEL-1 Above 10 Marks

LEVEL-2 Between 15 And 19 Marks


Data Migration
Q.No Quetsions
1. What is data migration? (2)

Answer:
Data migration is a process where in data from single or multiple source systems need to be
transferred into a target system. The source system can be of similar type or heterogeneous in
nature. Data from one file or table from the source can split across multiple tables in the target
system. The migration process should have a mechanism to reject erroneous records and load
them into an error table for validation or correction.
2. What are the most critical steps involved in data migration.? (4)

Answer:
Data migration process is critical to any system functioning. Hence this process should ensure
that the data migrated is accurate and clean. Data migration has the following processes
a) Data Analysis
b) Identify the data to be migrated / Extract the data from source systems.
c) Design source and target mappings.
d) Define business rules for transforming the data.
e) Define the values required for mandatory missing fields from the source data.
f) Design error handling and logging mechanism.
g) Identify tools / mechanism to carry the migration process.
h) Design data validation rules to validate the data migrated.
3. What would be the suggested approach for migrating data from various source systems?
i.e. how to consolidate data from various source systems. (2)

Answer:
Data migration process migrate data from single source or multiple source systems. The data to
be migrated need to be analyzed and profiling should be done to identify the essential data
elements from the source system. The data from various source systems can be extracted into
data files with a standard format defined. These data files will be loaded into the staging tables
during the migration process to consolidate the data from multiple source systems. The staging
table will have the same structure as the source data file. The data from this staging table will
then be processed and transformed before loading into target tables.
4. What is ETL? (3)

Answer:
ETL is basically a combination of

Extraction: Extracting the required data from source systems in a format defined during the
data analysis.
Transformation: Transforming the source data extracted by applying the rules defined during
the data profiling stage to load the data into the respective target tables.
Loading: Loading data into the target tables and eliminating unwanted data and storing it in
the error tables for verification.

ETL process can be carried out by using of the “commercially available of the shelf tools” or
by means of custom developed tools. The decision to go for custom tool or standard application
depends on the complexity of the migration process, data volume, and nature of source and
target systems and also the budget allocated for the migration process.
5. What is data mapping, what is its role in data migration process? (2)

Answer:
Data Mapping involves providing a mapping between the source data fields and target data
fields. The mapping can be either direct where a source filed is directly moved into a target
field or through a transformation rule on the field or by means of data enrichment where in a
new record will be created in the target table based on certain business rules.
6. Which stage of the data migration requires defining business rules? How to apply them
for the data being migrated? (3)

Answer:
Transformation stage.
The business rules defined will be used either to transform the source data as per the target
database design or to design validation criteria on the source fields before moving data into the
target table fields.
7. How to implement error handling during migration, how to ensure that there is no data
loss due to rejections during data migration? (3)

Answer:
Error handling and Error logging is most essential for data migration which helps is identifying
the corrupt and unwanted data to be removed from the source and log into error tables for
verification. The error handling should be strong enough to filter the erroneous records from
the source data. The rejected records should be stored in the error tables with the appropriate
description to indicate why these records are rejected with the reference to the source data or
records.
8. What is staging area? How it is different from the target tables. (2)

Answer:
Staging area is a temporary data area in the target database which will hold tables to store the
data extracted from various source systems. The table structure in staging area is similar to the
source file / table structure.
9. What is transformation? (3)

Answer:
In a migration process, it may not be possible to migrate data directly into the target system,
due to its design, changes in the business requirements, optimizing the data usage. Hence it is
essential that source data need to undergo few changes to cater to the need of target system and
the business requirements. The process of applying changes to the source data or enriching the
source data before migrating into target system is called ‘Transformation’ process. This process
is based on the business rules defined and the default values identified during the data profiling
stage.
10. What are the different transformation techniques?Explain each of them. (5)

Answer:
The following are the basic techniques used in the transformation process.
a) Structural transformation
In this transformation, there will be change in the structure of source record with
respect to the target database. This is a record level transformation process.
b) Content transformation
In content transformation changes happen in the data values of a record. This is an
attribute level transformation. In this technique Algorithms, Transformation rules /
Tables are used for changing the content of source to that of target database.
c) Functional Transformation
Functional transformation results in the creation of new records in the target
database based on the source data. This happens through data aggregation or by
combining one or more new attributes from a single source record or multiple
source records
11. Following is the requirement for designing error handling and logging mechanism in a
data migration process.
Data from two source systems having different data structure for storing the same information
need to be migrated into a target RDBMS system. The data is very critical to the system; hence
there should not be any data loss during the migration process. Hence it is required to capture
all the data which is rejected during the migration process along with the reasons why the data
got rejected. The error log mechanism should provide features to trace the source records back
for the rejected data looking into the error table.

Design an error logging and error handling mechanism for the above requirement. (3)

1) Traceability
2) Output to be logged to a reject file / table
3) Type of error

LEVEL-1 Between 10 And 19 Marks

LEVEL-2 Between 20 And 32 Marks

Potrebbero piacerti anche