Data Warehousing Concepts-Best One

Data warehousing concepts
1. What is difference between view and materialized view?

Views contains query whenever execute views it has read from base table
Where as M views loading or replicated takes place only once which gives you better query
performance
Refresh m views 1.on commit and 2. on demand
(Complete, never, fast, force)
2. What is bitmap index why its used for DWH?

a bitmap for each key value replaces a list of rowids. Bitmap index more efficient for data
warehousing because low cardinality, low updates, very efficient for where class
3. What is star schema? And what is snowflake schema?

The center of the star consists of a large fact table and the points of the star are the dimension
tables.
Snowflake schemas normalized dimension tables to eliminate redundancy. That is, the
Dimension data has been grouped into multiple tables instead of one large table.
Star schema contains demoralized dimension tables and fact table, each primary key values in
dimension table associated with foreign key of fact tables.
Here a fact table contains all business measures (normally numeric data) and foreign key values,
and dimension tables has details about the subject area.
Snowflake schema basically a normalized dimension tables to reduce redundancy in the dimension
tables
4.Why need staging area database for DWH?

Staging area needs to clean operational data before loading into data warehouse.
Cleaning in the sense your merging data which comes from different source
5.What are the steps to create a database in manually?

create os service and create init file and start data base no mount stage then give create data base
command.
6.Difference between OLTP and DWH?

OLTP system is basically application orientation (eg, purchase order it is functionality of an
application)
Where as in DWH concern is subject orient (subject in the sense custorer, product, item, time)
OLTP
Application Oriented
Used to run business
Detailed data
Current up to date
Isolated Data
Repetitive access
Clerical User
Performance Sensitive
Few Records accessed at a time (tens)
Read/Update Access
No data redundancy
Database Size 100MB-100 GB
DWH
Subject Oriented
Used to analyze business
Summarized and refined
Snapshot data
Integrated Data
Ad-hoc access
Knowledge User
Performance relaxed
Large volumes accessed at a time(millions)
Mostly Read (Batch Update)
Redundancy present
Database Size 100 GB - few terabytes
7.Why need data warehouse?

A single, complete and consistent store of data obtained from a variety of different sources made
available to end users in a what they can understand and use in a business context.
A process of transforming data into information and making it available to users in a timely enough
manner to make a difference Information
Technique for assembling and managing data from various sources for the purpose of answering
business questions. Thus making decisions that were not previous possible
8.What is difference between data mart and data warehouse?

A data mart designed for a particular line of business, such as sales, marketing, or finance.
Where as data warehouse is enterprise-wide/organizational
The data flow of data warehouse depending on the approach
9.What is the significance of surrogate key?

Surrogate key used in slowly changing dimension table to track old and new values and its derived
from primary key.
10.What is slowly changing dimension. What kind of scd used in your project?
Dimension attribute values may change constantly over the time. (Say for example customer
dimension has customer_id,name, and address) customer address may change over time.
How will you handle this situation?
There are 3 types, one is we can overwrite the existing record, second one is create additional new
record at the time of change with the new attribute values.
Third one is create new field to keep new values in the original dimension table.
11.What is difference between primary key and unique key constraints?

Primary key maintains uniqueness and not null values
Where as unique constrains maintain unique values and null values
12.What are the types of index? And is the type of index used in your project?
Bitmap index, B-tree index, Function based index, reverse key and composite index.
We used Bitmap index in our project for better performance.
13.How is your DWH data modeling(Details about star schema)?
14.A table have 3 partitions but I want to update in 3rd partitions how will you do?
Specify partition name in the update statement. Say for example
Update employee partition(name) a, set a.empno=10 where ename=Ashok
15.When you give an update statement how memory flow will happen and how oracles
allocate memory for that?
Oracle first checks in Shared sql area whether same Sql statement is available if it is there it uses.
Otherwise allocate memory in shared sql area and then create run time memory in Private sql area
to create parse tree and execution plan. Once it completed stored in the shared sql area wherein
previously allocated memory
16.Write a query to find out 5th max salary? In Oracle, DB2, SQL Server
Select (list the columns you want) from (select salary from employee order by salary)
Where rownum<5
17.When you give an update statement how undo/rollback segment will work/what are the
steps?
Oracle keep old values in undo segment and new values in redo entries. When you say rollback it
replace old values from undo segment. When you say commit erase the undo segment values and
keep new vales in permanent.
1.Can 2 Fact Tables share same dimensions Tables? How many Dimension tables are associated with
one Fact Table ur project?
Ans: Yes
2.What is ROLAP, MOLAP, and DOLAP...?
Ans: ROLAP (Relational OLAP), MOLAP (Multidimensional OLAP), and DOLAP (Desktop OLAP). In
these three OLAP architectures, the interface to the analytic layer is typically the same; what is quite
different is how the data is physically stored.
In MOLAP, the premise is that online analytical processing is best implemented by storing the
data multidimensionally; that is, data must be stored multidimensionally in order to be viewed in a
multidimensional manner.
In ROLAP, architects believe to store the data in the relational model; for instance, OLAP
capabilities are best provided against the relational database.
DOLAP, is a variation that exists to provide portability for the OLAP user. It creates
multidimensional datasets that can be transferred from server to desktop, requiring only the DOLAP
software to exist on the target system. This provides significant advantages to portable computer users,
such as salespeople who are frequently on the road and do not have direct access to their office server.
3.What is an MDDB? and What is the difference between MDDBs and RDBMSs?
Ans: Multidimensional Database There are two primary technologies that are used for storing the data
used in OLAP applications.
These two technologies are multidimensional databases (MDDB) and relational databases
(RDBMS). The major difference
between MDDBs and RDBMSs is in how they store data. Relational databases store their data in
a series of tables and
columns. Multidimensional databases, on the other hand, store their data in a large
multidimensional arrays.
For example, in an MDDB world, you might refer to a sales figure as Sales with Date, Product,
and Location coordinates of
12-1-2001, Car, and south, respectively.
Advantages of MDDB:
Retrieval is very fast because
The data corresponding to any combination of dimension members can be retrieved with a
single I/O.
Data is clustered compactly in a multidimensional array.
Values are caluculated ahead of time.
The index is small and can therefore usually reside completely in memory.
Storage is very efficient because
The blocks contain only data.
A single index locates the block corresponding to a combination of sparse dimension numbers.
4. What is MDB modeling and RDB Modeling?
Ans:
7. What is the difference btween OLTP & OLAP?
Ans: OLTP stand for Online Transaction Processing. This is standard, normalized database structure.
OLTP is designed for Transactions, which means that inserts, updates, and deletes must be fast.
Imagine a call center that takes orders. Call takers are continually taking calls and entering orders that
may contain numerous items. Each order and each item must be inserted into a database. Since the
performance of database is critical, we want to maximize the speed of inserts (and updates and
deletes). To maximize performance, we typically try to hold as few records in the database as
possible.
OLAP stands for Online Analytical Processing. OLAP is a term that means many things to many people.
Here, we will use the term OLAP and Star Schema pretty much interchangeably. We will assume that
star schema database is an OLAP system.( This is not the same thing that Microsoft calls OLAP;
they extend OLAP to mean the cube structures built using their product, OLAP Services). Here, we will
assume that any system of read-only, historical, aggregated data is an OLAP system.
A data warehouse(or mart) is way of storing data for later retrieval. This retrieval is almost always used
to support decision-making in the organization. That is why many data warehouses are considered to be
DSS (Decision-Support Systems).
Both a data warehouse and a data mart are storage mechanisms for read-only, historical, aggregated
data.
By read-only, we mean that the person looking at the data wont be changing it. If a user wants at the
sales yesterday for a certain product, they should not have the ability to change that number.
The historical part may just be a few minutes old, but usually it is at least a day old.A data
warehouse usually holds data that goes back a certain period in time, such as five years. In contrast,
standard OLTP systems usually only hold data as long as it is current or active. An order table, for
example, may move orders to an archive table once they have been completed, shipped, and received
by the customer.When we say that data warehouses and data marts hold aggregated data, we need to
stress that there are many levels of aggregation in a typical data warehouse.
31(i). What the difference is between a database, a data warehouse and a data mart?
Ans: -- A database is an organized collection of information.
-- A data warehouse is a very large database with special sets of tools to extract and cleanse
data from operational systems
and to analyze data.
-- A data mart is a focused subset of a data warehouse that deals with a single area of data and is
organized for quick
analysis.
32. What is Data Mart, Data WareHouse and Decision Support System explain briefly?
Ans: Data Mart:
A data mart is a repository of data gathered from operational data and other sources that is
designed to serve a particular
community of knowledge workers. In scope, the data may derive from an enterprise-wide database
or data warehouse or be more specialized. The emphasis of a data mart is on meeting the specific
demands of a particular group of knowledge users in terms of analysis, content, presentation, and
ease-of-use. Users of a data mart can expect to have data presented in terms that are familiar.
In practice, the terms data mart and data warehouse each tend to imply the presence of the other in
some form. However, most writers using the term seem to agree that the design of a data mart
tends to start from an analysis of user needs and that a data warehouse tends to start from
an analysis of what data already exists and how it can be collected in such a way that the
data can later be used. A data warehouse is a central aggregation of data (which can be
distributed physically); a data mart is a data repository that may derive from a data warehouse or
not and that emphasizes ease of access and usability for a particular designed purpose. In general,
a data warehouse tends to be a strategic but somewhat unfinished concept; a data mart tends to be
tactical and aimed at meeting an immediate need.
Data Warehouse:
A data warehouse is a central repository for all or significant parts of the data that an enterprise's
various business systems collect. The term was coined by W. H. Inmon. IBM sometimes uses the
term "information warehouse."
Typically, a data warehouse is housed on an enterprise mainframe server. Data from various online
transaction processing (OLTP) applications and other sources is selectively extracted and organized
on the data warehouse database for use by analytical applications and user queries. Data
warehousing emphasizes the capture of data from diverse sources for useful analysis and access,
but does not generally start from the point-of-view of the end user or knowledge worker who may
need access to specialized, sometimes local databases. The latter idea is known as the data mart.
data mining, Web mining, and a decision support system (DSS) are three kinds of applications
that can make use of a data warehouse.
Decision Support System:
A decision support system (DSS) is a computer program application that analyzes business data
and presents it so that users can make business decisions more easily. It is an "informational
application" (in distinction to an "operational application" that collects the data in the course of
normal business operation).
Typical information that a decision support application might gather and present would be:
Comparative sales figures between one week and the next
Projected revenue figures based on new product sales assumptions
The consequences of different decision alternatives, given past experience in a context that is
described
A decision support system may present information graphically and may include an expert system
or artificial intelligence (AI). It may be aimed at business executives or some other group of
knowledge workers.
71. What is difference between data scrubbing and data cleansing?
Ans: Scrubbing data is the process of cleaning up the junk in legacy data and making it accurate and
useful for the next generations
of automated systems. This is perhaps the most difficult of all conversion activities. Very often, this
is made more difficult when
the customer wants to make good data out of bad data. This is the dog work. It is also the most
important and can not be done
without the active participation of the user.
DATA CLEANING - a two step process including DETECTION and then CORRECTION of errors
in a data set
72. What is Metadata and Repository?
Ans:
Metadata. Data about data .
It contains descriptive data for end users.
Contains data that controls the ETL processing.
Contains data about the current state of the data warehouse.
ETL updates metadata, to provide the most current state.
20. What is Slowly changing dimensions?
Slowly changing dimensions are dimension tables that have
slowly increasing data as well as updates to existing data.
Q. What are the responsibilities of a data warehouse consultant/professional?
The basic responsibility of a data warehouse consultant is to publish the right data.
Some of the other responsibilities of a data warehouse consultant are:
1. Understand the end users by their business area, job responsibilities, and computer tolerance
2. Find out the decisions the end users want to make with the help of the data warehouse
3. Identify the best users who will make effective decisions using the data warehouse
4. Find the potential new users and make them aware of the data warehouse
5. Determining the grain of the data
6. Make the end user screens and applications much simpler and more template driven
Q. Stars and Cubes (Polaris)
The star schema and OLAP cube are intimately related. Star schemas are most appropriate for very
large data sets. OLAP cubes are most appropriate for smaller data sets where analytic tools can
perform complex data comparisons and calculations. In almost all OLAP cube environments, its
recommended that you originally source data into a star schema structure, and then use wizards to
transform the data into the OLAP cube.
Q. What is the necessity of having dimensional modeling instead of an ER modeling?
Compared to entity/relation modeling, it's less rigorous (allowing the designer more discretion in
organizing the tables) but more practical because it accommodates database complexity and
improves performance.
Q. Dimensions and Facts.
Dimensional modeling begins by dividing the world into measurements and context. Measurements
are usually numeric and taken repeatedly. Numeric measurements are facts. Facts are always
surrounded by mostly textual context that's true at the moment the fact is recorded. Facts are very
specific, well-defined numeric attributes. By contrast, the context surrounding the facts is openended and verbose. It's not uncommon for the designer to add context to a set of facts partway
through the implementation.
Dimensional modeling divides the world of data into two major types: Measurements and
Descriptions of the context surrounding those measurements. The measurements, which are
typically numeric, are stored in fact tables, and the descriptions of the context, which are typically
textual, are stored in the dimension tables.
A fact table in a pure star schema consists of multiple foreign keys, each paired with a primary key
in a dimension, together with the facts containing the measurements.
Every foreign key in the fact table has a match to a unique primary key in the respective dimension
(referential integrity). This allows the dimension table to possess primary keys that arent found in
the fact table. Therefore, a product dimension table might be paired with a sales fact table in which
some of the products are never sold.
Dimensional models are full-fledged relational models, where the fact table is in third normal form
and the dimension tables are in second normal form.
The main difference between second and third normal form is that repeated entries are removed
from a second normal form table and placed in their own snowflake. Thus the act of removing the
context from a fact record and creating dimension tables places the fact table in third normal form.
E.g. for Fact tables a Sales, Cost, Profit
E.g. for Dimensions a Customer, Product, Store, Time
Q. What are Additive Facts? Or what is meant by Additive Fact?
The fact tables are mostly very huge and almost never fetch a single record into our answer set. We
fetch a very large number of records on which we then do, adding, counting, averaging, or taking
the min or max. The most common of them is adding. Applications are simpler if they store facts in
an additive format as often as possible. Thus, in the grocery example, we dont need to store the
unit price. We compute the unit price by dividing the dollar sales by the unit sales whenever
necessary.
Q. What is a Conformed Dimension?
When the enterprise decides to create a set of common labels across all the sources of data, the
separate data mart teams (or, single centralized team) must sit down to create master dimensions
that everyone will use for every data source. These master dimensions are called Conformed
Dimensions.
Two dimensions are conformed if the fields that you use as row headers have the same domain.
Q. What is a Conformed Fact?
If the definitions of measurements (facts) are highly consistent, we call them as Conformed Facts.
Q. What is mean by Slowly Changing Dimensions and what are the different types of SCDs?
(Mascot)
Dimensions dont change in predicable ways. Individual customers and products evolve slowly and
episodically. Some of the changes are true physical changes. Customers change their addresses
because they move. A product is manufactured with different packaging. Other changes are actually
corrections of mistakes in the data. And finally, some changes are changes in how we label a
product or customer and are more a matter of opinion than physical reality. We call these variations
Slowly Changing Dimension (SCD).
The 3 fundamental choices for handling the slowly changing dimension are:
Overwrite the changed attribute, thereby destroying previous history
eg. Useful when correcting an error
Issue a new record for the customer, keeping the customer natural key, but creating a new
surrogate primary key
Create an additional field in the existing customer record, and store the old value of the attribute
in the additional field. Overwrite the original attribute field
A Type 1 SCD is an overwrite of a dimensional attribute. History is definitely lost. We overwrite when
we are correcting an error in the data or when we truly dont want to save history.
A Type 2 SCD creates a new dimension record and requires a generalized or surrogate key for the
dimension. We create surrogate keys when a true physical change occurs in a dimension entity at a
specific point in time, such as the customer address change or the product packing change. We
often add a timestamp and a reason code in the dimension record to precisely describe the change.
The Type 2 SCD records changes of values of dimensional entity attributes over time. The
technique requires adding a new row to the dimension each time theres a change in the value of an
attribute (or group of attributes) and assigning a unique surrogate key to the new row.
A Type 3 SCD adds a new field in the dimension record but does not create a new record. We might
change the designation of the customers sales territory because we redraw the sales territory map,
or we arbitrarily change the category of the product from confectionary to candy. In both cases, we
augment the original dimension attribute with an old attribute so we can switch between these
alternate realities.
Q. What are the techniques for handling SCDs?
Overwriting
Creating another dimension record
Creating a current value filed
Q. What is a Surrogate Key and where do you use it? (Mascot)
A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key. It is just
a unique identifier or number for each row that can be used for the primary key to the table.
It is useful because the natural primary key (i.e. Customer Number in Customer table) can change
and this makes updates more difficult.
Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the
primary keys (according to the business users) but ,not only can these change, indexing on a
numerical value is probably better and you could consider creating a surrogate key called, say,
AIRPORT_ID. This would be internal to the system and as far as the client is concerned you may
display only the AIRPORT_NAME.
Another benefit you can get from surrogate keys (SID) is in Tracking the SCD - Slowly Changing
Dimension.
A classical example:
On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's what would be in
your Employee Dimension). This employee has a turnover allocated to him on the Business Unit
'BU1' But on the 2nd of June the Employee 'E1' is muted from Business Unit 'BU1' to Business Unit
'BU2.' All the new turnover has to belong to the new Business Unit 'BU2' but the old one should
Belong to the Business Unit 'BU1.'
If you used the natural business key 'E1' for your employee within your data warehouse everything
would be allocated to Business Unit 'BU2' even what actually belongs to 'BU1.'
If you use surrogate keys, you could create on the 2nd of June a new record for the Employee 'E1'
in your Employee Dimension with a new surrogate key.
This way, in your fact table, you have your old data (before 2nd of June) with the SID of the
Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the SID of the employee 'E1' +
'BU2.'
You could consider Slowly Changing Dimension as an enlargement of your natural key: natural key
of the Employee was Employee Code 'E1' but for you it becomes
Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference with the natural key
enlargement process is that you might not have all part of your new key within your fact table, so
you might not be able to do the join on the new enlarge key a so you need another id.
Every join between dimension tables and fact tables in a data warehouse environment should be
based on surrogate key, not natural keys.
Q. What is the necessity of having surrogate keys?
Production may reuse keys that it has purged but that you are still maintaining
Production might legitimately overwrite some part of a product description or a
customer description with new values but not change the product key or the customer
key to a new value. We might be wondering what to do about the revised attribute
values (slowly changing dimension crisis)
Production may generalize its key format to handle some new situation in the
transaction system. E.g. changing the production keys from integers to alphanumeric or
may have 12-byte keys you are used to have become 20-byte keys
Acquisition of companies
Q. What are the advantages of using Surrogate Keys?
We can save substantial storage space with integer valued surrogate keys
Eliminate administrative surprises coming from production
Potentially adapt to big surprises like a merger or an acquisition
Have a flexible mechanism for handling slowly changing dimensions
Q. What are Factless Fact tables?

Fact tables which do not have any facts are called factless fact tables. They may consist of nothing
but keys.
There are two kinds of fact tables that do not have any facts at all.
The first type of factless fact table is a table that records an event. Many event-tracking tables in
dimensional data warehouses turn out to be factless.
E.g. A student tracking system that detects each student attendance event each day.
The second type of factless fact table is called a coverage table. Coverage tables are frequently
needed when a primary fact table in a dimensional data warehouse is sparse.
E.g. A sales fact table that records the sales of products in stores on particular days under each
promotion condition. The sales fact table does answer many interesting questions but cannot
answer questions about things that did not happen. For instance, it cannot answer the question,
which products were in promotion that did not sell? because it contains only the records of
products that did sell. In this case the coverage table comes to the rescue. A record is placed in the
coverage table for each product in each store that is on promotion in each time period.
Q. Explain the DW life cycle
Data warehouses can have many different types of life cycles with independent data marts. The
following is an example of a data warehouse life cycle.
In the life cycle of this example, four important steps are involved.
Extraction - As a first step, heterogeneous data from different online transaction processing systems
is extracted. This data becomes the data source for the data warehouse.
Cleansing/transformation - The source a is sent into the populating systems where the data is
cleansed, integrated, consolidated, secured and stored in the corporate or central data warehouse.
Distribution - From the central data warehouse, data is distributed to independent data marts
specifically designed for the end user.
Analysis - From these data marts, data is sent to the end users who access the data stored in the
data mart depending upon their requirement.
Q. What is the life cycle of DW?
Getting data from OLTP systems from diff data sources
Analysis & staging - Putting in a staging layer- cleaning, purging, putting surrogate keys, SCM ,
dimensional modeling
Loading
Writing of metadata
Q. What is a Star Schema?
A star schema is a set of tables comprised of a single, central fact table surrounded by denormalized dimensions. Each dimension is represented in a single table. Star schema implement
dimensional data structures with denormalized dimensions. Snowflake schema is an alternative to
star schema. A relational database schema for representing multidimensional data. The data is
stored in a central fact table, with one or more tables holding information on each dimension.
Dimensions have levels, and all levels are usually shown as columns in each dimension table.
Q. What is a Snowflake Schema?
A snowflake schema is a set of tables comprised of a single, central fact table surrounded by
normalized dimension hierarchies. Each dimension level is represented in a table. Snowflake
schema implements dimensional data structures with fully normalized dimensions. Star schema is
an alternative to snowflake schema.
An example would be to break down the Time dimension and create tables for each level; years,
quarters, months; weeks, days These additional branches on the ERD create ore of a Snowflake
shape then Star.
Q. What is a Join?
A join is a query that combines rows from two or more tables, views, or materialized views
("snapshots"). Oracle performs a join whenever multiple tables appear in the queries FROM clause. The
querys select list can select any columns from any of these tables. If any two of these tables have a
column name in common, you must qualify all references to these columns throughout the query with
table names to avoid ambiguity.
Q. What are join conditions?
Most join queries contain WHERE clause conditions that compare two columns, each from a different
table. Such a condition is called a join condition. To execute a join, Oracle combines pairs of rows,
each containing one row from each table, for which the join condition evaluates to TRUE. The columns
in the join conditions need not also appear in the select list.
Q. What is an equijoin?
An equijoin is a join with a join condition containing an equality operator. An equijoin combines rows
that have equivalent values for the specified columns.
Eg:
Select ename, job, dept.deptno, dname
From emp, dept
Where emp.deptno = dept.deptno;
Q. What are self joins?
A self join is a join of a table to itself. This table appears twice in the FROM clause and is followed by
table aliases that qualify column names in the join condition.
Eg:
SELECT e.ename || works for || e2.name Employees and their Managers
FROM emp e1, emp e2
WHERE e1.mgr = e2.empno;
ENAME
EMPNO
MGR
BLAKE
KING
12345
67890
67890
22446
Result: BLAKE works for KING

Q. What is an Outer Join?
An outer join extends the result of a simple join. An outer join returns all rows that satisfy the join
condition and those rows from one table for which no rows from the other satisfy the join condition. Such
rows are not returned by a simple join. To write a query that performs an outer join of tables A and B and
returns all rows from A, apply the outer join operator (+) to all columns of B in the join condition.
For all rows in A that have no matching rows in B, Oracle returns null for any select list expressions
containing columns of B.
Outer join queries are subject to the following rules and restrictions:
The (+) operator can appear only in the WHERE clause or, in the context of left correlation (that is,
when specifying the TABLE clause) in the FROM clause, and can be applied only to a column of a
table or view.
If A and B are joined by multiple join conditions, you must use the (+) operator in all of these
conditions. If you do not, Oracle will return only the rows resulting from a simple join, but without a
warning or error to advise you that you do not have the results of an outer join.
The (+) operator can be applied only to a column, not to an arbitrary expression. However, an
arbitrary expression can contain a column marked with the (+) operator.
A condition containing the (+) operator cannot be combined with another condition using the OR
logical operator.
A condition cannot use the IN comparison operator to compare a column marked with the (+)
operator with an expression.
A condition cannot compare any column marked with the (+) operator with a subquery.
If the WHERE clause contains a condition that compares a column from table B with a constant, the (+)
operator must be applied to the column so that Oracle returns the rows from table A for which it has
generated NULLs for this column. Otherwise Oracle will return only the results of a simple join.
In a query that performs outer joins of more than two pairs of tables, a single table can be the nullgenerated table for only one other table. For this reason, you cannot apply the (+) operator to columns
of B in the join condition for A and B and the join condition for B and C.
Set Operators: UNION [ALL], INTERSECT, MINUS
Set operators combine the results of two component queries into a single result. Queries containing set
operators are called compound queries.
The number and datatypes of the columns selected by each component query must be the same, but
the column lengths can be different.
If you combine more than two queries with set operators, Oracle evaluates adjacent queries from left to
right. You can use parentheses to specify a different order of evaluation.
Restrictions:
These set operators are not valid on columns of type BLOB, CLOB, BFILE, varray, or nested table.
The UNION, INTERSECT, and MINUS operators are not valid on LONG columns.
To reference a column, you must use an alias to name the column.
You cannot also specify the for_update_clause with these set operators.
You cannot specify the order_by_clause in the subquery of these operators.
All set operators have equal precedence. If a SQL statement contains multiple set operators, Oracle
evaluates them from the left to right if no parentheses explicitly specify another order.
The corresponding expressions in the select lists of the component queries of a compound query must
match in number and datatype. If component queries select character data, the datatype of the return
values are determined as follows:
If both queries select values of datatype CHAR, the returned values have datatype CHAR.
If either or both of the queries select values of datatype VARCHAR2, the returned values have
datatype VARCHAR2.
Q. What is a UNION?
The UNION operator eliminates duplicate records from the selected rows. We must match datatype
(using the TO_DATE and TO_NUMBER functions) when columns do not exist in one or the other table.
Q. What is UNION ALL?
The UNION ALL operator does not eliminate duplicate selected rows.
Note: The UNION operator returns only distinct rows that appear in either result, while the UNION ALL
operator returns all rows.
Q. What is an INTERSECT?
The INTERSECT operator returns only those rows returned by both queries. It shows only the distinct
values from the rows returned by both queries.
Q. What is MINUS?
The MINUS operator returns only rows returned by the first query but not by the second. It also
eliminates the duplicates from the first query.
Note: For compound queries (containing set operators UNION, INTERSECT, MINUS, or UNION ALL),
the ORDER BY clause must use positions, rather than explicit expressions. Also, the ORDER BY clause
can appear only in the last component query. The ORDER BY clause orders all rows returned by the
entire compound query.
Q. How many types of Sql Statements are there in Oracle?

There are basically 6 types of sql statements. They are:
a) Data Definition Language (DDL)
The DDL statements define and maintain objects and drop objects.
b) Data Manipulation Language (DML)
The DML statements manipulate database data.
c) Transaction Control Statements
Manage change by DML
d) Session Control
Used to control the properties of current session enabling and disabling roles and changing.
E.g. Alter Statements, Set Role
e) System Control Statements
Change Properties of Oracle Instance. E.g. Alter System
f) Embedded Sql
Incorporate DDL, DML and TCS in Programming Language. E.g. Using the Sql Statements in
languages such as 'C', Open, Fetch, execute and close
Q. What are the advantages of having an index? Or What is an index?
The purpose of an index is to provide pointers to the rows in a table that contain a given key value. In a
regular index, this is achieved by storing a list of rowids for each key corresponding to the rows with that
key value. Oracle stores each key value repeatedly with each stored rowid.
Q. What is PL/SQL?
PL/SQL is Oracles procedural language extension to SQL. PL/SQL enables you to mix SQL statements
with procedural constructs. With PL/SQL, you can define and execute PL/SQL program units such as
procedures, functions, and packages.
PL/SQL program units generally are categorized as anonymous blocks and stored procedures.
Q. What are procedures and functions? (KPIT Infotech, Pune)
A procedure or function is a schema object that consists of a set of SQL statements and other PL/SQL
constructs, grouped together, stored in the database, and executed as a unit to solve a specific problem
or perform a set of related tasks. Procedures and functions permit the caller to provide parameters that
can be input only, output only, or input and output values.
Q. What is the difference between Procedure and Function?
Procedures and functions are identical except that functions always return a single value to the caller,
while procedures do not return values to the caller.
Q. What are triggers?
Oracle allows to define procedures called triggers that execute implicitly when an INSERT, UPDATE, or
DELETE statement is issued against the associated table or, in some cases, against a view, or when
database system actions occur. These procedures can be written in PL/SQL or Java and stored in the
database, or they can be written as C callouts.
Committing and Rolling Back Transactions
The changes made by the SQL statements that constitute a transaction can be either committed or
rolled back. After a transaction is committed or rolled back, the next transaction begins with the next
SQL statement.
Committing a transaction makes permanent the changes resulting from all SQL statements in the
transaction. The changes made by the SQL statements of a transaction become visible to other user
sessions transactions that start only after the transaction is committed.
Rolling back a transaction retracts any of the changes resulting from the SQL statements in the
transaction. After a transaction is rolled back, the affected data is left unchanged as if the SQL
statements in the transaction were never executed.
Materialized Views for Data Warehouses
In data warehouses, materialized views can be used to precompute and store
aggregated data such as the sum of sales. Materialized views in these environments
are typically referred to as summaries, because they store summarized data. They
can also be used to precompute joins with or without aggregations. A materialized
view eliminates the overhead associated with expensive joins or aggregations for a
large or important class of queries.
The Need for Materialized Views
Materialized views are used in data warehouses to increase the speed of queries on
very large databases. Queries to large databases often involve joins between tables
or aggregations such as SUM, or both. These operations are very expensive in terms
of time and processing power.
How does MVs work?
The query optimizer can use materialized views by
automatically recognizing when an existing materialized view can and should be
used to satisfy a request. It then transparently rewrites the request to use the
materialized view. Queries are then directed to the materialized view and not to the
underlying detail tables. In general, rewriting queries to use materialized views
rather than detail tables results in a significant performance gain.
If a materialized view is to be used by query rewrite, it must be stored in the same
database as its fact or detail tables. A materialized view can be partitioned, and you
can define a materialized view on a partitioned table and one or more indexes on
the materialized view.

Data Warehousing Concepts-Best One

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Data Warehousing Concepts-Best One

Caricato da

Copyright:

Formati disponibili

Data warehousing concepts

1. What is difference between view and materialized view?

2. What is bitmap index why its used for DWH?

3. What is star schema? And what is snowflake schema?

4.Why need staging area database for DWH?

5.What are the steps to create a database in manually?

6.Difference between OLTP and DWH?

7.Why need data warehouse?

8.What is difference between data mart and data warehouse?

9.What is the significance of surrogate key?

11.What is difference between primary key and unique key constraints?

Q. What are Factless Fact tables?

Result: BLAKE works for KING

Q. How many types of Sql Statements are there in Oracle?

Potrebbero piacerti anche