Sei sulla pagina 1di 15

http://dwhlaureate.blogspot.

com/

What is the level of granularity of a fact table?

A fact table is usually designed at a low level of granularity. This means that we need
to find the lowest level of information that can be stored in a fact table e.g.,
employee performance is a very high level of granularity.
Employee_performance_daily and employee_perfomance_weekly can be
considered as lower levels of granularity.

The granularity is the lowest level of information stored in the fact table. The depth of
the data level is known as granularity. In date dimension, the level could be year,
month, quarter, period, week, and day of granularity.

The process consists of the following two steps:

 Determining the dimensions that are to be included


 Determining the location to find the hierarchy of each dimension of the information

The above factors of determination will be re-sent as per the requirements.

What is the difference between ‘view’ and ‘materialized view’?

View:

 Tail raid data representation is provided with a view to access data from its table.
 It has logical structure that does not occupy space.
 Changes get affected in the corresponding tables.
Materialized view:

 Pre-calculated data persists in the materialized view.


 It has physical data space occupation.
 Changes will not get affected in the corresponding tables.

What is junk dimension?

 In scenarios where certain data may not be appropriate to store in the schema, the
data (or attributes) can be stored in a junk dimension. The nature of the data of junk
dimension is usually Boolean or flag values.
 A single dimension is formed by lumping a number of small dimensions. This is
called a junk dimension. Junk dimension has unrelated attributes. The process of
grouping random flags and text attributes in a dimension by transmitting them to a
distinguished sub-dimension is related to junk dimension.

What are the different types of SCDs used in Data Warehousing?

SCDs (slowly changing dimensions) are the dimensions in which the data changes
slowly, rather than changing regularly on a time basis.

Three types of SCDs are used in Data Warehousing:

 SCD1: It is a record that is used to replace the original record even when there is
only one record existing in the database. The current data will be replaced and the
new data will take its place.
 SCD2: It is the new record file that is added to the dimension table. This record
exists in the database with the current data and the previous data that is stored in
the history.
 SCD3: This uses the original data that is modified to the new data. This consists of
two records: one record that exists in the database and another record that will
replace the old database record with the new information.
8. What is a star schema?
Star schema is a schema used in data warehousing where a single fact table
references a number of dimension tables. In a star schema, “keys” from all the
dimension tables flow into the fact table. This entity-relationship diagram resembles a
star, hence it is named a Star schema.

9. What is a snow flake schema?


Just like the star schema, a single fact table references number of other dimension
tables in snow flake scheme. Here however, these dimension tables are further
normalized into multiple related tables. As these tables are further snow flaked into
smaller tables, this schema is called a snow flake schema.

Define fact-less fact.


Fact-less fact is a fact table that does not contain any value. Such a table only contains
keys from different dimension tables.

What do you understand by dimensional modeling?


Dimensional model is a methodology that consists of “dimensions” and “fact tables”.
Fact tables are used to store various transactional measurements from “dimension
tables” that qualifies the data.

What is a data mart?


Data mart is a subset of organizational data. In other words, it is a collection of data
specific to a particular group within an organization.

What is data aggregation?


Data aggregation is the broad definition for any process that enables information
gathering expression in a summary form, for statistical analysis.

What is a fact table? Explain how many fact tables are there in a star schema?
A fact table is nothing but a table which consists information about measurements, facts, metrics
of a business process. It is usually located in the center of a star schema. A star schema is also
called as a snowflake schema. Usually, a fact table consists of two types of columns:
1. The first column has the fact data
2. The second column has the foreign key relation
There is only one fact table that is stored in the star schema or snowflake schema. So, multiple
fact tables are stored under fact constellation schema.

Q: What is the main benefit of normalization?


With the use of normalization process, it helps in reducing the data redundancy. It helps to maintain
valid data that makes more sense to the users whenever it is needed.
Q: What is data marting? Explain the different kinds of costs associated?
The data marting is also called as a “data mart”. A data mart is nothing but a process of redefining
information about a specific data set that makes sense for a particular group.
The different kinds of costs associated with data marting are as follows:

1. Hardware related costs


2. Software related costs
3. Network access-related costs
4. Time costs

What Is The Difference Between View And Materialized View?


Answer :
View - store the SQL statement in the database and let you use it as a table. Every
time you access the view, the SQL statement executes. Materialized view - stores
the results of the SQL in table form in the database. SQL statement only executes
once and after that every time you run the query, the stored result set is used. Pros
include quick query results.

Question 88. What Is Surrogate Key? Where We Use It? Explain With
Examples.
Answer :
Surrogate key is a substitution for the natural primary key.It is just a unique identifier
or number for each row that can be used for the primary key to the table. The only
requirement for a surrogate primary key is that it is unique for each row in the table.
Data warehouses typically use a surrogate, (also known as artificial or identity key),
key for the dimension tables primary keys. They can use Info sequence generator, or
Oracle sequence, or SQL Server Identity values for the surrogate key.
It is useful because the natural primary key (i.e. Customer Number in Customer
table) can change and this makes updates more difficult.
Some tables have columns such as AIRPORT_NAME OR CITY_NAME which are
stated as the primary keys (according to the business users) but ,not only can these
change, indexing on a numerical value is probably better and you could consider
creating a surrogate key called, say, AIRPORT_ID. This would be internal to the
system and as far as the client is concerned, you may display only the
AIRPORT_NAME.

1. Question 119. What Is Data Mart?


Answer :
Data Marts is used on a business division/department level. A data mart only
contains the required subject specific data for local analysis. A database, or
collection of databases, designed to help managers make strategicdecisions
about their business. data marts are usually smaller and focus on a particular
subject or department. Some data marts, called dependent data marts, are
subsets of larger data warehouses. A data mart is a simpler form of a data
warehouse focused on a single subject (or functional area) such as sales, finance,
marketing, HR etc. Data Mart represents data from single business process.
2. Question 120. What Is The Definitions For Datawarehose And Datamart?
Answer :
Datamart is subset of Datawarehouse we can say a datamart is collection of
individual departmental information...Where as datawarehouse in collection of
datamart.
Data mart is a single subject and datawarehouse is a integration of multiple
subjects.
3. Question 121. What Are Data Marts
Answer :
Data Mart is a segment of a data warehouse that can provide data for reporting
and analysis on a section, unit, department or operation in the company, e.g.
sales, payroll, production. Data marts are sometimes complete individual data
warehouses which are usually smaller than the corporate data warehouse.
4. Question 122. What Is The Difference Between A Data Warehouse And A
Data Mart?
Answer :
A data mart is a subject oriented database which supports the business needs of
individual departments within the enterprise.It is an subset of the enterprise data
warehouse.It is also known as high performance query structures.

Question 130. What Are Aggregate Tables?


Answer :
Aggregate table contains the summary of existing warehouse data which is grouped
to certain levels of dimensions. It is always easy to retrieve data from aggregated
tables than visiting original table which has million records. Aggregate tables reduces
the load in the database server and increases the performance of the query and can
retrieve the result quickly.

1. Question 159. Differences Between Star And Snowflake Schemas ?


Answer :
The star schema is created when all the dimension tables directly link to the fact
table. Since the graphical representation resembles a star it is called a star
schema. It must be noted that the foreign keys in the fact table link to the primary
key of the dimension table. This sample provides the star schema for a sales_ fact
for the year 1998. The dimensions created are Store, Customer, Product_class
and time_by_day. The Product table links to the product_class table through the
primary key and indirectly to the fact table. The fact table contains foreign keys
that link to the dimension tables.
2. Question 160. What Is Fact Table?
Answer :
Fact Table contains the measurements or metrics or facts of business process. If
your business process is "Sales" , then a measurement of this business process
such as "monthly sales number" is captured in the Fact table. Fact table also
contains the foriegn keys for the dimension tables.
1. Question 176. What Is The Need Of Surrogate Key;why Primary Key Not
Used As Surrogate Key?
Answer :
Surrogate Key is an artificial identifier for an entity. In surrogate key values are
generated by the system sequentially(Like Identity property in SQL Server and
Sequence in Oracle). They do not describe anything. Primary Key is a natural
identifier for an entity. In Primary keys all the values are entered manually by the
user which are uniquely identified. There will be no repetition of data.
2. Question 177. Need For Surrogate Key Not Primary Key
Answer :
If a column is made a primary key and later there needs a change in the data type
or the length for that column then all the foreign keys that are dependent on that
primary key should be changed making the database Unstable . Surrogate Keys
make the database more stable because it insulates the Primary and foreign key
relationships from changes in the data types and length.

Question 212. Explain The Difference Between The Truncate And Delete
Commands?
Answer :
Truncate :
It is a DDL command, used to delete tables or clusters. Since it is a DDL command
hence it is auto commit and Rollback can't be performed. It is faster than delete.
Delete:
It is DML command, generally used to delete a record, clusters or tables. Rollback
command can be performed , in order to retrieve the earlier deleted things. To make
deleted things permanently, "commit" command should be used.

What is Dimension Table?

Dimension table is a table which contain attributes of measurements stored in fact


tables. This table consists of hierarchies, categories and logic that can be used to
traverse in nodes.

4. What is Fact Table?

Fact table contains the measurement of business processes, and it contains foreign
keys for the dimension tables.

Example – If the business process is manufacturing of bricks

Average number of bricks produced by one person/machine – measure of the business


process

24. What are the key columns in Fact and dimension tables?

Foreign keys of dimension tables are primary keys of entity tables. Foreign keys of fact
tables are the primary keys of the dimension tables.
25. What is SCD?

SCD is defined as slowly changing dimensions, and it applies to the cases where record
changes over time.

26. What are the types of SCD?

There are three types of SCD and they are as follows:

SCD 1 – The new record replaces the original record

SCD 2 – A new record is added to the existing customer dimension table

SCD 3 – A original data is modified to include new data

27. What is BUS Schema?

BUS schema consists of suite of confirmed dimension and standardized definition if


there is a fact tables.

28. What is Star Schema?

Star schema is nothing but a type of organizing the tables in such a way that result can
be retrieved from the database quickly in the data warehouse environment.

29. What is Snowflake Schema?

Snowflake schema which has primary dimension table to which one or more dimensions
can be joined. The primary dimension table is the only table that can be joined with the
fact table.

30. What is a core dimension?

Core dimension is nothing but a Dimension table which is used as dedicated for single
fact table or datamart.

31. What is called data cleaning?

Name itself implies that it is a self explanatory term. Cleaning of Orphan records, Data
breaching business rules, Inconsistent data and missing information in a database.

32. What is Metadata?

Metadata is defined as data about the data. The metadata contains information like
number of columns used, fix width and limited width, ordering of fields and data types of
the fields.

What is surrogate key?

Surrogate key is nothing but a substitute for the natural primary key. It is set to be a
unique identifier for each row that can be used for the primary key to a table.

https://www.complexsql.com/data-warehouse-interview-questions/
What is Star-schema?
This schema is used in data warehouse models where one centralized fact
table references number of dimension tables so as the keys (primary key)
from all the dimension tables flow into the fact table (as foreign key) where
measures are stored. This entity-relationship diagram looks like a star, hence
the name.

Consider a fact table that stores sales quantity for each product and customer
on a certain time. Sales quantity will be the measure here and keys from
customer, product and time dimension tables will flow into the fact table.

If you are not very familiar about Star Schema design or its use, we strongly
recommend you read our excellent article on this subject - different schema in
dimensional modeling

What is Fact and Dimension in Data Warehouse?


Fact in Data warehouse provides quantitative information about the business
process. They are also called as measurements and metrics. Quantity, Sales
Amount, Profit, Turnover etc. are some examples of Fact.

Dimension is an object that describes facts or business numbers. Product,


Location, Time are a few examples of Dimension.

What do you understand by dimensional modeling?


Dimensional model is a data modeling technique used in data warehouse
design. In this design model, all the data is stored in two types of tables -
Facts table and Dimension table. Fact tables hold numeric data. Dimension
tables hold the descriptive information to outline the facts.

What is Fact Table? What is Dimension table?


Fact Table

A fact table is used in the dimensional model in data warehouse design.


A Fact table contains the transactional measurements/facts or the
quantitative information of business processes and foreign keys to dimension
table.

Dimension Table

A dimension table contains dimensions of a fact. It provides descriptive


information for all the measurements recorded in fact table, i.e. Product ID,
Product Category etc. They are joined to fact table via a foreign key.

9. What is Data Mining?


Data Mining is the procedure of mining knowledge from huge sets of data. It
generates detailed insights of the business. The insights derived via Data
Mining can be used for marketing, fraud detection, decision-making process
etc.

10. What is the difference between view and


materialized view?
View

i.) A view is a virtual table formed from one or more base tables or views. It
doesn't physically hold any data.

ii.) Since a View is not pre-computed and stored on a disk, you always get
the updated data in a View when any changes are made to the original base
table.

iii.) It is used for security purpose. Using Views, you can restrict the user
from accessing sensitive information in a database.

iv.) It reduces the complexity of queries by getting data from several tables
into a single customized View.

Materialized view

i.) A Materialized View is the physical copy of the original base tables. It
holds data physically in a table.

ii.) It is pre-computed and stored on a disk like an object, and it is not


updated each time it is used.

iii.) Materialized view improves performance. Since it is pre-computed, it


responds faster in comparison to View.

11. What is the main difference between Star and


Snowflake Star Schema?
In dimensional model, all the data is stored in two types of tables - Facts
table and Dimension table. Measures or numbers are stored in fact table and
dimensions which describes the measures are stored in dimension table.

In this model, we have two popular schemas available. They are Star
Schema and Snowflake Schema. (Schema is nothing but arrangement of
tables or database structure)

Star Schema

Star schema is the simplest among the data warehousing schemas. It


consists of data in the form of facts and dimensions. It is called a star
schema because diagram resembles a star where the fact table is surrounded
by multiple de-normalized dimension tables.

Due to de-normalization in the star schema, you have redundant or duplicate


data that calls for more maintenance. The queries are simpler with lesser
joins.
Snowflake Schema

A Snowflake Schema is an extension of a Star Schema. It is called snowflake


because its diagram resembles a Snowflake. Unlike Star schema, some
dimension tables in the Snowflake schema can be normalized.

Due to normalization in the Snowflake schema, the redundancy is reduced


and therefore, it becomes easy to maintain and the save storage space.

14. Define cursor.


Answer:

A Cursor is a database object which helps in manipulating data row by row


representing a result set.

15. What is sub-query? Explain properties of sub-


query.
Answer:

A subquery is a SELECT statement that is nested within another T-SQL


statement, executed independently in which it is nested and returns a result
set.

Properties of Sub-Query:

 A subquery must be enclosed in the parenthesis.


 A subquery must be put in the right hand of the comparison operator.
 A subquery cannot contain ORDER-BY clause.
 A query can contain more than one sub-queries.

16. What is the difference between Clustered and


Non-Clustered Index?
Answer:

A clustered index reorders the way records in the table are physically
stored.
A Non-Clustered index creates a separate object within the table and does
not reorders the way records in the table was stored.

Non-clustered index has data pointers so there can be many non-clustered


indexes per table. While clustered index is distinct for every table.

17. What is database Trigger?


Answer:

A trigger is a special type of stored procedure that fires automatically in


response to DML and DDL events.

DML triggers execute when data is modified through a data manipulation


language (DML) event such as INSERT, UPDATE, or DELETE statements on a
table. DDL triggers execute in response to Transact-SQL CREATE, ALTER, and
DROP statements.

18. What are stored procedures? And what are the


advantages of using them?
Answer:

A stored procedure is a set of SQL statements that performs a user defined


operation. Since it is precompiled and stored in the database, it runs queries
faster. It also reduces network traffic as many queries can be included in a
stored procedure, round trip time to execute multiple queries from an
application to database and back is avoided.

20. What is a Database Lock ? What are the type of


locks ?
Answer:

Database lock provides exclusive access to the record. A user can only
modify those records to which he has applied a lock. This prevents data from
being corrupted when multiple users try to write to the database.

Type of lock

1. Shared Lock
When a shared lock is applied on data item, other transactions can only read
the item, but can't write into it.

2. Exclusive Lock
When an exclusive lock is applied on data item, other transactions can't read
or write into the data item.

21. What is a Composite Key?


Answer:

A composite primary key represents a set of columns whose values uniquely


identify every row in a table.

For example: if "StudentId" and "Student Name" in a table are combined to


uniquely identify a row, it is a Composite Key.

22. What is a Foreign Key?


Answer:

A foreign key is used to link two tables together. A foreign key in one table
points to a primary key in another table.

They are used to enforce referential integrity and prevent any actions that
would destroy links between tables with the corresponding data values.

23. What are the advantages and disadvantages of


views in a database?
Answer:

Advantages:

1. The result set of a view is not stored physically, doesn't consume extra
disk space.
2. The view hide some of the columns and complexity of joins from the user.
3. Views help limit data access to specific users.

Disadvantages:

1. When a table is dropped, associated view become irrelevant.


2. Since the view is created when a query requesting data from view is
triggered, its a bit slow.
24. What is a materialized view?
Answer:

It is a database object that contains the results of a query. Unlike Views


which are virtual tables composed of the result set of a SQL query,
Materialized Views store result set in a physical object like a table.

We can index materialized view.

Materialized view is used to improve the response time on expensive


operations such report queries which join two very large tables.

25. Explain the difference between DELETE,


TRUNCATE and DROP commands?
Answer:
DELETE operation deletes the data of a table but we can perform Commit
and Rollback to retrieve data. Where condition can be used along with a
delete statement.

TRUNCATE is used to delete the data of a table. Here, Commit and Rollback
statement can’t be performed. Where condition can't be used along with
TRUNCATE statement.

Drop command is used to drop a table definition and all the data, indexes,
triggers, constraints and permission specifications for that table.

14. Explain Function, Procedure and Package in


PL/SQL.
Answer:

Function: It's main purpose is to compute and return a single value.

Procedure: It can consist of several SQL statements and return multiple


values.

Package: It is a schema object which groups logically related functions,


procedures, variables and record type statements, thereby providing
modularity to PL/SQL programs.
https://www.1keydata.com/datawarehousing/concepts.html

https://cognosbitech.wordpress.com/2018/08/10/data-warehouse-concepts-interview-questions-
answers-part-i/

https://www.folkstalk.com/2012/12/how-delete-duplicate-records-table-oracle-sql.html#more

Potrebbero piacerti anche