Sei sulla pagina 1di 4

Define Data warehouse.

What are roles of education in a data warehousing


delivery process?
Data Warehouse: In its simplest form, a data ware house is a collection of key
pieces of information used to manage and direct the business for the most profitable
outcome. It would decide the amount of inventory to be held, the no. of employees to
be hired, the amount to be procured on loan etc.,.
The above definition may not be precise - but that is how data ware house systems
are. There are different definitions given by different authors, but we have this idea in
mind and proceed. It is a large collection of data and a set of process managers that
use this data to make information available. The data can be meta data, facts,
dimensions and aggregations. The process managers can be load managers, ware
house managers or query managers. The information made available is such that they
allow the end users to make informed decisions.
Roles of education in a data warehousing delivery process:-
This has two roles to play - one to make people, specially top level policy makers,
comfortable with the concept. The second role is to aid the prototyping activity. To
take care of the education concept, an initial (usually scaled down) prototype is
created and people are encouraged to interact with it. This would help achieve both
the activities listed above. The users became comfortable with the use of the system
and the ware house developer becomes aware of the limitations of his prototype
which can be improvised upon.
d) Give the architectures of data mining systems.

e) What are the guidelines for KDD environment ?


It is customary in the computer industry to formulate rules of thumb that help information technology (IT) specialists to apply new developments. In setting up
a reliable data mining environment we may follow the guidelines so that KDD system may work in a manner we desire.
i). Support extremely large data sets
ii). Support hybrid learning
iii). Establish a data warehouse
iv). Introduce data cleaning facilities
v). Facilitate working with dynamic coding
vi). Integrate with decision support system
vii). Choose extendible architecture
viii). Support heterogeneous databases
ix). Introduce client/server architecture
x). Introduce cache optimization
1. a) With the help of a diagram explain architecture of data warehouse.
The architecture for a data ware is indicated below. Before we proceed further, we should be clear about the concept of architecture. It only gives the major
items that make up a data ware house. The size and complexity of each of these items depend on the actual size of the ware house itself, the specific
requirements of the ware house and the actual details of implementation.

Before looking into the details of each of the managers we could get a broad idea about their functionality by mapping the processes that we studied in the
previous chapter to the managers. The extracting and loading processes are taken care of by the load manager. The processes of cleanup and transformation of
data as also of back up and archiving are the duties of the ware house manage, while the query manager, as the name implies is to take case of query
management.

What is an event in data warehousing? List any five events.


An event is defined as a measurable, observable occurrence of a defined action. If this definition is quite vague, it is because it encompasses a very large set of
operations. The event manager is a software that continuously monitors the system for the occurrence of the event and then take any action that is suitable
(Note that the event is a measurable and observable occurrence). The action to be taken is also normally specific to the event.
A partial list of the common events that need to be monitored are as follows:
i). Running out of memory space.
ii). A process dying
iii). A process using excessing resource
iv). I/O errors
v). Hardware failure

Define data marting. List the reasons for data marting.


The data mart stores a subset of the data available in the ware house, so that one need not always have to scan through the entire content of the ware house. It is
similar to a retail outlet. A data mart speeds up the queries, since the volume of data to be scanned is much less. It also helps to have tail or made processes for
different access tools, imposing control strategies etc.,.
Following are the reasons for which data marts are created:
i) Since the volume of data scanned is small, they speed up the query processing.
ii) Data can be structured in a form suitable for a user access too
iii) Data can be segmented or partitioned so that they can be used on different platforms and also different control strategies become applicable.

5. a) Explain how to categorize data mining system.


There are many data mining systems available or being developed. Some are specialized systems dedicated to a given data source or are confined to limited
data mining functionalities, other are more versatile and comprehensive. Data mining systems can be categorized according to various criteria among other
classification are the following:
a) Classification according to the type of data source mined: this classification categorizes data mining systems according to the type of data handled such as
spatial data, multimedia data, time-series data, text data, World Wide Web, etc.
b) Classification according to the data model drawn on: this classification categorizes data mining systems based on the data model involved such as relational
database, object-oriented database, data warehouse, transactional, etc.
c) Classification according to the king of knowledge discovered: this classification categorizes data mining systems based on the kind of knowledge discovered
or data mining functionalities, such as characterization, discrimination, association, classification, clustering, etc. Some systems tend to be comprehensive
systems offering several data mining functionalities together.
d) Classification according to mining techniques used: Data mining systems employ and provide different techniques. This classification categorizes data
mining systems according to the data analysis approach used such as machine learning, neural networks, genetic algorithms, statistics, visualization, database
oriented or data warehouse-oriented, etc.

b) List and explain different kind of data that can be mined.


Different kind of data that can be mined are listed below:-
i). Flat files: Flat files are actually the most common data source for data mining algorithms, especially at the research level.
ii). Relational Databases: A relational database consists of a set of tables containing either values of entity attributes, or values of attributes from entity
relationships.
iii). Data Warehouses: A data warehouse as a storehouse, is a repository of data collected from multiple data sources (often heterogeneous) and is intended to
be used as a whole under the same unified schema.
iv). Multimedia Databases: Multimedia databases include video, images, audio and text media. They can be stored on extended object-relational or object-
oriented databases, or simply on a file system.
v). Spatial Databases: Spatial databases are databases that in addition to usual data, store geographical information like maps, and global or regional
positioning.
vi). Time-Series Databases: Time-series databases contain time related data such stock market data or logged activities. These databases usually have a
continuous flow of new data coming in, which sometimes causes the need for a challenging real time analysis.
vii). World Wide Web: The World Wide Web is the most heterogeneous and dynamic repository available. A very large number of authors and publishers are
continuously contributing to its growth and metamorphosis and a massive number of users are accessing its resources daily.

Explain how decision trees are useful in data mining.


Decision trees are powerful and popular tools for classification and prediction. The attractiveness of tree-based methods is due in large part to the fact that, it is
simple and decision trees represent rules. Rules can readily be expressed so that we humans can understand them or in a database access language like SQL so
that records falling into a particular category may be retrieved.
What are three major activities of data ware house? Explain.
Three major activities of data ware house are :-
i) Populating the ware house (i.e. inclusion of data)
ii) day-to-day management of the ware house.
iii) Ability to accommodate the changes.

i). The processes to populate the ware house have to be able to extract the data, clean it up, and make it available to the analysis
systems. This is done on a daily / weekly basis depending on the quantum of the data population to be incorporated.
ii). The day to day management of data ware house is not to be confused with maintenance and management of hardware and
software. When large amounts of data are stored and new data are being continually added at regular intervals, maintaince of the
quality of data becomes an important element.
iii). Ability to accommodate changes implies the system is structured in such a way as to be able to cope with future changes without
the entire system being remodeled. Based on these, we can view the processes that a typical data ware house scheme should support as
follows.

What is aggregation? Explain the need of aggregation. Give


example.
Aggregation : Data aggregation is an essential component of any decision support data ware house. It helps us to ensure a
cost effective query performance, which in other words means that costs incurred to get the answers to a query would be
more than off set by the benefits of the query answer. The data aggregation attempts to do this by reducing the processing
power needed to process the queries. However, too much of aggregations would only lead to unacceptable levels of
operational costs.
Too little of aggregations may not improve the performance to the required levels. A file balancing of
the two is essential to maintain the requirements stated above. One thumbrule that is often suggested is that about three out
of every four queries would be optimized by the aggregation process, whereas the fourth will take its own time to get
processed. The second, though minor, advantage of aggregations is that they allow us to get the overall trends in the data.
While looking at individual data such overall trends may not be obvious, whereas aggregated data will help us draw certain
conclusions easily.
Give the reasons for creating the data mart.
The following are the reasons for which data marts are created :-
i). Since the volume of data scanned is small, they speed up the query processing.
ii). Data can be structured in a form suitable for a user access too
iii). Data can be segmented or partitioned so that they can be used on different platforms and
also different control strategies become applicable.

26. Explain the two stages in setting up data marts.


There are two stages in setting up data marts :-
i). To decide whether data marts are needed at all. The above listed facts may help you to
decide whether it is worth while to setup data marts or operate from the warehouse itself.
The problem is almost similar to that of a merchant deciding whether he wants to set up retail
shops or not.
ii). If you decide that setting up data marts is desirable, then the following steps have to be gone
through before you can freeze on the actual strategy of data marting.
a) Identify the natural functional splits of the organization.
b) Identify the natural splits of data.
c) Check whether the proposed access tools have any special data base structures.
d) Identify the infrastructure issues, if any, that can help in identifying the data marts.
e) Look for restrictions on access control. They can serve to demarcate the warehouse
details.

27. What are disadvantages of data mart?


There are certain disadvantages :-
i). The cost of setting up and operating data marts is quite high.
ii). Once a data strategy is put in place, the datamart formats become fixed. It may be fairly difficult to change the strategy later,
because the data marts formats also have to be changes.

28. What is role of access control issue in data mart design?


Role of access control issue in data mart design :-
This is one of the major constraints in data mart designs. Any data warehouse, with its huge volume
of data is, more often than not, subject to various access controls as to who could access which part of data. The easiest case is where
the data is partitioned so clearly that a user of each partition cannot access any other data. In such cases, each of these can be put in a
data mart and the user of each can access only his data .
In the data ware house, the data pertaining to all these marts are stored, but the partitioning are retained. If a super user wants to get an
overall view of the data, suitable aggregations can be generated.

Explain the responsibilities of each manager of data ware house.


Ware house Manager :-
The warehouse manager is responsible for maintaining data of the ware house. It should also create
and maintain a layer of meta data. Some of the responsibilities of the ware house manager are
o Data movement
o Meta data management
o Performance monitoring
o Archiving.
Data movement includes the transfer of data within the ware house, aggregation, creation and
maintenance of tables, indexes and other objects of importance. It should be able to create new aggregations as well as
remove the old ones. Creation of additional rows / columns, keeping track of the aggregation processes and creating meta
data are also its functions.

25. What are the different system management tools used for
data warehouse?
The different system management tools used for data warehouse :-
i). Configuration managers
ii). schedule managers
iii). event managers
iv). database mangers
v). back up recovery managers
vi). resource and performance a monitors.

Q No. 4 [5]

Why data warehouse and transaction databases need to be different.


A Data Warehouse (DW) on the other end, is a database (yes, you are right, it's a database) that is designed for facilitating querying and analysis. Often designed as
OLAP (On-Line Analytical Processing) systems, these databases contain read-only data that can be queried and analysed far more efficiently as compared to your
regular OLTP application databases. In this sense an OLAP system is designed to be read-optimized.
Separation from Transaction database also ensures that your business intelligence solution is scalable (your bank and ATMs don't go down just because the CFO
asked for a report), better documented and managed (god help the novice who is given the application database diagrams and asked to locate the needle of data in
the proverbial haystack of table proliferation), and can answer questions far more efficietly and frequently.

Creation of a DW leads to a direct increase in quality of analyses as the table structures are simpler (you keep only the needed information in simpler tables),
standardized (well-documented table structures), and often denormalized (to reduce the linkages between tables and the corresponding complexity of queries). A
DW drastically reduces the 'cost-per-analysis' and thus permits more analysis per FTE. Having a well-designed DW is the foundation successful BI/Analytics initiatives
are built upon.

If you are still running your reports off the main transaction database, answer this simple question: Would the solution still work next year with 20% more customers,
50% more business, 70% more users, and 300% more reports? What about the year after next? If you are sure that your solution will run without any changes,
great!! However, if you have already budgeted to buy new state-of-the-art hardware and 25 new Oracle licenses with those partition-options and the 33 other cool-
sounding features, good luck to you. (You can probably send me a ticket to Hawaii, since it's gonna cost you just a minute fraction of your budget)
After all, both are databases, and both have some tables containing data. If you look deeper, you'd find that both have indexes, keys, views, and the regular jing-
bang. So is that 'Data warehouse' really different from the tables in your application? And if the two aren't really different, maybe you can just run your queries and
reports directly from your application databases!
Well, to be fair, that may be just what you are doing right now, running some EOD (end-of-day) reports as complex SQL queries and shipping them off to those who
need them. And this scheme might just be serving you fine right now. Nothing wrong with that if it works for you.
But before you start patting yourself on the back for having avoided a data warehouse altogether, do spend a moment to understand the differences, and to
appreciate the pros and cons of either approach.
The primary difference between transaction database and a data warehouse is that while the former is designed (and optimized) to record , the latter has to be
designed (and optimized) to respond to analysis questions that are critical for your business.
It's probably simpler and more sensible to create a new DW exclusively for your BI needs. And if you are cash strapped, you could easily do that at extremely low
costs by using excellent open source databases like MySQL.

Potrebbero piacerti anche