Sei sulla pagina 1di 11

MC9280- DATAMINING AND DATA WAREHOUSING

QUESTION BANK
UNIT-I
DATA WAREHOUSING

PART A
1. Define the term Data Warehouse.
2. Write down the applications of data warehousing.
3. When is data mart appropriate?
4. List out the functionality of metadata.
5. What are nine decision in the design of a Data warehousing?
6. List out the two different types of reporting tools.
7. Why data mining is used in all organizations.
8. What are the technical issues to be considered when designing and implementing a data
warehouse environment?
9. List out some of the examples of access tools.
10. What are the advantages of data warehousing.
11. Give the difference between the Horizontal and Vertical Parallelism.
12.. Define star schema.
13.What are the steps to be followed to store the external source into the data warehouse?
14. Define Legacy data.
15. Draw the standard framework for metadata interchange.
16. List out the five main groups of access tools.
17. Define Data Visualization.
18. How is data warehouse different from database? How are they similar?
19. What is data transformation? Give example.
20. With an example explain what is Meta data?
21. What is data mart?
22. Difference between OLAP and OLTP.
23. What is meant by OLAP?
24 . List out the five categories of decision support tools.
25.What is data cube?
26. List out any 5 OLAP guidelines.
27.Distinguish between multidimensional and multi-relational OLAP.
28. Define ROLAP and MQE.

PART-B
1. Enumerate the building blocks of data warehouse. Explain the importance of metadata in a
data warehouse environment. [16]
2. Explain various methods of data cleaning in detail [8]
3. Diagrammatically illustrate and discuss the data warehousing architecture with briefly explain
components of data warehouse [16]
4. (i) Distinguish between Data warehousing and data mining. [8] (ii)Describe in detail about
data extraction, cleanup [8]
5. Write short notes on
(i)Transformation [8] (ii)Metadata [8]
6. List and discuss the steps involved in mapping the data warehouse to a multiprocessor
architecture. [16]
7. Discuss in detail about Bitmapped Indexing [16]
8. Explain in detail about different Vendor Solutions. [16]
9. Discuss the typical OLAP operations with an example. [6]
10. List and discuss the basic features that are provided by reporting
and query tools used for business analysis. [16]
11. Describe in detail about Cognos Impromptu [16]
12. Explain about OLAP in detail. [16]
13. With relevant examples discuss multidimensional online analytical processing and multi-
relational online analytical processing. [16]
14. Discuss about the OLAP tools and the Internet [16]
15. (i)Explain Multidimensional Data model. [10]
(ii)Discuss how computations can be performed efficiently on data cubes. [6]

UNIT-II
DATA MINING AND ASSOCIATION RULE

PART A
1. State why the data preprocessing an important issue for data warehousing and data mining.
2. What is the need for Discretization in data mining?
3. What are the various forms of data preprocessing?
4. What is concept Hierarchy? Give an example.
5. What are the various forms of data preprocessing?
6. Mention the various tasks to be accomplished as part of data pre-processing.
7. Define Data Mining.
8. List out any four data mining tools.
9. What do data mining functionalities include?
10. Define patterns.
11.What are the various forms of data preprocessing?
12.What is Correlation Analysis?
13.Give the Different data formats for data mining.
14.Explain the purpose of data reduction.
15.List the interesting measures of an association rule.
16.What do you mean by segmentation in Discretization of Numerical value.
17.Describe the inter attribute and intra attribute Relationship.
18. What is meant by market Basket analysis?
19. What is the use of multilevel association rules?
20. What is meant by pruning in a decision tree induction?
21. Write the two measures of Association Rule.
22. With an example explain correlation analysis.
23. Define conditional pattern base.

PART-B
1. (i) Explain the various primitives for specifying Data mining Task.
(ii) Describe the various descriptive statistical measures for data mining.
[10]
[6]
2. Discuss about different types of data and functionalities. [16]
3. (i)Describe in detail about Interestingness of patterns.
(ii)Explain in detail about data mining task primitives.
[10]
[6]
4.

5.
(i)Discuss about different Issues of data mining.
(ii)Explain in detail about data preprocessing.
How data mining system are classified? Discuss each classification with an
example.
[6]
[10]
[16]
6. How data mining system can be integrated with a data warehouse? Discuss with
an example.
[16]
7.Discuss in detail the various steps involved in KDD with suitable illustration.

8. Explain an efficient and scalable method for finding frequent item set with suitable example.

9.How can data be preprocessed as to improve the efficiency in the case of data mining
process.?
10.Explain the uses and steps involved in Apriori algorithm. Discuss it with suitable example.
11.Write different approaches to data transformation. Explain various methods of data cleaning
with examples



UNIT-III
CLASSIFICATION AND PREDICTION

PART A
1.What is constraint based mining?
2.Give the main advantage in employing a lazy learning method.
3.distinguish between clustering and classification
4.What is the role of classification.
5.What is meant by network pruning.
6. List out the major strength of decision tree method.
7. In classification trees, what are the surrogate splits, and how are they used?
8. The Nave Bayes classifier makes what assumptions that motivate its name?
9. What is the frequent item set property?
10. List out the major strength of the decision tree Induction.
11. Write the two measures of association rule.
12. How are association rules mined from large databases?
13. What is tree pruning in decision tree induction?
14. What is the use of multi level association rules?
15. What are the Apriori properties used in the Apriori algorithms?
16. How is predication different from classification?
17. What is a support vector machine?
18. What are the means to improve the performance of association rule mining algorithm?
19. State the advantages of the decision tree approach over other approaches for performing
classification.

PART-B
1. Decision tree induction is a popular classification method. Taking one typical decision tree
induction algorithm briefly outline the method of decision tree classification. [16]
2. Consider the following training dataset and the original decision tree induction algorithm
(ID3). Risk is the class label attribute. The Height values have been already discredited into
disjoint ranges. Calculate the information gain if Gender is chosen as the test attribute. Calculate
the information gain if Height is chosen as the test attribute. Draw the final decision tree (without
any pruning) for the training dataset. Generate all the IF-THEN rules from the decision tree.
Gender Height Risk
F (1.5, 1.6) Low
M (1.9, 2.0) High
F (1.8, 1.9) Medium F (1.8, 1.9) Medium F (1.6, 1.7) Low
M (1.8, 1.9) Medium
F (1.5, 1.6) Low M (1.6, 1.7) Low M (2.0, 8) High M (2.0, 8) High
F (1.7, 1.8) Medium M (1.9, 2.0) Medium F (1.8, 1.9) Medium F (1.7, 1.8) Medium
F (1.7, 1.8) Medium [16]
(a) Given the following transactional database
1 C, B, H
2 B, F, S
3 A, F, G
4 C, B, H
5 B, F, G
6 B, E, O
(i) We want to mine all the frequent item sets in the data using the Apriori algorithm.
Assume the minimum support level is 30%. (You need to give the set of frequent item sets in L1,
L2, candidate item sets in C1, C2,) [9]
(ii) Find all the association rules that involve only B, C.H (in either left or right hand side of the
rule). The minimum confidence is 70%. [7]
3. Describe the multi-dimensional association rule, giving a suitable example. [16]
4. (a)Explain the algorithm for constructing a decision tree from training samples [12]
(b)Explain Bayes theorem. [4]
6. Develop an algorithm for classification using Bayesian classification. Illustrate the algorithm
with a relevant example. [16]
7. Discuss the approaches for mining multi level association rules from the transactional
databases. Give relevant example. [16]
8. Write and explain the algorithm for mining frequent item sets without candidate generation.
Give relevant example. [16]
9. How is attribute oriented induction implemented? Explain in detail. [16]
10. Discuss in detail about Bayesian classification [8]
11. A database has four transactions. Let min sup=60% and min conf=80%.
TID DATE ITEMS_BOUGHT
T100 10/15/07 {K,A,B}
T200 10/15/07 {D,A,C,E,B}
T300 10/19/07 {C,A,B,E}
T400 10/22/07 {B,A,D}
Find all frequent item sets using Apriori and FP growth, respectively. Compare the efficiency of
the two mining process. [16]
12. Explain briefly the issues regarding Classification and Prediction with relevant example.
13. Discuss about Bayesian classification and explain how Bayesian approach applied on
Datasets.
14.What is a neural network? Explain back propagation algorithm.
15.Explain the method of ID3 decision tree classification algorithm with an example (12).
Explain tree pruning with an example (4).

UNIT-IV
CLUSTERING ANALYSIS

Part A
1. What are the requirements of clustering?
2. Types of data in Cluster analysis.
3.Give categorization of major clustering methods.
4. Distinguish between classification and clustering.
5.Define K-Medoids.
6.Give various density methods.
7. What is the objective function of K-means algorithm?
8. Mention the advantages of Hierarchical clustering.
9. Define Partitioning Methods.
10. List the requirements of clustering in data mining.
11. Give model-based methods
12. Give grid-based methods
13.What is nominal variable?
14.What is the use of competitive learning?
15. What is cluster analysis?
16. What are the two data structures in cluster analysis?
17. What is an outlier? Give example.
18.Define Euclidean distance?
19..List the characteristics of grid-based clustering method.
20.Name any two Grid-Based clustering methods.
21. Give the main advantage in employing a lazy learning method.
PART-B
1. BIRCH and CLARANS are two interesting clustering algorithms that perform effective
clustering in large data sets.
(i) Outline how BIRCH performs clustering in large data sets. [10] (ii) Compare and outline the
major differences of the two scalable clustering algorithms BIRCH and CLARANS. [6]
2.Explain the different ways of representing clusters for Hierarchical clustering with example.
3. Discuss and elaborate the current trends in data mining. [6+5+5]
4. With a suitable example, explain K-Means Clustering algorithm.
5. (a) Explain the following clustering methods in detail.
(a) BIRCH (b) CURE [16]
6. Describe the working of PAM (Partitioning Around Medoids) algorithm.
7. Write short notes on
(i) Partitioning methods [8] (ii) Outlier analysis [8]
8. Describe K means clustering with an example. [16]
9. Describe in detail about Hierarchical methods.
10. With relevant example discuss constraint based cluster analysis. [16]

UNIT-V
MINING OBJECT, SPATIAL, MULTIMEDIA, TEXT AND WEB DATA

PART A
1. What is spatial mining?
2. What are the applications of spatial data bases?
3. What is text mining?
5. Define a Spatial database.
6. List out any two various commercial data mining tools.
7. What is web usage mining?
8. What is text mining?
9. What are the applications of spatial databases?
10. What is audio data mining?
11. List two application of data mining
12. List the types of data which can be stored in multimedia database.
13. What is sequential pattern mining?
14. List the major challenges faced in bringing data mining research to market.
15. Give three advantages of web mining.
16. Define privacy preserving data mining.
17. What are the data mining tasks performed on a text database.
18. List out any two various commercial data mining tools.
19.What is web mining? Give its classification
20.List Advantages of web mining.
PART B
1. Explain the major function of data mining in multimedia and how it can be used for
surveillance system
2. Discuss the importance of multidimensional analysis with complex data in detail
3. Explain how data mining is used in financial data analysis
4. Write short notes on the following:
i) Spatial data mining ii) multimedia data mining
5. Explain in detail the concepts involved in mining spatial database
6. Discuss web usage mining in detail with example
7. How is web usage mining different from web structure mining and web content mining?
Discuss about the social impacts and various trends in data mining?
8. In what way text mining is related to web mining? Brief on the techniques that are currently
used in the field of text mining?
9. Explain the need of data mining in retail industry (ii) Discuss in detail about any one data
mining tool.
10. (i)Explain how data mining is used for instruction detection
(ii) Discuss the various steps involved in text mining?
11. Write a short note on web mining taxonomy. Explain the different activities of text mining.
12. Discuss spatial data bases and Text databases [16]
13. What is a multimedia database? Explain the methods of mining multimedia database? [16]
14. Discuss in detail about any four data mining applications [16]

Potrebbero piacerti anche