Sei sulla pagina 1di 2

Code: 15A05602 R15

B.Tech III Year I Semester (R15) Regular & Supplementary Examinations November/December 2018
DATA WAREHOUSING & MINING
(Information Technology)
Time: 3 hours Max. Marks: 70
PART – A
(Compulsory Question)
*****
1 Answer the following: (10 X 02 = 20 Marks)
(a) What is data reduction?
(b) Mention the various tasks to be accomplished as part of data pre-processing.
(c) What are the uses of multi featured cubes?
(d) Explain the differences between star and snowflake schema.
(e) List two interesting measures for association rules.
(f) Define predictive model.
(g) What do you mean by cluster analysis?
(h) Define centroid distance.
(i) What do you understand by the term ‘mining data streams’?
(j) What is graph mining?

PART – B
(Answer all five units, 5 X 10 = 50 Marks)
UNIT – I
2 What is a data mining system? Illustrate in detail the integration of a data mining system with a data
warehouse system.
OR
3 What is data mining process? Explain its steps, also classify the various data mining system.
UNIT – II
4 Why multidimensional views of data & data cubes are used? Explain in detail the data cube
implementations.
OR
5 State the differences between the three main types of data warehouse usage: Information
processing, analytical processing and data mining. Describe in detail the characteristics of OLAP
system.
UNIT – III
6 Consider the training samples shown in the following table for a binary classification problem.
Instance A1 A2 A3 Target class
1 T T 1.0 +
2 T T 6.0 +
3 T F 5.0 -
4 F F 4 -
5 F T 7 -
6 F T 3 -
7 F F 8 -
8 T F 7 +
9 F T 5 -
(i) What is the entropy of this collection training samples with respect to the positive class?
(ii) What are the information gains of A1, A2 relative to these training samples?
(iii) For A3 which is continues attribute, compute the information gain for every possible split.
(iv) What is the best split (among A1, A2, A3) according to the information gain.
OR
Contd. in page 2

Page 1 of 2
Code: 15A05602 R15
7 Write the algorithm to discover frequent item sets without candidate generation and explain it with an
example.
UNIT – IV
8 Briefly describe and give examples of each of the following approaches to clustering:
Partitioning methods, hierarchical methods, density-based methods.
OR
9 Use the k means algorithm & Euclidean distance to cluster the following samples into 3 clusters:
X1(2,10);
X2(2,5); x3(8,4); y1(5,8);y2(7,5);y3(6,4);z1(1,2);z2(4,9); suppose that the initial seeds (centre of
cluster) as x1, x4 and z2. Run the k means algorithm for 1 epoch. At the end of this epoch show:
(i) The new cluster.
(ii) The center of the new clusters.
(iii) Draw a 10 by 10 space with all the 10 points and show the cluster.
(iv) How many more iterations are needed to converge? Draw the result for each point.
UNIT – V
10 Discuss in detail about web mining. Explain the algorithms used in mining the structure and content
of the web with suitable applications.
OR
11 What are multimedia databases? Explain the methods of mining multimedia databases.

*****

Page 2 of 2

Potrebbero piacerti anche