Sei sulla pagina 1di 3

1.

Three properties that define data quality are


a) Accuracy, completeness, consistency
b) Believability, interpretability, timeliness
c) Noise, completeness, missing entries
d) Consistency, correctness, timeliness

2. Data reduction includes


a) Dimensionality Reduction
b) Numerocity reduction
c) Discretization
d) All of the above

3. The measure used to evaluate the correlation for nominal data is


a) Chi-square test b) Pearson’s coefficient
c) Covariance d) Regression

4. To correct missing values


a) The tuple can be ignored
b) Use bayesian inference to identify the most probable value
c) Use measures of central tendency
d) All of the above

5. Market-basket problem was formulated by __________.


a) Agrawal et al. b) Steve et al.
c) Toda et al. d) Simon et al.

6. Reducing the number of attributes to solve the high dimensionality problem is called
as ________.
a) A.Dimensionality curse. b) Dimensionality reduction.
c) Cleaning. d) Overfitting

7. Which of the following is not a data mining metric


a) Space Complexity b) Time Complexity
c) ROI d) All of the above

8. Data that are not of interest to the data mining task is called as ______
a) Missing data b) Noisy Data
c) Irrelevant Data d) Uncorrelated data

9. Noisy data can be cleared using


a) Binning b) Scrubbing
c) Regression d) All of the above

10. The most effective method for calculating support and generating Frequent patterns
is
a) Apriori b) FP Growth Algorithm
c) Partition Algorithm d) None of the above
11. The full form of KDD is [ ]
a) Knowledge Discovery
b) Knowledge discovery in Databases
c) Knowledge data definition
d) Knowledge from Data warehouse to Data Mining

12. The other names of Data Mining are [ ]


a) Knowledge Extraction b) Data Dredging
c) Data Archaeology d) All Of The Above

13. Which of the following is not a Data Mining task [ ]


a) Frequent Pattern Mining b) Classification
c) Outlier analysis d) Machine Learning

14. Noisy data can be cleared using [ ]


a) Binning b) Scrubbing
c) Regression d) All of the above

15. Which of the following is not an attribute selection measure [ ]


a) Gain Ratio b) Gini Index
c) Information Gain d) Probability

16. __________describes the data contained in the data warehouse [ ]


a) Relational Data b) Relational Data
c) Metadata d) Metadata

17. The data is stored, retrieved & updated in ____________ [ ]


a) OLAP b) OLTP
c) SMTP d) FTP

18. Expansion for DSS in Data Warehousing__________ [ ]


a) Data Summary System b) Decision Support System
c) Datawarehouse Storage system d) Database Support System

19. The type of relationship in star schema is [ ]


a) Many-to-Many b) One-to-Many
c) One-to-One d) Many-to-One

20. Discovery of cross-sales opportunities is called ________________. [ ]


a) Segmentation b) Visualization
c) Correlation d) Association

21. Which of the following is not an attribute selection method [ ]


a) Stepwise forward selection b) Attribute construction
c) Stepwise backward elimination d) Attribute transformation
22. Reducing the number of attributes to solve the high dimensionality problem is called
as ________. [ ]
a) Dimensionality curse. b) Dimensionality reduction.
c) Cleaning. d) Overfitting

23. The proportion of transaction supporting x in t is called _________ [ ]


a) Support b) Confidence
c) Support Count d) None of the Above

1. The measure used to evaluate the correlation for nominal data is__________
2. Principal component analysis computes______________ vectors
3. Discovery of cross-sales opportunities is called ________________
4. If t consist of 500000 transactions, 20000 transaction contain bread, 30000 transaction
contain jam, 10000 transaction contain both bread and jam. Then the support of bread
and jam is _______
5. The first phase of a priori algorithm is _______
6. ___________predicts future trends & behaviors, allowing business managers to make
proactive, knowledge-driven decisions.
7. Removing duplicate records is a process called _____________recovery
8. The structure generated in FP Growth Algorithm is called ___________
9. The major steps in classification are ________ and ____________
10. Support is defined as the number of transactions containing a particular itemset to the
_______ of transactions
11. Replacing missing values can be done using ____________________
12. The right hand side of an association rule is called __________
13. The ________ is defined as the conditional probability of a set of items bought having
assumed that another item is already bought
14. The term __________ is a misnomer
15. The methods for smoothing of Data are ___________ and ____________
16. The ___________________ is a data transformation method where the raw values of a
numeric attribute (e.g., age) are replaced by interval labels (e.g., 0–10, 11–20, etc.) or
conceptual labels (e.g., youth, adult, senior).
17. __________ is a process where new attributes are constructed and added from the
given set of attributes to help the mining process
18. __________ is an example of a wavelet family
19. The second phase of a priori algorithm is _______.
20. Apriori property states that all____________ subsets of a frequent itemset must also be
frequent.

Potrebbero piacerti anche