Sei sulla pagina 1di 2

4. Differentiate between database management systems (DBMS) and data mining.

Ans. Database Management System (DBMS) is the software that manages data on physical storage devices. Data Mining: Data mining is the process of discovering relationships among data in the database. Area Task Type of result Method Example question DBMS Extraction of detailed and summary data Information Deduction (Ask the question, verify the data) Who purchased mutual funds in the last 3 years? Data mining Knowledge discovery of hidden patterns and insights Insight and Prediction Induction (Build the model, apply it to new data, get the result) Who will buy a mutual fund in the next 6 months and why?

Data mining is concerned with finding hidden relationships present in business data to allow businesses to make predictions for future use. It is the process of data-driven extraction of not so obvious but useful information from large databases. The aim of data mining is to extract implicit, previously unknown and potentially useful (or actionable) patterns from data. Data mining consists of many up-todate techniques such as classification (decision trees, naive Bayes classifier, k-nearest neighbor, and neural networks), clustering (k-means, hierarchical clustering, and density-based clustering), association (one-dimensional, multidimensional, multilevel association, constraint-based association). Data warehousing is defined as a process of centralized data management and retrieval. Data Warehouse is an enabled relational database system designed to support very large databases (VLDB) at a significantly higher level of performance and manage ability. Data warehouse is an Environment, not a product. It is an architectural construct of information that is hard to access or Present in traditional operational data stores.

5. Differentiate between K-means and Hierarchical clustering Ans. Hierarchal clustering is the sort that you might apply when there is a "tree" structure to the data. Think of the classification of living things. At the top, all of them, then splitting into plants, animals and other things such as funghi. Once you are on the animal branch, these splits into mammals, reptiles, etc, and you can keep going until you get down to individual species. AT NO TIME, when things have been split off from the rest of the data onto one of the branches, do subsets ever move to other branches. You might think about whether this is appropriate for your data. Once you have split your data up into two sets this split is final, and the process only subdivides further - nothing from set one ever moves back into set two.

K-means clustering does not assume a tree structure. In its pure form you might ask the computer - split these data values into three groups or four groups, but you can't guarantee that merging two groups from the four-group solution will produce the same as the three-group solution. If you have only two or three dimensions (or can sensibly reduce your data by factor analysis) you can plot it and see what sort of relationships you have. Are you looking for nice spherical clusters, or are long chains more suitable? You might consider that your data values were generated from multivariate normal random variables from groups with different means, and you might consider how best to identify these groups and their means. Sometimes data values fall into such clear groups that almost all clustering methods will find the same clusters. Where the boundaries are fuzzy, the solutions may be very different. I'll end with a little parable. Suppose I have a very willing idiot working for me, and I ask him to arrange my books nicely. He might do this by author or by subject, or by the colour of the cover, or the size of the book, or by weight, or by date of publication. If I simply ask for a "nice arrangement" I ought not to complain about any of these, and I might find one or more useful. If you just ask SPSS to use cluster analysis to produce a "nice arrangement" then, according to the method chosen, the order of the data and a possible random element, you might get one of many rather different nice arrangments, and the "best" of these depends on what you want the clustering for.

6. Differentiate between Web content mining and Web usage mining.


Ans.

Potrebbero piacerti anche