Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
The information and knowledge gained can be used for applications ranging
from business management, production control, and market analysis, to
engineering design and science exploration.
The evolution of database technology.
Q) Explain Knowledge Discovery in Databases(KDD). (or)
Explain Data mining architecture with a neat sketch.
Data mining is a process of extracting or mining knowledge from large
amounts of data. Here is the list of steps involved in knowledge discovery
process:
Data Cleaning - In this step the noise and inconsistent data is removed.
Data Integration - In this step multiple data sources are combined.
Data Selection - In this step relevant to the analysis task are retrieved from
the database.
Data Transformation - In this step data are transformed or consolidated
into forms appropriate for mining by performing summary or aggregation
operations.
Data Mining - In this step intelligent methods are applied in order to extract
data patterns.
Pattern Evaluation - In this step, data patterns are evaluated.
Knowledge Presentation - In this step, knowledge is represented in various
forms like charts and graphs.
Fig: The architecture of a typical data mining systems
1. Relational Databases:
2. Data Warehouses:
3. Transactional Database:
These are constructed based on object relational data model. This model
extends the relational model by providing a rich data type for handling
complex objects and object orientation. Class hierarchies and object
inheritance properties are added to relational data model.
5. Spatial Database:
Both uses time related data. Temporal database usually stores relational
data that include time related attributes. Time series database stores
sequences of values that change with time, such as data collected regarding
the stock exchange.
Text databases are databases that contain word descriptions for objects.
These descriptions may not be simple key words but rather long sentences
or paragraphs.
Multimedia databases store image, audio and video data. Specialized storage
and search techniques are required to access multimedia databases.
Objects in one component database may differ mostly from other objects
called Heterogeneous databases. Legacy database is a group of
heterogeneous databases that combine different kinds of data systems.
9. World Wild Web:
Data mining functionalities are used to specify the kind of patterns in data
mining tasks. Descriptive mining tasks characterize the general properties
of the data, predictive mining perform the inference on the current data to
make predictions.
Association Analysis:
Frequent patterns, are patterns that occur frequently in data. A frequent
itemset typically refers to a set of items that often appear together in a
transactional data set
Eg. milk and bread, which are frequently
Cluster Analysis:
The data objects that do not comply with the general behavior or
model of the data can be considered as outliers. Outliers may be detected
using statistical tests that assume a distribution or probability model for the
data. Distance or deviation based methods are used to identify the outliers.
Statistics:
Statistics is the collection, analysis, interpretation or explanation, and
presentation of data. Data mining has an inherent connection with
statistics.
Statistical models are widely used to model data and data classes.
Eg. Data mining tasks like data characterization and classification uses
statistics.
Machine Learning
Machine learning investigates how computers can learn based on data.
Classic problems in machine learning that are highly related to data mining
are:
Supervised learning is basically a synonym for classification. The
supervision in the learning comes from the labeled examples in the training
data set.
Eg. Postal code recognition problem.
Active learning is a machine learning approach that lets users play an active
role in the learning process.
Eg. An active learning approach can ask a user (e.g., a domain
expert) to label an example, which may be from a set of unlabeled examples.
2. User Interaction:
The user plays an important role in the data mining process. Interesting areas of
research include:
Interactive mining: The data mining process should be highly interactive.
Thus, it is important to build flexible user interfaces and an exploratory
mining environment, facilitating the user‟s interaction with the system.
Ad hoc data mining and data mining query languages: Query languages
(e.g., SQL) have played an important role in flexible searching because they
allow users to pose ad hoc queries. Similarly, high-level data mining query
languages or other high-level flexible user interfaces will give users the
freedom to define ad hoc data mining tasks.
The wide diversity of database types brings about challenges to data mining.
These include:
Social impacts of data mining: With data mining penetrating our everyday
lives, it is important to study the impact of data mining on society. How can
we use data mining technology to benefit society? How can we guard against
its misuse?
The improper disclosure or use of data and the potential violation of
individual privacy and data protection rights are areas of concern that need
to be addressed.
Q) Discuss data mining applications
Business Intelligence:
Business intelligence (BI) technologies provide historical, current, and
predictive views of business operations. Examples include reporting, online
analytical processing, business performance management.
requirements
1. Bottom-Tier:
3. Top-Tier:
The top tier is a front-end client layer, which contains query and reporting
tools, analysis tools, and/or data mining tools (e.g., trend analysis,
prediction, and so on).
There are three data warehouse models: the enterprise warehouse, the data
mart, and the virtual warehouse.
Metadata Repository:
Metadata are data about data. A metadata repository should contain the
following:
Facts are numerical measures. Fact table contains the names of the
facts, measures and leys related to each dimension table.
This organization provides users with the flexibility to view data from
different perspectives.
•Roll up (drill-up): The roll-up operation (also called the drill-up operation)
performs aggregation on a data cube, either by climbing up a concept
hierarchy for a dimension or by dimension reduction.
•Drill down (roll down): Drill-down is the reverse of roll-up. It navigates
from less detailed data to more detailed data.
Drill-down can be realized by either stepping down a concept hierarchy for a
dimension or introducing additional dimensions.
Design Process:
Choose the business process grain, which is the fundamental, atomic level
of data to be represented in the fact table for this process
Eg. individual transactions, individual daily snapshots.
Choose the dimensions that will apply to each fact table record
Eg. time, item, customer, supplier
Choose the measures that will populate each fact table record.
Eg. Typical measures are numeric additive quantities like dollars sold and
units sold.
DataWarehouse Implementation:
SELECT item, city, year, SUM (amount) FROM SALES CUBE BY item, city,
year;
There are three choices for data cube materialization given a base cuboid:
1. No materialization: Do not precompute any of the “nonbase” cuboids.
This leads to computing expensive multidimensional aggregates on-the-fly,
which can be extremely slow.
Bitmap Indexing:
The bitmap indexing method is popular in OLAP products because it allows
quick searching in data cubes. The bitmap index is an alternative
representation of the record ID (RID) list.
Join Indexing:
The join indexing method gained popularity fromits use in relational
database query processing.
Traditional indexing maps the value in a given column to a list of rows
having that value. In contrast, join indexing registers the joinable rows of
two relations from a relational database.