Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ON
AUTHORS
SUPRAJA K
CSE (2/4)
Ph: 040_23818232
e-mail: koneru_supu@yahoo.co.in
SWETHA P
CSE (2/4)
Ph: 9985389725
e-mail: swe_pinky@yahoo.com
2
INDEX
ABSTRACT
INTRODUCTION
WHAT IS DATAMINNG?
WHAT IS DATA WAREHOUSING?
HOW DO DATAMINING AND DATAWARE HOUSING WORK TOGETHER?
APPLICATIONS
ADVANTAGES
DISADVANTAGES
CONCLUSION
REFERENCES
ABSTRACT
We live in the age of information. Data is the most valuable resource of an enterprise.
In today’s competitive global business environment, understanding and managing enterprise
wide information is crucial for making timely decisions and responding to changing business
conditions. Many companies are realizing a business advantage by leveraging one of their
key assets – business Data. There is a tremendous amount of data generated by day-to-day
business operational applications. In addition there is valuable data available from external
sources such as market research organizations, independent surveys and quality testing labs.
Studies indicate that the amount of data in a given organization doubles every 5 years.
Data warehousing has emerged as an increasingly popular and powerful concept of
applying information technology to turn these huge islands of data into meaningful
information for better business. Data mining, the extraction of hidden predictive information
from large databases is a powerful new technology with great potential to help companies
focus on the most important information in their data warehouses. Data mining tools predict
future trends and behaviors, allowing businesses to make proactive, knowledge-driven
decisions.
This paper describes the practicalities and the constraints in Data mining and Data
warehousing and its advancements from the earlier technologies.
3
INTRODUCTION
Data Warehousing
• A data warehouse can be defined as any centralized data repository which can be
queried for business benefit
Data Mining
particular region and who would purchase them, given the sale of similar products in that
region.
WHAT IS DATA MINING?
Generally, data mining (sometimes called data or knowledge discovery) is the process
of analyzing data from different perspectives and summarizing it into useful information -
information that can be used to increase revenue, cuts costs, or both. Data mining software is
one of a number of analytical tools for analyzing data. It allows users to analyze data from
many different dimensions or angles, categorize it, and summarize the relationships
identified. Technically, data mining is the process of finding correlations or patterns among
dozens of fields in large relational databases.
Although data mining is a relatively new term, the technology is not. Companies have
used powerful computers to sift through volumes of supermarket scanner data and analyze
market research reports for years. However, continuous innovations in computer processing
power, disk storage, and statistical software are dramatically increasing the accuracy of
analysis while driving down the cost.
Dramatic advances in data capture, processing power, data transmission, and storage
capabilities are enabling organizations to integrate their various databases into data
warehouses. Data warehousing is defined as a process of centralized data management and
retrieval. Data warehousing, like data mining, is a relatively new term although the concept
itself has been around for years. Data warehousing represents an ideal vision of maintaining a
central repository of all organizational data. Centralization of data is needed to maximize user
access and analysis. Dramatic technological advances are making this vision a reality for
many companies. And, equally dramatic advances in data analysis software are allowing
users to access this data freely. The data analysis software is what supports data mining.
5
According to Bill Inman, author of Building the Data Warehouse and the guru who is
widely considered to be the originator of the data warehousing concept, there are generally
four characteristics that describe a data warehouse:
• Time-variant: The data warehouse contains a place for storing data that are five to 10
years old, or older, to be used for comparisons, trends, and forecasting. These data are
not updated.
This overview provides a description of some of the most common data mining
algorithms in use today. We have broken the discussion into two sections, each with a
specific theme:
Each section will describe a number of data mining algorithms at a high level, focusing
on the "big picture" so that the reader will be able to understand how each algorithm fits into
the landscape of data mining techniques.
The data taken by Data warehouse is selected and transformed and the useful data
is sent through Data mining.
The data, which is sent through data mining is evaluated and presented.
APPLICATIONS
Data Warehousing
o Metadata - information describing the model and definition of the source data
elements
7
• Transfer - processed data transferred to the data warehouse, a large database on a high
performance box.
Data Mining
• Medicine - drug side effects, hospital cost analysis, genetic sequence analysis,
prediction etc.
• Knowledge Acquisition
ADVANTAGES:
• Enhances end-user access to a wide variety of data.
• Business decision makers can obtain various kinds of trend reports e.g. the item with
the most sales in a particular area / country for the last two years.
A data warehouse can be a significant enabler of commercial business applications,
most notably Customer relationship Management (CRM).
DISADVANTAGES:
Data mining systems rely on databases to supply the raw data for input and this raises
problems in that databases tend be dynamic, incomplete, noisy, and large. Other problems
arise as a result of the adequacy and relevance of the information stored.
Limited Information
8
A database is often designed for purposes different from data mining and sometimes
the properties or attributes that would simplify the learning task are not present nor can they
be requested from the real world. Inconclusive data causes problems because if some
attributes essential to knowledge about the application domain are not present in the data it
may be impossible to discover significant knowledge about a given domain. For example
cannot diagnose malaria from a patient database if that database does not contain the red
blood cell count of the patients.
Missing data can be treated by discovery systems in a number of ways such as;
FUTURE VIEWS
The future of data mining lies in predictive analytics. The technology innovations in
data mining since 2000 have been truly Darwinian and show promise of consolidating and
stabilizing around predictive analytics. Variations, novelties and new candidate features have
been expressed in a proliferation of small start-ups that have been ruthlessly culled from the
herd by a perfect storm of bad economic news. Nevertheless, the emerging market for
predictive analytics has been sustained by professional services, service bureaus (rent a
recommendation) and profitable applications in verticals such as retail, consumer finance,
telecommunications, travel and leisure, and related analytic applications. Predictive analytics
have successfully proliferated into applications to support customer recommendations,
customer value and churn management, campaign optimization, and fraud detection. On the
product side, success stories in demand planning; just in time inventory and market basket
optimization are a staple of predictive analytics. Predictive analytics should be used to get to
9
know the customer, segment and predict customer behavior and forecast product demand and
related market dynamics. Be realistic about the required complex mixture of business
acumen, statistical processing and information technology support as well as the fragility of
the
resulting predictive model; but make no assumptions about the limits of predictive analytics.
Breakthroughs often occur in the application of the tools and methods to new commercial
opportunities.
CONCLUSION:
REFERENCES:
1.Books Referred:
a. www.kluweronline.nl
b. www.internet2.com
c. www.the-data-mine.com