Sei sulla pagina 1di 8

DESCRIPTIVE / UNSUPERVISED DATA MINING

Reported by: Cindy C. Gabayeron

Descriptive Data Mining Vs Predictive Data Mining

The descriptive and predictive data mining techniques are used in data mining to mine
the types of patterns. The descriptive analysis is used to mine data and provide the
latest information on past or recent events. On the other hand, the predictive analysis
provides answers of the future queries that move across using historical data as the
chief principle for decisions.

Data mining tasks can be descriptive, predictive and prescriptive. Here we are just
discussing the two of them descriptive and prescriptive. In simple words, descriptive
implicates discovering the interesting patterns or association relating the data whereas
predictive involves the prediction and classification of the behaviour of the model
founded on the current and past data.

Comparison Chart

BASIS FOR
DESCRIPTIVE MINING PREDICTIVE MINING
COMPARISON

Basic It identifies, what happened in It describes, what can happen in

the past by analyzing stored the future with the help past data

data analysis.

Require Data aggregation and data Statistics and forecasting

mining
methods

Preciseness Provides accurate data Produces results does not

ensure accuracy.
BASIS FOR
DESCRIPTIVE MINING PREDICTIVE MINING
COMPARISON

Type of approach Reactive Proactive

Practical analysis Standard reporting, query/drill Predictive modelling, forecasting,

methods down and ad-hoc reporting. simulation and alerts.

Definition of Descriptive Data Mining

Descriptive mining is generally used to produce correlation, cross tabulation,


frequency etcetera. These techniques are determined to find the regularities in the data
and to reveal patterns. The other application of descriptive analysis is to discover the
captivating subgroups in the major part of the data.

Descriptive analytics focuses on the summarization and conversion of the data into
meaningful information for reporting and monitoring. Furthermore, it permits to examine
the data in a detailed way so that it would be able to answer easily about “what has
happened?” and “what is happening?”. Clustering, summarization, association are the
techniques categorized under descriptive mining.

Definition of Predictive Data Mining

The primary objective of predictive mining is to predict future results instead of current
behaviour. It involves the supervised learning functions used for the prediction of the
target value. The methods fall under this mining category are the classification, time-
series analysis and regression. Data modelling is the necessity of the predictive
analysis, which works by utilizing some variables to anticipate the unknown future data
values for other variables.

Additionally, it also conducts the comparison among these supervised learning methods
for obtaining the prescience about the strength and weaknesses of each approach. This
whole process is performed to find out the most suitable method for extracting the
desired knowledge. The predictive analysis is used for providing information about
“what might happen?” and “why it might happen?”.

Key Differences Between Descriptive and Predictive Data Mining


1. Descriptive mining tasks describe the characteristics of the data in a target data
set. On the other hand, predictive mining tasks carry out the induction over the
current and past data so that predictions can be made.
2. In terms of accuracy, the descriptive technique is more precise and accurate as
compared to predictive mining.
3. The predictive analysis involves control over the situation along with responding
to it while descriptive analysis just responds to the situation.
4. The operation performed in the descriptive approach are standard reporting,
query/drill down and ad-hoc reporting which are capable of generating the
response of –

o what happened?
o where exactly is the problem?
o what is the frequency of the problem?
As against, predictive mining performs tasks like predictive modelling, forecasting,
simulation and alerts. These involve the result of questions like –

o what will happen next?


o what is the outcome if these trends continue?
o what actions are required to be taken?
Conclusion

Descriptive mining employs unsupervised learning functions while predictive uses


supervised learning techniques. This is the reason the descriptive analysis is not able to
anticipate the unknown target values but concentrates more on the intrinsic
arrangement, interconnections and relations. Conversely, predictive mining specifies
and distinguishes a set of data for future prediction.

Difference Between Supervised and Unsupervised Learning

Supervised and Unsupervised learning are the machine learning paradigms which are
used in solving the class of tasks by learning from the experience and performance
measure. The supervised and Unsupervised learning mainly differ by the fact that
supervised learning involves the mapping from the input to the essential output. On the
contrary, unsupervised learning does not aim to produce output in the response of the
particular input instead it discovers patterns in data.

These supervised and unsupervised learning techniques are implemented in various


applications such as artificial neural networks which is a data processing systems
containing a huge number of largely interlinked processing elements.
Comparison Chart

BASIS FOR SUPERVISED


UNSUPERVISED LEARNING
COMPARISON LEARNING

Basic Deals with labelled data. Handles unlabeled data.

Computational High Low

complexity

Analyzation Offline Real-time

Accuracy Produces accurate Generates moderate results

results

Sub-domains Classification and Clustering and Association rule

regression mining

Definition of Supervised Learning

Supervised learning method involves the training of the system or machine where the
training sets along with the target pattern (Output pattern) is provided to the system for
performing a task. Typically supervise means to observe and guide the execution of the
tasks, project and activity. But, where supervised learning can be implemented?
Primarily, it is implemented in the machine learning Regression and Cluster and Neural
networks.

Now, how do we train a model? The model is guided with the help of loading the model
with the knowledge, to facilitate the prediction of future instances. It uses labelled
datasets for the training. The artificial neural networks the input pattern train the network
which is also associated with the output pattern.

Definition of Unsupervised Learning

Unsupervised Learning model does not involve the target output which means no
training is provided to the system. The system has to learn by its own through
determining and adapting according to the structural characteristics in the input
patterns. It uses machine learning algorithms that draw conclusions on unlabeled data.
The unsupervised learning works on more complicated algorithms as compared to the
supervised learning because we have rare or no information about the data. It creates a
less manageable environment as the machine or system intended to generate results
for us. The main objective of the unsupervised learning is to search entities such as
groups, clusters, dimensionality reduction and perform density estimation.

Key Differences Between Supervised and Unsupervised Learning

1. Supervised learning technique deals with the labelled data where the output data
patterns are known to the system. As against, the unsupervised learning works
with unlabeled data in which the output is just based on the collection of
perceptions.
2. When it comes to the complexity the supervised learning method is less
complex while unsupervised learning method is more complicated.
3. The supervised learning can also conduct offline analysis whereas unsupervised
learning employs real-time analysis.
4. The outcome of the supervised learning technique is more accurate and reliable.
In contrast, unsupervised learning generates moderate but reliable results.
5. Classification and regression are the types of problems solved under the
supervised learning method. Conversely, unsupervised learning includes
clustering and associative rule mining problems.
Conclusion

Supervised learning is the technique of accomplishing a task by providing training, input


and output patterns to the systems whereas unsupervised learning is a self-learning
technique in which system has to discover the features of the input population by its
own and no prior set of categories are used.

POSSIBLE APPLICATIONS ON FRAUD DETECTION, TARGETED MARKETING,


AND CUSTOMER RELATIONS

Alongside increased risk associated with lending, banks have witnessed growing
fraudulent behavior. This behavior may be internal (by undisciplined staff) or external
(by fraudulent customers). In the insurance market, the incidence of fraudulent events
has grown, especially in certain geographical areas.

Overt fraud is known to be low, but suspect cases and claims that are resolved, for
example, by settlement between the counterparties, are significantly higher. Lack of
control over such events can lead to over time and (sometimes sizeable) losses.
Businesses do not have the right information needed to tackle a variety of fraudulent
situations. It is crucial for fraud managers to have as much information as possible to
spot fraudulent and new abnormal behavior early on, and to identify possible fraudulent
networks of people among counterparties, dealers, and other parties involved in the
business.

Fraud Detection is designed to help organizations reduce fraud-related costs. The


application’s predictive capabilities combine different techniques, mixing in a single risk
score the business user experience with predictive modeling techniques and anomaly
detection models.

Fraud Detection screens all claims procedures, loan applications and product purchase
procedures allocating a risk score to each that enables the fraud manager to set up alert
logics for receiving signals based on its control objectives.
Users can assess all procedures based on certain business rules that are specific to
individual industries such as insurance, consumer credit and lending products. Through
predictive analytics, users can define fraud prediction models based on past cases of
overt fraud and, even more so, on cases deemed suspect, thereby capitalizing on the
value of all available information.
Finally, the Fraud Detection application offers SNA (social network analysis) to perform
exploratory analyses of dealers and counterparties and enable those in charge of fraud
to investigate and recognize abnormal or fraudulent networks.
The application produces risk scores that may relate to individual customers or other
actors in the chain, such as branch offices, agents and liquidators, in the case of
insurance. The application identifies fraud by analyzing real-time data that are produced
every day through transactions and customer interactions with the company. Fraud
Detection is able to handle big data as in the case of data from the “black boxes”
installed in cars from insurance companies.
The application interacts with company core processes through alerts and reports or
triggers that can activate and / or modify the behavior of business users involved in the
process, such as a bank counter operator or insurance liquidator.
In this way, advanced analytics benefits spread to all levels of users, even those without
skills needed for using complex analytical tools.
Key features
 Real-time checking and scoring
 Alerts for claims investigators
 Alert, email and report management based on deterministic rules
 Mapping of actions (initiation of disputes, inspections)
 Overview and breakdown features for anomalies detected / risk level
 Deterministic rules to identify fraudulent behavior, false claims and risky subjects
(customers, employees, companies, third parties)
 Predictive fraud detection algorithms to improve accuracy in each risk scoring
activity
 Specific rules based on process, branch and claim type
 Risk scores based on a risk matrix: best of predictive, best of industry knowledge
 Anomaly detection for potentially fraudulent patterns
 Predictive models for customers or transaction risk score
 Profiling of the types of actions and relationships between subjects via Social
Network Analysis
 Analysis of relationships between counterparties to identify fraudulent networks
and collusions
ONLINE ADVERTISING

Online advertising is one of the most effective ways for businesses of all sizes to
expand their reach, find new customers, and diversify their revenue streams.

ETHICS OF DATA MINING

The core idea of data mining is about analyzing large complex databases and
identifying useful patterns, trends, and information in the unorganized data. This is
accomplished by software programs and machine learning algorithms. Data mining has
been successfully used by retail, marketing, e-commerce, healthcare, and other
business organizations. In the business sector like marketing, e-commerce, and retail
data mining are used to analyze customer behavior to predict trends thereby enhancing
a company's revenue or profits. In the healthcare sector, data mining is used for storing
patient data, for reducing costs and other health-related processes. The insurance
sector has begun using data mining for customer data storage and analysis.
Governmental agencies are well-known to use data mining for accessing and storing
large quantities of individual information for the purposes of national security.

Ethical implications for businesses using data mining are different from legal
implications. Performing a theft is defined as illegal, but even thinking of trying to
attempt a theft is termed unethical. Hence, the concerns among public is that when
companies even attempt to use their shopping information or other data to target them
back with more products, they consider it unethical. But despite this, ethics surrounding
data mining is a gray area. The entire technology cannot be considered good or bad
since it has many useful advantages for the public good too.

With the rise of data mining applications to various sectors, there is an equivalent rise in
concerns about the ethics of mining customer data for the motive of profit. The process
of mining data by companies is not going to reduce in the future; rather it is going to
increase with more organizations accessing computer power.

One of the most often cited issue with mining personal data is when the information
mined from an individual's consumption behavior is used to market more products and
services to that individual. Here companies appear to focus on the philosophy that if
more data is mined then sales of products will automatically increase. While this may be
true to some extent, it can severely conflict with customers. Some examples of such
conflicts are listed below:
 A teenage girl searches a company's website that sells baby products. And the
data mining application of the company immediately tracks the customer
information and sends baby products addressed to the teenage girl. This can
cause embarrassment to the girl and her family. A prime example is the 2012
Target store incident.
 A person who has lost his/her legs might simply have browsed online for shoes
out of curiosity or a desire to see shoes. If a company were to send him/her
information about shoes, he/she might be pained at receiving it.

Another area of concern is the ethical use of data mining applications in the healthcare
industry. Patient information is required by law to be gathered only with complete
consent by the patient. And such information can be accessed or used by research
companies only after many levels of security checks. Despite the regulations on paper
and the agencies implementing, some organizers perform unethical mining of data
without any consent or approval in order to discover a new product that might fetch high
revenue.

The solution to the varied forms of ethical concerns of data mining by businesses is for
companies to maintain transparency in mining data and being accountable for any
breaches of privacy. They must be proactive in implementing the above two aspects in
order to ameliorate customers that their personal data is not being misused and that the
data is secure.

REFERENCES:

Maisel, L (2014) : Predictive business analytics : forward-looking capabilities to improve


business performance.

Liebowitz, J. (2014), Business Analytics: an introduction

Evans, J (2013) Business analytics: methods, models, and decisions

Potrebbero piacerti anche