Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
The descriptive and predictive data mining techniques are used in data mining to mine
the types of patterns. The descriptive analysis is used to mine data and provide the
latest information on past or recent events. On the other hand, the predictive analysis
provides answers of the future queries that move across using historical data as the
chief principle for decisions.
Data mining tasks can be descriptive, predictive and prescriptive. Here we are just
discussing the two of them descriptive and prescriptive. In simple words, descriptive
implicates discovering the interesting patterns or association relating the data whereas
predictive involves the prediction and classification of the behaviour of the model
founded on the current and past data.
Comparison Chart
BASIS FOR
DESCRIPTIVE MINING PREDICTIVE MINING
COMPARISON
the past by analyzing stored the future with the help past data
data analysis.
mining
methods
ensure accuracy.
BASIS FOR
DESCRIPTIVE MINING PREDICTIVE MINING
COMPARISON
Descriptive analytics focuses on the summarization and conversion of the data into
meaningful information for reporting and monitoring. Furthermore, it permits to examine
the data in a detailed way so that it would be able to answer easily about “what has
happened?” and “what is happening?”. Clustering, summarization, association are the
techniques categorized under descriptive mining.
The primary objective of predictive mining is to predict future results instead of current
behaviour. It involves the supervised learning functions used for the prediction of the
target value. The methods fall under this mining category are the classification, time-
series analysis and regression. Data modelling is the necessity of the predictive
analysis, which works by utilizing some variables to anticipate the unknown future data
values for other variables.
Additionally, it also conducts the comparison among these supervised learning methods
for obtaining the prescience about the strength and weaknesses of each approach. This
whole process is performed to find out the most suitable method for extracting the
desired knowledge. The predictive analysis is used for providing information about
“what might happen?” and “why it might happen?”.
o what happened?
o where exactly is the problem?
o what is the frequency of the problem?
As against, predictive mining performs tasks like predictive modelling, forecasting,
simulation and alerts. These involve the result of questions like –
Supervised and Unsupervised learning are the machine learning paradigms which are
used in solving the class of tasks by learning from the experience and performance
measure. The supervised and Unsupervised learning mainly differ by the fact that
supervised learning involves the mapping from the input to the essential output. On the
contrary, unsupervised learning does not aim to produce output in the response of the
particular input instead it discovers patterns in data.
complexity
results
regression mining
Supervised learning method involves the training of the system or machine where the
training sets along with the target pattern (Output pattern) is provided to the system for
performing a task. Typically supervise means to observe and guide the execution of the
tasks, project and activity. But, where supervised learning can be implemented?
Primarily, it is implemented in the machine learning Regression and Cluster and Neural
networks.
Now, how do we train a model? The model is guided with the help of loading the model
with the knowledge, to facilitate the prediction of future instances. It uses labelled
datasets for the training. The artificial neural networks the input pattern train the network
which is also associated with the output pattern.
Unsupervised Learning model does not involve the target output which means no
training is provided to the system. The system has to learn by its own through
determining and adapting according to the structural characteristics in the input
patterns. It uses machine learning algorithms that draw conclusions on unlabeled data.
The unsupervised learning works on more complicated algorithms as compared to the
supervised learning because we have rare or no information about the data. It creates a
less manageable environment as the machine or system intended to generate results
for us. The main objective of the unsupervised learning is to search entities such as
groups, clusters, dimensionality reduction and perform density estimation.
1. Supervised learning technique deals with the labelled data where the output data
patterns are known to the system. As against, the unsupervised learning works
with unlabeled data in which the output is just based on the collection of
perceptions.
2. When it comes to the complexity the supervised learning method is less
complex while unsupervised learning method is more complicated.
3. The supervised learning can also conduct offline analysis whereas unsupervised
learning employs real-time analysis.
4. The outcome of the supervised learning technique is more accurate and reliable.
In contrast, unsupervised learning generates moderate but reliable results.
5. Classification and regression are the types of problems solved under the
supervised learning method. Conversely, unsupervised learning includes
clustering and associative rule mining problems.
Conclusion
Alongside increased risk associated with lending, banks have witnessed growing
fraudulent behavior. This behavior may be internal (by undisciplined staff) or external
(by fraudulent customers). In the insurance market, the incidence of fraudulent events
has grown, especially in certain geographical areas.
Overt fraud is known to be low, but suspect cases and claims that are resolved, for
example, by settlement between the counterparties, are significantly higher. Lack of
control over such events can lead to over time and (sometimes sizeable) losses.
Businesses do not have the right information needed to tackle a variety of fraudulent
situations. It is crucial for fraud managers to have as much information as possible to
spot fraudulent and new abnormal behavior early on, and to identify possible fraudulent
networks of people among counterparties, dealers, and other parties involved in the
business.
Fraud Detection screens all claims procedures, loan applications and product purchase
procedures allocating a risk score to each that enables the fraud manager to set up alert
logics for receiving signals based on its control objectives.
Users can assess all procedures based on certain business rules that are specific to
individual industries such as insurance, consumer credit and lending products. Through
predictive analytics, users can define fraud prediction models based on past cases of
overt fraud and, even more so, on cases deemed suspect, thereby capitalizing on the
value of all available information.
Finally, the Fraud Detection application offers SNA (social network analysis) to perform
exploratory analyses of dealers and counterparties and enable those in charge of fraud
to investigate and recognize abnormal or fraudulent networks.
The application produces risk scores that may relate to individual customers or other
actors in the chain, such as branch offices, agents and liquidators, in the case of
insurance. The application identifies fraud by analyzing real-time data that are produced
every day through transactions and customer interactions with the company. Fraud
Detection is able to handle big data as in the case of data from the “black boxes”
installed in cars from insurance companies.
The application interacts with company core processes through alerts and reports or
triggers that can activate and / or modify the behavior of business users involved in the
process, such as a bank counter operator or insurance liquidator.
In this way, advanced analytics benefits spread to all levels of users, even those without
skills needed for using complex analytical tools.
Key features
Real-time checking and scoring
Alerts for claims investigators
Alert, email and report management based on deterministic rules
Mapping of actions (initiation of disputes, inspections)
Overview and breakdown features for anomalies detected / risk level
Deterministic rules to identify fraudulent behavior, false claims and risky subjects
(customers, employees, companies, third parties)
Predictive fraud detection algorithms to improve accuracy in each risk scoring
activity
Specific rules based on process, branch and claim type
Risk scores based on a risk matrix: best of predictive, best of industry knowledge
Anomaly detection for potentially fraudulent patterns
Predictive models for customers or transaction risk score
Profiling of the types of actions and relationships between subjects via Social
Network Analysis
Analysis of relationships between counterparties to identify fraudulent networks
and collusions
ONLINE ADVERTISING
Online advertising is one of the most effective ways for businesses of all sizes to
expand their reach, find new customers, and diversify their revenue streams.
The core idea of data mining is about analyzing large complex databases and
identifying useful patterns, trends, and information in the unorganized data. This is
accomplished by software programs and machine learning algorithms. Data mining has
been successfully used by retail, marketing, e-commerce, healthcare, and other
business organizations. In the business sector like marketing, e-commerce, and retail
data mining are used to analyze customer behavior to predict trends thereby enhancing
a company's revenue or profits. In the healthcare sector, data mining is used for storing
patient data, for reducing costs and other health-related processes. The insurance
sector has begun using data mining for customer data storage and analysis.
Governmental agencies are well-known to use data mining for accessing and storing
large quantities of individual information for the purposes of national security.
Ethical implications for businesses using data mining are different from legal
implications. Performing a theft is defined as illegal, but even thinking of trying to
attempt a theft is termed unethical. Hence, the concerns among public is that when
companies even attempt to use their shopping information or other data to target them
back with more products, they consider it unethical. But despite this, ethics surrounding
data mining is a gray area. The entire technology cannot be considered good or bad
since it has many useful advantages for the public good too.
With the rise of data mining applications to various sectors, there is an equivalent rise in
concerns about the ethics of mining customer data for the motive of profit. The process
of mining data by companies is not going to reduce in the future; rather it is going to
increase with more organizations accessing computer power.
One of the most often cited issue with mining personal data is when the information
mined from an individual's consumption behavior is used to market more products and
services to that individual. Here companies appear to focus on the philosophy that if
more data is mined then sales of products will automatically increase. While this may be
true to some extent, it can severely conflict with customers. Some examples of such
conflicts are listed below:
A teenage girl searches a company's website that sells baby products. And the
data mining application of the company immediately tracks the customer
information and sends baby products addressed to the teenage girl. This can
cause embarrassment to the girl and her family. A prime example is the 2012
Target store incident.
A person who has lost his/her legs might simply have browsed online for shoes
out of curiosity or a desire to see shoes. If a company were to send him/her
information about shoes, he/she might be pained at receiving it.
Another area of concern is the ethical use of data mining applications in the healthcare
industry. Patient information is required by law to be gathered only with complete
consent by the patient. And such information can be accessed or used by research
companies only after many levels of security checks. Despite the regulations on paper
and the agencies implementing, some organizers perform unethical mining of data
without any consent or approval in order to discover a new product that might fetch high
revenue.
The solution to the varied forms of ethical concerns of data mining by businesses is for
companies to maintain transparency in mining data and being accountable for any
breaches of privacy. They must be proactive in implementing the above two aspects in
order to ameliorate customers that their personal data is not being misused and that the
data is secure.
REFERENCES: