Sei sulla pagina 1di 41

Chapter 2

Business Problems and Data


Science Solution

Fundamental concepts
An important Principle of data science is that data
mining is a process with fairly well-understood
stages.
Some involve the application technology, such as
the automated discovery and evaluation of patterns
from data, while others mostly require an analysts
creativity, business knowledge, and common sense.

Fundamental concepts
Since the data mining process breaks up the overall
task of finding patterns from data into a set of welldefined subtasks, it is also useful for structuring
discussions about data science.
This chapter introduces the data mining process,
but first we provide additional context by
discussing common types of data mining task.

From Business Problems to Data


Mining Tasks
Each data-driven business decision-making
problem is unique, comprising its own combination
of goals, desires, constraints, and even
personalities.
The solutions to the subtasks can then be
composed to solve the overall problem. Some of
subtasks are unique to the particular business
problem, but others are common data mining tasks.

From Business Problems to Data


Mining Tasks
Example: telecommunications churn problem( )
is unique to MegaTelCo:
Estimate from historical data the probability of a customer
terminating her contract shortly after it has expired.

Despite the large number of specific data mining algorithms


developed over the years, there are only a handful of
fundamentally different types of tasks these algorithms
address.
Individual( ):entity. Ex: a customer or a business.
Correlations( ):between a particular variable describing an
individual the company after their contracts expired.

Classification
Classification( ) and class probability
estimation attempt to predict, for each individual
in a population, which of a (small) set of classes
this individual belongs to.
Example question: Among all customers of
MegaTelCo, which are likely to respond to a given
offer?
Two classes: will respond and will not respond.

Scoring or class probability estimation


Score representing the Probability( quantification of
likelihood)

Regression
Regression( )(value estimation) attempts to
estimate or predict, for each individual, the
numerical value of some variable for that individual.
Example question: How much will a given
customer use the service?
Classification predicts whether something will
happen.
Regression predicts how much something will
happen.

Similarity matching
Similarity matching( ) attempts to
identify similar individuals based on data known
about them.
Ex: IBM is interested in finding companies similar to
their best business customer, in order to focus their
sales force on the best opportunities.
Recommendations

Clustering
Clustering( ) attempts to group individuals in a
population together by their similarity, but not
driven by any specific purpose.
Example question: Do our customers form natural
groups or segments?
Decision-making processes.

10

Co-occurrence grouping
Co-occurrence grouping( ) attempts to find
association between entities based on transactions
involving them.
Example question: What items are commonly
purchased together?
Ex: analyzing purchase records from a
supermarket.
Recommendation system

11

Profiling
Profiling( )(also as behavior description(
)) attempts to characterize the typical behavior o
an individual, group, or population.
Example question: What is the typical cell phone
usage of this customer segment?
Profiling is often used establish behavioral norms
for anomaly detection applications.
Fraud detection and monitoring for intrusions to computer
systems.

12

Link prediction
Link prediction( ) attempts to predict
connections between data items, usually by
suggesting that a link should exist, and possibly
also estimating the srength of the link.
Since you and Karen share 10 friends, maybe
youd like to be Karens friend?

13

Data reduction
Data reduction( ) attempts to take a large
set of data and replace it with a smaller set of data
that contains much of the important information in
the larger set.
Ex: a massive dataset on consumer movie-viewing
preferences may be reduce to a much smaller
dataset revealing the customer taste preferences.

14

Causal modeling
Causal modeling attempts to help us understand
what events or actions actually influence others.
Ex: consider that we use predictive modeling to
target advertisements to consumers.

15

Supervised vs. Unsupervised


Methods
Unsupervised: no specific purpose or target.
Do our customers nataturally fall into different group?

Supervised: specific target defined.


Can we find groups of customers who have particularly
high likelihoods of canceling their service soon after their
contracts expires?

16

Supervised vs. Unsupervised


Methods
Supervised data mining: there must be data on the
target.
Tow subclasses of supervised data mining:
Classification
Will this customer purchase service S1 if given incentive I?
Which service package( S1, S2 , or none) will a customer
likely purchase if given incentive I?

Regression
How much will this customer use the service

17

Supervised vs. Unsupervised


Methods
A vital part in the early stages of the data mining
process
To decide whether the line of attack will be supervised or
unsupervised.
If supervised, to produce a precise definition of a target
variable. This variable must be specific quantity that will
be the focus of the data mining.

18

Data Mining and Its Result


Distinction pertaining to mining data:
Mining the data to find patterns and build models.
Using the results of data mining.

Churn example.

19

Data Mining and Its Result

20

The Data Mining Process


Cross Industry
Standard
Process for
Data Mining
process.

21

Business Understanding
It is vital to understand the problem to solved.
A part of the craft where the analysts creativity
plays a large role.
The design team should think carefully about the
use scenario.

22

Data Understanding
The data comprise the available raw material from
which the solution will be built.
Estimating the costs and benefits of each data
source and deciding whether further investment is
merited.
Ex:
Credit card fraud
Medicare fraud

23

Data Preparation

24

Data preparation
Often proceeds along with data understanding.
Ex.
1. converting data to tabular format.
2. removing or inferring missing values.
3. converting data to different types.

25

Data preparation
Leaks
a variable collected in historical data gives information on
the target variable-information that appears in historical
data but is not actually available when the decision has to
be made.
Leakage must be considered carefully during data
preparation.

26

Modeling
Output of modeling is some sort of model or
pattern capturing regularities in the data.

27

Evaluation
Assess the data mining results rigorously and to
gain confidence that they are valid and reliable
before moving on.
Includes both quantitative and qualitative
assessments.

28

Deployment
Put into real use in order to realize some return on
investment.
The clearest cases of deployment involve
implementing a predictive model in some
information system or business process.

ex. Churn example

29

Deployment
The data mining techniques themselves are
deployment.
Two reasons
1. the world may change faster than the data science team
can adapt, as with fraud and intrusion detection.
2. a business has too many modeling tasks for their data
science team to manually curated each model individually.

30

Deployment
Can also be mush less technical
It is not necessary to fail in deployment to start the
cycle again. The Evaluation stage may reveal that
results are not good enough to deploy.

31

Implications for Managing the


Data Science Team
It is tempting - but usually a mistake - to view the
data mining process as a software development
cycle.
Software skills versus analytics skills

32

Other Analytics Techniques and


Technologies
Present six groups of related analytic techniques.
Comparisons and contrasts with data mining.
Data mining => automated search for knowledge,
patterns, or regularities from data.
Business analyst => to recognize what sort of
analytic technique is appropriate for addressing a
particular problem.

33

Statistics
Two different uses in business analytics.
1. used as a catchall term for the computation of particular
numeric values of interest from data.
2. denote the field of study that goes by that name.

34

Data Querying
A specific request for a subset of data or for
statistics about data, formulated in a technical
language and posed to a database system.
Differs fundamentally from data mining in that
there is no discovery of patterns or models.
Ex: select * from customers where age >45
sex = m and domicile = ne

and

35

Data Querying
On-line Analytical Processing (OLAP)
easy-to-use GUI to query large data collections
Data mining tools generally can incorporate new
dimensions of analysis easily as part of the
exploration.

36

Data Warehousing
Collect and coalesce data from across an enterprise,
often from multiple transaction-processing systems,
each with its own database.

37

Regression Analysis
This will involve estimating or predicting values for
cases that are not in the analyze data set.

38

Machine Learning and Data


Mining
A field of study arose as a subfield of Artificial
Intelligence, which was concerned with methods for
improving the knowledge or performance of an
intelligent agent over time.
KDD focused on concerns raised by examining realworld applications.

39

Answer Business Questions with


these Techniques
1. who are the most profitable customers?
2. Is there really a difference between the profitable
customers and the average customer?
3. But who really are these customers? Can I
characterize them?
4. Will some particular new customer be profitable?
How much revenue should I expect this customer to
generate?

40

summary
Data mining is a craft. As with many crafts, there is
a well-defined process that can help to increase the
likelihood of a successful result.
We will refer back to the data mining process
repeatedly throughout the book, showing how each
fundamental concept fits in.

41

THE END

Potrebbero piacerti anche