Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
THE INFORMATION CONTAINED IN THIS PRESENTATION IS FOR INFORMATIONAL PURPOSES ONLY. IBM SHALL
NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS
PRESENTATION OR ANY OTHER DOCUMENTATION.
BAFPRED: Fundamentals of Predictive Analytics
IBM, the IBM logo, ibm.com, Cognos, SPSS and iLog are trademarks or registered trademarks of International
Introduction Business Machines Corporation in the United States, other countries, or both. If these and other IBM
trademarked terms are U.S. registered or common law trademarks owned by IBM at the time this information
was published. Trademarks may also be registered or common law trademarks in other countries. A current
list of IBM trademarks is available on the Web at “Copyright and trademark information” at
http://www.ibm.com/legal/copytrade.html. The IBM logo must not be moved, added to or altered in any way.
Other company, product, or service names may be trademarks or service marks of others.
IBM Global Center for Smarter Analytics © 2013 IBM Corporation IBM Global Center for Smarter Analytics © 2013 IBM Corporation
Course Description
This course is designed to introduced to students the fundamentals
of predictive analytics. Predictive analytics allows voluminous data
to be used for prediction, classification and association making it a
very useful tool for projections, forecasts and correlations.
BAFPRES: Fundamentals of Predictive Analytics
Course Overview
4 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 5 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
6 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 7 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
1
12/14/2014
8 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 9 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
IBM Global Center for Smarter Analytics © 2013 IBM Corporation IBM Global Center for Smarter Analytics © 2013 IBM Corporation
12 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 13 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
2
12/14/2014
14 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 15 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
LongDista
Internatio PAY_MTH LocalBillT nceBillTyp Est_Inco
ID SEX STATUS CHILDREN STATUS
Does this mean that a new ID LONGDIST nal LOCAL DROPPED D ype e AGE SEX STATUS CHILDREN me Car_Owner STATUS
0 F M 2 Vol
3 F S 2 InVol subscriber who fits the 0 5.2464 7.5151 86.3278 0CH FreeLocal Standard 57F M 2 27535.3Y Vol
16 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 17 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
3 0 0 3.94229
MORE DATA may provide
0 CC
Intnl_disc
Budget ount
50 F S 2 64632.3 N InVol 3 0 0 3.94229 0 CC
Intnl_disc
Budget ount 50 F S 2 64632.3 N InVol
a more accurate
Intnl_disc Intnl_disc
4 5.55564 0 9.36347 1 CC Budget ount
68 F M 2 81000.9 N Vol 4 5.55564 0 9.36347 1 CC Budget ount 68 F M 2 81000.9 N Vol
8 14.0193 5.68043 29.8065 0 CC ANALYSIS. 34 M
Budget StandardS 0 87467.1 Y Current 8 14.0193 5.68043 29.8065 0 CC Budget Standard 34 M S 0 87467.1 Y Current
FreeLoca Intnl_disc FreeLoca Intnl_disc
10 13.664 2.95642 32.6381 0 CC l ount 60 M M 2 83220.6 N Vol 10 13.664 2.95642 32.6381 0 CC l ount 60 M M 2 83220.6 N Vol
But is it practical to do this
FreeLoca FreeLoca
11 0 0 1.41294 0 CC l Standard
84 F S 0 50290.7 N InVol 11 0 0 1.41294 0 CC l Standard 84 F S 0 50290.7 N InVol
MANUALLY?Intnl_disc Intnl_disc
13 0.281029 0 8.53692 0 CH Budget ount 28 F M 2 20850.4 N Vol 13 0.281029 0 8.53692 0 CH Budget ount 28 F M 2 20850.4 N Vol
FreeLoca FreeLoca
17 1.577 0 19.9808 0 CC l Standard 52 M S 0 84112.6 N Current 17 1.577 0 19.9808 0 CC l Standard 52 M S 0 84112.6 N Current
FreeLoca FreeLoca
20 0.452629 0 73.0122 0 Auto l Standard 88 F M 2 73865.9 Y Vol 20 0.452629 0 73.0122 0 Auto l Standard 88 F M 2 73865.9 Y Vol
FreeLoca FreeLoca
23 20.2946 0 76.0518 0 CC l Standard 76 F M 1 12309.6 N Vol 23 20.2946 0 76.0518 0 CC l Standard 76 F M 1 12309.6 N Vol
Intnl_disc Intnl_disc
42 8.86499 4.43676 43.6439 0 CH Budget ount 55 M S 2 85753.8 N Vol 42 8.86499 4.43676 43.6439 0 CH Budget ount 55 M S 2 85753.8 N Vol
4820 0 0 0.660288 0 CH Budget Standard 49 F M 0 68828.4 Y Vol 4820 0 0 0.660288 0 CH Budget Standard 49 F M 0 68828.4 Y Vol
IBM Global Center for Smarter Analytics © 2013 IBM Corporation IBM Global Center for Smarter Analytics © 2013 IBM Corporation
3
12/14/2014
CRISP-DM[8]
IBM Global Center for Smarter Analytics © 2013 IBM Corporation 21 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
Business Understanding[8]
Most important phase of data mining. Includes determining business objectives, situation
assessment, data mining goals and producing a project plan.
IBM Global Center for Smarter Analytics © 2013 IBM Corporation 23 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
• Data selection
24 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 25 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
4
12/14/2014
Modeling[8] Evaluation[8]
This is the phase where analysis methods are used to extract information from the data. Involves evaluating the data mining results. The key aim is to determine if there is some
Involves selecting modeling techniques, generating test designs, and building then critical business issues that has not been sufficiently considered.
assessing models.
26 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 27 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
• Predictor Selection
• Modeling
• Testing
28 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 29 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
• Predictor Extraction
• Model
• Prediction
30 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 31 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
5
12/14/2014
Data Mining and Machine Learning [1] Data Mining and Machine Learning [1]
Data in the world, in our lives seems to go on increasing Is about solving problems by finding patterns in data already present
Lying hidden in all this data is information, potentially useful information, that Useful patterns allow prediction of new data
is rarely made explicit or taken advantage of.
32 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 33 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
Data Mining and Machine Learning [1] Data Mining and Machine Learning [1]
What are the characteristics of customers who will stay with the Telephone Company What are the characteristics of customers who will stay with the Telephone Company
as subscribers? as subscribers?
Surveys?
Focus Group Discussions?
What do you need to answer this question? Or do you use actual data?
IBM Global Center for Smarter Analytics © 2013 IBM Corporation IBM Global Center for Smarter Analytics © 2013 IBM Corporation
IBM Global Center for Smarter Analytics © 2013 IBM Corporation IBM Global Center for Smarter Analytics © 2013 IBM Corporation
6
12/14/2014
Association model
Goal:
Identify what products are being sold together
Approach:
Use a data extract from a transactional system
Define which fields to use
Visualize relationship between products
Generate association model
Review results
Why?
Identify next likely purchase
Create bundles to increase $ value
IBM Global Center for Smarter Analytics © 2013 IBM Corporation IBM Global Center for Smarter Analytics © 2013 IBM Corporation
IBM Global Center for Smarter Analytics © 2013 IBM Corporation IBM Global Center for Smarter Analytics © 2013 IBM Corporation
Deployment
Data mining and text analytics
IBM Global Center for Smarter Analytics © 2013 IBM Corporation IBM Global Center for Smarter Analytics © 2013 IBM Corporation
7
12/14/2014
•Pertains to algorithms
44 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 45 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
Data Mining
46 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 47 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
Predictors,
Attributes,
Features,
Inputs
Label
Target,
Class,
Outputs
Learning
Instance,
Records
Witten, I. and Frank E. (2005). Data Mining Practical Machine Learning Tools and Techniques. Elsevier. Witten, I. and Frank E. (2005). Data Mining Practical Machine Learning Tools and Techniques. Elsevier.
48 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 49 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
8
12/14/2014
Sample Problem
50 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 51 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
Predictors Predictors
Target,
Class or
Label
14 14
Learning Instances
Instances
Rules Rules
What are the rules for deciding whether to play tennis or not?
Witten, I. and Frank E. (2005). Data Mining Practical Machine Learning Tools and Techniques. Elsevier. Witten, I. and Frank E. (2005). Data Mining Practical Machine Learning Tools and Techniques. Elsevier.
52 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 53 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
Witten, I. and Frank E. (2005). Data Mining Practical Machine Learning Tools and Techniques. Elsevier. Witten, I. and Frank E. (2005). Data Mining Practical Machine Learning Tools and Techniques. Elsevier.
54 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 55 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
9
12/14/2014
Witten, I. and Frank E. (2005). Data Mining Practical Machine Learning Tools and Techniques. Witten, I. and Frank E. (2005). Data Mining Practical Machine Learning Tools and Techniques.
Elsevier. Elsevier.
56 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 57 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
Unsupervised Learning
Learning instances are not labelled
Machine learning must be used to cluster similar instances
An expert will have to characterize the clusters later
Rules
Identify the different components
Witten, I. and Frank E. (2005). Data Mining Practical Machine Learning Tools and Techniques. Elsevier.
58 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 59 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
60 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 61 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
10
12/14/2014
62 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 63 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
64 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 65 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
Outliers
+ =
Data Machine Learning Algorithm
66 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 67 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
11
12/14/2014
Check-up
How are data mining, machine learning and predictive analytics related?
68 IBM Global Center for Smarter Analytics © 2013 IBM Corporation 69 IBM Global Center for Smarter Analytics © 2013 IBM Corporation
References
[1] :Witten, I. and Frank E. (2005). Data Mining Practical Machine Learning Tools and
Techniques. Elsevier.
12