Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
com/en-us/library/ms175428
Scenario
The Lift Chart tab displays a graphical representation of the change in lift that a mining model causes. For example, the marketing department at Adventure Works Cycles wants to create a targeted mailing campaign. From past campaigns, they know that a 10 percent response rate is typical. They have a list of 10,000 potential customers stored in a table in the database. Therefore, based on the typical response rate, they can expect 1,000 of the potential customers to respond. However, the money budgeted for the project is not enough to reach all 10,000 customers in the database. Based on the budget, they can afford to mail an advertisement to only 5,000 customers. The marketing department has two choices:
Randomly select 5,000 customers to target Use a mining model to target the 5,000 customers who are most likely to respond
If the company randomly selects 5,000 customers, they can expect to receive only 500 responses, based on the typical response rate. This scenario is what the random line in the lift chart represents. However, if the marketing department uses a mining model to target their mailing, they can expect a larger response rate because they can target those customers who are most likely to respond. If the model is perfect, it means that the model creates predictions that are never wrong, and the company could expect to receive 1,000 responses by mailing to the 1,000 potential customers recommended by the model. This scenario is what the ideal line in the lift chart represents. The reality is that the mining model most likely falls between these two extremes; between a random guess and a perfect prediction. Any improvement from the random guess is considered to be lift.
meaning that the customer purchased a bike or is likely to do so. The lift chart thus shows the improvement the model provides when identifying customers who are likely to buy a bike. In addition to the basic model, the chart includes a related model that has been filtered to target specific customers. You can add multiple models to a lift chart, as long as the models all have the same predictable attribute. This filter restricts the cases used in both training and evaluation to customers who are under the age of 30. As a result, the number of cases that the model is evaluated against differs for the basic model and the filtered model. This point is important to remember when you interpret the prediction results and other statistics.
The x-axis of the chart represents the percentage of the test dataset that is used to compare the predictions. The y-axis of the chart represents the percentage of predicted values. The diagonal straight line, shown here in blue, appears in every chart. It represents the results of random guessing, and is the baseline against which to evaluate lift. For each model that you add to a lift chart, you get two additional lines: one line shows the ideal results for the training data set if you could create a model that always predicted perfectly, and the second line shows the actual lift, or improvement in results, for the model. In this example, the ideal line for the filtered model is shown in dark blue, and the line for actual lift in yellow. You can tell from the chart that the ideal line peaks at around 40 percent, meaning that if you had a perfect model, you could reach 100 percent of your targeted customers by sending a mailing to only 40% of the total population. The actual lift for the filtered model when you target 40 percent of the population is between 60 and 70 percent, meaning you could reach 60-70 percent of your targeted customers by sending the mailing to 40 percent of the total customer population. The Mining Legend contains the actual values at any point on the curves. You can change the place that is measured by clicking the vertical gray bar and moving it. In the chart, the gray line has been moved to 30 percent, because this is the point where both the filtered and unfiltered models appear to be most effective, and after this point the amount of lift declines. The Mining Legend also contains scores and statistics that help you interpret the chart. These results represent the accuracy of the model at the gray line, which in this scenario is positioned to include 30 percent of the overall test cases.
Series, model
Predict probability
Targeted mailing all Targeted mailing under 30 Random guess model Ideal model for: Targeted mailing all Ideal model for: Targeted mailing under 30
0.71 0.85
61.38% 46.62%
From these results, you can see that, when measured at 30 percent of all cases, the general model (Targeted mailing all) can predict the bike buying behavior of 47.40% of the target population. In other words, if you sent out a targeted mailing to only 30 percent of the customers in your database, you could reach slightly less than half of your target audience. If you used the filtered model, you could reach about 51 percent of your targeted customers. The value for Predict probability represents the threshold required to include a customer among the "likely to buy" cases. For each case, the model estimates the accuracy of each prediction and stores that value, which you can use to filter out or to target customers. For example, to identify the customers from the basic model who are likely buyers, you would use a query to retrieve cases with a Predict probability of at least 61 percent. To get the customers targeted by the filtered model, you would create query that retrieved cases that met all the criteria: age and a PredictProbability value of at least 46 percent. It is interesting to compare the models. The filtered model appears to capture more potential customers, but when you target customers with a prediction probability score of 46 percent, you also have a 53 percent chance of sending a mailing to someone who will not buy a bike. Therefore, if you were deciding which model is better, you would want to balance the greater precision and smaller target size of the filtered model against the selectiveness of the basic model. The value for Score helps you compare models by calculating the effectiveness of the model across a normalized population. A higher score is better, so in this case you might decide that targeting customers under 30 is the most effective strategy, despite the lower prediction probability.
You can click in the chart to move the vertical gray bar, and the Mining Legend displays the percentage of cases overall, and the percentage of cases that were predicted correctly. For example, if you position the gray slider bar at the 50 percent mark, the Mining Legend displays the following accuracy scores. These figures are based on the TM_Decision Tree model created in the Basic Data Mining Tutorial.
Series, model
TM_Decision Tree Ideal model
Score
0.77
Target population
40.50% 50.00%
Predict probability
72.91%
This table tells you that, at 50 percent of the population, the model that you created correctly predicts 40 percent of the cases. You might consider this a reasonably accurate model. However, remember that this particular model predicts all values of the predictable attribute. Therefore, the model might be accurate in predicting that 90 percent of customers will not buy a bike.
Note
The prediction accuracy for all discrete values of the predictable attribute is shown in a single line. If you want to see prediction accuracy lines for any individual value of the predictable attribute, you must create a separate lift chart for that value. Back to Top
See Also
Concepts Validating Data Mining Models (Analysis Services - Data Mining) Profit Chart (Analysis Services - Data Mining) Classification Matrix (Analysis Services - Data Mining) Scatter Plot (Analysis Services - Data Mining) Cross-Validation Report (Analysis Services - Data Mining) Other Resources
4