Sei sulla pagina 1di 6

Summary of the lab:

This lab deals with SAP Predictive Analysis, an effective data mining technique. This kind of
predictive analysis is one of the new features of SAP Business Intelligence (BI) which is
different from the SAP Business Warehouse (BW). We upload an MS Excel file of 2201
records called Titanic File which contains data of all the passengers who had travelled in
the Titanic ship where only some of them survived. The associations among the items were
gathered by taking Survived column as the input. An association analysis has been
performed using R-Apriori algorithm for the survivor data. The survival chances of
passengers are calculated using the columns Class, Sex and Age. After running the
algorithm, we can also visualize the rules by clicking on the Visualize option and selecting an
appropriate chart by using the Charts option. We can also export the association rules to an
output file in CSV format.
Lessons learned:
I have learned the following things using this lab:
We can perform Predictive analysis using SAP Predictive Analysis tool.
This is a new tool using which we can perform analysis other than SAP BW.
An input file has to be uploaded on the analysis tool to determine the association rules
using properties such as Confidence, Support, etc.
We can visualize the generated rules using advanced chart level options provided by SAP
Predictive Analysis.
We can also predict any column without mentioning any key fields.
Files can be exported in various formats using the Export option.
Comparison between SAP Association Analysis with BW in Lab 5 and Predictive Analysis
from this lab:
Similarities:
Both uses excel as input file for conducting analysis.
Differences:
SAP Association analysis is done within the SAP BW environment while Predictive analysis is
done in a separate SAP Predictive Analysis environment.
We obtain different number of rules in Association analysis by changing the value of N,
while in Predictive analysis we perform by changing the properties of R-Apriori Algorithm.
In SAP Predictive Analysis we can visualize the rules that are generated, whereas
Lab Questions

Lab question # 1: What can you tell about this data file? If you are asked to perform some
sort of data mining task on it what would you do (i.e. what kind of data mining task you
may be able to perform?
The data file gives the list of passengers who travelled during the Titanic crash. We have a
total of 2201 records. There are a total of 5 columns Passenger, Class, Sex, Age and
Survived. The passenger field gives the passenger Id, class field gives the details of the class
in which the passengers have booked the journey (1st class, 2nd class or 3rd class), sex field
implies their gender (male or female), age field informs whether the passenger is an adult or
child and survived field explains whether the passenger has survived the crash (yes or no).
For the data mining task, we can perform Classification analysis on the given data file. The
chances of survival on the basis of Age, Gender and Class can be predicted using this
analysis.
Lab question # 2: Recall the parameters that we have set for this analysis including the
confidence and support that we have chosen. (i.e. screenshot # 2), what is the purpose of
the analysis?
The basic purpose of this analysis is to show the list of association rules where Support is
higher than 1% (0.01) and Confidence is higher than 80% (0.80). All possible association
rules are created only for the 4 items Class, Age, Sex and Survive.
Lab question # 3: Pick the second rule from the table i.e. {Class=2nd}=> {Age=Adult}, what
can we tell about the rule?
This rule explains the association of a person travelling 2nd class, that the person will be an
adult with 92% confidence. In other words, we can also say that the person is a kid (not an
adult) with 8% confidence. Among all passengers, we have 12% of people who are adults
and travel in 2nd class.
Lab question # 4: What are the rules that we have received? Explain the rule which has
100% confidence.
1. A male passenger travelling in the 3rd class cannot survive with a support of 19% and
confidence of 83%.
2. A male passenger travelling in the 2nd class cannot survive with a support of 7% and
confidence of 86%.
3. A Female passenger travelling in the 1st class can survive with a support of 6% and
confidence of 97%.
4. A Female passenger travelling in the 2nd class can survive with a support of 4% and
confidence of 88%.

5. A child travelling in the 2nd class can survive with a support of 1% and confidence of
100%.
6. A male adult passenger travelling in the 3rd class cannot survive with a support of 18%
and confidence of 84%.
7. A male adult passenger travelling in the 2nd class cannot survive with a support of 7% and
confidence of 92%.
8. A female adult passenger travelling in the 1st class can survive with a support of 6% and
confidence of 97%.
9. A female adult passenger travelling in the 2nd class can survive with a support of 4% and
confidence of 86%.
Among these 9 rules, Rule 5 has 100% confidence where the rule implies that a child
travelling in 2nd class could survive the crash with a support of 1% (0.01) and a confidence
of 100% (1.00).
Lab question # 5: According to the rules who are the persons to survive from the titanic
tragedy?
1. All children who travelled in 2nd class have survived the crash.
2. Most of the female passengers who had travelled in 1st and 2nd classes have survived the
crash Average support of 5% and an average confidence of 92.5%.
3. Most of the female adult passengers who had travelled in 1st and 2nd classes have
survived the crash Average support of 5% and an average confidence of 91.5%.

Screenshots
Screenshot # 1

The above screenshot displays the analysis worksheet where R-Apriori algorithm under
Associations receives its data from the titanic data source.

Screenshot # 2

The above screenshot displays the settings of R-Apriori dialog window for the Run
Association Analysis where a confidence of 0.8 is observed.
Screenshot # 3

The above screenshot displays the results of R-Apriori under the Component Selector,
where we have a table of association rules.

Screenshot # 4

The above screenshot displays the results of the rules having Rhs for Survived = (No, Yes).
Screenshot # 5

The above screenshot displays the bubble chart produced, which indicates the lift for that
rule and the colors indicate the various rules.

Potrebbero piacerti anche