Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
When the Titanic sank, 1502 of the 2224 passengers and crew got killed. One of the
main reasons for this high level of casualties was the lack of lifeboats on this
supposedly unsinkable ship.
Those that have seen the movie know that some individuals were more likely to
survive the sinking (lucky Rose) than others (poor Jack). In this course you wil apply
machine learning techniques to predict a passenger's chance of surviving using R.
Let's start with loading in the training and testing set into your R environment. You
will use the training set to build your model, and the test set to validate it. The data is
stored on the web as CSV files; their URLs are already available as character strings
in the sample code. You can load this data with the read.csv() function.
The training and test set that you've imported before are already available in the
workspace, as train and test. Which of the following statements is correct?
Rose vs Jack, or Female vs Male
How many people in your training set survived the disaster with the Titanic? To see
this, you can use the table() command in combination with the $-operator to select
a single column of a data frame:
# absolute numbers
table(train$Survived)
# proportions
prop.table(table(train$Survived))
If you run these commands in the console, you'll see that 549 individuals died (62%)
and 342 survived (38%). A simple prediction heuristic could thus be "majority wins":
you predict every unseen observation to not survive.
table(train$Sex, train$Survived)