Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Df = number of variables
Multivariate outlier :
Case : 98, 36
Lakukan sort pada kolom “mah_2”
Dependent var : X19
Independent var : X6 – X18
Df = 13
Multivariate outlier :
Case : 98, 36
#check missing value for data
nrows = nrow(data)
ncomplete = sum(complete.cases(data))
ncomplete
#check structure data or variables
str(data)
library(ggplot2)
#Plotting the dependent variable distribution
pl1 <- ggplot(data, aes(y))
pl1 + geom_density(fill = "red", alpha = "0.7")
#Here we can see that the distribution looks similar to a half normal distribution.
#If we take a closer look, we can see that there is a sudden spike towards the right end of the graph.
#This might possibly be a sentinel value.
#A sentinel value is a special kind of bad numerical value: a value that used to represent “yes” or “not” or "other special cases in numeric data"
#One way to detect sentinel values is to look for sudden jumps in an otherwise smooth distribution of values.
#We can now take a look into the summary of "y" variable to confirm this
summary(data$y)
summary(data)
hist(data$age)
library(ggplot2)
pie(table(data$y))
pie(table(data$loan))