Sei sulla pagina 1di 2

Exploring numeric variables Create Plot

0000B7, 008E00, 000002


Display Summary boxplot(evals$score)
summary(evals) hist(evals$score, xlab = "Score")
summary(evals[c("ID", "bty_avg")])
ID bty_avg
Min. : 1.0 Min. :1.667
1st Qu.:116.5 1st Qu.:3.167
Median :232.0 Median :4.333
Mean :232.0 Mean :4.418
3rd Qu.:347.5 3rd Qu.:5.500 Feature Plot (Library caret)
Max. :463.0 Max. :8.167 BostonHousing(mlbleach)
featurePlot(x = BostonHousing[,c("age","lstat","tax")], y
Get Quantile =BostonHousing$medv, plot = "scatter", layout = c(3,1))
quantile(evals$score)
0% 25% 50% 75% 100%
2.3 3.8 4.3 4.6 5.0
quantile(evals$score, probs = c(0.01, 0.99))
1% 99%
2.7 5.0
quantile(evals$score, seq(from = 0, to = 1, by = 0.20))
0% 20% 40% 60% 80% 100%
2.3 3.7 4.1 4.4 4.7 5.0

Measuring spread – variance and standard deviation


variance

The average of the squared differences between each value


and the mean value. featurePlot(x = iris[, 1:4], y = iris$Species, plot = "box",
scales = list(y = list(relation="free"), x = list(rot = 90)),
var(evals$score) layout = c(4,1 ), auto.key = list(columns = 2))
[1] 0.2957886

sd(evals$score)
[1] 0.5438645

The standard deviation is the square root of the variance, and


is denoted by sigma as shown in the following formula

Exploring categorical variables

table(evals$ethnicity)
minority not minority
64 399
Calculate table proportion
prop.table(table(evals$ethnicity))
minority not minority
0.1382289 0.8617711

prop.table(table(evals$ethnicity)) * 100
minority not minority
13.82289 86.17711
round(prop.table(table(evals$ethnicity)) * 100, digits = 1)
minority not minority
13.8 86.2

Classification using Nearest Neighbors


Rescaling features for kNN is min-max normalization

Result in value form 0 -1.

z-score standardization

Converting variable to factor and changing the label


table(br_di$diagnosis)
B M
357 212

br_di$diagnosis<- factor(br_di$diagnosis, levels = c("B","M"), labels =


c("Bengin","Malignant"))

table(br_di$diagnosis)
Bengin Malignant
357 212

round(prop.table(table(br_di$diagnosis)) *100 , digits = 1)


Bengin Malignant
62.7 37.3

Potrebbero piacerti anche