Sei sulla pagina 1di 7

MINISTERE DE L ’ENSEIGNEMENT SUPERIEUR ET RECHERCHE SCIENTIFIQUE

ECOLE POLYTECHNIQUE DU TUNISIE

STATISTICAL ANALYSIS
Assignment #2

Realized by: Houcem Ben Salem

Academic year: 2019-2020


Exercise1:

a-

Results of the 2014 elections

b- We can simply plot the two bar charts with each bar chart representing one election where the y-
axes are identical in both the plots. But, if we want to compare the election results in one graph then
the best solution to compare between two different results of the election in different dates, is by
applying the difference between the percentage of each political party in the election.
Difference in results between the 2014 and 2009 elections

Exercise2:
a- It’s clear that the scale of X is continuous.

b-the table bellow contains all the important information that may help us in our
exercise specially to find the histogram

j [ej-1,ej) nj fj dj hj F(x)

1 [0,15) 19 19/55 15 19/825 19/55

2 [0,15) 17 17/55 15 17/825 36/55

3 [0,15) 6 6/55 15 6/825 42/55

4 [0,15) 5 5/55 15 5/825 47/55

5 [0,15) 4 4/55 15 4/825 51/55

6 [0,15) 2 2/55 15 2/825 53/55

7 [0,15) 2 2/55 6 2/825 1

c-

d-we can reproduce the histogram in R with function “hist”.


And to have a kernel density plot we have to fin the density then we can draw with function “plot”
that we gave us an empty plot so we can use function “polygon” which give us a full colored empty
fun
e-I’ve already calculated the empirical cumulative distribution function for this data in first
table in the last column.

f-We can use the function “plot.ecdf” which give us easily a plot of ECDF of the original data

g-To solve this question we can easily use course’s formula

(i) H (X ≤ 45) = F (45) = 42 55 ≈ 0.76.

(ii) H (X > 80) = 1 − F (80) ≈ 0.085

(iii) H (20 ≤ X ≤ 65) = F (65) − F (20) ≈ 0.43

Exercise 3:

a-

Score 1 2 3 4 5 6 7 8 9 10

Results 1 3 8 8 27 30 11 6 4 2

fj 1/100 3/100 8/100 8/100 27/100 30/100 11/100 6/100 4/100 2/100

Fj 1/100 4/100 12/100 20/100 47/100 77/100 88/100 94/100 98/100 1

In R:
b-from our table and thanks to our course’s formula we have:

F (3) = 12 % and F (9) = 98 %

c-the ECDF for the grouped data goes from (0, 0) to (1, 1)

applying another course’s formula: F(x) = F (e j−1) + fj/dj (x – ej-1)

F (3) = F (0) + 0.47 5 · (3 − 0) = 28.2 %

F (9) = F (5) + 0.53 5 · (9 − 5) = 89.4 %.

d-it’s clear that we have a huge difference between (b) and (c) because of the different formula
applied in each case ;in (c) we assumed that the values are uniformly distributed in each category
(each value occurs as often as each other value).But thanks to question (a) we know that there are
more values towards the central score numbers. So, we can conclude that the assumption(c) is
inappropriate.

Exercise4:

a-
b- after installing trdyverse package (install.packages("tidyverse") ) which contain ggplot2 we
can use it

c-

d-

Potrebbero piacerti anche