Sei sulla pagina 1di 28

DESCRIPTIVE STATISTICS

Hadyana Sukandar
LEARNING OBJECTIVES :

After studying this module, participants will be able :


1. Explain why summary indices are needed in medicine.
2. Compute the mean, median, and mode of a given set
of data (grouped and ungrouped).
3. Discuss the uses and limitations of the mean, median
and mode, and their relative merits and disadvantages
as summary indices of health data.
4. Select an appropriate measure of central tendency and
location for given application.
5. Explain the meaning and application of confidence
interval of an estimate of health indices
MEASURES OF CENTRAL TENDENCY
AND LOCATION

1. MEAN
2. MEDIAN
3. MODE
4. WEIGHTED MEAN
5. QUARTILE, DECILE AND PERCENTILE
6. GEOMETRIC MEAN
1. The mean is the arithmetic sum of the observations
divided by number of observations.

2. The median is the middle observation when the


observations are listed in increasing order. Thus it is
the (n+1)/2 th observation in the ordered series.

3. The mode is the value which occurs most frequently.


4. A weighted mean is a mean for which individual
values in the set are weighted, very often by their
respective frequencies.

5. The geometric mean is often used when the


observations are distributed asymmetrically with a
relatively small number of very large values as an
alternative to the mean. The geometric mean, Xg, is
the antilog of the average of the logarithms of the
values in the sample.
A. UNGROUPED DATA

5, 3, 9, 7, 1, 3, 6, 8, 2, 6, 6 (n = 11)

MEAN = 5+3+9+…6=56/11=5.1
MEDIAN (ARRAY : 1,2,3,3,5,6,6,6,7,8,9)
=6
MODE = 6
EXAMPLE WEIGHTED MEAN (GRAND MEAN):

VILLAGE NO.OF CHILDREN MEAN


AGE(MONTHS)
------------------------------------------------------------------------
1 54 58.6
2 52 59.5
3 49 61.2
4 48 62.5
5 48 64.5
-------------------------------------------------------------------------

WEIGHTED MEAN =
(54X58.6)+(52X59.5)+…(48X64.5)/251
= 61.2 MONTHS
GEOMETRIC MEAN
EXAMPLE : 5,100, 425, 700, 1000

XG = nx1.x2.x3.x4.x5 = 5  5x100x425x…x1000


or

Log (XG) =  LOG Xi /n = 11,1724/5 = 2,234


XG = 171.6
EXTENTION OF THE MEDIAN :
1. QUARTILE : DIVIDED THE ARRAY INTO 4 GROUPS

2. DECIL : DIVIDED THE ARRAY INTO 10 GROUPS

3. PERCENTILE : DIVIDED THE ARRAY INTO 100 GROUPS

THE UPPER OF THE QUARTILE (Q1, Q2 and Q3) :


[i ( n + 1 ) / 4] th observation in the array; i= 1,2,3

THE UPPER OF THE DECILE (D1, D2, …, D9) :


[i ( n + 1 ) / 10] th observation in the array; i= 1,2,…,9

THE UPPER OF THE PERCENTILE (P1, P2, …, P99) :


[i ( n + 1 ) / 100] th observation in the array; i= 1,2,…,99
EXAMPLE :

52, 56, 57, 60, 64, 66, 70, 75, 82, 86, 92, 99

THE UPPER LIMIT OF Q1 = 1(12 + 1)/4


= 3 ¼ th
OBSERVATION IN THE ARRAY.
 Q1 = 57 + ¼ ( 60 - 57 ) = 57 ¾
Q3 = ??
GROUP DATA :

EXAMPLE :

SERUM ALBUMIN VALUES IN G/LITRE OF BLOOD OF 50


WOMEN SEEN IN SURVEY ARE AS FOLLOWS :

42 41 42 44 44 36 38 41 42 44
42 39 49 40 45 32 34 43 37 39
41 39 48 42 43 33 43 35 32 34
39 35 43 44 47 40 39 42 41 46
37 49 41 39 43 42 47 48 51 52
USING SIX EQUAL INTERVALS :

Serum albumin (G/Litre) Number Observation

-------------------------------------------------------------
30 – 33 3
34 – 37 7
38 – 41 14
42 – 45 17
46 – 49 7
50 – 53 2
-------------------------------------------------------------
TOTAL 50
-------------------------------------------------------------
MEASURES OF VARIABILITY

1. RANGE
2. INTERQUARTILE RANGE
3. VARIANCE
4. STANDARD DEVIATION
5. COEFFICIENT OF VARIATION
COEFFICIENT OF VARIATION (CV) :

STANDARD DEVIATION
CV = ----------------------------------- X 100%
MEAN

NOTE :

THE COEFFICIENT OF VARIATION IS INDEPENDENT OF


UNITS USED. FOR THIS REASON IT IS USEFUL IN
COMPARING DISTRIBUTION WHERE UNITS MAY BE
DIFFERENT.
CONFIDENCE INTERVAL
Definitions :
1. Estimator :
a formula or process for using sample data to
estimate a population parameter
2. Estimate :
a specific value or range of values used to
approximate some population parameter
3. Point Estimate :
a single value (or point) used to approximate a
population parameter
The sample mean x is the best point estimate of
the population mean µ
4. Confidence Interval (or Interval Estimate) :
a range (or an interval) of values used to
estimate the true value of the population
parameter.

Lower # < population parameter < Upper #

As an example :
Lower # < µ< Upper #
5. Degree of confidence (level of confidence or
confidence coefficient) :
The probability 1 -  (often expressed as the
equivalent percentage value) that is the relative
frequency of times the confidence interval actually
does contain the population parameter, assuming that
the estimation process is repeated a large number of
times.

Usually 95 %, (= 5%) or 99 % (=1 %).


Interpreting a Confidence Interval for  (for example :
Systolic Blood pressure of medical student) :
100 <  < 140

We are 95 % confident that the interval from 100 to 140


actually does contain the true value of . This means
that if we were to select many different samples of
size 100 and construct the confidence interval, 95 % of
them would actually contain the value of the population
mean .

PARAMETER = STATISTIC  ITS ERROR


HYPOTHESIS

What is Hypothesis ?

Definition :

Hypothesis in statistics, is a claim or statement

about a property of population.


Components of formal hypothesis test :
- Null Hypothesis : H0
- Statement about value of population parameter
- Must contain condition of equality : = ; ; or 
- Test the null Hypothesis directly
- Reject H0 or fail to reject H0
Alternative Hypothesis
- Must be true if H0 is false
- ≠; < ; >
- Opposite of null

Note : If you are conducting a study and want to


use a hypothesis test to support your claim, the
claim must be worded so that it becomes the
alternative hypothesis.
LEVEL OF SIGNIFICANCE, 

The probability that the test statistic will fall in


the critical region when the null hypothesis is
actually true. Common choices are 0.05; 0.01
and 0.10.
TYPE ERROR : TYPE I AND TYPE II

Type I error : - the mistake of rejecting the null


hypothesis when it is true.
-  (alpha) is used to represent the
probability of a type I error.

Example : Rejecting a claim that the mean systolic


blood pressure is 110 mmHg when the mean really
does equal 110 mmHg.
Type II error : - the mistake of failing to reject the null
hypothesis when it is false.
-  (beta) is used to represent the
probability of a type II error.

Example : Failling to reject the claim that the mean


systolic blood pressure is 110 mmHg when the mean
really different from 110 mmHg.
Table Type I and Type II Errors

True State of Nature

The null hypothesis is The null hypothesis is


true false
Decision : Type I error Correct decision
(rejecting a true null
hypothesis)

Correct decision Type II error
(rejecting a falls null
hypothesis)

p value

- Probability of Obtaining a Test Statistic More


Extreme ( or  ) than Actual Sample Value
Given H0 Is True .
- Called Observed Level of Significance
- Used to Make Rejection Decision
* If p value  , Do Not Reject H0
* If p value < , Reject H0
Hypothesis Testing: Steps

Test the Assumption that the true mean SBP of


participants is 120 mmHg.

* State H0 H0 : µ = 120
* State H1 H1 : µ  120
* Choose   = 0.05
* Choose n n = 100
* Choose Test: Z, t, X2 Test (or p Value)
* Compute Test Statistic (or compute P value)
* Search for Critical Value
* Make Statistical Decision rule
* Express Decision
Example : One sample-mean Test

- Assumptions
- Population is normally
distributed
- State the null and
alternative hypotheses
H0: µ = µ0
H1: µ  µ0
- t test statistic :
sample mean  null value x   0
t 
standard error s
n

Potrebbero piacerti anche