Sei sulla pagina 1di 146

Descriptive Statistics

It a techniques used to organize, summarize,


categorize, classify, manipulate, and present a set of
data in a concise way to make suitable for .
Raw data are measurements or variables that have
not been organized, summarized or other wise
manipulated.
Objective of data organization, summarization
manipulation;
-To see the similarity and dissimilarity of objects.
-To see the important features of the collected data.
-To prepare data for summarization and analysis.
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By 1
8/12/2010
Minlikalew D. (B.Sc.)
Cont…d
Descriptive statistics include:
Frequency distribution.
Tables.
Graphs.
Numerical summary measures;
- Measures of central tendency.
- Measures of variability.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 2
Minlikalew D. (B.Sc.)
Cont…d
Before summarization, organization,
categorization/classification,
displaying/presentation, analyzation of data, we
need to know;
 The concept of data.
 The concept of variable.
 The concept of measurement and measurement
scale
Victory College, Faculty of Health Science, Department of
8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 3
Minlikalew D. (B.Sc.)
Cont…d
Data
 Is facts or information which helps for making
reasoning.
 Is a collection of observations on one or more
variables.
 Is raw material of statistics.
 Is information collected from the source.
 There are different criteria to classify data into
different groups.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 4
Minlikalew D. (B.Sc.)
A. Based on the nature of the variable in which the data is
collected;
I. Qualitative/Categorical/Non-number data: the data
collected on a qualitative variable and obtained by simple
possession of certain attribute or characteristics.
Example:
-Breast feeding status (exclusive, partial, and none).
-Whether the mother was employed (yes, no).
-Marital status (single, married, divorced, widowed).

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 5
Minlikalew D. (B.Sc.)
Cont…d
Nominal data: are categorical data where the order
of the categories is arbitrary. A good example is
race/ethnicity has values 1=White, 2=Hispanic,
3=American Indian, 4=Black, 5=Other. Note that
the order of the categories is arbitrary. Certain
statistical concepts are meaningless for nominal
data. For example it would be silly to ask what are
the mean and standard deviation are for
race/ethnicity.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 6
Minlikalew D. (B.Sc.)
Cont…d
Ordinal data: are categorical data where there is a logical
ordering to the categories. A good example is the Likert scale
that you see on many surveys: 1=Strongly disagree;
2=Disagree; 3=Neutral; 4=Agree; 5=Strongly agree. While
computation of a median is easily justified for ordinal data,
some statisticians have reservations about computing a mean
for ordinal data.
II. Quantitative/number data: the data collected on
quantitative variables and obtained by count or measurement.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 7
Minlikalew D. (B.Sc.)
Cont…d
Quantitative/number data Consist of both continuous
and discrete data type.
a.Continuous data: consist of both interval and ratio
data.
Interval data is continuous data where differences
are interpretable, but where there is no "natural"
zero. A good example is temperature in Fahrenheit
degrees. Ratios are meaningless for interval data. You
cannot say, for example, that one day is twice as hot
as another day.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 8
Minlikalew D. (B.Sc.)
Cont…d
Ratio data: are continuous data where both differences
and ratios are interpretable. Ratio data has a natural zero.
A good example is birth weight in kg.
The distinctions between interval and ratio data are subtle,
but fortunately, this distinction is often not important.
Certain specialized statistics, such as a geometric mean
and a coefficient of variation can only be applied to ratio
data.
b. Discrete data: quantitative data collected from discrete
variable.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 9
Minlikalew D. (B.Sc.)
Cont…d
B. Based on the source of data in which it is collected;
I. Primary Data: are those data, which are collected by the
investigator himself. Such data are original in character and
are mostly generated by census/sample survey conducted by
individuals or research institutions.
II.Secondary Data: are those data, which are collected from
secondary source, for example journals, reports,
government publications, publications of professionals and
research organizations.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 10
Minlikalew D. (B.Sc.)
Cont…d
Source of data
There are different sources of data on health and
health related conditions. These are;
 Health Surveys:
 Vital statistics:
 Health Service Records
 Census:

Victory College, Faculty of Health Science, Department


of Public Health Officer, Biostatistics Lecture Note
8/12/2010 11
Prepared By Minlikalew D. (B.Sc.)
Cont…d
Systems for collecting data
1.Regular system: Registration of events as they
become available.
2. Ad hoc system: A form of survey to collect
information that is not available on regular basis.
Data collection technique/methods
There are different methods of data collection. For
selection the appropriate method we need to
consider the following points.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 12
Minlikalew D. (B.Sc.)
Cont…d
Selection of data collection methods are based on;
 The nature of the investigation whether the study is
qualitative or quantitative.
 The resources available and its Relevance of the
information.
 Acceptability and Accuracy of the method.
 The research interest to focus on and cover on.
 Familiarization of the procedure.
 The characteristics of the study population are under the
influencing factors.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 13
Minlikalew D. (B.Sc.)
Cont…d
Based on the above selection point the methods are;
For qualitative data:-
1. Focus group discussion.
2. In-depth interview (unstructured/ semi-structured).
3. Observation(participant/non-participant)
4. Case studies.
5. Rapid appraisal techniques.
6. Nominal group techniques.
7. Delphi techniques and life histories.

Victory College, Faculty of Health Science, Department


of Public Health Officer, Biostatistics Lecture Note
8/12/2010 14
Prepared By Minlikalew D. (B.Sc.)
Cont…d
For quantitative data:-
1.Face-to-face and interview.
2.self-administered interview.
3.Postal or mail method and telephone interview.
4.Measuring height, length, weight, BMI, MUAC, chest circumference, head
circumference, blood pressure, Hgb, Hct.
5.Using available information (record review), e.g. mortality report, morbidity
report.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 15
Minlikalew D. (B.Sc.)
Cont…d
Decision-makers need information that is:
– Relevant,
– Timely,
– Accurate and
– Usable.

The following table shows comparison of different


data collection techniques in terms of advantage
and disadvantage.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 16
Minlikalew D. (B.Sc.)
Cont…d
Summary of each data collection technique

Technique Advantage Disadvantage


Using available information • Is inexpensive, because • Data is not always easily
data is already there. accessible.
• Permits examination of • Ethical issues concerning
trends over the past. confidentiality may
arise.
• Information may be
imprecise or incomplete.
• Data collection may not
be standardized.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 17
Cont…d
• Gives more detailed and • Ethical issues concerning
Observing context related information. confidentiality or privacy
• Permits collection of may arise.
information on facts not • Observer bias may occur
mentioned in the (observer may only notice
questionnaire. what interest him or her).
• The presence of the data
collector can influence the
situation observed.
• Thorough training of
research assistants is
required.

• Is suitable for use with • The presence of the


Interviewing illiterates. interview can influence
• Permits clarification of responses
questions. • Reports of events may be
• Has high response rate than less complete than
written questionnaires. information gained through
observations.

Victory College, Faculty of Health Science, Department of Public


Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 18
Minlikalew D. (B.Sc.)
Cont…d
Small scale flexible • Permits collection of • The interviewer may
interview data in depth inadvertently influence
information and the respondents.
exploration, • Open ended data is
spontaneous remarks by difficult to analyze.
respondents

Large scale fixed interview • Is easy to analyze • Important information


may be missed because
spontaneous remarks by
respondent are usually
not recorded or
explored.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By 19
8/12/2010
Minlikalew D. (B.Sc.)
Cont…d
• Less expensive. • Cannot be used with
Administering written • Permits anonymity illiterate
questionnaires and may result in respondents.
more honest • There is often a low
responses. rate of response
• Does not require • Questions may be
research assistants. misunderstood.
• Eliminates bias due
to phrasing
questions differently
with different
respondents.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 20
Minlikalew D. (B.Sc.)
Cont…d
Variable
It is a characteristic which takes different values in
different persons, places, or things. Any aspect of an
individual or object that is measured (e.g., BP) or
recorded (e.g., age, sex) and takes any value. There
may be one variable in a study or many.
E.g., A study of treatment outcome of TB.
Variables can be broadly classified into:
A. Categorical (or Qualitative).
B. Quantitative (or numerical variables).
Victory College, Faculty of Health Science, Department of
8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 21
Minlikalew D. (B.Sc.)
Cont…d
A. Categorical (or Qualitative)
 Variables that can be measured numerically but can be
divided in to different categories are called qualitative
or categorical variable.
 A variable that can’t assume a numerical value but can
be classified in to non-numerical categories according
to a set of rules.
 The notion of magnitude is absent or implicit.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 22
Minlikalew D. (B.Sc.)
Cont…d
The variable has only two categories are called binary
or dichotomous. E.g. Sex. The variable with more
than two categories are called polythumous. E.g.
Occupational status.

It can be;
1. Nominal: Variables with no inherent order or
ranking sequence, e.g. numbers used as names
(group 1, group 2...), gender, etc.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 23
Cont…d
2. Ordinal: Variables with an ordered series, e.g. "greatly dislike,
moderately dislike, indifferent, moderately like, greatly like". Numbers
assigned to such variables indicate rank/order only. The "distance"
between the numbers has no meaning.
B. Quantitative (or numerical variables)
 A variable that can assume numerical value and measured numerically.
Quantitative data measures either how much? or how many? of
something, i.e. a set of observations where any single observation is a
number that represents an amount or a count.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 24
Minlikalew D. (B.Sc.)
Quantitative variable has the notion of magnitude. It can
be;
1.Discrete
It can only have a limited number of discrete values
(usually whole numbers).
Characterized by gaps or interruptions in the values.
The values aren’t just labels, but are actual measurable
quantities.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 25
Minlikalew D. (B.Sc.)
Example:
 The number of episodes of diarrhoea a child has
had in a year. You can’t have 12.5 episodes of
diarrhoea.
 The number of accidents.
 The number of students in this class.
 The number of cars.
 E.t.c.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 26
Minlikalew D. (B.Sc.)
Cont…d
2. Continuous
It can have an infinite number of possible values in any given
interval.
Does not possess the gaps or interruptions
Example:
 Weight.
 Income.
 Age.
 Time. E.t.c.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 27
Minlikalew D. (B.Sc.)
3. Interval
Do not have a true zero. e.g. 88 degrees is not necessarily double the
temperature of 44 degrees.
Equally spaced variables. e.g. temperature. The difference between a
temperature of 66 degrees and 67 degrees is taken to be the same as
the difference between 76 degrees and 77 degrees.
4. Ratio variables
Variables spaced equal intervals with a true zero point, e.g. age.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 28
Minlikalew D. (B.Sc.)
Cont…d
5. Independent variable
It is a hypothesized cause or influence on a dependent
variable. This might be a variable that you control, like a
treatment, or a variable not under your control, like an
exposure.
6. Dependent variable
The variable that you believe might be influenced or
modified by some treatment or exposure or the variable
you are trying to predict. Sometimes the dependent
variable is called the outcome variable.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 29
Cont…d
The definition of dependent and independent variable
depends on the context of the study. For example
the variable that is dependent in one study may be
independent in the other study.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 30
Cont…d
Measurement and Measurement Scale
Measurement: the assignment of numbers or names to
objects or events according to a set of rules. All
measurements are not the same.
Measurement Scale: ways in which variables/numbers
are defined and categorized. It is talking about the
degree of precision of which a characteristics measured.
Depending on the nature of variable and set of rules
considered to measure variable, there are four scale of
measurements.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 31
Minlikalew D. (B.Sc.)
Cont…d
Each scale of measurement has certain properties which
in turn determines the appropriateness for use of
certain statistical analyses.
1.Nominal scale
The simplest and lowest/weakest strength level of
measurement scale than others, in which the values fall into
unordered categories or classes.
Uses names, labels, or symbols to assign each measurement
and numbers have NO meaning.
Measure always qualitative data.

Victory College, Faculty of Health Science, Department of Public


8/12/2010 Health Officer, Biostatistics Lecture Note Prepared By 32
Minlikalew D. (B.Sc.)
Cont…d
Characteristics to be fulfilled;
- Each categories should be mutually exclusive.
- Each categories should be exhaustive.
- The name or symbols can interchange with
out altering essential information.
Example: Blood type, sex, race, marital status, eye
color, type of tar, University attended, occupation,
residence, e.t.c.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 33
Minlikalew D. (B.Sc.)
Cont…d
2. Ordinal scale
Assigns each measurement to one of a limited
number of categories that are ranked in terms of
order.
The difference among categories are not
necessarily equal and often not even measurable.
Although non-numerical, can be considered to
have a natural ordering.
It is the next higher level of measurement.
It is used usually for qualitative data.
Victory College, Faculty of Health Science, Department of
8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 34
Minlikalew D. (B.Sc.)
It is subjective in its nature.
Many health care variables are ordinal in nature.
Example: Patient status, cancer stages, social class, Pain level,
dehydration status, Glasgow coma scale e.t.c.
3. Interval scale
Measured on a continuum and differences between any two
numbers on a scale are of known size.
It assign each measurement to one unlimited number of
categories.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 35
Minlikalew D. (B.Sc.)
It has no true zero point. “0” is arbitrarily chosen
and doesn’t reflect the absence of temp.
The distance between each value is equal and fixed
but the attribute is not equal.
It is used for truly quantitative data.
Examples: Body temperature in OF or OC, directions in
degrees, time of the day, IQ.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 36
Minlikalew D. (B.Sc.)
Cont…d
4. Ratio scale
Measurement begins at a true zero point and the
scale has equal space.
It is the highest level of measurement.
It has true zero point.
Used for purely quantitative data.
Examples: Height, weight, BP, e.t.c.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 37
Cont…d

Degree of precision in measuring

Nominal

Ordinal

Interval

Ratio
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 38
Minlikalew D. (B.Sc.)
Cont…d
Summary of each measurement scale

Nominal Ordinal Interval Ratio

People or objects People or objects Intervals between There is a rationale


with the same scale with a higher scale adjacent scale values zero point for the
value are the same value have more of are equal with scale.
on some attribute. some attribute. respect the attribute
being measured. Ratios are equivalent,
The values of the scale The intervals between e.g., the ratio of 2 to 1
have no 'numeric' adjacent scale values E.g., the difference is the same as the ratio
meaning in the way are indeterminate. between 8 and 9 is the of 8 to 4.
that you usually think same as the difference
about numbers. Scale assignment is by between 76 and 77.
the property of "greater
than," "equal to," or
"less than."

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 39
Minlikalew D. (B.Sc.)
Cont…d
Methods of Data Organization and Presentation
In most cases, useful information is not immediately evident from the
mass of unsorted data and it does not impart information.
Data organization: is making condensed information in a way that
will show patterns of variation clearly.
Precise methods of analysis can be decided up on only when the
characteristics of the data are understood. For the primary objective
of this different techniques of data organization are used.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 40
Cont…d
Objective of data organization
To see the similarity and dissimilarity of objects.
To see the important features of the collected data.
To prepare data for summarization and analysis.
The methods of organizing and presenting
(describing) data differ depending on the type of
data/variable whether it is numerical or categorical
that is organized and presented.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 41
Minlikalew D. (B.Sc.)
1.Describing categorical variables: It includes;
A. Table of frequency distributions
– Frequency
– Relative frequency
– Cumulative frequencies
B. Charts
– Bar charts
– Pie charts

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 42
Minlikalew D. (B.Sc.)
Cont…d
Frequency Distributions
• Frequency: It is the number of times each observation
(for individual data) or each class interval (for grouped
data) occurs.
 Frequency Distributions: is arrangement of data in a
table that shows the possible values of the data with the
corresponding frequency or class frequency. A simple
and effective way of summarizing categorical data is to
construct a frequency distribution table.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 43
Minlikalew D. (B.Sc.)
Cont…d
Advantages:
Data to be more easily appreciated.
To draw quick comparisons.
To arrange the data in the form of a table, or in one
of a number of different graphical forms.
Types of frequency distribution
I. Simple Frequency Distribution: a table
representing the frequency versus observations.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 44
Hospital stay (days) of 50 patients in a
In this table the number of medical ward (Hypothetical data)
days of hospital stay
Hospital stay (Days)(xi) Frequency (fi)(the number
represents the variable of patients

under consideration, 0 5
Number of persons 1 10
represents the
2 2
frequency, and the
whole distribution is 4 23
called simple frequency 5 5
distribution.
7 5

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 45
i. Array (Ordered Array)
 It is a serial arrangement of numerical data in an
ascending or descending order.
 It is the first step in organizing data.
 It is appropriate when the number of observation is
greater than 6 and less than 20.
 It enables to know quickly the smallest and the largest
measurement and the range in the observation.
 It is the simplest method.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 46
Minlikalew D. (B.Sc.)
Example: Raw data: 5, 6, 4, 9, 11, 0, 3, 8.
When these data are put in ordered array
0, 3, 4,5,6,8,9,11.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 47
E.g. Qualitative variables

ii. Categorical distribution


Non-numerical information Mothers plan No of Mother

can also be represented in a Exclusive breast 100


feeding
frequency distribution.
Replacement 50
Example: HIV positive mothers feeding
attended at ANC unit on their Mixed feeding 30
future plan for infant feeding.
Nursery 50

Total 230

Victory College, Faculty of Health Science,


8/12/2010 Department of Public Health Officer, Biostatistics 48
Lecture Note Prepared By Minlikalew D. (B.Sc.)
II. Groups Frequency Distribution
 It is the way of representing large sets of data in class
intervals.
 STEPS IN CONSTRUCTION OF GROUPED
FREQUENCY DISTRIBUTION
1.Choosing the classes. (1st Put data in ordered array).
2.Sorting (or tallying) of the data into these classes.
3.Counting the number of items in each class.
4.Displaying the results in the form of a chart or table.

Victory College, Faculty of Health Science, Department of Public


Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 49
Minlikalew D. (B.Sc.)
Cont…d
1. Choosing the classes.
 When data consisting of large number of observations
are divided in to certain groups that have defined
upper and lower limits, each group is called class.
 The size of the class is called class interval.
 Choosing the suitable classification involves;
a. Determining the appropriate number of class/class
interval.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 50
Minlikalew D. (B.Sc.)
Cont…d
The class/class interval are determined by;
I. Non-statistical method/ convenience method:-
choose class not fewer than 6 and more than 20. The
average is 15. The class less than 6 is much
summarized and causes loss of information, the
class greater than 20 does not meet the objective of
data organization. the exact number we use in a
given situation depends mainly on the number of
measurements or observations we have to group.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By
51
Minlikalew D. (B.Sc.)
Cont…d
II. Statistical method:- choose class by using sturges’s formula.

Where K = number of class intervals.


K = 1 + 3.322(logn)
n = number of observations.
Example: Sample size are 275, How many class interval is needed?

K=1+3.322(log275)
K= 1+3.322(2.433)=9

K = 1 + 3.322(logn)

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 52
Minlikalew D. (B.Sc.)
Cont…d
Note:
The Sturge’s rule should not be regarded as final, but should
be considered as a guide only. The number of classes
specified by the rule should be increased or decreased for
convenient or clear presentation.
Classes should be mutually exclusive and do not overlap.
We must make sure that the smallest and largest values fall
within the classification and none of the values can fall into
possible gaps.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 53
Minlikalew D. (B.Sc.)
Cont…d
b. Determine class width.
Class width denoted by “W” which is equal for each
class. R X max − X min
W= =
K K
Where W=Width of the class
R=Range
Xmax=the largest value in the observation.
Xmin=the lowest value in the observation.
K=the number of class.
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 54
Example:
– Leisure time (hours) per week for 40 college students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20 22 14 13
10 19 27 29 22 38 28 34 32 23 19 21 31 16 28 19 18
12 27 15 21 25 16
K = 1 + 3.22 (log40) = 6.32 ≈ 6
Maximum value = 38, Minimum value = 10
Width = (38-10)/6 = 4.66 ≈ 5

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 55
Cont…d
c. Determining true limit/class boundary.
Class limit: the smallest and largest values that can
go in to any class are regarded as its limits; they can
be either lower or upper class limits.
True limit/class boundaries are those limits, which
are determined mathematically to make an interval
of a continuous variable which is continuous in both
directions, and no gap exists between classes. The
true limits are what the tabulated limits would
correspond with if one could measure exactly.
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 56
Minlikalew D. (B.Sc.)
Cont…d
True limit/class boundaries used for smoothening of
the class intervals.
Obtained by subtract 0.5 from the lower and add it to
the upper limit. This is simple convention.
It can be lower or upper.
d. Determining class mark.
 Class mark denoted by “Xc”.
 It is the mid point of each classes. The formula is;
UTL + LTL
Xc =
2
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 57
Where Xc=class mark.
UTL=Upper True Limit.
LTL=Lower True Limit.
2. Sorting (or tallying) of the data into these classes.
 Tally mark are small vertical bars which are used in a
frequency table to represent the number of times a
particular event has appeared in the collected data. Against
a particular class is a particular value has occurred four
times, we put four tally marks (////) but for the fifth
occurrence we put a cross tally mark

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 58
Cont…d
(////) to give it a block of five. When it occurs for the
sixth time we put an other tally mark by leaving
space. If we use only continuous tally bars like(//////)
there may be confusion in counting and it may lead
to mistakes.
3.Counting the number of items in each class.
Relative frequency is the frequency of each class
interval (fi) divided by the total frequency (n). For
grouped data, n = ∑ fi

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 59
Minlikalew D. (B.Sc.)
Cont…d
Cumulative Frequencies when frequencies of two or more
classes are added up. Helps to find the total number of items
whose values are less than or greater than some value. It can be;
- Less than cumulative frequency distribution: Cumulative
frequency distribution, if we start the cumulation from the lowest
size of the variable to the highest size. The most common one.
- More than cumulative frequency distribution: If the
cumulation is from the highest to the lowest value.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 60
Minlikalew D. (B.Sc.)
Cumulative relative frequency: It is computed by
adding subsequent relative frequencies of interest. It
is also possible to calculate cumulative relative
frequency(frc) by dividing cumulative frequency(fc)
to total frequency (n) (i.e. frc =fc/n for each class).

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 61
Minlikalew D. (B.Sc.)
Cont…d
Exercise: Construct grouped frequency distribution. For the following data. Age of patients
(years) (n=60) in a diabetic clinic in Addis Ababa, January 2000 is
19,82,98,78,30,26,32,66,87,81,40,48,70,61,69,58,60,53,28,54,47,40,
80,56,36,53,65,28,90,95,45,32,34,36,20,62,51,20,17,26,70,81,39,63,
33,66,61,77,41,55,76,70,42,67,22,75,24,50,50,44.
Based on the above data construct a table that contains;
1.Class interval/Class. 6.Relative frequency
2.Class boundary. a. Less than relative frequency.
3.Class mark. b. Greater than relative frequency.
4.Tally mark. 7. Cumulative relative frequency.
5.Frequency. a. Less than crf.
b. Greater than crf.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 62
Cont…d
Statistical Tables
Statistical table is an orderly and systematic
presentation of numerical data in rows and columns.
o Rows are horizontal arrangements of data ,
and row heading is termed stub.
o Columns are vertical arrangement of data
and its heading is called caption.
Both simple and grouped frequency distributions can
be put in statistical tables.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 63
Minlikalew D. (B.Sc.)
Cont…d
Almost any quantitative information can be
organized into a table.
Tables are useful for demonstrating patterns,
exceptions, differences, and other relationships.
In addition, tables usually serve as the basis for
preparing more visual displays of data, such as
graphs and charts, where some of the detail may be
lost.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 64
Minlikalew D. (B.Sc.)
Cont…d
Parts of table
1. Table number:
– Serially numbered.
– Should be written in the center at the top.
2. Title:
– Should be written in the center at the top of the table below the
table number.
3. Caption:
– Refers to the name of the column heading.
– Is written at the center of the column.

Victory College, Faculty of Health Science, Department of Public


Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 65
Cont…d
4. Stub:
– Refers to the name of the raw heading.
– Written at the extreme left.
5. Body of the table:
– The numerical data expressed in the table.
– When the body is empty, it is called dummy table (table
shell) and the variables are termed dummy variables.
6. Head note:
– Short statement about all or major parts of the table.
– Written below the title in brackets.
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 66
Minlikalew D. (B.Sc.)
Cont…d
7. Foot note:
– If any clarification is needed about the parts of a table.
– Written at the bottom of the table.
– Indicate source of data.
The following structure shows the placements of
various parts of a table.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
67
8/12/2010 Minlikalew D. (B.Sc.)
Cont…d
Common Rules of Constructing Tables
Although there are no hard and fast rules to follow, the following
general principles should be addressed in constructing tables.
1. It should be as simple as possible.
2. It should be self-explanatory. To create a table
that is self-explanatory, follow the guidelines below:
I. Title should be clear and to the point.
II.Title should answer when & where it is done, & what it
explains about.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 68
Minlikalew D. (B.Sc.)
Cont…d
III. Precede the title with a table number.
IV. Label each row and each column clearly and
concisely and include the units of measurement for
the data. Limit the number of variables to three or
less.
V. Totals should be shown either in the top row and the
first column or in the last row and last column. If you
show percents (%), also give their total (always 100).
VI. Explain any code, abbreviation, or symbol, or
exclusion in a footnote.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 69
Cont…d
VII. Note the source of the data in a footnote if the data are
not original.
VIII. Put the title at the top of the table.
IX. Numerical entities of zero should be explicitly written
rather than indicated by a dash. Dashed are reserved for
missing or unobserved data.
X. In cross-tabulated data (variables put as row and column
headings), the dependent variable should be the column
heading and the independent variable should be the row
heading.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 70
Minlikalew D. (B.Sc.)
Cont…d
3. If the data shows a qualitative variable , the
observations are listed in alphabetical order or their
degree of importance.
4. If the data is time bound, classified by time of
occurrence, it should be arranged in chronological order.
It starts from the earlier to the latest or vise versa.
5. If the data represents places, it may be placed in
alphabetical order or in terms of geographic location.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By 71
8/12/2010
Minlikalew D. (B.Sc.)
Cont…d
Types of table
Based on the purpose for which the table is
designed and the complexity of the
relationship, a table could be either of;
A. Simple frequency table.
B. Cross tabulation.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 72
Minlikalew D. (B.Sc.)
Cont…d
A. Simple frequency table Example:- Table X: Overall
(one-way table): immunization status of children in
Adami Tullu Woreda, Feb. 1999.
• Is used when the Immunization Number Percent
status
individual observations
involve only to a single immunized 75 35.7
Not

variable. Partially 57 27.1


• The denominators for the immunized
Fully 78 37.2
percentages are the sum immunized
of all observed Total 210 100.0
frequencies.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 73
Cont…d
B. Cross tabulated:
Is used to obtain the frequency distribution of one
variable by the subset of another variable.
The decision for the denominator is based on the
variable of interest to be compared over the subset of
the other variable.
Could be two type;
I. Two-way table.
II. High order table.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 74
Cont…d
Example:-Table Y: TT immunization by marital status of the women of childbearing age, Addis Ababa town, 2006.

I. Two-way table:
Shows two variables/
Source: Mikael A. et al Tetanus Toxoid immunization coverage among women of child bearing age in Assendabo town; Bulletin of JIHS, 1996, 7(1): 13-20

characteristics and
is formed when
either the caption or
the stub is divided
into two or more
parts.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 75
Minlikalew D. (B.Sc.)
Cont…d
II. Higher Order Table: Example:-Table Z: Distribution of Health
Professional by Sex and Residence.

When it is desired to
represent three or more
characteristics/variables
in a single table.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 76
Cont…d
Diagrammatic representation of data
• Appropriately drawn graph allows readers to obtain
rapidly an overall grasp of the data presented.
• Well designed graphs can be incredibly powerful
means of communicating a great deal of information
using visual techniques.
• When graphs are poorly designed, they not only do
not effectively convey message, but also they often
mislead and confuse.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 77
Cont..d
Importance of Diagrammatic Representation
 Attractiveness.
 They help in deriving the required information in
less time and without any mental strain.
 They facilitate comparison.
 They show unsuspected events and let to action
 Memorization.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 78
Cont…d
Limitations of diagrammatic presentation:
• Fail to show slight differences.
• They are not accurate, provide approximate
information's .
• The are not suitable to all statistical data.
• They are not used when comparison is not necessary
or impossible.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 79
Minlikalew D. (B.Sc.)
Cont…d
General rules that are commonly accepted about
construction of graphs:
1.Self-explanatory and as simple as possible.
2.Titles are usually placed below the graph and it
should again question What? Where? When?.
3.Legends or keys should be used to differentiate
variables if more than one is shown.
4.The axes label should be placed to read from the left
side and from the bottom.

Victory College, Faculty of Health Science, Department of Public


Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 80
Cont…d
5. The units in to which the scale is divided should be
clearly indicated.
6. The numerical scale representing frequency must start at
zero or a break in the line should be shown.
The choice of the particular form among the different
possibilities will depend on personal choices and/or
the type of the data. Bar charts and pie charts: are
commonly used for qualitative or Histograms and
frequency polygons: are used for quantitative
continuous data.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 81
Cont…d
Common types of diagrammatic representations
1. Bar graph
 It is the easiest and most adaptable general-purpose
chart.
 Bar graph is especially satisfactory for nominal and
ordinal data.
 The heights of bars represent the value of the
frequency (actual number or percentage) for each
category.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 82
Cont…d
 The categories are represented on the baseline (x-
axis) at regular intervals and the corresponding
values frequencies or relative frequencies
represented on the Y-axis (ordinate) in the case of
vertical bar diagram and vis-versa in the case of
horizontal bar diagram.

Victory College, Faculty of Health Science, Department of Public


Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 83
Cont…d
Tips for constructing bar graph:
1. Whenever possible it is better to construct a bar diagram
on a graph paper
2. All bars drawn in any single study should be of the same
width.
3. Leave space between the different bars and should be
equal distances.
4. All the bars should rest on the same line called the base
on the x-axis.
5. Whenever possible, it is advisable to draw bars in order of
magnitude.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By 84
Minlikalew D. (B.Sc.)
Cont…d
6. Label both axes clearly.
7. The scale should be started from zero.
8. Use of divided bars is possible to show the
component parts.

Victory College, Faculty of Health Science, Department of


8/12/2010 Public Health Officer, Biostatistics Lecture Note Prepared By
Minlikalew D. (B.Sc.) 85
Cont…d
Types of bar graph Example:
A. Simple bar chart:
– It is a one-dimensional
diagram in which the bar
represents the whole of
the magnitude.
– The height or length of
each bar indicates the
size (frequency) of the
Fig. X: Distribution of pediatric patents in a
figure represented. hospital ward by type of admitting
diagnosis in Hospital X, Jan 2000.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 86
Minlikalew D. (B.Sc.)
Cont…d
B. Double bar graph: Example:
 Used to depict two
variables.

Fig. Y: TT Immunization status by


marital status of women 15-49 years,
Asendabo town, 1996.
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 87
Cont…d
C. Multiple bar chart: Example:
– Represents the relationships
among more than two
variables.
– The component figures
(bars) are shown as separate
bars adjoining each other.
– The height of each bar
represents the actual value
of the component figure.
Fig. X’: Prevalence of cough in school
children by smoking history of children
and their parents, Town A Jan 2000.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 88
Cont…d
D. Sub-divided (component) bar graph:
It is also called segmented bar graph. If a given
magnitude can be split up into subdivisions, or if there
are different quantities forming the subdivisions of the
totals, simple bars may be subdivided in the ratio of
the various subdivisions to exhibit the relationship of
the parts to the whole. The order in which the
components are shown in a "bar" is followed in all
bars used in the diagram.
Are constructed when each total is built up from two
or more component figures.
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 89
Minlikalew D. (B.Sc.)
Cont…d
Sub-divided (component) bar Example:
graph are two types. These
are;
I. Actual Component
Bar Diagrams:
 When the over all height of
the bars and the individual
component lengths
represent actual figures. Fig.Y’: TT Immunization status by
marital status of women 15-49
years, Asendabo town, 1996.
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 90
Cont…d
II. Percentage Example:
Component Bar
Diagram:
 Where the individual
component lengths
represent the percentage
each component forms the
over all total.
Note that a series of such bars
Fig. Z: TT Immunization status by marital
will all be the same total status of women 15-49 years, Asendabo
height, i.e., 100 percent. town, 1996.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 91
Cont…d
2. Pie chart
Useful for qualitative or quantitative discrete data.
Shows a relative frequency for each by dividing a
circle into sectors so that the areas of the sectors are
proportional to the frequencies.
Appropriate for variables having six categories,
because the circle should not be divided more than six
sectors.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 92
Minlikalew D. (B.Sc.)
Cont…d
Methods of constructing Example:
pie-chart:
– Construct a frequency table
– Change the frequency in to
percentage (f/n).
– Change the percentage in
degrees.
Where degree = percentage ×
360
– Draw a circle and divide it Fig. X: Distribution of Cause of
accordingly death of females in England &
Wales,1999.
Victory College, Faculty of Health Science,
Department of Public Health Officer,
8/12/2010 Biostatistics Lecture Note Prepared By 93
Minlikalew D. (B.Sc.)
Cont…d
3.Histogram
Is a special kind of bar graph.
Useful for quantitative continuous data.
Is frequency distributions with continuous class
intervals that have been turned in to graphs.
The area of each rectangle represents the frequency
of the corresponding class intervals.
To avoid crowding, you can use class midpoints.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 94
Cont…d
In addition to simplifying Example:
complex data set,
histogram is important
in depicting the shape
(symmetric/skewed)
and location of central
tendency (“averages”)
of a frequency Source: Knapp RG, Miller MC III: Clinical Epidemiology and
biostatistics: The national Medical series for Independent study.
distribution of a Williams& Wilkins 1992 Baltimore, Maryland.

continuous distribution. f.g.Z: Distribution of the RBC cholinesterase


values (μmol/min/ml) obtained from 35
workers Exposed to Pesticides.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 95
Cont…d
4. Frequency polygon
To draw it connect the midpoints of the tops of the
adjacent rectangles (cells) of the histogram with line
segments a frequency polygon is obtained.
When the polygon is continued to the X-axis just out
side the range of the lengths the total area under the
polygon will be equal to the total area under the
histogram.
It is not essential to draw histogram in order to obtain
frequency polygon.
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 96
It can be drawn with out erecting rectangles of histogram as
follows:
Methods of constructing frequency polygon:
 The scale should be marked in the numerical values of the mid-
points of intervals.
 Erect ordinates on the midpoints of the interval - the length or
altitude of an ordinate representing the frequency of the class on
whose mid-point it is erected and join the tops of the ordinates
and extend the connecting lines to the scale of sizes.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 97
Cont…d
Example of frequency polygon Example of frequency polygon
drawn from histogram. drawn with out frequency
700
polygon.
600 A g e o f w o m e n a t th e tim e o f m a rria g e

500
40

400 35

300 30

200 25

No of women
100 Std. Dev = 6.13 20
Mean = 27.6
0 N = 2087.00 15
15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0
10
N1AGEMOTH
5

0
12 17 22 27 32 37 42 47

Fig. z:Frequency polygon for the ages of Age

Fig. z’:Frequency polygon for the ages of


2087 mothers with <5 children, Adami
women at the time of marriage.
Tulu, 2003.
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared By 98
8/12/2010
Minlikalew D. (B.Sc.)
Cont…d
5.Ogive Curve (The Cumulative Frequency Polygon)
Some times it may be necessary to know the number
of items whose values are more or less than a certain
amount. To get this information it is necessary to
change the form of the frequency distribution from a
‘simple’ to a ‘cumulative’ distribution.
Ogive curve turns a cumulative frequency
distribution in to graphs.
Are much more common than frequency polygons.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 99
Cont…d
To construct an Ogive curve:
I) Compute the cumulative frequency of the
distribution.
II)Prepare a graph with the cumulative frequency on the
vertical axis and the true upper class limits (class
boundaries) of the interval scaled along the X-axis
(horizontal axis). The true lower limit of the lowest
class interval with lowest scores is included in the X-
axis scale; this is also the true upper limit of the next
lower interval having a cumulative frequency of 0.
Victory College, Faculty of Health
Science, Department of Public
8/12/2010 Health Officer, Biostatistics Lecture 100
Note Prepared By Minlikalew D.
Cont…d
Example: Construct Ogive for Ogive Cumulative frequency curve
the data below.
Table.X:Heart rate of patients admitted to
Hospital D, 2000.

Fig.D: Heart rate (beat/minute) of patients


admitted to Hospital B ,2000.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 101
Minlikalew D. (B.Sc.)
Cont…d
Numerical Summary Measures
MCT (Measure of Central Tendency)
A frequency distribution is a general picture of the
distribution of a variable.
But, can’t indicate the average value and the spread
of the values.
On the scale of values of a variable there is a certain
stage at which the largest number of items tend to
cluster.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 102
Minlikalew D. (B.Sc.)
Cont…d
Since this stage is usually in the centre of distribution,
the tendency of the statistical data to get concentrated
at a certain value is called “central tendency”.
The various methods of determining the point about
which the observations tend to concentrate are called
MCT (Measure of Central Tendency).
The objective of calculating MCT is to determine a
single figure which may be used to represent the whole
data set.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 103
Minlikalew D. (B.Sc.)
Cont…d
In that sense it is an even more compact description
of the statistical data than the frequency distribution.
Since a MCT represents the entire data, it facilitates
comparison within one group or between groups of
data.
Characteristics of a good MCT:
A MCT is good or satisfactory if it possesses the
following characteristics;
1. It should be based on all the observations.
2. It should not be affected by the extreme values.
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 104
Cont…d
3. It should be as close to the maximum number of values as
possible.
4. It should have a definite value.
5. It should not be subjected to complicated and tedious calculations.
6. It should be capable of further algebraic treatment.
7. It should be stable with regard to sampling.
The three most common measures of central tendency are:
–Mean, Median, and Mode.

Victory College, Faculty of Health Science, Department of Public


Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 105
Cont…d
Arithmetic Mean
The arithmetic mean is the measure of central location
you are probably most familiar with.
It is the arithmetic average and is commonly called simply
“mean” or “average.”
In formulas, the arithmetic mean is usually represented as μ
for population mean and , read as “x-bar” for sample mean.
It is the sum of all the observations divided by the total
number of observations.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 106
Cont…d
General formula
a) Ungrouped mean
If x1 , x 2 , ..., x n are n observed values , then
n

i =1
∑x i

b) Grouped data n .
x =
In calculating the mean from grouped data, we assume that all
values falling into a particular class interval are located at the
mid-point of the interval. It is calculated as follows:

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 107
Cont…d
k

∑m f
i=1
i i
x= k

∑f i=1
i

where,
k = the number of class intervals.
mi = the mid-point of the ith class interval.
fi = the frequency of the ith class interval.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 108
Cont…d
Properties of the Arithmetic Mean:
• For a given set of data there is one and only one
arithmetic mean (uniqueness).
• Easy to calculate and understand (simplicity).
• Influenced by each and every value in a data set.
• Greatly affected by the extreme values (Sensitivity).
So, mean is an excellent measure of central
tendency when the distribution is symmetric
(normally or approximately normally distributed).

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 109
Cont…d
• Algebraic sum of the deviations of the given values
from their arithmetic mean is always zero (Center of
gravity).
• In case of grouped data if any class interval is open,
arithmetic mean can not be calculated.
• it is not appropriate for either nominal or
ordinal data.
• The sum of the squares of deviations from the
arithmetic mean is less than of those computed from
any other point.
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By 110
8/12/2010 Minlikalew D. (B.Sc.)
Cont…d
Advantages;
1) It is based on all values given in the distribution.
2) It is most early understood.
3) It is most amenable to algebraic treatment.
Disadvantages;
1) Overly sensitive to extreme values.
2) When the distribution has open-end classes, its
computation would be based assumption, and
therefore may not be valid.
3) Sometimes it may even look ridiculous (amazing).
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 111
Cont…d
Example 1: The heart rates for n=10 patients were as
follows (beats per minute):
167, 120, 150, 125, 150, 140, 40, 136, 120, 150
What is the arithmetic mean for the heart rate of
these patients?
Ans.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 112
Cont…d
Example 2:Compute the mean age of 169 subjects
from the grouped data.
Class interval Mid-point (mi) Frequency (fi) mifi
10-19 14.5 4 58.0
20-29 24.5 66 1617.0
30-39 34.5 47 1621.5
40-49 44.5 36 1602.0
50-59 54.5 12 654.0
60-69 64.5 4 258.0

Total __ 169 5810.5

Ans. Mean = 5810.5/169 = 34.48 years.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 113
Cont…d
Median
It is the middle value of an observation when the observations
are listed in an increasing or decreasing order.
a)Ungrouped data
The median is the value which divides the data set into two
equal parts.
If the number of values is odd, the median will be the middle
value when all values are arranged in order of magnitude with ½
of the observations being larger than the median value, and ½
smaller.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 114
Minlikalew D. (B.Sc.)
Cont…d
When the number of observations is even, there is no single
middle value but two middle observations. In this case the
median is the mean of these two middle observations, when
all observations have been arranged in the order of their
magnitude.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 115
Minlikalew D. (B.Sc.)
Cont…d

Victory College, Faculty of Health Science, Department of Public


Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 116
Minlikalew D. (B.Sc.)
Cont…d
b) Grouped data
In calculating the median from grouped data, we
assume that the values within a class-interval are
evenly distributed through the interval.
– The first step is to locate the class interval in
which it is located.
– Find n/2 and see a class interval with a minimum
cumulative frequency which contains n/2.
– Then, use the following formal.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 117
Minlikalew D. (B.Sc.)
Cont…d
n 
 −Fc 
~
x = Lm +2 W
 fm 
where,  
Lm = lower true class boundary of the interval containing the
median.
Fc = cumulative frequency of the interval just above the median
class interval.
fm = frequency of the interval containing the median
W= class interval width.
n = total number of observations.
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 118
Cont…d
Example. Compute the median age of 169 subjects
from the grouped data.
Class interval Mid-point (mi) Frequency (fi) Cum. freq

10-19 14.5 4 4
20-29 24.5 66 70
30-39 34.5 47 117
40-49 44.5 36 153
50-59 54.5 12 165
60-69 64.5 4 169

Total 169

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 119
Cont…d
Ans.
n/2 = 169/2 = 84.5
n/2 = 84.5 = in the 3rd class interval
Lower limit = 29.5, Upper limit = 39.5
Frequency of the class = 47
(n/2 – fc) = 84.5-70 = 14.5

Median = 29.5 + (14.5/47)10 = 32.58 ≈ 33

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 120
Minlikalew D. (B.Sc.)
Cont…d
Properties of the median;
• There is only one median for a given set of data
(uniqueness).
• The median is easy to calculate.
• Median is a positional average and hence it is
insensitive to very large or very small values.
• Median can be calculated even in the case of open
end intervals.
• It is determined mainly by the middle points and
less sensitive to the remaining data points
(weakness).
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 121
Cont…d
• It is not a good representative of data if the number of
items is small.
• The median can be used as a summary measure for
ordinal, discrete and continuous data, in general however,
it is not appropriate for nominal data.
Advantages
1)It is easily calculated and is not much disturbed by
extreme values.
2)It is more typical of the series.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 122
Cont…d
3) The median may be located even when the data are
incomplete.
4) The median is more nearer to the reality and more
representative than the mean.
Disadvantages
1. The median is not so well suited to algebraic
treatment as the arithmetic, geometric and
harmonic means.
2. It is not so generally familiar as the arithmetic mean

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 123
Minlikalew D. (B.Sc.)
Cont…d
Mode
• The mode is the most frequently occurring value among all the
observations in a set of data.
• It is not influenced by extreme values.
• It is possible to have more than one mode or no mode.
• It is not a good summary of the majority of the data.
• The mode can be used as a summary measure for
nominal, ordinal, discrete and continuous data, in
general however, it is more appropriate for nominal
and ordinal data.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 124
Cont…d
Any observation of a Diagrammatic presentation of mode.
variable at which the
distribution reaches a
peak is called a
mode.
Most distributions
encountered in
practice have one
peak and are
described as uni-
modal.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 125
Cont…d
a) Ungrouped data
• It is a value which occurs most frequently in a set
of values.
• If all the values are different there is no mode, on
the other hand, a set of values may have more than
one mode.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared
8/12/2010 126
By Minlikalew D. (B.Sc.)
Example 1:
• Data are: 1, 2, 3, 4, 4, 4, 4, 5, 5, 6
• Mode is 4 “Unimodal”
Example 2:
• Data are: 1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 6, 8
• There are two modes – 2 & 5
• This distribution is said to be “bi-modal”
Example 3:
• Data are: 2.62, 2.75, 2.76, 2.86, 3.05, 3.12
• No mode, since all the values are different

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 127
b) Grouped data
• To find the mode of grouped data, we usually refer
to the modal class, where the modal class is the
class interval with the highest frequency.
• If a single value for the mode of grouped data must
be specified, it is taken as the mid-point of the
modal class interval.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 128
Cont…d

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 129
Minlikalew D. (B.Sc.)
Cont…d
Also we can use this formula
Mode = L + d1C
d1 + d2
Where;
L= is the lower limit of the modal class
d1= is the difference of frequencies in the modal class and the
preceding class
d2= is the difference of frequencies in the modal class and the
succeeding class
C= is the class interval of the modal class.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 130
Cont…d
Properties of mode;
• The mode can be used as a summary measure for
nominal, ordinal, discrete and continuous data, in general
however, it is more appropriate for nominal and ordinal
data.
• It is not affected by extreme values.
• It can be calculated for distributions with open end
classes.
• Often its value is not unique.
• The main drawback of mode is that often it does not exist.
• It is an average of position.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 131
Cont…d
Advantages
1. Since it is the most typical value it is the most
descriptive average.
2. Since the mode is usually an “actual value”, it indicates
the precise value of an important part of the series.
3. Used for categorical data to describe the most frequent
category.
4. Not affected by extreme values.
5. Easy to understand

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 132
Cont…d
Disadvantages
1. Unless the number of items is fairly large and the
distribution reveals a distinct central tendency, the
mode has no significance.
2. It is not capable of mathematical treatment.
3. In a small number of items the mode may not exist.
4. Some times there may be more than one mode

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 133
Cont…d
Exercise: A table showing the protein intake of different families.

Protein intake/Mid point of class Number offixi Cumulative


consumption unit/intervals families frequency
day (g)
15- 25 20 30 600 30

25-35 30 40 1200 70

35-45 40 100 4000 170

45-55 50 110 5500 280

55-65 60 80 4800 360

65-75 70 30 2100 390

Find
75-85 mean, median,
80 and mode.
10 800 400

Total 400 19000

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 134
Cont…d
Measures of Dispersion
MCT are not enough to give a clear understanding about the
distribution of the data.
We need to know something about the variability or spread of
the values — whether they tend to be clustered close together,
or spread out over a broad range.
Measures of Dispersion: Measures that quantify the
variation or dispersion of a set of data from its central
location.

Victory College, Faculty of Health Science, Department of Public


Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 135
Cont…d
Dispersion refers to the variety exhibited by the
values of the data.
The amount may be small when the values are close
together.
If all the values are the same, no dispersion.
Other synonymous term to Measures of
Dispersion :
– “Measure of Variation”
– “Measure of Spread”
– “Measures of Scatter”

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 136
Cont…d
Measures of dispersion include:
– Range
– Inter-quartile range
– Variance
– Standard deviation
– Coefficient of variation

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 137
Minlikalew D. (B.Sc.)
Cont…d
1. Range (R)
• The difference between the largest and smallest
observations in a sample.
• Range = Maximum value – Minimum value

Example –
– Data values: 5, 9, 12, 16, 23, 34, 37, 42
– Range = 42-5 = 37

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 138
Cont…d
• Being determined by only the two extreme
observations, use of the range is limited because it
tells us nothing about how the data between the
extremes are spread.
• Further, interpretation of the range depends on the
number of observations-
– when the number of observations increase, the
range can get larger.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 139
Cont…d
2. Percentiles, Quartiles and Inter-quartile Range
• The quartiles are sets of values which divide the
distribution into four parts such that there are an
equal number of observations in each part.
– Q1 = [(n+1)/4]th
– Q2 = [2(n+1)/4]th
– Q3 = [3(n+1)/4]th

• The inter-quartile range is the difference between the


third and the first quartiles.
– Q3 - Q 1
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 140
Cont…d
• Although the inter-quartile range sometimes serves
as a useful descriptive measure, it is mathematically
intractable and can also vary considerably from
sample to sample.
• Percentiles divide the data into 100 parts of
observations in each part.
• It follows that the 25th percentile is the first quartile,
the 50th percentile is the median and the 75th
percentile is the third quartile.

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 141
Minlikalew D. (B.Sc.)
Cont…d
3. Variance
• A good measure of dispersion should make use of all the data.
• Intuitively, a good measure could be derived by combining, in some way, the
deviations of each observation from the mean.
• The variance achieves this by averaging the sum of the squares of the deviations from
the mean.

Victory College, Faculty of Health


Science, Department of Public
8/12/2010 Health Officer, Biostatistics Lecture 142
Note Prepared By Minlikalew D.
Cont…d
• The sample variance of the set x1, x2, ..., xn of n
observations with mean ẍ is
n

∑ i
(x − x) 2

S = 2 i=1

n -1

Note : The sum of the deviations from the mean is


zero, thus it is more useful to square the deviations,
add them, find the mean (to get the variance).

Victory College, Faculty of Health Science, Department of Public


Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 143
Cont…d
4. Standard Deviation
• Being the square of the deviations, the variance is limited as
a descriptive statistic because it is not in the same units as in
the observations.
• By taking the square root of the variance, we obtain a
measure of dispersion in the original units.
Example : We use the data set of 10 numbers (See Page 29):
19 21 20 20 34 22 24 27 27 27
– The range = 34 – 19 = 15
– The first quartile is 20 and the third quartile is 27
– The inter quartile range = 27 – 20 = 7.
– The variance is 21.88
– The SD = √21.88 = 4.68.
Victory College, Faculty of Health Science, Department of
Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 144
Cont…d
5. Coefficient of variation
 When we desire to compare the variability in two sets
of data, the standard deviation which calculates the
absolute variation may lead to false results.
 The coefficient of variation gives relative variation &
is the best measure used to compare the variability in
two sets of data. Never use SD to compare variability
between groups.
 CV = standard deviation
Mean

Victory College, Faculty of Health Science, Department of


Public Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 145
Thanks You!!!
Enjoy it.
Victory College, Faculty of Health Science, Department of Public
Health Officer, Biostatistics Lecture Note Prepared By
8/12/2010 Minlikalew D. (B.Sc.) 146

Potrebbero piacerti anche