Sei sulla pagina 1di 16

Biostatistics

Text Book :
An Introduction to Biostatistics by N.Gurumani
Reference Book:
(1) Biostatistical Analysis by Jerrold H. Zar.
(2) Biostatistics : A foundation for analysis in the health sciences by W.W.Daniel.

Chapter 1 Introduction
History
Brave (1554-1661) - Astronomy
Kepler - Detailed study of the information collected by Brave.
John Graunt (1620-1674) - Father of vital statistics (births and deaths).
Edmund Halley - First life table.
Sussimilch (1707-1767) - Ratio of births and deaths remains constant.
Bernoulli (1654-1705) law of large numbers
Laplace (1749-1827)
Expectation
Francis Galton (1822-1921) - Study of regression analysis.
Jevons (1835-1882) - index numbers.
Karl Pearson (1857-1936) - Correlation analysis.
A.Fisher (1890-1962) - tests of significance applied to genetics.

The word STATISTICS

Status Latin
Statista Italia
Statistique French
Statistic German
Literal meaning : political state (administrative activities of the state)

Definitions

Statistics are numerical statements of facts in any department of inquiry BOWLEY


(plural)
The science which deals with collection, presentation, analysis and interpretation of
numerical data CROXTON and COWDEN ( singular)
Science of counting / averages

Example : statistical methods

Question : Find the percentage of marks obtained by Mr. Kumar in his S.S.L.C
examination.
Collection of data: Marks 75 60 100 99 51
Organization of data
Subject

marks

Tamil

75

English

60

Maths

100

Science

99

Social science 51

Presentation of data

120
100
80
60
40
20
0
tamil

english

maths

Analysis of data
Percentage of
marks = (secured marks / total marks) X 100

science

social science

= ( 425 / 500 ) X 100


= 85%
Interpretation of data
Mr. Kumar has got 85% of marks in his SSLC examination.

Four functions of the statistics

Collection of data
Tabulation and Presentation of data
Analysis of data
Interpretation of data
Limitations of statistics
It can be used only to study numerically valued data not qualitative phenomena like
intelligence, poverty, honesty etc.
It deals with aggregate and not with individuals.
Statistical data collected for a given purpose cannot be applied to any other situation.
It is not always possible to compare statistical data unless they are homogeneous in
character.
It can be misused.
It is one of the methods of studying a problem.

Biostatistics - Defn

The statistics which is used to analyze the data derived from the biological sciences &
medicine is named as biostatistics.

Type of statistics
Descriptive Statistics:
To organize and summarize the data is known as Descriptive statistics. [ Descriptive
Statistics - methods of organizing, summarizing, and presenting data in an informative
way. ]
Inferential Statistics:
To reach decisions about a large body of data by examining only a small part of data.
[Inferential Statistics: A decision, estimate, prediction, or generalization about a
population, based on a sample. ]

A Taxonomy of Statistics

Chapter 2 Population and Sample


Population, sample, sampling
Population:
The largest collection of values of a random variable for which we have an interest of
a particular time.
Finite:
If a population of values consists of a fixed number of these values, then the population
is said to be Finite.
Infinite:
If a population consists of an endless succession of values, then the population is said
to be Infinite
Sample
A part of a population or A subset of a population.
Sampling
The process of drawing a sample from a population is called sampling.

Population versus Sample


A population is a collection of all possible individuals, objects, or measurements of
interest.
A sample is a portion, or part, of the population of interest

A population is a collection of all possible individuals, objects, or


measurements of interest.
A sample is a portion, or part, of the population of interest

Chapter 3 Variables
Variables and Variate

Variable Defn : Commonly a factor or character which can take different values is
called a variable.
Example: height, length, weight.
Variate Defn : It is a single observation of a variable.
--

Types of Variables

Qualitative : A variable, cannot be expressed in numbers is known as qualitative


variable. ( ex- sex, skin colour, smell of flowers)
Quantitative: A quantitative variable is one whose differing status can be expressed in
numbers. ( length of fish, weight of frog, seeds in a fruit)

Type of quantitative variables


A quantitative variable can be classified as discrete or continuous.
Discrete: It can assume certain fixed numerical values with no intermediate values
possible. ( ex- number of seeds in a fruit; number of children per family)
Continuous: It can assume, at least theoretically, infinite number of values b/w any two
fixed points. ( ex- length of fish, area, volume, percentages, etc)

Types of Variables

3. Ranked Variable: The variables, cannot be measured but can be ordered or ranked by
their magnitude. ( ex- Activity of gland ( pancreas, thyroid, blood pressure, etc) as
assessed in microscopic observations)
4. Derived variable: The variables are those which are calculated based on two or more
independently measured variables. (to show the relationship b/w variables) ( ex- ratio,
percentage, indices)

Measurement
The assignment of numbers to objects or events according to a set of rules.
(i) Ratio scale :
Measurement scales having a constant interval size and true zero point are said to
be ratio scales of measurement.
Ex: length, weight, volume
(ii) Interval scale:
The scale which is used to order the objects (measurements, observations) and also
the distance between any two measurements is known as interval scale.
(iii) Ordinal scale:
The scale which is used to order (arrange) the measures are adding to same criterion
but not category to category is known as ordinal scale.

Example: low - medium high

(iv) Nominal scale:


The lowest measurement scale is the nominal scale. It consists of naming
observations or classifying them into various mutually exclusive and collective
exhaustive categories.
Example: child - adult
under 65 - 65
male female
The Characteristics for Levels of Measurement

Chapter 4. Collection of Data


Collection of data
Data:The raw material of statistics is data.
Types of Data:

(i) Primary data:


The data which are collected from the individual respondents directly for the purpose
of certain study or information are known as primary data. (ex- survey, experiments,
questionnaire, local agents ).
ii) Secondary data:
The data which had been collected by certain people or agency or statistically
analyzed records are known as secondary data.
(ex- published reports, commercially available data banks, journals, census reports )

Various steps involved in collection of data


Statement of the hypothesis
Nature of the sample
Sample size
Enumeration and Measurement
Scientific Notation
Significant Digits

Rounding of data
Errors in Measurements
Accuracy and Precision of data
Recording of the data

Statement of the hypothesis


Situation/ problem / question
a)To study the relationship b/w the various factors affecting the growth of catla (fish)
keep pH of water is constant.
b)To study effect of pH on growth of catla.
c)Various factors: age, sex, maturity, physico-chemical properties of the water, type and
quantity of food, what catla, protein content of flesh.
d)Data can be drawn based on the type of experiment.
The clear statement of the problem should include

Statement of the hypothesis


The clear statement of the problem should include
The hypothesis ( is a tentative statement that offers an answer / explanation for a
problem)
Precise definition of the population from which the samples is to be obtained and on
which inferences are to be made (It includes sex , age groups, maturity, seasonal
conditions, environmental factors)
Definition of the parameters to be measured
Methodology ( It includes design of experiments, measurement of the parameters, units
of measurements, instrumentation to be used for measurement)

Nature of the Sample

Sample should be unbiased


Random sample ( The unbiased sample is obtained from the population randomly)
Issues related to random samples
Obtaining a satisfactory random sample is not easy. Collecting the samples of plants,
plant parts, air , water might be easier than samples of animals.

Sample size

Size affects the accuracy of the inference.


Larger samples are more useful than the smaller samples.
Small unbiased samples are more useful than large biased samples.
Enumeration and Measurement
In case of discrete variables, the variate is a value of counting or enumeration.
In case of continuous variables, a variate is a value of measurement (scale).
Examples:
a) Discrete variates: (i) Smaller values are recorded accurately (Number of girls in a
family; 0,1,2,3)
(ii) larger numbers are recorded approximately (Number of cells, insects, bacteria,
RBC in one mm3 of blood)
b) Continuous variates : Measurement variates are obtained from continuous variables
are recorded approximatly.

Scientific Notation

Scientific Notation : Power of 10


Suppose the number has many zeros before and after the decimal point, we will employ
the scientific notation for our convenience.

Example: 679000000 = 6.79 x 10 8


0.0078 = 7.8 x 10 -3

Significant digits
The accurate digits , apart from zeros needed to locate the decimal point, are called
significant digits or significant figures of the number.
Example: 76.3 (3 significant digits)
8.6700 (5 s.d)
28.65 (4 s.d)
0.05723 (6 s.d)

Rounding of Data
To reduce the number of significant digits while recording measurement variates, we
follow the general rules for rounding
A digit to be rounded of is not changed if it is followed by a digit less than 5.
. is increased by 1 if it is followed by a digit greater than 5.
is an even number and is followed 5 standing alone or followed by zeros, then the
number is unchanged. If the number is an odd number then it is raised by 1.

Errors in measurement

Two types of errors : Systematic & Random


Systematic : The errors due to defective instrument
Random error: The difference between observed value and true value
Accuracy and Precision of data

Accuracy : It is the closeness of a measured or computed value to its true value.( If the
systematic error on the higher side, then the accuracy of the method and therefore that of
the data is low.
Precision : It is the closeness of the repeated measurements of the same quantity (It is a
measure of reproducibility).

Recording of the data


A complete and permanent record of the data obtained should be maintained (Notebook,
record sheets, PC, etc)
The record should include
a) date b) number or code c)weight / size d) Location e) brief comments of
observations f) units of measurements g) Derived variables if any.

Chapter 5. Classification and Tabulation of Data


Classification
Defn: It is the process of arranging the available facts into homogeneous groups or
classes to bring out the resemblances, similarities and other relationships.
Objectives:
The mass data into a concise format.
To bring out the relevant points similarity, dissimilarity, and comparison.
To make the statistical treatment of the data easy.

Characteristics of Classification

Unambiguity: Clear definition of the terms used should be provided.


Stability: Consistent throughout the analysis

Flexibility: Easy to manipulate to new situations and circumstances


( addition/deletion of few classes without altering the basic theme)

Types of Classification
Spatial or Geographical: It is based on geographical locations( different continents,
countries, states, towns,)
Temporal of chronological: It is based on time ( year, month, )
Qualitative: It is based on quality or attribute ( colour, behaviour, religion ..)
Quantitative: It is based on enumerable or measurable variable.

Tabulation

Defn: It is defined as presentation of classified data in scientific manner to bring out the
essential features and main characteristics.

Organisation of a Table
Table number : reference and future identification
Title of the table: nature of the data, collected and classified details, other relevant details
Date
Head note ( optional ): Information like unit of measurements and scientific notations
Captions: headings of the vertical columns
Stubs : heading of the horizontal rows
Body of the table : cells, numerical values, totals, statistical analytical values, derived

values ,
Source
Footnote ( optional ): Explanations to the information given in the various parts of the
table.

Classification vs Tabulation
Classification

Tabulation

1. It is the process of dividing the It is the process of arranging the


data into homogeneous
classified data systematically /
subgroups
scientifically in rows and columns
of a table.
2. sorting
Summarizing
3. This condenses the mass of data This provides the data a readily
and facilitates to grasp the nature. referable and almost permanent
form.( rows and columns)
4. This foreruns tabulation
This completes an important stage
of enumeration.
5. This is a process of analysis of
data.

This is a process of presentation of


data.

6. Careful planning for tabulation is This is a mechanical function after


necessary even at this stage
classification

General rules for the construction of table


Number

and Title
Neither too large nor too small
Units of measurement
Large numbers to be approximated ( scientific notation)
Spaces between rows for long unbroken columns
The values to be compared should be kept in adjacent columns / rows
Label the columns with numbers/alphabets.
The column headings should be used in the continuation pages
Items in stub should be in logical order ( alphabetic, chronological, geographical,..)
All less important items are placed in a separate column named as Miscellaneous classes
Rulings ( border / grid lines ) are important. Thick / Multiple lines are used in the main
classes while thinner lines are used to separate the sub-classes.

Types of Tables
Qualitative and Quantitative: sample classified according to some qualitative /
quantitative characteristic is tabulated as qualitative / quantitative
Simple and Complex : Based on number of variables
(a) one way
(b) two way
(c) three way
(d) manifold
Primary and derivative : Primary table is prepared on the basis of the original data
collected. Derivative table is prepared on the basis of the statistical derivatives such as
ratio, percentage, index, etc

Chapter 6. Diagrams and Graphs


Need and Usefulness
To present dry and uninteresting statistical facts in the form of attractive and appealing
pictures and graphs
They render comparisons simple
Forecasting
To understand the relationship between variables
To locate descriptive statistical measures (Median, Mode etc.)

Guidelines for drawing diagrams and Graphs


Choose the diagram or Graph for the data appropriately
Number and title
Scale is neither too large nor too small
Geometric instruments are required
Software packages can be used
a) Microsoft Excel
b) SPSS (Statistics Package for the Social Science )

Types of Diagrams

Bar diagrams
a) Simple Bar
b) Multiple Bar
c) Sub divided or Component Bar
d) Percentage Bar
Pie Diagrams
Pictograms and Cartograms

Bar Diagrams
Simple Bar: It is used to represent only one variable. In this diagram, the base are of
same width and only the length varies.
Multiple Bar: It is used to represent more than one variable with multiple bars. In this
diagram, the base are of same width and only the length varies. The bars are drawn side
by side. Different colours or shades are used to distinguish the bars.
Subdivided Bar: It is used to represent more than one variable. A bar is subdivided in to
parts in proportion to the values given in the data drawn on absolute figures
Percentage Bar: It is used to represent more than one variable. A bar is subdivided in to
parts in proportion to the values given in the data drawn on percentages (Relative basis)
value in percentage == Individual value of the item X 100 / Total value of the item

Pie Diagrams
Pie Diagram: It is circular diagram. The circle is divided in to segments which are in
proportion to the size of component. Different colors or patterns are used to differentiate
the segments
Conversion of the angles corresponding to each factor by applying the formula
Angle of the sector == Individual value of the item X 360o / Total value of the item

Pictograms and Cartograms


Pictograms represent the data in the form of pictures. They are attractive, easy to
understand and useful for exhibition, poster session of seminars, magazines and
newspapers.

Cartograms are used to present the statistical data in the form of maps. (Reports of
temperature, rainfall in the different parts, state)
Excel

Graphs
Graphs are charts consisting of points, lines and curves.
They are drawn on graph sheets.
Scales are to be chosen suitably in both X and Y axes.
Statistical measures like quartiles, median and mode can be found out from the graphs.
It is also used to analysis the time series, regression, forecasting, interpolation.
Graph vs Diagram
Graphs
It consists of points, lines and curves
This is drawn on graph sheet
Numerical variation is in two directions
and scales are chosen for both the axes
Mathematical relation b/w two variables is
shown by regression lines
Less attractive and requires more attention
to understand
Widely used in statistical analysis,
presentation of data and research.
Trends and tendencies are known

Diagrams
It is geometrical shape such as bar,
circle, etc
Graph sheet is not required
Numerical variation is in one
direction and scales are chosen for
only one axis
Useful for visual comparisons
More attractive and easier to
understand the nature of the data
It is used in advertisements and
publicities
Trends and tendencies of the data
are not known

Chapter 7. Frequency Distribution


Frequency Distribution

It is a classification of a random variable into a number of classes or class intervals


indicating the number of times( ie the frequency) the different classes or representatives
of the class intervals occur in the data.
It is always presented in a table called FT.

Discrete frequency distribution

Classes: Depends up on maximum and minimum value of the data


Tally marks: Counting process
Frequency: The number of items that fall in the class
Continuous frequency distribution
Practical class intervals: Used for counting purpose ( 3.3 3.5; 3.6-3.8; 3.9-4.1)
True class intervals: Used for all other purposes. Implied limits of the original values.

They are continuous because no break b/w the classes( 3.25-3.55; 3.55-3.85;3.85-4.15)
Class limits / boundaries: Each (practical / true) class-interval consists of two limits
called lower limit and upper limit( 3.3-lower 3.5-upper; 3.25-lower 3.55-upper)

Width / Magnitude of a class interval: Difference b/w the true upper limit and true lower

limit
Mid-point / Mid-value / Class mark of a class-interval: The average of the true lower and
true upper limits.
Class frequency: The number of items that fall in the class-interval
Number of class-interval: Depends up on the total number of item.

Steps in framing a frequency distribution


Classes should be clearly defined and should lead to any ambiguity
Each of the values should be included in one of the classes
Mutually exclusive classes
Classes should be of equal width
Classes should be non-overlapping
Avoid open-end classes
Number of classes ; b/w 5 and 15
Class interval = (max min ) / number of classes , where the number of classes = 1 +

3.322 log10 N

Cumulative frequency distributions


It is a frequency distribution when successive frequencies are added together so that
each class includes all the classes below or above, depending upon the end from which
the cumulative process begins.
Less than: The accumulation begins from the first class-interval
More than: The accumulation begins from the last class-interval

Relative frequency distribution

Divide the frequency of each class / class interval by total number of item.
Percent relative frequency: multiply relative frequency by 100
Frequency Graphs

Qualitative and Discrete frequency distribution: Bar diagram


Continuous frequency distribution: Histogram, Frequency polygon, Frequency curve.
Cumulative frequency distribution: ogives curve
Bar diagram ( graph)

X-axis: classes of the variable


Y-axis: Frequencies
Height of each bar is proportional to the frequency of the respective class.
Histogram

X-axis : class intervals


Y-axis: frequencies
Construct adjacent rectangle for each class interval
Height of each rectangle is proportional to the frequency of the respective class.

Frequency Polygon

It is constructed from histogram by joining the midpoints of top of the various


rectangles by straight lines upto the base line.
The total area of the frequency polygon = total area of the rectangles taken together.

Frequency curve

It is constructed from histogram by joining the midpoints of top of the various


rectangles by smooth lines upto the base line.

Cumulative frequency Graphs / OGIVES

Construct the table of less than and more than cumulative frequency.
Plot the points of less than c.f on y-axis and draw a smooth line/curve which is the less
than ogives
Plot the points of more than c.f on y-axis and draw a smooth line/curve which is the
required more than ogives

Potrebbero piacerti anche