Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Text Book :
An Introduction to Biostatistics by N.Gurumani
Reference Book:
(1) Biostatistical Analysis by Jerrold H. Zar.
(2) Biostatistics : A foundation for analysis in the health sciences by W.W.Daniel.
Chapter 1 Introduction
History
Brave (1554-1661) - Astronomy
Kepler - Detailed study of the information collected by Brave.
John Graunt (1620-1674) - Father of vital statistics (births and deaths).
Edmund Halley - First life table.
Sussimilch (1707-1767) - Ratio of births and deaths remains constant.
Bernoulli (1654-1705) law of large numbers
Laplace (1749-1827)
Expectation
Francis Galton (1822-1921) - Study of regression analysis.
Jevons (1835-1882) - index numbers.
Karl Pearson (1857-1936) - Correlation analysis.
A.Fisher (1890-1962) - tests of significance applied to genetics.
Status Latin
Statista Italia
Statistique French
Statistic German
Literal meaning : political state (administrative activities of the state)
Definitions
Question : Find the percentage of marks obtained by Mr. Kumar in his S.S.L.C
examination.
Collection of data: Marks 75 60 100 99 51
Organization of data
Subject
marks
Tamil
75
English
60
Maths
100
Science
99
Social science 51
Presentation of data
120
100
80
60
40
20
0
tamil
english
maths
Analysis of data
Percentage of
marks = (secured marks / total marks) X 100
science
social science
Collection of data
Tabulation and Presentation of data
Analysis of data
Interpretation of data
Limitations of statistics
It can be used only to study numerically valued data not qualitative phenomena like
intelligence, poverty, honesty etc.
It deals with aggregate and not with individuals.
Statistical data collected for a given purpose cannot be applied to any other situation.
It is not always possible to compare statistical data unless they are homogeneous in
character.
It can be misused.
It is one of the methods of studying a problem.
Biostatistics - Defn
The statistics which is used to analyze the data derived from the biological sciences &
medicine is named as biostatistics.
Type of statistics
Descriptive Statistics:
To organize and summarize the data is known as Descriptive statistics. [ Descriptive
Statistics - methods of organizing, summarizing, and presenting data in an informative
way. ]
Inferential Statistics:
To reach decisions about a large body of data by examining only a small part of data.
[Inferential Statistics: A decision, estimate, prediction, or generalization about a
population, based on a sample. ]
A Taxonomy of Statistics
Chapter 3 Variables
Variables and Variate
Variable Defn : Commonly a factor or character which can take different values is
called a variable.
Example: height, length, weight.
Variate Defn : It is a single observation of a variable.
--
Types of Variables
Types of Variables
3. Ranked Variable: The variables, cannot be measured but can be ordered or ranked by
their magnitude. ( ex- Activity of gland ( pancreas, thyroid, blood pressure, etc) as
assessed in microscopic observations)
4. Derived variable: The variables are those which are calculated based on two or more
independently measured variables. (to show the relationship b/w variables) ( ex- ratio,
percentage, indices)
Measurement
The assignment of numbers to objects or events according to a set of rules.
(i) Ratio scale :
Measurement scales having a constant interval size and true zero point are said to
be ratio scales of measurement.
Ex: length, weight, volume
(ii) Interval scale:
The scale which is used to order the objects (measurements, observations) and also
the distance between any two measurements is known as interval scale.
(iii) Ordinal scale:
The scale which is used to order (arrange) the measures are adding to same criterion
but not category to category is known as ordinal scale.
Rounding of data
Errors in Measurements
Accuracy and Precision of data
Recording of the data
Sample size
Scientific Notation
Significant digits
The accurate digits , apart from zeros needed to locate the decimal point, are called
significant digits or significant figures of the number.
Example: 76.3 (3 significant digits)
8.6700 (5 s.d)
28.65 (4 s.d)
0.05723 (6 s.d)
Rounding of Data
To reduce the number of significant digits while recording measurement variates, we
follow the general rules for rounding
A digit to be rounded of is not changed if it is followed by a digit less than 5.
. is increased by 1 if it is followed by a digit greater than 5.
is an even number and is followed 5 standing alone or followed by zeros, then the
number is unchanged. If the number is an odd number then it is raised by 1.
Errors in measurement
Accuracy : It is the closeness of a measured or computed value to its true value.( If the
systematic error on the higher side, then the accuracy of the method and therefore that of
the data is low.
Precision : It is the closeness of the repeated measurements of the same quantity (It is a
measure of reproducibility).
Characteristics of Classification
Types of Classification
Spatial or Geographical: It is based on geographical locations( different continents,
countries, states, towns,)
Temporal of chronological: It is based on time ( year, month, )
Qualitative: It is based on quality or attribute ( colour, behaviour, religion ..)
Quantitative: It is based on enumerable or measurable variable.
Tabulation
Defn: It is defined as presentation of classified data in scientific manner to bring out the
essential features and main characteristics.
Organisation of a Table
Table number : reference and future identification
Title of the table: nature of the data, collected and classified details, other relevant details
Date
Head note ( optional ): Information like unit of measurements and scientific notations
Captions: headings of the vertical columns
Stubs : heading of the horizontal rows
Body of the table : cells, numerical values, totals, statistical analytical values, derived
values ,
Source
Footnote ( optional ): Explanations to the information given in the various parts of the
table.
Classification vs Tabulation
Classification
Tabulation
and Title
Neither too large nor too small
Units of measurement
Large numbers to be approximated ( scientific notation)
Spaces between rows for long unbroken columns
The values to be compared should be kept in adjacent columns / rows
Label the columns with numbers/alphabets.
The column headings should be used in the continuation pages
Items in stub should be in logical order ( alphabetic, chronological, geographical,..)
All less important items are placed in a separate column named as Miscellaneous classes
Rulings ( border / grid lines ) are important. Thick / Multiple lines are used in the main
classes while thinner lines are used to separate the sub-classes.
Types of Tables
Qualitative and Quantitative: sample classified according to some qualitative /
quantitative characteristic is tabulated as qualitative / quantitative
Simple and Complex : Based on number of variables
(a) one way
(b) two way
(c) three way
(d) manifold
Primary and derivative : Primary table is prepared on the basis of the original data
collected. Derivative table is prepared on the basis of the statistical derivatives such as
ratio, percentage, index, etc
Types of Diagrams
Bar diagrams
a) Simple Bar
b) Multiple Bar
c) Sub divided or Component Bar
d) Percentage Bar
Pie Diagrams
Pictograms and Cartograms
Bar Diagrams
Simple Bar: It is used to represent only one variable. In this diagram, the base are of
same width and only the length varies.
Multiple Bar: It is used to represent more than one variable with multiple bars. In this
diagram, the base are of same width and only the length varies. The bars are drawn side
by side. Different colours or shades are used to distinguish the bars.
Subdivided Bar: It is used to represent more than one variable. A bar is subdivided in to
parts in proportion to the values given in the data drawn on absolute figures
Percentage Bar: It is used to represent more than one variable. A bar is subdivided in to
parts in proportion to the values given in the data drawn on percentages (Relative basis)
value in percentage == Individual value of the item X 100 / Total value of the item
Pie Diagrams
Pie Diagram: It is circular diagram. The circle is divided in to segments which are in
proportion to the size of component. Different colors or patterns are used to differentiate
the segments
Conversion of the angles corresponding to each factor by applying the formula
Angle of the sector == Individual value of the item X 360o / Total value of the item
Cartograms are used to present the statistical data in the form of maps. (Reports of
temperature, rainfall in the different parts, state)
Excel
Graphs
Graphs are charts consisting of points, lines and curves.
They are drawn on graph sheets.
Scales are to be chosen suitably in both X and Y axes.
Statistical measures like quartiles, median and mode can be found out from the graphs.
It is also used to analysis the time series, regression, forecasting, interpolation.
Graph vs Diagram
Graphs
It consists of points, lines and curves
This is drawn on graph sheet
Numerical variation is in two directions
and scales are chosen for both the axes
Mathematical relation b/w two variables is
shown by regression lines
Less attractive and requires more attention
to understand
Widely used in statistical analysis,
presentation of data and research.
Trends and tendencies are known
Diagrams
It is geometrical shape such as bar,
circle, etc
Graph sheet is not required
Numerical variation is in one
direction and scales are chosen for
only one axis
Useful for visual comparisons
More attractive and easier to
understand the nature of the data
It is used in advertisements and
publicities
Trends and tendencies of the data
are not known
They are continuous because no break b/w the classes( 3.25-3.55; 3.55-3.85;3.85-4.15)
Class limits / boundaries: Each (practical / true) class-interval consists of two limits
called lower limit and upper limit( 3.3-lower 3.5-upper; 3.25-lower 3.55-upper)
Width / Magnitude of a class interval: Difference b/w the true upper limit and true lower
limit
Mid-point / Mid-value / Class mark of a class-interval: The average of the true lower and
true upper limits.
Class frequency: The number of items that fall in the class-interval
Number of class-interval: Depends up on the total number of item.
3.322 log10 N
Divide the frequency of each class / class interval by total number of item.
Percent relative frequency: multiply relative frequency by 100
Frequency Graphs
Frequency Polygon
Frequency curve
Construct the table of less than and more than cumulative frequency.
Plot the points of less than c.f on y-axis and draw a smooth line/curve which is the less
than ogives
Plot the points of more than c.f on y-axis and draw a smooth line/curve which is the
required more than ogives