Sei sulla pagina 1di 1027

FUNDAMENT ALS OF

STATISTICS

BY

D. N. ELHANCE, M. COM.,
H,ad qf thl D,partm,nt and Dean of 1M Faculty of Commerce.
Uniuersity of Jodhpur,
JDdhpur.

KITAB MAHAL ALLAHABAD


1972
B.~. No. 64.
Indian Veterinary Research Institute
Library.
MUKTESWAR.
Class,

Register No.98:2_ 'Room No. 31 a


Inward No. Shelf No. E:.L-H
Received Book~o.
~/'~; ;' 10; Ii,' .
MGIPC-S4-28 VRjo2-12·2 '63-1,00 J.
First Idition,_ 1957
Second Edttio~ 1958
Third Edition, 1960
Fourth Edition, 1962
Fifth Edition, 1964
Sixth Edition. 1965
Seventh Edition. 1966
Eighth Edition, 1967
Ninth Edition, 1968
Tenth Edition, 1969
Eleventh Edhion, 1970
Tyveltth Ed~itp-?'f 1971
Thirteenth Edition: 197~

,-
Printed by: Eagle Offset Printers. 15. Thornhill Road, Allahabad
Published by: Kitab Mahal, 15. Thornhill Road. Allahabad
IN MEMORY

OF

MY FATHER
PREFACE TO THE THIRTBB'lTfI EDlrfO~
A new edition of the now famous book on Statistics has come out
maintaining its old traditions intact but with new approaches all round
to register and record the various changing aspects.
Calculations have been re-calculated in order to eliminate any
slightest variations which may haye crept in during the past years-
Change to metric system has also been completed.
In its present fOJ;m the utility of the book has increased consider.
ably, and University students as well as administrators will find suffi.
cient material for their guidance and assistance.
The author will feel grateful to the discriminating student com-
munity and the general users of the book for their indulgence in pinpoin-
ting any error.
D. N. ELHANCB
PREFACE TO THE SIXTH EDITION
The present edition of this book has many new features. Two
new chapters-Designs of Experiments and Statistical Q:lality Control-
have been added in this volume. The chapter on Growth of Statistics
in India has been made uptodate.and latest figures have been substituted
for old ones.
Some chapters of this book have been reVised and new points have
been included in them. A large number of fresh questions have been
added at the end of each chapter to make the book more useful to exa-
minees. The entire portion of Indian Statistics has been brought ·upto-
ate.
I hope the present volume would be found useful by the students
of the subject. J am grateful to a number of students and friends" ho
gave me valuable sug,gestions for the improvement of the book and
1 am confident that they would continue to do so in future also.
D. N. ELHANCB
PREFACE TO THE SECOND EDITION
From the various reviews which appeared in a large number of
journals and papers, I conclude that the first edition of this book was
very well received. In the present edition I have rearranged certain
chapters and made the chapter on Growth of Statistics in India upto-
date. Besides, I have included a large number of new questions at the
end of each chapter.
The book is now divided in two volumes. V.olume I covers tbe
eatire B. Com., B.A. and B.Sc. course of statistics of all the universi-
ties of India and Pakistan. Volume II contains chapters on Probability,
1 heoretical Frequency Distributions and Sampling. Tbe two volumes
ale available separately as well as in a combined form.
I am grateful to a la-rge number of friends who have given me
valuable s.uggestions for the improvement of the book. 1 hope the
students of the subject would lind the book more useful than before.
151h April. 1958. D. N. ELHANCB
PREFACE TO THE FIRST EDITION
The science of statistics has assumed great importance in recent
years. It was once known as the "Science of Kings" and its scope
was extremely limited, but today the science of statistics has become
an all-important science, without which no other science can progress.
Modern age is the age of statistics and it is very correctly said that the
extent of the economic development of a country can best be known
by finding out the extent to which statistical organisation has developed
there. Till recent! y the foreign government of our country and even
our countrymen were very indifferent towards statistics. After inde-
·pendence of the country the era of economic planning started and along
with it the importanc of statistics increased considerably, In fact
economic planning cannot be imagined in the absence of statistical
data.
o It is a matter of great satisfaction that the impottance of statistics
Is gradually being realised in our country and they are occupying the
place of honour which they should have got much earlier. Statistics
is now taught in almost all the universities of the country and there are
a number of statistical institutes which impart special trainihg in ~his
subject. This book is an attempt to furnish a simple, non-mathemat1cal
text for those who desire to equip themselves with a knowledge of the
elementary statistical methods used in modern times. The treatment
of this subject has been as far as possible of a non-mathematical character
because most of the students who study this subject do not always have
a mathematical background. This book has been written primarily
for use of M.A., M.Com., and B.Com. students who study this subject.
The book covers the entire course which is prescribed for the statistics
paper in these examinations in various universities of the country as
also the courses prescribed in LA.S. and P.C.S. examinations of the
paper. A large -pumber of questions have been given at the end of
each chaptet with a view to help the students in solving numetical pt'o-
blems and thus familiarising themselves with different types of formula~
used in statistical analysis.
I am grateful to my colleagues in the Faculty of Commerce, Alla~
habad University, who have given me some ver} valuable suggestions,
Thanks are also due to Mr. S.V. Erasmus, my secretary, who worked
almost like a machine for all the days during which this book was
written. Kitab Mahal, my publishers deserve congratulations for the
nice printing and get-up of the book.
IJI December, 1956. D. N. ELHANCE
CONTENTS
CHAPTER Page
)'r Meaning and Definition of Statistics 1
2. Origin and Growth of Statistics 8
/ Importance, Limitations and Functions of Statistics 16
4. Preliminaries to the Collection of Data 33
5. Collection of Primary and Secondary Data 41
6. Accuracy, Approximation and Errors 53
....:w--- Classification, Seriation and Tabulation 63
8. Ratios, Percentages and Logarithms 80
C Measures of Central Tendency 87
%. Measures of Dispersion 178
11. Moments, Skewness and Kurtosis 236
12. Index Numbers 250
~. Diagrammatic Representation of Data 300
..)..4. Graphic Representation of Data 347
15. Analysis of Time Series 405
1Jj. Correlation 454
17. Regression and Ratio of Variation 508
18. Theory of Attributes and Consistence of Data 528
19. Association of Attributes 546
20. Interpolation 577
21. Business Forecasting ... 610
/22 Interpretation of Data 619
,.23: Probability 629
24. Theoretical Frequency Distributiolls 654
25. Theory of Sampling 676
26. Sampling of Attributes 689
27. Sampling of Variables (Large Samples) 706
28. Chi-square Test and Goodness of Fit 736
29. 'Sampling of Variables (Small Samples) 757
30. Analysis of Variance 783
31. Designs of Experiments 796
32. Statistical Quality Control 802
33. Growth of Statistics in India 814
34. Mathematical Tables 994
DET A ILED CONTENTS

Chapter x-Meaning and Definition 'of Statistics- Pages.


Meaning; Definition; Main divisions of the study of
Statistics; Objects of Statistics; Questions. 1-7
Cbapter 2-0rigin and Growth of Stad.tics-Early
beginnings; 16th to zoth Centuries; Relationship with
ober Sciences; Statistics and Economics; Statistics and
Mathematics; Statistics and Astronomy; Statistics and
Bioogy, etc., Questions. 8-51
Chapter 3-Importance, Limitations and Functions of
Statistics-Statistics and the common man; Causes of
importance; IndisJ.'ensability of Statistics; Limitation;
of the Science of Statistics; Distrust of Statistics
Functions of Statistics and Sta~isticians; Questions. 1 6-32
Chapter 4-Preliminaries to the Collection of Data-
Object and Scope of enquiry; Sources of information;
Type of enquiry I Statistical Units, Degree of accuracy,
Questions. 33-40
chapter ,-Collection of Primary and Secondary Data
-Primary and Secondary data ; Choice of Mthods J
Method of Collecting Primary data I Representative
Data J Random Sampling J Collection of Secondary data I
Scrutiny of Secondary data J Questions. 41-5:1
Chapter 6-Accuracy, Approximation and Errors-
Editing of data, Accuracy , Approximation , Statistical
Errors; Questions. 53-62
Chapter 7-Classification, Seriation and Tabulation-
Classifitation : Need and meaning; Characteristics-
Classification according to attributes J Classification
according to class-intervals.
Serialio,., Definition J Time Series J Spatial Series J Con-
dition Sel.~s, Discrete, Continuous, Simple and Cumu-
lative Series.
Tablliation : Types of tabulation, Rules of tabulation;
Questions. 63 -79
Chapter 8-Ratios, Percentages and Logarithms-Need,
De~ivatives , Fallacies i~ the US~ of percentages and
ratIos, Some ~<.>pular RatJos used In Population Studies;
General Fert1h~y Rate; Gross a~d Net Reproduction
Rates.j LogarIthms I Computatlons by logarithms ;
QuestIons. 80- 96
DETAILED CONTENTS

Chapter 9-Measures of Central Tendency-Need and


Meaning; Objects; Characteristics of representative
average, Measures of various orders; Types of averages.
Arithmetic Average: Calculation of arithmetic average
in a discrete series; Calculation of the arithmetic average
in a continuous series; Charlier's Accuracy Check,
Algebraic properties of arithmetic average J Meri ts
and Drawbacks.
l'J.edian: Meaning; Location of Median in various
types of series; Graphic calculation , Merits and Draw-
backs, comparison with mean.
Qllartiles, Deciles and Percentiles, etc., : Location in
various types of series; Graphic calculation; Charac-
teristics.
Mode : Meaniog J Location of mode in various types
of Series; Determination by curve fitting; Determi-
nation of mode from mean and median ; Graphic Me-
thod ; Merits and Drawbacks; Comparison with mean;
aod Median.
Geometric Mea" : Meaning; Calculation in various types
of series ; Algebraic properties of geometric. mean,
Merits aod Drawbacks.
Harmonic Mean: Meaning; Calculation; Reciprocal
character of' arithmetic average and harmonic mean
Merits and Drawbacks.
Other Averages: Quadratic mean, Moving average;
Progressive average; Relation between different averages;
Selection of an average, Limitations of averages.
Weighted Average : Need and meaning, Calculation of
.weighted arithmetic average by direct and short-cut
methods; When should weighted mean be used,
Weighted geometric and harmonic means J Questions. 97-177
Chapter lo-Measures 6f Dispersion-Need and meaning;
Measures of dispersion.
Range: Its merits, demerits and uses.
Inter-Quartile and Semi-Inter Quartile Range: Calcula-
tion in various types of series; Merits aod Drawbacks.
Mean Deviation : Meaning ; Calculation in various types
of series by direct and short-cut methods; Charac-
teristics aod uses of mean deviation.
Standard Deviation : Meaning ; Calculation in various
types of series by direct and short-cut methods; Charlier's
check of accuracy; Sheppard's corrections for grouping;
Standard Deviation and the spread of Observations;
Mathematical properties of standard deviation , Merits,
demerits and uses.
OHTAILED CONTENTS :xj

Other Measures of Dispersion: Modulus, Precision;-


Probable Error; Variance I Co-efficient of Variations I
Ginni's Mean Difference.
Relationship between various measures of dispersion;
Choice of a measure of disperslon; Lorenz curve;
Questions. 178--235
Chapter II-Moments, Skewness and Kurtosis-
Moments : Meaning, Calculation of various moments
about the mean I f3 and " co-efficients.
Skewness : Need and meaning; Tests of skewness;
Measures of skewness I First and Second measures of
skewness; Positive and Negative skewness.
Kurtos~s : Meaning and calculation; Dispersion, Skew-
nesS and Kurtosis contrasted, Questions. 236-249
Chapter u-Index Numbers-Need and Meaning.
Wholesale Price Index Numbers: Technique of construc-
tion; Selection of items and obtaining quotations;
Selection of Base; Fixed and Chain Base, Price relatives
and link relatives I Problem of weighting.
Cost of Ulling Index NURJbers: Need; Difficulties in
construction; Aggregate expenditure method; Family
budget method, Sources of errors io cost of living index
numbers.
Indices of Industrial Production : Need and technique
of construction.
Indices of Business Conditions: .Need and technique
of construction.
Relationship between fixed base aod chain base index
numbers ; Base shifting ; Splicing of index numbers ;
Deflating of index numbers ; Reversibility Tests; Time
Reversal Test I Factor Reversal Test; Circular Test;
Problem'of an Ideal Index Number I Various Formulae
used in construction of index numbers J Uses and Limi-
tations of Index Numbers ; Questions. 250-299
Chapter 13-Diagrammatic Representation of Data-
Need and usefulness; Characteristics of and rules for
drawing diagrams, various types of diagrams-Simple,
multiple and sub-divided bars; Rectangles, SquaresJ
Circles ; Cubes; Pictograms ; Cartograms ; Questions. 300-346
Chapter 14- Graphic Representation of Data---Construc-
tion of graphs.
Graphs of Time Series: Absolute HistorigraOlS- ; False
Base line J Mhod of showing Range-Zone Graph I
Mebod of showing differences ; Ba_nd Graphs ; Zec-
chart.
xU OBTAILlm CONTEN'l'S

Graphs of Freq/lenry Distriblltions : Bar Frequenc}


curves ; Discontinuous curves; Continuous curves.
Theorrtical Frequenry Curves: Normal curve of error ;
Moderately asymmetrical frequency curves; Extremely
skew curves.
Cumulative Frequency Curves: Less than and more
than curves; Galton's method of locating median.
Graphs on Ratio Scale: Semi-logarithmic curves;
Reading graphs on ratio scale ; Special features of ratio
scale.
Unear Relationships.
Non-Linear Relationships : Parabolic and Hyperbolic
curves; Exponential Curves; Questions. 347-404
Chapter Is-Analysis of Time Series-Meaning and need;
Secular Trend; Seasonal Variations; Cyclical Fluctuations;
Irregular Fluctuations.
Measurement of Trend: Curve fitting by inspection;
Moving average methodJ curve fitting by mathema-
tical equations; Method of Least Squares; Fitting
curve of the power series ; Parabolic curve.
Measurefnmt of Seasonal Flf(ctuations Seasonaf Varia-
tion Index (by monthly averages) ; Seasonal Variation
Index (by moving averages); Method of Link Relatives.
Measurement of Cy&lical and Irregular Fluctuations:
Questions. 405-453
Chapter I6--Correlation-Meaning; Scatter diagram; Cor-
relation graph.
Coeffident of Correlation : Karl Pearson's fo.rmula
and its proof ; Calculation of the Coefficient in various
types of series by direct and short-cut methods ; Cor-
relation in time series-:-long-time changes, Short-time
oscillations and cyclical fluctuations ; Correlation in
grouped data; Probable Error of the Coefficient of
Correlation in Interpretation of Correlation ; Correlation
and method of Least Squares; Rank Correlation ; Co-
efficIent of Concurrent Deviations ; Correlation Table
Lag and Lead Correlation and Determination; Questions. 454-507
Chapter 17- Regression and Ratio of Variation-Meaning
and use; Regression equations; Regression Lines;
Regression Coefficient; Ratio of Variation; Galton's
Graph and its interpretation, Questions. 508-527
Chapter IS-Theory of Attributes and Consistence of
Data-Meaning ; Classification of Data J Rules for
testing consistence of Data; Incomplete Data,
Questions. 528-545
DE.TAILED CONTETS Kin
Chapter I9-Association of Attributes-Expected values I
Criterion of Independence; Association; Complete
association and Qhassociation J Intensity of association ;
Chance association; Coefficient of association; Coeffi-
cient of Collignation ; Partial association; Illusory asso-
ciation; Manifold Classification; Association in Contin-
gency tables ; Coefficient of contingency; Tschuprow's
Coefficient; Questions. 546-576
Chapter 2o-Interpolation-Meaning and need; Assump-
tions J Methods of interpolation.
Graphk Methods: In continuous time series ; in series
showing periodicity in correlated series,
Algebraic Methods: Methods of curve fitting; Methods
of .finhe differences; Newton's Formulae; Newton-
Gauss Formula; Sterling Formula; Newton-Gauss
(Backward) Formula; Direct Binomial Expansion
method; Lagrange's Formula; Questions, 577-609
Chapter 2I-Business Forecasting.-Meaning and Need ;
Basis; Technique, Business Barometers; General Assump-
tions , Theories of Business Forecasting ; Tim<:-lag or
Sequence Theory; Action and Reaction Theory I
Specific Historical Analogy Theory ; Cross Cut Analysis
Theory; Utility of Business Forecasting; Questions. 610-618
Chapter u-Interpretation of Data-Meaning and Need I
False generalizations ; Wrong interpretation of statis-
tical measures like Index Numbers, Correlation and
Association etc., Effect of wrong interpretations of
data; Questions. 619-628
Chapter 23-Probability-Permutation and combination I
Calculation of probability I Simple events I Compound
events I Multiplication of probabilities, Addition of
probabilities; Use of binomlal theorem, Mathematical
Expectation; Inverse probability J Questions. 629-653
Chapter 24-Theoretical Frequency Distributions-
Binomial Distribution: Meaning; Characteristics J
Binomial expansion, General form of binomial dis-
tribution , Comparison of actual and expected frequen-
cies ; Mean and standard deviation of binomial distri-
bution.
Normal Distribution: Meaning J Properties of a nor-
mal curve, Equation of normal curve; Basic assump-
tions of normal curve.
Poisson Distribution : Meaning J Equation of Poisson
distribution. Assumptions in Poisson distribution J
Questions. 654-675
xlv DETAILED CON'l!EN'l'S

Chapter 2,-Theory of Sampling-Meaning and use,


Types of Universes; Objects of Sampling J Precision
in Sampling; Types of Sampling J Bias in Sampling.
Selecting Random Sample: Lottery method J Serial
Geographical or Alphabetical arrangement, Random
numbers.·
Selecting Purposive and Mixed Samples: conducting a
sample enquiry; Questions. 676-688
Chapter 2li-Sampling of Attributes-Simple sampling;
Mean and Standard Deviation in Simple Sampling of
attributes; Standard Errors; Standard Error and Size of
Sample; Standard Error and Precision ; Standard Error
of the difference between proportions ·of two Samples J
Questions. 689-705
Chapter 27-Sampling of Variables (Large Samples)-
Nature of the problem; Sampling Distribution I Standard
Error of Simple Sampling of Variables J Standard Error
of the Mean • Standard Error of the Median, Quartiles,
Deciles, etc.; Standard Error of Mean Deviation, Stan-
dard Deviation, Quartile Deviation; Variance Coeffi-
cient of Variation and Co-efficient of Skewness; Stan-
dard Error of the Co-efficient of Correlation, Regression
and Association.
Standard Error of the Difference of Sample Means:
Sample Medians and Sample Standard Deviations;
Questions. 706-73 S
Cbapter 28-Chi-Square Test and Goodness of Fit-
Meaning; Degrees of Freedom; Levels of Signifi~a~ce,
Formulae of Calculation; Expected Values; Condltlons
for the application of Chi-square Test; Additive property
of X 2 ; Chi-square Distribution; Goodness of Fit;
Questions. 736-756
Chapter 29-Sampling of Variables (Small Samples)-
Need of Separate Analysis; Tests of Significance, Null
Hypothesis; Significance of a Sample Mean; Form of
"t" distribution.
Significance of Difference between Two Sample Means :
Significance of the Coefficient of Correlation; Z-trans-
formation; Significance of the difference between two;
Sample Coefficients of Correlation; Questions. 757-782
Chapter 3o-Analysis of Variance-Meaning and Use;
Total Variation; Variance between the samples:
Variance within the Samples; Formula of Calculation;
Short-cut mc:thod; Table Values of F: Questions. 783-795
DETAILF.D CONTEN111 xV

Chapter 3I-Designs of Experiments-Meanlng and need


Experimental Designs; Comparison in pairs; Latin
Squares; Factorial Designs; Questions. 796-801
Chapter 3z-Statistical Quality Control-Meaning; Pur-
pose.
Process Control: Evolution, Control Chart Technique;
Chart of Industrial observance; Chart of averages;
Chart of Range, Control Chart and analysis of variance,
Control Chart for defects per unit; Selection of Control
limits.
Acceptance Inspection: Meaning and Technique; Sam-
pling Technique; Sampling Plans; Single, Double and
sequential Sampling; Average Sampling; OC curve;
ConclusionJ Questions. 802-813
Chapter 33-Growth of Statistics in India-
Section I : Statistical Organization : Early beginnings; 18th
Century; 19th Century; 2.oth Century; Present Posi-
tion. Improvement in Methodology, Scope and
Coverage.
Section 2. : Population Statistics : Census procedure upto
1931 ; Changes in 1941 ; Census of 1951 ; Information
Collected; General Criticism of Indian Population
Census; Census of 1961-some Suggestions.
Vital Statistics-Shortcomings-Suggestions.
Demographic Surveys.
Utility of Population Statistics.
Section 3 : Agricultural Statistics: Area Statistics; Tempora-
rily settled areas and Permanently settled areas; Yield
Statistics; Traditional Method; Random Sampling
Method ; Crop-estimates J Land Utilization Statistics;
Publication on Agricultural Statistics; General Short-
comings of Agricultural StatistJcs.
[ndices of Agricultural Production :Reserve Bank of India
Index; Eastern Economist Index; F. A. o. Index.
,Miscellanlolll -t4gricultural Statistics : Livestock Statis-
tics--Statistics of Holdings; Forest Statistics of Mines
and Minerals.
fettion 4: Industrial StatistiGS : Early Statistics; Present Posi-
tion; Annual Census of Manufactures; Statistics of
Industrial Output.
IndiceJ oj IndUS/rial Prodllctlon pnd Profits: Eastern
Economist Index ; Index issued by Ministry of Com-
merce and Industry; Capital Index of Industrial Ac-
tivity} Index Number of Industrial Profits. Employ-
XVI OF.TAILED CONTENT $

ment Statistics; Trade Union Statistics; Industrial


Dispute Statistics; Social Security and Labour Welfare
StatIstics.
S,t/ion S : Price Statist;&!: Harvest Prices; Other Prices.
Publications containing price statistics.
Pr;,e Index Numher s : Index Number of Harvest Prices}
Economic Adviser's Index of Wholesale Prices; Eco-
nomic Adviser's new (Revised) Index of Wholesale
Prices ; Calcutta Wholesale Price Index Number.
Labollr Bureau Index Number of Retail Prices. Con-
sumer Price Index Numbers, compiled by the Labour
Bureau Rnd various States; Bombay Working Class
Cost of Ijving Index, Kanpur Working Class Cost of
Living Index.
Indites of S~Cllrity Prices : Economic Adviser's Series l
Old Series of the Reserve Bank; New Series of the
Reserve Bank.
Sution 6 : Wage Statistics : Publications containing Wage
Statistics--Labour Bureau Index of Earnings of Fac-
tory Workers.
Agricultural Wages.
SIll/on 7 : Trade Statistics : Publications containing statistics
of Inland and Foreign (Sea, Air and Land) Trade of
India and their detailed study.
Sulio'J 8 : Financial Statistics: Publication containing financial
statistics and their study.
Sulion 9: National Income Statistics : Important Methods
of Calculation; Difficulties in the calcuhtion of India's
National Income; Technique suitable to Indian con-
dition J Esdmate of India's National Income; Special
Features of India's National Income.
Sution 10: National Sample Sf/rUBVJ: Beginning Method;
First round ; Subsequent rounds; Asses~ment of results
-Information collected.
SUIion
I I : PreJent POJilion : Shortcomings and Suggestion!;
Questions. 814-990
Mathematical Tables 994-1006
Meaning and Definition
of Statistics 1
----------------------------------~--~--.--
MEANING

"Statistics", in its modern J:onnotation, "is a body of metho.:ls for


making wise decisions in the face of uncertainty." (Wallis and Roberts)
It is used for the collection, analysis and interpretation of data in order
. to provide a basis for making correct decisions. This concept of
statistics is very different from the sense it originally used to denote.
As its name implies, the word Sfatistics was originally applied only
to such facts and figures as the State required for its official purposes.
The word has since acquired a wider meaning, so that it now embraces
any set of quantitative data relating to a particular phenomenon, irres-
pective of the fact whether the data are of interest to the State or not.
The same word is also used, not only for the material which is analysed,
but also for the methods applied in its analysis. Thus in recent times
the word statisti&S has come to be used in two senses: as numerical data
and as statistical methods.
As numerical data
In common parlance the word Statistics denotes some numerical
data. If, for example, somebody says that he has studied the statistics
of man"-hours lost by the Indian cotton mills due to strikes in the year
1955 or that he has seen the statistics of automobile accidents in 'the
U.S.A., he refers to the numerical figures or data relating to these phe-
nomena. In this sense, statistics are numerical descriptions of the quan-
titative aspects of things. They take the form of counts or measure-
ments. Statistics about the membership' of a certain hostel, for example,
include a count of the number of members and separate counts of the
number of members of various kinds. as postgraduate and undergra-
duate or over and under 21 years of age. They might include such
measurements as the weights and heights of the members qf the numbers
computed from such counts or m~asurements, for example. the pro-
portion of members who are married or the ratios between weights and
heights. The use of the word statistics in this sense is always in plural.
However, any figure or set of figures cannot be called statistics irres-
pective of any other consideration. Many things are taken into account
before using the word statistics for any group of figures. We shall
discuss these a little later.
The use of the word statistics in the above sense is, in our opinion,
oot very correct. A more appropriate 'Word to indicate numerical
facts is dala and as far as possible this word should'be used in place of
statistics in this sense.
2 FUNDA.J.mNTALS OP STATISTICS

Statistical methods
The second sense in which the word statistics is used refers to
the statistical principles and methods -used in collection, analysis and
interpretation of data. In this sense the word is used in singular.
Statistical methods (or statistics) have a very wide range. They include
. not only simple and conunonly known devices of comparison and
analysis. but also highly technical and mathematical formulae which
are capable of being understood only by experts who have received
special training in this subject.
SttllislictJl methods IJIIIl experimllltlJl IIIIthods. Statistical methods
include all those devices which are used in collection and simplification
of nUIllerical data so as to render them capable of being analysed, and
conunonly understood without much difficulty. ,Statistical methods are
different from experimental methods in as much as the latter are more
accurate and precise than the former. In experimental methods it is
possible for us to study the effects of anyone of the many factors affect-
ing a phenomenon individually by making the other factors inoperative
for the time being. Thus in physics it is not difficult to study the effects
of, say, only heat on the density of air by making other factors in-
operative for the duration of study. But the same thing is not possible
in statistical methods. It is not feasible to study the effects of, say,
only inflation on prices. The effects of inflation cannot be separately
studied from the effects of many other factors like demand, supply,
exports and imports, etc. However, by the use of statistical methods
it is possible to have a rough idea of the effects of inflation upon prices.
Statistical study c~aot be as accurate as the study done by experimental
methods. Thus we see that statistical methods are comparatively less
accurate and are usually applied in inexact sciences like sociology though
even in physical sciences (which are classed as exact sciences), the use
of these methods is sometimes necessary. Statistical methods are thus
of universal application though their primary field is social sciences.
Thus "Statistics are- numerical facts, but statistics is a body of
methods for making decisions when there is uncertainty arising from
the incompleteness or the unstability of the information available. The
decisions may be made either for the practical purpose of selecting
a course of action or for the scientific purpose of gaining genera]
knowledge."
DEPINITION
The term Stmsliu has been defined differeady by different au-
thors. Some authors have defined the word as used in the first sense
(of numerical d~ta) while others have .defined it as. 1l:sed in the second
sense (of statistical methods or the sCience of statistics).
Firat Type
Of the first type of definitions the one given by HortJce Secrist iJ
the most exhaustive. It is as follows -
MEANING AND DEPINITION OP STATISTICS 3
·'By statistics we mean aggregates of facts affected to a marked
extent by multiplicity of causes numerically expressed, enu-
merated or estimated according to reasonable standards of
accuracy, collected in a systematic manner for a pre-deter-
mined purpose and placed in relation to each other."
This definition makes it clear that statistics (as numerical data)
should possess the following characteristics : -
(i) They should b6 aggregates of facts. Single and unconnected
figures are not statistics. A single age of 25 years or 40 years is not
statistics but a series relating to the ages of a group of persons would
be called statistics. A single figure relating to birth, death, purchase,
sale, accident, etc., does not form statistics though aggregates of figures
relating to births, deaths, purchases, sales, a~cidents, etc., would be
called statistics because they can be, studied in relation to each other and
are capable of comparison. It is possible to study them in relation to
time, place and frequency of occurrence.
. (it) They should be affected to a marked extent by multiplicity oj cause-r.
Usually statistical facts are not traceable to a single cause. Since statis-
tics are m~st commonly used in social sciences it is only natural that
they are affected by a large variety of factors at the same time. It is
usually not possible to study the effects of anyone of these factors se-
parately as is the case in experimental methods. In statistical methods
the effects of various factors affecting a particular phenomenon are
generally studied in a combined form though attempts are also made
to study the effects of different sets of factors sepll-rately as well. Most
of the statistics, however, are affected to Ii considerable degree by mul-
tiple causation. For example, statistics of prices are affected by con-
ditions of supply, demand, exports, imports, currency circulation and
a large numbet of other factors.
(iit) They should b6 numerically 6Xpre.rS6J. Qualitative expressions
like good, bad, young, old, etc., do not form part of statistical studies
un1e~" a numerical equivalent is assigned to each such expression. If
it is said that the production of wheat per acre in 1953 was 100 maunds
and in the year 1954 it was only 60 maunds or if it is said that of two
perspns A and B, A is 20 years old and B 60 years old, we shall be mak-
ing statistical statements.
(iv) They should be enumerated or estimated according to reasonable
standards oj acctlraqy. Numerical statements can either be enumerated
in which case, they are supposed to be accurate and precise or they can
be estimated by some expert observers. Where the scope of statistical
enquiry is very wide or where the numbers are very large, enumeration
i~ usually out of question and in such cases :ligures can only he estimated.
It. is obvious that estimated :ligures cannot be absolutely accurate and I
pI(cise. The degree of accuracy expected in such :ligures depends to
a large extent on the purpose for which statistics are collected and also
4 JroN1).umNTALS 01' STA'rlS'nCS

on the nature of the particular problem about which data are being
collected. There cannot be a uniform standard of accuracy for all
types of enquiries. For-example. if the heights of a group of individuals
are being measured it"is all right i1' the measurements are correct to the
tenth of an inch but if we are measuring the dista.ri.cc from Bombay
to Calcutta, a difference of a few: furlongs even, can be easily ignored.
(v) They shOtlld bl coll,;leti in " syslll1JaliG l1Jamur. If figures are
collected in a haphazard fashion Ole can never be sure about the degree
of accuracy of such _data. It is, therefore, essential that statistics must
be collected in a'systematic manner so that they may conform Jo re-
asonable standards of accuracy.
(VI) ThU .rooflld be collettetifor a J>f'Itilllrmineti Pll1"poSI. It is obvious
that if statistical data are not collected with some predetermined aim
their usefulness would be almost negligible. Figures, are usually collect-
ed with some end in view, as without it all the efforts made in the collec-
tion of figures would be completely wasteful and the figures so collected
would not be in any way us<tuI.
(viI) The} should h, pfaffti in r,lation 10 ell&D Dlher. Statistics are
collected mostly for the purpose of comparison. If the collected figures
are not capable of being compared with each other they. lose a very
large part of their value. It is n..ecessary that the figures which' are
collected should be a homogeneous lot because it is not possible to
compare figures which are of a heterogeneous character and which
cannot be placed in relationship to each other. 1£\ for example, the
height of a person and the money spent by him in getting his house
constructed are placed together it does not make any sense and the figures
cannot be compared to each other. Such figures naturally do not come
under the category of statistics.
Webster has"also defined statistics in the same sense in which Secrist
has defined it. Webster's definition of statistics is as follows:
"Statistics are the classified facts respecting the conditions of
the people in a state ... speclally those facts which can be stated
in numbers or in tables of numbers or in any tabular or classified
arrangement."
This definition is rather narrow. It confines statis~cs onlYrto
those facts which relate to the condition of the l?eople in a state. '.Ilhis
was a very old concept o~ the word statistics and it does not suit modern
conditions. At presen~ statistics relate to all aspects of human activity
and as such this definition falls short of the modern concept of the term.
Moreover, this definition is not as clear and exhaustive as the one given
by Secrist. :
Second Type
Of the second type of definitions of the' term statistics (as statis-
tical methods or science ~f statistics) the oni: given b'1 S,ligm4n is very
short and simple and yet quite comprehensive. According to Selig-
WlANING AND DEFINITION OF STATIS1'ICS

man' 'Statistics is the science which deals with the Ibcthods of collecting,
classifying, presenting, comparing and interpretin~ numerical data col-
lected to throw some light on any sphere of enquIry."
. Acco~ding to King <"the science of statistics is the method of judg-
tog colleCtive, natural or social phenomenon from the results obtained
from the analysis or enumerarlon or collection of estimates." This
~efinitjon is not very exhaustive and it limits the scope of the science
of statistics. The author himself admits this defect but is of the view
that for practical purposes it is all right.
A. L. Bowley has given a series of'definitions but most of the de-
finitions given by him are not complete and lay emphasis only on
some;: of the aspects of the science. At one place Bowley says, «Statis-
tics may be called the science of counting.... At another place he is of
the view that "Statistics may rightly be called the science of averages".
Both these definitions are defective as the science of statistics does not
confine itself either to counting or to averaging alone. Th~se are no
doubt important statistical methods but they do not cover the entire
field of the science of statistics. Yet another definition given by the
same author characterises statistics as "the science of measurement of
tbe social organism regarded as a whole in all its manifestations."'" O'b-
viously this definition limits the application of the statistical methods
to only one field, namely, sociology. Bowley realised this limitation
and he himself writes at another place that statistics cannot be confined
to anyone science.
Bodtlington has defined statistics as the science of "estimates and
probabilities." This definition gives expression only- to certain methods
by which conclusions are derived in this science. No doubt in most
of the cases statistics are estimates and 'probabilities' but it should be
remembered that the scope of the science is not confined merely to
these things.
Lovitt ddfines the science as 'that which deals 'With the collection
classification and tabulation of numerical facts as th6 basis for explana~
tion, description and comparison of phenome~." This ·definition is
fairly satisfactory and it indicates that the science .of statistics is a sim-
ple and scientific exposition of statistical methods.
Having briefly discussed some 0% the definitioJj.s of the term statis-
tics and having seen their drawbacks we are now w. a position to give
a simple and complete definition of the term in the following words : -
Stati.rtiu (a.r lued in the .ren.re,oj data) are ,numerical .rtatement.r oj jart
rapable of analy.ri.r and interpretation and the sfienre of J/ati.rtiu i.r a .rtudy oj
thl prinripies and method.r u.red in the rollertion, pre.rentation,analy.ri.r and inter-
pretation oj nttmeriral data in any .rphm oj en(illiry.
MAIN· DIVISIONS 01' THE STUDY 01' STATISTICS
Statistics as a science can be divided into two JJlain classes, namely,
,,:,fati.rtirai mlthods and applild .rlat;.rtifl.
.6 FUNDAMENTALS OF STATl:.-rICS

t. Statistical methods
Under statistical methods are studied all those devices" rules of
procedure and ge~eral principles which are applicable to all kinds or
grou,ps of data. Thus they include all the general principles and tech-
niques which are commonly used in the collection, analysis and inter-
pretation of data relating to any sphere of enquiry. Statistical methods
are the .tools in the hands of a statistical investigator. These are devices
for achieving the desired ends explained in theory. Since a method is
always a means to an end, its acc·uracy and precision depends on thl'
object which is desired to be· achieved and this in turn is considerably
affected by the peculiar features .of the problem to which it is related.
This is the reason why different statistical methods are usc:! in different
types of enquiries and no uniform standard of accuracy is desired to
be achieved in different types of investigations.
a. Applied 8tatis~C8
Applied statistics deal with the application of statistical methods
to specific problems or concrete forms. If we have to estimate the
national income of a country or its industrial or agricultural production
then the special techniques followed to achieve these ends and the re-
sults obtained thereof would form part of applieu statistics. As IS
clear from the above explanation applied statistics can be further divideCl
into two m.ain groups. They may be either descriptive or scientific.
Dmriptive applied statistics deal with data which are known and
which naturally relate either to the present or to' the past. For example,
business statistics are descriptive applied statistics, as they deal with
the analysis, measurement and presentation of business facts relating
to past or present. On the basis of these facts decisions about various
business problems are usually taken.
Scientific applied statistics deal with the formulation of physical
and psychological laws on the basis of quantitative data collected for
descriptive purposes by the use of appropriate statistical methods. If.
for example, by the use of soine business statistics we are in a position
to derive certain conclusions, which we use for forecasting the future
trend or tendency of that particular phenomenon, we are making use
of scientific applied statistics. For purposes of business forecasting
we have to make use of such statistics.
OBJECTS OF STATISTICS

In the words of A. L. Boddington "the ultimate end of statistical


research is to enable comparison to be made between pasl and present results,
with a view to ascertaining the reasons for changes which have taken place and
the effect of SIIch changes in the future."
To achieve the above mentioned ends data relating to past and
present are collected and presented in the shape of time-series from
which valuable conclusions are drawn and these conclusions are used
MEANING ANI? DEFIN!'I'ION OF 51'F,'lS},y(... ':; 7

for the purpose of forecasting the future trend of dilferent problems.


Collection,. presentation. analysis and interpretation of statistical data
are no easy tasks. Latest statistical methods have to be applied for
ltriving at correct and dependable conclusions. Rese...rches have been
going on for improving statistical methods with a view to make them
more accurate and precise :so that the laws based on the analysis of the
descriptive applied statistics may become comparatively more stable
and dependable. It is thus very obvious that the science of statistics
is very closely associated with the progress of human civ~tion. It
helps in assessing the results of past achievements of human activities
and It is also useful for making forecasts about the future course of
events.

Questions
t. Explain clearly the concepts of statistics, statistical methods ...ad statistical
siences.
2. Examine the main differences hetween statistical, methods <and experimental
methods.
3. Critically CKamine the following de.6nitions of statistics: "Statistics is a.
>cience of counting", "Statistics is a science of averages", and, "Statistics is a sdene"
of the measurement of social organism in all its aspects". (B. C(IfII. Agra, 1'943).
4. Discuss the meaning and scope of statistics.
s. "Statistics affects everybody and touches life at man¥ points. It is both
ascience and an art." Explain the above statement with appropriate examples.
(B. Co",. Agra, 1946).
6. "Statistics of a business can be tre~ted scientifically and the preparation
and study of business statistics may be made a more e&act science than the study of
national and social statistics". Explain. (B. Co",. Allahabad, 1932).
7. "Science without statistics beats no fruit, statistics without science have
no root." Rxplain the above statement with necessary comments.
(M. A. P4lnfl, 1943).
8. "Statistics is co-operative counting." COInIl"ent.
9. What ate the characteristics that statistics (statistical data) possess. Explain
with illustrations.
10. What are the main divisio.ns of statistics. Illustrate with examples.
n. Write a note on the objects of statistics.
12. "Statistical methods include all those devices of analysis and synthesis by
means of which statistics are scientifically collected and used to explain or describe
phenomena either in their individua lor related capacities", Co'Dtt!ent on the above
statement.
". Explain with aIustrations how statistical methods tend to clarity of thOl1ght,
accuracy of estimates, verification of theories and discovery of relations.
(B. Co",. Agra, 1947).
14. UBy statistics we mean quantItative data affected to a marked extent by a
multiolicity of causes". Explain, (M. Co",. Agra, 1945).
IS. In whd ways can statistical methods be misused by interested persons
Give at least two caramplell of the misuse of statistics.
16. "A statistician is not an alchemist expected to produce gold from any "Vorth~
less material," Comment.
Origin and Growth of
Statistics 2
Early Beginninge
The origin of statistics is suggested by the derivation of this word.
It seems to have been derived from the Latin word stati.t which means
a political state. In fact the origin of statistics was due to administrative
requirements of the state. Statistics in the past were a by-product of
administrative activity. The administration of the states required the
collection and analysis of data relating to population and material wealth
of the country for purposes of war and finance. The earliest form of
statistical data, therefore, relate to census of population and property
collection of data. for other purp,oses, however, was not entirely ruled
out. Perhaps one of, the earliest censuses of population and wealth
was held in Egypt as early as 3050 B. C. for the erection of pyramids.
RamlSlI II conducted a census of all lands of Egypt. During the Middle
Ages such censuses were held in England, Germany-and other Westem
countries as well. In India about 2000 years ago we had an efficient
system of colleGbng administrative statistics. During the' Hindu period,
particularly during the Mauryan regime, our country had an efficient
system of collecting vital statistics and of the registration of births and
deaths. Ain~;-Akh4r; gives us a detailed account of the administrative
and statistical survc;y conducted during the reign of Emperor Akbar.
The histories of th~ other countries of the world also clearly indicate
that in ancient times statistics was regarded as a. matter connected with
the activities of the state and that is why it was known as a science of
statetUaft. The systematic collection of offiCial statistics originated in
Germany towards the end of the eighteenth century. In its earliest
form it was an attempt to assess, for political purposes, the relative
strengths of the German states by comparing population, industrial and
agricultural output. In England, statistics is a legacy of the Napoleonic
Wars. In order to raise new taxes that the cost of the war demanded,
it was found necessary to collect such facts and figures which would
enable government to have an idea about the probable revenue and
expenditure more accurately.

Sixteenth Ceutury
These spasmodic attempts made in ancient times to collect certain
facts and figures can be left out of account as in those days statistical
methods were not properly developed and 'We do not know the tech-
nique by which these figures were collected. Most of these figures
e.re not available and all that we kno'W is that such statistics 'Were collected
ORIGIN AND GROWTH OF STATISTICS

In those days. It has been only within comparatively recent times that
mankind has realised the utility and usefulness of collecting statistics
relating to the phenomena of physical and social universe. Prior to it,
the astronOl:n.~s used to record the movements of heavenly bodies like
stars and planets to foretell their position and to make forecasts about
eclipses. Tycho Brahe (1546-1601) collected valuable information about
the movements of planets and johannu K,pler made an exhaustive study
of these data and discovered the three famous laws relating to the move-
ment of planets. It was on the basis of these laws that Sir Isaa& N,w/on
formulated his theory of gravitation. Sir Frant:is Bacon (1561-1626)
was of the opinion that a proper knowledge of nature can be obtained
only on the basis of the study of data relating to various forms of nature,
and under his influence this method was adopted by scholars in various
fields. When these methods proved their efficacy in physical sciences
and when it was found that the results obtained by the use of these devices
were very accurate, social sciences like politics, econqpUcs and sociology,
all adopted statistical methods for the formulation of their theories
and for testing the degree of accuracy ~chieved by them.

Seventeenth Century

During the seventeenth century the methods of statistical science


were used under the banner of Political Arithmetic. Captain Joon Graunl
of London (1620-1674) was ,the first person who sturued statistics of
births and deaths and he is often referred to as the 'Father of Vital Statis-
tics'. It was during this period ,that figures relating to births, deaths
and marriages were collected by other persons also, specially by the
preachers of the Protestant Churches with a view to check illegi-
timacy prevailing in those days. During this period Edmund HallY
prepared the first life table giving the expectation of life at each age on
the basis of data collected by Casper Newman, in 1691, relating to death
records of Breslau. Sir William Petty (1623-1687) also prepared mor-
tality tables and calculated expectation of life at different ages. Later
on Jamer Dodson, Thomas Simpson, Dr. Price and others also, computed
mortality tal;>les and it was during this period that the idea of life in-
surance was developed. The first life insurance institution was founded
in London in the year 1698.
Even in the early 18th century statistical methods were used in the
same old name of political arithmetic. 1. P. Suumikh (1707-1767)
who was a Prussian clergyman statistically explained the theory of
'Natural Order of Physiocratic School'. He developed the .doctrine
that the ratio of births and deaths remains more or less constant and
that it is a kind of natural law;
, jacob Bernoulli (1654-1705) in his great work Ars Corget/andi pub-
lishe eight years after his death, was the first person to state the 'Law
of Large Numbers' and S. Poisson (1781-1804) also contributed a brilliant
paper on this subject.
10 FUNDAMENTALS OF sTAl'lSnCS

Eighteenth Century
The modern theory of statistics can be said to have been formulated
by L. A. Ji QHelict (1796-1874). He put forward the notion of 'average
man' whose actions, he stated, conform to the 'average rc;,:;ults obtained
from society.· He was further of opinion that the action and beha-
viour of other persons deviated from this form in a lesser or greater
degree and these deviations from this theoretical average were capable
of being treated by the method of errors and probability. He also
emphasised the im1?0rtance of the 'law of large numbers' which was
founded by Jacob Bernoulli.
In fact the science of statistics is highly indebted to the games of
chance. G. Cartlano (1501-1536) who was a great mathematician and
at the same time a big gambler also, wrote a valuable treatise on the
hazards of the .game of chances and he pronounced certain rules by which
the risks of gambling could be minimized and one could protect him-
self against cheating. These rules were based on the correct approach
to the problems which we, in modern times, study under the theory
of probability. Jacob Bernoulli and his nephew Daniel BernoHIIi (1700-
1782) laid a solid foundation of the theory of probability and put forward
the idea of 'moral expectation'. It was after this that Pierra Silllon de
Lapl(lce (1749-1827) published in 1782, his monumental work on the
theory of probability. This work is recognised as one:: of the best ever
done on the subject of probability. It is both mathematical as well as
philosophical. Later on most of the prominent mathematicians of
the eighteenth and nineteenth centuries like Moillre, Fiuier, Lagrange,
Chrystal, Btges, TodhHnter, GaHss, MorgaH, Lexis and Charlier, to mention
only a few names, contributed to the subject to probability.
Nineteenth and Twentieth Centuries
On these foundations laid by the mathematicians of the eighteenth
and nineteenth centuries modern theory of statistics' was gradually built
up. G. F. Knapp (1842-1926) and W. Lexis (1837-1914) contributed
valuable works on the statistics of mortality. Sir Frands Galton (1822-
1911) was the first to introduce statistical methods in the field of bio-
metry. Later on Karl Pearson took up this chain and his work on the
subject is too well known to need any detailed description. In the words
of Pearson himself, "the whole problem of evolution is a problem of
vital statistics, a problem of longevity, of fertility, of health, of disease
and it is impossible for the evolutionist to proceed without statistics as it
would be for the Registrar General to discuss the National Mortality
without an enumeration of the population, a classification of .deaths and
a knowledge of statistical theory."
It 'Was in the second half of the last century and in the present
century that statistical methods entered the realm of the science of eco-
nomics and became intimately associa~d 'With the ancient subject of
mathematics. Though relationship of statistics and mathematics is
very old yet it is only during the last tOO years or so that the two sciences
have come very ~ose to each other. In recent years the domain of
ORIGIN AND GROWTH OF STATISTICS 11'
statistical methods has considerably widened and today there is hardly
any science which does not make use of statistical methods. The science
of statistics is now associated with all other sciences in some form or
the other and we shall now study the relationship of statistics with other
sciences particularly. with economics and mathematics. For the past
two decades particularly there has been a remarkable and sustained
growth in the use of statistics. This is because business, government
and science, three fields in which applications of statistics are most nu-
merous and di\'erse, are growing in volume and complexity. It is
also because of the technological revolution which has taken place in
data handling, affecting especially computing and tabulating equipment,
and a scientific revolution in statistical theories and techniques.
RELATIONSHIP OF STATISTICS WITH OTHER SCIENCES
Statistics and Economics
Though the relationship of statistics ..,-ith economics dates back to
1690 when Sir William Petty published his book named "Political Arith-
metic" yet the relationship of these two sciences became intimate rather
very late. No doubt statistical data ab9ut economic problems used to be
collected in the past but there was no relationship between statistics and
economic theory. In earlier stages of development the science of eco-
nomics was based on deductiol. and the predominance of deductive
approach was responsible for the disinterest of economists towards
quantitative data for purposes of the development of economic doc-
trines. Besides this, there was also a tendency in those days to avoid
figures which were considered to be lifeless, rude and coarse. What
was responsible for this peculiar disposition to figures in those days if,
difficult to state. It is a fact that people wanted to avoid rude shocks
which awaited them in the world of facts and always wanted to be vague
in their statements and logic. Gradually this hatted for figures melted
away and even deductive writers like J. S. Mill admitted that "in some
cases instead of deducing our conclusions from reasoning and verifying
them from observations we begin by obtaining them provisionally from
specific experience and afterwards connect them with the principles of
human nature by a priori reasoning." Similarly in 1871 W. S. lepons
wrote that "the deductive science of economy must be verified and
rendered useful from the_purely inductive science of statistics. Theory
must be invested with the reality of life and fact. Political economy
might gradually be erected into ,the exact science, if only commercial
statistics were far more complete and accurate than they are at present
so that the formulae could be endowed with the exact meaning by. the
aid of numerical data. Jevons developed the technique of an~ysis of
time-series and was the pioneer in the field of price studies and index
numbers. Rightly he has been called the 'Father of Index Numbers'.
Besides Jevons the Historical School (1843-1883) also brought statistics
and economics close to each other. In fact Roscher, Knies, and Hilde,.·
brand, all were of the 'opinion that economic doctrines should not be
argued in the abstract and that they should be inductively verified. The
12 FUNDAMENTALS OF STATISTICS

·effect of the preachings of Historical School was indeed very great and
the science of economics no more remained merely deductive in its
approach. By th~ time the .present century began, much of the opposi-
tion to .the use of .statistical methods in the realm of economics had
elided and in 1907 Ai/rId Marshall could write, "Disputes as to methods
have ceased. Qualitative analysis has done the greater part of its work ...
··that is to say, there is general a~reement as to the charactc.cistic and
duration of the changes which varIOUS economic forces tend to produce.
Much less progress has been made towards the quantitative determina-
tion of the relative strength of different economic forces ...... that higher
·and more difficult tasks must wait upon the slow growth of thorough
realistic statistics." At the same time Pareto wrote, ':The progress of
political economy in the £uture will depend in great part upon the in-
vestigations of Impiri&a/laws derived from statistics which will then be
compared with known theoretical laws or will suggest derivation from
them of new laws." Later on Lord KeylUs writirig abaut the functions of
statistics w[Ote that it is "first, to suggest '~f::al /aWt, it mayor may
not be capable of subsequent deductive exp tions; and. secondly, to
supplement deductive reasoning by checking its resu,lts and submitting
them to the test of experience." Now there are no tWo opinions about
tIie fact that both induction and deduction are necessary for the growth
and development of economic science. In fact statistics and economics
are so intermixe~ with ea~ other now that the question of th~ir separa-
tion does not arIse. .
Fa&tors responsibl, for &Ioser lies b,/ween Itonomiu aIId sfa(isliu. Since
1890 two factors have worked together to bring about this great change in
the relationship of statistics and economic:s-. 'the Brst is the develop-
ment of statistical methods-of probability G:dd -sampling, simple and
partial correiatj9n and association, periodicity'-an<l index Jl11ID.bers, etc.,
the second is the enlargement of statistical material in recent years. In
fact during this period various eminent statisticians like C. B. Datl,nporl.
A. L. Bowley, W. Pearson. W. I. King and R. A. Fisher. etc. have made very
valuable contributions towards the developments of the science of statis-
tics. During this period the statistic~ data have also increased in
quantum allover the wo.rld on account of the establishment of statistical
bureaus in various countries. Tpe improvement of statistical methods
and the expansion of statistical data have thus brought economics and
statistics very close to each other and have marked the real. inception of
statistics in the domain of the science of economics.
Statistics, economics and mathematics
It has already been mentioned above that statistics and mathe-
matics have been closely in touch with each other eve.r since the seven-
teenth century when theory of probability was found to have bearing on
various. Cltatistical methods. During the last 100 years or so not only
statistics and mathc;.matics have come very close to each other due to the
dc;velopment of mathematical statistics, but these sciences have been
joined by economics as wells and now there.is a happy union between
statistics, economics and mathematics, Mathematics has considerably
'-
OllIGIN AND GllOW'I'H OP S'l'ATISTICS

helped in the development of economic theories and now mathematical


economics has become a very important branch of the science of econo-
mies. In December 19.30, the first econometric Society was founded in
the United States of America and this was a sort of recognition of the
union of these great sciences. ,The purpose of the .society was officially
defined as "the advancement of economic theory in its relation to statistics
and mathematics ...... " The aim of econometrics is to make economics
more realistic and practical science. Mathematical approach to economic
theories makes them more precise and logical and similarly statistics give
a quantitative conclusion about the validity of purely theoretical concepts.
Thus we see that in lll,.odern times economics, statistics and mathematics
are very much intermixed with each other and this union has' proved
helpful for the ,development and progress of all these sciences.
Statistics and Astronomy
Statistics, as we have already mentioned earlier, 'Were first collected
by astronomers for the study of the movement of stars and' plantts.
In fact there are a few things which are common between physical sciences
and statistical methods, and astronomers applied statistical methods for
the furtherance of their studies. The method of least squares was first
q,eveloped by an astronomer. Astronomers generally take a large num-
ber of measurements and in most cases there is some difference between
these several observations. In order, therefore, to have the best possible'
measurement they have to make use of the technique of the law of errors
in the form of method of least squares. Besides this, even in physical
sciences usually first rough measurements are taken anrllater on as morc
and more data are available and as the precision of instruments of measure-
ment increases, better estimates arc put forward. In order to give an
idea about the degree of accuracy achieved, usually limits are assigned
within which the true value of the phenomenon is expected to lie. This
is in fact a purely statistical approach. Thus 'We find that even though
astronomy is a physical science, and statistical methods are generally noi
applied in such sciences, yet they cannot be entirely ruled out of question.
Statistics and Biology
The development of biological theories has been found to be closely
associatd with statistical methods. Professor Karl Pearson in his Gram·
mar of Sciences says that the whole doctrine of heredity rests on statistical '
basis. The contention that tall fathers have in general tall S011S can be
proved only by the use of statistical data and statistical methods. The
differences observed in the various generations in different zoological
species can be measured and studied only with the help of statistical
technique. Thus we see that statistical methods help in the formation
of the theory of development of human and animal life.
StatistiC's and Meteorology
Statistics is related to meteorology. In meteorology records are
made of temperature, humiditr o£ air and barometrical pressures, etc.
'For purposes of comparison /and forecasting it becomes necessary ~
average these figures and. to study their trend and fluctuations. . A
14 PUND.Al.mI\lTALS op STATISTICS

study of the significance of these deviations has also to be made for various
purposes. All this cannot be done without the use of statistical methods.
We thus find that the science of stat1stics helps meteorology in a large
number of ways.
The above account of the origin ~d growth of statistics clearly
reveals the fact that the great science of statistics is associated with all
the other important sciences both physical as well as social. In fact
today the domain or statistics' is very wide, it is almost universal and
it is difficult to imagine any science worth the' name where statistics has
not proved its usefulness in some form or the other. Bowley was right
when he said, "A knowledge of statistics is like a knowledge of foreign
language or of algebra; it may prove of use at any time under any cir·
cumstances. "
Callser of the recent growth of Statistiu. The tremendous growth in
the use of statistics, l'.S has been shown above can be attributed mainly to
two factors, "i~. :-increased demand of statistics and decreasing cost of
statistics.
(I) Increased dlmand. There has been a phenomenal increase in the
demand. for statistics in various fields. Statistics are most commonly
used by businessmen, government and scientists. The spheres of the
activities of all these three categories have increased extraordinarily in
modern times. The magnitude of business has considerably increased
resulting in an increased demand of statistics. The business in modern
times has become a very complicated affiUr and this fact has further aug-
mented the demand of statistical data. The complexity in business is
on account of numerous government regulations, laoour disputes, ever·
increasing taxes ~d technologjcal revolution which the business world
has witnessed in recent years.
Even more than business activities, the activities of the government
have incJ;eased both in size a.II well as in complexity. Modern states are
welfare states and they have to look after a large variety of things result-
ing in an increased demand of statistics.
Probably the most spectacular development of modern world is the
growth of scientific research. Science today is a very complex pheno-
menon and different types of researches in the field of science are of an
e~emely complex nature and they make an extensive use of statistical
data. We thus find that the demand of statistics has considerably in-
creased and this is one reason why the science or statistics is developing
so fast.
(;/) Decreasing Cost. Another reason why the science of statistics
has developed so fast and has become so popular is that on account of
a number of reasons the cost and the time required for the collection and
analysis of data have gone down. There has been a vast improvement in
the technique of processing the data which has resulted in great economy
of both time and cost. Modern computing and tabulating machine!:
not only save time but money also. The development of electronic
calculators and other modern machines like desk-calculators and card
ORIGIN AND GROWTH 010 STATISTICS 15

sorting machines etc., have made the task of scientists, businessmen and
administrators very easy and simple. They have resulted in a very great
economy both in terms of money as well as the time needed to do a job.
Statistical theory has also developed in modern times in such a
manner that the cost of compilation of statistical data has gone down
considerably. The theory of sampling and various designs of experi-
ments and statisticallJ.uality control have all contributed towards lower-
ing the cost of collection and analysis of statistical data.

Questions
I.Write a shott essay on the origin and growth of the science of statistics and
throw light on its future.
2. &plain the relationship between ~conomics and statistics.. How far has
the use of statistical methods in economics led to its development ?
• (B. Com. Lt«kf/4flJ 1941)
,. "Statistics are the straw out of. which every other economist has to
make the bricks." (Marshall).
B'It()lain, in the light of the above observation,the relation between ec)l1omics
alld statistics and discuss how far it is correct to say that the science of economics is
becoming statistical in its method. (M. Com. Allahabad, 1944).
4. Trace the association of mathematics with the science of statistics and show
how the former has considerably helped the development of the latter.
s. Discuss the relationship between statistics and various soclal sciences.
6. Do you think that statistical methods are of any help in physical scJences?
If 80, how?
7. Write a brief essay on the relationship of economics, statistics and mathe-
matics.
8. Show how the science of statistics which was originally the science of state-
craft has now become the sclence of universal application. Do you think that statistical
methods are in reality applicable to all types of sciences ?
9. How far has the growth of statistics coincided with the development of
physical and BQ.cial sciences ?
10. "Statistics is an apparatus by the help of which the validity of the laws of
physical and social sciences can be tested". Comment.
II. Discuss the factors responsible for the quick development of statistics in
recent years.
Importance, L imitation and
Functions of Statistics 3
Sflltisnrs and th, coml11on man. The fact that in the modern world
statistical methods are of universal applicability, is in itself enough to
show how important the science of statistics is. As a matter of fact
there are millions of people all over the world who have not heard a
word about statistics and yet who make a profuse use of statistical me-
thods in their day-to-day decisions. Statistical methods are common
ways of thinking and hence Rre used by all types of persons. When
a .person wishes to purchase a car or a radio and he goes through the
price lists of various companies and makers to arrive at a decision, what
he really aims at, is to have an ideaabout the average level and the range
within which the prices vary, though he may not know a wQtd about
these terms. When a farmer wishes to have a particular quantity of
tain in a p~ticular season so that he may have a good crop, he has in fact
an idea of the correlation that exists between rainfall and crop yields and
the regression line of. crop yields on rainfall. Again when we use a
common proverb ·'as you sow, so you reap" we indirectly pint that there
is a positive correlation between one's actions and achievements.
Examples can he multiplied to show that human behaviour and
statistical methods have much in common. In fact statistical methods
are so closely connected with human actions and behaviour.that practically
all hvroan activity can be explained by statistical methods. This shows
how important and universal statistics is.
CAUSES OP nIB IMPORTANCE OP STATISTICS
Simplifies Gomplexi(J. One reason why statistics is so important
today is that it simplifies complexity. Human mind is not capable of
assimilating huge facts and figures, and statistical methods, by making
these data easily intelligible and readily understandable render a great
service, because in its absence the information 'Would not have been
of any use. Statistical methods describe a phenomenon in a very simple
fashion. If, for example, we have to study the economic system of
Soviet Russia we cannot properly understand it by a purely descriptive
method in which no statistics are used, but if the different aspects of tho
economic system are numerically eXpressed we can und~rstand the whole
thing in a short time and in a better manner.
'M£asures·rU1IIts. Similarly if we have to measure the results of
particular policy it can best be done by statistical methods. If we have
to study. for example, the effect of a rise in the bank rate on the industries
of a country we c~n do so in a proper manner only by, means of a statistical
IMPORTANCE. LIMr1'ATIONS AND FUNCTIONS OP STATISTICS 17

study of the phenomenon. Such situations involve the rather difficult


and delicate task of the comparison of two phenomena. The pertinent
question in the present case would be, whether a rise in the bank rate has
affected the industries adversely or favourably? An answer to this
question would involve. a comparison of the present situation with the
past and also a decision whether the change has been beneficial or other-
wise, from the point of view of industries. Without an adequate use of
statistical data it would he impossible to arrive at any sound and depen-
dable conclusion. Statistics thus help in measuring the effects of a
particular policy and in arriving at a conclusion about it.
Studies relationships. Yet another reason for the importance of
statistics is that it makes possible a study of correlation between two
phenomena. In all types of studies the importance of observing relation-
ship between different phenomena is very great. The relationship bet-
ween, say, price and supply, or demand and price is a phenomenon which
requires a very careful and close study before any generalization can be
made. In the absence of statistical methods it would be very difficult
to arrive at a precise and correct conclusion in this respect.
Enlarges human experience. Thus we see that the science of statistics
enlarges human experience and knowledge, by making it easier for man
to understand, describe and measure the effects of his own actions or
the actions of .others. Many fields of knowledge would have ever re-
mained closed to mankind but for the efficient and refined technique
and sound methodology provided by the science of statistics. It has
provided such a Jllaster key to mankind that he can use it anywhere and
can study any problems in its correct perspective and on right lines.
We have described above how the science of statistics helps in the
solution of many difficulties which mankind has to face. The technique
of the science can briefly be summed .uP in three words. namely, Des-
cription, Comparison and Correlation. It is with these devices that statistics
help in improving human knowledge about various sciences and· in
solving various difficulties which arise in the development and growth
of these studies. We will now briefly discuss how statistics is indispen-
sable in different branches of human activities.
INDISPENSA13ILITY OF STATISTICS
Tmporland in economics. In f:he field of economics it is almost im-
possible to find a problem which does not require an extensive use of
statistical data. Important phenomena in aIr branches of economics
can be described, compared and correlated with the help of statistics.
Statistics of consumption tell us of the relative strength of the desire
of a certain section of the people and its variations from time to time.
By statistical analysis we can study the manner in which people spend
their income over various items of expenditure, namely, food, clothing
house rent, etc. Statistics of production describe the wealth of a nation
and compare it year after year showing thereby the effect of changing
economic policies and other factors on the level of production. Ex-
change statistics throw light on the commercial cievelopment of a nation.
2
18 FUNDAMENTALS OF STATIS'l'ICS

They tell us about the' volume of business done in a country and the
amount of money in _circulation. Distribution statistics disclose the
economic conditions of the various classes of people. They throw light
on the distribution of national dividend amongst the inhabitants of a
country. We thus find that in all types of economic problems statistical
approach is essential and statistical analysis useful. Mathematics and
"its offsprings, statistics and accounting, are the powerful instruments
which the modern economist has at his disposal, and of which business
through the development of research agencies and JIlethods. is 'making
constantly greater use.
Need in planning. Modern age is an age of planning. The days
of laisse~ faire are gone and state intervention in practically all aspects
of life has become universal in character. Today. we live in ~ period of
transition; economic activities are being more and more closely directed
to the production of such goods, and the provision of such services, as
the government may decide to be most urgently required~. Our future
is 'Very largely being pla111led, and this planning, to be successful must be
soundly based on the correct analysis of complex statistical data. When-
ever we thiuk of a plan we have to think of statistics. Planning cannot
be imagined without statistics. If we study the economic plans imple-
mented in various countries in recent times we will·find that all of them
are a statistical study of the economic resources of the respective countries,
and they suggest possible ways and means of utilising these resources
in the best possible manner. Various plans that have bTen prepared
for the economic development of India have also made \1se of the statis-
tical material available about various economic problems. The fact that
in our country the amount of statistical material available to the planners
has been very scanty, is responsible for many drawbacks and inaccuracies
in different plans. Not only plans of economic development are construc-
ted on the basis of statistical data but the success that a plan achieves is
also measured best by the use of statistical apparatus. We thus find that
in the field of economic planning the use of statistics is indispensable:
Usefulness in commerce. .~tatistics are an aid to business and com-
merce. In fact today the situation is, that a businessman succeeds or fails
according as his forecasts prove to be accurate or otherwise. When a
man enters business he enters the profession of forecasting, because success
in business is always the result of precision in forecasting and failure in
business is very often due to wrong expectations,. which arise in turn due
to faulty reasoning and inaccurate analysis of various causes affecting a
particular phenomenon. Modern devices have made business fore-
casting more definite and precise. Economic barometers are the gifts
of statistical methods and businessmen all over the world make extensive
use of them. A producer estimates probable demand of his goods, ana-
lyses -the effects of trade cycles and seasonal variations as also of changes
in habits and customs of people on the demand of his wares, and after
taking all these factors into consideration finally takes decision about the
quantum of production. A businessman who ignores the effects of booms
and depressions can never succeed and is bound to face frustrations as his
IMPORTANCE. LIMITATIONS AND FUNCl'IONS OF STAl'ISnCS 19

calculations are Sure to be faulty. A study of all these things is in reality


a study of'statistics and hence we say that all types of businessmen have
""to make use of statistics in one form or the other if they want any success
in their profession. For the solution of problems connected with tht
internal organization and administration of business units and with the
processes of buying and selling that bring the businessman into ~ontact
with the price system. methods of statistical analysis are peculiarly appro-
priate. Various branches of commerce utilise the services of statistics
in different forms. Promoters of new business make extensive use of
statistical data to _arrive at conclusions which are vital from the point of
view of starting a new concern. In fact in the absence of statistical data
it would be impossible to carryon the activities of promotion of new
concerns on sound lines and the number of business failures would be
much more than at present. Again, cost accounting is entirely statistical in
otltlook and it is with the help of this technique that producers are in a
position to decide about the prices of various commodities. We thus
find that the science of statistics is of extreme importance to business a~c;l
commerce and in its absence the growth of these things would be lopsided
and very slow, and business uncertainties would considerably increase.
Utility to bankers. brokers, insurance Gompanies, etc. Bankers, stock
exchange brokers, investors, insurance companies and public utility
concerns all make extensive use of statistical data. A banker has to make
a statistical study of business cycles to forecast a probable boom or a
depression and has to study in detail the seasonal variations in the demand
for call money from its clients. It is after a study of these factors that a
blnker decides about the amount of reserves that should be kept. Unless
the calculations of a banker are correct, and they cannot be so in the
absence of statistical data, he is' always in danger of making a mistake
and losing public confidence upon which his entire existence depends.
Statistics are equally important from the view-point of stoGk txchange
brokers, speculators and investors. They have to be fully conversant with
the prevailing money rate at' v~rious centres and have to study their
future trends. Their success depends on the extent to which their
forecasts about the future trends of money rates come to be true. In-
surance companies cannot carry on their business in the absence of
statistical dlta relating to life table:? and premium rates, etc. In fact,
Insurance has been one of the pioneer branches of commerce and business
which has been making use of statistics from the very beginning of its
existence. Insurance as an institution could never have existed in the
absence of statistical data. Theory of probabiflty, works itself out fully.
in the field of insurance and the success of an insurance company de-
pends on the accuracy of the basic data that-it uses for the calculation
of premium rates, etc. Again public utility concerns like railways,
dec/ric supp!J companies, waterworkl, etc., alsd make extensive use of
statistics. As a matter of fact it is difficult to imagine any business. big
or small which does not make use of statistical data, in one form or the
other. The science of statistics is indeed indispensable to business and
. commerce.
20 FUNDAMEmALS OF STATISTICS

Utility ;n Blisiness Management. One element common to all


problems faced by business managers is the need to make decisions in the
face of uncertainty; and the essence qf modern statistics lies in the
development of general principles for dealing 'Wisely 'With uncertainty.
Modern statistical tools of collection, classification, tabulation, analysis
and interpretation of data have been found to be an important aid in
making 'Wise decisions at various levds of managerial fUnction. The
uses to which statistical methods are put in this area are many and
varied, for example, the success of a modern industrial enterprise
depends to a significant extent on the accuracy of production pro-
gramming, and sales,. quality and inventory control. Statistical tools
are relied upon heavily in arriving at correct decisions in all these
aspects.
The success of production programming both in the short as well
as long period depends to a great extent on the quality of sales forecasts
and proJections. The sales forecasts may be made in one or more of
various ways. A compants regional offices may make estimates of sales
in their respective areas. They may alsol take into account seasonal va-
riations. These regional estimates may then be subjected to statistical
treatment and final forecasts made. These forecasts may further be recast
in the light of statistically estimated variations in aggregate market
variations under the impact of economic programming at the national
level.
The variations in sales below or above the forecast may disturb the
production schedules. These variations may be seasonal and the policy
of stabilising production means that the inventory of finished goods
would vary rather widely. The upper and lower zones are set based
on statistical study of sales fluctuations and if fluctuations occur outside
this zone, changes in pro~uction schedules are made.
Effective control on sales can also be exercised through regional
allocations. 1.1 this case also the determination of aggregate and re-
gional sales potential is based on a statistical study on tto:nds. Statistical
methods also help the sales executive in the establishment of the share of
responsibility on each regional sales office for the achievement of sales
targets. These measures enable the sales executive in determintng which
of the regional offices is most effectively utilising the market potential
of its region. Market research, co~sumer preference studies, trade channel
studies and readership surveys !lre other methods of sales control which
make an extensive use of statistical tools.
Statistical methods also come to the aid of quality control. A
manufacturer of match boxes may select 100 pieces which are consi-
dered to be satisfactory with respect to its characteristics and acceptability
to consumers. The average weight of each box may then be ascertained
and set on a standard of quality. Now variations in the weight of match
boxes from this sta.ndard vTeight may be due to a mUltiplicity of varia-
IMPORTANCE. LIMITATIONS AND Fl'NCTIONS OF STATIS'l1ICS 21
ti')ns in the size. quality a.nd weight of its components variations in the
quality of match and bO'lt-wood. quantity and quality of phosphorus,
fJIloisture in wood or phosphorus, labelling etc. Statistical methods can
also be used to avoid the cumbersome process of weighing each and
every match box. The boxes may be divided into lots on random sam-
pling basis, and boxes may again be selected from each lot on random
sampling basis. If the weight of these boxes corresponds to standard
weight, the whole lot may be considered to be of standard quality.
Inventory control is essential for economical functioning of business
enterprises. It relates both to quantitative and qualitative aspects. The
stocking of inventories at the optimum level depends on the accuracy
of sales forecasts and correlation between the fin~l product and size and
quantity of each raw material, tools, equipment, fuel etc., needed for it.
Quality control On inventory is not only facilitated but also made more
accurate with the aid of statistics. It may be a cumbersome, time con-
suming and costly process to inspect each and every item of inventory
purchases. This is particularly so if the size of items is very large. It
is also not possible to inspect meticulously each and every item. This
problem is tackled by the method of random sampling. Some items
are selected on this basis and subjected to close quality inspect~on and
if these are found according to specifications, the whole lot is accepted~
Importance to the J/ate. Statistics are very helpful to a state as they
help it in administration. Modern State makes extensive use of statistical
data on various problems. Before enforcing any policy a State has to
examine its pros and COBS and this can be done only with the help of
numerical data. Evils of drinking or crimes nee,d a proper statistical
investigation before remedies can be suggested for them. The state has
to collect figllres of population for various purposes; it has to estimate
the figures of national income to find out the prosperity of the country.
A state in the modern set-up besides being an administrative body is a
big commercial concern also. It carries on businesses of various kinds
and has monopoly in many cases. It needs statistics for carrying on these
works. In fact the state is always the most important single unit which
not only collects the largest amount of statistics but also needs statistics
on a very extensive scale. Probably this is the reason why official statis-
tics occupy a very important place in the statistical literature of any
country.
Desirability ill Research. Modern statistical methods and statistical
data are being found increasingly useful in research in different fields.
Experiments about crop yields with different types of fertili~ers .and ·.di-
fferent types of soils or the growth of animal life under different types of
diets and environments are very often designed and analysed according
to statistical methods. Even in the field of medicine and public health
statistical methods are used for testing the efficacy of new medicines arid
methods of treatment. In the field of industry and commerce statisticians
carry on different types of researches. They try to find out the sources
PUNDAMENTALS OF S'VAT1STICS

and causes of variations of different products from their standard quality.


The technique of quality control is entirely statistiGal in nature. Mark et
researches are carried on by making extensive use of statistical method:,.
Even in literary field statistical.researches have been found to be useful.
Statistical studies about the length of sentences, the frequency of various
words and various parts of speech have been used to find out whether a
disputed work is of one author or the other. As a matter of fact, for
a research worker in any field which is concerned with numerical
results a study of statistical method is not only useful but necessary.
Universal appHrability of sta/is/iral methods. We thus find that the
'Statistical methods are of wide, almost -qniversal applicabili~y. Govern-
ment needs them, economists and business men .J.eed them; in fact all
types of persons, astrologers, astronomers, biologists, meteorologists, bo-
tanists, and zoologists make use of statistics and statistical methods.
Statistics, when used effectively, become so intertwined in the whole
fabric of the ~ubject to -which it is applied as to be an integral part of it.
Statistics assist in planning the initIal obse.rvations, in organizing them
and formulating hypotheses from them, and in judging whether the new
observations agree sufficiently well with the predictions from the hy-
potheses. The universality of statistics is enough to indicate its im-
portance, utility and indispensability to the modern world.
Some illustrations of the uses of statistics in various fields have
been given below.

Uses in Business Management

(i) Qualitative Change in Produr/. A soap manufacturer found


that his product was loosing ground to a competitor who had introduced
a new quality. He decided to make a su.rvey of consumers' preferences
so as to provide the b·asis for introducing qualitative changes in his
own product. But survety of preferences of millions of consumers spread
all over the country was 3 very expensive and time consuming proposi-
tion. It was, therefore, decided to take a sample of 1,000 consumers.
Consequently, he sent his own soap and that of his competitor to the
sampled consumers alongwith a questionnaire requesting them to show
their preference and the reasons for such preference. The tabulation,
analysis and interpretation of the information supplied by them reveaJed
that scent, transparence and colour of the competitors' soap were res-
ponsible for its preference by them, so a new formula was devised
for introducing these qualities to a greater degree in his own soap with
the result" that his prodllct could again outsell the competitor.
(il) Sales CtJn/rol. A manufacturer of electrical appliances was
worried about the declining trend in take-off from the factory. The
only information available in his records related to monthly despatches
The consumers' su.rvey was indeed an expensive and time consuming
task. Even the number of dealers in his product ran into thousands and
IMPORTANCE, LlMtTA'tIONS AND ';tJNCT10NS OF STAnSTICS ~3

presented the same problepls, although to a lesser degree. He, there-


fore, decided to address a questionnaire to a random sample of dealers
including whole-salers and retailers. Their replies were classified
and tabulated separately. A statistical interpretation of the processed data
showed that the wholesalers had drastically reduced their sto~s due to
two reasons viZ., credit squeeze exercised by banks and reduced orders
from retailers. The retailers were facing marketing difficulties. The
manufacturer of competing products had started providing mobile re-
pairing services. These two factors were responsible for decline in
factory take-off. The provision of pecessary credit facilities to whole-
salers and organisation of mobile repairing services restore the fall in
take-off.
(iii) Issue oj Bonus shares and Revaluation of Siock. (a) A general
as well as sectoral study of price movements by the Amen Steel Structurals
showed that the prices of all the commodities have been rising steadily
during the last ten years. A statistical analysis of price movements and
developmental outlay further showed that a positive: and high correla-
tion existed between the two. A study of trends in developmental
outlay and of the deliberations of the Planning Commission showed that
during the. Fourth Plan it will be nearly double the size of the Third
Plan. It was, therefore, concluded that it could be safely assumed that
prices were not likely to decline in future. The company could, on this
basis, take the decision of making a revaluation of its fixed and floating
assets and issuing bonus shares.
(b) The revaluation of fixed assets, inventories and stock also
presented a serious problem as they ran into thousands. Moreover,
many of them were not available in market due to import restrictions
and it was impossible to ascertain their market price. The task of esti-
mating the residual life of all the fixed assets was also very complex and
difficult. The company, therefore, decided to'take a random sample of a
few items of each type of asset and make its valuation. On this basis,
the valuation of the whole lot of assets was assessed and results were
found to be very satisfactory.

uses in Social Sciences


'(i) Scholastit Performance. Numerous statistical studies have de
monstrated that a high and positive correlation ~ists between scholastic
performance and extra-curricular activities. Findings of these studies
have led to a vast expansion of extra-curricular activities in educational
institutions and ways and means have been devised for encouraging
students to participate in them.
(ii) Public Qpinion. General impressions about public opinion
are often found to be misleading. Carefully designed statistical analysis
has been very helpful in arriving at accurate conclusions. Immediately
after the cease fire following the Indo-Pakistan War in September 1965. it
24 FUNDAMENTALS OF STAnSTICS

was generally believed that the Indian people wanted to resume tighting
again. A poil of public opinion carried out by a leading newspaper
r::vealed the following result :
Yes No No
Are ynu in favour of another round opinion
of fighting with Pakistan. 65 25 10
Uses in War
(i) Active lead by OJlicers. A statistical analysis of the Indian and
Paklstani casualties during the Indo-Pakistani War of September 1965
re vealed that the proportion of officers among those killed was higher
ou the Indian side. This showed that the Indian armies were actually
led by their officers and this was one of the important far.tors responsible
for Indian victory. This factor will assume importance in the formula-
tion of future war strategy.
(ii) Training in the Use of War Eqllipment. The heavy reverses
suffered by Pakistan during the above War, despite its vastly
superior Air Force and armoured Corps came as a great surprise to the
whole world. Statistical analysis with its causes revealed that a high
. and positive correlation existed between the _period and intensity of
training in the use of aeroplanes and tanks and their effective use ill war.
A further investigation into the period and intensity training provided
in both the countries revealed that Pakistani failure td make an effective
use of its fighters. bombers and tanks was due to inadequate and inferior
training of its personnel
(iii) Inspection ofpurchases. During the war, military requirements
of goods and commodities increase tremendously. Complete inspec-
tion of each and every item involved huge expenditure and time of a
large number of personnel and it can also not be done expeditiously.
Here statistics come to the help of the army. The use of sampling ins-
pection method helps not only in its quick disposal but also gives accurate
results. Under this method, only a few items, say 2 per cent. are selected
on random sample basis and thoroughly inspected. This method is both
cheapet and expeditious. It also ensures accuracy as it is easier to ins-
pect more closely a few rather than a large number of items.
LIMITATIONS OF THE SCIENCE OF STATISTICS

Does not study eptalitative plJenomena. Despite the universality of its


application the science of statistics has its oWu limitations. The most
important limitation of the science is that it can be applied only to those
problems which are capable of quantitative expressions. Such pheno-
mena which.cannot be expressed in figures have very little use of statistical
methods. Honesty.. for example, cannot be measured in figures and,
therefore, in a study of honesty statistical methods cannot be of much
help. However, it should be noted that even these subjective concepts
WPORTANCE, LIMITATIONS AND FUNCTIONS OF STATISTICS 25

caiJ be related in an indirect fashion to numerical data. Honesty itself


may not be capable of quantitative analysis but many factors which are
related to this phenomenon are capable of being expressed in ligures and
as such can throw some light on the study of this problem. A study of
the number of thefts or cases of cheating or swindling can indirectly tell
us something of the problem under study. Again, tht' crime can be
measured in terms of the men who go to prison and if the !lumber of such
persons is decreasing, we can safely say that there is a better enforcement
of the law and crime is on the decrease. Similarly a study about the
culture or civil~tion of a country is not possible with the help of statis tical
methods though these methods can certainly help in such studies in an
indirect and subsidiary manner.
'DoIs II()t reveal the entire story. Another limitation of statistics is
that it cannot reveal the entire story of a problem. Since many problems
are affected by such factors which are incapable of statistical analysis it
is not always possible to examine a problem in all its manifestations only
by a statistical approach. Many problems have to be examined in the
background of a country's culture, philosophy or religion. All these
things do not come under the orbit of statistics.
Statisli&aJ laIPS tr," onlJ on atJeragl. We have already discussed
in an earlier chapter that statistics, as a science. is not as accurate as many
other sciences are. and statistical methods are not very precise and coltect.
Laws of statistics are not universally true like the laws of physics or
astronomy. Statistical laws are true only on an average. Statistics deal
wjth such phenomena which are affected by a multiplicity of causes and
it is not possible to study the effects of each of these factors separately
as is done under experimental methods. Due to this limitation in the
statistical methods, the conclusions arrived at are not perfectly accurate
and consequently the same conclusions cannot be arrived at under similar
conditions at all times.
Does II()t stll4J individuals. Statistical methods ha\re no place for an
individual item of a series. Statistics deal with aggregates, though for
purposes of analysis these aggregates are very often reduced to single
figures. A statistical series is condensed into an average for purposes of
comparison though an individual item of the series has no specific reGog-
nition. This is a limitation. If only 100 persons die of starvation in
India and if the percentage of these deaths to the total population of the
country works out to be a negligible figure, statistically we will be justi-
fiedin ignoring it; but this fact does not in any way reduce the torture of
death and its aftermath, so far as these 100 people are concerned. and
consequently from this point of view it is something very important and
material, and yet in statistical analysis the problem does not occupy any
significant place whatever. This type of apathy to individual items of a
series is a serious handicap in many investigations. The average income
of a group of persons might have remained the same over two periods
-26 FUND.A.MENTALS OP STATISTICS

and yet many persons in the group might have become poorer than what
they were before. Statistical methods ignore such individual cases.

Is liable to be misused. Statistics are liable to be misused easiiy.


Any person can misuse statistics and draw any type of conclusion he
likes. There is very great possibility of the ?Jisuse of this science. In
,reality statistical methods can be properly used only by trained people and
their use by less expert hands is sure to give inaccurate results. Statistics
is a delicate science and' consequently should be used with caution.
Misuses, unfortunately, are probably as common as valid uses of statis-
tics. The ability to discriminate between a valid and an invalid use of
statistics is more important for most people than knowing how themselves
to make effe<;tive use of statistics. No one can afford to be misled by bad
statistics; and everyone needs kno~'ledge that can be gained only through
the effective use of statistics. The fact that it can be used properly only
by ~erts limits the chances of ma.~ popularity of this important and
useful science.
Some illustrations of the misuses of statistics have been given
below.

Shifting of Definition
I
(I) Monthly and Hourly Wage Rates. A firm had introduced pro-
ductivity methods with the result that productivity had increased. Since
the demand for its product was inelastic and labour laws did not permit
retrenchment, it decided upon reducing the working hours. As a
result, the monthly rates of wages could be increased only marginally.
A dispute arose between labour and management. The contention of
the labour was that despite significan~ increase in productivity, the wage'S
had increased only marginally; and in support of its argument, it de-
monstrated monthly wage statistics. The managements' argument wab
just the opposite. It maintained that the increase in wages has been
commensurate with increases in productivity; and in support of its
contentiori, it demonstrated average hourly wage statistics. Both the
labour and management were right. It depends on the definition of
wages which is lfdopted. The labours' definition will be considered
more apprQpriate "When wages are viewed as income of workers; and tha t
of management will be more appropriate when wages are viewed as cost
of production.
(il) BPlplayment of Women. The census of 1961 showed that the
percentage of working women in India had increased from 23.30 in 1951
to 27.96 in 1961. It might be concluded from this that the female labout
participation ratio increas-ed sig!,lificantly during the decade. But as a
matter of fact, a major part of this increase was'due tv the mclusion of
u11paid family workers and hOl,lsewives under the nOII)enclature 'workers'
IMP(,RTANCE, LIMlTATIONS .AND FUNCTIONS OF STATISTICS 27

Inaccurate Measurement Classification


. (i) Incidence oj-Crimes. The newspapers reported that the number
of convicts in jails had been increasing at a fast rate during recent years.
It was inferred from this that crimes Were on the increase. The Govern-
ment was reprimanded for growing inefficiency in police administration.
The Home Minister making a statement in the State Assembly S'llid that
the number of jail convicts had not increased due to an increase in crimes
but due to stricter penal provisions and strengthening of the police force,
especially of its detection wing. He thus maintained that police adminis-
tration has become more rather than less : efficient.
(ii) Performances of the First Five Year Plan. It was claimed by the
critics of the First Plan of India that' per capita income declined from
Rs. 266.5 at its beginning to Rs. 255.0 at its end. This led to the erro-
neous conclusion that the economy suffered deterioration during this
period and the Plan was a failure. But, in fact, this decline had occurred
due to a fall in the general price level and real per capita income had
'indeed increased. This could be shown by making comparisons of pe r
capita incollles in both the years at 1948-49 constant ,prIces that it had
increased to Rs. 267.8 at the end of the Plan as compared to Rs. 247.5.at
its beginning.

Inappropriate comparison
(1) Deaths in Hospitals. The statement that 'the incident of death
among sick persons is higher in hospitals than at home' is likely to lead
to the conclusion that more patients die in hospita1s than at home due
to lack of proper treatment and care. But this conclusion .turns out to be
completely erroneous if it is borne in mind that in India only seriously
ailing persons are hospitalised.
(il) It was claimed by a teacher that his teaching method was
superior to that of others. He supported argument by showing that all
the students in his .class secured first class: Investigation into the
matter revealed that unlike' others, all his students had secured first class
in previous examination_ and were merit holders. His success was,
therefore, due to better stuff in his class rather than to the superiotity of
his teaching method.

·Defective Method in SeleCting Cases


Isslle of Abortion. The Morning News reported that 70 per cent
people in the country were in favour oflegalisation of abortion. It had
come to this conclusion by a statistical analysis and interpretation of the
replies sent to it by its readers in response to a questionnaire. But a
broad based survey made by a social organisation showed that this was
entirely incouect. In fact, more than 80 per cent. people wei:e against
it. The newspaper had reached the errOneous conclusion as it was
based on the opinion of educated people who constituted only a small
minority in the population
28 PUNDAMENTALS OP STATISTICS

DIS1!RUST OF STATISTICS

Figuru may be incomplete or manipulated. Despite its importance and


usefulness the' science of statistics is looked upon with a suspicious eye
and is quite often condemned as a tissue of falsehood. It is said that "an
ounce of1tuth will produce tons of statistics;" or that "statistics- are lies
of the first order!' These statements indicate the extcht to which the
science of statistics has come in disrepute and is not trusted even in modern
times when its use has spread over all types of human activities. In our
:daily life we tend to accept statistical conclusions and the interpretations
placed on them, uncritically. But then we are misled so often by skilful
talkers and writers who deceive us with correc~ facts that We come to
distrust statistics entirely and assert that-"statistics can prove anything"-
implying, of course, that "statistics can prove nothing." Strangely
enough, whereas on the one hand statistics is condemned in such bitter
language, on the other hand it is also said : '~If figures say so it cannot
be otherwise" or "figures don't lie". The reason for such diversity
in views is not far to. seek. The reason lies in the innocence of figures.
Figures are innocent and easily believeable. It is human psychology that
when facts which are supported by figures come before a man they are
easily believed. Numerlcal data convey a sense of precision and accuracy
and consequently it is only natural that a man believes a statistical state-
ment usually without questioning it. There is a great danger in this
type of approach to a numerical statement. Figures which support a
particular statemellt may not be true. They may be incomplete, in-
accurate or deliberately manipulated by prejudiced persons who wish to
conceal the truth and want to present a false picture to achieve a particular
end. Latef on when people realise that even a statistical statement has
belied their expectation, their faith in the science of statistics is shaken
and they begin to condemn it in the strongest possible language. The
fault, in such cases does not lie with the science of statistics; it lies with
those who use it. If wrong figures have been used they are bound to
give wrong conclusions and it is the duty of, the persons who use statistics
to see that the figures that they use are free from all types of bias and
have been properly collected and scientifically analysed.
Can prove anything is' not correct. Sometimes it is remarked that
statistics can prove anything. But people who say so are usually those
who do not know the A. B. C. of the subject. Statisticians rarely claim
to prove anything. They are taught to examine the reliability of their
data and the justification of their conclusion with the utmost suspicion,
due care and caution. Statisticians generally take care that the chances
of their statement being correct are at least 20: 1 and as such it is abso-
lu~ely wrong to say that statistics can prove anything.
Does no/ prove anything, is merelY a fool. Many people disbelieve
statistics because it does not prove a particular thing in a particular
manner. It should be clearly understood that statistics does not prove
anything. Statistics is only a method of approach;' it is a tool in the hands
mPOllTANCB. LIMITAnONS AND FUNCTIONS OP STATISTICS 29I

of a statistician to present a phen,pmenon in a particular manner, no thi1lg


beyond it. The science of statistics does not prove or disprove a thing.
it merely presents the true facts about a problem and leaves the rest to
other people. Different tn>es of conclusions can be arrived at from the
same set of figures if there IS a difference in the approach of various per-
sons. From one set of figures a communist can prove that Russia has
eliminated unemployment and improved the lot of the working class
and ftom the same set of figures an anti-communist can derive an oppo-
site conclusion. This fundamental di:£ference in approach or we may
call it bias in the minds of the investigators, has been responsible for
different conclusions being drawn from the same set of figures. For
this, the science of statistics cannot be blamed. It is not the fault of
the science. It is the mischief of those ~ho use it.
Need of Gallik",. A layman has, therefore, to be very cautious. If
figures have been given without the context in which they were collected
or if they are not complete, or if they relate to a phenomenon different
from the one under investigation, or even if the figures are correct and
complete but a faulty or biased logic is applied to them, the conclusions
arrived at are bound to be wrong and would strengthen the'beliefthat
statistics are Hes of the first order. Unfortunately a set of figures canno~
by itself disclose whether it is dependable or not. Figures-'ao not bear $
trade mark of their accuracy. All figures appear to to be correct and\
innocent. It is this difficulty of separating ~ood frugres from bad ones:
which is responsible for discrediting the SClence to a considerable ex-
tent. It lis, therefore, necessary that whenever we use statistict we
should first of all make sure that they were properly collected and are
suitable for the problem under investigation.
StaJistiGa/ methods are de/itate loo/s. Statistical methods are very-
delicate and since they are liable to be misused easily they are very dan-
gerous as well. The results of the misuse of statistical methods or statis-
tical data should not be used to discredit the science. If a child cuts his
finger by a sharp knife or an insane person hits his own head or that.
of anyone else with a stick, the fault does not lie with the knife or !he
stick. It lies with the person who uses it. Similarly if statistical method
are not properly used the fault does not lie with the science of statistics .
but with the person using it. Statistics are tools and can be used in any
way we like and it is in our own interest that we use them in a proper
manner.
"He who accepts'statistics indiscriminately will often be duped
unnecessarily. But he who distrusts statistics indiscriminately will often
be ignorant unnecessarily. There is an accessible alternative between
bliriC:l gullibility and blind distrust. .. It is possible to interpret statistics
skilfully. The art of interpretation need not be monopolized by
statisticians, though, of course, technical statistical knowledge helps.
Many important ideas of technical statistics can be conveyed to the non-
atatistitian without distortion or dilution. Statistical interpretation
depends not only on statistical ideas but also on ordinary clear thinking.
30 FUNDAMEN'l'ALS OP STATISTICS

Clear thinking is not only indispensable in interpreting statistics but is


often sufficient even in the apsence of specific statistical knowledge ...
I:or the statistician not only death and taxes but also statistical fallacies
are unavoidable. With skill, common sense, patience and above all
objectivity, their frequency can be reduced and their effects minimised.
But eternal vigilance is the price of freedom from serious statistical blun-
ders." (Wall"s and Roberts).

FUNCTIONS OF STATISTICS AND STATIS'I'lCIANS

To (ollerl and ana!Jse data. At this stage it is necessary to pause for


a moment to see what are the functions of statistics and statisticians.
We have discussed above that the main function of statistics is to collect
and present numerical data in a systematic manner so that it may be
analysed in a scientific way. Statistics is, as we have seen, not meant
to prove anything; it is merely to analyse the phenomena in a s(;ientific
fashion. Accordingly, the role of a statistician is to collect the data in
a proper fashion, to scientifically analyse it and to set a stage for its
correct interpretation. It is futile to expect him to work wond~rs or to
give a particular shape to given material. He has simply to arrange the
·material in a proper form sO that its real worth may be exposed. After
doing this the statistician has finished his job. The ta~k of giving a
particular shape, to material is beyond the scope of the scienc~ of statis-
tics. Statistics' are like raw materials and to convert them .into finished
products is the work of people other than statisticians. The use of eco-
nomic statistics for the purpose of formulating an economic policy is
the work of an economist not of a statistician. A statistician would
simply collect and analyse the economic statistics. He would hot for-
mulate an economic policy on their bas s; he would leave this work for
the economist.
\

In general a successful statistician requires not only a sound


knowledge of statistical methods but he has to be a specialist in the branch
in which he is carrying on an investigation. If a statistician is asked
to find out if fertilizer A is better than fertilizer B when applied to po-
tatoes, he should first know all about fertilizers in general, about their
application, growing of potatoes and many other connected things.
In this case he should be an agricultural expert. Practical statistician
is, therefore, in the first place an engineer, an economist, a biologist
or some other specialist and he has to acquire special knowledge of the
field in which he is making use of statistical methods.
A statistician should also not forget the limitations of the science
of statistics. He should not forget that laws of statistics are true only
on an average, and that he cannot boast of the same precision which
is found in experimental methods. He has to work under various
handicaps and he should be very cautious and vigilant. Even a slight
mistake on his part is liable to render his entire work useless and defec-
tive. He should be free from 'bias, should have profound common sense
IMPORTANCE, LIMITATIONS AND FUNCTIONS OF' STATISTICS 31

and should work like a true re!ltarcher without any prec<1nceived notions
or conclusion about the problem under investigation. It should not be
forgotten, as W. 1. King said that "statistics is a most useful servant but
only of great value to those who understand its proper use.'r

Questions

I. Discuss fully the importance of statistics as an aid to commerce.


(B. Com. Allahabad. 1942).
2. "Knowledge of statistics is like a knowledge of foreign language or of
alge~ra. It may prove of use at any time under any circumstances." EKplain.
3. "The statistics of a business can be treated practically and the prepamtion
and study of business statistics can be made a more exact science than the study of
national and social statistics." Explairn. (B. Com. AflahalJad, 193 2).
4. Explain clearly the statistical methods used in any scientific investigation
and show their importance to theoretical economists and practical businessmen.·
J. Discuss the scope, utility and limitation of statistics. (B. Com. Agra, 1937)
6. Discuss the importance of statistics and show how it can help the extension
pf scientific knowledge, the establishment of a.soundb usiness and the introduction
of social and political reform. . (8. Com. Agra. 1942)
7. "Figures never lie". "Statistics can prove anything". Comment on the
above two statements indicating the reasons for tre existence of such divergent views
regarding the nature and functions of statistics. (B. Com. Agra. 1948).
8. "Statistics could not be used a~ Ii blind man does a lamp-post for support
instead of for illumination". Comment on the above remark. (M. A. Agra, 1946)
9. Write an ess~ on "Statistics in the Service of the State." (1. C. S., 1946).
10. Discuss the usefulness of statistics to businessmen and members of legislative
councils and local badies in 1ndia. (B. COlli. LU(htO/ll,I942)
I I . In what ways can statistical ~ethods be misused by interested persons.
Give at least two examples of the misuse 0 f statistics. (B. Com. Lu(.knom, 1939
12. "Science ,}Vithout statistic~ bears no fruit and statistics without science h a
no root." Comment.
I;. Give the important limitations and uses of statistics. Show its relatiQn to
canomics and mathematics. (B. Com. Lutkno1Jl. 19;8).
14. ExpJain tbe utility of maintaining statistics in industrial and commercial
concerns. (B. Com. Agra, 1949)
IJ. ;"flor. the most part statistics is a method .of investi~ation far use when
other methods are of no avail; it is often a last resort of a forlorn hope".
Comment.
t 16'. "Statistical analysis properly conducted is a delicate dissection of uncer-
ainties, a surgery of suppositions." Explain tne above statemeots.
I7. "Public knows too little of the statistician as a conscientious and skilled
serv,lOt of true science." In the light of the abovc statement explain why the
science of statistics is not well known to the common man.
18. "There are lies, damn lies and statistics." Comment.
19. "There is more than a germ of truth in the suggesri,)ll that in a society where
statisticians thrive, liberty aDd individuality are'likely to be emasculated." Do you
agree with tbe above statement? If not, why?
20. Clearly show how in modern times statistics is the science of human welfare.
32 FUNDAMENTALS OF STATISTICS

21. Discuss the functions of statistics and ~tatistician8.


n. "The science of statistics is a most useful servant, but only of great value to
those who understand its proper use."-Kmg. Comment.
23. "He who accepts statistics indiscriminately will often be duped unnecessarily.
But he who distrusts statistics indiscriminately will often be ignorant unnecessarily."
Comment.
24. "When you can measure what you are speaking about and express it in num-
bers you know something about it; but when you cannot measure it, when you cannot
express it in numhers, your knowledge is of a meagre and unsatisfactory kind."
-Lcrrd KelPin.
Elucidate the above statement.
25· ........ eternal vigilance is the price of freedom from serious statistical blun-
ders." Explain. \
26. Bxplain and illustrate the use of statistics in the field of Business management.
27. Give some examples of the misuse of statistics and point ·out the precaution
which should be taken in using statistical data.
Pr.eliminaries to the
Collection of Data 4
N,ed. In order to apply the statistical methods to any type of
en,uiry it is essential that statistical data be collected as statistical ana-
l YSlS is not possible in the absence of quantitative data. Data are in
fact the fundamentals of statistics. Thetefore, an all-important step in
statistical work is the collection of facts and figures. The problem
appears to be very simple and easy on first thought but actually it is
not so. A careful study of the technique of collection of data and their
presentation in proper form is absolutely necessary as these things
form the ,very foundation of tbe statistical information that bas to be
.provided. Every aspect of the problem has to be carefully examined
so that the real purpose of the collection of facts may be fulfilled. As
such in all statistical investigations before the collection of data begins
a large number of preliminaries have to be undergone.
Probl,,,, should b, num,rical. A statistical study is always undertaken
to supply answers to some questions Which emerge from any important
problem. But all types of questions cannot be answered statistically.
Therefore, the firs~ thing to be observed by a statistical investigator is
whether the problem and more partiCularly the question arising out of it
is capable of quantitative expression. Such questions as, how great was
Mahatma Gandhi, how brave was Subhas Chandra Bose or how vir-
tuous was Aurobindo? cannot be answered by the use of statistical
methods. A qpestion, to be suitable for statistical investigation should
be like, what IS the average production of wheat per acre in India?
What is the national income of this country or what is the total population
of the Indian Union? All these questions are capable of being
answered in absolute or relative numbers.
Having verified the fact that the problem under consideration is
capable of quantitative study. other things to be thought about are.
the object of the investigation and the scope of the enquiry. It is also
necessary to study beforehand the data iPli;," Ikw, to IN tolletl,d in order
to fulfil the objects of investigation, and the sources from which the
collection has to be done. When all these things have been decided
then only a decision can be taken about the type. of enquiry which is
to be conducted. At this stage the statisticaJ tmils are decided and de-
fined and an idell is also formed about tite d'grlt of aCttlrtlry desired. Thus·
before the collection of the data actually begins the following steps must
be carefully discussed and analysed:- \
1. Object and scope of the enquiry.
2. Sources of information.
3. Type of enquiry to be conducted.
3
34 FUNDAMENTALS OF STATISTICS

4. Statistical units and their definition.


S. Degree of accuracy desired.
We shall now study them in turn.
OBJECT AND SCOPE OF ENQUIRY
Object. The determination of the object of enquiry is a very im-
portant step in statistical investigation. If the object of the enquiry
is properly determined and defined many difficulties or the collection
and analysis of data are automatically removed. It becomes easy to
decide which data are useful and essential for the purpose of the investi-
gation and which are comparatively less important and can be left out.
This considerably improves the degree of accuracy of the collected data.
The knowledge of the purpose of en<Juiry serves as a guide in the collec-
tion of facts and the various difficulttes experienced. by the investigtltors
are easily solved if the object of the enquiry is kept in mind. Moreover,
with the object of enquiry in mind, it is always possible to have a uniform
approach to different problems which arise during the course of collection·
and analysis of facts and figures. The purpose of an enquiry may be
general or spedfie. Census of population and census of production are
examples of statistics collected for general purpose, while statistics of
cost of living or of indebtedness of a particular group of persons, are
examples of statistics collected for specific purpose.
Seope. It is also essential that the scope of the- enquiry is also
determined beforehand. The extent to which statistical data can be
useful and should be collected for the purpose of a partietUfl.r investi-
gation should be decided before the actufll work of collection begins.
If a very large quantity of statistical data are collected they are likely
to become unmanageable and it may not be easy to draw correct
inferences from them. On the other hand, 1f the quantum of statisti-
cal data collected is inadequate, there is every possibility of the con-
clusions drawn being incorrect. It is therefore, essential that efforts
should be made to come to a corr'ect conclusion about the exact
quantum of data that have to be collected. In some cases complete
record of the whole data may be necessary while in others only selec-
tive study might suffice.
SOUl\CES OF INFOl\MA'l1ION

Primary and secondar;y data. Having determined the object and


scope of the enquiry it becomes necessary to think about the sources
from which data have to be collected. Broadly speaking, the source
of information may be either: (i) Primary or (ii) Secondary. Primary
data. are those which are collected for the first time by the investigators
or enumerators working under him, while secondary data are those
that have already been collected ~y others and which ate usually available
in journals, magazines Ot research publications. the nature, scope and
objects of the enquiry have to be taken into account for deciding whe-.
ther the data are to be collected originally or whether published or UO-,
published information which has already been collected can be utilised
for the purpose of investigation. If statistics have to be collected in the
PRELI!4lNAllIES TO THE COLLECTION (W DATA 35
shape of primary data we shall have to decide about the 'persons from
whom such information is to be gathered. Some enquiries like
population census or rural indebtedness J;llay involve varied sources
fs:,om which data have to be collected. while in small enquiries like
tourist traffic in hill stations or expenses of university education in a
particular state, the sources may be comparatively few and less varied.
If secondary data have to be used adequate precautions must be taken
otherwise results of the investigation are likely to be inaccurate. A
careful study should be made as to how the published statistics were
collected, what was the purpose of their collection and whether they
are suitable for use in the enquiry that is being conducted. Published
statistics should never be taken at their face value.
TYPE OF ENQUIRY
Object and scope. A decision about the type of enquiry most suit-
able for a particular problem can be taken only after a study of a large
number of factors. Among these, the object and scope of the enquiry
are comparatively more important. The type of enquiry is considerably
influenced by the object with which the investigation is conducted,
and the scope of the enquiry has also a considerable bearing on this
problem. If, for example. the object of an enquiry is to find out the.
total area under wheat .in Uttar Pradesh. the type of enquiry best suited
would be one. in which there is complete enumeration. A sample study
would not give dependable results. If, on the other hand, the object
of the enquiry is to find out the normal yield, a sample survey would give
fairly accurate results. Similarly. if the scope of the enquiry is wide. it
has to be of one type and if the scope is narrow the enquiry has to be of a
totally different form. Thus. we find that a decision about the type of
enquiry most suitable for a particular case depends considerably on the
object and scope of the investigation.
Who wants inforl1lation. Another factor affecting the decision about
the type of enquiry is the answer to the question, on whose behalf the data
are being co/I~cted. If the enquiry is being conducted on behalf of the
state, the task of collection becomes comparatively easier as the state can
compel people to su~ply the necessary information and that too at regular
intervals, and at thl:lr own cost. If th.e investigation is being conducted
on behalf of an institution other than the state, for example, chamber of ,
commerce, university, or a trade union, there is the force of moral pre-
ssure only. These institutions can 0lllY persuade and request people to
give the necessary information. The type of enquiry in such cases is
bound to be of a different fashion. If the data are being collected by an
individual on his own behalf, the position is still worse.. He can only
beg for information which he needs and the edquiry in this case would be
of a still different type. Besides this the financial resources of different
tYPes oflersons or institutions, conducting statistical investigations also
differ. state can spend much more than a private institution and simi-
larly a private institution can spend ordinarily much. more than an in-
diVldual. Thus a decision about the type of enquiry is affected by its
financial implications also. ."
36 FUNDAMENTALS OP STATISTICS

HOIII do data tmergt. The manner in which statistical informa-


tion emerges is another factor which has a bearing <;,n the typo of en-
quiry which should be done in a particular 'case. If the data are to be
collected originally, or in other words, if the primary data have -to be
collected the type of enquiry suitable in such cases would differ from the
type which would be ideal if the data in question are secondary. If the
secondary data have to be compiled and used the problems of definition
of terms and units, etc., are no more, as the data have already been collect-
ed with certain definitions of the units which cannot be changed. In case
of primary data various terms and units will have to be defined in the
light of the objects of the enquiry; the manner of the collection of data
in the above case would be entirely different from the previous one.
A decision about the type of enquiry best suited for a particular
investigation should be taken after due consideration of all the factors
discussed above. An enquiry may be
(a) Census or sample
(b) Original or repetitive
(c) Direct or -indirect
(d) Open or confidential
We shall now briefly expla~n these types.
eenSllI ~r sample. A census enquiry is one in which all the units
connected with the problem are taken into accs>unt, while in sample
enquiry only some selected representative units are studied. Whether
a particular enquiry should be of a census type or sample type depends on
a variety of factors like object, scope and nature of the investigation, as
also on the amount of money available for the puJ:tlose. A census enquiry
is usually a costly affair and requires a big organ1sation which ordinarily
only a state or a big private institution can afford.
Original or repetitive. An original enquiry is one which is conducted
for the first time and a repetitive enquiry is one which is carried on in
continuation or repetition of previous enquiries. In an original enquiry
the plan of investigation has to ire drawn whereas in repetitive enquiry
the previous plan is only modified to suit the new situation. It should,
however, be remembered that in the repetitive enquiry the definition of
various terms should not be materially altered as this would render com-
parison inaccurate. It should also be remembered that old definitions
very often need a change, and as such, the advantages of a new definition
should be weighed against the drawback <If sacrificing comparability and
continuity of figures.
DirHI or indirect. Statistical enquiries may be direct or indirect.
Direct enquiries are those in which the data are capable·-of quanti-
tative expression and can be directly measured whereas indirect enquiries
are those in which the problem is not capable of quantitative measurement
directly. If we have to collect statist1cs of the age, height, weight or
income of a group of persons the enquiry would be direct enquiry as all
these things are capable of direct measurement. On the other hand, if
the problem relates to intelligence Or character or a certain group -of people,
PRELIMINARIEs TO THE COLLECTION OF DATA 37
the enquiry y/ould have to be an indirect one as these phenomena
are not capable of direct quantitative measurement. In such cases some
factors which have an indirect bearing on the problem and which can be
quantitatively measured will have to be studied. For example, to study
intelligence of a group of persons we may have to study the marks ob-
tained by the group in a certain test and thus we may have some idea
about the main problem.
Open or Gonfldential. Another classification of statistical enquiries
can be open or confi~ential. {\n open enquiry ~s one which is not con-
udenrlal and the deta11s of which are not kept Hi secrecy. Most of the
enquiries conducted by the state, private institutions and even individuals
are of this type. However, the results of certain enquiries are not open to
public and are kept confidential. Private bodies like manufacturers'
associations, employers' associations or trade' unions sometimes collect
information, the details of which are confined only to their members and
none else. Such enquiries are of a confidential type.
STATISTICAL UNITS

Need of definition. The collection of statistics necessitates measure-


ment or counting, and as such it is essential that the unit in which the
data are to be collected should be properly defined. In the absence
of a proper definition of the unit it is quite likely that the items which
should have been included are omitted and those which should have
been .omitted are included. At the first thought it might appear to be
a very easy and even unnecessary step but a little thinking would clearly
show hbw difficult and important the problem is.
Physical and arbitrary, IInits. "The unit of measurement applied to
the data in any particular problem is the statistical unit." In many
studies the unit to be used is conventionally fixed and is well determine'd
and defined. Physical units of measurement like ton, pound, yard, feet;
inch, hour and year, etc., are examples of this type. These units do not
need any explanation or definition. However, in many statistical studies
such customary and legal units are not available. It is in such cases that
a statistician has to arbitrarily decide about a unit and has to give it a
proper definition. In social sciences such situations. arise very frequently.
For instance, if an enquiry is undertaken about the wages of workmen
in any industry the unit of measurement will have to be-carefully defined.
Wage is a very general and vague term. It may refer to money wage,
or real wage. piece wage or time wage. of 'Skilled worker or of unskilled
worker, weekly wage or'monthly wage, and so on. Further, a week may
be of 48 working hours or of 40 working hours or less or more. Under
such, circumstances, which unit of wage income should be used, is a-
question not easy to answer. It is, therefore. essential that a statistician
defines the units of data before he commences the work of collection.
The unit of measurement should be uniform throughout the study of a
p~ticular pro!:?lc:pl.~""V ."
38 FUNDAMENTALS OF STATISTICS

Requirement:> of statistical units


Should be unambiguous and specific. The first and foremost require-
ment of a statistical unit is that it should be unambiguous and unmis-
takable. If the unit is not specific, and if its meaning is liable to be
misunderstood, the data collect~d would suffer from various types of
inaccuracies. It is necessary that the units'are properly denned, as in the
absence of proper definition ambiguity about its meaning is bound
to arise.
Should be stable. The statistical unit should be stable. If there
are significant fluctuations in the value of a unit the data collected at
different times or at different places would not be comparable and much
of their utility would be lost. Fluctuations in the value of currency or
difference in weights and measurements at different places may create
unending difficulties in comparison and analysis.
Should be appropriate to enquiry. The unit should further be appro-
priate to the enquiry and should be capable of correct ascertainment.
As has been noted earlier the definition and concept of a statistical unit
differ from enquiry to enquiry. Price may mean retail price in one en-
quiry, wholesale price in a second enquiry and cost price in a third en-
quiry. It may be used in other senses also. It is essential that the
unit is defined in such a manner that it completely suits the purposes
of enquiry. ~
ShOUld be homogene()lls. Homogeneity of the units is another essen
tial thing which requires careful consideration. Unit must be uniform
throughout the enquiry. The unit should imply as far as possible the
same characteristics at different times or at different places. If the data
are not homogeneous they can be broken up in groups and sub-groups to
secure uniformity. For instance, if data relating to industrial accidents
are being collected, the accidents cll-n be divided into a number of classes
on the basis of the type of injury and compensation claimed. Thus even
heterogeneous data can be made homogeneous in small groups to ensure
uniformity in study.
Types of statistical units
Broadly speaking statistical units can be of two types, viZ : -
(a) Units of collection.
(b) Units of analysis.
(i) Units of collection. Units of collection are those units in which
figures relating to a ?articular problem are either enumerated or esti-
mated; for example, production of wheat in, India may be estimated
in tons, the consumption of dectricity in kilowatts and the exports
of c:>tton in bales. Units of measurement may be either simple or (Ofll-
poslte.
Simple units of collection like ton, pound, bale, kilowatt, yard, and
hour, etc., are not at all difficult to define. Their meaning is general
and they are in common use. However, care should be taken in their
actual us,!. For example, bales of cotton can be of different weights. In
PREUMINARIES TO THE COLLECTION OF DATA 39
such cases a standardised definition of the units must be used and this
fact ,should be mentioned: Similarly most of the monetary units have
different values in different countries and even in the same country at
different times che values are not the same. Allowance for such varia-
tions must alWl!.ys be made.
A tompositl unit is one which is formed by adding a qualifying word
to a simple unit with the result that its !)cope becomes restricted and its
definition becomes rather difficult. "Mile" is a single unit and its scope
and meaning are very clear; but if this word is preceded by a qualifying
word "ton" then "ton-mile" becomes a com}?osite unit. It has now a
restricted scope and it requires a special definttion. Ton-miles are equal
to the number of tons multiplied by tll.e number of miles carried. Other
examples of composite units are passenger-miles, labour-hours, kilowatt-
hours and bus-miles.
(it) Units of ana!Jsis. Units of analysis and interpretation, as their
name suggests, are those units with which statistical data are analysed and
interpreted. They include ratios, pertentages, rates and ro-iffitienls. All
these are very useful for the purpose[ of comparisop. Comparison in
statistical analysis may relate to tIme, place or condition. A series relating
to annual production of manganese in India during the last ten years is a
time-series; if a comparative study of the production in different years is
to be undertaken, it can best be done by calculating ratios, coefficients
or percenta~es. Similarly series relating to space or condition are also
analysed WIth the help of such units. Ratios and coefficients involve
comparison between the numerator and denominator both of which are
supposed to be homogeneous. Similarly percentages and rates (per 1000)
are comparisons of certain figures in relation to a fixed level of 100 and
tOOl) re~pectively. Elsewhere in this volume we shall discuss the fallacies
of ratios and percentages and we shall point out the precautions which
should be taken while making use of these units for the purpose of making
comparisons or drawing inferences.
DEGREE OF ACCURACY
Abs()llIle acturary is impossible. Before commencing the work of
actual col!ection of data it is necessary that the investigator has some
idea in his mind about the degree of accuracy which he desires in his
estimates. The type of enquiry and the mode of collection of data
are affected to a considerable extent by the degree of accuracy whicn
is aimed at. It should be kept in mind that absolute accuracy is impossible
to be achieved, and as such efforts must be made to achieve only a reason-
able standard of accuracy. In most of the statistical investigations,
perfect accuracy, even if it were attainable, is hardly of much use and a
reasonable degree of accuracy is enough to draw dependable inferences.
A decision about the degree of accuracy should be made with regard
to the purpose of investigation and the nature of enquiry. The degree of
precision needed by a grain merchant in Weighing grain is much less than
that needed by a chemist in weighing medicine.
40 FUNDAMENTALS OF STATISTICS

The standard of accuracy aimed at should be stated, and if possible,


the limits of the probable error should also be mentioned. We shall
discuss the concept of accuracy in greater detail in the next chapter.

The above discussion gives in brief an idea of the preliminaries


that are necessary before the actual work of collection of data com-
mences. In the next chapter we shall discuss the methods of the collec-
tion of statistical data.

Questions
1. Discuss the preliminary steps which should be taken before commencing
the work of 'collection' of data.
z. Why is it necessary to determine the object and scope of the enquiry before
planning an investigation i'
3. What is a statistical unit? Is it necessary that the data be homogeneous i'
(B. Com. Agra, 1939).
4. What steps would you take to organise an economic survey of a typical
Indian village?
5. Describe the various stages in conducting a primary economic investigation.
What precautions will you take at each stage i' (M. A. &IJ Punjab, 195 0 )'
. 6. Wh~t is meant by (a) units of collection, and (b) units of analysis? Explain
theIr respective uses. /
7. Differentiate between simple and composite units. Give examples of each.
8. Write a note on the purpose and utility of planning a statisticll investigation.
9. What is meant by degree of accuracy? How should it be determined jI

10. Distinguish between primary and secondary data. 111u~trate your answer
with examples.
Collection of Primary and
Secondary Data 5
Primary and secondar'y data. After the preliminaries discussed in
the last chapter have been gone through, the task of the collection of
data begins. Statistical data, as we have already seen, can be either
primary or secondary. ,Primary data are those which are collected for
the first time and are thus original in character, whereas secondary
data are those which have already been collected by some other per-
sons and which have passed through the statistical machine at least
once. Primary data are in ,the shape of raw materials to· which statis-
tical methods are applied for the purpose of analysis and interpreta-
tion. Secondary data are usually in the shape of finished products since
they have been treated statistically in some form or the other. After
statistical treatment the primary data lose their original shape and become
secondary data. On a closer examination it will be found that the dis-
tinction between primary data and secondary data in many cases is one
of degree only. Data which are secondary in the hands of one may be
primary for others. Statistics of agricultural production are secondary
data for the Agriculture Department of a Government, but for the pur-
pose of calculation of national income these data are primary, because
they will have to go through further analysis and their shape will not
remain the same.
Factors affecting choice of method. It is obvious that the methods
of the collection of primary data and secondary data would not be exactly
identical because in one case the data have to be originally collected
while in the other the work is of the nature of compilation. There are
various methods of the collection of primary and secondary data and the
choice of the method depends on a number of factors. Nature, object
and scope of the enquiry are the most important tbings on which the
selection of the method depends. The method selected should be
such that it suits the type of enquiry that is being conducted.
Availability of finance is another factor which influences the selec-
tion of the method of collection of data. When financial resources at
the disposal of the investigator are scanty he shall have to leave aside
expensive methods even though they are better than others which are
comparatively cheap.
Availability of time has also to he taken into account. Some methods
involve a long duration of enquiry while with others the enquiry can be
conducted in a comparatively shorter duration. The time at the disposal
of the investigator thus affects the selection of the technique by which
data are to be cotlected.
42 RUNDAMENTALS OF STATISTICS

METHODS OF CoLLECTING PRIMARY DATA

The following methods of the collection of primary data are in


cllmmon use : -
(a) Direct personal investigation.
(b) Indirect oral investigation.
(c) By schedules and questionnaires.
(d) By local reports.
We shall briefly discuss each of them in turn.
Direct personal investigation
In direct personal investigation as the name suggests the investi-
gator has to collect the information personally from the sources con-
cerned. He has to be on the spot for conducting the enquiry and has to
meet people from whom data have to be collected. It is necessary that in
such cases the investigator has a keen sense of observation and he is
very polite and courteous. He should further acquaint himself with
local conditions, customs and traditions so that he is in a position to
identify himself fully with the persons from whom the information is
sought. In some cases it may not be possible or worthwhile to contact
directly the persons concerned and in such cases the investigator has to
cross-examine other persons who are closely in touch with the sources
of data. The information elicited in such a manner should be carefully
used and the investigator should make sure that the persons' from whom
data are being collected actually know the facts fully and catideliver him
the goods. The investigator has to be very tactful and cautious in such
cases. He should put easy and simple questions which are capahle of
being answered precisely and in a language which is not vague.
The method of direct personal investigation is suitable only for
intensive investigations. It involves enormOIlS cost and usually requires a
long time. It is naturally not suitable for extensive: enquiries where the
scope of investigation is wide. Further, in this method the bias or
prljllliice of the investigator can do a lot of damage as he is in sole charge
of the collection of data. This method, however, gives very satis-
factory results if the scope of the enquiry is narrow and if the investigator
is fully dependable and is completely unbiased.
Indirect oral investigation
When the above mentioned method cannot be used either on account
of the reluctance of persons to part with information when approached
directly, or on account of the extensive scope of the enquiry or on account
of some other reason an indirect oral examination can be conducted. In
this method data are not collected directly from the persons concerned but
through indirect sources. Persons who are supposed to have knowledge
about the problem under investigation are interrogated and the desired
information is collected. Usually in such enquiries a sOlall/isl of questions
relating to the investigation is prepared and these questions are put to
different persons (known as witnesses) and their answers are recorded.
COLLECTION OF I'.IMAB.Y AND SECONDAB.Y DATA 43

Most of the commissions and committees appointed by the Govern-


ment to. collect statistical data or to carry on such investigations in which
factual data have to be compiled, make use of this method. They re-
quest different types. of people to come and give evidences and on the
basis of these records, facts about different problems are ascertained. In
such enquiries the evidence of one person should not be relied upon and
the views, of a number of per&ons ~houl~ be asce!tained to find out the
real position. In this method the accuracy of data collected would largely
depend on the type" of persons whose evidences are being recorded. It is,
therefore, necessary to be very cautious in the selection of these persons.
Invariably it should be seen that the person who is being questioned
(a) knows full facts of the problem under investigation;
(b) is not prejudiced;
(&) is capable of eXpressing himself correctly and can give a true
account; _and
(d) is not motivated to give colour to the facts.
Proper allowance should be made for the inherent optimism or
pessimism of the informants. Some people by nature are optimists while
others are pessimtsts. These persons may be honest and unbiased and
yet their eVidences in most cases are likely to be affected by their inherent
psychology. The .w:ll-known example ,of two dr';illkards (one optimist
and the other pesslmlst), each of whom was left With half a glass of wine
iltustrates the point very clearly. The optimist said, "What do I care
for the world, I have 'yet half the glass with me" and the pessimist
remarked, "What can I do in this world, I have only half t4e glass with
me." Both of them were stating facts correctly and yet the two state-
ments give entirely different impressions.
Schedules and questionnaires
An important method of the collection of data followed usually
by private lndividuals~ research workers, non-official institutions and
sometimes the Government also, is that of schedules and questionnaires.
In this method a list of questions relating to the problem under investi-
gation is prepared and printed and information is collected from various
sources in any of the following ways : -
(a) B'y sendipg the tpl8ltionnaire to the persons &oncerned and reques-
ting them 10 ansn'e'; the (juestions and return toe questionnaire.
The main advantage of this method is that it is least expensive
and with it, information can be collected from a wide area in a com-
paratively short period of time. If the investigation is properly conducted
the method can easily ensure a reasonable standard of accuracy. Success
in this method depends on the co-operation that the informants are pre-
pared to give. Generally it has been found that the informants adopt an
attitude of indifference towards such enquiries and in many cases do not
even returh the questionnaire. Even those who answer the questions do
so most hapha%ardly and in a very vague and unintelligible manner
Only those persons who are under the authority of the investigator or:
44 PUNDAMENTALS OF STATISTICS

i?-ve!ltigating ins.titution or those who are obliged to them in some form


or the other devote some time and energy in answering the questions. In
order to have correct answers the investigator should send a very polite
letter to the informants emphasising the need and usefplness of the
investigation that is being conducted and requesting them to give their
co-operation by sending correct replies. He should further give them
an assurance that if the informants so wish their replies would be -kept
confidential. Further the questions that are asked should be very
carefully framed. The questions should be : -
(1) Short and'clear.
(2) Easy to understand and answer.
(3) Few in number. .
(4) Free from ambiguity.
(5) Such as can be answered in Yes or No if opinion is sougbt On
a particular point.
(6) Corroboratory in nature.
(7) Not such which cald for a confidential information.
(8) Not such which may hurt the sentiments of the informants
or may arouse resentment in their minds.
However, this method cannot be used if the informants are illiterate.
If they are literate but adopt an indifferent attitude then also the method
should be used with utmost caution as in such cases likelihood of error
is very great.
(b) By sending the questionnaires through e1Iunierators to help the infor-
mants'in filling the answers. ,
In this method the enumerators go to the informants along with the
questionnaires and help them in recording their answers. The enumera·
tors explain the aims and objects of the investigation to the informants
and also emphasise the necessity and usefulness of correct answers. They
also remove the difficulties which any informant may feel in understand-
ing the implications of a particular question or the definition or concept of
difficult terms. This method is very useful in extensive enquiries and
with it, fairly dependable results can he expected. It is, however, very ex-
pensive and usually such enquiries can be conducted only by the Govern-
ment. Population census all Over the world is conducted by this method
In such enquiries it is necessary that not only the questions are simple
and few in number but the enumerators are also courteous and polite
and have proper training.
The selection of enumerators is a very important task and should be
carefully done. The enumerators should be explained the nature, scope
and subject of the investigation thoroughly and they should properly
understand the implications of the different questions put and the de-
finitions of the various terms used. The enumerators should have
intelligence and capacity of cross-examination for the purpose of finding
out the truth and they should be persons who are hard-working and
should have patience and perseverance.
1:0LLECTlON OF PIlIMARY AND SECONDARY DATA 45

By local reports
The last method of collection of primary data is through local
reports. In this method data are not formally collected by enumerators
but by the local correspondents or agents in their own fashion and to
their own likings. Obviously such data cannot be very reliable and
as such this method is used in those cases where the purpose of in'{es-
tigation can be served with rough estimates only and where a high degree
of precision is not necessary. This method has the advantage of being
least expensive and it also saves the botheration usually associated with
statistical investigatioq of other types.
REpRESENTATIVE DATA
As has been pointed out previously a statistical investigation can
be either of census type or of sample type. In a census enquiry all the
units assoCiated with a particular probl~m are taken into account where-
as in sample enquiry only a few selected units are studied and on the
basis of such studies attempts are made ~o draw generalisations which'--
may be applicable to the whole data. If, for ·example, we have to find
out the average monthly expenditure of the 2000 students residing in the
hostels of the Allahabad University and if we hold a census investigation
we shall have to study the monthly expenditure of each one of these 2000
students. If,. however, we hold sample investigation we shall select say,
200 students out of these 2000 and then study their expenditure. On the
basis of the study of these 200 units (techOlcally called a "sample") we
can draw conclusions which will hold good about the expenditure of all
~he 2000 students (technically called a "universe" or' "population").
The sample is considered to be a representative of universe and if the
sample has been properly selected and if its size is all right. whatever
holds good for the sample should also hold good for the universe. If
the scope of the enquiry is very wide a census investigation would not
only be-very expensive but highly cumbersome also. Moreover·it will
take a very long time and require a large number of enumerators. In
such cases a sample investigation is very suitable. A sample usually
gives representative data and the generalisations made on the basis of
such data usually hold good for the universe.
The most important point, however, is the Sel,ttlon of th, sampl,.
A sample study would give dependable conclusions only if the sampfe is
a true representative of the universe. Broadly speaking there are two
methods by which samples can be selected and they aro:-
(1) Deliberate or purposive sampling,
(2) Random or chance sampling.
Deliberate selection or purposive sampling
In deliberate selection or purposive sampling the investigator him-
self cho~ses from the uni\rerse few such units which according to his
estimates are best representatives of the population. His selection is
I For a detailed study see chapters on Sampling.
46 PUNDAMENTALS OF STATISTICS

deliberate and is based on his own ideas about the representativeness of


the sampled units. These selected units are intensively studied and
certain conclusions are arrived at. It is supposed that these conclusions
would hold good for the whole population.
This technique of selection has many drawoacks. The first and
the foremost of them is that the bias or prqudice of the investigator has enough
s,ope to Ulork and influence the seleaion. If the investigator is biased, it is
but natural that he would select such a sample which would give con-
clusions which suit 'his requirements and views. If, for example, an
investigator wants to shaw that the expenses of students residing in
the hostels of the university are very high he can select such a sample
which consists of those students only who are very aristocratic and who
spend much more than others. Another defect of purposive sampling
is that it is not possible to have a~ idea about the degrtn of accuracy achieved in
any statistical investigation conducted by this method. If the scope
of enquiry is very wide the selection of the sample by this method carr
never be recommended. However, if the investigator is unbiased and
has the capacity of keen observation and sound judgment even purposive
selection can give fairly ,clependable results.
Chance selection or random sampling_
In random sampling the selection of the units is pone in such a
manner that the chance of selection of each unit of the universe is the
same. In other words, the selection of the units depends entirely on
chance and one does not know before hand which units will actually
constitute the sample. It is for this reason that this method is also
known as the meth<#d of ,han,e seletiion. It is in fact a lottery method of
selection. How carl such a selection be made is a question not easy to
answer. Methods, which on first thought, appear to be perfectly ran-
dom may actually prove to be otherwise. If we have to select a sample
of 200 students out of 2000 hostellers, we can write their names on small
chits of papers and after folding them and mixing them together can
blind-folded draw 200 chits. This is the lottery method. It appears to
be random but in actual practice ~t may not be so. It is quite possible
that some chits were folded and pressed less than others and so their
size was slightly bigger than the size of other chits and if it was so, the
chance of the selection of bigger chits cannot be said to be the same as
that of the smaller ones. Thus, we see that it is not easy to have a purely
random sample. However, various methods are in vogue and among
them the technique of Tippets Numbers 1 is most popular. In random
sampling attempts are made to eliminate human bias in all forms and that
is why selections are usually made with the help of machines. Each unit
of the universe is assigned a number and then certain numbers are mecha-
nically selected to constitute a sample.
Chance selection or random sampling has many advantages over
purposive selection. The most significant merit of this system is that by
I Fot a detailed study see chapters on Sampling.
COLLECTION OF PRIMARY AND SECONDAR'i' DATA 47

theory of probability it is possible 10 hlJlJe an ;Jea aboRI Ihe e"DrS of esti-


malion, and we can always find out whether the results are significant
or not. It is possible to assign limits within which the true value of
a measure of universe must invariably lie. Another point in favour of
this method is that the selection is nol affected by the prejudice or bias of Ibe
investigator. As we have noted above, the selection in most cases under
this system is made by mechanical devices and naturally human bias has
hardly any scope here. But it must always be kept in mind that in many
cases it is difficult to say that the selection has been purely random and
that the sample is fully representative of the universe. However, as
far as possible, the selection of the sample should be done on a random
basis as it is always likely to give better results than the method of pur-
posive selection.
AC&llra&,J and site of ,fample. It should further be kept in mind that
the size of the sample has a relation to the degree of accuracy that it is
expected to achieve. Ordinarily, the bigger the size of a sample the
greater would be the accuracy, but a very big sized sample is likely to
become unmanageable and is very often unnecessary. No hard and fast
rule can be laid down with regard to the size of a sample. An ideal size
would depend on the type of the series and the size of the universe. If
the series is comparatively more variable the sample should be big to
cover up all types of variations. Again, ordinarily the bigger the universe
the greater should be the size of the sample. The accuracy in a random
sample has more or less a fixed relation to its size. The accuracy of a
sample increases with the rate of the square root of the increase in size of
the sample. If, for example, the degree of accuracy desired is to be
doubled, the size of the sample should be increased four-fold; if it is to
be trebled the size of the sample should be increased nine-fold. We
shall study this relationship in further details 4I chapters on Sampling.
Random sampling and the Iheory of probability. The technique of
random sampling is based on the Theory of Probability.l Probability
is a mathematical concept and indicates Ihe likelihood or the chance of the
happening or nol happening of a particular event. If, for example, a coin is
tossed it can fall in two ways-either with head up or tail up. Each
of these ways is equally likely and so the probability of the coin falling
head or tail up is equal. It is 1 {2. If, however, a dice is thrown, there
are six possible wars in which it can fall. The probability of its falling
with No.6 up is 1/6 because the chance of its falling with any of the six
numbers upward is equal. The chance of the dice not falling with 6
upward would be 516 as there are five ways in which it can fall without No .
..6 being upward. Thus, if an event can happen in a ways and fail to
happen in b ways and if each of them is equally likely, the chance of its
happening would be _!__b and of its not happening _!!__b' I( from a
a+ a+
pack of 52 cards one card is drawn at random the chance of its being

I For detailed description see chapters on Samp/in&.


48 FUNDAMENTALS OP STATISTICS

any king is clearly 4/52 and the chance of its being any card of spade is
13/52. This clearly indicates that if the chances of selection of all the
units in a universe are equal, and if from it, selections are made at ran-
dom, then the possibility is, that in the sample so selected the various type
of units would be in the same proportion in which they are in the universe.
On this basis it is said that random sampling gives a representative sam-
ple which contains the characteristics of the populatlOn. Further, as
has been pointed out earlier, the size of the sample and its accuracy are
also related. In ten tosses of a coin it is not unlikely that seven times it
falls heads and only three times tails. But if there are a 100 tosses there is
a greater chance of heads and tails being equal. If the number of tosses
is 1000 the chance of equal distribution of heads and tails is still greater.
The bigger the size of the sample the greater is the chance of accuracy.
Law of statistical regularity. Thus according to the rules of the
theory of probability, if from the universe a moderately large sized sample
is chosen at random, it is almost certain that on an average the sample so
chosen will have the same characteristics as the universe. It is on this
basis that games of chance are played successfully by a large number of per-
sons and the insurance companies are able to insure people against varlOUS
types of calamities. In statistics this law is known as the "Law of Statis-
tical Regularity. It is a corollary to the mail} .theory of probability.
The theory ofp,.obability tells us of the mathematical expectation of the success Dr
failure of an event and on this basis the law of statistical regularity tells us that
random selection from the universe is very likely 10 give a representative sample.
Law of inertia of large numbers. We have men'tioned above, that
there is a relationship between the size of a sample and its accuracy.
The larger tht. sample the greater would be the accuracy. The reason
for this lies in the fact that in large numbers the chances of compensatory
action are greater. If in the first ten tosses of a coin there are seven heads
and three tails, it is quite likely that in the next ten tosses the situation
might be reversed and there may be seven tails and three heads. The
larger the number of such experiments the greater are the chances of
one irregularity compensating the other. It is said on this basis that
large numbers have got an inertia or that they are more constant. The
production of wheat in the 'district of Allahabad might show great varia-
tions year after year but the production figures of the state ofU. P., would
not. vary much, because if in some districts the crop is above normal it is
very likely that in others it might be below normal. Similarly the pro-
duction figures of wheat for the whole of India whould show still less
variations and the figures of world production would show hardly any
significant change. This phenomenon is characterised as the "Law oj
Inertia of Large Numbers" which states that large numbers are relatively
more constant and stable than small ones. It is on the basis of this law
that we say that larger the size of the sample the greater would be its
accuracy.
It should not be concluded from the above discussion that the law
of inertia of large numbers does not allow any change in figures with the
passage of time. All that it means is that large numhers are more constant
COLllECTlON OF PRIMARY AND SECONDARY DATA 49

and stable than small ones. There are no violent fluctuations in large
numbers. After all the figures of world production of wheat do change
from time to time but these changes are not violent and sudden. They
are slow and gradual. Long-period trend is indicated by large numbers:
they simply ignore the short-period regular and irregular fluctuations.
COLLECTION OF SECONDARY DATA
Soqrces of secondary data
We know that secondary data are those which have already been
collected and analysed by someone else, and as such the problems asso·
ciated with the original collection of data do not arise here. Secondary
data may be either published or unpubli~hed. The sources ofpllblished data
are usually : -
(0) Qfficial publications of the central, state and the local govern-
ments.
(b) Official publications of the foreign government or interna-
tional bodies like the United Nations Organization and its
subsidiary bodies.
(c) Reports and publications of trade associations, chambers of
commerce, b~nks, co-operative societies, stock exchanges, anc
tnlde unions, etc.
(~ l'echnica~ tiade journals like the Economica, Indian
Journal of Economics, Commerce, Capital, etc., and books
and newspapers.
(t) Reports submitted by economists, research scholars"university
bureaus and various other educational associations, et~.
The .fOliren, of ilnpllbli.fhed data are varied, and such materials may
be found with ~cholars and research workers, trade associations, cham-
bers of commerce, labour b~eaus, etc. Many enquiries of a private
nature are conducted by these bodies and these findings are not pub-
lished and are usually ineant for the conswnption of their members only.

Editing and scrutiny of secondary data


The secondary data mu~1; be used with caution. It is usually very
difficult to verify ~u'ch data and to edit them to find out inconsistencies,
probable error,s and omissions. Scrutin, of the secondary data is essen-
tial bec~u.se the data might be inaeCllrate, ttnsllitable or inadeqllate. In the
words of Bowley, "It is never safe to take published statistics at their face
value without knowing their meanings and limitations and it is always
necessary to criticise arguments that can be based on them." Statistics
collected by other people canno't be fully depended upon as they may
contain many pitfalls and unless they have been thoroughly scrutinized
they should not be used.
4
50 FUNDAMENTALS OP STATISTICS

The secondary data should possess the following attributes :_


(i) They sh(1uid be reliable. The reliability of the data can be tested
by ·finding out : -
(a) Who collected the data and from which sources?
(b) Are both the compiler and the source dependable ?
(c) Were the data collected by the use of proper methods ?
Cd) At what time were the data collected? Can it be regarded
as normal time ?
(t) Are there any possibilities of deliberate or unconscious bias
on the part of the compiler?
(j) What degree of accuracy was desired by the compiler? Was.
it achieved ?
(it') They should be slIitable for the pJlrpose oj investigation. Even if the
data are reliable they should not be used if they are found to be unsuitable
for the purpose of investigation. Data which are suitable for one
enquiry may be entirely unsuitable for another. An example would make
the point cle'ar. If an enquiry is being conducted about the level of
earnings of factory workers and if some data collected by some agency
relating to wage level are being utilised, it is quite likely that these data
may be unsuitable for the purposes of the present enquiry. It is possible
that the data which are being used might relate to wages of skilled labour
only or might relate to the wages of day workers only or might include
bonus payments. In all these cases the data are upsultable for investigat.
ing the earnings of factory workers. The definition of various' terms and
units of collection must also be carefully scrutinized and the object,
scope and nature of the enquiry should also be properly studied. If
there are differences in th(Jse, the data are not lit to be used.
(iii) They should be adequate. TIle data may be found to be reliaJ:>le
and suitable but they may be inadequate for the purpose of the enquiry.
The original, data may refer to an area which is wider or narrower than
the area of the present enquiry and if it is so, they should not be used,
because there might be signi.ficant variations in different regions.
Further the data may not cover suitable periods; for a monthly study of a
phenomenon; yearly figures are inadequate. Again the degree of accuracy
achieved in the data may be found to be inadequate for the purpose of
the investigation in which they are proposed to be used.
Thus it is very risky to use statistics collected by other people unless
they have been thoroughly scrutinized and found reliable, suitable and
adequate.
Questions
I. Distinguish between primary and secondl\TY data. What ate the various
methods bv which prImary data arc collected ?
2.. "In collection of statistical data commonsense is the chief rcquisitlf and
e:s:perien£e the chief teacher." Discuss the above statement with commentS.
(M. A. Pafna, 19~I),
COLLECTION OF PRIMARY AND SECONDARY DATA 51
3. Mention the different kinds of statistical methods generally used in investi
gations. Are there any fields of enquiry where these methods cannot be used i'
(B. Com. Agra. 1940)
4. "Though figures ClUlnot lie. yet liars can figure". Expand the above state-
ment so as to explain its bearing on the use of secondary statistical data.
(M. Com. AI/ahabad. 1945).
5. How will you organise an investigation into· the handloom ,veaving industry
of Urtar Pradesh? Prepare a questionnaire for the purpose.
( B. Com. Allahabad. 1942.).
6. How far do the results "I statistical investigations depend upon correct
sampling? Compare the me.thods used to secure representative data.
~B. Com. Agra. 19~9
7. State and explain ,he law 0 f st:ltistical regularity. Di~cuss the methods)
generally used in sampling. (D. Com. Agra. 1941)
. S. Comp.lre the dirferent methods used in the collection of numerical data.
Explain the importance of determining a statistical unit. (B. Com. Agra. 1942.)·
9. Distinguish between a census and a sample enquiry and briefly discuss their
comparative advantages. Wl1ich of these methods would you prefer for caleulating
the total wages of,vorkers io a given industry? (M. Com. Agra. 1947).
(0. ,You are required to undertake a rapid sample survey for estimating average
size of a holding for rour province. How would you plan the survey and how would
yOU use the rcs!llts of tbis survey on a subsequent occasion?
11. It is desired to obtain reliable data to lind out the cost of production of sugar·
cane. in Uttar Pradesh. How will you proceed to organise the enquiry. Wbat various
points of importance will you consider and what decisions on each such point would
you make? (1. C. S. 1948).
12. What is a random sample? Explain the distinction between a random sample
and a representative sample. How would rou apply the technique of tandom sampl·
ing an enquiry into working class fami y budgets?
1;. Classify the methods generally employed in the collection of statistical data
alld state brieBy their respective medts and demerits. CB. Com. AI/aha.bad. 1946)
14. Draw up a suitable questionnaire for surveying the economic aspects of any
cottage industry in which yoU may be interested. BrieBy indicate how you will pro-
ceed to collect the relevant material.
15. Discuss the advantages of direct personal investigation as compared with the
other methods generally used in collecting data. (B. Com. Agra. 1950).
16. How will you organise an econOlnic survey of a stnaB Indian State com-
prising five towns and 1,000 villages. (M. Com. Allahabad, 1943).
17. If you are appointed to investigate the housing conditions of industrial
labour in Lucknow how will you proceed to do the job Give a specimen of the
~uestions that you would put. (D. Com. Lllt/moUl, 1944)
18. Compare the advantages and disadvantages of the census method and the
sample method of collecting statistics. B. Com. Ca/mlla. 1937)
19. Statistical investigations carried out by the Government arc usually based
~ither on complete enumeration of universe of reference, as for instance, the popula-
tion census. or on the study of "typical" cases as for instance, the proposals
regarding the economic censUS. Explain why the method of random samples is to be
preferred to either of these methods. (M. A. Allahabad. 19;5).
zoo Show the necessity of the uSe of method of random sampling in any
extensive investigation. How will you make use of thiS method in carrying out an
economic survey of the rural areaS of U. P.
21. How would you organise an investigation into the hand weaving industry
of U. P. ? Propose a queStio.rlnaire suitable for the purpose.
(B. Com. AI/ahabad. 194~).
52 P'UNDAamNTA'LS OP STAnSnCS

12.. What is'sampling' and what are its uses. Expltin how would you design
a sample survey to estimate an average size of holding in locality.
(M. A. A".4. 1947).
13. "It is never safe to take published statistics at their face value without know-
ing their meanings and limitations and it is always necessary to criticise the arguments
that ~n be based on them." (BollPlt}!). Elucidate. CB. Com. Allahabad, 1946).
24. Why is it neeessaey to sctutinizc and edit secondary data before its usc?
What' precautions would you take before ',sing such statistics ?
IS. Write short notes on :
(a) Theory of Probability.
(b) Law of Statistical Regulatlty.
(I) Law of In.ertla of Large Num~ets.
2.6. "In any sample survey there arc many sources of errots. A perfect survey'
is a myth". Discuss the ~tatement.
z7. Suppose you we-nt to study the changes in the e#cnt of indebtedness of
middle-class people of Allahabad for the next five' years. 'How would you proceed
to do it 7 Explain all the protesses. -- (8. Com. BtlnOral, 19S5).
z8. Descrlbe the procedure you wouJd adopt In order to obtain the necessary
Information for introducing compulsory primary education in a big city.
(B. Com. Btztloral, 19'2.).
19. "Statistics, especially other people's statistics, are full of pitfalls for the user".
(Conner) Do you agree with this statement ? '
50. "Samples arc devices for leaming about large maS$es by observ"jng a few
individual..... (Sneti~_).
Elucidate the above statement.
31. How would 70U conduct an enquiry about 'Payment of Wa~ in an in-
dustry P On what pOlOts would it be necessary for you to he clear before actually
beginning investigatIon work? (M. Com. Agra,19S7)'
31. How would you organise a marketing survey of the fruit trade in a particular
region wIth a view to making suggestions for its development? Explain the pro-
cedUre you Would fol~ow step by step. (M. Com. Agra, 1956).
Accuracy. Approximation
And Errors 6
Btlitin,g oj data. After collection of data the next step in a statistical
investigation 15 the ·scrutiny of the Ct?llected figures. This is technically
called ;tlitiltg of data. It is a necessary step as in most cases the collected
data contain various types of mistakes and errors. It is quite likely
that some question has been misunderstood by ~he informants, and if it
is so, this part of the data has to be collected afresh, or it may be, that
answers to a particu1a.s: question are, in general, vague, and it is difficult
to chaw inferences from them, or some of the schedules and question..
naires are so haphazardly blled that it is necessary to reject them. It is
also likely that some of the investigators were biased and the answers
&ned by them or the data collected by them show unmistakable signs of
their prejudices. In all such cases the collected data have to be edited
and modified. However, it should be, clearly understood that undue
tampering of data should never be doae. If only a few schedules are
defective they can be omitted but this too should be done very carefully.
,"In some cases the omission of a few schedules would not affect the general
conclusions, while in others this may entirely change the complexion
of the problem under study. As has been pointed out earlier, absolute
accuracy is neither 'possible nor essential but decision about the extent
to which irutccuracles, approximations and errors can be allowed, is a
very important step in statistical analysis and we shall study these things
in the fOllowing pages.

ACCURACY

'Reasons IPItJ JHrfeGi ar&lIraty not possible. Perfect accuracy means to


describe a phenomenon enctly as it is. It is impossible to be achieved.
We can never describe a thing with complete accuracy. There arc two
reasons for it .(a) imperfection of the investigator, and (b) imperfection
, of the instruments of inspection and measurement. Since man is not
perfect the investigations done by him and the instruments of measure-
ment and inspection made by him are also imperfect. For these reasons
the data collected cannot be absolutely ac~~te.
It is futile to e~ect complete accuracy in statistical investigations.
When in physical sCIences where controlled experiments can be done
perfect accuracy cannot be achieved, it is no use to expect the same in
statistical investigations, where, neither the experiments are possible nor
it is possible to use the.instruments of measurement at all places. In
statistical methods where personal prejudices-deliberatt: or uncon-
scious-are present, efforts to .obtain absolute accuracy are bound to end in
fallure. In reality one should not be surpmed at the fact that sbltistical
54 FUNDAMENTALS OP STATISTICS

methods have given comparatively inaccurate results, because there are


reasons for it; the fact to be really surprised at is, how have the statistical
, methods given such results which are fairly close to accurate ones, In
fact the science of statistics helps us in understanding the factual world
with all its inaccuracy and imperfectness. When conditions of investi-
gation are imperfect, the invf'stigator is imperfect and the instnunentJ
of measurement are imperfect it is only natural that the results do not
achiev:e perfect accuracy.
,No need of absolute accuracy. Moreover there is no need of absolute
accuracy in statistical i.nvestigations. If reasonably accurate estimates
are available there is no difficulty in understanding or analysing a pheno-
menon. At many places it is foolish to try to have absolute precision.
For example, if the distance froor the earth to 'any planet is es'timated
correct to inches (if it is possible) this woul_? hardly have any practical
significance. Where billions of miles are being measured or estimated
inches have absolutely no importance. This is an example of extreme
type. In actual practice estimates which are many times more crude than
this are sufficient for the purpose of statistical analysis. No businessman
cares to weigh grain correct to an ounce. Where ~easurements are
being done in tons it is enough if they are correct to a pound. Similarly
in the measurement of miles a few yards have no significance, not to talk
of feet and inches. In fact we never measure ~ thing with perfect
accuracy. We simply estimate its true value. If in the estimates there
is reasonable accuracy we have every reason to be satisfied.
What ;s reasonable accllracy? But on this point a very pertinent
question arises. What do we mean by reasonable accuracy? It is not possible
to give an absolute definition of this term. It depends on the type of
data that are being used and the purpose of the investigation. In many
cases there are conventional standards of accuracy and they also help the
investigator in taking a decision. In measuring the distance from the
earth to the sun a few hundred miles can very safely be left out but
n measurement of cloth even a few inches cannot be ignored. In
statistics there is no need of absolute accuracy; only relative accuracy
is taken into account.
How the degree of acC1lf'tlC'y ;s shown. Degree of relative accuracy
achieved should always be mentioned. If the production of wheat in
a certain district is 25,000 tons (correct to a 1000) the degree of accuracy
can be shown in any of the following ways:-
(a) The production is 25,000 tons (rounded in thousands).
(b) The production is 25,000 tons plus or minus an amount
not exceeding 500 tons; or the production is 25,000
tons ± 500 tons.
(c) The production is between 24,501 and 25,500 tons.
Cd) The production is 25,000 tons correct to 2%.
ACCURACY. APPROXIMA'rION AND ERRORS S5
APPROXIMATION
Meaning and need. "Approximation is the basis of rounding off
the figures with a view to simplify them and to make them fit for con-
sumption and analysis without in any way imparing the standard of
reasonable accuracy." Big numbers are usually confl.lsing to the eye
and the mind, and even when actual figures are available it is worthwhile
to round them off, with a view to make them more intelligible and fit
for analysis and interpretation. At many places there is no need to
give actual numbers and approximate figures setve the purpose all right.
If the actual figures of the production of wheat in India are given without
approximation they would be confusing and difficult to analyse and in-
terpret. Round figures can safely be given it_). such a case. It is quite
likely that the figures which are, left out or added in the process of appro-
ximatibn might actually make the data more accurate and remove the
errors of calculation.
Methods of approximation
There are some universally accepted methods of approximation.
They are given below. Out of these the first one is the most ac.curatl'.
(a) ApproxiIJlation to tbe nearest wbole nllmber. In this method the
nearest whole number is written in place of the actual figure. Thus
5,32,671 would become 5,33,000 (to the nearest 1000)
4,12,?30 would become 4,12,000 (to the nearest 1000)
The rule is that if the portion that is being left is more than half
the whole number (1000 in the above case) it shO'llld be replaced by the
whole number. In the first example given above, 671 has been replaced
by 1000 and the number has thus becomes 533 thousands. If the portion
approximated is less than half ot the whole number it should be ignored.
In the second example above 230 has been left out. If the number to be
approximated is just half of the whole number, it can either be replaced
by the whole number or ignored. However, if such cases are many,
in half of them the whole "number should pe kept and in the 'other half
the figures should be ignored. Another practice followed in such cas es
is to keep the retained figure unchanged if it is even and to increase it to
the next higher figure if it is odd. Thus 324 will be rounded as 320 and
335 as 340.
The same rule can be applied in case of percentages and ratios
etc. For example 74.8~ ~ can be written as 75 percent and 73.2% as
73 percent in round numbers.
(b) Approximation by tlsiJlg the next bigher whole nU/IJ/Jer. In this
method in place of the portion which is being left out the next higher
figure is written. According to this rule:
5,32,671 would become 5,33,000 (correct to 1000) and
4,12,230 would become 4,13,000 (correct to 1000)
According to the first rule 4,12,230 wa~ approximated at 4,12,000
but according to this rule it has b( w approxImated at 4,13,000.
56 FUNDAMENTALS OF STATISTICS

Similarly 74.8% would be approximated at .7S~o and 73.2% at


74% and not 73%·a'8 in the previoull method.
(t) Approximation by discarding terlain digils. In this method a
",art of the number which is approximated is entirely left out. Thus
5,32,671 would become 5,32,000 (correct to a thousand)
and 4,12,230 would become 4,12,000 (correct to a thousand).
Similarly 74.8% would become 74% and 73.2% as 73% (correct to
a whole number).
How much approximation is necessary in a particular case would
depend on the degree of accuracy achieved in the collection of data.
If, for example, certain lines have been measured correct to m.ilI.Up.etres
then the tenth part of the millinietre can be removed by approximation,
3.22 nuns. can be approximated as 3.2 roms. OrdinarilY all }iglll"l1 ,xapl
on; bI,J01Id th, fJl4rgin of aG&llraey shol/ld b, lift 0111.
Method of Ulriting approximated fi/,II"s. The approximated figure
should· be written in such a manner that the degree of approximation
is clear from it. For example, a line has been measured correct to milli-
metres and measurement 1S 4.99 ems. After approximation it would be-
come 5 ems. but it should be said and written as 5.0 ems. and not 5 ems.
5 ems. would mean that the measurement is correct up to centimetres
only, i.e., all measurements between 4.5 ems. and 5.5 ems. have been
expressed as 5 ems. On the other hand, 5.0 ems. would mean that the
measurement is correct up to milllmetres or in other words all mea-
surements betwe~n 4.95 and 5.05 ems. have been expressed as 5.0 ems.
Thlis if there is a f(!ro at thl Ind of an approximat,d Jigtlrl it Ihol/lt! al1ll91
h, IIIrilt,n.
The method of approximation should also be made clear while
writing an approximated figure. Usually the lower and the upper limits
of the approximated figure should also be stated. For example, in the
illustration given in the preceding paragraph if the measurement is correct
to centimetres it should be written as 5±0.5 ems. and if it is correct to
millimetre'S it should be written as 5.0±O.05 ems.
Approximation ant! other &altliialionl. If approximated figures are
used in multiplication, division or for finding out the roots or powers
great care should be exercised. In such cases the ertors due to aJ;lproxi-
mation would come after multiplication, division, etc., and this may
considerably affect the conclusions. For example, if two figures 194
and 184 are multiplied their product would be 36,696. If, however, they
are approximated as 190 and 180 respectively -and then multiplied the
product would be 34,200. There is a considerable difference between the
two results. Similarly in division of figures or in the calculation of roots
and powers, approximated figures may sometimes give erroneous can·
elusions. The effect of approximation on percentages calculated from
such figures is negligible. An illustration would make it clear. 1,23,65,
357 is 25% of 4,94,61,428. If these figures ate approximated correct to
a lakh they would become 1.24,00,000 and 4,95,00,000 res~ctively. The
ACCURA.CY, A.Pl'ROXIM'ATION AND ERRORS 57
former is 25.05% of the latter. Thus 'We see that even a high degree of
approximation has not materially. affected the percentages.
STATISTICAL ERRORS

Meaning. The word error is used in a specialised sense in statistics.


It does not mean the same thing as mistake. Mistake in statistics means
a wrong calculation or use of inappropriate method in the collection 'or
analysis of data. Error, on the other hand, means "Jhe difJereoa beJlJlltR
the trIIe vallie and Jhe utifJlaled tla/lle." We have seen In the p:receding pages
that in statistics 'We only aim. at a reasonable standard of accuracy. In
other words, we use approximated values or estimates rather than actual
v:aIues. The- difference between the approximated or estimated value and
the true value is technically called the statistical error.
CaliSCI of errors. Statistical errors arise due to a large number' of
factors. They may be due to inappropriate definitions of statistical units
bias of the investigator or the inherent instability of the collected data.
Such errors are called Errors oJOrigin. Errors may also arise on account
of manipulation in counting, measurement, description or approxima-
tion. Such errors are known as E"ors of ManipllJation; Yet another
cause of statistical errors may be the use of incomplete data, errors may
also arise on account of inadequacy of the sb1e of the sample and all such
errors are called E"ors of Inadetjllaty.
Measurement of Errors
Statistical errors can be measured either-
(4) absolutely, or
(b) relatively.
AvsolNle ~nd relalilll errors. If the error is measured absolutely it is
called an absolute error and if it is measured relatively it is called relative
error. Absolute error is the difference between the true value and the
estimate. If the actual figure of sales of a concerti is Rs. 9,900 and the
approximated figure is Rs. 10,000 there is difference of Rs. 100 in these
two figures. This is an abso/llle error. Relative erroi is the ratio of the
absolute error to the estimate. In the above example if the absolute error
of Rs. 100 is divided by the estimated figure of Rs. 10,000. the result,
Ib%o or 0.01 is the relative error. The relative error can al~o be express-
ed in te1l1lS of percentages. It is then known as percentage e"or. In this
100
example percentage error would be 10000 X 100 or 1 'Yo.
,
Algebraically if U stands for the actual value, U for the estimated
value, Ue for the absolute error and e for the relative error,
Ue=U'-U and
U'-U
'=-u
58 FUNDAMENTALS OF STATISnCS

In statistical analysis relative errors are more valuable than abso-


lute errors. Absolute errors very often give erroneous conclusions.
If the true value of a phenomenon is 99 and if it is estimated at 100,
the absolute error is 1. Again, if the true value is 99,999 and the
estimated figure 1,00,000 the absolute error is 1. The absolute error
in both cases is the same but the relative error in the first case is-n~o-
while in the second it is 1,0~,000' The first error is relatively 1000
times the second one.
Positive qnd negative errors. Absolute and relative errors can be
either positive or negative. If the true value exceeds the estimate, the error
is said to be positive and on'the other hand if the estimate exceeds the
true value the error is called negative.
Classes of Errors
Broadly speaking ertors may be either-
(a) Biased or
(b) Unbiased.
Biased erron. Biased errors are ·those which arise on account of
some bias in the mind of the investigator or the informant or in the
instruments of measurement. If the investigator wishes to exaggeTate
the :figures he would approximate them at the next higher figure. If,
on the other hand, he has a downward bias he would approximate them
by discarding numbers. A biased investigator can play mi~chief even at
earlier stages of investigation. He can select such data which would suit
his conclusions. Biased errors may also arise due to defective instruments'
of measurement. If a yard-stick, which is 35 w in length, is used to measure
a certain distance it will always produce a biased error, as there wiII
always be a short measurement.·
Bia.red error.r are cllmulative. The larger the number of cases in which
there is a biased error ~he greater would be its magnitude. Ifwe measure
the distance of 5 yards only with a yard-stick of 35* the error would be
of the magnitude of 5". But if we measure 100 yards the error would
be 100". Thus biased errors are cumulative.
Uf/biased err()N. Unbiased errors are those which arise just on ac-
count of chance. They are not the results of any prejudice or bias. If
figures are approximated to the nearest whole number, the error would be
unbiased, as in some cases the approximated number would be less than
the actual ones while in others they would be more than the true values.
Unbiased errors are generally cOlllpensating. One error compensates
the other. The law of statistical regularity works here and since errors
are both positive and negative they usually cancel each other. If the
yard-stick is just 36' and if with it, certain distances are measured. it
is quite likely that in some cases the measurements are unconsciously
more than 36' while on others they are less. The larger the number
of such measurements the lesser would be the error. An unbiased
ACCURACY, APPROXIMATION ANt) ERRORS 59

coin may fall heads in:3 tosses,out of 4 but in 3000 tosses the number of
heads and tails are bound to be more or less equal. There is a general
tendency everywhere to give ages in round figures. It is another
example of unbiased error. If some people have, in this process, over-
estimated their ages, others might have under-estimated them. A person
of29 years of age may call himself of 3Q but it is also likely that a person
of 31 years may call nimself of 30, and in such a case the errors cancel
each other.
The following table will illustrate the characteristics of the biased
and unbiased errors : -

TABLE I
Bialed and u'lbialed e'-f'()'-J

Exact number
Correct to
nearest
I Absolute
"error'"
Correct to
next 1000
Absolute
error
1000 unbiased and over biased
50,241 50 +241 51 -759
60,507 61 -493 61 -493
49,361 49 +361 50 -639
61,427 61 +427 62 -573
53,764 54 -2.36 54 -236
48,090 48 + 90 49 -910
50,460 50 +460 51 -540
96,670 97 -330 97 -330
I
60,250 60 +250 I
61 -750
Total 5,30,770 I 530 +770 536 -5230
When figures are estimated correct to the nearest thousand the
error is an unbiased one. The unbiased absolute error in the above
ngures, as shown in column 3, is only 770 and the relative error is
5';~,~70=0.001453. The errors are negligib1~.
When figures arc estimated correct to the next one thousand and
over, the error is a biased one. The biased absolute errOl in the
above case is - 52.30 as shown in column 5 and the relative error is
5~~70 =0.00975. These errors are comparatively much more than
in the previous case and cannot be safely ignored.
Brrtt,-I in !lIliltiplication, dir·jIion, ete. However, it should bt:'
remembered that neithet are unbiased errors always compensatini>:
'nor biased errors always cumulative. Where items have to be added
together biased errors would no doubt be cumulative and unbiased
ones compensating; but where items have to be subtracted the situatio.lll
is just the reverse, and biased errors would be smaller in size than the
unbiased ones. If ~'o items arc multiplied together unbiased errors
60 FUNDAMENTALS OP STATISTICS

would give a better estimate than the biased ones. But if the items
are divided and the algebraic signs of the two figures are the same
(as is the case in biased errors) the result would be quite close to the
true valu~ ;and if the signs are opposite (as is the case in unbiased error$)
the reo;ults would be away from the true value. In other words, ordina-
rily, unbiased errors ar.e compensating only when items have to be added
or multiplied but when the items have to be subtracted or divided
biased errors would give results closer to the true value than the; results
given by unbiased errors. .
These points can· be illustrated as follows : -
True Value Estimated value with Estimated value with
biased error unbiased error
(a) 100 99 99
(b) 200 197 202
(i) Biased errOl in-(d)""l ana unbiased euo! c= 1
(ii) Biased errOl in (b) -3 and unbiased etror "" - 2
(iii) Biased ~rror of (a+b) or 300 -= (300 - 296) or 4 and
unbiased error (300-301) "'" - 1
(iv) Biased errOl fo! (b-a) O! (100) ... (100-98) "" 2 and
unbiased error-(100-103) -3
(v) Biased error for (axb) or 20,000=(20,000-19503)-497 and
the unbiased error -(20,000-19.998) ==21
(VI) Biased error for (b+a) or (200+ 100) or 2 ""
197) . ( 202)
( 2- 99 ... 0.01. and the unbIased error -2- 99
--0.04.
Thus it is clear that in addition and multiplIcation the biased
errors are more than the unbiased ones whereas in subtraction and
division the position is reverse and the unbiased errors are more thaD
the biased ones.
Estimation ot errors
In most of the statistical investigations in actual practice the exact
figures or the true values are not known. In such cases we cannot
measure the absolute or the relative error. But it is possible to estimate
them.
EsJimation of IInbiased e"Orl. Unbiased errors can be estimated
without much difficulty in most of the cases. In the illustration in
Table I if the actual figures were not known. all we could say was,
that the total of the figures (correct to nearest 1000) was 5,30,000.
If the absolute error in the above figures, is to be estimated then in
each of the nine items it can range between 0 and 499. It will be
zero if the actual number was in exact thousands. and in such a case
the actual and the approximated figures would be the same. The
maximum error in any figure can be 499 because the approximated
figure will be discarding all numbers less than 500 and adding all
ACCURACY, APPROXIMATION AND ERRORS

numbers more than 500. Thus 60,250 has been approximated as


60,000 and 60,507 as 61,000 0 and 499 are the minimum and ma-
ximum of the absolute error~ per: item in this example. The most
likely error, however, would lie somewhere between these two limits.
It can be expected to be about the middle of these limits at 249.5
or say 250. The best estimat, of the unbiased absolute ,"or ;s given by the
product of the average ahsolllie error and the square rool of the nlllllber oj il,/III.
ln the above case the estimated average absolute error per item is 250
and the square root of the number of items ('\1'9) is equal to 3. 'Thus
the estimated value of the absolute error would be :
Average absolute error of items X ,I:No-:-'
otltems
"" 250 'x v'9 .=; ± 750
The estimated figure of 750 compares well with the actual figure
of the absolute error which is 770.
, ,

'The relative error can be estimated easily. It is equal to the


estimated absolute error divided by the approxImated total of the
items. In this case it would be:
_ 250 X"';9 750
- 5,30,000 - 5.30000 , , -0.00115
The actual,relative error as we had calculated was 0.001453.
Bstimation of bia.red ,"orl. Just as it is possible to estimate the
unbiased errors, similarly, the biased errors can also be estimated to a
certaiq extent. In the example given in Table 1 the minimum biased
absolute error per item is 1 and the maximum is 999, as all figures from 1
upward to 999 are apl'roximated at 1000. Thus 60,250 has been approxi-
mated at 61,000 and (;0,507 has also been approximated at 61,000. The
likely error per item would be somewhere between 1 and 999. It will
be round about SOD. The rule for the estimation of biased errors is
slightly c:lliferent from the rule for unbiased errors discussed above.
In case of biased errors the estimation is done by Illl1ltiplYing Ihe average
absolllte'e""s of il'lII. by the nllmber of item.r (instead of the square root of
the number as in the previous case). Thus in the above. illustration
the estimated absolute biased error would be :
Average absolute error of items X Number of items
... 500x9=+4500
The estimated absolute biased error of 4~OO conlpares well with
the actual figure of 5230. The estimated relative biased error can be
found out by dividing th,is figure with the approximated total. In the
above case It would be :
Average absolute error of items X Number of items
Approximated total of all the items
SOOX9
... 5,36,000 = .0084
The actual relative error had been calculated at .00975.
62 FUNDAMENTALS OF STATISTICS

Questions

1. Write a note on the c;ditlng of primary and secondary data for the purposes of
analysis and interpreta~lon.
2. The statistician who desires to safeguard. his analysis and result8 from im.
perfections entering at the very start should rest his choice among sources upon a test
of reliability rather than upon accessibility and convenience.
Expaod this statement so as to bring out clearly the way in which sources should
be used. eM. Com. LtlcJ:nolP, 1943)'
3. Discuss the standard of accuracy required in statistical calculations. To what
extent should approximations be used? (M. A. Agra, 1949)'
4. What precautions should be taken in the use of published statistics.
I (B. Com. Agro, 1949)'
5. Mention the advantages of approximation of Statistics. What degree of
accuracy is generally required in each statistical investigation?
(M. Com. Rajpulono, 1951).
6. What are the different ways of approximating figures ? Discuss the merita
of each.
7. To what extent call figures be safely approximated in statistical analysis?
How should such ligures be written i'
8. (0) Discuss the sources of errors in statistics and their effects.
(b) State the important methods of approximation and their utility in
statistics. (B. Com. Agra. 1940).
9. In what way does a statistical error differ from a 'mistake? What classC1I of
¢uorsarethere and how may they be measured? (B. Com., Allababad, 1943)'
10. Discuss the various types of errors likel y to creep into statisl:ical investigations
and suggest how to avoid or correct them. (B. Com. Agro, 1949).
. u. Of the biased errors the statistician should have none : but of the unbiaaed
ones the more the merrier, notwithstanding that they are also errors. Elucidate'.
12.. In framing statistical estimates we are not so definite as the Modem Traveller
who:
........ knew the weather to a T.
Longitude to a degree.
The Latitude exactly,"
Explain the bearing of the above, on the degree of accuracy desired in statistical
estimates as distinguished from the estimates of the more exact sciences.
eM. A. PlInjab. 195Z).
15. Show how biased errors are generally cumulative and unbiased ones com-
pensating. Are there any exceptions to this general rule?
14. Discuss the various methods of estimating biased and unbiased errors botb
abSolutely and relatively.
1 S. Distinguish between
(a) Absolute and relative errors and
(b) Biased and unbiased errors.
Discuss the effects of these errors and explain the steps that are taken to meet the
effects. (B, Com. Agra, 1938).
,
Classification, Seriation and
Tabulation
7
CLASSIPICATION
Need "nd meaning
The data which are collected or compiled in accordance with the
rules and methods discussed in the preceding chapter are usually very
voluminous and large in quantity. As such they are not directly fit for
analysis or interpretation. If, for example, the figures of the expenses
of 2,000 students residing in Allahabad University hostels are before
us, as collected, it would not be possible to draw any inferences from
them because for purposes of comparison. analysis and interpretation
it is essential that the data are in a condensed form. Further. it i$.
a]so essential that the likes must be separated from the unlikes. All
the 2.000 students, no doubt. are alike in the sense that all of them
belong to a particular university and live in hostel but they differ in
other respects. Some may be living in single-seated rooms atld others
in double or treble-seated rooms; some may be living in costlier hostel
and others in comparatively cheaper ones; some may be having their
privat~ messing arrangements while others may have joined the common
mess. Thus, even though the data collected relate to one set of persons
yet there may be many types of dissimilarities even within this ~roup.
For the purpose of analysis and interpretation. data have to be d1vided
in homogeneous groups. In order to remove these defects-of volume
and heterogeneity-;-statistical data are fablliated with a view to present
a condensed and homogeneous picture. But before the tabulation of
data, it is necessary to arrange them in homogeneous groups so that
there may be.no difficulty in tabulation. The proceu of arranging data in
grollps or claue! according to relemblances and limilarities is technicallY called
Cla.r.rification. Thus, by classificatioQ we try to strike a note o(homoge-
neity in the heterogeneous elements of the collected inform~tion. Classi-
fication gives expression to the similarities which may be found in the
diversity of individual units. In classification of data units having
a common characteristic are placed in one class and in this fashion the
whole data are divided into a number of classes. Even after classifi.cation
the !ltatistical data are not fit for comparison and interpretation but this
is certainly the first step towards the tabulation of data. After tabula-
tion of data statistical analysis and interpretation are possible. Classi-
fication is a preliminary to tabulation and it prepares the ground for
proper presentation of statistical facts.
Characteristics of an ideal classific~tion
Despite the fact that classification is a very important preliminary
in a stati~tical analysis no hard and fast rules can be laid down for it.
64 PUNDAMENTAt.s lOP STATISTICS

Technically the classification of data in each ~vestigation has to be


decided after taking into accol!nt the nature, scope and purpose of the
enquiry. However, an ideal classifi_>ation should possess the following
characteristics : -
(a) II shoilld be IIntJlllbigtlOlil. If there is ambiguity in classification
the very purpose for which it is meant is not served. Oasswcation is
meant for removing ambiguity. It is necessary that the various classes
should be so defined that there is no room for doubt or confusion. It
is by no means an easy task. . If we have to divide the population into
two classes, say, literates and illiterates, exhaustive delinltion of the
terms used, is essential. Who is a literate? is a question not easy
,to answer. Some criterion has to be laid down. In the last censuS of
.population of India, a literate was deBned as one who could read and
write a simple letter. This is technically not a very satisfactory definition.
After all what is meant ~y a simple letter is a point on which there can
be difference of opinion. But for practical purposes the definition can
be .said to be faidy satisfactory.
(b) If Ihould belfaple. The ideal classification should have the merit
of stability. If a classification is not stable and if each time an enquiry
is conducted it has to be changed, the data would not be fit for com-
parison. The occupational classification in the Indian population census
suffers from this defect. Various occupations have been defined in
different ways, in successive censuses, and these figures llfe not strictly
comparable.
(e) II sholild be flexible. A good classwcation should be flexible
and should have the capacity of adjustment to new situations and
circwnstances. When we talk of stability of classification we do not
mean rigidity of classes. The "f'erm is used in a relative sense. No
classification can be stable for ever. With changes in time, some classes
become obsolete and have to be dropped. while fresh classes have also
to be added. An ideal classification should be such that it can adjust
itself to these changes and yet retain its stability. The data should be
divided into a few major classes which must be sub-divided further.
Ordinarily there would not be many changes in major classes. Only
small sub-classes may need a change and the classification can thus retain
the merit of stability and yet possess flexibility.
Basis of classification
Statistical data are classified on the basis of the charactetistics
possessed by the different groups of units of a universe. As has been
pointed our earlier, these characteristics give expression to the unity
of attributes which may be traced in a diversity of individual units.
These characteristics can be either deltrip/jlle or nUII/erital. Unem-
ployment, oc,:.:upation, literacy, civil conditions and sex are ex-
amples of descriptive characteristics while age, income.. weight
and height are examples of numerical characteristics. Descriptive
characteristics cannot be quantitatively measured or estimated. OnI
their presence or ahsenee .in an individual unit can be found ou
CLASSIPICATION. S8IiUATIO~ AND TABuLATION 65
For example. we cannot q-oantitatively measure litetac:y. All we can
sar is whether an individual is literate or illiterate according to c:er-:-
taln definitions laid down. When, data are classified on the basis
of qualities or attributes, which are incapable of quantitative measure-
ment, the classilication. is said to' be IItaJriJing 10 IIllribtllll, and when the
data are classified on the basis of quantitative D;leasurement the classiii·
cation is said to be IIttore1ing 10 t/ass ;1I1'''II/S.
Classification. according to attributes
SiIllP" tlassi/i&alioll. In this me&od the data are divided po. the
basis of attributes' or qualities. All those units in which a ~cular
characteristic is present, are placed in one group and thos~ 10 which
it is not present are placed in another group. If, for example, the
problem of blit1dness is being studied, the universe can be divided
tnto two classes-one in which the units possess this characteristic
and the other in which this characteristic is not found. We shall thus
have two classes: those who are blind and those who are not blind.
This type of classiiication in which only one attribute is studied and
the data are divided in two parts is called rimp!, tlt!!!ifitlJlitJfl IIf tltmifitll-
litJfl amwding 10 tlitlJolol1l..1'
Ma1li(Dlti tlilSrifitaliDn. If, however, more than one attn'bute is
being stud'ied simultaneously. the data would be divided into a number
Qf classes. If the problem of blindness is studied sex-wise, there are
twQ attributes under study, namely, blindness and sex. A person can
be either blind or not blind; further a person can be either a male or
Ii female. Each of the two ,attributes is capable of division itt two
classes. The data wouid thus be divided in four classes. (1) males who
are blind, (2) males who are not blind, (3) females who are blind.
(4) fenpUes who are not blind. The study can be further extended it
we have a third attribute say, religion. Now ea~h of the above four
classes is capable of further sub-division on the basis of religion. Such
classification in which more than one attn"bute is taken into account is
calred Cll»lifolJ &ltmijitatitJfl.
Arbitrllry 1III1ttr' (If clalri/ital/(JlI. In the various groups which ate
formed in the above mentioned manner the diHerenccs are not always
natural or very well defined. Ordinarily such classification is of an
arbitrary nature. If the universe is divided in two ~roups-tall men
and short men-we shill have to give arbitrary definitions of the two
classes. It can be said that those who are 5 feet 4 inches or above are
taIl and those who are less than. 5 feet 4 inches are short. The
classification is obviously arbitrary. In those cases where a particular
attribute is decided on the basis of quantitative study, as in the above
case of tall and short men, the classification is comparatively more
definite and precise. But this is not always possible. Many attributes
cannot be studied with the help of figures. The dii£erence between
Uteracy and illiteracy is an enmple. Here one attribute gradually
changes into another attribute and there is no clear cut line of demarca-
tion. The c:llifer~nce between a literate and an illiterate is always a
5
66 PUNDAM8NTA,LS OF Sl'A'l:ISTICS

matter of opinion. There may be' many persons, whom it wouid be


difficult to classify either as literates or as illiterates. Whenever data are
classified according to attributes this point should be kept in mind and
attempts must be made to define the attributes in such a manner that
there is the least possibility of doubt and ambiguity.
Classification according to class intervals
This type of classification is applicable only in those cases where
the direct quantitative mealurement of data is possible. Data relating
to height, weight, income, production and consumption, etc., come
under this category. In such cases data are classi6ed on the basis of
values or quantities. Thus, instead of saying that a certain group
of persons is tall while' the other group is short, the heights can be
specUied in class-intervals. Persons whose heights say, are, within
5'4"-5'6" can form one group, those whose heights are within 5'6"-
5'S" can form another group and so on. In this way the data are
divided into a number of classes, each of which is called a class
Interval. 5'6"-5'8" is One class interval. The limits within which
a class interval lies are called C/all Limits. In the present case 5'6"
and 5'S' are respectively the lower and the upper limits of this class.
The: difference between two class limits is termed as &lass Magnitude,
or M4gnilllde of the dass Interval. In the above example the magnitude
of class-interval is 2". The number of items which,fall in any class-
interval are called Class Frequency. If the number of persons whose
heights are 5'6"-5'8" is 116, this would be the trequency of the class
5'6"-5'8".
Classification according to class intervals involves three basic
problems. They are:-
ea) Number of classes and their magnitude.
(b) Choice of class limits.
(c) Counting the number in each class.
Number of classes. Ordinarily, a frequency distribution should
not contain more than 20 to 25 and not less than 6 to 8 classes, depend-
ing upon the total number of items of the series. If the number of
items in a series is large it can have a large number of class intervals
also, because in such a case all class intervals would have a fairly good
fr~quency. If, on the other hand, the number of items is less, the
number of classes should also be less, as otherwise there would be
no frequency in some classes and very little frequency in others. The
idea contained in the data can be easily and readily grasped when the
number of classes is few. But in such a case there is the danger of
obscuring some important characteristics of the data. If the number
of classes is large, all the characteristics of the data are contained in
them but on account of too many classes it becomes difficult to ascertain
them. In fact, a balance should be struck between these two factors.
An ideal nllmber of classes for any frequency distribution would be that which
p_iues the maximllfll information in. tbe clearest fashion.
CLASSIFICATION, SERIATION AND TABULATION 67
Magniftltle o.f intervals. The magnitude of class intervals depends
on the range of the data and the number of classes. If the range
(difference between the maximum and the minimum values) of the
heights of a group of persons is 15', and if it is desired to have 10
classes, the magnitude of each class inter1al would be 1.5'. Besides
these things, a few other points should also be kept in mind. The
magnitude of the class intervals should be such that it does not distort
or obscure the important characteristics of the data. Bearing this
fact in mind the magnitude of tpe class interval should be 2, 5, 10,
25, 50, 100 500, tOoo, 5000 and so on, rather than odd figures like
1, 3, 7,·11, .24, 57, 92 and 472, etc. The multiples of 2,5 and 10 are
in common use and human mind considers them almost as natural
magnitude~.
In general, the class intervals should be of equal magnitude. If the
si2:e of the class 'interval is unequal it may give a misleading impression
and in such cases, comparison of one class with the other may not be
possible.
Class limits. The most important thing that should be kept in
mind while choosing the class limits is that these should be chosen in
such a manner that the mid-point of a class interval and the actual average
of items of that class interval should be as close to each other as possible.
If it is .not so, the class limits would be obscure and distort the main
characteristics of the data. Consistent with this point, wherever
possible the class limits should be located at multiples of 2, 5, 10, 100
and such other figures. The class limits must be such that midpoints
of class intervals are familiar and common figures ending with 0, 2, 5, 10,
15, etc. These are capable of easy and simple analysis. As far as
possible in frequency distribution there should be no indettrminat;e
class~s like under 10 or over 10,000. Such classification may create
difficulties in analysis and interpretation.
The class limits may be written in any of the following ways : -

TABLE 1
I II III IV
0-10 oand under 10 0-9 5
10-20 10 and under 20 10-19 15
20-30 20 and under 30 20-29 25
In the first method, items whose values are just 10 or 20 ca'
be classified either in 0-10 group and 10-20 group respectively or i
10-20 and 20-30 classes respectively. Usually in such cases the iteJ
is classified in the next higher class so that the item whose value
exactly 10 would come in 10-20 group. In the second method, tho
point is made clear. Items whose values are Ius than 10 woul
be in the 0-10 class interval. This is the exclusive method of c1as!'
fication. In exclusive method the items whose values are equ
to the upper limit of a class are grouped in the next higher dar
68 PUNDAl(BN'tALS O~ STATImCS

In other words, the upper litnit of a class is excluded and items wi~
values less than the upper limit are taken into account. As against
this the third method is in&ms;v,. In it the upper limit is alSo in-
cluded in the class interval. This method. in reality, is like the second
method as 0-9 means 0 and undc:r 10. To emphasise this point sOJ:QC"
times the class interval is written as 0-9.99. The fourth method indicates
only the mid-pbints.
Cotm#ng I/;, nllmb,r of it'lIIl in quI; t/all. After deciding the number
of classes. their tnagnitude and class limits, the next thing to be done
is to count the number of items falling in each class. This can be ·done
in any of the following ways : -
(a) .B;r IaI!J ·shl,ls. Under this method, the class intervals ~re
written on a sheet of paper (called Tally Sheet) and for .each item a
stroke is marked against the class interval in which it falls. Usually
after every four strokes in a class, the fifth item is iudicated by drawing
a horizontal or diagonal line over or through the strokes. These groups
of five are eas} to count. Data sotted in such a manner would give
the following type of tally sheet.
TABLE 2
Nllmb". of 1II4f1u oblai",J b" 80 sIIIt/",tl
(Tally Sheet)
------------------------
MArks
I 'I To'"

20-30 IIIl nn II 12
30-40 IiII fin lIn III 18
40-50 UlI IIil iIII IIII IIII nIl 1 31
50-60· lItt nrr 10
60-70 rill IIII 9

Total 80

(b) B.1I11't~i&al aids. Various types of machines are now available


for purposes of sorting !lnd listing of data. Some of these machines
are hand operated wl\ile others are operated with electricity. With
the help of hand operated machines the method of Needl4 Soiting has
become very popular now. Large number of items can be sorted
with it under any number of headings and sub-headin~s. Cards hf
convenient s~e and shape with a series of holes, are used In· this method.
Each hole stands Eor a value and when cardo ~re stacked, a needle passel
through particular hole representing a particular -vlU"ile. These cards
CLASSIPICATION, SERIATION AND TABULATION 69
3re later on separated and counted. In this 'way frequencies of vanous
classes can be found out by the repetition of this .technique. .
The technique of pllll~h,d ~ardl is also equally popular. In this
method the data are recorded. on special cards by punched hole6 made
by means of a special.key punch "fIhich can be operated either by hand
or electrically. HoIferith and Powers Samas sorting machines sort
the cards at a speed of; about 24,000 per hour. Thus we :find that
mechanical aids have made the work of classification very easy, quick
and accurate. _
SBlUATION
Definition. The process of seriation is closely associated with classi-
fication of da,ta. According to L. R. Connor, "If two variable gllatltiti"
~an b, arrang,tT lid, by lith .so that the' meamrabl, dijj'rltl~1S in th, on, ~orrll­
pond to th, ""amrabl, diffBrln~1I in th, other th, reSult il laid to jOf'fll a ItatiSfi~1I1
lerill." If the production figures of wheat in India for the last 10 years
are arranged systematically they would form a statistiatl series. Similarly
if tL·, marks obtained by a group of"100 students or their heights or
weights are pt9perly arranged they would form statistical series.
The classification of data can b~ done on three bases, filii', JptJR,
and ~ontlition and they give rise to three types of statistical series known
as T;., S"i8l, Spatial S";II and Condition Series•
.Tim' S"';es. Time series are also knOWD. as historical series as
the data collected relate to either past ot present. If ·the figures of
.enrQlment of students in the Allahabad 'University during the last .30
years are properly arranged they would form a time series. Similarly
figures of the population of India during the last eight censuses would
form a historical series. The changes in the level of phenomena measured
are related to the changes in time.
Spatial Ser.ies. If the data collected do not change in relation to
time but in relation to place the-series is called spatial series. Technically
spea~g, such series are not statistical" series because changes in place
are not capable of a quantitative measurement. As per the definition
given above, in statispcal series both phenomena should be variable
and capable' of quantitative measurement. However, in I;ommon
parlance data arranged on the basis of place are called spatial series. If
the figures of production of wheat for a particular year, for different
States· of India, were noted down they would form a spatial series as
the data are in relation to place. .
Condition s"';es. If statistical data are recorded on the basts' of
changes in some condition, the series so formed, is called condition
series. If the data relating to the heights of 100 students were classified
they would form a condition series. as the figures are neither on the
basis' of time nor place, but a particular condition, namely, heigqt.
Similarly data relating to income, expenditure, marks, and weight Of
'students would give rise to condition series. ' .
. D;scr,te anti ~ont_RS smll. Statistical. series may . be eithet
-aismt, OT eontinJIDliS. A discrete series is formed fIOm items "Which
70 FUNDAMENTALS OF STATISTICS

are capable of exact measurement. In such cases the various units


are not capable of division. Each unit of data is separated and complete.
We can count the number of persons whose salaries are exactly Rs.
100 per month, Rs. 105 per month, or Rs. 200 per month. The
data would give rise to discrete series. But there are certain pheno-
mena which are not capable of exact measurement like height or weight.
Height of an individual cannot be measured with absolute accuracy
and as such, we cannot count the number of persons whose heights
are exactly 5'46
•The actual height may vary by a thousandth part of
an inch from this figure. In such cases, therefore, the data are given
in relation to certain groups or class intervals. For example we can
count the number of persons whose heights are between 5'3" and 5'4".
Here an exact measuement is not possible. Such series are called
continuous series.
In continuous series the statistical unit is capable of division and
can be measured in fractions of any size, no matter how small. A
ton of coal can be divided in a 100, 1000, 10,000 or even more parts.
Theoretically a ton of coal can be divided into a limitless number of
sub-divisions. In discrete series statistical unit is either not divisible
or is not divided. We can image half a ton, one-fourth of a gallon or
one~tenth of a pound, but it would be absurd to talk about half a son,
one~fourth of a student and one-tenth of a wife. Here the unit is com-
plete and indivisible. We can, however, have discr<;te series, even from
divisible units where they are not conventionally divided. For example,
marks are given to the students in whole numbers. It is po~sible in
such a case to have a discrete series of marks. \
Simple and cumulative series. Statistical series can be either simple or
cumulative. In a simple series the frequency against each class interval
or value is shown separately a.qd individually. In a cumulative series
the frequencies are progressively totalled and aggregates are shown.
The following example woUld clearly show the difference between
discrete and continuous series and simple and cumulative series : -
TABLE 3
Discrete and Continuolls Series

Discrete Series Continuous Series

No. of children No. of couples Height in inches No. of persons


per couple
2 50 60~62 12
3 is 62-64 15
4 40 64-66 24
5 28 66-68 13
The above series are simple. If they have to be converted into
cumulative series they would be as follows : -
CLASSIFICA'I'rON, SERrATION AND TABULATION 71
TABLE 4
CUmtllatilJc S cries
No. of children No. of couples Height in inches I No. of persons
per couple
'.
Up to 2 50 Up to 62 12
Up to 3 125 Up to 64 27
Up to 4 165 Up to 66 51
Up to 5 193 Up to 68 64
TABULATION
Meaning and imp()rtance. In the broadest sense "tabulation is an
orderlY arrangement of data in columns and rows". It involves the systematic
presentation of data to elucidate the problem under investigation. It
is a process between the collection of data on the one hand, and its
final analysis on the other. In fact tabulation is meant to properly
au~nge the answers relating to the questions posed in any investigation,
and is very helpful in analysis of the collected data as also in drawing
inferences from them. Tabulation is the final stage in collection and
compilation of data, and is a so~t of stepping-stone to the analysis and
interpretation of figures. In deciding about the type of tabulation one
has to keep in mind the nature, scope and object of the enquiry. Tabu-
lation of data should be done in such a form that it suits the nature and
object of the investigation. The importance of proper tabulation is
very great because if the tabulation of data is not satisfactory its analysis
will not only be difficult but defective also.
Types of tabulation
Simple and complex tabulation. Broadly speaking, tabulation of data
can be :.ither simpfe or complex. Simple tabulation gives information
about one or more groups of independent questions. Complex tabu-
lation shows the division of data in two or more categories and as such
is meant to give information about one or more sets of inter-related
questions.
One-way table.r. Simple tabulation usually gives rise to single or one-
way tables. One-way tables supply answers to questions about one char-
acteristics of data only. The following table will illustrate the point:-
TAB~E 5
Marks obtained by 100 sludent.f in statistics
Marks Number of students
30-40 14
40-50 26
50-60 30
60-70 20
70-80 10
Total 100
72 PUNDAMENTALS OF STA'l'IS'I1CS

This table tells us about the number of students in each class-


interval of marks obtained' by them. We can know from this table
that 30 students obtained marks between SO and 60. This table also
tells us that the minimum marks range from 30 to 40 and the maximum
from 70 to 80. Thus this one-way table gives us information only
about one chaucteristic of data, that is, marks of students in statistics.
All the ~uestions that can be answered fro~ the table would be indepen-
dent of each other.
TWO-IIIIlJ tabus. As against the above type of Jable there are dollble
or 11II9-IIII1J lables. Two-way tables give information ~bout' two- inter-
related characteristics of a particular phenomenon. If the numbers
of' students given in the above table ate further divided sex--wise, the
table would become a two-way table because it would give information
a~out two characteristics, namely, the marks obtained by students in
statistiCs and the sex-wise distribution of students in various class intervals
of marks. The shape oftbe table will be as follo'Ws:-

TABLE 6
MarRs,obtained by 100 stllJents in statistics (slx-wisI)

Marks Number of Students

\
Males Females Total
30-40 8 6 14
40-50 16 10 26
50-60 14 16 30
60-70 12 8 20
70-80 6 4 10

Total S6 44 100

The above table is capahle of supplying information about ques-


tions relating to two inter-related phenomena. From the table not_only
can we find out that 30 students obtained marks between 50 and 60 but
also the fact that out of them 14 were males and 16 females.
Thr,I-1PiZ.J labks. If three inter-related phenomena are to be studied
there would, be treble or three-way tables. A three-way table can
answer questions relating to three inter-related problems. In the above
example if we further find out the number of students who. were hostellers
and the number who 'were day-scholars a three-way table would be
necessary. It would be as follows-
CLASSIFICATION, SERIATION AND T~ULATION 73
TABLE.7

Marks obtained by 100 students in statisties (suc-lIIise and


on the b4Sis of residen&l)
Number of Students
Males I l'emales I Total
Host- Day Totar Host- \ D~y Total Host- Day
Marks ellers Scho- ell.ers Scho- . elIers Scho- Total
lars lars Iars
30-40
--4- .-r- 8 -4- 2 6 -8-- ---6- 14-
40-50 10 6 16 5 5 10 15 11 26
50-60 8 6 14 9 7 16 17 13 30
60-70 7 5 12 5 3 8 12 8 20
70-80 5 1 6 2 2 4 \ 7 3 10
---
Total I"' 34 I 22 I 56 I 25 I 19 I 44 I 59 I ·41' I. 100
The above table can supply us information about (1) marks obtained
by students, (2) the distribution of these students sex-wise and (3)tbc
distribution ot" the students on the basis of residence.
Higher order tab/es. The tables can also be I1I~JO/J .or of higbl,
orde,. Such tables supply information about a large nwhJ>er of inter-
related questions. If in the above table additional informa~on is given
about civil conditions of the ~tudents it would become a four,.;way table
and similarly tables can be. of still higlier order-five-way, six-w-ay, and
so' on. .All such tables are called manifold or higher order tables.

TABLE 8
Marks obtained 1!7 stfltknts (sex-1IIise, on tbl btlns oj ~"i/
eonJiti01ls and resitkn&u)
Number of Students
Males
~----_-_-

Residence Marks

Hostellers
30-40
40-50
50-60
I

i
I I
60-70

Totali l l
i i
_ _ _ _ 70-801 _ _ _ _ _ _ _ _ _ I__i__
,
I_
74 FUNDAMENTALS OF STATISnCS

Day scho- 30.40


lars 40-50 j
1

50-60 I
60-70 I
70-eO
-
Total
1
-- - - I - -I~------
I I .l --I
Total I (
30-40
40-59
50-60 I
I ,
60-70 I
70-80
Grand
Total
-j-,- --
i
--~ -- --,--1-
I
I I
The above table gIves information about a large number of inter-
related questions regarding students, namely, about the marks obtained
sex-wise distribution, civil conditions and residence. Manifold tables
are very useful in presenting population census data.
Rules- of 'tabulation
Having discussed the meaning, importance and ~es of tabulation,
it is necessary to lay down certain rules regarding construction of tables.
The following general rules should be observed in the copstruction
of tables : -
1. The table should be precise and easy to understand. It should
not be necessary to go throJ.l.gh footnotes or explanation to properly
understand a table. .
2. If the data are very large they should not be crowded in a
single table. This would increase the chances of mistakes and would
make the table unwieldy and inconvenient. Such data can be presented
in a number of tables. Each table should be complete in itself and
should serve a particular purpose.
3. The table should suit the size of the paper and, therefore,
the width of the columns should be decided beforehand.
4. There should be thick lines to separate the data under one
class, from the data under another class and the lines separating the
sub-divisions of classes should be comparatively thin.
5. The number of main headings should be few though there
is no harm if the number of sub-headings is large. This will he 'p in
understanding the main points of the table.
6, Captions, headings or sub-headings of columns, and sub-
headings and sub-headings of rows must be self-explanatory.
7. Those columns whose data are to be compared should be
kept side by side. Similarly percentages, totals and averages must also
be kept close to tl;le data.
CLASSIPICA'l'ION, SElUA'l'ION AND TAlIOLA'l'ION 75
8. As far as possible figures- should be approximated before
tabulation. This would reduce unnecessary details.
9. The units of measurement under each heading or sub-heading
must always be indicated.
10. Total of rows should be placed in the extreme right column,
though sometimes they are placed in the first column after the vertical
captions on the left. The totals of columns should ordinarily be placed
at the foot though in some cases it is helpful to place them at the top of
the table.
11. Items should be arranged either in alphabetical, chronological
or geographical order or according to si2:e, importance, emphasis or
casual relationship to facilitate comparison.
12. If certain ii gures are to be emphasised they s!-.ould be in dis-
tinctive type or in a "box" or "circle" or between thick lines.
13. When percentages are given side by side with original figures
they should be in a separate type-preferably italics.
14. If some portion of collected data cannot be classified in any
class or division a miscellaneous class should be' created and the data
shown in it.
15. There should be a proper title to each table. It should tell
what exactly the table presents.
Besides the rules mentioned above, the figures should be scruti-
nized before being entered in a table. Below a table, should be given
the method of collection, sources of data, general results obtained and
their limitations. The probable error should also be mentioned.
It Rhould be remembered that there cannot be any rigidity about
these rules. Tables must suit the needs and requirements of an in-
ve~tigation. Bowley bas correctly said that "in collection and tabu-
lation common sense is the chief requisite and experience the chief
teacher."

Questions
I. What do you understand by classification, seriation and tabulation? Dis.
cuss their importance in a statistical analysis.
z. "Classification is the process of arranging things (either actually or notionaliy)
in groups or classes according to their resemblances and affinities giving expression
to the unity of attributes that may subsist amongst a diversity of individuals!'
Elucidate the above'statement. ' (B. Com. Allahabad, 1947).
3. How would you proceed to classify the observations made and what points·
will you take into consideration in tabulating them? Mention the kinds of tables
generally used. (B. Com. Agra, 1941)
4 What precautions would you take in tabulating your data ?
(B. Com. Agra, 1933).
1. "In collection and tabulatiQn common sense is the chief requisite and ex·
perience the chief teacher."-Bowley.
What precautions in your opinion are necessary to avoid statistical errors in the
collection and computation of primacy' data? (M. A. Agra. 1940).
76 PONDAHBNl'ALS OF STA'l'ISTICs

6 •• DlacUSI the main functions and importaDcc 0.£ tabulation in a schcmc in in-
vcatJgation. Prepare blank tables to show distribution of students of a coUc~ accord.
Ing to age, class and residence for arranging (a) Physical training and (b) Tutorial classes.
7. (or) Draw up a blank table with suitablc beadings, spacings, table of lincs.
etc:. in which could be shown the number and tonnage of ships enteted and cleared
at ~ in India for 10 years distinguishing steam and sailing vessels anel also tbose
with eatgOCB from those in ballast.
(b) What do you mean by "A statistical Unit of Measurement:; Give a
auItab1e illlJ8tfttion. (B. CO/JI. H()JIs. AiMDTII, 194%)'
·8 Draw "P two independent blank tablcs giving rows,-columns and totals in
eacb ease swnmatlzing thc dCtails about thc members of a number of families distingue.
shing males from females, earners from dependants and adults from chUdren.,
g. Draw up in detail, with propct attentioCl to soaclng double lines, etc.,
and showing all sub-totals, a blank table in whIch coulcl bc entered the numbers
occupied in sil[ Industries on two dates, distinguishing males from females, and
ImODI the latter single, married and widowed. (M. A. AlIi/., (940)
10. &plain how you would tabulate IItatistics of death from principal diseases
by 1CZeI, in two dUfcrent provincea in India for a period to five years.
(M. COllI. Ct:Iflllla, 19")'
U. Prcpa:rc a table with a proper title, divisions and subdivisions to represent
the following heads of !nformation : -
(a) ~rt of cotton piccegoods from India.
(b) To BlUm.. China, Java, Iran, lraAJ.
(t) Amount of piec.egoOda to each country.
Value of piecegoOds to each country.
Prom 1939-40 to 1945-46 year by year.

m To amount exported cadi year.


Total nlue of" aporta each year.
(M. CD",. A.lld.; .1946).
u. lhplain the ~poac and methods of classification of data. How are the
madllae tabiiladng caida prepared and used 1
15'. Prepare a blank form with luitable heading and lpacing for use in collection
()f data on ODe of the following : -
(.) Sut'f'CY of tradCli in your district.
(b) StancWd of living of middle class families in a small town.
(t) .Bzpea&el ollt\Jdenta in a Wlivenitv.
14. DistinJZU!sb between qnCoway, tWo-way, three-way tablc:a and tables of
bJsher order. lnustrate your aDlWCZI with elWJlples
IS. Write ahort DOtea on : -
(a) ClauHicatiOD ac:cotdJag to attributes.
(t) Clasa limits.
(f) Magnitude of c:lau interval.
(tI) Q,mplcz tabulation.
(e) Class frequency.
16. (4J) What f. the motivation for arranging ob8ctVed data in a frequency dis-
tribution with a number of c:laaI-fntctvals of the variable jI
(b) What are tIle ~tlnclples governing the choice of (I) the numhct of class-
Intctral... (II) the length of the clus-interval and (/Ii) the mid-point of the clan
Interval ?
(I) It ill said that in obtaining a frequency distribution of the ceolhll agc
remrna, the mid-point of the elas:I-Inte:rnla should be multipl.led by S. GiVe an expla-
natJoo.
CLASSIPlCATION, 8BlUATION AND TABULAT~ON 7i
(I) For a frequency distribution of marks in bistofI of zoo candidates
(grouped in .intervals O-S, J-1o.' ..... etc.) the mean an4 standard devistioll were
lOund to be 40 and 1J. Later it Was discovered that the score 4' was misread as"
in obtaining the £~uency distribution. Find the corrected mean and stsndard
deviation corresppndmg to the corrected frequency distn'bution. (1. A. S., 1951).
17. You ~re,'given a statistical table. What questioDs would you 115\ before
accepting it P Draft a form of tabUlation tc show : -
(6) Sex; (b) Three tsnb-Supervisors, assistants, and clerks; (,) Years 1918
and 194~;(tI) Age-groups :-18 years and undor, over 18 but less than H years, over
H years. (D. A. Mlll/r6s, 1953).
8. What information can be I>btained from a frequency distrIbution P
19' What are the advantages aDd disadvantages in having a large number of
class intervals P Discuss.
10. Define Frequency Distribution. State the principles to be observed in Its
formation.

, The follOWing is a record of weights of 70 students (in Ibs.). Tabulate the


data in the form of Frequency Distribu:ion, taking the lowest class as (60-69) :-
61 73 93 107 111 76 78 69 96 72
80 88 96 109 103 84 84 106 91 7J
91 92 101 91 101 90 77 10 5 90 86
(13 101 114 7 1 77 1I8 9S 63 99 81
100 106 ~7 89 91 1 07 III 76 8, 86
106
109
107
97
62
74
94
98
73
67
108
Sa
liS
10 4
8S
88
98
88 91
9'
II. ' Make a f!'C<JUency table (n descending order in Inclusive from the data
gf'f'!=l below. eelectlng III c_ Interval of, units each.

1" I,.I,.a.,
17. a" 19. 11,,14. 'I,.
i6.17. I,.
Ja, 18. ale 15. 20,
10. 22, .17, 21, 19. 19. 16. 18, II, 18, 10 •
11.
19. 17. t6. 14.

. u. Following III, the m:ord of matkll obtafned by go caodldatell In ao eumi-


nadon. Form It fnlqueney dlatrlbutlon.
84, 91. ,8, 71 • 44. 87. 76.4" as, 40. 75. 86. 77. 1S. n. 71 , '4. 46, H. 45,
n.76• 94. 6,. 74, ,0,6,,80. '7, ", $6. H. 91. n. 6,. 69. 47. 119.57. n. b,40,
27. 84, H. %9,51.72 • 44. 19, 11, 67, 58, 76.3 8• 16, 37. 74, 46. 50. IS. '9. a7A 92,
IS. 4'. 61. '9.78, 15. 12, 71, 6a. II, 41, ,8, 27, 66, St. 29,6,.47.59.19. a., ".
39. 80. ". (S. A. A/JuInu. I,,,.)
2,.
Convert the following data of clas&-&equendea c:umuJated from the top
and frOm the bottom Into usual type of class-mtervals with Indlvldual c1u&-&equendel
(1) C1ass-frequenclea cumulated from (II) ClasArequendee cumulated from
the top the bottom.
Below
. ,
Matb Studen~ Abo~e Mub Stucknta

".,
10
I,
10
2ll ..... ,
0 n
45

.. ..
57 10
20
I,
50
H
1,
20
"
IS
S
"
78 FUNDAMENTALS OF S'l'ATISTICS

.24. In an enq~find out relation between age and monthly wages, the fo1.
lowmg data were co from 40 mill workers :
S. No. Age(Ycats) Wagc(Ra.} S. No. Agc(ycars) Wagc(Ra.)
I. 37 81 :11 41 89
1. al 100 aa 38 9a
3· 49 101 a3 41 8I
4· ,6 109 24 37 140
S· 57 (02. as 4S 94
6. 34 104 a6 4 6 .n9
7· 25 8( 2.7 28 99
8. 48 tit, a8 43 109
9. 51 100 2.9 41 92.
10. 41 89 30 31 no
n. 4, 15' 31 5S tao
12.. H 101 32 42 115
13· 38 99 H 4Q 119
14· 41 U3 H 4S 90
IS· 31 100 3S So 76
16. 30 99 56 24 IS8
17· 55 130 37 :n 76
til. 30 159 38 u 76
19· 2.9 90 59 al 94
ao. u 79 40 58 89
Tabulate the above data in the following form ! -

No. of Mill Workers


Wage group
in RupccB
----
Age group Age group Age group Total
2.I -3 0 yra. 51-40 yra • 41 -S o yts
76-100
. _---
101-US
ltG-ISO
lSI-In
--------1--
Total

Note I-Your answer should show the 'actuaI procC$s of tabulation.


(R4j. B. CfJ1It. I9S8)
as. Preecnt the following information In a suitable tabubt fortnl-
In 1940041 the total production in India (in thousand tons) of the principal oll-
seeds Willi l1li follows: Ground-nuts 37°2; linseed 454; rape and mustard n05; castor
lOS; sesamum .33; Next 'year the production of each of the 6rat three Items £ell by
56% and of the remaining Items fell by 10% each.
In 1942.-.45 ~ere Willi an incrClllle compared to the preceding year of 8%.in ground·
nuts, t2.% m Imsecd, 1% In rape and mustard, 50% in castor and 10% m See8mum
Tn tile next year the figures were respectively ,825,395, 955. 140 and #1·.
(M. CIJIIJ., IJeIb" 1959).
26. The followin~ is the su1lllD9.fY of the time of leaving home and the number
of hours spent in the I.llstitution of a group of teachers In Bombay unlvctsity : -
One teacher leaves the bome before, .30 a. m. and spends 4 hours in the instl·
tution. Orthe 2.3 teachers who leave thelr homes between 6 and 7 a.m. 7 teachers
lpend 3 hours, II teachers .. 4 bours, 1 teacbers ... s hours. and 3 teachers 6 hours. Of
the 16 who leave between 7 and 8 a. rn.,4 teachers spend 3 hours, 6 teachers ... 4 houta,
79
I teachct •.. S houn and ~ teachera ... 6 houJ:l'. C ~. 9%, v.Lo leave-between 8 and 10
a.m., 6 teachets... 3 houtsl 9 teachera ... 4 houts, :zx'teacher ", I-OUI1' md 46 teachers••.
6 houts. Of the :n who leave between 10 and II a.m., %r,1II,r-.htts"'5 houts, S teachers
· .. 4 houts, 7 teachets ... Shouts and 4 teachets... 6 houts
Present the summary In a suitable Tabular Form
(Raj., B. CtIIII., 1961).
27. Ar.rangc in a suitable tabular fotq!. the following I
The Food Grain Enquiry Committee made t}le following comparable study of
size of holdings in the Eastern U. P. and the rest of U. P.
In the 14 eastern districts of U. P., holding. below 2 acres account for 20% of
the area under all holdings comprising a total area of U%SO (thousand acres); the cor-
responding figures for the rest ofU.P. are II% and %9036 (thousand acres). Smilarly,
the proporton of area covered by holdings exceeding 2 acres but not exceeding ,
acres to the area under all holdings Is 29% in 14 districts and only 3% in the rest of
U. F. On the other hand the proportion ot area covered by holdings exceeding 5 aerea
is much greater in the rest of U.P. than in the 14 districtS. (Delhi, M.A, I9S8).
(13anaral, B. Com., 1960).
2S. In a newspapct account, describing the incidence of infl1lCllZ8 among
tubercular persons liVing in the same family, the follOWing paragraph appeared:-
"Exactly a fifth of the 1.00,000 inhabitants showed signs of tubercluosls and no
fewer than SOOO among them had an attack of influenza, but jUUong them only 1000
lev-ed in infected houses. In conttaSt with this 1 (15th of the tubercular petsons who
did not have influenza were still exposed to infection. Altogether 21,000 were at.
tacked by influenza and 41,000 were Cxposed to the risk of infection, but the number
h8"l'ing an attack of influenza but not of tuberclosis and living in bouses wh~re no other
Cases of influenza occurred was only 2,000."
Redraft the information in s concise tabular torm.
(M. Cmn., AgrfJ, 1962).
(R. A. S., 1960).
(M. A., Delhi, 1957.)
z9. The following figures give the height in Inches of 80 studentB of a class
Represent the data by a frequency distribution with suitable class-Intervals : -
6Z.I, 65.5. 6~.0, 62.2, 64.7, 63.1, 6S.8, 62.3, 60.7, 63.2, 64.1, 59.6, 64.S, 61.1.
65.7,60.2,64.3,67.4, 64.S, 664, 64. 2, 6204, 63·3, 64,0, 6z." 6,.4, 66,3, S9'9, 63·5, 61.8,
6S.4, 67'3,60.4,6,,6,59.1,64. 8,61.9,62.6,67.0,68.1, '9.4, §3.6 64.4, 62.0, 63.7, 6S.3"
63.8,667, 63.9, 60.8,63.0,64.3, 6uz, 6%,7, 64.6, 64.9, 60." 64.4, 61.7, 66.5, 66.6'
63.4, 6S.2, 66.2, S9.7, 67.6, 63·5, 67.41 63. 6, 68.5, 60.0, 61.3 63.6, 61.S, 6,.1,6%.8,61'3
64.0, 68.7, 66.6.
30. Ammge the followihg mark. in a Frequency Table taking the lowest class
nterval 10-20 :- ~

13, 81, 58, 81 SS, 7S, 61, 70, 84, 84, 81. 87, 67, 6" 62, 62, 61, S9, 5S, 57, 75,
72, 84, 91, 87, 76, 43, 83, 40 , 73, 86, 73, 43, 33, 76, 95, 73, 65, 77, 72, 72, 29,
43 85, 4%, 80, 75, 85,62, 57, 64, 70,95, 57, 74, ,0. 7S, 49, 55,64, 92, 73, 73, 96,.
69 51, 22, 7S, 80, 36, 70 8S, 47, 69,63, 53, 91, H. 69, 30. (AndbrfJ, B. A., 1914)
31. Tabulate the following data by taking 10 as the cIas.·lnterval :
30, 45, 55, 65, 60, 90, lIS. 8s. 95> 100, 95, '65, 75. 8S, IZS, lIO, 87, 6"
100, lIS, 65, 60, 75, 9S, 130, 95, 125, II5, 6" 70, 9" 8" 6S, 60, 80, 8"
75, 95. 55, 45, 35, 45, 40, 85, 135, 140, 9S, 65, 4S, 3', U5, 90,80. IZS, 130,
~5. 90, 100, 95, 85, 85, uo, II5, 40, 35, 12 5, 35, lOS, 7',45,
(B. CtIIII., Vwa"" 1964).
Ratios, Percentages And
Logarithms 8
RAnos. AND PERCBNTAGBS

Need. Mter the statistical data have been collected, edited, cla$sifi-
ed and tabulated, they are ready for further statistical analysis. In the
process of classification· and tabulation the size of the data is considerably·
reduced and a large number of figures are condensed. This is done with
a view to make the data easily understandable and fit for analysis and
interpretation. But even after condensation, data might be fairly large
in quantity and the figures may be very big and unwieldy. It·may not be
easy to draw inferences from them. To remove this difficulty, sometimes,
ratios and percentages are calculated so that big figures are reduced to
small ones and 11. relative study of the data is possible. Absolute figures
ue uafit for relative study and in statistical analysis where most of the
data ~ compared relatively, absolute figures, even though they arc:
esset;ltial do not have very great· significan~.
Derivatives
Ratios and percentages are obtained by a combination of two or
more figures. They are J,ri",J from the absolute figure~ collected for the
putpose of investigation, and that is why. they are sometimes referred
to as utkriflflnfllt." Derivative is a quantity. obtained by the combination
of two or more figures. In a statistical analysis a vanety of derivatives
are used. Ratios, percentages, rates, coefficients, measures of central
tendency and meas~s of dispersion., skewness, kurtosis are all statistical
derivatives. Ratios and percentages are nlJlpl, JlriIJ4/iWI while measures
of central tendency or averages of the first order and measures of dis-
persion.and skewness or averages of the second order arc ~oClpl,x tllrilJa-
lilll", as in their calculation a number of statistical processes nave to be
undetgone. Simple derivatives may be either to-er_1I ()1' mlmJilk1h.
When two or more parts of a universe are.compared with each other ~th
the help of ratios or percentages these derivatives .are called co-ordinate
derivatives, and when a patt of the universe is co~=d with the Whole
of the universe derivatives are said to be subor teo The ratio of
females and males in a population is an example of co-ordinate derivative
and the ratio of females to the totall?opulation is a subordinate derivalive.
Ratiot. In the simplestjOSSlble form, a ratio is t\ quotient or the
numerical quantity obtaine by dividing One figure by another. 1£
800 is divided by 100 the quotient is 8. Here 800 has been compared
with 100 which 1S the base in this case. In other words, 800 is to 100
tlS 8 is tQ 1. Or 800: 100: : 8 : 1. 'the process reduces the s* of the
numben and thU9 facilitates comparison. Instead of saying that the
RATIOS, PERCENTAGES AND LOGAllI'1IHMS 8"1

production ofw~eat in cQuntry A is 3,SO,800 tons and in B, it.is 1,00,000


,tons, we can say that the ratio of prdduction in these countries is 3,50;800
tons to 1,00,000 tons Or 3.508 to 1. Ratio is the siinplest form of
relative :comparison between two fi,gures. Ratios are used in a number
of ways. In ordinary walk of life, the common man is sub~onsciously
aware of 'them. "Nehru is one man in a million", or ICcost of living
has gone u~ four timesv or "he can lift thrice his own weight" are
expressions 1n common use. They indicate the univetsal use of ratios.
P,"",t4glS. Ratios are very 'often expressed as percentages. In'
the calculation of percentages also, one figure is taken as base 'and is
represented by 100. The other figure is expressed as a ratio of t~s base.
80 is 400 0 0 of 2.0 or 80 : 20 : : 400 : 100. Instead of. saying that the
export o~ a commodity ftom a country was valued at 45 lakhs in the y,ear
1955, and 30 lakhs in the year 1954, we can say that the exports in the
year 1955, were 1500 0 of the figures for the yeat 1954.
RatlS. Instead of 100 as base other figures lik~ 1, 10, 1000, 10,000
can also be used. Usually when,the b~se is 1000 the ratio 'is called rate.
For example, ihhe number of deaths is diyided by the total population
and the quotient is multiplied by 1000 we will get what is called It crude
death rate. However, thete is no hard and fast rule that a rate should
have the base of 1000 only. Rate per '1000 is called rate per ",ilk:
C"'Jfiti",lI. Rate per uOlt is «alled a ~6efficient. The death rate in
India at present is .bout,1.7 per cent or 17 per thousand. We can say
that the co-efficient of deaths 18 .017. If this co-efficient of .017 is ululti-
plied by the total population we shall ,get the total number of deaths'.
Sit' oj tIJ, ball. The abo~e discussion clearly indicates that the
diJrerence betweef\ ratios, rates, percentages and co-eflicients is only io
the base on which they ate calculated, otherwise all of them give a relative
picture o( two phenomena which are interrelated. Which base should be
chosen for the putpose of comparison is a question which can be decided
alter taking into account the nature of the data. OrdiDArily the base
should be large enough to permit the numerator to be expressed as a
whole number and it should be snudl enough to prevent more than three
digits appearing in "tl,le<numerator. to the left of the decimal point. The
death rate in India is t 7 per thousand. If the base was reduced from 1000
to to the numerator\:woUld be 17 which is less easy to understand. If 00
the other hand, it W'l\s raised from fooo to 1.00,000 the numerat01 would
be 1,700. This violates the principle of ratio which is tD , _ , /arK.'
1I11111b"s ID S/llall" fJIII~flJf' 1/)1 lah of ,a1.7 IIIItlIr .rlaiuliflg ami 11114!1.ri.r.
TJjHl of btllu. Various types of bases can be llsed for the compu-
tation of ratios. Som~ of the important ones are-:-
. 1. To/alto total. If one group o! ~res (as a whole) is compared
wlth another group, the base of the ratio would be the total of one of the
two groups. Income per capjta is ,an example. In its calculation total
income is divided "y 'the total, population:

"
82 FUNDAMENTALS OF STATISTICS

2. Total to part. Where a part is compared to a whole or universe,


the hase of the ratio is usually the value of the universe. If ratio offemales
to the total population is calculated, base would be the total population
and the ratio would be obtained by dividing the number of the females
by the total population.
3. Part to part. If the ratio of males to females in a population
is studied the sex ratio of the population will have to be calculated.
This ratio is usually expressed as number of fe!I}.ales per 1,000 males.
Here the base of the ratio is one of the two parts-males in this case.
4. Past to present. If the production of wheat in India in the year
195: is to be expressed as a ratio of the production in the year 1954, we
shall have to use the figures of the past, that is, 1954 as the base of the
ratio. When it is said that th~ production of wheat in 1955 was 110
per cent of the production in 1954, it means that the production of 1954
is represented by 100 which is the base.
5. Standard area, distance and units. Sometimes the base of the ratio
is a standard area as in the case of population per square mile, or standard
distance as in the case of cost of railway line per mile or a standard or
conventional unit as in the case of children per faJl1i/y, students per ;.hool
or room per house, etc.
6. Arbitrary ratios. In many enquiries it is pos~ible to use arbi-
trary units and they sometime,s give better results than e\ren conventional
units. Examples of such units are horse-power, ton-mile, light-years,
class-hours, etc.
As has been said earlier, the most common arbitrary units ate 1, 10,
100, 1000 and 1,00,000. Among these 100 (or percent) is the most
popular arbitrary base.
Ratios between like and IInlike tlnits. In order to facilitate comparison
in the shape of ratios it is essential that the two figures compared should
have the same characteristics and should be expressed either in the same
unit or in comparable units. We can calculate a ratio between produc-
tion of wheat (in tons) and export of wheat <in tons). Similarly consump-
tion of cotton (in' bales) can be compared with its production (in bales), but
we cannot calculate ratios between production of wheat <in tons) and the
c0nsumption of cotton <in bales) .. Death rates are comparisons between
persons and persons. Number of persons dead are compared with the num-
ber of persons constituting the total pc·fmlation. However, in many cases
comparison has to be done between items which are expressed in dii'erent
units. For example, total income of a country is divided by the total
population to find out per capita income. Similarly in the comparison of
the number of miles -done with a gallon of p<;trol the units are different. A
direct comparison of rupee and persons or miles and gallon cannot be done
but they can be reduced to a common denominator. The common denominator
in such cases is a number or quantity. Thus, in comparing total income
with the total population we really divide the number of rupees representing
the total inco~e by the number of persons representing the total population.
RATIOS, PERCENTAGES AND LOGAa.ITHMS 83
Similarly in the second example above we compare the "lIIIIb". of miles
with the 1lfl1llber of gallons. We thus find that in these cases also the nu-
merator and denom~nator are identical. They are both IUImberl.
Fallacies in the use of percentages Clnd ratios
Original data IhOlild be comparable. The percenta~s and ratios
should be used with caution otherwise they are liable to give misleading
conclusions. Percentages should be used to compare only such data
which are comparable in actual figures. If the original figq,res are not
strictly comparable with each other, for some reason or the other the per-
centages should never be used and the original data should be presented.
Usually it is said that percentages give a better idea of the relationship bet-
ween profits and capital than the original figures. It is for this reason that
profits are usually expressed as percentages of capital invested. This
is all right, but if the figure of capital investment undergoes a change in
any year, and if only percentages of profits of capital invested are shown,
it is likely to give wrong impression. An illustration would make the
point clear. Supposing a company has a capital of Rs. 1,00,000 for the
last five years, and its profits each year are Rs. 10,000 which means that in
each of these 5 years the profits are 10% of tp.e capital. Suppose in the
next year the capital of the company is increased to Rs. 1,50,000 and the
amount of profit is Rs. 12,000. The percentage of profit to capital would
now be only 8. If only percentages of profits are shown for these six
years, it would appear as if the profits in the sixth year have gone down
wJ;lereas the profit has increased by Rs. 2,000. This is so on account of
the fact that the original data of capital and profits are no mote comparable
as the amount of capital has increased. In such cases, -where the homo-
geneity of the original data has been disturbed by any factor, percent-
ages should not be used without the original fgures being shown side
by side.
Balil of calClilation of percentages. Misleading conclusions by the
use of percentages may also be derived' on account of the wrong basis
of the calculation of the percentages. It is essential that. the basis of
the percentage calculation is correctly observed. An example would
clearly illustrate, bow wrong conclusions can be arrived at, if the basis
of calculation of percentage is not properly observed. Suppose it is
said that the lrice of a commodity" fell 5%, then went up 10%, then
a~ain fell 20~ I) and then again went up 25% over a period of time, it is
dIfficult to find out the change in the price level over the whole period.
Two different answers would be obtained accordingly as the basis of
calculatign of percentages is
(/) the original price DC
(il) the price level ruling at the time of the change.
I~ ~he original price level was 100, changes according to the first
Supposition would be : -
Original price 100
95
5% fall :. price becomes. 100 X100 95
.
34· FVNDAMENTALS OP STATlSTtCS

9~ + 10Xl00
100ft, rise ..· price becomes :I 100 - 105
2OXl00
20% fall ..· price becomes \05--- -
100 - 85

25% rise ..· price becomes 85+ 25r~OO - 110

Thus according to this method the prices went up 10% over tbe
)riginal price.
Using the second supposition : -
Original price 100
95 X 100
5% fall /. price: becomes
100 = 95
10% rise ·.. price becomes 110x 95
100
= 104.5

2f>% fall ..· price becomes


80.x104.5
100
83.6
125 X 83.6
25% rise ..· price becomes
lOQ
104.5

Or we can directly calculate the price as


95 110 80 -125 .
100 XIOO X lOO X 100 X 100 - 104.5

Tlius according to the second supposition the rise in price! level


luring the whole period .is only 4.5%.
S;~.' oj ;IIIIIS. If the sae of items is small percentages may give
nisleadfug conclusions. If, for example, in'a school out of 2 candidates
who.appeared in an examination both pass, whUe in another school out of
~OO appearing 190' pass. The percentage of success in the first case is
loo·and in the-second 95 only. On the basis of these percentages we.will
lot be correct in drawing the conclusion ·that the first schoo. is better
:han the second. Therefore, percentages should not be used if the she of
tems is small, say, less than 100.
Pr,ttlllnons ill 1111111' oj ratios. Due to the fallacies in the use of
)ercentages mentioned above, sometimes the use of ratios may be advOcat-
:d in place of the perc:entages. In the example given in the last paragraph
tbout the success of candid*tes in .two schools, ratios would have given
1 better picture. If the ratios of suCcessful candidates to the total number
is. to be expressed, they Wduld be 2 : '2 and 190 : 200. The fallacy which
'lie observed by the use of percentages in this case is not foUnd here, and
:he ratios give a clear picture of the 'whole situation. However, ratios
sb,ould also be used with caution, particularly in comparing unte1ated
tnd heterogeneous figures. If out of t,OOO persons in a locality 200 are
tttacked by small-pox, the ratio of persons not attacked to those attacked
85
would be 800: 200 or 4: r. 1£ in another locality out of 1.000 persons
who were inoculated only 100 were attacked the ratio would be 900: 100
or 9 :t. These ratios, namely, 4: 1 and 9: 1 give the impression that
the second localj.ty.is healthier than the first. but it may not be so, be-
cause a lesser incidence of small-pox'in the second locality is most J'ro-
bably due to inoculation. The concl~ons thus would be fallacious.
Therefo.re, in cases where the data are not strictly homogeneous, ratios
should also be used with a great amount of caution.
S9me popular Ratios used in Population Stu~e8
Ratios, rates and coefficients are ·rised iC& all types of st ~s. In-
.:~me pet capita, population per square .mile, production per acre, turn-
;ob..,e.r .ratio, lixed-as~t ratio, lntellig_ence qu~tients, etc., art: examples. of
VUlOUS popular tattos used. We gJve below some of the popular tabos,!
Used in population studies : -
Ct'lllll tlllllh and birtlrrat,. Crude death rate for a locality is found by
dividing the .total number of deaths in that area, during the year in ques-
tion, by the number of people living in that ~ty at the mid-point of
that year. It is 1lcSually expressed per thousand (or p~r fllil/,).
Thus
Crud d th t _No. of deaths in a locality .in a ~ 1000
e ea ra e No. of people living during the mid- X
point of that year
'Crude birth rate is similarly found· by dividing the. total number
ofDUths in a l~ty, during the year in question. by the total popuktion
of the l~ty at the mid-point of the year.
Thus
Crude birth· . No. of births in a locality during. y~ 1000
mte - No. Of people UVi1lg during the mid- ~ .
point of that year
SliIIIIhrtlifld tkath aNI wth ratll. Crude death-~tes ahd bfrth-rates
are usually iJ.<?t fit for comparing conditions in .two or more localities.
The crude d~~tate of two places may be identical and yet their mortality
pattemJ may be entirely .dissimilar. S~ly; the fact that' etude birth
rates of two places are identical, docs not necessarily mean that the fertility
pattern of ~e two places is similar. This is on account of the fact that
age cOJJlpOSition.of two populations may dUrer from.each bther. The
~tages of people in dilferent age groups may not. be oimilat in the
two populations, and if it is so, the comparisons are bound to be fallacious.
To remove these dtawbacks death-rates and birth-rates are standarc'li%ed.
In the calculation of standardized rates it 'is presum~ that the age com-
position of the two populations is identical. The difFerence betWeen the
age compositions is elimio.ated with the technique of presuming a popula-
tion as tIOrIJIalor standard. Standarclli!ed death and birth-rates are calCUlated
by as~ting·the ctude death and birth-tate in dii£crent age groups with f.
86 FtINDAlmNTALS OF STATISTICS

normal or standard population instead of the actual pop_ulation of the


locality. The following illustration would clarify the pOJnt : -
Example 1
Suppose we have to calculate the crude and standardized death rates
of two towns A and B. Their age composition and mortality patterns
are as follows : -
TABL~9
Calculatidll of Crude dealh-rales of 10lllni A and E.
' Town A . -_ -- Town B
Age / Popula- No. a)f Oeath- Popula- No. of' Death-
Composition bon death s rate per tion deaths \ rate per
1000 1000
I-----~ - ---
Less than 5 yeats 3000 180 60 1500 75 50
5-20 years 5000 200 40 2200 55 25
20-50 years 4000 120 30 2800
( 56 20
2000 140 70 2500 150 60
--~- --
Total 14000 640 45.7 9000 -3~ 37.3
Crud, tiealhraJe of Town A
640
= 14000 X 1000 -=45.7
It catl also be calculated as :
~_?OX3000)+(40X5000)+(30X4000)+(70X2000) = 6,40,000 =457
3000+5000+4000+2000 14000'
Crude death rale of lawn B
336
= 9000 X 1000=37..3
It can also be calculated as :
_(50X1500)+C25 X200) 1-(20X2800)+(60 X2500) _ 3,36,000 =373
1500+2.200+2800+2500 - 9,000 •
As has been said earlier these rates cannot be directly compared.
To calculate the standardized death-rates we shall assume a standard'popu-
lation. Supposing the age composition of the normal or standardized
population is as follows : -
TABLE 10
Standard Popliialion
Age composition Population
Less than 5 years 200
5-20 years 250
20- 50 years 400
Above 50 years 150
RATIOS, PERCENTAGES ANI: LOGARITHMS 87
Now the standardized death-rates of the two towns A and B w;~ld
be calculated as follows : -
TABLE 11

. Calculation of standardized death-rates of towns A and B


Standard Death-rate Col. 2x Death-rate Col. 2 X
Age Groups pop~la- of town A Col. 3 of town B Col. 5
tion
1 2 3 4 5 6
-
i '!ss th:tn 'S years 200 -'60~
12000 ---5(r- -10,000
S. -20 years 250 40 tOOOO 25 6,250
20-50 years 400 30 12000 20 8,000
.4_2ove 50 years 150 70 10500 60 9,000
Total-- -1000 - - - - 44500
33,250
Standardized death-rate of town A = 44500 =44.5
1000
Standardized death rate of town B = 33250 = 33.25
1000
We could have calculated the number of deaths on the basis of the
standard population by applying the original death rates and got the same
answers. For example, on the basis of a death rate of 60 per thousand,
the actual deaths in !l nopulation of 200 (in less than 5 years group for
town A) would have b~en 12. Similarly, for the next 3 groups in town
A, actual number of deaths would have been 10, 12 and 10.5 respectively.
The total number of deaths thus would Mve been (12+ 10+ 12+ 10.5) 01
44.5. Since the total population is 1000 the death-rate also would have
been 44.5.
In the above ex:tmple we have presumed a standard population and
on the basis of this population we have calculated the standardized death-
rates for towns A and B. These rates are comparable with each other
because in their calculation, the diHerences in the age composition of the
two populations have been eliminated. If, however, the standard popu-
lation was not known, the population of any of the two localities could
have been assumed as standard population, and on that basis, the death-
rates would have been calculated.
General fertility rate. The crude birth-rate, as we have seen, dues
nat take into account eithi!r the age composition or the sex ratio of the
population. In the standardi:z:ation of the birth-rate also, the sex ratio
is ignored. In order to study the pattern of population growth it is
necessary to take both these factors into account because if two popula-
tions have different sex ratios, and if their birth-rates are identical, it is a
definite proof that there is a difference in their fertility patterns. If the birth
rates of the two populations are identical, the tocality in which the number
of females are comparatively less, the number of births per tOOO women
88 PUNDAMENTALS OF STATISTICS

am bound to be mo.re than in the other locality, because then only,


the two bjrth-rate~ would be similar. To remove this difficulty l"'nY»
j".IiHty rill, is calculated. It is the .ratio of the number of children bom to.
1000 women of child-bea.ring ages.
Thus
.; Total number of births
General fertthty .rate - Total number of women in child- X tOOO
bearing age group (15-45)
Spltifit j".IiHry ralls. If a more detailed study of fertility is necess-
ary the child-bearing age group can be divided in a number of smaller
groups. Fertility in the firsl: b,al£ of the child-bea.ring age group (15-30)
is more than in the second half (30-45). So, for a more detailed study,
fertility rates for specific age groups or individll!ll ages, within the child-
bearing age group, are calculated. They are called sR,djit f".tiHIJ ratls.
Here the total number of births to Women within a parttcular age-group lU'e'
divided by the total number of women in that age group. Thus for
women in the 20-25 age group. the specific fertility rate would be
Specific fertility 1;'otal number of children born to women
rate for women in 20--25 age group
in 20--25 age = - X 1000
group Total number of women in 20-25 age group
If the specific fertility rates for various ages are totalled together,
the ~esultis called lolal jertililJ ral,.
Gross rePTDtlIItliofl rat,. For a study of the £opulation growth, the
calculation of general fertility rate or even specific fertility rates is not
enough, because in their calculation two important factors, namely the
sex of children born and mortality ,are not taken into account. These
factors are considered in the calculation of reproduction rates. In gross
reproduction rate mortality is oot b!.ken into account, only the sex factor
is considered but in. the calculation of net reproduction rate both these
things are accounted for. Sinc~ children can be bom only b.y·women and
since in their case it is easy to lay down the limits of fertility ,period, re-
production rates are calculated by takjog into account the mothers and
f~e children only. Now-a-days, however, lIIal, "prodmtiOll rates and
even tOlllbilllJ reprotbitliOll ralls (for males and females combined) are also
calculated.
Female gross reproduction rate tells us about the number of female
children expected to be born to 1000 newly born females, during their
life-time, on the assumption that none of these 1000 ne'Wly born females
would die before crossing the upper limit of child-bearing a~e period and
further ,that the current fertility rates 'Would continue to rema1n unchanged
during the 'Whole of this period. Thus if 1000 newly born female children
remain toqo till the age of 45 (which is the upper limit of child-bearing
age period) or in other words, if there fs no mortality in this group till
R.A'rIOS. PBRCENTAGES AND LOGAIU'I'H¥S

th!e age of 45, and if dur.ing this period. on the basis of eurre nt fettility
rates, they give birth to 2412 femAle child.r~ the female g,ross reproduo-
tion rate 'Would be 2.412. Reproduction rates are generally apressed in
terms of UQity. It means that on. the assumptions mentioJ1ed above ·for
each mother at the present moment there would be 2.412 mothers in
future.
Thus
Number of female children expected to be born to
Female gron "\000 newly born females on the basis of current
,.
reproduction fertility Without mortality

tooo
N" "prodlttliOfl rat,. As bas been noted above the gross repro-
d~ction rate does not take into account" the factor of mortality. The
net reproduction rate takes intO account this factor also. Female net
reproduction rate tells us about the nwnbcr of female children ~
to be bom to 1000 newly hom females <;)n the basis of ~nt fertility
and mortality rates. It is quite obvious that neheproduction rate 'Would
be less than the gross reproduction rate. 1000 newly born.females 'Would
in actual practice not remain 1000 at the age of say 16. Some of them
would die. Supposing their number is reduced from 1000 to 800 and'
suppose further that the lCUttent fertility rate for the age of 16 is 20 per
toOO then the total 'number of children bom to them would not be 20
but 16 only~, ; If the sex ratio'is 50 ; 50 then only 8 female children.would
be taken into account for the (a}.~tion of female net reproductioll
rate. In the calculat;i.on of gross repr~uction rate 10 female children
'Would have been taken into accoWlt. In ~ age group of-women in the
dlildbearing age period, the numbC:r would go on declining due to morta-
lity .and ·the number of children bOrn woUld also be reduced. I{ sUppose
the total of 2,412 children (preswp~d in the calcUlation of gross reproduc-
tion rate) comes down to 1411. female ,net reproduction rate would be
1..411; It shows that for every present mother there would be 1.4~1
future mothers or, in other words, the populadon is growing. If net
reptoduction rate is just 1 it indicates a stationary population in fume
and if it is less than 1 it is a sign ,of declining popUlation.
'rhus
Number of female childl:en expected to be born to
Female net re- 1000 newly bom females on the basis of cutteDt
production fertility and mortality rate!>
rate
1000
• In the same way male reproduction rates can be calculated by taking
into, account the fathers and the number of male children espected to be
bom.. Combined reproduction rates for males and females can be cal-
culated by taking into account the population (both males and females)
and the number of children (both males and femaIes) expected to be bom.
90 FUNDAMENTALS OF STATISTICS

LoGARITHMS

Like ratios and percentages, logarithms also help in making relative


studies. Logarithms are short-cuts in mathematical calculation. With
their help multiplication, division, roots and powers of big and small
numbers can be easily calculated.
The common system of logs. (short form of logarithms) is based on
10. Log. of a number is the exponent to which 10 is raised to be just equal
to that number. The following example would clarify this point : -
1,00;000 = 105 therefore the logarithm of 1,00,000 =5
10,000=104 " " " " 10,000=4
1,000=103 "" ,,1,000=3
2
100=10 " " " " 100=2
10=101 " " " " 10=1
0
1 =10 " " " " 1-==0
.1=10- 1 " " " " .1=-1
.01 == 10- 2 " " " " .01 =-2
.001 = 10-8 " " " " .001 =-3
.0001 = 10-' ,. " " " .0001 =-4
.00001-=10- 5 " " " " ;00001 =-5
The logarithms of the above figures are all integral numbers. The
logarithm of 10 is 1 and of 100 is 2. For all numbers more than 10 and
less than 100 the logs. would be between 1 and 2. Similarly the logs.
of numbers more than. 01 and less than 1 would be betWeen -2 and -1.
Characteristics and manti.rsa. Thus leaving aside numbers like 10,
100,1000, etc., for all other numbers the logs. would consist of an integral
and a fraction. The log. of a number thus consists of two parts, namely;
(a) An integral number known as characteristic which can be
either positive or negative.
(b) A fractional part known as mantissa which is always posi-
tive.
RnJes fOT finding 0111 characteristic. There are two rules for finding out
the characteristic of a number : -
(1) The characteristic of all numbers more than 1 is equal to one
less than the number of digits to the left of the decimal place. Thus
the characteristic of 214.43 is 2 as the number of digits to the left of
the deC.llt.J place is three. Similarly the characteristic of 48297.3 is 4
and that of ll.2 is 1 and of 7 is 0. The characteristic of 1 is also O.
(2., The characteristic of all numbers less than 1 is equal to one
nlVtf' than the number of ~eros after the decimal point and before any
significant digit. Thus the characteristic of .003801 is-3 as the number
of ~eros after the decimal point and before a significant digit is 2. Simi-
larly the characteristic of.0102 is-2, of .00012 is-4 and of .182 is-1.
Rnles for finding 0111 mantissa. Mantissa of a number is seen from
the log. tables. At the end of this book there is a three-figure log.
RATIOS, PBllCBNTAGBS .AND l.OGAlUTH)lS 91
table. Log. tables can be of 4, 5, 6 or even more figures. There arc two
things which should be remembered about mantissa : -
(011) Mantissa is always positive.
(b) Mantissa is not affected by the position of the decimal point.
The mantissa of 785, 78.5, 7.85, .785, .0785 and .00785 would be the
same. Looking at the log. table we find that the mantissa of all these
figures is. 8949. Since, in numbers less than the characteristic is negative
and ,mantissa is positive, the minus sign is not written before the log. but
on the top of the characteristic; thus if the characteristic is - 2 and the
mantissa is .8949 the log. would be written as 2.8949 and not as - 2.8949.
Finding oNtlogarithm. Thus to find out the log. of a number 'We
should first write down the characteristic in accordance with the above
rules and then should consult the log. tables and write down the mantissa.
The log. tables given at the end of this book are only 3-figure tables, and
as such, figures with more than 3 digits should be first approximated to 3
qigits, and then these tables should be consulted. The following illus-
trations would clarify these points =
log. 6789.5 3.8319
678.95 2.8319
67.895 1.8319
6.7895 0.8319
.67895 1.&319
.067895 ~.s319
.0067895 3.8319
Anti-Iogaritoms. Just as with the help of log. tables it is possible
to find out the log. of a number, similarly by the use of anti-log tables
numbers can be found out from their logs. To find out a number from
it log. first only mantissa is taken into account. In the anti-log. tables
we can look up the number asainst the figures of mantissa. After this
the position of decimal point is decided by taking into account the charac-
teristic. Thus, ifwe have to find out the number whose log. is 2.874 we
shall read the number in anti-log. table. The number against the mantissa
of .874 (.87 at the margin and 4 at the top in 3 figure tables as given at the
end of this book) is 7482. The characteristic of the numbex is 2, there-
foxe, there shOUld be 3 digits in the number. Accordingly we place
_.declmal point after 8 and the number Whose log is 2.874 is 748.2. Simi-
larly the anti-log of 2.874 would be .07482; since the characteristic is- 2
the number of zeros after the decimal point and before a significant digit
would be one.
Computation by logarithms
To mllitply nllmh!rs. To multiply numbers find out theix logs, add
them together and find out the
anti-log. Thus a X b == Anti-
log. (log. a+log. b)
92 Pt1NDAMBNTALS or STATISTICs

'&lIJIIpl, 1

Multiply. 64.7 with 29.8


(a) log. 64.7-1.8109
(II) log. 29.8-1.4742
log. a+log. b-3:285i
Anti-log. 3.2851-1928
:.64.7X29.8-1928
BxIUllP/l II
Multiply 49.3 with ;0842
(a) log. 49.3 1.6928
'(11) log .•0842 2:'9253
log. a+ log b 0.6181
Anti-log ~f 0.6181 4.150
:. 49.3 X .0842 4.150
Not,. Whatever is carried fotw'ard from mantissa to characteristic
is positive and in the addition of cbaractcristics~ plus and minus signs
are taken into a~ount. In the above eumple. one is carried Eot'Wud
from 'the mantissa to the characteristic; it is positive and when it is added
to the characteristic of the first numbet it becomes+2; the cbarac:teristic
of the second numbet ilr~and so the total of the chatactens* is O.
'BxlUllph 111
Multiply .0842 and .00741
(a) log• •0842 - 29253
(&) log••00741- 3.8698

log. lI+log.b "4:7951


Anti-tog:. 4.7951 .0006237 .
-.~2X.OO741 •0006237
Til JiPitk fIIIl""trS; To divide one number by another. lind out ttle
log. of the dividend and from it subtraCt ,the log of the divisor. Find out
the anti-log of this dilFerence. It 'frill be the requireo arlswet.
Thus
: -Anti-log. (log. a-log. b)
Exlll1lpl, 1
Divide 1928.1 by 29.8
(a) log. 1928.1 3.2856
(b) log. 29.8 1.4742
log. a-log. b 1.8114
Anti-log. 1:8114 64.71
:. 1928.1.;.29.8 64.71
93

BxIJlllP" 11
Divide .0009 by .008
(II) log ••0009
(b) • log•• 008
- 4:9542
'J.9031
log. 4-1og. b T.OSt,t
Anti-log. 1.0511
: •• 0009+.008
-
po"",.
To ,.tUIIII tillmb" 10 II
.1125
.1125
In order to raise a number to a power
of
multiply the log. of the number by the exponent the power and find
out the anti-log.
Thus aa-Anti.llog. (nxlog. a)
Exampll1
Find out toe vslue of (95.2)~
log. 95.2 - 1.9786
x4
7.9144
Anti-log. 7.9144 8204()(){)('
:. (95.2)4 82040000
'&a111p/,11
Find out the cube of .0991
log. of .0991 - 2.9961
x3
4.9883
Anti-log of4.9883 .0009727
:. (0991)1 - .00097Z7
No/,. In th~ second example above 2 which is carried forward
£rom the mantissa to the characteristic is subtracted &om the product
of 3 and 2 and thus the chancteristic of the product is .f."
To6:tlrll# tbI rool ojIIl1l1111b". To extract the root ora numbet divi-
de the log. of the number by the index of the' root and. find out the anti-
log.
Thul
,\la-anti-Iog (10: a)
&alllpill
Find out the value of ~
log. 92.4 - 1.9657
Divided by 3
Anti-lo$~_ .6552
-- 1.~6~ -.6552
4.519
:.{j92.4 . 4.519
94 FUNDAMENTALS 011' S'I'ATIS'rICS

Example II
Find out the value of 7 v.00481
log. .00487 3.6875
_ To divide 3.6875 by 7 we shall have to write it as 7+4.6875 because
in 3..6875 the characteristic is negative and the mantissa is positive and
division is not possible with the figures as they are.
So
7+4.6875+7
Anti-log. 1];696
:.'11' -.0--
The utility of logarithms is very great in statistical calculations.
As has been said in the beginning, they help us in studying propor-
tionate changes. 10 to 100 is the same degree of relative chnge 9S 100
to 1000. In a.bsolute figures these changes are different but jf we find
out their logarithms, they would be 1 and 2 (for 10 aug 100 respectively)
and 2 and 3 (for 100 and 1000 respectively) indicating that the relative
changes in the two cases are identical.
Questions
I. Defin e a statistical derivative and discuss its utility in statistical analysis.
2. What is meant by co-ordinate and subordinate derivatives ? Illustrate with
examples.
3. "Wh"t precautions are necessary in the use of ratios and percentages?
4. What do you understand by a crude-birth rate? Is it an accurate measure:-
ment of the population growth of a locality? If not, how can it be modified to
give better results?
~. What is a "standard population"? How are birth rates and death rates
standardized ?
6. What do you understand by general fertility rate? Is it an improvement
over standardized birth rates ? '
1. What statistical data are necessary for the calculation of net-reproductlon-
rate? What is the deficiency in the existing Indian data in this respect.
(M.A.AIIJ .• 1951).
8. What is net-reproduction-rate ? Explail" with the help of an example the
method of calculating it.
9. What are the various ways of the measurement of population growth? In
this connection discuss in detail the calculation of net-reproduction-rate.
CM. Com. Allahabad, 1952).
10. Point out the ambiguity or mistake, if any, in the following statements :_
(a)· 99% of the people who drink, die before reaching 100 years of age.
I Therefore, drinking is bad for longe\'ity.
(b) The rate of increase in the number of cows in India is greater than the
population,' Then'fore, the people of India are now getting more milk per head.
(M. A. Palna, 1943).
RATIOS, PERCEN'l'~GaS AND WGARITHWI 95

II. Below is given the fertility rate for 1000 women, by their age group for a
certain country for 19;6 : -

Age GrollP Per IililJ rale per AglI GrfJfIP FerlililJ rale per
1000 women 1000 women
Years Yeara
I6-2c' 19 36-40 IS7
,"[-25 173 41-45 67
26-;0 "H 46-5 0 9
3 1 -35 201

Assuming that ratio of female babies to total births for tbe country and year
concerned is 48.8%. calculate the gross-reproduction-rate for the country and explain
what this rate means.
u. ~he following are the death-rates. per thousand, per annum, of two towns
in a certaln Y,car : -
Town A TownB
Ages Death- Death-
(years) Population Deaths rate per
1000
PopulatIon D.:aths
I ~~e:~

Under a
2-10
3000
10000
191.
70
64.0
7. 0
SOOO
12.000
300
78
I
I
60.0
6.5
10-"0 10000 40 4.0 10000 38 S·8
2,0-60 ;"5 00 1.60 8.0 1Sooo 190 7. 6
60 & over SSoo 510 60.0 8000 460 SM
1\jJ---I--"-7(,,-,.-noo--
107" I ~7;- --6:;"00-00- J lO66 17.71

(a) For each age group the death-rate of town A is greater than that of
town B but the reverse is the case when all age-group. are grouped together. Why
is it 80 r
(b) Calculate the standardized death-rate for toWn B taking the popUlation
of toWn A as the standard. (B. COf1I. Andbra, 1944), (M. A. Punjab, 1954).
13· Compute crudc and standardized death-rates in the folloWing and find out
if the local population has a higher or lower death-rate:_

Standard population Local population


Age group
Years
Population Deaths Population Deaths
-
Under 5 500 15 0 5000 60
5-15 I 2.000 14 15000 30
15-6s 2.0000 60 %0000

I
80
Above 6) 8000 320 8000 400

14· Wh'lt are absolute and relative measurements r E,.-plain in this Cllnnectlol'l
the URe of ratios, ,Jercentages and co-efficients. (B. Com. -" 6"0. 1941).
IS· Write short notes on: (a) Derivative series. (b) Complex derivatives.
(r) Total fertility rate. (d) Male.rr-prodnct)rm.rat... (e) P2llacks in the use (,f
ratiM and percentages.
96 rl1NDAHBN'rALS OF ITA'l'ISTICI

A B
......__-----I-----I----------
No. of ClDcUdateI Suc:ceuful No. 0( c:aocU- Sac:cellCuJ
appeared data appeared
M. Sc.
M. A.
60 ,0
90
zoo
Z40
160
190

-_.I. ~---1---~~~-- -~-- _.__


100
Sc:. .fOO Joo ZOO 140

160
_____ :'0:. __
ToU! 800 '90 800 '90
(II y,.,.. T. D. C•• R4.. 1961).
17. 'l11e following table gives the result of ceftaJn eumlnatiODll of tluee ani-
1'CtI1tb fa the JCIU 19'7. Whfch Ja the best otliveftlty P

M.A.
---------r=- Percentages resultlln the otliveftlty
A B C
----- ... ----------
7
, ----- 0
M.Sc:.

I
70
B.A. 80
B.Bc. 70
B.Com. 60

(M. A. c.Jmtlti)
Measures of Central
Tendency 9
Need and meaning
We have discussed in the last chapter the utility of various statistical
derivatives like ratios, percentages and rates, etc., in reducing the quantum
of data and also in reducing the size of the figures. But these derivatives
ate not enough for the proper condensation of figures and sometimes
there are many fallacies in their use. Condensation of data is nece,ssary
in statistical analysis because a large number of big figures are 1l0t only
confusing to prind but difficult to analyse also. In order to retiRGt Ib,
complexity of data and to make thelll GOllIparable it is essential that the,various
phenomena which are being compared are reduced to ,one figure each.
If, ,for example, a comparison is made between the marks obtained by a
group of 200 students belonging to a university and the marks obtained
by another group of 200 students belonging to another university, it
would be impossible to' arrive at any conclusion, if the two series relating
to these marks are directly compared. On the other hand, if each of these
series is repre_sented by one figure, comp~n 'Would 'be an extremely
,easr affair. ,It is ,obvious tnat a figure which,is used to represent a whole
senes should neither have the lowcst value in the series nor the 'highest
value, but a value somewhere between.these two limits, possibly in the
centre, where most of the items of the, $eries cluster. Such figures are
,called MealllriS' of Central TendellGJ or A_ages. An average represents
a whole series an4 as such, its value always lies between the minimum
and maximum values and generally 'it is located in the centre or middle
of the d i s t r i b u t i o n . ' ,
ObjeGts. Measures of central tendency or averages gipe a bird'l ey,
iii,., of/he hllge lIIalS ofJlatistitai tItsta w!Jith'Ordillari(y are not tanlJ jntelligible.
They are devices to aid the human mind 'in grasping the true significance
of large aggregates of facts and m~surements. They set aside the un-
,necessary details of the data and put'forward a concise picture of the com-
plex phenomena under investigation. If the human mind was capable
of grasping all the details of large nu~bers and their interrelationships,
averages would have no utility. But the human mind is not capable of
this. It is impossible to keep in mind, say, the details of heights, weights,
incomes and expenditures of even 200 students, what to talk of big figures.
This difficulty of keeping all the details in mind necessitates the use of
averages not only for grasping the central theme of a data, but also for the
.facility of comparison and further analysis. Averages are thus extre""lJ
/;elpflll for pllrPdJlS of (olllpariJon.
w~ jj an aperage a reprefen/alive. The reason why ao average is
a valid representative of a series lies in the fa.ct that ordinarily most of the
7
98 FUNDAMENTALs OF STATISTICS

items of a series cluster in the middle. On the extreme ends the number
of items is very little. In a population of 10,000 adults there would
hardly be any person whO" is 2 ft. high or whose height is above 8 'ft.-
There will be a smaU range within which these values would vary,
say 5 ft. to 6' 5", Even within this range a large number of persons
wou1d have a heighl: between say,S' 5· to 5' 10·. In other class intervals
of height the number of persons would be comparatively small. Under
such circumstances if we conclude that the height of this particular
group of persons would be represented by, say 5' 7', we can reasonably
be sure that this figure would, for aU practical purposes, give us a
satisfactory conclusion. This average would satisfactorily represent
the whole group of figures from which it has been calculated. Ordinarily,
items with values less than the average cancel the items whose values
are more than the average. Thus the average of 3, 4 and 5 is 4. The
item before it is one less in value and the item after it is one more in
value, than the average figure of 4. Thus the two deviations 'If -1
and +1 cancel each other.
Typical and descriptive averages. It should, however, be noted,
that a serie .. can be represented by an average only if the average is
really typical. Sometimes the average which is calculated is not truly
representative of the series. In such cases it should not be used to
represent the series. Averages which are representative are called
Typical Averages and those which are 'not representative aQ.d have only
a theoretical value are called Descriptive averages.
CharacteristicS of a representative average. In whatever way we define
an average it is necessary to keep in mind the fact that an average is
a particular value in a variable and as such it has to be expressed in the
same unit in which the series is. If the variable refers to the weights
of students in pounds the average would also be weight and in pounds.
Similarly- the average of ratios and percentages should be in ratios and
percentages only. Averages are meant for condensing a frequency
distribution in one figure and it is necessary that they are in the same
unit in which the original series is., At thi's stage, it is necessary to decide
about the desiderata or the requirements for a good measure of central
tendency. A typical average should possess the following charac-
teristics : -
(a) It shollld be rigidly defined. If an average is left to the estimation
of an observer and if it is not a definite and fixed value it cannot be
representative of a series. The bias of the investigator in such cases
would considerably affect the value of the average. If the average is
rigidly defined this instability in its value would be 110 more, and it
would always be a definite figure.
(b) It shollld be based on all the observations of the series. If some
of the items of the series are not taken into account in its calculation
the average cannot be said to be a representative one. As we shall
see later on there are some averages which do not take into account
MEASURES 011 CENTRAL T'ENDENCY 99
all the values of a group and to this extent they are not satisfactory
averages.
(e') )t should be e'apable o/further algebraie' treatment. lrfiytilverage
does not possess this quality, its use is bound to be very limited. It
will not be possible to calculate, say, the combined average of two or
more series from their individual averages; further it will not be possible
to study the average relationships of various parts of a variable, if it is
expressed as the sum of two or more variables. Many other similar
studies would not be possible if the average is not capable of further
algebraic treatment. -
(d) It .rhotJ/d be ea.ry to e'aleu/ate and .rimp!e fo follow. If the calcu-
lation of the average involves tedious mathematical processes it Will
not be readily understood and its use will be confined only to a limited
number of persons. It can never be a popular average. As such,
one of the qualities of a good average is that it should not be too abstract
or mathematical and there should be no difficulty in its calculation.
Further, the properties of the average should be such that they can be
easily understood by persons of ordinary intelligence.
(e) If should not be affected by jlue'ftlatiblls of samplilzy,. If two
independent sample studies are made in any particular field, the averages
thus obtained, should not materially differ from each other. No doubt,
when two separate enquiries a~".made, there is bgund to be a difference
in the average values calculated but in some cases this difference would
be great while in others comparatively less. Those averages in which
this difference, which is technically called "fluctuation of sampling"
is less, are considered better than those in which its difference is,
more.
One more thing to be remembered about averages .is that tbe itellu
lIIM.re average ir being cakulated rhollld form a oomogli1le01lS group. It is absurd
to talk about the average of a man's height and his weight. If the data
from which an average is being calculated at:e not homogeneous, mis-
leading conclusions are likely to be drawn. To.find out the average
production of cotton cloth per mill, if big and small mills are not separat-
ed, the average would be unrepresentative. SimiLirly, to study wage
level in cotton..mill industry of India, separate averages should be cal-
culated for the male and female workers. Again, adult workers should
be separately,studied from the juvenile group. Thus We see that as far
as possible, the data from which an average is calculated should be a
homogeneous lot. Homogeneity can be achieved either by selecting
dnly like items or by dividing the heterogeneous data into a number
of homogeneous groups.
Measures of various orders
Statistical series may differ from each other in the following three
ways : -
1. They may differ in ~ values of th~ variable round which
most of the .items cluster. '
100 FUNDAlIENTALS OP STATISTICS

2. They. may differ in the extent to which items are dispersed


round the centtal value.
3. They may differ in the extent of departure &om a normal
distnbution.
Accordingly there are three measures designed to study the
above differences. They are respectively known as : -
1. Measures of the first order or measures of central tendency or
averages.
2. Measures of the second order, or meas!1fes of dispersion.
3. Measures of the third order or skewness, kurtosis, etc.
We shall study all these measures. In the present chapter, a study
of the measures of the first order, or measures of central tendency is
being made. Measures of the second and third order 'Would be studied
in the next two chapters.
Types of .AJ!eragu. Measures of central tendency or averages are
usually of the following types : -
(a) Ma/helllatical Al)eragll
1. Arithmetic Average or Mean.
2. Geometric Mean.
3. Harmonic Mean.
(b) Averages of Position
1. Median
2. Mode
Besides these there are less important averages like QlI4IIralit M,ll".
There are also some averages which are mostly calculated by using
the technique of the arithmetic average in a modified form. Examples
of such averages are Mo,,;ng aperage ami Progr4ssip, a"erag'. Both of them
are used in the analysis of commercial statistics and their utility in the
analysis of a time series is very great.
Of the above mentioned five important averages Arithmetic
Average, Median and Mode are the most popular ones. Geometric
mean and Harmonic mean come next. We shall study them in this
very order.

ARITHMETIC AVERAGE

Arithmetic Average or Mean of a series is the figNre obtained b.J


dividing the tolal flallilS of the flar;olls itellls by their III1,,",er. If the heights
of a group of eleven persons are 64", 69"'. 63", 60" 65', 68", 62", 67",
70",66" and 61', then to find the arithmetic ave~ge of the height of
these persons we shall add these figures and divide the total so obtained,
by the number of items which is 11. The total of the items in this case
is 715" and if it is divided by 11 w..s:.se;t'411a.e ngure of 65-. This is the
mean or arithmetic av~~e serles. ~.,
WlL\SUaBS OP CBNTlI.AL 'mNDENCY 101
Calculation of (be arithmetic average in a eede.. of individual
obscrvatione
Direct ·M,lhod. fu has been said above the simple arithmetic
average of a series is equal to the sum of variables divided by their
number. This m~thod can be expressed in the shape of a mathematical
-..formula.
Suppose the values of a variable are respectivelY·lIIl • 1111. fIJ, .......
••• ............. ....Q and their arithmetic average is represented by II. then

1
If - 7("1+1Il1 +1Il.+ ............... +11I0)
1 :zmo
or a - -~ or a - -
Where " "
11=Arithmetic average; Ill" Values of the '\>ariablej 1: = Sum-
mation or total; ,,-Number of items.
The following example would illustrate this formula.
&alllpl, 1. Calculate the simple arithmetic average of the
following ltems :
Si%e of items
20 SO 72
28 53 74
34 54 75
39 59 78
42 64 79
SollltiOll. DiI'I# M,thod
Computation of aritbm.ctic_ aY~J;3ge
Size of items
(m)
20
28
34
39
42
50
53
54
59
64
72
74
75
78
7'9
'1'02 FUNDAMENTALS OP STATISTICS

Arithmetic average or a = l:m j where ~ represents


n
the summation of measurements and n the number of items,
H21
a - 15=54.73

Arithmetic -average of the series=54.73


Shorl-;ttt Method. The above method of the .;alculation of arith-
metic average can be used only when the items are few and the size
of the figures is small. If it is not so, there would be considerable
di!fficulty in the calculation of the arithmeuc average.
To remove this difficulty a short-cut method is used. The method
is based on an important property of the arithmetic average, which~
that the algebraj; Slim of the deviations of a series of individual obseruatioR;r,
from their mean is a/ways equa/to zero. Thus the arithmetic average of
4, 6, 8, 10 and 12 is equal to 8. If the differenr.e of each of these items
from the mean is calculated it would be-4, -2, 0, + ~ +4, Their total
is zero. This will always be so. This can be easily proved. *
This being so we can assume any arbitrary -mean to find out the
deviations of items from this assumed mean. The total of the devia-
tions will not be Zero. If this total i& divided by the nUmber of items
and added to the assumed average we shall get the actual arithmetic
average.

• Proof. Supposing INI' fIIz, INa, etc., stand for the values of a
variable and d1,tdz, da, etc., for· the;r respective deviations from the
mean and if a stands for their arithmetic average and n for the number
of items.
Then,
IN.+fIIZ+INS+ •.. +INn
a ~ --~~~--~--~~~
n
IN1+fIIZ+INS+ ... +tJtn ~an
The number of items is equal to n.
:. If we subtract an times from each side of the equation we
get

But
(m1-a)=d1, (ms-a)=d z, (INa-a)=d. and so on.
:. d1+dl+da+ ... +dn =0
Or l)i==0
MEASURES OF CENTRAL TBND:l!NCY 103

Symbolically:
T,tix
a =x+-- n
Where -
a =Actual arithmetic average; x=Assumed arithmetic average;
T.dx => The sum of the deviations from the assumed mean; n = Number
of items.
It should be remembered that the difference between the actual
arithmetic average and the assumed arithmetic average is equal to the
sum of the deviations from the assumed arithmetic average divided by
the number of items.
Symbolically ;
T,dx
a- x = __
n
If we solve example No. 1 by this short-cut method it will give us
exactly the same answer as we got by the direct method. This alternative
method is illustrated below:-
Calm/ation of arithmetic average
Short-cllt method
Deviation from an assumed
Size of items mean (50)
_ _ _ _ _ _.(m) (dx)
20 -30
28 -22
34 -16
39 -11
42 -8
50 o
53 3
54 4
59 9
64 14
72 22
74 24
75 25
78 28
79 29
n = 15 }Jdx=+71

Arithmetic ave1:age or a = X + T.dx;


n
where x represents assum-
cd average and T.dx represents the total of the deviations from the
assumed average and n. the number of items.
104 '_PUNP~N'l'.u.s, 91' STA.TISTICS

71
a -50+ ~ -50+4.73 -54.73

Arithmetic average of the series -54.73


In th~ ab<>ve solution when deviations are measured from 50 as
arbitrary mean. there is' in each case an error., This error is"a constant
figure and is equal to the c:illference betWeen the actual arithmetic
average and the, assumed arithmetic average (thUS the first deviation
of-3D should have been-34.73 measured from the actual arithmetic
average). In the sum of all such deviations from the assumed mean.
the total error would be " times of this constant error, since the e~or
is repeated once for every item included. If the sum of these deviations
iS'divided by n the actual amount of error is determined, and we' can
calculate the actual arithmetic average.
Calculation of arithmetic average in a discrete serice
Dirl&1 Method. In a discrete series the values of the variable are
-multiplied by their respective frequencies and the products so obtained
are totalled. This total is divided by the number of i~ems. which in a
discrete series, is equal to the total of the frequencies. The resulting
quotient is a simple arithmetic average of the series.
. Algebraically
. If fl' f .. fa, etc., stand respectively for the fre~uencies of the
values 1111' III•• III.. etc.,

Or 4<='P"f -= Emf
n 1:.f
The following illustration would clarify the formula :
Example 2. The following table gives the number of children
born per (amily in 735 families. Calculate the average number of
children born per family.

Number of children Number of Number of children Number of


born per family families born per family families
o 96 7 20
1 108 8 11
2 154 9 6
.3 126 10 5
4 95 11 5
5 62 12 1
6 45. 13 1
ImASUlI.ES OF CBNTIlAL TENDENCY 105
SO/lltiOll :--Dirttf metbot/'
Complltation of the averag' 1111mb" of children born per jal1l;!!
Number of child~en Number of
born per family I, families mxj
(m) f (j)
0 I 96 0
1 108 108
2 154 308
3 126 378
4 95 380
5 62 310
6 45 270
7 20 140
8 11 88
9 6 54
10 5 50
11 5 55
12 1 12
13 1 13
Total '2Lr=735 .tm ... 2166
Aritlunetic average or a ... Dn! = where .tmJ represents the
sum of the products of the size orhems and corresponding frequencies.
2166
a- 735"-2.9
... 3 children approximately.
The average number of children born per family is equal to 3
approximately.
Short:'&/It method. A short-cut method can be used in the discrete
series also. In this method the deviations of the items from an assumed
mean are first found out and they are multiplied by their respective
frequencies. The total of these products is divided by the total fre-
quencies and added to the assumed mean. The resulting figure is the
actual arithmetic average. For further simplification of the calculations
the deviations from the assumed mean may be divided by a commOn
factor to reduce their size. If ,this is done the sum of the products
of the deviations and frequencie's is multiplied by this common factor
and then it is divided by the total frequencies and added to the assumed
average.
Algebraically: a<=x+
tidx
n
Where
}';fdx = the total of the products of the deviations from the assumed
iilverage and the respective frequencies of the items.
106 PUNDAM&n'ALS OF STATISTICS

IT the deviations are further divided by a common factor and if


this factor is represented by i

a "" x + ( }:;~dx X i)
The following illustrations would clarify these tules :_
Example 3. The following data relate to sh:es of shoes sold
a store during a given week. Find the average size by the short-cut
method.
Computation of the overage nte of shoes
Size of shoes No. of pairs Size of shoes No. of pairs
4.5 1 8 95
5 2 8.5 82
5.5 4 9 75
6 5 ~5 44
6.5 15 10 25
7 30 1~5 15
7.5 60 11 4
So/Illion. Shor/-&II/ Me/hoa No.1.

Size of shoes No. of pairs


Deviations from
the assumed
I \ Total
(III) (J) meanJ!) deviation
(fdx)
4.5 1 -3.5 -3.5
5 2 -3.0 -6.0
5.5 4 -2.5 -10.0
6 5 -2.0 -10.0
6.5 15 -1.5 -22.5
1 30 -1.0 -30.0
7.5 - 95 60 -0.5 -30.0
8 I 0 0
8.5 82 +0.5 +41.0
9
9.5
75
44
I +1.0
+1.5
+75.0
+66.0
10 25 I +2.0 +50.0
10.5
11
15
4
I +2.5
+3.0 I'
+37.5
+ 12.0
- - - - - - I n ",,457 -----;.....1 --;~'"'"':fi"iJx---+--'1"'09;;t""";.5
Applying the short-cut methOd.
. hmeac
A rlt . average or a ." x+ 1.ftix
_
"
Where x stands for the assumed average; r.fdx for the sum-
mation of deviations for the assumed average; and fI, for the number
of items.
MEASURES OF CENTRAL TENDENCY 107
169.5
We get, a-8+ ~ - 8 + .37 ... 8.37
Thus the average size of shoes is 8.37
J:3.xampJe 4. The following table gives the heights of 350 men.
Ca1cnlate the mean height of the group.
Height in inches Number of persons
~ 1
61 2
63 9
&5 48
~ 1M
~ 1~
71 40
73 17
So/ll#OIl. Short-eftt method No.2.
-Computation of the mean height of the group.

Height
No. of
Persons
Deviations
from the
avo mean (67)
I Step-
deviation
Total
Deviations

{m) (J) (dx) (~) (ftlx)


5~ 1 -8 -4 -4
61 2 -6 -3 -6
63 9 -4 --2 -18
65 48 I -2 -1 -48
I
67 131 0 0 0
69 102 2 1 102
71 40 I, 4 2 80
73 i7 I
I 6 3 51
11":'350 .' "I:.fdx-157

Arithmetic avemge or a -x+ ( "I:.f':; X i) where x represents


the assumed average, fdx, the product of the frequency and step-dcvia-
tion, and i represents the common factor of deviations,

a-67+ (!~~ X 2 ) -67.89


'lpe mean height, of the group -67.89"
Cakulapon of the arithmetic average in a continuous series
The process of the calculation of arithmetic average in a conti-
nuous series is the same as in case of a discrete series. In a conti-
nuous series the midpoints of the various class intervals are written
down to replace the class intervals. Once it is done. there is no clif£er-
ence between a continuous series and a discrete series. .All- the
108 FtJNDAllENTALS OP STATISTICS

three methods of the calculation of arithmetic average discussed above


in connection with the discrete series can be used here as well. The
following examples would illustrate the point :-
Example 5. Calculate the arithmetic average of the following by
the direct method:
weekJy wages NumbCr of labourers
(in rupees)
11-13 3
13-15 4
15--17 5
17-19 6
19--21 5
21--23 4
23-25 3
Soilltion. Dire~t method.
Computation of the average daily wages of labourers.
Wages in rupees I No. of Mid-values of Wages multIplied
I labourers the wage by the no. of
I
I
groups labourers
( m) (j) (1/111) (mf)
- 11':_13 3 12 36
13-15 4 14 \ 56
15-17 5 16 80
17-19 6 18 108
19-21 5 20 100
21-23 4- 22 88
23-25 3 24 72
-
n=3O I
Substitutillg the above data 1n the formula.
-
l:fIIf-S40

Arithmetic Mean Ot a = l'/1IJ


We get, a co ~
"
540
.... 18 rupees.
Thus the average daily wages paid to a labourer is Rs. 18
Bxt1tl1P" 6. The following table gives the marks obtained by· a
set of students in a certain examination. Calculate the average mark:
per student.
Marks Number of students Marks Numbet of
students
10-20 1 60-70 12
20-.30 2 70-80 16
30-40 3 80-90 10
40-50 5 90-100 4:
50-60 7
SO/filion. Short-rill fIIItbod.
Computation of average marks per student
I
I I DevIatIon
Marks Mid No. of nom Step Total
values students assumed deviations deviation
mean (10)
{fJJ) (fJlJI.) (j) (55) (d%-) (fdx)
10-20 15 1 --=40-- -4 -4
20-30 25 2 -30 -3 -6
30-40 35 3 -20 -2 -6
40-50 45 5 -10 -1 -5
50-60 55 7 0 0 0
60-70 65 12 +10 +1 +12
70-80 75 16 +20 +2 +32
80-90 85 10 +30 +3 +30
90-100 95 4 +40 +4 +16
___..,--
n-60 r.fJx-+69

Arithmetic average or a "'"""+ (};f;: X;)


-55+ (:~ X 10)
-66.5 marks.
Charlier's accuracy check
The accuracy of the calculation of the arithmetic average can be
checked easily with the help of the following formula given by Chadit'r :
r./d "-:E{j(d+1)} -l;j
If the two sides of the above equation are equal it is a proof that
the calculations are all right. In the above example if + 1 is added to
the deviations they 'Would respectively become .... 3, - 2, -1, 0, 1, 2, 3,
4, and 5 and the values of f(d+- 1) would be - 3, - 4, - 3,. 0, 7. 24, 48,
40, and 20. The total of f«(1+1) would be+129. Substituting these
values in the equation given above we get
69-129-60 ... 69
Thus E/J-{f(d+l) }-E/
and it is a proof that the calculations are all right.
Steps in short-cut method
The short-cut method of calculating arithmetic average shoud be
used in all cases as it saves time and gives accurate results. The process
of calculating the arithmetic average by the short-cut method cao be
lununar~d as follows :_
110 FUNDAMENTALS OP STATISTICS

(i) Assume as average, the midpoint of a class which is in the


middle of the series. Technically any class can be chosen, but if the
class chosen is in ,the middle of the di.s.t.riblltion there is considerable
facility in calculations.
(i;) Calculate the deviations of the items (midpoints in case of
continuous series) from the assumed mean.
(iii) Divide the deviations by a common factor or magnitude
of the class interval. These deviations are known as stt} deviationl
or tiePiations in class-interval 1I11;IS.
(;v) Multiply the deviations with the respective frequencies of
the various classes and total the products, taking into account the
algebraic signs (plus or minus).
(f) Divide this total by the total of the frequencies and if step
deviatIons have been taken multiply the result by the common factor
or the magnitude of the class-interval.
(VI) Add this figure to the assumed average and the resulting
figure would be the actual arithmetic average of the series.
Algebraic properties of the arithmetic average
The arithmetic average has three important matnematical pro-"
perries. They are:
(a) "The total of the deviations of the items frOflJ the mean (takiTJg pitts
and minus sig"s) is 8fl'Iai 10 zero. We have already seen in ptevious pages
how important this property of arithmetic Ilverage is, and how the
calculation of arithmetic average is based on this rule. The algebraic
proof of this has also been given.
(b) If a serie.r oj an observation Gonsi!ts of 1",0 or more Gomponent series
the lIIean of the ",hole series Gan be easify expressed in terms of the metZl'lr of the
GOl1lptmenl series. If, for example a series relating to wages in a particular
industry is divided in two parts-one relating to males and the othea,
relating to females-and if we know the number of observations in each
group and their respective means, we can find the combined mean of
the two series as follows :
111 a1 +1:11 az
au ...
nl+~11
Where
au is the combined mean of the two series a and as the means
of the two series respectively and n1 and n2 the num~er of observations
in the two series.
Thus if the average wage of male workers is Rs. 30 ana the?
number i~ 200 and the average wage of female workers js Rs. 2S ~
their number is 100, the combined mean of the t'Wo series"Would-be
(200 X30)+(l()O X 25) 6000+2500
200+100 300
Rs.. 28.3
MEASURES OF CENTRAL TENDENCY 111
/
(I:') The mea/II of all the SlIlIiS and differences of carre/ponding obsert!ptions
in fwo series (with eqlla/ nllmber of obsertJatiofJs) is eljfla/ to tbe .film or difference
1)/ the mcans of the tW'O series.
The following illustrations would clarify the point : -
Section A Section B A+B A-B
5 8 --13 _3
6 10 16 -4.-
7 12 19 -5
8 14 22 -6
9 16 25 -7
10 18 28 -8
11 20 31 -9
Total 56 98 154 \~ ___ -42
Arithmetic Average 8 14 22 -6
In the above elf{<lrr.pie the mean of the corresponding sum of the
two series is equal f6" ~2 and the mean of the differences is-6. these
figures can be direcdy ()btained by adding the means of the two series
(8 +14) and by subtractin~ t"e mean of the second series from the Clean
of the first one (8-14).
Merits of arithmetic average
The arithmetic average is the most popularly used measure of
central tendency. There are many reasons for its popularity. III the
beginning of this chapter we l.ad laid down certain characteristics which
an ideal average should possess. We shall now see how far the arith~
metic average fulfils these conditions : -
(i) The first condition that an average should be rigidly defined
is ful.fiI1ed by the arithmetic average. It is rigidly defined and a biased
investigator shall get the same arhhmetic average from the series liS an
unbiased one. Its value is always definite.
(ii) The second characteristic that an average sbould be based
on all the observations of a series is also found in this average. Arith-
metic average cannot be calculated if even a single item of a series is
left out.
(iii) Arithmetic average is also capable of further algebraic ueat-
.nent. While discussing the algebraic properties of the arithmetic
~~erage. we have already seen in details, how various mathematical
processes can be applied to it for purposes of further analysis and in-
terpretation of data. It is on account of this characteristic of the arith-
metic average that:
(0) It is possible to find the aggregate of items of II series if
only its arithmetic average and the number of items is
known.
(b) It is possible to find the aritrunetic average if only the.
aggregate of items and their number is known.
112 PUNDAMl!N'I'ALS OP "STATISTICS

(ill) The fourth characteristic laid down for an ideal average that
it should be easy to calculate and simple to follow, is also found in
arithmetic average. The calculation of the arithmetic average is simple
and it is very easily understandable. It does not require the arraying
of "data which is necessary in case of some other averages. In fact this
average is so well knQwn that to a common "man_Average means an
arithmetic average.
Thus the arithmetic average
(a) is simple to calculate,
(b)~ does not need arraying of data,
(e) is easy to under5tand
(v) The last characteristic of an ideal average that it:. should be
least affected by fluctuations of sampling is also present in arithmetic
average to a certain extent. If the number of items in a series is large,
the arithmetic average provides a good basis of comparison. as in such
cases, the abnormalities in one direction are set off against the abnorm-
alities in another direction.
Drawbacks of arithmetic average
No doubt the arithmetic average satisfies most ,of the conditions
of an ideal average, there are certain drawbacks also from which it suffers
and as such it should be used with caution. These drawbacks really
arise on account of the peculiar nature of this average aqd the teChnique
of its calculation. The points worth consideration in this respect are
as follows:
(i) Since arithmetic average is calculated from. all the items of a
series sometimes the abnormal items may considerably affect this average,
particularly when the number of items is not large. For example,
if the income of a shopkeeper is Rs. 1,000 per month and the incomes of
his three assistants are Rs. 25, Rs. 35 and Rs. 40 per month respectively,
. . " 1000+25+35+40
the average Income of thIS group would be Rs. 4
or is 275 per month. This is not at all a representative figure. Simi-
larly, if one player in cricket scores 300 runs and the remaining 10 players
score only 140 runs, the total is 440 runs and the average per player is
40 run~. It is not a representative figure as 10 players out of 11 have
scored on an average only 14 runs each.
(it) Further, the fact that the arithmetic average cannot be calcu-
lated without all the items of a series can also be said to be a drawback,
If out of 1000 items the values of 999 items are known the arithmetic
average <;annot be calculated. Other averages like median and mode do
not need complete data,
(iit) Arithmetic average is no doubt easy to calculate but in Ii
relative sense its calculation may be more difficult than tha:t of mode or
median as they can be located merely by inspection.
(iv) Another point to be noted. in this connection is that the
arithmetic ayerage can be a figure which does not exist in tne series
MEASURES OF CENTRAL TENDENCY 113
at all. The arithmetic average of 12, 14 and 19 is 15. No items of the
series has a value of- 15.
(II) Arithmetic aye rage sometimes gives such results which appear
almost absurd. If we have to find out the number of children per
family, and if we use th~ arithmeti~ average, it is qui!e likely that we ~et
the average as 3'4 "children. ObvlOusly the result 1S absurd. A chlld
cannot be divided in fractions.
(II') Sometimes arithmetic average gives fallacious conclusions.
Suppose the incomes of two groups of persons are as follows :-
The average incoine of each of these two
groups is Rs. 300. It would appear from the A B
averages that both the groups are economically
at the same level, and the two series are al·
most similar to each other but this is not the 1000 325
case. The two series entirely differ from each 100 300
other. so far· as their composition is con- 75 285
cerned. 25 290
(fIi;) The arithmetic average gives
greater importance to bigger items of a series
and lesser Importance to smaller items. It has 1200 1200
an upward bias. One big item among four ,
items, three of which are small, will push up the average conSIderably.
But the reverse is not true. If in a series of four items there are three
big items and one small item the average will not be pulled down very
much.
The above discussion thus leads us to the conclusion that though
arithmetic average fulfils most of the conditions of an ideal average yet
it should be used with caution as it is likely to give erroneous conclusions
under certain conditions.
,. MEDIAN
- ('MuI;an !!!Ae vtJlf!..,_gilh, 11I~ ilJ.!!l.d.. a ser;e.!..JI!/un iJ,4 arrayed ~'!. t!,mn£_-
in&. ;;:r,{t:enmng Drlir DL!'IPblibl, .
It divides the series in two equal parts.
Tlie va uesof items in one part are less than the value of the median and
in the other part are more than it. If in a clas~ there are 21 students and
if they stand in a line in accordance with their height beginning with the
shortest amongst them and ending with the tallest, then the 11 th student
would be in the centre and would divide them in two parts consisting of
ten students each. Students of one part will have heIghts less than the
height of the 11th student and of the other part more than this height.
The height of the 11th student is the median height. For un grouped
data it may be convenient to und the value of the median by counting
+1 .Items, b"
N"2-.- eglnmng W'1t . h the h'19hest f\or Iowest).Item tn . th e
array. In grouped data it is abandoned.
Symbolically M .... si%e of ; items
where M stands fot the median and n for the number of items.
8
114 FUNDAM;NTALS OF STATISTICS

Location of median in a series of individual observations


Bxamplt 7. Find out the median of the f ....'lowing it~s:-
5, 7, 9, 12, 10, 8, 7, t5~ 21
Sollltion. Items arranged in ascending order JJf magnitude
Serial number Size of items
--~--- 5
1
2 7
3 c' 7
4 8
5 9
6 10
7 12
8 15
9 21
If M represents the median, and n, the number of items,
·
M = SIZe 0
f n+ 1 .Items -= SIZe
-2- . 0
f 9+ 1 .
-2- Items
=size of 5th item..,9
In the above example the number of items,was odd and there was
no difficulty in finding out the middle item and 'its value. If the number
of items is evt:n, say, 10, the middle item of n~ i~ems would be
1
I

the 5.5th item. In such a case the values of 5 and 6 items would be
added and IN _total would be divided by 2: the resulting figure would ,
be the value of the median. The following example would clarify this
point : -
Exam.ple 8. The following table gives the marks obtained'by a
batch of 30 B. Com. students in a class-test in statistics. (Marks 100).
Roll. No. Mark;s obtained Roll No. Marks obtaIned
1 33 16 ~4
2 32 17 33
3 55 18 42
.4 47 19 38
5 21 20 45
6 SO 21 26
7 27 22 33
8 12 23 44
9 68 24 48
10 49 2S 52
11 40 26 30
12 17 27 58
13 44 28 37
14 48 29 38
15 62 30 35
MEASURES OF CENTRAL TENDENCY 115
Find the value of the median.
S()ffltion. Marks obtained by 30 students arranged in ascending
order of magnitude:
Serial No. Marks Serial No. Marks Serial No. Marks
1 12 11 33 21 47
2 17 12 35 22 48
3 21 13 37 23 48
4 24 14 38 24 49
5 26 15 38 25 50
6 27 16 40 26 52
7 30 17 42 27 55
8 32 18 44 28 58
9 33 19 44 29 62
10 33 20 45 30~ 68

If M represents the median, and 'f1 the number of items,

M == SIZe z-
. 0 f n+1 items
.

=si:te of 30+ 1 items ""size of 15.5 th item.


2
size of 15 itenis+size of the 16 items.
2
38+40
'= - 2 - =39 marks.
Location of median in discrete series
In a discrete series also the items are first arranged according to the
ascending or descending order of mag~tude and their r~5pective fre-
quencies are written against them. After this, the frequencies are cumu-
lated and then the value of the middle item can be easily located. The
following example illustrates the procedure :
Example 9. Find the median siZe of the shoes from figures given
below:
Size of shoes Frequency Size of shoes Frequency
4.5 1 8.5 82
5 2 9 75
5.5 4 9.5 44
6 D 10 25
6.5 15 10.5 15
7 30 11 4
7.5 60
8 95
116 FUNDAMENTALS OF STA'l'IS'l'ICS

SO/linon: Calculation of the median size of the shoes :_


Size of shoes
4.5
Frequency
1
Cumulati¢ Frequency
,
5 2 3
5.5 4. 7
6 5' 12
6.5 15 27
7 30 57
7.5 60 117
8 95 212
8.5 82 294
9 75 369
9.5 44 413
10 25 438.
ffi.5 15 453
11 4. 457

d · ... size
-:?Jf M elan z-
. of n+l pairs;
. wnere
/ n equaIs the total f requency

.
= SIZe 0
457 + 1 or 229'
f ----2--- palrs - 8.5
I
It will be clear from the above figures th,t th~ .alue of items from
213th to 294th is 8.5. The 'Value of the 229th item. thus, is also 8.5.
Detetmination of median in a continuous s¢es
When the median of a continuous fre-~ncy distribution has to
be determined there is one difficulty. The tie of the median lies in
a class interval, and to get a definite fi~ure, interpolation has to be done.
Suppose, for example it is'found that the :value of the median lies in the
20 to 30 class interval'whose frequency is 40. Now to find out the value
of the median "We have to takic recourse to interpolation and to apply a
?articular formula. This formula, which we discuss below, is based on
the asswnption that the frequencies of the class in which the median lies
Lre uniformly spread over the whole class-interval. In the abqve case.
He shall presume that these 40 units are equally distributed in, the whole,
:lass interval of 20 to 30 or each of these ten values 20, 21, 22 ang so on,
las a frequency of 4 units. /
The formula of interpolation to find out the median is : -

D M=/t+ /~-/l (111- t)


7 /1
Where
M =Median; 11 - the lower limit of the class in which median lies;
't"",the upper limit of the- Class in which median lies;.,,=the frequency
117
of the class in which median lies; III-middle item; & -cumulative fre-
quency of the group preceding the median group.. .
The following ~xamples illustrate the above formula : -
'Bxdlllpl, 10. Find the median of the following distribution.
Cla88-intervals Frequencies Class-intervals Frequencies
as. Rs.
1-3 6 11-13 16
3-5· 53 13-15 4
5-7 85 15-:t7 4
7-9 56
9-11 . 21. Total 245
Sollllioll. Calculation of median
Class-intervals Frequency Cumula~ frequ(;ncy
1-3 6 6
·53
3-5
$-7 85 5'
144_
7-9 56 200
9-11 21 221
11-13 16 237
13-15 4 ·241
t5-17 4 245

Median=the value, of -; i. ,., 122.5 items; which lies in 5-7


group;
Applying the formula of interpolation,
f.-II
M I... 1 +-t'-
J 1
(111-')
~ 7-5
we have, M=5+ --ss-(122.5-59) 0=6.5
In the above example median is the value of 122.5 items which lies
in 5-7 group. In this group the number of items is 85. On the pre-
sumption that these items are uniformly distributed in this class-interval,
we can calculate median by direct arithmetical process also. 59th item has
the value of 5 and the next 85th items up to 144, ar e spread over two
values from 5 to 7. From 59th to 122.5 there are 63.5 items. The value of
63.5 item after 59th (or the value of 122.5 item) would exceed 5 (the
value of 59th item) by l-" X 63.5 or by 1.5. Thus the value of the 122.5
items would be 5+ 1.5 or 6.5
Graphic calculation of median
The median of a series can be calculated graphically also. For this
the series is cumulated and a cumula.tive frequency curve (called 0UII')
~s drawn. A perpendicular is then drawn on the base line (called Absi.rra)
118 FUNDAMBN'I'ALS OF STATISTICS

at the middle item cutting the curve at a particular point. The value of
the median is read on the vertical line (called fNdinate) at the point of
intersectio~. This procedure would be illustrated in the chapter on
Graphs.
Merits of median
(i) It satisfies the first condition laid down in previous pages for
an ideal average as it is rigidly defined.
(ii) It can be easily calculate'd and it is understood without any
difficult~.
(iii) It is not affected by the values of the extreme items and as
such is sometimes more representative than arithmetic average. If the
incomes of five persons are Rs. 30, Rs. 35, Rs. 40, Rs. 45 and Rs. 1,000
the median would be Rs. 40 whereas the arithmetic average would be
Rs. 230. Median in such cases is a better average.
(iv) Even if the value of the extremes is not known median can
be calculated if the number of items is known.
(v) It can be located merely by inspection in many cases.
(vi) It gives best results in a study of those phenomena which are
incapable of direct quantitative measurement, for example intelligence..
It is impossible to measure intelligence quantitatively but it is possible to
arrange a group of persons in ascending or descending order of intelligence
and thus to locate a person ;vhose intelligence can be:. said to be average.
Drawbacks of median
(i) Median may not be representative of a series iQ. many cases.
This is specially so when there are wide variatiQns between the values
of different items: For example, if the marks obtained by eleven students
are respectively 15, 16, 16, 18, 18, 20, 54, 60, 60, 60, and 72 the median
marks would be 20. Clearly the average is not representative of the series.
(ii) It is not suitable for further algebraic treatment. For exam-
ple, we cannot find out the total values of the items if we know their
number, and median.
(iii) When median has to be calculated in continuous series it
requires interpolation. The assumption of the interpolation, that all
the frequencies of the class-interval are uniformly spread over their
values in the class-interval, may not be actually true. In most cases it will
not be true.
(iv) If big or small items in a series are to receive greater impor-
tance median would be an unsuitable average. Median ignores the
values of extreme itenls.
(v) Median is more likely to be affected by the fluctuations of samp-
ling than the arithmetic average.
(vi) The arrangement of items in ascending or descending order
is sometimes very tedious.
Comparison of mean and median
Both the mean and the median satisfy the conditions of rigld
definition and stability but so far as ease in calculation is concerrred
MEASURBS OF CEN'tRAL TBNDENrV 119
median has >l distinct advantage over mean. On the other hand, the
general fluctuations of sampling 'affect the median to a greater extent
than the mean, though there might be some cases where mean is affected
to a greater extent by such fluctuations than the median.
So far as thl'! case of algebraic treatment of these two averages is
concerned, mean is definitely superior to median. In case of mean
w hen several series relating to one phenomenon are combined into one,
it is possible to find out the combined average from the averages of
various series and their number of observations. It is not possible in
case of median. However, if the component series are symmetrkaP
their means and medians would be identical and as such combined mean
and median would also be the same. But in case of asymmetrical distri-
bution the combined median would not coincide with the mean n01" with
any other assignable value. The sum or difference of the corresponding
values of the items of two series, is not equal to the sum or difference of
their medians as is the case with arithmetic average. The calculated value
of the median subject to error, is not necessarily the same as the true value
of the median, even if the error is :tero. that is if positive or negativ:e
errors cancel each other.
On the other hand, median has certain advantages over tue mean.
It is easily calculated and is readily obtained without even knowing
the value of all the items, provided they can be arrayed. Further in
SOme cases mean cannot be calculated due to the extreme class intervals
being infinite, like cCless than 100" or "more than 10,000" etc; but median
can be easily obtained in such distributions. Sometimes median may be
more representative than the arithmetic average, due to the fact that it is
not affected by the values of extreme item:::. If, for example, the values
of most of the items of a sample cluster round 200, median would not be
affected if suddenly, one it~m, whose value is 3000, is included in the
sample.' Mean in such cases is more affected by fluctuations of sampling
thhl the median. Further, median is geO(:rally the value of a particular
item of the series, whereas mean may not be the value of any item of the
series. In this sense median is a more natural average than the mean.
QUARTILES, DECILES AND PERCENTILES
It has been seen that the median divides an arrayed series in t'wo
equal parts. The values of items in on'e part are more than the median
value, and the vlllue of items in the other part, less than the value of the
median. With a view to have a better study about' the composition of
a series it may be necessary to divide it in four, five, six, seven, eight,
nine, ten or hundred parts. Usually the series are divided either in
four, ten or hundred parts. Just as one item divides the series in two
parts, three items would divide it in four parts, nine items in ten parts and
ninety-nine items in hundred parts. The values of these items are res-
pectively known as Quartiles, Deciles and Percentiles. A series can be
di~ided in five, seven or eight parts by Quintiles, Septiles and Octiles.
I For further explanation see chapters 0::1 Dispersion and Skewness.
120 'PtlNDAMl!NTALS OF STATIS'I'lCS

There are thus three quartiles, nine deciles and ninety-nine percen-
tiles in a series. The second quartile, qrth decile and 50th percentile is
median. The value of the item which divides the first half of It series
(with values less than median) i.ti two equal parts is called the First gM4rtil.
or LOlli" Quartil, and the value of the item which divides the latter
half of a series with values more than the median) in two equal parts is
called Third Q1IIIrIiJ, 0'Upp., QIIP,IiJ,. The S,fOlJd Qua,lih or the
MidtJ" Qlla,lil, is the same thing as median.
The calculation of 'Quartiles, ,Deciles, 'Percentiles and other such
values is done by following the same rules with which the value of median
is determined.
Thus
Ql - the v al ue of 4 " .ltems

Qa -the value of 3f) items

D1 -the value of 1~ items

Dr-the value of;~) items

D.-the value of ;~)items

PI - the value of 1~ items

PI-the value of ;g;) items


P ..... the value of 9;~) items
,
\ Where Ql and Q. stand for the first and third quartiles Dl DI and
b. for the first. second and ninth deciles and P1 PI and P~ for the l$t,
2nd and 99th percentiles respectively and" stands for the number of items
in the series.
Location of quartiles, deciles and percentiles, etc., in a seties of
individual observation.
&(I/IIpl, 11. From the data given in the Example No. 8 calculate
the value of the qua.rtiles, 6th decile and 70th percentile.
lBL\SUUS op ClIINTRAL ~DBNCY 121

. f II •
SO/Ii/iOIl. 1st Quartile or 12.1 -s~e 0 4" Items

... si2!e of ~ or7.50thitem


-size 7th litem+i (si2!e of 8th item-size
of 7th i~em)
-30+ i (32-30)
-31 marks.
. . f 3(11) .
, Quarttle or Qa
3rd -s~ 0 -
4 Items
-si2!e of 3 (~) items or 22.5 th item
-size of 22nd item+i (size of 23rd
item-size of 22nd item).
-48+. (48-48)
... 48 marks.
. f6(1I) .
6th Decile or D. -SI2le 0 '101tems

=size of 6i~2 or 18th item


... 44 marks'.
70(11) 1tems
. ,0f --roo-
. o-r p ,.=S12lC
70th Percentile .

-siR of 7~) or 21st item


-47 marks ..

Location of quartilea, decllea, percenjilcs, etc•• in a discrete aerie.


Exampl, 12. From the data given in Example 9 calculate the
lower and upper quartiles, 7th decile and 46th percentile, 3rd quintile
and 5th octile.
Solllfi~n. Lower quartile .
... Sl2:e 0
f TII patrs

=site of 4!7 or 114.25th pair


=7.5
122 FUNDAMENTALS OF STATISTICS

. f3(11) .
Upper Quartile =Slze 0 -4- pairs

=size of ~t~7) or 342.75th pair

=9
· f 7(n) .
7th Decile =!i{ze 0 -fO-palrs

· 0 £7(457)
=Slze lU or 3199h'
. t pair
=9
46th Percenrl1e · 0 £--pallS
=SJZe 46(n) .
100
· £46(457) 2 02 .
=Slze 0 -fOO-" or 1. 2th pair
=8
3rd Quintile · f 3 (n) .
=Slze 0 - 5 - palts

· 3(457) .
=SlZe o f -- or 274.25th Ipalr
5
=8.5
· f 5(n) ".
5th Octile = size 0 -8- pairs

.
=SlZ~ 0
f5(457)
8 or 2856h .
. t pal!
=8.5

Determination of quartiles, deciles, percentiles, etc., in a conti.


DUOUS series

In a continuous series, a"s in the case of median the values of quar-


tiles, deciles and percentiles, etc., lie in various class-intervals and the
actual values have to be interpolatecl by the use of algeoraic formulae.
The formulae for the calculation of quartiles, deciles, percentiles, etc., are
almost the same as used in the calculation of median. The assumption
of interpolation is also the same, and it is that the frequencies of a class-
interval are uniformly spread over its values.
Thus
MEASURES 0 r GENTRAL TENDENCY 123

'1
Where and 12 are the lower and upper limits of the class in which
the first quartile lies,/l the frequency of this class, '11 the quartile number
.!!._ and c the cumulative frequency of the class preceding the quartile
4
class.
',,-/1 ,
Qa = I 1 + 11 ('l3- C)
Where 11 and 12 stand for the lower and upper limits of the class
in which the 3rd quartile lies, 11 for the frequency of this class, 'ia the
quartile number and & the cumulative frequency of the class preceding
the quartile class. .
Similarly the formulae can be denved for the calculation of deciles
percentiles, etc. '
Thus '.-i1
'd )
D 2= I 1 - - 2-& an d
11
i.-II \
P72 ... il +-y-;- rp72- C}
Example 13. From the data given below calcula 'e the median and
quartiles.
Solution. Calculation of the median and quartile ages of married females.
Age Number of married Cumulatlye frequency.
females
~

0-5 3 3
5-10 31 34
10-15 410 444
15-20 1809 Q253
20-25 2446 4699
25-30 2223 6922
30-35 1723 8645
35-40 1292 9937
40-45 963 10900
45-50 762' 11662
50-55 531 12193
55-60 317 12510
60-65 156 12666
65-70 59 12725
70-75 37 12762

Total 12,762
The median age of married fe~ales
th f th n females, where n equals the total
= e age a e 2 frequency
12762
... the age of the -2 _. i.e., 6381st married female .
l~ PUoNDAMBNTALS OP STATIS'l'ICS

who lies in the 25-30 age group. Applying the formula of interpolation
1,-/1 )
M- I 1+-,;-(111-1
where, M represents the median,/:t-!.._nd I. the lower and the upper limits
of the group in which median is situated;!1 the frequency of median class;
111. the number of middle item or T items a"nd I. the cumulative
frequency of the group lower than the one ~ which median is situated.
30-25
M-25+ 2223 (6381 - 4699) -28.8· years approx.
The lower quartile age of married females
n
-the age of the ~ i.i., 3190.50th married female who lies
in the 20-25 age group;
By interpolation
I 1,-/1 )
121-1+ !1 (fl- t ;
where 121 represents lowc:r quartile; 11 and It. the lower and the upper
limits of the group in w!Uch lower quartile is situated;!I' thf frequency of
"
lower quartile class; fl' the number of 4 .ltems; and t. the cumu-
lative frequency of the group lower than the one in which the lower
quartile is situated.
=20+ 25-20
2446 (31 90.50 - 2253) = 21.9 yrs. approx.
The upper quartile agc< of married females
-the age of the 3 ~) 'i.,.• 9571.5 st married female who
i~ situated in the 35-40 age group;
By interopolation,
n 1+ 1.- 11
olGa"'" 1 --Y-;-(f,-t;
)

where Q a stands for upper quartile; 11 and I,. for the lower and the upper.
limits oithe groUp in which upper quartile is situated;!! for the frequency
of upper quartile class; fa for the number of ~) items; and 1

or the cumulative frequency of the. group lower than the one in which
Q. is situated.
-- 35 + 40-35
1M2 (9571.5-8645) "",38.6 yrs. approx.
MEASURES OF CENTRAL TENDENCY 125
Bxampll 14. From the dda given in Example 10 calculate
(a) 8th decile and (b) 56th percentile.
S oilltion : (a) 8th duile
Da =si2!e 'of 8 i~) items. where n equals 245
... size of 196th item, 'which lies in 7 - 9 group; applying the for-
mula of interpolation.
D.... 11 + !,,;:1 (ds - t).

here 11 and I" represent the lower and the upper limits of the group
in which 8th decile is situated.ft, the frequency of the same group; de, the
value of 8i~) item and t. the ~ulative frequency of the group
)ewer than the one in which 8th deCIle is situated.
9-7
We get Da-=7+ ---sr (196-144)
=8.6.
(b) 56th Pemntil,;
. f 56 (n) .
PH=s12le 0 100- items

. f 56(245) ..
.,. SIZe 0 100 stems
-sae of 137.2th item, which lies in 5-7 group,
Applying the formula of interpolation,
It-II
P&I - I 1 + 11CP.. - t);
where 11' II and!1 represent the lower and the upper limits and the fre-
quency of the group in which the 56th p_ercentile is situated PH> the value
of ~~_(n)
100 item and t, the cumulative frequency of the group lower
lihan the one in which PH is situated
7-5
We get Pae =5+ ss-(137.2-59)
-6.84

Graphic calculation of quartiles, deciles and percentiles, etc.


Like median quartiles, dedles and pel'centiles can also be calculated
graphically with the help of cumUlative frequency curves called Ogives.
The rule for drawing such curves and the procedure for reading the''Values
of quartiles, deciles and percentiles, etc., would be discussed in details
in the chapter on Graphs.
126 FUND.&MBNTALS OF STATISTICS

Characteristics of quartiles, deciles and percentiles, etc.


It should be remembered that quartiles, deciles and percentiles ·etc.
are not averages in the same sense in which mean and median are. An
average is representative 9f a whole series while quartiles, deciles and
percentiles are averages of parts of series. First quartile is the average of
the first half of a series arranged in ascending order. Similarly, the
third quartile is the average of the second half of the series. First decile
in the same way is the average of the first tenth part of a series and first
percentile of the first hundredth part.
These are,however, very helpful in understanding the formation
in a series. They tell us how various items are spread round the median.
Their special utility lies in a study of the dispersion of items from the
median. We shall discuss this point in greater details in the next chapter
and then the usefulness of this study would become more clear.

MODB

Mode is the most comma" item of a series. It represents the most typical
of frequent value of a series-a \talue which is in fact,the fashion(/a mode).
When one speaks of the "average student," "the most common wage."
"the common man" or "the typical farm" and the l!ke, he is unconsciously
referring to mode. If it is said that the most common wage in a particular
industry is Rs. 50 per month, what it means is that the largest number of
persons get this single figure of Rs. 50 as wage. Other I figures of wage
are not as popular as this one, and the number of persons getting them is
less than the number getting Rs. 50 per month. _
Methods of calt:tllation. It appears from this definition that it must
be very easy to calculate the mode of a series. In fact it is.., not always
so. As we shall see later on, the most satisfactory method of calculating
mode is that of "curve fitting" which is an extremely difficult process.
In ordinary practice, however, mode is estimated by easier methods which
are comparatively very much less accurate than the method of curve
fitting. These methods are no doubt very simple and easy.

Mode cannot be determined from a series of individual observations


unless it is converted into either a discrete or continuous series. In a
discrete series the value of the variable a&ainst which the frequency is the
largest would be the modal value. Simllarly in a continuous frequency
distribution the class-interval having the maximum frequency would
be the modal class. The exact location of mode in a class-interval is
done by interpolation, as in case of median, on the basis of certain
assumptions which we shall examine a little later. Location of the modal
value in a discrete series or of a modal class in a continuous series is
possible only if the concentration of items is at one particular point.
If, however, there are two or more values round which figures concentrate,
it becomes difficult to determine the value of mode. Such series are
-,
MEASURES OF CENTRAL TENDENCY 127

called hi~moda/, tri-modal and multi-modal depending on whether the


items concentrate at ~ 3 or more values.
Gr()llping method. In discrete and continuous series if the items
concentrate at more than one value, attempts are made to find out the
point of maximum concentration with the help of grouping method. In
this method values are first arranged in ascending order and the frequencies
against each value are written down. These frequencies are then added
in two'. and the totals are written in lines between the values added.
Frequencies can be added in two's in two ways:
(a) By adding frequencies of item numbers 1 and 2; 3 and 4;
5 and 6 and so on.
(h) By adding frequencies of item numbers 2 and 3; 4 and 5; 6
and 7 and so on. After this the frequencies are added in three's. This
can be done in three ways : -
(a) By adding frequencies of item numbers 1, 2 and 3; 4, 5 and
6; 7, 8 and 9 and so on.
(h) By adding frequencies of item numbers 2, 3 and 4; 5, 6 and
7; 8, 9 and 10 and so on. .
(c) By adding the frequencies of item numbers 3, 4 and 5; 6, 7
and 8, 9, 10 and 11 and so on.
If necessary freq~encies can be added in four's and five's also· After
this the si2!e of items containing the maximum frequencies are noted
down and the item which has the maximum frequency the largest num-
ber of times is called the mode. If grouping has been done in case of
continuous series we shall be in a position to determine the modal class
by this process.
We shall now see how mode is determined by the grouping me-
thod in a discrete series.

Location of mode in a discrete series

Example 15
Find out the mode of the following series : -

Si2!e FrequenH Si2!e Frequency


5 48 13 52
6 52 14 41
7 56 15 57
8 60 16 63
9 63 17 52
10 57 18 48
11 55 19 40
12 50
.
128 PONDAlmNTALS OP STA'rISTICS

SolNliofl

Location 'Of mode by grouping

SllC 0 f FrC<!uency (f)

I ~3) I
item
(81) _(!L_/ (2) {4) 1 (5) (6)

5 48.
100 I
!
6 52 } II J~
108
7
8
56
60.
..-
} 116
} u3
r I I
156
168 179

I t I
9 63
10 57 } lao I~
1
11 55 } 112 17' 162
} 105

I I
12 50
13 52 } 102 157
I

}
I
93 1.43
14 41 150
}
I I II~
98
15 57
16 63
} X20 161
172

17 52 } 115

I
} 100
18 48

19 40 } 88 140

The frequencies in colutntl (1) are first added in /tIIo' sib. columns (2)
and ,3). Then they are added in IDr,,'s in columns (4), (5) and (6). The
maxtmum frequency in each column is indicated by thick letters. It
will be observed that mode changes with the change in grouping. Thus
according to column (1) mode should be 9 or 16 according to column
(2) it should be either g or 10 or 15 or 16. To find out the point of.ma.xi-
mwn concentration the data can be arranged in the shape of table as
follows:
129

Analysis Table
Columns Sh!!e of item containing ma:lCimum frequency
- 9 16
(1)
(2) 9 19 15 16
(3) 8 9
(4) 8 9 10
(5) 9 10 11
(6} 7 8 9
No. of times--a -size 1 3 6 3 1 l' 2
occurs , ! I
Since the size 9 occurs the largest number of times it is the modal
size or mode is 9.
If we look; at the frequencies in the o~iginal t.able, we shall fin.d
that the frequency of 63, which is the ma:lC1mum smgle frequency, IS
against two values, 9 and 16. The series thus appears to be hi -modal
but the process of grouping leads us to the conclusion that the- con-
centration of items round 9 is more than the concentration round 16.
Even if the frequency against 16 was 64 instead of 63 probably group-
ing would have disclosed that the concentration/-of items round about
9 is plore, even though the individual frequency again!>t 9 is only 63 It
is thus never safe to rely only on the inspection of a series and to locate
the mode at the point of maximum frequency. Mode is affected by the
frequencies of the neighbouring items also, and, therefore, grouping is
essential, as it reveals the true point of ma:lCimum concentration.
Determination of mode in a continuous series
In a continuous senes the determination of mode involves two
steps. First, by the process of grouping, the class in which there is
maximum concentration has to be located. After this the value of
mode is interpolated by the use of a formula. It should be remember-
ed that mode does not always give satisfactory results in a continuous
series. If the size of the class-interval is changed the modal class also
changes in many cases. Suppose, for example, the magnitude of c1ass-
intervals is 10 and mode hes in, say, 30-40 group. If this series is
regrouped in class-intervals having magnitude of only 5, it is quite likely
that the mode may lie in, say, 45-50 group. It would depend on the
distributior. of items in various class intervals. For determining mode
in 2. continuous series, the class-intervals should not be very big in size,
but if the size of the class-intervals is very small the frequencies also
become very small, the distribution becomes irregular and the deter-
mination of mode becomes very difficult. The series n:ay even become
multi-modal.
It has already been said that the mode is affected by the frequen-
cies of the neighbouring classes. The formulae for the interpretation of
mode are based on this very assumption. If the frequency of the
9
130 FUNDAMEN'rALS OF STATISTICS

preceding class is greater than the frequency of the succeeding class,


mode wO.lld be nearer the lower limit of the class-interval and if the
frequ~n::y of the succeeding class is more than the frequency of the
preceding class mode would be nearer the upper limit. To study this,
the proportions of frequencies in the preceding and succeeding classes
to the total frequencies in these two classes, are found out.
If 10 sta?ds for the frequer~,;ies of the preceding ~lass and f. for
the frequencies of the succeedIng class these proportions would be
(a) fo
10+1,
(b) f.
10+12
These proportic;IOS are multiplied by the magnitude of the class-
i aterval and mode IS calculated in any of the following two ways-
either ~y adding fo~1B X(/a- / l) to the lower limit of the modal

class or by deducting 10~il X(/I-/1) from the upper limit .of the
modal class. Thus if Z stands for the mode,

~ Z=/ 1+ 10{'f;X (/,-11)

'/ Z-',,- 1o!:11 X (/ -:'1)1

Mode is also calculated by taking into account (I) the proportion


of difference between the frequency of the modal class and the frequency
of the preceding class, and (il) the proportiQn of difference between the
modal frequency and the frequency of the succeeding class.
Tr .~ jf /i sta.nds for the frequency of the modal class, and if we
take Into al.:count the lower limit of the mqdal class, the proportion of
the difference (/1 -10) is added to it and if we take into account the
upper. limit, the p.roportion of the difference (11-1a) is deducte~
from It.
Thus
MEASURES OF CENTRAL TENDENCY 131

The two sets of formulae given above would give different values
of mode as they are based on different assumptions. In the first case
we take into account only the frequencies of the preceding and suc-
ceeding classes whereas in the second case (i) difference of the modal
frequency and the preceding frequency, and (ll) the difference of the
modal frequency and the succeeding frequency, are taken into accou ..<.
The second set of formulae ~re supposed to be better than the
first set and usually mode is interpolated by starting with the lower
limit. As such we shall be making use of the following formula in
the determination of mode in a continuous series.
"*'
v/
Z- I1 + 2/11--/10- 12
1 0
I
(2- 1
I)

Example 16. The following tahle gives the length of life of 150
electric lamps : -
Life (hours) Frequency of lamps
a to 400 4
400 to 800 12
800 to 1200 40
1200 to 1600 41
1600 to 2000 27
2000 to 2400 13
2400 to 2800 9
2800 ~o 3200 4
Calculate the mode.
Soln/ion. Determination of mode by grouping
Life (hours) I Frequency of lamps
(1)
, (2) I (3) I (4) I (5) (6)
0- 400
I
I 4
-.
\.16
I
400- 800 12 ) ")
~52
800-1200 40 J 56
I
18]
1 f93
1200-1600 4:1: J
168 I!oa
1600-2000 27 J
2000-2400 13
140
J 'la,
I·' f~
}22 I
\49
2400-2800

2800-3200
9

4
113
J ! I i
132 FUNDAMENTALS OF STATISTICS

COlumns Si2!e of group containing maximum frequency


(1) 11200-1600
(2~ 800-1200 1200-1600
(3 11200-1600 1600-2000
(4) I 12QO-1600 1600-2000 2000-2400
(5) 400-800 800-1200 1200-1600
(6) 800-1200 1200-1600 1600-2000
No. oftime~
---
the size 1 3 6 3 1
occurs
There£ re 1200-1600
- , _g roup. Mode lies in
g roup is the modal
thill group and by applying the formula of interpolation, viZ"

Z=/ + ~1 £0-11 (/
1 2- /1 )L?ttfl?)
Where Z stands for mode, 11 and 12 .stand for the lower and upper
limits of the modal group, 11 stands for frequency of the modal group,
fo stands for frequency of the group preceding the modal group,f2 stands
for frequency in the group succeeding the modal group.
41-40
We get, Z = 1~+ 82-40-27 X ~O
=1226.67 hours
Thus
The modal life of the lamp = 1226.67 hours.
Detel:.tllination of mode by curve fitting
As has been said earlier, the above methods of the calculation
of mode are unsatisfactory. In most of the distributions, as they arise
in actual practice, these methods would not give satisfactory results.
The ideal method of calculating the mode is that of curve :litting. Since
there are many irregularities in the data which we normally come across,
it is necessary to remove them befo~e determination of mode. These
irregularities are removed by the technique of curve fitting. Attempts
jlre made to :lit an ideal curve which gives the closest possible :lit to the
actual distribution. The value of the variable corresponding to the
maximum of this ideal curve is the value of the modt'. The technique
of curve fitting is highly mathematical and should be left to the more
advanced students of this subject.
Determination of mode from mean and median
In a symmetrical distribution the mean, median and mode are
identical. We shall discuss in the next chapter the concept of 'a sym-
metrical distribution which gives a normal curve. In actual practice,
however, symmetrical distributions are very rare, and data usually give
a symmetrical curve. In distributions which moderately differ from
MEASURES OF CENTRAL 'l'E;NDENC'Y 133

a symmetrical distribution, there is an empirical relationship between


mean, median and mode. This relationship holds good fot most of
the moderately asymmetrical distributions. It is as follows :_
Made=Mean- 3 (Mean-Median)
It means that mean is. usually on one end and mode on the other.
Median lies at a point one-third of the distance between mean and mode
from the mean towards the mode. The median is thus closer to mean
than mode. From thi.s relationShip we can estimate the value of mode
of moderately asymmetrical distribution if we know the values of
mean and median. This relationship can also be expressed as :
~ (Median-.Mode) = i (Mean-Mode)
Tn most of the cases if the series is moderately asymmetrical, value
of the mode as estimated from the mean and median would not differ
significantly from the value calculated by other methods.
Determination of mode by graphic method
Mode can also be located graphically. In discrete and continuous
series the point of maximum frequency, which wouLl usually be the
apex of the curve, is observed, to find out the modal value. The value
of the variable against the apex of the curve would be the value of the
mode. However, when determining the value of mode graphically
it is better if the curve is smoothed for irregularities. We shall discuss
more about it in the chapter on Graphs.
Merits of Mode •
Of the many conditions laid down for an ideal average mode
possesses only a few. They are as follows : -
(i) It possesses the merit of simplicity. It can be determined
without much mathematical calculation. In a discrete series mode
can be located even by inspection. In this respect, like median, it has
an advantage over arithmet~c average
(ii) 111 is commonly understood. As has been said earlier, mode
is an average which people use in their day-to-day expressions. The
average si~e of the ready-made garment, the typical si2e of holdings,
the average number of road accidents are all examples of the common
use of mode.
(iii) Sin<:e mode is the most common item of a series it is not an-
isolated example like the median: Unlike arithmetic average it cannot
be a value which is not found in the series.
(iv) Mode is not affected by the values of extreme items provided.
they adhere to the natural law relating to extremes.
~v) For the determination of mode it is not "necessary to know
the values of all the items of a series. If the point of norm or maxi-
mum concentration is known it is enough. The value of extreme
items need not be known even, as usually there is very little concentra-
tion round the extreme values~
134 FUNDAMENTALS OF STATISTICS

Dtawbaoks of mode
Mode is an unsatisfactory average and has many drawbackt.
)ome of them are as follows:
(,) Mode is ill-defined, indeterminate lind indefinite. The veCj
Ist condition laid down for an ideal average that it should be rigidly
efined is not fu11illed by it.
(ii) Mode is not based on all the observations of a series and as
lch the second condition is also not fulfilled by it.
uti) Mode is not capable of further mathematical treatment.
(iv) Mode may be unrepresentative in many cases. If in a series
1000 items 20 have a particular value and other values have frequencies
is than 20, it does not necessarily mean that the value whose frequency
20 is the typical or average value. In such cases data should be
IOverted into class intervals of a bigger magnitude.
(u) In many cases it may be impossible to set a definite value of
.ode. There may be 2, 3 or more modal values.
omparison of mode with mean and median
From' the above discussion, about the merits and drawbacks of
lean, median and mode it is qbvious that mode dbes not stand in
)mparison either to mean or median. Mode no doubt possesses the
lerit of being the most popular item 'of a series and has also the
ivantage of easy calculation and common understandability, yet its
rawbacks are too many to be set' off against these merits. Mean is
.mple in calculation, its value is definite and can be easily determined.
t is amenable to algebraic treatment and is usually not affected much
y fluctuations of sampling. Median is more ea,ily calculated than
ven mean, and in certain cases it is as stable as mean, but if v'ariations
it the values of items .are not uniform, median is indeterminate, and .is
lmost incapable of algebraic treatment. Mode is hardly suitable for
[lost of the elementary studies as it is correctly determined only by
urve-fitting which is an extremely difficult process. It is unrepresen-
ative in many cases, and is not based on all the observations of a series.
rhus, of these tlvee averages, mean has definite advantages over median
.nd mode, though there may be some cases where median or mode
nay have preference over mean. Mode has its own importance and
t JIlay be the reason for giving its value along with mean but it should
)e clearly understood that mode cannot replace mean and for that
natter neither can median do so. However, it should not be ta~en
:0 mean that median and mode are superficial averages and have no
independent virtues. There are certain fields in which Il".t!dian or
mode may give better result than the mean, but sllch cases are few
and the universality of mean cannot be challenged on account of these
~ases. We shall discuss more about this point in a later section after
we have examined the other averages also.
MBASURBS OF CBNTRAL TENDENCY 135

GEOMETRIC MEA.~
Geometric mean is the nth root of the product 9'fn items of a series.
Thus if the geometric mean of 3. 6 and P Ie. ~o be calculated it would
be equal to the cube root of the product of these figures. Similarly
the geometric mean of 8, 9, 12 and 16 would be the 4th root of the
product of these four figures.
Symbolically g=D'¢mlXmsXHlaX ... mn
where g stands for the geometric mean, n for the number of items and
m for the values of the variable.
The calculation of the geometric mean by this process is possible
only if the number of items is very few. If the number of items is
large and their si2:e is big, this method is more or less out of question.
In such cases calculations have to be done with the help of logs. In
terms of logs.
1 _log.rml+1og.ms+log.ms+ .. .log mn
og.g_ 11

or
g- A nti-1og. {
log.III1+10g.Hl2+1nog.Hls+ .. .1og. Hln }

or

g=Anti-log. { ~ lOng. HI}


Thus geometric mean is the anti-log of the arithmetic average of the
logs. of the values of a variable. It is also possible to assume a log.
mean and to find out the deviations from it and then calculate the
geome~ric mean. It should be noted that the yalue ?f the geometric
mean IS always less than the value of the arIthmetlc avera~e unless
all the items have equal value in which case the geometric mean
and arithmetic average have identical values .
. The following examples would illustrate the calculation of geo-
metrlc mean:
E«ampl6 17. Calculate the simple geometric mean from the
following items ;-
133, 141, 125, 173, 182.
Solulion. Calculation of the geometric mean
Size of item Logarithms
133 2.1239
141 2.1492
125 2.0969
173 2.2380
182 2.2601
n==5 l:logs .... 10.8681 .
136 FUNDAMENTALS OF STATISTICS

According to the fotnlwa, viZ"


Geometric Mean~"'\i11l--xm
1 :I x---m
••• 0

• 1 ((lOg. m1+1og. mz .. .log. mn ))


"'" A nt 1- og
n
=Anti-log. ~O.8:81 =; Anti-log. 2.1736
=149 (to the nearest whole number)
Thus the geometric mean is 149.
Alttrnate MeihoJ
Site ofitem Logs. Deviations from
(m) assumed log. mean
(2.000)
(Jx)
133 '"1 2.1239' .1239
141 2.1492 .1492
125 2.0969 .0969
173 2.2380 \.2380
182 2.2601 .2601
,,-5 r.dc<= .8681

. Mean=
Geomc;tnc '.' Anti-
. 1og. [ assume'd 1C?g. + ~ Deviations]
II

= Anti-log. [ 2+ ~ ] _Anti-log. 2.1736


-t49 (to the nearest whole number)
Thus the geometric mean is 149.
Example 18.
Calculate the geometric mean of the following two series:-
(4) (b)
2574 .8974
475 .0570
75 .0081
5 .5677
.8 .0002
.08 .0984
.005 .0854
.0009 .56/2
MEASURES OF CENTRAL TENDMCY 137

SDllltion CalCIIlation, of geometric mean


Series A Series B

Size of items Size of items Logarithms


Logarithms (m)
em)
(a) 2574 3.4106 .8974 1.9530
(b) 475 2.6767 .0570 2:7559
(() 75 1:8751 .0081 3·9085
(d) 5 0.6990 .5677 1-:7541
(e) .8 1.9031 .0002 4.3010
(f) 08 2.9031 '.0984 2:9932
(g) 005 3.6990 .0854 29317
(h) 0009 4.9542 .5672 1:7538
~ log._ 2.1208 1: log. = 10.3512

Geometric Mean =Anti-Iog. [1: 10;. m]


S,ries A. According to the formula, we have

g= A n ti'..Iog. 2.1208
- 8 - = A nt!. 1og ..265
-1.841
16+6.3512
Series B. g= Anti-log. 8 =Anti-Iog.2.7938
-.06220
Al~ebraic properties of geometric 'lllean
Geometric mean possesses certain mathematical properties and they
are as follows : -
(i) Just as in case of arithmetic average the sum of the items
remains unchanged if each item is replaced by the arithmetic average,
similarly in case of geometric mean the product of the items remains
unchanged if each item is replaced by the geometric mean. Thus the
total of 2, 4. and 8 is 14 and the arithmetic average is__!3~ If in place
of these figures. we substitute the arithmetic average the total would
still remain 14.. Similarly in caSe of geometric mean the product of
these three figUres 2. 4 and 8 is 64 and the geometric mean is 4. If in
place of these numbers the geometric mean is written the product would
still remain 64.
(;1) On account of the above property of the geometric mean, it
is possible to calculate the combined geometric mean of two or more
senes if only their geometric meanS and the number ofjtems are known.
138 PUNDAMENTALS 01" STATISTICS

The formula for finding out the combined geometric mean is : -


g =anti-Iog.
1'2
[!!.1
log_:_~_t~_l~g'_~2 ]
"1+"2
Where gl'B stands for the combined geometric mean, "1 and "2 for
the number of items in the two series respectively and gl and g2 for
the geometric mean of these two series.
Thus if there are two series A and B with the following values:
A B
133 125
141 173
182
and we have to find out their combined geometric mean. The log.
of geometric mean of series A is 2.13655 and of series B it is 2.19833.
If these logs are multiplied by the respective number of items of the
two series, namely 2 and 3, their values would become 4.2731 and
6.5950 respectively. The combined geometric mean wculd be:
. 1 [4.2731+6.5950]
antI- og. 2+3~

. Iog. [10.8681']
""anti- . 1 .2. 1736
--5- =antt~,og
149
If we calculate geometric mean of the five items together we shall
get this very figure. It can be yerHied from the answer ot example. No.
17 in which the geometric mean of these five items has been calculated.
(iii) Just as in the case of arithmetic average, sum of the deviations
from the mea:' on either side is always equal, similarly in case of geo-
metric mean the product of the corresponding ratios on either side
is always equal. If the ratios of the geometric mean to the figures
which are equal to less than it, are multiplied together, this product
would be equal to the product of the ratios of figures more than the
geometric mean.
Thus the geo~etric m.ean of 3, 6, 8 .and 9 is equal to 6. The
product of the ratlos of ltems. equal to it or less than it would be
equal to the product of the ratios of items more than it.
t, g 8 9
Thus 3- X 6" "" g X g
or
6 6 8 9
'3 X'6=6 X 6
This p,:,cperty of the geometric mean is very important. It
indicates that geometric mean measures relative changes. If the price
MEASURES op CENTRAL 'TENDENCY 139

of a commodity has gone up from 100 to 1000 and of another commo-


dity has fallen from 100 ~o 10 there is no r~lative change in the p~ice
level. The rise in the rrlce of one commodlty has been set cff agalnst
the fall in the price 0 the other. In such cases arithmetic average
would give an erroneous conclusion. The arithmetic average of the
.. lprices
orlglna ' 0 f the two conuno d'lues
. wouId b e 100+100
2 or 100 an d
"
the arithmetiC average af ter the c h anges In
. prices,
. would be 1000+10
2
or 505 indicating that the prices have gone up_ The geometric mean
of the original prices would be 100 and the geometric mean of the
new prices would be V 1000 X 10 or 100. It indicates that relatively
there has been no Change in the price level as the rise in the price of
the first commodity has been counter-balanced by the fall in the price
of the other.
-Jiv) The geometric mean of the ratios of corresponding observa-
tions in two series is equal to the ratios of their geometric means. Thus
i~ there are two series as follows : -
A B A
13
3 2 1.500
6 4 1.500
8 4 2.000
9 8 1.125
Geometric mean 6 4 1.500
In the above example the geometric means of the two series A
and B are respectively 6 and 4 and their ratio is as 1.5: 1. The geo-
metric mean of the ratios ~ as calculated in the third column is also
the same figure, i.e. 1.5. Thus the geometric mean of the ratios of
the corresponding values of two series can be directly- calculated by
finding out the ratio of their geometric means.
(II) The geometric mean of the products of corresponding items
in two series is equal to the prod~ct of their geometric means. Thus
if in the above example we multiply-the corresponding items of A and
B series the products would be respectively 6, 24, 32 and 72 and their
geometric mean equal to 24. The geometric mean of these two series,
A and B is also ( x4) or 24.
(VI) Another mathematical proper.ty of the geometric mean is
useful in calculating the average rate of increase of any sum at com-
pound interest or in calculating the average rate of increase of a popu-
lation. In fact in all cases where changes in quantity are directly pro-
portionate to the quantity itself, or where we are qealing with average
of ratios as in case of index number of prices, the use of geometric mean
is almost inevitable.
140 FUND~MENTALS 'OF STATISTICS

Th,:s i.f PO represents the principal at the beginning of a period.


Po the prIncIpal at the end of the period, r the rate of interest and fl the
number of years.
pn =Po (l+r)n
and

r= n /) Pn __ 1
." Po
Thus if Rs. 1,000 at compound interest become Rs. 1,500 at the end
of 10 years there has been an increase of 50% and the simple tate of
interest is 5%. The compound rate would be

r =10 J~~~~ -- 1
=10'\1"1.5- -1 =1.041-1
=.041 or 4.1%
Whenever we have to find out the average of the rates of increase
_gr decrease, ~uch problems arise. If we calculat~ the mean of the
rates of increase or decrease the study would be Inaccurate as -mean
measures absolute changes but if the geometric mean of the rates of in-
crease or decrease is calculated the results would be accurate, as geo-
metric mean measures relative changes. ,
Merits of geometric mean
Besides the above-mentioned mathematical properties the geometric
mean has many other merits. We shall now examine the worth of
this average by finding out how many conditions Qf an ideal average
(laid down earlier) does it satisfy.
(i) The geometric mean is rigidly defined and its value is a precis~
figure.
(ii) It is based on all the observations of a series. Like arith-
metic average it cannot be calculated, if even a single value of a series
is missing.
(iii) It is capable of further algebraic- treatment. As we have
seen above, various types of mathematical relationships can be establish-
ed between data when a relative study is being made with the help of
geometric mean.
{io) Ge~etric mean is. not much affe~ted by the ftuct?ations of
sampling. It .,.ves comparatIvely more weIght to smaller Items. In
this respect it is better than the arithmetic average and a single big figure
does not push its value very much.
ThuS out of five conditions laid down for an ideal average geo-
metric. meal' satisfies four.
MEASURES OF CENTRAL TENDENCY 141

Drawbacks of geometric mean


(i) Geometric mean is neither easy to calculate nor is it simple
to understand. This is a major drawback of this average.
(il) If any value in a series is ~ero the geometric mean would also
be ~ero. In such cases geometric mean cannot be calculated. Simi-
larly if a value is negative geometric mean becomes an imaginary
figure.
(iii) Like arithmetic average it may be a value which does not
exist in the s ~ries.
(iv) The property of giving more weight to smaller items may in
some cases prove to be a drawback of the geometric mean. In some
cases smaller items have to be given smaller weights and bigger items
bigger weights. In such cases geometric mean is not an ideal average.
The above discussion clearly indicates the scope and limitations
of the geometric mean. We shall discuss more about the properties of
the geometric mean in the chapter on Index Numbers.
HARMONIC MEAN

Harmonic mean of a serie.r is tbe reciprocal of tbe arithmetic avcrage of


the reciprocals of the values of its vario!ls items. The harmonic mean of
1+1+1
2, 4 and 8 would be equal to the reciprocal of ¥ -p
Symbolically
1 1 1 1
h=Reciprocal '!!1 + m;+ m3 +~~ m~
n
Where h stands for the harmonic mean, '''1,171 2, etc., for the values
of the variable and n for the number of items. The following examples
would illustrate the formula :
Example 19. The annual incomes of fifteen families are given
below tn rupees : -
SO, 2500, 90, 1200, 1450, 7200, 120, 1060, 150, 4S0, 360, 96, 200,
520, 60.
Calculate the Harmonic Mean.
Somtion. Cot/lpulation of the Harlllonie II/Call of the annual incomes of
fifteen families
Size of items Reciprocals
Rs. (m)
80 .01250
2,500 .00040
90 .01111
1,200 .00083
142 FUNDAMENTALS OP STATISTICS

1,450 .00069
7,200 .00014
120 .00833
1060 .00094
150 .00667
480 .00208
360 .00278
96 .01042
200 .00500
520 .00192
60 .01667
-~·m~048

Harmonic mean = Reciprocal.2...+ 1 -t- _!_ + ... +


1111 111~ m3 "'n1
n
where 1111 m• ... mn represent 'the
values of the n items of the vari-
able, and n is the number of items.
R' l:E Reciprocals
..,. eClproca - - 11

Substituting the values


b ...Reciprocal .0~48 -Reciprocal .00536
"'" 186.5 tupees. I
Examplt 20. Calculate the harmonic mean of the following
items : -
1.0, 1.5, 5.0, 15.0, 250.0, .5, .05, .095, 1245.0, .009.
SO/lItioll. CalGII/alion of th, harmoniG mean.
Size of the items Reciprocals
1.0 1.0000
1.5 .6667
5.0 .2000
15.0 .0666
250.0 .0040
.5 2.0000
.05 20.0000
.095 10.5300
1245.0 .0008
.009 111.0000
145.4681
r[' R' I :E (Reciprvcal of the items)
£" arm011lC mean = eoproca n
145.4681
= Reciprocal 10 = Reciprocal 14.54681
=.06878
MEASURES OF CENTRAL TENDENCy 143
The above examples clearly show that the harmonic me~n gives
a very great importance to small itc:ms of a series. In example 19 above
if arithmetic average is calculated it would be 1038 whereas the har-
monic mean is only 186.5. In fact harmonic mean gives a value which
is smaller than not only the arithmetic average but also the geometric
mean.
Reciprocal character of arithmetic average and harmonic mean
The price of a commodity can be quoted in two ways, either in
tetms of money or in terms of quantities. Thus either we can say that
the price of mangoes is Rs. 1.50 per dozen or we can S1.y that the ptice
is eight mangoes per rupee. Suppose mangoes are selling at the follow-
ing rates at three shops-4 far a rupee, 5 per rupee and 10 for a rupee.
We have to calculate the average price of the mangoes. The arithmetic
average of the figures given above "(4, 5 and 10) is 1':. This is the
average number of mangoes sold per rupee. Therefore, the prke of a
mango would be fg rupee or 15.7 paisa. If these 9uotatio'lS are in
terms of prices and not quantities they would lie 25 paIsa per mango at
first shop, 20 paisa per mango at the second shop and 10 paisa per man-
go at the third shop. The average of these prices (25, 20 and 10) is
18.3 paisa per mango. Thus there is discrepancy between the average
calculated above. It is due to the fact that we have calculated the
arithmetic average of "quantity prices" (so many mangoes pe? rupee).
If we calculate the harmonic mean of these quantity prices, it would
. t+ l '
equal to the reciprocal of !- 11; .~ or the reciprocal of U
1
or it would
be f!j mangoes per rupee. The price of one mango then would be it
rupee or 18.3 paisa. Thus we find that if we calculate the harmonic
mean of the quantity prices and the arithmetic mean of the money prices
there would be no discrepancy and the price per unit would be the same,
in both the cases. Harmonic mean gives accurate results in such ..:ases.
For one rupee we get 1~ mangoes: therefore the price of a mango is
U rupee. The two are reciprocals of each other.
Merits of harmonic mean
(fJ Harmonic mean satisfies the test of rigid defini!ion. Its
definition is precise and its value is always definite.
(it") Like arithmetic average and geometric mean this average is
also based on all the observations of the series. It cannot be calculated
in the absence of even a single figure.
(iii) Harmonic mean is capable of further algebraic treatment.
fiv) Like geometric mean this average is also not affected very
much by fluctuations of sampling.
(v) It gives greater importance to small items and as such a single
big item cannot push up its value.
(VI) It measures relative changes and is extremely useful in averag-
ing certain types of ratios and rates.
144 FUNDAMENTALS OF STATISTICS

Drawbacks of harmonic mean


(1) Harmonic mean is not readily understood nor can it be cal-
culated with ease.
(2) It gives a very high weightage to small items and for analysis
of economic data it is not very useful.
(3) It is usually a value which does not exist in a series.
(4) Generally it is not a good representative of a statistical series,
unless the phenomenon is such where small items have to be given a
very high weightage.
OTHER AVERAGES

Having discussed the chief features of the five ~ain avera~es we


shall briefly discuss some of the minor and less important averages.
Quadratic mean
This is also known as Root Mean Sqllare. It is calculated by taking
the square root of the average of the squares of the numbers. It is
useful when some items have negative values and others positive values
because in such cases the mean is not very r!!presentative.
Symbolically :

Qm=Jmlll+msl+mall + ... + m1n


n ,
Where Q'II stands for quadratic mean, ml,f"., ma, etc., for the value
of the variable :tnd n for the number of items. The following example
would illustrate the formula : -
~ Example 21. Find out the quadratic mean of the following items
10, 30, 40, 50 and 70.

SO/lition :
Calclllation of quadratic mean
Size of items Square of the size
(m) (ml)
10 100
30 900
40 1600
50 2500
70 4900

n=5 10000
1000~
Qm= j 5
=44.72
MEASURES OF .cENTRAL TENDENCY 145

The arithmetic average of the series would have been 40. Quad-
ratic mean is seldom used as an average except in case of finding out
the average of the positive and the negative deviations from a measure
of central tendency. In that case it is known as standard deviation: We
shall discuss it in the next chapter.
Moving average
Moving average is calculated by using the technique of simple
arithmetic average. It is useful in removing the irregularity of time.
series and is usually calculated to study the long period trend. The first
thing to be decided in the calculation of moving average is the "period"
for which the average is to be calculated. The moving average may
be three-yearly, five-yearly or seven-yearly depending on the nature of
the series. We shall discuss this problem of periodicity of moving
average later in the chapter on Analysis of Time Series. For the present
we shall simply illustrate the technique of its calculation.
If a three yearly moving average is to be calculated the arithmetic
average of the first three years' figures would be found out and written
against the middle year (second year in this case). Then the'first year's
figure would be dropped and the aritbmetic average of second, third
and fourth years' figures would be calculated and written against the
third year. Similarly the arithmetic average of the figures of third,
fourth and fifth years would be written against the fourth year and so
on. The following example would illustrate the method of its cal-
culation.
ExatlJple 22. Calculate the three yearly moving average of the
following figures relating to the annual sales of a concern (in lakhs of
rupees).
Calculation of tbree yearly moving average
Year Sales (in lakhs 3-Yearly moving 3-Yearly mov-
of rupees) Total ing average
-
1945 8 ... ...
1946 9 25 8.3
1947 8 24 8.0
1948 7 23 7.7
1949 8 24 8.0
1950 9 27 9.0
1951 10 30 10.0
1952 11 32 10.7
1953 11 34 11.3
1954 12 33 11.0
1955 10 ... ...
Similarly, if a 'five yearly moving average Has to be calculaJed
the first five figures (of years 1945 to 1949) would be added 31'd their
10
146 FUNDAMENTALS OF STATISTICS

average would be written against the third year or 1947, then the next
five figures leaving the first (of years 1946 to 1950) would be averaged
and the figures written against the middle year of 1948 and so on.
Moving a~erage is very helpful in removing the fluctuations of
time series and giving an idea about the general trend.

Progressive average
It is also calculated by the help of simple arithmetic average. It
is a cumulative average and is different from the moving average. In
the calculation of this average, figures of all previous years are a,dded
and no figure is left out as in the case of moving average, Thus the
progressive average of the second year would be equal to the arithmetic
average of the figures of the first two years; the progressive average
of the third year would be equal to the arithmetic average of the figures
of the first three years and so on. .
The following illustration would clarify the procedure :
Example 23. C~culate the progressive average of the data given
in Example 22:-
Ca/(ulation oj progressive average
I
Years Sale (in lakhs of ProgressIve Progressive
rupees) total average
1945 8 8 8.0
1946 9 17 8.5
1947 8 25 8,3
1948 7 ~2 8.0
1949 8 40 8.0
1950 9 49 8.1
1951 10 59 8.4
1952 11 \, 70 8,7
1953 11 81 9,0
1954 12 93 9.3
1955 10 103 9.3
Pr<;>gressiv~ average is used by business-houses particularly in early
years wIth a VIew to compare the current profits with those of
the past,
Relation between different averages
When different averages have been calculated from a given set of
observations it will be found that there is a relationship between their
values, Generally these relationships are of the following type : -
(i) If a series is «normal" or ".ryll/metrical" the values of its mean
median and mode would be identical.
M~SURES 00F CENTRAL TENDENCY 147

(if) If a series is moderately asymmetrical the ~e4ian wo~ld be


somewhere between the mean and the mode. Usually 1t 1S at a d1stance
one-third from the mean towards the mode, or
median =mean- j (mean-mode)
mode -=mean-3 (mean-mode)
or
(median-mode) = I (mean-mode)
(iii) The arithmetic mean is greater than the geometric mean;
which in turn, is greater than the harmonic mean; but if all the values
of a variable are equal the arithmetic mean, geometric mean and harmo-
nic mean would coincide.
(if) The geometric mean of any two values is equal to the geo-
metric mean of their arithmetic average and harmonic mean. Thus
the arithmetic average of 4 and 16 is 10, geometric m-:an is 8 and the
harmonic mean is 8Il'. The geometric mean of 10 and 3 a' is also 8:
This rule holds good only when there are two items in a series. If
there are more than two items this rule would hold good only, if the
values of the items increase in geometric progression (like 2, 4, 8, 16 etc.)
Selection of an average
The choice of an average is an important and difficult problem
which a statistician has to face. It is to be very cautiously made, as
if a wrong average has been chosen, inaccurate conclusions are likely
to follow. There are no hard and fast rules for the selection of a
particular average in different fields of statistical investigation. Selection
of a particular average should be done after giving consideration to the
nature and type of enquiry, as also to the object with which the investi-
gation has been conducted. No one average can be said to be good
for all types of enquiries and under all conditions.
In the selection of an average consideration must also be given
to the chief characteristics and limitations of various averages. Most of
the averages suffer from one limitation or the other and -they have their
own merits and drawbacks as owell. We have seen that the arithmetic
average is, generally speaking, better than other averages as it has many
properties which other averageo~ do not have, but even arithmetic
average cannot be recommended 'for universal usc. If, for example,
a veOry large number of items in a series have small values and only one
or two items have very big values, arithmetic average would give
fallacious conclusions. In such cases median, mode or geometric mean
would give much better results than the arithmetic average. However,
if the purpose of the investigation is to find out things like "averag<r
output", ·'averag.e imports or exports", "average cost of production'S
~r "average price" the arithmetic average would be an ideal one. In
economic and social studies it gives better results than other averages.
If the purpose of the enquiry is to study such phenomena which are 0

incapable of direct quantitative measurement, like intelligence or honesty,


etc., median has a distinct advantage over all other averages. If, however.
148 FUNDAMENTALS OF STATISTICS

there are wide variations in any series median is the most unsuitable
average. Similarly, if the enquiry under question relates to, say,
"average size of ~eady-made clothes" or "size of typical farms," the
average to be used is mode. The use of mode is every day increasing
in business and commerce, Modal output per machine or modal time
needed to produce a commodity are very important concepts in the
business world of today. But mode i~ very often indeterminate and
unrepresentative and is entirely unsuitable for many enquiries. It is
not capable of further algebraic treatment and has limited use. If an
e~quiry is being conducted to study the relative changes in the price
level at two periods, neither arithmetic average nor median or mode
would give satisfactory results. In such cases the best average is the
geometric mean. In the construction of index numbers the use of
geometric mean is almost' universal. But geometric mean is entirely
useless if bigger items have to be given more weight or if a study of
absolute, rather than relative changes, is undertaken. Harmonic mean
similarly is the best average if small items have to be given more weights
or if we have to find out the average of certain types of rates, etc. If,
for example. we have to calculate the average speed of a person who
walks four miles per hour, for the first mile a~d three miles an hour,
for the second mile, arithmetic average would give inaccurate results.
Harmonic mean of these figures which would be s._,4, . is the correct
average. This person takes fifteen minutes to cover the first mile and
twenty to cover the second mile or in thirty-five minutes he covers
two miles. The speed is '.,~ miles per hour.
The above discussion clearly shows that each type of average has
its own field of importance an.d usefulness. Before selecting an average
ail these considerations should be kept in mind. In actual practice
tWo or three averages of a series may be necessary for a proper under-
standing of its special 'features. A discriminate use of averages is
assential for souna statistical analysis. But all said and done, it has to be
edmitted that arithmetic average would be found to be ideal average
for a larger number of enquiries, than any other average.
Limitations of averages
Even when an average .has been selected very judiciously and is
ideal for a particular investigatioo, it should never be forgotten that
even the best average has its own limitations. An average is a single
figure representing a series, and no single figure can condense in itself
all the properties of the items which it represents. This is the reason
why conclusions which are drawn on the basis of a study of averages
are not always infallible: The average height of women may be less
than the average height of men but it does not mean that no woman
can be taller than a man. The well-known example of the mathema-
tician who calculated the average depth of a stream and finding it lower
than the average height of his family members, attempted to cross it anct
drowned with his family in the process, is an illustration on this point.)
MEASURES OF CENTRAL TENDENCY 149

The average depth of the river may have been lower "than the height
of the shortest member of the mathematician's family, but at some
point the depth of the stream must have been more than the height of
the tallest member in the group.
Average is a single ~gure and can be expected to represent a series
only as best as a single figure can. Averages do not throw light on the
formation of a series or distribution of frequencies round the various
values of a variable. It is for this reason that measures of dispersion
and skewness are calculated. Averages do not reveal the whole story of
a series. A student getting 30, 40 and 50 markis respectively, in three
examinations would have the same average as another who gets 50, 40
and 30 marks respectively. The progress of the two students is in
different directions but on the basis of the averages' they will be ranked
together.
In fact if wrong conclusions are drawn by the use of judiciously
selected averages, it is not the fault of the averages. The fault lies
with the person drawing the conclusions. The inherent limitations of
averages should always be kept in mind and they should not be expected
to reveal more than what they can.
WEIGHTED AVERAGE
Need and meaning. In the calculation of simple. average each
item of the series is considered equally important but there may be
cases where all items may not have equal importance, and some of
them may be comparatively more important than others. The funda-
mental purpose of finding out an average is that it shall "fairly" re-
present, so far as a single figure can, the central tendency of the many
varying figures from which it has been calculated. This being so,
it is necessary that if some items of a series are more important than
others, this fact should not be overlooked alt<;>gether in the calculation
of an average. If we have to find out the average income of the
employees of a certain mill and if we simply add the figures of the
income of the manager, an accountant, a clerk, a labourer and a watch-
man and divide the total by five", the average so obtained cannot be
a fair representative of the income of these people. The reason is that
in a mill there may be one manager, two accountants, six c~erks, one
thousand labourers and one dozen watchmen, and if it is so, the rela,...
tive importance of the figures of their income is not the same. Similady
if we are finding out the change in the cost of living of a certain group
of people and if we merely find the simple arithmetic average of the
prices of the commodities consumed by them, the average would 'be
unrepresentative. All the items of consumption are not equally.im'por-
tanto The price of salt may increase by 500% but this wiP not'affect
the cost of living to the extent to which it would be affected, if the
price of wheat goes up only by 50%. In such cases if an average has
to maintain it,) representative character, it should take into account
the relative importance of the different items from which it is being
calcl,1lated. The simple average gives equal importance to all the
items of a series. In this sense a simple average is also a wei
150 FUNDAMENTALS OF STATISTICS

average, because 'in a simple average the relative importance of all the
items is supposed to be the same. But in actual practice the impor-
tance of various items is not always the same and in such cases the
simple arithmetic average and the weighted arithmetic ~verage would
differ in value. Therefore, in order that an average may be a typical or
a representative average, it is necessary that the relative importance
of items is taken into account in its calculation. Thus if item A is
considered-to be five times as important as item B, the weights of these
items respectively should be 5 and 1. Weights are. figures which indicate
the relative importance of variolis items.
Difficulties in weighting. It is easy to say that in many cases it is
better to take into account the relative importance of items and to have
a weighted average, rather than simple average~ but it is very difficult
to decide the relative importance of different items. If we have to
decide the relative importance of items, the problem that would arise
would be about the basis or criteria of determining the relative impor-
tance. How should weights be assigned, is a question very difficult to
answer. In fact no hard and fast rule can be laid down for the assign-
ment of weights, as the relative importance of items depends on ~he
nature and purpose of the investigation. In some cases the weights
are determined without much difficulty, and such cases are those where
weights are determined on the basis of some evidences associated with
given data. If we have to decide the weights of the income figures of
a manager, an accountant, a clerk, a labourer and a watchman, the
simplest method would be to give them weights in accordance with
their number. Thus if there is one manager, two accountants, six
clerks, one thousand labourers and twelve watchmen, the weights
would also be these very figures respectively. 1'6 calculate the average
income of these people if instead of finding out the simple arithmetic
average of the figures of their incomes, we multiply their incomes by
their numbers (weights), and if the total of these products is divided
by the total of weights, we shall get the weighted arithmetic average
of the series. This average' would be a better representative of the
series than the simple arithmetic average. Many writers like Secrist
and Kelly are of opinion, and Eightly too, that this is not a weighted
r
average. When values ar multiplied by their frequencies and the
sum of their product~ is dIvided by the total of their frequencies, it
is in fact a simple arithmetic average of the series. In cases of discrete
and continuous series we have already seen that arithmetic average
is c:Vculated by multiplying the values by their respective frequencies.
Sueli writers are of the opinion that weights should be determined
by some such evidence, which is not associated with the items them-
selves. But it is neither easy nor safe to associate weights to various
items arbitrarily, as in such cases weighted average may give misleading
conclusions. Weights have to be judiciously selected.
In fact difficulties in'the selection of p!Oper weights are so many.
that many writers are of opinion that it is better to have simple average
than to have weighted average of doubtful fairness. Thus Bowley says:
\
)
MEASURES OF CENTRAL TENlJENcY 151

"The discussion of the proper weights to be used... has occupied a


space in statistical literature out of all ~ons to its significance,
for it may be said that no great importance need be attached to the
special choice of weights;" a little further he observes: «80 we arrive
at a very important precept; in calculating averages give all care to
making the items free from bias and do not strain after exactness it
weighting."
But this is hardly a full statement of facts. Weights are used to
make items free from bias. Weighting is essential because items usually
constitute a heterogeneous group. Even Bowley admits that "paucity
of data may make the use of weights necessary or an attempt at fairness
of measurement rr.ay make weighting expedient". So we arrive at a
conclusion that even though a selection of proper weights is an extremely
difficult task, weighting of items is essential to make them free from
bias.
We shall now see how various averages can be weighted and
what special characteristics do such averages possess.
WEIGHTFJ) ARITHMETIC AVERAGE
In calculating the weighted arithmetic average each value of the
variable is multiplied by its weight and the products so obtained are
aggregated. This total is divided by the total of weights and the
resulting figure is the weighted arithmetic average.
Symbolically
a l = f!Jl w1+m2 w2+ms ws+···+mn Wn
wl+ w2+Wa",+ wn
where a 1 stands for the weIghted arithmetic average, m 1 , m 2 , etc.,
for the values of the variable and WI' W2' etc., for their respective
weights:
The formula can be written in short as:-
~ItIW
al = ~w

Where ~ mw stands for the sum of the products of the values and their
respective weights, and ~ w for the sum of the weights. The following
illustration would clarify the formula : -
Calculation of the weighted arithmetic average : direct method
Example 24. Calculate the weighted arithmetic average of the
prI~e of tea, from the following data assuming the quantities sold as
welghts:
Price per pound Quantities sold
(Rs.) (pounds)
2.25 14
2,:"0 11
2.75 9
3.00 6
152 FUNDAMENTALS OF STATISTICS

Solulion. Calculation of the weighted arithmetic average of the price


of lea : -
Price per pound Quantities llold Price X quantity
in Paisas (in pounds)
(fll)
225
)4) (II/W)
3,150
250 11 2,750
275 9 2,475
300 6 1,800

Total 1,050 40 10,175


Weighted arithmetic average or,
:L mw 10,175
al = TW = -;ru-
=\254.375 paisa per pound.
-2.54375 rupees per pound.
The simple arithmetic average of the prices would have been
T
1050 .
or 262.5 paIsa (2.625 rupees per pound).
Aclual and estimated weights. In the above example weights used
were actual. Many times actual weights are not available and estimated
weights have to be used or even if actual weights are available estimated
weights are used for the sake of simplicity in calculation. It will be
observed that if there is not much difference between the actual and
esth~ated weights the results obtained by usi1"l:g tl:te estimated weights
would not materially differ from the results obtained by using the actual
weights. In the above example if actual weights were not used and
if the estimated weights were'respectively 15, 10, 10 and 5 the products
of price and the weights would have been respectively 3,375, 2,500,
2,750 and 1,500 and the total would have been 10,125. The total (If the
weights is still, 40 and ,·the weighted arithmetic average would be
10,125/40.or 253'125 paisa or 2·531 rupees per pound. The difference
between the two averages is not much.
If the actual weights are used and if the values of the items arc.
slightly changed the average would be affected to a greater extent than
where values are correct and the weights estimated. Thus, if in the
above example the values are taken as 200, 275, 300 and 325 respec-
tively and the original weights are used the products of the values and
weights would be respectively 2,800, 1,025, 2,700 an<1 1,950. Their
total would be 1657 and the weighted arithmetic average would be
1?~;5 or 261.875 paisa or 2.618 rupees per pound. This' error is
much more than the one in the previous case. Thus it should always
be remembered that an error in weight is less serious than a corresponding
MEASURES OF CENTRAL.'iit-JDENCY 153

error in the size of ilcou. The reason for it is that the errors in weights
are u<ually unbiased and compensate each other while errors in the
values of items are generally biased ones. It is for this reason that
we had concluded above that attempt should be made to make the
items free from bias and we should not strain after exactness in
weights. According to King, "The items should be as exact as pos-
sible and the weights used should be approximately accurate ...... ".
Short-cut method of calculating weighted arithlnetic average
The method discussed above for the calcluation of weighted
arithmetic average is sometimes found to be very tedious particulatly
when the size of items is big. In such cases a short-cut method can
be used. In this method, first an average is assumed and the deviations
of each item from the assumed average are multiplied by the respective
weights of the items. The sum of these !:-'roducts is then divided by
the total of weights and added to the assumed average. The result in
figure is the actual weighted arithmetic average of the series.
.+
1: d'1JI
· 11y a ' =x -Yw
Symb 0 1lea
Where a' stands for the weighted arithmetic average x' for the
assumed average 1: d'1)) for the sum of the products of the deviations
and the respective weights of items, and 1:D' fOr the total of the weights.
The following example would illustrate the formula : -
Example 25
From the following table calculate weighted average price of tea.
Price per lb. Lbs. sold
Rs. p.
1 00 200
1 35 275
1 62 400
1 75 150
2 00 100
2 ~ ~
2 50 50
SOilltioll. Caiculaiioll oj the weighted averag~ price oj a lb. oj tea
Deviations from
Price in Lbs. sold assumed weighted Total devia-
paisas per lb. average (175) tions
(m) (w) (d') d'w
100 zuu -75 -15,000
135 275 -40 - 9,625
162 400 -13 -15,200
175 150 0 U
200 100 +25 + 2,500
225 75 +50 + 3,750
250 50 +75 + 3,750
~w-l,247 1 1:m=1250· I 1:d' UJ = -19,825
154 FUNDAMENTALS OF STATISTICS

Substituting the above data in the fprmula,

, '+ ~ (J'w)
a =X
~ (w)

where a' stands for the weighted average; x' for assumed weighted
ayerage: w, for weight and Jr for deviation from assumed weighted
average .
• We get, .a' = 175 + - ~~58~5 = 175- 15.86 = 159.14 paisa

Thus the weighted average price of a lb. of tea is 159.14 paisa


or 1.59 rupees.

When to use weighted arithmetic av\!rage

Weighted arithmetic averllge, as we have seen, removes tllt.. bias


of items and gives a fair measure of central tendency. Though in
many cases the simple arithmetic average and weighted arithmetic
average give similar results yet there are some special circum~tances in
which the weighted arithmetic average must invariably be used. In
these cases a weighted arithmetic average is a much better measure than
the simple arithmetic average. These cases are as follows : -

(a) When the importance oj all the items in a series is not equal. We
have seen that simple arithmetic average gives equal importance -to all
the items of a series. In many cases all the items may not be of equal
importance. If it is so, a simple arithmetic average would give us
misleading conclusions. The following example would clarify the
point : -

Example 26. An examination was held to decide the award of


a scholarship. The weights of various subjects were different. The
marks obtained by 3 candidates (out of 100 in each subject) are given
below:-

Subject Weight Marks A Marks B Marks C


Statistics 4 63 60 65
Mathematics 3 65 64 70
Economics 2 58 56 63
Hindi 1 70 80 52
MEASURES OF CENTRAL TENDE~ 155
Sollition. Calctllalion of the weighted and simple arithmetic averages.

~~
....
I~~ '"81.,)
Subject /weight Marks Marks Marks ...c: .. :e~ :ern
A B C .....bO~... bOlo<
..... c.s bO~
..... C<!

Statistics 4 63 60 65

e3252 ~a
240 260
e3 S
Mathematics 3 65 64 70 195 192 210
Economics 2 58 56 63 116 112 126
Hindi 1 70 80 52 70 I 80 52
~
---
Total 10 256 260 250 633 624 648

Simple arIthmetic average of marks


256 '260 250
A = ~= 64; B= -4 =65; c= -4 =62.5
Weighted arithmetic average ~f marks
633 'n 624 648
A = -ro = 63.3; D = 1:0 = 62.4; C =11) = 64.8
Thus on the basis of simple arithmetic average B should get t
scholarship but according to the weighted arithmetic average the sel
larship should be given to C and not to B. Since all the subjects
which the candidates appear for examination are not of equal importal
the result given by the weighted arithmetic average is more accur
and C should get the scholarship.
(b) When the classes of the same group contain widely varyingfrequencl
Ifwe have to compare, say, the salaries of teachers in two towns A anc
and if teachers at both these places are classified in similar groups anc
is found that the numbers in these groups widely differ, weighted ari!
metic average would give a much better id,~a about their salaries thar
simple- arithmetic average. The following example would clarity
this point:
Example 27. Compute the weighted means of the salaries of teacherll
in towns A and B. Compare them with weighted means.
Town~ TownB
-
Schools No. of Rate of No. of Rate of
teachers salary teachers salary
~----
R_s. Rs.
1. Municipal school 25 30 34 40
2. Govt. school 26 50 35 60
3. Aided school 20 43 12 25
4. Non-aided school 19 35 11 20
5. Night school 10 32 8 25
1(10 19() 100 170
Total I
156 FUNDAMENTALS OF STATISTICS

SO/Iltion. Complltation Of the weighted and ul1llleighted means of the


salaries of teachers in towns A and B.
own B
.
Town A I
Description of -----No. or- No. of
school Salary teachers Salary teacher!
. (",) (w) (wm) (f)} ) (w) (lIml)
f--'--
Rs.
r-----
Rs.
1- Municipal
school 30 25 750 40 34 1360
2. Govt. school 50 26 1300 60 35 2100
3. Aided school 43 20 860 25 12 >
3nO
4. No~aided
school 35 19 665 20 11 220
5. Night school 32 10 320 25 8 200
~ ~u'= "i1l!Il' ~~ 170 .l:.w- };WIJI .~,
100 3895 100 4180
Weighted lIJean
1'ownA IJ= ~(1JIIJJ~ = Rs. 3895 = Rs. 38.95
};(Ill) IOn
(wm) 4180
Town B Q = t(w) =Rs· 100 = Rs. 41/S-
Thus the weighted means of the salaries of teacp.ers in towns A
and Bare Rs. ~8.95 and Rs. 41.8 respectively.
Unweighted mean
};m 190
Town A a= --;z-=Rs.-:s =Rs. 38
"E.m 170
Town B a = - = Rs. - = Rs. 34
f1 5
Thus unweighted means of the salaries in town:.. -4 and Bare
Rs. 38 and Rs. 34 respectively.
Thus we see that on the basis of simple arithmetic average we
would have concluded that the salaries of teachers in town A are on
an average higher than the sllaries of teachers in town B. But the
weighted arithmetic average reveals an entirely opposite tendericy.
According to weighted arithmetic average the salaries of teachers in
town B are higher than tbe salaries of teachers in town A. The conclu-
sion arrived by the use of the weighted mean is the correct conclusion.
In cases like the above, where the variation in the number of items in
similar groups is of a high degree. the weighted arithmetic average
shoul? be invariably used, as it gives correct conclusions. In the above
example the weighted arithmetic average is the correct average
because if we multiply the weighted arithmetic average with the totaJ
number of teachers we get the amount of the total salary. Thus, 38.95
X 100 is equal to the total salary paid to the teachers in town A ,and
MEASURES OF CEN'I1RAL TENDENCY 157
similarly 41.8 multiplied by 100 gives us this figure for town B. If
simple arithmetic average of the salaries is multiplied by the number
of teachers it would not give the correct figure of the total salary paid.
(~) Where there i.r a ~bange either in the proportion of va/un of item.r
or in the proportion of their freQllflTcies. If in example 27 above the salaries
of the teachers are doubled in both the towns, the weighted arithmetic
average would be rupees 77.9 and rupees 83.6 respectively. These
averages are in the same ratio as the original averages of rupees 38.95
and Rs. 41.8. If the salaries remain the same but the number of teachers
in each category is doubled the weighted arithmetic averages would
remain the same, namely, Rs. 38.95 and Rs. 41.8. In these two cases
there is a change either in the values of the items or in the frequencies,
but the prbportions of either of them is not affected. If the salaries
of the teachers change in such a way that the Municipal school teachers
get 20% more and Government school teachers 15% mote, then the
original proportions of the values of items are disturbed. Similarly,
if the numbers of teachers are not doubled in all categories, but in some
categories they are doubled and in others trebled, the proportions of
frequencies also change. In cases like these, where the proportions of
either values or frequencies change, weighted arithmetic average should be
used. It .rhou/d be remembered that it is not the ab.rolule .rize of the weight
that matter.r: it i.r the relative size of the 1l eights that actually affects the average.
l

(d) When ratio.r, percentage.r or ratn are hfin.g averag.a. Suppose the
heights of four groups' of persons are measured and it i~ found that
Scy, of the persons in group A, 10% in group B, 8% in group C, and
4~Z in group D hav"C heights less than 50" and it is required to find
mit the percentage' of people in all the groups combin("d together
:'Whose heights would be less than 50'. Simple arithmetic average of
these percentages would give a misleading conclusion. The reason is
that we do not know the number of persons in each group. In such
cases we should presume certain numbers in each group, and then on
that basis calculate the weighted arithmetic average, which gives the
correct results. If suppose the number of persons in these groups
were respectively 50, 70, 75 and 55 the weighted arithmetic average
can be ca1culated by taking these frequencies as weights of the various
percentages.
The percentage ratio of people with heights less than 50' (in all
the groups combined together) wo~ld be : -
(5 X 50)+(10 X 70)+(8 X 75)+(4x 55)
----50+ 70+ 75+5-5- - -
or
250+700+600+220
250------
or
1770
250 or 7.08%
158 FUNDAMENTALS OF STATISTICS

(e) Whm it is desired 10 caleulate the average of series from the average
oj its component parts. We have already discussed in the section on
simple arithmetic average, how the means of two or more compo~ent
series can be combined in one. The method involves the calculation
of weighted arithmetic average of the different means, using the number
of items in each case, as the weights. Thus, if the average of a series
is 20 and the number of items in it is 10 and the average of another_ series
is 25 and the number of items iD.!it is 15, the combined average of the
two series would be equal to the weighted average of these two averages,
the weights being 10 and 15 respectively (the number of items in each
case). The weighted arithmetic average would be : -
(20x;0)+(25x15) 23
10+ 15 or
The simple arithmetic average 10f. the two averages would be
--2 or 22.5. ThIS'"IS an Inaccurate
20+ 25 >
It 15 mu1tIP
average, as. 1"f"' . I"Ie d
by the total frequency (now 25) it would not give the correct aggre-
gate. If, however, we multiply the weighted arithmetic average or 23,
by the total frequency or 25, the product would be 575 which is the
total of the aggregates of the two series (200+375).
Discriminate weigbting. We have seen that in many cases the SImple
arithmetic average and weighted arithmetic average differ considerably,
and the question that arises is, which of the two averages should be used
in such cases to represent the series? For this, it is necessary to study
the weights of the items in relation to their si2:e. Sometimes it would
be found that big items in a series are associated with big, weights and
small items with small weights. In such cases weigh~d arithmetic
average would be more than the simple arithmetic average .... Thus the
simple arithmetic average of natural numbers 1, 2, 3, 4, 5, 6, 7, 8, 9,
and 10 is 5.5, and if these numbers are associated with weight~ whose
respective values are 1,2,3,4,5,6, 7, 8, 9, and 10 the weighted arithmetic \
average would be 7.0.
If, on the other hand, big items are associated with small weights
and small items with big weights, the weighted arithmetic average
would be less than the simple arithmetic average. If the weights in
the above case were respectively 10, 9, 8, 7, 6, 5, 4, 3, 2, and 1 the
weighted arithmetic average would be 4.0 whereas the simple arith-
metic average is 5.5.
Chance weighting. If weights are indiscriminately associated with
values or, in other words, if big items are associated with both big and
small weights and similarly small items with both small and big weights,
the weighted average and the simple average would not materially d11f~r.
Thus if for the'values of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10 the weights
wett respectively 10, 3, 6, 4, 5, 8, 2, 1, 9, and 7 the weighted arith-
metic average would be 5.4 and the simple arithmetic average is 5.5.
MEASURES OF CENTRAL TENDENCY 159

We arrive at an important conclusion by the study of the above


illustrations. If weights are biased in one direction or the other, simpl,
arithmetic average and the weighted arithmetic average would significant!J differ
from each other. But if weights are left to chance (or if there is chance weighting)
and big and small weights are associated with both big and small values, the
weighted and the unweighted average would be almost equal.
Rational weighting. The question that arises now is, should weights
be purposively selected or should they be left to chance? If there is
purposive weighting, and if it is biased the weighted average would
give a fallacious conclusion and as -suc:h it should not be used. If
weights are left to chance there would not be a significant difference
between the weighted and the unweighted averages, and then there
is no point in weighting an average. Thus weights should be neither
biased nor should they be left to chance.
Weights should be rationaliSed, or we should use what ire called
"ratianalweights." But this does not solve the problem. We shall have
to decide what we mean by rational or logical weights.We shall
leave this point to be discussed in the chapter on Inde~ Numbers.
However, it should be mentioned here that rational weights would
differ in different types of enquiries and the nature and pUlJ>ose of
the enquiry would have to be carefully studied in deciding what would
be the rational or logical weights in a particular case.
If rational weights are used, and if there is a diff('rence between
the simple and weighted averages, invariably weighted average should
be used as it would represent a series much better than a simple arith-
metic average in such a case.
WEIGHTED GEOMETRIC MEAN
Just as arithmetic average can be weighted similarly it is possible
to calculate weighted geometric mean. Weighted geometric mean is the
nth root of the product of various valucs raised to the power of their respectivl
weights. Thus if ml' m2 and m 3 , etc., stand for the values of a variable,
WI' w2 , w3 , etc., for their respective weights, n for the sum 'of the weights.
Weighted geometric mean or

calculating geometric mean

The following example would illustrate the formula : -


Example 28. From the following data calculate the weighted
index number, using the geometric mean.
160 FUNDAMENTALS OF STATISTICS

Group Index Number Weight


Food 125 7
Clothing 133 5
Fuel and lignt 141 4
House rent 173 1
Miscellaneous 182 3

Solution: Calculation 'of tbe weighted il1dex ntlmber (Direct method)

Index Welght logs. of


Group Number ..- Index Nos. Weightx1og.
(m) (w)
Food -r2S-' 7 -~:0%9-- ·---f4.678r--·-
Clothing 133 5 2.1239 10.6195
Fuel and light 141 4 2.1492 8.5968
House rent 173 1 2.2380 2.2380
Miscellaneous 182 3 2.2601 6.7803
Lw,=20 42.9129 -

Weighted geometric mean

.g =anti-Iog. [ .... m X Ul) ]


L (lO!w.

. 1og. (42.9129)
=antl- 20

= anti-log. 2.1456
= 140 (to the nearest whole number)
The method discussed above is the direct method. We have
seen in the calculation of simple geometric mean that a short-cut method
can be used by assuming a geometric mean. Weighted -geometric
mean can also be calculated by the short-cut method. The deviations
of the logs. from the log. of the assumed geometric mean are multi-
plied with their respective weights, and the sum of the products is divided
by the total of the weights. The resulting figure is added to the log.
of the assumed geometric mean. The anti-log. of this figure wonld
give the actual weighted geometric mean of the series.

Symbolically g' =anti-log. (log. of assumed m'ean + _!.!_w.)


LllI

where d' stands for the deviations of the logs. from the log. of the
assumed mean.
MEASURES OF CENTRAL TENDENCY r61

Ex:a.mple 28 would be solved by this method as follows : -


Sborf-&lIf TIIefooa

Index LOgari- Deviations WeIght


Group Number Weight thms from log. X
mean 2.000 Deviation

Food 125 7 2.0969 .0962 .6783


Clothing 133 5 2.1239 .1239 .6195
Fuel and light
House rent
Miscellaneous
141
173
182
4
1
3
2.1492
2.2380
2.2601
.1492
.2380
.2601
I .5968
.2380
.7803
L.,=ZO D/J.,=2.9129

Weighted geometric mean


=anti-Iog. [ assumed log-mean + ~: ]
= anti-log. [ 2.000- 2.;~29] =anti-log. 2.14564

=140 (to the nearest whole number)


Thus the weighted index number is 140.
Weighted geometric mean, as we shall see later on, is very useful
for the calculation of index numbers. In most of the index numbers
calculated nowadays only weighted geometric mean is used. We shall
discuss the utility of weighted geometric mean particularly in tbe cal-
culation of index numbers in another chapter.
'WEIGHl'El> HAJD{ONIC MEAN

Weighted harmonic mean is calculated almost in the same manner


as weighted geometric mean. The reciprocals of items are multiplied
by their respective weights and the sum so obtained is divided by the
total of the weights. The reciprocal of the resulting figure is the weighted
harmonic mean of the series.
1
Symbolically h'... [ i:r] .
Where h' stands for the weighted harmonic mean, r for the reci-
procal of various items and II' for their respective weights.
The following illustration would clarify the formula.
Example 29. Calculate the weighted harmonic average of the
following items:
1tr
-16%
FUNDAMENTALS OF STATISTICS

Items Weight
t 5
.5 10
10.0 20
45.0 10
175.0 15
.ot 2
4.0 15
11.2 8
Soilltion. CD1/Jplltation of the wlighted harmonic mlan
Items ReCIprocals Weight WClghtXftecl.

1 1.0000
- 5
/
procals
5.0000
.5 2.0000 10 20.0000 )
10.0 .1000 20 2.0000 (
45.0 .0222 10 .2220
175.0 .0057 15 .0855
.01 100.0000 2 200.0000
4.0 .2500 15 3.7500
11.2 .0893 8 .7144
85 1231.1719

The weighted hannonic mean _ ~


1

1:w
. ___ , 1:.,,-
-ReClp~ };W
• cal 231.7719
... ReClpro 85 . --, 2727
.. RcClproQU.
-.3663
QueStions
1. What is meant by measures of central tendency? What are the characteristics
of a good measure of central tendency i'
2.. Define arithmetic average, geometric mean, median and mode. Which of
these is most roprosentative and why i' (M. Cam. Au'" 1945).
3· What is a statistical average? What are the desirable properties £01' an ave-
rage to possess? Which of the averages, you know, possess most of these proper_
ties? (M...4. Delbi,19H).
4. What are the algebraic properties of the arithmetic average?
. ,. Define weighted average. How does it ditIer from a simple average? Is a
weIghted average better than a simple one? Give reasons.
6. Discuss critically the use of weighted mean in statistics.
(B. CDIII. Cal&tll/", 1937).
,. What are the algebraic .properties of the geometric mean? Is it a better
average than median and mode i' If So how?
8. Compare ~nd contrast the relative merits and demerits of the variouS measur es
of central tendency which you know.
MEASURES OF CF.NTRAL TENDENCY 163

9. Write a note on the limitation of averages.


10. Is there any relationship between mean. median and mode in a moderatel,
asymmetrical frequency distribution? Discuss.
II. What is the p'utpose ,erved by an average? Discuss the special advan.'
tages attached, to the different averages and illustrate their use, (B. eM' .• Al,a. 19'.,)
u. On what considerations would you select an average for ltudying a parti.
cular phenomenon. In which cases geometri~ mean is better than orher averag•••
13. Comment on the following statements : -
(a) Median is more representative than mean beClluse it is less affected
by the values of extremes.
(.) True value of mode cannot be calculated exactlv in a continuous fre.
q,!ency distribution.
<,,) The harmonic mean of a series of fractions is tFie s'tme a~ the reciprocal
of the arithmetic mean of the series.
14. Statistics help collective agreement of wage adjustments. Whllt data are
required for the consideration of a revision in wage rates in a factory? Which ave-
rage will you utilise and why? (M. Co1ll •• "AJlalJabaa. 1943).
15. Compare the merits and demerits of the median and the mode. In which
of the followinp; problems would they be mosr useful :-
(a) Skill mcasurement (b) Size of holdin$s (&) Comparison of i.·
tell~nce Cd) Marks obtained in an examinlltl6n (e) Hejghts and wdghts
of students. (M. A .• Ava. 1949)'
16. uThe figl,re of 2.a children per adult female waS felt ttl be in some respects
absurd ann the Royal'Commission suggested that the middle classes be paid monel'
to increase the average to a rounder and more convenient number ,.. (Plnt&b) •.
Commenting on the above statement discuss the limita~ions of the arithmetic
average.
17. Given below are the marks obtained (out of aoo) by the IS .tudents in an
interview. held by a Public Service Commission. Calculate the simple arithmetic
average of the series.
Size of series
180 lao 108
160 IZ4 loa
l}l 115 101
146 IU 8S
143 tIO 68
- -;8-:-' The-follo~g dat;gi;;;;;~dist:ibution. ~bta~~'by-;~ss;ng ten-Pen.
nics loa4 times and recording the number of head, that appearetj on each toss. 1t'hat
is the average number of heads per toss ?

Number of heads Frequency Number of headl Frequency


o I 6 209
16 7 118
2 42: 8 53
la6 9 4
4 199 10 3
19. 5 Givcn the following air!quency distribution. calculat~ the Arithmetic Ave-
rage.
_'UNDAMENTALS OF STA~ISTICS

Monthly Wages Workers Monthly Wages Workers


Rs. Rs. Rs. Rs.
12.5-17 .5 2 37.5-42.5 4
17.5-22.5 22 42.5-47.5 6
22.5-27.5 19 47.5.-52.5 1
27.5-32.5 14 52.5-57.5 1
32.5-37.5 3
(M. Sc., AgriCillturt, Punjab, 1943).
20. The following table gives the r"ight of 350 men. ,C.alculate mean height of
the group.
Height in inches Number of Height in inches Number of
P!:rsons Persons
59 1 67 131
61 2 69 102
63 9 71 40
65 48 73 17
21.
'.
The frequency distribution of cost of production of Gur in rupees per maund
{or different holdings in two districts is given below. Find the average cost in each
district, and test whether there is any significant difference.
Cost in rupees District District
per maund A B
2- 3 9 1
3- 4 32 10
4- 5 37 3~
5- 6 21 23
6-- 7 13 21
7-- 8 7 14
8- 9 5 10
9--10 2 9
10--11 1 5
11--12 2 2
12--13 1 1

Total 130 130


(1. C. S., 39).
22. The frequency distribution below gives the cost of production of sugarcane
in different ~oldings. Obtain the Arithmetic Mean.
Cost Frequency Cost Frequency
2--6 1 18- 52
6- 9 22- 36
10-- 21 26- 19
14:.,_ 47 30-34 3
(Indian Audil a11d Accoullt; Serpice Exam., 1941).
23. The following table gives the population of males at different age-groups of
the U. K. and India at the time of the census of 1931.

Age-group U. K. Lakhs India Lakhs


0- 5 18 214
5-10 19 258
10-15 10 222
15-20 18 157
20-25 16 145
25-30 14 Hi\
30-40 27 257
40-50 25 184
50-60 19 120
Above 60 17 100
MEASURES OF CENTRA!.. TENDEN.CY 16;
Compare the average age 'of males in the two countries, and account for the differ·
ence if any. (B. Co",., AI/d. 1936, 41).
24. The following table gives the male population of Kanpur and Jaiput in 1931 :-
Age group Populatibn of males in Thousands
(years) Kanpur Jaipur
0-- 5 14 9
5--10 13 8
10--15 13 8
15--20 13 7
20--30 33 15
30-40 29 12
40--50 17 9
50-60 7 6
60-80 4 4
Calculate the average age of males at Kanpur and Jaipur separately, and a~unt
for the difference, if any. (B. Com. Allahabad, 1952).
25. Find the average marks of a student from the following table :_
Marks No. of Students
Below 10 25
20 40
.... 30
I,(}
50 95
60
75
60 125
" 70 190
80 240
26. Compute the arithmetic mean from the follOWing data :--
Salary in Rs. Frequency
Below 50 30
50-- 70 16
70--100 19
100--110 20
110--120 10
120 and over 5
27. Find the avenge wage of a labourer from the following table
Wage in No. of Wage in No. of
Rs. above labourers Rs. above labourers
o 650 40 300
10 500 SO 275
·20 425 60 250
30 375 70 100
28. The following table gives the number of persons with different incomes in the
U. S. A. during the year 1929.
Income in No. of Income in No. of
thousands persons thousands persons in
of dollars in lakhs of dollars lakhs
Under 1 13 10- 25 27
1- 2 90 25- 50 6
2-- 3 81 50- 100 2
3- 5 117 100--1000 2
5--10 66
Calculate the average income per heAd. (B. CIIIJI. L'Jtk., 1939),
166 FUNDAMENTALS OF STATISTICS

29. Make a frequency table having grades of wages with class intervals of two
Annas each from the following data of daily wages received by 30 labourers in a certain
factory and then compute the average daily wages paid to a labourer.
Daily wages in annas,
14, 16, 16, 14, 22, 13, 15, 24, 12, 23,
14, 20, 17, 21, 18, 18, 19, 20, 17, 16,
15, 11, 12, 21, 20, 17, 18, 19, 22, 23.
(B. A. Hons. PUlljab, 1945).
30. The following table gives the monthly average of automobile production
in the United States for the year 1926-1932 (unit 1,000 cars).
Year Production Year Production
1926 358.4 1950 279.7
1927 283.4 1931 199.1
1928 363.2 1932 114.2
1929 446.5
Calculate the average per cent of change per year.
31. The following is the table of the age of 30 adult. persons

Digits (Division of class intervals)

Years 1 2 3 4 5 6 8

20-29
°
2 1 2 2 1
7
1 1
9 Total
10
30-39 2 1 2 1 2 8

40-49 2 2 1 1 6
50-59 1 2 1 4

60-69 1. 1 2

Thus there' are two persons of 23 years, one of 57 years and so on.
Find out the mean of the series
(a) by using only totals of class intervals.
(b) by using the entire data
32. A candidate obtains the following percentages in an examination: Sanskrit
75 ; Mathematics 84 ; Economics 56 ; English 78 ; Politics 57 ; History 55 . Geo-
graphy 47. It is agreed to give double weight to mlrks in English, Mathemati~s and
Sanskrit. What is the Weighted and unweighted mean ?
33. Explain what is meant by weighted ave rag,?, and discuss the effect ofweighting..
Calculate (i) the unweighted mean of the pn.ces in column III and (ii) the mean
-obtained by weighting each price by the quantIty consumed.
I II III
Articles of food Quantity consumed Price in rupee per
maund
Flour 11.5 rods. 5.8
Ghee 5.6 mds. 58.4
Sugar .28 mds. 8.2
Potato .16 mds. 2.5
Oil .35 mds. 20.0
(M. A. Cal., 1937).
MEASURES OF CENTRAL' TENDENCY

34. The following table gives the number of employees and their monthly earnings
in two factories of a particular city :

A B
....
Description No. of Monthly No. of Monthly
of workmen employees earnings emplQyees earnings
Rs. Rs.
(0) 3 800 2- 750
(b) 20 145 10 150
(f) 15 50 15 60
(d) 25 30 25 50
(e) 80 35 40 40
(f) 250 20 120 20
Compare the weighted average.

35. Suppose that an automobile makes a 200 mile trip, covering the first 100
miles at the rate of 50 miles an hour and the second 100 miles at the rate of 40 miles
an hour. What is its average speed ?

36. A railway train runs for 30 minutes at a speed of 40 miles an hour and then,
because of repairs of the track runs for 10 minutes at a speed of 8 miles, an hour, after
which it resumes its previous speed and runs for 20 minutes except for a period of 2
minutes when it had to run over a bridge with a speed of 30 miles per hour. What
is its average speed ?
37. The following table indicates the increase in cost of living over July 1946,
for a working class family as at 1st January 1955, and the weights assigned to various
groupS.

Group Percentage increase Weights


?ood 29 7.5
Rent 54 2.0
Clothing 97.5 1.5
Fuel and lighting 75 1.0
Other items 75 0.5
Find out the weighted average of the increase in cost of living.
(B. Com. Allahabad, 1938).

38. The table shows the age distribution of married females according t9 sample
census of 1941 in the Baroda State.

Age Number of married Age Number of married


females females
0- 5 3 35-40 1292
5-10 31 40-45 963
10-15 410 45-50 762
15-20 1809 50-55 531
20-25 2446 55-60 317
25-30 2223 60-65 156
30-35 1723 65-70 59
70-75 37

Calculate the median age of married females and also the two quartiles.
(T. A. & A. S •• elr., EXalll., 1942)
FUNDAME.Nl'AT-,~ OF Sl',o\TISl'ICS

39. Calculate the values of the median and the two quartiles for the following :-

Limits f~r percentage recovery Factories in India


of sugar on cane (1935-36)
8.0-8.3 2
8.2- 5
8.4- 4
8.6- 11
8.8- 11
9.0- 11
9.2- 13
9.4- 10
9.6- 7
9.8- 6
10.0- 3
10.2- 1
10.4-10.6 1
85
(M. A. Flllifab Univmity, 1943).

40. Calculate the mean and median for the following distributioo.

Weight of boys in Number Weight of boys in Number


a certain class a certain class
100-104 4 140-144 500
105-109 14 145-149 430
110-114 60 150-154 260
115-119 138 155-159 128
120-124 206 160-164 66
125-129 298 165-169 28
130-134 380 170-174 12
135-139 450
2974
(Illdian AI/dit alld Areol/Ills Service Exam., 1938)

41. \he foll~wing ?ble !!ives t?e distribution of the male. and female popu!at:r:::!.
of a certam area jfi India. By finding the mean age, the median age, and the ilpper
and lower quartile ages, make comments on the age distribution of the tWG sexes
in the area :-

Age group Male Female


0-9 2756 2787
10-19 2124 2032
20-29 1977 1724
30-39 1481 1485
40-49 1021 1022
50-59 610 1579
60-<l9 245 269
70-79 67 78
80-89 16 20
90-99 3 4
Total 10,000 10.000
(T. C. S., 38).
42. Calculate the average, median, and upper and lower quartile ages in the
following table :-
MEASURES OF CENTRAL Tm:l'DENCY

Age-group Population in thousand


(1881) (1931)
0- 4 3,520 3,280
5- 9 ~160 3~00
10-19 5,340 7.200
20-29 4,560 6,640
30-39 3,420 5,980
40-49 2,660 5,240
50-59 1,900 3,780
60-69 1,320 2,440
70-79 600 1,220
80 and over 120 320
(M. A., Agra, 1940).
43. Determine the quartiles and the median for the following table :-
Income No. of6~ersons
Below Rs. 30
Rs. 30 and below Rs. 40 167
Rs.40 50 207
Rs. 50 60 65
Rs. 60 70 58
Rs.70 .. 80 27
Rs. 80 and over 10
Total 603
(Bombay, 1942).
44. The following table classifies the she-buffaloes of India in 1940 according to
the yield of milk per day. Calculate from the data the mean and the median yield
of milk per she-buffalo (and its co-efficient of variation) :-
Yield per day in lbs. No. of she-buffaloes in
thousands
Upto 1 114
Above 1 to 2 2,005
2 to 3 7,706
3 to 4' 4,590
4 to 5 2,080
5 to 6 240
6 to 7 3,580
Total
20,315
(P. C. S., 43).
45. Find the median and the quartiles
Amount of wages Number of workers
receiving such. rate of wages
Not exceeding 8 shillings 85
Over 8 sh. but not exceeding 10 sh. 65
Over 10 sh. .. 12 sh. 59
Over 12 sh. .. .. 14 sh. SO
46. Amend the following table, and locate the median from the amended table.
Also measure the magnitude of the median so located :-
Size Frequency Size Frequency
10-15 10 30-35 28
15-17.5 15 3.5--40 30
17.5-20 17 45 ana onwards 40
20-30 25
(B. Com. Allahabad, 1942).
170 FuNDAMENTALS OF STATISTICS

47. The following table gives the marks obtained by 65 students in Statistics in
:ertain examination :-
Examination marks Number of students
More than 70% 7
60% 18
50% 40
40% 40
30% 63
20% 65
Calculate the median of the above series.
48. Find out the median of the following series
Wages No. of labourers
Rs.
60-70 5
50-60 10
40-50 20
30--40 5
20-30 3
49. The following is the age distribution of candidates appearing at the Matr
culation and Intermediate Arts examinations of the Patna University in 1937.
_Age in yeats 12- 13- 14- 15 16- 17 18 19 20- 21 22 Tota
Matriculation 5 48 189 303 522 980 981 794 515 474 X 481
Intermediate X X X 5 45 87 127 150 155 127 175 87
Compare the median and modal ages of the Matriculation candidates with thos
of 1. A. candidates. (M. A. Pallia, 1940
50. The following table shows the frequency with which profits are made. Wha
is the Mode ? I
Frequency
Exceedi.ng Rs. 3,000 and not exceeding 4,000 83
4,000 5,000 27
.. 5,000 6,000 25
" 6,000 7,000 50
.. 7,000 8,000 75
" 8,000 9,000 38
" 9,000 " 10,000 18
"
51. Find the modal wage group from the following table :
Wages in Rupees No. of labourers
Above 30 520
40 470
50 399
60 210
70 105
" 80 45
90 7
52. Find out the median and the mode for the following table
No. of days absent No. of students
Less than 5 29
10 224
15 465
20 582
25 634
" 30 644
" " 35 650
653
" " 40 655
45
"
, -, 1.
MEASURES OF CENTRAL TENDENCY , '.

53. Find the median and mode from the following table :
Class Frequency Class
0- 3 Frequency
4 18-20 24
3- 6 8 20-24
6-10 14
10 24-25 16
10-12 14 25-28
12-15 11
16 28-30 10
15-18 20 30-36 6
54. Find the modal wage from the following data :
Weekly Wage No. of wage-eamers
Sh. d. Sh. d.
12 6 to 17 6 4
17 6 22 6 44
22 6 27 6 38
27 6 32 6 28
32 6 .. 37 6 6
37 6 42 6 8
42 6 47 6 12
47 6 .. 52 6 2
52 6 .. 57 6 2
(B. Com., Rajplliana, 1949)
55 1'lnd out the mode of the following 'Seri<;s ' -
Size 0 It n::. Frequency Size of item Frequency
0-9.99 10 40-49.99 11
10-19.99 14 50-59.99 13
20-29.99 16 60-69.99 17
30-39.99 14 70-79.99 13
56. Calculate the geometric mean of the following figures : -
5, 10, 192, 14,374, 20,498, 1,20,674, 15,491

57. Compute the weighted geometric average of relative prices of the following
'~mmodities for the year 1939 (Base year 1938-price 100) :_

Weight
Commodity Relative Price (value produced in 1938)
Corn 128.8 1,385
Cotton 62.4 819
Hay 117.7 842
Wheat 99.0 561
Oats 130.9 408
Potatoes 143.5 194
Sugar 125.6 142
Badey 150.2 100
Tobacco 101.1 103
Rye 116.2 25
Rice 117.5 17
Oil seeds 78.7 29
How does it differ from the unweighted geometric mean, and why ?
(B. Com., Alld. 1943)

~, 58. The following table gives index numbers for various items entering the cost
)£ liVing. Find an index of the cost of living by computing a weighted average of
;!lese items. The weights to be used are also given in Ithe table : -
FUNDAMENTALS OF STATISTICS

Table
Items Index Weight
1. Clothing 77.3 13
2. Food 74.5 43
3. Fuel and light 85.8 6
4. Housing 64.6 18
5. Sundries 92.5 20
59. Compute the geometric mean of the following series
Marks No. of students
o-tO 5
10-20 7
20-30 15
30-40 25
40-50 8
60
60. The annual incomes 'of fifteen families are given below in rupees : -
80, 2500, 90, 1200, 1450, 7200, 120, 1060, 150, 480, 360, 96, 200, 520 and 60.
Calculate the Harmonic Mean.
61. The following table gives-(o) the total number of persons possessing hold-
ing~ of different sizes and (b) the total area of land comprised in holdings of different
sizes in U. P. during the year ending on 30th June, 1945 :
Total number Total area in
Size of holdings in acres of hersons in thousands
t ousands of acres
Not exceeding .5 2,643 925

" .. 1
2
.. ....
Exceeding .5 but not 1
2
3
1,696
2,205
1,430
1,556
3,361
3,373
3 "
" .. 4 "
5 "
"
"
4
5
6
992
703
515
3,458
3,150
2,817
" 6 " " 7 378 2,446
" 7 " "
"
"
8 "
9
..
" ,"
"
"
8
9
10
283
216
171
2,112
1,830
1,617
" 10 12 206 2,264
" 12 " " 14 138 1,776
" 14 " " 16 96 1,424
" 16 " " 18 68 1,252
" 18 " " 20 51 972
" 20 " " 25 70 1,570
" over 25" " 115 5,310
Grand Total 12,276 41,113
(i) Calculate the average size of holdings in the U. P.
(ji) Assuming the minimum size of an economic holding to be 10 acres-
(1) Calculate the percentage of the area under uneconomic holdings in 1945 in
the U. P.
(2) Calculate the percentage of persons having uneconomic holdings in the
U. P. in 1945. (P. C. S. 1951)
62. (0) Define a 'weighted mean.'
If several sets of observations are combined into a single set show that the means
of the combined set is the weighted means of the several sets.
(b) The number of asthma sufferers whose first attacks came at various ages is
given in the follOWing table. CaIculate the mean age at the first attack by any method.
MEASURES OF CENTRAL TF.NDENCY

T1\.aLE
Age at
first 0-5 5-10 10-15 15-20 20-25 25-3030-35 35-40 40-45 45-50 50-55 55-60 60-65
attack
Number
of cases 298 113 64 61 70 81 I 77 64 53 40 35 24 20

(I. A. S. 1955)
63_ Fi~d the mean, mode, standard deviation and co-efficient of skewness for
the followtng . -
Year under 10, 20, 30, 40, 50, 60.
No. of persons 15, 32, 51, 78, 97, 109.
(P. C. S. 1952)
64. What are the desiderata for a satisfactory average? Point out the special
characteristics of the arithmetic mean, the median a~d the geometric mean.
Explain the step-deviation method for finding out the arithmetic mean of a
frequency distribution. Derive the useful formula and apply it to find the arithmetic
mean of the distribution.
Variate 5, 10, 15, 20, 25, 30, 35, 40, 45, 50.
frequency 20, 43, 75, 67, 72, 45, 39, 9, 8, 6.
~ vi (P. C. s. 1954)
6? The following table gives the monthly income of 24 families in a certain
locallty :-
Serial No. of Monthly income Serial No. ot Monthly income
the family
1
in Rupees
60
the family
13
in Rupees
96
I
2 400 14 98
3 86 15 104
4 95 16 75
5 100 17 80
6 150 18 94
7 110 19 100
8 74 20 75
9 90 21 600
10 92 22 82
11 280 23 200
12 180 24 84
Calculate the arithmetiC average, the median and the mode of the above incomes.
Which average would represent the above series the best? Give reasons.
(P. C. S. 1955),
/66. Figures concerning the number of deatbs in two towns in a particular year are
given below : -
Town A Town 15
~ge-group No. of persons Deaths No. of person s Deaths
In years. living living
0-10 500 100 12,000 4,800
10-20 3,000 150 6,000 360

. 20-30
30-40
over -40
7,000
10,000
19,500
200
300
750
9,000
25,000
48,000
180
250
576
Total 40,000 1500 1,00,000 6,166
Compare the health conditions in both towns.
(P. C. S. 1955)
174 FUNnAMRNTAIS 0.1' STATISTICS

67. You are given the following statistics of population and unemployment-in ;-
(0)" Your country as a whole for a standardised age distribution.
(b) The local administrative area in which you live.
Calculate (i) the standardized unemployment rate in the country as a whole, (ii)
the standardised ratC of unemployment in the local Mea and (iii) the crude rate of
unemployment in the local area.
Age (Years)
16--;3lr 30---4S- 45-60 60-75 Total
Standard population
Age constitution 250 350 300 100 1,000
Unemployment rate
per cent 5 8 12 15 -
Local population
Age constitution 300 300 350 50 1,000
Unemployment rate
per cent 4 9 12 20 -
(P. C. S. 1956).
68. Fifty items sold in Department A of the Comer Store had a mean price of 30
rupees. Seventy-five items sold in Department B had a mean price of 20 rupees. The
mean price of commodities sold in Departments A and B was 24 rupees. Is it right?
69. If Xl and,)(2 are two positive values of a variate, prove that their geometric.
mean is equal to the geometric mean of their arithmetic and h:trmonic means. .
70. (0) An examination candidate's percentages are' ; English, 73; French, 82;
Mathematics, 57; Science, 62; History, 60; Find the Candidate's weighted mean if
weights of 4, 3, 3, 1, 1 respectively are allotted to the subjects.\
(b) The average percentages for the same examination were 57, 52, 48, 55, 50
for the above subjects respectively. Find the weighted mean for the whole examination.
71. "The inherent inability of the human mind to grasp in its entirety a large body
of numerical data compels us to seek relatively few constants that will adequately des-
cribe the data."-R. A. Fisher.
Comment.
72. Find the Average ages of men And WOmen blood donors from the following
data : -
Age, years 10-19 20-29 30-39 40-49 50-59 60-69
Frequency, Men 3016 6894 9229 5714 3575 1492
Women 7845 16,008 13,107 9685 6374 2137
Age years 70-79 80-89 90 and over
Frequency, Men 170 9 1
Women 173 9
73. A candidate obtains the following percentages in an examination : Latin,
75; Mathematics, 84; French, 56; English, 78 ; Science, 57 ; History, S4 ; Geo-
grapby 47. It is agreed to give double weight to the marks in English, Mathematics
and Latin. What is his weighted mean ?
74. Tbe frequency distributions of real income in rupees of the employees of a
big industrial concern, in two different periods. are as given below
Frequency
Income in Rs. Period t Period 2
0-50 90 200
50-100 150 400
100-150 100 120
150-200 80 100
200-250 70 150
over 250 10 30

500 1,000
!'dEASUllES OF CENl'RAL TENDENCY
175
The total income of 10 employees In the frequency class '':Iver 250' in Period 1 is
Rs. 3,000 and that of 30 employees in Period 2 is Rs. 18,000.
(a) Compute the mean and median incomes for the two periods.
(b) Write a very brief note on the .relative economic conditions of the employees
in the two periods, supporting your statements by analysis of the given
data, if, necessary.
(I) Every employee belonging to the top 25 per cent of the earners is required to
pay 1 per cent of his income to a worker's relief fund. Estimate the in-
crease in contributions to this fund from Period 1 to Period 2; (1. A. S.1958)

75. The following are the monthly salaries in rupees of 30_ ~mployees of a firm:-
139, 12.6, II4, 100, 88, 62. 77, 99, 10 3, 144. 148, 63. 69. 148, 132., II8. 142.
16, 12.3, 104,95, 80,85, 106, 12.3, 133, 140, 134, 108,12.9.
The firm gave bonus of Rs. 10, 15, 20, 25, 30 and 35 for individuals in the res-
pective salary groups-Exceeding 60 but not exceeding 75, exceeding 75 but not ex-
ceeding 90 and so on upto exceeding 135 but not exceeding 150. Find out the average
bonus paid per employee. (B. Com., B. H. U.)
76. For a certain group of 'Saree' weavers of Banaras, the median and quartile
earnings per week are Rs. 44.3. Rs. 43.0 and Rs. 45.9 respectively. The earnings for
the group range between Rs. 40 and Rs. 50. Ten percent of the group earn under
Rs. 42 per week, 13 percent earn Rs. 47 and over and 6 percent Rs. 48 and over. put
these data into the form of a frequency distribution and obtain an estimate of the mean
wage. (P. C. S., 19~6).
77. From a frequency.distribu.tion of marks in AcCOunts of 100 students, mean
was found to be 35. Later It was discovered that the marks 35 were mis-read as 25
Find the concet mean.
78. From the following data. find the missing frequency.
No. of Tablets. 4 - 8 - I2. - 16 - 2.0 - 2.4 - 2.8 - 32. - 36 - 40
No. of Persons cured II 13 16 14 9 17 6 4
The average number of tablets given to cure fever was 20.

79. Calculate the Median, Quartiles, 6th Deciles and 70th Percentile from the
following data : -
Marks less than 80 70 60 SO 40 30 2.0 10
N? of Students. 100 90 80 60 52 2.0 13
(B. Com., Raj., 1951).
80. (a) From the data given below, find the mode:

Age 2.002 5 25-3 0 30-35 55-40 40-45 45-S o 50-H SS-6O


No. of person 70 80 180 15 0 12.0 70 10

(b) If the mode and the median of a moderately asymmetrical series are 16
inches and 20.2 inches respectively, compute the most probable median.
(D. C01t1., Delbi, 1960).

SI. Recast the following cum_ulative table inlO the form of ~n "cdinary
frequency distribution and determme the value of Mode by usmg formula
Mean.Mode.= ~(Mean-Mcdian). -'~ -
176 FUNDAMENTAlS OF STl'TISTlCS

No. of days absent No. of students No. of days ab- No. of students
sent
- - - _ - __ - - - ... -_- - -_ ---_ ....
Less than 5 29 Less than 30
[0 224 H
[5 46 5 40
20 582 45
25 634
(B. Com., Luckno1Jl, 1957)
82. A taxicab drives from a plain-town to a hill-station, 60'miles distant, at a
mileage rate of 10 miles per gallon of petrol and on the return trip at 15 miles per gallon.
Find the harmonic mean rate of mileage per gallon. Verify that this is the proper
average in this particular case.
83. An aeroplane flies around a square the sides of which measure too m~les
each. The aeroplane covers at a speed of 100 miles per hour the first side, at 200 mdes
per hour the second Side, at 300 miles per hour the third side and at 400 m.p.h. the
fourth side. What is the average speed of the aeroplane around the square ?
8-4. A train moves first to miles at the rate of 10 m.p.h. next 20 miles at the rate
of 30 m.p.h .• and then due to repairs in the track another 5 miles at the speed of 5
miles per hour. It covers the last 15 miles at the rate of 10 miles an hour. Find the
average speed of the train per hour.
85. The mean wage of 50 labourers working in a factoil is Rs. 38. The mean
wage of 30 labourers working in the morning shift is Rs. 40. Find the mean wage
of remaining 20 labourers working in Evening shift.
86. The teachers of statistics reported mean examination marks of 37.5, 41 and
42 in their classes which consisted of 32, 2.5 and 17 students respectively. Determine
the mean marks for all the classes taken together.
87. The following table gives the distribution of the average weekly wages of
100workers in a factory. Calculate (i) Average weekly total wage bill of these
workers; (ii) The weekly wage of a worker whose wage is greater than that of
75% workers.
Weekly wages 16-20 ZI-2l 26-3 0 31-35 36-40 41-45 46-50
No. of workers 7 12 It 8
Weekly wages 56 -60
No. of workers
88. The monthly incomes of 8 families in rupees in certain locality are given
below. Calculate the Mean, the Geometric mean and Harmonic Mean, and confirm
that the -relationship a > g > h holds true.

Family A IB C D E I- F I G I H
Income : (Rs. J 70 r 10 500 1 75 8 1 25 0 1 8 I 42
(Sagar, B. Com., II,1965)
Calculate 3.4 and 5 yearly moving Average from the following data : -
Years 19P 152153 I 54 155 I 56 I 57 I 58 I 591 60 I 61 16 2 1 6 3 164 16,
Value 18 I 20 I 22 I 25 I 30 I 37 I 38 I 38 I 40 I 43 I 45 I 4 6 r 4 8 I 49 I F
MEASURES OF CENTRAL TENDENCY 177

90. The age-distribution of the members of a certain children'S club is as


follows: Age on last birthday
(in yrs.) 6 71 8
Frequency
4
5 r151
-9 +-1-l....;8~.!.-3..!-5--'-1 9110 III 112
-4-1-+--3-"1---',1C-1-5-~1-7-~

There is a member A such that there are twice as mlIny members older than /I.
a8 there arc members younger than /I.. Estimllte his age (in years upto two decimals.)
(M. A • .&0., Delhi. 1963).
91. The arithmetic mean. the mode and the meclian of a group of 75 observations
were calculated to be 17, H, 19 respectively. It was later discovered that one ob-
servation was wrongly read as 43 instead of the correct value 53. Examine to what
extent the calculated values of the three averages will be affected by the discovery
of this e r r o r . . (M.A .• E&O •• Delbi. 1963)'
:;1. If the mode and the median of a moderately asymmetrical series is 166 and
15.6 respectively. what would be its most probable median? (8. CDm., AgrtJ, 1960).
93. Under what conditions weighted average is 0) equal to simple a~e, (ii)
greater than simple av~tage and (iii) less than simple avcrage. lllustrate your answef
with the help of examples.
• 94. (a) A train starts from rest and travels successive quarters of miles at ave-
ragc speed of IlZ, x6, :t4 and 48 miles per hour. The average speed over the whole
mile is 19.7. m.p.h. and not 15 m.p.h.
(b) The price of a commodity increased by 5 percent from 1954 to 1955.
by 8 percent from 1955 to 1956 and by 77 percent from 1956 to 19n. The llvcragc
increase from 1954 to 1957 is quoted as 7.6 percent and not ~o percent.
Explain the two statements as you would to a layman and verify the arith.
metic mean. (M. COlli. Agrll. 1962)
95. If arithmetic mean of two cumbers is 20 and their geometric mc:atl IS 16. line
the harmonic mc:atl.
Measures of Dispersion 10
Need and meafliflg. In the preceding chapters we have already
discussed why it is necessary to tabulate and classify statistical series
and to condense them into a single figure called average. The average
as we have already seen has its own limitations and even an ideal average
can represent a series only" as best as a single figure can". No doubt
averages have a very great utility in statistical analysis but they fail
to reveal the entire story of a phenomenon. There may be a dozen
series whose averages may be identical but which may differ from each
other in a hundred ways. Obviously in such cases further statistical
analysis of the data is necessary so that these differences between various
series may also be studied and accounted for. If this is done statistical
analysis would be more accurate and we shall be more confident of our
conclusions.
Suppose there are three series of nine items each as follows :

Series A Series B Series C


40 ,6 I
40 37 9
40 ,8 20
40 39 30
40 40 40
40 41 50
40 42 60
40 43 70
40 44 80
To~al 360 360 360
Mean 40 40 40

In the first series the mean is 40 and the value of all the items
is identic~l. The items are not at all scattered, and the mean ,fully-
discloses the cha.racteristics of this distribution. However, in the
second case though the mean is 40 yet all the items_of the series have
different values. But the items are not very much scattered as the
minimum value of the series is 36 and the maximum is 44. In this
case also mean is a good representative of the series. Here mean
cannot replace each item yet the difference between the mean and
other items is not very significant. In the third series also, the mean
is 40 and the values of different items are' also different, but here the
values are very widely scattered and the mean is 40 times of the
MEASURES OF nYSPERSION 179

smallest value of the series and half of the maximum value. Obviously
the average dves not satisfactorily represent the individual items in
this group. In order to have a correct analysis of these three series,
it is essential that we study something more than their averages because
averages are identical and yet the series widely differ from each other in
their formation. The scatter in the first case is nil, in the second case
it. varies within a small range, while in the third case the values ragge
between a very big: span and they are widely scattered. ItTs'Cvldent from
the above, that a study of the extent of the scatter round an average should
also be studied to throw more light on the composition of a series/. The
name gillen to this scatter is dispersion.
Dispersion in a general sense. Dispersion, thus, refers to the variability
in the size of items. It indicates that the size of items in a series is not
uniform. The value of various items differs from each othe1. If thus
variation is substanti~l dispersion is said to be considerable and if the
variation is litt~e dispersion is insignificant. This is rather a general sense
in which this terni is used. If there is a series in which the scatter of the
value is much, say, from 100 to 1000, this series would be said to have
more dispersion than the one in which the values range only from 100
to 2.00.
Vispersion in a precise sense. The term dispersion not only gives a
g~r;. ral impression about the variability of a series, but also a precise
me ."ure of this variation. Usually in a precise study of dispersion, the
deviations of size of items from a measure of central tendency are found
out and then these deviations are averaged, to give a single figure re-
presenting the dispersion of the series. This figure can be compared
with similar figures representing other series. It goes without saying
that such comparisons would give a better about the formation of
series than a mere ('omparison of their averages.

Averages uJ second order. S:nce for a precise study of dispersion we


have to average deviations of the values of the various items, from their
average, various measures of dispersion are called Averages of the Second
Order. We have seen earlier that mean, median, mode, geometric mean
and harmonic mean, etc., are all averages of the first order. Since in the
calculation of measures of disBersion we average values derived by the
use of the averages of the first order, these measures are called averages
of the second order.

Absolute and relative dispersion. Dispersion or variation can be ex-


pressed either in term~ of the original units of a series or as an abstract
figure like a ratio percentage. If we calculate dispersion of a series
relating to the ineome of a group of persons in absolute figures, it will
have to be expressed in the unit in which the original data are, say rupees.
Thus we can say that the income of a group of persons is Rs. 120 per
month and the dispersion is Rs. 20. This is called Absolute Dispersion.
If, on the other hand, dispersion is measured as a percentage- or ratio
of the average it is called Re/c;;':"p. Dispersion. It is not expressed in the
180 FUNDAM.BNTALS OJ' STATiSTICS
\ I

unit of the original data. In the above case the average income would be
referred to .. S Rs. 12.0 per month and the rdative dispersion ~ or '167
120
or 16.7%. In a comparison of the variability of two or more series, it is
the relativt: dispersion that has to be taken into account, as the absolute
dispersion may be etroneous or unfit for comparison if the series are
originally expressed in different units.
Measures of dispersion
The following measures of dispersion are in common use--
I. Range
2. Inter-Quartile-Range
3. Semi-Inter-Quartile-Range or Quartile Deviation
4. Average Deviation or Mean Deviation
5. Standard Deviation or Root-Mean-Square Deviation taken
from the mean.
We shall discuss them in turn.
RANGE
Range is the simplest possible measure of dispersion. It is the
difference between th~ vallies oj. the.!..?f1!e1Jle.i1!J!;,LojEJ..e.r.iM:- Thus if in a series
rerat1t'ig to the weight measurements of a group of students the lightest
student has a weight of 90 pounds and the heaviest of 240 pounds the
value of range would be 150 pounds. This figure indicates the variability
in the weights of students. The distance on the scale measuring 150
pounds would include the weight of every student. If the data are given
in the shape of continuous frequency distribution, range is the difference
between the lower limit of the smallest class and the upper limit of the
biggest class.
Range as calculated aboveis an absolute measure of dispersion which
is unfit for purposes of comparison, if the distributions are in different
units. For example the range of the weights of students cannot be
compared with the rang(. of their height measurements as the range of
weights would be in pounds and that of heights in inches. Sometimes,
for purposes of comparison, a relative measure of range is calculated.
If range is divided by the sum of the extreme items, the resulting figure
is called "The Ratio of the Range" or "The Coefficient of the Scalier."
Merits, demerits and uses of range
A good measure of dispersiort should possess the same qualit:es
which were laid down in the Ilj.st chapter for a good measure of central
tendency. A good measure of dispersion should be rigidly defined,
easily calculated, readily understood and further, should be capable of
algebraic- treatment and should not be affected much by the fluctuations
of sampling.
The only merits possessed by range are, that it can be easily calculated
':::.. and readily understood. As against these, there are many drawbacks from
which it suffers. The most important point against range is that it is
HEASyRES OP DISPElt STON 181

afleoted vety greatly by tluctuations of sampling. Its value is never


stable and it varies from sample to sample. In a class where normally
the heights of students range from 60 to 72 inches, if a dwarf whose
height is 36 is admitted, the range would shoot up from 12 inches to
d

36 inches. Thus a single variation in the value of an extreme item


affects the value of the range. Range is not based on all the observations
of the series. If the heights of the shortest and tallest students remain
unchanged and if the heights of all other students are changed, range
would remain unaffected. Thus range does not take into account the
composition of a series or the distribution of items within the extremes.
The. range of a symmetrical and an asymmetrical distribution can be
identical. Two such distributions C:ln never have the same dispersion.
In this way we find tha.t range is a very unsatisfactory measure of
dispersion and should be used with extreme caution.
However, range as a measure of dispersion is commonly used in
some fields-particularly those where the variation is not much. III
quality control of manufactured products, range is used to study thc
variation in the quality of the units manufactured. Even with the most
modern mechanical equipment there may be a small, almost insigni-
ficant, difference in the different units of a commodity manufactured.
Thus, if a company is manufacturing bottles of a particular type, there
may be a sligh t variation in the size or shape of the bottles manufactured.
In such cases a range is usually determined, and all the units which fall
within these limits are passed as all right while those which fall outside
the limits are rejected. Variations in money rates and rates of exchange
etc .• are also studied with tange. However, it should never be for~
gotten that range is a very rough measure of dispersion and is entirely
unsuitable fot: pr~cise and accurate studies.
INTER-QUARTILE RANGF

Just as in case of range the difference of extreme items is found,


similarly, if the difference in the values of two quartiles is calculated,
it would give us what is called the Inter-Quartile Range. 1nter-quartil e
range is also a measure of dispersion. It has an adv-antage over range,
inasmuch as, it is not affected by the valu'es of the extreme items. In
fact 50 %of the values of a variable are between the two quartiles and as
such the inter-quartile range gives a fair measure of variability. How-
ever, the inter-quartile range suffers from the same defects from which
range suffers. It is also affected by fluctuations of sampling and is not
based on all the observations of a series. It is a measure of location,
and its value is not very stable. The inclusion or eXclusion of a single
item may sometimes considerably affect its value. It docs not take into
account the composition of a series. It is not capable of further algebraic
manipulation. But inter-quartile range is easy to calculate and is re~d'ily
understood.
Sometimes percentile range is also calculated. Since range i~ aittct-
ed by the values of extreme items, and since inter-quartile r.o.l!e leaves
182 FUNDAMENTALS OF STATISTICS

500/0 of the values, a percentile range which takes into account, say, the
90th and the 10th percentiles would give a better measure of dispersion
than either of these two. If the difference of the 90th and the 10th
percentiles is found out it will be called 10-90 percentile range. Un-
lik:e range it has the advantage of not being affected by the values of the
extreme items of a series and it also does not leave aside 50% of the
values as the intet-q uartile range does. A 10-90 percentile range would
leave only 20% of the values at the extremes. It, however, suffers from
most of those defects from which range and inter-quartIle range suffer.

SEMI-INTER-QUARTlLE RANGE

Semi-intet-quartile range as the name suggests is the midpoint


of the inter-quartile-range. In other words, it is one half of the diffe-
rence between the third quartile and the first quartile. Symbolically,

Semi-inter-quartile range
or

Quartile deviation

Where Q'A and Ql stand for the upper and lower qua{tiles respectively.
In a symmetrical series median lies half way on the scale from Ql
to Qa. If, therefore, the value of the quartile deviation is added to the
lower quartile or subtracted from the upper quartile, in a symmetrical
series, the resulting figure would be the value of the median. But
generally series are not symmetrical and in a moderately asymmetrical
s~ries Ql+ quartile ~eviation or Q3- quartile deviation, would not give
tne value of the median. There would be a difference between the two
figures and the greater the difference, the greater would be the extent of
departure from normality. .
Quartile deviation is an absolute measure of dispersion. If it
is divided by.the average value of the two quartiles, a relative measure
of dispersion IS obtained. It is called the Coefficient of Quartile Deviation.
/2a-Ql
Symbolically 2
Coefficient of a quartile deviation = Q2+ Q'8 =Qa- Ql
2 Qa+Ql
The following example would clarify the procedure of the calcu-
lation of the quartile deviation and its coelfficient : -
Example 1. Calculate the Semi-Inter-Quartile Range and its
coefficient of the marks of 59 students in Economics given below.
MEASURES OF DISPER.SION 183

Marks-grou No. of Stuclents


0-10 4
10-20 8
20-30 11
30-40 15
40-50 12
50-60 6
60-7J 3
Soilition. Computation oj the S:eoli-Inter- Quartile Rang'
Marks-group No. of students Cumulative frequency
0-10 4 4
10-20 8 12
20-30 11 23
30-40 15 38
40-50 12 50
50-60 6 56
150-70 3 59
Quartile or
59
Q 1 =-the marks of the "4 i. e. 14.75th student which lie in the
20-30 marks-group.
By interpolation, =20+ 30 1~0:_(14'75-12)=22.5 marks
Third Quartile or
Qa=the marks of the 3(~9) i. 8., 44'2Sth student which lie in tile
40-50 marks group. By interpolation,

... 40 + 50-40
1 2 (44'25-38)=45'2 marks
Semi-inter-quartile range =
Q a- 2 Q 1 =
\/44'2-22'5
2 = 10.85 marks.

Co-efficient of the S.. I. range·= Qa - Ql = 44'2 -22'~ =.324


Qs+ Ql 44. 2 +22·5
Median, or

M=the marks of the ( ~) i.e., 29·Sth student which lie in


the ( 30-40) group
o (40-30) ( 29'5-23)=34'33 marks.
-3 + 15
1n the above example the quartile deviation is 10·85 marks ..ltt
these marlliS are added to the lower quartile the resulting figure wo~d
1184 FUNDAMENTALS OF STATISTICS

be 33.35 and if they are subtracted from the upper quartile it will again be
33.35. The actual value of the median is 34.33. It shows that the series
is not perfectly normal though the department from normality is not much.
It, however, reveals that the dispersion of items on the two sides of the
median is almost equal.
Merits and drawbacks of quartile deviation
The quartile deviation possesses the merits of simple calculation and
easy understandability. It is commonly understood and its calculation
.does not involve any mathematical intricacies. These are the points in
\favour of quartile deviation but there are a large number of points which
go against it. Quartile deviation is neither based on all the observations
of the data, nor is it capable of further algebraic treatment. It is affected
to a cousiderable extent by the fluctuations of sampling. A change in
the value of a single item may in certain cases affect its value considerably.
Thus quartile deviation is not a very good measure of dispersion, parti-
cularly for series in which the variation is considerable. However, for
rough studies, '{uartile deviation may give an approximate idea of the
extent. of variabllity in a series.

AVERAGE DEVIATION OR MEAN DE<Vl~TION

Method of limits and method of avwaging deviations. All the above


mentioned measures of dispersion suffer from one seriC\us defect, i.e.,
they are calculated by taking into account only two values of a series-
either the extreme values as in the case of range, or the values of quartiles
as in case of quartile deviation. They ignore the other values of the
series. They are not based on all the observations of the series. This
method of studying dispersion (by location of limits), is also called the
"method of limits." Range, inte1"*Cluartile range and quartile deviation are
all such measures in which dispersion is studied by the method of limits.
This is a serious drawback because in such calculations the composition
of the series is en~irely ignored. It is possible that the range or the quartile
deviation of two series is identical and their composition very much
dissimilar. It is, therefore, always better to have such a measure of dis-
persion which takes into account all the observations of a series and is
calculated in relation to a central value. Range and quartile deviations,
as we have seen, are not calculated in relation to any average. If the
variations of items were calculated from an average, such a measure of
dispersion would throw light on the formation of the series and the dis-
persal of items round a central value. This method of calculating dis-
persion is called the method of averaging deviations. As the name suggests,
in this method, the deviations of items from a measure of central tendency
are averaged to study the dispersion of the series.
Mean deviation is such a measure of dispersion. "Mean deviatio'n of a
reries is the arithmetic aPirage oj the deviations of variolls items from a measllre
of lentral tendmfJ (either lIIean median or mode).' Theoretically, deviations
r
MEASURES OF DISPERSION 185
can be taken fr n any of the three averages mentioned kPove but in actual
practice mean ueviation ~s calculat~d either .fr?m mean. or from ~ed~an.
Mode is usually not consIdered, as ItS value is Indeterminate, and It gives
erroneous conclusions. Between mean and median the latter is supposed
to be better than the former, because the sum of the deviations from the
median is less than the sum of the deviations from the mean. Therefore,
the value of the mean deviation from median, is always less than the value
calculated from mean. In aggregating deviations the algebraic signs
are not taken into account. It should be remembered that if algebraic
signs were taken into account the sum of the deviations from the mean
should be zero and from die median would be nearly zero if the series is
moderately asymmetrical. Since the purpose of a measure of dispersion
is to study the variation of items from a central value. it does not matter
in the least. if the plus and minus signs are ignored. However, leaving
of plus and minus signs renders mean deviation incapable of further
algebraic treatment. Mean deviation is also known as the first moment
of dispersion. Symbolically
(i) 8 ~ '};d
n
Where 8 stands for the mean deviation from mean, d for the deviations
from the mean, and n for the number of items.
(ii) Sm = '};tIm
n
Where 8m stands for the mean deviation from median, dm f01 the devia-
tions from the median, and n for the number of items.
(iii) 8z = '};dZ
n
Where 8Z stands for the mean deviation from mode, d.z for deviations
from the mode, and n for the number of items.
Mean deviation or first moment of dispersion. as calculated above.
:woul~ be an abs.o~ute measure of dispersion, expressed in the same units
In whIch the orIgInal data are. In order to transform it into a relative
me.asure, it is divided by the average from which it has been calculated.
It is then known as the Mean coefficient of diS_1Jersion.
Thus, mean coefficient of dispersion from mean median and mode
would be respective y : '
~,
8m and 8t
a M Z
Calculation of mean deviation and its coefficient in a series of
individual observations
As has been said earlier. mean deviation should be calculated either ,,-
from arithmetic average or median, preferably from the latter. In the
illustrations gi.ven below we shall show, ho,\\," mean deviation can be calcu.
lated by the dIrect and short-cut methods from mean as well as median.
1~ FlJ.NtD.A.MJ!.NTALS. OF STATISTICS

n the direct method, as we have seen above, the mp.an deviation would
be calculated by totalling the deviations from the mean or median (plus
and minus ignored) and dividing this total by the nllJIlber of items.
In the short-cut method mean or median is assumed and the total of
the "allies of itWiS below the actual mean or median and above it are found
out. The former is subtracted from the latter and divided by the number
of items. The resulting figure is the required mean deviation.
Symbolically
81n= _:_(JIIY-1IIx)'
~ n ,
Where 3m stands for the mean deviation from median, my for
the total of the values above the actual median, and mx for the values
below it, and n for the number of items.
1
(;,1 8=-;;- (ay-ax)
Where 8 stands for the mean devia~ion from mean, '!Y stands for the
total of the values above the actual arithmetic average and ax for values
below it. The following example would illustrate these formulae : -
Example 2. The following are the marks' ~btained by a batch of 9
students in a certain test : - I
Serial No. Marks Serial No. Marks
(out of 100) (out of 100)
1 68 5 54
2 49 6 38
3 32 7 59
4 21 8 66
9 41
Calculate the mean deviation of the series.
Soilltion. Direct method. Calculation of mean deviation of the series
of marks of 9 students (arranged in ascending order of magnitude).
I- 1.)eVlattons frcJlllmedian (4~)
Students Marks (+and- signs ignored)
(m) (dm)
1 21 28
2 32 17
3 38 11
4 ~ 8
5 ~ 0
6 54 5
7 59 10
8 66 17
9 68 19
r.dm = 115
MEASURES ()il DISPERSION 187

n-\=1.
Median=value of - 2 - ltems

= 49 marks

.. ~ "l'..dm
Mean devlatlon or um = -
n
Where "l'..dm represents the summation of the deviations from the
median; and n, the number of items
115
Sm-=-9- marks =12.8 marks

Shorl-tut method. Marks arranged in ascending order of magnitude


Marks
(m)
21
32
38
41
49
54
59
66
68
Sum of items above median (with values less than median)
=(21+32+38+41)=132 (mx)
Sum of items below median (with values more than median)
=(54+59+66+68)=247 (Ply)

Mean Deviation = _!_cmy-m.x)


n
1 1
= - (247-132) - - X 115 .. 12'8 marks
9 9
Example 3. Calculate the mean deviation (from the arithmetic
average) and its co-efficient for the following prices of a Government
security.
1 1 3 5 3 1 6 10 8 1
Rs. 100 2' 4' 8' 8' 4' tj' 16' 16 10' if
188 FUNJ>AlDINTALS op STATISTICS

So/tllion
Calculation of Mean Deviation ftom the arithmetic average,
Prices Rs, Deviations from arithmetic average
(Rs, 100,425)
(+and-signs ignored)
(111) (d)
100.500 .075
100.250 .17S
100,375 ,050
100.625 ,200
100.750 ,325
100,125 ,300
100,375- ,050
100.625 ,200
100.500 .075
100.125 .300
:Em -1004,250 Ed ... 1.750
. hm .
Arit etic average-
Em
n
== 1004.25
1.0
R 100 4.25
- s. '1

Mean Deviation from the Arith. Av .• or 8= ~


"
Re. 1.~~ =Re .. 175
B
Co-efficient of dispersion fJ;om I:he Arith. Average--
a
.175 001
1C0,425 ==. 7 approx.
Thus the mean 4eviation of the given prices of Govt"rnment security
is Rs .• 175 and its co-efficient is .0017 appro:r..
Shorf-CIIf mtfhoa. Prices arranged in ascending order
Prices
100.125
100.125
100.250
100.375
100.375
100.500
100.500
100.625
100.625
100.750
1004.250
JolEAstmES 01' DISPERSION 189

.. 1004.250
ArIthmetIc average or a- 10 "'" 100.425
Number of items smaller than arithmetic
average or fiX = 5 and their total
or ax -- 501.250
Number of items bigger than arithmetic
average or '!1-5 and their total
or ~ ... 503.000
1
Mean deviation ,,-(aY-flyxa)+(nxxa-ax)
1
n(ay -IlK)
1 1.750
= 10 (503.000-501.250) - """'10
- .175-rupees ..

Calculation of mean deviation in discrete series


in a discrete series the deviations of items from median or mean
are multiplied with their respective frequencies and the total of these
products is divided by the number of items. The resulting figure is
the required mean deviation.

Symbolically (i) 8111-= l:...r: 11I


(iI) 8- ~
The following illustration would clarify the formula :_
ExturpI,4. Find mean deviation of the distribution given below:_
No. of accidents Persons having said
number of accidents
o 15
1 16
2 ~1
S 10
4 17
5 8
6 4
7 2
8 t
9 2
10 2
11 o
12 2
Total 100
i90 FUNDAMPNTALS OP STATISTICS

Solution. Crzkulation of the mean deviation


No. of Persons having said Deviation from Total
accidents number of accidents Median (2) Deviations
(+and - signs ignored)
em) (f) (dm) (Jdm)
0 15 2 30
1 16 1 16
2 21 0 0
3 10- 1 10
4 17 2 34
5 8 3 24
6 4 4 16
7 2 5 10
8 1 6 6
9 2 7 14
10 2 8 16
11 0 9 0
12 2 10 20
n -= 100
- "1:.fdm=
196
:
The value of medIan =2
"
M ean d eVlation ~ co: --n-
or om "1:.fdm == 196
100 == 196i acc.1'dents.
\ There is no need to calculate mean deviation by a short-cut m~thod
as the median value in a discrete series is usually around number and
there i~ no difficulty in the calculation of mean deviation.
Example 5. Calculate the mean deviation (from mean) of·..rfit!"
following series: -
Marks No. of students
5 5
15 8
25 15
35 16
45 6
Solution. Calt'ulation of mean deviation
- Marks
Step Devia-
tion from No. of
Deviation
frolll actual
as. avo students average
(25) J27)
(m) (d') (f) (fd') (d) Cfd)
5 -2 5 -10 22 110
15 -1 8 - 8 12 96
25 0 15 0 2 30
35 +1 16 +16 8 128
45 +2 6 +12 18 108
I "1:.f "",50 "1:. fd'=+10 L (d=472
MEASURBS OF DISPERSION 191

Arithmetic average or a =25-'- (!g X410) =27

. .
Mean d eVlatlon = l:.fd
I:.f =
472
50 mark s. =.
9 44 marks.

While calculating the mean deviation from the mean it may be


found more convenient to use a short-cut method, by assuming an arith-
metic average. The process of calculating mean deviation by the short-
cut method involves the following steps : -
(1) Deviations of items are taken from an assumed mean and
multiplied by their respective frequencies .and the products so' obtained
are totalJed.
(2) Number of items less than the actual arithmetic average are
multiplied by the difference between' the actual and the assumed mean.
(3) Similarly, the number of items more than the arithmetic average
are multiplied by the difference between actual and the assumed mean.
• (4) The latter (No.3) is deducted from the former (No.2) and the
balance is added to the sum of the products of deviations from the assumed
mean and their frequencies (No.1).
(5) The resulting figure is divided by the number of items and it is
the value of the mean deviation.

Example 5 is solved below by this short-rUle method : -


Dev. from -
Marks Step devia- as. avo 25
tions from (Number of (fd') (fd)
as. avo (25) Students (+ &-signs (with+&-
(m) (d') (1) ignored) signs)
5 -2 5 10
- 10 --
15 -1 8 8 -8
,
25 0 15 0 0
35 +1 16 16 +16
45 +2
}:. =50
6
v
II 12
}:. a'=46
+12
}:. a=+10

Actual arithmetic average ... 25 +( ~~ X 10 )=27

Total devijl,tions from assumed'average of 25


==46x 10=460

Adjustments
Number of items less than the !lctual arithmetic average (27)=28
'192 FUNDAMENTALS OF STATISTICS

Number of items ~re ~_h!ln the actual arithmetic average (27)~


Difference between actual and assumed average ~2.
Total deviations from actual average or ("I:.fd)
= 460+(28 X 2)-(22 X 2)
=460456-44=472
··
M ean deVlatlon =--
I;fd =-_
472
- 11 50
-9.44 marks
Calculation of mean deviation in continuous series
The calculation of mean deviation in continuous series is done by
the same proc~dure by which it is done"in discrete series. In the short-
cut method also the same procedure is followed, provided the assumed
mean or median is in the same class-interval in which the actual mean or
median is. ~f the assumed average is in a different class interval, further
adjustments are necessary. The following examples would illustrate
these procedures : -
BxalJlpk 6. Calculate the mean deviation (from median) from the
following data :_
C..lass intervaJ Frequency Class interval Frequency
1-3 6 9--11 21
3-5 53 11-13 I 26
5--7 85 13--15 4
7-9 56 15-17 4
SO/Illion. Dirett and short-tilt metbods. The median of the above serie~
is 6.5.

Class Deviations Dev. X Dev. from Dev. X


interval Mid- from actual Fre- Prequen- assumed frequen-
points median(6.5) quency cy median(6) cy
(tim) Cf} Udm) (d"m) (Jdm)
1--3 2 4.5 6 27.0- 4 24
~-5 4 2.5 -5-3 132.5 2 106
5-7 6 .5 85 42.5 0 0
7-9 8 1.5 56 84.0 I 2 I 112
9·-11 10 3.5 21 73.5 I 4 ! 84
12
11-13
13--15
15-17
14
16
5.5
7.5
9.5
I,
I
16
4
4
88.0
30.0
38.0
6
8
10
96
32
40
Total I 1245- 515.5 494

Diretl Mlthod.
MEASUR.ES OF, DISPERSION 193 -

Shorl-Ctlt Melhoi.Total of deviations from assumed median=494


No. of items with values less than the actual median (6.5)
(6+53+85) = 144
0=

No. of items with values more than the actual,median


"" (56+21+16+4+4)-=101
Ditference betweep actual and assumed median:;."", (6.5-6) =.5
Total deviation from actual median (when actual and assumed
medians are 'in the same class interval)
=494+(144x .5) (101 x .5)
=494+72--50.5 -515.5
"
Mean devlatlon= 515.5
245 =
21
.

Example 7. Calculate the mean deviation (from mean) from the


following .data : -
Di1ference in age between husband and wife in a particular com+
munity:-
Difference in years frequency Difference in years Frequency
3-5 449 20-25 109
5-10 705 25-30 52
10-15 507 30-35 16
15--20 281 35-40 4
Direcl alld shori-clil m;thods
Calculation of the Mean Deviation
Differ- DeVIatIOn trom Total devla- DevIatIOn
ence in Mid- Fre-the as. avo (12.5) tions from from the Total
years values quency the assWll- avo (10.5) devia-
dx/i ed average +'&-signs tions
dx where ignored
(.) (INII) ef) i-5 (fdx) (d) (jd)
0::5 ~ 449-- -=to- -2 --":':_S98 8 3:592
5-10 7.5 705 -5 -, -1' -705 3 2,115
10-15 12.5 507 0 -0 0 2 1,014
15-20 17.5 281 +5 +1 +281 7 1,967
20-25 22.5 109 +10 +2 +218 12 1,308
25-30 27.5 52 +15 +3 +156 17 884
30-35 32.5 16 +20 +4 +64 22 352
35-40 37.51'_~ +25 +5 +20 27 108
II...
12j 123
I ,
'f.JlJx-
-864 .~.
T.1L-,1
1;:340
13
194 F'tlJo."DAHJlNTAL$ OF STA'l'lSTICS

. hmetlc
Atit . average=a + 'SfdX • 25+('-864
~XI=l. 2f23 x 5 ) = 10.5

" 'Sfd 11340


Mean DeviatIon - ~n- - 2123 = 5.3

Short~cllf Method. Total deviations from assumed average C±signs


ignored)-(2342IX S}=11710
Ntunbetuf items smalrer than actual arithmetic al'erage (10.5)==1154
Number of items bigger than actual arithmetIC average (10.5)=969
Difference between actual and Assumed average."",_2
Total deviations from actual arithmetic average (where actual and·
assumed average are in the same class-interva])

= 11710+(1154x -2)-{969 X-"""2)


-11710-2308+1938=11340
.. 11340 53
~ean deVlatlon - 2123 ... '.
When the assumed median or mean is not in the same class interval
in which the actual median or mean is. some further adjustments at~
necessary. The following illustration would clarify them : -
I

'&tampll 8. The foUowing table gives the age distrih"tioos of


students admitted to a college in the year 1955:-

Calculate the mean deviation and its coefficient.

Age Number of students admitted


in the year
15- o
l6- 1
17- 3
18- 8
19- 12
20- 14
21- 14
22_ 5
23- 2
24- 3
25- 1
26- o
27~ 1

64
MEASURES OF DISPERSION 195

SO/Iltion. Complltation of the Mean Deviation

Mid-value
I
= Deviation 'fotal Deviation
from the deviations from the Total
I
Age-group of the Frequen- as. avo from the average devia-
group cy (19.5) as. avo (20.7) I tions
(+ &-signs!from the
ignored) , average
(m) I(m.v.) (f) (dx) I (/dx) Cd) Cfd)
15-16 15.5 0 -4 0 5.2 0
16-17
!
, 16.5 '1 -3 -3 4.2 4.2
17-18 I 17.5 3 -2 -6 :'\.2 9.6
18-19 18.5 8 -1 -8 i 2.2 17.6
19-20 19.5 12 0 0 1.2 14.4
I
20-21
21-22
22-23
I 20.5
21.5
22.5
14
14
5
+1
+2
+3
+14
+28
+15
.2
.8
1.8
2.8
11.2
9.0
23-24 2~.5 2 +4 + 8 2.8 5.6
2~25 24.5 3 +5 +15 3.8 11.4
25-26 25.5 1 +6 +6 4.8 4.8
26-27 26.5 0 +7 0 5.8 0
• 27-28 27.5 1 +8 + 8 6.8 6.8
----.
,I n-64 l:.fdr = . j}:.jJ='97. 4
I +77
.h
A nt .
metlcaverage or,a=x
+. l:.fdx -=19.5+64"
---n 77
=1.0.7 year$.

Mean deviation or 8 = l:.fd = 97.4 = 1.52 years approx.


a 64
Mean coefficient of dispersion = ~ _ 20.7
1.52
~ .07.
a
Short-cllt Method. Total deviation from assim1ed average ± signs
ignored c= 111.
Note. Where actual and assumed averages are in different class
'intervals a special adjustment is necessary. In such cases the frequency
of ;he class in which the actual me!).n lies is treated separately. It is
m~ltiplied by the difference of the deviations of the mid-value from the
actual ancl assumed averages. Th~ product so obtained is subtracted
fro~ the total deviation from the assumed mean.
Thus, number of items smaller than the actual arithmetic average
(20:7)=24 (frequency of mean class being ignored)
Number of items bigger than actual arithmetic averag<; (.20.7}
=26 .
Frequency of the mean class = 14.
Difference between actual and assumed averages -= 1.2
196 FUNDAMENTALS OF STATISTICS

Difference of deviations of mid-value of mean class from the actual


and assumed ,averages-(2O-.5-20.7}-(20.5-19.5)
=-.2-1
"".8±signs ignored
Total deviations from actual mean
= 111+(24X 1.2)-(26 X 1.2)-(14 X .8)
== 111 +~8.8-31.2-11.2-=97.4
971 == 1..5""..
Mean d eVlatlon.... ~
o'

Mean co-efficient of disperslOn= ~~~~ ~ .07


Characteristics and use of mean deviation
(1) Mean deviation is rigidly defined and its value is precise and
definite. However, since mean deviation can be calculated from any·
avetage, it is likely that in some investigatio~s the Olean has been used
as base, while in others either median or mode has been used as such.
If itis so, the c<omparison of the mean deviations would give inaccurate
results. TheJ;efore, it should always be ascertained whether mean
deviation has been calculated by using the mean) _median or mode-
(2) The calculation of mean deviation is not very difficult. No
doubt range and quartile deviation have an advant;age over mean
deviation in. this respect, still the calculatiop- of mean deviation cannot
be said to be a complicated or difficult job.
. (3) Mean deviation is readily understood. It is the average of
the deviations from a measure of central tendency.
(4) It is based on all the observations, and unlike quartile devia.-
tion or range, it cannot be calculated in the absence of a single figure.
(5) It is not affected very much by the values of the extreme. items.
We shall see later on, how the standard deviation is affected by the values'
of extremes much more than the mean deviation.
(6) Mean deviation ignores the algebraic:S1gns of the deviations,
and as such, it is not capable of further mathematical treatment.
(7) Mea'n deviation is not a very accurate meaSure of ~ispersion
particUlarly when it IS calculated from the mode because mode can
be unrepresentative, and even when it is calculated from median, it
cannot be fully relied upon, because if the degree of variability in a
series is high, median is not a representative, average. .If ~ean devia-
tion is calculated from the arithmetic average, it is not very scientific
because, the sum of the deviation from the mean (plus minus signs
ignored) is more than the sum of the deviations from the median.
Therefore, \ in many cases, mean deviation may give unsatisfactory
results. In fact, this measure of dispersion is not in common use ~nd
generally dispersion is studied through standard deviation, which, as
MEASURES OF DISPERSION 197
we shall see, has many properties no!: possessea by any othe.r measure
of dispersion. HO'":1."ever, mean deviation has found favour with eco-
nomists and businessmen due to simplicity in calculation and on account
of the fact that ~tandarci deviation gives greater importance to the
deviations of the extreme values.
STANDARD DEVIATION

Meaning. The technique of the calculation of mean deviation is


mathematically illogical as in its calculation the algebraic signs are
ignored. This drawback is removed in the calculation of standard
deviation. One of the easiest ways of doing away with algebraic signs
is to square the figures and this process is adopted in the calculation
of standard deviation. Standard dfJIJiation is tbe sqllare root of tbe aritbmetic
average of tbe sqllares of tbe deviations measured from tbe mean. Thus in' the
calculation of standard deviation, first the arithmetic average is cal-
culated and the deviations of various items from the arithmetic average
are squared. Thus squared deviations are totalled and the sum is
divided by the number of items. The square root of the resulting
figure is the standard deviation of the series. The standard deviation
is conventionally represented by the Greek letter Sigma CT.

Symbolically

a = jL:~
Where CT stands for the standard deviation, ~d2 for the sum of
the squares of the deviations measured from the arithmetic average
and n for the number of items.
Difference between' root mean os-quare deviation and standard deviation.
Various terms like· Mean Error, Mean Square Error_and Error of Mean
Sqllare are used to denote the value of standard deviation. We shall
be using the term standard deviation only as it is most" popularly used.
Some writers use the term root-mean-square-deviation to denote the stan-
dard deviation. This is technically wrong, bec~use the standarddeviation
is only one of the many values that the root-meatJ..-square.-deviation
Cll n take. Root-mean-sqllare-de.uiatiofl is tbe sqllare root of tbe arithmetic
average of the sqllares of deviations measllred from a'!Y arbitrary vallie. If the
deviations are measured from the arithmetic average there is no difference
between root-mean-square:...deviation and the standard deviation;
in' other -Words, standard deviation is the root-mean-square-deviation
0,
mea'sured from the arithmetic average. If deviations are not measured
from the arithmetic average but from some other value we can find out
the value of the standard deviation from the value of the root-square-
deviation. In fact the short-cut method of calCUlating the standard
deviation is based on the relationship between standard deviation and
root-mean-square deviation. We ~ball discuss this point a little later.
198 FUNDAMENTALS OF STATISTICS

Calcula ion of standard deviation in a series of individual observa-


tions
Dir"t Method No. 1. In a series of individual observations the
deviation of each item from the arithmetic average is found out, and
is squared. The total of these squared deviations is divided by the
number of items. This figure is called the Suond MomenT abollt the M.ean.
The square root of it is the required standard deviation. The follow-
ing example would clarify this procedure ! -
Example 9. Calclllalion of standard "viation of the height.
(Direct Method 1)

Height in inches 15eviations from Deviations"'""Squ.ared


mean 63")
________ <!!!;-)_ _ I (d) cdr;
6(} -3 9
60 -.3 9
61 -2 4
62 -1 1
63 0 0
63 0 0
63 0 0
64 +1 1
64 +1 1
_~-_-7~0.",..__1_. ____+.. :._7____-..-;--r::-...........
4.,-9_ __
::Em =630 >:'d"-74
.____---=~~~----~--------------------~--------------
. runetic
A rit . averag or a .,. -n-
::Em = 10
630 - 63·

Standard Deviation or 0'- ,J~


"74 --
.....
J -10 -=...;7.4-=2.72"

Dirl&t Method 2. Standard deviation can be ca.lculated by anoth,er


method also. In this method the squares of the values of items (not of
deviations) are totalled and from this figure the square of the total of
the valueli divided, by th~ number of items, is subtracted. The resulting
figJ.1l'e is again divided by the number of items and its square root is the
required standard deviation.
Symbolically u- jr.",r."--'' ---(=;:m'' ' ' ' ' ')t''"','-'
Where m stands for the values of the variable and n for the number
of items. Example No. 9 above would be solved in the following
ma.nner by this method : -
-MEASURES OF nISp:ERstON

Dirut Method 2. Size of item


(111) (~2)
60 3,600
60 3,600
61 3,721
62 3,844
63 3,969
63 3,969
63 3,969
64 4,096
64 4,096
70 4,900
~III ~630 ~1111 =39.764

- ..
-sfancTard DeViation or (1-
J~mI
- nl:(mj'/n -

'39,764-(630)1/10 ,j'W'
-=,J 10 - 10
= 2.72"
Short-cut metbod. Standard deviation can also be calculated by
a short-cut method. Here the deviations from an assumed average
are calcula.ted and squared. Their sum. is divided .by the number of
items, or in other words, the arithmetic average of the square of devia-
tion!> from the assumed average is found out. From this ligure the
square of the arithmetic average of the deviations from the assUllled
mean is subtracted. The square root of the resulting figure is the
standard deviation.

~2_ (~dX):P
Symbolically 0=
j "
--
n
Where dx 'stands for the deviation from the assumed mean.
Example No.9 would be solved by this method.as follows:-
·Proof
nand $= J'DJx
J ~~I 2
Let 0= - , , - and c=(a-x)

2 l
(Ill = '2.fd and $2 = '1:.fdx
n "
(111=$II_CI

s.
As would always be greater than (II, the root mean-square devia-
tion from mean would always be le~s than the root mean square
deviatiofJ from any other point.
-200 PUNDAM:e;N'l'ALS 01':- STATISTICS

Short-fill M,thaa
si:te Of items Deviacl"Ons from Square o f -
assumed mean 62 deviations
-------60- -
(dx) ....
, - - - ' (~-);.,.----
60
-2' I 4
61
62
-2
-1
o
I
I
1
0
63 1 \ 1
63 1 1
63 I 1 1
64 2 4
64 2 4
70 8 64
Total +10

-0 - J};:' -(~ y
a -J 84 -(~)I-V84-1
10 10 .
- v 7.4 =2-.72'
This formula ,can be 'Written ih the following way~ also

(-I)
Then dX-d-f
(tlx2)_ (tI+ &)'= tl s+ 2&J+ &~
L(dxY'=;l:dl +:t2ed+el
but ~d=:O
:. Z(dx)B=l::dl+ncl
I "i.(dX)1 = };dl _+ el
n n
l::d' l::(dx)1
-= ---e·
n "
-= (a-x)!
n n
MEASURES OF ruSPE'RSION 2.01.

(ii)

Where dx stands for the deviations from the assumed average,


a for the actual arithmetic average and x for the assumed average.
Thus in the above example: 13=63 and x-=62. Substituting the
values we get

(i) a
J 84-10(63-62)1 -
10
J 84-10. '-74
10 v - ·
2.72'

.(ii) a ...
J~ Iu -(63-62)' -J 8! -
10
1

~'\I~2,72

The standard deviation is an absolute measure of dispersion. For


purposes of comparison a rel~tiye measure of ,disper.sion is calculate~
hy dividing the standard devIatIOn by the arIthmetIc average. It IS
called "standard coefficient of dispersion" or "coefficient of standard deviation".

Thus, Standard coefficient of dispersion ... ~


a

In the above exampl~ its value would be 2.~; or .04


Calculation of standard deviation in a discrete series
In discrete series the square of the deviations from the arithmetic
average are multiplied by the respective fre.quencies of these items.
The total of these products is divided by the total of the frequencies
and the square root of this figure is the standard deviation of the series.
Symbolically

o =J-~~2
The following illustration would clarify this procedure : -
Example 10, Calculate the standard deviation from the following
data : -
Size of item Frequency Size of itetp f're'iaency
6 3 10 ..i
7 6 11 5
8 9 12 4
Q
13
202 l'UND~N1.'ALS 'OF STATISTICS

Soilltian. Direct,Metbod. Calculauon of Standard Deviation

- Size of Fre- Size X Deviations Deviations ,.Frequency X

-
items

(m)
6
I quency

,(f)
3
I
Frequency from the

(mf)
-.'---~-'
average (9)
(d)
squared
up
(JI)
-3
square .of
deviations
(fdt )
9 Z7
1- 6 42 -'2 4 24
8 9 72 ~1 I 1 9
9 13 117 0 I 0 0
10 8 80 +:1 1 8
11 , 5 55 +2 4' 20
12 4 48 +3 9 36
11==48 :Emf=432 -:Ejdi -124
- ~

. hm . 1:.mf 432
A rlt etlc average = - - = __ =9
n 48
Standard DeviatlOn

., j };~dt ... j-r;: = 1.6

In discrete series also, the standard deviation can bb calculated by~


short-cut method. The deviations from an assumed me,an are fust.squared
and multipliea by the:: l.espective' frequenCies of items. '1 he e products
are totalled and divided 'by the total of the frequencies. From th!s
figure the square of the difference betw'een the actual and asstUned
average is subtracted. The square root of the resulting figure is the
re::quiren standard deviation.
Symbolically

''i:.ftlxl
u=
J -n--(a-x)1I

or <:; = J~f:XS -"(a-~


or u= J ~~dxlI (~Jf: )" _

The following examples would illustrate'these formulae '!_

Example 11. The following table gives the number of finished


artic~es turned out per day by different num~er. of workers i1;l. a £actory.
Find the mean, value and the standard devlatlOo of the dally output
of finished articles.
MEASURES. OF DISPERSION 203 '

Number of Number of Number of Number of


articles wor~ers articles workers
18 3 23 17
19 7 24 13
20 11 25 8
21 14 26 5
22 18 27 4

Solution. Calcu/tltion of standard deviation


Number of
. No. of Deviations Total
.
Deviations Frequency
articles workers from the Deviations squared X square
assumed up of devia-
average (22) tions
(Ill) (f) (dx) (fdx) (dx)2 (fdx 2 )
~-"::"12- 48
11$ 3 -4 16
19 7 -3 -21 9 63
20 11. -2 -22 4 -1-4
21 14 -1 -14 1 14
22 18 0 0 0 0
23 17 +1 +17 1 17
24 13 +2 +26 4 52
25 8 +3 +24 9 72
26 5 +4 +20 16 80
27 4 \ +5 +20 25 100
n=100 i :Efd.x= +38 :Efdx2 =490

Mean value of the finished articles

=x+
:EJd« = 22
n
+ 38 =22.38 articles
100
Standard deviation
. . j j1X:-
:E n(a-x)1I

= j490-10~06·38)~·~ ,;;[756
= 2.2 articles app rox.
OJ'

a=J~a_(:Ef:X2)

= )
490 C
38\2 -
100-- 'tOO} = ,\/4.9-.144= '\14.756
=2.2
204 FUNDAMENTALS OF STATISTICS

Calculation of standard deviation in contin.uOu8 series


The technique of the calculation of standard deviation in a con-
tinuous series is exactly the same as it is in discrete series. The class
intervals are represented by their mid~points and once it is done, a
continuous series becomes a discrete series. However, since in con-
tinuous series the class-intervals are usually of equal size the deviations
from the assumed average can be expressed in class interval units, or
in oth::r words, step.-deviations can be found out by dividing the devig-
tion~ by the magnitude of the class-intervals. If it is done, a slight
adjustment is necessary in the calculation of the standard deviation.
The formula for the calculation of standard deviation is then written
as follows : -
a= J"'i:.~S _(E~dxy 'xi
Where i stands for the common factor or the magnitude of the
class-interval, and dx stands for the deviations in class-interval units,
and other signs stand for what they stood in previous formulae. The
foHawing examples would illustrate the calculation of standard deviation
in a continuous series by vrrious methods.
Example 12. Calculate the standard devia tion fo;: the following ta hIe
giving the age distribution of 542 members of the~House of Commons.
Age No. of Mem1ers
20-- 3
30-- 61
40- 132
50-- 153
60-- 140
70-- 51
80-- 2
Total 542
Solution. Calculation of the standard deviation of the age distribution
of 542 memhers of the House of Common,·.

Age
group
\

value
I
Mid- ' Freque-
ncy
Deviat~ons
from the
assumed
Total
deviations dev iations
--
Square of Frequency
X square
I aVo (55) fJxI
(m) (11/4)) ~f) dx fdx dx"
20-30 2 5
30--40 35
3
61
-30 - 90 900 2/UU
24400
-20 -1220 400
40-'5"0 I 45 132 -10 -1320 100 13200
50-60 55 153 0 0 I 0 0
60-70
70-80
65
75
140
51
+10
+20
+1400
+1020
I 100
400
14000
20400
80-90 85 2 +30 + 60 ,900 1800
n=542 "'i:.fdx=- "'i:.fdx"=
150 I 76500
MEASURES OF DISPERSION 205
-150
(a-x) .... 542 = .28

Standard Deviation = J "J:.j dx2-:(a-x)"

-=J 76500--:542(.28)1
542
- v'i4iJ57
-11.9
The following metpod 'will also give us the same result. ."--
. , 0 f t he age d
Standard d eVlatton "b
Istn '
utton = j"J:.fdx'l.
_ ' - - -("J:.fdX)'
--
1\ n n

-= j 76500
542
(-150
542
)2 = v' 141.07
.... 11.9 years
Example 1.3. The following data relate to the ages of a group
of Government mployees. Calculate the standard deviation. I
• Age Number of employees Age Number of employees
50-55 25 30-35 80
45--50 30 25-30 110
40--45 40 20-25 170
35-40 45
SoJlltion. Cal&1llation oj standard tktIiation

No. of Step devia- .......,......


Age
I tion from
employees as. av.(37.5)
+ ~(t/o(+l)
~
em)
50-55
45-50
(I)
25
30
(dX)
+3
+2
jdx
+75
+60
fdx 2
225
120
+4
+3
-
(d,(+ 1) ' - '
16
9
400
270
40-45 40 +1 +40 40 +2 4 160
35-40 45 0 0 0 +1 1 45
30-35 80 -1 -80 8D 0 0 0
25-30
20-25
110
170
-2
-3
-220
-S10
I 1530
440 -1
-2 4
1 110
680
"J:.f=500
I
"J:.jdx
-635
I~fdx=
2435
,
I- - -
~f(dx+l)
1665

u= J
Standard deviation or
_-==L""':j:~X-::z""'_-(-=-=~:-:~"""x-y X i

_- j 500
2435 _ ( -635)2
5eO X
'5

= 9.0 years.
206 FUNDAMBNTALS OF STATISTICS

Thus the steps in the calculation of standard devia60n in a conti-


nuous frequency distribution are as follows : -
(1) Assume an average at the mid-point of a class interval which
is preferably in the centre of the distribution.
(2) Measure the devia60ns of the mid-values of various class
intervals from the assumed mean and divide them by the magnitude
of the class interval to get step deviations or deviatIons in class interval
units (dx).
(3) Mul6ply the step deviations with the respective frequencies
of the classes (fdx). Totar these products ("'J:.Jax).
(4) Square the deviations and mul6ply them by the respective
frequencies and obtain the aggregate of such products (~dxl).
(5) Divide "'J:.Jdxl by the number of items or the total frequencies
"'J:.fdx
'
n
(6) Deduct ( "'J:.JdX)2
-n- from ;fax". This will be th~ value of the
n
IIarian&e or square of the standard deviation.
(1) Extract the square root of the variance and it would be the
standard deviation in class interval units. I
(8) Multiply the standard deviation so obtained by the magnitude
of the class-interval and the resulting ngure would be the standard
deviation in original unitR.
Cbarli~' s check of accuracy
J~st as in case of arithmetic average we check the accuracy of
calculatIons by a forl:Jlula given by Charlier, similarly, the accuracy of
the ca-Iculations can be assured in case of standard deviation also by
the following rule :_
"'J:.f(dx+ 1)"-"'J:.fdx'~2"'J:.fdx =N
Substituting the values in the above formula in Example No. 13
we get.
Hi65-2435-2(-635) .... 500
1665-2435-(-1270) ~500
1665-2435+1270 .... 500
2935-2435 == 500
. We thus find that the two sides of the equation are equal and this
IS a proof of the Correct calculation of the value of "'J:.fdKI.

Sheppard's conection for grouping


In the calculation of standard deviation in a continuous series
we take the mid -points of class intervals to represent the classes, or in
other words, we presume that all the frequencies are concentrated at
the mid-point of a class interval. This may not always be so. How
MEASURES OF DISPERSION 207
ever, if the distribution is symmetrical or even moderatelyasymme-
trical, and if the class intervals. are not greater than 1J12th of the range,
the likelihood is that the assumption would not be far from the truth.
If a distribution is continuous and if the frequency tapers off to
zero in both the directions, a correction in the value of standard deviation
is usually done to remove the effects of grouping. These corrections are
known as Sheppard's corrections for grouping. They are as follows : -
h'l.
oil = ut" 12

Whete a.· stands for the square of the §tandard deviation after
corrections, at for the square of the standard deviation before correction
and h. for the square of the magnitude of the class intervals.

Thus if ii~ Example No. 12 above, the' standard deviation is cor-


rected for grouping, it would be as follows : -

u1.= 141.07
h=10
10·
Therefore ul =141.07- = 132.74
12
Therefore

It should be remembered that Sheppard's corrections are not applicable


to }-shaped, U-shaped or highly c;kewed distributions. Further they
should not be applied if the total frequency is not very large-say less
than 1000.

Standard deviation and the spread of observations


It should be noted that in a 'symmetrical9r moderately asymmetrical
series a range which is six times standard deviation usually covets
at least 99% of the observations. Thus, in Example, No. 12 the standard
deviatiQn (after Sheppard's corrections for grouping) is 11.5. Six times
of this gives a range of 69.0. !twill be observed that all the observatio1ls
lie within a ·range of 70 (90-20). Similarly in Example No. 1~," the
.uncorrectcc1 standard deviation is 9, and six times this, would give a
range of 54. All the observations are covered within 'a range: which is,
smaller than this. This property gives a concrete and definite meaning
to .standard deviation. We shall discuss more about this property later
on in conrlectio~ with normal frequency curve. In fact in a symmetrical
cir moderately asymmetrical 'dlstribution, mea1i ± 10 covers about
67% of all t~e values and mean ±20 about 95% and mean ±30 about
99% of the values of the variable.
208 FUNDAMENTALS OF STAnS'i'ICS

Mathematical properties of standard deviation


(1) Just as it is possible to find ~llt the mean ofa series from
the means of its component parts, similady the standard deviation of
a series can be found out from the standard deviations of its component
parts and their means.
We know that
all ~ "1 a1+n:tZ1
__,'----'._,_----'O..~

" 1+n ll
Where all stands for the mean of a serie:; and a1 and a, for the means
of its component parts, and n1 and n. for the number of. items in tbe
two component parts respectively.
If. further 0'1 and 0'1 stand for the standard dcvi~tions of these
component parts and O'lf for standard deviation of the whole series
tben

+ d21)
0'11=
1 j nl (O'tl+dtl)+n.
n1 +n.
((1111

The follOWing example Would illustrate the formulae:-


Exampl, 14. Find out the combined mean and standard deviations
from the following data

Series Series
A 13
Number of items 100 500
Mean 50 60
Standard deviation 10 11
SO/Illioll.
Combined mean or

all=
"1fNl
11 +11
+" 2m 2
1 II
(lOQx50) + (500 X 60) 35,000
- - - . 100+500 ._= 600-
- 58.3
MEASURES OP DISPERSION

Combined standarq deviation or


+ dl l )+#. (uI1+Jl)
Ull'" j #). (O'I I
#1
+
#1
-
J 1 -50-58.3 --~.3
dl ... 60-58.3 ... 1.7

100[(10)1+(-8.3)1]+500[(11)1+(1.7)1 ]
• 100+501)
-= j 600
16889+61850
""
,.--~~-

J13f.23 =11.5

Similarly the standard deviation of more thah twO component


parts can be combined in one.
If the number of observations in the two component parts is equal,
and if the means of the two parts are also identical then-

0'11- j O'II+O'tl
2
• Thus, if in the above example, the number of items in each case
was 100 and if the mean in each case 'Was 50 the combined standat:d
deviation by the lirst method would have been
_j 100(100+oY.+ 1oo(121+0)
1 +100

- J 10000+ 12100
200

_j ~: _j~l
10.51
If we apply the second rule then
j _j ~1
~t-
J
0'11+0',- _

10.51
100+ 121
2 2

(2) The standard deviation of the brst # natural numbers is


0'-= "II~ (,,1_1)

We can know from elementary algebra that the sum of the first
" natural numbers is
11 (,,+ 1)
2
14
210 FUNDAlomNTALS OF STATISTICS

Thus the sum of 1, 2,3, 4 and 5= 5(5+1)=15


Z

The mean of first n natural numbers = ( 11+1


-,,-)

T-hus ~h~ ~ea_? ~f 1, 2, 3,~na 5 is ( 5~1 ) or 3. Further it can


be easily proved that the sum of the' squareS of the first II natural
number is

1'1(1'1+ 1) (2n+ 1
6
Thus the sum of the squares of natural numbers 1 to 5 would
be 1+4+9+16+25 ... 55. It is equal to

5(5+1) (10+1)
6 ~-

5 X 6:11 -55
We have Seen in Example No. 9 (Direct method No.2) that
u "" J '1:.ma_<;m)l{w

In case of natural numbers '1:.1111 = n(n+lL(2n+l) and

:l:1II = 1'I(1I~1). If we substitute these equations in the above formula


and simplify it we shall get
u = '\I'l(n+1) (2n+1)-~(n+1ya
u = '\I'1/12(n l -1)
Thus if we calculate the standard deviation of natural numbers
1 to 5 it will be
u = '\I'ls(Si-1) -
),(BASURES o~ DlSJ'ERSION 2U

If we calculate the standard deviation by the direct method we


shall get exactly the same answer as follows : -

Si:l!e of item Deviations ftom Mean (3) Square of deviations


(tI) (.til)
1 -2 4
2 -1 1
3 o o
4 1 1
5 2 4

Standard deviation="
j T.tinll = )10
s- = "';2= .1.414.
The standard deviation possesses many other mathematical pro··
perties which are derived primarily from the above two rules.
. Example 15. From the oata below. giving the averages and
standard deviations of four sub-groups, calculate the average and the
standard deviation of the whole group.
:Sub-group No. of men Average wage S. D. wage
s. d. 8. d.
A 50 61 0 8 0
B 100 70 0 9 0
C 120 80 6 10 0
D 30 83 0 11 0
Total -300
Solution.
Total wages of 50 men in sub-group A=50 X61/. =30501.
Total wages of 100 men in sub-group B=100X 70s.· .... VOOO:r.
Total wages of 120 men in sub-group C ... 120 X 80 5s. "" 96601.
Total wages of 30 men in sub-group D =30 X 831. =24901.
Total wages of 300 men in sub-groups A, B, C and D =(3050+
7000+9660+2490)1. =-22,2001.
Average of the whole group = ~odO s. = 741.
(ii) We know that SI=aI +JI
where, s. is the second moment about any arbitrary number and
tI is the difference between the mean and this number.
Now mean of the whole group is 74.
:.tll=74-61 = 13; d 2 = 74 - 70=4; d3 =80.5-74=6.5 and
d,=83-74=9
Now NuI=Nt(O'll + d1 2) +
N. (0'18+da2) + N. (O'.I+dl) +N,
(O','+d,2)
212 FUNDAMENTALS OF STATISTICS

or 300al -50(64+ 1'69)+ 100(81 + 16)+ 120(100+ 42.25)+ 30(121 + 81)


... 11650+9700+ 17070+ 6060 = 44480
or at -148.3
a-12.2
'Extl1llpJ, 16. For a frequency d,jc;tribution 9f marks in history
of 200 candIdates grouped in intervals 0-5. 5-10...•.. etc.). the mean
and standard deviation we.r~ found to be 40 and 15. Later it was dis-
covered that the score 43 was misr..: '!d as 53. in obtaining the frequency
distribution. Find the corrected mean and standard deviation corres-
ponding to the corrected frequency distribution.
Solution. Let us reconstruct the original and wrong tables.
Let the frequency of the ith group be Ii then:
Class-intervals Mid-values Frequency. Frequency
(correct)
0-5 2.5 .Ii. 11
5-10 7.5 I. II
# .. , •• , . . . . . . ,. ......... , ....... .... ............ ,
, ............
............ ........•... ............ .. .............
3S-4O 37.5 f.
40-45 42.5 f.- I. + 1

~o11
45-50 47.5 110
50--55 52.5 111-1
......... ... ............ .." ......... ............
----
200 200
The value of the mean when 43 was misread as 53 is given by
1
40- 20(r(2.5/1+7·~I+ •..... +37.5/.+42.5f.+47.5fl.+52.5111+ ..... ·)
Let the value of the corrected mean be x.
- 1
Then x .... 200 (2.5/1+ 7 •5/.+.,.+37.5/.+42.5 (/.+ 1)+47.5/10
+52.5 (jtl-l)+ ...... )
Let 2.5/t+ ...... +37.5!.+47.51111+57.5/1t+ ...... -.r
1
Then 40 =2O(j'"(s+42.5/.+S2.5fu) or .r+42.5/.+S~.5fn -800(,l
- 1
and x - 200 [1+42.5 (/9+ 1 )+52.5 (/11-1)]
1
- 200[1+42.5/.+52.5/11+42.5-52.5)
1 7990
- 200 (BOOO-10} - 200 -39.95
MEASURES OF DISpERSION 213

The corrected mean corresponding to the corrected distribution


is 39.95.
Calculation of the corrected Standard Deviation.
When 43 was misread as 53, the second moment about 40 which
was thought to be equal to ai, is given by :
1
(15)1 ... 200 [/1(2·5-40)1+/1(7·5-40)1+ ... +1,(42.5-40)1+/18
(47.5-40)1+/11. (52.5-40)1+ •..... ]
Let/l 37.51+/2 32.51+ .. .j, 2.51+/10 7.5'+/u 17.51+ ...... -s
Then 225 = _1_ [s+2.5~.+12.5o/11] or s+6. 251.+ 156.25/11
20u
=-45000
Let the correct value of the second moment about 40 be sa
Then
1
S· -= 200 [s+6.25 <f..+l)+156.25/u-1)1
1
~ 200 rs+6.25/,+156.2~1l+6.25-156.25)

1 44850
2000 [45000-150} = 200 -224.25
aI=sl-dl where d is the difference between the actual and
assumed mean.
In this example s:l=81 =224.25 and d-=(40-39.95) =0.05
:.aI =224.25-0.0025
=224,2475
:.a =14.97
The corrected standard deviation corresponding to the corrected
distribution is 14.97.
ExtZlllp/e 17. The mean. age and standard deviation of a group
of 100 persons (grouped in intervals 10-, 12-, ... etc.) were ,found
to be 32.02 and _13.18. I.ater it was discovered that the age 57 was
misread as Z7. Find the correded mean and standard deviation.
Solution. The age 57 belongs to the group 56-58 (mid-value
,-57) and the age Z7 belongs to the group 26-28 },mid-value-27)1
, Let the misread frequencies of these two grou:ps be 1 and I.. Then
the corrected frequencies will be (/1+1) and (/r-l) respectively. All
other frequencies have been entered correctly.
Mid-value Frequency (wrong) Frequency (correct)
57 11 11+1
27 I,. /.- 1
214 FUNDAMEN'!'ALS OF STATISTics

Value of the mean when 57 was misread as 27 is given by 32.02


1
100 (s+57/1 +27/,) or s+5711+ 27/.-3202
Where. s is the sum: of the products of correct frequencies and
values. Let the correct value of the mean be x. then
- 1
x= 100 {s+57 (/1+ 1)+27 (/a-1)}
1
100 {s+57 /1+ 27 /.+57-27}
1 3232
- 100 {320:2+30} = 100 = 32..32
,'. Cot;rected mean - 32.32.

Value of the standard deviation when 57 was misread as 27 is given bv:


(ts..18)2 0= 1~ [s+/1(57-30.02)1 + /2(27-30.02)IJ
Where s..,;Z/(X-;;)1 for all correct values 0(/.
1
or 174.0124 co 100 (s+727.9204/1 +9.120412)

It is the second moment about 32.02 when 57 was misread as 27.


Corrected second moment about 32.02=a s is given by

82 = 1~ (s+727.9204 (/1+ 1)+9.1204 (/,;-1)

= 1!0 (s+727.920 J1+9.12041.+727.9204-9.1204)

1~O (17401.24+718.8)= 18~:.04 0= 181.2004

We know that U'o=S2-J2 where J is the-difference bet\Veen the


ac;:tual and assumed mean.
In this example s2 ... 8~-181:2004. and d ... 32.32-32.02"",0.3
:.u2"", 181.2004-0.09 ... 181.1104
.~. C:Orre~ted standard deviation=V 181.1104 ... 13.45
Example 18. The mean and the standard deviation of 1000
values of a variate (grouped in intervals 2.5-,7.5 ... etc.) were fOurid
to be 29.93 and 9.977. Later it was discQyered that in calculating
these values the errata which was supplied with the-data was not consj-
dere<jl. The errata read as follows :
MEASURES OP DJII.-RSION

Grout> 7.5-12.5 for frequency .3 read frequency 28


Group 17.5-22.5 for frequency 120 read frequency 121
Group 27.5-32.5 for frequency 200 read frequency 598
Grout> 32.5-.-32..5 for frequency 175 read frequency 176
Group 47 .5~2.5 for frequency 25 read freqp.ency 27
Calculate the corrected mean and standard devjp, tion for the
corrected series.
Solution. The frequencies in all other groups were recorded
correctly. Let these frequencies be noted by Ii and let T.liK;=s.
The value of the mean when the errata was not considered is given.
by:
29.93= 4~OO [/+(30Xl0)+(120x20)+(200X30)+(175x35)+
25x50)]
1
- 1000 [/+300+2400+6000+6125+1250]

1
~ 1000 [/+16075] :.1-29930-16075-13855

Correct value of the mean is given by :

x- 1~00 [/+(28Xl0)+(121X20)+(198X30)+(176X35)+
(27 X 50)]

... 1~0 [13855+280+2420+5940+6100+1350 ]

30005
... --fooo ... 30.005
4gain let T.li(xi-29.93)2=T.; where I; are the
correct frequen-
cies. Then second moment ahout 29.93, when errata was not
considered is given by : ~

(9.977)2 "" 1~00 [T+30(10-29.93)2+120(20-29.93)1!+200

(30-29.93)2+ 175 (35-29.93)1+25 (50-29.932 ]


1
or 99.5295.,. 1000 [T+38318.195]
or T = 99529.5;_.38318·195=61211·305
·\..
2'16 FUNDAMENTALS.OP STATISTICS

Correct second moment about 29.93 is given by

82 -= 1~ [T+(28X397.2049)+(121X98.6049)+(198X.OO49)+
(176X25.7049)+(27X~02.8049)

- 1~00 [61211.305+38453.695] - l~OO X 99665-99.665

We know that a2=s~-da. Here s2~82 ... 99.665 and d-.07S


:.-a2 =99.665-.0056 ... 99.6594
:. a = 9.98

Merits, demerits and uses of statldatd deviation


The standard deviation possesses most of the characteristi<;s which
an ideal measure of dispersion should have. Thus:
(1) Standard deviation is rigidly defined and its value is always
definite.
(2) It is based on ali the observations of the data.
(3) It is amena.ble to algebraic trea.tment and possesses many
mathematical properties. It is on account of .these properties that
standard deviation is used in many advanced studies. I
(4) It is less. affected by the 1l'uctuations of sampling than most
other measures of dispersion.
. (5) The squaring of the deviations mak.es them pO~ltive and the
dlfficulty_about algebraic signs which was experienced in case of mean
deviation is not found hae.
(.fJ) However, standard deviation is not easy to calculate, nor
is it ~ty understood. In any case it is more cumbersome in its cal-
culation than .either quartile deviation or mean deviation.
(7) It gives more weight to extreme items. anc;lless to those whicb
are ·near the mean, because the squares of the deviations, which are big
in size, would be proportionately greater than the squares of those
deviations which are comparatively small. Thus dev1~tions 2 and 8
are in the ratio of 1 : 4 out their s~uare, i.e., 4 and 64 would be in
the ratio of 1 : 16.
The above merits and demerits of the standard deviation show
that despite some drawbacks, it is th~ best measure of dispersion and
should be used whereverpossibJe. HO'Wcver, the sta~dard deviation
has not found favour with economists and businessmen because it gives
greatCI weight to extreme items and economists and businessmen are
MBASUBns OF DISPBRSION 217 .

more interested in the results of the modal class. Moreover, the


difficulties of its calculation are also xesponsible for its comparatively
lesser popularity with the common man. But it should always be kept
in mind that just as mean is the best measure of central tendency (leaving
~xceptional cases), standard deviation is the best measure of dispersion,
excepting a few cases, where mean deviation or quartile deviation may
give better results.
OTHER MEASURES OF, DISPE:aSI~

Besides range, quartile deviation, mean deviation and standard


deviation there are some other meaSures of dispersion also. They
are not in common use and comparatively much less important than
others. They are :
Modulus
Modulus is the square root of twice the second moment of dis-
persion about the inean. It is generally d~noted by C.
Thus

C - J2fl
Modulus is equal to standard deviation multiplied by the square
[oat of2 or
C=aXV2
Like standard deviation this measure is also based on the second
moment about the mean.
Precision
It is the reciprocal of modulus.
Thus
Precision ...
1
jv:'
Probable ettot
It is equal to .67449 X stanctard deviation.
Modulus, precision and probable errors are used in the theory of
errors of observations. We .shall discuss them in chapter!! on Sampling.
Standard deviation should not be confused with the term "Standard
Error" which stands for the standard deviation of simple sampling.
The. concept of standatd error will also be discussed in details in the'
chapters on Sampling.
Variance
It is equal to the square ~f the standard deviation or in other words
it is the second moment about the mean.
'218 FUNDAMENTALS OF STATISTICS

Coefficient of variation
It stands for the percentage, which the value of standard devia-
tion is, to the value of the mean. In other words, if standard devia-
tion is divided by the mean and multiplied by 100 we get the coefficient
of variation. This measure was first suggested by Professor Karl
Pearson. According to him, coefficient of variation is the "percentage
variation in the mean, the standard deviation being treated as the'tota}
variation in the mean,"
Symbolically
Coefficient of variation or V ... -.!!_X 100
a
-Coefficient of standard deviation X 100
Thus, if the mean of a series is 50 and the standard deviation is 10,
the coefficient of variation would be
10
SO--X 100
or 20%
It means that the standard dcviation is 20% of the n.can.
Ginni's mea,n difference
Corrado Gipni, an Italian statistician, has suggested that instead
of measuring dispersion from any measure of celJtral tendency, the
mean dMrerence, between tne values' of all possibJe p~rs of the variable
should be found out, and it would give a good measure of dispersion.
Thus, thi~ measure of dispersion is equal to the mean difference (regard-
less of algebraic signs) of each possibfe pair of the values of the variable.
Symbolically
Ginni's mean differen£e _l
m
Where g stands for the total of the differences in the values of all
possible pairs of a variable and m stands for the total number of diffe-
rences. The tot~l number of differences would be equal to ,j n (n-l)
The following- example would illustrate the above formulae : -
Exampk 19. Find out Ginni's mean difference from the following
items : -
22, 24, 26, 28, .30.
SO/filion
30-22=8 28-22 ... 6
'30-24=6 28-24=4
30-26=4 28-26 ... 2
30-28=2
Total .. 20 -12
MEASURES OF DISPERSION 219

The sum of all the differences or


g_(20+12+6+2)~40
The tota-I number of difference =1 n (n-l)
-1 5 (5-1)-10
' " s mean d'Er
G lOnl 40 <= 4 •
g = 10
Juerence -;;,-

The mean deviation of the above series -2.4 and the standard devia-
tion-2.8.
Giani's mean difference is always more than the mean deviation
as it gives greater importance to extreme variations. The value of
Giani's mean difference lies in the fact that it studies the variations
(JIIJongII the values of a variable rather from a central value.
If the- square root of the average of the squares of all dif¥erences is

fuund it would always be equal to aJ 2 ( n1 )


·or nearly" '\>"2--

1n other words, it would almost be equal to the value of modulus.


In the above example the average of the squares of all the dHfe-
. .200 201 . '-
rences IS -W0r'. ts square root 1S V' 20. The standard deviation

of the series ... V -,fl.

Thus jZ-::j~ x j 2( 55 1)
-~/~xJ}
-J 40 X 2..00:J20
5 2

lteIationship between various measures of dispersion


For 1I. normal distribution Or even for a moderately asymmetrical
distribution the following relationshio between quartile deviation
220 FUNDAMENTALS OF STATISTICS

mean deviation and standard deviation hold good :_

Measure of Percentage of observations included Si2:e of various


dispersion within a certain range on either measures of
side of the mean dispersion in
relation to
standard devia-

d
tion.
--_----- :±: 1 stan- :±: 2 stan- ::I: 3 stan-
dud dev;,,- dard devia- dard devia-
tion tion tion

Quartile 50.0 82.3 95.7 0.6745


deviatiOn

Mean 57.5 88.9 98.3 0.797.9
deviation

Standard 68.3 \ 95.4 99.7 1.000


deviation
-
Thus fot a symmetrical or moderately asymmetrical.distribution:-
I
(1) The quartile deviation is .6745 times the standard deviation or
roughly 2/3 rd the value of the standard deviation.
(2) The mean deviation from mean is equal to .7979 times the
standard deviation or roughly f.th of the standard deviation.
(3) It follows from the above that a range six times the standard
deviation is equal to a range nine times the quartile deviation and 7.5
times the mean deviation. Within these ranges at least 99% of the ob-
servations are covered.
(4) Mean:±: 1 standard deviation would include 68.3% of the
cases.
Mean :c 2 standard deviation would include about 95.4% of
casc.. '
Mean::f: 3 standard deviation would include 99.7% of the cases.
Mean :c 1 quartile deviation weuld include 50% of the cases.
Mean ::I: 1 mean deviation would inclUde 57.5% of the cases.
(5) The prObable error is .6745 times of the standard de~.
Thus mean :I: 1 probable error :would COver roughly 4% of all
the observations.
MBASUaES OF DISPERSION ~21

Choice of a measure dispersion


We have already studied the merits and demerits of various mea-
sures of dispersion and we are not in a position to make a comparative
study of their qualities. It would help in the selection of an appropriate
measure of dispersion for a particular problem under study. Range,
as a measure of dispersion, suffers from serious drawbacks; it is an un-
stable measure, affected considerably by the fluctuations of sampling,
and as such, its use cannot be advocated except in cases where the varia-
tion in the size of items is very little. The quartile deviation is a better
measure than the range. as it is not affected too much by the values of
extreme items. It is easily calculated and is readily understood. In
these respects it is better than even the mean and standard deviations.
But quartile deviation.has no .algebraic propert~es abd its behaviour under
fluctuations of sampling is freakish. As such its use can be recommended
pnly in those cases where mean deviation or standat:d deviation cannot
be easily calculated or its calculation is impossib1.e. as in case of indefinite
extreme classes (like more than 1000 or less than 100). Between mean
deviation and standard deviation, the former has an advantage of copt-
'parativdy simple calculation and easy understandability, but the mathe-
matical properties possessed by the standard deviation are not found in
this measure, and it is not easily amenable to further algebraic treatment.
However, in cases where median is supposed to be an iQeal average, the
best measure of dispersion would probably be the .mean deviation. In
other cases the standard deviation scores over all other measuc('$ of dis-
persion. We have already seen that amongst the measures of central
tendency, mean occupies a unique position, and the same positic;>n is
occupied by the standard deviation amongst the measures of dispersion.
Standard deviation is rigidly defined, is based on all the observations, is
capable of algebraic treatment, and is not affected very much by fluctua-
tions, of sampling. However, it should·be kept in mind that standard
deviation gives comparativdy greater importance to extreme variations,
which should usually be ignorged.
The above' discussion leads us to the conclusion that though the
choke of a measure of dispetsion would depend on the nature purpose
and object of an inyestigation, yet for all practical purposes th; standard
deviation is a better mt1tSuce of dispersion than otbers.
It should further be remem'beted that for comparison of variability
of two series, we should always choose a relative measure of dispersion
Absolute measures of dispersion sometimes give very misleading con-
clusions. If, for example, the protits of two a>mpanies A and B during
'the .last t~ yeoars are as follows ! -
A
(Rupees)
2,000
3;O<K'
4.000
222 FUNDAMENTALS OF STATISTICS

The range in both the cases is Rs. 2,000 a.nd the mean deviatigr,
is Rs. 666.7 in both the cases. The absolute measures of dispersion
are thus equal but the variation in the two series, is, in reality, not iden-
tical. If, however, we calculate relative measures of dispersion diis
anomaly would be removed. The coefficient of range in the two cases
would be land 11 respectively and similarly the mean coefficient of dis-
.
perslon wou1 2 and
d be 2 respective
63 ' 1y. In ..
comparIng .
dIsperSlon
9
of two series, expressed in different unitS, the use of relative mea-
sures of dispersion is inevitable because absolute measures of dis-
persion in such cases would be in different units.
Lorenz curve
.
Dispersion can be studied graphically also with the help of what
is called Lorenz Curve, after the name of Dr. Lorepz who first studied
the dispersion of distribution of wealth by the graphic method. The
technique of drawing Lorenz Curve is not very difficult. In it the size
of items and the frequencies are both cumulated and taking the total as
100, percen'tages are calculated for the various cumulated values. These
percentages are plotted on a ~raph paper. If there is proportionately
equal distribution of the frequencies over various values of a variate, the
points would'lie in a straight line. This line is called tpe "Uneo! Eqllal
Dirtribllfion." If, however, the distribution of items is not proportioll-
'itelyequal, it indicates variability, and the curve would be away from
the line of equal distribution. The farther the curve is from this
line, the greater is the variability in the~ries. The following example
would illustrate the procedure of drawing'-a Lorenz curve : -
Example 16, Draw a Lorenz curve from the t'ollowing data :-

Number of persons in thousantls


Income in thou-
sand rupees Group A Group B Group C

10
20 I
i
I
5
10
8
7
I 15
6
40 20 5 2
50 25 3 1
80 40 2 1
- -
To draw the Lorenz Curve from the above data the size of the item
and frequencies would have to be cumulated and then percentages would
have to be calculated by taking the respective totals as 100., This is
MBASt11tlfS OF DISPERSI<?N 223

done in the following 'table : -


Income
------------------------
Group A Group B Group C

10 10 5 5 5 5 8 8 32 I 15 15 1 60
20
40
30
70
15
35
10
20
15
35
15
35
7
5
15
20
60
80
I6
2
21, 84
23 \ 92
50 120 60 25 60 60 3 23 92 I 1 24 96
80 200 100 40 100 100 2 25 100 I 1 25 100
--~------~---------------
Now the cumulative percentages would be plotted on a graph paper.
Percentages relating to the number (Jf person would be shown on the
abscissa and from left to right the scale would begin with 100 and end
with O. The income percentages would be shown on the ordinate and
here the scale will begin without' the bottom and go up to 100 at the top.
The above percentages would give the following type of curve :
224 PUNDAlmNTALS OP STATISTICS

From the above figure it is clear that in the first group of persons,
the distribution of income is proportionately equal 110 that 5% of the
income is shared by 5% of the population, 15% of the income by 15%
of the population ancfso on. It gives the line of equal distribution.
In the second group the distribution is uneven so that 5% of the income
is shared by 32% of the people and 150/0. of the income by 6()0,4 of the
people. In the ttir~ group the distributIon is still more un~qual so that
5% ofthe income is shared by 60% of the people and 15%oftheincome
by 84% of the people. The variation in group C is thus greater than the
variation in group B. Curve C is thus at a greater distance from the
line of equal distribution, than ~rve B.
The Lorenz curve has a great drawback. It does not give any
numerical value of the measure of dispersion. It merely gives a picture
of the extent to which a series is pulled away from an equal distribution.
It should be used along with some numerical measure of dispersion. It
is very useful in the study of income distributions, distributions of land
and wages, etc.

Questiool
1. What is meant by dispenion? What are the methods of computing mra-
sures of dispetsion ? Illustrate the practical utility of such methods.
eM. C_., Ail••, 194').
z. Explain the meaning of the term djspellion and distingui~ between absolute
and relative measures of dispellion. (B" C_•• Allaha/Hui. 1946).
3. Discuss the various ways in which the diifctences in the characteristics of
frequency djsttibutions ate generally measured. CB. C_ •• LIK_",. 1957).
4. Explain the various methods of describing the Idltter of a frequency distri-
bution and say what you know as to the relztive worth of the relztive measures.
(B.U..,NfII1lW. I 944)·
5. Frequency distributions may either differ in the numerical size of their ave-
rages thoogh not neccssatiJy in their formations or they may have the same valucs
orthe average but differ in their respective fonnations.
Explain and illustrate how the measures of dispersion afford a IUpplcmcnt to the
informatiOn about the frequency distributions given by the a~.
(M. C_ .• KlljJ1ldlrlltl. I.9S Z).
6. Ddine carefully the mc:an deviation. standard deviation and quartile devia-
tion of any given distribution. In wbat problems should each be uacd ?
(M. A.. AlJ6habtu1. 1940).
7. What arc the mathematical properties of standard de"jation? How is it"
better measure of dispersion than the mean deviation or quartile deviation ?
8. What is meant by Sheppard's Coucctions? Under what c:onditiosls should·
these. corrections be made ?
9. Define dispersion. Why is it necc:swy to measure dispctsiosl in ord er to
make comparisons of frequency d,isttibutiona ?
10. What is range? What ate ita advantages and disadvantages as mcslUre of
dispCllion ?
n. Find directly the standard deviation of the natural aucibers &om 1 to 10
and VCtify the answer obtained by a abort cut method.
U. Write abort notcs on
(II) Lotens Curve (/1) Charlier's Check (f) Ginni's Mean Differcucc Cd) pre.
cision ee) Modulus (I) Root Mean Square deviation.
MEA ~URES OF o:.>PERSIO" 225

13. The following table gives weights of one hundred persons. Compute the
coefficient of dispersion by the Method of Limi(s.
Weight in lbs. of 100 persons
Class-interval No. of pefSons
85- 95 4
95-105 13
105-115 8
115-125 14
125-135 9
135-145 16
145-155 17
155-165 9
165-175 8
175-185 2

100

14. What arc the different measures of dispersion ? Th~ following table gives
the height of one hundred persons. Ca1culate the dispersion by Range Method.
Height of 100 persons in inches
Height in incht:s Frequency
'Below 62 2
63 8
64 19
65 32
66 45
67 58
68 85
69 93
70 100
"
15. The following are the marks obtained by a batch of 9 students in a certain
test : -
Serial Number Marks Serial number Marks
(out of 100) (out of 100)
1 68 5 54
2 49 6 38
3 32 7 59
4 21 8 66
9 41
Calculate the mean deviation of the series.

16. Summary of receipts and Passengers of a certain Motor Bus Company


Year Receipt Passengers
1925 2,~54 SO,010
1926 2,780 61,060
1927 3,011 70,005
1928 3,020 70,110
1929 3,541 83,001
1930 4,150 91,100
1931 5,000 1,00,000
(B. Com. Allahabad, 1932).
i-rom the foregoing data, find out one measure of dispersion and state whether
the ~;\riation in receipts is greater than in passengers.
226 PUNDAtof!NTALS Olf STATISTICS

17. Find Mean Devilltion of the distribution given below ; -


No. of PetJQns having said No. of Persons havlkf: said
accidents numbet of accidents: accidents number of acei at.:
I) 15 7 2
1 16 8 1
2 21 9 2
3 10 10 2
4 17 11 0
5 8 12 2
6 .oJ
Total 100

18. Calculate the mean deviation from the following data, what light does it
throw on the social conditions of the c:;ommunity ?
Difference in age between husband lUld Wife in It particular co~munity.
Difference in years Frequency Difference in yeatS Frequency
0- 5 449 . : 20-25 109
5-10 705 25-30 52
10--15 507 30--35 16
15--20 281 35--40 .oJ

19. The following table gives the age distributions of swdents admitted to a col-
lege in the years 1914 and 1938. Find which of the two,gri>ups is more variable in age.
Number of stl](fcnts admitted 1n
I
Age 1914 19.38
15- 0 1
16- 1 6
17- 3 .4
18- 8 :2
19- ·12 ·5
20- 14 :0
21- 14 7
22- 5 .9·
23- .2 3
24- 3 0
25- 1 O.
.Q 1
26-
27- 1 0

Toral 6-4 1.48


(B. C_. ~. 19-42).

ZOo Calculate quartile dniation and its coefficient of A's monthly eAminp.
for It year.
Months Monthly earnings Months Monthly earnings
Rs. ·Rs.
1 139 \7 160
2 150 8 161
3 151 9 162
4 151 10 162
5 1.57 11 173
6 158 12 175
227
21. From the following table giving height of student$ calculate the Semi-
[nterquartile Range and' the Coefficient of Quartile Deviation.

Height 'in inches No. of students Height in inches No. of students


53 25 63 24
55 21 65 22
57 28 67 18
59 20 69 23
61 18

22. Find out the Standard Deviation of the following items : -


8, 10, 12, 14, 16, 18, 20, 22, 24, 26.

23. Compute the standard deviation'of the rainfall in the varioQ.S jute-growing
listricts of Bengal from the following statement : -

>jstricts Rainfall in inches Districts Rainfall in incj1es


(1939 July) (1939 July)
',4-Parganas 17.36 Rajshahi 21.23
,{utshidabad 19.17 Dacca 27.10
~ulna 22.99 Chittagong 40.97
lurdwan 17.00 Cooch-Bihar 26.58
..r.idnapur 14.99 Hoogly 17.67
(B. A. HOI1., Pwrjob, 1941).

24. Calculate the standard deviation of the following two series. Which shows
:ceater deviation ?

Series A Series B Series A Series B


192 83 260 126
288 87 348 126
236 93 291 101
229 109 330 102
184 124 243 108
(P. C. S. 1938).

25. Find standard deviation of the figures in the following table to show whether
he ","riation is great in the area or the yield ?

Yield in lacs of bales


Years Area in lacs of~acres of 400 lbs. each
1914-15 1~ G
-16 114 51
-17 138 50
-18 154 45
-19 144 40
-20 153 53
-21 144 59
-22 in w
-23 136 63
1923-24 154 60

26. The index numbers of prices of cotton and COli.] shares ill 1942 "ere as under:-
;l28 FUNOAMEN'l'ALS OF STATISTICS

Index number of Index number of


Month prices of prices of
cotton shares coal shares
January 188 131
February 178 130
March 173 130
April 164 129
May 172 129
June 183 120
July ~q4 127
August 185 127
September 211 130
October 217 137
November 232 140
December 240 142
\'V'hich of the two shares do you consider more variable in price ?
(M. A. Agra, 1944).
27. The fluctuations in the rates ofKohinoorand Tata Deferred on the 7th Ma>'ch
are given below. Find out which of the two shares shows greater variability.
Kohinoor-618, 619, 616, 623, 620, 624, 622, 625, 622, 625, 626, 625.
Tata deferred-2152l, 21321, 2134!, 2132t, 2145, 2142t, 21461. 2130, 21461,
21421, 2150, 2135, 2152t. (Bombay. 1955).
28. The following table gives the number of fipished articles turned out per day
by different number of workers in a factory. Find the~mean value and the "standard
deviation" of the daily output of finished articles, and explain the significance, of
'standard deviations'. : -
Number of Number of Number of Number of
articles workers articles workers
18 3 23 17
19 7 24 18
20 11 25 8
21 14 26 5
22 18 27 4
(D. Com. Calcutta, 1937).
29. Calculate the Standard Deviation of the following data with regard to 2,298
families in the U. K.
Number of persons Number of Number of persons Number of
in the family families in the family families
1 165 7 7
2 552 8 41
3 580 9 20
4 433 10 8
5 268 11 5
6 148 12 1
Total 2,298
(M. A. A/M., 1942).
30. Find out the mean daily earnings and standard deviation of earnings from
the following data :
Rs.
36 men get at the rate of 5.0 per man per day
40 5.5
94 6.0
138 6.5
80 7.0 "
"
61
25
7.5
8.0 ..
MEASURES 01' DISPERSION 229

31. Calculate the standard deviation for the following table giving the age dis-
tribution of 542 members of the House of Commons.
Age No. of members
2~ 3
30- 61
40- 132
50-- 153
6~ 140
70-- 51
80-- 2
Total 542
I
32. The following table gives the frequency distribution of expenditure on food
per family per month among working class families in two localities. Find the arith-
metic average and the standard deviation of the expenditUle at both places.
Range of expenditure No. of families
in Rs. per month Place A Place B
Rs. 3- 6 28 39
6-- 9 292 284
" 9-12 389 401
" 12-15 212 202
15-18 59 48
18-21 18 21
~~ ~ 5
(P. C. S., 41).

33. Find the mean yield of paddy and the standard deviation for the distribution
of the results of 3,061 crop-cutting experiments shown in the following table --
Yield of paddy per acre in
Lbs. No. of experiments
0- 400 236
401- 800 481
801-1200 604
1201-1600 576
1601--2000 419
2001--2400 333
2401--2800 217
2801--3200 87
3201--3600 64
3601-4000 23
4001-4400 14
4401-4800 6
4801--5200 1

3061
(B. Com., Bombqy, 1945).

34. Calculate the mean and standard deviation of the following series--
Marks Number of students Marks Numbct of students
1- 5 1 21-25 7
6-10 18 26-30 2
11-15 25 31-35 1
16--20 26
230 I'UNDAU;ENTALS OF STATISTICS

35. Find out the mean and standard deviation of the following data : -
Age untler Number of persons Age under Number of persons
dying dying
10 15 50 100
20 30 60 110
30 53 70 115
40 75 80 125
36.Find out the co-erlicient of variation of the following series :-
Number of Number of
Income persons Income persons
More than 1000 0 More than 500 600
900 50 400 750
800 110 " 300 350
700 200 200 900
600 400 " 100 1000
37. Calculate the standard deviation of the following seri,cs:-
Marks Number of students
More than 0 100
10 90
20 75
30 50
40 25
SO 15
60 5
70 o
33. Find out the m=an and variance from the following data : -
I
Factory .A Factory B
Wages No. of No. of
workers workers
Not exceeding Rs. 40 30 45
Exceeding Rs. 40 but not exceeding Rs. 80 25 35
80 120 30 25
120 160 45 40
160 200 25 25
200 240 13 20
240 280 24 5
" 280 320 8 5
Tot21 200 200
39. A collar ffi'lnufacturer is considering the production of a new style of collar
to attract young men. The follOWing statistics of neck circumferences are available
based upon measurements of a typical group of college students : - '
Mid-value No. of students Mid-value No. of students
(inches) (inches)
12.5 4 15.0 29
13.0 19 15.5 18
13.5 30 16.0 1
14.0 63 16.5 1
14.5 66
Compute the Standard Deviation and use the criterion (X ±3 Standard Deviation)
to determine the largest and smallest size of collars he should make in order to meet
the needs of practically all his customers, bearing in mind that collars are wom. on
~:age, ! inch larger than neck si.l.e. (D. Com., RRj., 1949).
loiHASUllES 01' DlSPERSION 231

,-- 40. Calculate the arithmetic tTerage and the standard deviation of the following
figures and state the percentages ot cases which He outside the mean at distance II ± (f,
'1I±2a, "±3a, where (1 stands for the atandard dCTiation.

148, f45, 141, 116, 96, $II, 87, 89, 91, 91, 102, 95, 108, 120, 139.
41. Find the S. D. of the following frequency distribution : -
Exceeding But not exceeding Frequency
5.5 6.5 4
6.5 7.5 2
7.5 8.5 5
8.5 9.5 7
9.5 10.5 9
10.5 11.5 4
11.5 12.5 2
(M. A., Agrll, ,1934).
42. The following table relates to the profits and losses of 100 firms. Calculate
the average profits and the standard deviation of profits.
Profits Rs. Number of £inns
5000 to 6000 8'
4000 to 5000 12
3OQo to ~OO 30
2000 to 3000 10
1000 to 2000 5
Oto rooo 5
-1000 to 0 6
-2000 to -1000 8
-3000 to -2000 9
--4000 to -3000 7
43. In any two series, where /1 and /. represent the deviation from a trial average,
100,
X/l -180 ,E/11=245320
XJ.-250 ,EJ,I-4385Q
II .... 100
Calculate the c:odfident of variation for the two series.
44. In any two aamplCl, wh~ the variatCi Xl and X. arc measured in the same
units,
"1-36 (summation) L'Xll=49428
",-49 ., 2',..,1..,71258
Compute the values of the StandArd Deviations of the two samples. What
additional information is required to calculate the co-dficient of variation of the above
two samplCl? Indicate the uses of such a coefficient. (B. Ctmt., LIKIcn~. 43).
45. An analysis of the monthly wages paid to workers in two firms A and B,
belonging to the same indusn-y, gives the following results : -
Firm A Firm B
Number of ~e-oarnetS 586 648
Ayerage monthl: w~ Rs. 52.5 47.5
Varian~ of the distnbution of wage 100 121
(II) Which firm, A or B, pays ~ut the larger amount as monthly wage. ?
(.) In which finn. A or B. is there sreater variability in individual wages ?
(&) What are the m~rCl of ('I aTctsge monthly wage, and (ii) the variability
ill individual waCCS. ot all the workers In the two .&nn•• A and B, taken toscthcr.
(1. A. S., ,11., ~",.,., 1951).
232 FPNDAMENTALS OF STA'l1ISllCS

46., The following table gives the marks obtained by 100 'itudents ; - -
Digits (Division of Class-interval)

Marks 0 1 2 3 4 5 6 7 8 9 Total
0-9 2 4 3 1 1 1 12
10-19-----1--5 3 4 2 1 15
- - -20--29 ! -
1 7 8 10 5 4 3 2 40
- - - 30-39 3 5 10 2 1 1 22
40--49-- 4 3 2 2 11
100

Thus, 4 marks obtained by 3 students 13 marks by 4 students, 35 marks by 2


students and so on.
Calculate the mean marks and standard deviation of marks ; -
(i) By using the totals only.
(ii) By using the Whole data.

47. How do you calculate the co-efficient of variation of a distribution?


What is the justification for saying that about 68 per cent of the observed values
lie within one standard deviation of the mean value ?
The following marks Were given to a batch of candidates ; -

66, 62, 45, 79, 32, 51, 56, 60, - 51, 49


25, 42, 54, 54, 58, 70, 43, 58, 50, 52
38, 67, 50, 59, 48, 65, 71, 30, 54, 55
82, 51, 63, 45, 53, 40, 35, 56, 70, 42
67, 55, 57, 3D, 63, 42, 74, 58, 44, 55

Find the co-efficient of variation of the marks.


Also draw a cumulative frequency curve, and from this curve find the proportion
of candidates receiving more than 50 marks. (T. A. S., 1953).
48. Explain the terms ; Frequency distribution, frequency polygon, frequency
histogram and frequency curve.
For a cert;ain group of 'Jart~' weavers ofBanaras. the median and quartile earnings
per week are Rs. 44.3, Rs. 43.0 and Rs. 45.9 respectively. The earnings for the group
range between Rs. 40 and Rs. 50. Ten per cent of the group earn under Rs. 42 per
\\ e!k, 13 per cent earn Rs. 47 and over and 6 per cent Rs. -48 and over. Put these
data into the form of a frequency distribution and obtain an estimate of the mean
wage and the standard deviation. (P. C. S., 1956).
49. Compile a table showing the frequencies with which words of different
numbers of letters occur in the extract reproduced below (omitting punctuation
marks) treating as the variable the number of letters in each word, and obtain the
mean, median, and the co-efficient of variation of the distribution ;
Success in the examination confers no absolute right to appointment, unless
Government is satisfied, after such enquiry as may be considered necessary, that the
candidate is suitable in all respects for appointment to the public service.
(I. A. S .• 1947.)
50. The following table gives the frequency distribution of area under wheat
in a sample of 282 villages in Meerut District during 19~6-37. Calculate (0) the btan-
dard deviation, and (b) the semi-interquartile range of the distribution ; -
MEASURES OP DISPERSION 23~

Bighas under Frequency Bighas under Frequency


wheat wheat
0 3 1,000 HI
100- 7 1,100 14
200- 10 1,200 14
300- 17 1,300 16
400- 33 1,400 8
500- 29 1,500 8
600- 27 1,600 6
700- 21 1,700 5
800- 23 1,800 2
900- 20 1,900-2,000 1
(I. A. S., 1949).•

51. What are measures of dispersion of a distribution? Why is the standard


::leviation most commonly used as a measure of dispersion in statistics ?

Goals scored by two teams A and B in a football season were as follows:-

Number ot goals scored Number of Matches


in a match A B
o 27 17
1 9 9
2 8 6
3 5 5
4 4 3

By calculating the co-efficient of variation in each case, find which team may
be considered more consistent. (I. A. S., 1954).
52. Explain the method of computing the standard deviation of a frequency
distribution from a working origin different. from the arithfIletical mean.
Calculate the standard deviation for the data given below using the interval,
50-59 as working origin : -

-Class-interval Frequency
0- 9 2
10- 19 4
20- 29 23
30- 39 30
40- 49 40
50- 59 45
60- 69 35
70-79 25
80- 89 12
90- 99 9
100-109 6
110-119 10
120-129 3
130-139 1
140-149 1
150-159 3
Total 249

How would the value obtained above be modified if. you have to adjust it for
the reason that the data are grouped in class-intervals ? (r. A. S., 1956).
FUNDAMENTALS OF S.TATl:;U(';:S

53. The following is a record of the number of bricks laid each day fot 20 daya
by two bricklayers A and B :-
A- 725, 700, 750, 650, 675, 725, 675. 725, 625, 675,
700. 675. 725, 675, 800, 650, 675, 625, 700, 650,
B- 575, 625, 600. 575, 675, 625, 575, 550. 650, 625,
550, 700, 625. 600, 625. 650, 575, 675, 625, 600.
Calculate the co-efficient of variation in each case, and discuss the relatlYc consis-
tency of the two bricklayers. If the figures for A were in every case 10 more and
those for B in every case 20 more than the figures given above. how would the ans-
wer be affected ? (M. Com., BtmurIU. 1950).
54. A distribution consists of three components with frequencies of 200,
250 an? 300 having means of 25. 10 and 15 and standard deviations 0( 3. ",
and 5 respectively. Find the mean and the standard deviation of the combined
distribution ? (M. Com., B4narar. 1954).
55. Suppose each measurement in a distribution is multiplied by 2. What
happens to the : -
(it) mean of the distribution
(/I) variance" ..
(l») standard deviation of "
(J~ each of the three if .. is added to each meaSlUCment ?
56. Compute the values of arithmetic average, mode, median and standard
deviation for the following observations :
96, 8.... 10.3, 88, 92, 98, 100, 96, 87
92, 94.
57. Suppose a group of children have a distribution of I. Q. Scores with mean
100 and standard deviation 10. If one child with I.Q. 70 is reroOfed, what wllI be thc
c:fi"ect on the mean, and slllndard deTiation.
58. Three distributions each of 100 members and standard deviatlon 4.5 units
are loated with their arithmetic means at 12.1, 17.1 and 22.1 units respectively. Find
the standard deviation of the distribution obtained by combining the chCQI '1

S9 The (irst of the two samples bas 100 items with mean and standard deo:rla-
tion ,: If the whole group has 250 items with mean 15.6 and standard deviatiOn
vIT44 find the standard deviation of the second group. \
. , (M. A., Beo., Ik/~/, 1~"91
60. The mean and the standard Deviation of a sample of to? observa.tlOfls waS
calculated a9 40 and 5. I respectively by a student Who took by mIstake '.0 mstcad of
40 for one observation. Calculate the correct mcan and standard deYlallon.
61. Co-efficient of variation of two series are 60% and 70%. Their standard
deviatjons are :z [ {lnd z6. What are their arithmetic means?
62. Given: Number Mean Variance
IG~ ~
11 Group 60 5'
1 and II Group combincd 95 u .,
Find the missing items.
63. Indicate the extent of dispersion graphically for the data giycn in the
follOWing table ; -

Years
Income (in thousands)
I '.><
AII
B1
6
t6
55
8
ao
,6
JJ
(8
57
9
18
58
8
ZO
59
10

Z2
60 61
12

36
10
18
6z 6_
J4
zz
u.
110

64. The tablc given below gives the population and weekly earnings of twO
localities-A and B. Represent the data graphically to bring out the inequalities
of dil;tribution of earnings.
MBASUB,ES OF DISPERSION 235

Weekly earning
(in Rs. I
o-:to I 2
20-40 6 S
40-00 8 zo
60-80 IS zS
So-IOO 20 4~

65. Find the actual class-intervals from the data given below :
dx -3 -2 -I 0 1 :l S
f 10 15 25 25 10 10 5

Mean = 31 and standard Deyiation = 15.9.


Moments, skewness and Kurtosis.
Moments, Skewness and
Kurtosis 11
MOMENTS

While discussing the calcula}ion of mean deviation and standard


z 2
. .
d eVlatlon we h ave d efi ne d Ed
- an d I:d
~
( or I:jd
- an d I:fd
- - 1n •
case
n n n n
or' discrete and continuous series ) as the First Moment and Second
Moments about the mean respectively. If the deviations are not taken
from the actual arithmetic average but from any other value x then
Edx or I:dx or (Ef~~ or ~fdx2) are known as First and
2

,~ n n n
Second Moments about the value x. It is obvious that any moment about
II value other than mean, would be more than the value of the moment
about the mean. Thus the first moment about the mean is 0 because the
sum of the deviations from the mean is-always o. The second moment
about the mean is the variance or the square of the standard deviation.
Just as we can calculate the first and second moments e\ther about the
mean or about any other value similarly 3rd, 4th, 5th and nth moments
can be calculated either about the mean or about any other value of the
variate. Thus the third moment about the mean or
};fd3
7t3= n and

the nth moment or


Efdn
1t n =n
Just as it is possible to calculate the second moment about the mean,
or the variance, from the second moment about any other value, similarly,
aU other moments about the mean can be expressed in terms of moments
about any other point. The following illustration would clarify these
points :-
Example I. Calculate the first, second and third moments about
the mean from the following data :
Size of item Frequencies
2; 10
4 15
8 8
10 7
MOMENI'S. SKEWNESS ANn KURTOSIS 237
Soltdion. Calcu~ation of the moments about the mean

8 I S
rIO
t>.. o tl 0 a<'I
E u ..!;:lpo."'-" J:j
o~
q ........ ",«~
*~ ----- :g S'-" 2
~
0 ...... q --0 .._.. co
......
0 &.'-' .;:: So II) U ~
-.:::... ~ ~
0 ...... ..... d
..... <'I
fd 2 d3 fd 3
bl)
........ ~ ~
~
~ :J
4.) •21 :J <'I • ..-4 U .....
~
N
po. '"
V <'I
U5 ~<'I
rIO
0
.2 10 -2 -.20 '40 -80 -HS 11.22 112.2 -37. 6 -376 .0
4 IS 0 0 0 0 -l.~S 1.82 27·3 -~·46 - 36 '9
8 8 4 32 u8 p2 +2.65 7. 02 56 . 1 10.6 148.8
10 7 6 42 252- 151 2 +4·65 21.62 15 1.3 100·7 70 4.9
Total 4 0 +54 4 20 1944 346 ,9 440.8
The first moment about the arbitrary origin (4) or
Efdx 54
vl = - - - = -=1·35
n 40
The first moment about the mean or
, S4 54
7tl=v1-V1 = - - -=0
40 40
The second moment about the arbitrary origin (4) or,
Efdx2 420
Vz = - - = -=IO·S
n 40
The second moment about mean or
7tZ=V2-V12 = 420
___ •.
r(

:(5- 4)2
" •

0
~ 86
- . 8
40 4
The third mOment aDout the arbitrary origin (4) or
Efdx3 1944
vs= - - = - = 4 8. 6
n 40
The third moment about the mean or
1Ta=Pa-3PtV2+2.V13
= 1944
40
(; X 54 X">42.0) +2 X
4 0 X40
(Jj;)3
4,0
=48.6 -42.5 +4.9 2
= 1 1.02

If we were to calculate the various moments about the mean by


taking the actual arithmetic average we would have got the same answers.
Thus
FUNDAMENTALS OF STATISTICS

Similarly the fourth moment about an arbitrary origin or


Efdx4
1 ' , = - - and
n
the fourth moment about the mean or
1T,==v",-4 VI 1'2+ 6 1'1 2V2-3 1'1'
Band y Coeflicients
Certain values derived from the moments are of special import-
ance, particularly in a study of Kurtosis.
Thus

and further
1'1 = +v'i3;
and
1'2=~2-3
Thus for example No. I.

(11"02.)2 _J(I1.02.)'
~l = (8.68)3 and 1'1 - (8.68)

We shall see a little later how these measures are of importance tn


studying the departure of a curve from normality (in a study of Kurtosis).
SKllWNESS

Need and lneanin.g. In our studies so far, we have discussed the methods
of measuring the central tendency of a frequency distribution and the
methods of studying the concentration of items ro'und the central value.
These measures of central tendency and disperSion do not reveal whether
the dispersal of values on either side of an average is symmetrical or not.
If observations are arranged in a symmetrical order round a measure of
central tendency, we get what is called a "symmetrical distribution."
When plotted on a graph paper such a distribution gives a normal or
ideal curve. A normal curve has many mathematical properties, which
we shall study in a later chapter in which we shall discuss the various types
of theoretical frequency distributions. For the present it would suffice
to say that in a normal distribution the values of the mean, median and
mode coincide and the quartiles are equidistant of the median. It is obvious
that in such cases the sum of the deviations measured from the mean,
median or mode would be o. We have already mentioned in earlier
chapters that the empirical relationships between various averages and
measures of dispersion hold good only in a symmetrical distribution.
MOMENTS. SKEWNESS AND KURTOSIS 2.39.

Anormal curve is a bell-shaped frequency curve in which the values on


either side of a measure of central tendency are symmetrical.
1n order to study a frequency distribution it would be of great use
to know whether it would give a normal curve, and if not, to what extent
it would §!eviate from a normal distribution. In fact measures of central
tendency and measures of dispersion should always be supplemented by
what are called measures of skewness. Skewness is opposite of symlHttry
and its presence tells tiS that a particular di.rtriblltion is not symmetrical or in
other words it is skew. Thus averages tell us about the central value of a
distribution, measures of 'dispersion tell us about the concentration of
items round the central value, and measures of skewness tell us whether
the dispersal of itemS from an average is symmetriCal or asymmetrical.
The following figures give us an idea about the shape of symmetrical
and asymmetrical cu:r;ves.
Figure No. I gives the shape of an ideal symmetrical curve. it is
bell-shaped, and in it, there is no skewness. The value of mean, median
and mode in such a curve would be identical.

( ~.
/ '

\
J ~

V
/ \\
L/ a "--
M
Z
Figure I.

Figure No. z gives the shape, of moderately skewed curve. It is


skewed to the right. In it the value of mean would be more than the
values of median and mode. Median would have a value higher than
the value of the mode. Such curves are called p()sitiflety· skew.
240 FUNDAMEN'rALS OF S'rA'rIS'l'ICS

( " ~
1\
I f\
II '\
~
./ Z Md
~
Figure z.
Figure No. 3 also gives the shape of a moderately skew curve.
This curve is skewed to the left and in it, the value of mode would be
greater than the value of median and the value of median would be greater
than the value of the mean. Such curves are called negatively skew.

V
I \
/
1
"\ ,

~
/ I ~
(1M Z
Figure ;.
UOlGlNTS. lOWNESS AND s:uaTOSIS 241

T_t oIl11ewaea
In order to find Qut whether a particular distribution is ,Ikew cer-
tain testa are u~a1ly applied. They ale as followa :-
(.) In a lkew distribution val,ues of mean. median and mode
would not coincide. The ttlean and mode would be pulled wide apart
and median would usuilly lie between them. Vie have already seen
that· in modetate1y asymmetrical distribution ;
Mean =Modc+ I (Median-Mode)
(j) In a Ikcw distribution the two qual' tiles would not, be equi-
distant from the median or in other words (12,- M)-(M- 121) would
not be O.
(e) A skew distribution when plotted on a graph paper would not
gi'Ye a .ymmetrl~ bell-shaped curve.
Mouurel ollkewnel'
The abo..e mentioned 'testl would indicate whether a particular
distribution ia skew or not. If a particular diltdbution is (ound to be
skew the nat problem that arises is to meu~re the c::ct~t of skewness.
Some distributions may be slightly dUfctent from th;' ~'!trical dis-
tribution while others may be very much different fro~ ~~,. Meuures
of skewness are meant to give an idea about the extent "01 asymmetry
in a series. . " -,
First IIIUlllrll of SIu1ll1lIlS. 'Pte 'first meaSures of skewness are
based on the assumption that in a skew distribution the values of mean,
median and mode do not coincide. This being so. the difference
between any two of these values indicates the extent of skewness.
Thus fint measures of skewness ate :-
(I') Mean - Mode or (11- Z)
(it) M~-Median or (.-M)
(iiI') Median-Mode 01' (M-Z)
The above measures of skewness arc absolute measures. For pur-
poses of comparison it is necessary to have telative meaaurea of .Itew-
neS!. Relative measures of skewness are obtalined by dividing the
absolute measures byuny measures of diapetaion. The absolute measures
of .kewnes. should not be divided by a mCUUt'e of central tendency or
average because. here the problem il not to study the extent of skewness
in s:elation to the size ofitem&, but it is to study the asymmetry in relation
,J to the di.~raal of items round a central value. The purpolle of studying
skewnes'1' to find out how much more or leis. do the items on one side
deviate.from the items on the other side of a central value. Therefore,
absolute measures of skewness IhQ~l~diVjded b1 a measure of disper-
sion rather than a measure of ce(it.r t\ndency. Relative measures of
.kewness .lIe known o,..'/fid,,,f bf ~ »IfI.us.
16
242 FUNDAMENTALS OF STATISTICS

Thus
Coefficient of skewness or
· a-Z (i)
J=sz····
· a-Z
or J= -a-· ..··· (it)
If mode is ill-defined median can be used in place of mode and then
• ,(1- M (,'1.'.)
J=an;- ..... .

· a-M
or J=_ -8-······ (iv)

Skewness can also be studied by studying the difference of median


and mode. Thus, I

./=sz--
· M-Z
..... . (v)

· M-Z
or J= sm- (vi)
Kllrl Pearson has given a formula in which the denominator ,is not
the mean deviation but standard deviation.
· a-Z (vii)
Thus J= - ..•...
a
\
If mode is ill-defined, Karl Pearson is of opinion 'that its value
should be estimated on the basis of the empiri~l relationship which
exists between the values of mean, median and mode in a moderately
asymmetrical distribution. We have seen that in a moderately asym-
metrical distribution
(Mean-Mode) = 3 (Mean-Median)
Thus j = 3(a-M) .... (viii)
a
The value of the above coefficients of skewness would be 0 for a
symmetrical distribution and for skew distributions it would be a pure
number. These are the two properties of these coefficients and for these
reasons they are regarded as better than other tneasures. In theory there
are no limits to the values of the coefficient numbers (i), (ii), (iii), (iv),
(v), (111} and (vii). In actual practice for moderately asymmetrical distri-
butions all these coefficients (excepting No. viii) vary between ± 1. The
theoretical limits of coefficient number (viii) are ±3 (because the
a-M .
theoreticaI limits 0 f - - are ±1) but they are never reached In actua
I
u
practice.
MOMENTS, SKEWNESS AND KUR'l:OSIS 243
SuonJ Measure oj Skewnes.r. The second measure of skewness is
based on the quartiles. It has been said above that in a skewed distri-
bution (M- Ql) and (Qa-M) would not be equal. A measure of skew-
ness is thus derived by finding out the difference between these
two values.
Thus
Second measure of skewness =(Qa-M)-(M-Ql)
=Qa- 2M+Ql
=Q.+Q,-2M
The above is an absolute measure of skewness. The relative
measure can be obtained by dividing this absolute measure by the sum
of (Qa-M) and (M-QJ.
Thus the coefficient of skewness or
;_ (Qs-M)-(M- Ql)
- (Q.-M) + (M- Ql)
QS+Ql- 2M
= QQ .... (ix)
3- 1

This coefficient is also a pure number. Its theoretical limits are


±1. For a sxmmetrical distribution its value would be o. However, the
~econd measure of skewness and its coefficient do not always give de-
pendable results. In many cases the value of this co-efficient may be
zero and yet. th~ d~stribution may not be perfectly symmetrical. The
reason for thiS lies In the fact, that quartiles llre not based on all the ob-
servations of a series. Thus this measure of skewness should be used
with caution and for purposes of comparison~ as far as possible, Karl
Pearson's coefficient of skewness should be used.
The following example illustrates some of the above formulae l -
Example 2. Calculate the coefficient of skewness from the follow-
ing data : -

Wages in Rupees No. of labourers


0-10 185
10-20 77
20-30 34
30-40 180
40-50 136
50-6Q 23
60-70 50
Soililion. In the above example Rs.
Mean or a 29
Mode or Z 37.7
Median or M 32.6
244 PUNDAMBNTALS OF STATISTICS

Fjrst Quartile or III = 9..3


Third Qul&.Ctile or Q. - .2.8
Mean De?iation ftom mean or 8 =- 16.5
Standard DeviAtion or a 18.9
Coefficient of skewness

j- ~~~ = 29-37.7 _ -.53 {No. ;;}


43 16.5-
,,-M ,29-32.6 22 (No. ill)
j= -8 - 16.5 --.
III-Z _ 29_-:-_37.7 = _ 46
a 18.9 .
3(a:M) 3(29"is~:.6) = -.57 (No. tliil)
Q.+QI-aM (42.8+9.3-65.2) 39 (No. ix)
j- Qa-QI --42:8-9T -.
Poeltive and negative akc..rne..
As has been said earlier. if A curve is skewed to the right the value
of the mean would be mote than the value of either median or mode.
Ia such cues skewness is positive. \
If, on the other band, a curve is skewed to' the left, the value of
mean 'Would be less than the values of median and mode. In such
casel skewness is said to be negative.
In the e:rwnplea. solved above, skewness is negative as mean has
a value less than the values of median and mode.. Further, the·degree
of skewness is high all a coefficient of skewness about .4 indicates a
high degree of skewne...

Jtt1ll TOSIS

We haT~ seen above that mc:uures of skewness tell us whether a


particular distribution dHfera from a normal or symmetrical distribu-
tion and if so, to what extent. Another measure to teat, how;Dear a
particular I'rc<jaency dlattibutlon confu"", to the ..."",u curve;'
tos~. II ittdktd", .111111". " Jistr;;IIn.. ;1 _.,.,11a1-hll" Dr III"'
p
-Ih_ 11M IID,.I111111 Milrilnllillll. The figure on the next page shows a no aI
carve and two other curves in which Kurtosis i. present•
f
• In this figure curve No. 1 is a norma1~' e. It is also c:a1led M ~D­
·_II~. CalVe No. 2 is more peaked th ':l the normal curvc. Such
cur-..ea are known u Llplo_lit. Curve .f3 is more flat-topped than
the nomW curvc. Such curves are leno nt as Pl4I.1htrlk.
XOIO!N'I'S~ 5DWSNEIS .AND JWJl'tOSIS 245

Figure 4.
baurea of kurtosis
Kurtosis is measured by coefficient' f3. or its derivatio.r )'1' We
lave seen in connection with th e ltudy of moments that

~.
Q "'"
== -.-
"".
In other words P. is equal to the fourth moment about the mean
li-rided by the square of the second moment about the mean.
Y. = P. - 3
The standard value of fl. is taken SUI 3 and the CUtVC8 with valuei
f II. less than 3 are called ~latykurtic and curves with values of P. morc
lao 3 are called leptokurtlc. In a normal or metokurtlc curve the value
flJ. is equal to 3. .As sudl for a normal curve the value of Y. -0, and
I curves which are more Bat-topped o·r more peaked than the nonna}
nve the value of y. would be cithet a minus or pInl iigure. The
igge.r the value
!!parture from no
c;!,!j1 in sa frcqueru:y dittributiOD. the greater is its
ty.
iapeaion, .xccwnes. IRIld kurto8ia contrasted
Now that we have .tudicd CIi.apetsion, Ikewocsi and kurtOllis, it
·ill not be out of place to comparc1Ulcfcontralt them,llI all the.e meuurcs
:e meant to study the formation of a frequency distribution;Disper.ion
:udies the acatter of itcml unmd a central value or among themaelTcl.
: doa not ahowthe extent to which deviations dulter below an QeDlle
246 FUNDAMENTALS OF S'rATISTICS

::>r above it. Measures of skewness study this point. ,They tell us .about
the cluste!= of deviations above and below a measure of central tendency.
In a normal distribution the deviations below and above an average are
equal while in an asymmetrical distribution they are not equal. Kurtosis
studies the concentration of items at the central part of a series. If the
items concentrate too much in centre the curve becomes leptokurtic.
and if the concentration in the centre is comparatively little the curve
becomes platykurtic.
Thus we find that measures of dispersion, skewness and kurtosis
study three different aspects of a frequency distribution. Measures of
dispersion throw light on the span withil;l which values of a variable lie.
They study the size of a series. Measures of skewness throw light on
the shape of the series and the size of variation on either side of a central
value. Kurtosis studies the frequencies of' a series at the cent.ral values.
The theory of skewness and kurtosis has not a very great impor-
tance in economic and social studies, as in these cases a normal distri-
bution is usually out of question, but the importance of these studies is
very great in biological studies and studies relating to other physical
sciences.

Questions
I. Define moments and discuss the method of calculating momcllts of dja-
persion about the mean.
I
2.. How would you calculate the value of a moment about the mean from the
value of the moment about any arbitrary value ?
~ . What is skewness? How does it differ from dispersion? What arc the
vadous measures of skewness which you know ?
4' What ia kurtosis? What purpose does it serve? 1& the ltudy of kurtosil
useful in economic and social scieoces ? If oot. why ?
5. Find the Second Moment of I;>ispersioo and a coefficient of skewness from
the data in the following series : -
Size of item Frequl;ncy Size of item Frequency

3 7·5 8S
7 8·S 32-
za 9·S 8
60
61. Find out the mean wage and a coe6icien't of skewness for the following :_

3~
40 ..,.
men get at the rate of Rs.

..
•• ,.••
5-5 0 ....
4-50