Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
WHAT IS STATISTICS?
1.0 Objectives
1.1 Introduction
1.2 Statistical Modeling
1.3 Probability
1.4 Common statistical Terminology
1.5 Population
1.6 Probable errors in statistics
1.7 Variables
1.8 Statistical Measures (Tools)
1.8.1 Central Tendency
1.8.2 Measures of Dispersion
1.9 Distribution
1.10 Expectations
1.11 Association
1.12 Summary
1.13 Check your Progress - Answers
1.14 Questions for Self - Study
1.0 Objectives
What is Statistics? / 1
1.1 Introduction
In our day to day life we deal a lot with statistics, may be without
being aware of it. For example, when you tell your graduation marks –
You are making use of ‘Averaging’ concept of statistics. When you talk of
odds in favour of India’s winning a cricket match – You are dealing with
‘Probability’. When you are talking of most selling product – You speak
of ‘Modal Value’. Weather forecast is also based on statistical analysis
of weather conditions collected with the help of Satellite.
1.3 Probability
Probability
It is theory of chance when taken as science.
It is chance of happing an event when considered in connection
with event. Probability of any event is between 0 and 1 both
included.
Mathematical or Objective Probability
Probability theory, which is based on Statistical data and prob-
ability axioms, is called as mathematical probability.
Axioms of probability
There are three axioms of probability : (1) Chances are always
at least zero. (2) The maximum chance that something hap-
pens is 100%.
Subjective probability
Probability theory, which is based on feelings of thinking of a
person, is called as subjective probability.
Conditional probability
It is probability of an event that is calculated on the assumption
that some related has happened.
Experiment
Action whose outcomes are of interest to us is called as an
experiment e.g. tossing of a coin.
Event and Happening of an event
Event is a set of one or more outcomes of an experiment. An
event is said to have happened if the outcome is the result of the
experiment. e.g in the experiment of tossing of coin there are
two outcomes head and tail. Two events A and B can be defined
as follows
Event A : Head shows up in the experiment of tossing of a coin.
Event B : Tail shows up in the experiment of tossing of a coin.
Now if head shows up, then we can say that event A has hap-
pened.
Probability of an event A is denote by P (A).
Business Statistics / 4
Sample Space
Set of all possible outcomes of an experiment is called as sample
space.
Dependent events
If happening of one event changes the probability of another
event then those events are said to dependent events.
Independent events
If happing of one event does not change the probability of an-
other event then those events are said to be independents events.
Mutually Exclusive Events
Two or more events which cannot happen at the same point of
time are called as mutually exclusive events.
Exhaustive Events
If two or more events cover the entire sample space i.e. if two or
more events cover all possible outcomes of an experiment, then
such events are called as exhaustive events.
Certain Event
If probability of a happening of an event is 1. The event is a
certain event.
Impossible event
If probability of a happening of an event is 0, the event is an
impossible event.
Complement of an event
Complement of an event means that event does not happen. i.e.
if event A is getting 1 in throw of dice then A complement is not
getting 1 in a throw of dice. Complement of event is denoted by
(A C , AI or A ) And P ( A C ) = 1 - P ( A ).
1.3 Check your Progress
1. What is statistics?
________________________________________________________
________________________________________________________
2. What is mean by descriptive statistics?
________________________________________________________
________________________________________________________
What is Statistics? / 5
3. Short Notes:-
i. Probability:
________________________________________________________
________________________________________________________
ii. Descriptive Statistics:
________________________________________________________
________________________________________________________
iii. Inferential Statistics:
________________________________________________________
________________________________________________________
0-10 12
10-20 15
20-30 6
30-40 9
Table 1.4.2
What is Statistics? / 7
Class Interval
Difference in Upper class boundary and Lower class boundary
is called as class interval. In above case class interval is
0 - 10 = 10. Example (table 1.4.2).
Class Marks
Mid point of a class is known as class mark. It is average of
Upper class boundary and Lower class boundary i.e. 25 is class
20 30 50
mark for the class 20-30. E.g. = = 25
2 2
Class Frequency
Class frequency is the number of observations in the class.
Business Statistics / 8
conclusions are to be drawn about the students in a particular college, all
students of that college will comprise the population.
Sample
The part of the population selected for the purpose of the study
is called as sample. In above case it will be difficult to interview
all the students of the college as the total number of students
could be in thousands. In such case one would select a few
students for interview. Students selected for interview comprise
of sample.
Sample Size
The number of elements in a sample is called as sample size.
Sample Survey
A survey based on the responses of a sample of individuals,
rather than the entire population.
Cluster Sample
In a cluster sample, the entire population is divided into hetero-
geneous group and some of such groups are selected as sample
which is chosen on geographical basis is example of cluster
sampling. If the blocks are chosen separately from different
strata, so the overall design is stratified cluster sample.
Convenience Sample
A sample drawn because of its convenience not a probability
sample e.g. sample of people having telephone numbers in as
Pune city to decide about the population in city is convenience
sample. It is selected because it would be easier to interview
people over phone rather than visiting their homes. Samples of
convenience are not representative of the population, and it is
not possible to quantify how unrepresentative results based on
samples of convenience will be.
Random Sample
A random sample is a sample whose members are chosen at
random from a given population in such a way that the chance
of obtaining any particular sample can be computed from
particular population.
Simple Random sample or probability sample
A simple random sample is the sample selected from population
where every individual of the population has equal chance of
What is Statistics? / 9
getting selected. A simple random sample can be drawn in two
ways – SRSWR (Simple Random Sample with Replacement)
and SRSWOR (Simple Random Sample without Replacement),
In SRSWR individual once selected in the sample can be again
selected in another sample i.e. it is put back in the population.
In SRSWOR individual once selected in the sample cannot be
selected in any other sample i.e. it is not put back in the
population. If we want to draw sample of size 2 from the numbers
4.5.6 then with SRSWOR we can have the following samples.
(4,4), (4,5), (4,6), (5,4), (5,5), (5,6), (6,4), (6,5), (6,6). While with
SRSWOR we can have only 3 samples as (4,5), (4,6), (5,6)
Thus if sample size in n i.e. if n units are to be drawn from a
population of N1 units then total number of samples that can be
drawn by SRSWR method is Nn While with SRSWOR method
we can draw NCn samples.
Stratified Sample
In random sampling, sometimes the sample is drawn separately
from different disjoint homogeneous (having same properties)
subsets of the population which itself is heterogeneous (having
different properties) i.e. population is divided into number of
groups. Each such group is called a stratum. The plural of stratum
is strata. Samples are drawn separately from each of such group.
Sample drawn in such a way are called stratified sample.
For example, to determine buying habits of persons in
society, one needs to divide the populations of the city into various
income groups. Because buying habits would differ according
to the income. Thus heterogeneous population that population
having dissimilar incomes is divided into number of homogeneous
groups or strata having similar incomes.
Systematic Sample
A systematic sample from a frame of units is one drawn by
listing the units and selecting every individual after fixed interval.
For example, if there are 100 units in the population and a sample
of 10 is to be drawn, then every 10th is selected. It is not
necessarily the first unit, the eleventh unit, the 21st unit….. The
first unit selection is usually made by a random number and
then every 10th unit selected. Systematic samples are not random
samples, but they often behave essentially as if they were
random, if the order in which the units appear in the list is
haphazard. Systematic samples are a special case of cluster
samples. Systematic samples are not as good as simple random
Business Statistics / 10
sampling. When starting unit is not selected by random number
method rather than it is decided by the judgement, then such
sample is called as Systematic Sample and not as Systematic
Random sample.
Quota Sample
Quota sampling is a method of sampling widely used in opinion
polling and market research. Interviewers are each given a quota
of subjects of specified type to attempt to recruit for example,
an interviewer might be told to go out and select 20 adult men
and 20 adult women, 10 teenage girls and 10 teenage boys so
that they could interview them about their television viewing.
Random Error
All measurements are subject to error, which can often be bro-
ken down into two components: a bias or systematic error, which
affects all measurements the same way; and a random error,
which is in general different each time a measurement is made,
and behaves like a number drawn with replacement from a box
of numbered tickets whose average is zero.
Systematic error
An error that affects all the measurements similarly. For ex-
ample, if a ruler is too short, everything measured with it will
appear to be longer than it really is (ignoring random error). If
you are watching runs fast, every time interval you measure
with it will appear to be longer than it really is (again, ignoring
random error). Systematic errors do not tend to average out.
Systematic errors can also originate from incorrect sampling
procedures.
What is Statistics? / 11
That is, the standard error is the standard deviation of the
errors.
1.7 Variable or Variate
A letter which can take values of all observations e.g. If variable
x represent marks or three students who have scored 40, 50
and 60 marks, then x 1 = 40, x2 = 50, x3 = 60.
Categorical Variable
A variable whose value ranges over categories, such as male,
female. Some categorical variables are ordinal.
Continuous Variable
A quantitative which can take all values in its range is called as
continuous variable. Its set of possible values is infinite set. In
practice, one can never measure a continuous variable to infi-
nite precision, so continuous variables are sometimes approxi-
mated by discrete variables, A random variable X is also called
continuous if its set of possible values is zero A random variable
is continuous if and only if its cumulative probability distribution
function is a continuous function (a function whose graph does
not show any break.)
Discrete Variable
A quantitative which cannot take all values in its range is called
as discrete variable. Its set of possible values is finite set. A
discrete random variable is one whose set of possible values is
countable. A random variable is discrete if and only if its cumu-
lative probability distribution has breaks in its graph.
Ordinal Variable
A variable whose possible values can be arranged in some or-
der, such as short, medium, long. In contrast, a variable whose
possible values are India, China, USA, are not ordinal variables,
Arithmetic with the possible values of an ordinal variable does
not necessarily make sense, but it does make sense to say
that one possible value is larger than another.
E.g. 1) 5, 4, 2, 3, then 2, 3, 4, 5 are ordinal sample
2) Good, Better, Best.
Random Variable
A random variable denotes possible outcomes of a random ex-
periment. E.g. A coin is tossed, we get H and T as random
variable.
Random Experiment
A random experiment is the one in which all outcomes have
Business Statistics / 12
equal chance of appearing. e.g. A throw of fair dice has outcome
1,2,3,4,5,6. Since all outcomes have an equal chance of ap-
pearing, throw of a fair dice is an random experiment and if x
denotes the outcomes 1,2,3,4,5,6 then x is a random variable.
Bias
When the measurements are affected by the judgment of the
data collector or data analyst rather than by standard Statistical
procedures, bias is said to be introduced. A biased estimate
gives the value, which is different from the truth. Numerical value
of bias is the average difference between the measurement value
and the actual value which could have been obtained without
bias. Unbiased or random selection procedure is without any
bias.
Dependant Variable
When value of the first variable is governed by the value of the
second variable then first variable is dependent variable. e.g.
x- Rank in examination, y= Number of marks in examination.
Independent Variable
When value of the variable is not governed by the value of any
other variable then such variable is called as independent vari-
able. e.g. x- Height of a student, y - Marks of student.
What is Statistics? / 13
total number of observations. Geometric mean of 2,2,8,8.
is 4 2.2.8.8 = 4 256 = 4.
Harmonic Mean
It is the reciprocal of the average of reciprocals of all observa-
tions.
Harmonic mean of 2, 2, 2, 8 is calculated as
13
Step I : [ ½ + ½ + ½ + 1/8 ] =
8
Step II :
n 4 8 32
4
1 13 13 13
8
x
Median
Observation that occupies the middle place when data is ar-
ranged in increasing order is called as median. Median of
10,15,20,30,35 is 20.
Mode
Mode is the most frequently occuring observation. Mode can be
more than one. Mode of 100, 200, 200, 500, 600 is 200.
1.8.2 Dispersion
Dispersion gives idea of spread of data from the central value say
mean.
Deviation
Deviation is the difference between a observation and some ref-
erence value. Observation value is usually represented by X.
Deviation of x from some value A is X – A.
Deviation from mean is X X .
Absolute deviation
When deviation is always taken as positive irrespective of its
sign. It is called as absolute deviation. It is represented as
| X X|.
Business Statistics / 14
Mean Deviation
It is sum of absolute deviations from mean divided by total num-
ber of observations. See chapter 2 for examples.
Standard Deviation
It is square root of sum squares of mean deviation divided by
total number of observations. See chapter 2 for examples.
Variance
Variance is square of standard deviation. See chapter 2 for ex-
amples.
Quartile deviation
It is the difference between the third quartile and first quartile
divided by 2. it is also called as semi-interquartile range. See
chapter 2 for examples.
Inter Quartile range
It is the difference between the third quartile and first quartile.
See chapter 2 for examples.
Range
Range is the difference between the largest value and the small-
est value of the data set. See chapter 2 for examples.
Trials
Number of times experiment is repeated is called as
number or trials.
Binominal Distribution
A random variable has a binomial distribution if it de-
notes number of successes of a particular event in n number of
trials and p is the probability of success in each trial. Probabil-
ity of success remains same for all trials. Binominal distribution
has two parameters (n.p.) it is a discrete distribution e.g. num-
ber of heads obtained in tossing of a fair coin for n times. Vari-
ables representing binomial distribution is a binomial variable or
binomial variate. See chapter 4 for more details.
Poisson Distribution
A random variable has a poison distribution if it denotes
number of successes of a particular event when x units are
picked up from population and m is the mean value of successes
e.g. Finding probability that sample of 10 units would contain 2
defectives if probability of finding defective is .05. Poisson distri-
bution has only one parameter m. It is a discrete distribution.
Poisson distribution is usually applied where probability of suc-
cess is quite low e.g number of accidents, number of defective
products etc. Variable representing Poisson distribution is a
Poisson variable or Poisson variate. See chapter 4 of more de-
tails.
Normal distribution
A random variable is normally distributed if the variable
is continuous and the distribution is symmetric about mean.
50% observations lie below the mean and 50% observation lie
above the mean. It has bell shaped continuous curve in which
two parts are made by the vertical line at mean exactly fit over
each other.
In this distribution mean = mode = median Variable rep-
resenting normal distribution is a normal variable or normal vari-
ate. See chapter 4 for more details.
Business Statistics / 16
Standard Normal Distribution
It is a normal distribution in which mean = mode = median = 0
and standard deviation = 1. Variable representing standard nor-
mal distribution is a standard normal variable or standard nor-
mal variate. Standard normal variate is denoted by z.
Univariate Distribution
Distribution involving only one variable is called as univariate
distribution e.g average marks obtained by students in a ex-
amination.
Bivariate distribution
Distribution involving two variables is called as bivariate distribu-
tion e.g. Marks obtained by students in two subjects say eco-
nomics and statistics in an examination.
Skewed distribution
A distribution that is not symmetrical is skewed distribution.
What is Statistics? / 17
Null Hypothesis
It is initial assumption about an outcome before testing. It is
denoted by Ho e.g. Average strength is as per required norms,
which means population mean is same as sample mean. It is
written as Ho: X .
Alternative Hypothesis
Another assumption if null hypothesis is proved to be false. It is
denoted by H1 e.g. Average strength is less than required norms.
Confidence Interval
A confidence interval is percentage of observations that are sup-
posed to lie in that interval. e.g. 95% confidence interval is sup-
posed to contain 95% of observations according to the speci-
fied criteria.
Confidence Level
Confidence level is the confidence interval in which we expect to
lie the given parameter of the hypothesis.
e.g. A hypothesis is rejected at 95% confidence level means
that the given set of observations does not match with 95% of
the population for the given parameter.
Significance Level, Critical Level
Significance level is the percentage of observations which lie
beyond the desired confidence level.
e.g. 95% confidence level means 5% significance level.
Critical Value
The critical value in a hypothesis test is the value of the param-
eter beyond which we would reject the null hypothesis.
Type I Error
Rejecting the null hypothesis when it is true.
Type II Error
Accepting the null hypothesis when it is false.
One sided tests or one tailed tests :
A test in which we consider only one side of the distribution
e.g. greater than and less than testing.
Two sided tests or two tailed tests :
A test in which we consider only both sides of the distribution
e.g. equal to and not equal to testing.
Business Statistics / 18
1.11 Association
Two variables are associated if variation in one variable has effect
on variation in other variables.
Correlation
It is a measure of association between variables.
Scatter Diagram or Scatter Plot
It is graph obtained by plotting of values of two variables which
describe single bivarite observation (e.g. height and weight of a
persons). One variable (independent Variable) X coordinate and
the other variable (dependant variable) as Y coordinate.
Correlation coefficient
The correlation coefficient r is a measure of how nearly a scat-
tered diagram or scatter plot falls on straight line. The correla-
tion Coefficient is always between – 1 and + 1.
Causation, Causal Relation.
Two variables are casually related if changes in the value
of one cause the other to change.
1.12 Summary
This chapter explains in detail the importance of statistics to
people and scope of statistics in different fields like Medical science
Business etc. The different types of statistical theory and tools to suse
and get proper decision of our interest in the Business.
What is Statistics? / 19
1.13 Check your Progress – Answers
1.3
All answers are Descriptive. Take proper help of your SLM and
also write your ideas in your own words.
1.4
1) Class marks
2) Class interval
3) Upper class boundary and lower class
4) Frequency of that observation
5) Specified range
6) Quantitative Data
7) Quantitative Data
8) Chance
9) Analysis of Variance
10) Inferences
1.8
1) Continuous Variable
2) Discrete Variable
3) Truth
4) Mean, Mode and Median
5) Mode
6) Standard Deviation
7) Standard Deviation
8) Inter Quartile Range
9) Range
10) Trials
1.12
1) Correlation
2) – 1 and + 1
Business Statistics / 20
3) Type I Error
4) Type II Error
5) Association
What is Statistics? / 21
NOTES
Business Statistics / 22
CHAPTER 2
MEASURES OF CENTRAL
TENDENCY AND DISPERSION
2.0 Objectives
2.1 Introduction
2.2 Methods of Collection of Primary Data
2.2.1 Direct personal interview
2.2.2 Indirect personal interview
2.2.3 Mailed questionnaire
2.2.4 Scheduled through enumerations
2.3 Organizing the Data
2.3.1 Cumulative Frequency Distribution
2.3.2 Grouped Frequency Distribution
2.3.3 Guidelines for making class intervals
2.3.4 Cumulative grouped frequency distribution
2.4 Graphical Representation
2.5 Pie Chart Calculations
2.6 Frequency Curves
2.7 Cumulative Frequency
2.8 Averages
2.9 Partition Values
2.9.1 Quartiles
2.9.2 Deciles
2.9.3 Percentiles
2.10 Measures of Dispersions
2.10.1 Mean Deviations
2.10.2 Standard Deviation
2.11 The Coefficient of Variation
2.12 Skewness
2.13 Quartiles and the quartile Deviation
2.14 Extreme Values
2.15 Summary
2.16 Check your Progress – Answers
2.17 Questions for Self - Study
2.1 Introduction
Business Statistics / 24
age i.e dispersion is less while in the second case data values are more
scattered or spreads or away from the average i.e dispersion is more.
The most common measure of dispersion is the standard deviation.
Primary data is data collected for the first time through census
or sample. There are several ways of collecting such data. These are :
Advantages:
Limitations:
Advantages:
Limitations:
Advantages:
- It can be administered to large groups of individuals.
- It is much less time consuming and is economical.
- A much larger coverage can be made as people in distant places
can be reached without much difficulty.
- It is advantageous in a situation where the persons concerned
move to far away places. For example, in an enquiry relating to
old students of a college, such a method may be useful as
students move out and away after leaving the institution.
- Useful for collection of demographic information, satisfaction lev-
els and opinions of the program.
Limitations:
- The method can be adopted only in case of enlightened and
educated people.
- As persons are not approached directly, the proportion of non
response is usually much larger. People do not have the time to
spare nor are they are willing to take the trouble of writing the
answers and returning the questionnaire. Sometimes people also
do not like to record information in their own handwriting and
very often avoid answering delicate questions.
Advantages:
Limitations:
- Frequency Distribution
Consider the following set of data which are the high tempera-
tures recorded for 30 consecutive days. We wish the summarize this
data by creating a frequency distribution of the temperatures.
Business Statistics / 28
Frequency Distribution
1. Identify the highest and lowest values in the data set. In the
given data temperature the highest temperature is 51 and the
lowest temperature is 43.
2. Create column with the title of the variable we are using. In this
case temperature. Enter the highest score at the top, and in-
clude all values within the range from the highest score to the
lowest score.
3. Create a tally column to keep track of the scores as you enter
them into the frequency distribution. Once the frequency distri-
bution is completed you can omit this column. Most printed
frequency distributions do not retain the tally column in their
final form.
4. Create a frequency column, with the frequency of each value,
as shown in the tally column recorded.
5. At the bottom of the frequency column record the total frequency
for the distribution proceeded by N.
6. Enter the name of the frequency distribution at the top of the
table.
Business Statistics / 30
C u m u lative Fr eq u en cy Distr ib utio n fo r High
T e m p eratu re s
T em pe ratu re T ally F req uen c y Cu m u lativ e
F req u e nc y
51 4 30
50 4 26
49 6 22
48 0 16
47 3 16
46 3 13
45 4 10
44 3 6
43 3 3
N= 30
D a ta Se t – H ig h T em p e ratu r es f o r 5 0 d a y s
57 39 52 52 43
50 53 42 58 55
58 50 53 50 49
45 49 51 44 54
49 57 55 59 45
50 45 51 54 58
53 49 52 51 41
52 40 44 49 45
43 47 47 43 51
55 55 46 54 41
Business Statistics / 34
2. Title :
It can contain the title and subtitle if any of the graph.
3. Axis :
Base line when data is positioned on a graph. Scale and scale
label are displayed on the axis. Unit label, axis title, and break
line are also displayed if necessary. The name of each axis
may vary depending on the chart type.
4. Plot area
The area in which the graph is plotted.
5. Series
The group of series of associated values displayed in the graph
e.g. One year will represented by one series and each series is
represented by a bar in the graph.
6. Legend
The list indicating the colour, line style, or filling pattern of the
W heat R ic e C e re a ls T o ta l
1990 50 100 15 0 300
1999 100 150 25 0 500
Bar Chart
200 Wheat
150
100
50
Business Statistics / 36
Multiple or compound bar chart
250 Ce re als
Rice
2 00 W h eat
1 50
100
50
1 990 1 999
1990 1999
PIE CHART
Business Statistics / 38
A histogram of a frequency distribution is drawn as fol-
lows:
a) The class boundaries are marked on the X-axis starting and
finishing at convenient points on the axis, the class intervals are
thus marked on the X-axis and are taken as bases.
b) On each base, a rectangle is drawn whose height is equal to
the frequency of that class. If the class intervals are of equal
size of width, the areas of the rectangles are proportional to the
corresponding class frequencies. Here the vertical axis (or y-
axis, as is commonly known) is the frequency axis.
c) Instead of class boundaries class limits may be used if the
frequency distribution is given or constructed in terms of class
limit. But it is better to use class boundaries, especially in case
of continuous variables. We draw below the histogram corre-
sponding to the frequency distribution given by Table in the given
example in problem to be solve.
J - Shaped Reversed
Revered J Shaped U - Shaped
Business Statistics / 40
axis. The corresponding polygon is known as cumulative frequency poly-
gon (less than) or ogive. By joining the points by a free hand curve we get
the cumulative frequency curve (“less than”). Similarly we can construct
another cumulative frequency distribution (“more than” type) by consider-
ing the sum of frequencies greater than the lower class boundaries of the
classes. For example, the total frequency greater than the lower class
boundary 158.5 of the class 159-160 is one (1), while the total frequency
grater than the lower class boundary 156.5 of the class 157-158 is 1 + 4
= 5, that of the class 155-156 is 1 + 4 + 6 = 11, and so on. Given below
is Table 3.7 of cumulative frequency distribution. (“more than”) of the,
same distribution.
Class
Class (in cms.)(in
interval interval
cms.) Frequency Cumulative Frequency
(Less than)
144.5 – 146.5 2 2
146.5 – 148.5 5 7
148.5 – 150.5 8 15
150.5 – 152.5 15 30
152.5 – 154.5 9 39
154.5 – 156.5 6 45
156.5 – 158.5 4 49
158.5 – 160.5 1 50
Total 50
50
30
20
10
Classinterval
Class (in cms.)(in
interval
cms.) Frequency Cumulative Frequency
(More than)
145 – 146 2 50
147 – 148 5 48
149 – 150 8 43
151 – 152 15 35
153 – 154 9 20
155 – 156 6 11
157 – 158 4 5
159 – 160 1 1
50
Business Statistics / 42
The graph obtained by joining the points obtained by plotting the
cumulative frequencies (“more than”) along the vertical axis and the cor-
responding lower class boundaries along the X-axis is known as cumula-
tive frequency polygon (greater than) or ogive, by joining the points by a
free hand curve, one gets cumulative frequency (“more than” type). These
two curves are shown in figure above.
1 1 1 1
Harmonic mean n n ... .
x n x
1 x 2 x n
Business Statistics / 44
Mean for grouped data
Solution
Total score of 35 boys = 35 x 60 = 2100
Total score of 85 girls = 85 x 40 = 5400
Total score of 120 students = 2100 + 3400 = 5500 marks
k
xi f i
i 1
k
f i
i 1
Median
Example: -
Median:
Ex: 2,3,5,6,7
No. of observations in the given set = N = 5 It is odd No.
th
N 1
\ Median = value of observation
2
Here N = 5
N 1 5 1
= =3
2 2
1st, 2nd, 3rd, 4th, 5th
2, 3, 5, 6, 7
Median = 3 rd
observation
=5
= N
= 6 it is even number
Business Statistics / 46
N th N th
1
Median = value of 2
2 Observation
6 th 6 th
value of 1 observatio n
= 2 2
2
=
value of 3 rd 4 th observatio n
2
0 1 1
= = 0.5
2 2
N o . of stu d en ts 6 4 16 7 8 2
M a rk s 20 9 25 50 40 80
n
N fi 43
i1
Solution:
In given data N = 10.
th th
N N
(value of 1 Obs.)
Median = 2 2
2
th th
10 10
(value of 1 Obs.)
= 2 2
2
th th
(value of 5 6 Obs.)
=
2
78
=
2
= 7.5
Mode
Marks 0 1 2 3 4 5 6 7 8
Number of boys 7 10 16 17 26 31 11 2 1
Business Statistics / 48
Solution :
xi f
Fi fi xi c.f.
0 7 0 7
1 10 10 17
2 16 32 33
3 17 51 50
4 26 104 76
5 31 155 107
6 11 66 118
7 2 14 120
8 1 8 121
Mean Median
fi xi th
X = N 1
fi M = observation
2
440
= th
121 121 1
= observation
= 3.64 2
Mode th
Highest frequency of gives data = 31 122
= observation
& Value with highest frequency = 5 2
Mode = 5 = 61 th observation
e.g ;
(I) 1,2,2,3,3,3,4 = 4
The numbers appear in this data Corrosponding c.f. = 76
1 once 2 twice Corrosponding Xi = 4
3 trice 4 once
There are the values that divide total observations into a number
of equal parts when data is arranged in the increasing order.
2.9.1 Quartiles
N C.F.
4 h
Q1 = L + F
L= Lower limit of the quartile class
CF = Cumulative frequency (c.f.) of previous class
F= Frequency of the quartile class
h= Width of the quartile class
4
3 N C.F.
h
Q3 = L + F
Q1 is called as lower quartile or first quartile
Q2 is called as middle quartile or second quartile or median.
Q3 is called as upper quartile or third quartile.
2.9.2 Deciles
Ni
C.F.
10
L h
D1 = F
Business Statistics / 50
For example ;
N1
C.F.
10
L h
D1 = F
N 2
C.F.
10
L h
D2 = F
2.9.3 Percentiles
Percentiles are the values of the variant that divide the total fre-
quency into 100 equal parts. There are total 99 percentiles. Pi percentile
has i% values below it. e.g.if your score in an examination is 86 percen-
tile which is equal to the actual marks scored 70. It would mean 86%
candidates have scored less than 70 marks i.e.86% candidates have
scored less marks than you and 14% (100-86 = 14) candidates have
scored more marks than you.
The formula is
N i
C.F.
100
l h
Pi = F
For example;
N1
C.F.
100
l h
P1 = F
Find first quartile, median, the third quartile. Find D4, P66.
Solution:
The cumalative frequency (cf) is as below:
Business Statistics / 52
Here class 200-400 means wages from Rs.200 (200 is lower
class limit and is included in the class) to less than 400 (400 is upper
class limit and is NOT included in the class)
Q1 corresponds (first quartile) to 280/4 = 70th observation which
lies interval 600-800 unit lower class boundary (L) = 600.
Interval contains 78 observations and the CF of earlier class is
66. Class width h = 200.
N1
c.f .
4
l h
Q1 = f
280 1
66
4
600 200
Q1 = 78
280
4 66
600 200
= 78
70 66
= 600 200
78
= 600 + 10.256
= 610.256
280 2
4 66
600 200
Q2 = 78
= 600 + 189.74
= 789.74
3 N 3 280
Q3 Corresponds (third quartile) = = = 210 observa-
4 4
tion lies in the interval 800-1000.
N 3
4 c.f .
l h
Q3 = f
280 3
4 c.f .
800 200
Q3 = 80
= 800 + 165
= 965
D4
N i 280 4
The observation of 4th deciles corresponds to = = 112
12
10 10
Therefore, l = 600, f = 78, c.f. = 66, h = 200
4 280
10 66
600 200
D4 = 78
= 600 + 117.95
= 717.95
P66
66 280
The observation corresponds to = 184th observation,
100
which lies in the interval 800-1000.
Ni
100 c.f .
l h
Pi = f
Business Statistics / 54
66 280
100 144
800 200
P66 = 80
= 800 + 102
= 902
Ex : For what value of x will 8 and x have the same mean (average)
as 27 and 5 ?
27 5
= 16 Therefore
2
x8
= 16
2
32 = x+8
24 = x
Solution :
72 86 92 63 77 x
= = 80
6
( 80 ) ( 6 ) = 390 + x
480 = 390 + x
90 = x
x x 46
= 38
3(dogs )
Z weighs 34 kg.
Measures of Dispersion:
The Range: The difference between the smallest and largest val-
ues of item in a set or distribution.
Ex : The daily number of books sold by two separate bookstores
over twelve days were:
Bookstore 1 : 3, 5, 1, 4, 5, 3, 6, 8, 6, 2, 3, 7
Bookstore 2 : 2, 3, 2, 1, 4, 3, 2, 2, 1, 3, 4, 1
Business Statistics / 56
Mean Deviation for
Grouped Data
n
| Xi x |
M. D. = i 1
n
n
fi | Xi x |
i 1
M. D. = n
fi
i 1
Even nothing is mentioned mean deviation MD is always taken
about mean.
Ex : Find the range and calculate the mean deviation of 84, 92,
73, 67, 88, 74, 91, 74
Range = 92 - 67
= 25
84 92 73 67 88 74 91 74
Mean =
8
643
=
8
= 80.375
Mean Deviation =
In other words, each value in the set is, on average, 8.375 units
away from the common mean.
1631 368.98
fi | x x |
Mean Deviation : M.D. ( x ) =
fi
368.98
=
68
= 5.43
Business Statistics / 58
1) Ungrouped Data
2 xi
= ( xi x ) Where, x =
n n
2 2
xi xi
=
n n
2) A frequency distribution:
2
fi ( xi x )
=
fi
2 2
fi xi fi xi
=
fi fi
Solution:
Xi
xi Xi22
xi
84 7056
92 8464
73 5329
67 4489
88 7744
74 5476
91 8281
74 5476
Total 64 3 5 231 5
xi
x =
n
643
=
8
= 80.375
= 80.38
Measures of Central Tendency & Dispersion / 59
2
S.D = xi x 2
n
55315
= (80.38)2
8
= 6539.375 6460.94
= 78.435
= 8.856
Number of weeks 3 17 15 20 9 4
2
fi xi ( x ) 2
M.D = =
fi
41877
= (23.99) 2
68
= 615.838 575.52
= 40.318
= 6.35
Business Statistics / 60
2.11 The Coefficient of Variation
Solution:
2.12 Skewness
Business Statistics / 62
Skewness is a measure of the asymmetry of a frequency distri-
bution, and the skewness coefficient is included as one of the statistics.
A right or positive, skewed forecast has a greater destiny of values occur-
ring and the mode around the lower end of the range. A left, or negative,
skewed forecast displays the opposite trend. The skewness of a fre-
quency distribution can be an important consideration. For example, if
your forecast is Net profit, you would prefer a situation that led to a posi-
tively skewed distribution of profit to one that is negatively skewed (with
all else being equal).
Mean Mode
Person’s skew (SKP) =
Q 3 Q1 2 Media
Bowley’s skewness coefficient =
Q 3 Q1
34% 34%
13.5% 13.5%
2% 2%
µ – 3σ μ – 2σ 1σ μ μμ+μ 1σ
μ – 1σ 1σ μ1σ μ +μμ+μ+1σ
1σ +1σ2σ μμ+++2σ
μ + μ2σ 2σ
3σ μ++3σ
μ + μ3σ 3σ
46 58 65 70 76 49 59 66 71 78
50 59 66 71 79 53 60 66 72 80
54 62 66 73 82 55 63 68 73 83
55 64 68 73 84 57 65 69 74 88
Business Statistics / 64
Solution:
As the percentage for this sample are very close to the empirical
rule, it is reasonable to conclude that this sample is coming from a nor-
mal population.
Formula:
x mean
Z = Where Z = standard score
x = any value in a data set
The standard score (Z) of a data value (x) is the number of stan-
dard deviations that the data value is above or below the mean:
SD
Calculate coeffiecient of variation = 100
Mean
5
C.V (stats) = 100
50
= 10.
25
C.V. (Eco) = 100
40
= 62.5
C.V (stats) < C.V. (Eco)
Result of stats is better than Eco.
Even give
Jill (Stats) = 70
Jack (Eco) = 90.
- The first quartile (denoted Q1) as the value with 25 percent of the
data below it.
Business Statistics / 66
For example, if a distribution has 8 values, Q1 is the value with 2
numbers less than it.
- The third quartile (denoted Q3) as the value which has 75 per-
cent of the data below it.
For example, if a distribution has 8 values, Q3 is the value with 6
numbers less than it.
- The range of the middle 50 percent of the data is found by sub-
tracting Q1 from Q3 – this is called the inter-quartile range.
1. Graphical Approach :
2. Formula approach:
Solution:
2. By formula
Business Statistics / 68
2.14 Extreme values
The terms outlier and extreme values are often used interchange-
ably. Both refer to a data value that is atypical of the data set i.e. values
which differ markedly from most of the numbers in the set.
2 1 10 1 1
2.15 Summary
For ungrouped = =
xi
x n
Where xi = x1 + x2 + x3 + . . . . . + xn
n = No. of abservations given in data.
For grouped - Frequencies are given as fi;
Mean = =
fi xi
x n
Likwise Meadian and Mode there are two different formule for
grouped and ungrouped data.
1) Title
2) Plot area
3) Smoothed frequency polygon
4) More than and less than
5) 3
6) 5
7) 4
8) 10
9) Percentiles
10) Dispersion
Business Statistics / 70
NOTES
Business Statistics / 72
CHAPTER 3
3.0 Objectives
3.1 Introduction
3.2 Scatter diagram
3.3 Correlation & Covariance
3.4 Karl Pearson’s correlation coefficient
3.5 Spearman’s Rank Correlation
3.6 Coefficient of concurrent Deviation
3.7 Standard Error & Probable Error
3.8 Coefficient of Determination
3.9 Regression
3.9.1 Least Square Method
3.9.2 Properties of Regression coefficient
3.10 Residual Values
3.11 Standard Error Estimate
3.12 Limitations
3.13 Homoscedacity
3.14 Summary
3.15 Check your Progress – Answers
3.16 Questions for Self - Study
3.0 Objectives
Student 1 2 3 4 5
Height 165 175 160 180 160
Weight 52 57 54 60 50
Business Statistics / 74
I - Scale
X - Weight of a Student
Y - Height os a student
Correlation Co-efficient
= +1
II - Correlation Co-
efficient = -1
Graph is Descending
3.2.3 3.2.4
3.2.5 3.2.6
Cov ( x, y) = (x - x)(y - y)
n
xy
xy
= n
Business Statistics / 76
For uncorrelated variables the covariance is zero. However, if the
variables are correlated in some way, then their covariance will be non-
zero In fact, if cov ( x, y ) > 0, then y tends to increase as x increases,
and if cov ( x, y ) < 0, then Y tends to decrease as X increases. Not that
while statistically independent variables are always uncorrelated, the
converse is not necessarily true.
xi = 1 + 2 + 3 + 4 + 5 = 15
yi = 6 + 9 + 6 + 7 + 8 = 36
xi y i = 6 + 18 + 18 + 28 + 40 = 110
1 1
Cov (xy) = xi yi xi yi
n n
1 1
= 110 (15)(36)
5 5
1
= 110 108
5
2
=
5
= 0.4
The two variables have bivariate normal distribution for any given
value.
The method is used for measuring the linear ship between two
variables (series) Pearson’s coefficient between two variables (x.y) is
denoted by r (x, y) or r or ryx or by simply r . This is also know as product
moment correlation coefficient. It is the of the ratio of the co variance cov
(x , y ) to product of standard deviation of x and y.
Business Statistics / 78
cov(x, y)
r = xy
= standard deviation
Now for n pairs of observation (x1 y1) (x2 y2) ……. (xn, yn)
1
cov ( x, y ) = ( x X)(y Y )
n
1 2
X = ( x X)
n
1 2
Y = (y Y)
5n
( x X)( y Y )
r =
( x X) 2 ( y Y ) 2
(dx , dy )
r = 2 2
(d x ) (dy )
dx = ( x X) and dy = ( y Y )
Alternative formula :
n xy ( x )( y )
2 2 2 2
r = n x ( x ) n y ( y )
Ans
X
xi 55 5.5
n 10
Y
yi 88 8.8
n 10
n xy ( x )( y )
r = n x 2 ( x )2 n y 2 ( y )2
(10)(586 ) (55)(88)
= (10)(385) (55)2 (10(1114) (88)2
1020 1020
= (825)(3396) =
2801700
1020
=
1673.5
= 0.61 ( approx )
Business Statistics / 80
Ex 3 The following table gives are the monthly income and sav-
ings of 10 persons. Calculate the correlation between
monthly income and savings.
Em ployee 1 2 3 4 5 6 7 8 9 10
Monthly 780 360 980 250 750 820 900 620 650 390
Incom e
Net saving 84 51 91 60 68 62 86 58 53 47
Solution:
6500
X = = 650
10
660
Y = = 66
10
2 2
No X Y X XX Y YY X Y Xy
xy
r =
x 2y 2
27040
r =
537800 2224
= 0.78
6 D 2
R = 1-
n(n2 1)
Sta tistics 3 5 8 4 7 10 2 1 6 9
Accountancy 6 4 9 8 1 2 3 10 5 7
Business Statistics / 82
2
Rank X Rank y D D
3 6 -3 9
5 4 1 1
8 9 -1 1
4 8 -4 16
7 1 6 36
10 2 8 64
2 3 1 1
1 10 9 81
6 5 1 1
9 7 2 4
Total 214
6 D 2
R = 1-
n(n2 1)
6(214)
= 1-
10(10 2 1)
= - 0. 2 9 7
x: 75 88 95 70 60 80 81 50
y: 120 134 150 115 110 140 142 100
rRR = n=8
D2 = 6
66
= 1
8(64 1)
1
= 1
21
20
=
21
= 0.93
Business Statistics / 84
3.6 Coefficient of Concurrent Deviation
rC = (2cm)/ m
Price : 1 4 3 5 5 8 10 10 11 15
Demand : 100 80 80 60 58 50 40 40 35 30
Solution :
Price (X) CX Demand ( y ) CY CXCY
1 100
4 + 80 -
3 - 80 0
5 + 60 -
5 0 58 -
8 + 50 -
10 + 40 -
10 0 40 0 +
11 + 35 -
15 + 30 -
C=1
rc = (2cm)/ m
rc = ( 2 1 9 ) / 9
= - 0 . 84
The disadvantages are it is not useful for long term range. It does
not differentiate between small and big variations. The results are rough
indicator and not as accurate as other methods.
Business Statistics / 86
3.7 Standard Error and Probable Error
1 r2
SE =
n
r = Coefficient of Correlation
n = number of observations in Pairs.
Properties of P. E.
1) if r = 6 (PE) then it is not significant
2) if, r 6 (PE) then it is significant & correlation exist.
Thus PE is used for testing the reliability value of r.
(1 r 2 )
P.E = 0.6745
n
0.6745(1 r 2 )
0.072 =
25
0.6745(1 r 2 )
0.072 =
5
0.072 5
(1 r 2 ) =
0.6745
0.360
=
0.6745
360
=
674.5
= 0.5333
r2 = 1 - 0.533
= 0.467
r = 0.467
= 0.6833
(1 r 2 )
Standard Error SE =
n
0.533
=
5
= 0.1066
3.9 Regression
Since the points are unlikely to fall precisely on the line, the
exact linear relationship must be modified to include an error (Stochastic
or random disturbance) term
Y = a + byxX + e
1. Linearity
Y = a + byxX + e
Ŷ = a + byxX
Business Statistics / 90
3.9.1 Least Squares method
Minimize ( Y Ŷ ) 2
Minimize ( X X̂ ) 2
It is expressed as y = a + byxx
where,
dx = X X and dy = Y yŶ
or alternatively,
b yx
n xy ( x)( y)
= n y 2 ( y ) 2
or
xy n X Y
b yx = 2
y2 n Y
3.9.2 Properties of regression Co-efficient
Business Statistics / 92
Sampled
1 2 3 4 5 6 7 8 9 10
individual
Solution :
Selection 88 85 72 93 70 74 78 93 82 92
Test Score
Perform
17 16 13 18 11 14 15 19 16 20
Rating
Selection 79 84 71 77 87
Test Score
Perform
14 15 12 13 19
Rating
xy n X Y
b xy = 2 2
x nX
Y = y / n = 298 / 20 = 14.90
xy n XY
byx = 2 2
x nX
Business Statistics / 94
24,492 20(80.95)(14.90)
=
132,117 20(80.95)2
24,492 24123.1
=
132,117 131,058.05
368.9
=
10578.95
= 0.3484
= 0.35
a = Y byx X
= 14.90 – 0.35 ( 80.95 )
= 14.90 – 28.3325
= -13.43
Therefore the regression equation for estimating the performance
rating on the basis of selection test score is :
Ŷ = a + bX
Ŷ = - 13.43 + 0.35X
The value of bxy = 0.35 indicates that the slope of the regression
line is 0.35 indicating that for each increase of one point in the selection
test score, there is an increase of 0.35 in the performance rating. On the
average. Therefore, a direct (positive) relationship exists between these
two variables.
The value of a = -13.43 may look a bit puzzling. Graphically, this
is the point of intersection of the regression line with the Y axis; hence
this is the value of Y when X = 0, but how can there be a ‘negative’
performance rating when the data indicate that only positive ratings are
assessed? The answer is that any regression equation is only meaning-
ful for the range of the values of the independent variable included in the
sample.
Now, if a trainee applicant has a selection test score of 90, the
estimated performance rating on the job is :
Ŷ = a + byx X
= -13.43 + 0.35 ( 90 ) = -13.43 + 31.50 = 18.07 18
e = Y – Ŷ
Business Statistics / 96
16 87 17 17.02 -0.02
17 72 10 11.77 -1.77
18 77 12 13.52 -1.52
19 82 14 15.27 -1.27
20 76 13 13.17 -0.17
2
Regression Sum of Squares (RSS) = Explained Variation = ( Ŷ Y )
2
Error Sum of Squares (ESS) = Residual variation in y = ( y Ŷ )
r2 = RSS/TSS = 1 – (ESS/TSS)
i.e. (ESS/TSS) = 1 – r2
= unexplaine d error / n = (Y - Yˆ ) 2 / n
Syx = y 1- r2
Operator 1 2 3 4 5 6 7 8
Experience (x) 16 12 18 4 3 10 5 12
Ratings (y) 87 88 89 68 78 80 75 83
Business Statistics / 98
Solution :
n=8
X
x 80 10
we have
n 8
Y
y 648 81
n 8
dx dy 247
b yx = 2 = = 1.133
dx 218
By direct method
(n xy x )( y )
b yx =
n x 2 ( x)2
8 6727 80 648
=
8 1080 (80)2
1976
=
1744
= 1.133
Equation of regression line on x is
y Y b yx ( x X)
y – 81 = 1.133 (x – 10)
X Y
Mean 36 85
Standard deviation 11 08
Solution :
Given x = 36, y = 85
σ x = 11 σy 8
r = 0.66
x 11
Now, b xy = r = 0.66 = 0.908
y 8
The regression equation x on y
x- X = bxy (y - Y )
x – 36 = 0.908 ( y – 85)
x – 36 = 0.908 y – 77.180
x = 0.908 y – 77.180 + 36
= 0.908 y – 41.180
When Y = 75, then x will be;
X = 0.908 X 75 – 41.180
= 36.92 Ans.
Business Statistics / 100
3.12 Limitations of correlation and regression analysis
3.13 Homoscedacity
This means that variance around the regression line is the same
for all values of predictor variable x. The plot shows a violation of this
assumption. For the lower values, the points are all very near the regres-
sion line. For higher values on the x-axis, there is much more variability
around the regression line.
3.14 Summary
1) Co-Variance
2) Perfect
3) One
4) Bivariate distribution
5) Standard error
6) Coefficient of determination
7) independent Variable
8) Multiple Regression
9) y on x
10) Homescedacity
Short Notes:
1) Coefficient of concurrent
2) Types of Errors
3) Line of Regression
4) Homescedacity
5) Standard Error Estimate
4.0 Objectives
4.1 Introduction
4.2 Important Definations
4.3 Basic Calculations in Probability
4.4 Basics of Permutations and Combinations
4.5 Set Theory & Probability Theorems
4.6 Baye’s Theorem
4.7 Mathematical Expectations or Expected Values
4.8 Binomial Distribution
4.9 Poisson Distribution
4.9.1 Properties of Poission Distribution
4.9.2 Examples of Events
4.10 Normal Distribution
4.10.1 Properties of Normal Distribution
4.11 Standardizing Normal Radom Variable
4.12 Summary
4.13 Check your Progress - Answers
4.14 Questions for Self - Study
4.0 Objectives
Probability
It is theory of chance when taken as science.
It is chance of happening an event when considered in connec-
tion with the event. Probability of any event is between 0 and 1, both
included. Probability is also defined as the percentage of times for which
a specific out come would happen if the same experiment were repeated
number of times.
Axioms of probability
There are three axioms of probability : (1) Chances are always
at least zero (2) The maximum chance that something happens is
100% (3) If two events cannot both occur at the same time, the chance
that either one occurs is the sum of the chances that each occurs.
Subjective Probability
Probability theory, which is based on feeling or thinking of a per-
son, is called as subjective probability.
Experiment
Action whose outcomes are of interest to us is called as an
experiment. e.g. toss of a coin. Chance of getting head or tail at a time is
exactly one half.
Sample Space
Set of all possible out comes of an experiment is called as sample
space.
Dependent Events.
If happening of one event changes the probability of another event
then those events are said to be dependent events.
Independent Events
If happening of one event does not change the probability of an-
other event then those events are said to be independent events.
Exhaustive Events
If two or more events cover the entire sample space i.e if two or
more events cover all possible outcomes of an experiment, then such
events are called as exhaustive events.
Impossible Event
If probability of a happening of an event is 0, the event is an
impossible event.
Complement of an event
Complement of an event means that event does not happen. i.e.
if event A is getting 1 in a throw of dice then A complement is not getting
1 in a throw of dice. Complement of event A is denoted by (Ac , A’ or
c
A ) and P(A ) = 1 – P( A ).
1 12
P(A) = 1 – P(A) = 1 - =
13 13
Thus,
P (A) + p ( A ) = 1
1 12
+ =1
13 13
Now, event A is certain to occur then P(A) = 1 and P( A ) = 0
Alternatively probability can be defined as
n( A )
P(A) =
n(S)
Let us consider tossing of a coin. The outcomes are head or tail.
S denotes a complete set of outcomes for a given situation and it is
called as sample space or universe. Thus in above experiment, Sample
space = S = {H,T}
Let us define event A : Getting a head on the top surface.
Therefore A = {H}
Now n(A) denotes number of elements in the set A. Since set A
has only one element n(A) = 1. Set S, sample space has 2 elements in
Probability & Distributions / 109
it. Therefore, n(S) = 2. Thus probability that head is obtained in the
tossing of a coin is;
n( A ) 1
P(A) = =
n(S) 2
If we apply the first definition, then the number of favourable out-
comes are the ones in which we are interested. In this case we are
interested only in head i.e. number of favorable outcomes is only 1. Total
number of outcomes is 2. Again the probability getting head is ½.
Solution:
3 1
P(A) = =
36 12
3
C2 = 3 x 2 / 1 x 2 = 3 (in the denominator go on multiplying up to
the number after C and in the numerator go on multiplying in the reverse
direction in the decreasing order starting from the number before C for the
same number of digits as that of in the denominator.)
10
C3 = 10 x 9 x 8 / 1 x 2 x 3 = 120
10 ! 10 !
= =
(10 - 3) ! 3 ! 7 ! 3!
7 ! 8 9 10
=
7 ! 1 2 3
Technical definition for combination is
n!
n =
Cr r ! (n - r)!
8P2 = 3.2 = 56
10
P3 = 10.9.8 = 720
Technical definition for permutation is;
n!
n
Pr =
(n r) !
It is read as permutations of r objects taken at a time out of n
objects.
10
P3 = 10 ! / ( 10 – 3 ) ! = 10 ! / 7 ! = 10.9.8
Let A be the event where 3 white balls are drawn. Now , 3 white
balls must to come from 5 available white balls.
N(A) = Number of ways in which 3 white balls can be drawn out
of 5 balls
5
5.4.3
= C3 =
1.2.3
= 10 ways
9
9.8.7
= C3 = = 3.4.7
1.2.3
= 84 ways
n(A) 10 5
P(A) = = =
n (S) 84 42
4) n C = _______________.
n
5) n C = _______________.
i
N = { 1,2,3, ............. }
i) P ( A) 0
ii) P( ) 0
iii) P (S ) 0
Proof - Since A is an event, therefore ACS
n( A)
i) P ( A) 0
n( S )
n( ) 0
ii) P( ) 0
n( S ) n( S )
n( S )
iii) P( S ) 1
n( S )
Points to Remember (Most IMP)
( A B n ( A B ) n( A) n( B ))
n( A) n ( B)
n( S ) n( S )
P ( A) P ( B )
k
p( Ai)
i 1
P (A - B) = P(A) - P(AB)
Proof - Let A and B are two events
A-B
A B AB
(A-B) (AB) =
(A-B) (AB) = A
P( A - B) P(A) - P(AB)]
A B AB
A-BB=
A-BB=AB
A-B
P(AB) = P[(A - B) B]
= P(A - B) + P(B)
= P(A) - P(AB) + P(B) A B
P(A or B) = P(AB) = P(A) + P(B) - P(AB) AB
Theorem 7 - Addition Law for three events -
A B C
Let BC = D
Then P(ABC) = P(AD) = P(A) + P(D) - P(AD) ……. (1)
…… (by Theorem (1))
But
AD = A (BC)
= (AB) (AC)
P (AD) = P [(AB) (AC)]
= P (AB) + P (AC) - P [(AB) (AC)] ……….(2)
[…….By Theorem 6]
= P (AB) + P(AC) - P(ABC)
and P(D) = P(BC) = P(B) + P(C) - P(BC) ……..(3)
using (1), (2), and (3) we have
P (ABC) = P(A) + P(B) + P(C) - P(AB) - P(BC) - P(AC) +
P(ABC)
Corollary - IF A, B, C are mutually exclusive events.
Then
P(AB) = P(A) + P(B)
P(BC) = P(B) + P(C)
P(AC) = P(A) + P(C)
P(ABC) = P(A) + P(B) + P(C)
A B
Proof - Given A B
B-A
B = A (B - A)
and A (B-A) =
P(B) = P[A (B-A)]
= P(A) + P(B - A) (by Theorem 4)
P(A) P(B) P(B-A) 0
Theorem 10 - If A is an event associated with a random experiment,
then 0 P(A) 1
Examples : 1
n( A) 5 1
P ( A) 0.2
n(S ) 25 5
n( B ) 3
P( B )
n(S ) 25
5 3
25 25
53
25
8
25
T C
16 20 25
39
Now,
By addition theorem
N (T U C) = n (T) + n (C) – n (T C) (Number of people taking tea or
coffee or both)
= 36 + 45 – 20 = 61
n (T U C)’ = n (S) – n (T U C) (Number of people neither taking tea nor
coffee)
= 100 – 61 = 39
45
P(C) = n (C) /n (S) = = 0.45
100
4. P (Person Takes Tea or Coffee)
61
P (T U C) = n (T U C) / n(S) = = 0.61
100
5. P (Person Takes Tea and Coffee)
P (T C) = n (T C) n (S) = 20/100 = 0.20
6. P (Person neither Takes Tea nor Coffee)
P (T U C) = n (T U C)’ / n (S) = 39/100 = 0.39
Or P (T U C) = 1 – P (T U C) = 1 – 0.61 = 0.39
7. P (Person Takes only Tea)
i.e P (Person takes tea and not coffee)
P (T – C) = P (T C’) = P (T) – P (T C) = 0.36 – 0.20 = 0.16
n (A) 26
So, P (A) = =
n (S) 52
n (S) = 52 (Total sample space)
n (A B) = 2
So, P (A B) = n (A B) = 2
n (S) = 52 (Total no. of cards in a pack)
[ Note : A U B = A or B, A B = A and B ]
Solution
n (S) = 13
Let A be the event that ball selected is with a number that is
multiple of 3,
i. e 3,6,9,12
n (A) = 4
4
P (A) =
13
Let B be the event that ball selected is with a number that is multiple
of 4, i. e. 4,8,12 n (B) = 3
3
P (A) =
13
1
P (A B) =
13
P (A U B) = P (A) + P (B) – P (A B)
4 3 1 6
= + - =
13 13 13 13
N (S) = 20
Let A be the event that ticket drawn is a multiple of 2
i.e. 2,4,6,8,10,12,14,16,18,20
n (A) = 10
10
P (A) =
20
Let B be the event that ticket drawn is a multiple or 5
i.e. 5,10,15,20
n (B) = 4
4
P (B) =
20
n (A B) = 2 (2 numbers are common to both events A and B)
2
P (A B ) =
20
P (A U B) = P (A) + P (B) – P (A B)
10 4 2 12 3
= + - = =
20 20 20 20 5
Solution:
30
P (Failed in Maths) = = P (A)
100
20
P (Failed in Chem) = = P (B)
100
10
P (A B) =
100
P( A B)
P(A/B) =
P(B)
10
100
= 20
100
10 100
=
100 20
1
=
2
ii ) either in Maths or in Chem
(A U B) = ?
P ( A U B) = P ( A) + P (B) – P (A B)
30 20 10
= + -
100 100 100
30 20 10
=
100
40
=
100
40
P (A U B) =
100
= 0.40
Ex 10 The probability that a contractor will get a plumbing con-
tract is 2/3, and the probability that he will not get a electric
contract is 5/9. if the probability of getting at least one con-
tract is 4/5, what is the probability that he will get both the
contracts.
Solution
2
P (A) =
3
Event B = will get electric contract
5
P (B’) =
9
4
Probability of getting at least one contract i.e. P (A U B) =
5
Probability of (A B) =?
5 9 5 4
P (B) = 1 - = =
9 9 9
P( A B) = P(A)+P(B)-P(A B)
2 4 4
= + -
3 9 5
42
=
135
Ex 11 An urn contains 7 black and 5 white balls. Two balls are
drawn at random one after another. Find the probability that
both balls drawn are black if :
Probability & Distributions / 127
i) when first ball drawn is not replaced before drawing the second
(such drawing is called without replacement) and
ii) when first ball drawn is replaced before the second ball (such
drawing is called with replacement)
Solution
Black Balls = 7
White Balls = 5
Total balls = 7 + 5 = 12
So
n (s) = 12C2
12 11
= = 6 11 = 66
1 2
i) When first ball drawn is not replaced before drawing the second
(such drawing is called without replacement) In such cases we
find the probability by usual method 2 black balls can be drawn
out of 7 black balls in 7C2 ways.
7!
7
n (A) = C2 =
(7 2 ) ! 2 !
5 ! 6 7
= = 21
5 ! 1 2
7
C2
P (A) = 12
C2
21
=
66
7
=
22
ii) When first ball drawn is replaced before the second ball (such
drawing is called with replacement)
A = 1st ball drawn is black
i.e. We consider the event in two steps.
For first step 1 black ball is to be drawn out of 7 black balls
n (A) = 7C1 = 7
n(A)
P (A) =
n(B)
7
=
12
B = 2nd ball drawn is black when first ball is replaced.
At this stage the ball drawn is put back into the urn.
We, therefore have again same situation i.e. 7 black balls and
12 total balls – for second step 1 black ball is to drawn again out
of 7 black balls as earlier ball is replaced.
N (B) = 7C1 = 7
And for n (S) 1 ball is to be drawn out of total 12 balls
n (S) = 12C1 = 12
n(A)
P (B) =
n(B)
7
=
12
Since A and B are independent events
7 7 49
P (A B) = P (A) P (B) = =
12 12 144
Solution
10.8.9
n (S) = 10C3 =
1.2.3
= 120
2 blue balls can be drawn out of 6 in 6C2 ways = 6.5 = 15 ways.
60
P (A) =
120
1
=
2
Ex. 13 A and B are independent events and P (A) = 1/3,P (B) = ¾
find P (AUB)
Solution
Solution
1
P (A) = P (X selection) =
5
Business Statistics / 130
1
P(B) = P(Y Selection) =
3
i. P (both X and Y are selected) = P(A and B) = P(A B)
1 1
=
5 3
1
=
15
i. P( only one of X and Y is selected) = P(only A) + P(only B)
P(only A) = P(A) – P(A B)
= P (A) – [P(A) . P(B)]
1 11
= -
5 5 3
1 1
= -
5 15
15 5
=
75
10
=
75
2
=
15
And
P (only B) = P(B) – P(A B)
= P(B) – [P(A).P(B)]
1 11
= -
3 5 3
1 1
= -
3 15
15 3
=
45
4
=
15
Probability & Distributions / 131
iii. None of them selected
p(x, y) bothe not selected )
= 1 - p ( x and y both selected )
1
= 1-
15
14
=
15
4.6 Baye’s Theorem
If A1,A2,A3 ….. An are mutually exclusive and exhaustive events
and B is any other which is spread over events A1, A2, A3…..An. Consider
that there are 3 containers containing balls of different colors. Then event
A1 is selection of container 1, event A2 is selection of container 2 and A3
is selection of container 3. Thus it can be seen that events A1, A2 and A3
are mutually exclusive and exhaustive. If we define events B as drawing a
yellow ball then yellow ball can be selected from container 1 or 2 or 3.
Then we say that event B is spread over events A1, A2 and A3. And if we
know that the ball drawn is yellow then probability that the ball is se-
lected from a particular container is given by
P( Ai)P(B )
Ai
P( A1 ) = P( A1) P( B ) P( A 2 ) P( B ) ...P( A i ) P( B ) ...P( A n ) P( B )
B A1 A2 Ai An
Alternatively ...
A( Ai B)
P( Ai ) =
B A( A 1 B) P( A 2 B) ..... P( Ai B) ...... P( A n B)
1
P(A1) =
3
P( A 3 B)
P(A3/B) = P( A 1 B ) P( A 2 B ) P( A 3 B )
5 1
P(B/A1) = =
10 2
Similarly ,
8 2
P(B/A2) = =
12 3
6 2
P(B/A3) = =
9 3
1 1 1
P(A1 B) = P (B/A1). P(A1) = =
3 2 6
1 2 2
P(A2 B) = P (B/A2). P(A2) = =
3 3 9
1 2 2
P(A3 B) = P (B/A3). P(A3) = =
3 3 9
2
9
= 1 2 2
6 9 9
2
9
= 33
54
2 54
=
9 33
4
=
11
Solution
P(A1 )P( B )
A1
P(A1/B) = P(A1 )P( B ) P(A 2 )P(B )
A1 A2
P(A1 B)
= P(A1 B) P(A 2 B)
10 95
100 100
P(A1/B) = 10 95 90 45
100 100 100 100
0.095
=
0.95 0.405
0.095
=
0.5
= 0.19
= 3 – (1.5) 2
= 3 – 2.25 = 0.75
Solution
X 1 2 3 4 5 6
Probability 1 1 1 1 1 1
6 6 6 6 6 6
21 7
P ix i =
6
= = m = E(x)
2
1 1 1 1 1 1
Pix2i = (12 x ) + (22 x ) + (32 x ) + (42 x ) + (52 x ) + ( 62 x )
6 6 6 6 6 6
1 4 9 16 25 36
= + + + + +
6 6 6 6 6 6
91
Pix2i =
6
= 15.16 = E(x2)
Probability (P) 1 3 3 1 1
8 8 8 8
Solution
Total Tickets = 8 T
n (S) = 8 C 1 = 8
n (A) = 3 C 1 = 3
n (B) = 5 C 1 = 5
3 5
C1 C1
P (A) = 8 P (B) = 8
C1 C1
3 5
P (A) = , P (B) =
8 8
3 5
Pi Xi = (5 x ) + (2 x )
8 8
15 10 25
= + =
8 8 8
Pi Xi = 3.125 = m = E (x)
Pix i = (10 x 3 28 ) + (4 x 10 28 ) + (7 x 15 28 )
Solution
A B
Defective P (A) = 9/100 Defective P (B) = 5/100
Good P (A)’ = 91/100 Good P(B)’ = 95/100
So assumed part is not defective will be A’ and B’.
i.e. is not defective A’ B’.
Therefore P ( A’ B’ ) = 91 95
100 100
= 8645
10000
= 0.8645
3 1
1 1
= 4
2 2
4 1
= =
16 4
6. Sum of all probabilities is 1. i.e. f (0) + f (1) + f (3) = f(4) = 1 i.e.
f(x) = 1
7. Mean or expected value of x E (x) of binomial experiment is
m = np.
1
In this case expected value of heads E (x) = m = 4 =2
2
1 1
8. Variance of x V(x) = npq. In this case V (x) = 4 =1
2 2
10. The most likely value mode of x is given by the largest integer
less than or equal to (n + 1) p; if m = (n + 1)p is itself an integer,
then m – 1 and m are both modes.
11. Sums of binomials
If x ~ B (n, p) and y ~ B (m, p) are independent binomial vari-
ables, then z = x + y is again a binomial variable then its distri-
bution is z ~ B (m + n, p)
The horizontal axis is the index x. The function is only non – zero
at integer values of m. The connecting lines are only guides for the eye
and do not indicate continuity.
1
F(X) = e ( X µ )2
2πσ 2
This is called as probability density function (PDF) where is
the mean and is the standard deviation, is the constant
3.14159, and e is the base of natural logarithms and is equal to
2.718282. x can take on any value from – infinity to + infinity.
9. Total probability of the whole area under the curve = 1.
10. Mean or expected value of x, E (x) of normal distribution is .
11. Variance of x, V (x) = 2 . In this case V (x) = 2
µ–σ µ μ+σ
µ – 2σ µ μ + 2σ
68.27% of the area under the curve is within one standard devia-
tion of the mean. ( 1 range i.e. + )
95.45% of the area is within two standard deviations. ( 2
range i.e ± 2 )
99.73% of the area is within three standard deviations. ( 3
range i.e. ± 3 )
99.99% of the area is within four standard deviations. ( 4
range i.e. ± 4 )
99.9999% of the area is within five standard deviations.
(5 range i.e. ±5 )
99.999999% of the area is within six standard deviations.
(6 range i.e. ± 6 )
99.999999999% of the area is within seven standard deviations.
(7 range i.e. ± 7 )
If X ~ N ( , 2 ), then
X
Z =
4.12 Summary
4.2
1) 1 and 0
2) Experiment
3) Head and tail
4) Sample space
5) Dependent event
1) 1
2) 1
n!
3)
r !( n r ) !
4) 1
5) n
4.10
1) Normal distribution
2) = Median = mode
3) Probability density function
4) One
5) Normal
1) Short Notes –
a) Normal Distribution
b) Probability
2) What is Poisson distribution?
3) Write properties of Poisson distribution?
4) Explain the term, ‘Expected value’
5) Where can be Baye’s Theorem applied?
INDEX NUMBERS
5.0 Objectives
5.1 Introduction
5.2 Price and Quantity Relatives
5.3 Price and Quantity Index, Numbers
5.4 Laspeyre’s, Paasche’s Index Numbers
5.5 Advantages & Disadvantages
5.6 Illustration
5.7 Various Index Numbers
5.8 Consumer price index
5.8.1 Calculating a consumer Price Index
5.9 Summary
5.10 Check your Progress – Answers
5.11 Questions for Self - Study
5.0 Objectives
5.1 Introduction
If we let P0 be the price in the base period and let PN be the price
in the later period, then the price relative for the price change between
PN
these periods is given by P 100 .
O
Price Relative is given by:
Price of one commodity in the current year
100
Price of the same commodity in the base year
PN
= P 100
O
QN
= Q 100
O
PnQo
The Laspeyre’s price is given by P Q 100 .
o o
P
N 100 POQ O
Laspeyre’s Price Index = PO
PO QO
PNQ O
100
POQ O
The end weighted or Passsche’s price index = (PnQn/ POQN) 100
The base weighted index has the advantage that we only have to
work out the base year expenditures once. We can then use these in the
calculation of the index in any subsequent period. However, this index
can be misleading in telling us what is actually going on. For example,
the fluctuations in fashion might have a considerable impact on an index.
Suppose that skirts were considered as a separate item in a women’s
clothing manufacturer’s index. The greatly increased relative popularity
of trousers would dramatically affect the quantities sold and any index
which used base year quantities from some time back would be mislead-
ing. The next index that we consider avoids this particular problem.
Wheat 5 9 500 1200 2500 4500 6000 10800 3000 5400 600
Rice 4 10 600 1800 2400 6000 7200 18000 3200 8000 800
Cereals 7 14 400 900 2800 5600 6300 12600 2100 4200 300
Commodity Price Rs/Kg Quantity (tons) (PN /PO) 100 (QN/QO) 100
W It is any standard value other than base and current year quan-
tities.
PO = 16
PN = 31
QO = 1500
QN = 3900
POQO = 7700
PO Q N = 19500
PNQ O = 16100
P NQ N = 41400
P
N P 100 630
O = = 210
Number of Commodities 3
Walsh index PW pn qo qn
100
p qo qn
100 10437
Over the period1996-2003 there has been a 10% rise in the gen-
eral price level. But this hides major changes in average prices for differ-
ent products. The average cost of purchasing tobacco products has jumped
by nearly sixty per cent whereas the prices of clothing, second hand
cars and communication have been falling.
Price 4 15 6 20
Wheat 3 40 5 35
Jawar 5 20 5 25
Pulses 6 10 8 10
A 8 50 10 60
B 10 40 12 50
C 5 100 9 70
D 6 10 8 20
5.9 Summary
5.7
1. Economic Barometers 2. Base year
3. Base year 4. Price in current year
PNQo 100
QNPN 100
5. 6.
PoQo QoPN
5.8
1. Laspeyre’s Index Number = 138.2
2. Paasche’s Index Number = 135.135
3. Fisher’s Index Number = 136.67.
Index Numbers / 167
NOTES
A M M A M M E M O A
E E M A O E M A M A
M A O A M E E M A M
5. Profit after tax for a company for the last six years is as given
below. Draw a bar diagram.
0 – 5.0 1
5.0 – 10.0 4
10.0 – 15.0 10
15.0 - 20.0 20
20.0 – 25.0 50
25.0 – 30.0 80
30.0 – 35.0 60
35.0 – 40.0 65
40.0 – 45.0 30
9. Find the median, lower, upper quartiles, 4th docile and 70th per-
centile for the following distribution.
Dividend Yield 0-4 4-8 8-12 12-14 14-18 18-20 20-25 25 above
Number of 10 12 18 7 5 8 4 6
Companies
14. If four coins are tossed once, write down the sample space.
15. If three units are tested, each unit will be either Good (G) or
defective (D). Write down the sample space for testing of 3 units.
16. A box contains 200 bulbs of which 20 are defective. If one bulb
is selected at random. Find the probability that is non defective
(Ans 9)
19. What is the probability that the series which ends when a team
wins 4 games will last 4 games? 5 games? 6 games? 7 games?
Assume that the teams are evenly matched.
Mean 10 90
S. D. 3 12
X 43 44 46 40 44 42 45 42 38 40 472 57
Y 29 31 19 18 19 27 27 29 41 30 26 10
Height of Husband x 60 62 64 66 68 70 72
(in inches)
X 1 2 3 4 5 6 7 8 9 10
Y 20 16 14 10 10 9 8 7 6 5