0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

16 visualizzazioni175 pagineFeb 15, 2020

© © All Rights Reserved

0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

16 visualizzazioni175 pagineSei sulla pagina 1di 175

WHAT IS STATISTICS?

1.0 Objectives

1.1 Introduction

1.2 Statistical Modeling

1.3 Probability

1.4 Common statistical Terminology

1.5 Population

1.6 Probable errors in statistics

1.7 Variables

1.8 Statistical Measures (Tools)

1.8.1 Central Tendency

1.8.2 Measures of Dispersion

1.9 Distribution

1.10 Expectations

1.11 Association

1.12 Summary

1.13 Check your Progress - Answers

1.14 Questions for Self - Study

1.0 Objectives

operations students can explain the following –

Concepts of statistics

Statistical Modelling and statistics for Business decision mak-

ing

What is probability and its use in Business for making proper

decisions.

Common statistical terminology and tools. Students can solve

problems of proper decision making in Business involving

numerical data.

What is Statistics? / 1

1.1 Introduction

In our day to day life we deal a lot with statistics, may be without

being aware of it. For example, when you tell your graduation marks –

You are making use of ‘Averaging’ concept of statistics. When you talk of

odds in favour of India’s winning a cricket match – You are dealing with

‘Probability’. When you are talking of most selling product – You speak

of ‘Modal Value’. Weather forecast is also based on statistical analysis

of weather conditions collected with the help of Satellite.

analysis. Interpretation or explanation, and presentation of data. It is ap-

plicable to a wide variety of academic disciplines, from the physical and

social sciences to the humanities. Statistics is also used for making

informed decisions.

in singular sense it describes a discipline, a subject. e.g. BBA semester

– I has one subject as Statistics. When used in plural sense, it denotes

the results obtained from the data. e.g. What are the export statistics?

Here you are interested in statistical data related to exports.

for Decision Making.

1. Descriptive Statistics

2. Inferential Statistics

you what kind of data you are dealing with e.g. what is the average strength

of the cement block. What is the variability of that strength? We know

that all cement blocks that are tested may not give the exactly same

value. We need to know how much the test values differ from each other.

The idea of this variation is given by variability measures. All these mea-

sures are discussed at length in the following chapters. Thus Descriptive

statistics can be used to summarize the data, either numerically or graphi-

cally, to describe the sample. Basic examples of numerical descriptors

which are used to describe the data include various kinds of charts and

graphs.

we decide whether the cement blocks have the required strength from

Business Statistics / 2

the given test results? Yes or No? Such type statistics is mainly useful in

hypothesis testing. You will study the introductory concepts of hypoth-

esis testing in this book. Thus Inferential Statistics is used to model

patterns in the data, accounting for randomness and drawing inferences

about the larger population. The inferences are drawn through Statistical

modeling. Statistical models give us some sort of relationship between

different variables and observations under study. Modelling can be used

to draw inferences, which may take the form of answers to yes / no

questions (hypothesis testing), estimates of numerical characteristics

(estimation), descriptions of association (correlation), or modelling of re-

lationships (regression). Other modelling techniques include ANOVA

(Analysis of Variance), time series and data analysis.

scribe a collection of data and in addition, patterns in the data may be

modelled in a way that accounts for randomness and uncertainty in the

observations, and then used to draw inferences about the process or

population being studied. Both descriptive and inferential Statistics com-

prise Business Statistics or Applied Statistics.

is concerned with theoretical basis of the subject.

1.3 Probability

mathematical probability and subjective probability. For example, what

are the chances of India’s winning a cricket match? A person might say

that chances are 50% Is it same as probability? If his decision is based

on past Statistical data on India’s winning and losing records and if the

chances are calculated according to the rules of probability calculations,

then it is mathematical probability of India’s winning the match is 0.5,

otherwise it is subjective probability which based on what one feels!

if the event comprises of all possible outcomes of an experiment. For

example probability that a toss of a fair coin will yield head or tail is 1.

of the possible outcomes of an experiment. For example probability that

a throw of a fair dice will yield 7 is 0.

able. The changed probability is called as posterior probability and the

probability in the absence of such information is called as priori probability.

What is Statistics? / 3

An interesting observation about probability is even if probability

of happening of an event is very low say 0.01, the happening of event

cannot be ruled out! And it is true for every trial. And if that chance of 0.01

turns into reality for majority of trials, the probability would show the

upward change!

Probability

It is theory of chance when taken as science.

It is chance of happing an event when considered in connection

with event. Probability of any event is between 0 and 1 both

included.

Mathematical or Objective Probability

Probability theory, which is based on Statistical data and prob-

ability axioms, is called as mathematical probability.

Axioms of probability

There are three axioms of probability : (1) Chances are always

at least zero. (2) The maximum chance that something hap-

pens is 100%.

Subjective probability

Probability theory, which is based on feelings of thinking of a

person, is called as subjective probability.

Conditional probability

It is probability of an event that is calculated on the assumption

that some related has happened.

Experiment

Action whose outcomes are of interest to us is called as an

experiment e.g. tossing of a coin.

Event and Happening of an event

Event is a set of one or more outcomes of an experiment. An

event is said to have happened if the outcome is the result of the

experiment. e.g in the experiment of tossing of coin there are

two outcomes head and tail. Two events A and B can be defined

as follows

Event A : Head shows up in the experiment of tossing of a coin.

Event B : Tail shows up in the experiment of tossing of a coin.

Now if head shows up, then we can say that event A has hap-

pened.

Probability of an event A is denote by P (A).

Business Statistics / 4

Sample Space

Set of all possible outcomes of an experiment is called as sample

space.

Dependent events

If happening of one event changes the probability of another

event then those events are said to dependent events.

Independent events

If happing of one event does not change the probability of an-

other event then those events are said to be independents events.

Mutually Exclusive Events

Two or more events which cannot happen at the same point of

time are called as mutually exclusive events.

Exhaustive Events

If two or more events cover the entire sample space i.e. if two or

more events cover all possible outcomes of an experiment, then

such events are called as exhaustive events.

Certain Event

If probability of a happening of an event is 1. The event is a

certain event.

Impossible event

If probability of a happening of an event is 0, the event is an

impossible event.

Complement of an event

Complement of an event means that event does not happen. i.e.

if event A is getting 1 in throw of dice then A complement is not

getting 1 in a throw of dice. Complement of event is denoted by

(A C , AI or A ) And P ( A C ) = 1 - P ( A ).

1.3 Check your Progress

1. What is statistics?

________________________________________________________

________________________________________________________

2. What is mean by descriptive statistics?

________________________________________________________

________________________________________________________

What is Statistics? / 5

3. Short Notes:-

i. Probability:

________________________________________________________

________________________________________________________

ii. Descriptive Statistics:

________________________________________________________

________________________________________________________

iii. Inferential Statistics:

________________________________________________________

________________________________________________________

Data or Data Set

Data is a set of measurements of some qualitative aspect or

quantitative aspect. If we record earnings of five persons in a

city, say 100, 200, 200, 500, 1000 (in Rs.) then those figures

will be our data set or data.

Unit

Unit is an individual about which data is to be collected, e.g. Person.

Observation

Observation is individual measurement in a data set, e.g.

Rs. 700.

Quantitative Data

Quantitative Data is data which has numerical value e.g. the

above data. Another example is marks obtained by students in

a class.

Qualitative Data

Qualitative Data is data which does not have numerical value. It

is data which is of descriptive nature, e.g. colour of eyes.

Table of Dividend paid by 50 Companies

Dividend (%) No of Companies

0–6 8

6 – 12 10

12 – 18 15

18 – 24 12

24 – 30 5

Table 1.4.1

Business Statistics / 6

Frequency

Number of times the observation repeats is called as frequency

of that observation. In our example frequency of class 6-12 is 10

thats means frequency of observations is 10.

Class

Class is a group of the observations whose value fall in the

specified range e.g. If we make classes of marks obtained by

students as 11-20, 21-30 etc. then number of the students scoring

marks between 11 to 20 will be recorded in class 11-20 and so

on.

Class Boundary. (Class Limits)

The extreme values which defines that which observation would

included in a class are called as class boundaries.

Upper class boundary and Lower Class boundary

The uppermost and lowermost values of class are called as

upper class boundary and lower class boundary respectively.

In above example (table 1.4.1) consider the class 12-18,

12 - Lower limit (lower boundary)

18 - Upper limit (Upper boundary)

12% and 18% divided both are included i.e., 15 companies will

get 12 to 18% divided. This method is called inclusive method.

Where lower limit is included and upper limit is not included in

that percolator class.

e.g.

0-10 12

10-20 15

20-30 6

30-40 9

Table 1.4.2

because method is exclusive means lower limit of class 30-40,

30 is included and for previous class 20-30, 30 is excluded.

What is Statistics? / 7

Class Interval

Difference in Upper class boundary and Lower class boundary

is called as class interval. In above case class interval is

0 - 10 = 10. Example (table 1.4.2).

Class Marks

Mid point of a class is known as class mark. It is average of

Upper class boundary and Lower class boundary i.e. 25 is class

20 30 50

mark for the class 20-30. E.g. = = 25

2 2

Class Frequency

Class frequency is the number of observations in the class.

2) Difference between upper class boundary an Lower class

boundary is called as _______________

3) The uppermost and lowermost values of class are called as

_______________ and _______________.

4) Num ber of tim es t he observ ation repeats called as

_______________.

5) Class is a group of the observations whose value fall in the

_______________.

6) Colour of eyes is a _______________.

7) Marks obtained by students in B.B.A Class is _______________

variable.

8) Probability is a theory of _______________.

9) ANOVA is _______________.

10) Modeling can be used to draw _______________.

ences are to be drawn is called as population or universe. e.g. If some

Business Statistics / 8

conclusions are to be drawn about the students in a particular college, all

students of that college will comprise the population.

Sample

The part of the population selected for the purpose of the study

is called as sample. In above case it will be difficult to interview

all the students of the college as the total number of students

could be in thousands. In such case one would select a few

students for interview. Students selected for interview comprise

of sample.

Sample Size

The number of elements in a sample is called as sample size.

Sample Survey

A survey based on the responses of a sample of individuals,

rather than the entire population.

Cluster Sample

In a cluster sample, the entire population is divided into hetero-

geneous group and some of such groups are selected as sample

which is chosen on geographical basis is example of cluster

sampling. If the blocks are chosen separately from different

strata, so the overall design is stratified cluster sample.

Convenience Sample

A sample drawn because of its convenience not a probability

sample e.g. sample of people having telephone numbers in as

Pune city to decide about the population in city is convenience

sample. It is selected because it would be easier to interview

people over phone rather than visiting their homes. Samples of

convenience are not representative of the population, and it is

not possible to quantify how unrepresentative results based on

samples of convenience will be.

Random Sample

A random sample is a sample whose members are chosen at

random from a given population in such a way that the chance

of obtaining any particular sample can be computed from

particular population.

Simple Random sample or probability sample

A simple random sample is the sample selected from population

where every individual of the population has equal chance of

What is Statistics? / 9

getting selected. A simple random sample can be drawn in two

ways – SRSWR (Simple Random Sample with Replacement)

and SRSWOR (Simple Random Sample without Replacement),

In SRSWR individual once selected in the sample can be again

selected in another sample i.e. it is put back in the population.

In SRSWOR individual once selected in the sample cannot be

selected in any other sample i.e. it is not put back in the

population. If we want to draw sample of size 2 from the numbers

4.5.6 then with SRSWOR we can have the following samples.

(4,4), (4,5), (4,6), (5,4), (5,5), (5,6), (6,4), (6,5), (6,6). While with

SRSWOR we can have only 3 samples as (4,5), (4,6), (5,6)

Thus if sample size in n i.e. if n units are to be drawn from a

population of N1 units then total number of samples that can be

drawn by SRSWR method is Nn While with SRSWOR method

we can draw NCn samples.

Stratified Sample

In random sampling, sometimes the sample is drawn separately

from different disjoint homogeneous (having same properties)

subsets of the population which itself is heterogeneous (having

different properties) i.e. population is divided into number of

groups. Each such group is called a stratum. The plural of stratum

is strata. Samples are drawn separately from each of such group.

Sample drawn in such a way are called stratified sample.

For example, to determine buying habits of persons in

society, one needs to divide the populations of the city into various

income groups. Because buying habits would differ according

to the income. Thus heterogeneous population that population

having dissimilar incomes is divided into number of homogeneous

groups or strata having similar incomes.

Systematic Sample

A systematic sample from a frame of units is one drawn by

listing the units and selecting every individual after fixed interval.

For example, if there are 100 units in the population and a sample

of 10 is to be drawn, then every 10th is selected. It is not

necessarily the first unit, the eleventh unit, the 21st unit….. The

first unit selection is usually made by a random number and

then every 10th unit selected. Systematic samples are not random

samples, but they often behave essentially as if they were

random, if the order in which the units appear in the list is

haphazard. Systematic samples are a special case of cluster

samples. Systematic samples are not as good as simple random

Business Statistics / 10

sampling. When starting unit is not selected by random number

method rather than it is decided by the judgement, then such

sample is called as Systematic Sample and not as Systematic

Random sample.

Quota Sample

Quota sampling is a method of sampling widely used in opinion

polling and market research. Interviewers are each given a quota

of subjects of specified type to attempt to recruit for example,

an interviewer might be told to go out and select 20 adult men

and 20 adult women, 10 teenage girls and 10 teenage boys so

that they could interview them about their television viewing.

Sampling Error

Sampling error are the errors in the sample selection which can

lead to incorrect results. Sampling errors are broadly classified

as random errors, error to due bias or systematic errors.

Random Error

All measurements are subject to error, which can often be bro-

ken down into two components: a bias or systematic error, which

affects all measurements the same way; and a random error,

which is in general different each time a measurement is made,

and behaves like a number drawn with replacement from a box

of numbered tickets whose average is zero.

Systematic error

An error that affects all the measurements similarly. For ex-

ample, if a ruler is too short, everything measured with it will

appear to be longer than it really is (ignoring random error). If

you are watching runs fast, every time interval you measure

with it will appear to be longer than it really is (again, ignoring

random error). Systematic errors do not tend to average out.

Systematic errors can also originate from incorrect sampling

procedures.

The standard Error of a random variable is a measure of how far

it is likely to be from its expected value; that is, its scatter in

repeated experiments. The SE of a random variable X is defined

What is Statistics? / 11

That is, the standard error is the standard deviation of the

errors.

1.7 Variable or Variate

A letter which can take values of all observations e.g. If variable

x represent marks or three students who have scored 40, 50

and 60 marks, then x 1 = 40, x2 = 50, x3 = 60.

Categorical Variable

A variable whose value ranges over categories, such as male,

female. Some categorical variables are ordinal.

Continuous Variable

A quantitative which can take all values in its range is called as

continuous variable. Its set of possible values is infinite set. In

practice, one can never measure a continuous variable to infi-

nite precision, so continuous variables are sometimes approxi-

mated by discrete variables, A random variable X is also called

continuous if its set of possible values is zero A random variable

is continuous if and only if its cumulative probability distribution

function is a continuous function (a function whose graph does

not show any break.)

Discrete Variable

A quantitative which cannot take all values in its range is called

as discrete variable. Its set of possible values is finite set. A

discrete random variable is one whose set of possible values is

countable. A random variable is discrete if and only if its cumu-

lative probability distribution has breaks in its graph.

Ordinal Variable

A variable whose possible values can be arranged in some or-

der, such as short, medium, long. In contrast, a variable whose

possible values are India, China, USA, are not ordinal variables,

Arithmetic with the possible values of an ordinal variable does

not necessarily make sense, but it does make sense to say

that one possible value is larger than another.

E.g. 1) 5, 4, 2, 3, then 2, 3, 4, 5 are ordinal sample

2) Good, Better, Best.

Random Variable

A random variable denotes possible outcomes of a random ex-

periment. E.g. A coin is tossed, we get H and T as random

variable.

Random Experiment

A random experiment is the one in which all outcomes have

Business Statistics / 12

equal chance of appearing. e.g. A throw of fair dice has outcome

1,2,3,4,5,6. Since all outcomes have an equal chance of ap-

pearing, throw of a fair dice is an random experiment and if x

denotes the outcomes 1,2,3,4,5,6 then x is a random variable.

Bias

When the measurements are affected by the judgment of the

data collector or data analyst rather than by standard Statistical

procedures, bias is said to be introduced. A biased estimate

gives the value, which is different from the truth. Numerical value

of bias is the average difference between the measurement value

and the actual value which could have been obtained without

bias. Unbiased or random selection procedure is without any

bias.

Dependant Variable

When value of the first variable is governed by the value of the

second variable then first variable is dependent variable. e.g.

x- Rank in examination, y= Number of marks in examination.

Independent Variable

When value of the variable is not governed by the value of any

other variable then such variable is called as independent vari-

able. e.g. x- Height of a student, y - Marks of student.

value.

Measures Central Tendency are the numerical values represen-

tative of the data. These are mean, mode and median.

Arithmetic Mean

It is given by sum of all observations divide by total number of

observations. Consider the observations 10,15,20,30,35.

Arithmetic mean = [10 +15 + 20 + 30 + 35] / 5 = 19

Geometric Mean

It is given by nth root of product of all observations where n is

What is Statistics? / 13

total number of observations. Geometric mean of 2,2,8,8.

is 4 2.2.8.8 = 4 256 = 4.

Harmonic Mean

It is the reciprocal of the average of reciprocals of all observa-

tions.

Harmonic mean of 2, 2, 2, 8 is calculated as

13

Step I : [ ½ + ½ + ½ + 1/8 ] =

8

Step II :

n 4 8 32

4

1 13 13 13

8

x

Median

Observation that occupies the middle place when data is ar-

ranged in increasing order is called as median. Median of

10,15,20,30,35 is 20.

Mode

Mode is the most frequently occuring observation. Mode can be

more than one. Mode of 100, 200, 200, 500, 600 is 200.

1.8.2 Dispersion

Dispersion gives idea of spread of data from the central value say

mean.

Deviation

Deviation is the difference between a observation and some ref-

erence value. Observation value is usually represented by X.

Deviation of x from some value A is X – A.

Deviation from mean is X X .

Absolute deviation

When deviation is always taken as positive irrespective of its

sign. It is called as absolute deviation. It is represented as

| X X|.

Business Statistics / 14

Mean Deviation

It is sum of absolute deviations from mean divided by total num-

ber of observations. See chapter 2 for examples.

Standard Deviation

It is square root of sum squares of mean deviation divided by

total number of observations. See chapter 2 for examples.

Variance

Variance is square of standard deviation. See chapter 2 for ex-

amples.

Quartile deviation

It is the difference between the third quartile and first quartile

divided by 2. it is also called as semi-interquartile range. See

chapter 2 for examples.

Inter Quartile range

It is the difference between the third quartile and first quartile.

See chapter 2 for examples.

Range

Range is the difference between the largest value and the small-

est value of the data set. See chapter 2 for examples.

1) A quantitative variable which can take all values in its range is

called as _______________.

2) A quantitative variable which cannot take all values in its range

is called as _______________.

3) A biased estimate gives the value which is different from the

_______________.

4) Measures of Central Tendency are _______________.

5) _______________ is the most frequently occurring observation.

6) Variance is square of _______________.

7) _______________ Square root of Variance.

8) _______________ is the difference between the third quartile

and first quartile.

9) _______________ is the difference between the largest value

and the smallest value of the data.

10) Num ber of ti mes experi ment is repeat ed i s cal led

_______________.

What is Statistics? / 15

1.9 Distribution

the population. It can be represented by a generalized frequency curve.

The distribution can also be represented by some mathematical relation-

ship known as distribution function.

Trials

Number of times experiment is repeated is called as

number or trials.

Binominal Distribution

A random variable has a binomial distribution if it de-

notes number of successes of a particular event in n number of

trials and p is the probability of success in each trial. Probabil-

ity of success remains same for all trials. Binominal distribution

has two parameters (n.p.) it is a discrete distribution e.g. num-

ber of heads obtained in tossing of a fair coin for n times. Vari-

ables representing binomial distribution is a binomial variable or

binomial variate. See chapter 4 for more details.

Poisson Distribution

A random variable has a poison distribution if it denotes

number of successes of a particular event when x units are

picked up from population and m is the mean value of successes

e.g. Finding probability that sample of 10 units would contain 2

defectives if probability of finding defective is .05. Poisson distri-

bution has only one parameter m. It is a discrete distribution.

Poisson distribution is usually applied where probability of suc-

cess is quite low e.g number of accidents, number of defective

products etc. Variable representing Poisson distribution is a

Poisson variable or Poisson variate. See chapter 4 of more de-

tails.

Normal distribution

A random variable is normally distributed if the variable

is continuous and the distribution is symmetric about mean.

50% observations lie below the mean and 50% observation lie

above the mean. It has bell shaped continuous curve in which

two parts are made by the vertical line at mean exactly fit over

each other.

In this distribution mean = mode = median Variable rep-

resenting normal distribution is a normal variable or normal vari-

ate. See chapter 4 for more details.

Business Statistics / 16

Standard Normal Distribution

It is a normal distribution in which mean = mode = median = 0

and standard deviation = 1. Variable representing standard nor-

mal distribution is a standard normal variable or standard nor-

mal variate. Standard normal variate is denoted by z.

Univariate Distribution

Distribution involving only one variable is called as univariate

distribution e.g average marks obtained by students in a ex-

amination.

Bivariate distribution

Distribution involving two variables is called as bivariate distribu-

tion e.g. Marks obtained by students in two subjects say eco-

nomics and statistics in an examination.

Skewed distribution

A distribution that is not symmetrical is skewed distribution.

outcomes where weights are probability of the outcome.

For continuous distribution expected value the mean value.

If X and Y are two random variables, the expected value of their

sum is the sum of their expected values (E(X Y ) E( X ) E( Y)) .

And the expected value of a constant a times a random variable

X is the constant times the expected value of E ( a x ) = a E ( x ).

Hypothesis

An Assumption of outcome of Statistical testing is called as

hypothesis. e.g. Sample is as per required norms according to

the given parameter.

Parameter

Parameter is criterion on which sample is accepted or rejected.

Parameter could be mean, standard deviation etc.

Estimator

Estimator is parameter which is used to estimate the value of

the population parameter. An example of an estimator is the

sample mean. Which is an estimator of the population mean.

Test Statistics

Value of the test parameter is known as test Statistics.

What is Statistics? / 17

Null Hypothesis

It is initial assumption about an outcome before testing. It is

denoted by Ho e.g. Average strength is as per required norms,

which means population mean is same as sample mean. It is

written as Ho: X .

Alternative Hypothesis

Another assumption if null hypothesis is proved to be false. It is

denoted by H1 e.g. Average strength is less than required norms.

Confidence Interval

A confidence interval is percentage of observations that are sup-

posed to lie in that interval. e.g. 95% confidence interval is sup-

posed to contain 95% of observations according to the speci-

fied criteria.

Confidence Level

Confidence level is the confidence interval in which we expect to

lie the given parameter of the hypothesis.

e.g. A hypothesis is rejected at 95% confidence level means

that the given set of observations does not match with 95% of

the population for the given parameter.

Significance Level, Critical Level

Significance level is the percentage of observations which lie

beyond the desired confidence level.

e.g. 95% confidence level means 5% significance level.

Critical Value

The critical value in a hypothesis test is the value of the param-

eter beyond which we would reject the null hypothesis.

Type I Error

Rejecting the null hypothesis when it is true.

Type II Error

Accepting the null hypothesis when it is false.

One sided tests or one tailed tests :

A test in which we consider only one side of the distribution

e.g. greater than and less than testing.

Two sided tests or two tailed tests :

A test in which we consider only both sides of the distribution

e.g. equal to and not equal to testing.

Business Statistics / 18

1.11 Association

Two variables are associated if variation in one variable has effect

on variation in other variables.

Correlation

It is a measure of association between variables.

Scatter Diagram or Scatter Plot

It is graph obtained by plotting of values of two variables which

describe single bivarite observation (e.g. height and weight of a

persons). One variable (independent Variable) X coordinate and

the other variable (dependant variable) as Y coordinate.

Correlation coefficient

The correlation coefficient r is a measure of how nearly a scat-

tered diagram or scatter plot falls on straight line. The correla-

tion Coefficient is always between – 1 and + 1.

Causation, Causal Relation.

Two variables are casually related if changes in the value

of one cause the other to change.

ables.

2) The correlation Coefficient is always between _______________.

3) _______________ is Rejection the null hypothesis when it is

false.

4) _______________ is accepting the null hypothesis when it is

false.

5) Two variables are _______________ if variation in one variable

has effect on variation in other variable.

1.12 Summary

This chapter explains in detail the importance of statistics to

people and scope of statistics in different fields like Medical science

Business etc. The different types of statistical theory and tools to suse

and get proper decision of our interest in the Business.

What is Statistics? / 19

1.13 Check your Progress – Answers

1.3

All answers are Descriptive. Take proper help of your SLM and

also write your ideas in your own words.

1.4

1) Class marks

2) Class interval

3) Upper class boundary and lower class

4) Frequency of that observation

5) Specified range

6) Quantitative Data

7) Quantitative Data

8) Chance

9) Analysis of Variance

10) Inferences

1.8

1) Continuous Variable

2) Discrete Variable

3) Truth

4) Mean, Mode and Median

5) Mode

6) Standard Deviation

7) Standard Deviation

8) Inter Quartile Range

9) Range

10) Trials

1.12

1) Correlation

2) – 1 and + 1

Business Statistics / 20

3) Type I Error

4) Type II Error

5) Association

Statistics.

2) Write a short note on, ‘Probability’.

3) What is Sampling?

4) Write types of Sampling.

5) Explain the term ‘Statified Sampling’.

6) What are Central Tendency and write proper formulae.

What is Statistics? / 21

NOTES

Business Statistics / 22

CHAPTER 2

MEASURES OF CENTRAL

TENDENCY AND DISPERSION

2.0 Objectives

2.1 Introduction

2.2 Methods of Collection of Primary Data

2.2.1 Direct personal interview

2.2.2 Indirect personal interview

2.2.3 Mailed questionnaire

2.2.4 Scheduled through enumerations

2.3 Organizing the Data

2.3.1 Cumulative Frequency Distribution

2.3.2 Grouped Frequency Distribution

2.3.3 Guidelines for making class intervals

2.3.4 Cumulative grouped frequency distribution

2.4 Graphical Representation

2.5 Pie Chart Calculations

2.6 Frequency Curves

2.7 Cumulative Frequency

2.8 Averages

2.9 Partition Values

2.9.1 Quartiles

2.9.2 Deciles

2.9.3 Percentiles

2.10 Measures of Dispersions

2.10.1 Mean Deviations

2.10.2 Standard Deviation

2.11 The Coefficient of Variation

2.12 Skewness

2.13 Quartiles and the quartile Deviation

2.14 Extreme Values

2.15 Summary

2.16 Check your Progress – Answers

2.17 Questions for Self - Study

2.0 Objectives

and Dispersion Students can explain the followings –

Methods of collections of Primary Data.

How to divide and Organize Data.

Types of charts and graphs.

Draw Graphs.

Limitations and advantages of each and every method.

Calculations.

Making decision on the basis of statistical tools in the business

implementations.

2.1 Introduction

data collection, data representation and measures of central tendency.

we want to know about the object or objects under study. For example if

we want to study performance of students in an examination, set of marks

obtained by students will be our data.

detail in this chapter. After collection of data we need to classify the data

to know as in above case, the performance of the students and analyze

that data to arrive at conclusions and finally we need to take presentation

of our slides to the concerned authorities e.g: teacher. Data representa-

tion usually precedes data analysis as we come to know various trends

within the data through data representation.

refers to a measure of the middle or expected value of the data set. There

are many different types of averages that can be chosen as a measure-

ment of central tendency of the data items. The most common method is

the arithmetic mean, geometric mean, harmonic mean etc. are also use-

ful in applicable situations.

majority of students have scored marks near the average or far away from

the average, For example average of 1 and -1 is also 0. Similarly average

of 100 and -100 is also 0. In the first case data values are near the aver-

Business Statistics / 24

age i.e dispersion is less while in the second case data values are more

scattered or spreads or away from the average i.e dispersion is more.

The most common measure of dispersion is the standard deviation.

Primary data is data collected for the first time through census

or sample. There are several ways of collecting such data. These are :

Indirect personal interview or observation.

Mailed questionnaire

Scheduled through enumerators.

mation directly from the sources concerned. For example, an investiga-

tor may collect information about cost of cultivation through personal in-

teraction with the farmers who cultivate the land.

Advantages:

The investigator can check and countercheck the information

and get the form in which he desires.

- The investigator can put alternative questions suited to the edu-

cational and cultural level of the persons concerned.

- In such cases, information can be collected by eliminating the

bias and prejudices of the persons concerned.

Limitations:

sive and localized to a locality or a group. This cannot be used

when the enquiry is extensive or is to be done in large areas.

- Such an enquiry is subjective in the sense that the intelligence,

tact, skill as well as personal bias of the investigator are all

reflected in the process.

2.2.2 Indirect Personal Interview

have intimate knowledge of the phenomenon under enquiry. For example,

an investigator may collect information about cost of cultivation indirectly

from village head instead of collecting the information from farmers.

Advantages:

- As information can be collected from more knowledgeable per-

sons, these are expected to be more useful and reliable.

- As fewer persons need be contacted, the enquiry could be more

extensive than in case of direct personal enquiry.

Limitations:

personal bias of the persons from whom it is collected.

- One has to be very careful about the selection of such persons

not only their knowledge but their personal attributes affects the

quality of data. Great caution is called for in dealing with such a

situation.

tured or semi-structured format. Respondents often choose from among

a set of forced-choice, or provided responses. These can include yes/no

or scaled responses. Questionnaires can be administered in person, by

mail, over the phone, or via email/Internet.

mail to the persons from whom information is to be collected. They, in

their turn, are expected to answer the questions and also to supply addi-

tional information and comments, where, necessary and mail them back

to the investigator. Great care is to be taken while preparing a question-

naire. Skill of the experience under enquiry is needed in drafting a ques-

tionnaire. Though there are no hard and fast rules for designing a ques-

tionnaire, there are a few general points which should be borne in mind.

· Delicate questions are to be put with great care. Often indirect

Business Statistics / 26

questions should be put to get answers to some pertinent point.

It is sometimes desirable to avoid very delicate questions.

· The size of the questionnaire schedule should be as small as

possible. It saves time, both for the enumerator and the respon-

dent. A large questionnaire is likely to exhaust the patience of

the respondent.

· There should be a natural, logical order in which questions are

arranged.

It should be noted that the information collected through ques-

tions should be such that it is usable.

Advantages:

- It can be administered to large groups of individuals.

- It is much less time consuming and is economical.

- A much larger coverage can be made as people in distant places

can be reached without much difficulty.

- It is advantageous in a situation where the persons concerned

move to far away places. For example, in an enquiry relating to

old students of a college, such a method may be useful as

students move out and away after leaving the institution.

- Useful for collection of demographic information, satisfaction lev-

els and opinions of the program.

Limitations:

- The method can be adopted only in case of enlightened and

educated people.

- As persons are not approached directly, the proportion of non

response is usually much larger. People do not have the time to

spare nor are they are willing to take the trouble of writing the

answers and returning the questionnaire. Sometimes people also

do not like to record information in their own handwriting and

very often avoid answering delicate questions.

collect data are called as enumerators) are engaged to collect informa-

tion from the persons concerned. They gather information in schedules

or questionnaire specially prepared for the purpose in the form of an-

swers given by the respondents to specific questions.

In the case of a census, enumerators visit every member of the

source in the zones or areas specifically allotted to them and in the case

of sample survey, they visit those members who come under their sample

procedure. This method is applied in census and in the most other exten-

sive enquiries designed to cover larger areas for population.

Advantages:

suses as well as sample enquiries.

- There is a much lesser degree of subjectivity on the part of

interviewer in this method.

- This method is useful where the scope and coverage is large

enough.

Limitations:

- Thorough training of the enumerators is needed before they are

set to the field. It also needs an organization to handle the whole

process of appointment, training and supervision of enumera-

tion work.

- Frequency Distribution

tistical data is by constructing a distribution table. In this method, classi-

fication is done according to quantitative magnitude. The items are clas-

sified in to groups of classes according to their increasing order in terms

of magnitude and the number of items failing in-to each group is deter-

mined and indicated.

Consider the following set of data which are the high tempera-

tures recorded for 30 consecutive days. We wish the summarize this

data by creating a frequency distribution of the temperatures.

Business Statistics / 28

Frequency Distribution

50 45 49 50 43

49 50 49 45 49

47 47 44 51 51

44 47 46 50 44

51 49 43 43 49

45 46 45 51 46

follows:

1. Identify the highest and lowest values in the data set. In the

given data temperature the highest temperature is 51 and the

lowest temperature is 43.

2. Create column with the title of the variable we are using. In this

case temperature. Enter the highest score at the top, and in-

clude all values within the range from the highest score to the

lowest score.

3. Create a tally column to keep track of the scores as you enter

them into the frequency distribution. Once the frequency distri-

bution is completed you can omit this column. Most printed

frequency distributions do not retain the tally column in their

final form.

4. Create a frequency column, with the frequency of each value,

as shown in the tally column recorded.

5. At the bottom of the frequency column record the total frequency

for the distribution proceeded by N.

6. Enter the name of the frequency distribution at the top of the

table.

following frequency distribution.

Disc rete Fr equency Distr ibuti on for High

Te m peratur es

Tem per ature Tally Frequenc y

51 4

50 4

49 6

48 0

47 3

46 3

45 4

44 3

43 3

N= 30

quency distribution by adding an additional column called “Cumulative

Frequency”. For each score value, the cumulative frequency for that score

value is the frequency up to and including the frequency for that value. In

the cumulative frequency distribution for the high temperatures data be-

low, notice that the cumulative frequency for the lowest temperature (43)

is 3, and that the cumulative frequency for the temperature 44 is 3+3 or 6.

The cumulative frequency for given value can also be obtained by adding

the frequency for the value to the cumulative value for the value below the

given value. For example the cumulative frequency for45 is 10 which is

the cumulative frequency for 44 (6) plus the frequency for 45 (4) finally,

notice that the cumulative frequency for the highest value (51) in the cur-

rent case should be the same as the total of the frequency column (30) in

the case of the temperature data).

Business Statistics / 30

C u m u lative Fr eq u en cy Distr ib utio n fo r High

T e m p eratu re s

T em pe ratu re T ally F req uen c y Cu m u lativ e

F req u e nc y

51 4 30

50 4 26

49 6 22

48 0 16

47 3 16

46 3 13

45 4 10

44 3 6

43 3 3

N= 30

1. Create a frequency distribution.

2. Add a column entitled cumulative Frequency.

3. The cumulative frequency for each score is the frequency up to

and including the frequency for that score.

4. The highest cumulative frequency should equal N (the total of

the frequency column)

summarize the data properly. For example, you wish to create a fre-

quency distribution for the IQ scores of your class of 30 pupils. The IQ

scores in your class range from 73 to 139. To include these scores in a

frequency distribution you would need 67 different score values (139 down

to 73). This would not summarize the data very much. To solve this prob-

lem we would group scores together and create a grouped frequency

distribution.

If your data has more than 20 score values, you should create a

grouped frequency distribution by grouping score values together into

class intervals. To create a ground frequency distribution:

1. Select an interval size so that you have 7-20 class intervals.

2. Create a class interval column and list each of the class inter-

vals.

Measures of Central Tendency & Dispersion / 31

3. Each interval must be the same size , they must not overlap ,

there may be no gaps within the range of class intervals

4. Create a tally column (optional)

5. Create a midpoint column for interval midpoints

6. Create a frequency column

7. Enter N = sum value at the bottom of the frequency column.

intervals.” Mutually exclusive” means that a score can belong to

only one class intervals. Two non – mutually exclusive class

intervals would be 45-49 and 47-51 since the scores 47, 48, and

49 could belong to either class interval. In a grouped frequency

distribution, the class intervals must be mutually exclusive.

2. Do not omit any class intervals. Just as with regular frequency

distribution, all possible scores between the largest score and

the smallest score of the data set must be included in the grouped

frequency distribution. Even if an interval has a frequency of

zero, it is to be included in the list of class intervals.

3. The class interval size is usually more than 3. The class interval

size is defined as the upper real limit of the class interval, minus

the lower limit of the class interval For instance, in the class

interval 44-49, the class interval size is 49.5 – 44.5 = 5.

4. Pick the smallest class interval size of 3 or 5 or a multiple of 5

that also satisfies the first guideline of producing approximately

7 to 20 class intervals. In other words if a class interval size of

25 will produce approximately 18 class intervals and a class

interval size of 30 will produce approximately 12 class intervals,

select 25 as class interval size. The rationale for this rule is that

it is better to under summarize the data, by using a smaller

class interval size, than to over-summarize the data, would re-

sult with a larger class interval size.

5. The class interval size should be equal for all class intervals. If

the class interval size were not equal for all class intervals, then

we could not perform the statistical computations which use

grouped frequency distributions.

6. The lower apparent limit of each class interval should be a mul-

tiple of the class interval size. If the lowest score in the data set

is 46 and a class interval size of 5 has been selected, the first

class interval would be 45-49 because 45 (the lower apparent

limit) is a multiple of 5 while 46 is not.

Business Statistics / 32

Look at the following data of temperatures for 50 days. The high-

est temperature is 59 and the lowest temperature is 39. If we were to

create a simple frequency distribution of this data we would have 21

temperature values. This is greater than 20 values so we should create a

grouped frequency distribution.

D a ta Se t – H ig h T em p e ratu r es f o r 5 0 d a y s

57 39 52 52 43

50 53 42 58 55

58 50 53 50 49

45 49 51 44 54

49 57 55 59 45

50 45 51 54 58

53 49 52 51 41

52 40 44 49 45

43 47 47 43 51

55 55 46 54 41

grouped frequency distribution, we would create the following grouped

frequency distribution. Note that we use an interval size of three so that

each class interval includes three score values. Also note that we have

included an interval midpoint column, this is the midvalue of each inter-

val.

Class Interval Tally Interval Midpoint Frequency

57-59 58 6

54-56 55 7

51-53 52 11

48-50 49 9

45-47 46 7

42-44 43 6

39-41 40 4

N= N = 50

2.3.4 Cumulative grouped frequency distribution

distribution. We just add a cumulative frequency column to the grouped

frequency distribution and we have a cumulative grouped frequency dis-

tribution. The cumulative grouped frequency distribution below was cre-

ated by adding a cumulative frequency column.

Class Interval Tally Interval Frequency Cumulative

Midpoint Frequency

57-59 58 6 50

54-56 55 7 44

51-53 52 11 37

48-50 49 9 26

45-47 46 7 17

42-44 43 6 10

39-41 40 4 4

N= N

50 = 50

components of graph created by computer software

Business Statistics / 34

2. Title :

It can contain the title and subtitle if any of the graph.

3. Axis :

Base line when data is positioned on a graph. Scale and scale

label are displayed on the axis. Unit label, axis title, and break

line are also displayed if necessary. The name of each axis

may vary depending on the chart type.

4. Plot area

The area in which the graph is plotted.

5. Series

The group of series of associated values displayed in the graph

e.g. One year will represented by one series and each series is

represented by a bar in the graph.

6. Legend

The list indicating the colour, line style, or filling pattern of the

graph corresponding the series Legend is displayed in the initial

state other than stock chart.

7. Comment

The comment about the graph.

8. Data label

The string that displays the name or value for the data on the

graph.

9. Text label

The string that can be displayed at any position within the graph.

10. Bar Graph

Ex 1.

Table showing production of wheat rice and cereals for the years

1990 and 1999 is given below.

W heat R ic e C e re a ls T o ta l

1990 50 100 15 0 300

1999 100 150 25 0 500

Bar Chart

250

Rice

200 Wheat

150

100

50

Business Statistics / 36

Multiple or compound bar chart

250 Ce re als

Rice

2 00 W h eat

1 50

100

50

1 990 1 999

1990 1999

1990 (50/300)x100~17 (100/300)x100~33 (150/300)x100 = 50

1999 (100/500)x100 = 20 (150/500)x100 = 30 (250/500)x100 = 50

2.5 Pie Chart Calculations

1990 (50/300)x360 = 60 (100/300 )x360 = 1 20 (150/300)x360 = 180

1999 (100/500)x360 = 72 (150/500 )x360 = 1 08 (250/500)x360 = 180

PIE CHART

Business Statistics / 38

A histogram of a frequency distribution is drawn as fol-

lows:

a) The class boundaries are marked on the X-axis starting and

finishing at convenient points on the axis, the class intervals are

thus marked on the X-axis and are taken as bases.

b) On each base, a rectangle is drawn whose height is equal to

the frequency of that class. If the class intervals are of equal

size of width, the areas of the rectangles are proportional to the

corresponding class frequencies. Here the vertical axis (or y-

axis, as is commonly known) is the frequency axis.

c) Instead of class boundaries class limits may be used if the

frequency distribution is given or constructed in terms of class

limit. But it is better to use class boundaries, especially in case

of continuous variables. We draw below the histogram corre-

sponding to the frequency distribution given by Table in the given

example in problem to be solve.

of observations and if the class intervals are taken to be smaller, it may

be possible to have a sizable frequency for most of the classes. Then

the frequency polygon will closely approximate a curve, which is called

frequency curve. Such a curve is also known as smoothed frequency

polygon.

It has been found that frequency curves of data found in nature

and industry generally take the characteristic shapes as indicated.

the right left

J - Shaped Reversed

Revered J Shaped U - Shaped

upper class boundary of a given class interval: this number is the fre-

quencies up to and including that class to which the boundary corre-

sponds. This sum in Known as the cumulative frequency up to and in-

cluding that class interval.

the points obtained by plotting the cumulative frequencies along the ver-

tical axis and the corresponding upper class boundaries along the x-

Business Statistics / 40

axis. The corresponding polygon is known as cumulative frequency poly-

gon (less than) or ogive. By joining the points by a free hand curve we get

the cumulative frequency curve (“less than”). Similarly we can construct

another cumulative frequency distribution (“more than” type) by consider-

ing the sum of frequencies greater than the lower class boundaries of the

classes. For example, the total frequency greater than the lower class

boundary 158.5 of the class 159-160 is one (1), while the total frequency

grater than the lower class boundary 156.5 of the class 157-158 is 1 + 4

= 5, that of the class 155-156 is 1 + 4 + 6 = 11, and so on. Given below

is Table 3.7 of cumulative frequency distribution. (“more than”) of the,

same distribution.

of 50 students.

Class

Class (in cms.)(in

interval interval

cms.) Frequency Cumulative Frequency

(Less than)

144.5 – 146.5 2 2

146.5 – 148.5 5 7

148.5 – 150.5 8 15

150.5 – 152.5 15 30

152.5 – 154.5 9 39

154.5 – 156.5 6 45

156.5 – 158.5 4 49

158.5 – 160.5 1 50

Total 50

heights of 50 students.

More Than O give

Plot more than CF

against LCL

50

Plot Less than CF

against UCL

40

30

20

10

Classinterval

Class (in cms.)(in

interval

cms.) Frequency Cumulative Frequency

(More than)

145 – 146 2 50

147 – 148 5 48

149 – 150 8 43

151 – 152 15 35

153 – 154 9 20

155 – 156 6 11

157 – 158 4 5

159 – 160 1 1

50

Business Statistics / 42

The graph obtained by joining the points obtained by plotting the

cumulative frequencies (“more than”) along the vertical axis and the cor-

responding lower class boundaries along the X-axis is known as cumula-

tive frequency polygon (greater than) or ogive, by joining the points by a

free hand curve, one gets cumulative frequency (“more than” type). These

two curves are shown in figure above.

were the same, then this number should be used. e.g if the numbers are

same say 4,4,4,4 then we can use number 4 to represent this data set.

What if they are not the same? Which of the number we should select to

represent the data set? The average is the answer. The average should

not depend on the order of the numbers in the list, and it is not less than

the smaller number in the list, nor greater than the greater number in the

list.

simply called the mean. The arithmetic mean of two numbers, such as 2

and 8, is obtained by A = (2 + 8) / 2 = 5.

the resulting value obtained for A. The mean 5 is not less than the mini-

mum 2 nor greater than the maximum 8. The mean of a list of integers is

not necessarily an integer.

bers we multiply them. Thus, the geometric mean of 2 and 8 is obtained

by G = square root of (2 x 8) = 4. And again it is seen that changing the

order of the members of the list to be averaged does not change the

result: In order to make sense of the requirement that the mean must be

at least as big as the smallest member of the list and no bigger than the

largest, the geometric mean is usually only applied to lists of positive

numbers, not to lists that can include negative numbers such as tem-

peratures.

many other ways of combining the elements of a list in a manner that

does not change when the order of the list is changed. For each of them

one can define an average based on that method.

called the mode. So the mode of the list (1, 2, 2, 3, 3, 3, 4) is 3. The

mode is not necessarily well defined. The list (1, 2, 2, 3, 3, 5) has the two

Measures of Central Tendency & Dispersion / 43

modes 2 and 3. The mode can be subsumed under the general method of

defining averages by understanding it as taking the list and setting each

member of the list equal to the most common value in the list if there is a

most common value. This list is then equated to the resulting list with all

values replaced by the same value. Since they are already all the same,

this does not require any change.

to order the list according to its magnitude and then repeatedly remove

the pair consisting of the highest and lowest value till either one or two

values are left. If two values are left replace them with their arithmetic

mean. This method takes the list 1, 7, 3, 13 and orders it to read 1, 3, 7,

13. Then the 1 and 13 are removed to obtain the list 3, 7. Since there are

two elements in this list replace them by their arithmetic mean (3 + 7)/2

= 5. Now do the same for the equal sized list consisting of all the same

value M: M, M, M, M. It is already ordered. We remove the two end

values to get M, M. We take their arithmetic mean to get M. Finally, set

this result equal to our previous result to get M = 5.

for obtaining averages. A number of averages, including the ones dis-

cussed above., that have been found to be useful in some circumstances

or other are listed below along with their formal solutions.

to calculate, based on each and every observation, stable to sampling

fluctuations, rigidly defined, should not be affected by extreme values.

data is arranged in the increasing order.

est frequency in the data set

1

Geometric mean X1 X2 X3...

Xn n

1 1 1 1

Harmonic mean n n ... .

x n x

1 x 2 x n

Business Statistics / 44

Mean for grouped data

examination. The mean marks of boys were found to be 60%,

where as the mean of girls was 40% Determine average marks

percentage of the school.

Solution

Total score of 35 boys = 35 x 60 = 2100

Total score of 85 girls = 85 x 40 = 5400

Total score of 120 students = 2100 + 3400 = 5500 marks

The mean for grouped data is

X = x1f1 x 2f 2 x 3f3 ............xk fk

n

1 k

= xi fi

n i 1

OR

k

xi f i

i 1

k

f i

i 1

Median

Median.

Example: -

of set -3, -1, 0, 1, 2, 3 is (0 + 1)/2 = 0.5.

The median is a value which divides the set of observation into

two equal values such as 50% of the observation lie below the median

and 50% above the median.

The median is not attached by the actual values of the observa-

tion but rather on their positions. Even median is not always a value from

the given set also.

Procedure to find median

(Frequency Distribution is given)

Step 1: Arrange the values of variables in ascending or descending

order of magnitude.

Step 2 : Find the cumulative frequency ( c.f.)

Step 4 : Find the cumulative frequency (c.f.) first greater than N/2

and determine the corresponding value of the variable.

Step 5 : The value obtained in step 4 is the required median.

Median:

Ex: 2,3,5,6,7

No. of observations in the given set = N = 5 It is odd No.

th

N 1

\ Median = value of observation

2

Here N = 5

N 1 5 1

= =3

2 2

1st, 2nd, 3rd, 4th, 5th

2, 3, 5, 6, 7

Median = 3 rd

observation

=5

= N

= 6 it is even number

Business Statistics / 46

N th N th

1

Median = value of 2

2 Observation

6 th 6 th

value of 1 observatio n

= 2 2

2

=

value of 3 rd 4 th observatio n

2

0 1 1

= = 0.5

2 2

N o . of stu d en ts 6 4 16 7 8 2

M a rk s 20 9 25 50 40 80

quency distribution table –

N = No. of observations = 43 odd

9 4 4

20 6 10

25 16 26

40 8 34

50 7 41

80 2 43

To tal 43

n

N fi 43

i1

2

Therefore, median = 22nd value.

th

N 1

Median = value of observations

2

The above table shows that all items from 11 to 26 have their

values, since 22nd item falls in this interval; hence its value is 25.

Hence median = 25 marks.

Ex : 3,4,4,5,7,8,8,8,9,10

Solution:

In given data N = 10.

th th

N N

(value of 1 Obs.)

Median = 2 2

2

th th

10 10

(value of 1 Obs.)

= 2 2

2

th th

(value of 5 6 Obs.)

=

2

78

=

2

= 7.5

Mode

called mode.

For example, in the series 6,5,3,4,7,8,5,9,5,4. We notice 5 oc-

curs most frequently . Hence 5 is the mode.

Marks 0 1 2 3 4 5 6 7 8

Number of boys 7 10 16 17 26 31 11 2 1

Business Statistics / 48

Solution :

xi f

Fi fi xi c.f.

0 7 0 7

1 10 10 17

2 16 32 33

3 17 51 50

4 26 104 76

5 31 155 107

6 11 66 118

7 2 14 120

8 1 8 121

Mean Median

fi xi th

X = N 1

fi M = observation

2

440

= th

121 121 1

= observation

= 3.64 2

Mode th

Highest frequency of gives data = 31 122

= observation

& Value with highest frequency = 5 2

Mode = 5 = 61 th observation

e.g ;

(I) 1,2,2,3,3,3,4 = 4

The numbers appear in this data Corrosponding c.f. = 76

1 once 2 twice Corrosponding Xi = 4

3 trice 4 once

(II) 1,2,2,3,3,5

The data points appear in this data

1 once 2 twice

3 twice 5 once

3 and 2 the both are mode

Measures of Central Tendency & Dispersion / 49

2.9 Partition Values

There are the values that divide total observations into a number

of equal parts when data is arranged in the increasing order.

2.9.1 Quartiles

are total 3 quartiles. Q quartile has (25)% values below it. e.g. if Q1 of a

data set is 40. It would mean 25% (25*1=25) of the observations of that

data set are below 40. and 75% (100-25=75) observations have value

more than 40.

N C.F.

4 h

Q1 = L + F

L= Lower limit of the quartile class

CF = Cumulative frequency (c.f.) of previous class

F= Frequency of the quartile class

h= Width of the quartile class

4

3 N C.F.

h

Q3 = L + F

Q1 is called as lower quartile or first quartile

Q2 is called as middle quartile or second quartile or median.

Q3 is called as upper quartile or third quartile.

2.9.2 Deciles

are total 9 deciles. D1 decile has (10)% values below it. e.g. if D8 of a

data set is 70. It would mean 80% (10*8=80) of the observations of that

data set are below 70. and 20% (100-80=20) observations have value

more than 70.

Ni

C.F.

10

L h

D1 = F

Business Statistics / 50

For example ;

N1

C.F.

10

L h

D1 = F

N 2

C.F.

10

L h

D2 = F

c.f. = Cumulative Frequency (c.f.) of previous class

f = Frequency of the decile class

h = Width of the decile class

2.9.3 Percentiles

Percentiles are the values of the variant that divide the total fre-

quency into 100 equal parts. There are total 99 percentiles. Pi percentile

has i% values below it. e.g.if your score in an examination is 86 percen-

tile which is equal to the actual marks scored 70. It would mean 86%

candidates have scored less than 70 marks i.e.86% candidates have

scored less marks than you and 14% (100-86 = 14) candidates have

scored more marks than you.

The formula is

N i

C.F.

100

l h

Pi = F

For example;

N1

C.F.

100

l h

P1 = F

N 2

C.F.

100

l h

P2 = F

l = Lower limit of the percentile class

c.f. = Cumulative Frequency (c.f.) of previous class

f = Frequency of the percentile class

h = Width of the percentile class

pany is as given below:

Less than 200 12

200-400 16

400-600 38

600-800 78

800-1000 80

1000-1200 35

1200-1400 14

Above 1400 7

Total 280

Find first quartile, median, the third quartile. Find D4, P66.

Solution:

The cumalative frequency (cf) is as below:

L ess th an 2 00 12 12

2 00-40 0 16 28

4 00-60 0 38 66

6 00-80 0 78 144

8 00-10 00 80 224

1 000-1 200 35 259

1 200-1 400 14 273

Above 1400 7 280

To tal 28 0

Business Statistics / 52

Here class 200-400 means wages from Rs.200 (200 is lower

class limit and is included in the class) to less than 400 (400 is upper

class limit and is NOT included in the class)

Q1 corresponds (first quartile) to 280/4 = 70th observation which

lies interval 600-800 unit lower class boundary (L) = 600.

Interval contains 78 observations and the CF of earlier class is

66. Class width h = 200.

N1

c.f .

4

l h

Q1 = f

280 1

66

4

600 200

Q1 = 78

280

4 66

600 200

= 78

70 66

= 600 200

78

= 600 + 10.256

= 610.256

2 N/4 = 140 also lies in the same class.

280 2

4 66

600 200

Q2 = 78

= 600 + 189.74

= 789.74

3 N 3 280

Q3 Corresponds (third quartile) = = = 210 observa-

4 4

tion lies in the interval 800-1000.

Therefore , L = 800, F=80 , CF=144, h = 200.

N 3

4 c.f .

l h

Q3 = f

280 3

4 c.f .

800 200

Q3 = 80

= 800 + 165

= 965

D4

N i 280 4

The observation of 4th deciles corresponds to = = 112

12

10 10

Therefore, l = 600, f = 78, c.f. = 66, h = 200

4 280

10 66

600 200

D4 = 78

= 600 + 117.95

= 717.95

P66

66 280

The observation corresponds to = 184th observation,

100

which lies in the interval 800-1000.

Ni

100 c.f .

l h

Pi = f

Business Statistics / 54

66 280

100 144

800 200

P66 = 80

= 800 + 102

= 902

Ex : For what value of x will 8 and x have the same mean (average)

as 27 and 5 ?

27 5

= 16 Therefore

2

x8

= 16

2

32 = x+8

24 = x

86,92,63 and 77. What test score must X earn on his sixth test

so that his average (mean score) for all six tests will be 80?

Solution :

72 86 92 63 77 x

= = 80

6

( 80 ) ( 6 ) = 390 + x

480 = 390 + x

90 = x

dogs X weighs 46 kg. The other two dogs, Y and Z, have the

same weight. Find Z’s weight.

Let x = Y’s weight

Therefore Z’s weight = x (they weigh the same so they are both

“x”)

Average: sum of the data divided by the number of data.

x x 46

= 38

3(dogs )

(38) (3) = 2x + 46

114 = 2x + 46

68 2x

=

2 2

X = 34

Z weighs 34 kg.

Measures of Dispersion:

items and their common mean;

i) the mean deviation

ii) the standard deviation

Central percentage spread of items:

i) the 10 to 90 percentile range

ii) the quartile deviation

The Range: The difference between the smallest and largest val-

ues of item in a set or distribution.

Ex : The daily number of books sold by two separate bookstores

over twelve days were:

Bookstore 1 : 3, 5, 1, 4, 5, 3, 6, 8, 6, 2, 3, 7

Bookstore 2 : 2, 3, 2, 1, 4, 3, 2, 2, 1, 3, 4, 1

store 2 is 4 – 1 = 3. Thus, daily sales are more variable for Bookstore 1.

difference between each item and the mean. Absolute means ignoring

the negative sign i.e Read | x | as absolute x.| -2 | = 2 and | 2 | = 2.

Business Statistics / 56

Mean Deviation for

Grouped Data

n

| Xi x |

M. D. = i 1

n

n

fi | Xi x |

i 1

M. D. = n

fi

i 1

Even nothing is mentioned mean deviation MD is always taken

about mean.

Ex : Find the range and calculate the mean deviation of 84, 92,

73, 67, 88, 74, 91, 74

Range = 92 - 67

= 25

84 92 73 67 88 74 91 74

Mean =

8

643

=

8

= 80.375

Mean Deviation =

8

67

=

8

= 8.375

In other words, each value in the set is, on average, 8.375 units

away from the common mean.

Ex : Calculate the mean deviation from the following distribution:

Number of weeks 3 17 15 20 9 4

(f ) (x) ( fx ) xx f | xx|

10-14 3 12 36 11.99 35.97

15-19 17 17 289 6.99 118.83

20-24 15 22 330 1.99 29.85

25-29 20 27 540 3.01 60.2

30-34 9 32 288 8.01 72.09

35-39 4 37 148 13.01 52.04

1631 368.98

68

= 23.99

fi | x x |

Mean Deviation : M.D. ( x ) =

fi

368.98

=

68

= 5.43

from the mean value.

It is the most common measure of dispersion, (Remember: dis-

persion = spread / variability).

It is used as a measure for comparison only when the units in

the distribution are the same and the respective means are com-

parable.

Business Statistics / 58

1) Ungrouped Data

2 xi

= ( xi x ) Where, x =

n n

2 2

xi xi

=

n n

2) A frequency distribution:

2

fi ( xi x )

=

fi

2 2

fi xi fi xi

=

fi fi

Solution:

Xi

xi Xi22

xi

84 7056

92 8464

73 5329

67 4489

88 7744

74 5476

91 8281

74 5476

Total 64 3 5 231 5

xi

x =

n

643

=

8

= 80.375

= 80.38

Measures of Central Tendency & Dispersion / 59

2

S.D = xi x 2

n

55315

= (80.38)2

8

= 6539.375 6460.94

= 78.435

= 8.856

Number of weeks 3 17 15 20 9 4

10 – 14 3 12 36 4 32

15 – 19 7 17 289 49 13

20 – 24 15 22 330 72 60

25 – 29 20 27 540 14 580

30 – 34 9 32 288 92 16

35 – 39 4 37 148 54 76

Tota l 68 163 1 41 877

2

fi xi ( x ) 2

M.D = =

fi

41877

= (23.99) 2

68

= 615.838 575.52

= 40.318

= 6.35

Business Statistics / 60

2.11 The Coefficient of Variation

made, it is necessary to do so with regard to their variability.

While the standard deviation is the important measure of spread,

it cannot be used as the sole basis of comparing two distributions

This is because it is an absolute measure of dispersion that

measures variation in the same units as the original data.

(Remember that absolute values ignore the negative signs).

For example, if we have a standard deviation of 10 and a mean

of 5, the values vary by an amount twice as the mean itself. If,

on the other hand, we have a standard deviation of 10 and a

mean of 5,000, the variation to the mean is insignificant. Therefore

we cannot know the dispersion of a set of data until we know

how the standard deviation compares with the mean.

A relative measure of dispersion, which compares the mean to

the standard deviation, is the coefficient of variation, which is

found by dividing the standard deviation by the mean.

= (SD / Mean).x 100

x 100

= ( ( / x ).100

x 100

Ex : Given the following data:

A: x = 120, = 55

B: x = 90, = 50

Solution:

of B (90). This means that deviations from the mean of A, and

thus the standard deviation will tend to be higher.

B: Coefficient of Variation = 50 / 90 = 55.6 %

B has the higher relative variability in weekly wages.

Although the standard deviation for A is higher in absolute terms,

the dispersion for B is higher in relative terms.

2.12 Skewness

tion.

It can be positive (for a distribution which is skewed to the right),

negative (when a distribution is skewed to the left), or zero (for a

symmetric distribution).

If a distribution is skewed, it means that values of the distribu-

tion are concentrated at either the low end or the high end of the

measuring scale on the horizontal axis. For example, the two

curves below are skewed distributions:

coincide (i.e. mean = median = mode) is known as a symmetrical

distribution. Conversely, when values of mean, median and mode are not

equal the distribution is known as asymmetrical or skewed distribution.

In moderately skewed or asymmetrical distribution a very important

relationship exists among these three measures of central tendency. In

such distributions the distance between the mean and median is about

one-third of the distance between the mean and mode, as will be clear

front the diagrams 1 and 2. Karl Pearson expressed this relationship as:

2

And Median = mode + [mean – mode]

3

Business Statistics / 62

Skewness is a measure of the asymmetry of a frequency distri-

bution, and the skewness coefficient is included as one of the statistics.

A right or positive, skewed forecast has a greater destiny of values occur-

ring and the mode around the lower end of the range. A left, or negative,

skewed forecast displays the opposite trend. The skewness of a fre-

quency distribution can be an important consideration. For example, if

your forecast is Net profit, you would prefer a situation that led to a posi-

tively skewed distribution of profit to one that is negatively skewed (with

all else being equal).

ficient values:

- Greater than 1 or less than -1 indicates a highly skewed distri-

bution ;

- Between 0.5 and 1 or 0.5 and -1 is moderately skewed ; and

- Between -0.5 and 0.5 indicates that the distribution is fairly sym-

metric.

Mean Mode

Person’s skew (SKP) =

efficient or Kelly’s coefficient. Bowley’s coefficient is always between -1

and +1 both included.

Q 3 Q1 2 Media

Bowley’s skewness coefficient =

Q 3 Q1

1) Distribution has positive skew ness

SKP > 0

mode < mean

2) Distribution has Negative skewness

SKP < 0

mode > mean

3) Distribution is symmetrical, skewness is zero

SKP = 0

Mode = mean

The Empirical Rule

- The standard deviation can be used to convey information about

variability in a collection of data.

- To illustrate this, we look at the case of a normal population –

this means that the data values have a bell – shaped histogram.

- It can be shown for such populations that about 68% of the data

lie within one standard deviations of the mean. About 95% within

two standard deviations of the mean , about 95% within two

standard deviations of the mean, and about 99% within three

standard deviations of the mean. This is shown in the diagram

below:

34% 34%

13.5% 13.5%

2% 2%

µ – 3σ μ – 2σ 1σ μ μμ+μ 1σ

μ – 1σ 1σ μ1σ μ +μμ+μ+1σ

1σ +1σ2σ μμ+++2σ

μ + μ2σ 2σ

3σ μ++3σ

μ + μ3σ 3σ

mal population – to do so, it is necessary that the empirical rule should

be satisfied.

an exam.

46 58 65 70 76 49 59 66 71 78

50 59 66 71 79 53 60 66 72 80

54 62 66 73 82 55 63 68 73 83

55 64 68 73 84 57 65 69 74 88

and10 respectively, does this data satisfy the empirical rule?

Business Statistics / 64

Solution:

range 57-77 i.e. within one standard deviation of the mean. These 26

numbers represents 26/40 or 65% of the data, and very close to the 68%,

which lie within one standard deviation for the empirical rule. Further cal-

culations are shown below:

1 S. D. (57-77) 26 65 68

2 S. D.’s (47- 87) 38 95 95

3 S. D.’s (37-97) 40 100 99

As the percentage for this sample are very close to the empirical

rule, it is reasonable to conclude that this sample is coming from a nor-

mal population.

relative to the other data values in the collection.

Formula:

x mean

Z = Where Z = standard score

x = any value in a data set

The standard score (Z) of a data value (x) is the number of stan-

dard deviations that the data value is above or below the mean:

standard deviation of 5%

standard deviation of 25%.

of 90 in the economics exam.

Solution :

Mean 50% 40%

S.D. 5% 25%

SD

Calculate coeffiecient of variation = 100

Mean

5

C.V (stats) = 100

50

= 10.

25

C.V. (Eco) = 100

40

= 62.5

C.V (stats) < C.V. (Eco)

Result of stats is better than Eco.

Even give

Jill (Stats) = 70

Jack (Eco) = 90.

only takes two values into account, and is obviously affected by extreme

values (outliers). Remember the Handout on Dispersion and skewness,

page 1.

at the range in the middle part of the data, when arranged in ascending or

descending order.

- The first quartile (denoted Q1) as the value with 25 percent of the

data below it.

Business Statistics / 66

For example, if a distribution has 8 values, Q1 is the value with 2

numbers less than it.

- The third quartile (denoted Q3) as the value which has 75 per-

cent of the data below it.

For example, if a distribution has 8 values, Q3 is the value with 6

numbers less than it.

- The range of the middle 50 percent of the data is found by sub-

tracting Q1 from Q3 – this is called the inter-quartile range.

viation) is used instead of the interquartile range – this is just the inter-

quartile range divided by two.

Q3 Q1

Semi-interquartile range = Quartile Deviation =

2

distribution can be found using one of the two methods.

1. Graphical Approach :

cumulative frequency distribution, and dropping horizontal lines

from 25% and 75% on the cumulative percentage scale to the

ogive , and reading Q1 and Q3 vertically downwards on the hori-

zontal axis.

2. Formula approach:

respectively.

Note that these methods should both give the same answer.

company.

(Note : A grouped frequency distribution):

Weekly Wages Number of Employees

Under 200 16

200 to under 225 153

225 to under 250 101

250 to under 275 92

275 to under 300 68

300 and over 50

approach?

Solution:

cumulative frequency distribution and then get the percentage cumula-

tive frequency:

200 16

225 169

250 270

275 362

300 430

325 480

From this graph , it can be seen that :

Q3 = Cumulative frequency is ¾ of 480, or 360 = £ 274

Q1 = Cumulative frequency is ¼ of 480, or 120 = £ 217

= > Inter –quartile range = £ 274 - £ 217 = £ 57

2. By formula

classes, which contain Q1 and Q3 respectively. This means that we

multiply the total frequency by ¼ and ¾ respectively and find the classes

contain these values.

Business Statistics / 68

2.14 Extreme values

The terms outlier and extreme values are often used interchange-

ably. Both refer to a data value that is atypical of the data set i.e. values

which differ markedly from most of the numbers in the set.

played by a team in the last five years is as follows:

2 1 10 1 1

enced by the extreme value also called as outlier 10.

is perhaps more meaningful for comparisons or for setting a norm.

sure of average for data sets with outliers.

sons for such values can vary – it may be due to weather conditions or

due to a recording error.

2.15 Summary

Mean is nothing but average of given abservation. Mean is calculated in

two types of data:

- Ungrouped

- grouped

For ungrouped = =

xi

x n

Where xi = x1 + x2 + x3 + . . . . . + xn

n = No. of abservations given in data.

For grouped - Frequencies are given as fi;

Mean = =

fi xi

x n

Likwise Meadian and Mode there are two different formule for

grouped and ungrouped data.

2.16 Check your Progress - Answers

1) Title

2) Plot area

3) Smoothed frequency polygon

4) More than and less than

5) 3

6) 5

7) 4

8) 10

9) Percentiles

10) Dispersion

2) Central Tendency.

3) Measures of Dispersion.

4) Types of Graphs.

5) Quartiles.

Business Statistics / 70

NOTES

NOTES

Business Statistics / 72

CHAPTER 3

3.0 Objectives

3.1 Introduction

3.2 Scatter diagram

3.3 Correlation & Covariance

3.4 Karl Pearson’s correlation coefficient

3.5 Spearman’s Rank Correlation

3.6 Coefficient of concurrent Deviation

3.7 Standard Error & Probable Error

3.8 Coefficient of Determination

3.9 Regression

3.9.1 Least Square Method

3.9.2 Properties of Regression coefficient

3.10 Residual Values

3.11 Standard Error Estimate

3.12 Limitations

3.13 Homoscedacity

3.14 Summary

3.15 Check your Progress – Answers

3.16 Questions for Self - Study

3.0 Objectives

can explain the following –

Relation between two variables.

Different errors in calculations.

Formule - to calculate correlation coefficient, converiance, etc.

Use of regression.

Limitations of regression and also can solve the examples with

given numerical data.

3.1 Introduction

ables. If we increase the value of one of the variable by some amount,

what effect will it have on value of other variable? The answer is provided

by Correlation analysis. Let us take and example. We take a group of

students and record their weights and heights. If increase in weight also

corresponds to increase in height in general then we can say that the

height and weight of a group of students are correlated.

changes between the values of two or more variables.

but not between individual items.

exact mathematical relationship.

determine the degree of association or relationship between two or more

variables. The amount of correlation in a data is measured as a coeffi-

cient of correlation, which is denoted by r.

X axis and the other variable on Y axis e.g. if we want to study the rela-

tionship between heights and weights in a group of student, then mea-

surement of height and weight of each student will be recorded. Say for

example the following readings are obtained.

Student 1 2 3 4 5

Height 165 175 160 180 160

Weight 52 57 54 60 50

on a graph paper with suitable scale. This graph is a scatter diagram or

scatter plot.

Business Statistics / 74

I - Scale

X - Weight of a Student

Y - Height os a student

Correlation Co-efficient

= +1

II - Correlation Co-

efficient = -1

Graph is Descending

3.2.3 3.2.4

3.2.5 3.2.6

axis represents the weight of a student. And that the variable Y graphed

on the Y axis represents the height of the student. If the two variables

increase together and are perfectly correlated. All points in the scatter

diagram fall on a straight line, as you see in the upper left graph of the

above figure. In this case the correlation coefficient is a plus 1.

other decreases, the points lie on a falling straight line, as you see in the

upper right graph of. In this case the correlation is a minus 1(-1).

In the middle life graph you see a little bit of a scatter about the

line. The correlation coefficient here is between 0 and 1.

The presence of a correlation means, given X value on the hori-

zontal axis you can make a prediction of the Y value by using the straight

line predictors you see in the graphs. The better the correlation, the

more accurate the prediction. The correlation coefficient just measures

the degree to which the scatter diagrams for the variables approximate

the straight lines Graph – correlation of coefficient.

I X – Weight of a student +1 Two variables increase

Y – Height of a student together. Perfect correlation

II X – Health of a student -1 X – Increases

Y – Height of a student Y – Decreases Perfect

Correlation

III X – Health of a student Between 0 and 1 Not Perfect correlation

Y – I.Height

Q of aofstudent

a student

IV X – Isonomic Status 0 and -1 Not perfect correlation

Y – Height

I. Q of aofstudent

a student

V X – Health of a student Almost 0 No correlation

Y – Height

I. Q of aofstudent

a student

VII X – Social Status Almost 0 No correlation

Y – Height

I. Q of aofstudent

a student

In general – correlation co-efficient

1) +1 +0 - 1 varies

2) 0 No correlation

between two or more sets of random varieties. The covariance for two

variable random variables x and y, each with sample size n, is defined by

the expectation value.

Cov ( x, y) = (x - x)(y - y)

n

xy

xy

= n

Business Statistics / 76

For uncorrelated variables the covariance is zero. However, if the

variables are correlated in some way, then their covariance will be non-

zero In fact, if cov ( x, y ) > 0, then y tends to increase as x increases,

and if cov ( x, y ) < 0, then Y tends to decrease as X increases. Not that

while statistically independent variables are always uncorrelated, the

converse is not necessarily true.

var (x + y) = var (x) + var (y) + 2cov (x, y)

var (x – y) = var (x) + var (y) - 2cov (x, y)

cov (x + z, y) = cov (x, y) + cov (x, z)

cov (ax + by) = ab.cov (x, y)

tion coefficient gives the degree to which the two variables are interre-

lated. It gives the degree of correlation.

ally denoted by r and rxy or rxy and rxy = ryx

such as height of the student in a class.

(2) Bivariate distribution ; In this case there are two variables. Such

as height + weight of the students in a class.

(3) Covariance ; The corresponding values of the two variables x

and y on the given set of n pairs of observation be given by the

pair (x1 y1) , (x2 y2) … (xn yn)

=

n

1

= ( xi X)( yi Y)

n

tively, The above formula for calculation of covariance is difficult and com-

plicated. Easier method of calculation is:

1 1

Cov (x, y) = xi yi xi yi

n n

tion of two variable x and y (1,6) (2, 9) (3, 6) (4,7) (5,8)

xi = 1 + 2 + 3 + 4 + 5 = 15

yi = 6 + 9 + 6 + 7 + 8 = 36

xi y i = 6 + 18 + 18 + 28 + 40 = 110

1 1

Cov (xy) = xi yi xi yi

n n

1 1

= 110 (15)(36)

5 5

1

= 110 108

5

2

=

5

= 0.4

Degree of relationship between variables that are linearly related the points

to fall along and about an imaginary straight line that passes through the

clusters.

The two variables have bivariate normal distribution for any given

value.

The method is used for measuring the linear ship between two

variables (series) Pearson’s coefficient between two variables (x.y) is

denoted by r (x, y) or r or ryx or by simply r . This is also know as product

moment correlation coefficient. It is the of the ratio of the co variance cov

(x , y ) to product of standard deviation of x and y.

Business Statistics / 78

cov(x, y)

r = xy

= standard deviation

Now for n pairs of observation (x1 y1) (x2 y2) ……. (xn, yn)

1

cov ( x, y ) = ( x X)(y Y )

n

1 2

X = ( x X)

n

1 2

Y = (y Y)

5n

( x X)( y Y )

r =

( x X) 2 ( y Y ) 2

(dx , dy )

r = 2 2

(d x ) (dy )

dx = ( x X) and dy = ( y Y )

Alternative formula :

n xy ( x )( y )

2 2 2 2

r = n x ( x ) n y ( y )

data (1, 2) (2, 4) (3 ,8) (4,7) (5, 10) (6,5) (7,14) (8,16) (9,2)

(10,20)

Ans

Correlation & Regression / 79

2 2

X Y X Y XY

1 2 1 4 2

2 4 4 16 8

3 8 9 64 24

4 7 16 49 28

5 10 25 100 50

6 5 36 25 30

7 14 49 196 98

8 16 64 256 128

9 2 81 4 18

10 20 100 400 200

Tota l 55 88 385 586

e, g 2 = 1 2

8 = 24

X

xi 55 5.5

n 10

Y

yi 88 8.8

n 10

n xy ( x )( y )

r = n x 2 ( x )2 n y 2 ( y )2

(10)(586 ) (55)(88)

= (10)(385) (55)2 (10(1114) (88)2

1020 1020

= (825)(3396) =

2801700

1020

=

1673.5

= 0.61 ( approx )

Business Statistics / 80

Ex 3 The following table gives are the monthly income and sav-

ings of 10 persons. Calculate the correlation between

monthly income and savings.

Em ployee 1 2 3 4 5 6 7 8 9 10

Monthly 780 360 980 250 750 820 900 620 650 390

Incom e

Net saving 84 51 91 60 68 62 86 58 53 47

Solution:

6500

X = = 650

10

660

Y = = 66

10

2 2

No X Y X XX Y YY X Y Xy

2 360 51 -290 -15 84100 225 4350

3 980 91 330 25 108900 625 8250

4 250 60 -400 -6 160000 36 2400

5 750 68 100 2 10000 4 200

6 820 62 170 -4 28900 16 680

7 900 86 250 20 62500 400 5000

8 620 58 -30 -8 900 64 240

9 650 53 0 -13 0 169 0

10 390 47 -260 -19 67600 361 4940

( x X)( y Y )

r = 2 2

( x X) (y Y)

xy

r =

x 2y 2

27040

r =

537800 2224

= 0.78

variables X and Y.

This is applied to a problem where there is no quantitative data.

But qualitative data is available.

6 D 2

R = 1-

n(n2 1)

n = number of paired observations,

are follows,

Sta tistics 3 5 8 4 7 10 2 1 6 9

Accountancy 6 4 9 8 1 2 3 10 5 7

Business Statistics / 82

2

Rank X Rank y D D

3 6 -3 9

5 4 1 1

8 9 -1 1

4 8 -4 16

7 1 6 36

10 2 8 64

2 3 1 1

1 10 9 81

6 5 1 1

9 7 2 4

Total 214

6 D 2

R = 1-

n(n2 1)

6(214)

= 1-

10(10 2 1)

= - 0. 2 9 7

When the Ranks are not given

1. Assign the rank highest first and the lowest last on both x and y

2. Find the Rank difference (D), then D2

3. Apply formula as done earlier.

data.

x: 75 88 95 70 60 80 81 50

y: 120 134 150 115 110 140 142 100

Assign first rank to the highest

x – 95, 88, 81, 80, 75, 70, 60, 50

y – 150, 142, 140, 134, 120, 115, 110, 100

75 5 120 5 0 0

88 2 134 4 -2 4

95 1 150 1 0 0

70 6 115 6 0 0

60 7 110 7 0 0

80 4 140 3 1 1

81 3 142 2 1 1

50 8 100 8 0 0

2

? D = 6

rRR = n=8

D2 = 6

66

= 1

8(64 1)

1

= 1

21

20

=

21

= 0.93

It is useful for ordinal data.

distribution but it is not as accurate as Karl Pearson’s coefficient of cor-

relation. It is cumbersome to use for large data and it cannot be used for

continuous variable.

Business Statistics / 84

3.6 Coefficient of Concurrent Deviation

is that the variables will fluctuate the way in which short term fluctuations

take place. If the majority of short term fluctuations are in the same direc-

tion then the variables will have positive correlation and if he the majority

of short term fluctuations are in the opposite direction then the variables

will have negative correlation.

rC = (2cm)/ m

m= number of pairs of deviations, which one Less that actual num-

bers N

m= N -1

demand by concurrent deviation method.

Price : 1 4 3 5 5 8 10 10 11 15

Demand : 100 80 80 60 58 50 40 40 35 30

Solution :

Price (X) CX Demand ( y ) CY CXCY

1 100

4 + 80 -

3 - 80 0

5 + 60 -

5 0 58 -

8 + 50 -

10 + 40 -

10 0 40 0 +

11 + 35 -

15 + 30 -

C=1

Here

m = N - 1 = 10 - 1 = 9

rc = (2cm)/ m

rc = ( 2 1 9 ) / 9

= - 0 . 84

to understand and compute and it is extremely useful for short term fluc-

tuation analysis.

The disadvantages are it is not useful for long term range. It does

not differentiate between small and big variations. The results are rough

indicator and not as accurate as other methods.

2) Two variables increase together mean there is _______________

correlation.

3) Univariate distribution there is only _______________ variable

in the data to study.

4) _______________ there are two variables under study.

5) SE means _______________.

6) The square of the correlat ion coef f icient is cal led

_______________.

7) SLR Analysis indicates that there is only one _______________.

8) If two or more independent Variables then _______________

Analysis.

9) byx is the regression coefficient of _______________.

10) _______________ means variance around the regression line

is the same for all values of variable x.

Business Statistics / 86

3.7 Standard Error and Probable Error

1 r2

SE =

n

r = Coefficient of Correlation

n = number of observations in Pairs.

PE = 0.6745 (SE) = (2/3) (SE)

Properties of P. E.

1) if r = 6 (PE) then it is not significant

2) if, r 6 (PE) then it is significant & correlation exist.

Thus PE is used for testing the reliability value of r.

cient of determination.

the total variation. It tells us what proportion of variation in dependent

variable can be attributed to the variation in the independent variable.

r = 0.8, r2 = 0.64.

variation is unexplained)

error of r

(1 r 2 )

P.E = 0.6745

n

P . E. = 0.072

0.6745(1 r 2 )

0.072 =

25

0.6745(1 r 2 )

0.072 =

5

0.072 5

(1 r 2 ) =

0.6745

0.360

=

0.6745

360

=

674.5

= 0.5333

r2 = 1 - 0.533

= 0.467

r = 0.467

= 0.6833

(1 r 2 )

Standard Error SE =

n

0.533

=

5

= 0.1066

3.9 Regression

tionship between the dependent or response variable (Y) and the inde-

pendent or explanatory variable (X), and it is used for estimation of values

of one variable given the value of the other variable.

Simple Linear Regression Analysis usually begins by plotting

the set of (X,Y) values on a scatter diagram and determining by inspec-

tion if there exists an approximate linear relationship.

Business Statistics / 88

Y = a + byxX

Since the points are unlikely to fall precisely on the line, the

exact linear relationship must be modified to include an error (Stochastic

or random disturbance) term

Y = a + byxX + e

relationship between variables, rather than estimating the value of a de-

pendent variable.

defines a straight line.

is only one independent variable, while

dent / explanatory variables in estimating the value of the dependent vari-

able.

Some assumptions associated with simple linear regression

analysis

1. Linearity

ship. represented by equation :

Y = a + byxX + e

Ŷ = a + byxX

is the estimated value of the dependent variable from the linear relation-

ship. a is the first parameter of the regression equation, indicating the

value of Y when X = 0 and byx is the second parameter of the regression

equation, indicating the slope of the regression line, e is the random error

in the same observation, associated with sampling process.

Scatter Diagram is a graph that show the relationship between

the two variables and can be used to observe whether there is a general

agreement with the assumptions underlying regression analysis.

An alternative graph to determine such agreement is Residual

Graph which is a graph of the residuals e = Y – Ŷ with respect to the

values of Ŷ .

Direct (Positive) Relationship indicates that the values of the

dependent variable Y generally increases as the values of the indepen-

dent variable X increase.

Inverse (Negative) Relationship indicates that the values of Y gen-

erally decrease as the values of X increase.

The general degree of relationship between the variables is indi-

cated by the extent of scatter with respect to the best fitting line.

The mathematical criterion generally used to determine the lin-

ear regression equation is the Least Squares Criterion by which the sum

of the squared deviations between the actual and estimated values of the

dependent variable is minimized.

The parameters A and Byx (Population) in the linear regression

model are estimated by the values of a and byx based on the sample.

Thus, the linear regression equation to determine the estimated (com-

puted) value of the dependent variable, given a value of X for the indepen-

dent variable is :

Ŷ = Y = a + byxX

the value of Y. It takes the form

X = a + bxyY

Business Statistics / 90

3.9.1 Least Squares method

line to the sample of (X,Y) observations. When we want to estimate Y

from X the sum of the square (vertical) of Y deviations of points from the

line is to be minimized.

Minimize ( Y Ŷ ) 2

tion of the line which is derived for the data so that the sum of squares of

the errors are minimum.

zontal) of X deviations of points from the line is to be minimized.

Minimize ( X X̂ ) 2

tion of the line which is derived for the data so that the sum or squares of

the errors are minimum. Thus it would be clear that when we want to

estimate value of y from the given value of x we will need the equation

which is different from the equation which is needed to estimate the value

of x from given value of y. However, if the correlation coefficient is + 1 then

only one and the same equation is used to estimate the values of both

the variables.

from given value of x.

It is expressed as y = a + byxx

slope of the line respectively. Values of a and b are obtained by solving

the following equations which are called as normal equations for y on x.

These equations are obtained from n given pairs of observations for y and x.

y=na + b x

xy = a x + b x2

yy = b yx ( x x )

Cov( x, y ) y

b yx = 2 = r =

y x

where,

dx = X X and dy = Y yŶ

or alternatively,

b yx

n xy ( x)( y)

= n y 2 ( y ) 2

or

xy n X Y

b yx = 2

y2 n Y

3.9.2 Properties of regression Co-efficient

2) Both regression coefficient bxy and byx have the same sign.

where r has the same sign (+ or -) as that bxy and byx.

Regression coefficients are independent of change of origin but

are dependent on change of scale.

Basis of Selection Test Scores :

Business Statistics / 92

Sampled

1 2 3 4 5 6 7 8 9 10

individual

Solution :

Selection 88 85 72 93 70 74 78 93 82 92

Test Score

Perform

17 16 13 18 11 14 15 19 16 20

Rating

Sampled 11 12 13 14 15

individual

Selection 79 84 71 77 87

Test Score

Perform

14 15 12 13 19

Rating

We can use the above table for determining the linear regression

2 2

Sampled Selection Perform XY X Y

individual Test Score X Rating Y

1 88 17 1496 7744 289

2 85 16 1360 7225 256

3 72 13 936 5184 169

4 93 18 1674 8649 324

5 70 11 720 4900 121

6 74 14 1036 5476 196

7 78 15 1170 6084 225

8 93 19 1767 8649 361

9 82 16 1312 6724 256

10 92 20 1840 8464 400

11 79 14 1106 6241 196

12 84 15 1260 7056 225

13 71 12 852 5041 144

14 77 13 1001 5929 169

15 87 19 1653 7569 361

16 87 17 1479 7569 289

17 72 10 720 5184 100

18 77 12 924 5929 144

19 82 14 1148 6724 196

20 76 13 988 5776 169

Total 1,619 298 24492 132117 4590

xy n X Y

b xy = 2 2

x nX

Y = y / n = 298 / 20 = 14.90

xy n XY

byx = 2 2

x nX

Business Statistics / 94

24,492 20(80.95)(14.90)

=

132,117 20(80.95)2

24,492 24123.1

=

132,117 131,058.05

368.9

=

10578.95

= 0.3484

= 0.35

a = Y byx X

= 14.90 – 0.35 ( 80.95 )

= 14.90 – 28.3325

= -13.43

Therefore the regression equation for estimating the performance

rating on the basis of selection test score is :

Ŷ = a + bX

Ŷ = - 13.43 + 0.35X

The value of bxy = 0.35 indicates that the slope of the regression

line is 0.35 indicating that for each increase of one point in the selection

test score, there is an increase of 0.35 in the performance rating. On the

average. Therefore, a direct (positive) relationship exists between these

two variables.

The value of a = -13.43 may look a bit puzzling. Graphically, this

is the point of intersection of the regression line with the Y axis; hence

this is the value of Y when X = 0, but how can there be a ‘negative’

performance rating when the data indicate that only positive ratings are

assessed? The answer is that any regression equation is only meaning-

ful for the range of the values of the independent variable included in the

sample.

Now, if a trainee applicant has a selection test score of 90, the

estimated performance rating on the job is :

Ŷ = a + byx X

= -13.43 + 0.35 ( 90 ) = -13.43 + 31.50 = 18.07 18

3.10 Residual Values

and the estimated value ( Ŷ ), it is also called as error and is given by:

e = Y – Ŷ

fitted regression line values Ŷ .

smaller the residuals, error terms), the greater is the variation in Y (de-

pendent variable) ‘explained’ by the estimated regression equation. Total

variation in Y is equal to the explained plus the residual variation.

individual Score X Rating Y value Y e = Y – Ŷ

1 88 17 17.37 -0.37

2 85 16 16.32 -0.32

3 72 13 11.77 1.23

4 93 18 19.12 - 1.12

5 70 11 11.07 -0.07

6 74 14 12.47 1.53

7 78 15 13.87 1.13

8 93 19 19.12 -0.12

9 82 16 15.27 0.73

10 92 20 18.77 1.23

11 79 14 14.22 -0.22

12 84 15 15.97 -0.97

13 71 12 11.42 0.58

14 77 13 13.52 -0.52

15 87 19 17.02 1.98

Business Statistics / 96

16 87 17 17.02 -0.02

17 72 10 11.77 -1.77

18 77 12 13.52 -1.52

19 82 14 15.27 -1.27

20 76 13 13.17 -0.17

2

Regression Sum of Squares (RSS) = Explained Variation = ( Ŷ Y )

2

Error Sum of Squares (ESS) = Residual variation in y = ( y Ŷ )

Dividing both sides by TSS gives us

1 = RSS/TSS + ESS/TSS

of the total variation in Y (dependent variable) explained by the regression

of Y on X (independent variable)

r2 = RSS/TSS = 1 – (ESS/TSS)

i.e. (ESS/TSS) = 1 – r2

nation as it tells the proportion of the variation in one variable which can-

not be attributed to the variation in the other variable.

Coefficient of Correlation (r) on the other hand, is the square root

of the coefficient of determination, with the arithmetic sign being desig-

nated as positive if the relationship is direct and negative if the relation-

ship is inverse.

tween two variables X and Y, if value of one variable is given, to estimate

the value give other.

known values of a variable (called as dependent variable) from the known

values of other (called as independent variable). The regression line which

describes the average relationship between the variable x and y.

3.11 Standard Error Estimate

regression line is given by

= unexplaine d error / n = (Y - Yˆ ) 2 / n

Syx = y 1- r2

(x - Xˆ ) 2 / n

Where X is the observed value.

mated figures. The smaller its value, the better are the estimates & hence

more representative is the regression line.

tors and their performance ratings by the number of good

turned out per 100 pieces.

Operator 1 2 3 4 5 6 7 8

Experience (x) 16 12 18 4 3 10 5 12

Ratings (y) 87 88 89 68 78 80 75 83

ence and estimate the probable performance if an operator has y 7 year

experience.

Business Statistics / 98

Solution :

n=8

X

x 80 10

we have

n 8

Y

y 648 81

n 8

16 87 6 6 36 36 36 256 7 569 13 92

12 88 2 7 4 49 14 144 7 744 10 56

18 89 8 8 64 64 64 324 792 16 02

4 68 -6 -1 3 36 16 9 78 16 4 624 2 72

3 78 -7 -3 49 9 21 9 6 884 2 34

10 80 0 -1 0 36 0 100 6 400 8 00

5 75 -5 -6 25 36 30 25 5 625 3 75

12 83 2 2 4 4 4 144 6 889 9 96

80 6 48 0 0 218 31 8 247 1018 53676 67 27

dx dy 247

b yx = 2 = = 1.133

dx 218

By direct method

(n xy x )( y )

b yx =

n x 2 ( x)2

8 6727 80 648

=

8 1080 (80)2

1976

=

1744

= 1.133

Equation of regression line on x is

y Y b yx ( x X)

y – 81 = 1.133 (x – 10)

y – 81 = 1.133x – 11.33

y = 1.133x + 81 – 11.33

= 1.133 + 69.67 ans.

If experience is 7 years, the probable performance will be.

x=7

y = 1.133 X + 69.67

= (1.133) 7 + 69.67

= 7.991 + 69.67

= 77.66

X Y

Mean 36 85

Standard deviation 11 08

sion equation x and y hence estimate value of x when y = 75.

Solution :

Given x = 36, y = 85

σ x = 11 σy 8

r = 0.66

x 11

Now, b xy = r = 0.66 = 0.908

y 8

The regression equation x on y

x- X = bxy (y - Y )

x – 36 = 0.908 ( y – 85)

x – 36 = 0.908 y – 77.180

x = 0.908 y – 77.180 + 36

= 0.908 y – 41.180

When Y = 75, then x will be;

X = 0.908 X 75 – 41.180

= 36.92 Ans.

Business Statistics / 100

3.12 Limitations of correlation and regression analysis

variables dependent or independent then this relation gives falsely

results.

two variables to be correlated, correlation will not consider this kind of

relationship.

3.13 Homoscedacity

This means that variance around the regression line is the same

for all values of predictor variable x. The plot shows a violation of this

assumption. For the lower values, the points are all very near the regres-

sion line. For higher values on the x-axis, there is much more variability

around the regression line.

3.14 Summary

line of regression, Rank Correlation. Types of Errors etc., These all are

very important to calculate dependent variable with independent variable.

1) Co-Variance

2) Perfect

3) One

4) Bivariate distribution

5) Standard error

6) Coefficient of determination

7) independent Variable

8) Multiple Regression

9) y on x

10) Homescedacity

3.16 Questions for Self - Study

Short Notes:

1) Coefficient of concurrent

2) Types of Errors

3) Line of Regression

4) Homescedacity

5) Standard Error Estimate

NOTES

NOTES

CHAPTER 4

4.0 Objectives

4.1 Introduction

4.2 Important Definations

4.3 Basic Calculations in Probability

4.4 Basics of Permutations and Combinations

4.5 Set Theory & Probability Theorems

4.6 Baye’s Theorem

4.7 Mathematical Expectations or Expected Values

4.8 Binomial Distribution

4.9 Poisson Distribution

4.9.1 Properties of Poission Distribution

4.9.2 Examples of Events

4.10 Normal Distribution

4.10.1 Properties of Normal Distribution

4.11 Standardizing Normal Radom Variable

4.12 Summary

4.13 Check your Progress - Answers

4.14 Questions for Self - Study

4.0 Objectives

calculations and all the probable formula students can explain the

following –

Concept of probability

Events

Expectations

Expected values

Combinations

Permutations

Students can understand the problems and solve with given nu-

merical data, can give the proper decisions in business administrations.

Probability & Distributions / 105

4.1 Introduction

out of all probable situations for making decision. After reading this chap-

ter, students can understand the mathematical theory of probability for

making business decision. Also we consider some situations where di-

rect theoretical results can be applied.

istic models and probabilistic models. In deterministic model, we do not

consider any uncertainty, while the real life situations are full of uncer-

tainties. And if these uncertainties are not included in decision making

process, one may end up in making incorrect decision incurring losses.

ity is a measure of uncertainty or certainty whichever way we define. For

example, if we say that probability that product will be sold is 0.6 then we

are 60% sure (certainty) that product will be sold and we are 40% unsure

(uncertainty) about the sale of the product.

Probability

It is theory of chance when taken as science.

It is chance of happening an event when considered in connec-

tion with the event. Probability of any event is between 0 and 1, both

included. Probability is also defined as the percentage of times for which

a specific out come would happen if the same experiment were repeated

number of times.

Probability theory, which is based on statistical data and prob-

ability axioms, is called as mathematical probability.

Axioms of probability

There are three axioms of probability : (1) Chances are always

at least zero (2) The maximum chance that something happens is

100% (3) If two events cannot both occur at the same time, the chance

that either one occurs is the sum of the chances that each occurs.

Subjective Probability

Probability theory, which is based on feeling or thinking of a per-

son, is called as subjective probability.

Conditional Probability

It is probability of an event that is calculated on the assumption

that some related has happened.

Experiment

Action whose outcomes are of interest to us is called as an

experiment. e.g. toss of a coin. Chance of getting head or tail at a time is

exactly one half.

Event is a set of one or more outcomes of an experiment. An

event is said to have happened if the outcome is the result of the experi-

ment. e.g. In the experiment of tossing of a coin there are two outcomes

head and tail. Two events A and B can be defined as follows:

Event A: Head shows up in the experiment of tossing of a coin.

Event B: Tail shows up in the experiment of tossing of a coin.

Now if head shows up, then we can say that event A has happened.

Probability of an event A is denoted by P(A).

Sample Space

Set of all possible out comes of an experiment is called as sample

space.

Dependent Events.

If happening of one event changes the probability of another event

then those events are said to be dependent events.

Independent Events

If happening of one event does not change the probability of an-

other event then those events are said to be independent events.

Two or more events which cannot happen at the same point of

time are called as mutually exclusive events.

Exhaustive Events

If two or more events cover the entire sample space i.e if two or

more events cover all possible outcomes of an experiment, then such

events are called as exhaustive events.

Impossible Event

If probability of a happening of an event is 0, the event is an

impossible event.

Certain Event

If probability of a happening of an event is 1, the event is a certain

event.

Complement of an event

Complement of an event means that event does not happen. i.e.

if event A is getting 1 in a throw of dice then A complement is not getting

1 in a throw of dice. Complement of event A is denoted by (Ac , A’ or

c

A ) and P(A ) = 1 – P( A ).

Two or more events are called as equally likely if they have the

same probability of occurrence.

For discrete distribution expected value is weighted mean of all

outcomes where weights are probability of outcome. If X and Y are two

random variables , the expected value of their sum is the sum of their

expected values (E(X+Y) = E(X) + E(Y)), and the expected value of a

constant a times random variable X is the constant times the expected

value of X ( E ( a X) = a E ( X )).

2) Action whose outcomes are of interest to us is called as

_______________.

_______________.

4) Set of all possible outcomes of an experiment is called as

_______________.

5) If happening of one event changes the probability of another

event then those events are said to _______________.

4.3 Basic Calculations in probability

tion or decimal or in percentage. e.g. probability of getting head in a toss

of coin is ½ or 0.5 or 50%.

P(A) =

Total Outcomes

probability that it is a queen.

Number of favourable outcomes = 4 (4 queens)

Thus P(A) = 4/52 = 1/13

getting a queen:-

1 12

P(A) = 1 – P(A) = 1 - =

13 13

Thus,

P (A) + p ( A ) = 1

1 12

+ =1

13 13

Now, event A is certain to occur then P(A) = 1 and P( A ) = 0

Alternatively probability can be defined as

n( A )

P(A) =

n(S)

Let us consider tossing of a coin. The outcomes are head or tail.

S denotes a complete set of outcomes for a given situation and it is

called as sample space or universe. Thus in above experiment, Sample

space = S = {H,T}

Let us define event A : Getting a head on the top surface.

Therefore A = {H}

Now n(A) denotes number of elements in the set A. Since set A

has only one element n(A) = 1. Set S, sample space has 2 elements in

Probability & Distributions / 109

it. Therefore, n(S) = 2. Thus probability that head is obtained in the

tossing of a coin is;

n( A ) 1

P(A) = =

n(S) 2

If we apply the first definition, then the number of favourable out-

comes are the ones in which we are interested. In this case we are

interested only in head i.e. number of favorable outcomes is only 1. Total

number of outcomes is 2. Again the probability getting head is ½.

of one coin one after another. The possibilities are

Head Head

Head Tail

Tail Head

Tail Tail

Therefore A = {(H.H)}

i.e. n(A) = 1. n(S) = 4

Therefore P(A) = n(A) / n(S) = ¼

Therefore B = {(H,T), (T,H), (H,H)} (At least one, therefore one

head or two heads are what we are looking for)

i.e. n ( A ) = 3. n ( S ) = 4

Therefore P ( A ) = n ( A ) / n ( S ) = ¾

dice are multiplied together. Find probability that the prod-

uct is 4.

Solution:

n ( S ) = (Number of outcomes in one trial) Number of trials

n ( S ) = 62 = 36

Favourable outcomes are

A = { (1x 4), (2 x 2), (4 x 1) }

i.e. n(A) = 3

3 1

P(A) = =

36 12

of 3 balls from a box containing 5 white and 4 black balls?

pose there three balls Red, Blue and Yellow. If 2 balls are selected out of

these 3 balls then possible selections are Red, Blue or Blue, Yellow or

Yellow, Red i.e. there ways in which selection can be made.

This is written as 3C2 which means select 2 objects (here balls)

at a time out of 3 objects, i.e. combinations of 2 objects taken at a time

out of 3 objects. And it is calculated as-

3

C2 = 3 x 2 / 1 x 2 = 3 (in the denominator go on multiplying up to

the number after C and in the numerator go on multiplying in the reverse

direction in the decreasing order starting from the number before C for the

same number of digits as that of in the denominator.)

10

C3 = 10 x 9 x 8 / 1 x 2 x 3 = 120

10 ! 10 !

= =

(10 - 3) ! 3 ! 7 ! 3!

7 ! 8 9 10

=

7 ! 1 2 3

Technical definition for combination is

n!

n =

Cr r ! (n - r)!

n! = 1,2,3,………n

= n(n–1)(n–2)…….1

e.g. 4! = 1.2.3.4

= 4.3.2.1 = 24

10

C3 = 10 ! / 3 ( 10 – 3 ) ! = 10 ! / 3 ! 7 !

= 10.9.8 / 1.2.3. = 120

0! = 1

1! = 1

n! = n(n–1)!

=n(n–1)(n–2)! And so on

i.e. 10 ! = 10.9.8 ! = 10.9.8.7 ! And so on

n

c0 = 1

n

cn = 1

n

c1 = n

n

cn = ncn-r i.e. 10C7 = 10

C10-7 = 10

C3

tant? Arrangement means the order in which objects are presented is

also important.

Red, Blue or Blue, Red

Blue, Yellow or Yellow, Blue

Yellow, Red or Red, Yellow

Selections and their arrangements is called as permutations.

This is written as 3P2 which means select and arrange 2 objects

(here balls) at a time out of 3 objects. i.e. permutations of 2 objects taken

at a time out of 3 objects. And it is calculated as -

8P2 = 3.2 = 56

10

P3 = 10.9.8 = 720

Technical definition for permutation is;

n!

n

Pr =

(n r) !

It is read as permutations of r objects taken at a time out of n

objects.

10

P3 = 10 ! / ( 10 – 3 ) ! = 10 ! / 7 ! = 10.9.8

= 10.9.8 = 980

n

P0 = 1

n

Pn = n !

n

P1 = n

Getting back to our problem

Balls drawn = 3 from a box containing 5W and 4B balls.

White balls = 5

Black balls = 4

Total Balls = 9

Let A be the event where 3 white balls are drawn. Now , 3 white

balls must to come from 5 available white balls.

N(A) = Number of ways in which 3 white balls can be drawn out

of 5 balls

5

5.4.3

= C3 =

1.2.3

= 10 ways

N ( S ) is Number of ways in which 3 balls can be drawn out of total 9

balls

9

9.8.7

= C3 = = 3.4.7

1.2.3

= 84 ways

n(A) 10 5

P(A) = = =

n (S) 84 42

1) 0 ! = _______________.

2) 1! = _______________.

3) n C = _______________.

r

4) n C = _______________.

n

5) n C = _______________.

i

4.5 Set Theory and Probability Theorems

the concept of sets must be very clear in mind.

N = { 1,2,3, ............. }

special condition.

and A and event, then -

i) P ( A) 0

ii) P( ) 0

iii) P (S ) 0

Proof - Since A is an event, therefore ACS

n( A)

i) P ( A) 0

n( S )

n( ) 0

ii) P( ) 0

n( S ) n( S )

n( S )

iii) P( S ) 1

n( S )

Points to Remember (Most IMP)

Theory 2 - If A and B are mutually exclusive events then

P( A B) 0

AB =

n( ) 0

P ( A B ) P ( ) 0

n(S ) n(S )

Theorem 3 - If A and B are two mutually exclusive events, then

P(A) + P(B) = 1

A and B be two mutually exclusive events.

Then AB =

and AB = S

A and B mutually exclusive events

AB =

P(AB) = p() = 0

P(AB) = P(A) + P(B) - P(AB)

= P(A) + P(B) - 0

= P(A) + P(B)

but (AB) = S

P(AB) = P(S) = 1

P(A) + P(B) = 1

Theorem 4 - Addition Law - If A and B are mutually exclusive

events them -

n( A B) n( A) n( B)

P( A B)

n( S ) n( S )

( A B n ( A B ) n( A) n( B ))

n( A) n ( B)

n( S ) n( S )

P ( A) P ( B )

= P(A1) + P(A2) + .............P(Ak)

k

p( Ai)

i 1

P (A - B) = P(A) - P(AB)

Proof - Let A and B are two events

A-B

A B AB

(A-B) (AB) =

(A-B) (AB) = A

P( A - B) P(A) - P(AB)]

Theorem 6 - Addition Law - For any two events A and B

A B AB

A-BB=

A-BB=AB

A-B

P(AB) = P[(A - B) B]

= P(A - B) + P(B)

= P(A) - P(AB) + P(B) A B

P(A or B) = P(AB) = P(A) + P(B) - P(AB) AB

Theorem 7 - Addition Law for three events -

A B C

BC = D ABC = AD

Let BC = D

Then P(ABC) = P(AD) = P(A) + P(D) - P(AD) ……. (1)

…… (by Theorem (1))

But

AD = A (BC)

= (AB) (AC)

P (AD) = P [(AB) (AC)]

= P (AB) + P (AC) - P [(AB) (AC)] ……….(2)

[…….By Theorem 6]

= P (AB) + P(AC) - P(ABC)

and P(D) = P(BC) = P(B) + P(C) - P(BC) ……..(3)

using (1), (2), and (3) we have

P (ABC) = P(A) + P(B) + P(C) - P(AB) - P(BC) - P(AC) +

P(ABC)

Corollary - IF A, B, C are mutually exclusive events.

Then

P(AB) = P(A) + P(B)

P(BC) = P(B) + P(C)

P(AC) = P(A) + P(C)

P(ABC) = P(A) + P(B) + P(C)

Theorem 8 - For each event A, P( A ) = 1 - P(A), Where A is a

complementary event of A.

A B

Proof - Given A B

B-A

B = A (B - A)

and A (B-A) =

P(B) = P[A (B-A)]

= P(A) + P(B - A) (by Theorem 4)

P(A) P(B) P(B-A) 0

Theorem 10 - If A is an event associated with a random experiment,

then 0 P(A) 1

Examples : 1

P(B) = P and P( A B) 0.5 . Find the value of P..

P(AB) = 0

P(AB) = P(A) + P(B)

0.5 = 0.3 + P

P = 0.5 - 0.3

= 0.2

Examples 2 - In a class of 25 students with roll numbers 1 to 25 a

student is picked up at random to answer a question. Find the probability

that the roll number of the selected student is either a multiple of 5 or 7.

B - bea event of numbers multiple of 7

S = { 1, 2, 3, 4, ………..23, 24, 25}

A = {5, 10, 15, 20, 25}

B = {7, 14, 21}

n (S) = 25

n (A) = 5

n (B) = 3

n( A) 5 1

P ( A) 0.2

n(S ) 25 5

n( B ) 3

P( B )

n(S ) 25

AB =

P(AB) = 0

P(AB) = P(A) + P(B)

5 3

25 25

53

25

8

25

Answer - Probability of student selected at random for answer a question

8

is

25

.

20 take both tea and coffee. Find the probability that a per-

son selected at random

1. Takes Tea

2. Takes Coffee

3. Takes Tea or Coffee

4. Takes Tea and Coffee both

5. Takes neither Tea nor Coffee

6. Takes only Tea

7. Takes only Coffee

8. Takes only one drink

9. Takes Tea if we know that the person takes coffee

10. Takes coffee if we know that the person takes tea.

T C

16 20 25

39

Event C: Person selected at random takes coffee

Now,

n(T) = 36 ( Number of persons taking tea )

n ( C )= 45 ( Number of persons taking coffee )

n (S) = 100 (Total number of persons in the group)

n (T) = 36 (Number of persons taking tea)

n (C) = 45 (Number of persons taking coffee)

n (T C) = 20 (Number of persons taking tea and coffee both)

By addition theorem

N (T U C) = n (T) + n (C) – n (T C) (Number of people taking tea or

coffee or both)

= 36 + 45 – 20 = 61

n (T U C)’ = n (S) – n (T U C) (Number of people neither taking tea nor

coffee)

= 100 – 61 = 39

P (T) = n (T)/n(S) = 36/100 = 0.36

2. P (Person does not take Tea)

P (T’) = 1- P (T) = 1 - 0.36 = 0.64

3. P (Person Takes Coffee)

45

P(C) = n (C) /n (S) = = 0.45

100

4. P (Person Takes Tea or Coffee)

61

P (T U C) = n (T U C) / n(S) = = 0.61

100

5. P (Person Takes Tea and Coffee)

P (T C) = n (T C) n (S) = 20/100 = 0.20

6. P (Person neither Takes Tea nor Coffee)

P (T U C) = n (T U C)’ / n (S) = 39/100 = 0.39

Or P (T U C) = 1 – P (T U C) = 1 – 0.61 = 0.39

7. P (Person Takes only Tea)

i.e P (Person takes tea and not coffee)

P (T – C) = P (T C’) = P (T) – P (T C) = 0.36 – 0.20 = 0.16

8. P (Person Takes Tea)

i. e P (Person takes coffee and not tea)

P (C – T) = P (C T ) = P (C) – P (T C) = 0.45 – 0.20 = 0.25

9. P (Person Takes only one drink)

i. e . P (Person takes only coffee or only tea)

P (C – T) + P (T – C) = 0.160 + 0.25 = 0.41

or P (T) + P (C) – 2 x (T C) = 0.36 + 0.45 – 2 x 0.20 = 0.41

10. P (Person Takes Tea if we know that the person takes coffee)

i. e pick up a tea taking person from group of coffee drinking

persons

P (T/C) = n (T C) / n (C) = 20/45 = 4/9

11. P (Person Takes Coffee if we know that the person takes Tea)

i. e pick up a coffee taking person from group of tea drinking

persons

P (C/T) = n (T C) / n (T) = 20/36 = 5/9

or

P (C/T) = P (T C) / P (T) = 0.20/0.36 = 5/9

a black card or an ace

Solution:

Total Cards = 52

Black Cards = 26

Aces = 4

Let A be the event or drawing a black card

n (A) = 26C1 = 26

n (A) 26

So, P (A) = =

n (S) 52

n (S) = 52 (Total sample space)

n (B) = 4C1 = 4

n (B) 4

So, P (B ) = =

n (S) 52

n (S) = 52

Now 2 aces are black, that is they are common to both Black

cards and aces.

n (A B) = 2

So, P (A B) = n (A B) = 2

n (S) = 52 (Total no. of cards in a pack)

26 4 2

= + -

52 52 52

26 4 - 2 28 7

= = =

52 52 13

[ Note : A U B = A or B, A B = A and B ]

probability that a ball selected at random is a ball with a

number that is multiple of 3 or 4.

Solution

n (S) = 13

Let A be the event that ball selected is with a number that is

multiple of 3,

i. e 3,6,9,12

n (A) = 4

4

P (A) =

13

Let B be the event that ball selected is with a number that is multiple

of 4, i. e. 4,8,12 n (B) = 3

3

P (A) =

13

and n (A B) = 1 (There is only one number which is multiple of both

3 and 4 Which is 12)

1

P (A B) =

13

P (A U B) = P (A) + P (B) – P (A B)

4 3 1 6

= + - =

13 13 13 13

is drawn at random. Find the probability that it will be a

multiple of 2 or 5 .

Solution:

N (S) = 20

Let A be the event that ticket drawn is a multiple of 2

i.e. 2,4,6,8,10,12,14,16,18,20

n (A) = 10

10

P (A) =

20

Let B be the event that ticket drawn is a multiple or 5

i.e. 5,10,15,20

n (B) = 4

4

P (B) =

20

n (A B) = 2 (2 numbers are common to both events A and B)

2

P (A B ) =

20

P (A U B) = P (A) + P (B) – P (A B)

10 4 2 12 3

= + - = =

20 20 20 20 5

ematics, 20% have failed in Chemistry and 10% have failed

in both Mathematics and Chemistry. A student is selected

at random.

i) What is the probability that the student has failed in Mathemat-

ics if it is known that he has failed in Chemistry?

ii) What is the probability that the student has failed in Mathemat-

ics or Chemistry?

Solution:

Event B : Student failed in Chemistry

30

P (Failed in Maths) = = P (A)

100

20

P (Failed in Chem) = = P (B)

100

10

P (A B) =

100

P( A B)

P(A/B) =

P(B)

10

100

= 20

100

10 100

=

100 20

1

=

2

ii ) either in Maths or in Chem

(A U B) = ?

P ( A U B) = P ( A) + P (B) – P (A B)

30 20 10

= + -

100 100 100

30 20 10

=

100

50 10

=

100

40

=

100

40

P (A U B) =

100

= 0.40

Ex 10 The probability that a contractor will get a plumbing con-

tract is 2/3, and the probability that he will not get a electric

contract is 5/9. if the probability of getting at least one con-

tract is 4/5, what is the probability that he will get both the

contracts.

Solution

2

P (A) =

3

Event B = will get electric contract

5

P (B’) =

9

4

Probability of getting at least one contract i.e. P (A U B) =

5

Probability of (A B) =?

5 9 5 4

P (B) = 1 - = =

9 9 9

P( A B) = P(A)+P(B)-P(A B)

2 4 4

= + -

3 9 5

42

=

135

Ex 11 An urn contains 7 black and 5 white balls. Two balls are

drawn at random one after another. Find the probability that

both balls drawn are black if :

Probability & Distributions / 127

i) when first ball drawn is not replaced before drawing the second

(such drawing is called without replacement) and

ii) when first ball drawn is replaced before the second ball (such

drawing is called with replacement)

Solution

Black Balls = 7

White Balls = 5

Total balls = 7 + 5 = 12

So

n (s) = 12C2

12 11

= = 6 11 = 66

1 2

i) When first ball drawn is not replaced before drawing the second

(such drawing is called without replacement) In such cases we

find the probability by usual method 2 black balls can be drawn

out of 7 black balls in 7C2 ways.

7!

7

n (A) = C2 =

(7 2 ) ! 2 !

5 ! 6 7

= = 21

5 ! 1 2

7

C2

P (A) = 12

C2

21

=

66

7

=

22

ii) When first ball drawn is replaced before the second ball (such

drawing is called with replacement)

A = 1st ball drawn is black

i.e. We consider the event in two steps.

For first step 1 black ball is to be drawn out of 7 black balls

n (A) = 7C1 = 7

And for n (S) 1 ball is to be drawn out of total 12 balls

n (S) = 12C1 = 12

n(A)

P (A) =

n(B)

7

=

12

B = 2nd ball drawn is black when first ball is replaced.

At this stage the ball drawn is put back into the urn.

We, therefore have again same situation i.e. 7 black balls and

12 total balls – for second step 1 black ball is to drawn again out

of 7 black balls as earlier ball is replaced.

N (B) = 7C1 = 7

And for n (S) 1 ball is to be drawn out of total 12 balls

n (S) = 12C1 = 12

n(A)

P (B) =

n(B)

7

=

12

Since A and B are independent events

7 7 49

P (A B) = P (A) P (B) = =

12 12 144

blue and 4 red balls. What is the chance that 2 balls are

blue and 1 is red?

Solution

3 balls can be drawn out of 10 balls in 10C3 ways.

10.8.9

n (S) = 10C3 =

1.2.3

= 120

2 blue balls can be drawn out of 6 in 6C2 ways = 6.5 = 15 ways.

1 red ball can be drawn out of 4 in 4C1 ways = 4 ways.

6

n (A) = C2 . 4C1 = 15.4 = 60

60

P (A) =

120

1

=

2

Ex. 13 A and B are independent events and P (A) = 1/3,P (B) = ¾

find P (AUB)

Solution

1 3 1

P (A B) = P (A) P (B) = =

3 4 4

And

1 3 1

P ( A U B) = P (A) + P (B) – P (A B) =

3 4 4

1 2

=

3 4

46

=

12

10

=

12

5

=

6

in the same post. The probability of X’s selection is 1/5 and

that of Y’s selection is 1/3. What is the probability?

i) both X and Y will be selected

ii) only one of them will be selected and

iii) none of them will be selected?

Solution

1

P (A) = P (X selection) =

5

Business Statistics / 130

1

P(B) = P(Y Selection) =

3

i. P (both X and Y are selected) = P(A and B) = P(A B)

1 1

=

5 3

1

=

15

i. P( only one of X and Y is selected) = P(only A) + P(only B)

P(only A) = P(A) – P(A B)

= P (A) – [P(A) . P(B)]

1 11

= -

5 5 3

1 1

= -

5 15

15 5

=

75

10

=

75

2

=

15

And

P (only B) = P(B) – P(A B)

= P(B) – [P(A).P(B)]

1 11

= -

3 5 3

1 1

= -

3 15

15 3

=

45

4

=

15

Probability & Distributions / 131

iii. None of them selected

p(x, y) bothe not selected )

= 1 - p ( x and y both selected )

1

= 1-

15

14

=

15

4.6 Baye’s Theorem

If A1,A2,A3 ….. An are mutually exclusive and exhaustive events

and B is any other which is spread over events A1, A2, A3…..An. Consider

that there are 3 containers containing balls of different colors. Then event

A1 is selection of container 1, event A2 is selection of container 2 and A3

is selection of container 3. Thus it can be seen that events A1, A2 and A3

are mutually exclusive and exhaustive. If we define events B as drawing a

yellow ball then yellow ball can be selected from container 1 or 2 or 3.

Then we say that event B is spread over events A1, A2 and A3. And if we

know that the ball drawn is yellow then probability that the ball is se-

lected from a particular container is given by

P( Ai)P(B )

Ai

P( A1 ) = P( A1) P( B ) P( A 2 ) P( B ) ...P( A i ) P( B ) ...P( A n ) P( B )

B A1 A2 Ai An

Alternatively ...

A( Ai B)

P( Ai ) =

B A( A 1 B) P( A 2 B) ..... P( Ai B) ...... P( A n B)

and 8 black balls, urn3 contains 3 red and 6 black balls.

One urn is chosen at random and a ball is drawn. The color

of the ball is black. What is the probability that it has been

drawn from urn3 ?

Solution:

Let A1 be the event of selection of urn-1

Let A2 be the event of selection of urn-2

Let A3 be the event of selection of urn-3

Let Event B = black ball is drawn.

Total urns are 3 therefore ,

1

P(A1) =

3

1

P(A2) =

3

1

P(A3) =

3

We know that the ball drawn is black and we are interested to

know the probability that it has come from urn 3 i.e. we want to know

P(A3/B)

Now by bayes’s theorem.

P( A 3 B)

P(A3/B) = P( A 1 B ) P( A 2 B ) P( A 3 B )

ability of drawing black ball from urn1 is (That is we know that urn 1 is

selected and we want to know the probability of drawing black ball).

P(B/A1) =

Total balls in urn 1

5 1

P(B/A1) = =

10 2

Similarly ,

8 2

P(B/A2) = =

12 3

6 2

P(B/A3) = =

9 3

1 1 1

P(A1 B) = P (B/A1). P(A1) = =

3 2 6

1 2 2

P(A2 B) = P (B/A2). P(A2) = =

3 3 9

1 2 2

P(A3 B) = P (B/A3). P(A3) = =

3 3 9

Therefore,

P( A 3 B)

P(A3/B) = P( A 1 B ) P( A 2 B ) P( A 3 B )

2

9

= 1 2 2

6 9 9

2

9

= 33

54

2 54

=

9 33

4

=

11

to collapse whether design is faulty or not. The chance that

the design is faulty is 10%. The chance that the building

collapses is 95% if the design is faulty and otherwise it is

45%. It is seen that the building collapsed. What is the

probability that it is due to faulty design?

Solution

Event A2: Design is not faulty.

Event B = building collapses.

(i.e. design is faulty)

So probability that design is not faulty is

Now,

P(B/A1) = 0.95 (Building Collapses when we know that design is faulty)

P(B/A2) = 0.45 (Building Collapses when we know that design

is not faulty).

We want to find the probability of Design is faulty when we know

that building collapsed. That is we are interested in P(A1 / B). By Baye’s

Theorem.

P(A1 )P( B )

A1

P(A1/B) = P(A1 )P( B ) P(A 2 )P(B )

A1 A2

P(A1 B)

= P(A1 B) P(A 2 B)

10 95

100 100

P(A1/B) = 10 95 90 45

100 100 100 100

0.095

=

0.95 0.405

0.095

=

0.5

= 0.19

experiment is repeated large number of times e.g. when toss a coin for 4

times the possible number of heads are 0, 1, 2, 3, 4. If this experiment is

repeated for say 100 times how many numbers of heads are expected ?

To make such calculations we will calculate mathematical expectation

by probability theory and then multiply the result obtained by 100 to know

expected number of heads in throwing 4 coins 100 times.

ample makes it clear.

is tossed for 3 times.

No of heads (x) 0 1 2 3 Total

Outcomes

Probability (P) 1 3 3 1 1

8 8 8 8

Px (row 2 x row 1) 0 3 6 3 12

= 1.5

8 8 8 8

Px2 (row 2 x 0 3 12 9 24

square of row 1) =3

8 8 8 8

Solution

denoted by m.

E( x ) = Px2 = 3

Variance (x) = E ( x 2) – m 2

= 3 – (1.5) 2

= 3 – 2.25 = 0.75

Note that

Variance = E(x2) – m2

= E (x2) – [E(x)]2

= Pi xi2 – [ Pi xi ] 2

Where

X = number of outcomes

P = Probability of that outcome.

balanced dice is thrown. Find also standard deviation.

Solution

X 1 2 3 4 5 6

Probability 1 1 1 1 1 1

6 6 6 6 6 6

1

i.e. Probability of each is

6

Solution :

1 1 1 1

P ix i = (1 x ) + (2 x ) + (3 x ) + (4 x ) +

6 6 6 6

1 1 1 2 3 4 5 6

(5x )+ ( 6 x ) + + + + +

6 6 6 6 6 6 6 6

21 7

P ix i =

6

= = m = E(x)

2

1 1 1 1 1 1

Pix2i = (12 x ) + (22 x ) + (32 x ) + (42 x ) + (52 x ) + ( 62 x )

6 6 6 6 6 6

1 4 9 16 25 36

= + + + + +

6 6 6 6 6 6

91

Pix2i =

6

= 15.16 = E(x2)

= 15.16 – (3.5)2

= 15.16 – 12.25 = 2.91

In three tosses of balanced coin, he will get a reward of Rs.

20,000, Rs. 10,000, Rs. 5000 and no reward if he gets three

tails, two tails, one tail and no tail respectively. The en-

trance fee for the contest is Rs. 6000. Will he play the

game?

Probability (P) 1 3 3 1 1

8 8 8 8

8 8 8 8

Expected value of reward = Er = 8125

As expected earnings are more than the entrance fee, the per-

son will play the game.

Rs. 5 each and other 5 a prize of Rs. 2.

i. If one ticket is drawn what is the expected value of the prize.

ii. If 2 tickets are drawn what is the expected value of the game?

Solution

Event B : Ticket Drawn carries prize of Rs. 2

3 tickets have Prize Rs. 5

5 tickets have Prize Rs. 2

Total Tickets = 8 T

n (S) = 8 C 1 = 8

n (A) = 3 C 1 = 3

n (B) = 5 C 1 = 5

3 5

C1 C1

P (A) = 8 P (B) = 8

C1 C1

3 5

P (A) = , P (B) =

8 8

3 5

Pi Xi = (5 x ) + (2 x )

8 8

15 10 25

= + =

8 8 8

Pi Xi = 3.125 = m = E (x)

ii) Two tickets are drawn

C1 / 8C2 = 3 28

C1 / 8C2 = 10 28

C1 5C1/8C2 = 15 28

Pix i = (10 x 3 28 ) + (4 x 10 28 ) + (7 x 15 28 )

B. In the process of manufacture of part A. 9 out of 100 are

likely to be defective. Similarly 5 out of 100 are likely to be

defective in manufacture of part B. Calculate the probability

that assumed part will not be defective.

Solution

A B

Defective P (A) = 9/100 Defective P (B) = 5/100

Good P (A)’ = 91/100 Good P(B)’ = 95/100

So assumed part is not defective will be A’ and B’.

i.e. is not defective A’ B’.

Therefore P ( A’ B’ ) = 91 95

100 100

= 8645

10000

= 0.8645

tistics is 3/4, that Y can solve is 2/5, that Z can solve is

5 / 9. If they all try independently, find the probability that

problem will be solved.

Solution:

P (B) = Y solves the problem = 2/5

P (C) = Z solves the problem = 5/9

P (S) = 1

P (A’ B’) + P (A U B) = P (S) = 1

Similarly

P (A’ B’ C’) + P (A U B U C) =1

Now, P (A solves problem or B solves problem or C solves problem)

= P (A U B U C) =1 - P (A B C)

P (A’) = X does not solve the problem =

P (A’) = 1 – P (A)

= 1 – 3/4 = 1/4

P (B’) = Y does not solve the problem =

P (B’) = 1 – P (B)

= 1 – 2/5 = 3/5

P (C’) = Z does not solve the problem =

P (C’) = 1 – P (C)

= 1 – 5/9 = 4/9

Events A,B,C are independent. Therefore events A, B and C are also

independent.

P (A’ B’ C’) = 1/4 x 3/5 x 4/9 = 1/15

P (A solves problem or B solves problem or C solves problem)

P (Problem is solved)

= P (A U B U C) = 1 – P (A’ B’ C’) = 1- 1/15 = 14/15

and 20% of the total number of items of a factory. The per-

centage of defective outputs of these machines are respec-

tively 3%, 4%, 5%. If an item is selected at random. What’s

the probability that the selected item is defective.

Solution.

A: - Item produced on machine A

B: - Item produced on machine B

C: - Item produced on machine C

D: - Defective selected item

Then we get P (A) = 0.50

P (B) = 0.30

P (C) = 0.20

P (D/A) = 0.03; P (D/B) = 0.04; (D/C) = 0.05

P (D) = P (D A) + P (D B) + P (D C)

= P (D/A) P (A) + P (D/B) P (B) + P (D/C) P (C)

= 0.50 0.03 + 0.30 0.04 + 0.20 0.05

= 0.015 + 0.012 + 0.010

= 0.037

after the swiss Mathematician James Beroulli (1654-1705). This distri-

bution can be used under the following conditions-

fixed number of times. In other words n - the number of trials, is

finite and fixed.

ii. The outcome of the random experiment (trial) results in the clas-

sification of events.

A = A’ - The non-occurence of event = Failure.

the binomial and Poisson.

4.8.1 Properties of binomial distribution

Here n = 4

2. Each trial has only two outcomes. Success or failure. In above

experiment, if we decide getting head us success the probabil-

ity of success is probability of getting head. It is denoted by p.

Thus p = 1/2 and probability of failure is probability of not getting

head. It is denoted by q and q = 1 – p = 1 – 1/2 = 1/2 n and p

called parameters of binomial distribution.

3. maximum number of success = n number of outcomes in a set

is 1 + n. in above case, when coin is tossed 4 times, we might

get outcome as 0 heads, 1 head, 2heads, 3 heads or 4 heads.

i.e. there are 5 outcomes. This distribution can be represented

as x ~ B (n, P) where x is a binomial variable which takes values

from 0 to n. in this case the distribution is represented as

1

x ~ B (4, ).

2

Probability of success is the same for all trials.

4. Outcome of earlier trials does affect the outcome of later trials.

5. Probability of getting x success in a set of n trials is given by

45r4f (x) = n C x p x q (n - x) . This is called as probability mass func-

tion (PMF) e. g. probability of getting 3 heads in above experi-

ment is

4

f (3) = C 2 p 3q (4 -3)

3 1

1 1

= 4

2 2

4 1

= =

16 4

6. Sum of all probabilities is 1. i.e. f (0) + f (1) + f (3) = f(4) = 1 i.e.

f(x) = 1

7. Mean or expected value of x E (x) of binomial experiment is

m = np.

1

In this case expected value of heads E (x) = m = 4 =2

2

1 1

8. Variance of x V(x) = npq. In this case V (x) = 4 =1

2 2

10. The most likely value mode of x is given by the largest integer

less than or equal to (n + 1) p; if m = (n + 1)p is itself an integer,

then m – 1 and m are both modes.

11. Sums of binomials

If x ~ B (n, p) and y ~ B (m, p) are independent binomial vari-

ables, then z = x + y is again a binomial variable then its distri-

bution is z ~ B (m + n, p)

discrete probability distribution. It expresses the probability of a number

of events occurring in a fixed period of time if these events occur with a

known average rate, and are independent of the time since the last event.

(1781 -1840) and published, together with his probability theory, in 1838.

Poisson

Probability mass Function

The horizontal axis is the index x. The function is only non – zero

at integer values of m. The connecting lines are only guides for the eye

and do not indicate continuity.

over a fixed interval. It can also be characterized by an average

rate of occurrence of an event over a fixed lot. e. g. mean arrival

rate of customer in the shop which means average number of

customers visiting a shop per hour where time interval is defined

as 1 hr.

2. There are only two outcomes, success or failure. In above case

arrival of customer can be defined as a success and non arrival

as a failure.

3. It is a limiting case of binomial distribution where probability of

success is low.

4. Average number of success is denoted by m.

5. Poisson distribution has only one parameter which is m

6. distribution can be represented as x ~ Poi(m) where x is a

binomials variable which takes values from 0 onwards.

7. Probability of success in any interval is independent of outcomes

of earlier intervals.

Business Statistics / 144

8. Probability of getting x success in a set of n trials is given by

f (x) = e –m mx / x! This is called as probability mass function

(PMF) where e = 2.71828

e.g. If in above case if average rate of arrival for customers is 2

customers per hour then m = 2 and probability of getting 3 cus-

tomers in an hour is

f (3) = e –m mx / x!

= e -223 /3!

= 4 (1/2)3 (1/2)

= 4/16 = 1/4

9. Sum of all probabilities is 1. i.e. f (0) + f (1) + f (2) + f (3) + f (4)

… = 1 i.e f(x) = 1

10. Mean or expected value of x, E (x) of Poisson experiment is m.

11. Variance of x, V (x) = m. in this case V (x) = 2

13. The mode of a Poisson – distributed random variable with non –

integer m is equal to the largest integer less than or equal to m.

When m is positive integer, the modes are m and m -1

14. Sums of Poisson’s:

If x ~ Poi(m) and y ~ Poi(n) are independent Poisson variables,

then z = x + y is again a Poisson variable then its distribution is

z ~ Poi(m + n )

Poisson distributions include:

(sufficiently distant from traffic lights) during a given period of

time.

The number of spelling mistakes a secretary makes while typing

a single page.

The number of phone calls at a cell centre per minute.

The number of times a web server is accessed per minute.

The number of stars in a given volume of space.

4.10 Normal Distribution

after Carl Friedrich Gauss, a German mathematician, although Gauss

was not the first to work with it), is a probability distribution of great

importance in many fields. It is a family of distributions of the same gen-

eral form, differing in their location and scale parameters the mean (“aver-

age”) and standard deviation (“variability”) respectively. The standard nor-

mal distribution is the normal distribution with a mean of zero and a

variance of one. It is often called the bell curve because the graph of its

probability density resembles a bell.

Cumulative distribution function

same general shape. They are symmetric with scores more concentrated

in the middle that in the tails. Normal distributions are sometimes de-

scribed as bell shaped.

psychological and educational variables are distributed approximately nor-

mally. Measures of reading ability, introversion, job satisfaction and memory

are among the many psychological variables approximately normally dis-

tributed, although the distributions are only approximately normal. They

are usually quite close.

is easy for mathematical Statisticians to work with. This means that many

kinds of Statistical tests can be derived for normal distributions. Almost

all statistical tests discussed in this text assume normal distributions.

Fortunately, these tests work very well even if the distribution is only

approximately normally distributed. Some tests work well even with very

wide deviations from normality.

they differ in how spread out they are. The area under each curve is the

same. The height of a normal distribution can be specified mathemati-

cally in terms of two parameters: the mean ( ) and the standard devia-

tion ( ).

4.10.1 Properties of Normal Distribution

rence is zero. But it specifies the probability of an observation

lying in a certain range.

2. It is a symmetric distribution about mean. The number of obser-

vations go on increasing up to the mean and after crossing the

mean, number of observations go on reducing in the exactly

same way as they increased up to the mean, i.e. Number of

observations are maximum at mean and a vertical line drawn at

the mean divides the distribution into two parts which exactly

match each other.

3. 50% observations are above the mean and 50% observations

are below the mean.

4. In normal distribution, mean = mode = median.

5. Average number of success is denoted by m.

6. Poisson distribution has only one parameter which is m.

7. This distribution can be represented as x ~ N ( , 2) where x is

a normal variable which takes values from – infinity to + infinity.

8. Probability of getting value of observation up to x is given by

1

F(X) = e ( X µ )2

2πσ 2

This is called as probability density function (PDF) where is

the mean and is the standard deviation, is the constant

3.14159, and e is the base of natural logarithms and is equal to

2.718282. x can take on any value from – infinity to + infinity.

9. Total probability of the whole area under the curve = 1.

10. Mean or expected value of x, E (x) of normal distribution is .

11. Variance of x, V (x) = 2 . In this case V (x) = 2

13. The mode of normal distribution =

14. The inflection points of the curve occur at one standard devia-

tion away from the mean, i.e. at – and +

15. Sums of normal distributions

If x ~ N ( 1. 12) and y ~ ( 2, 22) are independent normal

variables, then r = x + y is again a normal variable then its

distribution is r ~ N ( 1+ 2, 12 + 2 2 ) if r = x – y, then r ~

N ( 1 - 2, 12 + 2 2 )

distribution.

µ–σ µ μ+σ

µ – 2σ µ μ + 2σ

68.27% of the area under the curve is within one standard devia-

tion of the mean. ( 1 range i.e. + )

95.45% of the area is within two standard deviations. ( 2

range i.e ± 2 )

99.73% of the area is within three standard deviations. ( 3

range i.e. ± 3 )

99.99% of the area is within four standard deviations. ( 4

range i.e. ± 4 )

99.9999% of the area is within five standard deviations.

(5 range i.e. ±5 )

99.999999% of the area is within six standard deviations.

(6 range i.e. ± 6 )

99.999999999% of the area is within seven standard deviations.

(7 range i.e. ± 7 )

4.10 Check your Progress

2) In normal distribution mean = _______________.

3) PDF _______________.

4) Total probability of the whole area under the curve is equal to

_______________.

5) Values of all three mean, mode, median are equal then distribu-

tion is _______________.

dard normal.

If X ~ N ( , 2 ), then

X

Z =

mean = mode = median = 0

Variance = Standard Deviation = 1

4.11.1 Relations between various parameters for a normal

distribution

standard deviation / average deviation = 1.2533

probable error / standard deviation = 0.6745

probable error / average deviation = 0.8453

probable error / average error = 0.8453

average error/probable error = 1.183

probable error /standard deviation = 0.6745

standard deviation /probable error = 1.4826

IQR = 1.35 X

The first quartile of any normal distribution is located below the

mean and the third quartile is 0.67 above the mean.

4.12 Summary

interual zero and one. That mean both values zero and one included.

The probability value is almost a fraction, but always positive. Distribu-

tion means the expansion, variance of our given data about the central

tendencies like Mean, Mode and Median. With help of that there are

types of distribution like Normal, Skewed, Poisson.

4.2

1) 1 and 0

2) Experiment

3) Head and tail

4) Sample space

5) Dependent event

4.4

1) 1

2) 1

n!

3)

r !( n r ) !

4) 1

5) n

4.10

1) Normal distribution

2) = Median = mode

3) Probability density function

4) One

5) Normal

1) Short Notes –

a) Normal Distribution

b) Probability

2) What is Poisson distribution?

3) Write properties of Poisson distribution?

4) Explain the term, ‘Expected value’

5) Where can be Baye’s Theorem applied?

NOTES

NOTES

CHAPTER – 5

INDEX NUMBERS

5.0 Objectives

5.1 Introduction

5.2 Price and Quantity Relatives

5.3 Price and Quantity Index, Numbers

5.4 Laspeyre’s, Paasche’s Index Numbers

5.5 Advantages & Disadvantages

5.6 Illustration

5.7 Various Index Numbers

5.8 Consumer price index

5.8.1 Calculating a consumer Price Index

5.9 Summary

5.10 Check your Progress – Answers

5.11 Questions for Self - Study

5.0 Objectives

the idea of calculations and proper decisions of economics of business

and get following –

Price and quantity

Numerical to calculate Price Index

Base of weighted Index Number

Advantages of Index Number

5.1 Introduction

value for two different time periods. It is given in terms of percent relative

change.

Let us consider price index, Relative change is the ratio of prices

of one period to the prices of the other period. e.g. If prices of wheat in the

year 1990 and 1999 were Rs. 5/- per kg and Rs. 9/- per kg then the

relative change is 9/5 and the price index of 1999 for wheat with respect to

Index Numbers / 155

the base year 1990 is given by ( 9 ) 100 = 180, which is percent rela-

5

tive change. Thus period with which comparison is made is called as

base period. The index number is independent of unit used for compari-

son. However, prices of the both periods must be expressed in the same

units.

100 then in 1999 the price has grown to 180 and it shows that there is

80% increase in prices of wheat in 1999 compared to prices of wheat in

1990.

Volume, GNP, Price, Quantity, Expenditure etc.

arrive at general picture of the prices of two periods.

If we let P0 be the price in the base period and let PN be the price

in the later period, then the price relative for the price change between

PN

these periods is given by P 100 .

O

Price Relative is given by:

Price of one commodity in the current year

100

Price of the same commodity in the base year

PN

= P 100

O

100

Quantity of the same commodity in the base year

QN

= Q 100

O

5.3 Price and Quantity Index Numbers

single index, we need to calculate the index number taking into account

all commodities. Such index numbers are called as aggregative index

numbers. There are two types in such index numbers: Simple aggregative

index numbers and weighted aggregative index numbers.

100

Sum of prices of all commodities of the base year

year. It is called a base weighted index because we use the quantities

purchased in the base year (here 1990) to weight the unit prices in both

years. Keeping the quantities constant in this way means that any change

in the calculated expenditure is due solely to price changes.

PnQo

The Laspeyre’s price is given by P Q 100 .

o o

ing price relatives. For this method, we have to use the expenditures in

the base year as weights. This sounds more complicated but the reason

we do this, is that it is easier to obtain data on expenditure that on actual

quantities bought when we are dealing with a large complicated index.

For example, cost of living weights are obtained by using sampling in the

Survey of House Expenditure. Indeed for some elements of the cost of

living expenses, ‘quantities’ don’t even make sense. You can’t really talk

about ‘quantities’ of public transport, for example.

Here is the general rule for working out the base weighted or

Laspeyre’s price index using price relatives.

P

N 100 POQ O

Laspeyre’s Price Index = PO

PO QO

Notice that cancelling the PO above and below on the top line and

taking out the factor of 100 gives us the same answer as before.

PNQ O

100

POQ O

The end weighted or Passsche’s price index = (PnQn/ POQN) 100

Passche’s Quantity Index = (QnPn/ QoPn) x 100

The base weighted index has the advantage that we only have to

work out the base year expenditures once. We can then use these in the

calculation of the index in any subsequent period. However, this index

can be misleading in telling us what is actually going on. For example,

the fluctuations in fashion might have a considerable impact on an index.

Suppose that skirts were considered as a separate item in a women’s

clothing manufacturer’s index. The greatly increased relative popularity

of trousers would dramatically affect the quantities sold and any index

which used base year quantities from some time back would be mislead-

ing. The next index that we consider avoids this particular problem.

and Laspeyre’s indices.

only the current year’s prices are required for the calculation of the index,

for the Paasche’s Price Index then the current year quantities and prices

are required. Laspeyre’s Price Index is slightly easier to calculate and

can be obtained in situations where the current year’s quantities are un-

known.

and topical. Especially where the quantities may have changed dramati-

cally.

data comes along as the current period will have changed, substantial

quantity changes the previously calculated Paasche’s Price Indices may

take substantially different values from the newly calculated ones based

upon the new current year quantities. Laspeyre’s Price Index does not

need to be recalculated as the base period remains the same.

In summary Paasche Price Index is slightly more relevant better

interpretation as uses current quantities slightly more inconvenient to

calculate. If both quantities and prices are readily available then it is a

simple matter to calculate both and the combined index. Substantial

differences between the Laspeyre’s and Paasche Price Indices then this

is conveying information about the changes in prices and quantities.

quantities bought. We’ll suppose first that we are particularly interested

in price changes overtime. In complicated situations, where we need to

compare the prices of many items over many different time intervals (such

as for the Retail Price Index) we work with the different prices, and use

the quantities to weight them in different ways for different index

numbers.

5.6 Illustration

year quantity year quantity suitable quantity

1990 1999 1990 1999

Wheat 5 9 500 1200 2500 4500 6000 10800 3000 5400 600

Rice 4 10 600 1800 2400 6000 7200 18000 3200 8000 800

Cereals 7 14 400 900 2800 5600 6300 12600 2100 4200 300

Weight:Base Weight Current Weight : Any

Quantity Index Calculation

year quantity year quantity suitable quantity

Commodity Price Rs/Kg Quantity (tons) (PN /PO) 100 (QN/QO) 100

1990 1999 1990 1999

QO Quantities of Base Year (1990)

QN Quantities of Current Year (1999)

PO Prices of Base Year (1990)

PN Prices of Current Year (1999)

W It is any standard value other than base and current year quan-

tities.

PO = 16

PN = 31

QO = 1500

QN = 3900

POQO = 7700

PO Q N = 19500

PNQ O = 16100

P NQ N = 41400

P

N P 100 630

O = = 210

Number of Commodities 3

= (16100 / 7700) X 100 = 209.09

= (19500 / 7700) X 100 = 253.24

= (41400 / 19500) X 100 = 212.30

6. Passche’s Quantity Index =( QNPN / QOPN) X 100

=(41400 / 16100) X100 = 257.14

=(17600 / 8300) X 100 = 212.04

= (32700 / 12700) X 100 = 257.48

= (41400 / 7700) X 100 = 537.66

listed in the following table.

Bowley index PB 1

( P PP )

2 L

Fisher index PF PL PP

WP VO

Pn

VO

P1

W

P P = P 100 & W P0 P1

o 0

Harmonic mean index PH [ p n qo / ( po 2 qo / pn ] X

100

Marshall-Edge worth index PME 100

[ p n (qo + qn) / po ( qo + q n)] X

100

Walsh index PW pn qo qn

100

p qo qn

One problem in the construction of any index number is choos-

ing a suitable base period. We want a base where prices ( or Volumes )

were not unnaturally high or low. An example of this would be if bad

weather had caused an extreme shortage in a particular crop which then

led to a very high price for it. Also, people are not happy with a base

period which is too far in the past. Furthermore, tastes and availability

can change a great deal over time so such an index could be seriously

misleading. One way sometimes used to avoid these problems is to use

a chain-based system where, in calculating successive index numbers,

the base used is the previous period. A chain-based index number is

particularly suited for period by period comparisons, but a fixed-based

index number makes it easier to compare the movement of prices over

time.

Fill in the blanks:

1. Index Numbers are called ________________ .

2. For Index Number period with which comparison is made is

called as ________________.

3. Po is the price in________________.

4. PN is the price in ________________.

5. The Laspeyre’s price Index Number given by ________________.

6. The Passche’s Quantity Index = ________________.

7. QO = ________________.

8. QN = ________________.

9. Price in base is denoted by ________________.

10. Quantity in current year denoted by ________________.

a) Food: cereals; meat and fish; fruits and vegetables; miscella-

neous food such as dairy products, sugar, tea, pies, etc.

b) Drinks, tobacco and betel nut: soft drinks ( treated waters and

cordials); alcoholic drinks, cigarettes tobacco, betel nut.

c) Clothing and footwear: men and boys’ clothing; women and girls’

clothing; other clothing such as nappies, accessories, etc; foot-

wear.

d) Rent, council charges, fuel and power: dwelling rentals; council

charges for water, sewerage and garbage disposal; electricity

and kerosene

e) Household equipment and operating: durable goods (e, g , sew-

ing machine, kerosene stove ): semi-durable goods ( e, g, sheets,

tableware) non-durable (e, g, matches, laundry soap, insecti-

cides)

f) Transport and communication; motor vehicle purchase; motor

vehicle operation (petrol, oil, repairs, parts, accessories, licenses

and insurance): airline, taxi, bus and public Motor vehicle ( PMV)

fares; telephone and postal charges.

g) Miscellaneous: medical and health care; entertainment and cul-

tural goods and services (e, g, sound equipment, newspapers

and magazines, cinema admissions, education fees): other goods

(e, g, items for personnel care, writing and drawing materials).

a weighted price index for the economy – this is the sort of data which is

then used to calculate the rate of inflation

Category Price Index (I) Weighting(W) Price x weight

100 10437

The price index for each category shows what has happened to

the price level since a base year value. To generate a weighted price

index we multiply the price index for each category by its weight and

then sum these, We then divide by the sum of the weights (100) to find

an overall price index ( 104.37) or 104.4 rounded to one decimal place.

Here is some real world data on a selected of price indices for

goods and services.

Items cars Hand Care

1996 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

Over the period1996-2003 there has been a 10% rise in the gen-

eral price level. But this hides major changes in average prices for differ-

ent products. The average cost of purchasing tobacco products has jumped

by nearly sixty per cent whereas the prices of clothing, second hand

cars and communication have been falling.

Index Numbers for prices in the year 2004 with 2009 where 2004 as nase

year, from following data.

Price Quantity Price Quantity

PO QO P1 Q1

Price 4 15 6 20

Wheat 3 40 5 35

Jawar 5 20 5 25

Pulses 6 10 8 10

Commodity Base year Current year

Price Quantity Price Quantity

A 8 50 10 60

B 10 40 12 50

C 5 100 9 70

D 6 10 8 20

5.9 Summary

measuring the relative changes in value of variables or of a group of re-

lated variables from one period to other place.

Index Number are also called as Economic Barometers. For

measuring the relative changes in price of a single commodity or a group

of commodities. We use Index Number.

5.7

1. Economic Barometers 2. Base year

3. Base year 4. Price in current year

PNQo 100

QNPN 100

5. 6.

PoQo QoPN

9. PO 10. QN

5.8

1. Laspeyre’s Index Number = 138.2

2. Paasche’s Index Number = 135.135

3. Fisher’s Index Number = 136.67.

1) Index Number 2) Laspeyre’s Index Number

3) Advantages of Index Number 4) Consumer’s price Index.

Index Numbers / 167

NOTES

QUESTION BANK

tries reported the following date for the amount of time 4 year

old children spent alone with their fathers each day.

Time with

Country Dad (minutes)

India 60

Belgium 30

Canada 44

China 54

Finland 50

Germany 36

Nigeria 42

Sweden 46

United State 42

For the above sample, determine the following measures:

a. The mean

b. The standard deviation

c. The variance

d. The mode

e. The 75th percentile

subjects were. The following represents their response (M =

Management; A = Accounting; E = Economics; O = Others)

A M M A M M E M O A

E E M A O E M A M A

M A O A M E E M A M

b. Construct a relative frequency distribution.

3. The frequency distribution below was constructed from data col-

lected on the quarts of soft drinks consumed per week by 20

employees of a garden centre.

0-3 4

4-7 5

8-11 6

12-15 3

16-19 2

a. Construct a relative frequency distribution.

b. Construct a cumulative frequency distribution.

c. Construct a cumulative relative frequency distribution.

Rs. 90,000 with a standard deviation of Rs. 180. In 2003, the

average donation was Rs. 1,60,000 with a standard deviation of

Rs. 240. In which year do the donations show a more dispersed

distribution?

5. Profit after tax for a company for the last six years is as given

below. Draw a bar diagram.

ate the advantages of sample method over complete enumera-

tion.

data. What is their purpose?

as a percentage is given below. Construct a histogram from the

given data.

Inventory to sales rate (Percentage) No. of Companies

0 – 5.0 1

5.0 – 10.0 4

10.0 – 15.0 10

15.0 - 20.0 20

20.0 – 25.0 50

25.0 – 30.0 80

30.0 – 35.0 60

35.0 – 40.0 65

40.0 – 45.0 30

9. Find the median, lower, upper quartiles, 4th docile and 70th per-

centile for the following distribution.

Dividend Yield 0-4 4-8 8-12 12-14 14-18 18-20 20-25 25 above

Number of 10 12 18 7 5 8 4 6

Companies

What is the expected value and variance of daily revenue (Y)

from the machine, if X the number of cans sold per day has

E (X) = 125, and Var (X) = 50?

based upon previous claims

Percent loss | 0 25 50 100

Probability | .90 .05 .02 ????

their expected loss in Rs/hectare is approximately.

12. A rock concert producer has scheduled an outdoor concert. If it

is warm that day, she expects to make a Rs. 20,000 profit. If it

is cool that day, she expects to make a Rs. 5000 profit. If it is

very cold that day, she expects to suffer a Rs. 12,000 loss.

Based upon historical records, the weather office has estimated

the chances of a warm day to be 60 the chances of cool day to

be 0.25. What is the producer’s expected profit?

14. If four coins are tossed once, write down the sample space.

15. If three units are tested, each unit will be either Good (G) or

defective (D). Write down the sample space for testing of 3 units.

16. A box contains 200 bulbs of which 20 are defective. If one bulb

is selected at random. Find the probability that is non defective

(Ans 9)

getting exactly 2 hours?

lege is 0.3. If 5 students from the same school apply, what is

the probability that at most 2 are accepted?

19. What is the probability that the series which ends when a team

wins 4 games will last 4 games? 5 games? 6 games? 7 games?

Assume that the teams are evenly matched.

drivers, & 5000 truck drivers. The probability of an accident is

0.05, 0.02 and 0.10 respectively in case of scooter, car and

truck drivers. One of the insured person meet, an accident, what

is probability that he is a car driver?

21. In a certain university the percentage of Hindu, Muslim and Chris-

tians among students is 50.25 and 25 respectively, If 50% of

Hindus, 90% of Muslims and 80% of Christians are smokers.

Find the probability that randomly selected student is a Mus-

lim. Use Baye’s theorem.

which contribute 30%, 20%, 28%, 22% respectively, to the total

output. It was observed that these sections produced 1%, 2 %,

3% & 4% defective units respectively. If a unit is selected of

random and to be defective, what is the probability that is from

S1 or S4.

Mean 10 90

S. D. 3 12

a. Calculate two regressions live

b. Find the likely sales when advertising expenditure is Rs

15 lakhs

c. What should be the advertising expenditure if the com-

pany sales target of Rs. 120 lakhs.

X 43 44 46 40 44 42 45 42 38 40 472 57

Y 29 31 19 18 19 27 27 29 41 30 26 10

25. CALCULATE KARL PEARSON’S Coefficient of Correlation for

the data given below taking 66 and 63 are assumed means of

x + y respectively.

Height of Husband x 60 62 64 66 68 70 72

(in inches)

the following data.

X 1 2 3 4 5 6 7 8 9 10

Y 20 16 14 10 10 9 8 7 6 5

a. Coefficient of determination

b. Rank correlation.

## Molto più che documenti.

Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.

Annulla in qualsiasi momento.