AMF 104 Quantitative Analysis in Management Book PDF

AMF104
QUANTITATIVE ANALYSIS IN MANAGEMENT

PREFACE
Decision making is a fundamental part of the management process and it pervades the
activities of every manager. In fact it is the managers competence as a decision maker
that enables us to distinguish between a good manager and a bad one. Modern
management is adopting and applying quantitative techniques to aid the process of
decision-making in an ever increasing measure. This is due the fact that an intelligent
application of the appropriate tools can reduce an otherwise unweildy and complex
problem to one of the mangeable dimensions. Of course, the quantitative techniques are
expected only to supplement and not to suppliant the managers senss of dcisions
making. He can use analytical tools in a wise manner only he comprehends fully the
underlying assumptions : what the analyses acheives, what compromises the model used
makes with reality, and above all how the conclusions derived are to be adapted to the
dynamic environmental conditions. Doubtless, however a knowledge of quantitative
analysis is a boon to the manager.
In the present book, a modest attempt has been made to discuss some of the commonly
used quantitative techniques in a wide spectrum of decision making situations. This book
is intended to provide a comprehensive presentation of the essential topics of quantitative
techniques.
The book presents the application of various techniques through a sufficient number of
examples. Answers to the end chapter quizzes have been provided at the end of this book.
Kumar Saurabh
Master of Finance & Control Copyright Amity university India
Page 1
AMF104
CONTENTS
1.1 Statistics...............................................................................................................................................3
Internal Data ..................................................................................................................................................5
External Data .................................................................................................................................................5
Secondary Data..............................................................................................................................................5
1.2 Introduction to SPSS, SAS and other Statistical Software Packages ..................................................9
1.3 Diagrammatic & Graphical Presentation of Data: Bar Diagram, Histogram, Pie Diagram,
Frequency Polygons, and O gives ...........................................................................................................10
2.1 Central Tendency ..............................................................................................................................32
2.2 Median ...............................................................................................................................................42
2.3 Mode..................................................................................................................................................45
2.4 Dispersion..........................................................................................................................................46
Page 2
AMF104
CHAPTER 1
INTRODUCTION
1.1 Statistics
What do you mean by statistics? What does it bring to mind; matches records, election
records, unemployment figures? Or is it just the requirement of course? The application
of statistics is limitless it has very broad scope of applications to business, government,
and the physical and social sciences. It is for sure that the time to you spend studying this
subject will repay you in many ways.
The Random House College Dictionary defines statistics as "the science that deals with
the collection, classification, analysis, and interpretation of information or data."
Statisticians are trained in collecting numerical information in the form of data,
evaluating it, and drawing conclusions from it. Furthermore, statisticians determine what
information is relevant in a given problem and whether the conclusions drawn from a
study are to be trusted.
Application of Statistics in Business and Management
H. G. Wells, author of science fiction classics like The War of the Worlds and The Time
Machine once said, "Statistical thinking will one day be as necessary for efficient
citizenship as the ability to read and write." The media every day present us the published
results of political, economic, and social surveys. In increasing government emphasis on
drug and product testing, for example, we see vivid evidence of the need to be able to
evaluate data sets intelligently. Consequently, each of us has to develop a discerning
sense-an ability to use rational thought to interpret and understand the meaning of data.
This ability can help you make intelligent decisions, inferences, and generalizations; that
is, it helps you think critically using statistics.
Statistical thinking constitutes of application of rational thought and the science of
statistics to critically assess data and inferences. The variation that exists in populations
and process data is the fundamental thought of statistics.
Successful managers rely heavily on statistical thinking to help them make decisions.
Every managerial decision-making problem begins with a real-world problem. Flow
diagram for every managerial problem is shown in figure, as-
Page 3
AMF104
Managerial
formulation of Problem
Managerial Question
Relating to Problem
Statistical Formulation
of Question
Real
World
Statistic
al
Analysis
Managerial Solution
to Problem
Answer to Managerial
Question
Answer to
Statistical Question
This problem is then formulated in managerial terms and framed as a managerial

question. The next sequence of steps (proceeding counterclockwise around the flow
diagram) identifies the role that statistics can play in this process. The managerial
question is translated into a statistical question, the sample data are collected and
analyzed, and the statistical question is answered. The next step in the process is using
the answer to the statistical question to reach an answer to the managerial question. The
answer to the managerial question may suggest a reformulation of the original managerial
problem, suggest a new managerial question, or lead to the solution of the managerial
problem.
Basic Concepts of Statistical Studies:
Statistics is the science of data. It involves collecting, classifying, summarizing,
organizing, analyzing, and interpreting numerical information.
Data are collections of any number of related observations. Data can be of any nature as
we can collect total revenue earned by a particular store in a given time or we can
consider revenue generated by given number of stores on a given day, we can call our
result as data. A collection of data is called a data set, and a single observation as a data
point.
The collection of data is the basic step in any statistical investigation . The data are
generally classified in Following two groups:
1. Internal data
2. External data
Data can be obtained from two important sources, namely
a. Internal Source
b. External Source
Depending on the source we have internal data, external data.
Now we will discuss these two data in details.
Page 4
AMF104
Internal Data
Internal data comes from internal sources related with the Functioning of an organization
or firm where records regarding Purchase, production, sales are kept on a regular basis.
Since Internal data originate within the business, collecting the desired Information does
not usually offer much difficulty. The Particular procedure depends largely upon the
nature of facts being collected and the form in which they exist. The problem of internal
data can either be insufficient or inappropriate for the statistical enquiry into a
phenomenon.
External Data
The external data are collected and published by external agencies. It can be further
classified as
1. Primary data
2. Secondary data
Primary Data
Primary data are measurements observed and recorded as part of an original study. When
the data required for a particular study can be found neither in the internal records of the
enterprise, nor in published sources, it may become necessary to collect original data, i.e.,
to conduct first hand investigation. The work of collecting original data is usually limited
by time, money and manpower available for the study. When the data to be collected are
very large in volume, it is possible to draw reasonably accurate conclusions from the
study of a small portion of the group called a sample.
Secondary Data
In statistics the investigator need not begin from the very beginning, he may use and must
take into account what has already been discovered by others. Consequently before
starting a statistical investigation we must read the existing literature and learn what is
already known of the general area in which our specific problem falls. When an
investigator uses the data which has already been collected by others, such data are called
secondary data. Secondary data can be obtained from journals, reports, government
publications, publications of research organizations, etc. However, secondary data must
be used with utmost care. The reason is that such data may be full of errors because of
bias, inadequate size of the sample, substitution, errors of definition, arithmetical errors,
etc. Even if there is no error, secondary data may not be suitable and adequate for the
purpose of inquiry. Before using secondary data the investigator should examine the
following aspects:
a. Whether the data are suitable for the purpose of investigation: for example, if the
object of inquiry is to study the wage levels including allowances of workers and the
data relate to basic wages alone, such data would not be suitable for the immediate
purpose.
b. Whether the data are adequate for the purpose of the investigation: for example, if our
object is to study the wage rates of the workers in the cotton industry in India and if
the available data covers only Maharashtra, it would not solve the purpose.
Page 5
AMF104
c. Whether the data are reliable: to determine the reliability of secondary data is perhaps
the most important and at the same time most difficult job. The following tests if
applied, may be helpful to determine how far the given data are reliable :
1.
2.
3.
4.
5.
6.
Was the collecting agency unbiased or not?

If the enumeration was based on a sample, was the sample representative?
Were the enumerators capable and properly trained?
Was there a proper check on the accuracy of field work?
Were the editing, tabulating, and analysis carefully and conscientiously done?
What degree of accuracy was desired by the compiler? How far was it achieved?
Statistical methods are useful for studying, analyzing, and learning about populations. A
population is a set of units (people, objects, transactions, or events) that we are interested
in studying.
In studying a population, we focus on one or more characteristics or properties of the
units in the population. We call such characteristics variables. For example, we may be
interested in the variables age, gender, income, and/or the number of years of education
of the people currently employed in the Africa.
A variable is a characteristic or property of an individual population unit. The name
"variable" is derived from the fact that any particular characteristic may vary among the
units in a population. For statistical analysis we need numerical data but it is not always
possible to find out numerical one so we need to convert non numeric data in to
numerical data using measurement. Measurement is the process we use to assign
numbers to variables of individual population units. We might, for instance, measure the
preference for a soft drink product by asking a consumer to rate the product's taste on a
scale from 1 to 10.
If the population we wish to study is small, it is possible to measure a variable for every
unit in the population. When we measure a variable for every unit of a population, the
result is called a census of the population. The, I populations of interest in most
applications are much larger, involving perhaps many thousands or even an infinite
number of units.
A reasonable alternative would be to select and study a subset (or portion) of the units in
the population.
A sample is a subset of the units of a population. For example, suppose a company is
being audited for invoice errors. Instead of examining all 51,472 invoices produced by
the company during a given year, an auditor may select and examine a sample of just 100
invoices. On that basis he can measure the status as error or no error of each sampled
invoice.
Page 6
AMF104
Census Vs Sample
Census and Sample Method
Under the census or complete enumeration survey method, data are collected for each and
every unit (person, household, field, shop, factory, etc.) as the case may be of the
population or universe, which is the complete set of items, which are of interest in any
particular situation. For example, if the average wage of workers working in. sugar
industry in India is to be calculated, then wage figures would be obtained from each and
every worker working in the sugar industry and by dividing the total wages which all
these workers receive by the number of workers working in sugar industry, we would get
the figure of average wage.
Merits of Census method
The merits of the census method are:
accurate and reliable.

under some crops and yield thereof, the number a persons of certain age groups,
their distribution by sex, educational, level of people, etc. This is the reason why
throughout the world the population data are obtained by conducting a census
generally every 10 years by the census method.
Data of complete enumeration census can be widely used as a basis for various
surveys.
Demerits of Census Method

However, despite these advantages the census method is not very popularly used in
practice because of the following reasons1. The effort, money and time required for carrying out complete enumeration will
generally be very large and in many cases cost may be so prohibitive that the very idea of
collecting information may have to, be dropped. This is truer of underdeveloped countries
where resources constitute a big constraint.
2. Also if the population is infinite or the evaluation process destroys the population unit,
the method cannot be adopted.
What is universe in Statistics?
The word universe as used in Statistics denotes the aggregate from which the sample is
to be taken. For example, if in the year 1999 there are 200000 students in African
Page 7
AMF104
University and a sample of 5,000 students is taken to study their attitude towards
semester system, then 2,00,000 constitutes the universe and 5,000 the sample size. It
should also be noted that the universe may not necessarily comprise persons. It may
consist of any object or thing, example, if one is interested in knowing the number of cars
and buses in Africa then his universe will comprise the total number of cars and buses.
The universe may be either finite or infinite.
A finite universe is one in which the number of items is determinable, such as the
number of students in any African University or in India. An infinite universe is that in,
which the number of items cannot be determined, such as the number of stars in the sky.
In some cases, the universe is so large that for all practical Purposes it is regarded as
infinite, such as, the number of leaves on a tree.
Sampling is simply the process of learning about the population on the basis of a sample
drawn from it. Thus in the sampling technique instead of every unit of the universe only a
part of the universe is studied and the conclusions are drawn on that basis for the entire
universe. A sample is a subset of population units. The process of sampling involves
three elements:
a. Selecting the sample.
b. Col1ecting the information, and
c. Making an inference about the population.
The three elements cannot generally be considered in isolation from one another. Sample
selection, data collection and estimation are all interwoven and each has an impact on the
others. Sampling is not haphazard selection-it embodies definite rules for selecting the
sample. But having followed a set of rules for sample selection, we cannot consider the
estimation process independent of it estimation is guided by the manner in which the
sample has been selected.
Practical examples of sampling
Although much of the development in the theory of sampling has taken place only in
recent years, the idea of sampling is pretty old. Since times immemorial people have
examined a handful of grains to ascertain the quality of the entire lot. A housewife
examines only two or three grains of boiling rice to know whether the pot of rice is ready
or not. A doctor examines a few drops of blood and draws conclusion about the blood
constitution of the whole body. A businessman places orders for material examining only
a small sample of the same. A teacher may put questions to one or two students and find
out whether the class as a whole is following the lesson. In fact there is hardly any field
where the technique of sampling is not used either consciously or unconsciously.
Page 8
AMF104
It should be noted that a sample is not studied for its own sake. The basic objective of its
study is to draw inference about the population. In other words, sampling is a tool which
helps to know the characteristics of the universe or population by examining only a small
part of it. The values obtained from the study of a sample, such as the average and
dispersion, are known as statistics. On the other hand, such values for population are
called parameters.
What is a Theoretical Basis of Sampling?
Theoretical Basis of Sampling
On the basis of sample study we can predict and generalize the behavior of mass
phenomena. This is possible because there is no statistical population whose elements
would vary from each other without limit For example, wheat varies to a limited extent in
color, protein content, length, weight, etc., it can always be identified as wheat. Similarly,
apples of the same tree may vary in size, color, taste, weight, etc., but they can always be
identified as apples. Thus we find that although diversity is a universal quality of mass
data, every population has characteristic properties with limited variation. This makes
possible to select a relatively small unbiased random sample that can portray fairly well
the traits the population.
1.2 Introduction to SPSS, SAS and other Statistical Software Packages

SPSS means Statistical Package for the Social Sciences is a computer program used for
statistical analysis. SPSS is among the most widely used programs for statistical
analysis in social science. It is used by market researchers, health researchers, survey
companies, government, education researchers, marketing organizations and others. In
addition to statistical analysis, data management (case selection, file reshaping, creating
derived data) and data documentation (a metadata dictionary is stored in the datafile) are
features of the base software.
Statistics included in the base software:
Descriptive statistics: Cross

Descriptive Ratio Statistics
Bivariate statistics: Means, t-test, ANOVA, Correlation (bivariate, partial,

distances), Nonparametric tests
Prediction for numerical outcomes: Linear regression
Prediction for identifying groups: Factor analysis, cluster analysis (two-step, Kmeans, hierarchical), Discriminant.
tabulation, Frequencies,
Descriptives,
Explore,
Page 9
AMF104
SAS stands for Statistical Analysis System. It is a statistical and information system that
performs sophisticated data management and statistical analysis. SAS is available in
multiple computing environments. SAS (Statistical Analysis System) is an integrated
system of software products provided by SAS Institute that enables the programmer to
perform:
data entry, retrieval, management, and mining
report writing and graphics
statistical analysis
business planning, forecasting, and decision support
operations research and project management
quality improvement
applications development
data warehousing (extract, transform, load)
platform independent and remote computing
In addition, SAS has many business solutions that enable large scale software solutions
for areas such as IT management, human resource management, financial
management, business intelligence, customer relationship management and more.
1.3 Diagrammatic & Graphical Presentation of Data: Bar Diagram,

Histogram, Pie Diagram, Frequency Polygons, and O gives
The most convincing and appealing ways in which statistical results may be presented is
through diagrams and graphs. Just one diagram is enough to represent a given data more
effectively than thousand words. Moreover even a layman who has nothing to do with
numbers can also understands diagrams. Evidence of this can be found in newspapers,
magazines, journals, advertisement, etc. An attempt is made in this chapter to illustrate
some of the major types of diagrams and graphs frequently used in presenting statistical
data.
The Charts
A chart can take the shape of either a diagram or a graph. For the sake of clarity we will
discuss them under two separate heads:
a. Diagrams, and
b. Graphs
Diagrams
A diagram is a visual form for presentation of statistical data, highlighting their basic
facts and relationship. If we draw diagrams on the basis of the data collected they will
easily be understood and appreciated by all. It is readily intelligible and save a
considerable amount of time and energy.
Page 10
AMF104
Diagrams and graphs are extremely useful because of the following reasons.
1. They are attractive and impressive.
2. They make data simple and intelligible.
3. They make comparison possible
4. They save time and labour.
5. They have universal utility.
6. They give more information.
7. They have a great memorizing effect.
There are broadly four types of diagrams:
1. One-dimensional diagrams
2. Two-dimensional diagrams
3. Three- dimensional diagrams
4. Pictographs and cartograms
One-dimensional diagrams:
In such diagrams, only one-dimensional measurement, i.e height is used and the width is
not considered. These diagrams are in the form of bar or line charts and can be classified
as
1. Line Diagram
2. Simple Diagram
3. Multiple Bar Diagram
4. Sub-divided Bar Diagram
5. Percentage Bar Diagram
Line Diagram:
Line diagram is used in case where there are many items to be shown and there is not
much of difference in their values. Such diagram is prepared by drawing a vertical line
for each item according to the scale. The distance between lines is kept uniform. Line
diagram makes comparison easy, but it is less attractive. This is the simplest form of
diagram. The height of each line indicates the value of an item that is being measured.
The line diagram is drawn taking a suitable scale.
Example:
Show the following data by a line chart:
No. of Children
Frequency
0
10
1
14
2
9
3
6
4
4
5
2
Page 11
AMF104
Simple Bar Diagrams

Unlike the line diagram, a simple bar diagram shows a width or column. Simple bar
diagram can be drawn either on horizontal or vertical base, but bars on horizontal base
more common. Bars must be uniform width and intervening space between bars must be
equal. While constructing a simple bar diagram, the scale is determined on the basis of
the highest value in the series. It is used to represent only one variable. Suppose we have
the production of an item X for three years.
Example:
Represent the following data by a bar diagram.
Solution:
Multiple Bar Diagram:

When two or more interrelated series of data are depicted by a bar diagram, then such a
diagram is known as a multiple-bar diagram. Suppose we have export and import figures
for a few years. We can display by two bars close to each other, one representing exports
while the other representing imports figure shows such a diagram based on hypothetical
data. Multiple bar diagram is used for comparing two or more sets of statistical data. Bars
are constructed side by side to represent the set of values for comparison. In order to
distinguish bars, they may be either differently colored or there should be different types
of crossings or dotting, etc.
Example:
Draw a multiple bar diagram for the following data.
Page 12
AMF104
Solution:
Subdivided or Component Bar Diagram

As the name of this diagram implies, it shows subdivisions of components in a single bar.
For example, a bar diagram may show the composition of revenue expenditure of the
Government of India. In a sub-divided bar diagram, the bar is sub-divided into various
parts in proportion to the values given in the data and the whole bar represent the total.
Such diagrams are also called Component Bar diagrams. The sub divisions are
distinguished by different colours or crossings or dottings.
The main defect of such a diagram is that all the parts do not have a common base to
enable one to compare accurately the various components of the data.
Example:
Represent the following data by a sub-divided bar diagram.
Solution:
Page 13
AMF104
Percentage bar diagram:

This is another form of component bar diagram. Here the components are not the actual
values but percentages of the whole. The main difference between the sub-divided bar
diagram and percentage bar diagram is that in the former the bars are of different heights
since their totals may be different whereas in the latter the bars are of equal height since
each bar represents 100 percent. In the case of data having sub-division, percentage bar
diagram will be more appealing than sub-divided bar diagram.
Since these data are in absolute terms, we have to convert them into percentages. This has
been done in the following table.
Calculation of Percentage Expenditure under Different Heads
Page 14
AMF104
Broken Bars
In some cases we find that the data may have very wide variations in values in the sense
that some values may be very large while others extremely small. In such cases, larger
bars may be broken to provide space for the smaller bars. An illustration will make this
clear.
Example:
Suppose we are given data pertaining to the number of students enrolled in certain
faculties in a university. The data regarding the enrolment are given below:
Faculty
Number of Students
Arts
1500
Science
2000
Commerce
6800
Management
500
These data are now shown in Fig.
Deviation Bar Diagrams

Deviation bars are used to show both positive and negative values. For instance, we may
show net profit data for a company for some years. It is possible that in one or two years,
instead of earning net profit the company might have sustained net loss. In such a case,
the data on net profit will be displayed above the base line while the data on net loss
below it. The bars can show the absolute data or the percentages as may be thought
proper. Let us take an example
Suppose we have the following data relating to rates of change (percent) in agricultural
production (all crops) for a few years.
Page 15
AMF104
Duo-directional Bar Diagram

A duo-directional bar diagram as its name indicates is a diagram on both sides of the axis
of X. One component is shown above the horizontal line while the other is shown below
it. The two components taken together give the total value of the item displayed. This
will be clear from the example given below.
Suppose we have the following data for two years-1998-99 and 1999-2000.
Items
1998 -99 ('000 Rs)
1999 -2000 ('000 Rs)
Total earnings from production
100
80
Cost of production (all expenses)
60
55
Net Income
40
25
Sliding Bar Diagram

A sliding bar diagram is similar to the duo-directional bar diagram. The only difference
between the two is that the latter is based on the absolute values, as we have seen above,
while the former is based on percentages. Sliding bar diagrams can be shown in either
way - horizontally or vertically.
Page 16
AMF104
Example:
Suppose we are given the following data to be presented by a sliding bar diagram. Results
of the B.Com. examination of three colleges affiliated to a certain university
College
Pass
Fail
A
60
40
B
75
25
C
80
20
Pyramid Diagram
A pyramid diagram shows a number of horizontal bars, which are arranged in such a
manner as to give an appearance of a pyramid. Such diagrams are suitable to present data
on population, occupation, education, and so forth. Suppose we have the following data
pertaining to the inhabitants of a locality.
Figure displays these data in the form of a pyramid diagram.
Two-dimensional Diagrams:
In one-dimensional diagrams, only length is taken into account. But in two-dimensional
diagrams the area represents the data and so the length and breadth have both to be taken
Page 17
AMF104
into account. Such diagrams are also called area diagrams or surface diagrams. The
important types of area diagrams are:
The three forms of such diagrams are:
1. rectangles,
2. squares and
3. circles,
Rectangles:
Rectangles are used to represent the relative magnitude of two or more values. The area
of the rectangles is kept in proportion to the values. Rectangles are placed side by side for
comparison. When two sets of figures are to be represented by rectangles, either of the
two methods may be adopted. We may represent the figures as they are given or may
convert them to percentages and then subdivide the length into various components. Thus
the percentage sub-divided rectangular
diagram is more popular than sub-divided rectangular since it enables comparison to be
made on a percentage basis.
Example:
Represent the following data by sub-divided percentage rectangular diagram.
Solution:
The items of expenditure will be converted into percentage as shown below:
Page 18
AMF104
Percentage Rectangular Diagram

The foregoing rectangular diagrams were based on the absolute values of different items
of expenditure, total cost, total revenue and profit. The same data on these items can be
presented in the form of percentage rectangular diagrams. In such a case, the area
pertaining to each item represents its percentage to the total, which is taken as 100. For
this purpose, we have to first convert absolute values in terms of percentages and then use
them in the rectangular diagram. These percentages for different items for the two firms
A and B are shown below.
Percentages to the total
Item
Raw material
cost
Labor cost
Other
overhead
expenses
Miscellaneous
expenses
Total cost
Profit
Total revenue
Firm A
Percentage
Cumulative
Percentage
33.3
33.3
Firm B
Percentage
Cumulative
Percentage
38.3
38.3
23.3
13.4
56.6
70
16.4
8.2
54.7
62.9
10
80
2.7
65.6
20
100
80
100
34.4
100
65.6
100
Page 19
AMF104
Squares:
In a square diagram the length and width are of the same dimension. Sometimes a square
diagram is preferable to rectangle. This is because when a comparison is involved
between any two phenomena but their magnitudes are widely different from each other, it
is difficult to draw rectangles on the same page. One rectangle would have very wide
width while the other a very small width. This apart, a meaningful comparison between
the two diagrams would be quite difficult. To overcome this difficulty, square diagrams
are preferred. The method of drawing a square diagram is very simple. One has to take
the square root of the values of various item that are to be shown in the diagrams and then
select a suitable scale to draw the squares.
Example :
Yield of rice in Kgs. per acre of five countries are
Represent the above data by Square diagram.

Solution: To draw the square diagram we calculate as follows:
Page 20
AMF104
Circular or Pie Diagrams
Another type of diagram, which is more commonly used than the square diagrams, is the
circular or pie diagram. In such diagrams, both the total and the component parts or
sectors can be shown. The area of a circle is proportional to the square of its radius.
While making comparisons, pie diagrams should be used on a percentage basis and not
on an absolute basis. In constructing a pie diagram the first step is to prepare the data so
that various components values can be transposed into corresponding degrees on the
circle. The second step is to draw a circle of appropriate size with a compass. The size of
the radius depends upon the available space and other factors of presentation. The third
step is to measure points on the circle and representing the size of each sector with the
help of a protractor.
Example:
Draw a Pie diagram for the following data of production of sugar in quintals of various
countries.
Solution:
The values are expressed in terms of degree as follows.
Page 21
AMF104
The pie diagram is also known as an angular sector diagram though in common usage the
term pie diagram is used. It is advisable to adopt some logical arrangement, pattern or
sequence while laying out the sectors of a pie chart. Usually, the largest sector is given at
the top and others in a clock-wise sequence. The pie chart should also provide
identification for each sector with some kind of explanatory or descriptive label. If the
space within the chart is sufficient, the labels can be placed inside the sectors, otherwise
these should be shown outside the circle, using an arrow or pointing out to the concerned
sector.
Limitations of pie diagrams
There are certain limitations of pie diagrams.
1. They are not as effective as bar diagrams for accurate reading and interpretation. This
limitation becomes all the more obvious when the series are divided into a large number
of components or the differences among the components are too small. When a series
comprises more than five or six categories, pie chart would not be a proper choice since it
would be confusing to differentiate the relative values of several small sectors having
more or less the same size.
2. Although pie diagram is frequently used, it turns out to be inferior to a bar diagram
whether it is simple bar or a divided bar or a multiple bar.
Three-dimensional diagrams:
Three-dimensional diagrams, also known as volume diagram, consist of cubes, cylinders,
spheres, etc. In such diagrams three things, namely length, width and height have to be
taken into account. Of all the figures, making of cubes is easy. Side of a cube is drawn in
proportion to the cube root of the magnitude of data.
Example:
Represent the following data by volume diagram.
Solution:
The sides of cubes can be determined as follows
Page 22
AMF104
Pictograms and Cartograms:

Pictograms are not abstract presentation such as lines or bars but really depict the kind of
data we are dealing with. Pictures are attractive and easy to comprehend and as such this
method is particularly useful in presenting statistics to the layman. When Pictograms are
used, data are represented through a pictorial symbol that is carefully selected.
Cartograms or statistical maps are used to give quantitative information as a geographical
basis. They are used to represent patial distributions. The quantities on the map can be
shown in many ways such as through shades or colours or dots or placing pictogram in
each geographical unit.
Choice of a Suitable Diagram
The foregoing discussion shows that a variety of diagrams are available, which can be
used to display statistical data. At times it may be difficult for one to select a particular
type of diagram from among several types. However, there are some major guidelines
that should be taken into consideration while deciding on the mode of data presentation.
These guidelines are:
1. The nature of data
2. Purpose of the diagrammatic presentation
3. The type of audience or people for whose understanding the diagrammatic presentation
is made.
The first consideration is the type of data to be displayed. As we have seen earlier, when
there are wide variations in the data, one-dimensional diagrams may not be suitable.
Instead, two diagrams seem to be more appropriate. Apart from the nature of the data, the
purpose of the diagram is also a relevant factor while deciding as to how data should be
displayed. As far as the data along with the break-up in the form of several components
are concerned, subdivided bar diagrams would be suitable. Circle diagrams can also be
used in such cases. However, if the magnitudes of components are very small, circular
diagrams would not be the right choice as the angles showing the magnitudes would be
so small that it would be difficult to show them distinctly. Finally, the type of audience,
that is, the people who would be using the diagram or for whose understanding it is being
drawn should be considered. Simple diagrams, which are easily understandable, should
be used for the layman. Psychologists hold the view that most people are able to compare
bar heights with greater ease than compare pie slices. As all these considerations are very
relevant, one should consider the pros and cons of different modes of display in each case
before making a final choice. Above all, one should be aware of a major limitation of
diagrams. While they can be useful up to a certain point, they cannot take the place of
tabular form of data presentation.
Page 23
AMF104
GRAPHICAL PRESENTATION OF DATA

Like diagrammatic presentation, graphical presentation also gives a visual effect.
Diagrammatic presentation is used to present data classified according to categories and
geographical aspects. On the other hand, graphical presentation is used in situations when
we observe some functional relationship between the values of two variables. There are
many forms of graphs, which can be broadly classified as:
1. Time series or line graphs
2. Frequency graphs.
Time Series Graph
As the name implies, it shows the data against time, which could be any measure such as
hours, days, weeks, months and years. It may be noted that the time series graphs are very
commonly used where one variable is the time factor. The data on population, number of
workers in an industrial unit or the entire industry, number of students enrolled in a
college, profits earned by a company, sales of a given product are some of the examples
where time series can be effectively used to present the given series. It may also be noted
that in the same graph we can present two or more related series.
Example :
Represent the following data by volume diagram.
Solution: The sales data are shown in figure as given below:
Graph Showing sales from 1990 1996

A graph of this type clearly shows that there was a decline in sales in 1993 while all other
years, 1991 to 1996, showed an increase in the same.
Frequency Graphs
We now come to another category of graphs known as frequency graphs. The following
types of frequency graphs are normally used:
Page 24
AMF104
1. Line Graph
2. Histogram
3. Polygon
4. Frequency Curve
5. Ogive
A brief discussion of these along with a suitable illustration in each case now follows.
Line Graph
We have earlier seen line graph in the form of a time series. It may be noted that a line
graph is also used to present a discrete frequency distribution. On the axis of X is
measured the size of the items while on the axis of Y is measured the corresponding
frequency. Suppose we have the following data relating to the number of persons in
households in a given locality.
Example:
Solution: Data are shown in Line graph as-
Histogram:
A histogram is a bar chart or graph showing the frequency of occurrence of each value of
the variable being analyzed. In histogram, data are plotted as a series of rectangles. Class
intervals are shown on the X-axis and the frequencies on the Y-axis. The height of
each rectangle represents the frequency of the class interval. Each rectangle is formed
with the other so as to give a continuous picture. Such a graph is also called staircase or
block diagram. However, we cannot construct a histogram for distribution with open-end
classes. It is also quite misleading if the distribution has unequal intervals and suitable
adjustments in frequencies are not made.
Example 10: Draw a histogram for the following data.
Page 25
AMF104
Solution:
Frequency Polygon
A frequency polygon like any polygon consists of many angles. A histogram can be
easily transformed into a frequency polygon by joining the mid-points of the rectangles
by straight lines. It may be noted that instead of transforming a histogram into a
frequency polygon, one can draw straightaway a frequency polygon by taking the midpoint of each class-interval and by joining the mid-points by the straight lines. Another
point to note is that this can be done only when we have a continuous series. In ease of a
discrete distribution, this is not possible. Instead of having a frequency polygon, we can
have a relative frequency polygon. While the relative frequency polygon has the same
shape as the frequency polygon drawn from the same data, it has a different scale of
values son the vertical axis. Instead of having absolute frequencies in the relative
frequency polygon will be the number of frequencies in each class as a proportion of the
total number of frequencies.
Example:
Draw a frequency polygon for the following data.
Page 26
AMF104
The advantages of histogram are:

1. Each rectangle shows distinctly separate class in the distribution.
2. The area of each rectangle in relation to all other rectangles shows the proportion of
the total number of observations pertaining to that class.
Frequency polygons, too, have certain advantages.
1. The frequency polygon is simpler as compared to its histogram.
2. The frequency polygon shows more vividly an outline of the data pattern.
3. As the number of classes and the number of observations increase, so also the
frequency polygon becomes increasingly smooth.
Frequency Curve:
If the middle point of the upper boundaries of the rectangles of a histogram is corrected
by a smooth freehand curve, then that diagram is called frequency curve. The curve
should begin and end at the base line.
Example:
Draw a frequency curve for the following data.
Page 27
AMF104
Solution:
Ogives:
For a set of observations, we know how to construct a frequency distribution. In some
cases we may require the number of observations less than a given value or more than a
given value. This is obtained by a accumulating (adding) the frequencies up to the give
value. This accumulated frequency is called cumulative frequency. These cumulative
frequencies are then listed in a table is called cumulative frequency table. The curve table
is obtained by plotting cumulative frequencies is called a cumulative frequency curve or
an ogive. There are two methods of constructing ogive namely:
1. The less than ogive method
2. The more than ogive method.
In less than ogive method we start with the upper limits of the classes and go adding the
frequencies. When these frequencies are plotted, we get a rising curve. In more than
ogive method, we start with the lower limits of the classes and from the total frequencies
we subtract the frequency of each class. When these frequencies are plotted we get a
declining curve.
Page 28
AMF104
Example:
Draw the Ogives for the following data.
Solution:
Page 29
AMF104
END CHAPTER 1 QUIZZES

Answer the following questions, 1 to 10, choosing the correct answer
1. Bar diagram is a
(a) one-dimensional diagram
(b) two-dimensional diagram
(c) diagram with no dimension
(d) none of the above
2. Data represented through a histogram can help in finding graphically the
(a) mean
(b) mode
(c) median
(d) all the above
3. Ogives can be helpful in locating graphically the
(a) mode
(b) mean
(c) median
(d) none of the above
4. Data represented through arithmetic line graph help in understanding
(a) long term trend
(b) cyclicity in data
(c) seasonality in data
(d) all the above
5. Which of the following is one dimensional diagram.
(a) Bar diagram
(b) Pie diagram
(c) Cylinder
(d) Histogram
6. Percentage bar diagram has
(a) data expressed in percentages
(b) equal width
(c) equal interval
(d) equal width and equal interval
7. Frequency curve
(a) begins at the origin
(b) passes through the origin
(c) begins at the horizontal line.
(d) begins and ends at the base line.
Page 30
AMF104
8. With the help of histogram we can draw

(a) frequency polygon
(b) frequency curve
(c) frequency distribution
(d) all the above
9. Ogives for more than type and less than type distribution intersect at
(a) mean
(b) median
(c) mode
(d) origin
10. Which of the following is not a one-dimensional graph(a) Bar graph
(b) pie graph
(c ) O gives
(d) Cube
Page 31
AMF104
Chapter 2
Summary Statistics
In the last chapter, we discussed the techniques of classification and tabulation, which
help in summarizing the collected data and presenting them in the form of a frequency
distribution. The resulting pictures of frequency distribution illustrate trends and pattern
in the data. Sometime we need more exact measures. In these cases we can use single
numbers called summary statistics to describe characteristics of data set.
A large number of numerical methods are available to describe quantitative data sets.
Most of these methods measure one of two data characteristics:
Central Tendency- The central tendency of the set of measurements that is, the tendency
of the data to cluster, or center, about certain numerical values. Central tendency is the
middle of a distribution. Measures of central tendency are also called measures of
location.
Dispersion- The variability of the set of measurements that is, the spread of the data. It is
the extent to which the observations are scattered.
2.1 Central Tendency

The concept of a measure of central tendency is concerned only with quantitative
variables and is undefined for qualitative variables as these are immeasurable on a scale.
Various Measures of Central TendencyArithmetic Mean
Weighted Mean
Median
Mode
Arithmetic Means- In classification and tabulation of data, we observed that the values
of the variable or observations could be put in the form of any of the following statistical
series, namely:
1. Individual series or ungrouped data.
Page 32
AMF104
2. Discrete frequency distribution.

3. Continuous frequency distribution.
Two methods are used for calculating arithmetic mean of an individual series.
a. Direct method.
b. Short-cut method.
The arithmetic mean is obtained by adding all the observations and dividing the sum by
the number of observations i.e. if the respective values of N observations recorded on a
variable X are X1, X2, ., XN; then the mean, denoted by X , is defined by
Here
be written as X /N .
Example: Calculate the mean of the following five sample measurements: 5 , 3 , 8 , 5 , 6 .

Solution: Using the definition of sample mean and the summation notation, we find
Thus, the mean of this sample is 5.4.
It may be noted that the Greek letter is used to denote the mean of the population and n
to denote the total number of observations in a population. Thus the population mean
= X/N
Page 33
AMF104
The direct method of calculating arithmetic mean is used when the number of items in the
series is relatively less. If the items are more and figures are large enough, the
computation of mean becomes difficult. This difficulty can be solved by using the shortcut method. Under this method an assumed mean is taken as the basis of calculation. The
assumed mean is usually chosen to be a neat round number in the middle of the range of
the given observations, so that deviations can be easily obtained by subtraction. Then a
short-cut formula, based on deviations from assumed mean, for calculating arithmetic
mean is-
= A + d/N
Where X = the arithmetic mean,

A= the assumed mean,
d = (X-A), the deviation of each value of the variable from the assumed mean A.
d = the sum of the deviations.
Example
Calculate the arithmetic mean from the following table:
Student No.
Marks
15 20 25 19 12 11 13 17
9
18
10
20
Solution
Page 34
AMF104
In this example, the data ranges from 11 to 25. Therefore, 18, a neat round value in the
middle of 11 and 25, may be taken as assumed mean, i.e. A=18. The deviations and sum
of deviations needed in formula may be calculated from the table given above.
Using above formula, the arithmetic mean will be
=18 + (-10)/10
=17
It can be seen that the arithmetic mean obtained by the direct and short-cut method is the
same.
Arithmetic mean in a discrete frequency distribution:

We can use the following three methods for computing arithmetic mean in a discrete
frequency distribution:
1. Direct Method
2. Short-cut Method
3. Step-Deviation Method
Page 35
AMF104
Direct Method
In discrete frequency distribution we multiply the values of the variable (X) by their
respective frequencies (f) and get the sum of the products (
fX ). The sum of the
f / N . Thus according to
this method, the formula for calculating arithmetic mean becomes:
fX = the sum of the products of observations with their respective frequencies.

fX = the sum of the frequencies.
Example
Following table gives the wages paid to 125 workers in a factory. Calculate the arithmetic
mean of the wages.
Wages (Rs.):
200 210 220 230 240 250 260
No. of workers : 5
15
32
42
15
12
Solution:
Short-cut Method
According to this method, the formula for calculating arithmetic mean is
Page 36
AMF104
Where A= assumed mean

D= (X-A), the deviation of X from the assumed mean A.
fd =
Step-Deviation Method
In the short-cut method of calculating mean, the deviations taken from an assumed mean
generally have a common factor. In continuous frequency distributions this common
factor is nothing but the uniform class-interval. The computational work in short-cut
method can be further simplified if the deviations obtained are divided by this common
factor. The deviations divided by the common factor are called step deviations.
According to this method, the arithmetic mean is calculated by the formula:
where h = common factor in deviations, d=X-A
Page 37
AMF104
the step-deviations.
fd`= the sum of the products of the step-deviations and their
respective frequencies.
Using formula, the arithmetic mean becomes
Arithmetic Mean in Case of Grouped Data (continuous frequency distribution)

For grouped data, arithmetic mean may be calculated by applying any of the following
methods:
1. Direct method
2. Short-cut method
3. Step-deviation method
In the case of direct method, the formula
Page 38
AMF104
Here X is mid-point of various classes, f is the frequency of each class and N is the
total number of frequencies. The calculation of arithmetic mean by direct method is
shown below.
Example
The following table gives the marks of 58 students in Statistics. Calculate the average
marks of this group.
Marks
No. of Students
10-10
20-20
30-30
11
40-40
15
50-50
12
60-60
70-70
Total
58
Solution:
= 1940/58
= 33.45 marks
Page 39
AMF104
the mid-point of each class is taken as a good approximation of the true mean of the class.
This is based on the assumption that the values are distributed fairy enough throughout
the interval.
In the case of short-cut method, the concept of arbitrary mean is followed. The formula
for calculation of arithmetic mean by the short-cut method is given below:
where, A= arbitrary or assumed mean

f = frequency
d= deviation from the arbitrary or assumed mean.
When the values are extremely large and / or in fractions, the use of the direct method
would be very cumbersome. In such cases, the short-cut method is preferable.
Advantages and Disadvantages of Arithmetic MeanAdvantages of arithmetic mean are many as it is simple and known to everyone. Second,
every data set has a mean. It can be calculated and it is unique. Disadvantage of
arithmetic mean is it can be affected by extreme values that are not representative of the
rest of the data. Second disadvantage of the arithmetic mean is that we are unable to
compute the mean for a data set that has open ended classes as either the high or low end
of the scale.
Weighted Arithmetic Mean

One of the limitations of the arithmetic mean is that it gives equal importance (weight) to
all the items in the series. But there are cases where the relative importance of all the
Page 40
AMF104
items is not equal. Weighted arithmetic mean is the correct tool for measuring the central
tendency of the given observations in such cases. Here, the term weight stands for the
relative importance of different items or observations.
The formula for calculating weighted arithmetic mean is as follows:
Here, Xw
W= the weights attached to values of the variable
X= the values of the variable.
Example
Suppose a student has secured the following marks in three tests:
Mid-term test 30
Laboratory 25
Final 20
The simple arithmetic mean will be (30+25+20)/3 = 25
However, this will be wrong if three tests carry different weights on the basis of their
relative importance. Assuming that the weights assigned to the three tests are:
Mid-term test 2 points
Laboratory
3 points
Final
5 points
Solution
On the basis of this information, we can now calculate a weighted mean as shown below
Page 41
AMF104
= W1X1+W2X2+W3X3 / W1+W2+W3
= 60+75+100/ 2+3+5
= 23.5 marks
It will be seen that weighted mean gives a more realistic picture than the simple or
unweighted mean.
2.2 Median
Median is defined as the value of the middle item (or the mean of the values of the two
middle items) when the data are arranged in an ascending or descending order of
magnitude. Thus, in an ungrouped frequency distribution if the n values are arranged in
ascending or descending order of magnitude, the median is the middle value if n is odd.
When n is even, the median is the mean of the two middle values.
Suppose we have the following series:
15, 19, 21, 7, 33, 25, 18 and 5
We have to first arrange it in either ascending or descending order. These figures are
arranged in an ascending order as follows:
5, 7, 10, 15, 18, 19, 21, 25, 33
Now as the series consists of odd number of items, to find out the value of the middle
item, we use the formula (N+
Where N is the number of items. In this case, n is 9, as such
(N
Page 42
AMF104
Suppose the series consists of one more item, 23. We may, therefore, have to include 23
in the above series at an appropriate place, that is, between 21 and 25. Thus, the series is
now 5, 7, 10, 15, 18, 19, 21, 23, 25, and 33. Applying the above formula, the median is
the size of 5.5th item. Here, we have to take the average of the values of 5th and 6th item.
This means an average of 18 and 19, which gives the median as 18.5.
It may be noted that the formula (N+
merely indicates the position of the median, namely, the number of items we have to
count until we arrive at the item whose value is the median.
Determination of Median in a Continuous Frequency Distribution

In the case of a continuous frequency distribution, we first locate the median class by
cumulating the frequencies until (N/2)th point is reached. Finally, the median is
calculated by linear interpolation with the help of the following formula:
Where M = the median

l1 = the lower limit of the class in which the median lies
l2 = the upper limit of the class in which the median lies
f = the frequency of the class in which the median lies
c= the cumulative frequency of the class preceding the one in which the median lies
Example:
Page 43
AMF104
In order to calculate median in this case, we have to first provide cumulative frequency to
the table. Thus, the table with the cumulative frequency is written as:
Now, Median is the value of N/2=143/2=

1,400). Thus (1,200-1,400) is the median class. For determining the median in this class,
we use interpolation formula as follows:
Page 44
AMF104
2.3 Mode
The mode is another measure of central tendency. It is the value at the point around
which the items are most heavily concentrated. As an example, consider the following
series:
8, 9, 11, 15, 16, 12, 15, 3, 7, 15
There are ten observations in the series wherein the figure 15 occurs maximum number of
times- three. The mode is therefore 15. The series given above is a discrete series; as
such, the variable cannot be in fraction. If the series were continuous, we could say that
the mode is approximately 15, without further computation.
In the case of grouped data, mode is determined by the following formula:
Where, l1 = the lower value of the class in which the mode lies
f1= the frequency of the class in which the mode lies
f0 = the frequency of the class preceding the modal class
f2 = the frequency of the class succeeding the modal class
i = the class- interval of the modal class
It should ensure that the class-intervals are uniform throughout. If the class-intervals are
not uniform, then they should be made uniform on the assumption that the frequencies
are evenly distributed throughout the class. In the case of in equal class-intervals, the
application of the above formula will give misleading results.
Example
Let us take the following frequency distribution:
Page 45
AMF104
Solution
We can see from Column (2) of the table that the maximum frequency of 12 lies in the
class-interval of 60-70. This suggests that the mode lies in this class-interval. Applying
the formula given earlier, we get:
2.4 Dispersion
An average, such as the mean or the median only locates the centre of the data. An
average does not tell us anything about the spread of the data. Dispersion also known as
Scatter, spread or variation, measures the items vary from some central value. It measures
the degree of variation. Dispersion is used to determine the reliability of an average and
to facilitate comparison and control also to facilitate the use of other statistical measures.
The measure of central tendency serves to locate the center of the distribution, but they
do not reveal how the items are spread out on either side of the center. This characteristic
of a frequency distribution is commonly referred to as dispersion. In a series all the items
are not equal. There is difference or variation among the values. The degree of variation
is evaluated by various measures of dispersion. Small dispersion indicates high
uniformity of the items, while large dispersion indicates less uniformity.
Page 46
AMF104
Characteristics of a good measure of dispersion:

An ideal measure of dispersion is expected to possess the following properties
1. It should be rigidly defined
2. It should be based on all the items.
3. It should not be unduly affected by extreme items.
4. It should lend itself for algebraic manipulation.
5. It should be simple to understand and easy to calculate
There are two kinds of measures of dispersion, namely

1. Absolute measure of dispersion
2. Relative measure of dispersion.
Absolute measure of dispersion indicates the amount of variation in a set of values in
terms of units of observations. For example, when rainfalls on different days are available
in mm, any absolute measure of dispersion gives the variation in rainfall in mm. On the
other hand relative measures of dispersion are free from the units of measurements of the
observations. They are pure numbers. They are used to compare the variation in two or
more sets, which are having different units of measurements of observations. The various
absolute and relative measures of dispersion are listed below.
There are various tools to measure dispersion, namely Range, Average Deviation,
Standard Deviation, Variance and Coefficient of Variation.
Range:
This is the simplest possible measure of dispersion and is defined as the difference
between the largest and smallest values of the variable.
In symbols, Range = L S.
Where L = Largest value.
S = Smallest value.
Page 47
AMF104
The range only takes into account the most extreme values. This may not be
representative of the population.
Example: A sample of five accounting graduates revealed the following salaries: 22,000,
28,000, 31,000, 23,000, 24,000.
The range is 31,000-22,000= 9,000
In individual observations and discrete series, L and S are easily identified. In continuous
series, the following two methods are followed.
Method 1:
L = Upper boundary of the highest class
S = Lower boundary of the lowest class.
Method 2:
L = Mid value of the highest class.
S = Mid value of the lowest class
Co-efficient of Range = (L-S)/(L+S)
Example: Find the value of range and its co-efficient for the following data.
7, 9, 6, 8, 11, 10, 4
Solution:
L=11, S = 4.
Range = L S = 11- 4 = 7
Coefficient of Range = (11-4)/(11+4)
= 7/15
= 0.4467
Example: Calculate range and its co efficient from the following distribution.
Page 48
AMF104
Size:
60-63
Number: 5
63-66 66-69
18
69-72
72-75
27
42
Solution:
L = Upper boundary of the highest class.
= 75
S = Lower boundary of the lowest class.
= 60
Range = L S = 75 60 = 15
Co-efficient of Range = (L-S)/(L+S)
= (75-60)/(75+60)
= 15/135
= 0.1111
Merits and Demerits of Range :

Merits:
1. It is simple to understand.
2. It is easy to calculate.
3. In certain types of problems like quality control, weather forecasts, share price
analysis, etc., range is most widely used.
Demerits:
1. It is very much affected by the extreme items.
2. It is based on only two extreme observations.
3. It cannot be calculated from open-end class intervals.
4. It is not suitable for mathematical treatment.
5. It is a very rarely used measure.
Page 49
AMF104
Limitations of Range:
1. It does not indicate the direction of variability.
2. It does not present very accurate picture of the variability.
Uses of Range:
1. It facilitates to study quality control.
2. It facilitates to study variations of prices on shares, debentures, bonds and agricultural
commodities.
3. It facilitates weather forecasts.
Average Deviation:
The range and quartile deviation are not based on all observations. They are positional
measures of dispersion. They do not show any scatter of the observations from an
average. The average deviation is measure of dispersion based on all items in a
distribution. The average deviation takes into consideration all of the values.
Mean deviation is the arithmetic mean of the absolute values of the deviations from the
arithmetic mean. It is the arithmetic mean of the deviations of a series computed from any
measure of central tendency; i.e., the mean, median or mode, all the deviations are taken
as positive i.e., signs are ignored. According to Clark and Schekade, Average deviation
is the average amount scatter of the items in a distribution from either the mean or the
median, ignoring the signs of the deviations.
Coefficient of mean deviation:

Mean deviation calculated by any measure of central tendency is an absolute measure.
For the purpose of comparing variation among different series, a relative mean deviation
is required. The relative mean deviation is obtained by dividing the mean deviation by the
average used for calculating mean deviation.
Coefficient of mean deviation: = Mean deviation/ (Mean or Median or Mode)

If the result is desired in percentage,
the
Page 50
AMF104
Computation of mean deviation Individual Series:

1. Calculate the average mean, median or mode of the series.
2. Take the deviations of items from average ignoring signs and denote these deviations
by |D|.
4. Divide this total obtained by the number of items.

M.D. = D /n
Example: Calculate mean deviation from mean and median for the following data:
100,150,200,250,360,490,500,600,671 also calculate coefficients of M.D.
Solution:
Mean = X = X/n = 3321/9 = 369
Now arrange the data in ascending order

100,150, 200, 250, 360, 490, 500, 600, 671
Median= Value of (n+1/2)th item

= Value of (9+1/2)th item
= Value of 5th item
=360
Page 51
AMF104
Standard Deviation:
Standard deviation is the most commonly used measure of dispersion. It is similar to the
average deviation; the standard deviation takes into account the value of every
observation. The values of the mean deviation and the standard deviation should be
relatively similar. The standard deviation uses the squares of the residuals.
As we have seen range is unstable, average deviation neglects algebraic signs of the
deviations, a measure of dispersion that does not suffer from any of these defects and is at
the same time useful in statistic work is standard deviation. It is represented by (read as
sigma); 2 i.e., the square of the standard deviation is called variance. Here, each
deviation is squared.
The measure is calculated as the average of deviations from arithmetic mean. To avoid
positive and negative signs, the deviations are squared. Further, squaring gives added
weight to extreme measures, which is a desirable feature for some types of data. It is a
square root of arithmetic mean of the squared deviations of individual items from their
arithmetic mean. The mean of squared deviation, i.e., the square of standard deviation is
known as variance. Standard deviation is one of the most important measures of
variation used in Statistics. Let us see how to compute the measure in different situation.
Standard deviation, denoted as, S.D., can be calculated using following steps1. Find the sum of the squares of the residuals
2. Find the mean
3. Then take the square root of the mean
= ((x- x)2/n)0.5
Standard Deviation in Case of Ungrouped Data without Using Arithmetic Mean
Direct Method
The formula in this case, can be expressed as
SD = (( 2 x/n)-(x2/n))0.5
Page 52
AMF104
where x2 ,x denotes respectively the sum of the squares and that of the terms of the
given data.
Deviation Method
By assuming a standard point of reference, we can find from the formula
SD=
Where dx = x - A,d2x,dx and n denote the sum f squares of deviations, the sum of the
deviations and the number of items respectively.
Standard Deviation in Case of Grouped Data

In case of grouped data, we find the deviation of the midpoint of class interval form the
mean ( dx ), square it ( d2x ), multiply this by the frequency of the class ( fd2x ), sum all
the classes, ( fd2x ). This we divide by the total of all frequencies, f = n, to get ( f
d2x)/n which give variance. To get standard deviation, we take square root.
Standard Deviation in Case of Grouped Data Without Using Mean

Direct Method
Find the midpoint of the class interval, square it, multiply it with the frequencies and find
the sum. Similarly, find the sum of products of midpoint of the class with frequencies.
The formula is
where xi, is the mid-point of the i-th class-interval and , are the respective frequencies.
Page 53
AMF104
Deviation Method
Just like the method we discussed to find the Arithmetic mean, we can also find standard
deviation using class-interval steps. This method can be used only when the classintervals are the same size. In this method, we choose a point as assumed mean. Then the
interval of length preceding it will have a value of 1 and succeeding it will have a value
of +1. The step interval deviation so calculated is squared (dx2) and multiplies with the
frequency (f dx2). It is summed over all the class intervals (dx 2). The sum of all the
products of frequency with step-deviation ( fdx2 ) is also found. Then the standard
deviation is found using the formula.
SD=
We can also use the deviations of the mid-point of the class intervals from an assumed
point for dx in which case, c will not appear in the formula.
Example
Find the standard deviation from the following ungrouped data by using as well as not
using the mean.
25, 32, 43, 53, 62, 59, 48, 31, 24, 33.
Solution
Using mean
Page 54
AMF104
Direct Method
Total
Co-efficient of Variation
This is another relative measure of dispersion where the coefficient of respective absolute
measure is multiplied by 100 to convert the figures into percentages or in relation to 100.
When dispersions are measured by the median and quartiles the respective co-efficients
of variations are:
(i)
Co-efficient of Mean variation or C.V.

= (Mean deviation/Mean or Median) *100
(ii)
Co-efficient of Quartile Variation or CV

= CO-efficient of Quartile Deviation *100
= ((Q3-Q1)/(Q3+Q1)) * 100
(iii)
Co-efficient of Variation (Karl Pearsons Measure)

= SD/ X * 100
of the above three measures of coefficients of variation, the one given by Karl Pearson is
mostly commonly used. The importance of such relative measures expressed in
percentages is two-fold. Firstly, it provides a proper basis for comparison when there are
frequency distributions having variables expressed in different units and also when the
size of the distributions is varying. Finally, the expression in percentages enables better
grasp of the magnitude of the deviations in a number of distributions which would not be
possible simply by co-efficient of dispersion.
Page 55
AMF104
END CHAPTER 2 QUIZZES

Answer the following questions, 1 to 10, choosing the correct answer
1. Which of the following is not a measure of central tendency
(a). Mean
(b). Median
(c ). Mode
(d). Standard Deviation
2. Calculate the mean of the following five sample measurements: 5 , 3 , 8 , 5 , 6
(a). 5.4
(b). 5.0
(c). 5.9
(d). None of these
3. Which of the following is an advantage of arithmetic mean is(a). It is simple
(b). every data set has mean
(c). None of the above
(d). Both of the above
4. Median of odd number terms of series is(a). (N+1)/2th term
(b). N/2th term
(c). None of the above
(d). Both of these
5. Which of the following is not a good characteristic of dispersion
(a). It should be rigidly defined
(b). It should be based on all the items.
(c). It should not be unduly affected by extreme items.
(d). None of these
Page 56
AMF104
6. Find the value of range and its co-efficient for the data 7, 9, 6, 8, 11, 10, 4
(a). 7, 0.4467
(b). 5, 0.4467
(c). 7, 0.5
(d). 5, 0.5
7. Find the standard deviation from the following ungrouped data 25, 32, 43, 53, 62, 59,
48, 31, 24, 33
(a). 13.2363
(b). 3.9087
(c). 12.3456
(d). 14.7897
8. Which of the following is not a merit of range
(a). It is simple to understand.
(b). It is easy to calculate.
(c). It is very much affected by the extreme items.
(d). None of these
9. Dispersion means:
(a). is the middle of distribution
(b). variability of distribution
(c ). None of these
(d). both of these
10. Weighted means gives
(a). equal important to every data
(b). more weight to important data
(c). None of these (d). Both of these
Page 57
AMF104
CHAPTER 3
FORECASTING TECHNIQUES
CONTENTS
Forecasting Techniques ...............................................................................................................................59
3.1 Estimation Using the Regression Line ..............................................................................................59
3.2 Scatter Diagram .................................................................................................................................66
3.3 Karl Pearsons Coefficient of Correlation .........................................................................................67
3.4 Time Series Analysis .........................................................................................................................69
3.5 Forecasting ........................................................................................................................................79
Page 58
AMF104
Forecasting Techniques
So, far we have studied problems relating to one variable only. In practice we come
across large number of problems involving the use of two or more than two variables. If
two quantities vary in such a way that movements in one are accompanied by movements
in other, these quantities are correlated.
Managers make personal and professional decisions based on prediction of future events.
For this they rely on relationship between known and what is to be estimated. If decision
makers can determine how the known is related to the unknown, they can aid the
decision-making process considerably.
Regression and correlation analyses show us how to determine both the nature and the
strength of relationship between two variables. In regression analysis, we develop an
estimating equation i.e. a mathematical formula that relates the known variables to the
unknown variable. Afterwards we can apply correlation analysis to determine the degree
to which the variables are related.
Relationship Type
Regression and correlation analyses are based on the relationship, or association, between
two (or more) variables. The known variable is called the independent variable. The
variable which is need to predict is dependent variable.
Scatter Diagrams
First step in determining whether there is a relationship between two variables is to
examine the graph of the observed data. This graph or chart is called the scatter diagram.
Scatter diagram gives two types of information. First, we can look for patterns that
indicate that the variables are related. Then, if variables are related, then what kind of
estimating equation, describes this relationship.
3.1 Estimation Using the Regression Line

In the preceding chapter, it was mentioned that the first step in the correlation analysis is
to plot the values of the two series, X and Y, on the graph and to have a visual idea of the
scatter diagram that emerges. In that chapter, we had also given four different types of
scatter diagrams to indicate different types of relationships between two variables. These
were: positive, negative, spurious and non-linear, here we shall show how to calculate the
regression line by using an equation that relates the two variables. It may be noted that
we are concerned here with the linear relationship between two variables only. The
equation for a straight line is:
Y = a + bX
Page 59
AMF104
where Y is the dependent variable, X is the independent variable, a is the Y-intercept,

which is the point at which the regression line crosses the Y-axis (the vertical axis) and b
is the slope of the regression line. It should be noted that the values of both a and b will
remain constant for any given straight line. On the basis of this equation, we can find the
value of Y for any value of X if values of a and b are known to us.
Suppose that a is 6 and b is 3 and we have to find the value of Y if X is 10.
Y = a + bX
Now how to determine the values of two constants a and b? The slope of the straight line
can be numerically calculated by the following formula:
b = (Y2 Y1) / (X2 X1)
On the basis of this equation, we can now find the values of the dependent variable Y for
varying values of X.
Least Squares Method
In a scatter diagram, if a straight line is drawn, it obviously does not touch several points.
In fact, it may not pass through any of the scattered points. Some points may be above the
straight line while the remaining points may be below it. Surely, there must be a
technique to draw a line that is best fit. Such a straight line will minimize the error
between the estimated point that lies on the line and the actual point based on the data
given. The technique used for this purpose is known as the method of least squares.
We have used Y so far to represent the individual value of the observed points on the Yaxis. A new symbol Yc (computed or estimated value of Y) is used to represent
individual values of the estimated points, that is, those points that actually lie on the
estimating line. In view of this, the equation for the estimating line becomes Yc = a + bX.
The difference between the actual point and the estimated point is shown by a vertical
line in respect of each of the seven points either above or below it.
We may raise a pertinent issue here: which would give a better fit-one or two points
which are far away from the observed points in an estimating equation or a few points
showing small differences from the original or observed points? We can visualize that the
second alternative-smaller differences between the estimated points and the observed
points would give a better fit. This leads us to square the individual differences or errors
before summing them up. This process of squaring serves two purposes. First, it
magnifies the larger differences or errors. Second, the effects of positive and negative
differences get cancelled. This is because the square of negative differences or errors is a
positive figure. This process is known as the least square method as it minimizes the sum
of the squares of the error in the estimating line.
With the help of this method of least squares, we can find out whether one estimating line
is better than the other. But, as a very large number of estimating lines can be drawn on
the basis of the same set of data, it is extremely difficult for us to say that the estimating
line that we have obtained is the best fitting line. To overcome this problem, a set of two
equations known as the normal equations is used to determine both the Y-intercept and
the slope of the best-fitting regression line. The two normal equations are:
Y
Page 60
AMF104
XY
X
X2
where
Y = the total of Y series
n = number of observations
X = the total of X series
XY = the sum of XY column
X2 = the total of squares of individual items in X series
a and b are the Y-intercept and the slope of the regression line, respectively. We now take
up an example to illustrate the use of the two normal equations. Given the following data,
find the regression equation of Y and X.
Example:
X 2 3
Y 7 9
4
10
5
14
6
15
Solution
We have now to set up a worksheet to get the values of the terms shown earlier.
Worksheet for Computing Correlation
X
Y
XY
X2
2
7
14
4
3
9
27
9
4
10
40
16
5
14
70
25
6
15
90
36
X = 20 Y = 55
XY = 241
X2 =90
Substituting these values in the normal equations given above
55 = 5a + 20b (i)
241 = 20a + 90b (ii)
Solving these we get,
a = 2.6
b = 2.1
Therefore, the regression equation of Y on X is
Y = 2.6 + 2.IX
Alternative Approach
We can use an alternative approach, which involves the use of two formulae-one to
calculate the Y-intercept and the other to calculate the slope. The formula for calculating
the slope is
b=
XY-
nXY )/( X 2-
nX2 )
In order to apply the above formula, we should know the values of X and Y in addition to
XY
X2 .
X =X /n =20/5 =4
Y = Y /n
Page 61
AMF104
Calculating the value of b from above equation, we get

b = 2.1
The formula for calculating the Y-intercept of the line is
a
Y bX
where a is Y-intercept. Applying this formula to calculate the Y-intercept, we get
a = 2.6
Hence, the regression equation is Y = 2.6 + 2.1X (same as was obtained earlier by
applying the normal equations). On the basis of this regression, we can find the value of
Y for any value of X. Suppose that we have to ascertain the value of Y where X is 14.
Applying this value of X = 14 in the above equation,
We are now clear as to how the regression line is obtained. The question is how to check
the accuracy of our results. One method is to draw a scatter diagram with original data
pertaining to X and Y series and then to fit a straight line. This graph will give a visual
idea about the suitability of the straight line fitted. A more refined and, therefore, better
approach is based on the mathematical properties of a line fitted by the method of least
squares. This means that the positive and the negative errors (i.e. differences between the
original data points and the calculated points) must be equal so that when all individual
errors are added together, the result is zero.
X
2
Y
7
Yc
2.6 + (2.1 *
Y-Yc
-1.0
-0.2
Total 0
Here, the calculated value of Y is shown as Yc. We find that the sum of positive errors Y
- Yc, is equal to 1.2. The same is true for negative errors. Thus, the sum of the column YYc, comes to zero. This means problem solved is correct.
Regression Coefficients
So far our discussion on regression analysis related to finding the regression of Y on X. It
is just possible that we may think of X as dependent variable and Y as an independent
one. In that case, we may have to use X = a + bY as an estimating equation. Then, the
normal equations will be
X = na +b Y
XY = aY + Y2
X, Y,
XY, Y2 and n. Once these values
are known, we may enter them in the two normal equations. The equations then can be
solved in the same manner as in the case of regression of Y on X.
i. Regression equation of Y on X
Y
Y
r(s y /s x )(X X )
The term r(s y /s x
XY
X2
Page 62
AMF104
ii. Regression equation of X on Y

X
X
r(s x /s y )(Y Y )
The term r(s y /s x
XY / Y2
It may be noted that the square root of the product of two regression coefficients is the
value of the coefficient of correlation.
We may write
bxy or r(s x /s y
xy
y2
byx or r(s x /s y
xy/x 2
r
bxy *b yx )^0.5
Another point to note is that x and y are the deviations in X and Y series from their
arithmetic means.
Example:
X
Y
2
7
3
9
4
10
5
14
6
15
20
55
x =X- X
-2
-1
0
1
2
x = 0
x2
4
1
0
1
4
x
y =Y- Y
-4
-2
-1
3
4
y
y2
16
4
1
9
16
xy
8
2
0
3
8
y2 = 46
xy = 21
X =20/5 = 4
Y =55/5 = 11
Regression equation of X on Y:
X
X =r(s x /s y )(Y Y )
On putting the values, we get
Y = 2.6 + 2.1X
Also,
r
bxy *b yx)^0.5
xy
y 2 )*( xy/x2 ))^0.5
= ((21/46)*(21/10))^0.5
= (0.9587)^0.5
= 0.98
Regression equation of X on Y is
X
X
r(s x /s y )(Y Y )
or X = 40 + 0.5 (10/9) (Y - 45)
or X = 40 + 0.556 (Y - 45)
or X = 40 + 0.556Y- 25.02
or X = 14.98 + 056Y
In order to estimate the value of Y for X = 48, we have to use the regression equation of
Y on X
Y = 27 + 0.45X
when X= 48
or Y= 27 + 21.6
or Y = 48.6
Page 63
AMF104
Properties of Regression Coefficients

At this stage, we should know the properties of the regression coefficients, which are
given below.
1.
The coefficient of correlation is the geometric mean of the two regression
coefficients.
2. As the coefficient of correlation cannot exceed 1, in case one of the regression
coefficients is greater than 1, then the other must be less than 1.
3. Both the regression coefficients will have the same sign, either positive or negative. If
one regression coefficient is positive, then the other will also be positive.
4. The coefficient of correlation and the regression coefficients will have the same sign. If
the regression coefficients are positive, then the correlation coefficient will also be
positive and vice versa.
5. The average of the two regression coefficients will always be greater than the
correlation coefficient. Symbolically,
This should be obvious as r happens to be the square root of the two regression
coefficients.
6. Finally, regression coefficients are not affected by change of origin. But this is not the
case in respect of scale.
The Standard Error of Estimate
We now turn to the concept of the standard error of estimate. It is the measure of the
spread of observed values from the estimated ones, expressed by regression equation.
This concept is similar to the standard deviation, which measures the variation of
individual items about the arithmetic mean. As in the case of standard deviation, we can
find the standard error of an estimate by calculating the mean of squares of deviations
between the actual or observed values and the estimated values based on the regression
equation. It can be written symbolically
where Yc is the calculated or estimated value of Y. Y is the actual or observed value of

variable n is the total number of observations Syx is the standard deviation of regression
of Y values on X values It may be noted that the sum of the squared deviations is divided
by n-2 and not by n This is because we have lost 2 degrees of freedom in estimating the
regression line. One can reason out that while obtaining values of a and b from the
sample data, 2 degrees of freedom have been lost while estimating the regression line
from these points. There is a short-cut method for finding the standard error of estimate.
The formula is
where
X = values of the independent variable
Y = values of dependent variable
Page 64
AMF104
a = Y-intercept
b = slope of the estimating equation
n = number of observations
It should be obvious that this formula gives a short-cut method. When we estimate the
regression equation, all the values, which we need, are already determined.
Example
Suppose that we have been given the following data pertaining to two series X and Y
X
40 30
20
50
60 40
20
60
Y
100 80
60 120 150 90 70
130
X series indicates advertising expenditure in thousand rupees and Y series relates to sales
in units. We are told the regression equation is Yc =24.444 + 1.889 X. We are asked to
calculate the standard error of estimate.
Solution
X
Y
40
100
30
80
20
60
50
120
60
150
40
90
20
70
60 130
Yc
100
81
62
119
138
100
62
138
(Yi-Yc)
(Yi-Yc)2
0
-1
1
-2
4
1
1
12
144
-10
100
8
64
-8
64
Total: 378
= (378/6)^0.5
= (63)^0.5
= 7.9 (standard error)
Higher the magnitude of the standard error of estimate, the greater is the dispersion or
variability of points around the regression line. In contrast, if the standard error of
estimate is zero, then we may take it that the estimate in equation is the best estimator of
the dependent variable. In such a case, all the points would lie on the regression line. As
such, there would be no point scattered around the regression line.
SIMPLE LINEAR CORRELATION

As the summer heat rises, hill stations, are crowded with more and more visitors. Icecream sales become more brisk. Thus, the temperature is related to number of visitors and
sale of ice-creams. Correlation analysis is a means for examining such relationships
systematically. It deals with questions such as:
Page 65
AMF104
Is there any relationship between two variables? If the value of one variable changes,
does the value of the other also change? Do both the variables move in the same
direction? How strong is the relationship?
The purpose of correlation analysis is to measure and interpret the strength of a linear or
nonlinear (eg, exponential, polynomial, and logistic) relationship between two continuous
variables. When conducting correlation analysis, we use the term association to mean
linear association
Simple linear correlation is a measure of the degree to which two variables vary together,
or a measure of the intensity of the association between two variables. The parameter
being measure is r (rho) and is estimated by the statistic r, the correlation coefficient. r
can range from -1 to 1, and is independent of units of measurement. The strength of the
association increases as r approaches the absolute value of 1.0. A value of 0 indicates
there is no association between the two variables tested.
A better estimate of r usually can be obtained by calculating r on treatment means
averaged across replicates. Correlation does not have to be performed only between
independent and dependent variables. Correlation can be done on two dependent
variables. The X and Y in the equation to determine r do not necessarily correspond
between an independent and dependent variable, respectively.
Methods measuring CorrelationWidely used techniques for the study of correlation are scatter diagrams, Karl Pearsons
coefficient of correlation and Spearmans rank correlation. A scatter diagram visually
presents the nature of association without giving any specific numerical value. A
numerical measure of linear relationship between two variables is given by Karl
Pearsons coefficient of correlation. A relationship is said to be linear if it can be
represented by a straight line. Another measure is Spearmans coefficient of correlation,
which measures the linear association between ranks assigned to individual items
according to their attributes. Attributes are those variables which cannot be numerically
measured such as intelligence of people, physical appearance, honesty etc.
3.2 Scatter Diagram

A scatter diagram is a useful technique for visually examining the form of relationship,
without calculating any numerical value. In this technique, the values of the two variables
are plotted as points on a graph paper. The cluster of points, so plotted, is referred to as a
scatter diagram. From a scatter diagram, one can get a fairly good idea of the nature of
relationship. In a scatter diagram the degree of closeness of the scatter points and their
overall direction enable us to examine the relationship. If all the points lie on a line, the
correlation is perfect and is said to be unity. If the scatter points are widely dispersed
around the line, the correlation is low. The correlation is said to be linear if the scatter
points lie near a line or on a line.
Scatter plots are a useful means of getting a better understanding of your data.
Page 66
AMF104
3.3 Karl Pearsons Coefficient of Correlation

This is also known as product moment correlation and simple correlation coefficient. It
gives a precise numerical value of the degree of linear relationship between two variables
X and Y. The linear relationship may be given by Y = a + bX This type of relation may
be described by a straight line. The intercept that the line makes on the Y-axis is given by
a and the slope of the line is given by b. It gives the change in the value of Y for very
small change in the value of X. On the other hand, if the relation cannot be represented
by a straight line as in Y = X2 the value of the coefficient will be zero. It clearly shows
that zero correlation need not mean absence of any type of relation between the two
variables. Let X1, X2, ..., XN be N values of X and Y1, Y2 ,..., YN be the corresponding
values of Y. In the subsequent presentations the subscripts indicating the unit are dropped
for the sake of simplicity. The arithmetic means of X and Y are defined as
and their variances are as follows
The standard deviations of X and Y respectively are the positive square roots of their
variances. Covariance of
X and Y is defined as
Where x = X-X and y = X Y are the deviations of the ith value of X and Y from their
mean values respectively.
The sign of covariance between X and Y determines the sign of the correlation
coefficient. The standard deviations are always positive. If the covariance is zero, the
correlation coefficient is always zero. The product moment correlation or the Karl
Pearsons measure of correlation is given by
Page 67
AMF104
Strong and weak are words used to describe correlation. If there is strong correlation, then
the points are all close together. If there is weak correlation, then the points are all spread
apart. There are ways of making numbers show how strong the correlation is. These
measurements are called correlation coefficients. The best known is the Pearson
product-moment correlation coefficient. If the answer is 1, then there is strong
correlation. If the answer is -1, then there is weak correlation. Another kind of
correlation coefficient is Spearman's rank correlation coefficient.
Importance if Correlation AnalysisThe study if correlation is of immense use in impractical life in view of following:
1. Most of the variables in economics and business area show relationship.
2. Once the correlation is established between two variables, regression analysis
helps us to estimate value of dependent variable for the given value of
independent variable.
3. Correlation analysis together with regression analysis helps us to understand the
behavior of various social and economic variables.
4. The effect of correlation is to reduce the range of uncertainty in our predictions.
There are following types of Correlations1. Positive or Negative
2. Linear or Non-linear
Positive and Negative CorrelationPositive or direct correlation refers to the movement of variables in the same direction.
The correlation is said to be positive or direct when the increase (decrease) in the value of
one variable is accompanied by an increase (decrease) in the value of the other variable.
Linear and Non-Linear Correlation
In perfect linear correlation the amount of the change in one variable bears a constant
ratio to the amount of change in one variable bears a constant ratio to the amount of
change in the other. The graph of variables having such a relation will be a straight line.
On the other hand, in non-linear correlation, the amount in one variable does not bear a
constant ratio to the amount of change in other variable.
Properties of Correlation Coefficient
r has no unit. It is a pure number. It means units of measurement are not part of r.
A negative value of r indicates an inverse relation. A change in one variable is
associated with change in the other variable in the opposite direction.
Page 68
AMF104
If r is positive the two variables move in the same direction.

If r = 0 the two variables are uncorrelated. There is no linear relation between them.
However other types of relation may be there.
If r = 1 or r = 1 the correlation is perfect. The relation between them is exact.
A high value of r indicates strong linear relationship. Its value is said to be high when it
is close to +1 or 1.
A low value of r indicates a weak linear relation. Its value is said to be low when it is
close to zero.
The value of the correlation coefficient lies between minus one and plus one, 1
r<=
r is outside this range it indicates error in
calculation.
The value of r is unaffected by the change of origin and change of scale.
3.4 Time Series Analysis

Arrangement of statistical data in chronological order ie., in accordance with occurrence
of time, is known as Time Series. Such series have a unique important place in the field
of Economic Business statistics. A business man is interested in finding out his likely
sales in the near future, so that the businessman could adjust his production accordingly
and avoid the possibility of inadequate production to meet the demand. In this connection
one usually deal with statistical data, which are collected, observed or recorded at
successive intervals of time. Such data are generally referred to as time series.
According to Mooris Hamburg A time series is a set of statistical observations arranged
in chronological order Ya-Lun- chou defining the time series as A time series may be
defined as a collection of readings belonging to different time periods, of some economic
variable or composite of variables. A time series is a set of observations of a variable
usually at equal intervals of time. Here time may be yearly, monthly, weekly, daily or
even hourly usually at equal intervals of time.
The Primary purpose of the analysis of time series is to discover and measure all types of
variations which characterise a time series. The central objective is to decompose the
various elements present in a time series and to use them in business decision making.
Components of Time series:
The components of a time series are the various elements which can be segregated from
the observed data. The following are the broad classification of these components.
In time series analysis, it is assumed that there is a multiplicative relationship between

these four components.
Page 69
AMF104
Symbolically,
Y=T
Where Y denotes the result of the four elements; T = Trend; S = Seasonal component; C
= Cyclical components; I = Irregular component
In the multiplicative model it is assumed that the four components are due to different
causes but they are not necessarily independent and they can affect one another. Another
approach is to treat each observation of a time series as the sum of these four
components. Symbolically
Y = T + S+ C+ I
The additive model assumes that all the components of the time series are independent of
one another.
1) Secular Trend or Long - Term movement or simply Trend
2) Seasonal Variation
3) Cyclical Variations
4) Irregular or erratic or random movements (fluctuations)
Secular Trend:
It is a long term movement in Time series. The general tendency of the time series is to
increase or decrease or stagnate during a long period of time is called the secular trend or
simply trend.
Methods of Measuring Trend:
Trend is measured by the following mathematical methods.
1. Graphical method
2. Method of Semi-averages
3. Method of moving averages
4. Method of Least Squares
Graphical Method:
This is the easiest and simplest method of measuring trend. In this method, given data
must be plotted on the graph, taking time on the horizontal axis and values on the vertical
axis. Draw a smooth curve which will show the direction of the trend. While fitting a
trend line the following important points should be noted to get a perfect trend line.
(i) The curve should be smooth.
(ii) As far as possible there must be equal number of points above and below the trend
line.
(iii) The sum of the squares of the vertical deviations from the trend should be as small as
possible.
(iv)If there are cycles, equal number of cycles should be above or below the trend line.
(v) In case of cyclical data, the area of the cycles above and below should be nearly
equal.
Example:
Fit a trend line to the following data by graphical method.
Year
1996 1997 1998 1999 2000 2001 2002
Sales (in Rs 000) 60
72
75
65
80
85
95
Solution:
Page 70
AMF104
The dotted lines refers trend line

Merits:
1. It is the simplest and easiest method. It saves time and labour.
2. It can be used to describe all kinds of trends.
3. This can be used widely in application.
4. It helps to understand the character of time series and to select appropriate trend.
Demerits:
1. It is highly subjective. Different trend curves will be obtained by different persons for
the same set of data.
2. It is dangerous to use freehand trend for forecasting purposes.
3. It does not enable us to measure trend in precise quantitative terms.
Method of semi averages:
In this method, the given data is divided into two parts, preferably with the same number
of years. For example, if we are given data from 1981 to 1998 i.e., over a period of 18
years, the two equal parts will be first nine years, i.e., 1981 to 1989 and from 1990 to
1998. In case of odd number of years like 5,7,9,11 etc, two equal parts can be made
simply by omitting the middle year. For example, if the data are given for 7 years from
1991 to 1997, the two equal parts would be from 1991 to 1993 and from 1995 to 1997,
the middle year 1994 will be omitted.
After the data have been divided into two parts, an average of each part is obtained. Thus
we get two points. Each point is plotted at the mid-point of the class interval covered by
respective part and then the two points are joined by a straight line which gives us the
required trend line. The line can be extended downwards and upwards to get intermediate
values or to predict future values.
Example :
Draw a trend line by the method of semi-averages.
Year
1991 1992 1993 1994 1995 1996
Sales Rs in (1000) 60 75 81 110 106 117
Solution:
Divide the two parts by taking 3 values in each part.
Page 71
AMF104
Difference in middle periods = 1995 1992 = 3 years

Difference in semi averages = 111 72 = 39
Trend of 1991 = Trend of 1992 -13
= 72-13 = 59
Trend of 1993 = Trend of 1992 +13
= 72 + 13 = 85
Similarly, we can find all the values
The following graph will show clearly the trend line.
Merits:
1. It is simple and easy to calculate
2. By this method every one getting same trend line.
3. Since the line can be extended in both ways, we can find the later and earlier estimates.
Demerits:
1. This method assumes the presence of linear trend to the values of time series which
may not exist.
2. The trend values and the predicted values obtained by this method are not very
reliable.
Method of Moving Averages:
This method is very simple. It is based on Arithmetic mean. Theses means are calculated
from overlapping groups of successive time series data. Each moving average is based on
value covering a fixed time interval, called period of moving average and is shown
against the center of the interval. The method of odd period of moving average is as
Page 72
AMF104
follows. The moving averages for three years is (a+b+c)/3, (b+c+d)/3, (c+d+e)/3, etc.
Formula for five yearly moving average is (a+b+c+d+e)/5, (b+c+d+e+f)/5,
(c+d+e+f+g)/5, etc.
Steps for calculating odd number of years are following:
1. Find the value of three years total, place the value against the second year.
2. Leave the first value and add the next three years value (ie 2nd, 3rd and 4th years
value) and= put it against 3rd year.
3. Continue this process until the last years value taken.
4. Each total is divided by three and placed in the next column.
Example:
Calculate the three yearly averages of the following data.
Year
1975 1976 1977 1978 1979 1980 1981 1982 1983
1984
Production in (tones)
50
36
43
45
39
38
33
42
41
34
Solution:
Even Period of Moving Averages:

The middle period of each set of values lies between the two time points in case of even
moving period. So we must center the moving averages.
The steps are
1. Find the total for first 4 years and place it against the middle of the 2nd and 3rd year in
the third column.
2. Leave the first year value, and find the total of next four-year and place it between the
3rd and 4th year.
3. Continue this process until the last value is taken.
4. Next, compute the total of the first two four year totals and place it against the 3rd year
in the fourth column.
5. Leave the first four years total and find the total of the next two four years totals and
place it against the fourth year.
Page 73
AMF104
6. This process is continued till the last two four years total is taken into account.
7. Divide this total by 8 (Since it is the total of 8 years) and put it in the fifth column.
These are the trend values.
Example: The production of Tea in India is given as follows. Calculate the Four-yearly
moving averages
Year
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002
Production (tones) 464 515 518 467 502
540 557
571 586 612
Solution:
Merits:
1. The method is simple to understand and easy to adopt as compared to other methods.
2. Method is flexible as mere addition of more figures to the data will not change the
entire calculation. That will produce only some more trend values.
3. Regular cyclical variations can be completely eliminated by a period of moving
average equal to the period of cycles.
4. It is particularly effective if the trend of a series is very irregular.
Demerits:
1. It cannot be used for forecasting or predicting future trend, which is the main objective
of trend analysis.
2. The choice of the period of moving average is sometimes subjective.
3. Moving averages are generally affected by extreme values of items.
4. It cannot eliminate irregular variations completely.
Page 74
AMF104
Method of Least Square:

This method is widely used. It plays an important role in finding the trend values of
economic and business time series. It helps for forecasting and predicting the future
values. The trend line by this method is called the line of best fit.
The equation of the trend line is y = a + bx, where the constants a and b are to be
estimated so as to minimize the sum of the squares of the difference between the given
values of y and the estimate values of y by using the equation. The constants can be
obtained by solving two normal equations.
y = na + bx . (1)
xy = ax + bx2 (2)
Here x represent time point and y are observed values. n is the number of pair- values.
When odd numbers of years are given
Step 1: Writing given years in column 1 and the corresponding sales or production etc in
column 2.
Step 2: Write in column 3 start with 0, 1, 2 .. against column 1 and denote it as X
Step 3: Take the middle value of X as A
Step 4: Find the deviations u = X - A and write in column 4
Step 5: Find u2 values and write in column 5.
Step 6: Column 6 gives the product uy
Now the normal equations become
y = na + bu
where u = X-A
uy = au + bu2
Since u = 0, From above equation
a
y/n
uy = bu2
b
y/u2
y = a + bu = a + b (X - A)
Example:
Fit a straight line trend by the method of least squares for the following data.
Year
1983 1984 1985 1986 1987 1988
Sales (Rs. in lakhs) 3
8
7
9
11 14
Also estimate the sales for the year 1991
Page 75
AMF104
Solution:
u = (X-A)/(1/ 2)
= 2 (X
The straight line equation is
y = a + bX = a + bu
The normal equations are
y = na .(1)
uy
u2 (2)
From (1) 52 = 6a
a = 52/6
= 8.67
From (2) 66 = 70 b
b = 66/70
= 0.94
The fitted straight line equation is
y = a+bu
y = 8.67+0.94(2X-5)
y = 8.67 + 1.88X - 4.7
y = 3.97 + 1.88X
The trend values are
Put X = 0, y = 3.97 X = 1, y = 5.85
X = 2, y = 7.73
X = 3, y = 9.61
X = 4, y = 11.49
X = 5, y = 13.37
The estimated sale for the year 1991 is; put X = x 1983
= 1991 1983 = 8
y = 3.97 + 1.88 8
= 19.01 lakhs
The following graph will show clearly the trend line.
Page 76
AMF104
Merits:
1. Since it is a mathematical method, it is not subjective so it eliminates personal bias of
the investigator.
2. By this method we can estimate the future values. As well as intermediate values of the
time series.
3. By this method we can find all the trend values.
Demerits:
1. It is a difficult method. Addition of new observations makes recalculations.
2. Assumption of straight line may sometimes be misleading since economics and
business time series are not linear.
3. It ignores cyclical, seasonal and irregular fluctuations.
4. The trend can estimate only for immediate future and not for distant future.
Seasonal Variations:
Seasonal Variations are fluctuations within a year during the season. The factors that
cause seasonal variation are
i) Climate and weather condition.
ii) Customs and traditional habits.
Measurement of seasonal variation:
The following are some of the methods more popularly used for measuring the seasonal
variations.
1. Method of simple averages.
2. Ratio to trend method.
3. Ratio to moving average method.
4. Link relative method
Method of simple averages
The steps for calculations:
i) Arrange the data season wise
ii) Compute the average for each season.
iii) Calculate the grand average, which is the average of seasonal averages.
iv) Obtain the seasonal indices by expressing each season as percentage of Grand average
The total of these indices would be 100n where n is the number of seasons in the year.
Page 77
AMF104
Example :
Find the seasonal variations by simple average method for the data given below.
Solution:
Grand average = (47.6

=224/4 = 56
Seasonal Index for I quarter = (First quarterly Average/ Grand Average)
Seasonal Index for II quarter = (Second quarterly Average/Grand Average)
= 113.6
Seasonal Index for III quarter = (Third quarterly Average/Grand Average)
Seasonal Index for IV quarter = (Fourth quarterly Average/Grand Average)
Cyclical variations:
The term cycle refers to the recurrent variations in time series that extend over longer
period of time, usually two or more years. Most of the time series relating to economic
and business show some kind of cyclic variation. A business cycle consists of the
recurrence of the up and down movement of business activity. It is a four-phase cycle
namely.
1. Prosperity
2. Decline
3. Depression
4. Recovery
Each phase changes gradually into the following phase. The following diagram illustrates
a business cycle.
Page 78
AMF104
The study of cyclical variation is extremely useful in framing suitable policies for
stabilizing the level of business activities. Businessmen can take timely steps in
maintaining business during booms and depression.
Irregular variation:
Irregular variations are also called erratic. These variations are not regular and which do
not repeat in a definite pattern. These variations are caused by war, earthquakes, strikes
flood, revolution etc. This variation is short-term one, but it affects all the components of
series. There are no statistical techniques for measuring or isolating erratic fluctuation.
Therefore the residual that remains after eliminating systematic components is taken as
representing irregular variations.
3.5 Forecasting
Introduction:
A very important use of time series data is towards forecasting the likely value of
variable in future. In most cases it is the projection of trend fitted into the values
regarding a variable over a sufficiently long period by any of the methods discussed
latter. Adjustments for seasonal and cyclical character introduce further improvement in
the forecasts based on the simple projection of the trend. The importance of forecasting in
business and economic fields lies on account of its role in planning and evaluation. If
suitably interpreted, after consideration of other forces, say political, social governmental
policies etc., this statistical technique can be of immense help in decision making.
The success of any business depends on its future estimates. On the basis of these
estimates a business man plans his production stocks, selling market, arrangement of
additional funds etc.
Forecasting is different from predictions and projections. Regression analysis, time series
analysis, Index numbers are some of the techniques through which the predictions and
projections are made. Whereas forecasting is a method of foretelling the course of
business activity based on the analysis of past and present data mixed with the
consideration of ensuring economic policies and circumstances. In particular forecasting
means fore-warning. Forecasts based on statistical analysis are much reliable than a guess
work.
Methods of Business forecasting:
There are three methods of forecasting
1. Naive method
Page 79
AMF104
2. Barometric methods
3. Analytical Methods
1. Naive method: It contains only the economic rhythm theory.
2. Barometric methods: It covers
i) Specific historical analogy
ii) Lead- Lag relationship
iii) Diffusion method
iv) Action reaction theory
3. Analytical Methods: It contains
i) The factor listing method
ii) Cross-cut analysis theory
iii) Exponential smoothing
iv) Econometric methods
The economic rhythm theory:
In this method the manufactures analysis the time-series data of his own firm and
forecasts on the basis of projections so obtained. This method is applicable only for the
individual firm for
Which the data are analyzed, the forecasts under this method are not very reliable as no
subjective matters are being considered.
Diffusion method of Business forecasting
The diffusion index method is based on the principle that different factors, affecting
business, do not attain their peaks and troughs simultaneously. There is always time-log
between them. This method has the convenience that one has not to identify which series
has a lead and which has a lag. The diffusion index depicts the movement of broad group
of series as a whole without bothering about the individual series. The diffusion index
shows the percentage of a given set of series as expanding in a time period. It should be
carefully noted that the peaks and troughs of diffusion index are not the peaks troughs of
the business cycles. All series do not expand or contract concurrently. Hence if more than
50% are expanding at a given time, it is taken that the business is in the process of
booming and vice - versa.
The graphic method is usually employed to work out the diffusion index. The diffusion
index can be constructed for a group of business variables like prices, investments, profits
etc.
Cross cut analysis theory of Business forecasting:
In this method a thorough analysis of all the factors under present situations has to be
done and an estimate of the composite effect of all the factors is being made. This method
takes into account the views of managerial staff, economists, consumers etc. prior to the
forecasting. The forecast about the future state of the business is made on the basis of
overall assessment of the effect of all the factors.
Page 80
AMF104
END CHAPTER-3 QUIZZES

Answer the following questions1. _______ is a relative measure of association between two or more variables
(a). coefficient of correlation
(b). coefficient of regression
(c). both
((d). none of these
2. Correlation coefficient lies between
(a) -1 to +1
(b) 0 and +1
(c) -1 and 0
(d) None of these
3. R is independent of ______
(a). choice of origin and not of choice of scale
(b). choice of scale and not of choice of origin
(c). both choice of origin and choice of scale
(d). none of these
4. Correlation between temperature and sale of garments
(a). Positive
(b). 0
(c). negative
(d). none of these
5. Covariance can vary from
(a). -1 to+1
(b) infinity to + infinity
(c). -1 to 0
(d). 0 to +1
6. Karl Pearsons coefficient is defined from
(a). Ungrouped data
(b). Grouped data
(c). Both
(d). None
7. If the value of r2 for a particular situation is 0.49. What is the coefficient of
correlation
(a). 0.49
(b). 0.7
(c). 0.07
(d). Cannot be determined
Page 81
AMF104
8. If r=0.6, then the coefficient for non determination is(a).0.4

(b).-0.6
(c).0.36
(d).0.64
9. Coefficient of determination is(a). r3
(b). r2
(c). 1-r2
(d) 1+r2
10. When r=0 then covariance of (x,y) is equal to
(a). +1
(b). -1
(c). 0
(d). none of these
Page 82
AMF104
CHAPTER-4
PROBABILITY & TESTING OF HYPOTHESIS
CONTENTS
Probability & Testing of Hypothesis ...........................................................................................................84
4.1 Introduction ...........................................................................................................................................84
4.2 Classical Probability ..............................................................................................................................86
4.3 Probability Rules ...................................................................................................................................87
4.5 Probability Distribution .........................................................................................................................88
Cumulative Probability Distributions ......................................................................................................89
Discrete and Continuous Probability Distributions .....................................................................................90
Discrete Probability Distributions ...........................................................................................................90
Continuous Probability Distributions ......................................................................................................91
Binomial Distribution ..................................................................................................................................92
Binomial Experiment ..............................................................................................................................92
Notation ...................................................................................................................................................93
Binomial Distribution ..............................................................................................................................93
Binomial Probability ...............................................................................................................................93
Cumulative Binomial Probability ............................................................................................................94
Normal Distribution ....................................................................................................................................96
The Normal Equation ..............................................................................................................................96
The Normal Curve ...................................................................................................................................96
Probability and the Normal Curve ...........................................................................................................97
Standard Normal Distribution .................................................................................................................99
Standard Normal Distribution Table .......................................................................................................99
The Normal Distribution as a Model for Measurements .......................................................................101
4.6 Sampling..............................................................................................................................................102
4.7 Sampling Distribution .........................................................................................................................103
4.8 Chi-Square Distributions .....................................................................................................................103
4.9 t-Tests ..................................................................................................................................................105
4.10 F-Tests ...............................................................................................................................................108
Page 83
AMF104
Probability & Testing of Hypothesis

4.1 Introduction
A decision maker who always faces some degree of risk while selecting a particular
decision to solve a problem. It is because each strategy can lead to a number of different
possible outcomes. Thus it is necessary for the decisionmakers to enhance their
capability of grasping the probabilistic situation so as to gain a deeper understanding of
the decision problem and base their decisions on rational considerations. For this the
knowledge of the concept of probability, its distribution and various related statistical
terms is needed. The knowledge of probability and its various types of distributions help
in the development of probabilistic decision models.
Suppose a coin-is tossed once and the up face is recorded. The result we see and record-is
called an observation, or measurement, and the process of making an observation is
called an experiment. An experiment is an act or process of observation that leads to a
single outcome that cannot be predicted with certainty.
Some common terminologies used in probability areRandom Experiment
A random experiment is a process leading to two or more possible outcomes with
uncertainty as to which outcome will occur. If we list all the possible outcomes of this
experiment, we have a Sample Space = S = set of all possible outcomes of the
experiment.
Sample Space
The possible outcomes of a random experiment are called the basic outcomes, and the set
of all basic outcomes is called the sample space. The symbol S will be used to denote the
sample space. If E is a subset of S, the E is called an event.
Event
An event E is any subset of basic outcomes from the sample space. An event occurs if the
random experiment results in one of its constituent basic outcomes. The null event
represents the absence of a basic outcome and is denoted by .
Intersection of Events
Let A and B be two events in the sample space S. Their intersection denoted A B, is
the set of all basic outcomes in S that belong to both A and B. Hence, the intersection A
B occurs if and only if both A and B occur. More generally, given K events E1,
E2,Ek their intersection E1E2 .Ek, is the set of all basic outcomes that belong
to every Ei ( i = 1,2, .K).
Mutually Exclusive
If the events A and B have no common basic outcomes, they are called mutually
exclusive and their intersection A B is said to be the empty set indicating that A B
cannot occur. More generally, the K events E1,E2,Ek are said to be mutually exclusive
if every pair of them is a pair of mutually exclusive events.
Union
Let A and B be two events in the sample space S. Their union, denoted A U B, is the set
of all basic outcomes in S that belong to at least one of these two events. Hence, the
Page 84
AMF104
union A U B occurs if and only if either A or B or both occur. More generally, given the
K events E1,E2,Ek their union E1 U E2 U U EK is the set of all basic outcomes
belonging to at least one of these K events.
Complement
Let A be an event in the sample space S. The set of basic outcomes of a random
experiment belonging to S but not to A is called the complement of A and is denoted by
A.
To understand probability distributions, it is important to understand variables. Random
variables, and some notation. A variable is a symbol (A, B, x, y, etc.) that can take on any
of a specified set of values. When the value of a variable is the outcome of a statistical
experiment, that variable is a random variable. Generally, statisticians use a capital
letter to represent a random variable and a lower-case letter, to represent one of its values.
For example,
X represents the random variable X.

P(X) represents the probability of X.
P(X = x) refers to the probability that the random variable X is equal to a
particular value, denoted by x. As an example, P(X = 1) refers to the probability
that the random variable X is equal to 1.
Probability is the numerical assessment of likelihood on a scale from 0 (impossibility) to

1 (absolute certainty). Probability is usually expressed as the ratio between the number of
ways an event can happen and the total number of things that can happen (e.g., there are
13 ways of picking a diamond from a deck of 52 cards, so the probability of picking a
diamond is13/52, or 1/4). Probability of any event lies between 0 and 1. Sum of the
probabilities of all mutually exclusive and collective exhaustive events is 1.
When we are absolutely certain of an event occurring we assign it a probability of 1. So,
P(S)=1. If an event has no chance at all of occurring we assign it a probability of 0. The
closer a probability is to 1, the more certain we are of its chances of occurring. The closer
a probability is to 0, the less certain we are of its chances of occurring.
A sample point is the most basic outcome of an experiment.
Example:
Two coins are tossed, and their up faces are recorded. List all the sample points for this
experiment.
Solution:
Even for a seemingly trivial experiment, we must be careful when listing the sample
points. At first glance, we might expect three basic outcomes: Observe two heads,
Observe two tails. or Observe one head and one tail. However, further reflection reveals
that the last of these, Observe one head and one tail, can be decomposed into two
outcomes: Head on coin 1, Tail on coin 2; and Tail on coin 1, Head on coin 2. Thus, we
have four sample points:
1. Observe HH
2. Observe HT
Page 85
AMF104
3. Observe TH
4. Observe TT
where H in the first position means "Head on coin 1," H in the second position means
"Head on coin 2," and so on.
The collection of all the sample points of an experiment is called the sample space of the
experiment. For example, there are six sample points in the sample space associated with
the die-toss experiment.
Example:
Experiment: Observe the up face on a die
Sample space:
1. Observe a 1
2. Observe a 2
3. Observe a 3
4. Observe a 4
5. Observe a 5
6. Observe a 6
7. This sample space can be represented in a set of six sample points.
S: {1,2,3,4,5,6}
Probability Postulates
Let S denote the sample space of a random experiment, Oi are the basic outcomes, and A
an event. For each event A of the sample space S, we assume that a number P(A) is
defined and we have the postulates.
If A is any event in the sample space S
0<P(A)<1.
Let A be an event in S, and let Oi denote the basic outcomes. Then P(A)= P(Oi ) where
the notation implies that the summation extends over all the basic outcomes in A.
4.2 Classical Probability

Classical Definition of Probability
The classical definition of probability is the proportion of times that an event will occur,
assuming that all outcomes in a sample space are equally likely to occur. The probability
of an event is determined by counting the number of outcomes in the sample space that
satisfy the event and dividing by the total number of outcomes in the sample space. The
probability of an event A is
P(A) = NA/N
Where NA is the number of outcomes that satisfy the condition of event A and N is the
total number of outcomes in the sample space. The important idea here is that one can
develop a probability from fundamental reasoning about the process.
Example:
Page 86
AMF104
In a pack of cards, we have N=52 equally likely outcomes. Now have to determine the
probability that the card is King, Queen and card is not a King.
Solution:
Probability of being King = 4/52
= 1/13
Probability of being Queen = 4/52
=1/13
Probability that card is not a King = (52-4)/52
= 48/52
= 12/13
4.3 Probability Rules

Complement Rule
Let A be an event and its complement. Then the complement rule is:
=1-P(A)
The Addition Rule of Probabilities
Let A and B be two events. The probability of their union is
P(A U B ) = P ( A ) + P( B ) - P( A B )
Conditional Probability
Let A and B be two events. The conditional probability of event A, given that event B has
occurred, is denoted by the symbol P( A|B ) and is found to be:
P(A/B) = P(AB)/P(B)
The Multiplication Rule of Probabilities
Let A and B be two events. The probability of their intersection can be derived from
conditional probability as
P( A B) = P( A|B ) P( B )
Statistical Independence
Let A and B be two events. These events are said to be statistically independent if and
only if
P( A / B ) = P( A ) P( B )
From the multiplication rule it also follows that
P( A|B ) = P( A ) ( if P( B ) > 0 )
More generally, the events E1, E2, ., EK are mutually statistically independent if and
only if
P( E1 E2 .. EK ) = P( E1 ) P( E2 )..P( EK )
Page 87
AMF104
4.5 Probability Distribution

Probability distribution is related to frequency distributions. Probability distribution is
like theoretical frequency distribution. A theoretical frequency distribution is a
probability distribution that describes how outcomes are expected to vary. Because these
distributions deal with expectations, they are useful models in making inferences and
decisions under conditions of uncertainty.
To understand probability distributions, it is important to understand variables. Random

Variable, and some notation. A variable is a symbol (A, B, x, y, etc.) that can take on any
of a specified set of values.
When the value of a variable is the outcome of a statistical experiment, that

variable is a random variable.
Generally, statisticians use a capital letter to represent a random variable and a lower-case
letter, to represent one of its values. For example,
X represents the random variable X.

P(X) represents the probability of X.
P(X = x) refers to the probability that the random variable X is equal to a
particular value, denoted by x. As an example, P(X = 1) refers to the probability
that the random variable X is equal to 1.
The relationship between random variables and probability distributions can be easily
understood by example. Suppose you flip a coin two times. This simple statistical
experiment can have four possible outcomes: HH, HT, TH, and TT. Now, let the variable
X represent the number of Heads that result from this experiment. The variable X can
take on the values 0, 1, or 2. In this example, X is a random variable; because its value is
determined by the outcome of a statistical experiment.
A probability distribution is a table or an equation that links each outcome of a
statistical experiment with its probability of occurrence. Consider the coin flip
experiment described above. The table below, which associates each outcome
with its probability, is an example of a probability distribution.
Number of Heads
0
1
2
Probability
0.25
0.50
0.25
The above table represents the probability distribution of the random variable X.
Page 88
AMF104
Cumulative Probability Distributions

A cumulative probability refers to the probability that the value of a random variable
falls within a specified range.
Let us return to the coin flip experiment. If we flip a coin two times, we might ask: What
is the probability that the coin flips would result in one or fewer heads? The answer
would be a cumulative probability. It would be the probability that the coin flip
experiment results in zero heads plus the probability that the experiment results in one
head.
P(X < 1) = P(X = 0) + P(X = 1) = 0.25 + 0.50 = 0.75
Like a probability distribution, a cumulative probability distribution can be represented
by a table or an equation. In the table below, the cumulative probability refers to the
probability than the random variable X is less than or equal to x.
Number of heads: x
0
1
2
Probability: P(X = x)
0.25
0.50
0.25
Cumulative Probability: P(X < x)

0.25
0.75
1.00
Example:
Suppose a die is tossed. What is the probability that the die will land on 6 ?
Solution: When a die is tossed, there are 6 possible outcomes represented by: S = { 1, 2,
3, 4, 5, 6 }. Each possible outcome is a random variable (X), and each outcome is equally
likely to occur. Thus, we have a uniform distribution. Therefore, the P(X = 6) = 1/6.
Example:
2
Suppose we repeat the dice tossing experiment described in Example 1. This time, we ask
what is the probability that the die will land on a number that is smaller than 5 ?
Solution: When a die is tossed, there are 6 possible outcomes represented by: S = { 1, 2,
3, 4, 5, 6 }. Each possible outcome is equally likely to occur. Thus, we have a uniform
distribution.
This problem involves a cumulative probability. The probability that the die will land on
a number smaller than 5 is equal to:
Page 89
AMF104
P( X < 5 ) = P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4) = 1/6 + 1/6 + 1/6 + 1/6 = 2/3
Discrete and Continuous Probability Distributions
If a variable can take on any value between two specified values, it is called a continuous
variable; otherwise, it is called a discrete variable.
Some examples will clarify the difference between discrete and continuous variables.
Suppose the fire department mandates that all fire fighters must weigh between
150 and 250 pounds. The weight of a fire fighter would be an example of a
continuous variable; since a fire fighter's weight could take on any value between
150 and 250 pounds.
Suppose we flip a coin and count the number of heads. The number of heads
could be any integer value between 0 and plus infinity. However, it could not be
any number between 0 and plus infinity. We could not, for example, get 2.5
heads. Therefore, the number of heads must be a discrete variable.
Just like variables, probability distributions can be classified as discrete or continuous.

Discrete Probability Distributions
If a random variable is a discrete variable, its probability distribution is called a discrete
probability distribution.
An example will make this clear. Suppose you flip a coin two times. This simple
statistical experiment can have four possible outcomes: HH, HT, TH, and TT. Now, let
the random variable X represent the number of Heads that result from this experiment.
The random variable X can only take on the values 0, 1, or 2, so it is a discrete random
variable.
The probability distribution for this statistical experiment appears below.
Number of heads
0
1
2
Probability
0.25
0.50
0.25
The above table represents a discrete probability distribution because it relates each value
of a discrete random variable with its probability of occurrence. In subsequent lessons,
we will cover the following discrete probability distributions.
Page 90
AMF104
Binomial probability distribution

Hypergeometric probability distribution
Multinomial probability distribution
Poisson probability distribution
Note: With a discrete probability distribution, each possible value of the discrete random
variable can be associated with a non-zero probability. Thus, a discrete probability
distribution can always be presented in tabular form.
Continuous Probability Distributions
If a random variable is a continuous variable, its probability distribution is called a
continuous probability distribution.
A continuous probability distribution differs from a discrete probability distribution in
several ways.
The probability that a continuous random variable will assume a particular value
is zero.
As a result, a continuous probability distribution cannot be expressed in tabular
form.
Instead, an equation or formula is used to describe a continuous probability
distribution.
Most often, the equation used to describe a continuous probability distribution is called a
probability density function. Sometimes, it is referred to as a density function, a PDF,
or a pdf. For a continuous probability distribution, the density function has the following
properties:
Since the continuous random variable is defined over a continuous range of values
(called the domain of the variable), the graph of the density function will also be
continuous over that range.
The area bounded by the curve of the density function and the x-axis is equal to 1,
when computed over the domain of the variable.
The probability that a random variable assumes a value between a and b is equal
to the area under the density function bounded by a and b.
For example, consider the probability density function shown in the graph below.
Suppose we wanted to know the probability that the random variable X was less than or
equal to a. The probability that X is less than or equal to a is equal to the area under the
curve bounded by a and minus infinity as indicated by the shaded area.
Page 91
AMF104
Note: The shaded area in the graph represents the probability that the random variable X
is less than or equal to a. This is a cumulative probability. However, the probability that
X is exactly equal to a would be zero. A continuous random variable can take on an
infinite number of values. The probability that it will equal a specific value (such as a) is
always zero.
Later we will discuss following distribution in that chapter:
Normal probability distribution

t distribution
Chi-square distribution
F distribution
Binomial Distribution
To understand binomial distributions and binomial probability, it helps to understand
binomial experiments and some associated notation; so we cover those topics first.
Binomial Experiment
A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that
has the following properties:
The experiment consists of n repeated trials.

Each trial can result in just two possible outcomes. We call one of these outcomes
a success and the other, a failure.
The probability of success, denoted by P, is the same on every trial.
The trials are independent; that is, the outcome on one trial does not affect the
outcome on other trials.
Consider the following statistical experiment. You flip a coin 2 times and count the
number of times the coin lands on heads. This is a binomial experiment because:
The experiment consists of repeated trials. We flip a coin 2 times.

Each trial can result in just two possible outcomes - heads or tails.
The probability of success is constant - 0.5 on every trial.
Page 92
AMF104
The trials are independent; that is, getting heads on one trial does not affect
whether we get heads on other trials.
Notation
The following notation is helpful, when we talk about binomial probability.
x: The number of successes that result from the binomial experiment.

n: The number of trials in the binomial experiment.
P: The probability of success on an individual trial.
Q: The probability of failure on an individual trial. (This is equal to 1 - P.)
b(x; n, P): Binomial probability - the probability that an n-trial binomial
experiment results in exactly x successes, when the probability of success on an
individual trial is P.
nCr: The number of combinations of n things, taken r at a time.
Binomial Distribution
A binomial random variable is the number of successes x in n repeated trials of a
binomial experiment. The probability distribution of a binomial random variable is called
a binomial distribution (also known as a Bernoulli distribution).
Suppose we flip a coin two times and count the number of heads (successes). The
binomial random variable is the number of heads, which can take on values of 0, 1, or 2.
The binomial distribution is presented below.
Number of heads
0
1
2
Probability
0.25
0.50
0.25
The binomial distribution has the following properties:
The mean of the distribution (x) is equal to n * P .

The variance (2x) is n * P * ( 1 - P ).
The standard deviation (x) is sqrt[ n * P * ( 1 - P ) ].
Binomial Probability
The binomial probability refers to the probability that a binomial experiment results in
exactly x successes. For example, in the above table, we see that the binomial probability
of getting exactly one head in two coin flips is 0.50.
Page 93
AMF104
Given x, n, and P, we can compute the binomial probability based on the following
formula:
Binomial FormulaSuppose a binomial experiment consists of n trials and results in x successes. If the
probability of success on an individual trial is P, then the binomial probability is:
B(x; n,P) = nCx * Px * (1-P)n-x
Example
Suppose a die is tossed 5 times. What is the probability of getting exactly 2 fours?
Solution: This is a binomial experiment in which the number of trials is equal to 5, the
number of successes is equal to 2, and the probability of success on a single trial is 1/6 or
about 0.167. Therefore, the binomial probability is:
b(2; 5, 0.167) = 5C2 * (0.167)2 * (0.833)3
b(2; 5, 0.167) = 0.161
Cumulative Binomial Probability
A cumulative binomial probability refers to the probability that the binomial random
variable falls within a specified range (e.g., is greater than or equal to a stated lower limit
and less than or equal to a stated upper limit).
For example, we might be interested in the cumulative binomial probability of obtaining
45 or fewer heads in 100 tosses of a coin (see Example 1 below). This would be the sum
of all these individual binomial probabilities.
b(x < 45; 100, 0.5) = b(x = 0; 100, 0.5) + b(x = 1; 100, 0.5) + ... + b(x = 44; 100, 0.5) +
b(x = 45; 100, 0.5)
Example
The probability that a student is accepted to a prestigeous college is 0.3. If 5 students
from the same school apply, what is the probability that at most 2 are accepted?
Solution: To solve this problem, we compute 3 individual probabilities, using the
binomial formula. The sum of all these probabilities is the answer we seek. Thus,
Page 94
AMF104
b(x < 2; 5, 0.3) = b(x = 0; 5, 0.3) + b(x = 1; 5, 0.3) + b(x = 2; 5, 0.3)

b(x
<
2;
5,
0.3)
=
0.1681
+
0.3601
+
0.3087
b(x < 2; 5, 0.3) = 0.8369
Example
What is the probability that the World Series will last 4 games? 5 games? 6 games? 7
games? Assume that the teams are evenly matched.
Solution:
This is a very tricky application of the binomial distribution. If you can follow the logic
of this solution, you have a good understanding of the material covered in the tutorial, to
this point.
In the world series, there are two baseball teams. The series ends when the winning team
wins 4 games. Therefore, we define a success as a win by the team that ultimately
becomes the world series champion.
For the purpose of this analysis, we assume that the teams are evenly matched. Therefore,
the probability that a particular team wins a particular game is 0.5.
Let's look first at the simplest case. What is the probability that the series lasts only 4
games. This can occur if one team wins the first 4 games. The probability of the National
League team winning 4 games in a row is:
b(4; 4, 0.5) = 4C4 * (0.5)4 * (0.5)0 = 0.0625
Similarly, when we compute the probability of the American League team winning 4
games in a row, we find that it is also 0.0625. Therefore, probability that the series ends
in four games would be 0.0625 + 0.0625 = 0.125; since the series would end if either the
American or National League team won 4 games in a row.
Now let's tackle the question of finding probability that the world series ends in 5 games.
The trick in finding this solution is to recognize that the series can only end in 5 games, if
one team has won 3 out of the first 4 games. So let's first find the probability that the
American League team wins exactly 3 of the first 4 games.
b(3; 4, 0.5) = 4C3 * (0.5)3 * (0.5)1 = 0.25
Page 95
AMF104
Okay, here comes some more tricky stuff, so listen up. Given that the American League
team has won 3 of the first 4 games, the American League team has a 50/50 chance of
winning the fifth game to end the series. Therefore, the probability of the American
League team winning the series in 5 games is 0.25 * 0.50 = 0.125. Since the National
League team could also win the series in 5 games, the probability that the series ends in 5
games would be 0.125 + 0.125 = 0.25.
The rest of the problem would be solved in the same way. You should find that the
probability of the series ending in 6 games is 0.3125; and the probability of the series
ending in 7 games is also 0.3125.
Normal Distribution
The normal distribution refers to a family of continuous probability distributions
described by the normal equation.
The Normal Equation
The normal distribution is defined by the following equation:
Normal equation
The value of the random variable Y is:
Y= [1/ * sqrt(2)] * e-(x-)2/22
Where X is a normal random variable, is the mean, is the standard deviation, is
approximately 3.14159, and e is approximately 2.71828.
The random variable X in the normal equation is called the normal random variable.
The normal equation is the probability density function for the normal distribution.
The Normal Curve
The graph of the normal distribution depends on two factors - the mean and the standard
deviation. The mean of the distribution determines the location of the center of the graph,
and the standard deviation determines the height and width of the graph. When the
standard deviation is large, the curve is short and wide; when the standard deviation is
Page 96
AMF104
small, the curve is tall and narrow. All normal distributions look like a symmetric, bellshaped curve, as shown below.
The curve on the left is shorter and wider than the curve on the right, because the curve
on the left has a bigger standard deviation.
Probability and the Normal Curve
The normal distribution is a continuous probability distribution. This has several
implications for probability.
The total area under the normal curve is equal to 1.

The probability that a normal random variable X equals any particular value is 0.
The probability that X is greater than a equals the area under the normal curve
bounded by a and plus infinity (as indicated by the non-shaded area in the figure
below).
The probability that X is less than a equals the area under the normal curve
bounded by a and minus infinity (as indicated by the shaded area in the figure
below).
Additionally, every normal curve (regardless of its mean or standard deviation) conforms
to the following "rule".
About 68% of the area under the curve falls within 1 standard deviation of the
mean.
About 95% of the area under the curve falls within 2 standard deviations of the
mean.
About 99.7% of the area under the curve falls within 3 standard deviations of the
mean.
Page 97
AMF104
Collectively, these points are known as the empirical rule or the 68-95-99.7 rule.
Clearly, given a normal distribution, most outcomes will be within 3 standard deviations
of the mean.
Example:
An average light bulb manufactured by the Acme Corporation lasts 300 days with a
standard deviation of 50 days. Assuming that bulb life is normally distributed, what is the
probability that an Acme light bulb will last at most 365 days?
Solution: Given a mean score of 300 days and a standard deviation of 50 days, we want
to find the cumulative probability that bulb life is less than or equal to 365 days. Thus, we
know the following:
The value of the normal random variable is 365 days.

The mean is equal to 300 days.
The standard deviation is equal to 50 days.
We enter these values into the Normal Distribution Calculator and compute the
cumulative probability. The answer is: P( X < 365) = 0.90. Hence, there is a 90% chance
that a light bulb will burn out within 365 days.
Example:
Suppose scores on an IQ test are normally distributed. If the test has a mean of 100 and a
standard deviation of 10, what is the probability that a person who takes the test will
score between 90 and 110?
Solution: Here, we want to know the probability that the test score falls between 90 and
110. The "trick" to solving this problem is to realize the following:
P( 90 < X < 110 ) = P( X < 110 ) - P( X < 90 )
We use the Normal Distribution Calculator to compute both probabilities on the right side
of the above equation.
To compute P( X < 110 ), we enter the following inputs into the calculator: The
value of the normal random variable is 110, the mean is 100, and the standard
deviation is 10. We find that P( X < 110 ) is 0.84.
Page 98
AMF104
To compute P( X < 90 ), we enter the following inputs into the calculator: The
value of the normal random variable is 90, the mean is 100, and the standard
deviation is 10. We find that P( X < 90 ) is 0.16.
We use these findings to compute our final answer as follows:

P( 90 < X < 110
P(
90
<
X
P( 90 < X < 110 ) = 0.68
=
<
P( X
110
<
110
)
)
=
P(
0.84
<
-
90 )
0.16
Thus, about 68% of the test scores will fall between 90 and 110.
Standard Normal Distribution
The standard normal distribution is a special case of the normal distribution. It is the
distribution that occurs when a normal random variable has a mean of zero and a standard
deviation of one.
The normal random variable of a standard normal distribution is called a standard score
or a z-score. Every normal random variable X can be transformed into a z score via the
following equation:
z = (X - ) /
where X is a normal random variable, is the mean mean of X, and is the standard
deviation of X.
Standard Normal Distribution Table
A standard normal distribution table shows a cumulative probability associated with a
particular z-score. Table rows show the whole number and tenths place of the z-score.
Table columns show the hundredths place. The cumulative probability (often from minus
infinity to the z-score) appears in the cell of the table.
For example, a section of the standard normal table is reproduced below. To find the
cumulative probability of a z-score equal to -1.31, cross-reference the row of the table
containing -1.3 with the column containing 0.01. The table shows that the probability that
a standard normal random variable will be less than -1.31 is 0.0951; that is, P(Z < -1.31)
= 0.0951.
Page 99
AMF104
z
3.
0
...
1.
4
1.
3
1.
2
...
3.
0
0.00
0.001
3
0.01
0.001
3
0.02
0.001
3
0.03
0.001
2
0.04
0.001
2
0.05
0.001
1
0.06
0.001
1
0.07
0.001
1
0.08
0.001
0
0.09
0.001
0
...
0.080
8
...
0.079
3
...
0.077
8
...
0.076
4
...
0.074
9
...
0.073
5
...
0.072
2
...
0.070
8
...
0.069
4
...
0.068
1
0.096
8
0.095
1
0.093
4
0.091
8
0.090
1
0.088
5
0.086
9
0.085
3
0.083
8
0.082
3
0.115
1
0.113
1
0.111
2
0.109
3
0.107
5
0.105
6
0.103
8
0.102
0
0.100
3
0.098
5
...
0.998
7
...
0.998
7
...
0.998
7
...
0.998
8
...
0.998
8
...
0.998
9
...
0.998
9
...
0.998
9
...
0.999
0
...
0.999
0
Of course, you may not be interested in the probability that a standard normal random
variable falls between minus infinity and a given value. You may want to know the
probability that it lies between a given value and plus infinity. Or you may want to know
the probability that a standard normal random variable lies between two given values.
These probabilities are easy to compute from a normal distribution table. Here's how.
Find P(Z > a). The probability that a standard normal random variable (z) is
greater than a given value (a) is easy to find. The table shows the P(Z < a). The
P(Z
>
a)
=
1
P(Z
<
a).
Suppose, for example, that we want to know the probability that a z-score will be
greater than 3.00. From the table (see above), we find that P(Z < 3.00) = 0.9987.
Therefore, P(Z > 3.00) = 1 - P(Z < 3.00) = 1 - 0.9987 = 0.0013.
Find P(a < Z < b). The probability that a standard normal random variables lies
between two values is also easy to find. The P(a < Z < b) = P(Z < b) - P(Z < a).
For example, suppose we want to know the probability that a z-score will be
greater than -1.40 and less than -1.20. From the table (see above), we find that
P(Z < -1.20) = 0.1151; and P(Z < -1.40) = 0.0808. Therefore, P(-1.40 < Z < -1.20)
= P(Z < -1.20) - P(Z < -1.40) = 0.1151 - 0.0808 = 0.0343.
In school or on the Advanced Placement Statistics Exam, you may be called upon to use
or interpret standard normal distribution tables. Standard normal tables are commonly
found in appendices of most statistics texts.
Page 100
AMF104
The Normal Distribution as a Model for Measurements

Often, phenomena in the real world follow a normal (or near-normal) distribution. This
allows researchers to use the normal distribution as a model for assessing probabilities
associated with real-world phenomena. Typically, the analysis involves two steps.
Transform raw data. Usually, the raw data are not in the form of z-scores. They
need to be transformed into z-scores, using the transformation equation presented
earlier: z = (X - ) / .
Find probability. Once the data have been transformed into z-scores, you can use
standard normal distribution tables, online calculators (e.g., Stat Trek's free
normal distribution calculator), or handheld graphing calculators to find
probabilities associated with the z-scores.
Example: Mr. X earned a score of 940 on a national achievement test. The mean test
score was 850 with a standard deviation of 100. What proportion of students had a higher
score than Mr. X? (Assume that test scores are normally distributed.)
(A) 0.10
(B) 0.18
(C) 0.50
(D) 0.82
(E) 0.90
Solution:
The correct answer is B. As part of the solution to this problem, we assume that test
scores are normally distributed. In this way, we use the normal distribution as a model for
measurement. Given an assumption of normality, the solution involves three steps.
First, we transform Mr. X's test score into a z-score, using the z-score
transformation equation.
z = (X - ) / = (940 - 850) / 100 = 0.90
Then from the standard normal distribution table, we find the cumulative
probability associated with the z-score. In this case, we find P(Z < 0.90) = 0.8159.
Therefore, the P (Z > 0.90) = 1 P (Z < 0.90) = 1 - 0.8159 = 0.1841.
Thus, we estimate that 18.41 percent of the students tested had a higher score than Mr. X.
Page 101
AMF104
4.6 Sampling
A population is commonly understood to be a natural, geographical, or political
collection of people, animals, plants, or objects. Some statisticians use the word in the
more restricted sense of the set of measurements of some attribute of such a collection;
thus they might speak of the population of heights of male college students. Or they
might use the word to designate a set of categories of some attribute of a collection, for
example, the population of religious affiliations of U.S. government employees.
In statistical discussions, we often refer to the physical collection of interest as well as to
the collection of measurements or categories derived from the physical collection. In
order to clarify which type of collection is being discussed, in this book we use the term
population as it is used by the research scientist: The population is the physical
collection. The derived set of measurements or categories is called the set of values of the
variable of interest. Thus, in the first example above, we speak of the set of all values of
the variable height for the population of male college students.
After we have defined the population and the appropriate variable, we usually find it
impractical, if not impossible, to observe all the values of the variable. For example, all
the values of the variable miles per gallon in city driving for this years model of a certain
type of car could not be obtained since some of the cars probably are yet to be produced.
Even if they did exist, the task of obtaining a measurement from each car is not feasible.
In another example, the values of the variable condition of all packaged bandages (sterile
or contaminated) produced on a particular day by a certain firm could be obtained, but
this is not desirable since the bandages would be made useless in the process of testing.
Instead, we consider a sample (a portion of the population), obtain measurements or
observations from this sample (the sample data), and then use statistics to make an
inference about the entire set of values. To carry out this inference, the sample must be
random.
Types of Sampling
There are two methods of selecting samples from populations: nonrandom or judgmental
sampling and random or probability sampling. In probability sampling, all the items in
population have a chance of being chosen in the sample. In judgmental sampling,
personal knowledge and opinion are used to identify the items from the population that
are to be included in the sample.
Random sampling
Most statistics departments have entire courses in which different sampling techniques
and their efficiencies are studied; only a brief description of sampling can be given here.
If we have a population of N items from which a sample of n is to be drawn and we
choose the n items in such a way that every combination of n items has an equally likely
chance of being chosen, then this is called a simple random sample.
Page 102
AMF104
There are other methods of sampling besides simple random sampling. One is stratified
random sampling. This consists in dividing the population into groups, or strata, and then
taking a simple random sample from each stratum. This is done to improve the accuracy
of estimates, to reduce cost, or to make it possible to compare strata. The sampling is
often proportional so that the sizes of the samples from the strata are proportional to the
sizes of the strata. Other is systematic sampling, in which elements are selected from the
population at a uniform interval that is measured in time, order, or space. While in cluster
sampling we divide the population into groups, or clusters, and then select a random
sample of these clusters.
4.7 Sampling Distribution

A probability distribution of all the possible means of the samples is a distribution of
sample means or sampling distribution of the mean.
Population mean and sample average
The population mean is to be described as the populations center or location. If the
population were totally accessible, its mean would be computed by the formula
=y/N
in which (the lower-case Greek letter mu) is the symbol for the population mean, y is
the sum of all of the values of the variable of interest for the whole population, and N is
the number of elements in the population. We rarely have an opportunity to use this
formula since most of the populations we study are not totally accessible; they either are
too large, perhaps even infinite, or would be destroyed in the process of measurement.
Population variance and sample variance
The population variance is a measure of the spread of the population. Suppose we want to
choose between two investment plans and are told that both have mean earnings of 10%
per annum; we might conclude that they were equally good. However, suppose we learn
that plan A has a variance twice as large as plan B. This gives us additional information
on which to base a choice. A population variance can be computed from ungrouped data
or from data that are grouped into a frequency or relative frequency distribution if the
population is of the accessible variety. For ungrouped data, a population variance is
defined to be
4.8 Chi-Square Distributions

The chi-square is a continuous probability distribution. Although this theoretical
probability distribution is usually not a direct model of a population distribution, it has
many uses when we are trying to answer questions about populations. For example, the
Page 103
AMF104
chi-square distribution can be used to decide whether or not a set of data fits a specified
theoretical probability model a goodness-of-fit test.
Goodness-of-fit tests
Goodness-of-Fit Test with a Specified Parameter
Example: Each day a salesperson calls on 5 prospective customers and she records
whether or not the visit results in a sale. For a period of 100 days her record is as follows:
Number of sales: 0 1 2 3 4 5
Frequency:
15 21 40 14 6 4
A marketing researcher feels that a call results in a sale about 35% of the time, so he
wants to see if this sampling of the salespersons efforts fits a theoretical binomial
distribution for 5 trials with 0.35 probability of success, b( y; 5, 0.35). This binomial
distribution has the following probabilities and leads to the following expected values for
100 days of records:
Page 104
AMF104
Since the last category has an expected value of less than 1, he combines the last two
categories to perform the goodness-of-fit test.
In this goodness-of-fit test the hypotheses are:

H0: This sample is from b(y; 5, 0:35)
Ha: This sample is not from b(y; 5, 0:35)
The degrees of freedom are v k 2 1 5 2 1 4. The critical value is
The null hypothesis is rejected if this value is exceeded. Thus the marketing researcher
rejects the null hypothesis. The sales do not follow the pattern of this binomial
distribution.
4.9 t-Tests
If random samples of size less than 30 are taken from a normal distribution and the
samples used to estimate the
Variance, then the statistic
is not normally distributed. The probabilities in the tails of this distribution are greater
than for the standard normal distribution
Page 105
AMF104
Comparison of the standard normal distribution and a t distribution

t distributions are
1. unimodal;
2. asymptotic to the horizontal axis;
3. symmetrical about zero, E(t);
4. dependent on n, the degrees of freedom (for the statistic under discussion, v n 2 1);
5. more variable than the standard normal distribution, V(t) = v/(v - 2) for n > 2;
6. approximately standard normal if v is large.
Example:
Using a t Distribution to Test a Hypothesis about
The sports physiologist would like to test H0:= 17 against Ha:
marathon runners. In a random sample of 8 female runners, he finds
for female
Since n = 8, the degrees of freedom are v =7, and at = 0.05 the null hypothesis will be
rejected if |t|= t0.025,7 2.365. The test statistic is
Thus he rejects the null hypothesis and concludes that for women the distance until stress
is more than 17 miles.
It is possible to make inference about another type of mean, the mean of the difference
between two matched groups. For example, the mean difference between pretest scores
and post-test scores for a certain course or the mean difference in reaction time when the
same subjects have received a certain drug or have not received the drug might be
desired. In such situations, the experimenter will have two sets of sample data (in the
examples just given, pretest/post-test or received/did not receive); however, both sets are
obtained from the same subjects. Sometimes the matching is done in other ways, but the
object is always to remove extraneous variability from the experiment. For example,
identical twins might be used to control for genetically caused variability or two types of
seeds are planted in identical plots of soil under identical conditions to control for the
Page 106
AMF104
effect of environment on plant growth. If the experimenter is dealing with two matched
groups, the two sets of sample data contain corresponding members thus he has,
essentially, one set consisting of pairs of data. Inference about the mean difference
between these two dependent groups can be made by working with the differences within
the pairs and using a t distribution with n - 1 degrees of freedom in which n is the number
of pairs.
Example: Matched-Pair t Test
Two types of calculators are compared to determine if there is a difference in the time
required to perform a certain common statistical calculation. Twelve students chosen at
random are given drills with both calculators so that they are familiar with the operation
of each type. Then the time they take to complete the calculation on each device is
measured in seconds (which calculator they are to use first is determined by some random
procedure to control for any additional learning during the first calculation). The data are
as follows:
The null hypothesis is H0: = 0 and Ha:

in which d is the population mean for
the difference in time on the two devices. Thus
The test statistic is
and since t > 2.201, the test is

significant and the two calculators differ in the time necessary to perform the calculation.
Page 107
AMF104
Looking at the data, since yd is positive, the experimenter concludes that the calculation
is faster on machine B.
In the above example, the experimenter was interested in whether there is a difference in
time required on the two calculators; thus d= 0 was tested. The population mean
specified in the null hypothesis need not be zero; it could be some other specified
amount. For example, in an experiment about the reaction time the experimenter might
hypothesize that after taking a certain drug reaction times are slower by 2 seconds; then
H0: d = 2 would be tested, with
. The alternative hypothesis may be
one-tailed or two-tailed, as appropriate for the experimental question.
Using a matched-pair design is a way to control extraneous variability. If the study of the
two calculators involved a random sample of 12 students who used calculator A and
another random sample of 12 students who used calculator B, additional variability
would be introduced because the two groups are made up of different people. Even if
they were to use the same calculator, the means of the two groups would probably be
different. If the differences among people are large, they interfere with our ability to
detect any difference due to the calculators. If possible, a design involving two dependent
samples that can be analyzed by a matched-pair t test is preferable to two independent
samples.
4.10 F-Tests
Inference about two variances
There are situations, of course, in which the variances of the two populations under
consideration are different. The variability in the weights of elephants is certainly
different from the variability in the weights of mice, and in many experiments, even
though we do not have these extremes; the treatments may affect the variances as well as
the means.
The null hypothesis H0:12=22 is tested by using a statistic that is in the form of a ratio
rather than a difference; the statistic is s12=s22. Intuitively, if the variances are equal, this
ratio should be approximately equal to 1, so values that differ greatly from 1 indicate
inequality.
It has been found that the statistic s12/s22 from two normal populations with equal
variances follows a theoretical distribution known as anF distribution. The density
functions for F distributions are known, and we can get some understanding of their
nature by listing some of their properties. Let us call a random variable that follows an F
distribution F; then the following properties exist:
1. F > 0.
2. The density function of F is not symmetrical.
3. F depends on an ordered pair of degrees of freedom v1 and v2; that is, there is a
different F distribution for each ordered pair v1, v2. (v1 corresponds to the degrees of
freedom of the numerator of s12 /s22 and v2 corresponds to the denominator.)
4. If a is the area under the density curve to the right of the value Fa,v1,v2 , then
Fa,v1,v2 = 1/F1-a,v2,v1
Page 108
AMF104
5. The F distribution is related to the t distribution:

Fa,1,v2 = (ta/2,v2 )2
Table A.12 in the Appendix gives upper critical values for F if a 0.050, 0.025, 0.010,
0.005, 0.001. Lower-tail values can be found using property 4 above.
Example Testing for the Equality of Two Variances
Both rats and mice carry ectoparasites that can transmit disease organisms to humans. To
determine which of the two rodents presents the greater health hazard in a certain area, a
public health officer traps (presumably at random) both and counts the number of
ectoparasites each carries. The data are presented first in side-by-side stem-and-leaf plots
and then as side-by-side box-and-whisker plots:
He wants to test for the equality of means with a group comparison t test. He assumes
that these discrete counts are approximately normally distributed, but because he is
studying animals of different species, sizes, and body surface areas, he has some doubts
about the equality of the variances in the two populations, and the box plots seem to
support that concern. Thus he first must test
with the test statistic F =s12/s22 =43:4/13:0 = 3:34. Since n1 = 31 and n2 = 9, the degrees
of freedom for the numerator are v1 = n1 - 1 = 30 and for the denominator v2 = n2 - 1 =
8.
From table
F0:05,30,8 = 3:079 and F0:05,8,30 = 2:266
thus the region of rejection at a = 0.10 is F >= F0:05, 30,8 = 3:079 and F <= F0:95,30,8 =
1/ F0:05,8,30= 1/ 2:266= 0:441
Since the computed F equals 3.34, the null hypothesis is rejected, and the public health
officer concludes that the variances are unequal. Since one of the sample sizes is small,
he may not perform the usual t test for two independent samples.
One-tailed tests of hypotheses involving the F distribution can also be performed, if
desired, by putting the entire probability of a Type I error in the appropriate tail. Central
confidence intervals on 12= 22 are found as follows:
Page 109
AMF104
Although the public health officer cannot perform the usual t test for two independent
samples because of the unequal variances and the small sample size, there are
approximation methods available. One such test is called the BehrensFisher, or the t 0
test for two independent samples and using adjusted degrees of freedom.
Page 110
AMF104
CHAPTER-4 END CHAPTER QUIZZES

Answer the following
1.
If one event is unaffected by the outcome of another event, the two events are said
to be
(a). Dependent
(b). Independent
(c). Mutually exclusive
(d). All of the above
2. If P(A or B)= P(A), then
(a). A abd B are mutually exclusive
(b). The Venn diagram areas of A and B overlap
(c). P(A) + P(B) is the joint probability of A and B
(d). none of the above
3. The simple probability of an occurrence of an event is called the
(a). Bayesian probability
(b). Joint Probability
(c). Marginal Probability
(d). Conditional Probability
4. What is the probability that a value chosen at random from a particular populationis
larger than the median of the population
(a). 0.25
(b). 0.5
(c). 1.0
(d). 0.67
5. Which of the following is a characteristic of the probability distribution for any
random variable?
(a). A probability is provided for every possible value
(b). The sum of all probabilities is 1.
(c). No given probability occurs more than once.
(d). (a) and (b) both
6. Which of the following is a necessary condition for use of a Poisson distribution?
(a). Probability of one arrival per second is constant.
(b). The number of arrivals in any 1-second interval is independent of arrivals in the other
intervals.
(c). the probability of two or more arrivals in the same second is zero.
(d). Both (b) and (c)
7. Which of the following is method of selecting samples from a population?
(a). Judgment sampling
(b). Random sampling
Page 111
AMF104
(c). Probability sampling

(d). All of these
8. Choose the pair of symbols that best completes this sentence: ________is a parameter,
whereas ____ is a statistics.
(a) N,
(b) , s
(c) N, n
(d) All of these
9. The chi-square and the t-distribution are both:
(a). Always symmetrical distribution
(b). Used for hypothesis testing
(c). Dependent on the number of degrees of freedom
(d). both (b) and (c)
10. The F- ratio contains:
(a). Two estimates of the population variance
(b). Two estimates of population mean
(c). One estimate of population mean and one estimate of population variance
(d). None of the above
Page 112
AMF104
CHAPTER-5
DECISION THEORY
CONTENTS
5.1 Decision Environment .........................................................................................................................115
5.2 Structure of Decision Making Problem ...............................................................................................115
5.3 Types of Decision Making Criteria .....................................................................................................118
5.4 Expected Monetary Value (EMV).......................................................................................................122
5.5 Expected Value of Perfect Information ...............................................................................................123
5.6 Expected Opportunity Loss (EOL) ......................................................................................................124
Page 113
AMF104
Decision Theory
Typically, personal and professional decisions can be made with little difficulty. Either
the best course of action is clear or the ramifications of the decision are not significant
enough to require a great amount of attention. On occasion, decisions arise where the
path is not clear and it is necessary to take substantial time and effort in devising a
systematic method of analyzing the various courses of action.
When a decision maker must choose one among a number of possible actions, the
ultimate consequences of some if not all of these actions will generally depend on
uncertain events and future actions extending indefinitely far into the future. With
decisions under uncertainty, the decision maker must:
1. Take an inventory of all viable options available for gathering information, for
experimentation, and for action;
2. List all events that may occur;
3. Arrange all pertinent information and choices/assumptions made;
4. Rank the consequences resulting from the various courses of action;
5. Determine the probability of an uncertain event occurring.
According to the function cycles of management the managerial activity as whole
includes five phases or life processes which are as follows:1. Planning,
2. Organization,
3. Direction,
4. Supervision and
5. Control.
In performing all of these activities the management has to face several such situations
where they have to make a choice of the best among a number of alternative courses of
action. This choice making is technically termed as decision making or decision
taking. A decision is simply a selection from two or more courses of action. Decision
making permeates all management activities. It is essential for the following:1. Setting objectives,
2. Deciding basic corporate policies,
3. Preparing future product and facility programs
4. Other major plans or proposals that will have an important long run influence on the
future profit and assets requirements of the enterprise.
It also involves determining the type of organization structure ascertaining how to
motivate personnel and accepting innovations. Decision making may be defined as:process which results in the selection from a set of alternative courses of action, that
course of action which is considered to meet the objectives of the decision problem more
satisfactorily then other as judged by the decision maker.
Scientific decision making process is characterized by adoption of systematic, logical and
through-going reasoning to understand problem situations and to arrive at the best
Page 114
AMF104
Possible decisions through the process of :1. Abstraction,

2. Observation,
3. Investigation,
4. Experimentation,
5. Measurement and
6. Analysis of data in the relevant real word decision environment.
5.1 Decision Environment

An organization is working in an environment which is continuously being affected by
the parties outside to it. e.g. Government and other statutory authorities, ,competitors and
general public etc. Apart from a clear cut benefit/cost analysis, where the benefit should
be more than the related costs for a decision to be implemented, an organization should
also seek to know the views of the outside parties.
For example, certain restrictions imposed by the relevant government (rules relating to
environmental pollution, elimination of air and water pollution etc.), the policies of
competitors (pricing policy, marketing policy) & the hopes and expectations of the
consumers (low price, good quality, safety usage) would also have a direct bearing on the
organization decisions.
It may be noted that objectivity rather than subjectivity, reason rather than emotion, open
adherence to certain verifiable concepts, principles and decision rules rather than
expedient opportunism, orderly precise processes rather than haphazard, rough and ready
rule of thumb process, are the basic ingredients of decision making process. The method
permits the decision maker :1. to generate accurate empirical data,
2. to establish logical relationships among the various variables in the decision situation
and
3. to make optimal decisions based on facts.
The idea behind decision making process is that various options are developed and
weighted logically and empirically as far as practicable and feasible before adapting
particular choice or course of action. There are at least five stages in the decision making
process.
These are:
1. Perceiving the need and opportunity for decision making on any particular problem.
2. Determination of objectives sought by solution of the problem.
3. Collection of relevant information pertinent to the problem on hand.
4. Developing and evaluating alternative courses of action.
5. Choosing one of the alternatives.
5.2 Structure of Decision Making Problem

Decision theory commences with the assumption that regardless of the type of decisionwhether it involves long-range or short-range consequences; whether it is in finance,
production or making, or some other area; whether it is at a relatively high or low level of
Page 115
AMF104
managerial responsibility-certain common characteristics of the decision problem are to

be discussed). The decision problem under study may be represented by a model in terms
of the following elements:
1. The Decision maker.
The decision maker refers to individual or a group of individuals responsible for making
the choice of an appropriate course of action amongst the available courses of action.
2. Courses of Action.
The alternative courses of action or strategies are the acts that are available to decision
maker. The decision analysis involves a selection among two or more alternative courses
of action and the problem is to choose the best of these alternatives, in order to achieve an
objective.
An example of an act or state of nature is the number of units of a particular item to be
ordered for stock.
3. State of Nature
The events identify the occurrences which are outside of the decision makers control and
which determine the level of success for a given act. These events are often called state
of nature, or outcomes. (The decision maker has no control over which event take place
and would only attach a subjective probability or degree of belief of occurrence of each).
An example of an event or state of nature is the level of market for a particular item
during a stipulated time period.
4. Payoff
Each combination of a course of action and a state of nature is associated with a payoff,
which measures the net benefit to the decision maker that accrues from a given
combination of decision alternatives and events. They are also known as conditional
profit values or conditional economic consequences.
For example the conditional profit can be of Rs. 50 associated with the action of stocking
15 units of an item when the outcome is a demand of 10 units of that item. Costs can be
handled as negative profit.
5. Payoff Table
For a given problem, a payoff table, lists the states of nature (outcomes or events) which
are mutually exclusive as well as collectively exhaustive and a set of given courses of
action (strategies). For each combination of state of nature and course of action, the
payoff is calculated.
Suppose the problem under consideration has m possible events or states of nature
denoted by S1, S2, , Sm and n alternative courses of action (strategies) denoted by A1,
A2, , An. Then the pay-off corresponding to strategy Aj of the decision maker under the
state of nature Si will be denoted by Pij (i = 1, 2, , m; j = 1, 2, , n). The mn pay-off
can be conveniently arranged in a tabular form known as a m x n payoff table.
Payoff Table (Conditional Pay-off)
Page 116
AMF104
The weighted profit associated with a given combination of state of nature and course of
action is obtained by multiplying the payoff for that state of nature and course of action
by the probability of occurrence of the given state of nature (outcome). Although it is not
universal, mostly payoff is measured in tern of monetary units.
Opportunity Loss Table
The opportunity loss has been defined to be the difference between the highest possible
profit for a state of nature and the actual profit obtained for the particular action taken,
i.e., an opportunity loss is the loss incurred due to failure of not adopting the best possible
course of action or strategy. Opportunity losses are calculated separately for each state of
nature that might occur. For a given state of nature the opportunity loss of possible course
of action is the difference between the pay-off value for that course of action and the pay
off for the best possible course of action that could have been selected.
Consider a fixed state of nature Si. The pay-offs corresponding to the n strategies are
given by Pi1, Pi2, , Pin Suppose Mi is the maximum of these quantities. Then if A1 is
used by the decision maker there is loss of opportunity of M1-Pi1 and so on. Then a table
showing opportunity loss can be computed as follows:
Example:
Suppose electrical goods have a resource base to buy for resale purposes in a market,
electric irons in the range of 0 to 4. His resource base permits him to buy:1. Nothing or
2. 1 or
3. 2 or
4. 3 or
5. 4 units.
These are his alternative courses of action or strategies. The demand for electric irons in
any month is something beyond his control and hence is a state of nature. Let us presume
that the dealer does not know how many units will be bought from him by the customers.
The demand could be anything from 0 to 4. The dealer can buy each unit of electric iron
@ Rs. 40 and can sell it at Rs. 45 each, his margin being Rs 5 per unit. Assume the stock
on hand is valueless. Portray in a pay off table and opportunity loss table the quantum of
total margin (loss), that he gets in relation to various alternative strategies and states of
nature.
Solution
Conditional Pay-off value = (Marginal Profit) (Units sold) (Marginal Loss) (Units not
sold)
= (Rs. 45 Rs. 40) (Units sold) (Rs. 40) (Units not sold)
Payoff Table
Page 117
AMF104
Opportunity Loss Table
5.3 Types of Decision Making Criteria

The main aim of decision theory is to help the decision-maker in selecting best course of
action from amongst the available courses of action. Based upon the type of information
that is available about the occurrence of the various states of nature as well as upon the
decision environment, the decision models have been classified into four types: 1. Certainty,
2. Risk,
3. Uncertainty, and
4. Conflict.
Decision Making Under Certainty
In this environment, the decision maker knows with certainty the consequence of
selecting every course of action or decision choice. In this type of decision problems the
decision maker presumes that only one state of nature is relevant for his purposes. He
identifies this state of nature, takes it for granted and presumes complete knowledge as to
its occurrence.
For example, suppose a person has Rs. 20,000 to invest for a one year period. One
alternative is to open a saving account paying 6 percent interest and another is to invest in
a government treasury note paying 10 percent interest. If both investments are secure and
guaranteed, then there is a certainty that the treasury not will be the better investment.
The various techniques used for solving problems under certainty are:
1. Breakeven analysis,
2. Linear Programming models,
3. Transportation and Assignment techniques,
4. Inventory models under certainty,
5. Analysis of cost, volume and profit, etc.
Page 118
AMF104
Upon systematically describing the problem and recording all necessary data, judgments,
and preferences, the decision maker must synthesize the information set before him/her
using the most appropriate decision rules. Decision rules prescribe how an individual
faced with a decision under uncertainty should go about choosing a course of action
consistent with the individuals basic judgments and preferences.
Hurwicz criterion;
Laplace insufficient reason criterion;
Maximax criterion;
Maximin criterion;
Savage minimax regret criterion.
Decision Rules:
A tool commonly used to display information needed for the decision process is a payoff
matrix or decision table. The table shown below is an example of a payoff matrix. The
A's stand for the alternative actions available to the decision maker. These actions
represent the controllable variables in the system. The uncertain events or states of
nature are represented by the S's. Each S has an associated probability of its occurance,
denoted P. (However, the only decsion rule that makes use of the probabilities is the
Laplace criterion.) The payoff is the numerical value associated with an action and a
particular state of nature. This numerical value can represent monetary value, utility, or
both. This type of table will be used to illustrate each type of decision rule.
Actions\States
A1
A2
A3
S1 (P=.25)
20
0
50
S2 (P=.25)
60
20
-20
S3 (P=.25)
-60
-20
-80
S4 (P=.25)
20
20
20
i. Hurwicz criterion.
This approach attempts to strike a balance between the maximax and maximin criteria. It
suggests that the minimum and maximum of each strategy should be averaged
using a and 1 - a as weights. a represents the index of pessimism and the alternative with
the highest average is selected. The index reflects the decision makers attitude towards
risk taking. A cautious decision maker will set a = 1 which reduces the Hurwicz criterion
to the maximin criterion. An adventurous decision maker will set a = 0 which reduces
the Hurwicz criterion to the maximax criterion. A decision table illustrating the
application of this criterion (with a = .5) to a decision situation is shown below.
Actions\States
A1
A2
A3
S1
20
0
50
S2
60
20
-20
S3
-60
-20
-80
S4
20
20
20
a = .5
0
0
-15
Page 119
AMF104
ii. Laplace insufficient reason criterion.

The Laplace insufficient reason criterion postulates that if no information is available
about the probabilities of the various outcomes, it is reasonable to assume that they are
equally likely. Therefore, if there are n outcomes, the probability of each is 1/n. This
approach also suggests that the decision maker calculate the expected payoff for each
alternative and select the alternative with the largest value. The use of expected values
distinguishes this approach from the criteria that use only extreme payoffs. This
characteristic makes the approach similar to decision making under risk. A table
illustrates
this
criterion
below.
Actions\States
A1
A2
A3
S1 (P=.25)
20
0
50
S2 (P=.25)
60
20
-20
S3 (P=.25)
-60
-20
-80
S4 (P=.25)
20
20
20
Expected Payoff:
0
5
-7.5
iii. Maximax criterion.

The maximax criterion is an optimistic approach. It suggests that the decision maker
examine the maximum payoffs of alternatives and choose the alternative whose outcome
is the best. This criterion appeals to the adventurous decision maker who is attracted by
high payoffs. This approach may also appeal to a decision maker who likes to gamble
and who is in the position to withstand any losses without substantial inconvenience. See
the table below for an illustration of this criterion.
Actions\States
A1
A2
A3
S1
20
0
50
S2
60
20
-20
S3
-60
-20
-80
S4
20
20
20
Max Payoff
60
20
50
iv. Maximin criterion.

The maximin criterion is a pessimistic approach. It suggests that the decision maker
examine only the minimum payoffs of alternatives and choose the alternative whose
outcome is the least bad. This criterion appeals to the cautious decision maker who seeks
to ensure that in the event of an unfavorable outcome, there is at least a known minimum
payoff. This approach may be justified because the minimum payoffs may have a higher
probability of occurrence or the lowest payoff may lead to an extremely unfavorable
outcome. This criterion is illustrated in the table below.
Page 120
AMF104
Actions\States
A1
A2
A3

S1
20
0
50
S2
60
20
-20
S3
-60
-20
-80
S4
20
20
20
Min payoff
-60
-20
-80
v. Savage minimax regret criterion.

The Savage minimax regret criterion examines the regret, opportunity cost or loss
resulting when a particular situation occurs and the payoff of the selected alternative is
smaller than the payoff that could have been attained with that particular situation. The
regret corresponding to a particular payoff Xij is defined as Rij = Xj(max) Xij where
Xj(max) is the maximum payoff attainable under the situation Sj. This definition of
regret allows the decision maker to transform the payoff matrix into a regret matrix. The
minimax criterion suggests that the decision maker look at the maximum regret of each
strategy and select the one with the smallest value. This approach appeals to cautious
decision makers who want to ensure that the selected alternative does well when
compared to other alternatives regardless of what situation arises. It is particularly
attractive to a decision maker who knows that several competitors face identical or
similar circumstances and who is aware that the decision makers performance will be
evaluated in relation to the competitors. This criterion is applied to the same decision
situation and transforms the payoff matrix into a regret matrix. This is shown below
Actions\States
A1
A2
A3
R1
30
50
0
R2
0
40
80
R3
40
0
60
R4
0
0
0
Max Regret
40
50
80
Decision Making Under Risk

In this situation, the decision maker faces several states of nature. But one is supposed to
have :1. Believable evidential information,
2. Knowledge,
3. Experience or
4. Judgment,
To enable him to assign probability values to the likelihood of occurrence of each state of
nature. Probabilities could be assigned to future events by reference to similar previous
experience and information. Sometimes past experience, or past records often enable the
decision-maker to assign probability values to the likely possible occurrence of each state
of nature.
Knowing the probability distribution of the states of nature, the best decision is to select
that course of action which has the largest expected payoff value. The most widely used
Page 121
AMF104
criterion for evaluating the alternative courses of action, in this case is the Expected
Monetary Value (EMV) or expected utility. The objective of decision-making here is
to optimize the expected payoff, which may mean either maximization of expected profit
or minimization of expected regret.
For decision problems involving risk situations one most popular method of decision
criterion for evaluating the alternative strategies is the expected monetary value or
expected pay-off. The objective should be to optimize the expected payoff, which may
mean either maximization of expected profit or minimization of expected regret.
5.4 Expected Monetary Value (EMV)

Given a payoff table with conditional values (pay-offs) and probability assessments for
all states of nature, it is possible to determine expected monetary value (EMV) for each
course of action if the decision could be repeated a large number of times. The EMV for
given course of action is just sum of possible pay-off of the alternative, each weighted by
the probability of that payoff occurring.
Suppose there is prior knowledge either on the basic of past experience or on a subject
basis that the state of nature Si has a probability of occurrence P(Sj) [j = 1, 2, , k]. Then
the expected monetary value corresponding to courses of action Ai of the decision maker
is given by:
EMV (Ai) = Pi1 P(S1) + Pi2 P(S2) + Pi3 P(S3) + + Pij P(Sj) + .. + Pim P(Sm)
1. Usually, the conditional values (payoffs) are measured in monetary units and hence the
expected pay-off in the context of decision theory is referred to as the expected monetary
value (EMV).
2. The criterion of selecting the maximum expected pay off act is sometimes referred to
as the Bayes decision rule. This is also known as expected monetary value criterion
abbreviated as EMV criterion. It is frequently used in decision making to find the best
strategy.
3. Steps for calculating EMV. The various steps involved are as follows:
(i) Construct payoff table listing the various courses of action and state of nature.
(ii) List the payoff associated with each possible combination of course of action and
state of nature along with the corresponding probability of each state of nature.
(iii) Calculate an EMV for each course of action by multiplying (weighting) the
conditional profits or costs or losses (payoff) by the associated probabilities and sum
these weighted value for each course of action.
(iv)On the basis of specified decision, determine the course of action which corresponds
to the EMV.
Example:
Using the structure and date of illustration on the previous page determine the EMV for
each alternative.
Computation of Expected Monetary Value
Page 122
AMF104
The maximum value of EMV is corresponding to course of action A3. Hence, according
to the EMV criterion the dealer can buy w electric irons which yields the maximum
EMV,
i.e., EMV* = EMV (A3) = Rs 3.7
5.5 Expected Value of Perfect Information

The expected value with perfect information is the expected or average return, in the long
run, if we have perfect information before a decision has to be made. In order to calculate
this value, we choose the best alternative for each state of nature and multiply its payoff
times the probability of occurrence of that state of nature.
Expected value with perfect information = (Best outcome or consequence for 1st state of
nature) X (Probability of 1st state of nature) + (Best outcome for 2nd state of nature) X
(Probability of 2nd state of nature) + + (Best outcome for last state of nature) X
(Probability of last state of nature)
The Expected value of perfect information, EVPI, is the expected outcome with perfect
information minus the expected outcome without perfect information, namely, the
maximum EMV.
EVPI = Expected value with perfect information maximum EMV
From previous example the dealer can calculate the maximum, that he would pay for
information, i.e., the expected value of perfect information, or EVPI. He follows a twostate process:1. First of all, the expected value with perfect information is computed.
2. Then, using this information EVPI is calculated.
The procedure is outlined as follows:
1. The best outcome for the state of nature S1 is with a payoff of Rs. 0, the best outcome
for the state of nature S2 is with a payoff of Rs. 5 and so on.
Expected value with perfect information = (Re. 0)(0.04) + (Rs. 5)(0.06) + (Rs. 10)(0.20)
+ (Rs.
15)(0.30) + (Rs. 20)(0.40)
= Rs. 0.30 + Rs. 2.0 + Rs. 4.5 + Rs. 8.0 = Rs.
14.8
Page 123
AMF104
2. The maximum EMV is Rs. 3.70, which is the expected outcome without perfect
information.
EVPI = Expected value with perfect information-maximum EMV
= Rs. 14.8 Rs. 3.70 = Rs. 11.10
5.6 Expected Opportunity Loss (EOL)

An alternative approach to maximizing expected monetary value (EMV) is to minimize
expected opportunity loss (EOL). Expected opportunity loss or expected value of regrets
is calculated in the same manner as the EMV criterion discussed earlier.
If P(S1), P(S2), , P(Sm) are the prior probabilities corresponding to the states of nature
S1, S2, , Sm then the expected opportunity loss corresponding to act Aj is
EOL (AJ) = (M1 P1j) P(S1) + (M2 P2j) P(S2) + + (Mm Pmj) P(Sm) ; (j = 1,2,
, n)
Where (Mj-Oij) [i = 1, 2, , m; j = 1, 2, , n] refers to the possible regrets of an
alternative with reference to the highest payoff of an alternative.
Remarks
1. According to the expected opportunity loss (EOL) criterion the decision maker should
choose the strategy with the minimum expected opportunity loss.
2. It may be noted that although the EMV criterion and the EOL criterion are
conceptually different, however, in practice both the criteria will lead to the choice of the
same strategy or course of action. From the equation the expected opportunity loss
(EOL) corresponding to act Ai is M1 P(S1) + M2 P(S2) + +Mk P(Sk) {Oi1 P(S1) + Oi2
P(S2) ++ Oik P(Sk)}
The quantity inside the bracket is the EMV of Ai. Hence the strategy with the maximum
EMV is also the course of action with the minimum EOL.
3. The EOL of the optimum course of action is also same as Expected Value of Perfect
Information (EVPI).
Conceptually, this is the loss or penalty which the decision maker is suffering due to
uncertainty or equivalently due to lack of perfect information.
Page 124
AMF104
END CHAPTER-5 QUIZZES

Answer the following questions:
1. From which of the following can retailers immediately determine how many cases
they should stock each day to maximize profits?
(a). Expected profit from each stocking action
(b). Expected loss from each stocking action
(c). Conditional profit table
(d). Both (a) and (b)
2. Consider a conditional loss table with possible sales levels listed vertically in the
first column and possible stock actions listed horizontally in the first row. Any
value in a column below a zero indicates:
(a). An opportunity loss
(b). An obsolescene loss
(c). A profit
(d). None of these
3. Suppose that the only two possible stocking actions for a particular product are 10
and 15 bottles. Expected profits are $3.35 for 10 bottles and $3.5 for 15 bottles. If
expected loss for 10 bottles is $1.1 and 15 bottles are stocked, we may conclude
that expected loss is:
(a). Higher than $1.1
(b). Lower than $1.1
(c). Higher than $2.25
(d). Cannot be determined from the above information
4. A businessman who is said to be averse to risk:
(a). Prefers to take large risks to earn large gains
(b). Prefers to act any time the expected monetary value is positive
(c). Avoids all situations but those with very high expected values.
(d). None of these
5. Suppose that the actual standard deviation of a normal distribution is unknown,
but that you are told, The odds are 5 to 3 that a random observation is between
500 and 900, You know that the mean is 700. The area under the normal curve
between the values 700 and 900 is:
(a). 8/16
(b). 3/8
(c). 5/8
(d). 5/16
Page 125
AMF104
6. A certain product sells for $25 and is purchased by the retailer for $17. If it is not
sold within 2 weeks, the retailer will recoup only $8 of his original $17
investment because of spoilage. The value of MP for this situation is:
(a). $9
(b). $17
(c). $8
(d). $25
7. For a particular decision, the total benefit of a new plant is $ 18,200,000. If the
expected net benefit of this plant is $11,500,000, what is the cost of the plant?
(a). $6,700,000
(b). $8,400,000
(c). $ 29,700,000
(d). $11,500,000
8. A person who is attempting to maximize his expected utility would use the
expected value criterion, if:
(a). He is risk-averse
(b). He is risk-seeker
(c). He has non linear utility curve
(d). none of these
9. When a problem has a large number of possible actions, we would normally use a
(a). Conditional table
(b). Marginal table
(c). Utility table
(d). Marginal analysis
10. Decision theory deals with:
(a). Making decisions under conditions of uncertainty
(b). Quantity-oriented decisions, ignoring financial repercussions
(c). Worth of additional information to the decision maker.
(d). Both (a) and (c)
Page 126
AMF104
CHAPTER-6
LINEAR PROGRAMMING
CONTENTS
6.1 Definition of Operations Research and Its Main Features or Characteristics......................................129
6.2 Features/Characteristics of OR ............................................................................................................130
6.3 Scope of Operations Research .............................................................................................................131
6.4 Methodology of Operations Research .................................................................................................134
6.5 Models in Operations Research ...........................................................................................................137
6.6 Linear vs Non-linear Programming Model .........................................................................................138
Page 127
AMF104
Linear Programming
The organizations today are working in a highly competitive and dynamic external
environment. Not only the number of decisions required to be made has increased
tremendously but the time period within which these have to be made has also shortened
considerably. Decisions can no longer be taken on the basis of personal experience or
gut feeling. This has resulted in a need for the application of appropriate scientific
methods in decision making.
The name operations research (O.R.) is taken directly from the context in which it was
developed and applied. Subsequently, it came to be known by several other names such
as:1. Management science,
2. Decision science.
3. Quantitative methods and
4. Operational analysis.
After the war, the success of the military provided a much needed boost to the discipline
for. The industry during that period was struggling to cope up with the increase in
complexity. There were complex decision problems. Solutions to which were neither
apparent nor forthcoming.
The successful implementation of operations research technique during the war was
probably the most important event, the industry was waiting for. This paved the way for
the application of OR to the business & industry. As .the business requirements changed,
newer and better operations research techniques evolved.
Another factor which has significantly contributed to the development of OR during the
last few decades is the development of high speed computers capable of performing a
large number of operations in a very short time period. Since 1960s, there has been a
rapid increase in the areas in which operations research has found acceptability. A part
from the industry and the business, OR also finds applicability in areas such as:1. Regional planning.
2. Telecommunications,
3. Crime investigation,
5. Public transportation and
6. Medical sciences.
Operations research has now become one of the most importable tools in decisionmaking and is currently being taught under various management and business programs.
Due to the fast pace at which it has developed & gained widespread acceptance,
professional society devoted to the cause of operations research and its allied activities
have been founded world-wide. e.g. Institute of Management Sciences founded in 1953
seeks to integrate scientific knowledge with the management of an industrial house - by
the development of quantitative methodology to the functional aspects of management.
Critical Path Method (CPM) and Project Evaluation and Review Technique (PERT) were
developed in 1958. These are extensively used in scheduling and monitoring complex
and lengthy projects having a greater time and cost over-run. PERT is now considered as
the ultimate management technique and finds applicability in such diverse areas as: Master of Finance & Control Copyright Amity university India
Page 128
AMF104
1. Construction projects,
2. Ship-building projects,
3. Transportation projects and
4. Military projects.
A large number of business and industrial houses adopted the methodology of operations
research techniques by early 1970s.
The first use of OR techniques in India, was in the year 1949 at Hyderabad, where at the
Regional Research Institute, an independent operations research unit was set-up. To
identify evaluate and solve the problems related to:1. Planning,
2. Purchases and
3. Proper maintenance of stores
6.1 Definition of Operations Research and Its Main Features or

Characteristics
The phrase operations-research is taken from military-operations where it was first used.
It lays down emphasis on the various activities within an organization with a view to
control and co-ordinate them.
Operational Research seeks the optimal solution to a problem & not merely one which
gives better solutions than the one presently in use. A decision taken may not be
acceptable to each & every department with-in an organization but it should be the
optimal decision for greater part of the area organization. To arrive at such decisions the
decision maker must follow up the effects & interactions of a particular decision.
Although a large number of definitions of operations research have been given from time
to time, yet is it almost impossible to provide a single definition of OR which has a
uniform acceptability.
However, we provide some of the important definitions of OR:
(1). Operational Research is the application of the methods of science to complex
problems arising in the direction & management of large number of
a. men,
b. machines,
c. materials and
d. money,
In industry, business, government & defense the distinctive approach is to develop a
scientific model of the system, incorporating measurement of the factors such as chance
& risk, with which to predict & compare the outcomes of alterative decision, strategies &
contra is the purpose is to help the management, determine its policy and actions
scientifically.
Besides being too lengthy, this definition has also been criticized since it focuses on
complex problems & large systems, giving the impression that it is a highly sophisticated
& technical approach which is suitable on by for very large and complex organizations.
Page 129
AMF104
OR is an experimental & applied science devoted to observing understanding &

predicting the behavior of purposeful man and machine in systems; arid operations
research workers are actively engaged in applying this- knowledge to practical problems
in business, government and society.
6.2 Features/Characteristics of OR
Following are the salient features and characteristics of OR:
1. Inter-disciplinary Team Approach.
It is one of the most important features of operations research. What it signifies is that it
is impossible for a single individual to have a thorough and extensive knowledge of all
the aspects of a particular problem which is to be analyzed with the help of OR. This
would require a team of individuals having varied & diverse skills & backgrounds. Such
a team should be inter-disciplinary and., it should include individuals having adequate
degree of proven skills in the fields of:a. Statistics,
b. Engineering,
c. Computer_,
d. Economies and
e. Mathematics etc.
Every expert member of this team analyses each and every aspect of the problem and
determines if a similar problem has ever been undertaken previously by him or his
colleagues. By functioning in such a manner, each member of the team suggests an
approach which may be optimal for the problem under consideration, by utilizing his
experience and skills.
Hence, operations research makes optimal and most effective utilization of people from
diverse disciplines for developing latest tools and techniques applicable to the business.
For example, while working on production planning & control in an organization, one
may need the services of a production engineer, who knows the functions of the
production or assembly-line, a Cost accountant and a statistician. In this manner, every
member of the team so formed benefits from the views of other member of the team.
Such a solution developed through team work and collaboration has a relatively higher
chance of acceptance by the top management.
2. Methodological Approach.
OR is a highly systematic and scientific- method. It follows a fixed & definite pattern
from the start to finish. OR process starts with the careful observation & formulation
where the problem is broken down into various workable and manageable parts and then
carefully, observed by the OR team. The essence of the real problem is then sought to be
translated into the OR model. This model is examined, solutions ascertained, most
optimal ones applied to the model and results examined. If the results are found to be
satisfactory, these solutions are applied to the real or actual problem at hand.
Page 130
AMF104
3. Objectivity Approach
The primary objective of OR is to find the best (optimal) solution to a problem under
consideration. To achieve this goal it is first necessary to define a measure of
effectiveness that takes into consideration the main objectives of the organization. This
measure can then be used as a standard or bench-mark to compare & evaluate the
alternative actions.
4. Totalistic Approach.
Any action or decision within an organization must first be analyzed & their interactions
and the effect on the entire organization carefully considered. Under totalistic approach,
any action should not be seen in isolation. Before evaluating it, its impact on the whole
organization should always be kept in mind.
For example, to remove bottlenecks in product-ion a production manger can store large
quantity of inventories, raw materials & finished goods. While this is important from the
production managers view-point it may lead to a situation where there may arise a direct
conflict between the marketing and the finance departments of the organization.
In such a scenario, the OR team examines all the related factors such as cost of raw
materials, holding & storage costs & competitors prices etc. Based on these, an
appropriate OR-model can be formulated for the solution of the problem.
5. Continuous Process.
OR is a never-ending continuous process, It does not stop when a relevant OR model is
applied to the problem since this may further create new problems in the related sectors
as well as in the implementation, of the decision taken. To implement the optimal
solutions (decisions), sometimes a change in the organizational structure might be
needed. This is provided again by the OR. After implementation stage, the controlling of
results is also the primary function of operations research. Therefore, it can be considered
to be a continuous process.
6. Broad Outlook.
OR has a very great scope. It not only seeks to provide the most suitable optimal
solutions to a problem - but that it also uncovers new problems by study methods.
7. Economy of Operations.
The main function of OR is to provide solution to organizational problem and to assist in
decision-making. In case of any conflict or complexity in a situation, OR helps to
minimize costs and maximize profits resulting into economy of operations. Quite often
the problems are of such grave nature that they cannot be solved on the basis of intuition
or past experiences e.g. New product development, product innovation, entry into new
markets, product-mix etc. In all such cases, OR techniques are applied to achieve the
economy of operations.
6.3 Scope of Operations Research

The scope of operations research is very wide & depends upon the 3 key areas or phases
of OR. These are:1. Judgement Phase.
The various components of this phase are as under:
a. Ascertaining the operation
b. Establishing the objectives.
c. Determining the appropriate effectiveness measures
Page 131
AMF104
d. Formulating the problem

2. Research Phase
The various components of this phase are as under:
a. Observation & collection of data, to fully understand the various complex items that
form a part of the problem.
b. Hypothesis or assumptions formulation
c. Model formulation
d. Analyzing the available data
e. Testing & verifying the accuracy of the hypothesis
t. Generalizing the results,
g. Considering the alternative methods
3. Action Phase
After the above two phases are complete and an optimal solution has been obtained, in
the action phase, recommendation are made to the top management for accepting the,
proposed solutions. Such a recommendation be made by the persons who first presented
the problem for consideration or by any other person who is in a position to make an
effective decision. The decision, once accepted, is then implemented in the
implementation stage. Depending on these key areas, operations research can be applied
in a very large number of real life complex situations. However, any tools or techniques
of OR should be utilized only after carefully considering the organizational requirements
& should not be followed blindly. Moreover; the relative merits & demerits of a
technique should always be kept in mind.
It should be ensured that the basic postulates or assumptions on which it is based should
not change. As and when the new requirement emerge and the existing problems become
redundant, the technique in-use should either be mediated or replaced by another one
immediately
In general, OR techniques can be applied extensively in the following areas in an
organization:
1. Finance:
i. Cash flow analysis, cash management models
ii. Investment portfolios
iii. Long range capital needs
iv. Financial planning models
v. Dividend policy
vi. Equipment replacement policy
viii. Capital allocation under various alternatives.
2. Accounts:
i. Auditing and verification work
ii. Credit policy, credit risks
iii. Credit worthiness/Ratings
iv. Transfer pricing
v. Cost accounting - by-products & standard costs
Page 132
AMF104
3. Marketing:
i. Optimum product mix
ii. Advertising mix
iii. Sales promotion mix
iv. Product selection, timing, competitive action
v. No. of salesmen, frequency of calling, time spent on future customers
vi. Packaging
vii. Ware-housing
viii. Introducing a new product
4. Purchases Management:
i. Terms & conditions for purchases
ii. Pricing of purchases
iii. Quantities to be purchased
iv. When to purchase
v. Bidding policies
vi. Searching for a new supplier or source
vii. Exploiting a new supplier or source
5. Production Management:
a. Physical Distribution
i. Location of warehouse
ii. Size of warehouse
iii. Channels of distribution
iv. Centre of distributions & retail outlets
b. Distribution policy
i. Facilities Planning
ii. No. of factories/warehouses
iii. Location of factories/warehouses
iv. Facilities for loading/unloading for road as well as railway transport schedule
v. Choosing sites with improved/better facilities
c. Manufacturing
i. Production scheduling
ii. Production sequencing
iii. Minimizing in-process inventory
iv. Minimizing wastages/scraps/losses
v. Stabilizing production/lay-offs/product-mix
d. Maintenance & Project Scheduling
i. Maintenance policy
ii. Preventive maintenance
iii. Corrective maintenance
iv. Maintenance of crew size
v. Project scheduling
vi. Allocating the resources
Page 133
AMF104
6. Research & Development (R & D):

i. Areas interested in or key research areas
ii. Selecting the project
iii. Fixing the R & D budget.
iv Reliability & alternative design
v. Finding time over runs
vi. Finding cost - over runs
vii Controlling
7. Organisational Development/Personnel Management:
i. Minimizing the need for temporary help through better scheduling
ii. Selecting the skilled personnel at lowest possible cost
iii. Recruitment policies
iv. Assigning the right job to the right man
v. Right blend of age & skills.
6.4 Methodology of Operations Research

OR involves construction of mathematical models for organizational problems to clearly
represent the complex & uncertain situation. These models are then analyzed to check
their validity & for making proper recommendations to the management. Since OR is a
complicated technique, a problem dealing with operations research is broken down into
the following phases to facilitate proper analysis and evaluation of the alternatives:
1. Acknowledging the problem & presenting it in a clear form
2. Based on the preceding step, trying to create a model having all the relevant
characteristics of the problem.
3. Specifying any alternatives available at that point of time
4. Developing a feasible & acceptable solution on the basis of the given criterion or
constraints.
5. Choosing the alternative which best meets the specified criterion
6. Testing of the solution prior to implementation on real life situations with the help of
simulation etc.
7. Finally implementing the decision deemed to fit in with the organizational objectives.
The various steps involved in OR are as under:Step I : Defining and Formulating the Problem.
The identification of all important interactions between the gray area or the problem area
and all other) operations of an
organisation in totality results in the definition & the formulation of the problem. .
To develop a clear, definite statement of the problem is the first step in OR and often the
most difficult. The reason being that each problem has its unique characteristics and
hence may require the applications of totally distinct approaches to define and
subsequently formulate it. It is very important to look behind the symptoms of a problem
and identify its true causes responsible for the current state of affairs of the organisation.
Page 134
AMF104
Some of the main steps required to be taken into considerations for formulating the
problem are given below:
1. Organisational Culture
The first step in identification of the problem under study is to have a very good
understanding of the organisation its culture, objectives and expectations. This is often
overlooked by the inexperienced managers leading finally to a solution-which may not
work in the organisational climate.
2. Grey or Problem Area
All problems should be ranked properly priority-wise, after their proper identification.
There may be many problems all of which may not be equally important. Listing down
all the problems, breaking them further into components and then ranking in the order of
significance, importance or urgency helps the OR team to focus most on the area having
the highest or the top priority.
Nominally, strategic problems (related to the long-term performance - profitability,
market share, growth etc.) get preference over tactical problems (concerned with the daily
working of an organisation). Due to a multiplicity of objectives, some of the objectives
may be direct; while the other may be indirect or implied ones. Such objectives which are
implied by the other objectives should be eliminated.
3. Climate for Decision Making
An organisation is working in an environment which is continuously being affected by
the parties outside to it. e.g. Government and other statutory authorities, ,competitors and
general public etc. Apart from a clear cut benefit/cost analysis, where the benefit should
be more than the related costs for a decision to be implemented, an organization should
also seek to know the views of the outside parties.
For example, certain restrictions. imposed by the relevant government (rules relating to
environmental pollution, elimination of air and water pollution etc.), the policies of
competitors (pricing policy, marketing policy) & the hopes and
expectations of the c.9nsumers (low price, good quality, safety usage) would also have a
direct bearing on the organizations decisions.
4. Availability of Various Alternatives
Availability of alternatives gives a cushion or margin of safety to the organisation - and
minimizes the chances of incurring losses due to poor or faulty decisions. Some of the
alternatives are inherent to an organisation and flow out of its organisational structure,
while others are discovered during the process of problem formulation. However,
alternatives which are not feasible or which do not satisfy the objectives or the constraints
may require to be eliminated.
Step II: Developing and Constructing a Model
The second step in the OR methodology is to develop and construct a mathematical
model. A model is simply a mathematical representation of a situation. It consists of
mathematical relationships in the form filtrations or inclinations.
By applying the techniques of OR such as:Master of Finance & Control Copyright Amity university India
Page 135
AMF104
a. linear programming,
b. transportation,
c. assignment,and
d. simulation etc. the solution of the model is obtained.
A mathematical model has two main advantages over other types of models
e. it allows to Use the high-speed new generation computing machines and
f. one may also employ the advanced mathematical tools and techniques to obtain the
desired solution.
A mathematical model has three important constituents:
1. Decision Variables and Parameters
The controllable or decision variables are the unknowns which can be obtained by
solving the model. The given or the known values are known as parameters. These maybe
of two types
(a) Deterministic or
(b) Probabilistic.
The objective function and the constraints function are related to the decision variables
through parameters.
2. Constraints
Constraints are essentially the limitations within which the organisation has to function.
These are the limiting factors which confine the activities of an organisation to certain
well deified areas. Such factors could be implicit as well as explicit.
The constraints thus restrict the decision variables to a range offeasib1e values.
3. Objective Function
It show the objective required to be achieved by the organisation. At the optimal or the
best value of solution, the objective function and hence the objectives of the firm are
maximized. Thus, it measures the effectiveness of the system.
Step III: Solving the Model
After the model has been developed and formulated, in this third stage those values of
decision variables are ascertained which would optimize the objective function of the
firm. The choice of the technique required to solve the model would depend upon the
nature and type of the available data and the extent of complexity involved: After the
optimal solution has been reached, a post-optimal analysis called Sensitivity Analysis
can be performed to ascertain the behaviour of the system, changes in systems design,
parameters and specifications.
Sensitivity analysis is a very important part at this stage as it gives an idea as to the
accuracy of data and the various postulates and assumptions that are required to be made
in the earlier stages.
Step IV: Solution-Testing or Model-Validation
Since the optimal solution depends on both-the input data as well as the model, they both
require extensive testing. If the original data were collected by say interviews, some
additional data must also be collected from some other source such as sampling etc. The
idea is to have two sets of data from distinct sources which would then be compared and
statistical tests performed on them. The differences or any discrepancy in the result may
hence easily be detected: The model may not be the most suitable one where the results
Page 136
AMF104
are incompetent in relation to the problem under consideration, but the input data are
accurate. The option in such a case is to thoroughly check the model to ensure that it is
logical and that it represents the real problem or situation.
Retrospective Testing Retrospective Testing can also be employed for a model
validation. The model may be compared with a historical standard to ascertain, whether it
would have predicted correctly what has since been observed to occur.
Tracking Trackingis another technique used in testing the solution generated by OR.
Under this method, the parameters of the real system are varied systematically & the
model is minutely observed for its capability to track all such changes.
However, if it is not possible to test the solution before its implementation, it is better be
patient and cautious in approach so that the solution is allowed to be implemented in
phases. With the passage of time, as and when the model proves itself, other areas can
also be allowed to be covered.
Step V: Implementation Stage
This is the process of incorporating the solution obtained through operations research,
into the organisation. The top management making the decisions has a two-fold task i.e.
a. to identify good decision alternatives and
b. to select those alternatives which are capable of being implemented.
In this context, the decision maker should be when and truly aware of any assumptions as
well as the limitations of the model. Moreover, quite often the feasible optimal solution to
the problem may not prove to be advantageous to the organisation, as it may have a direct
conflict with the system and the ideas and the capabilities of the people who are a part of
the system. As such, many theoretically sound OR solutions may not see the light of the
implementation stage.
Step VI: Establishing Control Mechanisms
No system can be assumed to be completely static. Every instant of time, it is interacting
with its external environment which is in a. continuous state of change. Therefore control
mechanisms -either inherent or external are required to ensure that the changing
conditions do not make a solution unsuitable. Thus, the controls serve a very useful
purpose they maintain the operational effectiveness of the solution.
For example, a recession or a boom in the economy, an increasing or decreasing demand,
fall in the general purchasing power or per capita .income etc. are just some of the
changes that has to be closely monitored & the model modified accordingly.
6.5 Models in Operations Research

Contrary to the popular belief, operations research is not a highly advanced and
complicated discipline, totally independent of any other branch of menagerie geared
towards helping the managers in making effective decisions, On the country, it borrows
quite heavily from other fields especially the mathematics & its allied disciplines. OR
should be viewed as a mix of all the problem solving techniques available to the
management out of which an organisation has to settle for the one which best suits its
long term requirements. Following are some of the models extensively used in OR in
solving various types of problems:
1. Allocation Models
The resources- whether natural or man-made or financial are, scarce while the pressure
on these resources is many-fold due to inversely competitive needs of the organization.
Page 137
AMF104
Allocation models are concerned solely with the problem of optimal allocation of
precious and scarce resources for optimizing the given objective function subject to the
limiting factors prevailing at that point of time or the constraints within which a firm has
to most effectively operate. All such problems related to the allocation aspects are
collectively known as the mathematical programming problems.
Linear programming is very widely used in OR is just an example of the mathematical
programming problem.
6.6 Linear vs Non-linear Programming Model

The nature of the objective function along with its constraints separates a linear
programming problem from a non-linear one. Organisational problems, where both the
objective and the constraints functions are capable of being represented as a linear
function are solved with .the help of linear programming. Even the assignment &
transportation problems can be viewed essentially as a linear programming problem
though of a special type requiring procedures devised specially for them. On the other
hand, if the decision -parameters could be restricted to either integer value or zero-one
values in a linear programming problem, these are known as integer programming and
zero one programming respectively. Linear goal programming models are concerned with
those types of special problems which have conflicting, multiples objective functions in
relation to their linear constraints.
2. Simulation Models
It is very much similar to the managements trial and error approach in decision-making.
To simulate is to duplicate the features of the problem in a working model, which is then
solved using well known OR techniques. The results hence obtained are tested for
sensitivity analysis, after which these are applied to the original problem. By simulating
the characteristics and features of the organisational problems not a model, the various
decisions can be evaluated and-the risks inherent in actually implementing them is
drastically cut-down or eliminated. Simulation models are normally used for those kinds
of problems or situations which cannot be studied or understood by any other technique.
3. Inventory Models
The inventory models are primarily concerned with the optimal stock or inventory
policies of the organisation. Inventory problems deal with the determination of optimum
levels of different inventory items and ordering policies, optimizing a pre-specified
standard of effectiveness. It is concerned with the factors such as:
1. demand per unit time,
2. cost of placing orders,
3. costs incurred while keeping the goods in inventory,
4. stock-out costs and
5. costs of lost sales etc.
If a customer demands a certain quantity of a product, which is not available, then it
results in a lost sale. On the other hand, excess inventories mean blocked working capital
which is the life blood of modern business. Similarly, in the case of raw materials,
shol1age of even a very small item may cause bottlenecks in the production and the entire
assemb1y line may came to a halt. Inventory models are also useful in dealing with
quantity discounts and multiple products. These models can be of two types
Page 138
AMF104
1. deterministic and
2. probabilistic and are used in calculating various important decision variables such as :1. re-order quantity,
2. lead-time,
3. economic order quantity and
4. the pessimistic,. optimistic & the most likely level of stock keeping.
4. Network Models
Networking models are extensively used in planning, scheduling and controlling complex
projects which can be represented in the form of a net-work of various activities & subactivities.
Two of the most important and commonly used networking models are
1. Critical Path Method (CPM) and
2. Programme Evaluation & Review Technique (PERT).
PERT is the better known and more extensively applied of the two and it involves,
finding the time requirements of a given project, & the allocation of scarce resources to
complete the project as scheduled i.e.; within the planned stipulated time and with
minimum cost.
5. Sequencing Models
Sequencing models deal with the selection of the most appropriate or the optimal
sequence in which a series of jobs can be performed on different machines so as to
maximize the operational efficiency of the system. e.g. consider a job shop, where Jobs
are required to be processed on Y machines. Different jobs require different amounts of
time on different machines and each job must be processed on all the machines. In what
order should the jobs be processed so as to minimize the total processing time of all the
jobs. There are several variations of the same problem which can be evaluated by
sequencing models - with the different kinds of optimization criterion. Hence, sequencing
is primarily concerned with those problems in which the efficiency of operations depends
solely upon the sequence of performing a series of jobs.
6. Competitive Problems Models
The competitive problems deal with making decisions under conflict caused by opposing
interests or under competition.
Many problems related to business such as bidding for the same contract, competitions
for the market share; negotiating with labour unions and other associations etc. involve
intense competition. Games theory is the OR technique which is used in such situations,
where only one of the two or more players can win.
However, the competitive model has yet to find the widespread industrial and business
acceptability. Its biggest raw back is that it is too idealistic in outlook and fails to take
into consideration the actual reality and other related factors, within which an
organisation has to operate.
7. Queuing or Waiting Line Models
Any problem that involves waiting before the required service could be provided is
termed as a queuing or waiting line problem. These models seek to ascertain the various
important characteristics of queuing systems such as:1. average time spent in line by a customer,
2. average length of queue etc.
Page 139
AMF104
The waiting line models find very wide applicability across virtually every organisation
and in our daily life. Examples of queuing or waiting-line models are:1. waiting for service in a bank,
2. waiting list in schools,
3. waiting for purchases etc.
These models aim at minimizing the cost of providing service. Most of the realistic
waiting line problems are extremely complex and often simulation is used to analyze
such situations.
8. Replacement Models
These models are concerned with determining the optimal time required to replace
equipment or machinery that deteriorates or fails. Hence it seeks to formulate the optimal
replacement policy of an organization.
For example, when to replace the old machine with a newer one in the factory or at what
interval should an old car is replaced with a newer one? In all such cases there exists an
economic tradeoff between the increasing and the decreasing cost functions.
9. Routing and Trans-Shipment Models
These category of problems involve finding the optimal route from the starting or
invitation point (i.e., the origin) to the final or termination point (i.e., the destination),
where a finite number of possible routes are available.
For example:a. traveling salesman problems,
b. finding the shortest path and
c. transport dispatching problems,
could be solved by the routing and transshipment models..
10. Search Models
The main objectives of search models is:a. to search,
b. ascertain and
c. retrieve the relevant information required by a decision maker.
For example:a. auditing text-books for errors,
b. storage & retrieval of data in computers and
c. exploration for the natural resources are the problems falling within the domain of the
search models.
11. Markonian Models
These are the advanced operational research models and are applicable in highly
specialized cases or situation. e.g.:a. where the states of a system is capable of being accurately defined by precise
numerical value or
b. where the system is in a state of flux i.e.; it moves from one state to another which can
be computed using the theorems of probability. An example of the application or
markonian model is the brand-switching problems considered under the marketing
research.
The various steps involved in the operations research methodology can be depicted by the
help of flow-diagram as under:
Step I
Page 140
AMF104
Minute examination of the organisational environment & the problem area,

Step II
Acknowledge, analyse & rationally formulate the problem Construct a model based on
the above
Step III
Solve the model on the basis of the relevant data
Step IV
Testing the accuracy; authenticity & the reasonableness of the solution
Step VI
Implementation stage/Establishing control mechanisms then analyse the Main Reasons
For Wide-spread Applicability of OR
Reasons for Wide-spread Applicability of OR

The application of various tools and techniques of operations research has increased
tremef.1dously during the, past few decades. At present it is being extensively used in
almost all kinds of business and industrial situations.
The main factors responsible for the growth in the use of OR tools are:
1. Research and studies conducted during the 2nd World War
The classical problem solving mathematical techniques such as:a. calculus and
b. statistics,
have been applied to business situations for long, but their applicability was very limited.
These mathematical concepts were useful in solving the conventional problems of
mercantile & trade related to profits and productivity but could not be applied in cases of
problems involving
i. Scarcities of resources on one hand and ii. The profit maximization or
iii. The cost minimization objectives on the other.
2. Development in the field of Information Technology
Another factor which has significantly contributed to the growth of operations research
during the last 4-5 decades is the developments in the field of information technology
especially computers capable of performing millions of calculations in a very short period
of time. Both analog & digital computers are very efficient computing machines and thus
they provide a cost effective way of solving very large and highly complicated problems,
such as those found in business and trade.
Most of the mathematical and other OR techniques can be used with a c6mputer, thereby
compressing the time and hence making the task of decision making relatively easier. An
important technique extensively employed in most of the operations research situations
called simulation has been developed due to the availability of high speed computing
machines such as computers. Simulation is an extremely vital technique which assists the
managers in answering the important what-if questions.
Another factor which has favorably affected the sphere of OR is:i. the development of mini-computers,
ii. personal computers and
Page 141
AMF104
iii. remote access terminals etc., due to which the computers are now less expensive and
well within the reach of even a small organization.
As a result, the Central Planning Cell which is concerned with the strategic decision
making, can make effective use of the tools and techniques of operations research. This,
too, has contributed towards the growth and the development of OR.
3. Complexity of modern business.
The modern business has become highly competitive and extremely complex. In fact,
with each passing day, the degree of complexity is increasing many times. As technology
develops at an astronomical pace, the rate of redundancy and obsoleteness rises and the
cost of doing business increases. The present day consumer is more aware of his rights
and demands a high quality products and services at very reasonable prices which every
business is not in a position to satisfy.
Decision-making in such times is an extremely risky and complex proposition with
virtually no margin for error. A single faulty decision could spell disaster for the
organisation, which might take years to recover and retain its market position. Hence,
because costs and risks are rising, managers are increasingly turning to operations
research to help them make more effective decisions.
4. Resistance to change.
The fourth and probably one of the most important inherent factors is the human
resistance to change. Many Otherwise excellent OR techniques were avoided by
managers in the past. This was on account of two factors viz; firstly, these techniques
lacked wide-spread usage and could at best be applied to very limited situations and
secondly, due to a general uncertainty or apprehension about making use of a concept
whose usefulness and utility were not yet established. This may be summed up as the
human resistance to change or the fear of the unknown.
Limitations of OR
Despite being the concept which has been greatly responsible for the way in which
decisions are made and businesses are conducted world-wide, operations research is not
without its share of pit -falls or defects.
The main limitations of OR are as under
1. Extremely Lengthy Computations
The Central theme in all OR problems is to find the best possible optimum solution by
taking into consideration all the relevant factors. Since the modern business is extremely
complex, identifying all these factors and then expressing them into quantitative terms
requires voluminous, calculations and greater efforts on the part of the management.
Further, some of the factors originally considered may change due to the dynamic nature
of the business, which may require modification or deletion. Newer factors, as and when
they emerge may be required to be included in the problem model so as to truly reflect
the real or the original problem. Moreover, any technique used in OR is based on certain
assumptions or postulates which are assumed to hold well only in a fixed finite range of
values. Some of these assumptions may not either hold good in the actual and practical
situations or it may change over a period of time.
2. Difficulties in Quantification
Page 142
AMF104
The formulation of an operations research problem requires proper, finite quantification

of all the related constituents. It means that each and every aspect of the problem under
consideration which has a bearing on it must be capable of being expressed in precise
terms.
For example, i. quantity,
ii. price,
iii. capacity utilization,
iv. installed capacity etc.
Tangible factors such as above pose no difficulty since intrinsically they are quantifiable.
But many intangible and extremely important factors such as quality of employees,
workers motivation and morale, public relations etc. cannot be properly expressed.
Hence, they have to be excluded from the purview of the OR- although they may be more
important than the tangible factors.
3. Lack of Trained Personnel
Operations research is the job of a specialist. It requires individuals having a
comprehensive broad-based knowledge of the various sub-systems of the organisation as
well as the recent OR techniques which could be applied in a given situation.
Importance of OR in Managerial Decision Making
Decision making is the top most responsibility of a modern day, manager both during the
short-run as well as the long run planning period, Decisions are required to be taken for
the smooth functioning of the organisation so that it may achieve its objectives and
mission without any major problems. These decisions, which relate to the day-to-day
operations, are known as the tactical decisions. Yet another kind of decisions which
influence the organisations working and results on a long-term basis are the strategic
decisions. The latter are of a far greater importance than the tactical decisions because
they ensure that the organisation is well equipped to capitalize on its strength and grasp
the available opportunities for its growth. The strategic decisions also help in identifying
the weakness and suggesting & providing for corrective measures to protect it from
various kinds of external threats.
The basic problem is two-fold:i. firstly, the resources enjoyed by an organisation are not infinite. Had these resources
been unlimited, there would not have been any difficulties in achieving the organizational
goals and there would not have been any scope for employing the quantitative
methodology or operations research tools to the management areas.
The limitation or scarcity of resources calls for an optimal or most efficient utilization.
This implies that almost all the resources have an opportunity cost and some of the
objectives have to be sacrificed to achieve the others.
ii. Secondly, the organisation is dynamic in nature and not in a static state. It is
perpetually changing and always in a state of continuous evolution. Thus, even if a
decision is valid & effective at the point of time when it is made, it may not hold good in
the changing environment, since some of the assumptions on which it is based may
change. But perhaps more disturbing than the change is the rate of change or the rapid
pace at which newer and more efficient technology is being continuously developed on a
global basis.
Page 143
AMF104
This has resulted in the increased cost of doing business. Due to the saturation of most of
the economies and the related markets, organisations must continuously seek new
markets or develop newer products. This has resulted in a breakdown of the geographical
boundaries and multinational corporations are the order of the day.
The most important aspect of this development has been a perceptible increase in
competition at virtually all levels. Only the best products and the organisations can
survive in such a business climate. Perform or perish is the new mantra. There is
practically no place to hide.
Decision-making seen in this context appears to be an extremely complicated and risky
exercise. It is at this state that OR enters into the picture & comes to the rescue of the line
managers. OR is applied decision theory which uses any mathematical, scientific or
logical method to cope up with the organisational problems.
It seeks to ensure rationality in the process of decision making. It thus provides the top
management with a quantitative basis for efficient decision making and increases the
firms capability to make long-term plans and also to solve the everyday problems with
greater efficiency and control. The main advantages of operations research as applied to
the executive decision making are as follows :1. Better Planning
Realistic and practical short-term and long-term plans can be made with the help of OR
techniques by taking into account the organ insertions internal as well as the external
limiting factors or the constraints.
2. Flexibility in Operations
With the help of environmental scanning and analysis an organisation can ascertain with
a reasonable degree of accuracy, when some or all of its decision variables are likely to
be affected. The various techniques can then be suitably modified to preserve the
practicability of the solution.
3. Better Co-ordination
Co-ordinating all the activities and maintaining the desired balance between them is one
of the top most responsibilities of a sound management. Operations research is of
immense importance in maintaining proper co-ordination and control between the various
sub-systems.
4. Better Decisions
Since operations research is a quantitative tool, the quality of decision improves
tremendously by its application to a particular problem or situation. There is no need to
take decisions based only on intuition or gut-feeling. Some of the situations
encountered by even a medium-size organisation can be so complex that the human mind
can never hope to assimilate all the important factors without the help of operations
research and its related tools. However, it should not be concluded that there is no scope
for intuition, intelligence or experience in decision making. OR simply cuts down the
chances of failure and reduces the risks related to decision making.
5. Better Systems
Since an incorrect decision can prove to be catastrophic for the organisation, it is very
important that the best information and decision-making tools become available to the
management. OR tools provide an overall view of the problem and present the most
optimal solutions under a given situation. It hence improves the profitability and
efficiency of the systems and results in a better overall system.
Page 144
AMF104
Introduction to linear programming

Linear Programming is that branch of mathematical programming which is designed to
solve optimization problems where all the constraints as well as the objectives are
expressed as linear function. Linear Programming is a technique for making decisions
under certainty i.e.; when all the courses of options available to an organization are
known & the objective of the firm along with its constraints are quantified. That course of
action is chosen out of all possible alternatives, which yields the optimal results. Linear
Programming can also be used as a verification and checking mechanism to ascertain the
accuracy and the reliability of the decisions which are taken solely on the basis of
managers experience-without the aid of a mathematical model. Some of the definitions
of Linear Programming are as follows:
Linear Programming is a method of planning and operation involved in the construction
of a model of a real-life situation having the following elements: a) Variables which denote the available choices and
b) the related mathematical expressions, which relate the variables to the controlling
conditions, reflect clearly the criteria to be employed for measuring the benefits flowing
out of each course of action and providing an accurate measurement of the organizations
objective. The method maybe so devised as to ensure the selection of the best alternative
out of a large number of alternative available to the organizationKohler
Linear Programming is the analysis of problems in which a linear function of a number
of variables is to be optimized (maximized or minimized) when whose variables are
subject to a number of constraints in the mathematical near inequalities.
From the above definitions, it is clear that:
I.Linear Programming is an optimization technique, where the underlying objective is
either to maximize the profits or to minimize the Cost function.
II. It deals with the problem of allocation of finite limited resources amongst different
competiting activities in the most optimal manner.
III. It generates solutions based on the feature and characteristics of the actual problem or
situation. Hence the scope of linear programming is very wide as it finds application in
such diverse fields as marketing, production, finance & personnel etc.
IV. Linear Programming has be-en highly successful in solving the following types of
problems:
Product-mix problems
Investment planning problems
Blending strategy formulations and
Marketing & Distribution management.
V. Even though Linear Programming has wide & diverse applications, yet all LP
problems have the following properties in common:
The objective is always the same (i.e. profit rnaximisation or cost minimization).
Presence of constraints, which limit the extent to which the objective can be pursued/
achieved.
Availability of alternatives i.e.; different courses of action to choose from, and
The objectives and constraints can be expressed in the form of linear relation.
Regardless of the size or complexity, all LP problems take the same form i.e. allocating
scarce resources among various compete ting alternatives. Irrespective of the manner in
Page 145
AMF104
which one defines Linear Programming, a problem must have certain basic characteristics
before this technique can be utilized to find the optimal values.
The characteristics or the basic assumptions of linear programming are as follows:
1. Decision or Activity Variables & Their Inter Relationship.
The decision or activity variables refer to any activity, which are in competition with
other variables for limited resources. Examples of such activity variables are: services,
projects, products etc. These variables are most often inter-related in terms of utilization
of the scarce resources and need simultaneous solutions. It is important to ensure that the
relationship between these variables be linear.
2. Finite Objective Functions.
A Linear Programming problem requires a clearly defined, unambiguous objective
function, which is to be optimized. It should be capable of being expressed as a liner
function of the decision variables. The single-objective optimization is one of the most
important pre-requisites of linear programming. Examples of such objectives can be:
cost-minimization, sales, profits or revenue maximization & the idle-time minimization
etc.
3. Limited Factors/Constraints.
These are the different kinds of limitations on the available resources e.g. important
resources like availability of machines, number of man hours available, production
capacity and number of available markets or consumers for finished goods are often
limited even for a big organization. Hence, it is rightly said that each and every
organization function within overall constraints both internal and external. These limiting
factors must be capable of being expressed as linear equations or in equations in terms of
decision variables.
4. Presence of Different Alternatives.
Different courses of action or alternatives should be available to a decision maker, who is
required to make the decision, which is the most effective or the optimal. For example,
many grades of raw material may be available, the same raw material can be purchased
from different supplier, the finished goods can be sold to various markets, production can
be done with the help of different machines.
5. Non-Negative Restrictions.
Since the negative value of (any) physical quantity has no meaning, therefore all the
variables must assume non-negative values. If some of the variables are unrestricted in
sign, the help of certain mathematical tools can enforce the non- negativity restriction
without altering the original information contained in the problem.
6. Linearity Criterion.
The relationship among the various decision variables must be directly proportional i.e.;
both the objective and the constraint must be expressed in terms of linear equations or
inequalities.
For example if one of the factor inputs (resources like material, labour, plant capacity
etc.) increases, then it should result in a proportionate manner in the final output. These
linear equations and inequations can graphically be presented as a straight line.
7. Additively.
It is assumed that the total profitability and the total amount of each resource utilized
would be exactly equal to the sum of the respective individual amounts. Thus the function
Page 146
AMF104
or the activities must be additive - and the interaction among the activities of the
resources does not exist.
8. Mutually Exclusive Criterion.
All decision parameters and the variables are assumed to be mutually exclusive In other
words, the occurrence of anyone variable rules out the simultaneous occurrence of other
such variables.
9. Divisibility.
Variables may be assigned fractional values. i.e.; they need not necessarily always be in
whole numbers. If a fraction of a product cannot be produced, an integer-programming
problem exists. Thus, the continuous values of the decision variables and resources must
be permissible in obtaining an optimal solution.
10. Certainty
It is assumed that conditions of certainty exist i.e.; all the relevant parameters or
coefficients in the Linear Programming model are ful1y and completely known and that
they dont change during the period. However, such an assumption may not hold good at
all times.
11. Finiteness.
Linear Programming assumes the presence of a finite number of activities and constraints
without which it is not possible to obtain the best or the optimal solution. What are the
advantages and limitation of Now it is time to examine the advantages as well as the
limitations of Linear Programming.
Advantages of Linear Programming approach
1. Scientific Approach to Problem Solving.
Linear Programming is the application of scientific approach to problem solving. Hence it
results in a better and true picture of the problems-which can then be minutely analysed
and solutions ascertained.
2. Evaluation of All Possible Alternatives.
Most of the problems faced by the present organisations are highly complicated - which
cannot be solved by the traditional approach to decision making. The technique of Linear
Programming ensures thatll possible solutions are generated - out of which the optimal
solution can be selected.
3. Helps in Re-Evaluation.
Linear Programming can also be used in. re-evaluation of a basic plan for changing
conditions. Should the conditions change while the plan is carried out only partially,
these conditions can be accurately determined with the help of Linear Programming so as
to adjust the remainder of the plan for best results.
4. Quality of Decision.
Linear Programming provides practical and better quality of decisions that reflect very
precisely the limitations of the system i.e.; the various restrictions under which the system
must operate for the solution to be optimal. If it becomes necessary to deviate from the
optimal path, Linear Programming can quite easily evaluate the associated costs or
penalty.
5. Focus on Grey-Areas.
Highlighting of grey areas or bottlenecks in the production process is the most significant
merit of Linear Programming.
Page 147
AMF104
During the periods of bottlenecks, imbalances occur in the production department. Some
of the machines remain idle for long periods of time, while the other machines are unable
toffee the demand even at the peak performance level.
6. Flexibility.
Linear Programming is an adaptive & flexible mathematical technique and hence can be
utilized in analyzing a variety of multi-dimensional problems quite successfully.
7. Creation of Information Base.
By evaluating the various possible alternatives in the light of the prevailing constraints,
Linear Programming models provide an important database from which the allocation of
precious resources can be don rationally and judiciously.
8. Maximum optimal Utilization of Factors of Production.
Linear Programming helps in optimal utilization of various existing factors of production
such as installed capacity, labour and raw materials etc.
Limitations of Linear Programming
Although Linear Programming is a highly successful fool having wide applications in
business and trade for solving optimization problems, yet it has certain demerits or
defects. Some of the important-limitations in the application of Linear Programming are
as follows:
I. Linear Relationship.
Linear Programming models can be successfully applied only in those situations where a
given problem can clearly be represented in the form of linear relationship between
different decision variables. Hence it is based on the implicit assumption that the
objective as well as all the constraints or the limiting factors can be stated in term of
linear expressions - which may not always hold well in real life situations. In practical
business problems, many objective function & constraints cannot be expressed linearly.
Most of the business problems can be expressed quite easily in the form of a quadratic
equation (having a power 2) rather than in the terms of linear equation. Linear
Programming fails to operate and provide optimal solutions in all such cases.
e.g. A problem capable of being expressed in the form of: ax2 +bx+c = 0 where a # 0
cannot be solved with the help of Linear Programming techniques.
1. Constant Value of objective & Constraint Equations.
Before a Linear Programming technique could be applied to a given situation, the values
or the coefficients of the objective function as well as the constraint equations must be
completely known. Further, Linear Programming assumes these values to be constant
over a period of time. In other words, if the values were to change during the period of
study, the technique of LP would lose its effectiveness and may fail to provide optimal
solutions to the problem.
However, in real life practical situations often it is not possible to determine the
coefficients of objective function and the constraints equations with absolute certainty.
These variables in fact may, lie on a probability distribution curve and hence at best, only
the Iikelil1ood of their occurrence can be predicted.
2. No Scope for Fractional Value Solutions.
There is absolutely no certainty that the solution to a LP problem can always be
quantified as an integer quite often, Linear Programming may give fractional-varied
Page 148
AMF104
answers, which are rounded off to the next integer. Hence, the solution would not be the
optimal one. For example, in finding out the number of men and machines required to
perform a particular job, a fractional solution would be meaningless.
3. High Degree of Complexity.
Many large-scale real life practical problems cannot be solved by employing Linear
Programming techniques even with the help of a computer due to highly complex and
Lengthy calculations. Assumptions and approximations are required to be made so that
the, given problem can be broken down into several smaller problems and, then solve
separately. Hence, the validity of the final result, in all such cases, may be doubtful:
4. Multiplicity of Goals.
The long-term objectives of an organisation are not confined to a single goal. An
organisation ,at any point of time in its operations has a multiplicity of goals or the goals
hierarchy all of which must be attained on a priority wise basis for its long term growth.
Some of the common goals can be Profit maximization or cost minimization, retaining
market share, maintaining leadership position and providing quality service to the
consumers. In cases where the management has conflicting, multiple goals, Linear
Programming model fails to provide an optimal solution. The reason being that under
Linear Programming techniques, there is only one goal which can be expressed in the
objective function. Hence in such circumstances, the situation or the given problem has to
be solved by the help of a different mathematical programming technique called the
Goal Programming.
5. Flexibility.
Once a problem has been properly quantified in terms of objective function and the
constraint equations and the tools of Linear Programming are applied to it, it becomes
very difficult to incorporate any changes in the system arising on account of any change
in the decision parameter. Hence, it lacks the desired operational flexibility.
The basic model of Linear Programming:
Linear Programming is a mathematical technique for generating & selecting the optimal
or the best solution for a given objective function. Technically, Linear Programming may
be formally defined as a method of optimizing (i.e.; maximizing or minimizing) a linear
function for a number of constraints stated in the form of linear in equations.
Mathematically the problem of Linear Programming may be stated as that of the
optimization of linear objective function of the following form:
Z = C1x1 + C2x2 ++ Cixi+.. Cnxn
Subject to the Linear constrains of the form:
a11x1 + a12x2 + a13x3++a1ixi++ainxn >= or <= b1
ajix1 + a22x2 + a23x3 +.+a2ixi + ....+a2nxn >= or <= b2
These are called the non-negative constraints. From the above, it is linear that a LP
problem has:
(I) Linear objective function which is to be maximized or minimized.
(ii) Various linear constraints which are simply the algebraic statement of the limits of the
resources or inputs at the disposal.
(iii) Non-negatively constraints.
Page 149
AMF104
Linear Programming is one of the few mathematical tools that can be used to provide
solution to a wide variety of large, complex managerial problems.
Graphical method of solution in linear programming
Once the Linear programming model has been formulated on the basis of the given
objective & the associated constraint functions, the next step is to solve the problem &
obtain the best possible or the Optimal solution various mathematical & analytical
techniques can be employed for solving the Linear programming model.
The graphic solution procedure is one of the method of solving two variable Linear
programming problems. It consists of the following steps:
Step I : Defining the problem. Formulate the problem mathematically. Express it in terms
of several mathematical constraints & an objective function. The objective function
relates to the optimization aspect is, maximisation or minimisation Criterion.
Step II: Plot the constraints Graphically. Each inequality in the constraint equation has to
be treated as an equation. An arbitrary value is assigned to one variable & the value of the
other variable is obtained by solving the equation. In the similar manner, a different
arbitrary value is again assigned to the variable & the corresponding value of other
variable is easily obtained.
These 2 sets of values are now plotted on a graph and connected by a straight line. The
same procedure has to be repeated for all the constraints. Hence, the total straight lines
would be equal to the total no of equations, each straight line representing one constraint
equation.
Step III: Locate the solution space. Solution space or the feasible region is the graphical
area which satisfies all the constraints at the same time. Such a solution point (x, y)
always occurs at the comer. points of the feasible. The feasible region is determined as
follows:
(a) For greater than & greater than or equal to constraints i.e.; the feasible region or the
solution space is the area that lies above the constraint lines.
(b) For Less Then & Less than or equal to constraint ie.
The feasible region or the solution space is the area that lies below the constraint lines.
Step IV. Selecting the graphic technique. Select the appropriate graphic technique to be
used for generating the solution. Two techniques viz; Corner Point Method and Iso-profit
(or Isocost) method may be used we give below both there techniques, however, it is
easier to generate solution by using the corner point method.
(a) Corner Point Method
(i) Since the solution point (x. y) always occurs at the corner point of the feasible or
solution space, identify each of the extreme points or corner points of the feasible region
by the method of simultaneous equations.
(ii) By putting the value of the corner points co-ordinates into the objective function,
calculate the profit (or the cost) at each of the corner points.
(iii) In a maximisation problem, the optimal solution occurs at that corner point which
gives the highest profit.
(iv) In a minimisation problem, the optimal solution occurs at that corner point which
gives the lowest profit.
(b) Iso-Profit (or Iso-Cost) method. The term Iso-profit sign if is that any combination of
points produces the same profit as any other combination on the same line. The various
steps involved in this method are given below.
Page 150
AMF104
(i) Selecting a specific figure of profit or cost, an iso-profit or iso-cost line is drawn up so
that it lies within the shaded area.
(ii) This line is moved parallel to itself and farther or closer with respect to the origin till
that point after which any further movement would lead to this line falling totally out of
the feasible region.
(iii) The optimal solution lies at the point on the feasible region which is touched by the
highest possible isoprofit or the lowest possible iso-cost line.
(iv)The co-ordinates of the optimal point (x. Y) are calculated with the help of
simultaneous equations and the optimal profit or cost is as curtained.
Example: X Ltd wishes to purchase a maximum of 3600 units of a product two types of
product a. & are available in the market Product a occupies a space of 3 cubic Jeet & cost
Rs. 9 whereas occupies a space of 1 cubic feet & cost Rs. 13 per unit. The budgetary
constraints of the company do not allow to spend more than Rs. 39,000. The total
availability
of space in the companys godown is 6000 cubic feet. Profit margin of both the product a
& is Rs. 3 & Rs. 4 respectively.
Formulate as a linear programming model and solve using graphical method. You are
required to ascertain the best possible combination of purchase of a & so that the total
profits are maximized.
Solution: Let x1 = no of units of product &
x2 = no of units of product b
Then the problem can be formulated as a P model as follows:
Objective function,
Maximise Z =
x
x2
Constraint equations,
x1 + x2<= 3600 (Maximum Units Constraints)
3x1 + x2 <= 6000 (Storage area constraints)
9x1 + 13 x2<= 39000 (Budgetary constraints)
x1 + x2 <= 0
Step I. Treating all the constraints as equality, the first constraint is
x1+
x2=3600
Step II. Determine the set of the points which satisfy the constraint:
x1 + x2 = 3600
This can easily be done by verifying whether the origin (0,0) satisfies the constraint.
Here,
Step III. The 2nd constraint is: 3 x1+
x2<=6000
Page 151
AMF104
Now draw its graph.

Step IV. Like its in the above step II, determine the set of points which satisfy the
constraint.
Hence, all the points below the line will satisfy the constraint.
Step V. The 3rd constraint is:
Put & the point is (0, 3000)
Put
Now draw its graph.
Step VI. Again the point (0,0) ie; the origin satisfies the constraint . Hence, all the points
below the line will satisfy the constraint.
Step VII. The intersection of the above graphic denotes the feasible region for the given
problem.
Step VIII. Finding Optimal Solution. Always keep in mind two things:
(i) For constraint the feasible region will be the area which lie above the constraint lines
and for constraints, it will lie below the constraint lines.
This would be useful in identifying the feasible region.
(ii) According to a theorem on linear programming, an optimal solution to a problem (if it
exists) is found at a corner point of the solution space.
Step IX. At corner points (O, A, B, C), find the profit value from the objective function.
That point which maximize the profit is the optimal point.
Step IX. At corner points (O, A, B, C), find the profit value from the objective function.
The point which maximizes the profit is the optimal point.
For point B, solve the equation 9 x1 + 12 x2 =39000

And 3x1 + 6x2 =6000 to find point B
( A+ B, these two lines are intersecting )
ie, 3x1 + x2 =6000 (1)
9 x1 + x2 =39000 (2)
Page 152
AMF104
On solving we get x1 = 13000 and x2= 21000

At point (1300, 2100)
Z = 3x1 + 4x2
= 3*1300 + 4*2100
= 12,300 which is the maximum value.
Result. The optimal solution is:
No of units of product a = 1300
No of units of product b = 2100
Total profit, = 12300 which is the maximum
Page 153
AMF104
End Chapter-6 Quizzes

Answer the following questions:
2. Operation Research is applicable to which area:
(a). Management science
(b). Decision science
(c). Quantitative methods
(d). All of these
3. PERT can be used in following kind of problems:
(a). Ship-building projects
(b). Transportation projects
(c). Military projects
(d). All of these
4. OR emphasize on:
(a). Optimal solution
(b). Better solution
(c). None of these
(d). Both (a) and (b)
5. Scope of OR is in:
(a). Judgment phase
(b) Research phase
(c). Action phase
(d). All of these
6. Techniques of OR is/are:
(a). LPP
(b). Transportation
(c). All of these
(d). None of these
7. Concern/s of inventory method is/are:
(a). demand per unit time
(b). cost of placing order
(c). Stock out cost
(d). All of these
8. Which of the following is/are types of network model:
(a). CPM
(b). PERT
(c). Both of these
(d). None of these
Page 154
AMF104
9. Queuing theory is applicable in which of the following situations:

(a). Service in bank
(b). List of school
(c). Waiting in a shopping mall
(d). All of these
10. Routing model is used in
(a). finding the shortest path
(b). transport dispatching problem
(c). All of these
(d). None of these
11. Limitations of OR:
(a) Lengthy computation
(b) Difficulty in quantification
(c) Both of these
(d) None of these
Page 155
AMF104
Key to end chapter quizzes

Chapter 1: 1.(a) 2. (d) 3. (a) 4. (a) 5. (a) 6. (a) 7. (d) 8. (d) 9. 10. (d)
Chapter 2: 1.(d) 2. (a) 3. (d) 4. (a) 5. (d) 6. (a) 7. (a) 8. (c) 9.(b) 10. (b)
Chapter 3: 1.(a) 2. (a) 3. (c) 4. (c) 5. (b) 6. (b) 7. (b) 8. (d) 9.(b) 10. (c)
Chapter 4: 1.(b) 2. (d) 3. (d) 4. (b) 5. (d) 6. (d) 7. (d) 8. (b) 9.(d) 10. (a)
Chapter 5: 1.(a) 2. (a) 3. (b) 4. (c) 5. (d) 6. (c) 7. (a) 8. (d) 9.(d) 10. (d)
Chapter 6: 1.(d) 2. (d) 3. (a) 4. (d) 5. (c) 6. (d) 7. (c) 8. (d) 9.(c) 10. (c)
Page 156
AMF104
Bibliography
1. Richard I. Levin and David S. Rubin, Statistics for Management, Prentice Hall of India
2. N D Vohra, Quantitative Techniques in Management, Tata McGraw-Hill Publishing
Ltd
3. Gupta, S C and V K Kapoor, Fundamentals of Applied Statistics, S Chand and Sons
4. Gupta S P & Gupta M P, Business Statistics, Sultan Chand & Sons
5. Sharma J K, Operation Research: Theory & Application, Mac Millan Indian Ltd
Page 157
AMF104
Page 158
AMF104
Page 159
AMF104
Page 160
AMF104
Page 161
AMF104
Page 162
AMF104
Page 163
AMF104
Page 164

AMF 104 Quantitative Analysis in Management Book PDF

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

AMF 104 Quantitative Analysis in Management Book PDF

Caricato da

Copyright:

Formati disponibili

AMF104

QUANTITATIVE ANALYSIS IN MANAGEMENT

Master of Finance & Control Copyright Amity university India

QUANTITATIVE ANALYSIS IN MANAGEMENT

Master of Finance & Control Copyright Amity university India

QUANTITATIVE ANALYSIS IN MANAGEMENT

Master of Finance & Control Copyright Amity university India

QUANTITATIVE ANALYSIS IN MANAGEMENT

This problem is then formulated in managerial terms and framed as a managerial

Master of Finance & Control Copyright Amity university India

QUANTITATIVE ANALYSIS IN MANAGEMENT

QUANTITATIVE ANALYSIS IN MANAGEMENT

Was the collecting agency unbiased or not?

Master of Finance & Control Copyright Amity university India

QUANTITATIVE ANALYSIS IN MANAGEMENT

accurate and reliable.

Demerits of Census Method

QUANTITATIVE ANALYSIS IN MANAGEMENT

QUANTITATIVE ANALYSIS IN MANAGEMENT

1.2 Introduction to SPSS, SAS and other Statistical Software Packages

Descriptive statistics: Cross

Bivariate statistics: Means, t-test, ANOVA, Correlation (bivariate, partial,

Prediction for numerical outcomes: Linear regression

Master of Finance & Control Copyright Amity university India

QUANTITATIVE ANALYSIS IN MANAGEMENT

data entry, retrieval, management, and mining

report writing and graphics

business planning, forecasting, and decision support

operations research and project management

data warehousing (extract, transform, load)

platform independent and remote computing

1.3 Diagrammatic & Graphical Presentation of Data: Bar Diagram,

QUANTITATIVE ANALYSIS IN MANAGEMENT

Master of Finance & Control Copyright Amity university India

QUANTITATIVE ANALYSIS IN MANAGEMENT

Simple Bar Diagrams

Multiple Bar Diagram:

Master of Finance & Control Copyright Amity university India

QUANTITATIVE ANALYSIS IN MANAGEMENT

Subdivided or Component Bar Diagram

QUANTITATIVE ANALYSIS IN MANAGEMENT

Percentage bar diagram:

Master of Finance & Control Copyright Amity university India

QUANTITATIVE ANALYSIS IN MANAGEMENT

Deviation Bar Diagrams

Master of Finance & Control Copyright Amity university India

QUANTITATIVE ANALYSIS IN MANAGEMENT

Duo-directional Bar Diagram

Sliding Bar Diagram

Master of Finance & Control Copyright Amity university India

QUANTITATIVE ANALYSIS IN MANAGEMENT

Figure displays these data in the form of a pyramid diagram.

QUANTITATIVE ANALYSIS IN MANAGEMENT

Master of Finance & Control Copyright Amity university India

QUANTITATIVE ANALYSIS IN MANAGEMENT

Percentage Rectangular Diagram

Master of Finance & Control Copyright Amity university India

QUANTITATIVE ANALYSIS IN MANAGEMENT

Represent the above data by Square diagram.

Master of Finance & Control Copyright Amity university India

QUANTITATIVE ANALYSIS IN MANAGEMENT

Circular or Pie Diagrams