Sei sulla pagina 1di 10

Modern business management is more a Science than an Art.

Ever increasing global competition mandates


business managers to address uncertainty by using scientific methods and be Objective decision makers.
Forecasting, planning, organizing and decision making; some of the key activities of a manager intend better future
for the business. The only certainty about the future is its ‘uncertainty’. Even though one cannot eliminate
uncertainty, it is possible to measure uncertainty using Statistics: manager can make informed decisions by using
Statistical methods and Statistical thinking. This calls for unraveling the power of Statistics for managers.
Broadly, knowledge of statistics helps a manager to describe the problem, identify and evaluate alternative courses
of action, estimate error, monitor processes and take appropriate corrective actions to achieve optimum results.
Applications of Statistics for Managers
Both Descriptive and Inferential statistical methods find important place in business management. To quote a few of
the many applications across functions,
1. A Marketing manager needs to gather and analyze a large amount of data pertaining to market dynamics
and target customers. Ideally, marketing strategy depends up on the outcomes of a Market research, which
involves statistical methods for collecting and analyzing data, application of sampling techniques and evaluating
the effect of various marketing strategies.
2. A production manager would ideally use Statistical Process Control techniques to improve productivity and
quality. Knowledge and application of Control Charts, Sampling techniques and Probability Distributions ensures
better processes and products. This also leads to the reduction in production cost and higher profits.
3. A HR manager would be interested in identifying the best approach to train employees and evaluate the
impact of training. There is a need to measure attrition and understand the underlying factors.
4. For a Finance manager, crunching financial data and using financial techniques is an integral part of day-to-
day job. Knowledge of Statistics enhances competency and proficiency of a manager as a researcher and
therefore provides an edge.
In an era where Total Quality Management [TQM], Lean organization, Six- Sigma are some of the buzz words, it is
essential for a manager to be conversant with Statistics.
Also, in today’s scenario, a graduate from B – school is expected not just to manage functional departments of a
corporate house, but, be an intrapreneur. He/she is expected to approach work with an entrepreneurial mind set.
Professionally trained and skilled graduates are encouraged to establish ventures, create jobs and create value. In
order to be entrepreneurial it is important to be able to ‘think out of box’ and be ‘objective’. In other words, one
needs to nurture ‘Creative and Statistical thinking’ for success. Quantitative analysis aids in providing an objective,
factual underpinning of situations and responses. Analysis, along with data, helps quantify the extent of problems
and solutions in ways that other information seldom can.
Be it a simple tools like Tables, Graphs, for presentation and measures of Central Tendency, Dispersion, Association
for analysis of data, or complex techniques like Multi-variate techniques, Big Data analysis, Structural Equation
Modeling, Statistical techniques have made their way into corporate broad rooms. This indicates the indispensability
of Statistics for managerial decisions.

Definitions
Raw Data

Data collected in original form.

Frequency

The number of times a certain value or class of values occurs.

Frequency Distribution

The organization of raw data in table form with classes and frequencies.

Categorical Frequency Distribution

A frequency distribution in which the data is only nominal or ordinal.


Ungrouped Frequency Distribution

A frequency distribution of numerical data. The raw data is not grouped.

Grouped Frequency Distribution

A frequency distribution where several numbers are grouped into one class.

Class Limits

Separate one class in a grouped frequency distribution from another. The limits could actually appear in the
data and have gaps between the upper limit of one class and the lower limit of the next.

Class Boundaries

Separate one class in a grouped frequency distribution from another. The boundaries have one more
decimal place than the raw data and therefore do not appear in the data. There is no gap between the upper
boundary of one class and the lower boundary of the next class. The lower class boundary is found by
subtracting 0.5 units from the lower class limit and the upper class boundary is found by adding 0.5 units to
the upper class limit.

Class Width

The difference between the upper and lower boundaries of any class. The class width is also the difference
between the lower limits of two consecutive classes or the upper limits of two consecutive classes. It is not
the difference between the upper and lower limits of the same class.

Class Mark (Midpoint)

The number in the middle of the class. It is found by adding the upper and lower limits and dividing by
two. It can also be found by adding the upper and lower boundaries and dividing by two.

Cumulative Frequency

The number of values less than the upper class boundary for the current class. This is a running total of the
frequencies.

Relative Frequency

The frequency divided by the total frequency. This gives the percent of values falling in that class.

Cumulative Relative Frequency (Relative Cumulative Frequency)

The running total of the relative frequencies or the cumulative frequency divided by the total frequency.
Gives the percent of the values which are less than the upper class boundary.

Histogram

A graph which displays the data by using vertical bars of various heights to represent frequencies. The
horizontal axis can be either the class boundaries, the class marks, or the class limits.

Frequency Polygon

A line graph. The frequency is placed along the vertical axis and the class midpoints are placed along the
horizontal axis. These points are connected with lines.

Ogive
A frequency polygon of the cumulative frequency or the relative cumulative frequency. The vertical axis
the cumulative frequency or relative cumulative frequency. The horizontal axis is the class boundaries. The
graph always starts at zero at the lowest class boundary and will end up at the total frequency (for a
cumulative frequency) or 1.00 (for a relative cumulative frequency).

Pareto Chart

A bar graph for qualitative data with the bars arranged according to frequency.

Pie Chart

Graphical depiction of data as slices of a pie. The frequency determines the size of the slice. The number of
degrees in any slice is the relative frequency times 360 degrees.

Pictograph

A graph that uses pictures to represent data.

Stem and Leaf Plot

A data plot which uses part of the data value as the stem and the rest of the data value (the leaf) to form
groups or classes. This is very useful for sorting data quickly.

Statistics: Grouped Frequency Distributions

Guidelines for classes

1. There should be between 5 and 20 classes.


2. The class width should be an odd number. This will guarantee that the class midpoints are integers instead
of decimals.

3. The classes must be mutually exclusive. This means that no data value can fall into two different classes

4. The classes must be all inclusive or exhaustive. This means that all data values must be included.

5. The classes must be continuous. There are no gaps in a frequency distribution. Classes that have no values
in them must be included (unless it's the first or last class which are dropped).

6. The classes must be equal in width. The exception here is the first or last class. It is possible to have an
"below ..." or "... and above" class. This is often used with ages.

Creating a Grouped Frequency Distribution


1. Find the largest and smallest values
2. Compute the Range = Maximum - Minimum

3. Select the number of classes desired. This is usually between 5 and 20.

4. Find the class width by dividing the range by the number of classes and rounding up. There are two things
to be careful of here. You must round up, not off. Normally 3.2 would round to be 3, but in rounding up, it
becomes 4. If the range divided by the number of classes gives an integer value (no remainder), then you
can either add one to the number of classes or add one to the class width. Sometimes you're locked into a
certain number of classes because of the instructions. The Bluman text fails to mention the case when there
is no remainder.
5. Pick a suitable starting point less than or equal to the minimum value. You will be able to cover: "the class
width times the number of classes" values. You need to cover one more value than the range. Follow this
rule and you'll be okay: The starting point plus the number of classes times the class width must be greater
than the maximum value. Your starting point is the lower limit of the first class. Continue to add the class
width to this lower limit to get the rest of the lower limits.

6. To find the upper limit of the first class, subtract one from the lower limit of the second class. Then
continue to add the class width to this upper limit to find the rest of the upper limits.

7. Find the boundaries by subtracting 0.5 units from the lower limits and adding 0.5 units from the upper
limits. The boundaries are also half-way between the upper limit of one class and the lower limit of the next
class. Depending on what you're trying to accomplish, it may not be necessary to find the boundaries.

8. Tally the data.

9. Find the frequencies.

10. Find the cumulative frequencies. Depending on what you're trying to accomplish, it may not be necessary
to find the cumulative frequencies.

11. If necessary, find the relative frequencies and/or relative cumulative frequencies.

It is possible to have the TI-82 calculator find the frequencies for you. You will have to find the class width and class
boundaries first.

Definitions

Statistic
Characteristic or measure obtained from a sample
Parameter
Characteristic or measure obtained from a population
Mean
Sum of all the values divided by the number of values. This can either be a population mean (denoted by
mu) or a sample mean (denoted by x bar)
Median
The midpoint of the data after being ranked (sorted in ascending order). There are as many numbers below
the median as above the median.
Mode
The most frequent number
Skewed Distribution
The majority of the values lie together on one side with a very few values (the tail) to the other side. In a
positively skewed distribution, the tail is to the right and the mean is larger than the median. In a negatively
skewed distribution, the tail is to the left and the mean is smaller than the median.
Symmetric Distribution
The data values are evenly distributed on both sides of the mean. In a symmetric distribution, the mean is
the median.
Weighted Mean
The mean when each value is multiplied by its weight and summed. This sum is divided by the total of the
weights.
Midrange
The mean of the highest and lowest values. (Max + Min) / 2
Range
The difference between the highest and lowest values. Max - Min
Population Variance
The average of the squares of the distances from the population mean. It is the sum of the squares of the
deviations from the mean divided by the population size. The units on the variance are the units of the
population squared.
Sample Variance
Unbiased estimator of a population variance. Instead of dividing by the population size, the sum of the
squares of the deviations from the sample mean is divided by one less than the sample size. The units on
the variance are the units of the population squared.
Standard Deviation
The square root of the variance. The population standard deviation is the square root of the population
variance and the sample standard deviation is the square root of the sample variance. The sample standard
deviation is not the unbiased estimator for the population standard deviation. The units on the standard
deviation is the same as the units of the population/sample.
Coefficient of Variation
Standard deviation divided by the mean, expressed as a percentage. We won't work with the Coefficient of
Variation in this course.
Chebyshev's Theorem

The proportion of the values that fall within k standard deviations of the mean is at least where
k > 1. Chebyshev's theorem can be applied to any distribution regardless of its shape.
Empirical or Normal Rule
Only valid when a distribution in bell-shaped (normal). Approximately 68% lies within 1 standard
deviation of the mean; 95% within 2 standard deviations; and 99.7% within 3 standard deviations of the
mean.
Standard Score or Z-Score
The value obtained by subtracting the mean and dividing by the standard deviation. When all values are
transformed to their standard scores, the new mean (for Z) will be zero and the standard deviation will be
one.
Percentile
The percent of the population which lies below that value. The data must be ranked to find percentiles.
Quartile
Either the 25th, 50th, or 75th percentiles. The 50th percentile is also called the median.
Decile
Either the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, or 90th percentiles.
Lower Hinge
The median of the lower half of the numbers (up to and including the median). The lower hinge is the first
Quartile unless the remainder when dividing the sample size by four is 3.
Upper Hinge
The median of the upper half of the numbers (including the median). The upper hinge is the 3rd Quartile
unless the remainder when dividing the sample size by four is 3.
Box and Whiskers Plot (Box Plot)
A graphical representation of the minimum value, lower hinge, median, upper hinge, and maximum. Some
textbooks, and the TI-82 calculator, define the five values as the minimum, first Quartile, median, third
Quartile, and maximum.
Five Number Summary
Minimum value, lower hinge, median, upper hinge, and maximum.
InterQuartile Range (IQR)
The difference between the 3rd and 1st Quartiles.
Outlier
An extremely high or low value when compared to the rest of the values.
Mild Outliers
Values which lie between 1.5 and 3.0 times the InterQuartile Range below the 1st Quartile or above the 3rd
Quartile. Note, some texts use hinges instead of Quartiles.
Extreme Outliers
Values which lie more than 3.0 times the InterQuartile Range below the 1st Quartile or above the 3rd
Quartile. Note, some texts use hinges instead of Quartiles.

Stats: Measures of Central Tendency

The term "Average" is vague

Average could mean one of four things. The arithmetic mean, the median, midrange, or mode. For this reason, it is
better to specify which average you're talking about.

Mean

This is what people usually intend when they say "average"

Population Mean:

Sample Mean:

Frequency Distribution:
The mean of a frequency distribution is also the weighted mean.

Median

The data must be ranked (sorted in ascending order) first. The median is the number in the middle.

To find the depth of the median, there are several formulas that could be used, the one that we will use is:
Depth of median = 0.5 * (n + 1)

Raw Data

The median is the number in the "depth of the median" position. If the sample size is even, the depth of the median
will be a decimal -- you need to find the midpoint between the numbers on either side of the depth of the median.

Ungrouped Frequency Distribution

Find the cumulative frequencies for the data. The first value with a cumulative frequency greater than depth of the
median is the median. If the depth of the median is exactly 0.5 more than the cumulative frequency of the previous
class, then the median is the midpoint between the two classes.

Grouped Frequency Distribution


This is the tough one.

Since the data is grouped, you have lost all original information. Some textbooks have you simply take the midpoint
of the class. This is an over-simplification which isn't the true value (but much easier to do). The correct process is
to interpolate.

Find out what proportion of the distance into the median class the median by dividing the sample size by 2,
subtracting the cumulative frequency of the previous class, and then dividing all that bay the frequency of the
median class.

Multiply this proportion by the class width and add it to the lower boundary of the median class.

Mode

The mode is the most frequent data value. There may be no mode if no one value appears more than any other. There
may also be two modes (bimodal), three modes (trimodal), or more than three modes (multi-modal).

For grouped frequency distributions, the modal class is the class with the largest frequency.

Midrange

The midrange is simply the midpoint between the highest and lowest values.

Summary

The Mean is used in computing other statistics (such as the variance) and does not exist for open ended grouped
frequency distributions (1). It is often not appropriate for skewed distributions such as salary information.

The Median is the center number and is good for skewed distributions because it is resistant to change.

The Mode is used to describe the most typical case. The mode can be used with nominal data whereas the others
can't. The mode may or may not exist and there may be more than one value for the mode (2).

The Midrange is not used very often. It is a very rough estimate of the average and is greatly
affected by extreme values (even more so than the mean).

Property Mean Median Mode Midrange

Always Exists No (1) Yes No (2) Yes

Uses all data values Yes No No No

Affected by extreme values Yes No No Yes


Stats: Measures of Variation

Range

The range is the simplest measure of variation to find. It is simply the highest value minus the lowest value.

RANGE = MAXIMUM - MINIMUM

Since the range only uses the largest and smallest values, it is greatly affected by extreme values, that is - it is not
resistant to change.

Variance

"Average Deviation"

The range only involves the smallest and largest numbers, and it would be desirable to have a statistic which
involved all of the data values.

The first attempt one might make at this is something they might call the average deviation from the mean and
define it as:

The problem is that this summation is always zero. So, the average deviation will always be zero. That is why the
average deviation is never used.

Population Variance

So, to keep it from being zero, the deviation from the mean is squared and called the "squared deviation from the
mean". This "average squared deviation from the mean" is called the variance.

Unbiased Estimate of the Population Variance

One would expect the sample variance to simply be the population variance with the population mean replaced by
the sample mean. However, one of the major uses of statistics is to estimate the corresponding parameter. This
formula has the problem that the estimated value isn't the same as the parameter. To counteract this, the sum of the
squares of the deviations is divided by one less than the sample size.
Standard Deviation

There is a problem with variances. Recall that the deviations were squared. That means that the units were also
squared. To get the units back the same as the original data values, the square root must be taken.

The sample standard deviation is not the unbiased estimator for the population standard deviation.

The calculator does not have a variance key on it. It does have a standard deviation key. You will have to square the
standard deviation to find the variance.

Sum of Squares (shortcuts)

The sum of the squares of the deviations from the means is given a shortcut notation and several alternative
formulas.

A little algebraic simplification returns:

What's wrong with the first formula, you ask? Consider the following example - the last row are the totals for the
columns

1. Total the data values: 23


2. Divide by the number of values to get the mean: 23/5 = 4.6

3. Subtract the mean from each value to get the numbers in the second column.

4. Square each number in the second column to get the values in the third column.

5. Total the numbers in the third column: 5.2

6. Divide this total by one less than the sample size to get the variance: 5.2 / 4 = 1.3

4 4 - 4.6 = -0.6 ( - 0.6 )^2 = 0.36


5 5 - 4.6 = 0.4 ( 0.4 ) ^2 = 0.16

3 3 - 4.6 = -1.6 ( - 1.6 )^2 = 2.56

6 6 - 4.6 = 1.4 ( 1.4 )^2 = 1.96

5 5 - 4.6 = 0.4 ( 0.4 )^2 = 0.16

23 0.00 (Always) 5.2

Not too bad, you think. But this can get pretty bad if the sample mean doesn't happen to be an "nice" rational
number. Think about having a mean of 19/7 = 2.714285714285... Those subtractions get nasty, and when you square
them, they're really bad. Another problem with the first formula is that it requires you to know the mean ahead of
time. For a calculator, this would mean that you have to save all of the numbers that were entered. The TI-82 does
this, but most scientific calculators don't.

Now, let's consider the shortcut formula. The only things that you need to find are the sum of the values and the sum
of the values squared. There is no subtraction and no decimals or fractions until the end. The last row contains the
sums of the columns, just like before.

1. Record each number in the first column and the square of each number in the second column.
2. Total the first column: 23

3. Total the second column: 111

4. Compute the sum of squares: 111 - 23*23/5 = 111 - 105.8 = 5.2

5. Divide the sum of squares by one less than the sample size to get the variance = 5.2 / 4 = 1.3

x x^2

4 16

5 25

3 9

6 36

5 25

23 111

Potrebbero piacerti anche