Sei sulla pagina 1di 10

Subject :- Business Statistics

Topic :- Business Statistics


(BC.203)

Name :- Gautam Bisht


Roll No. :- 19134110159
Enrollment No. :- G191342338
Father's Name :- Mr Deepak
Bisht
Submitted To :- Dr. Amir Khan
Define median and its importance?
In statistics and probability theory, the median is the value separating the
higher half from the lower half of a data sample, a population or
a probability distribution. For a data set, it may be thought of as the
"middle" value. For example, the basic advantage of the median in
describing data compared to the mean (often simply described as the
"average") is that it is not skewed so much by a small proportion of
extremely large or small values, and so it may give a better idea of a
"typical" value. For example, in understanding statistics like household
income or assets, which vary greatly, the mean may be skewed by a small
number of extremely high or low values. Median income, for example, may
be a better way to suggest what a "typical" income is. Because of this, the
median is of central importance in robust statistics, as it is the
most resistant statistic, having a breakdown point of 50%: so long as no
more than half the data are contaminated, the median will not give an
arbitrarily large or small result.
Median of Ungrouped Data
If the number of observations is odd:

If the number of observations is even:


Median from grouped Data

For the median of grouped data, we find the cumulative frequencies


and then calculate the median number n2/n2. The median lies in the
group (class) which corresponds to the cumulative frequency in which
n2/n2 lies. We use the following formula to find the median.

Uses & Importance


The median can be used as a measure of location when one attaches
reduced importance to extreme values, typically because a distribution
is skewed, extreme values are not known, or outliers are untrustworthy,
i.e., may be measurement/transcription errors.
For example, consider the multiset
1, 2, 2, 2, 3, 14.
The median is 2 in this case, (as is the mode), and it might be seen as a
better indication of the center than the arithmetic mean of 4, which is
larger than all-but-one of the values! However, the widely cited empirical
relationship that the mean is shifted "further into the tail" of a distribution
than the median is not generally true. At most, one can say that the two
statistics cannot be "too far" apart; see § Inequality relating means and
medians below.

As a median is based on the middle data in a set, it is not necessary to


know the value of extreme results in order to calculate it. For example, in a
psychology test investigating the time needed to solve a problem, if a small
number of people failed to solve the problem at all in the given time a
median can still be calculated.[6]
Because the median is simple to understand and easy to calculate, while
also a robust approximation to the mean, the median is a
popular summary statistic in descriptive statistics. In this context,
there are several choices for a measure of variability: the range,
the interquartile range, the mean absolute deviation, and the median
absolute deviation.

For practical purposes, different measures of location and dispersion are


often compared on the basis of how well the corresponding population
values can be estimated from a sample of data. The median, estimated
using the sample median, has good properties in this regard. While it is not
usually optimal if a given population distribution is assumed, its properties
are always reasonably good. For example, a comparison of
the efficiency of candidate estimators shows that the sample mean is
more statistically efficient when — and only when — data is
uncontaminated by data from heavy-tailed distributions or from mixtures of
distributions.[citation needed] Even then, the median has a 64% efficiency
compared to the minimum-variance mean (for large normal samples),
which is to say the variance of the median will be ~50% greater than the
variance of the mean.
Differentiate between Geometric and Harmonic
mean.
The harmonic mean is a type of numerical average. It is calculated by
dividing the number of observations by the reciprocal of each number in the
series. Thus, the harmonic mean is the reciprocal of the arithmetic mean of
the reciprocals.

The harmonic mean of 1,4, and 4 is: 

The geometric mean is the average of a set of products, the calculation of


which is commonly used to determine the performance results of an
investment or portfolio. It is technically defined as "the nth root product
of n numbers." The geometric mean must be used when working with
percentages, which are derived from values.
The geometric mean is an important tool for calculating portfolio
performance for many reasons, but one of the most significant is it takes
into account the effects of compounding.

The Formula for Geometric Mean Is

Example of the Harmonic Mean

As an example, take two firms. One has a market capitalization of $100


billion and earnings of $4 billion (P/E of 25) and one with a market
capitalization of $1 billion and earnings of $4 million (P/E of 250). In an
index made of the two stocks, with 10% invested in the first and 90%
invested in the second, the P/E ratio of the index is:
What do you understand by mode? Explain
its uses and limitation with example.

The mode of a set of data values is the value that appears most


often. If X is a discrete random variable, the mode is the value x (i.e, X = x)
at which the probability mass function takes its maximum value. In
other words, it is the value that is most likely to be sampled.
Like the statistical mean and median, the mode is a way of expressing, in
a (usually) single number, important information about a random
variable or a population. The numerical value of the mode is the same as
that of the mean and median in a normal distribution, and it may be very
different in highly skewed distributions.
The mode is not necessarily unique to a given discrete distribution, since
the probability mass function may take the same maximum value at
several points x1, x2, etc. The most extreme case occurs in uniform
distributions, where all values occur equally frequently.
When the probability density function of a continuous distribution has
multiple local maxima it is common to refer to all of the local maxima as
modes of the distribution. Such a continuous distribution is
called multimodal (as opposed to unimodal). A mode of a continuous
probability distribution is often considered to be any value x at which
its probability density function has a locally maximum value, so any
peak is a mode
In symmetric unimodal distributions, such as the normal distribution,
the mean (if defined), median and mode all coincide. For samples, if it is
known that they are drawn from a symmetric unimodal distribution, the
sample mean can be used as an estimate of the population mode.
Uses
1. Unlike mean and median, the concept of mode also makes sense for
"nominal data" (i.e., not consisting of numerical values in the case
of mean, or even of ordered values in the case of median). For
example, taking a sample of Korean family names, one might find
that "Kim" occurs more often than any other name. Then "Kim" would
be the mode of the sample. In any voting system where a plurality
determines victory, a single modal value determines the victor, while
a multi-modal outcome would require some tie-breaking procedure to
take place.

2. Unlike median, the concept of mode makes sense for any random
variable assuming values from a vector space, including the real
numbers (a one-dimensional vector space) and
the integers (which can be considered embedded in the reals). For
example, a distribution of points in the plane will typically have a
mean and a mode, but the concept of median does not apply. The
median makes sense when there is a linear order on the possible
values. Generalizations of the concept of median to higher-
dimensional spaces are the geometric median and
the centrepoint.

Limitations of Mode

The are some limitations to using the mode. In some distributions, the
mode may not reflect the centre of the distribution very well. When the
distribution of retirement age is ordered from lowest to highest value, it is
easy to see that the centre of the distribution is 57 years, but the mode is
lower, at 54 years.

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

It is also possible for there to be more than one mode for the same
distribution of data, (bi-modal, or multi-modal). The presence of more than
one mode can limit the ability of the mode in describing the centre or typical
value of the distribution because a single value to describe the centre
cannot be identified. In some cases, particularly where the data
are continuous, the distribution may have no mode at all (i.e. if all values
are different). In cases such as these, it may be better to consider using the
median or mean, or group the data in to appropriate intervals, and find the
modal class.
May not represent the Data Accurately
Another limitation of the mode is that it may not represent the data
accurately. Hence in the above example, if 3, 5 and 6 are replaced by 100,
200 and 300 then also mode will be same which not correct representation
of the data. Hence one should be careful while analyzing the data only on
the basis of mode if series under consideration have extreme values.

Other Values become Insignificant


The biggest disadvantage of the mode is that other values are not taken
into consideration. Hence in the above example other values like 3, 5 and 6
are of no use and only number 2 matter as it has appeared 2 times.

Potrebbero piacerti anche