Sei sulla pagina 1di 34

Business Statistics

Fall, 2013 Introduction

Lecture outline
What is statistics? Summarizing the distribution of data - capturing the central tendency - spread

The word statistics originally meant the collection of information about and for the state.
It is now a scientific method of collecting and analyzing data (making sense of numerical/quantitative information) to assist in making more effective decisions.

In statistics, we deal with uncertainty. We dont deal with What is but of What probably is.
But what do we mean by it is probably that? Language alone is inadequate to illustrate the degree of uncertainty, we need more formal structure for this purpose. The language of probability will be the focus of the first part of this course.

Also, in statistics, we deal with samples. We make statements about a population based on the results of a sample. This is the focus of the second part of this course. Beware, some uncertainty will always remain.

Then, in future econometrics courses, you will learn to use statistical tools to - analyze relationships of variables in the economics context - to do forecasting

Caveat: - Statistics provide useful tools for manages to help them in decision making. - However, these tools are not intended as substitutes for the familiarity with the business environment that develops through years of study and accumulated experience. - It is in alliance with other relevant expertise in the business environment that statistical methods have proved most valuable as management tools.

Statistics are everywhere. Wherever they are used, those who use them use them to speak authoritatively.
Quite important to use the right statistic for the job!

Data point, data point, data point, Distribution of the data points Characterize the shape of the distribution - the center, usually the mean - the spread, the variance - the lopsidedness, the skewness - (the "peakedness, the kurtosis)

Capturing the central tendency


After every exam, you will receive your own score, and I will give you the average score of the class, why do I assume that you are interested in that average score?

Capturing the central tendency


Now suppose I ask you to poll your classmates about their opinions on making the market economy the core of any countrys development process. On a scale of 1 to 5, with 1 being strongly in favor of it and 5 being strongly against it.
1 strongly agree 2 agree 3 indifferent 4 5 disagree strongly disagree

It turns out that half of the class answered 1 and the other half of the class answered 5. When I ask you to tell me the result, which is to summarize the class opinion for me, would you add 1 and 5 and then divide the answer by 2? Thats how you calculate the average. If you do so, you will get 3, which indicates indifference. Would you report to me that the general opinion in the class on this point is actually quite indifferent?

Capturing the central tendency


Suppose you are dealing with manufacturers who produce clothing in various sizes. Is knowing the mean shirt size of European men is 41.3 or that average shoe size of American women is 8.24 useful?

Capturing the central tendency


Lets consider the incomes or wealth of households in a city. Usually, a large proportion of population has relatively modest incomes, but the incomes of, say, the highest 10% of all earners can be very large.
In such case, would you use mean income to present the view of economic well-being in the city?

Capturing the central tendency


The average or mean number is generally appropriate to summarize datas central tendency when we have numerical data.
But with categorical data, such as opinion scales, mean is meaningless. What is valuable for inventory decisions is not the mean size, but the modal size the size of items sold most often that is the size in heaviest demand.

Capturing the central tendency


But even with numerical data, mean can sometimes give misleading information about the center.
In the case of income distribution, the mean income can be inflated by the very wealthy. The existence of the very wealthy is also an illustration of outliers, numbers that are so far from the rest of the data. Outliers (positive outliers) tend to increase mean but does not affect median. The median is preferred to the mean in such case o describe the center position in income distribution.

Please calculate the mean, the median, and the mode, and tell us which statistic more reasonably captures the central tendency of this dataset?

How can we determine if the mean is being heavily influenced by outliers? The simple answer is: dont just look at the mean, look at more statistics. If the mean and the median are not close together, then the mean may be affected by outliers, such as the case in this example.

Even if the mean and the median are equal in a dataset, does it mean that we can use either one to adequately capture the central tendency?

Frequency

Salary Mean salary = Median salary The mean and median salaries are some of the least frequently reported values. These salaries appear to be bimodal. Perhaps in this case both staff and executive salaries have been collected. Because there are two frequently occurring values, the mode salary values may be the best way to summarize the dataset.

Running the numbers to get mean, median, and mode is simply not sufficient.
Graph the data before deciding how best to summarize a dataset.

Zhijiangs migrant income in 2007


The intervals into which the data are broken down are called bins (or classes).

The numbers of observations in each class are called frequencies.


A histogram is a representation of the tabulated frequencies over specified bins.

You can tell quite a bit about a variable by looking at a chart of its frequency distribution. It is clear to see that the migrant income histogram stretches out to the right, we call this positively skewed. We can tell that the mean is greater than the median in this case Mean: 1234 yuan Median: 1000 yuan Mode: 1000 yuan

A word on skewness

Skewness is the direction and relative magnitude the mean is pulled and the direction the tail of a graphed dataset is pulled. When the mean is pulled to higher values, we say there is a positive or rightskewness. When the mean is pulled to lower values, we say there is a negative or leftskewness. There is a type of distribution that has zero skewness that is the symmetric distribution. With symmetric distribution, the mean is equal to the median.

The variability or spread of a distribution


When we have two datasets with the same mean, how can we tell which dataset is More variable? more volatile less precise less predictable

The variability or spread of a distribution


The easiest way to think about the volatility of a dataset:
Range of the dataset

The variability or spread of a distribution


What about how far each point is from the mean? The dataset with the higher average distance from the mean should be more spread out or variable?
We can express this idea using the following formula: 1
(x ) N
i

The variability or spread of a distribution


But this formula always equals zero! (TA session) We must improve this formula slightly so that deviations on either side of the mean dont offset each other in the aggregate. To get rid of the offsets, we could either use absolute distance, or we can square the distances.
We choose the square the distances, thats easier to deal mathematically in many applications.

The variability or spread of a distribution


Now we create mean squared deviations from the mean. 2 1 N ( xi ) 2 N i 1 We call the mean squared deviations from the mean the statistical variance.

The variability or spread of a distribution


But along comes another problem. Variance is measured in units of data, squared. Wouldnt it be better to use a spread statistic that is expressed in the same units of the data being studied? So we take the square root of the variance.

1 N
2 ( x ) i

And this is called the standard deviation.

Look at two investment funds below:


MBA Student Fund A Average return over 10 years Median return Standard deviation 5% 7% 10% MBA Student Fund B 5% 2% 1%

In which fund would you invest?

Potrebbero piacerti anche