Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Lecture outline
What is statistics? Summarizing the distribution of data - capturing the central tendency - spread
The word statistics originally meant the collection of information about and for the state.
It is now a scientific method of collecting and analyzing data (making sense of numerical/quantitative information) to assist in making more effective decisions.
In statistics, we deal with uncertainty. We dont deal with What is but of What probably is.
But what do we mean by it is probably that? Language alone is inadequate to illustrate the degree of uncertainty, we need more formal structure for this purpose. The language of probability will be the focus of the first part of this course.
Also, in statistics, we deal with samples. We make statements about a population based on the results of a sample. This is the focus of the second part of this course. Beware, some uncertainty will always remain.
Then, in future econometrics courses, you will learn to use statistical tools to - analyze relationships of variables in the economics context - to do forecasting
Caveat: - Statistics provide useful tools for manages to help them in decision making. - However, these tools are not intended as substitutes for the familiarity with the business environment that develops through years of study and accumulated experience. - It is in alliance with other relevant expertise in the business environment that statistical methods have proved most valuable as management tools.
Statistics are everywhere. Wherever they are used, those who use them use them to speak authoritatively.
Quite important to use the right statistic for the job!
Data point, data point, data point, Distribution of the data points Characterize the shape of the distribution - the center, usually the mean - the spread, the variance - the lopsidedness, the skewness - (the "peakedness, the kurtosis)
It turns out that half of the class answered 1 and the other half of the class answered 5. When I ask you to tell me the result, which is to summarize the class opinion for me, would you add 1 and 5 and then divide the answer by 2? Thats how you calculate the average. If you do so, you will get 3, which indicates indifference. Would you report to me that the general opinion in the class on this point is actually quite indifferent?
Please calculate the mean, the median, and the mode, and tell us which statistic more reasonably captures the central tendency of this dataset?
How can we determine if the mean is being heavily influenced by outliers? The simple answer is: dont just look at the mean, look at more statistics. If the mean and the median are not close together, then the mean may be affected by outliers, such as the case in this example.
Even if the mean and the median are equal in a dataset, does it mean that we can use either one to adequately capture the central tendency?
Frequency
Salary Mean salary = Median salary The mean and median salaries are some of the least frequently reported values. These salaries appear to be bimodal. Perhaps in this case both staff and executive salaries have been collected. Because there are two frequently occurring values, the mode salary values may be the best way to summarize the dataset.
Running the numbers to get mean, median, and mode is simply not sufficient.
Graph the data before deciding how best to summarize a dataset.
You can tell quite a bit about a variable by looking at a chart of its frequency distribution. It is clear to see that the migrant income histogram stretches out to the right, we call this positively skewed. We can tell that the mean is greater than the median in this case Mean: 1234 yuan Median: 1000 yuan Mode: 1000 yuan
A word on skewness
Skewness is the direction and relative magnitude the mean is pulled and the direction the tail of a graphed dataset is pulled. When the mean is pulled to higher values, we say there is a positive or rightskewness. When the mean is pulled to lower values, we say there is a negative or leftskewness. There is a type of distribution that has zero skewness that is the symmetric distribution. With symmetric distribution, the mean is equal to the median.
1 N
2 ( x ) i