Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Methods
Chapter 1 : Data and Statistics
Kui Zhang, Mathematical Sciences
Introduction
Statistics deals largely with principles and procedures for colleting,
describing, and drawing conclusions from data
The purpose of this chapter is to:
1.
2.
3.
4.
AGE
41
31
64
26
SEX
1
2
2
1
HAPPY
2
1
3
2
TVHOURS
0
0
0
2
Data format
Observation(s) a row in the data file
Variables(s) a column in the data file
Chapter 1, MA5701 Statistical Methods, Fall 2016
Variables (Cont.)
Definition 1.8 The ordinal scale distinguishes between
measurements on the basis of the relative amounts of some
characteristic they process.
You can convert ratio or interval scale to ordinal scale, but the criteria is not
always clear, and it induces the loss of information.
Variables Example
Obs
Zip
Age
1
3
5
7
9
3
4
1
3
1
21
7
51
8
51
Bed Bath
3
1
3
3
2
2
1
1
2
1
Size
Lot
Exter
garage
fp
Price
951
676
1186
1368
1176
64904
54450
10857
.
6259
Other
Other
Other
Frame
Frame
0
2
1
0
1
0
0
0
0
1
30000
46500
51500
56990
65500
Distributions
Definition 1.10 A frequency distribution is a listing of frequencies
of all categories of the observed values of variable.
Definition 1.11 A relative frequency distribution consists of the
relative frequencies, or proportions (percentages), of observations
belong to each category.
Definition A cumulative frequency distribution gives the frequency
of observed values less than or equal to the upper limit of that class
interval.
Definition A cumulative percent gives the relative frequency of
observed values less than or equal to the upper limit of that class
interval.
Chapter 1, MA5701 Statistical Methods, Fall 2016
bed
Frequency
Percent
1
2
3
4
5
1
3
46
16
3
1.45
4.35
66.67
23.19
4.35
Cumulative Cumulative
Frequency
Percent
1
1.45
4
5.80
50
72.46
66
95.65
69
100
Frequency
Percent
[ 0, 50k)
[ 50k, 100k)
[100k, 150k)
[150k, 200k)
[200k, 250k)
[250k, 300k)
[300k, 350k)
[350k, 400k)
4
22
23
10
2
1
4
3
5.80
31.88
33.33
14.49
2.90
1.45
5.80
4.35
Cumulative Cumulative
Frequency
Percent
4
5.80
26
37.68
49
71.01
59
85.51
61
88.41
62
89.86
66
95.65
69
100.00
10
11
12
Distributions Histogram
13
14
Graphical Representation
15
Descriptive Statistics
Exterior = Brick
Exterior = Frame
Chapter 1, MA5701 Statistical Methods, Fall 2016
16
i 1
(1)
(2)
(n)
17
SS
n
i 1
( yi y ) 2
n 1
new distance =
i 1 ( yi y ) 2
n
s 2 mean square =
n
i 1
n
i 1
| yi y |
yi2 ( i 1 yi ) 2 / n
n
SS
df
18
Change of Scale
Linear transformation (from the change of unit)
Non-Linear transformation (squared transformation, log transformation)
What will be changed?
Mean
Variance and Standard Deviation
CV
19
20
21
22
zip(zip)
1
brick
frame
other
Total
Total
10
30
48
5.80
14.49
5.80
43.48
69.57
8.33
20.83
8.33
62.50
66.67
76.92
25.00
88.24
1.45
1.45
7.25
1.45
11.59
12.50
12.50
62.50
12.50
16.67
7.69
31.25
2.94
13
1.45
2.90
10.14
4.35
18.84
7.69
15.38
53.85
23.08
16.67
15.38
43.75
8.82
13
16
34
69
8.70
18.84
23.19
49.28
100.00
23
24
25
26
27
Data Collection
Goal make statements about population according to samples
Radom sampling or some more advanced probability sampling is the
appropriate way to collect data. In this book, we assume all samples
are from simple random sampling
Definition The simple random sampling is a sampling scheme that
each possible sample of the specified size has an equal chance of
occurring
Random sampling can be difficult to implement in practice
Convenience samples are dangerous (Be careful)
Sample size and power calculation
Chapter 1, MA5701 Statistical Methods, Fall 2016
28
Chapter Summary
Variable nominal, ordinal, interval, ratio, discrete,
continuous
Table Frequency table, contingency table
Numerical measurement mean, variance, standard
deviation, largest, smallest, median, range, midrange,
percentile, quartile
Graphic histogram, bar chart, pie chart, block chart,
scatter plot
Chapter 1, MA5701 Statistical Methods, Fall 2016
29
Writing Report
Use appropriate tables and figures to summarize the
data set you have
Do not directly copy tables or output from SAS output,
do some edits (e.g., add descriptions, appropriate row
and/or column names, effective digit)
Produce appropriate figures (see previous slides)
Discrete variables report frequency and percentage
Continuous variables mean and standard deviation
Chapter 1, MA5701 Statistical Methods, Fall 2016
30
31