Sei sulla pagina 1di 4

Summary Table

 gives the number of observation, mean, standard deviation, minimum and maximum value
 Syntax:
su variable
 for specific summary measures,
 Syntax:
tabstat variable, statistics(specify desired summary measures separated by spaces, syntax sensitive)
example: tabstat variable, statistics(mean range sd var median p25 p50 p75)
 or go to Statistics  Summaries, tables, and tests  Other tables  Compact table of summary statistics
 can summarize one or more variables

Frequency Distribution Table


 enter data in one column
 Syntax:
1. egen clvariable = cut(variable), at (LCB(width)UCB) label
2. tabulate clvariable

Histogram
 enter data in one column
 Syntax:
histogram variable, width(#) start(LCB) frequency ytitle(Title of y-axis) xtitle(Title of x-axis) xlabel(LCB1 LCB2 LCB3 LCB4)

Frequency Polygon
 make separate columns for the midpoints and frequency
 remember: make a new class before the first class and after the last class
 Syntax:
twoway (connected frequency variable), xlabel(midpoints separated by spaces)

Cumulative Frequency Polygon


 make separete columns for the midpoints and cumulative frequency
 relative cumulative frequency polygon may also be generated this way
 Syntax:
twoway (connected cumufrequency variable), xlabel(midpoints separated by spaces)

Box and Whisker Plot


 make separate columns for the numerical variable (var1) and the categorical variable (var2)
 for the categorical variable, codes may be used (remember to assign values to the codes)
 Syntax:
graph box var1, over(var2)

Stem and Leaf Plot


 enter data in one column
 Syntax:
stem variable
 for one-line stemplot:
stem variable, lines(#)

Vertical Bar Graph


 make separate columns for each variable
 Syntax:
graph bar (asis) var1, over(var2) ytitle(Title of y-axis)
 var1 is on the y-axis
 titles may be added manually (consumes more time)

Horizontal Bar Graph


 make separate columns for each variable
 Syntax:
graph hbar (asis) var1, over(var2) ytitle(Title of y-axis)
 var2 is on the y-axis
 for variables with more than one level:
graph hbar (asis) var1 var2 var3, over(var4) ytitle(Title of y-axis)
 var4 is on the y-axis

Component Bar Graph


 make separate columns for each variables, and separate each levels
 generate the total for all levels
 Syntax:
1. gen total = var1+var2+var3
2. graph hbar (asis) var1 var2 var3, over(var4, sort(total) descending) stack blabel (bar,position(center))
 var4 is on the y-axis
 ”blabel (bar,position(center))” is used to designate the label of each bar and its position on the graph
 ”stack” is important in the syntax, removing it would result to a horizontal graph with numerous levels
 ”sort(total) descending)” arranges the total in descending order, removing “descending” would result to an ascending graph
 for vertical graph: “graph bar” (x & y-axis would be different)

Pie Chart
 make separate columns for the variable and the percentage/frequency
 Syntax:
graph pie percent, over(variable) sort
 “sort” arranges the levels in ascending order
 ”sort descending” arranges the level in descending order
 sorting includes others in the arrangement; for convenience, arrange each levels in descending order and place others in the last row. By doing
this, syntax “sort” may be omitted
 labels may be inserted manually or add the syntax:
plabel(_all percent, size(*#) color(color))

Line Graph
 make separate columns for each variable, including the time variable
 Syntax:
twoway (var1 time), ytitle(Title of y-axis) ylabel(firstvalue(interval)lastvalue) xtitle(time) xlabel(firstvalue(interval)lastvalue)
 for comparison of several levels
twoway (var1 time) (var2 time) (var3 time), ytitle(Title of y-axis) ylabel(firstvalue(interval)lastvalue) xtitle(time)
xlabel(firstvalue(interval)lastvalue)

Scatterplot
 make separate columns for each variable
 Syntax:
twoway (scatter var1 var2), ytitle(Title of y-axis) ylabel(firstvalue(interval)lastvalue) xtitle(Title of x-axis) xlabel(firstvalue(interval)lastvalue)
 var1 is on the y-axis
 var2 is on the y-axis
Sturges’ Rule Outliers
𝐾𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = 1 + 3.32 log(𝑛) 𝑥 > 𝑄3 + 1.5(𝑄3 − 𝑄1 )
width = range/K 𝑥 < 𝑄1 − 1.5(𝑄3 − 𝑄1 )

Extremely Outlying Values


𝑥 > 𝑄3 + 3.0(𝑄3 − 𝑄1 )
𝑥 < 𝑄1 − 3.0(𝑄3 − 𝑄1 )
Measures of Central Tendency
Mean Median
ungrouped ungrouped
𝑛
odd:
𝑥̅ = ∑ 𝑥𝑖 𝑛
𝑖=1 𝑡ℎ 𝑜𝑏𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
grouped 2
𝑛 even:
𝑥̅ = ∑ 𝑓𝑖 𝑥𝑖 𝑛 𝑛
𝑖=1 + ( + 1)
2 2 𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑓𝑖 is frequency of the
ith category 2
𝑥𝑖 is midpoint of the ith category grouped
n is the total number of observations find median class: cf >
𝑛
2
𝑛
− 𝑐𝑓𝑀𝑑−1
𝑀𝑑 = 𝐿𝐶𝐵𝑀𝑑 + [2 ]𝑐
𝑓𝑀𝑑
LCBMd is LCB of median class
cfMd-1 is the cf of the class before the median class
fMd is the frequency of the median class
c is the class width
Mode Interpretations
ungrouped Mean:
most occurring value a. On the average, ______________ is __________..
grouped b. The average ____________ is ____________.
find modal class: class with highest frequency Median:
mode is the midpoint of the modal class Half (or 50%) of the observations are below or equal to ______ while
or half (or 50%) of the observations are or above _______.
∆1 Mode:
𝑀𝑜𝑑𝑒 = 𝐿𝑀𝑜 + ( )𝑐 a. (modal class) is the interval having the highest frequency.
∆1 + ∆2
∆1 is the difference between the frequencies of the class mode and the b. (midpoint) is the most frequently occurring value.
class before the class mode
∆2 is the difference between the frequencies of the class mode and the
class after the class mode
𝐿𝑀𝑜 is the lower boundary of the modal class

Measures of Spread
[Absolute] Range [Absolute] Interquartile Range
𝑅𝑎𝑛𝑔𝑒 = 𝐻𝐶𝐵 − 𝐿𝐶𝐵 Difference between P25 and P75 (or Q1 and Q3)
𝑅𝑎𝑛𝑔𝑒 = ℎ𝑖𝑔ℎ𝑒𝑠𝑡 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑎𝑖𝑜𝑛 − 𝑙𝑜𝑤𝑒𝑠𝑡 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
Interpretation: Half of the observations fall within ____ to ____.
Interpretation: The difference between the highest & lowest values is
______.
[Absolute] Variance [Absolute] Standard Deviation
Σ(𝑋𝑖 − 𝑋) 2
ungrouped:
𝑠2 =
𝑛−1 𝑛 ∑2𝑖 𝑥𝑖2 − (∑ 𝑥𝑖 )2
or 𝑠=√
(Σ𝑋𝑖 ) 𝑛(𝑛 − 1)
Σ𝑋𝑖 −
𝑠2 = 𝑛 grouped:
𝑛−1
𝑛 ∑2𝑖 𝑓𝑖 𝑥𝑖2 − (∑ 𝑓𝑖 𝑥𝑖 )2
𝑠=√
Interpretation: The ______ varies by _______. 𝑛(𝑛 − 1)

Interpretation: on the average, the distance of each observation from


the mean is ________ .
[Range] Coefficient of Variation
𝑆𝐷
𝐶𝑉 =
𝑚𝑒𝑎𝑛
𝑆𝐷
𝑅𝑆𝐷 = × 100
𝑚𝑒𝑎𝑛

Interpretation:
Measures of Location
Quartile (Qi) Decile (Di)
i = 1,2,3 i = 1, 2, 3,… 9
quartile class decile class
𝑛×𝑖 𝑛×𝑖
𝑐𝑓 ≥ 𝑐𝑓 ≥
4 10
ungrouped: ungrouped:
even even
𝑛 𝑡ℎ 𝑛 𝑡ℎ 𝑛 𝑡ℎ 𝑛 𝑡ℎ
( × 𝑖) + [( × 𝑖) + 1] ( × 𝑖) + [( × 𝑖) + 1]
𝑄𝑖 = 4 4 𝐷𝑖 = 10 10
2 2
odd odd
(𝑛 + 1)𝑖 (𝑛 + 1)𝑖
𝑄𝑖 = 𝑡ℎ 𝐷𝑖 = 𝑡ℎ
4 10
grouped: grouped:
𝑛 𝑛
( × 𝑖) −< 𝐶𝐹𝑄𝑖−1 ( × 𝑖) −< 𝐶𝐹𝐷𝑖−1
𝑄𝑖 = 𝐿𝐶𝐵𝑄𝑖 + 4 ×𝑐 𝐷𝑖 = 𝐿𝐶𝐵𝐷𝑖 + 10 ×𝑐
𝑓𝑄𝑖 𝑓𝐷𝑖
LCBQi is the LCB of the Qith class
c is the class width
<CFQi – 1 is the cf of the class before the Qith class
fQi is the frequency of the Qith class
Percentile (Pi) Interpretations:
i = 1, 2, 3,… 99 ____% of the observations are below or equal to ____ and ____%
percentile class observations are above _____.
𝑛×𝑖
𝑐𝑓 ≥
100 Percentile Decile Quartile
ungrouped: P90 D9
even P80 D8
𝑛 𝑡ℎ 𝑛 𝑡ℎ Q3 = P75
( × 𝑖) + [( × 𝑖) + 1] P70 D7
𝑃𝑖 = 100 100 P60 D6
2 P50 D5 Q2 = P50
odd
(𝑛 + 1)𝑖 P40 D4
𝑃𝑖 = 𝑡ℎ P30 D3
100 Q1 = P25
grouped: P20 D2
𝑛 P10 D1
( × 𝑖) −< 𝐶𝐹𝑃𝑖−1
𝑃𝑖 = 𝐿𝐶𝐵𝑃𝑖 + 100 ×𝑐
𝑓𝑃𝑖

Potrebbero piacerti anche