Sei sulla pagina 1di 59

1.

Overview of Statistics & CoIIection of Data


1. 1 Introduction to statistics - Defination, types, basic terms,
IeveI of data measurement.
1.2 Methods of CoIIection of Data - Census & SampIing
Methods
Shaya'a Othman Definition of Statistics
'Statistics is a scientific method of coIIecting, organizing,
presenting, anaIyzing and interpreting of numericaI information,
deveIoped from mathematicaI theory of probabiIity, to assist in
making effective and efficient decision."
Definition by Shaya'a Othman,
OVERVIEW OF STATISTICS
CoIIecting & PubIishing
NumericaI data
Scienctific Method of
Collecting,
Organizing,
Presenting,
Analyzing ,
nterpreting,
numericaI information,
deveIoped from
mathematicaI theory of
probabiIity, to assist in
making effective and
efficient decision.
DESCRIPTIVE STATISTICS:
Methods of Organizing, and
Presenting Data in informative
way.
INFERENTIAL STATISTICS:
Methods of determine something
about popuIation base on sampIe.
Qualitative or attribute
(type of car owned)
discrete
(number of children)
continuous
(time taken for an exam)
Quantitative or numerical
DATA
LeveIs of Measurement
NominaI NominaI
OrdinaI OrdinaI
IntervaI IntervaI
Ratio Ratio
nferential
Descriptive
Science
c
o
m
m
o
n
ETHICS
MisIeading Data
Use of Average
Use of Graphic
Use of Association
Computer AppIication:
Microsoft ExceI
SPSS, NVivo (CAQDAS}
COMPUTER
STATISTICS
CoIIection of Data
Primary Data
Secondary Data
ensus [Total ount]
Sample [selected ount]
SAMPLING TECHNIQUES;
Systematic sampling
Stratified Sampling
Multi-dtage Sampling
Cluster sampling
Quota sampling
METHODS OF COLLECTING
Interviews Forms irect/Phone
Mailing Questionnaires
Computer -eMail, eFax, etc
Mobile Phone -SMS
MALAYSIAN GOVERNMENT
PUBLICATION:
Statistics ept. PM ept.
Econ. Planning Unit, PM ept
Research Institution -RRI,
PORIM, MARI,
Private Survey/Research Co.
INTERNATIONAL
ORGANAZIATION:
United Nations
OIC ,ASEAN,
World Bank, Islamic ev. Bank
overnment Publications
Private Publication/Data
TotaI Count
of PopuIation
SeIected Count
of PopuIation
Internet, Website,CIA ata
METHODS
nternets
COLLECTING
DATA
COLLECTON
OF
DATA
ANALYSS
OF
DATA
TEST
OF
HYPOTHESS
RESEARCH METHODOLOGY
WnA1 IS nC1nLSIS ?
STEPS ACTIONS DESCRIPTIONS
STEP 1
State Null and
Alternative hypothesis
Null Hypothesis : Ho = 0
Alternative Hypothesis : H1 = 0
Note :
1.Twotailed test if alternative hypothesis does not
state direction [ greater or less].
2. Onetailed test if alternative state direction.
STEP 2
Select Level of
Significance
1. .01 level [1% level] for consumer research
2. .05 level [5% level] for quality assurance
3. .10 level [10% level] for political pooling
STEP 3
dentify the test
Statistics
and t as test statistic , and others
Non-Parametric Test : F and X Chisquare statistic
STEP 4
Formulate Decision
Rule
Find the critical value of z from Normal Distribution
table , or value t from t distribution table where
appropriate.
STEP 5
Take a sample arrive
at decision
Only ONE DECSON is possible in Hypothesis
Testing
Do not reject NuII Hypothesis, or reject Null
Hypothesis and Accept Alternative Hypothesis
SS1LS kCCLDUkL ICk 1LS1ING nC1nLSIS
1wota||ed test |f a|ternat|ve hypothes|s
does not state d|rect|on
greater or |ess
Cneta||ed test |f a|ternat|ve state
d|rect|on
oss|b|||ty 1wo 1ype of Lrrors
1ype I and 1ype II
One-SampIe
Tests of Hypothesis
Two-SampIes
Tests of Hypothesis
Large sample
[ n more than 30]
Small Sample
[ n less than 30]
Large Sample
[n more than 30 ]
Small Sample
[n less than 30]
TwoTail Test
[No direction]
z = x u
o/\n
Using normal
distribution table
t = x u
s/ \n
df = n1
Using t distribution
table
z = x& x&
______
\[ (o& / n ) +(o/
n)
t = x& x&
______
\[ (s& /n& ) +(s&/ n&
)]
df = n + n 2
Using t distribution
table
OneTail Test
[With direction :
Greater or less than]


0 1.65
Do not
reject
[Probability =.95]
Region of
rejection
[Probability=.05]
Critical value
STATISTICAL TEST OF HYPOTHESIS
nypothes|s "A suppos|t|on or proposed exp|anat|on made on the bas|s of ||m|ted
ev|dence as a start|ng po|nt for further |nvest|gat|on Oxford uictionory
nypothes|s " A statement or con[ecture wh|ch |s ne|ther true nor fa|se sub[ected to be
ver|f|ed " 5hoyoo Othmon kul5
nypothes|s "A statement about a popu|at|on parameter deve|oped for the purpose of
test|ng " uouq/os 4 Lind 5totistico/ 1echniques on 8usiness conomics
nypothes|s 1est|ng "A procedure based on samp|e ev|dence and probab|||ty theory
to determ|ne whether the hypothes|s |s a reasonab|e statement "
uouq/os 4 Lind stotistico/ 1echniques on 8usiness conomics
Nu|| nypothes|s "A statement about a the va|ue of a popu|at|on parameter#
uouq/os 4 Lind stotistico/ 1echniques on 8usiness conomics
A|ternat|ve nypothes|s "A statement that |s accepted |f the samp|e data prov|de
suff|c|ent ev|dence that the nu|| hypothes|s |s fa|se#
uouq/os 4 Lind stotistico/ 1echniques on 8usiness conomics
Descr|b|ng Data Measures of Locat|on
opulaLlon Mean Sum of all Lhe values ln Lhe opulaLlon
number of values ln Lhe opulaLlon
Sample Mean Sum of values ln Lhe Sample zx
number of values ln Lhe Sample n
WelghLed Mean zwx
zw
arameLer A characLerlsLlc of opulaLlon
Medlan 1he mldpolnL of values afLer Lhey have been ordered from Lhe smallesL
Lo Lhe hlghesL
Mode 1he value of observaLlons LhaL appears mosL frequenLly
Descr|b|ng data Measures of D|spers|on
kange LargesL value Smaller value
Mean Dev|at|on 1he ArlLhmeLlc mean of Lhe absoluLe values of Lhe devlaLlon from
Lhe arlLhmeLlc mean
l x x l
n
where ls slgma sum of x value of each observaLlon
x arlLhmeLlc mean of Lhe values n ls number of
observaLlon l l lndlcaLes absoluLe values
Var|ance 1he arlLhmeLlc mean of Lhe of Lhe squared devlaLlon from Lhe mean
Standard Dev|at|on 1he Square 8ooL of Lhe varlance
Locat|on of ercent||es Lp (n+1)
100

Characteristics oI the Mean


It is calculated by
summing the values
and dividing by the
number oI values.
It requires the interval scale.
All values are used.
It is unique.
The sum of the deviations from the mean is 0.
%he Arithmetic Mean Arithmetic Mean
is the most widely used
measure oI location and
shows the central value oI the
data.
The major characteristics of the mean are:
Average
1oe
3- 17
!opulation Mean

3
here
a is the population mean
is the total number of observations.
is a particular value.
% indicates the operation of adding.
For ungrouped data, the
Population Mean Population Mean is the
sum of all the population
values divided by the total
number of population
values:
3- 18
ample 1
500 , 48
4
000 , 73 ... 000 , 56

3
ind the mean mileage Ior the cars.
A Parameter Parameter is a measurable characteristic oI a
population.
AHMAD`s
Iamily owns
Iour cars. %he
Iollowing is
the current
mileage on
each oI the
Iour cars.
56,000
23,000
42,000
73,000
3- 19
$ample Mean
n

here 3 is the total number of


values in the sample.
or ungrouped data, the sample mean is
the sum oI all the sample values divided
by the number oI sample values:
3- 2
ample 2
4 . 15
5
77
5
0 . 15 ... 0 . 14

A statistic statistic is a measurable characteristic oI a sample.


A sample oI
Iive
eecutives
received the
Iollowing
bonus last
year ($000):
14.0, 14.0,
15.0, 15.0,
17.0, 17.0,
16.0, 16.0,
15.0 15.0
3- 21
ample 4
89 . 0 $
50
50 . 44 $
15 15 15 5
) 15 . 1 ($ 15 ) 90 . 0 ($ 15 ) 75 . 0 ($ 15 ) 50 . 0 ($ 5


uring a one hour period on a


hot Saturday afternoon in
Langkai, Ahmad sold fifty
drinks. He sold five drinks for
$0.50,; fifteen for $0.75, fifteen
for $0.90, and fifteen for $1.10.
Compute the eighted mean of
the price of the drinks.
3- 22
%he Median
%here are as many
values above the
median as below it in
the data array.
or an even set oI values, the median will be the
arithmetic average oI the two middle numbers and is
Iound at the (n1)/2 ranked observation.
%he Median Median is the
midpoint oI the values aIter
they have been ordered Irom
the smallest to the largest.
3- 23
The ages for a sample of five INSANIAH
students visiting Islamic Artifact Exhibition:
21, 25, 19, 20, 22,18, 27.
Arranging the data
in ascending order
gives:
18,19, 20, 21, 22, 25, 27
Thus median 21.
%he median (continued)
3- 24
ample 5
Arranging the
data in ascending
order gives:
73, 76, 80
Thus the median is
76.
The heights of 3 INSANIAH Lecturers, in inches,
are: 76, 73, 80.
The median is found at the
(n+1)/2 (3+1)/2 2
th
data
point.
3- 25
%he Mode: ample 6
Example 6 Example 6: : %he eam scores Ior ten students are:
81, 93, 84, 75, 68, 87, 81, 75, 81, 87. Because the
score oI 81 occurs the most often, it is the mode.
Data can have more than one mode. II it has two
modes, it is reIerred to as bimodal, three modes,
trimodal, and the like.
%he Mode Mode is another measure oI location and
represents the value oI the observation that appears
most Irequently.
3- 2
Symmetric distribution Symmetric distribution: A distribution having the
same shape on either side of the center
Skeed distribution Skeed distribution: One whose shapes on either
side oI the center diIIer; a nonsymmetrical distribution.
Can be positively or negatively skeed, or bimodal
%he Relative !ositions oI the Mean, Median, and Mode
3- 27
%he Relative !ositions oI the Mean, Median, and Mode:
$ymmetric Distribution
ero skeness Mean
Median
Mode
Mode
Median
Mean
3- 28
%he Relative !ositions oI the Mean, Median, and Mode:
Right $kewed Distribution
Positively skeed: Mean and median are to the right oI the mode.
Mean~Median~Mode
Mode
Median
Mean
3- 29
Negatively Skeed: Mean and Median are to the leIt oI the Mode.
MeanMedianMode
%he Relative !ositions oI the Mean, Median, and
Mode: LeIt $kewed Distribution
Mode
Mean
Median
3- 3
eometric Mean
GM n
n
( )( )( )...( ) 1 2 3
The geometric mean is used to
average percents, indexes, and
relatives.
The Geometric Mean Geometric Mean
() of a set of n
numbers is defined as the
39 root of the product of
the 3 numbers. The
formula is:
3- 31
ample 7
The interest rate on three bonds ere 5, 21, and 4
percent.
The arithmetic mean is (5+21+4)/3 10.0.
The geometric mean is
49 . 7 ) 4 )( 21 )( 5 (
3
GM
%he GM gives a more conservative
proIit Iigure because it is not
heavily weighted by the rate oI
21percent.
3- 32
eometric Mean continued
1
period) oI beginning at (Value
period) oI end at Value (

n
GM
Another use oI the
geometric mean is to
determine the percent
increase in sales,
production or other
business or economic
series Irom one time
period to another.
Growth in Sales 19992004
0
10
20
30
40
50
1999 2000 2001 2002 2003 2004
Year
S
a
l
e
s

i
n

i
l
l
i
o
n
s
(
$
)
3- 33
ample 8
0127 . 1
000 , 755
000 , 835
8
GM
The total number of females enrolled in American
colleges increased from 755,000 in 1992 to 835,000
in 2000. That is, the geometric mean rate of increase
is 1.27.
3- 34
Descr|b|ng data Measures of D|spers|on
kange LargesL value Smaller value
Mean Dev|at|on 1he ArlLhmeLlc mean of Lhe absoluLe values of L he
devlaLlon from Lhe arlLhmeLlc mean
L l x x' l
n
where L ls slgma sum of x value of each observaLlon x' arlLhmeLlc
mean of Lhe values n ls number of observaLlon l l lndlcaLes absoluLe values
Var|ance 1he arlLhmeLlc mean of Lhe of Lhe squared devlaLlon from Lhe mean
Standard Dev|at|on 1he Square 8ooL of Lhe varlance
ispersion ispersion
reIers to the
spread or
variability in
the data.
Measures oI dispersion include the Iollowing: range range, ,
mean deviation mean deviation, , variance variance, and , and standard standard
deviation deviation.
Range Range Largest value $mallest value
Measures oI Dispersion
0
5
10
15
20
25
30
0 2 4 6 8 10 12
3- 3
%he Iollowing represents the current year`s Return on
quity oI the 25 companies in an investor`s portIolio.
-8.1 3.2 5.9 8.1 12.3
-5.1 4.1 6.3 9.2 13.3
-3.1 4.6 7.9 9.5 14.0
-1.4 4.8 7.9 9.7 15.0
1.2 5.7 8.0 10.3 22.1
ample 9
Highest value: 22.1
Lowest value: -8.1
Range Highest value lowest value
22.1-(-8.1)
30.2
3- 37
Mean Mean
eviation eviation
The arithmetic
mean of the
absolute values
of the
deviations from
the arithmetic
mean.
The main features of the
mean deviation are:
All values are used in the
calculation.
It is not unduly influenced by
large or small values.
The absolute values are
difficult to manipulate.
Mean Deviation
M
% X - X
n
3- 38
%he weights oI a sample oI crates containing books
Ior the IN$ANIAH Library (in pounds ) are:
103, 97, 101, 106, 103
Find the mean deviation.
X 102
%he mean deviation is:
4 . 2
5
5 4 1 5 1
5
102 103 ... 102 103

n

MD
ample 10
3- 39
'ariance 'ariance:: the
arithmetic mean
oI the squared
deviations Irom
the mean.
Standard deviation Standard deviation: %he
square root oI the variance.
Variance and standard Deviation
3- 4
Not influenced by extreme values.
The units are akard, the square of the
original units.
All values are used in the calculation.
The major characteristics of the
Population 'ariance Population 'ariance are:
!opulation Variance
3- 41
Population 'ariance Population 'ariance Iormula:
% (X - 3)
2
N
9

X is the value oI an observation in the population


m is the arithmetic mean oI the population
N is the number oI observations in the population
9

Population Standard eviation Population Standard eviation Iormula:


2
9
Variance and standard deviation
3- 42
(-8.1-6.62)
2
(-5.1-6.62)
2
... (22.1-6.62)
2
25
9

42.227
6.498
In Example 9, the variance and standard deviation are:
% (X - 3)
2
N
9

ample 9 continued
3- 43
Sample variance (s Sample variance (s
22
) )
s
2

%(X - X)
2
n-1
Sample standard deviation (s) Sample standard deviation (s)
2
8 8
$ample variance and standard deviation
3- 44
40 . 7
5
37

%


30 . 5
1 5
2 . 21
1 5
4 . 7 6 ... 4 . 7 7
1
2 2
2
2

n

8
ample 11
%he hourly wages earned by a sample oI Iive students are:
$7, $5, $11, $8, $6.
ind the sample variance and standard deviation.
30 . 2 30 . 5
2
8 8
3- 45
0
1
2
3
4
5
6
7
8
9
29 less 39 39 less 49 49 less 59 59 less 69
Series1
0
5
10
15
20
25
29 less 39 39 less 49 49 less 59 59 less 69
Series1
CumuIative
Frequency PoIygon
Histogram &
Frequency Polygon
ample 12
A sample oI ten
movie in %V
tallied the total
number oI
movies showing
in all %V
channel last
week. Compute
the mean
number oI
movies
showing.
Movies
shoing
frequency
1
class
midpoint

(1)()
1 up to 3 1 2 2
3 up to 5 2 4 8
5 up to 7 3 6 18
7 up to 9 1 8 8
9 up to
11
3 10 30
Total 10 66
6 . 6
10
66

%

n
f

3- 49
%he Median oI rouped Data
) (
2

f
CF
n
L Med,n


where L is the lower limit oI the median class, CF is the
cumulative Irequency preceding the median class, f is
the Irequency oI the median class, and is the median
class interval.
%he Median Median oI a sample oI data organized in a
Irequency distribution is computed by:
3- 5
Descr|b|ng Data Measures of Locat|on
Ior Grouped Data
MLAN
MLDIAN
MCDL
%he Mean oI rouped Data
n
f

%he Mean Mean oI a sample oI data


organized in a Irequency
distribution is computed by the
Iollowing Iormula:
3- 52
ample 12
A sample oI ten
movie theaters
in a large
metropolitan
area tallied the
total number oI
movies showing
last week.
Compute the
mean number oI
movies
showing.
Movies
shoing
frequency
1
class
midpoint

(1)()
1 up to 3 1 2 2
3 up to 5 2 4 8
5 up to 7 3 6 18
7 up to 9 1 8 8
9 up to
11
3 10 30
Total 10 66
6 . 6
10
66

%

3- 53
%he Median oI rouped Data
) (
2

f
CF
n
L Med,n


where L is the lower limit oI the median class, CF is the
cumulative Irequency preceding the median class, f is
the Irequency oI the median class, and is the median
class interval.
%he Median Median oI a sample oI data organized in a
Irequency distribution is computed by:
3- 54
inding the Median Class
%o determine the median class Ior grouped
data
Construct a cumulative Irequency distribution.
Divide the total number oI data values by 2.
Determine which class will contain this value. or
eample, iI n50, 50/2 25, then determine which
class will contain the 25
th
value.
3- 55
ample 12 continued
Movies
showing
requency Cumulative
requency
1 up to 3 1 1
3 up to 5 2 3
5 up to 7 3 6
7 up to 9 1 7
9 up to 11 3 10

3- 5
ample 12 continued
33 . 6 ) 2 (
3
3
2
10
5 ) (
2


f
CF
n
L Med,n
rom the table, L5, n10, f3, 2, CF3
3- 57
USNESS STATSTCS ; LECTURE NOTE [ShayaaOthman]

Potrebbero piacerti anche