Class13 15

R l R h M h d l R l R h M h d l Rural Research Methodology Rural Research Methodology
PGP ABM II PGP ABM II

Data Analysis Data Analysis Data Analysis Data Analysis
Data Preparation Data Preparation pp
Coding Coding Coding Coding
Codebook Codebook
Metadata Metadata Metadata Metadata
Deciding on the data format Deciding on the data format
Data entry Data entry
Data cleaning Data cleaninggg
Handling missing data Handling missing data
Data Analysis Data Analysis yy
Type of analysis Type of analysis Type of analysis Type of analysis
Univariate Univariate
Descriptive Descriptive
Inferential Inferential
Bivariate Bivariate
Descriptive Descriptive Descriptive Descriptive
Multivariate Multivariate
Descriptive Descriptive
Method of analysis Method of analysis Method of analysis Method of analysis
To a large extent depends on level of measurement To a large extent depends on level of measurement
Univariate Descriptive Statistics Univariate Descriptive Statistics pp
Frequency distribution Frequency distribution q y q y
Measures of central tendency Measures of central tendency
Mode Mode
Median Median
Mean Mean
Measures of dispersion Measures of dispersion
Range Range
Average absolute deviation Average absolute deviation
Variance & Standard deviation Variance & Standard deviation
Percentiles & Percentiles & Quantiles Quantiles Percentiles & Percentiles & Quantiles Quantiles
Standardized Scores Standardized Scores
Frequency distribution: Frequency distribution: Frequency distribution: Frequency distribution:
How many observations occur in each response How many observations occur in each response
category of the variable. FD is a table of the category of the variable. FD is a table of the gg
outcomes, or response categories, of a variable outcomes, or response categories, of a variable
and the number of times each outcome is and the number of times each outcome is
observed observed observed. observed.
Relative FD Relative FD
Percent Percent
Cumulative frequency Cumulative frequency
Cumulative percent Cumulative percent
Frequency distribution Frequency distribution
Statistics
Interest in Movies
1504
13
Valid
Missing
N
Interest in Movies
Frequency Percent Valid Percent
Cumulative
Percent
467 30.8 31.1 31.1
872 57.5 58.0 89.0
165 10.9 11.0 100.0
1504 99.1 100.0
Great Interest
Some Interest
No Interest
Total
Valid
13 .9
1517 100.0
NA Missing
Total
Grouped Distribution: Grouped Distribution: Grouped Distribution: Grouped Distribution:
Grouped data is the data that have been collapsed Grouped data is the data that have been collapsed
into a smaller number of categories. Constructing a into a smaller number of categories. Constructing a g g g g
frequency distribution for a continuous variable frequency distribution for a continuous variable
first requires grouped data. first requires grouped data.
The process of grouping continuous variables from The process of grouping continuous variables from
many initial values into fewer categories is called many initial values into fewer categories is called y g y g
recoding. recoding.
Highest Year of School Completed
Cumulative
Grouped Distribution Grouped Distribution
2 .1 .1 .1
5 .3 .3 .5
5 .3 .3 .8
6 .4 .4 1.2
0
3
4
5
Valid
Cumulative
Percent
Statistics
Highest Year of School Comp
12 .8 .8 2.0
25 1.6 1.7 3.6
68 4.5 4.5 8.1
56 3.7 3.7 11.9
73 4.8 4.8 16.7
85 5 6 5 6 22 3
6
7
8
9
10
11 Highest Year of School Comp
1510
7
Valid
Missing
N
85 5.6 5.6 22.3
461 30.4 30.5 52.8
130 8.6 8.6 61.5
175 11.5 11.6 73.0
73 4.8 4.8 77.9
194 12 8 12 8 90 7
11
12
13
14
15
16 194 12.8 12.8 90.7
43 2.8 2.8 93.6
45 3.0 3.0 96.6
22 1.5 1.5 98.0
30 2.0 2.0 100.0
1510 99.5 100.0
16
17
18
19
20
Total 1510 99.5 100.0
7 .5
1517 100.0
Total
NA Missing
Total
Grouped Distribution Grouped Distribution
Statistics
Highest years of Schooling - Recoded
Highest Year of School Comp
1510
7
Valid
Missing
N
18 1.2 1.2 1.2
234 15.4 15.5 16.7
924 60.9 61.2 77.9
1 (0-5)
2 (6-10)
3 (11-15)
Valid
Cumulative
Percent
334 22.0 22.1 100.0
1510 99.5 100.0
7 .5
1517 100.0
( )
4 (16-20)
Total
System Missing
Total
Mode Mode Mode Mode
The category among the K categories in a The category among the K categories in a
distribution with the largest number of distribution with the largest number of distribution with the largest number of distribution with the largest number of
observations. observations.
A distribution may be bimodal. A distribution may be bimodal.
Mode is central tendency statistic applicable to Mode is central tendency statistic applicable to
nominal, ordinal, & interval variables. nominal, ordinal, & interval variables.
Median Median Median Median
The median is the outcome that divides an ordered The median is the outcome that divides an ordered
distribution exactly into halves. Half the cases will distribution exactly into halves. Half the cases will distribution exactly into halves. Half the cases will distribution exactly into halves. Half the cases will
have scores above the median value and half will have scores above the median value and half will
have scores below the median. have scores below the median.
For a grouped frequency distribution the median is For a grouped frequency distribution the median is
the value of that category at which the cumulative the value of that category at which the cumulative
percentage reaches 50%. percentage reaches 50%. percentage reaches 50%. percentage reaches 50%.
Mode is central tendency statistic applicable to Mode is central tendency statistic applicable to
ordinal & interval variables ordinal & interval variables
Mean: Mean: Mean: Mean:
The arithmetic average of a set of data in which the values The arithmetic average of a set of data in which the values
of all observations are added together and divided by the of all observations are added together and divided by the
number of observations Applicable to interval variables number of observations Applicable to interval variables number of observations. Applicable to interval variables number of observations. Applicable to interval variables
N
Y
Y
i
=
Mean of grouped frequency distribution: Mean of grouped frequency distribution:
N
Y f
k
) (
N
Y f
Y
i
i i
=
=
1
) (
f
i
=The frequency of cases with score Yi
K =The no. of categories in the distribution
Range: Range: Range: Range:
The difference between the largest and smallest The difference between the largest and smallest
scores in a distribution. scores in a distribution.
Average absolute deviation: Average absolute deviation: Average absolute deviation: Average absolute deviation:
The mean of the absolute values of the difference The mean of the absolute values of the difference
between a set of continuous measures and their between a set of continuous measures and their
mean. mean.
N
di
AAD
M M
=
N
Variance and Standard Deviation: Variance and Standard Deviation: Variance and Standard Deviation: Variance and Standard Deviation:
Variance is the mean squared deviation of a Variance is the mean squared deviation of a
continuous distribution. continuous distribution.
N
2
2
i 1
Y
(Yi Y)
S
N
=

=
Standard deviation is the square root of the Standard deviation is the square root of the
variance variance
N
variance variance
2
Y Y
S S =
Percentile: Percentile: Percentile: Percentile:
Is the outcome or score below which a given Is the outcome or score below which a given
percentage of the observations falls. percentage of the observations falls.
Th di i h 50 Th di i h 50
th th
il il The median is the 50 The median is the 50
th th
percentile. percentile.
Quantiles: Quantiles:
A di i i f b ti i t ith k A di i i f b ti i t ith k A division of observations into groups with known A division of observations into groups with known
proportions in each group. proportions in each group.
Percentiles divide observations into 100 equal Percentiles divide observations into 100 equal
groups groups
Quartiles divide the observations into 4 equal Quartiles divide the observations into 4 equal
groups of equal sizes, Quintiles into 5 equal groups of equal sizes, Quintiles into 5 equal groups of equal sizes, Quintiles into 5 equal groups of equal sizes, Quintiles into 5 equal
groups, and Deciles into 10 equal groups groups, and Deciles into 10 equal groups
St ti ti
Deciles Deciles
Statistics
Highest Year of School Completed
1510
7
Valid
Missing
N
7
9.00
11.00
12.00
Missing
10
20
25
Percentiles
12.00
12.00
12.00
13.00
30
40
50
60
13.00
14.00
15.00
16.00
16 00
70
75
80
90
16.00
90
Standardized scores (Z scores) Standardized scores (Z scores) Standardized scores (Z scores) Standardized scores (Z scores)
A transformation of the scores of a continuous A transformation of the scores of a continuous
frequency distribution by subtracting the mean frequency distribution by subtracting the mean gg
from each outcome and dividing by the from each outcome and dividing by the
standard deviation. standard deviation.
Useful for comparing scores across distributions Useful for comparing scores across distributions Useful for comparing scores across distributions. Useful for comparing scores across distributions.
Mean of Z scores equals zero and variance and Mean of Z scores equals zero and variance and
standard deviation equal 1. standard deviation equal 1. qq
i
i
S
) Y (Y
Z

=
y
S
INFERENTIAL ANALYSIS INFERENTIAL ANALYSIS
Inferential Analysis: From Sample to Inferential Analysis: From Sample to
Population Population Population Population
Researchers usually have no interest in Researchers usually have no interest in Researchers usually have no interest in Researchers usually have no interest in
studying the characteristics of a sample studying the characteristics of a sample
per se per se per se. per se.
Major objective is to draw inferences Major objective is to draw inferences
about the population from which the about the population from which the about the population from which the about the population from which the
sample was drawn. sample was drawn.
S l t ti ti i d f ti ti f S l t ti ti i d f ti ti f Sample statistic is used for estimation of Sample statistic is used for estimation of
the population parameter the population parameter
Population and Sample descriptions Population and Sample descriptions p p p p p p
Name Name Sample Statistic Sample Statistic Population Parameters Population Parameters
Mean Mean
N
Y
Y
i
=
=
= =
k
1 i
i i Y
) p(Y Y E(Y)
Variance Variance
1 N
) Y (Yi
S
N
1 i
2
2
Y
=

=
= =
k
1 i
i
2
i i
2
y
2
) p(Y ) (Y ) E(Y
Standard Standard
Deviation Deviation
2
Y Y
S S =
2
Y Y
=
Inferential Statistics Inferential Statistics
Significance tests Significance tests Significance tests Significance tests
Making interval estimates Making interval estimates
Let us first clarify some basic concepts
Basic Probability Concepts Basic Probability Concepts y p y p
Probability distribution: a set of outcomes Probability distribution: a set of outcomes Probability distribution: a set of outcomes, Probability distribution: a set of outcomes,
each of which has an associated each of which has an associated
probability of occurrence probability of occurrence probability of occurrence. probability of occurrence.
In deck of cards In deck of cards
Probability of randomly drawing a card from the Probability of randomly drawing a card from the Probability of randomly drawing a card from the Probability of randomly drawing a card from the
heart suit is 13/52 or (0.25). heart suit is 13/52 or (0.25).
Probability of randomly drawing an ace of spades Probability of randomly drawing an ace of spades
i 1/52 (0 019) i 1/52 (0 019) is 1/52 (0.019). is 1/52 (0.019).
If an outcome cannot occur it has a probability of If an outcome cannot occur it has a probability of
0.00. 0.00. 0.00. 0.00.
Basic Probability Concepts Basic Probability Concepts y p y p
Continuous probability distribution: Continuous probability distribution: Continuous probability distribution: Continuous probability distribution:
A probability distribution for a continuous variable, A probability distribution for a continuous variable,
with no interruptions or spaces between the with no interruptions or spaces between the
outcomes of the variable. outcomes of the variable.
p(Y)
P(a Y b) =
a b Y
Normal Distribution & Confidence Intervals Normal Distribution & Confidence Intervals
(Source: http://trochim human cornell edu/kb/sampstat htm) (Source: http://trochim human cornell edu/kb/sampstat htm) (Source: http://trochim.human.cornell.edu/kb/sampstat.htm) (Source: http://trochim.human.cornell.edu/kb/sampstat.htm)
Mean = 3.75
S.D. = 0.25
Sampling Distribution Sampling Distribution p g p g
Central Limit Theorem: Central Limit Theorem: Central Limit Theorem: Central Limit Theorem:
If all possible random samples of If all possible random samples of NN are drawn are drawn
from any population with mean from any population with mean
yy
and variance and variance
yy
22
yy
, then as , then as NN grows large, these sample means grows large, these sample means
approach a approach a normal distribution normal distribution, with mean , with mean
yy
and and
variance variance
22
/N /N variance variance
yy
/N /N..
The hypothetical distribution of all possible The hypothetical distribution of all possible yp p yp p
(infinite) means for samples of size (infinite) means for samples of size NN is called the is called the
sampling distribution sampling distribution of sample means. of sample means.
Sampling Distribution Sampling Distribution (Source: http://trochim.human.cornell.edu/kb/sampstat.htm) (Source: http://trochim.human.cornell.edu/kb/sampstat.htm) p g p g
Sampling Distribution Sampling Distribution
Standard Error: Standard Error: Standard Error: Standard Error:
Standard deviation of the sampling distribution is referred Standard deviation of the sampling distribution is referred
to as the Standard Error. It indicates distribution pattern of to as the Standard Error. It indicates distribution pattern of
different samples different samples different samples. different samples.
N
Y
Y
=
Sampling Error: Sampling Error:
Standard error in the sampling context is called Sampling Standard error in the sampling context is called Sampling Standard error in the sampling context is called Sampling Standard error in the sampling context is called Sampling
Error. Error.
Confidence Intervals Confidence Intervals
We do not know the sampling distribution We do not know the sampling distribution We do not know the sampling distribution We do not know the sampling distribution
but know the distribution of the sample but know the distribution of the sample
We set sampling mean to the mean from We set sampling mean to the mean from We set sampling mean to the mean from We set sampling mean to the mean from
our sample and calculate our sample and calculate standard error standard error
from our sample We now can construct from our sample We now can construct from our sample. We now can construct from our sample. We now can construct
the sampling distribution in order to the sampling distribution in order to
estimate confidence intervals for the estimate confidence intervals for the estimate confidence intervals for the estimate confidence intervals for the
population parameter. population parameter.
Confidence Intervals Confidence Intervals (Source: http://trochim.human.cornell.edu/kb/sampstat.htm (Source: http://trochim.human.cornell.edu/kb/sampstat.htm
Inferential Statistics Inferential Statistics
Significance tests Significance tests Significance tests Significance tests
Making interval estimates Making interval estimates
Significance Tests: Significance Tests: Nominal & Ordinal Variables Nominal & Ordinal Variables
Binomial test for dichotomous variables Binomial test for dichotomous variables Binomial test for dichotomous variables Binomial test for dichotomous variables
Government should provide electricity free of cost
to the farmers
Agree Disagree
Population assumption of no difference Population assumption of no difference 50% 50% 50% 50%
Sample observation Sample observation 47% 47% 53% 53% pp
The sample is unrepresentative. The discrepancy in our assumption and
sample observation is due to sampling error.
Our assumption (null hypothesis) of equal split in the population is
incorrect.
Binomial test for dichotomous variables Binomial test for dichotomous variables o a test o d c oto ous a ab es o a test o d c oto ous a ab es
Binomial test of statistical significance is the estimate of Binomial test of statistical significance is the estimate of
h l k l h d f b d l h h h l k l h d f b d l h h the likelihood of obtaining a random sample in which the likelihood of obtaining a random sample in which
sampling error produced a difference between categories sampling error produced a difference between categories
as big as we have observed ( as big as we have observed (53/47 53/47). ).
The figure obtained in this test range from 0.00 to 1.00 The figure obtained in this test range from 0.00 to 1.00
and are called and are called significance levels significance levels..
The The lower the significance lower the significance level the level the more the confidence more the confidence The The lower the significance lower the significance level, the level, the more the confidence more the confidence
that observed percentage differences reflect real that observed percentage differences reflect real
differences in the population. differences in the population.
Binomial test can be used for other splits against known Binomial test can be used for other splits against known Binomial test can be used for other splits against known Binomial test can be used for other splits against known
population proportions. population proportions.
One sample chi One sample chi- -square test square test pp qq
Is used for testing differences across the categories of a variable Is used for testing differences across the categories of a variable
with three or more categories. with three or more categories.
Farmers Orientation towards Farming
Business Subsistence Others
Population assumption of no difference Population assumption of no difference 33 3% 33 3% 33 3% 33 3% 33 3% 33 3% Population assumption of no difference Population assumption of no difference 33.3% 33.3% 33.3% 33.3% 33.3% 33.3%
Sample observation (332) Sample observation (332) 33.1% 33.1% 37.0% 37.0% 29.8% 29.8%
Chi Chi- -square 2.6 (p=0.27) square 2.6 (p=0.27)
There is a 27% chance that the difference across categories are due to There is a 27% chance that the difference across categories are due to
l h f h h l h f h h ll h h ll h h h h l h h l sampling error. Therefore continue with the sampling error. Therefore continue with the null hypothesis null hypothesis that each value that each value
orientation is equally prevalent. orientation is equally prevalent.
Interval Estimates: Interval Estimates: Nominal & Ordinal Variables Nominal & Ordinal Variables
Rather than estimating how likely the sample Rather than estimating how likely the sample Rather than estimating how likely the sample Rather than estimating how likely the sample
pattern will hold in the population, interval pattern will hold in the population, interval
estimate procedures calculate the likely margin estimate procedures calculate the likely margin
or error in the sample figures. or error in the sample figures.
Suppose in a survey 35% of the respondents Suppose in a survey 35% of the respondents
view view farming farming as a as a way of life way of life. What is the . What is the
likely margin of error of this estimate? How close likely margin of error of this estimate? How close
is the true population figure likely to be 35%? is the true population figure likely to be 35%? is the true population figure likely to be 35%? is the true population figure likely to be 35%?
Interval Estimates: Interval Estimates: Nominal & Ordinal Variables Nominal & Ordinal Variables
Compute Compute standard error standard error of the binomial: of the binomial: pp
N
PQ
S
B
=
N
S
B
= Std. error for the binomial distribution
P = Per cent in the category of interest
Q = Per cent in the remaining category(ies)
N = No of cases in the sample N = No. of cases in the sample
Confidence interval = PS
B
Significance Tests: Significance Tests: Interval Variables Interval Variables gg
Binomial and Chi Binomial and Chi--square tests of square tests of Binomial and Chi Binomial and Chi square tests of square tests of
significance can be used for interval data. significance can be used for interval data.
But with interval data we do not need to But with interval data we do not need to But with interval data we do not need to But with interval data we do not need to
limit our analysis to examination of limit our analysis to examination of
percentages only percentages only percentages only. percentages only.
We can test whether the sample mean We can test whether the sample mean
differs from an assumed or known differs from an assumed or known differs from an assumed or known differs from an assumed or known
population mean. population mean.
Significance Tests: Significance Tests: Interval Variables Interval Variables gg
tt--test test tt test test
Average Annual income
Sample mean (560 Rural Sample mean (560 Rural Rs 35277 Rs 35277 Sample mean (560 Rural Sample mean (560 Rural
People) People)
Rs. 35277 Rs. 35277
Known population mean Known population mean Rs. 38922 Rs. 38922
Null Hypothesis Null Hypothesis The mean in the sample is the same as the The mean in the sample is the same as the Null Hypothesis Null Hypothesis The mean in the sample is the same as the The mean in the sample is the same as the
known population mean known population mean
TT--test significance level test significance level 0.000 0.000
Interpretation Interpretation The difference in average income of rural The difference in average income of rural Interpretation Interpretation The difference in average income of rural The difference in average income of rural
people and that of the general population is people and that of the general population is
sufficiently large for a sample of this size sufficiently large for a sample of this size
that it almost certainly reflects a real that it almost certainly reflects a real yy
population difference rather than being due population difference rather than being due
to sampling error to sampling error
Source: de vaus (2002)
Interval Estimates: Interval Estimates: Interval Variables Interval Variables
Using the same general logic as with nominal Using the same general logic as with nominal Us g t e sa e ge e a og c as t o a Us g t e sa e ge e a og c as t o a
and ordinal variables we estimate the margin and ordinal variables we estimate the margin
of error. However, in place of percentages, of error. However, in place of percentages,
we estimate margin of error of sample we estimate margin of error of sample we estimate margin of error of sample we estimate margin of error of sample
means. means.
N
s
S
M
=
S
M
= Std. error of the mean
s = Std deviation s = Std. deviation
N = No. of cases in the sample
Confidence interval = PS
M
Bivariate Analysis: Bivariate Analysis: Nominal & Ordinal Variables Nominal & Ordinal Variables
Bivariate Bivariate analysis provides a systematic way of analysis provides a systematic way of a ate a ate a a ys s p o des a syste at c ay o a a ys s p o des a syste at c ay o
measuring whether two variables are associated measuring whether two variables are associated
(related). (related).
Using Using univariate univariate analysis we established analysis we established
variation among people; variation among people; bivariate bivariate analysis analysis
explains this variation explains this variation explains this variation. explains this variation.
If two variables are associated then knowing a If two variables are associated then knowing a
persons characteristic on just one variable persons characteristic on just one variable person s characteristic on just one variable person s characteristic on just one variable
improves our prediction about other improves our prediction about other
characteristics of that person. characteristics of that person.
Frequency Distributions
Importance of Crop Insurance
Cumulative
Statistics
Importance of Crop Insurance
836 46.4 46.4 46.4
574 31.9 31.9 78.2
132 7.3 7.3 85.6
62 3.4 3.4 89.0
198 11.0 11.0 100.0
VERY IMPORTANT
FAIRLY IMPORTANT
OF LITTLE IMPORTANCE
OF NO IMPORTANCE
DONT KNOW
Valid
Frequency Percent Valid Percent Percent
1802
0
2.01
2.00
1
Valid
Missing
N
Mean
Median
Mode
Statistics
SEX
1802 100.0 100.0 Total 1.290 Std. Deviation
Statistics
SEX
1802
0
2
Valid
Missing
N
Mode
842 46.7 46.7 46.7
960 53.3 53.3 100.0
1802 100.0 100.0
MALE
FEMALE
Total
Valid
Cumulative
Percent
Cross Cross- -tabulations tabulations
Column Marginals
Importance of Crop Insurance * SEX
Crosstabulation
Column Marginals
Independent Var. Count or cell freq.
Crosstabulation
Count
MALE FEMALE
SEX
Total
432 404 836
241 333 574
54 78 132
26 36 62
VERY IMPORTANT
FAIRLY IMPORTANT
OF NO IMPORTANCE
Dependent Var.
26 36 62
89 109 198
842 960 1802
OF NO IMPORTANCE
DONT KNOW
Total
Row Marginals
IMPORTANCE OF CROP INSURANCE * SEX Crosstab lation
IMPORTANCE OF CROP INSURANCE * SEX Crosstabulation
432 404 836
51.7% 48.3% 100.0%
Count
% within CROP
IINSURANCE
IMPORTANCE
VERY IMPORTANT
MALE FEMALE
SEX
Total
51.3% 42.1% 46.4%
24.0% 22.4% 46.4%
241 333 574
42.0% 58.0% 100.0%
28 6% 34 7% 31 9%
% within SEX
% of Total
Count
% within CROP
IINSURANCE
IMPORTANCE
%within SEX
FAIRLY IMPORTANT
Column percent
Row percent
28.6% 34.7% 31.9%
13.4% 18.5% 31.9%
54 78 132
40.9% 59.1% 100.0%
6.4% 8.1% 7.3%
3 0% 4 3% 7 3%
% within SEX
% of Total
Count
% within CROP
IINSURANCE
IMPORTANCE
% within SEX
%of Total
Total percent
3.0% 4.3% 7.3%
26 36 62
41.9% 58.1% 100.0%
3.1% 3.8% 3.4%
1.4% 2.0% 3.4%
89 109 198
% of Total
Count
% within CROP
IINSURANCE
TO IMPORTANCE
% within SEX
% of Total
C t
OF NO IMPORTANCE
DONT KNOW 89 109 198
44.9% 55.1% 100.0%
10.6% 11.4% 11.0%
4.9% 6.0% 11.0%
842 960 1802
Count
% within CROP
IINSURANCE
TO IMPORTANCE
% within SEX
% of Total
Count
DONT KNOW
Total
46.7% 53.3% 100.0%
100.0% 100.0% 100.0%
46.7% 53.3% 100.0%
% within CROP
IINSURANCE
TO IMPORTANCE
% within SEX
% of Total
Cross Cross--tabulations tabulations
CROP INSURANCE IMPORTANCE* SEX Crosstabulation
MALE FEMALE
SEX
T t l
Column percent
432 404 836
51.3% 42.1% 46.4%
241 333 574
28.6% 34.7% 31.9%
Count
% within SEX
Count
% within SEX
1
2
MALE FEMALE Total
Column percent
28.6% 34.7% 31.9%
54 78 132
6.4% 8.1% 7.3%
26 36 62
3.1% 3.8% 3.4%
Count
% within SEX
Count
% within SEX
3
4
89 109 198
10.6% 11.4% 11.0%
842 960 1802
100.0% 100.0% 100.0%
Count
% within SEX
Count
% within SEX
5
Total
If the two variables in any cross If the two variables in any cross--tabulation are tabulation are If the two variables in any cross If the two variables in any cross tabulation are tabulation are
independent, the formula for the expected independent, the formula for the expected
frequency in row frequency in row i i and column and column jj is: is:
) )(f (f
f
.j i.
ij
=
)
marginal row i in the total the f
column j the & row i the in cell the of frequency expected the f
N
th
i.
th th
ij
ij
=
=
)
table entire for the size sample or total, grand the N
marginal column j in the total the f
th
.j
=
=
The chi The chi--square test statistic summarizes the differences across the square test statistic summarizes the differences across the
cells between the observed frequencies and the expected cells between the observed frequencies and the expected
frequencies. Chi frequencies. Chi- -square is calculated by the formula: square is calculated by the formula:

R C
ij ij
2
)2 O (E
= =
=
1 i 1 j
ij
ij ij
2
E
) (
E
ij
= The expected frequency in the i
th
row, j
th
column under independence.
O
ij
= The observed frequency in the corresponding cell. O
ij
The observed frequency in the corresponding cell.
C = The number columns in the cross-tabulation.
R = The number of rows in the cross tabulation
Compare the computed value of chi-square with the critical value Compare the computed value of chi-square with the critical value
at the relevant degrees of freedom. If the computed chi-square
statistic is greater than the critical value at an acceptable
significance level, reject the null hypothesis
df = (R-1)(C-1)
How often go market * SEX Crosstabulation
432 404 836
390 6 445 4 836 0
Count
Expected Count
1
MALE FEMALE
SEX
Total
390.6 445.4 836.0
51.3% 42.1% 46.4%
241 333 574
268.2 305.8 574.0
28.6% 34.7% 31.9%
54 78 132
61 7 70 3 132 0
Expected Count
% within SEX
Count
Expected Count
% within SEX
Count
E t d C t
2
3
61.7 70.3 132.0
6.4% 8.1% 7.3%
26 36 62
29.0 33.0 62.0
3.1% 3.8% 3.4%
89 109 198
Expected Count
% within SEX
Count
Expected Count
% within SEX
Count
4
5
92.5 105.5 198.0
10.6% 11.4% 11.0%
842 960 1802
842.0 960.0 1802.0
100.0% 100.0% 100.0%
Expected Count
% within SEX
Count
Expected Count
% within SEX
Total
Chi-Square Tests
16.022
a
4 .003
16.047 4 .003
Pearson Chi-Square
Likelihood Ratio
Linear by Linear
Value df
Asymp. Sig.
(2-sided)
5.753 1 .016
1802
Linear-by-Linear
Association
N of Valid Cases
0 cells (.0%) have expected count less than 5. The
minimum expected count is 28.97.
a.
Bivariate Analysis: Bivariate Analysis: Interval Variables Interval Variables
Large number of possible values; cannot Large number of possible values; cannot Large number of possible values; cannot Large number of possible values; cannot
apply methods of nominal or ordinal apply methods of nominal or ordinal
variables variables variables. variables.
Collapse the categories or Collapse the categories or Collapse the categories or Collapse the categories or
Use techniques that can handle a large number of Use techniques that can handle a large number of
numeric values numeric values
Bivariate Analysis: Bivariate Analysis: Interval Variables Interval Variables yy
Collapsing of categories: Collapsing of categories: Collapsing of categories: Collapsing of categories:
Loss of information Loss of information Loss of information Loss of information
Cannot use powerful statistical tools Cannot use powerful statistical tools
Therefore use different techniques that can handle large number of categories Therefore, use different techniques that can handle large number of categories
Dependent: interval, Independent: dichotomous Dependent: interval, Independent: dichotomous p , p p , p
Comparison of Comparison of Means Means: t : t- -test test
C P i S Case Processing Summary
N Percent N Percent N Percent
Included Excluded Total
Cases
40081 65.9% 20708 34.1% 60789 100.0%
Amount spent on
nature related activity *
Sex of the respondent
Report Report
Amount spent on nature related activity
992.42 19560 3659.611
Male
Mean N Std. Deviation
464.21 20521 2260.503
721.98 40081 3036.691
Female
Total
Dependent: interval, Independent: dichotomous Dependent: interval, Independent: dichotomous p , p p , p
Comparison of Comparison of Means Means: t : t- -test test
Group Statistics Group Statistics
19560 992.42 3659.611 26.167
20521 464 21 2260 503 15 780
Male
F l
Amount spent on
nature related activity
N Mean Std. Deviation
Std. Error
Mean
20521 464.21 2260.503 15.780
Female
Independent Samples Test
Levene's Test for
Equal variances Amount spent on
F Sig.
Equality of
Variances
t df Sig. (2-tailed)
Mean
Difference
Std. Error
Difference Lower Upper
95% Confidence
Interval of the
Difference
t-test for Equality of Means
456.503 .000 17.473 40079 .000 528.20 30.230 468.952 587.457
17.286 32300.106 .000 528.20 30.557 468.312 588.097
Equal variances
assumed
Equal variances
not assumed
Amount spent on
Dependent: interval, Independent: interval Dependent: interval, Independent: interval Dependent: interval, Independent: interval Dependent: interval, Independent: interval
Pearsons Correlation coefficient Pearsons Correlation coefficient
Correlations
W kl
Amount spent
t
1 -.032**
. .000
60789 40081
Pearson Correlation
Sig. (2-tailed)
N
Weekly earnings
Weekly
earnings
on nature
related activity
60789 40081
-.032** 1
.000 .
40081 40081
N
Pearson Correlation
Sig. (2-tailed)
N
Amount spent on
C l ti i i ifi t t th 0 01 l l (2 t il d)
**
Correlation is significant at the 0.01 level (2-tailed).
.
Dependent: interval, Independent: interval Dependent: interval, Independent: interval
Correlations
Pearsons Correlation coefficient Pearsons Correlation coefficient
1 -.032** .022
Pearson Correlation Weekly earnings
Weekly
earnings
Amount spent
on nature
related activity
Total spent on
membership
fees, donation
. .000 .215
60789 40081 3317
-.032** 1 .109**
.000 . .000
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
Amount spent on
40081 40081 3317
.022 .109** 1
.215 .000 .
N
Pearson Correlation
Sig. (2-tailed)
N
Total spent on
membership fees,
donation
3317 3317 3317
Correlation is significant at the 0.01 level (2-tailed).
**.

Class13 15

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Class13 15

Caricato da

Copyright:

Formati disponibili

R l R h M h d l R l R h M h d l Rural Research Methodology Rural Research Methodology

PGP ABM II PGP ABM II

Potrebbero piacerti anche