Sei sulla pagina 1di 32

Operalization II

Statistical Tools for Data Analysis

Tools for Data Analysis could be

different with respect to type of data.


Several statistical tools could be used
to analyze data such as
a) qualitative and quantitative
b) time series & cross section
C) panel data.

Specification of procedure and analytical


tool to be used are related to the
objectives of the study.
Tool for data analysis are classified
into three broad categories.
Univariate Tools
Bivariate Tools
Multivariate Tools
These classification is based on the
number of variables a tool uses.

Classification of statistical Tools (Contd)


Univariate Tools for Data Analysis
Frequency Tables & Distribution
Histogram
Ogive or Cumulative frequency curve
Pie-Charts
MCTs
Measures of Dispersion
Kurtosis etc
Some of these tools are visual aids.

Classification of statistical Tools (Contd)

Bivariate Tools for Data Analysis


Cross Tables
Scatter Plots
Correlation
Bivariate Regression
Trend Lines
Binary Choice (Logistic Regression, Linear
Probability Models using two variables)
Etc

Classification of statistical Tools (Contd)


A few Multivariate Tools are as follows
Multiple Regression
Factor Analysis
Cluster Analysis
Discriminant analysis
Multivariate Analysis of variance (MANOVA)
Conjoint Analysis
Canonical Correlation
Multi Dimensional Scaling (MDS)
Structural Equation Modeling (SEM)
Logistic Regression using more than 2 variables etc

Master Table for survey Research


Using a Questionnaire/schedule
When data are collected through questionnaire in
a social research a master table is prepared to
summarize the data and conduct further analysis.
A Master Table could be prepared either manually
or using a soft ware package like an excel
package.
Code number could be used.
After summarizing the data different statistical
tools could be used to analyze those.

1. Univariate Tools
As the name suggests all the univariate tools use
one variable.
The primary objectives of use of univariate tools are
a) to summarize and introduce the sample to the
reader
b) to examine the nature of the variable in terms of
its distribution and other characteristics.
The tools could be broadly divided into two groups
such as graphical and numeric.
Broad conclusions on the nature of the distribution of
the data can be drawn from these tools.

Summarisation of data
Graphical representation of
data
Frequency distribution
Histogram, frequency
polygon, cumulative
frequency curves etc

Stem-and-leaf plot
The dot plot
The box and whisker plot
Pie chart, bar charts, Pareto
charts etc

Numerical representation of
data
Measures of central tendency
Mean, median and mode

Measures of variability
Range, inter-quartile range,
variance, standard deviation,
coefficient of variation

Measures of location
Quantiles such as quartiles,
deciles, and percentiles, and Zscore

Measures of shape
Coefficient of skewness and
coefficient of kurtosis

Univariate Tools (Contd)


Summary Statistics (Descriptive Statistics)
Often the researcher is interested to represent a set of
data in single number/figure with respect to a
variable.
For example : A researcher has a set of observation on
income of a group of persons.
He wants to summarize the variable for the group in
terms of average and deviation from the average.
Such statistical tools are known as descriptive
statistics since the number/figure describe the
distribution of the variables .

Univariate Tools (Contd)


Some of the Summary Statistics are:
Measures of Central Tendencies. Measures
of Dispersion, Measures of Peakedness .
a) MCT :
Arithmetic Mean, AM = Xi/N (Simple average
Weighted AM, WAM= WiXi/N ( Takes the importance
of each value to the overall total)
Geometric Mean: We use GM when we need to know
the average rate of growth of a series of numbers.
GM=Nth root of the product of n number of Xs.

Univariate Tools (Contd)


Harmonic Mean: It is used in cases where extreme
values (usually higher values) are there in a series
For exple: Let us consider the series of numbers
12,13,16,18, 11.16.19. 20.18. 17.14.89. 99.
Arithmetic Mean may not represent the series. In such
cases we use a harmonic mean to give less
weightage to the higher values.
H.M.=Reciprocal of the average of the reciprocals of N
number of Xs i.e 1/AM of 1/12,1/13..
Median: The most central item.
Mode: A value repeated most often

Univariate Tools (Contd)


Measures of Dispersion:
Range, Mean Deviation, Variance and Standard
Deviation.
They have several implications and uses in analyzing a
set of observations.
For Exple: 1+/- one standard deviation covers about
66% of the sample in a normal distribution.

Statistical Tests
Z and 't' Tests are used to examine the significance of
difference between sample and population means.
Similarly 2 'F' Tests are used to examine the
difference between sample and population variance.

2. Bivariate Tools
Bivariate Tools are used to highlight
relationship between two variables.
Some of the bivaraite tools are
a) Cross Tables, Graphs and Scatter Plots
b) Correlations (Rank and Simple)
c) Bivariate linear and non-linear regression
d) Binary Choice (Logistic Regression, Linear
Probability Models using two variables)
e)Trend Lines
Time is independent variable

Bivariate Tools (Contd..)


Scatter Plots (Gives an idea about the nature of relationship
between the variables)
Correlation
Rank (Spearman) and Simple (Karl Pearson) correlations are used
bivariate data analysis.
These two types of correlation differ with respect to the types of
data used. Ordinal scale (rank order) data are used for rank
correlation where as metric data are used in simple correlation.
Both of these use a specific formulae to calculate the correlation
coefficient which ranges from -1 to 1.
The correlation coefficient speaks about the direction and the
extent of correlation.
No cause and effect relationship is examined , but it should have
construct validity.
Redundant relationship should be avoided .

Bivariate Tools (Contd..)


Bivariate Linear & Non-Linear Regression
1. Linear Regression:
The simplest relationship between two variables is
a linear one which can be specified as follows;
Yi = + Xi + Ui , where
Y - Dependent variable
X- Independent variable
U- Error term or disturbance term .

Bivariate Tools (Contd..)


A scatter plot gives us some idea about the
relationship between two variables.
There could be alternative lines representing the
relationship between the variables.
Consider ei and ei2 about the alternative lines.
ei will not be comparable
ei2 will be non-negative and will vary with the spread
of the points from the lines.
Now, each line has and values .
Therefore ei2 conditional upon X i s will be a
function of and .

Bivariate Tools (Contd..)


What is ei ?
ei = Actual Observation on Y - Estimated Y
Therefore, ei2 = (yi - y^i) 2
Or
[yi - ( + Xi) ]2
ei2 = [yi - ( + Xi) ]2
Therefore we need to minimize ei2 with respect
to and which will identify the line giving
the least square error.
The estimated and are known as the Least
Square Estimates.

Bivariate Tools (Contd..)


The process of minimization gives two
normal equations with two unknowns.
By solving the equations we get the formula for
estimating the values of and .
= xiyi/ xi 2 ( In deviation form)
= Mean Y - Mean X
These estimates are known an Least Square Estimates.
With the help of these estimated values of the intercept
and the slope we can write the equation of the line of
best fit.

Bivariate Tools (Contd..)


The null hypothesis (Ho):
A Null Hypothesis which is commonly tested is
Ho : = 0
This means that there is no relation between X
and Y i.e. the line is a straight line parallel to
the X axis.
This null hypothesis (Ho) is rejected if the computed 't'
value is more than the tabulated 't' value with a
certain degree of freedom and significance level .
Rejection of the Ho means there is a relationship
between the two variables.

Bivariate Tools (Contd..)


The Coefficient of Determination ( R 2 )
Three quantities can be calculated from the line of
regression with respect to the given Y and X values.
TSS: Total Sum of Squares of the deviations
ESS : Explained sum of Squares
RSS: Residual Sum of Squares

R2 = Explained sum of Squares/ Total Sum of


Squares.(When RSS declines ESS tends to TSS and
R2 approaches 1 (One)
This is known as the explanatory power of the model.

Forms of Bivariate Regression Models


and their uses
Various forms of two variable regression
models have have different objectives/uses.
A few examples:
1. Simple Linear Model
Yi = + Xi + Ui , where
Y - Dependent variable, X- Independent variable
and U- Error term
It highlights the linear relationship between Y and X
as discussed earlier.

2. Linear Trend
The linear growth of a variable can be
calculated using a simple regression
model such as Y = + t + u, where Y
is the variable under consideration & t
is the time or trend variable.
The + ive or - ive trend of the variable
over the time period is determined by
looking at the sign of the slope or .

3.Log Linear Model Yi = Xi e , (Taking log)


Ln Yi = ln + ln Xi + ei
It is an exponential regression model ( known
as double log or log linear model).
This model is popular in applied work since
the slope coefficient measures elasticity of Y
with respect to X.(% change in Y due to %
change in X)
Exple: To estimate the advertising elasticity
of a product we may use this model
specifying : Sale Volume = f( Adv expdr).The
slope will give the adv elasticity.

4. Semi-log Regression Model.


The semi log model could be used to
measure growth rate of a variable over
a time period.
This model is specified as
ln Yi = + t + u
This is known as semi log model since only
one variable appears in log form. It is also
known as log-lin model.

Semi-log Rgression Model..Contd.


In the semi log model the slope coefficient
measures the constant proportion or relative
change in Y for a given absolute change in
X ( 't' in the above equation).
Slope x 100 will give the point of time
change in Y with respect to change in X .
Compound growth rate can be found by
The formulae: [ Antilog - 1] x 100

5. Quadratic/ Cubic Model:


The forms of a quadratic or a cubic model
could be
Y= a+bx+cX2 + u or Y= a+bx+cX2 +dX3+u
Since these models use one independent variable they
can be categorized under the two variable regression
equations.
The quadratic models are used to examine whether
minima or maxima exits in the curve depicting the
relationship between X and Y. Expl: Total Rev Curve, Av
Cost Curves etc. Cubic models are used in Total cost
functions etc.

Assignment 2
Collect relevant data and estimate the Five Forms of
Two Variable Regression Models explained above.
Use SPSS package for estimation with linear option.
Interpret the results
Exercises of the groups will be presented &
discussed in the next class.
The results may be summarized in a tabular form
shown in the next page.
The SPSS spread sheet incorporating the data may
be kept as a back up.

Summary of results:
Dependent variable.
Independent variable..
Equation

Intercept

Slope 1

Linear

12.098

0.098
(0.000)

Slope 2

Slope 3

R2
0.79

Potrebbero piacerti anche