Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Chapter Objectives
Express quantitatively the degree and direction of the co-variation or association between two variables. Determine the validity and reliability of the covariation or association between two variables.
Introduction
The importance of examining the statistical relationship between two or more variables can be divided into the following questions and accordingly requires the statistical methods to answer these questions:
Is Is
there an association between two or more variables? If yes, what is form and degree of that relationship? the relationship strong or significant enough to be useful to arrive at a desirable conclusion? the relationship be used for predictive purposes, that is, to predict the most likely value of a dependent variable corresponding to the given value of independent variable or variables?
Can
Example
Family income & expenditure on luxury items. Yield of a crop & quantity of fertilizer used.
Introduction
A statistical technique that is used to analyze the strength and direction of the relationship between two quantitative variables, is called correlation analysis. A few definitions of correlation analysis are:
An
analysis of the relationship of two or more variables is usually called correlation. A. M. Tuttle the relationship is of a quantitative nature, the appropriate statistical tool for discovering and measuring the relationship and expressing it in a brief formula is known as correlation. Croxton and Cowden
When
Introduction
The coefficient of correlation, is a number that indicates the strength (magnitude) and direction of statistical relationship between two variables. The strength of the relationship is determined by the closeness of the points to a straight line when a pair of values of two variables are plotted on a graph. A straight line is used as the frame of reference for evaluating the relationship. The direction is determined by whether one variable generally increases or decreases when the other variable increases.
does not establish any relationship, it can be used as a source for testing null and alternative hypotheses about a population. For example, it has been proved that smoking causes lung damage. There is often multiple reasons of health problems, the reason of stress cannot be ruled out. Similarly, there is a positive correlation between the yield of rice and tea because the crops are influenced by the amount of rainfall. But the yield of any one is not influenced by other.
influence: There may be a high degree of relationship between two variables but it is difficult to say as to which variable is influencing the other.
For example, variables like price, supply, and demand of commodity are mutually correlated.
According to the principle of economics, as the price of a commodity increases, its demand decreases, so price influences the demand level. But if demand of a commodity increases due to growth in population, then its price also increases. In this case increased demand make an effect on the price. However, the amount of export of a commodity is influenced by an increase or decrease in custom duties but the reverse is normally not true.
Types of Correlations
There are three broad types of correlations:
Positive and
Linear and Simple,
negative,
non-linear,
4/14/2012
1-10
Example:
X 5 8 10 15 17 Y - 10 12 16 18 20 X 17 15 10 8 5 Y 20 18 16 12 10
A negative (or inverse) correlation refers to the change in the values of variables in opposite direction.
X 17 15 10 8 5 Y - 10 12 16 18 20 X 17 15 10 8 5 Y 20 18 16 12 10
Linear Correlation
A linear correlation implies a constant change in one of the variable values with respect to a change in the corresponding values of another variable. In other words, a correlation is referred to as linear correlation when variations in the values of two variables have a constant ratio. Example:
X : 10 20 30 40
50
Y: 40 60 80 100 120
Non-Linear Correlation
A non-linear correlation an absolute change in one of the variable values with respect to change in values of another variable When the amount of change in the values of one variable does not bear a constant ratio to the
amount of change in the corresponding values of another variable. Example: x: 8 9 9 10 10 28 29 30 y: 80 130 170 150 230 560 460 600
If only two variables are chosen to study correlation between them, then such a correlation is referred to as simple correlation. Example: A study on the yield of a crop with respect to only amount of fertilizer, or sales revenue with respect to amount of money spent on advertisement, are a few examples of simple correlation.
Example:
(i)
yield of a crop is influenced by the amount of fertilizer applied, rainfall, quality of seed, type of soil, and pesticides Sales revenue from a product is influenced by the level of advertising expenditure, quality of the product, price, competitors, distribution, and so on.
(ii)
Example: Employer-employee relationship in any organization may be examined with reference to, training and development facilities; medical, housing, and education to children facilities; salary structure; grievances handling system etc.
method
Karl
Spearmans Method
Negative Correlation 1.00 Strong negative correlation Perfect negative correlation 0.50 Weak negative correlation 0
Weak positive Strong positive correlation correlation Perfect positive Moderate positive No correlation correlation correlation
A scatter diagram (or a graph) can be obtained on a graph paper by plotting observed (or known) pairs of values of variables x and y, taking the independent variable values on the x-axis and the dependent variable values on the y-axis.
Scatter diagram: A graph of pairs of values of two variables that is plotted to indicate a visual display of the pattern of their relationship.
low value of r does not indicate that the variables are unrelated but indicates that the relationship is poorly described by a straight line. A non-linear relationship may also exist.
correlation does not imply a cause-and-effect relationship, it is merely an observed association.
S=
r=
- (Sdx )
2 n Sdy
- ( S dy )
r=
n S fd x d y - ( S fd x ) ( S fd y )
2 2 n S fd x - ( S fd x )2 n S fd y - ( S fd y )2
correlation coefficient is appropriate to calculate when both variables x and y are measured on an interval or a ratio scale. variables x and y are normally distributed, and that there is a linear relationship between these variables.
Both
The
correlation coefficient is largely affected due to truncation of the range of values in one or both of the variables. This occurs when the distributions of both the variables greatly deviate from the normal shape.
is a cause and effect relationship between two variables that influences the distributions of both the variables. Otherwise correlation coefficient might either be extremely low or even zero.
There
1- r2 SEr = n
The probable error of the coefficient of correlation is calculated by the expression:
Thus with the help of PEr we can determine the range within which population coefficient of correlation is expected to fall using following formula: = r PEr
where (rho) represents population coefficient of correlation.
Remarks
If
r < PEr then the value of r is not significant, that is, there is no relationship between two variables of interest.
r > 6PEr then value of r is significant, that is, there exists a relationship between two variables.
If
r2 = 0, then no variation in y can be explain by the variable x. Where x is of no value in predicting the value of y. There is no association between x and y.
r2 = 1, then values of y are completely explained by x. There is perfect association between x and y.
If
variation in y explained by x. On the other hand value of r2 closer to 1 show that variable x can predict the actual value of the variable y.
r2 = 1-
S (y - y )2 n S y 2 - a S y - b S xy = 1=1 2 S(y - y ) n S y 2 - ( y )2
where y= a + bx is the estimated value of y for given values of x. One minus the ratio between these two variations is referred as the coefficient of determination.
The ranking is decided by using a set of ordinal rank numbers, with 1 for the individual observation ranked first either in terms of quantity or quality; and n for the individual observation ranked last in a group of n pairs of observations. Mathematically, across three types of cases.
Advantages
This
method is easy to understand and its application is simpler than Pearsons method.
This
method is useful for correlation analysis when variables are expressed in qualitative terms like beauty, intelligence, honesty, efficiency, and so on.
method is appropriate to measure the association between two variables if the data type is at least ordinal scaled (ranked) sample data of values of two variables is converted into ranks either in ascending order or descending order for calculating degree of correlation between two variables.
This
The
Disadvantages
Values
of both variables are assumed to be normally distributed and describing a linear relationship rather than non-linear relationship.
large computational time is required when number of pairs of values of two variables exceed 30.
method cannot be applied to measure the association between two variable grouped data.
This
For example, if two observations are ranked equal at third place, then the average rank of (3 + 4)/2 = 3.5 is assigned to these two observations.
Similarly, if three observations are ranked equal at third place, then the average rank of (3 + 4 + 5)/3 = 4 is assigned to these three observations.
1 1 3 3 6 Sd 2 + m1 - m1 + m2 - m2 + ... 12 12 R =1 - n (n 2 - 1)
where mi (i = 1, 2, 3, . . .) stands for the number of times an observation is repeated in the data set for both variables.