Sei sulla pagina 1di 7

Cramer's V

Explanations > Social Research > Analysis > Cramer's V Description | Example | Discussion | See also

Description
Cramer's V is a way of calculating correlation in tables which have more than 2x2 rows and columns. It is used as post-test to determine strengths of association after chi-square has determined significance. V is calculated by first calculating chi-square, then using the following calculation: V = SQRT(2 / (n (k - 1)) ) where 2 is chi-square and k is the number of rows or columns in the table.

Discussion
Chi-square says that there is a significant relationship between variables, but it does not say just how significant and important this is. Cramer's V is a post-test to give this additional information. Cramer's V varies between 0 and 1. Close to 0 it shows little association between variables. Close to 1, it indicates a strong association. Where the table is 2 x 2, use Phi. Cramer's V is named after the Swedish mathematician and statistician Harald Cramr.

Phi () Correlation
Explanations > Social Research > Analysis > Phi () Correlation Description | Discussion | See also

Description
Phi () correlation is used to assess correlation between two variables where they are in a 2 x 2 table (ie. both variables are dichotomous). Phi is calculated by first calculating chi-square, then using the following calculation: = SQRT(2 / N)

Discussion

Chi-square says that there is a significant relationship between variables, but it does not say just how significant and important this is. Phi correlation is a post-test to give this additional information. Phi varies between -1 and 1. Close to 0 it shows little association between variables. Close to 1, it indicates a strong positive association. Close to -1 it shows a strong negative correlation. Remember that Phi is only of use in 2x2 tables. Where tables are larger, use Cramer's V.

Chi-square test
Explanations > Social Research > Analysis > Chi-square test Description | Calculation | Example | Reporting | Discussion | See also

Description
The chi-square () test measures the alignment between two sets of frequency measures. These must be categorical counts and not percentages orratios measures (for these, use another correlation test). Note that the frequency numbers should be significant and be at least above 5 (although an occasional lower figure may be possible, as long as they are not a part of a pattern of low figures). Goodness of fit A common use is to assess whether a measured/observed set of measures follows an expected pattern. The expected frequency may be determined from prior knowledge (such as a previous year's exam results) or by calculation of an average from the given data. The null hypothesis, H0 is that the two sets of measures are not significantly different. Independence The chi-square test can be used in the reverse manner to goodness of fit. If the two sets of measures are compared, then just as you can show they align, you can also determine if they do not align. The null hypothesis here is that the two sets of measures are similar. The main difference in goodness-of-fit vs. independence assessments is in the use of the Chi Square table. For goodness of fit, attention is on 0.05, 0.01 or 0.001 figures. For independence, it is on 0.95 or 0.99 figures (this is why the table has two ends to it).

Calculation
Chi-squared, 2 = SUM( (observed - expected)2 / expected) 2 = SUM( (fo - fe)2 / fe ) ...where fo is the observed frequency and fe is the expected frequency.

Note that the expected values may need to be scaled to be comparable to the observed values. A simple test is that the total frequency/count should be the same for observed and expected values. In a table, the expected frequency, if not known, may be estimated as: fe = (row total) x (column total) / n ...where n is the total of all rows (or columns). The result is used with a Chi Square table to determine whether the comparison shows significance. In a table, the degrees of freedom are: df = (R - 1) * (C - 1) ...where R is the number of rows and C is the number of columns.

Example
Goodness of fit English test grade distributions have changed from last year, with grade B's somewhat lower. Is this significant? The table below shows the calculation. First, the expected values are created by scaling last year's results to be equivalent to this year. Then the test statistic is calculated as SUM((O E)^2/E). English test results Grade A Grade B Grade C Grade D Grade E This year, O Last year Scaled last year, E (O - E) (O - E)^2 (O - E)^2/E 23 25 26 -3.3 11.0 0.4 32 20 21 10.9 119.8 5.7 20 15 16 4.2 17.7 1.1 15 25 26 -11.3 128.0 4.9 10 10 11 -0.5 0.3 0.0 12.1 Sum 100 95 100

Chi-square is found to be 12.1 and the degrees of freedom are (5-1) = 4 (there are five possible grades). Looking this up in the Chi Square table shows the probability is between 5% (9.49) and 1% (13.28), so H0 is adequately falsified and a significant change can be claimed. Independence A year group in school chooses between drama and history as below. Is there any difference between boys' and girls' choices?

Observed Chose drama Boys Girls Total 43 52 95 Chose history Total 55 54 109 98 106 204

Expected = (row tot * col tot)/overall tot Chose drama Boys Girls Total 45.6 49.4 95 Chose history Total 52.4 56.6 109 98 106 204

(observed - expected)^2/expected Chose drama Boys Girls Total 0.2 0.1 Chose history Total 0.1 0.1 0.55

Chi-square is 0.55. There are (2-1)*(2-1) = 1 degree of freedom. Checking the Chi Square table shows 0.55 is between 0.004 and 3.84, so no conclusion can be drawn about independence or similarity between boys' and girls' choices.

Reporting
Chi-square is reported in the following form: 2 (3, N = 125) = 10.2, p = .012 Where: 3 - the degrees of freedom 125 - subjects in the sample 10.2 - the 2 test statistic .012 - the probability of the null hypothesis being true

Discussion
This test compares observed data with what we would expect to get (if the null hypothesis of no difference was true). It is based on the principle that if the two variables are not related (for example gender is not related to deafness) then measures applied to each variable will give similar results (for example about the same proportion of men and women being found to use a hearing aid), with any variation between the results being purely caused by chance. If the experimental measures are significantly different, then some relationship can be claimed. A reason that percentages do not work is because they are fractions and low numbers will not work. In practice, you can often get away with percentages by converting them into larger numbers. The measurement is unusual in that it has a square on numerator and a non-square on the denominator. Squaring removes negatives and exaggerates outliers. This increases the effect that chi-square has in analyzing the difference between two data sets. Note that the test only reports whether two sets of figures are similar. It says nothing about the nature of the similarity. A chi-gram is a bar-chart plot of a set of chi-square calculations and can visually show how chi-square varies across a set of related measurements. Where variables are dichotomous (ie. can have only one of two values), then McNemar's Q is a similar test that is customized for this circumstance. Note that this test is called the 'Chi-square' test, not 'Chi-squared'. The Chi-square test is non-parametric.

See also
Chi Square table, Choosing a correlation test, McNemar's Q

Cramr's V
From Wikipedia, the free encyclopedia

In statistics, Cramr's V (sometimes referred to as Cramr's phi and denoted as c) is a popular[citation needed] measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic and was published by Harald Cramr in 1946.[1]
Contents
[hide]

Cramr's V (c)

1 Usage and interpretation 2 Calculation 3 See also 4 References

5 External links

[edit]Usage

and interpretation

c is the intercorrelation of two discrete variables[2] and may be used with variables having two or more levels. c is a symmetrical measure, it does not matter which variable we place in the columns and which in the rows. Also, the order of rows/columns doesn't matter, so c may be used with nominal data types or higher (ordered, numerical, etc) Cramr's V may also be applied to goodness of fit chi-squared models when there is a 1k table (e.g: r=1). In this case k is taken as the number of optional outcomes and it functions as a measure of tendency towards a single outcome. Cramr's V varies from 0 (corresponding to no association between the variables) to 1 (complete association) and can reach 1 only when the two variables are equal to each other. c2 is the mean square canonical correlation between the variables[citation needed]. In the case of a 22 contingency table Cramr's V is equal to the Phi coefficient. Note that as chi-squared values tend to increase with the number of cells, the greater the difference between r (rows) and c (columns), the more likely c will tend to 1 without strong evidence of a meaningful correlation.[citation needed]

[edit]Calculation
Cramr's V is computed by taking the square root of the chi-squared statistic divided by the sample size and the length of the minimum dimension (k is the smaller of the number of rows r or columns c). The formula for the c coefficient is:

where:

is the phi coefficient. is derived from Pearson's chi-squared test is the grand total of observations and being the number of rows or the number of columns, whichever is less.

The p-value for the significance of c is the same one that is calculated using the Pearson's chi-squared test[citation needed].

The formula for the variance of c is known.[3]

[edit]See

also

Other measures of correlation for nominal data:

The phi coefficient Tschuprow's T The uncertainty coefficient The Lambda coefficient

Other related articles:

Contingency table Effect size

[edit]References

1.

^ Cramr, Harald. 1946. Mathematical Methods of Statistics. Princeton: Princeton University Press, p282. ISBN 0-691-08004-6

2.

^ Sheskin, David J. (1997). Handbook of Parametric and Nonparametric Statistical Procedures. Boca Raton, Fl: CRC Press.

3.

^ Liebetrau, Albert M. (1983). Measures of association. Newbury Park, CA: Sage Publications. Quantitative Applications in the Social Sciences Series No. 32. (pages 1516)

Cramr, H. (1999). Mathematical Methods of Statistics, Princeton University Press

[edit]External

links

A Measure of Association for Nonparametric Statistics (Alan C. Acock and Gordon R. Stavig Page 1381 of 13811386)

Nominal Association: Phi, Contingency Coefficient, Tschuprow's T, Cramer's V, Lambda, Uncertainty Coefficient

Potrebbero piacerti anche