Sei sulla pagina 1di 11

# 11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA

STHDA
Stati s t i c a l t o o l s f or high-through put data analysis

Licence:

Search... 

## Home Basics Data Visualize Analyze Products Contribute Support About

Home / Easy Guides / R software / R Basic Statistics / Correlation Analyses in R / Correlation Test Between Two Variables in R Actions menu for module Wiki

Bhubaneswar to London

Tools

## What is correlation test?

Install and load required R packages
Methods for correlation analyses
Correlation formula
Pearson correlation formula
Spearman correlation formula
Kendall correlation formula

Compute correlation in R
R functions
Visualize your data using scatter plots
Preleminary test to check the test assumptions
Pearson correlation test
Interpretation of the result

## Kendall rank correlation test

Spearman rank correlation coe cient

## Interpret correlation coe cient

Online correlation coe cient calculator
Summary
Infos

## What is correlation test?

Correlation test is used to evaluate the association between two or more variables.

For instance, if we are interested to know whether there is a relationship between the heights of fathers and sons, a correlation coe cient can be calculated to

 If there is no relationship between the two variables (father and son heights), the average height of son should be the same regardless of the height of
the fathers and vice versa.

www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 1/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA

 Here, we’ll describe the di erent correlation methods and we’ll provide pratical examples using R software.

## Install and load required R packages

We’ll use the ggpubr R package for an easy ggplot2-based data visualization

if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")

## Or, install from CRAN as follow:

install.packages("ggpubr")

library("ggpubr")

## Methods for correlation analyses

There are di erent methods to perform correlation analysis:

Pearson correlation (r), which measures a linear dependence between two variables (x and y). It’s also known as a parametric correlation test because it
depends to the distribution of the data. It can be used only when x and y are from normal distribution. The plot of y = f(x) is named the linear regression
curve.

Kendall tau and Spearman rho, which are rank-based correlation coe cients (non-parametric)

##  The most commonly used method is the Pearson correlation method.

Correlation formula
In the formula below,

## x and y are two vectors of length n

mx and my corresponds to the means of x and y, respectively.

## Pearson correlation formula

∑ (x − mx )(y − my )
r = −−−−−−−−−−−−−−−−−−−−−
2 2
√ ∑ (x − mx ) ∑ (y − my )

## The p-value (signi cance level) of the correlation can be determined :

1. by using the correlation coe cient table for the degrees of freedom : df = n − 2 , where n is the number of observation in x and y variables.

## 2. or by calculating the t value as follow:

r −−−−−
t = n − 2
− −−− −√
√ 1 − r2

In the case 2) the corresponding p-value is determined using t distribution table for df = n − 2

 If the p-value is < 5%, then the correlation between x and y is signi cant.

## Spearman correlation formula

The Spearman correlation method computes the correlation between the rank of x and the rank of y variables.
′ ′
∑(x − mx ′ )(y − my ′ )
i
rho =
−−−−−−−−−−−−−−−−−−−−−−
′ 2 ′ 2
√ ∑(x − mx ′ ) ∑(y − my ′ )

## Kendall correlation formula

www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 2/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
The Kendall correlation method measures the correspondence between the ranking of x and y variables. The total number of possible pairings of x with y
observations is n(n − 1)/2 , where n is the size of x and y.

## The procedure is as follow:

Begin by ordering the pairs by the x values. If x and y are correlated, then they would have the same relative rank orders.

Now, for each yi , count the number of yj > yi (concordant pairs (c)) and the number of yj < yi (discordant pairs (d)).

nc − nd
tau =
1
n(n − 1)
2

Where,

## nc : total number of concordant pairs

nd : total number of discordant pairs
n : size of x and y

Compute correlation in R

R functions
Correlation coe cient can be computed using the functions cor() or cor.test():

## cor() computes the correlation coe cient

cor.test() test for association/correlation between paired samples. It returns both the correlation coe cient and the signi cance level(or p-value) of
the correlation .

## cor(x, y, method = c("pearson", "kendall", "spearman"))

cor.test(x, y, method=c("pearson", "kendall", "spearman"))

## x, y: numeric vectors with the same length

method: correlation method

 If your data contain missing values, use the following R code to handle missing values by case-wise deletion.

## Import your data into R

1. Prepare your data as speci ed here: Best practices for preparing your data set for R

## 2. Save your data in an external .txt tab or .csv les

3. Import your data into R as follow:

## # If .txt tab file, use this

# Or, if .csv file, use this

## Here, we’ll use the built-in R data set mtcars as an example.

The R code below computes the correlation between mpg and wt variables in mtcars data set:

## mpg cyl disp hp drat wt qsec vs am gear carb

Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1

www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 3/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

## Visualize your data using scatter plots

To use R base graphs, click this link: scatter plot - R base graphs. Here, we’ll use the ggpubr R package.

library("ggpubr")
ggscatter(my_data, x = "mpg", y = "wt",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "Miles/(US) gallon", ylab = "Weight (1000 lbs)")

## Preleminary test to check the test assumptions

1. Is the covariation linear? Yes, form the plot above, the relationship is linear. In the situation where the scatter plots show curved patterns, we are dealing
with nonlinear association between the two variables.

2. Are the data from each of the 2 variables (x, y) follow a normal distribution?
Use Shapiro-Wilk normality test –> R function: shapiro.test()
and look at the normality plot —> R function: ggpubr::ggqqplot()

## Shapiro-Wilk test can be performed as follow:

Null hypothesis: the data are normally distributed
Alternative hypothesis: the data are not normally distributed

## # Shapiro-Wilk normality test for mpg

shapiro.test(my_data\$mpg) # => p = 0.1229
# Shapiro-Wilk normality test for wt
shapiro.test(my_data\$wt) # => p = 0.09

 From the output, the two p-values are greater than the signi cance level 0.05 implying that the distribution of the data are not signi cantly di
normal distribution. In other words, we can assume the normality.
erent from

Visual inspection of the data normality using Q-Q plots (quantile-quantile plots). Q-Q plot draws the correlation between a given sample and the normal
distribution.

library("ggpubr")
# mpg
ggqqplot(my_data\$mpg, ylab = "MPG")
# wt
ggqqplot(my_data\$wt, ylab = "WT")

www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 4/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA

 From the normality plots, we conclude that both populations may come from normal distributions.

 Note that, if the data are not normally distributed, it’s recommended to use the non-parametric correlation, including Spearman and Kendall rank-based
correlation tests.

## Pearson correlation test

Correlation test between mpg and wt variables:

## res <- cor.test(my_data\$wt, my_data\$mpg,

method = "pearson")
res

## Pearson's product-moment correlation

data: my_data\$wt and my_data\$mpg
t = -9.559, df = 30, p-value = 1.294e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.9338264 -0.7440872
sample estimates:
cor
-0.8676594

## t is the t-test statistic value (t = -9.559),

df is the degrees of freedom (df= 30),
p-value is the signi cance level of the t-test (p-value = 1.29410^{-10}).
conf.int is the con dence interval of the correlation coe cient at 95% (conf.int = [-0.9338, -0.7441]);
sample estimates is the correlation coe cient (Cor.coe = -0.87).

## Interpretation of the result

 The p-value of the test is 1.29410^{-10}, which is less than the signi cance level alpha = 0.05. We can conclude that wt and mpg are signi cantly
correlated with a correlation coe cient of -0.87 and p-value of 1.29410^{-10} .

The function cor.test() returns a list containing the following components:

## p.value: the p-value of the test

estimate: the correlation coe cient

res\$p.value

[1] 1.293959e-10

## # Extract the correlation coefficient

res\$estimate

www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 5/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
cor
-0.8676594

## Kendall rank correlation test

The Kendall rank correlation coe cient or Kendall’s tau statistic is used to estimate a rank-based measure of association. This test may be used if the data do
not necessarily come from a bivariate normal distribution.

res2

## Kendall's rank correlation tau

data: my_data\$wt and my_data\$mpg
z = -5.7981, p-value = 6.706e-09
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
-0.7278321

## tau is the Kendall correlation coe cient.

 The correlation coe cient between x and y are -0.7278 and the p-value is 6.70610^{-9}.

## Spearman rank correlation coe cient

Spearman’s rho statistic is also used to estimate a rank-based measure of association. This test may be used if the data do not come from a bivariate normal
distribution.

res2

## Spearman's rank correlation rho

data: my_data\$wt and my_data\$mpg
S = 10292, p-value = 1.488e-11
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.886422

## rho is the Spearman’s correlation coe cient.

 The correlation coe cient between x and y are -0.8864 and the p-value is 1.48810^{-11}.

## Interpret correlation coe cient

Correlation coe cient is comprised between -1 and 1:

-1 indicates a strong negative correlation : this means that every time x increases, y decreases (left panel gure)
0 means that there is no association between the two variables (x and y) (middle panel gure)
1 indicates a strong positive correlation : this means that y increases with x (right panel gure)

## Online correlation coe cient calculator

You can compute correlation test between two variables, online, without any installation by clicking the following link:

www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 6/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA

##  Correlation coe cient calculator

Summary

Use the function cor.test(x,y) to analyze the correlation coe cient between two variables and to get signi cance level of the correlation.
Three possible correlation methods using the function cor.test(x,y): pearson, kendall, spearman

Infos

##  This analysis has been performed using R software (ver. 3.2.4).

Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!

Methods in R

## Machine Learning Essentials: Practical Guide in

R

More books on R and data science
R Graphics Essentials for Great Data Network Analysis and Visualization in R
Visualization

www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 7/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA

Subscribe
by FeedBurner

On Social Networks:
on Social Networks

 Get involved :

Auto connect

 Register 

Welcome!

Subscribe
by FeedBurner

on Social Networks

analyzing data

alternative hypothesis

analyse data

analysis correlation

analysis of means

factoextra

www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 8/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA

survminer

ggpubr

ggcorrplot

fastqcr

Our Books

3
D
P
l
o
t
s
i
n
R
R Graphics Essentials for Great Data Visualization: 200 Practical Examples You Want to Know for Data Science
 NEW!!

## Practical Guide to Cluster Analysis in R

www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 9/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA

## Practical Guide to Principal Component Methods in R

Guest Book
Taking the full association or third party professional team is always deserved so that you do not deprive with annoying e ect re ected in Microsoft o ce 365. As
some cloud computing features sudd... [Read more]
By rayanwarner1

Guest Book

R-Bloggers

## SPOT ON 49077 Narayan Villa SPOT

₹572 MakeMyTrip Hotels Offer | Mount Abu
MakeMyTrip

www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 10/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA

Boosted by PHPBoost

## Be Awesome in ggplot2: Be Awesome in ggplot2: Be Awesome in ggplot2: Be Awesome in ggplot2:

A Practical Guide to b... A Practical Guide to b... A Practical Guide to b... A Practical Guide to b...