Sei sulla pagina 1di 11

11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA

This website uses cookies to ensure you get the best experience on our website.

Ok Learn more

STHDA
Stati s t i c a l t o o l s f or high-through put data analysis

Licence:

Search... 

Home Basics Data Visualize Analyze Products Contribute Support About

Home / Easy Guides / R software / R Basic Statistics / Correlation Analyses in R / Correlation Test Between Two Variables in R Actions menu for module Wiki

Bhubaneswar to London

₹22,494 LEARN MORE

 Correlation Test Between Two Variables in R


Tools

What is correlation test?


Install and load required R packages
Methods for correlation analyses
Correlation formula
Pearson correlation formula
Spearman correlation formula
Kendall correlation formula

Compute correlation in R
R functions
Import your data into R
Visualize your data using scatter plots
Preleminary test to check the test assumptions
Pearson correlation test
Interpretation of the result
Access to the values returned by cor.test() function

Kendall rank correlation test


Spearman rank correlation coe cient

Interpret correlation coe cient


Online correlation coe cient calculator
Summary
Infos

What is correlation test?

Correlation test is used to evaluate the association between two or more variables.

For instance, if we are interested to know whether there is a relationship between the heights of fathers and sons, a correlation coe cient can be calculated to
answer this question.

 If there is no relationship between the two variables (father and son heights), the average height of son should be the same regardless of the height of
the fathers and vice versa.


www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 1/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA

 Here, we’ll describe the di erent correlation methods and we’ll provide pratical examples using R software.

Install and load required R packages


We’ll use the ggpubr R package for an easy ggplot2-based data visualization

Install the latest version from GitHub as follow (recommended):

if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")

Or, install from CRAN as follow:

install.packages("ggpubr")

Load ggpubr as follow:

library("ggpubr")

Methods for correlation analyses


There are di erent methods to perform correlation analysis:

Pearson correlation (r), which measures a linear dependence between two variables (x and y). It’s also known as a parametric correlation test because it
depends to the distribution of the data. It can be used only when x and y are from normal distribution. The plot of y = f(x) is named the linear regression
curve.

Kendall tau and Spearman rho, which are rank-based correlation coe cients (non-parametric)

 The most commonly used method is the Pearson correlation method.

Correlation formula
In the formula below,

x and y are two vectors of length n


mx and my corresponds to the means of x and y, respectively.

Pearson correlation formula


∑ (x − mx )(y − my )
r = −−−−−−−−−−−−−−−−−−−−−
2 2
√ ∑ (x − mx ) ∑ (y − my )

mx and my are the means of x and y variables.

The p-value (signi cance level) of the correlation can be determined :

1. by using the correlation coe cient table for the degrees of freedom : df = n − 2 , where n is the number of observation in x and y variables.

2. or by calculating the t value as follow:


r −−−−−
t = n − 2
− −−− −√
√ 1 − r2

In the case 2) the corresponding p-value is determined using t distribution table for df = n − 2

 If the p-value is < 5%, then the correlation between x and y is signi cant.

Spearman correlation formula


The Spearman correlation method computes the correlation between the rank of x and the rank of y variables.
′ ′
∑(x − mx ′ )(y − my ′ )
i
rho =
−−−−−−−−−−−−−−−−−−−−−−
′ 2 ′ 2
√ ∑(x − mx ′ ) ∑(y − my ′ )

Where x′ = rank(x) and y ′ = rank(y) .

Kendall correlation formula


www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 2/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
The Kendall correlation method measures the correspondence between the ranking of x and y variables. The total number of possible pairings of x with y
observations is n(n − 1)/2 , where n is the size of x and y.

The procedure is as follow:

Begin by ordering the pairs by the x values. If x and y are correlated, then they would have the same relative rank orders.

Now, for each yi , count the number of yj > yi (concordant pairs (c)) and the number of yj < yi (discordant pairs (d)).

Kendall correlation distance is de ned as follow:

nc − nd
tau =
1
n(n − 1)
2

Where,

nc : total number of concordant pairs


nd : total number of discordant pairs
n : size of x and y

Compute correlation in R

R functions
Correlation coe cient can be computed using the functions cor() or cor.test():

cor() computes the correlation coe cient


cor.test() test for association/correlation between paired samples. It returns both the correlation coe cient and the signi cance level(or p-value) of
the correlation .

The simpli ed formats are:

cor(x, y, method = c("pearson", "kendall", "spearman"))


cor.test(x, y, method=c("pearson", "kendall", "spearman"))

x, y: numeric vectors with the same length


method: correlation method

 If your data contain missing values, use the following R code to handle missing values by case-wise deletion.

cor(x, y, method = "pearson", use = "complete.obs")

Import your data into R


1. Prepare your data as speci ed here: Best practices for preparing your data set for R

2. Save your data in an external .txt tab or .csv les


3. Import your data into R as follow:

# If .txt tab file, use this


my_data <- read.delim(file.choose())
# Or, if .csv file, use this
my_data <- read.csv(file.choose())

Here, we’ll use the built-in R data set mtcars as an example.

The R code below computes the correlation between mpg and wt variables in mtcars data set:

my_data <- mtcars


head(my_data, 6)

mpg cyl disp hp drat wt qsec vs am gear carb


Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1

www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 3/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

 We want to compute the correlation between mpg and wt variables.

Visualize your data using scatter plots


To use R base graphs, click this link: scatter plot - R base graphs. Here, we’ll use the ggpubr R package.

library("ggpubr")
ggscatter(my_data, x = "mpg", y = "wt",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "Miles/(US) gallon", ylab = "Weight (1000 lbs)")

Preleminary test to check the test assumptions


1. Is the covariation linear? Yes, form the plot above, the relationship is linear. In the situation where the scatter plots show curved patterns, we are dealing
with nonlinear association between the two variables.

2. Are the data from each of the 2 variables (x, y) follow a normal distribution?
Use Shapiro-Wilk normality test –> R function: shapiro.test()
and look at the normality plot —> R function: ggpubr::ggqqplot()

Shapiro-Wilk test can be performed as follow:


Null hypothesis: the data are normally distributed
Alternative hypothesis: the data are not normally distributed

# Shapiro-Wilk normality test for mpg


shapiro.test(my_data$mpg) # => p = 0.1229
# Shapiro-Wilk normality test for wt
shapiro.test(my_data$wt) # => p = 0.09

 From the output, the two p-values are greater than the signi cance level 0.05 implying that the distribution of the data are not signi cantly di
normal distribution. In other words, we can assume the normality.
erent from

Visual inspection of the data normality using Q-Q plots (quantile-quantile plots). Q-Q plot draws the correlation between a given sample and the normal
distribution.

library("ggpubr")
# mpg
ggqqplot(my_data$mpg, ylab = "MPG")
# wt
ggqqplot(my_data$wt, ylab = "WT")


www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 4/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA

 From the normality plots, we conclude that both populations may come from normal distributions.

 Note that, if the data are not normally distributed, it’s recommended to use the non-parametric correlation, including Spearman and Kendall rank-based
correlation tests.

Pearson correlation test


Correlation test between mpg and wt variables:

res <- cor.test(my_data$wt, my_data$mpg,


method = "pearson")
res

Pearson's product-moment correlation


data: my_data$wt and my_data$mpg
t = -9.559, df = 30, p-value = 1.294e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.9338264 -0.7440872
sample estimates:
cor
-0.8676594

In the result above :

t is the t-test statistic value (t = -9.559),


df is the degrees of freedom (df= 30),
p-value is the signi cance level of the t-test (p-value = 1.29410^{-10}).
conf.int is the con dence interval of the correlation coe cient at 95% (conf.int = [-0.9338, -0.7441]);
sample estimates is the correlation coe cient (Cor.coe = -0.87).

Interpretation of the result

 The p-value of the test is 1.29410^{-10}, which is less than the signi cance level alpha = 0.05. We can conclude that wt and mpg are signi cantly
correlated with a correlation coe cient of -0.87 and p-value of 1.29410^{-10} .

Access to the values returned by cor.test() function


The function cor.test() returns a list containing the following components:

p.value: the p-value of the test


estimate: the correlation coe cient

# Extract the p.value


res$p.value

[1] 1.293959e-10

# Extract the correlation coefficient


res$estimate


www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 5/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
cor
-0.8676594

Kendall rank correlation test


The Kendall rank correlation coe cient or Kendall’s tau statistic is used to estimate a rank-based measure of association. This test may be used if the data do
not necessarily come from a bivariate normal distribution.

res2 <- cor.test(my_data$wt, my_data$mpg, method="kendall")


res2

Kendall's rank correlation tau


data: my_data$wt and my_data$mpg
z = -5.7981, p-value = 6.706e-09
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
-0.7278321

tau is the Kendall correlation coe cient.

 The correlation coe cient between x and y are -0.7278 and the p-value is 6.70610^{-9}.

Spearman rank correlation coe cient


Spearman’s rho statistic is also used to estimate a rank-based measure of association. This test may be used if the data do not come from a bivariate normal
distribution.

res2 <-cor.test(my_data$wt, my_data$mpg, method = "spearman")


res2

Spearman's rank correlation rho


data: my_data$wt and my_data$mpg
S = 10292, p-value = 1.488e-11
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.886422

rho is the Spearman’s correlation coe cient.

 The correlation coe cient between x and y are -0.8864 and the p-value is 1.48810^{-11}.

Interpret correlation coe cient


Correlation coe cient is comprised between -1 and 1:

-1 indicates a strong negative correlation : this means that every time x increases, y decreases (left panel gure)
0 means that there is no association between the two variables (x and y) (middle panel gure)
1 indicates a strong positive correlation : this means that y increases with x (right panel gure)

Online correlation coe cient calculator


You can compute correlation test between two variables, online, without any installation by clicking the following link:


www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 6/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA

 Correlation coe cient calculator

Summary

Use the function cor.test(x,y) to analyze the correlation coe cient between two variables and to get signi cance level of the correlation.
Three possible correlation methods using the function cor.test(x,y): pearson, kendall, spearman

Infos

 This analysis has been performed using R software (ver. 3.2.4).

 Enjoyed this article? I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.
Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!

Share 63 Like 63 Tweet Share Save Share 138

Recommended for You!

Practical Guide to Cluster Analysis in R Practical Guide to Principal Component


Methods in R

Machine Learning Essentials: Practical Guide in


R


More books on R and data science
R Graphics Essentials for Great Data Network Analysis and Visualization in R
Visualization


www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 7/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA

Want to Learn More on R Programming and Data Science?


Follow us by Email

Subscribe
by FeedBurner

On Social Networks:
on Social Networks

 Get involved :
  Click to follow us on Facebook and Google+ :    
  Comment this article by clicking on "Discussion" button (top-right position of this page)
This page has been seen 621317 times

Sign in

Login
Login

Password
Password

Auto connect

Sign in

 Register 
 Forgotten password

Welcome!
Want to Learn More on R Programming and Data Science?
Follow us by Email

Subscribe
by FeedBurner

on Social Networks

analyzing data

alternative hypothesis

analyse data

analysis correlation

analysis of means

factoextra

www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 8/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA

survminer

ggpubr

ggcorrplot

fastqcr

Our Books

3
D
P
l
o
t
s
i
n
R
R Graphics Essentials for Great Data Visualization: 200 Practical Examples You Want to Know for Data Science
 NEW!!

Practical Guide to Cluster Analysis in R


www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 9/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA

Practical Guide to Principal Component Methods in R

Guest Book
Taking the full association or third party professional team is always deserved so that you do not deprive with annoying e ect re ected in Microsoft o ce 365. As
some cloud computing features sudd... [Read more]
By rayanwarner1

Guest Book

Datanovia: Online Data Science Courses

R-Bloggers

SPOT ON 49077 Narayan Villa SPOT


₹572 MakeMyTrip Hotels Offer | Mount Abu
MakeMyTrip


www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 10/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA

Newsletter Email 

Boosted by PHPBoost

Recommended for you

Be Awesome in ggplot2: Be Awesome in ggplot2: Be Awesome in ggplot2: Be Awesome in ggplot2:


A Practical Guide to b... A Practical Guide to b... A Practical Guide to b... A Practical Guide to b...

www.sthda.com www.sthda.com www.sthda.com www.sthda.com

AddThis


www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 11/11