Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
This website uses cookies to ensure you get the best experience on our website.
Ok Learn more
STHDA
Stati s t i c a l t o o l s f or high-through put data analysis
Licence:
Search...
Home / Easy Guides / R software / R Basic Statistics / Correlation Analyses in R / Correlation Test Between Two Variables in R Actions menu for module Wiki
Bhubaneswar to London
Compute correlation in R
R functions
Import your data into R
Visualize your data using scatter plots
Preleminary test to check the test assumptions
Pearson correlation test
Interpretation of the result
Access to the values returned by cor.test() function
Correlation test is used to evaluate the association between two or more variables.
For instance, if we are interested to know whether there is a relationship between the heights of fathers and sons, a correlation coe cient can be calculated to
answer this question.
If there is no relationship between the two variables (father and son heights), the average height of son should be the same regardless of the height of
the fathers and vice versa.
www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 1/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
Here, we’ll describe the di erent correlation methods and we’ll provide pratical examples using R software.
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
install.packages("ggpubr")
library("ggpubr")
Pearson correlation (r), which measures a linear dependence between two variables (x and y). It’s also known as a parametric correlation test because it
depends to the distribution of the data. It can be used only when x and y are from normal distribution. The plot of y = f(x) is named the linear regression
curve.
Kendall tau and Spearman rho, which are rank-based correlation coe cients (non-parametric)
Correlation formula
In the formula below,
1. by using the correlation coe cient table for the degrees of freedom : df = n − 2 , where n is the number of observation in x and y variables.
In the case 2) the corresponding p-value is determined using t distribution table for df = n − 2
If the p-value is < 5%, then the correlation between x and y is signi cant.
www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 2/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
The Kendall correlation method measures the correspondence between the ranking of x and y variables. The total number of possible pairings of x with y
observations is n(n − 1)/2 , where n is the size of x and y.
Begin by ordering the pairs by the x values. If x and y are correlated, then they would have the same relative rank orders.
Now, for each yi , count the number of yj > yi (concordant pairs (c)) and the number of yj < yi (discordant pairs (d)).
nc − nd
tau =
1
n(n − 1)
2
Where,
Compute correlation in R
R functions
Correlation coe cient can be computed using the functions cor() or cor.test():
If your data contain missing values, use the following R code to handle missing values by case-wise deletion.
The R code below computes the correlation between mpg and wt variables in mtcars data set:
library("ggpubr")
ggscatter(my_data, x = "mpg", y = "wt",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "Miles/(US) gallon", ylab = "Weight (1000 lbs)")
2. Are the data from each of the 2 variables (x, y) follow a normal distribution?
Use Shapiro-Wilk normality test –> R function: shapiro.test()
and look at the normality plot —> R function: ggpubr::ggqqplot()
From the output, the two p-values are greater than the signi cance level 0.05 implying that the distribution of the data are not signi cantly di
normal distribution. In other words, we can assume the normality.
erent from
Visual inspection of the data normality using Q-Q plots (quantile-quantile plots). Q-Q plot draws the correlation between a given sample and the normal
distribution.
library("ggpubr")
# mpg
ggqqplot(my_data$mpg, ylab = "MPG")
# wt
ggqqplot(my_data$wt, ylab = "WT")
www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 4/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
From the normality plots, we conclude that both populations may come from normal distributions.
Note that, if the data are not normally distributed, it’s recommended to use the non-parametric correlation, including Spearman and Kendall rank-based
correlation tests.
The p-value of the test is 1.29410^{-10}, which is less than the signi cance level alpha = 0.05. We can conclude that wt and mpg are signi cantly
correlated with a correlation coe cient of -0.87 and p-value of 1.29410^{-10} .
[1] 1.293959e-10
www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 5/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
cor
-0.8676594
The correlation coe cient between x and y are -0.7278 and the p-value is 6.70610^{-9}.
The correlation coe cient between x and y are -0.8864 and the p-value is 1.48810^{-11}.
-1 indicates a strong negative correlation : this means that every time x increases, y decreases (left panel gure)
0 means that there is no association between the two variables (x and y) (middle panel gure)
1 indicates a strong positive correlation : this means that y increases with x (right panel gure)
www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 6/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
Summary
Use the function cor.test(x,y) to analyze the correlation coe cient between two variables and to get signi cance level of the correlation.
Three possible correlation methods using the function cor.test(x,y): pearson, kendall, spearman
Infos
Enjoyed this article? I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.
Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!
More books on R and data science
R Graphics Essentials for Great Data Network Analysis and Visualization in R
Visualization
www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 7/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
Subscribe
by FeedBurner
On Social Networks:
on Social Networks
Get involved :
Click to follow us on Facebook and Google+ :
Comment this article by clicking on "Discussion" button (top-right position of this page)
This page has been seen 621317 times
Sign in
Login
Login
Password
Password
Auto connect
Sign in
Register
Forgotten password
Welcome!
Want to Learn More on R Programming and Data Science?
Follow us by Email
Subscribe
by FeedBurner
on Social Networks
analyzing data
alternative hypothesis
analyse data
analysis correlation
analysis of means
factoextra
www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 8/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
survminer
ggpubr
ggcorrplot
fastqcr
Our Books
3
D
P
l
o
t
s
i
n
R
R Graphics Essentials for Great Data Visualization: 200 Practical Examples You Want to Know for Data Science
NEW!!
www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 9/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
Guest Book
Taking the full association or third party professional team is always deserved so that you do not deprive with annoying e ect re ected in Microsoft o ce 365. As
some cloud computing features sudd... [Read more]
By rayanwarner1
Guest Book
R-Bloggers
www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 10/11
11/4/2019 Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
Newsletter Email
Boosted by PHPBoost
AddThis
www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r 11/11