Permutation tests do not rely on assumptions about the distribution of the sampled populations, as some other tests do. It is my understanding, however, that for certain tests—for example those specifically testing a difference in means—that there are assumptions about the underlying populations. For example, the Fisher-Pitman test is sensitive to the mean and the dispersion simultaneously.
Permutation tests work by resampling the observed data many times in order to determine a p-value for the test. Recall that the p-value is defined as the probability of getting data as extreme as the observed data when the null hypothesis is true. If the data are shuffled many times in accordance with the null hypothesis being true, the number of cases with data as extreme as the observed data could be counted and a p-value calculated.
The advantages of permutation tests are:
• the lack of assumptions about the distribution of the underlying data,
• their flexibility in the kinds of data they can handle (nominal, ordinal, interval/ratio),
• and their being relatively straightforward to conduct and interpret.
The disadvantages of permutation tests are:
• the limited complexity of designs they can handle,
• and the unfamiliarity with them for many readers.
R packages
The coin package offers a very flexible framework to conduct permutation tests. The coin package provides functions for common permutation tests, and, in the general framework, can handle nominal, ordinal, and interval/ratio data.
Another useful package in lmPerm, which conducts analyses analogous to general linear models (lm in R) with permutation tests.
There are other packages that implement permutation tests.
Packages used in this chapter
The packages used in this chapter include:
• coin
• lmPerm
• rcompanion
The following commands will install these packages if they are not already installed:
if(!require(coin)){install.packages("coin")}
if(!require(lmPerm)){install.packages("lmPerm")}
if(!require(rcompanion)){install.packages("rcompanion")}
Permutation test example
The following example uses the data from the One-way Anova chapter.
Note that results from permutation tests may vary due to the resampling procedure and the number of iterations.
Data = read.table(header=TRUE, stringsAsFactors=TRUE, text="
Instructor Student Sodium
'Brendon Small' a 1200
'Brendon Small' b 1400
'Brendon Small' c 1350
'Brendon Small' d 950
'Brendon Small' e 1400
'Brendon Small' f 1150
'Brendon Small' g 1300
'Brendon Small' h 1325
'Brendon Small' i 1425
'Brendon Small' j 1500
'Brendon Small' k 1250
'Brendon Small' l 1150
'Brendon Small' m 950
'Brendon Small' n 1150
'Brendon Small' o 1600
'Brendon Small' p 1300
'Brendon Small' q 1050
'Brendon Small' r 1300
'Brendon Small' s 1700
'Brendon Small' t 1300
'Coach McGuirk' u 1100
'Coach McGuirk' v 1200
'Coach McGuirk' w 1250
'Coach McGuirk' x 1050
'Coach McGuirk' y 1200
'Coach McGuirk' z 1250
'Coach McGuirk' aa 1350
'Coach McGuirk' ab 1350
'Coach McGuirk' ac 1325
'Coach McGuirk' ad 1525
'Coach McGuirk' ae 1225
'Coach McGuirk' af 1125
'Coach McGuirk' ag 1000
'Coach McGuirk' ah 1125
'Coach McGuirk' ai 1400
'Coach McGuirk' aj 1200
'Coach McGuirk' ak 1150
'Coach McGuirk' al 1400
'Coach McGuirk' am 1500
'Coach McGuirk' an 1200
'Melissa Robins' ao 900
'Melissa Robins' ap 1100
'Melissa Robins' aq 1150
'Melissa Robins' ar 950
'Melissa Robins' as 1100
'Melissa Robins' at 1150
'Melissa Robins' au 1250
'Melissa Robins' av 1250
'Melissa Robins' aw 1225
'Melissa Robins' ax 1325
'Melissa Robins' ay 1125
'Melissa Robins' az 1025
'Melissa Robins' ba 950
'Melissa Robins' bc 925
'Melissa Robins' bd 1200
'Melissa Robins' be 1100
'Melissa Robins' bf 950
'Melissa Robins' bg 1300
'Melissa Robins' bh 1400
'Melissa Robins' bi 1100
")
### Order factors by the order in data frame
### Otherwise, R will alphabetize them
Data$Instructor = factor(Data$Instructor,
levels=unique(Data$Instructor))
### Check the data frame
library(psych)
headTail(Data)
str(Data)
summary(Data)
Summarize data by group
library(FSA)
Summarize(Sodium ~ Instructor,
data=Data,
digits=3)
Instructor n mean sd min Q1
median Q3 max
1 Brendon Small 20 1287.50 193.734 950 1150.00 1300.0 1400.00 1700
2 Coach McGuirk 20 1246.25 142.412 1000 1143.75 1212.5 1350.00 1525
3 Melissa Robins 20 1123.75 143.149 900 1006.25 1112.5 1231.25 1400
Fisher-Pitman permutation test
library(coin)
oneway_test(Sodium ~ Instructor,
data = Data)
Asymptotic K-Sample Fisher-Pitman Permutation Test
chi-squared = 9.6282, df = 2, p-value = 0.008114
Post-hoc analysis with pairwise tests
library(rcompanion)
PT = pairwisePermutationTest(Sodium ~ Instructor,
data = Data,
teststat = "quadratic",
method = "fdr")
PT
Comparison Stat p.value p.adjust
1 Brendon Small - Coach McGuirk = 0 0.5949 0.4405 0.44050
2 Brendon Small - Melissa Robins = 0 7.63 0.00574 0.01722
3 Coach McGuirk - Melissa Robins = 0 6.329 0.01188 0.01782
cldList(comparison = PT$Comparison,
p.value = PT$p.adjust,
threshold = 0.05)
Group Letter MonoLetter
1 BrendonSmall a a
2 CoachMcGuirk a a
3 MelissaRobins b b
Permutation test with lmPerm
library(lmPerm)
model = lmp(Sodium ~ Instructor, data = Data,
perm="Prob",
seqs=FALSE)
anova(model)
Analysis of Variance Table
Df R Sum Sq R Mean Sq Iter Pr(Prob)
Instructor 2 290146 145073 5000 0.008 **
Residuals 57 1487812 26102
summary(model)
Multiple R-Squared: 0.1632
Post-hoc analysis with pairwise tests
### Brendon Small
vs. Coach McGuirk
model.1 = lmp(Sodium ~ Instructor,
data = Data[Data$Instructor=="Brendon Small" |
Data$Instructor=="Coach McGuirk" ,],
perm="Prob",
seqs=FALSE)
anova(model.1)
Analysis of Variance Table
Df R Sum Sq R Mean Sq Iter Pr(Prob)
Instructor 1 17016 17016 120 0.4583
Residuals 38 1098469 28907
### Brendon
Small vs. Melissa Robins
model.2 = lmp(Sodium ~ Instructor,
data = Data[Data$Instructor=="Brendon Small" |
Data$Instructor=="Melissa Robins" ,],
perm="Prob",
seqs=FALSE)
anova(model.2)
Analysis of Variance Table
Df R Sum Sq R Mean Sq Iter Pr(Prob)
Instructor 1 268141 268141 5000 0.0026 **
Residuals 38 1102469 29012
### Coach
McGuirk vs. Melissa Robins
model.3 = lmp(Sodium ~ Instructor,
data = Data[Data$Instructor=="Coach McGuirk" |
Data$Instructor=="Melissa Robins" ,],
perm="Prob",
seqs=FALSE)
anova(model.3)
Analysis of Variance Table
Df R Sum Sq R Mean Sq Iter Pr(Prob)
Instructor 1 150062 150062 4969 0.01992 *
Residuals 38 774688 20387
### Adjust p-values
p.adjust(c(0.4583, 0.0026, 0.01992), method="fdr")
0.45830 0.00780 0.02988
References
For more information on permutation tests and the coin package, see:
Hothorn, T., K. Hornik, M.A. van de Wiel, and A. Zeileis. 2015. Implementing a Class of Permutation Tests: The coin Package. cran.r-project.org/web/packages/coin/vignettes/Implementation.pdf.
library(coin); help(package="coin")
library(lmPerm); help(package="lmPerm")
library(lmPerm); vignette("lmPerm")