Introduction to Permutation Tests

Permutation tests do not rely on assumptions about the distribution of the sampled populations, as some other tests do. It is my understanding, however, that for certain tests—for example those specifically testing a difference in means—that there are assumptions about the underlying populations. For example, the Fisher-Pitman test is sensitive to the mean and the dispersion simultaneously.

Permutation tests work by resampling the observed data many times in order to determine a p-value for the test. Recall that the p-value is defined as the probability of getting data as extreme as the observed data when the null hypothesis is true. If the data are shuffled many times in accordance with the null hypothesis being true, the number of cases with data as extreme as the observed data could be counted and a p-value calculated.

The advantages of permutation tests are:

• the lack of assumptions about the distribution of the underlying data,

• their flexibility in the kinds of data they can handle (nominal, ordinal, interval/ratio),

• and their being relatively straightforward to conduct and interpret.

The disadvantages of permutation tests are:

• the limited complexity of designs they can handle,

• and the unfamiliarity with them for many readers.

R packages

The coin package offers a very flexible framework to conduct permutation tests. The coin package provides functions for common permutation tests, and, in the general framework, can handle nominal, ordinal, and interval/ratio data.

Another useful package in lmPerm, which conducts analyses analogous to general linear models (lm in R) with permutation tests.

There are other packages that implement permutation tests.

Packages used in this chapter

The packages used in this chapter include:

• coin

• lmPerm

• rcompanion

The following commands will install these packages if they are not already installed:

if(!require(coin)){install.packages("coin")}
if(!require(lmPerm)){install.packages("lmPerm")}
if(!require(rcompanion)){install.packages("rcompanion")}

Permutation test example

The following example uses the data from the One-way Anova chapter.

Note that results from permutation tests may vary due to the resampling procedure and the number of iterations.

Data = read.table(header=TRUE, stringsAsFactors=TRUE, text="

Instructor       Student Sodium
'Brendon Small'      a    1200
'Brendon Small'      b    1400
'Brendon Small'      c    1350
'Brendon Small'      d     950
'Brendon Small'      e    1400
'Brendon Small'      f    1150
'Brendon Small'      g    1300
'Brendon Small'      h    1325
'Brendon Small'      i    1425
'Brendon Small'      j    1500
'Brendon Small'      k    1250
'Brendon Small'      l    1150
'Brendon Small'      m     950
'Brendon Small'      n    1150
'Brendon Small'      o    1600
'Brendon Small'      p    1300
'Brendon Small'      q    1050
'Brendon Small'      r    1300
'Brendon Small'      s    1700
'Brendon Small'      t    1300
'Coach McGuirk'      u    1100
'Coach McGuirk'      v    1200
'Coach McGuirk'      w    1250
'Coach McGuirk'      x    1050
'Coach McGuirk'      y    1200
'Coach McGuirk'      z    1250
'Coach McGuirk'      aa   1350
'Coach McGuirk'      ab   1350
'Coach McGuirk'      ac   1325
'Coach McGuirk'      ad   1525
'Coach McGuirk'      ae   1225
'Coach McGuirk'      af   1125
'Coach McGuirk'      ag   1000
'Coach McGuirk'      ah   1125
'Coach McGuirk'      ai   1400
'Coach McGuirk'      aj   1200
'Coach McGuirk'      ak   1150
'Coach McGuirk'      al   1400
'Coach McGuirk'      am   1500
'Coach McGuirk'      an   1200
'Melissa Robins'     ao   900
'Melissa Robins'     ap   1100
'Melissa Robins'    aq   1150
'Melissa Robins'     ar   950
'Melissa Robins'     as   1100
'Melissa Robins'     at   1150
'Melissa Robins'     au   1250
'Melissa Robins'     av   1250
'Melissa Robins'     aw   1225
'Melissa Robins'     ax   1325
'Melissa Robins'     ay   1125
'Melissa Robins'     az   1025
'Melissa Robins'     ba    950
'Melissa Robins'     bc    925
'Melissa Robins'     bd   1200
'Melissa Robins'     be   1100
'Melissa Robins'     bf    950
'Melissa Robins'     bg   1300
'Melissa Robins'     bh   1400
'Melissa Robins'     bi   1100
")

### Order factors by the order in data frame
### Otherwise, R will alphabetize them

Data$Instructor = factor(Data$Instructor,
                         levels=unique(Data$Instructor))

### Check the data frame

library(psych)

headTail(Data)

str(Data)

summary(Data)

Summarize data by group

library(FSA)

Summarize(Sodium ~ Instructor,
data=Data,
digits=3)

Instructor n mean sd min Q1 median Q3 max
1 Brendon Small 20 1287.50 193.734 950 1150.00 1300.0 1400.00 1700
2 Coach McGuirk 20 1246.25 142.412 1000 1143.75 1212.5 1350.00 1525
3 Melissa Robins 20 1123.75 143.149 900 1006.25 1112.5 1231.25 1400

Fisher-Pitman permutation test

library(coin)

oneway_test(Sodium ~ Instructor,
data = Data)

Asymptotic K-Sample Fisher-Pitman Permutation Test

chi-squared = 9.6282, df = 2, p-value = 0.008114

Post-hoc analysis with pairwise tests

library(rcompanion)

PT = pairwisePermutationTest(Sodium ~ Instructor,
                             data     = Data,
                             teststat = "quadratic",
                             method   = "fdr")

PT

                          Comparison   Stat p.value p.adjust
1 Brendon Small - Coach McGuirk = 0 0.5949   0.4405 0.44050
2 Brendon Small - Melissa Robins = 0   7.63 0.00574 0.01722
3 Coach McGuirk - Melissa Robins = 0 6.329 0.01188 0.01782

cldList(comparison = PT$Comparison,
p.value = PT$p.adjust,
threshold = 0.05)

          Group Letter MonoLetter
1 BrendonSmall      a         a
2 CoachMcGuirk      a         a
3 MelissaRobins      b          b

Permutation test with lmPerm

library(lmPerm)

model = lmp(Sodium ~ Instructor, data = Data,
perm="Prob",
seqs=FALSE)

anova(model)

Analysis of Variance Table

           Df R Sum Sq R Mean Sq Iter Pr(Prob)
Instructor 2   290146    145073 5000    0.008 **
Residuals 57 1487812     26102

summary(model)

Multiple R-Squared: 0.1632

Post-hoc analysis with pairwise tests

### Brendon Small vs. Coach McGuirk

model.1 = lmp(Sodium ~ Instructor,
              data = Data[Data$Instructor=="Brendon Small" |
                          Data$Instructor=="Coach McGuirk" ,],
                      perm="Prob",
                      seqs=FALSE)

anova(model.1)

Analysis of Variance Table

           Df R Sum Sq R Mean Sq Iter Pr(Prob)
Instructor 1    17016     17016 120   0.4583
Residuals 38 1098469     28907

### Brendon Small vs. Melissa Robins

model.2 = lmp(Sodium ~ Instructor,
              data = Data[Data$Instructor=="Brendon Small" |
                          Data$Instructor=="Melissa Robins" ,],
                      perm="Prob",
                      seqs=FALSE)

anova(model.2)

Analysis of Variance Table

           Df R Sum Sq R Mean Sq Iter Pr(Prob)
Instructor 1   268141    268141 5000   0.0026 **
Residuals 38 1102469     29012

### Coach McGuirk vs. Melissa Robins

model.3 = lmp(Sodium ~ Instructor,
              data = Data[Data$Instructor=="Coach McGuirk" |
                          Data$Instructor=="Melissa Robins" ,],
                      perm="Prob",
                      seqs=FALSE)

anova(model.3)

Analysis of Variance Table

           Df R Sum Sq R Mean Sq Iter Pr(Prob)
Instructor 1   150062    150062 4969 0.01992 *
Residuals 38   774688     20387

### Adjust p-values

p.adjust(c(0.4583, 0.0026, 0.01992), method="fdr")

0.45830 0.00780 0.02988

References

For more information on permutation tests and the coin package, see:

Hothorn, T., K. Hornik, M.A. van de Wiel, and A. Zeileis. 2015. Implementing a Class of Permutation Tests: The coin Package. cran.r-project.org/web/packages/coin/vignettes/Implementation.pdf.

library(coin); help(package="coin")

library(lmPerm); help(package="lmPerm")

library(lmPerm); vignette("lmPerm")

Non-commercial reproduction of this content, with attribution, is permitted.
For-profit reproduction without permission is prohibited.

If you use the code or information in this site in a published work, please cite it as a source. Also, if you are an instructor and use this book in your course, please let me know. My contact information is on the About the Author of this Book page.

Citation

Mangiafico, S.S. 2016. Summary and Analysis of Extension Program Evaluation in R, version 1.23.1, revised 2025. rcompanion.org/handbook/. (Pdf version: rcompanion.org/documents/RHandbookProgramEvaluation.pdf.)