Appropriate data
• One-way data with two or more groups
• Dependent variable is ordinal, interval, or ratio
• Independent variable is a factor with levels indicating groups
• Observations between groups are independent. That is, not paired or repeated measures data
Hypotheses
• Null hypothesis: The medians of the populations from which the groups were sampled are equal.
• Alternative hypothesis (two-sided): The medians of the populations from which the groups were sampled are not equal.
Interpretation
Significant results can be reported as “The median value of group A was significantly different from group B.”
Packages used in this chapter
The packages used in this chapter include:
• RVAideMemoire
• coin
• nonpar
• FSA
• rcompanion
The following commands will install these packages if they are not already installed:
if(!require(RVAideMemoire)){install.packages("RVAideMemoire")}
if(!require(coin)){install.packages("coin")}
if(!require(nonpar)){install.packages("nonpar")}
if(!require(FSA)){install.packages("FSA")}
if(!require(rcompanion)){install.packages("rcompanion")}
Mood’s median test example
This example uses the formula notation indicating that Likert is the dependent variable and Speaker is the independent variable. The data= option indicates the data frame that contains the variables. For the meaning of other options, see ?mood.medtest or the documentation for the employed function.
For appropriate plots and summary statistics, see the Two-sample Mann–Whitney U Test chapter.
Data = read.table(header=TRUE, stringsAsFactors=TRUE, text="
Speaker Likert
Pooh 3
Pooh 5
Pooh 4
Pooh 4
Pooh 4
Pooh 4
Pooh 4
Pooh 4
Pooh 5
Pooh 5
Piglet 2
Piglet 4
Piglet 2
Piglet 2
Piglet 1
Piglet 2
Piglet 3
Piglet 2
Piglet 2
Piglet 3
")
### Check the data frame
library(psych)
headTail(Data)
str(Data)
summary(Data)
RVAideMemoire package
library(RVAideMemoire)
mood.medtest(Likert ~ Speaker,
data = Data,
exact = FALSE)
Mood's median test
X-squared = 9.8, df = 1, p-value = 0.001745
Coin package
### Median test
library(coin)
median_test(Likert ~ Speaker,
data = Data)
Asymptotic Two-Sample Brown-Mood Median Test
Z = -3.4871, p-value = 0.0004883
### Exact median test
median_test(Likert ~ Speaker,
data = Data,
distribution="exact")
Exact Two-Sample Brown-Mood Median Test
Z = -3.4871, p-value = 0.001093
### Median test by Monte Carlo
simulation
library(coin)
median_test(Likert ~ Speaker,
data = Data,
distribution = approximate(nresample = 10000))
Approximative Two-Sample Brown-Mood Median Test
Z = -3.4871, p-value = 0.0011
nonpar package
X = Data$Likert[Data$Speaker=="Pooh"]
Y = Data$Likert[Data$Speaker=="Piglet"]
library(nonpar)
mediantest(x = X, y = Y, exact=TRUE)
Exact Median Test
The p-value is 0.0010825088224469
Effect size measurements
A simple effect size measurement for Mood’s median test is is to compare the medians of the groups.
In addition, the whole 5-number summary could be used, including the minimum, 1st quartile, median, 3rd quartile, and the maximum.
library(FSA)
Summarize(Likert ~ Speaker, data=Data)
Speaker n mean sd min Q1 median Q3 max
1 Piglet 10 2.3 0.8232726 1 2 2 2.75 4
2 Pooh 10 4.2 0.6324555 3 4 4 4.75 5
Examining the medians and confidence intervals would be a somewhat different approach. Here, be cautious that confidence intervals by bootstrap may not be appropriate for the median for ordinal data with may ties, such as with Likert item data, or with small samples.
library(rcompanion)
groupwiseMedian(Likert ~ Speaker, data=Data, bca=FALSE, perc=TRUE)
Speaker n Median Conf.level Percentile.lower Percentile.upper
1 Piglet 10 2 0.95 2 3
2 Pooh 10 4 0.95 4 5
### Note that confidence intervals by bootstrap
may vary.
In addition, looking at a statistic of stochastic dominance, like Vargha and Delaney’s A, may be useful in this case.
library(rcompanion)
vda(Likert ~ Speaker, data=Data, verbose=TRUE)
Statistic Value
1 Proportion Ya > Yb 0.01
2 Proportion Ya < Yb 0.91
3 Proportion ties 0.08
VDA
0.05
Finally, we can divide the difference in medians from two groups by their pooled median absolute deviation (mad). Unless I find another reference for this statistic, I’ve termed it Mangiafico’s d. It’s somewhat analogous to a nonparametric version of Cohen’s d. Note that this statistic assumes the data are at least interval in nature, as so may not be appropriate for Likert item data.
A = c(1,2,2,2,2,3,4,5)
B = c(2,3,4,4,4,5,5,5)
Y = c(A, B)
Group = c(rep("A", length(A)), rep("B", length(B)))
Data2 = data.frame(Group, Y)
library(rcompanion)
mangiaficoD(Y ~ Group, data=Data2, verbose=TRUE)
Group Statistic Value
1 A Median 2.000
2 B Median 4.000
3 Difference -2.000
4 A MAD 0.741
5 B MAD 1.480
6 Pooled MAD 1.170
d
-1.71
Manual calculation
MU = median(Data$Likert)
A1 = sum(Data$Likert[Data$Speaker=="Pooh"] < MU)
B1 = sum(Data$Likert[Data$Speaker=="Piglet"] < MU)
A2 = sum(Data$Likert[Data$Speaker=="Pooh"] >= MU)
B2 = sum(Data$Likert[Data$Speaker=="Piglet"] >= MU)
Matrix = matrix(c(A1, B1, A2, B2), byrow=FALSE, ncol=2)
rownames(Matrix) = c("Pooh", "Piglet")
colnames(Matrix) = c("LessThanMu", "GreaterThanEqualMu")
Matrix
LessThanMu GreaterThanEqualMu
Pooh 1 9
Piglet 9 1
Monte Carlo simulation
chisq.test(Matrix, simulate.p.value=TRUE, B=10000)
Pearson's Chi-squared test with simulated p-value (based on 10000 replicates)
X-squared = 12.8, df = NA, p-value = 0.0009999