Published on 12 November 2018

Non-metric multidimensional scaling (NMDS) on macroinvertebrate communities

Non-metric multidimensional scaling (NMDS) is a technique often used to find similarities and similarities between communities composition. As a part of a broader analysis I wrote to assess the ecological integrity of a set of freshwater streams in the Auckland region, I used NMDS (using R) to compare macroinvertebrate communities composition among the studied streams.

Finding similarities and dissimilarities among sampled streams can help confirm or deny the relationships between sites, such as land cover. Although this sort of technique is usually applied to samples with full species composition (Death & Collier, 2010; Belmar et al., 2013; McManamay et al., 2013), it is used in this study with values that translate into abundance codes (see table below). The original data obtained from the Auckland Council only provides abundance code for each macroinvertebrate taxa, hence the conversion from abundance code to a numeric value. The translation from qualitative to quantitative values follows the thresholds set by Stark (1998).

Abundance code Counts (value)
Rare (R) 1
Common (C) 5
Abundant (A) 20
Very Abundant (VA) 100
Very very abundant (VVA) 500

NMDS can be understood as using a new axis for each macroinvertebrate taxa, where the taxa value for each site is plotted as a point and the "distance" among sites is used to evaluate how "different" sites are among them. It you only have one or two taxa it is a bit of an overkill to use NMDS but if you have dozens or hundreds of them then it certainly becomes very useful.

For this particular case I used the NDMS implementation provided in the R vegan package, which I highly recommend if you're into ecological assessments. 

library(dplyr)
library(extrafont)
library(ggplot2)
library(cowplot)
library(vegan)

#############
## WARNING ##
#############
## I like using Gill Sans Nova typeface for plots and maps.
## If you need to load your fonts in Windows, uncomment the line below.
## If you don't have Gill Sans Nova locally just replace all occurences of `pfamily="Gill Sans Nove"`
## or with your typeface of choice.

# loadfonts(device="win")

pfamily <- "Gill Sans Nova"
ptextsize <- 18

# Load and polish data
dsamples <- read.table(file="./data/auckland_streams.csv", sep=",", header=TRUE)
dsites <- read.table(file="./data/sites.csv", sep=",", header=TRUE)
# Polish date
dsamples$date <- as.Date(dsamples$date, format="%d/%m/%Y")
# Keep only sites we want
dsamples <- dsamples[dsamples$site %in% dsites$sitename,]
dsamples <- droplevels(dsamples)

##############################################################
# NMDS
##############################################################

dsamples <- merge(dsamples, dsites, by.x = c("site"), by.y = c("sitename"))
# Change some levels otherwise labels overlap
levels(dsamples$streamname)[levels(dsamples$streamname) == "Otara"] <- "Otara"
levels(dsamples$streamname)[levels(dsamples$streamname) == "Mahurangi"] <- "Mahurangi"
levels(dsamples$streamname)[levels(dsamples$streamname) == "Oakley LTB"] <- "Oakley"
levels(dsamples$streamname)[levels(dsamples$streamname) == "Ngakaroa"] <- "Ngakaroa"
levels(dsamples$streamname)[levels(dsamples$streamname) == "Puhinui"] <- "Puhinui"
levels(dsamples$streamname)[levels(dsamples$streamname) == "Papakura LTB"] <- "Papakura"
levels(dsamples$streamname)[levels(dsamples$streamname) == "Oteha"] <- "Oteha"
levels(dsamples$streamname)[levels(dsamples$streamname) == "Kumeu"] <- "Kumeu"
levels(dsamples$streamname)[levels(dsamples$streamname) == "Hoteo"] <- "Hoteo"
levels(dsamples$streamname)[levels(dsamples$streamname) == "Vaughns Lower"] <- "Vaughn"
levels(dsamples$streamname)[levels(dsamples$streamname) == "Wairoa"] <- "Wairoa"
levels(dsamples$streamname)[levels(dsamples$streamname) == "Matakana]"] <- "Matakana"
levels(dsamples$streamname)[levels(dsamples$streamname) == "Opanuku"] <- "Opanuku"
levels(dsamples$streamname)[levels(dsamples$streamname) == "West Hoe"] <- "West Hoe"
levels(dsamples$streamname)[levels(dsamples$streamname) == "Cascade LTB"] <- "Cascades"

# Subset for stream, sitetype and taxas only
dtsamples <- subset(dsamples, select = -c(year, date, sitenumber, site, protocol, richness, EPT, MCI, SQMCI, NativeForest, Pasture, Forestry, Urban, Horticulture))
# replace abundance codes by values R=1, C=5, A=20, VA=100, VVA=500
abundanceToInt <- function(a) {
    case_when(
        a == "R" ~ 1,
        a == "C" ~ 5,
        a == "A" ~ 20,
        a == "VA" ~ 100,
        a == "VVA" ~ 500,
        a == ""  ~ 0
    )
}

####################
# Plot MDS stuff
####################

diffcols <- setdiff(colnames(dtsamples), list("streamname", "sitetype"))
dtsamples[diffcols] <- lapply(dtsamples[diffcols], abundanceToInt)

not_all_na <- function(x) any(!is.na(x))
dtsamples <- dtsamples %>% select_if(not_all_na)
# update as some got removed
diffcols <- setdiff(colnames(dtsamples), list("streamname", "sitetype"))
# save streamnames-types separately
dtypes <- dtsamples %>% distinct(sitetype, streamname, .keep_all = F)
dtsamples$sitetype <- NULL # drop type, will add later
# mean values for each site as there are samples for multiple years
dtsamples_simpl <- aggregate(. ~ streamname, transform(dtsamples, streamname = streamname), mean)
# add type back
dtsamples_simpl$sitetype <- dtypes$sitetype
# NMDS
dtsamples_co <- metaMDS(comm = dtsamples_simpl[diffcols], distance = "bray", trace = FALSE, autotransform = FALSE)
spsc <- as.data.frame(scores(dtsamples_co, "species"))
# add species and type back
spsc$species <- rownames(spsc)
# get points with site (stream) name and type
MDS_xy <- data.frame(dtsamples_co$points)
MDS_xy$streamname <- dtsamples_simpl$streamname
MDS_xy$sitetype <- dtsamples_simpl$sitetype
# Plot
mdsplot <- ggplot(MDS_xy, aes(MDS1, MDS2)) + 
    geom_text(aes(label=streamname, colour=sitetype), size=5) + # stream names
    geom_point(data=spsc, aes(x=NMDS1,y=NMDS2), size=1, alpha=0.5) +  # add the species labels
    theme(plot.title=element_text(hjust=0.5),
        legend.text=element_text(size=ptextsize-4),
        axis.title.x=element_text(size=ptextsize-2, family=pfamily),
        axis.title.y=element_text(size=ptextsize-2, family=pfamily),
        axis.text=element_text(size=ptextsize-8 , family=pfamily, colour = "black"),
        axis.line = element_line(colour = "black"),
        text=element_text(size=ptextsize, family=pfamily),
        rect=element_blank()) +
    theme(panel.grid=element_blank(), panel.border=element_rect(colour = "black", fill=NA, size=0.5)) +
    scale_colour_manual(values=c("#006266", "#009900", "#FF6633", "#993333")) +
    guides(colour=guide_legend(title=""))
png(file="./output/mds.png", width=800, height=400)
print(mdsplot)
dev.off()

The resulting image reduces the data into 2 dimensions (axis x and y) and plots each site name colored by its type (forestry, native forest, urban or pasture). 

NMDS

Result of the NMDS analysys.

The code is available in a Github repo and soon I'll post the whole analysis. If you have any questions or recommendations please do get in touch, cheers.

References

Belmar, O., Velasco, J., Gutiérrez‐Cánovas, C., Mellado‐Díaz, A., Millán, A., & Wood, P. J. (2013). The influence of natural flow regimes on macroinvertebrate assemblages in a semiarid Mediterranean basin. Ecohydrology, 6(3), 363-379.

Death, R. G., & Collier, K. J. (2010). Measuring stream macroinvertebrate responses to gradients of vegetation cover: when is enough enough? Freshwater Biology, 55(7), 1447-1464.

McManamay, R. A., Orth, D. J., & Dolloff, A. C. (2013). Macroinvertebrate Community responses to gravel addition in a Southeastern regulated river. Southeastern Naturalist., 12(3), 599-618.

Stark, J. D. (1998). SQMCI: A biotic index for freshwater macroinvertebrate coded‐abundance data. New Zealand Journal of Marine and Freshwater Research, 32(1), 55-66.