Non-metric multidimensional scaling (NMDS) on macroinvertebrate communities
Non-metric multidimensional scaling (NMDS) is a technique often used to find similarities and similarities between communities composition. As a part of a broader analysis I wrote to assess the ecological integrity of a set of freshwater streams in the Auckland region, I used NMDS (using R) to compare macroinvertebrate communities composition among the studied streams.
Finding similarities and dissimilarities among sampled streams can help confirm or deny the relationships between sites, such as land cover. Although this sort of technique is usually applied to samples with full species composition (Death & Collier, 2010; Belmar et al., 2013; McManamay et al., 2013), it is used in this study with values that translate into abundance codes (see table below). The original data obtained from the Auckland Council only provides abundance code for each macroinvertebrate taxa, hence the conversion from abundance code to a numeric value. The translation from qualitative to quantitative values follows the thresholds set by Stark (1998).
| Abundance code | Counts (value) |
|---|---|
| Rare (R) | 1 |
| Common (C) | 5 |
| Abundant (A) | 20 |
| Very Abundant (VA) | 100 |
| Very very abundant (VVA) | 500 |
NMDS can be understood as using a new axis for each macroinvertebrate taxa, where the taxa value for each site is plotted as a point and the "distance" among sites is used to evaluate how "different" sites are among them. It you only have one or two taxa it is a bit of an overkill to use NMDS but if you have dozens or hundreds of them then it certainly becomes very useful.
For this particular case I used the NDMS implementation provided in the R vegan package, which I highly recommend if you're into ecological assessments.
library(dplyr) library(extrafont) library(ggplot2) library(cowplot) library(vegan) ############# ## WARNING ## ############# ## I like using Gill Sans Nova typeface for plots and maps. ## If you need to load your fonts in Windows, uncomment the line below. ## If you don't have Gill Sans Nova locally just replace all occurences of `pfamily="Gill Sans Nove"` ## or with your typeface of choice. # loadfonts(device="win") pfamily <- "Gill Sans Nova" ptextsize <- 18 # Load and polish data dsamples <- read.table(file="./data/auckland_streams.csv", sep=",", header=TRUE) dsites <- read.table(file="./data/sites.csv", sep=",", header=TRUE) # Polish date dsamples$date <- as.Date(dsamples$date, format="%d/%m/%Y") # Keep only sites we want dsamples <- dsamples[dsamples$site %in% dsites$sitename,] dsamples <- droplevels(dsamples) ############################################################## # NMDS ############################################################## dsamples <- merge(dsamples, dsites, by.x = c("site"), by.y = c("sitename")) # Change some levels otherwise labels overlap levels(dsamples$streamname)[levels(dsamples$streamname) == "Otara"] <- "Otara" levels(dsamples$streamname)[levels(dsamples$streamname) == "Mahurangi"] <- "Mahurangi" levels(dsamples$streamname)[levels(dsamples$streamname) == "Oakley LTB"] <- "Oakley" levels(dsamples$streamname)[levels(dsamples$streamname) == "Ngakaroa"] <- "Ngakaroa" levels(dsamples$streamname)[levels(dsamples$streamname) == "Puhinui"] <- "Puhinui" levels(dsamples$streamname)[levels(dsamples$streamname) == "Papakura LTB"] <- "Papakura" levels(dsamples$streamname)[levels(dsamples$streamname) == "Oteha"] <- "Oteha" levels(dsamples$streamname)[levels(dsamples$streamname) == "Kumeu"] <- "Kumeu" levels(dsamples$streamname)[levels(dsamples$streamname) == "Hoteo"] <- "Hoteo" levels(dsamples$streamname)[levels(dsamples$streamname) == "Vaughns Lower"] <- "Vaughn" levels(dsamples$streamname)[levels(dsamples$streamname) == "Wairoa"] <- "Wairoa" levels(dsamples$streamname)[levels(dsamples$streamname) == "Matakana]"] <- "Matakana" levels(dsamples$streamname)[levels(dsamples$streamname) == "Opanuku"] <- "Opanuku" levels(dsamples$streamname)[levels(dsamples$streamname) == "West Hoe"] <- "West Hoe" levels(dsamples$streamname)[levels(dsamples$streamname) == "Cascade LTB"] <- "Cascades" # Subset for stream, sitetype and taxas only dtsamples <- subset(dsamples, select = -c(year, date, sitenumber, site, protocol, richness, EPT, MCI, SQMCI, NativeForest, Pasture, Forestry, Urban, Horticulture)) # replace abundance codes by values R=1, C=5, A=20, VA=100, VVA=500 abundanceToInt <- function(a) { case_when( a == "R" ~ 1, a == "C" ~ 5, a == "A" ~ 20, a == "VA" ~ 100, a == "VVA" ~ 500, a == "" ~ 0 ) } #################### # Plot MDS stuff #################### diffcols <- setdiff(colnames(dtsamples), list("streamname", "sitetype")) dtsamples[diffcols] <- lapply(dtsamples[diffcols], abundanceToInt) not_all_na <- function(x) any(!is.na(x)) dtsamples <- dtsamples %>% select_if(not_all_na) # update as some got removed diffcols <- setdiff(colnames(dtsamples), list("streamname", "sitetype")) # save streamnames-types separately dtypes <- dtsamples %>% distinct(sitetype, streamname, .keep_all = F) dtsamples$sitetype <- NULL # drop type, will add later # mean values for each site as there are samples for multiple years dtsamples_simpl <- aggregate(. ~ streamname, transform(dtsamples, streamname = streamname), mean) # add type back dtsamples_simpl$sitetype <- dtypes$sitetype # NMDS dtsamples_co <- metaMDS(comm = dtsamples_simpl[diffcols], distance = "bray", trace = FALSE, autotransform = FALSE) spsc <- as.data.frame(scores(dtsamples_co, "species")) # add species and type back spsc$species <- rownames(spsc) # get points with site (stream) name and type MDS_xy <- data.frame(dtsamples_co$points) MDS_xy$streamname <- dtsamples_simpl$streamname MDS_xy$sitetype <- dtsamples_simpl$sitetype # Plot mdsplot <- ggplot(MDS_xy, aes(MDS1, MDS2)) + geom_text(aes(label=streamname, colour=sitetype), size=5) + # stream names geom_point(data=spsc, aes(x=NMDS1,y=NMDS2), size=1, alpha=0.5) + # add the species labels theme(plot.title=element_text(hjust=0.5), legend.text=element_text(size=ptextsize-4), axis.title.x=element_text(size=ptextsize-2, family=pfamily), axis.title.y=element_text(size=ptextsize-2, family=pfamily), axis.text=element_text(size=ptextsize-8 , family=pfamily, colour = "black"), axis.line = element_line(colour = "black"), text=element_text(size=ptextsize, family=pfamily), rect=element_blank()) + theme(panel.grid=element_blank(), panel.border=element_rect(colour = "black", fill=NA, size=0.5)) + scale_colour_manual(values=c("#006266", "#009900", "#FF6633", "#993333")) + guides(colour=guide_legend(title="")) png(file="./output/mds.png", width=800, height=400) print(mdsplot) dev.off()
The resulting image reduces the data into 2 dimensions (axis x and y) and plots each site name colored by its type (forestry, native forest, urban or pasture).
Result of the NMDS analysys.
The code is available in a Github repo and soon I'll post the whole analysis. If you have any questions or recommendations please do get in touch, cheers.
References
Belmar, O., Velasco, J., Gutiérrez‐Cánovas, C., Mellado‐Díaz, A., Millán, A., & Wood, P. J. (2013). The influence of natural flow regimes on macroinvertebrate assemblages in a semiarid Mediterranean basin. Ecohydrology, 6(3), 363-379.
Death, R. G., & Collier, K. J. (2010). Measuring stream macroinvertebrate responses to gradients of vegetation cover: when is enough enough? Freshwater Biology, 55(7), 1447-1464.
McManamay, R. A., Orth, D. J., & Dolloff, A. C. (2013). Macroinvertebrate Community responses to gravel addition in a Southeastern regulated river. Southeastern Naturalist., 12(3), 599-618.
Stark, J. D. (1998). SQMCI: A biotic index for freshwater macroinvertebrate coded‐abundance data. New Zealand Journal of Marine and Freshwater Research, 32(1), 55-66.