The Microbiome Quality Control Project Baseline study: Sources of variation in microbial community amplicon sequencing

Precision medicine has been made possible by the translation of ‘omics to the clinic, and human microbiome studies must likewise transition to applications in public health. This will require especially robust measurements and assimilation of data from multiple population-scale cohorts. We thus initiated the Microbiome Quality Control (MBQC, http://mbqc.org) project and report a baseline investigation of variability in taxonomic profiling due to physical sample handling, 16S amplicon sequencing, and bioinformatic processing. Blinded sample sets from human stool,chemostats, and mock community mixtures were sequenced by 15 handling laboratories and analyzed by nine bioinformatics protocols. The resulting 16,554 taxonomic profiles were integrated to evaluate the sources and extent of measurement accuracy and variability. Biological variability was typically largest, followed by that from DNA extraction, sample handling environment, and smaller effects from other protocol variables and bioinformatics; almost all factors, however, could produce large effects under at least some circumstances. Quantitative relative measures such as weighted alpha- and beta-diversity were generally robust to bioinformatics methods, but different samples were often differentially affected by individual protocol factors (e.g. DNA extraction effects were largest in fresh stool samples, samplehandling environment in negative controls). Analysis of artificial community positive controls revealed systematic differences both in extraction efficiency and in bioinformatic classification, and negative controls identified sources of contamination both in silico and during sample handling. Future evaluations of other microbiome sample types, human body sites, and metagenomic sequencing strategies will be necessary, but these results permit researchers to make informed experimental design choices for gut microbiome studies comparable across labs.