On the stability of the Bayenv method in assessing human SNP-environment associations
1 Department of Biology, Stanford University, Stanford, CA 94305, USA
2 AncestryDNA, 153 Townsend St. Ste. 800, San Francisco, CA 94107, USA
Human Genomics 2014, 8:1 doi:10.1186/1479-7364-8-1Published: 9 January 2014
Phenotypic variation along environmental gradients has been documented among and within many species, and in some cases, genetic variation has been shown to be associated with these gradients. Bayenv is a relatively new method developed to detect patterns of polymorphisms associated with environmental gradients. Using a Bayesian Markov Chain Monte Carlo (MCMC) approach, Bayenv evaluates whether a linear model relating population allele frequencies to environmental variables is more probable than a null model based on observed frequencies of neutral markers. Although this method has been used to detect environmental adaptation in a number of species, including humans, plants, fish, and mosquitoes, stability between independent runs of this MCMC algorithm has not been characterized. In this paper, we explore the variability of results between runs and the factors contributing to it.
Independent runs of the Bayenv program were carried out using genome-wide single-nucleotide polymorphism (SNP) data from samples from 60 worldwide human populations following previous applications of the Bayenv method. To assess factors contributing to the method's stability, we used varying numbers of MCMC iterations and also analyzed a second modified data set that excluded two Siberian populations with extreme climate variables. Between any two runs, correlations between Bayes factors and the overlap of SNPs in the empirical p value tails were surprisingly low. Enrichments of genic versus non-genic SNPs in the empirical tails were more robust than the empirical p values; however, the significance of the enrichments for some environmental variables still varied among runs, contradicting previously published conclusions. Runs with a greater number of MCMC iterations slightly reduced run-to-run variability, and excluding the Siberian populations did not have a large effect on the stability of the runs.
Because of high run-to-run variability, we advise against making conclusions about genome-wide patterns of adaptation based on only one run of the Bayenv algorithm and recommend caution in interpreting previous studies that have used only one run. Moving forward, we suggest carrying out multiple independent runs of Bayenv and averaging Bayes factors between runs to produce more stable and reliable results. With these modifications, future discoveries of environmental adaptation within species using the Bayenv method will be more accurate, interpretable, and easily compared between studies.