Genome-wide scans for loci under selection in humans

Ronald, James; Akey, Joshua M.

doi:10.1186/1479-7364-2-2-113

Review
Published: 01 June 2005

Genome-wide scans for loci under selection in humans

James Ronald¹ &
Joshua M. Akey¹

Human Genomics volume 2, Article number: 113 (2005) Cite this article

2071 Accesses
21 Citations
Metrics details

Abstract

Natural selection, which can be defined as the differential contribution of genetic variants to future generations, is the driving force of Darwinian evolution. Identifying regions of the human genome that have been targets of natural selection is an important step in clarifying human evolutionary history and understanding how genetic variation results in phenotypic diversity, it may also facilitate the search for complex disease genes. Technological advances in high-throughput DNA sequencing and single nucleotide polymorphism genotyping have enabled several genome-wide scans of natural selection to be undertaken. Here, some of the observations that are beginning to emerge from these studies will be reviewed, including evidence for geographically restricted selective pressures (ie local adaptation) and a relationship between genes subject to natural selection and human disease. In addition, the paper will highlight several important problems that need to be addressed in future genome-wide studies of natural selection.

Introduction

Phenotypic diversity is a ubiquitous characteristic of natural populations. Individuals vary in almost every conceivable way, including physical appearance, behaviour, disease susceptibility, ability to detoxify drugs and perception of environmental stimuli [1]. Although environmental forces undoubtedly contribute to phenotypic variation, so too does genetic variation. Therefore, explaining the evolutionary forces that create, maintain and shape patterns of human genetic variation is of fundamental importance in understanding phenotypic variation [2].

An important goal in studies of human genetic variation is to identify loci that have been targets of natural selection due to their variable effects on the fitness of individuals throughout a population's history. Signatures of natural selection delimit regions of the genome that are, or have been, functionally important. Therefore, identifying such regions will facilitate the identification of genetic variation that contributes to phenotypic variation and help to functionally annotate the genome. Unfortunately, inferring the action of natural selection remains a challenge. This is likely to change in the near future, as high-throughput methods for cataloguing genetic variation on a genome-wide scale and new statistical tools for detecting selection have been, and continue to be, developed.

Much important work has been done on genome scans for natural selection in model organisms such as Drosophila;[3–5] this review, however, will focus on studies performed in human populations. Firstly, there will be a summary of the effects of natural selection and population history on patterns of genetic variation and some of the common statistical methods used to test for deviations from neutrality will be presented. Next, a critical evaluation of several empirical genome-wide scans for selection will be presented. Finally, the paper will highlight several important problems, both practical and conceptual, that need to be addressed in future studies.

Human genetic variation: The neutral expectation

The evolutionary sojourn of a newly arisen mutation depends upon how it affects the fitness of the individual who possesses it. The neutral theory of molecular evolution posits that the vast majority of polymorphisms in a population are selectively neutral and have no appreciable effects on fitness [6, 7]. Under neutrality, changes in allele frequency are governed by the stochastic effects of genetic drift in populations of finite size. Thus, the effective population size, N_e, and neutral mutation rate, μ_o, determine levels of polymorphism within species and the rate of divergence between species [8]. In addition, the effect of mutations with small fitness effects can be rendered 'nearly neutral' if the product of N_e and s (which measures the strength of selection) is < 1 [9, 10]. For human populations, N_e is approximately 10,000 and therefore |s| must be greater than 10^-4 to overcome the stochastic effects of genetic drift. Because the neutral theory makes explicit and quantitative predictions about expected patterns of genetic variation within and between species, it is an indispensable tool in studies of natural selection. Specifically, the neutral theory provides an essential foundation for evaluating the evidence either for or against selection in empirical data, as it serves as the null hypothesis when exploring alternative evolutionary models [11, 12].

In combination with the neutral theory, coalescent theory provides a powerful framework for conceptualising and making inferences about evolutionary forces. The coalescent is a stochastic model of gene geneaologies [13–17] and has emerged as the primary analytical tool in studies of genetic variation. In classical population genetics theory, the initial state of a population is defined and one observes the evolution of the entire population by looking forward in time. By contrast, the coalescent is a sample-driven theory that traces the history of coalescent events backwards in time (see Figure 1 for some basic properties of the coalescent). Several excellent and detailed reviews of coalescent theory can be found elsewhere [18, 19]. Deviations from the standard neutral model distort the branch lengths, topology and coalescent times of gene genealogies, as described below.

Evolutionary forces perturb patterns of genetic variation

Natural selection and population demographic history perturb patterns of genetic variation relative to what is expected under a standard neutral model (constant sized, randomly mating, panmictic population at mutation drift equilibrium). Below, the way in which selection and demographic history affect patterns of genetic variation will be considered, from a coalescent point of view.

Theoretical studies have investigated the evolutionary dynamics of genetic variation subject to a variety of selective pressures, including, purifying [20–22] positive [23–26] and balancing selection [27–30]. This paper will focus on positive and balancing selection, as these have been the primary types of selection that current genome-wide scans have studied. Positive selection acts to increase the frequency of advantageous alleles in a population. Strongly advantageous mutations are rapidly swept to fixation, hence the term 'selective sweep'. Importantly, through a process referred to as 'genetic hitchhiking', positive selection also affects patterns of neutral polymorphisms linked to an advantageous mutation [23]. Positive selection leads to a shallow star-like genealogy (Figure 2) with a decreased time to the most recent common ancestor. Tracing the history of alleles backwards in time, these effects are a direct consequence of the rapid coalescence of lineages in the small but expanding progenitor population [31]. The signature of positive selection includes reduced levels of genetic variation compared with neutral expectations, [24–26] a skew in the allele frequency spectrum towards low-frequency alleles [32] (including an excess of high-frequency derived alleles [33]) and elevated levels of linkage disequilibrium [34].

Balancing selection occurs when polymorphisms are selectively maintained in a population. By contrast with positive selection, the genealogy of a locus subject to balancing selection is characterised by an increased time to the most recent common ancestor and long internal branches (Figure 2). The effect of balancing selection on gene genealogies can be understood by considering balanced alleles as distinct subpopulations, such that coalescence events can occur rapidly within a subpopulation but slowly between subpopulations [27]. The signature of balancing selection includes elevated levels of polymorphism relative to neutral expectations and a skew of the allele frequency distribution towards an excess of intermediate frequency alleles [29, 30, 35].

In addition to natural selection, population demographic history can also have strong influences on patterns of genetic variation, which often mimic the effect of natural selection [36, 37]. In other words, inferences of natural selection are confounded by population demographic history. For example, both positive selection and increases in population size have similar effects on gene genealogies (Figure 2); both processes therefore lead to an excess of low-frequency alleles in a population. In fact, strong positive selection can be thought of as a rapid population expansion of an advantageous allele as it sweeps through a population. Similarly, population structure and balancing selection both result in subdivided genealogies and therefore both processes are expected to result in an excess of intermediate-frequency alleles in a population (Figure 2). Population bottlenecks can lead to an excess of either low- or intermediate-frequency alleles relative to neutral expectations, depending on the age and severity of the bottleneck. Figure 2 demonstrates the effect of a severe and recent bottleneck, which forces all lineages to coalesce at the time of the size reduction and results in a genealogy that is similar to positive selection. Human populations clearly do not meet all of the assumptions of the standard neutral model; hence, rejecting the standard neutral model for a particular locus cannot be interpreted as unambiguous evidence for selection.

Detecting the signature of natural selection

Before presenting the results from genome-wide scans for natural selection, there now follows a brief description of some commonly used statistical methods designed to detect departures from neutrality, highlighting some of their strengths and limitations. The following is not meant to be an exhaustive discussion of such tests, and descriptions of many interesting and useful methods will not be included here. For further study, the reader is encouraged to see an excellent review by Kreitman [38].

Statistical tests of neutrality can broadly be classified into three categories, based upon the type of data that they use: 1) within species tests; 2) within and between species tests; and 3) between species tests (Table 1). The most common class of within-species tests compares summary statistics of the observed allele frequency distribution at a locus with the values expected under neutral evolution; it includes Tajima's D, [39] Fu and Li's D and F, [40] Fay and Wu's H [33] and Fu's Fs [41]. An attractive feature of these tests is that they do not require any a priori classification of functional versus non-functional sites, thus making them equally suitable for protein and nonprotein coding regions. In a thorough examination of the power of Tajima's D and Fu and Li's D and F, Simonsen et al. [42] found that these tests can only detect selective sweeps in a narrow time interval in the recent past, and that they can only detect balancing selection if it has acted for a very long time period. Interestingly, Simonsen et al. also found that the power of these tests could drop below the nominal false-positive rate, α, if non-neutral evolution did not occur within these critical time windows, thus creating the undesirable scenario in which rejection of the null is more likely if the null is true than if it is false. Fu observed similar results, although he found that F_s was most powerful and performed better at detecting more ancient positive selection [41].

Table 1 Statistical tests of neutrality

Full size table

The site-frequency spectrum tests discussed above are confounded by demographic events such as population growth, bottlenecks and subdivision (Figure 2) and are rendered conservative by intra-locus recombination. The desire to estimate population demographic parameters, recombination rates and evolutionary parameters has prompted the development of maximum likelihood-based methods which use the complete data, rather than summary statistics [31, 43, 44]. These methods are computationally intensive and are not currently feasible for large datasets, but they potentially allow for substantial gains in statistical power relative to summary statistics methods and are likely to become increasingly important tools in the future (for a general discussion, see Felsenstein [45]).

Another within-species test that has been used to detect selection is to compare the variation in allele frequencies between populations, which can be quantified by the statistic F_ST. Under selective neutrality, F_ST is determined by genetic drift, whereas natural selection is a locus-specific force that can cause systematic deviations in F_ST values for a selected gene and nearby genetic markers. For example, geographically restricted directional selection may lead to an increase in F_ST of a selected locus, whereas balancing or species-wide directional selection may lead to a decrease in F_ST compared with neutrally evolving loci [46–50]. In a series of simulation experiments analysing two different F_ST test implementations, Beaumont and Balding found that this approach yielded sufficient power to detect positive selection provided that the selective coefficient was approximately five times larger than the migration rate, but that F_ST had little power to detect balancing selection [50].

Positive selection is also expected to increase levels of linkage disequilibrium (LD) relative to neutral expectations. Recently, a new statistical test was developed, the long-range haplotype (LRH) test, [33] which takes advantage of ancestral recombination events and the associated decay in LD to identify genes subject to positive selection. The rationale for this test is that a common allele with long-range LD potentially represents a site that has appeared recently and was driven to high frequency before recombination could erode LD. The LRH approach does not detect balancing selection, however, and the robustness of the test to non-neutral population demographics, the choice of haplotype defining markers and phase misspecification have not been well studied.

The second major class of neutrality tests compares levels of within-species polymorphism and between-species divergence and includes the Hudson - Kreitman - Aguade (HKA) [51] and McDonald - Kreitman (MK) [52] tests. The HKA method tests the goodness of fit of the observed levels of polymorphism within species and the observed divergence between species to those predicted under neutral theory. In order to determine polymorphism and divergence expectations under neutrality, data are required from at least two loci in each species, so that a simultaneous estimate can be made of a time-since-speciation parameter and a relative population size parameter. Under the HKA test, rejection of the null is formally interpreted as elevated polymorphism at one locus or reduced polymorphism at the other, or excess divergence at one locus or limited divergence at the other. Thus, it may not be obvious which locus or which process is responsible for producing a statistically significant test. McDonald [53] has described improvements to the HKA test which may ameliorate this problem.

In the MK test, a 2 × 2 contingency table is formed to compare the number of non-synonymous and synonymous sites that are polymorphic within a species (P_N and P_S) and fixed between species (D_N and D_S). Under neutrality, the ratio of non-synonymous to synonymous sites that are polymorphic equals the ratio of non-synonymous to synonymous sites that are fixed (ie P_N/P_S = D_N/D_S). Under positive selection, however, these two ratios are no longer equal and D_N/D_S > P_N/P_S [54]. Among the strengths of the MK test are that it does not require assumptions about population demographic history (although under some circumstances the test can be adversely affected by increases in effective population size [54]) and is relatively insensitive to intra-locus recombination. Positive or purifying selection for codon usage may, however, bias the MK test [38].

The final class of neutrality tests uses between-species data to test for adaptive protein evolution. The classic test of positive selection compares the number of non-synonymous amino acid substitutions in a gene (d_n) with the number of synonymous amino acid substitutions (d_s). Under neutrality, the mutation rate at both categories of sites is the same, and d_n/d_s is expected to equal one; however, d_n/d_s < 1 for proteins subject to purifying selection and d_n/d_s > 1 for proteins under adaptive evolution. Although dn/ds > 1 provides strong evidence for adaptive protein evolution, it is a very conservative test, particularly if only a small number of codons have been selected for. The basic test has also been extended by Nielsen and Yang [55] and others to include models of codon and transition/transversion bias, to detect variation in d_n/d_s ratios among lineages and to identify specific codons under selection [56, 57].

Key advantages of genome-wide analyses

As alluded to above, distinguishing between the confounding effects of natural selection and population demographic history is difficult when studying a single locus. When many unlinked genes are considered, however, a clear strategy emerges. Population demographic history affects patterns of variation at all loci in a genome in a similar manner, whereas natural selection acts upon specific loci [12, 37, 46, 58]. Therefore, by sampling a large number of unlinked loci throughout the genome, empirical distributions of test statistics can be constructed and genes subject to locus-specific forces, such as natural selection, can be identified as outlier loci.

To provide some examples of how genome-wide analyses can facilitate inferences of natural selection, Figure 3 shows empirical distributions of Tajima's D, F_ST and d_n/d_s, along with their theoretical distributions, simulated under both standard neutral models and alternative demographic histories. Figure 3 highlights two important points. First, empirical distributions provide important information that can be used to infer population demographic history. For example, the demographic models used in simulating Tajima's D (Figure 3A and 3B) recapitulate the empirical distributions much more closely than data simulated under a standard neutral model. Secondly, outlier loci can be identified with greater precision and accuracy with more realistic models of human demographic history. Specifically, the best-fitting non-neutral distributions dramatically reduce the number of test statistics that are apparent outliers under neutrality. Conversely, some test statistics that do not appear to deviate from neutrality are outliers under the best non-neutral distributions. Thus, in principle, empirical distributions of test statistics can be used both to reduce the false-positive rate and to improve power. Although this general strategy has recently been dubbed 'population genomics', the theoretical foundation of searching for outlier loci to find targets of natural selection was outlined decades ago [46, 47].

In addition to providing empirical distributions, genomewide scans for natural selection offer several additional advantages compared with single-locus studies. Genome-wide scans can suggest general principles about the types of variation that natural selection acts most forcefully upon. Datasets derived from an unbiased sampling of loci throughout the genome allow for the discovery of novel functional elements whose presence is revealed by evidence for selection. Whole-genome scans also have the potential to reveal networks of genes whose evolutionary histories are correlated due to their collaboration in executing cellular functions. Finally, it is important to stress that genome-wide analyses do not preclude single-locus analyses, and that achieving a detailed and thorough understanding of the selective and demographic forces acting upon a locus will necessitate focused single-locus analyses drawing from multiple scientific disciplines.

Genome scans for natural selection

Several genome-wide scans for natural selection have recently been performed and are summarised in Table 2. These studies have used a variety of different statistical approaches, data and populations, but are united by the common theme of sampling a large number of loci and making inferences of natural selection. Below, some of these studies will be considered in more detail, to highlight the salient results emerging from genome-wide scans for selection.

Table 2 Summary of genome-wide scans for selection

Full size table

One of the first genome-wide screens for selection to be performed analysed 26,530 single nucleotide polymorphisms (SNPs), which were genotyped in three human populations: African-Americans, East Asians and European-Americans [65]. An empirical distribution of F_ST was constructed and outlier SNPs in gene regions were identified. As discussed above, geographically restricted selection (local adaptation) can accentuate levels of population structure by creating large differences in allele frequencies between populations. Conversely, balancing selection can lead to lower than expected levels of population structure. In total, 174 candidate selection genes were identified whose levels of population structure were significantly different compared with neutral expectations (156 genes had exceptionally high values of F_ST and 18 had exceptionally low values of F_ST). In addition, the average F_ST was significantly different between SNPs located in exons, introns and non-genic regions, which is consistent with the action of purifying selection. One limitation of this study was that it relied upon markers that were discovered in a small number of chromosomes, which can lead to significant ascertainment bias (ie in this case, an over-representation of intermediate-frequency alleles). Such ascertainment bias complicates inferences of natural selection, and, as the authors note, additional analyses are needed to confirm the signature of selection in these genes.

Three genome-wide scans for natural selection have also been performed with microsatellite markers, [66–68] the largest of which analysed 5,257 microsatellite markers in 28 individuals of European descent [66]. A sliding window analysis across the genome revealed 43 bins that contained a significant reduction in heterozygosity relative to neutral expectations. Interestingly, the recombination rate in these 43 bins was significantly reduced compared with the genomewide average, which is consistent with theoretical predictions that positive selection will be easier to detect in regions of the genome with low recombination rates [23].

The other two microsatellite based genome-wide scans for selection included multiple populations and searched for evidence of local adaptation by identifying outlier loci that exhibited large levels of population structure relative to the empirical distribution of all loci. Specifically, Kayser et al. [67] studied 332 microsatellite markers in 47 Europeans and 47 Africans (23 Ethiopians and 24 South Africans). The test statistics RST, a multiallelic analogue of F_ST, and ln RV, which is the natural log of the variance in allele sizes between populations, [69] were calculated for all loci. Numerous outlier loci were detected and 11 were studied further by genotyping additional microsatellite markers in these regions. The additional microsatellite analyses confirmed the large differences in genetic differentiation, which strengthens the hypothesis that outlier loci have been targets of geographically restricted selective pressures. Similarly, Storz et al. [68] analysed a total of 624 microsatellite loci that were previously genotyped in multiple populations from Africa, Europe and Asia. Again, measures of population structure were calculated for all markers (F_ST and an analogue to ln RV) and outlier loci were identified. In total, 13 outlier loci were found and all but one had significant reductions in heterozygosity in non-African populations; this was interpreted as evidence that local adaptation was more common outside of Africa. An important limitation of the microsatellite analyses is that the high mutation rate of microsatellites may obscure signatures of selection, except in low-recombining regions of the genome [70, 71].

In one of the largest gene-based genome-wide screens performed to date, Clark et al. [62] analysed 7,645 orthologous genes from humans, chimpanzees and mice (see also Figure 3D). Maximum-likelihood models were fitted to proteincoding DNA sequences to estimate rates of synonymous (d_s) and non-synonymous (d_n) substitutions. In total, 1,547 genes had d_n/d_s ratios > 1 in humans, which is commonly interpreted as evidence for positive selection, but the neutral model could be formally rejected at p < 0.05 for only six of these genes. Using an alternative statistical method with greater sensitivity, branch site models were fitted to the data in order to detect accelerated rates of d_n/d_s in the human lineage for a subset of nucleotide sites (ie d_n/d_s does not have to be > 1 for the entire gene). A total of 667 genes were identified as significant at p < 0.05 in this analysis; subsequent bioinformatics analyses revealed two interesting observations. First, accelerated rates of evolution were found for several functional classes of genes, including olfactory, nuclear transport and sensory perception. Secondly, genes with evidence for positive selection were enriched for genes that are associated with human diseases, as defined by the Online Mendelian Inheritance of Man (OMIM) database. OMIM primarily contains monogenic disease genes with large phenotypic effects, and it will therefore be interesting to see if these results also extend to complex disease genes. Indeed, signatures of natural selection have been described for several genes associated with various complex diseases [34, 72–78]. If complex disease genes are enriched for signatures of natural selection, finding targets of adaptive evolution may be a useful strategy for prioritising candidate genes in diseasemapping studies.

It is important to note that a recent theoretical study has suggested that maximum-likelihood branch site models may have a high false-positive rate [79] and, therefore, the 667 significant (at p < 0.05) genes in the study by Clark et al. [62] may contain a higher than anticipated fraction of false positives. In addition, increased rates of d_n/d_s along a lineage do not always indicate the action of positive selection and can also occur due to relaxation of purifying selection [79, 80]. As the authors point out, obtaining polymorphism data from human populations would provide further insight into the evolutionary history of these genes and help to clarify some of the issues raised above.

Local adaptation

An interesting observation that has consistently emerged from large-scale studies of selection is that local adaptation may be a more common feature of recent human evolutionary history than previously thought [52–60, 63–68]. Human populations have clearly had dramatic range expansions during the past 100,000 years that, at least theoretically, may have led to geographically restricted selective pressures, such as unique dietary, pathogenic and climatic challenges. Several genes that possess patterns of genetic variation consistent with local adaptation have previously been reported (Table 3) [60, 72–74, 76, 81–85]. As an illustrative example, Figure 4 shows patterns of genetic variation for a 115 kilobase region on chromosome 7q33 that possesses a striking signature of local adaptation in European-American populations [60]. Two of the genes in this region, TRPV5 and TRPV6, mediate the rate-limiting step of dietary calcium absorption; [86, 87] given the fact that lactase persistence and related metabolic pathways were selected for in northern European populations, [83] they are particularly strong candidates for the gene or genes driving this pattern of local adaptation.

Table 3 Genes with evidence of local adaptation

Full size table

In addition, several studies have found that non-African populations possess more evidence for selection relative to African populations [60, 67, 68]. As most studies have considered only a single African population, however, it is difficult to determine whether the observed differences in the frequency of selective events between African and non-African populations is a general phenomenon or simply reflects the need to sample African populations more comprehensively. Furthermore, theoretical studies have demonstrated that the power to detect a recent selective sweep is greater compared with an older sweep [41, 42, 88, 89]. Therefore, the frequency of selective events may be similar in African and non-African populations, but may be easier to detect in non-African populations if they occurred more recently.

Looking ahead: The HapMap project

The HapMap project (http://www.hapmap.org/) is a large international collaboration to describe patterns of common haplotype variation throughout the human genome [61]. The initial goal of the HapMap project is to genotype 600,000 SNPs in 270 individuals: 90 individuals of northern and western European ancestry (30 trios consisting of two parents and an adult child), 90 Yoruban individuals from Ibadan, Nigeria (30 trios), 45 unrelated Japanese individuals from Tokyo, Japan, and 45 unrelated Han Chinese individuals from Beijing, China. Although the HapMap project was initially developed to facilitate the search for complex disease genes, it will provide a powerful resource for population genetics and evolutionary studies. Specifically, it will provide a unifying publicly-available resource of genome-wide variation data to interrogate systematically for signatures of natural selection. As numerous evolutionary analyses will undoubtedly be conducted on the HapMap data, results can be verified across studies, which will allow prioritising candidate selection genes for subsequent studies.

Future challenges

It is important to temper our enthusiasm for genome-wide scans of natural selection because several analytical and conceptual challenges remain. For example, as indicated above, thousands of hypothesis tests will be performed in a typical study and it is necessary to correct for multiple tests to avoid an unacceptably high false-positive rate. One particularly appealing approach is to control the false discovery rate, [90, 91] which is more powerful than traditional methods such as Bonferroni corrections and has been used in a wide variety of genomics analyses. Furthermore, as numerous genome-wide scans for selection will be applied to common datasets, such as the HapMap, methods for combining results across studies would be invaluable.

A critical issue that has already arisen in current genomewide scans for selection is the need to verify the signature of selection through replication studies and by alternative experimental approaches. The importance of follow-up studies cannot be overstated because in their absence we will simply be left with a list of interesting 'candidate selection genes'. The problem of follow-up replication in genome-wide studies is a general one that has been considered in linkage analysis [92] and genetic association studies [93]. Clearly, replication in independent samples from the same population is an important criterion that can be used to discard false positives that accumulate from the multiple testing inherent in genome scans. Genome-wide study designs are known to suffer from the 'winner's curse' phenomenon, however, whereby the effect sizes of statistically significant loci are systematically over-estimated [93, 94]. If such concerns are ignored, the statistical power of subsequent replication attempts is likely to be over-estimated, leading the community to place undue faith in the veracity of failed replication attempts. Even if signatures of selection are confirmed, it remains difficult to identify the specific variants that have been subject to selection. Ideally, suspected targets of selection will be functionally characterised, which will facilitate inferences on genotype - phenotype correlations and ultimately on how the putative selected alleles affect fitness. Finally, more powerful methods to estimate evolutionary parameters, such as the timing of selective events and the strength of selection, need to be developed.

In addition to the issues described above, it is important to note that all of the statistical methods and studies considered in this review are predicated upon simple theoretical models of natural selection. For example, tests such as Tajima's D search for signatures of selection that act on a single locus. Genes do not exist in isolation, however, and it is possible -- perhaps even likely -- that selection acts on combinations of alleles, a process that is referred to as epistatic selection [95]. Recently, two studies in Drosophila melanogaster demonstrated strong empirical evidence for epistatic selection [96, 97]. It seems likely that that progress in reconstructing gene and protein networks will serve as a valuable guide in beginning to explore epistatic selection in humans.

Conclusions

The intersection of high-throughput methods to access human genetic variation on a genome-wide scale and statistical tools to identify signatures of natural selection will undoubtedly provide a deeper understanding of how adaptive processes helped to shape our genomes. Furthermore, the same resources used to scan the genome for signatures of selection will also provide a more comprehensive understanding of human demographic history, which will be necessary to understand how neutral and non-neutral evolutionary forces have interacted to shape extant patterns of human genetic and phenotypic diversity. Although many hurdles are likely to be encountered, the evolutionary insights obtained from genome-wide analyses will have implications for many contemporary issues, such as the functional annotation of the human genome and the discovery of complex disease genes.

References

Valle D: Genetics, individuality, and medicine in the 21st century. Am J Hum Genet. 2004, 74: 374-381. 10.1086/382790.
Article PubMed Central CAS PubMed Google Scholar
Bamshad M, Wooding SP: Signatures of natural selection in the human genome. Nat Rev Genet. 2003, 4: 99-111. 10.1038/nrg999.
Article CAS PubMed Google Scholar
Harr B, Kauer M, Schlotterer C: Hitchhiking mapping: A population-based fine mapping strategy for adaptive mutations in Drosophila melanogaster. Proc Natl Acad Sci USA. 2002, 99: 12949-12954. 10.1073/pnas.202336899.
Article PubMed Central CAS PubMed Google Scholar
Kauer MO, Dieringer D, Schlotterer C: A microsatellite variability screen for positive selection associated with the "Out of Africa" habitat expansion of Drosophila melanogaster. Genetics. 2003, 165: 1137-1148.
PubMed Central CAS PubMed Google Scholar
Schofl G, Schlotterer C: Patterns of microsatellite variability among X chromosomes and autosomes indicate a high frequency of beneficial mutations in non-African D. simulans. Mol Biol Evol. 2004, 21: 1384-1390. 10.1093/molbev/msh132.
Article PubMed Google Scholar
Kimura M: Evolutionary rate at the molecular level. Nature. 1968, 217: 624-626. 10.1038/217624a0.
Article CAS PubMed Google Scholar
King JL, Jukes TH: Non-Darwinian evolution. Science. 1969, 164: 788-798. 10.1126/science.164.3881.788.
Article CAS PubMed Google Scholar
Kimura M: The Neutral Theory of Molecular Evolution. 1983, Cambridge University Press, Cambridge, UK
Book Google Scholar
Ohta T: Slightly deleterious mutant substitutions in evolution. Nature. 1973, 246: 96-98. 10.1038/246096a0.
Article CAS PubMed Google Scholar
Ohta T, Gillespie JH: Development of neutral and nearly neutral theories. Theor Popul Biol. 1996, 49: 128-142. 10.1006/tpbi.1996.0007.
Article PubMed Google Scholar
Otto SP: Detecting the form of selection from DNA sequence data. Trends Genet. 2000, 16: 526-529. 10.1016/S0168-9525(00)02141-7.
Article CAS PubMed Google Scholar
Nielsen R: Statistical tests of selective neutrality in the age of genomics. Heredity. 2001, 86: 641-647. 10.1046/j.1365-2540.2001.00895.x.
Article CAS PubMed Google Scholar
Kingman JFC: The coalescent. Stochastic Process Appl. 1982, 13: 235-248. 10.1016/0304-4149(82)90011-4.
Article Google Scholar
Kingman JFC: On the genealogy of large populations. J Appl Prob. 1982, 19A: 27-43.
Article Google Scholar
Hudson RR: Properties of a neutral allele model with intragenic recombination. Theor Popul Biol. 1983, 23: 183-201. 10.1016/0040-5809(83)90013-8.
Article CAS PubMed Google Scholar
Hudson RR: Testing the constant-rate neutral allele model with protein sequence data. Evolution. 1983, 37: 203-217. 10.2307/2408186.
Article Google Scholar
Tajima F: Evolutionary relationship of DNA sequences in finite populations. Genetics. 1983, 105: 437-460.
PubMed Central CAS PubMed Google Scholar
Fu YX, Li WH: Coalescing into the 21st century: An overview and prospects of coalescent theory. Theor Popul Biol. 1999, 56: 1-10. 10.1006/tpbi.1999.1421.
Article CAS PubMed Google Scholar
Rosenberg NA, Nordborg M: Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nat Rev Genet. 2002, 3: 380-390. 10.1038/nrg795.
Article CAS PubMed Google Scholar
Charlesworth B, Morgan MT, Charlesworth D: The effect of deleterious mutations on neutral molecular variation. Genetics. 1993, 134: 1289-1303.
PubMed Central CAS PubMed Google Scholar
Hudson RR, Kaplan NL: Deleterious background selection with recombination. Genetics. 1995, 141: 1605-1617.
PubMed Central CAS PubMed Google Scholar
Neuhauser C, Krone SK: The genealogy of samples in models with selection. Genetics. 1997, 145: 519-534.
PubMed Central CAS PubMed Google Scholar
Maynard Smith J, Haigh J: The hitch-hiking effect of a favorable gene. Genet Res. 1974, 231: 1114-1116.
Google Scholar
Thomson G: The effect of a selected locus on a linked neutral locus. Genetics. 1977, 85: 752-788.
Google Scholar
Kaplan N, Hudson RR, Langley CH: The "hitchhiking effect" revisited. Genetics. 1989, 123: 887-899.
PubMed Central CAS PubMed Google Scholar
Stephan W, Wiehe THE, Lenz MW: The effect of strongly selected substitutions on neutral polymorphism: Analytical results based on diffusion theory. Theor Popul Biol. 1992, 41: 237-254. 10.1016/0040-5809(92)90045-U.
Article Google Scholar
Nordborg M: Structured coalescent processes on different time scales. Genetics. 1997, 146: 1501-1514.
PubMed Central CAS PubMed Google Scholar
Schierup MH, Vekemans X, Charlesworth D: The effect of subdivision on variation at multi-allelic loci under balancing selection. Genet Res. 2000, 76: 51-62. 10.1017/S0016672300004535.
Article CAS PubMed Google Scholar
Kelly JK, Wade MJ: Molecular evolution near a two-locus balanced polymorphism. J Theor Biol. 2000, 204: 83-101. 10.1006/jtbi.2000.2003.
Article CAS PubMed Google Scholar
Nordborg M, Innan H: The genealogy of sequences containing multiple sites subject to strong selection in a subdivided population. Genetics. 2003, 163: 1201-1213.
PubMed Central PubMed Google Scholar
Neilsen R: Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics. 2000, 154: 931-942.
Google Scholar
Braverman JM, Hudson RR, Kaplan NL, et al: The hitchhiking effect on the site frequency spectrum of DNA polymorphism. Genetics. 1995, 140: 783-796.
PubMed Central CAS PubMed Google Scholar
Fay JC, Wu CI: Hitchhiking under positive Darwinian selection. Genetics. 2000, 155: 1405-1413.
PubMed Central CAS PubMed Google Scholar
Sabeti PC, Reich DE, Higgins JM, et al: Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002, 419: 832-837. 10.1038/nature01140.
Article CAS PubMed Google Scholar
Takahata N, Nei M: Allelic genealogy under overdominant and frequency-dependent selection and polymorphism of major histocompatibility complex loci. Genetics. 1990, 124: 967-978.
PubMed Central CAS PubMed Google Scholar
Tajima F: The effect of change in population size on DNA polymorphism. Genetics. 1989, 123: 597-601.
PubMed Central CAS PubMed Google Scholar
Przeworski M, Hudson RR, Di Rienzo A: Adjusting the focus on human variation. Trends Genet. 2000, 16: 296-302. 10.1016/S0168-9525(00)02030-8.
Article CAS PubMed Google Scholar
Kreitman M: Methods to detect selection in populations with applications to the human. Annu Rev Genomics Hum Genet. 2000, 1: 539-559. 10.1146/annurev.genom.1.1.539.
Article CAS PubMed Google Scholar
Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989, 123: 585-595.
PubMed Central CAS PubMed Google Scholar
Fu YX, Li WH: Statistical test of neutrality of mutations. Genetics. 1993, 133: 693-709.
PubMed Central CAS PubMed Google Scholar
Fu YX: Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics. 1997, 186: 1997-2004.
Google Scholar
Simonsen KL, Churchill GA, Aquadro CF: Properties of statistical tests of neutrality for DNA polymorphism data. Genetics. 1995, 141: 413-429.
PubMed Central CAS PubMed Google Scholar
Kuhner MK, Yamato J, Felsenstein J: Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics. 1995, 140: 1421-1430.
PubMed Central CAS PubMed Google Scholar
Kuhner MK, Yamato J, Felsenstein J: Maximum likelihood estimation of population growth rates based on the coalescent. Genetics. 1998, 149: 429-434.
PubMed Central CAS PubMed Google Scholar
Felsenstein J: Likelihood calculations on coalescents. Inferring Phylogenies. Edited by: Felsenstein J. 2004, Sinauer Associates, Sunderland, MA, 470-487.
Google Scholar
Cavalli-Sforza LL: Population structure and human evolution. Proc R Soc Lond B Biol Sci. 1966, 164: 362-379. 10.1098/rspb.1966.0038.
Article CAS PubMed Google Scholar
Lewontin RC, Krakauer J: Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics. 1973, 74: 175-195.
PubMed Central CAS PubMed Google Scholar
Weir BS, Cockerham CC: Estimating F-statistics for the analysis of population structure. Evolution. 1984, 38: 1358-1370. 10.2307/2408641.
Article Google Scholar
Vitalis R, Dawson K, Boursot P: Interpretation of variation across marker loci as evidence of selection. Genetics. 2001, 158: 1811-1823.
PubMed Central CAS PubMed Google Scholar
Beaumont MA, Balding DJ: Identifying adaptive genetic divergence among populations from genome scans. Mol Ecol. 2004, 13: 969-980. 10.1111/j.1365-294X.2004.02125.x.
Article CAS PubMed Google Scholar
Hudson RR, Kreitman M, Aguade M: A test of neutral molecular evolution based on nucleotide data. Genetics. 1987, 116: 153-159.
PubMed Central CAS PubMed Google Scholar
McDonald JH, Kreitman M: Adaptive protein evolution at the Adh locus in Drosophila. Nature. 1991, 351: 652-654. 10.1038/351652a0.
Article CAS PubMed Google Scholar
McDonald JH: Improved tests for heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence. Mol Biol Evol. 1998, 15: 377-384. 10.1093/oxfordjournals.molbev.a025934.
Article CAS PubMed Google Scholar
Eyre-Walker A: Changing effective population size and the McDonald-Kreitman test. Genetics. 2002, 162: 2017-2024.
PubMed Central PubMed Google Scholar
Nielsen R, Yang Z: Likelihood models for detecting positive selected amino acid sites and applications to the HIV-1 envelope gene. Genetics. 1998, 148: 929-936.
PubMed Central CAS PubMed Google Scholar
Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986, 3: 418-426.
CAS PubMed Google Scholar
Suzuki Y, Gojobori T: A method for detecting positive selection at single amino acid sites. Mol Biol Evol. 1999, 16: 1315-1328. 10.1093/oxfordjournals.molbev.a026042.
Article CAS PubMed Google Scholar
Andolfatto P: Adaptive hitchhiking effects on genome variability. Curr Opin Genet Dev. 2001, 11: 635-641. 10.1016/S0959-437X(00)00246-X.
Article CAS PubMed Google Scholar
Hudson RR: Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics. 2002, 18: 337-338. 10.1093/bioinformatics/18.2.337.
Article CAS PubMed Google Scholar
Akey JM, Eberle MA, Rieder MJ, et al: Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol. 2004, 2: 1591-1599.
Article CAS Google Scholar
International HapMap Consortium: The international HapMap project. Nature. 2003, 426: 789-794. 10.1038/nature02168.
Article Google Scholar
Clark AG, Glanowski S, Nielsen R, et al: Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science. 2003, 302: 1960-1963. 10.1126/science.1088821.
Article CAS PubMed Google Scholar
Yang Z, Nielsen R: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000, 17: 32-43. 10.1093/oxfordjournals.molbev.a026236.
Article CAS PubMed Google Scholar
Yang Z: PAML: A program package for phylogenetic analysis by maximum likelihood. Comput Appl BioSci. 1997, 13: 555-556.
CAS PubMed Google Scholar
Akey JM, Zhang G, Zhang K, et al: Interrogating a highdensity SNP map for signatures of natural selection. Genome Res. 2002, 12: 1805-1814. 10.1101/gr.631202.
Article PubMed Central CAS PubMed Google Scholar
Payseur BA, Cutter AD, Nachman MW: Searching for evidence of positive selection in the human genome using patterns of microsatellite variability. Mol Biol Evol. 2002, 19: 1143-1153. 10.1093/oxfordjournals.molbev.a004172.
Article CAS PubMed Google Scholar
Kayser M, Brauer S, Stoneking M: A genome scan to detect candidate regions influenced by local natural selection in human populations. Mol Biol Evol. 2003, 20: 893-900. 10.1093/molbev/msg092.
Article CAS PubMed Google Scholar
Storz JF, Payseur BA, Nachman MW: Genome scans of DNA variability in humans reveal evidence for selective sweeps outside of Africa. Mol Biol Evol. 2004, 21: 1800-1811. 10.1093/molbev/msh192.
Article CAS PubMed Google Scholar
Schlötterer C: A microsatellite-based multilocus screen for the identification of local selective sweeps. Genetics. 2002, 160: 753-763.
PubMed Central PubMed Google Scholar
Schlötterer C, Wiehe T: Microsatellites, a neutral marker to infer selective sweeps. Microsatellites -- Evolution and Applications. Edited by: Goldstein D, Schlötterer C. 1999, Oxford University Press, Oxford, UK, 238-248.
Google Scholar
Wiehe T: The effect of selective sweeps on the variance of the allele distribution of a linked multi-allele locus-hitchhiking of microsatellites. Theor Popul Biol. 1998, 53: 272-283. 10.1006/tpbi.1997.1346.
Article CAS PubMed Google Scholar
Hamblin MT, Di Rienzo A: Detection of the signature of natural selection in humans: Evidence from the Duffy blood group locus. Am J Hum Genet. 2000, 66: 1669-1679. 10.1086/302879.
Article PubMed Central CAS PubMed Google Scholar
Tishkoff SA, Varkonyi R, Cahinhinan N, et al: Haplotype diversity and linkage disequilibrium at human G6PD: Recent origin of alleles that confer malarial resistance. Science. 2001, 293: 455-462. 10.1126/science.1061573.
Article CAS PubMed Google Scholar
Hamblin MT, Thompson EE, Di Rienzo A: Complex signatures of natural selection at the Duffy blood group locus. Am J Hum Genet. 2002, 70: 369-383. 10.1086/338628.
Article PubMed Central PubMed Google Scholar
Bamshad MJ, Mummidi S, Gonzalez E, et al: A strong signature of balancing selection in the 50 cis-regulatory region of CCR5. Proc Natl Acad Sci USA. 2002, 99: 10539-10544. 10.1073/pnas.162046399.
Article PubMed Central CAS PubMed Google Scholar
Fullerton SM, Bartoszewicz A, Ybazeta G, et al: Geographic and haplotype structure of candidate type 2 diabetes susceptibility variants at the calpain-10 locus. Am J Hum Genet. 2002, 70: 1096-1106. 10.1086/339930.
Article PubMed Central CAS PubMed Google Scholar
Rockman MV, Hahn MW, Soranzo N, et al: Positive selection on MMP3 regulation has shaped heart disease risk. Curr Biol. 2004, 14: 1531-1539. 10.1016/j.cub.2004.08.051.
Article CAS PubMed Google Scholar
Nakajima T, Wodding S, Sakagami T, et al: Natural selection and population history in the human angiotensinogen gene (AGT): 736 complete ATG sequences in chromosomes from around the world. Am J Hum Genet. 2004, 74: 898-916. 10.1086/420793.
Article PubMed Central CAS PubMed Google Scholar
Zhang J: Frequent false detection of positive selection by the likelihood method with branch-site models. Mol Biol Evol. 2004, 21: 1332-1339. 10.1093/molbev/msh117.
Article CAS PubMed Google Scholar
Rooney AP, Zhang J: Rapid evolution of a primate sperm protein: Relaxation of functional constraint or positive Darwinian selection?. Mol Biol Evol. 1999, 16: 706-710. 10.1093/oxfordjournals.molbev.a026153.
Article CAS PubMed Google Scholar
Gilad Y, Rosenberg S, Przeworski M, et al: Evidence for positive selection and population structure at the human MAO-A gene. Proc Natl Acad Sci USA. 2002, 99: 862-867. 10.1073/pnas.022614799.
Article PubMed Central CAS PubMed Google Scholar
Rana BK, Hewett-Emmett D, Jin L, et al: High polymorphism at the human melanocortin 1 receptor locus. Genetics. 1999, 151: 1547-1557.
PubMed Central CAS PubMed Google Scholar
Bersaglieri T, Sabeti PC, Patterson N, et al: Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet. 2004, 74: 1111-1120. 10.1086/421051.
Article PubMed Central CAS PubMed Google Scholar
Stephens JC, Reich DE, Goldstein DB, et al: Dating the origin of the CCR5-Delta32 AIDS-resistance allele by the coalescence of haplotypes. Am J Hum Genet. 1998, 62: 1507-1515. 10.1086/301867.
Article PubMed Central CAS PubMed Google Scholar
Rockman MV, Hahn MW, Soranzo N, et al: Positive selection on a human-specific transcription factor binding site regulating IL4 expression. Curr Biol. 2003, 13: 2118-2123. 10.1016/j.cub.2003.11.025.
Article CAS PubMed Google Scholar
Nijenhuis T, Hoenderop JGJ, Nilius B, Bindels RJM: (Patho)physiological implications of the novel epithelial Ca2þ channels TRPV5 and TRPV6. Pflugers Arch. 2003, 446: 401-409. 10.1007/s00424-003-1038-7.
Article CAS PubMed Google Scholar
van de Graaf SF, Hoenderop JG, Gkika D, et al: Functional expression of the epithelial Ca2⁺ channels (TRPV5 and TRPV6) requires association of the S100A10-annexin 2 complex. EMBO J. 2003, 22: 1478-1487. 10.1093/emboj/cdg162.
Article PubMed Central CAS PubMed Google Scholar
Kim Y, Stephan W: Joint effects of genetic hitchhiking and background selection on neutral variation. Genetics. 2000, 155: 1415-1427.
PubMed Central CAS PubMed Google Scholar
Przeworski M: The signature of positive selection at randomly chosen loci. Genetics. 2002, 160: 1179-1189.
PubMed Central PubMed Google Scholar
Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. JR Stat Soc. 1995, 57: 289-300.
Google Scholar
Storey JD, Tibshirani R: Statistical significance for genome-wide experiments. Proc Nat Acad Sci USA. 2003, 100: 9440-9445. 10.1073/pnas.1530509100.
Article PubMed Central CAS PubMed Google Scholar
Lander E, Kruglyak L: Genetic dissection of complex traits: Guidelines for interpreting and reporting linkage results. Nat Genet. 1995, 11: 241-247. 10.1038/ng1195-241.
Article CAS PubMed Google Scholar
Lohmueller KE, Pearce CL, Pike M, et al: Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet. 2003, 33: 177-182. 10.1038/ng1071.
Article CAS PubMed Google Scholar
Goring HH, Terwilliger JD, Blangero J: Large upward bias in estimation of locus-specific effects from genomewide scans. Am J Hum Genet. 2001, 69: 1357-1369. 10.1086/324471.
Article PubMed Central CAS PubMed Google Scholar
Lewontin RC, Kojima K: The evolutionary dynamics of complex polymorphisms. Evolution. 1960, 14: 458-472. 10.2307/2405995.
Article Google Scholar
Takano-Shimizu T, Kawabe A, Inomata N, et al: Interlocus nonrandom association of polymorphisms in Drosophila chemoreceptor genes. Proc Natl Acad Sci USA. 2004, 101: 14156-14161. 10.1073/pnas.0401782101.
Article PubMed Central PubMed Google Scholar
Zapata C, Nunez C, Velasco T: Distribution of nonrandom associations between pairs of protein loci along the third chromosome of Drosophila melanogaster. Genetics. 2002, 161: 1539-1550.
PubMed Central CAS PubMed Google Scholar

Download references

Acknowledgements

We thank Jennifer Madeoy, Dayna Akey and an anonymous reviewer for critical reading of the manuscript and providing valuable comments. J.R. is supported by the University of Washington Medical Scientist Training Program. J.M.A. is supported by a Pilot and Feasibility Award from the Clinical Nutrition Research Unit at the University of Washington.

Author information

Authors and Affiliations

University of Washington, Seattle, Washington, USA
James Ronald & Joshua M. Akey

Authors

James Ronald
View author publications
You can also search for this author in PubMed Google Scholar
Joshua M. Akey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joshua M. Akey.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ronald, J., Akey, J.M. Genome-wide scans for loci under selection in humans. Hum Genomics 2, 113 (2005). https://doi.org/10.1186/1479-7364-2-2-113

Download citation

Received: 31 January 2005
Accepted: 31 January 2005
Published: 01 June 2005
DOI: https://doi.org/10.1186/1479-7364-2-2-113

Genome-wide scans for loci under selection in humans

Abstract

Introduction

Human genetic variation: The neutral expectation

Evolutionary forces perturb patterns of genetic variation

Detecting the signature of natural selection

Key advantages of genome-wide analyses

Genome scans for natural selection

Local adaptation

Looking ahead: The HapMap project

Future challenges

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Human Genomics

Contact us

Genome-wide scans for loci under selection in humans

Abstract

Introduction

Human genetic variation: The neutral expectation

Evolutionary forces perturb patterns of genetic variation

Detecting the signature of natural selection

Key advantages of genome-wide analyses

Genome scans for natural selection

Local adaptation

Looking ahead: The HapMap project

Future challenges

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Human Genomics

Contact us