Email updates

Keep up to date with the latest news and content from Human Genomics and BioMed Central.

Open Access Primary research

Cross-comparison of the genome sequences from human, chimpanzee, Neanderthal and a Denisovan hominin identifies novel potentially compensated mutations

Guojie Zhang1*, Zhang Pei1, Edward V Ball2, Matthew Mort2, Hildegard Kehrer-Sawatzki3 and David N Cooper2

Author Affiliations

1 Bioinformatics Department, Beijing Genomics Institute at Shenzhen, Shenzhen 518083, China

2 Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK

3 Institute of Human Genetics, University of Ulm, Albert-Einstein-Allee 11, 89081 Ulm, Germany

For all author emails, please log on.

Human Genomics 2011, 5:453-484  doi:10.1186/1479-7364-5-5-453

The electronic version of this article is the complete one and can be found online at: http://www.humgenomics.com/content/5/5/453


Received:22 March 2011
Accepted:22 March 2011
Published:1 July 2011

© 2011 Henry Stewart Publications

Abstract

The recent publication of the draft genome sequences of the Neanderthal and a ~50,000-year-old archaic hominin from Denisova Cave in southern Siberia has ushered in a new age in molecular archaeology. We previously cross-compared the human, chimpanzee and Neanderthal genome sequences with respect to a set of disease-causing/disease-associated missense and regulatory mutations (Human Gene Mutation Database) and succeeded in identifying genetic variants which, although apparently pathogenic in humans, may represent a 'compensated' wild-type state in at least one of the other two species. Here, in an attempt to identify further 'potentially compensated mutations' (PCMs) of interest, we have compared our dataset of disease-causing/disease-associated mutations with their corresponding nucleotide positions in the Denisovan hominin, Neanderthal and chimpanzee genomes. Of the 15 human putatively disease-causing mutations that were found to be compensated in chimpanzee, Denisovan or Neanderthal, only a solitary F5 variant (Val1736Met) was specific to the Denisovan. In humans, this missense mutation is associated with activated protein C resistance and an increased risk of thromboembolism and recurrent miscarriage. It is unclear at this juncture whether this variant was indeed a PCM in the Denisovan or whether it could instead have been associated with disease in this ancient hominin.

Keywords:
Human; chimpanzee; Neanderthal; Denisovan hominin; genome sequence; potentially compensated mutations; disease

Introduction

The recent publication of the draft sequence of the Neanderthal genome [1] ushered in a new age in molecular archaeology [2,3]. This achievement was followed closely by the publication of the draft genome sequence (1.9-fold coverage) of a ~50,000-year old archaic hominin from Denisova Cave in southern Siberia [4]. This hominin (a 'Denisovan') is thought to have been a member of a sister group of hominins to the Neanderthals with whom they lived sympatrically during the Upper Pleistocene [4-7]. Denisovans appear to be more closely related to Neanderthals than humans, having diverged from Neanderthals about 640,000 years ago and from extant Africans about 804,000 years ago [4].

Access to DNA sequence data from ancient hominins not only promises to revolutionise our knowledge of hominin relationships, but is also potentially informative in the context of exploring the molecular basis of human genetic disease [8,9]. We have previously cross-compared the human, chimpanzee and Neanderthal genome sequences with a set of disease-causing/disease-associated missense and regulatory mutations in order to identify genetic variants which, although apparently pathogenic in humans, may represent a 'compensated' wild-type state in at least one of the other two species ('potentially compensated mutations' [PCMs]) [10]. PCMs correspond to variants that may have been deleterious for a certain period of evolutionary time but which persisted long enough in a given population or species to have become positively selected upon the introduction of a 'compensatory' nucleotide change [8,11-14]. Such compensatory changes are thought to be localised in the same gene as the PCM [15]. Not only do PCMs represent excellent candidates for recent population-specific selection (with different alleles having exhibited differential functional importance in different environments), but they may also furnish us with new insights into the genetic basis of susceptibility to common diseases [8,14]. Here, in an attempt to identify further PCMs of interest, we have compared a dataset of human mutations of putative pathological significance with their corresponding nucleotide positions in the Neanderthal, Denisovan and chimpanzee genomes.

Methods

Human Gene Mutation Database (HGMD) dataset

A total of 46,060 disease-causing (DMs) or disease-associated mutations had been obtained from the HGMD [16]http://www.hgmd.org webcite as of 13th May 2010. These data comprised 44,348 missense mutations from within the coding regions of 2,628 genes, and 1,712 single base-pair substitutions from within the regulatory regions (5' and 3' untranslated/flanking regions) of 807 genes. Some 42,595 of the mutations were disease-causing (41,960 missense and 635 regulatory), whereas 3,465 represented disease-associated or functional polymorphisms (2,388 missense and 1,077 regulatory) (Table 1). The latter were further ascribed to three distinct subcategories: (1) DPs, comprising variants reported to be in statistically significant (p < 0.05) association with a particular human disease state but lacking experimental evidence of functionality -- for example, from expression studies; (2) disease-associated polymorphisms with experimental evidence of functionality (DFPs) such as, for example, altered in vitro gene expression or protein function; (3) FPs that have been shown in vitro or in vivo to affect the structure, function or expression of the gene or gene product but for which no statistically significant disease association has yet been reported (see http://www.hgmd.cf.ac.uk/docs/poly.html webcite for further information).

Table 1. Missense and regulatory mutations from the HGMD used in this study, categorised by mutation type and putative role in disease aetiology

Identification of PCMs

A total of 8,280,851 nucleotide positions at which the Denisovan genome differs from either the human (NCBI36/hg18) or chimpanzee genome were downloaded from the website of the Max Planck Institute for Evolutionary Anthropology http://bioinf.eva.mpg.de/download/DenisovaGenome/Denisova_Neandertal_catalog.tgz webcite[1,4]. The human and the Denisovan hominin were found to exhibit the same nucleotide at 7,283,268 positions (87.95 per cent), so that the human-chimpanzee mismatches must have arisen before the divergence of modern humans and Denisovans (termed a 'derived' or 'D' state in the Denisovan). A total of 941,947 positions (11.38 per cent) displayed the same nucleotide in both Denisovan and chimpanzee, suggesting that the respective substitutions were human specific ('ancestral' or 'A' state in the Denisovan). The remaining 55,636 positions, which display different nucleotides in modern humans, Denisovans and chimpanzees, were termed 'undefined' ('N' state). Of the 8,280,851 Denisovan nucleotide positions investigated here, there were 5,205,736 positions at which the Neanderthal was found to differ from at least one of modern human, chimpanzee and Denisovan. From these 5,205,736 sites, we identified 197 sites for which the apparent wild-type nucleotide in Denisovan, Neanderthal or chimpanzee was logged in the HGMD as disease causing or disease associated in modern humans (Table 2). From the remaining 3,075,115 sites, we identified 117 sites for which the apparent wild-type nucleotide in the Denisovan or chimpanzee was logged in the HGMD as disease causing or disease associated in either the Denisovan or chimpanzee (Table 3).

Table 2. HGMD-derived mutations identified as PCMs in the Denisovan, Neanderthal and/or chimpanzee genomes

Table 3. HGMD-derived mutations identified as PCMs in the Denisovan genome and/or chimpanzee genome

Gene ontology (GO) enrichment analysis

A GO enrichment analysis of PCM-containing genes against a background of 2,688 human disease-causing genes was performed using the DAVID bioinformatics tool [17]. The statistical significance of a particular GO term was calculated using Fisher's exact test, which was then adjusted to allow for multiple testing by means of the Benjamini-Hochberg correction [18].

Calculation of Wright's fixation index (FST) values

The FST measures the proportion of genetic diversity in a subdivided population that is attributable to allele frequency differences between subpopulations. Pairwise FST values have also been used as a measure of genetic distance between populations. In this context, the allele frequencies of polymorphic ancestral PCMs in selected populations were obtained from HapMap http://hapmap.ncbi.nlm.nih.gov/ webcite and pairwise FST values were estimated for each polymorphism using the small sample estimate proposed by Weir and Hill [19]. The significance of individual FST values was then assessed by reference to the empirical distribution of FST among all single nucleotide polymorphisms (SNPs) in HapMap.

Results and discussion

Identification of PCMs in the Denisovan, Neanderthal and/or chimpanzee genomes

A total of 44,348 missense mutations from 2,628 genes and 1,712 putative regulatory mutations from 807 genes, which have been recorded in the HGMD as being either causative of (or associated with) a human inherited disease state, were cross-compared with the corresponding nucleotide positions in the Neanderthal, Denisovan and chimpanzee genomes.

When the 197 PCMs covered by both the Denisovan and the Neanderthal sequences were considered, these included 129 of 143 PCMs identified in the Neanderthal genome (10/12 DMs, 65/73 DPs, 25/26 FPs, 29/32 DFPs), and 123 (62 per cent) PCMs for which the Denisovan, Neanderthal and chimpanzee wild-type nucleotides were identical to the human disease-causing/disease-associated mutant allele. Of the 117 PCMs covered only by the Denisovan sequence, there were 79 (67.5 per cent) for which both the Denisovan nucleotide and the chimpanzee nucleotide were identical to a human DM/disease-associated mutation. This may be indicative of either a bottleneck effect or selection during the evolution of the modern human lineage. Of the 197 PCMs, there was one mutation that was compensated only in the Neanderthal, one that was compensated only in the Denisovan, five that were compensated in both Neanderthal and Denisovan and 16 that were compensated only in the chimpanzee. There were also 18 mutations that differed between the Neanderthal and the Denisovan, which could imply that such mutations were identical-by-state (Tables 2 and 3).

Disease-causing PCMs

There were 16 human DMs that were found to be potentially compensated in the chimpanzee, Denisovan or Neanderthal (covered by both the Neanderthal and the Denisovan sequence) and 12 human DMs potentially compensated in the chimpanzee or Denisovan (covered only by the Denisovan sequence) (Table 4).

Table 4. Human DMs identified as PCMs

Of the human DMs that were potentially compensated in the chimpanzee, Denisovan or Neanderthal, only the putatively pathological F5 variant was specific to the Denisovan. In humans, this missense mutation, Val1736Met, is associated with activated protein C resistance and an increased risk of thromboembolism and recurrent miscarriage [20,21]. It is unclear at this juncture whether this variant was indeed a PCM in the Denisovan or whether it could instead have been associated with disease in this archaic hominin.

Even though Denisovans appear to be more closely related to Neanderthals than humans, the Neanderthal and Denisovan were discrepant with respect to certain PCMs (eg the SLC5A1 H615Q variant associated with glucose-galactose malabsorption). In this case, the Denisovan (and the chimpanzee) possessed the allele that was mutant in humans (G), whereas the Neanderthal possessed the allele (C) which was wild-type in humans. In this context, it may be pertinent to mention that SLC5A1 is located on chromosome 22q12.3 within a region of putative gene flow from Neanderthal to Eurasian [1].

Some of the PCMs listed in Table 4 may well have been misclassified by the original authors as disease-causing in human (especially those variants which have been allocated a '?' by the HGMD; see Table 4) when they were actually neutral polymorphisms; however, this is much less likely in the case of the 16 human disease-causing mutations that are covered by both the Neanderthal and Denisovan sequences. These mutant alleles would have had to have been maintained in both Neanderthal and Denisovan populations for ~640,000 years, when these two hominins last shared a common ancestor, and this would have been unlikely if such variants had been neutral polymorphisms.

Statistically enriched GO terms were identified for genes containing human DMs identified as PCMs (Table 4) against a background of known disease-causing genes (from the HGMD) and are shown in Table S1 (Table 6). Five significantly enriched GO terms were found; all relate to the plasma membrane.

Table S1. Significantly enriched GO terms (Benjamini-corrected p-value <0.05) for human genes containing DMs identified as PCMs (listed in Table 4) against a background of known disease-causing genes. No significantly enriched GO terms were found to relate to biological processes or molecular function

With respect to the DPs/FPs, 100 DPs, 39 FPs and 43 DFPs were covered by both the Neanderthal and Denisovan sequences (Table S2 (Table 7)), while 52 DPs, 26 FPs and 27 DFPs were covered by the Denisovan but not the Neanderthal sequence (Table S3 (Table 8)); these DPs/FPs may be relevant to human genetic disease.

Table S2. PCMs covered by both the Denisovan sequence and the Neanderthal sequence

Table S3. PCMs covered by the Denisovan sequence but not the Neanderthal sequence

Human variants with significantly different population frequencies at sites of PCMs

The FST was used to quantify the allele frequency differences for the different polymorphic PCMs between extant African, Asian and European populations. Alleles that have been the target of localised positive selection tend to exhibit unusually high FST values [22,23]. We therefore compared the FST values of the ancestral polymorphic PCMs with the empirical FST distribution derived from all HapMap SNPs (International HapMap Consortium, 2007),[24] to assess the significance of individual FST values. We identified six PCMs with significantly elevated FST values (Table 5).

Table 5. PCMs (disease-causing and disease-related) with significantly different genotype frequencies in different HapMap populations

Although four of these PCMs had already been identified in our previous comparative analysis of the human, chimpanzee and Neanderthal genomes,[10] two novel PCMs were identified in the putative cation exchanger SLC24A5 (DP) gene and in the alcohol dehydrogenase ADH1B (FP) gene. These genes have in common the GO terms GO:0046872, GO:0043169 and GO:0043167, terms which relate to metal ion binding, cation binding and ion binding, respectively. The SLC24A5 variant appears to be associated with increased skin pigmentation and predominates in African/East Asian populations [25,26].

In conclusion, using the newly reported genome sequence from a Denisovan hominin, we have identified a number of PCMs in the chimpanzee, Neanderthal and Denisovan. Those human PCMs that were ancestral (ie both the Denisovan nucleotide and the chimpanzee nucleotide were identical to the human DM/disease-associated mutation) could potentially be indicative of either the human lineage-specific loss of compensatory nucleotide changes within the respective genes carrying the PCM, or adaptive differences between modern humans and Denisovans.

References

  1. Green RE, Krause J, Briggs AW, Maricic T, et al.: A draft sequence of the Neanderthal genome.

    Science 2010, 328:710-722. PubMed Abstract | Publisher Full Text OpenURL

  2. Noonan JP: Neanderthal genomics and the evolution of modern humans.

    Genome Res 2010, 20:547-553. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  3. Gibbons A: Paleogenetics. Close encounters of the prehistoric kind.

    Science 2010, 328:680-684. PubMed Abstract | Publisher Full Text OpenURL

  4. Reich D, Green RE, Kircher M, Krause J, et al.: Genetic history of an archaic hominin group from Denisova Cave in Siberia.

    Nature 2010, 468:1053-1060. PubMed Abstract | Publisher Full Text OpenURL

  5. Reich D, Green RE, Kircher M, Krause J, et al.: The complete mitochondrial DNA genome of an unknown hominin from southern Siberia.

    Nature 2010, 468:1053-1060. PubMed Abstract | Publisher Full Text OpenURL

  6. Krause J, Fu Q, Good JM, Viola B, et al.: The complete mitochondrial DNA genome of an unknown hominin from southern Siberia.

    Nature 2010, 464:894-897. PubMed Abstract | Publisher Full Text OpenURL

  7. Martinón-Torres M, Dennell R, Bermúdez de Castro JM: The Denisova hominin need not be an out of Africa story.

    J Hum Evol 2011, 60:251-255. PubMed Abstract | Publisher Full Text OpenURL

  8. Di Rienzo A, Hudson RR: An evolutionary framework for common diseases: The ancestral-susceptibility model.

    Trends Genet 2005, 21:596-601. PubMed Abstract | Publisher Full Text OpenURL

  9. Crespi BJ: The origins and evolution of genetic disease risk in modern humans.

    Ann N Y Acad Sci 2010, 1206:80-109. PubMed Abstract | Publisher Full Text OpenURL

  10. Zhang G, Pei Z, Krawczak M, Ball EV, et al.: Triangulation of the human, chimpanzee, and Neanderthal genome sequences ident-ifies potentially compensated mutations.

    Hum Mutat 2010, 31:1286-1293. PubMed Abstract | Publisher Full Text OpenURL

  11. Gao L, Zhang J: Why are some human disease-associated mutations fixed in mice?

    Trends Genet 2003, 19:678-681. PubMed Abstract | Publisher Full Text OpenURL

  12. Azevedo L, Suriano G, van Asch B, Harding RM, Amorim A: Epistatic interactions: How strong in disease and evolution?

    Trends Genet 2006, 22:581-585. PubMed Abstract | Publisher Full Text OpenURL

  13. Ferrer-Costa C, Orozco M, de la Cruz X: Characterization of compensated mutations in terms of structural and physico-chemical properties.

    J Mol Biol 2007, 365:249-256. PubMed Abstract | Publisher Full Text OpenURL

  14. Corona E, Dudley JT, Butte AJ: Extreme evolutionary disparities seen in positive selection across seven complex diseases.

    PLoS One 2010, 5:e12236. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  15. Baresić A, Hopcroft LE, Rogers HH, Hurst JM, et al.: Compensated pathogenic deviations: Analysis of structural effects.

    J Mol Biol 2010, 396:19-30. PubMed Abstract | Publisher Full Text OpenURL

  16. Stenson PD, Mort M, Ball EV, Howells K, et al.: The Human Gene Mutation Database: 2008 update.

    Genome Med 2009, 1:13. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  17. Huang da W, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.

    Nat Protoc 2009, 4:44-57. PubMed Abstract | Publisher Full Text OpenURL

  18. Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing.

    J R Stat Soc Series B 1995, 57:289-300. OpenURL

  19. Weir BS, Hill WG: Estimating F-statistics.

    Annu Rev Genet 2002, 36:721-750. PubMed Abstract | Publisher Full Text OpenURL

  20. Dawood F, Mountford R, Farquharson R, Quenby S: Genetic polymorphisms on the factor V gene in women with recurrent miscarriage and acquired APCR.

    Hum Reprod 2007, 22:2546-2553. PubMed Abstract | Publisher Full Text OpenURL

  21. Chegeni R, Kazemi B, Hajifathali A, Pourfathollah A, et al.: Factor V mutations in Iranian patients with activated protein C resistance and venous thrombosis.

    Thromb Res 2007, 119:189-193. PubMed Abstract | Publisher Full Text OpenURL

  22. Holsinger KE, Weir BS: Genetics in geographically structured populations: Defining, estimating and interpreting FST.

    Nat Rev Genet 2009, 10:639-650. PubMed Abstract | Publisher Full Text OpenURL

  23. Thornton KR, Jensen JD: Controlling the false-positive rate in multilocus genome scans for selection.

    Genetics 2007, 175:737-750. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  24. International HapMap Consortium, Frazer KA, Ballinger DG, Cox DR, et al.: A second generation human haplotype map of over 3.1 million SNPs.

    Nature 2007, 449:851-861. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  25. Lamason RL, Mohideen MA, Mest JR, Wong AC, et al.: SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans.

    Science 2005, 310:1782-1786. PubMed Abstract | Publisher Full Text OpenURL

  26. Stokowski RP, Pant PV, Dadd T, Fereday A, et al.: A geno-mewide association study of skin pigmentation in a South Asian population.

    Am J Hum Genet 2007, 81:1119-1132. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL