The recent publication of the draft genome sequences of the Neanderthal and a ~50,000-year-old archaic hominin from Denisova Cave in southern Siberia has ushered in a new age in molecular archaeology. We previously cross-compared the human, chimpanzee and Neanderthal genome sequences with respect to a set of disease-causing/disease-associated missense and regulatory mutations (Human Gene Mutation Database) and succeeded in identifying genetic variants which, although apparently pathogenic in humans, may represent a 'compensated' wild-type state in at least one of the other two species. Here, in an attempt to identify further 'potentially compensated mutations' (PCMs) of interest, we have compared our dataset of disease-causing/disease-associated mutations with their corresponding nucleotide positions in the Denisovan hominin, Neanderthal and chimpanzee genomes. Of the 15 human putatively disease-causing mutations that were found to be compensated in chimpanzee, Denisovan or Neanderthal, only a solitary F5 variant (Val1736Met) was specific to the Denisovan. In humans, this missense mutation is associated with activated protein C resistance and an increased risk of thromboembolism and recurrent miscarriage. It is unclear at this juncture whether this variant was indeed a PCM in the Denisovan or whether it could instead have been associated with disease in this ancient hominin.
Keywords:Human; chimpanzee; Neanderthal; Denisovan hominin; genome sequence; potentially compensated mutations; disease
The recent publication of the draft sequence of the Neanderthal genome  ushered in a new age in molecular archaeology [2,3]. This achievement was followed closely by the publication of the draft genome sequence (1.9-fold coverage) of a ~50,000-year old archaic hominin from Denisova Cave in southern Siberia . This hominin (a 'Denisovan') is thought to have been a member of a sister group of hominins to the Neanderthals with whom they lived sympatrically during the Upper Pleistocene [4-7]. Denisovans appear to be more closely related to Neanderthals than humans, having diverged from Neanderthals about 640,000 years ago and from extant Africans about 804,000 years ago .
Access to DNA sequence data from ancient hominins not only promises to revolutionise our knowledge of hominin relationships, but is also potentially informative in the context of exploring the molecular basis of human genetic disease [8,9]. We have previously cross-compared the human, chimpanzee and Neanderthal genome sequences with a set of disease-causing/disease-associated missense and regulatory mutations in order to identify genetic variants which, although apparently pathogenic in humans, may represent a 'compensated' wild-type state in at least one of the other two species ('potentially compensated mutations' [PCMs]) . PCMs correspond to variants that may have been deleterious for a certain period of evolutionary time but which persisted long enough in a given population or species to have become positively selected upon the introduction of a 'compensatory' nucleotide change [8,11-14]. Such compensatory changes are thought to be localised in the same gene as the PCM . Not only do PCMs represent excellent candidates for recent population-specific selection (with different alleles having exhibited differential functional importance in different environments), but they may also furnish us with new insights into the genetic basis of susceptibility to common diseases [8,14]. Here, in an attempt to identify further PCMs of interest, we have compared a dataset of human mutations of putative pathological significance with their corresponding nucleotide positions in the Neanderthal, Denisovan and chimpanzee genomes.
Human Gene Mutation Database (HGMD) dataset
A total of 46,060 disease-causing (DMs) or disease-associated mutations had been obtained from the HGMD http://www.hgmd.org webcite as of 13th May 2010. These data comprised 44,348 missense mutations from within the coding regions of 2,628 genes, and 1,712 single base-pair substitutions from within the regulatory regions (5' and 3' untranslated/flanking regions) of 807 genes. Some 42,595 of the mutations were disease-causing (41,960 missense and 635 regulatory), whereas 3,465 represented disease-associated or functional polymorphisms (2,388 missense and 1,077 regulatory) (Table 1). The latter were further ascribed to three distinct subcategories: (1) DPs, comprising variants reported to be in statistically significant (p < 0.05) association with a particular human disease state but lacking experimental evidence of functionality -- for example, from expression studies; (2) disease-associated polymorphisms with experimental evidence of functionality (DFPs) such as, for example, altered in vitro gene expression or protein function; (3) FPs that have been shown in vitro or in vivo to affect the structure, function or expression of the gene or gene product but for which no statistically significant disease association has yet been reported (see http://www.hgmd.cf.ac.uk/docs/poly.html webcite for further information).
Table 1. Missense and regulatory mutations from the HGMD used in this study, categorised by mutation type and putative role in disease aetiology
Identification of PCMs
A total of 8,280,851 nucleotide positions at which the Denisovan genome differs from either the human (NCBI36/hg18) or chimpanzee genome were downloaded from the website of the Max Planck Institute for Evolutionary Anthropology http:/ / bioinf.eva.mpg.de/ download/ DenisovaGenome/ Denisova_Neandertal_catalog.tgz webcite[1,4]. The human and the Denisovan hominin were found to exhibit the same nucleotide at 7,283,268 positions (87.95 per cent), so that the human-chimpanzee mismatches must have arisen before the divergence of modern humans and Denisovans (termed a 'derived' or 'D' state in the Denisovan). A total of 941,947 positions (11.38 per cent) displayed the same nucleotide in both Denisovan and chimpanzee, suggesting that the respective substitutions were human specific ('ancestral' or 'A' state in the Denisovan). The remaining 55,636 positions, which display different nucleotides in modern humans, Denisovans and chimpanzees, were termed 'undefined' ('N' state). Of the 8,280,851 Denisovan nucleotide positions investigated here, there were 5,205,736 positions at which the Neanderthal was found to differ from at least one of modern human, chimpanzee and Denisovan. From these 5,205,736 sites, we identified 197 sites for which the apparent wild-type nucleotide in Denisovan, Neanderthal or chimpanzee was logged in the HGMD as disease causing or disease associated in modern humans (Table 2). From the remaining 3,075,115 sites, we identified 117 sites for which the apparent wild-type nucleotide in the Denisovan or chimpanzee was logged in the HGMD as disease causing or disease associated in either the Denisovan or chimpanzee (Table 3).
Table 2. HGMD-derived mutations identified as PCMs in the Denisovan, Neanderthal and/or chimpanzee genomes
Table 3. HGMD-derived mutations identified as PCMs in the Denisovan genome and/or chimpanzee genome
Gene ontology (GO) enrichment analysis
A GO enrichment analysis of PCM-containing genes against a background of 2,688 human disease-causing genes was performed using the DAVID bioinformatics tool . The statistical significance of a particular GO term was calculated using Fisher's exact test, which was then adjusted to allow for multiple testing by means of the Benjamini-Hochberg correction .
Calculation of Wright's fixation index (FST) values
The FST measures the proportion of genetic diversity in a subdivided population that is attributable to allele frequency differences between subpopulations. Pairwise FST values have also been used as a measure of genetic distance between populations. In this context, the allele frequencies of polymorphic ancestral PCMs in selected populations were obtained from HapMap http://hapmap.ncbi.nlm.nih.gov/ webcite and pairwise FST values were estimated for each polymorphism using the small sample estimate proposed by Weir and Hill . The significance of individual FST values was then assessed by reference to the empirical distribution of FST among all single nucleotide polymorphisms (SNPs) in HapMap.
Results and discussion
Identification of PCMs in the Denisovan, Neanderthal and/or chimpanzee genomes
A total of 44,348 missense mutations from 2,628 genes and 1,712 putative regulatory mutations from 807 genes, which have been recorded in the HGMD as being either causative of (or associated with) a human inherited disease state, were cross-compared with the corresponding nucleotide positions in the Neanderthal, Denisovan and chimpanzee genomes.
When the 197 PCMs covered by both the Denisovan and the Neanderthal sequences were considered, these included 129 of 143 PCMs identified in the Neanderthal genome (10/12 DMs, 65/73 DPs, 25/26 FPs, 29/32 DFPs), and 123 (62 per cent) PCMs for which the Denisovan, Neanderthal and chimpanzee wild-type nucleotides were identical to the human disease-causing/disease-associated mutant allele. Of the 117 PCMs covered only by the Denisovan sequence, there were 79 (67.5 per cent) for which both the Denisovan nucleotide and the chimpanzee nucleotide were identical to a human DM/disease-associated mutation. This may be indicative of either a bottleneck effect or selection during the evolution of the modern human lineage. Of the 197 PCMs, there was one mutation that was compensated only in the Neanderthal, one that was compensated only in the Denisovan, five that were compensated in both Neanderthal and Denisovan and 16 that were compensated only in the chimpanzee. There were also 18 mutations that differed between the Neanderthal and the Denisovan, which could imply that such mutations were identical-by-state (Tables 2 and 3).
There were 16 human DMs that were found to be potentially compensated in the chimpanzee, Denisovan or Neanderthal (covered by both the Neanderthal and the Denisovan sequence) and 12 human DMs potentially compensated in the chimpanzee or Denisovan (covered only by the Denisovan sequence) (Table 4).
Table 4. Human DMs identified as PCMs
Of the human DMs that were potentially compensated in the chimpanzee, Denisovan or Neanderthal, only the putatively pathological F5 variant was specific to the Denisovan. In humans, this missense mutation, Val1736Met, is associated with activated protein C resistance and an increased risk of thromboembolism and recurrent miscarriage [20,21]. It is unclear at this juncture whether this variant was indeed a PCM in the Denisovan or whether it could instead have been associated with disease in this archaic hominin.
Even though Denisovans appear to be more closely related to Neanderthals than humans, the Neanderthal and Denisovan were discrepant with respect to certain PCMs (eg the SLC5A1 H615Q variant associated with glucose-galactose malabsorption). In this case, the Denisovan (and the chimpanzee) possessed the allele that was mutant in humans (G), whereas the Neanderthal possessed the allele (C) which was wild-type in humans. In this context, it may be pertinent to mention that SLC5A1 is located on chromosome 22q12.3 within a region of putative gene flow from Neanderthal to Eurasian .
Some of the PCMs listed in Table 4 may well have been misclassified by the original authors as disease-causing in human (especially those variants which have been allocated a '?' by the HGMD; see Table 4) when they were actually neutral polymorphisms; however, this is much less likely in the case of the 16 human disease-causing mutations that are covered by both the Neanderthal and Denisovan sequences. These mutant alleles would have had to have been maintained in both Neanderthal and Denisovan populations for ~640,000 years, when these two hominins last shared a common ancestor, and this would have been unlikely if such variants had been neutral polymorphisms.
Statistically enriched GO terms were identified for genes containing human DMs identified as PCMs (Table 4) against a background of known disease-causing genes (from the HGMD) and are shown in Table S1 (Table 6). Five significantly enriched GO terms were found; all relate to the plasma membrane.
Table S1. Significantly enriched GO terms (Benjamini-corrected p-value <0.05) for human genes containing DMs identified as PCMs (listed in Table 4) against a background of known disease-causing genes. No significantly enriched GO terms were found to relate to biological processes or molecular function
With respect to the DPs/FPs, 100 DPs, 39 FPs and 43 DFPs were covered by both the Neanderthal and Denisovan sequences (Table S2 (Table 7)), while 52 DPs, 26 FPs and 27 DFPs were covered by the Denisovan but not the Neanderthal sequence (Table S3 (Table 8)); these DPs/FPs may be relevant to human genetic disease.
Table S2. PCMs covered by both the Denisovan sequence and the Neanderthal sequence
Table S3. PCMs covered by the Denisovan sequence but not the Neanderthal sequence
Human variants with significantly different population frequencies at sites of PCMs
The FST was used to quantify the allele frequency differences for the different polymorphic PCMs between extant African, Asian and European populations. Alleles that have been the target of localised positive selection tend to exhibit unusually high FST values [22,23]. We therefore compared the FST values of the ancestral polymorphic PCMs with the empirical FST distribution derived from all HapMap SNPs (International HapMap Consortium, 2007), to assess the significance of individual FST values. We identified six PCMs with significantly elevated FST values (Table 5).
Table 5. PCMs (disease-causing and disease-related) with significantly different genotype frequencies in different HapMap populations
Although four of these PCMs had already been identified in our previous comparative analysis of the human, chimpanzee and Neanderthal genomes, two novel PCMs were identified in the putative cation exchanger SLC24A5 (DP) gene and in the alcohol dehydrogenase ADH1B (FP) gene. These genes have in common the GO terms GO:0046872, GO:0043169 and GO:0043167, terms which relate to metal ion binding, cation binding and ion binding, respectively. The SLC24A5 variant appears to be associated with increased skin pigmentation and predominates in African/East Asian populations [25,26].
In conclusion, using the newly reported genome sequence from a Denisovan hominin, we have identified a number of PCMs in the chimpanzee, Neanderthal and Denisovan. Those human PCMs that were ancestral (ie both the Denisovan nucleotide and the chimpanzee nucleotide were identical to the human DM/disease-associated mutation) could potentially be indicative of either the human lineage-specific loss of compensatory nucleotide changes within the respective genes carrying the PCM, or adaptive differences between modern humans and Denisovans.