Edited by: Thomas Flatt, Vetmeduni Vienna, Austria
Reviewed by: Josephine Hoh, Yale University, USA; Stuart Kim, Stanford University, USA
*Correspondence: Paola Sebastiani, Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Avenue, Room 317, Boston, MA 02118, USA. e-mail:
This article was submitted to Frontiers in Genetics of Aging, a specialty of Frontiers in Genetics.
This is an open-access article distributed under the terms of the
Supercentenarians (age 110+ years old) generally delay or escape age-related diseases and disability well beyond the age of 100 and this exceptional survival is likely to be influenced by a genetic predisposition that includes both common and rare genetic variants. In this report, we describe the complete genomic sequences of male and female supercentenarians, both age >114 years old. We show that: (1) the sequence variant spectrum of these two individuals’ DNA sequences is largely comparable to existing non-supercentenarian genomes; (2) the two individuals do not appear to carry most of the well-established human longevity enabling variants already reported in the literature; (3) they have a comparable number of known disease-associated variants relative to most human genomes sequenced to-date; (4) approximately 1% of the variants these individuals possess are novel and may point to new genes involved in exceptional longevity; and (5) both individuals are enriched for coding variants near longevity-associated variants that we discovered through a large genome-wide association study. These analyses suggest that there are both common and rare longevity-associated variants that may counter the effects of disease-predisposing variants and extend lifespan. The continued analysis of the genomes of these and other rare individuals who have survived to extremely old ages should provide insight into the processes that contribute to the maintenance of health during extreme aging.
Human aging is affected by genes, life style, and environmental factors. The genetic contribution to average human aging can be modest with genes explaining ∼20–25% of the variability of human survival to the mid-eighties (Herskind et al.,
The nature and contribution of genetic variation to exceptional longevity remains unclear, particularly the role for undiscovered rare genetic variants with large effects and/or the presence of many common genetic variants with small effects (Bloss et al.,
In this report, we describe the complete DNA sequences of two supercentenarians, a male and a female, both ages >114 years old. Although these data cannot provide conclusive evidence about the genetic determination of human exceptional longevity, they are the first step toward the generation of a comprehensive reference panel of exceptionally long-lived individuals. The data also provide interesting insights into genetic backgrounds that are conducive to exceptional longevity and allow us to test different models of exceptional longevity.
Figure
DNA from both individuals was sequenced using the Illumina Genome Analyzer II by Illumina’s Clinical Laboratory Service, using paired-end reads of 100 bp producing 1,650,463,996 reads for the woman and 1,804,595,182 reads for the man. Reads were mapped to the genome reference NCBI36 and NCBI37 using the procedures described in Figure
Single nucleotide polymorphisms (SNPs) were called using the Illumina CASAVA (Bentley et al.,
The two subjects shared 1,997,897 SNPs, and 66% of these SNPs had the same genotypes (Figure
The number of called SNPs in the man and the fraction of novel SNPs in both subjects were consistent with the projections made by The 1000 Genomes Project Consortium (
For additional assessment of the quality of the data, we computed the concordance between genotype calls in the sequences of the two subjects and their SNP array data. For both subjects, the concordance between genotype calls of SNPs in the array and SNPs in the sequences was >99.7%. In addition, more than 98% of the SNPs in the array not included in the list of called SNPs from the sequencing were homozygous for the referent allele. Transition to transversion ratios (2.11 in the man and 2.07 in the woman) were consistent with the expected number in Caucasians (Ebersberger et al.,
We used both SAMTools and Dindel (Albers et al.,
We used a suite of bioinformatics tools assembled and built by researchers at The Scripps Research Institute (Torkamani et al.,
We used the whole genome sequences of these two subjects to test different hypotheses about the genetics of exceptional longevity. These non-exclusive hypotheses and the results of the analyses are described in the sections that follow.
Genes involved in the insulin pathway (Guarente and Kenyon,
A. Coding SNPs linked to exceptional longevity in published literature |
|||||||||
---|---|---|---|---|---|---|---|---|---|
SNP | HG18 | Alleles | Ref Allele | EL Allele | Pubmed | PG17 | PG26 | ||
|
rs12206094 | chr6:109012893 | C/T | C | T | 0.27 | CC | ||
rs2764264 | chr6:109041154 | C/T | C | C | 0.29 | TT | |||
rs7762395 | chr6:109051800 | A/G | G | A | 0.17 | 20849522 | GG | ||
rs9400239 | chr6:109084356 | C/T | T | T | 0.24 | CC | |||
rs479744 | chr6:109126725 | G/T | G | T | 0.20 | GG | |||
rs2229765 | chr15:97295748 | G/A | G | A | 0.42 | 12843179 | |||
rs34516635 | chr15:97269499 | G/A | G | A | 0.02 | GG | GG | ||
chr15:97068418 | G/A | G | A | 18316725 | GG | GG | |||
chr15:97272104 | G/A | G | A | GG | GG | ||||
rs2227956 | chr6:31886251 | G/A | G | A | 0.76 | PMC1576475 | |||
rs5882 | chr16:55573593 | A/G | G | GG | 0.36 | 20068209 | AA | AA | |
rs662 | chr7:94775382 | T/C | T | C | 0.33 | 15050299 | TT | ||
rs9664222 | chr10:89338633 | A/C | A | C | 0.76 | 20304771 | CC | ||
Rs3758391 | chr10:69313348 | C/T | T | T | 0.27 | 17895433 | CC | ||
rs9536314 | chr13:32526138 | T/G | T | G | 0.14 | 11792841 | |||
rs9527025 | chr13:32526193 | G/C | G | C | 0.19 | ||||
|
rs2273773 | chr10:69336604 | C/T | T | T | 0.96 | Syn | TT | CT |
rs28365927 | chr11:226091 | A/G | G | G | 0.85 | Non-syn | AG | GG | |
rs2772364 | chr13:32488851 | C/T | T | C | 1.00 | Syn | CC | TT | |
rs9527026 | chr13:32526239 | A/G | G | G | 0.83 | Syn | AG | AG | |
rs564481 | chr13:32532983 | C/T | C | C | 0.63 | Syn | CT | CT | |
rs648202 | chr13:32533463 | C/T | T | C | 0.85 | Syn | CC | CC | |
rs649964 | chr13:32533835 | C/T | T | C | 1.00 | Syn | CC | CC | |
rs35812156 | chr15:97252339 | A/C | C | C | 0.98 | Syn | AC | CC | |
rs352493 | chr19:4131836 | C/T | C | T | 0.96 | Non-syn | CT | TT | |
rs3757261 | chr6:13707282 | C/T | C | C | 0.73 | Syn | CT | CT | |
rs854560 | chr7:94784020 | A/T | A | A | 0.59 | Non-syn | AA | TT |
The woman was homozygous for only one of the 16 longevity variants (allele A of rs2227956 in
As noted earlier, both subjects markedly delayed both disability and age-related diseases until very late in their lives. We tested the hypothesis that these two whole genome sequences did not include disease-predisposing variants or, if they did, the number was significantly lower compared to currently available genomes. We compiled a list of 62,339 disease-annotated variants from the Human Genome Mutation Database (HGMD@; Stenson et al.,
Figure
The bar plot in Figure
Overall, the two subjects carried 403 disease-associated variants with the same genotype (Table S1 in Supplementary Material), but only 209 of these were coding variants, and in only 76 of these positions the two sequences were homozygous for the risk allele. For example, both subjects were CC homozygotes for rs222859 (
This analysis shows that the two supercentenarians did not carry a smaller rate of disease-associated variants compared to other genomes.
One of the explanations for the small number of genetic variants irrefutably linked to exceptional longevity (Schachter et al.,
Eleven of the genes with novel mutations in the woman were linked to the class of diseases “infection” in the genetic association database (
Figure
In a genome-wide association study of exceptional longevity with 801 centenarians (median age at death 104 years) and 914 genetically matched controls, we identified 281 SNPs that were significantly associated with exceptional longevity and could be used to predict the phenotype with 78% sensitivity for a replication set with a mean age of 108 years (Sebastiani et al.,
We generated the whole genome sequences of a male and a female supercentenarian who were selected for this study because of both their exceptional lifespan and healthspan. Although two sequences do not provide sufficient data for general inference on the genetics of exceptional longevity, they are a first step toward the generation of a reference panel of exceptionally long-lived individuals and provide some interesting insights about genetic backgrounds that might be conducive to exceptional longevity. The $10 million Archon Genomics X Prize
The analysis of next generation sequence data is still very challenging, and no single mapping and variant calling algorithm has emerged as the standard tool to use. Therefore, we used two algorithms to select a set of robust variants to examine further. The specifically selected data show that the genetic architectures of the two subjects are comparable to published sequenced genomes, particularly in terms of rates of coding variants and predicted damaging variants, and it is likely that overall differences seen between genomes are due to different platforms, algorithms, and annotation methods rather than major structural changes that can be linked to exceptional longevity, at least in the case of germ line mutations.
We also used the genome sequences of these two subjects to test different genetic models of exceptional longevity. The insulin pathway, caloric restriction, and lipid metabolism significantly influence lifespan in other organisms including the mouse, fly, and worm (Christensen et al.,
One of the hypothetical genetic models of exceptional longevity is that, in order for centenarians to achieve their exceptional survival, they must lack disease-predisposing variants. Phenotypically, there is evidence that this is not necessarily the case, since approximately 40% of centenarians have had onset of age-related diseases before the age of 80 (Evert et al.,
An alternative, complementary hypothesis is that variants, possibly rare, not hitherto associated with health maintenance, compensate for disease-causing variants among the healthy elderly. In this light, the genomes of these two supercentenarians were not particularly enriched for novel variants, although the stringency of our approach to variant calling may have increased the false negative rate to reduce the false-positive rate. Nevertheless, the 1% novel variants discovered through this analysis may lead to the discovery of novel genes involved with exceptional longevity. The observation that the novel variants were in genes implicated in “alternative splicing” highlights the importance of pursuing follow-up studies involving transcriptome sequencing from different cells and tissues to identify RNA isoforms or expression profiles that may be important for exceptional longevity.
In summary, the two supercentenarian genomes we studied have different features including, for example, a number of private mutations in specific categories of genes in the case of the woman but not the man. Other ongoing studies of centenarians are showing that different centenarian genomes have different characteristics (Cirulli et al.,
It is also likely that environmental factors and possibly the genetic ancestry may influence the likelihood of an individual to live long ages directly or by interacting with the genetic background. The NECS has shown that the chance of male and female siblings of centenarians to live past 100 can be 8 and 17 times higher than the risk in the general population (Perls et al.,
Subjects provided informed consent and this project was approved by the Boston University Medical Campus Institutional Review Board.
The man and woman were both age 114+ years at the time of blood collection for DNA extraction. Their age was confirmed by birth/baptism certificates and early U.S. Census entries. For both subjects we examined the European ancestry by principal component analysis of genome-wide genotype data, as described in Sebastiani et al. (
Peripheral blood was obtained from each subject and used for DNA extraction. DNA samples were sequenced using a GAII sequencer at the Illumina Clinical Service Laboratory (San Diego, CA) using 100 bp paired-end reads. Fastq files were generated from the image files using the Illumina pipeline software (Firecrest for image analysis and Bustard for base calling).
Reads in Fastq files were mapped to the NCBI36 reference genome including all chromosomes and Mt DNA using the Eland aligner (Bentley et al.,
aln -n 4 -o 1 -e 2 -k 2 -l 35 -R 5
to limit the maximum edit distance to 4, the maximum number of gap opens to 1, the maximum number of gap extensions to 2, the maximum number of edit distance in the seed to 2, the seed to the first 35 bases, and to proceed with suboptimal alignment if there are no more than 5 best hits. The more relaxed thresholds produced a larger number of aligned reads [1,559,315,652 aligned reads for PG26 (86.41%) and 1,334,406,235 aligned reads for PG17 (80.85%)].
Single nucleotide polymorphisms were called using the CASAVA algorithm (see
We annotated the single polymorphic bases that differ from the reference genome (hg18) against dbSNP build 132 and the summary data distributed by the 1000 Genomes project
We identified 15 additional whole genome sequences for comparison. The samples and references are in Table
We assessed SNP call quality using several measures.
We used SNP array data genotyped with the Illumina array to describe the genome-wide structure of chromosomal alterations of PG17 and PG26. The two samples were genotyped with the Illumina 610 array (PG17) and 1M array (PG26) and data were processed as described in Sebastiani et al. (
We compared genotype calls detected from next generation sequence analysis against the genotype calls determined with SNP array analysis. Missing genotypes were ignored.
We calculated the number of transitions (A ↔ G or C ↔ T) and transversions (A ↔ C,T; C ↔ G; and T ↔ G) and their rate in all called SNPs, in SNPs reported only in the 1000 Genomes project, and in novel SNPs. Rates of heterozygosity were computed by the ratio of the number of heterozygous calls versus homozygous calls only for the SNPs with alleles that differ from the reference genome.
For short indel finding, the reads were mapped to the human reference genome (NCBI build 37) using BWA in paired-end mode with default settings except for the following differences in BWA
We found that using BWA with the last three parameters reduces the number of discordant pairs and pairs with ends mapped to different chromosomes without sacrificing overall alignment accuracy. Since Dindel uses both ends of discordant pairs for realignment and for calculation of variant qualities, we wanted to keep the number of discordant pairs low. The BAM alignment files produced by BWA were merged and split by chromosomes, library and read group information was added, and duplicates were removed on per-library basis using Picard MarkDuplicates. Short indel calls were made using Dindel and SAMTools
For bioinformatics analysis, we used the complementary genome-wide variant annotation tools embedded in a suite of tools developed by researchers at The Scripps Research Institute (Torkamani et al.,
We compiled this list by merging disease-annotated variants from the HGMD
We used the David functional tool (Huang da et al.,
We detected the longevity-associated variants in PG17 and PG26 as those genotypes that increase the posterior probability of exceptional longevity in the two subjects in the list of 281 SNPs reported in Sebastiani et al. (
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at
Illumina report (hg18) |
BWA (hg18) |
|||
---|---|---|---|---|
Male >110 | Female >110 | Male >110 | Female >110 | |
No of reads | 1,804,595482 | 1,650,463,996 | 1,804,595,182 | 1,650,463,996 |
Read length | 100 | 100 | 100 | 100 |
Number of bases used to generate consensus | 114,850,993,633 | 98,120,678,830 | 156,017,244,672 | 133,347,354,219 |
Base pairs reported (consensus) | 2,736,567,883 | 2,699,489,763 | 2,855,361,624 | 2,828,277,701 |
Number aligned reads | 1,430,677,828 (79.3%) | 1,246,883,285 (75.5%) | 1,559,315,652 (86.41%) | 1,334,406,235 (80.85%) |
Rate of genome covered | 0.957 | 0.952 | 0.999 | 0.998 |
Coverage depth | 43.14 | 36.40 | 54.60 | 47.04 |
Distribution of coverage and Phred scores of indels.
Genome | Ethnicity | Gender | Platform | Reference |
---|---|---|---|---|
NA12891 | CEU | M | Illumina | |
NA12892 | CEU | F | Illumina | |
NA12878 | CEU | F | ABI_SOLiD | |
NA12878 | CEU | F | Illumina | |
YH | Chinese | M | Illumina | |
AK1 | Korean | M | Illumina | |
NA07022 | CEU | M | Complete genomics | |
NA20431 | CEU | M | Complete genomics | |
NA19240 | AA | F | Complete genomics | |
NA19240 | AA | F | ABI_SOUD | |
NA18507 | AA | M | Illumina | |
NA18507 | AA | M | ABI_SOLiD | |
Quake | CEU | M | Helicos | |
Venter | CEU | M | Sanger | |
Watson | CEU | M | Roche 454 | |
PG17 | CEU | F | Illumina | |
PG26 | CEU | M | Illumina |
NECS |
Watson | Venter | Complete genomics |
AK1 | YH | Illumina |
1000 Genomes |
|||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Male >111 | Female >113 | NA07022 (CEU/M) | NA20431 (PG/M) | NA19240 (YRI/F) | NA 18507 (YRI/m) | CEU trio | YRI trio | |||||
Ti/Tv (in dbSNP) | 2.11 | 2.07 | 2.01 | 2.04 | 2.17 | 2.19 | 2.13 | 2.07 | 2.01 | 2.06 | 2.07 | 2.09 |
Ti/Tv (only 1000 genomes) | 2.33 | 2.27 | ||||||||||
Ti/Tv (Novel) | 2.07 | 2.13 | 1.94 | 2.02 | ||||||||
het/homo (in dbSNP) | 1.41 | 1.24 | 1.30 | 1.2 | 1.64 | 1.72 | 2.01 | 1.57 | 1.27 | 1.75 | ||
het/homo (only 1000 genomes) | 14.57 | 7.12 | ||||||||||
het/homo (Novel) | 6.28 | 9.16 | 12.3 | 27.48 | 13.73 | |||||||
Mean mapped depth | 41.64 | 35.05 | 7.40 | 7.5 | 87 | 45 | 63 | 29 | 36 | 41 | 43 | 40 |
Mean mapped depth (only 1000 genomes) | 35.51 | 30.02 | ||||||||||
Mean mapped depth (Novel) | 37.93 | 34.08 | ||||||||||
CGI | ||||||||||||
Watson/Venter/AK1/YH | ||||||||||||
Illumina | ||||||||||||
1000 Genomes | ||||||||||||
Venter |
Source | NECS | Complete genomic (2009) | Watson | CJ Venter | AK1 | YH (Chinese) | Illumina | 1000 Genomes | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Variation type | PG17 (Fern CEU) | PG26 (Male CEU) | NA19240 (Fem YRI) | NA07022 (Male CEU) | NA20431 (Male PG) | Male (CEU) | Male (CEU) | Male (Korean) | YH (Chinese) | NA18507 (Male YRI) | CEU trio | YRI trio |
All | 3,084,838 (1.4) | 3,253,896 (1.5) | 4,042,801 (19) | 3,076,869 (10) | 2,905,517 (10) | 3,322,093 (18) | 3,075,281 (18) | 3,453,653 (17) | 3,074,097 (14) | 4,139,196 (16) | 3,646,764 (11) | 4,502,439 (23) |
Homozygous | 1,378,735 (0.37) | 1,352,646 (0.52) | 1,297,601 (4) | 1,097,899 (2) | 965,029 (1) | 1,464,414 | 1,450,860 | 1,343,250 (4) | 1,351,824 | 1,503,420 | ||
Heterozygous | 1,706,103 (2.54) | 1,901,250 (2.18) | 2,639,864 (27) | 1,800,287 (15) | 1,657,540 (16) | 1,857,679 | 1,624,421 | 2,110,403 (25) | 1,722,273 | 2,635,776 | ||
Transitions | 2,080,095 (1.46) | 2,208,983 (1.48) | 3,635,882 | 2,858,818 | 2,658,112 | |||||||
Transversions | 1,004,743 (1.43) | 1,044,913 (1.53) | 1,706,195 | 1,316,837 | 1,213,232 | |||||||
Coding | 19,581 (1.9) | 21,377 (2.1) | 23,000 (16) | 18,723 (9) | 16,532 (10) | 21,152 | 27,762 | 15,686 | 26,140 | |||
Non-synonymous | 9,152 (2.5) | 10,022 (2.8) | 11,400 (19) | 9,286 (U) | 8,215 (12) | 10,569 | 6,889 | 10,162 | 7,062 | 10,875 | 8,299–11,122 | 10,349–11,122 |
All | 338,455 | 370,266 | 496,194 | 337,635 | 269,794 | 227,718 | 214,691 | 170,202 (62) | 135,262 (59) | 404,416 (50) | 411,611 (25) | 502,462 (37) |
Short insertions | 169,833 (63.9) | 187,058 (65.2) | 242,391 (40) | 168,909 (37) | 136,786 (37) | 188,388 (20) | 226,361 (30) | |||||
Short deletions | 168,622 (53.3) | 183,208 (53.5) | 253,803 (44) | 168,726 (37) | 133,008 (36) | 225,189 (29) | 277,036 (43) | |||||
Coding short indels | 314 (56.4) | 420 (58.3) | 549 (56) | 556 (58) | 435 (59) | 345 | 863 | 212 | 65 | 476 (21) | 658 (33) | |
Complete genomics | ||||||||||||
Watson/Venter/AK1/Yh | ||||||||||||
Illumina | ||||||||||||
1000 Genomes | ||||||||||||
Venter |
PG17 (all SNPs) | PG26 (all SNPs) | PG17 (all ins) | PG26 (all ins) | PG17 (all del) | PG26 (all del) | |
---|---|---|---|---|---|---|
Total number of variants creating miRNA binding sites | 5,965 | 6,150 | 208 | 241 | 265 | 298 |
Total number of variants destroying miRNA binding sites | 6,007 | 6,267 | 298 | 320 | 197 | 192 |
Total number of variants perturbing miRNA binding sites (diff delta |
13,190 | 13,655 | 880 | 938 | 808 | 877 |
Total number of TFBS disrupting variants | 587,761 | 638,857 | 28,896 | 33,234 | 50,997 | 58,709 |
Total number of major TFBS disrupting variants ( |
4,037 | 4,619 | 7,560 | 9,005 | 30,123 | 35,117 |
Total number of TFBS deleting variants: | 52 | 62 | 0 | 0 | 416 | 524 |
Total number of splice site variants | 3,385 | 4,284 | 306 | 352 | 269 | 308 |
Total number of “splicing change” variants | 23 | 28 | 85 | 104 | 102 | 129 |
Total number of variants creating exonic splicing enhancer binding sites | 6,628 | 7,240 | 39 | 66 | 189 | 200 |
Total number of variants destroying exonic splicing enhancer binding sites | 6,646 | 7,300 | 50 | 65 | 193 | 211 |
Total number of variants creating exonic splicing silencer binding sites | 3,738 | 4,038 | 38 | 43 | 176 | 179 |
Total number of variants destroying exonic splicing silencer binding sites | 3,607 | 3,982 | 14 | 23 | 158 | 153 |
PG17 (all SNPs, %) | PG26 (all SNPs, %) | PG17 (all ins, %) | PG26 (all ins, %) | PG17 (all del, %) | PG26 (all del, %) | |
---|---|---|---|---|---|---|
Total number of variants creating miRNA binding sites | 0.19 | 0.19 | 0.12 | 0.13 | 0.16 | 0.16 |
Total number of variants destroying miRNA binding sites | 0.19 | 0.19 | 0.18 | 0.17 | 0.12 | 0.10 |
Total number of variants perturbing miRNA binding sites (diff delta |
0.43 | 0.42 | 0.52 | 0.50 | 0.48 | 0.48 |
Total number of TFBS disrupting variants | 19.05 | 19.63 | 17.01 | 17.77 | 30.24 | 32.04 |
Total number of major TFBS disrupting variants ( |
0.13 | 0.14 | 4.45 | 4.81 | 17.86 | 19.17 |
Total number of TFBS deleting variants | 0.00 | 0.00 | 0.00 | 0.00 | 0.25 | 0.29 |
This work was funded by grants from the National Institute on Aging R56-AG027216 (Clinton T. Baldwin), R01HL087681 (Martin H. Steinberg), K24AG025727 and a Glenn Foundation for Medical Research Breakthroughs in Gerontology Award (Thomas T. Perls), National Science Foundation IIS-1017621 (Gary Benson), NIH/NCRR UL1 RR025774; NIH/NHLBI R01 HL089655-03; and NIH/NIDA R01 DA030976, and the Price Foundation and Scripps Genomic Medicine (Ali Torkamani, Nicholas J. Schork, Phillip Pham). We are beholden to the very special and wonderful subjects of this study for their interest and participation.
1
2
3
4
5
6
7
8
9
10
11
12