Extended Data Figure 4 The standardized number of variant sites per genome, partitioned by population and variant category. In practice, significance thresholds must balance false positives and false negatives22,23,24. ), WT086084/Z/08/Z and WT100956/Z/13/Z (G.M. Keywords. and JavaScript. 44, 955959 (2012), Xue, Y. et al. Springer Science and Business Media LLC. To evaluate variant discovery power and genotyping accuracy, we also generated deep Complete Genomics data (mean depth = 47) for 427 individuals (129 motherfatherchild trios, 12 parentchild duos, and 16 unrelateds). ), the Medical Research Council UK grant G0801823 (J.L.M. Resolving the complexity of the human genome using single-molecule sequencing. Pies are divided into four slices, representing variants private to a population (darker colour unique to population), private to a continental area (lighter colour shared across continental group), shared across continental areas (light grey), and shared across all continents (dark grey). 1000 Genomes Project Consortium. To provide a measure of uncertainty, one curve is plotted for each chromosome. Open access status: An open access version is available from UCL Discovery. A global reference for human genetic variation. - SciSpace by Typeset Human Genetics; Target Discovery; Development & Regeneration; Our team; Departments, Centres and Institutes; Funding opportunities. ), U01HG5728 (Y.X.F. An integrated map of structural variation in 2,504 human genomes 12742: 2015: . Hum. All individuals were sequenced using both whole-genome sequencing (mean depth = 7.4) and targeted exome sequencing (mean depth = 65.7). ), and contracts HHSN268201100040C (A.M.R.) An integrated map of structural variation in 2,504 human genomes. Comparison to haplotypes constructed from fosmids suggests the average distance between phasing errors is 1,062 kb, with typical phasing errors stretching 37 kb (Supplementary Table 12). ), the Japan Society for the Promotion of Science Fellowship number PE13075 (N.P. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. Nature Classes of Human Genetic Variation Not all classes of mutation occur with equal frequency, nor are they equivalent with respect to their contribution to disease. The sharing of haplotypes among individuals is widely used for imputation in GWAS, a primary use of 1000 Genomes data. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. Extended Data Figure 1 Summary of the callset generation pipeline. Although the number of characterized variants has more than doubled relative to phase 1, 2.3 million previously described variants are not included in the current analysis; most missing variants were rare or marked as low quality: 1.6 million had frequency <0.5% and may be missing from our current read set, while the remainder were removed by our filtering processes. ), R01GM104390 (L.B.J., M.Y.Y. 38, 458462 (2006), Klein, R. J. et al. Linkage disequilibrium was calculated around 10,000 randomly selected polymorphic sites in each population, having first thinned each population down to the same sample size (61 individuals). Global reference mapping of human transcription factor footprints 6c, d), although those confined within a population tend to be younger, with a shared common ancestor 143 generations ago (3,570 to 4,284 years ago)13. An integrated encyclopedia of DNA elements in the human genome. Datasets as Topic, Demography, Disease Susceptibility, Exome, Genetic Variation, Genetics, Medical, Genetics, Population, Genome, Human, Genome-Wide . 19, 16551664 (2009), Mathieson, I. They find that while most common variants are shared across populations, rarer variants are often restricted to closely related populations. ), RC2HG5552 and U01HG6513 (G.T.M., G.R.A. 1000 Genomes Project Consortium. CAS The magnitude of this difference is unlikely to be explained by demography10,11, but instead reflects the ethnic bias of current genetic studies. No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate. The current human reference genome is predominantly derived from a single individual and it does not adequately reflect human genetic diversity. a, Population structure inferred using a maximum likelihood approach with 8 clusters. Funding for this work was from the Wellcome Trust Core Award 090532/Z/09/Z and Senior Investigator Award 095552/Z/11/Z (P.D. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. An overview of the sample collection, data generation, data processing, and analysis is given in Extended Data Fig. Proc. Variation in factor B (BF) and complement component 2 (C2) genes is associated with age-related macular degeneration. 4): we observed 2,000 variants per genome associated with complex traits through genome-wide association studies (GWAS) and 2430 variants per genome implicated in rare disease through ClinVar; with European ancestry genomes at the high-end of these counts. is affiliated with SynapDx, P.D. Nature, 526(7571), 68 . For example, eastwest clines are visible in Africa and East Asia, a northsouth cline is visible in Europe, and European, African, and Native-American admixture is visible in genomes sampled in the Americas. Hypothetical LOC387715 is a second major susceptibility gene for age-related macular degeneration, contributing independently of complement factor H to disease risk. d, Heterozygote discordance for phase 3 compared to phase 1 within the intersecting sample. A global reference for human genetic variation. Nuffield Department a, Genotype covariance (above diagonal) and sharing of f2 variants (below diagonal) between pairs of individuals. ), the Marie Curie Actions Career Integration grant 303772 (C.A. Volume. Bioinformatics 29, 8491 (2013). First, the 1000 Genomes Project samples provide a broad representation of human genetic variationin contrast to the bulk of complex disease studies in humans, which primarily study European ancestry samples and which, as we show, fail to capture functionally important variation in other populations. For each category, z-scores were calculated by subtracting the mean number of sites per genome (calculated across the whole sample), and dividing by the standard deviation. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically. The diagonal represents the percentage of eQTLs in TFBS using the original discovery sample. 1b). ), the Simons Foundation SFARI award SF51 (M.W. Nature Genet. Natl Acad. We find that each common variant typically has over 1520 tagging variants in non-African populations, but only about 8 in African populations (Fig. The UK10K project identifies rare variants in health and disease. The squared correlation between imputed and experimental genotypes was >95% for common variants in each population, decreasing gradually with minor allele frequency (Fig. Pages . 526. As such, we estimate that improved rare variant discovery by deep sequencing our entire sample would at least double the total number of variants in our sample but increase the number of variants in a typical genome by only 20,000 to 60,000. Hum. A map of human genome variation from population-scale sequencing. ), WT0855322/Z/08/Z (R.L. a, Performance of imputation in 6 populations using a subset of phase 3 as a reference panel (n = 2,445), phase 1 (n = 1,065), and the corresponding data within intersecting samples from both phases (n = 1,006). Nature 526 (7571), 68, 2015. Nature Genet. Analysis of the small set of variants with large frequency differences between closely related populations can identify targets of recent, localized adaptation. Importantly, each release has examined larger numbers of individuals, aiding population-based analyses that identify and leverage shared haplotypes during genotyping. ), WT097307 (W.K. PLoS ONE 7, e44926 (2012), Hernandez, R. D. et al. b, PSMC curves estimated separately for all individuals within the 1000 Genomes sample. 357, 553561 (2007), Maller, J. http://dx.doi.org/10.1038/ng.3368 (2015), Delaneau, O. Sequence analysis methods improved with the development of strategies for identifying and filtering poor-quality data, for more accurate mapping of sequence reads (particularly in repetitive regions), for exchanging data between analysis tools and enabling ensemble analyses, and for capturing more diverse types of variants. Nature 475, 493496 (2011), Moltke, I. et al. PDF A global reference for human genetic variation - LSU A global reference for human genetic variation. Main The 1000 Genomes Project has already elucidated the properties and distribution of common and rare variation, provided insights into the processes that shape genetic diversity, and. A global reference for human genetic variation. A global reference for human genetic variation. Immunology Journal article. Sci-Hub | A global reference for human genetic variation. Nature, 526 b, The average number of detected variants per genome with whole-sample allele frequencies <0.5% (grey bars), with the average number of singletons indicated by colours. A global reference for human genetic variation - ICH GCP Nature Genet. Research. Towards a reference genome that captures global genetic diversity is a founder of Congenica and a consultant for Dovetail, E.E.E. ), R01GM59290 (L.B.J., M.A.B. Individuals from recently admixed populations show great variability in the number of variants, roughly proportional to the degree of recent African ancestry in their genomes. A global reference for human genetic variation. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. 14, 32273236 (2005), Yates, J. R. et al. ), T32HL94284 (J.L.R.F. Extended Data Figure 9 Performance of imputation. 47, 435444 (2015), The UK10K Consortium. (PDF 4486 kb). Genet. 10.1038/nature15393. Nature Genet. Nature Biotechnol. & Marchini, J. Although most common variants are shared across the world, rarer variants are typically restricted to closely related populations (Fig. ), R01HG5214 (G.A. B., Turchin, M. C., Pritchard, J. K. & Sella, G. The deleterious mutation load is insensitive to recent population history. Am. Targeting resident cell-based repair; Aim 2. Whereas our first analyses produced high-confidence short-variant calls for 8085% of the reference genome1, our newest analyses reach 96% of the genome using the same metrics, although our ability to accurately capture structural variation remains more limited33. Controlling for sample size, the decay of LD as a function of physical distance is fastest in African populations and slowest in East Asian populations (Extended Data Fig. Nature Genet. f, Heterozygote genotype discordance as a function of sequencing depth, as compared to Complete Genomics data. ), the Beatriu de Pinos Program grants 2006 BP-A 10144 and 2009 BP-B 00274 (M.V. In contrast, the bottleneck experienced by African populations during the same time period appears less severe, with Ne > 4,250. Dense genotyping identifies and localizes multiple common and rare variant association signals in celiac disease. Fan, W. Zhou, and T. Chen; N. Sengamalay, S. Ott, L. Sadzewicz, J. Liu, and L. Tallon; L. Merson; O. Folarin, D. Asogun, O. Ikpwonmosa, E. Philomena, G. Akpede, S. Okhobgenin, and O. Omoniwa; the staff of the Institute of Lassa Fever Research and Control (ILFRC), Irrua Specialist Teaching Hospital, Irrua, Edo State, Nigeria; A. Schlattl and T. Zichner; S. Lewis, E. Appelbaum, and L. Fulton; A. Yurovsky and I. Padioleau; N. Kaelin and F. Laplace; E. Drury and H. Arbery; A. Naranjo, M. Victoria Parra, and C. Duque; S. Dkel, B. Lenz, and S. Schrinner; S. Bumpstead; and C. Fletcher-Hoppe. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals . ), R01HG2385 (E.E.E. The 1000 Genomes Project has sought to comprehensively catalogue human genetic variation across populations, providing a valuable public genomic resource. The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Among variants in the GWAS catalogue (which have an average frequency of 26.6% in project haplotypes), the number of proxies averages 14.4 in African populations and 30.344.4 in other continental groupings (Supplementary Table 10). ), Ewha Womans University (C.L. A global reference for human genetic variation. Oxford Cardiovascular 2011CB809201, 2011CB809202 and 2011CB809203, Natural Science Foundation of China 31161130357, the Shenzhen Municipal Government of China grant ZYC201105170397A (J.W. We estimate the power to detect SNPs and indels to be >95% and >80%, respectively, for variants with sample frequency of at least 0.5%, rising to >99% and >85% for frequencies >1% (Extended Data Fig. Correspondence to Including an African population provided the greatest reduction in the count of associated variants and the greatest increase in overlap between top variants and TFBSs. The 1000 Genomes Project Consortium. ), NIJ Grant 2014-DN-BX-K089 (Y.E. 4c). Genome Res. Extended Data Figure 2 Power of discovery and heterozygote genotype discordance. D.M.A. USA 107, 74017406 (2010), Wakefield, J. Bayes factors for genome-wide association studies: comparison with P-values. A global reference for human genetic variation - ResearchGate Box 273, Banjul, The Gambia, Muminatou Jallow,Fatoumatta Sisay Joof,Tumani Corrah,Kirk Rockett&Dominic Kwiatkowski, NHLI, Imperial College London, Hammersmith Hospital, London, SW7 2AZ, UK, Centre for Tropical Medicine, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam, Tr`n Tnh Hi`n,Sarah J. Dunstan&Nguyen Thuy Hang, Peter Doherty Institute of Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, 3000, VIC, Australia, Kenema Government Hospital, Ministry of Health and Sanitation, Kenema, Sierra Leone, Richard Fonnie,Lansana Kanneh&Donald S. Grant, Tulane University Health Sciences Center, New Orleans, 70112, Louisiana, USA, Robert Garry,Lina Moses,John Schieffelin&Donald S. Grant, Laboratorios de Investigacin y Desarrollo, Facultad de Ciencias y Filosofa, Universidad Peruana Cayetano Heredia, Peru, Center for Non-Communicable Diseases, Karachi, Pakistan, Department of Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, 19104, Pennsylvania, USA, US National Institutes of Health, National Human Genome Research Institute, 5635 Fishers Lane, Bethesda, 20892, Maryland, USA, Lisa D. Brooks,Adam L. Felsenfeld,Jean E. McEwen,Yekaterina Vaydylevich,Jeffery A. Schloss&Lisa D. Brooks, Wellcome Trust, Gibbs Building, 215 Euston Road, London, NW1 2BE, UK, James D. Watson Institute of Genome Sciences, Hangzhou, 310008, China. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals fro ), DP2OD6514 (P.S. This provided a cost-effective means to discover genetic variants and estimate individual genotypes and haplotypes1,2. The number of alleles associated with a disease or phenotype in each genome did not follow this pattern of increased diversity in Africa (Extended Data Fig. Because of the wide availability of the data and samples, these samples have been and will continue to be used for studying many molecular phenotypes. Publisher. Genet. 3a). SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. This analysis separates continental groups, highlights their internal substructure, and reveals genetic similarities between related populations. Modelling the distribution of variation within and between genomes can provide insights about the history and demography of our ancestor populations14. Large-scale whole-genome sequencing of the Icelandic population. A global reference for human genetic variation. DOI: 10.1038/nature15393. ), U54HG3079 (R.K.W., E.R.M. Within each continental group, the maximum PBS statistic was selected from all pairwise population comparisons within the continental group against all possible out-of-continent populations. are affiliated with Illumina, J.K.B. 7938: 2012: The mutational constraint spectrum quantified from variation in 141,456 humans. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination . Compared to phase 1, rare variation imputation improved considerably, particularly for newly sampled populations (for example, PEL and PJL, Extended Data Fig. The performance of imputation and GWAS studies depends on the local distribution of linkage disequilibrium (LD) between nearby variants. Mol. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Abstract. Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. Age-related macular degeneration is associated with an unstable ARMS2 (LOC387715) mRNA. To assess imputation based on the phase 3 data set, we used Complete Genomics data for 9 or 10 individuals from each of 6 populations (CEU, CHS, LWK, PEL, PJL, and YRI). Nature http://dx.doi.org/10.1038/nature14962 (2015), Sidore, C. et al. Notable differences are confined to very recent time intervals, where the additional rare variants identified by deep sequencing suggest larger population sizes. Several potentially novel selection signals are also highlighted (such as TRBV9, which appears particularly differentiated in South Asia, PRICKLE4, differentiated in African and South Asian populations, and a number of genes in the immunoglobulin cluster, differentiated in East Asian populations; Extended Data Fig. ), the Quebec Ministry of Economic Development, Innovation, and Exports grant PSR-SIIRI-195 (P.A. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a . The human reference genome is the most widely used resource in human genetics and is due for a major update. Science 310, 17821786 (2005), Eiberg, H. et al. Due to the shared ancestry of all humans, only a modest number of variants show large frequency differences among populations. Aim 1. A global reference for human genetic variation - ResearchGate Drifted variants within such populations may reveal phenotypic associations that would be hard to identify in much larger global samples15. ), the European Molecular Biology Laboratory (P.F. ), the US National Institutes of Health National Center for Biotechnology Information (S.S.) and grants U54HG3067 (E.S.L. A global reference for human genetic variation :: MPG.PuRe 6a, b). Over the course of the 1000 Genomes Project there have been substantial advances in sequence data generation, archiving and analysis. The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Using a maximum likelihood approach12, we estimated the proportion of each genome derived from several putative ancestral populations (Fig. Lines represent the within-population median PSMC estimate, smoothed by fitting a cubic spline passing through bin midpoints. Nature 526, 68-74 (2015). abstract. Genetic Variation, Comparative Genomics, and the Diagnosis of Disease J. Med. Nature 517, 608611 (2015), Gudbjartsson, D. F. et al. Faculty of Medical Sciences, Cave Hill Campus, The University of the West Indies. is affiliated with ThermoFisher Scientific, N.H. is affiliated with Life Technologies, C.L. For structural variants, additional orthogonal methods were used for confirmation, including microarrays and long-read sequencing, resulting in FDR < 5% for deletions, duplications, multi-allelic copy-number variants, Alu and L1 insertions, and <20% for inversions, SVA (SINE/VNTR/Alu) composite retrotransposon insertions and NUMTs8 (nuclear mitochondrial DNA variants). (2b) Multi-allelic SNPs, indels, and complex variants (represented by yellow shapes, or variation in copy number) were placed onto the haplotype scaffold one at a time, exploiting the local linkage disequilibrium information but leaving haplotypes for other variants undisturbed39. 33, 7986 (2009), Wakefield, J. Commentary: genome-wide significance thresholds via Bayes factors. TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing,. The total number of observed non-reference sites differs greatly among populations (Fig. ), the Monument Trust (J.H. Science 331, 920924 (2011), Chen, W. et al. A global reference for human genetic variation The 1000 Genomes Project Consortium* The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Classic selective sweeps were rare in recent human evolution. 1b and Table 1). Furthermore, we estimate heterozygous genotype accuracy at 99.4% for SNPs and 99.0% for indels (Supplementary Table 4), a threefold reduction in error rates compared to our previous release2, resulting from the larger sample size, improvements in sequence data accuracy, and genotype calling and phasing algorithms. African genomes were consistently at the high end of these ranges. These novel variants especially enhance our catalogue of genetic variation within South Asian (which account for 24% of novel variants) and African populations (28% of novel variants). ), U41HG7635 (R.K.W., E.E.E., P.H.S. The majority of variants in the data set are rare: 64 million autosomal variants have a frequency <0.5%, 12 million have a frequency between 0.5% and 5%, and only 8 million have a frequency >5% (Extended Data Fig. Compared to phase 1, the number of imputed common and intermediate frequency variants increased by 7%, whereas the number of rare variants increased by >50%, and the number of indels increased by 70% (Supplementary Table 6). In contrast to earlier phases of the project, we expanded analysis beyond bi-allelic events to include multi-allelic SNPs, indels, and a diverse set of structural variants (SVs). "A Global Reference for Human Genetic Variation." Nature 526 (7571) (September 30): 68-74. doi:10.1038/nature15393. ), the Chinese 863 Program 2012AA02A201, the National Basic Research program of China 973 program no. When we restricted analyses to the variants most likely to affect gene function, we found a typical genome contained 149182 sites with protein truncating variants, 10,000 to 12,000 sites with peptide-sequence-altering variants, and 459,000 to 565,000 variant sites overlapping known regulatory regions (untranslated regions (UTRs), promoters, insulators, enhancers, and transcription factor binding sites). ), the UK Biotechnology and Biological Sciences Research Council grants BB/I02593X/1 (G.M.) An integrated map of genetic variation from 1,092 human genomes. A pangenome reference representative of 36 minority Chinese - Nature In the ARMS2/HTRA1 locus, the most strongly associated variant was now a structural variant (estimated imputation R2 = 0.89) that previously could not be imputed, consistent with some functional studies30. Adaptive evolution of the FADS gene cluster within Africa. After imputation, five independent signals in four previously reported AMD loci25,26,27,28 reached genome-wide significance (Supplementary Table 8). Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York 10065, USA, Ekta Khurana (Principal Investigator),Ekta Khurana (Principal Investigator)&Ekta Khurana (Principal Investigator), Department of Human Genetics, Radboud Institute for Molecular Life Sciences and Donders Centre for Neuroscience, Radboud University Medical Center, Geert Grooteplein 10, GA Nijmegen, 6525, The Netherlands, Department of Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences (RIMLS), Radboud University, Nijmegen, 6500, HB, The Netherlands, Institute of Genetics and Biophysics, National Research Council (CNR), Naples, 80125, Italy, Program in Computational Biology and Bioinformatics, Yale University, New Haven, 06520, Connecticut, USA, Mark B. Gerstein (Principal Investigator),Jieming Chen,Yao Fu,Arif O. Harmanci,Donghoon Lee,Xinmeng Jasmine Mu,Jing Zhang,Yan Zhang,Mark B. Gerstein (Principal Investigator),Jieming Chen,Xinmeng Jasmine Mu,Cristina Sisu,Jing Zhang,Yan Zhang,Mark B. Gerstein (Principal Investigator),Lukas Habegger,Mark B. Gerstein (Principal Investigator) (Co-Chair)&Yao Fu, Department of Computer Science, Yale University, New Haven, 06520, Connecticut, USA, Mark B. Gerstein (Principal Investigator),Mark B. Gerstein (Principal Investigator),Mark B. Gerstein (Principal Investigator)&Mark B. Gerstein (Principal Investigator) (Co-Chair), Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, 06520, Connecticut, USA, Mark B. Gerstein (Principal Investigator),Suganthi Balasubramanian,Mike Jin,Jeremy Liu,Jing Zhang,Yan Zhang,Mark B. Gerstein (Principal Investigator),Jing Zhang,Yan Zhang,Mark B. Gerstein (Principal Investigator),Suganthi Balasubramanian,Mark B. Gerstein (Principal Investigator) (Co-Chair),Suganthi Balasubramanian&Donghoon Kim, Department of Health Sciences Research, Mayo Clinic, Rochester, 55905, Minnesota, USA, Department of Chemistry, Yale University, New Haven, 06520, Connecticut, USA, Department of Medical Statistics and Bioinformatics, Molecular Epidemiology Section, Leiden University Medical Center, 2333, ZA, The Netherlands, Department of Computer Science, University of California, San Diego, La Jolla, 92093, California, USA, Beyster Center for Genomics of Psychiatric Diseases, University of California, San Diego, La Jolla, 92093, California, USA, Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, 77230, Texas, USA, Ken Chen (Principle Investigator),Xian Fan,Zechen Chong&Tenghui Chen, Bina Technologies, Roche Sequencing, Redwood City, 94065, California, USA, Department of Surgery, Massachusetts General Hospital, Boston, 02114, Massachusetts, USA, Kasper Lage (Principal Investigator),Jakob Berg Jespersen&Heiko Horn, Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, Kemitorvet Building 208, Lyngby, 2800, Denmark, Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, 10024, New York, USA, Department of Invertebrate Zoology, American Museum of Natural History, New York, 10024, New York, USA, School of Life Sciences, Arizona State University, Tempe, 85287-4701, Arizona, USA, Program in Biomedical Informatics, Stanford University, Stanford, 94305, California, USA, Institute for Molecular Bioscience, University of Queensland, St Lucia, QLD 4072, Australia, Lachlan Coin (Principal Investigator)&Haojing Shao, Virginia Bioinformatics Institute, 1015 Life Sciences Drive, Blacksburg, 24061, Virginia, USA, Division of Allergy and Clinical Immunology, School of Medicine, Johns Hopkins University, Baltimore, 21205, Maryland, USA, Department of Ecology and Evolution, Stony Brook University, Stony Brook, 11794, New York, USA, Centre for Health, Law and Emerging Technologies, University of Oxford, Oxford, OX3 7LF, UK, Nuffield Department of Population Health, The Ethox Center, University of Oxford, Old Road Campus, OX3 7LF, UK, Johns Hopkins University School of Medicine, Baltimore, 21205, Maryland, USA, Department of Medical History and Bioethics, Morgridge Institute for Research, University of Wisconsin-Madison, Madison, 53706, Wisconsin, USA, University of Wisconsin Law School, Madison, 53706, Wisconsin, USA, US National Institutes of Health, Center for Research on Genomics and Global Health, National Human Genome Research Institute, 12 South Drive, Bethesda, 20892, Maryland, USA, Department of African & African American Studies, Duke University, Durham, North, 27708, Carolina, USA, Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, 19104, Pennsylvania, USA, Department of Psychiatry and Clinical Psychobiology & Institute for Brain, Cognition and Behavior (IR3C), University of Barcelona, Barcelona, 08035, Spain, Cancer and Immunogenetics Laboratory, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK, Laboratory of Molecular Genetics, Institute of Biology, University of Antioquia, Medelln, Colombia, Peking University Shenzhen Hospital, Shenzhen, 518036, China, Institute of Medical Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Kunming, 650118, China, Instituto de Biologia Molecular y Celular del Cancer, Centro de Investigacion del Cancer/IBMCC (CSIC-USAL), Institute of Biomedical Research of Salamanca (IBSAL) & National DNA Bank Carlos III, University of Salamanca, Salamanca, 37007, Spain, Ponce Research Institute, Ponce Health Sciences University, Ponce, 00716, Puerto Rico.