Description

These data include a total of 18,916 subjects from the Health and Retirement Study genotyped on Illumina HumanOmni2.5-arrays.  Data files also include imputed data using the 1000 Genomes and the Haplotype Reference Consortium (HRC) reference panels.

Respondents who consented to the saliva collection in 2006 (Phase 1), 2008 (Phase 2), 2010 (Phase 3), or 2012 (Phase 4) have been genotyped using Illumina Omni genotyping platforms. The Phase 1 and 2 participants were genotyped together, and were imputed together previously (see dbGaP accession number phs000428.v1.p1). The Phase 3 participants were subsequently genotyped, and were imputed together with Phases 1-2 (dbGaP accession number phs000428.v2.p2). An additional 3,303 Phase 4 participants were genotyped in 2015, and were imputed together with Phases 1-3, yielding a total of 18,923 unique HRS participants: 15,620 from Phases 1-3, and 3,303 from Phase 4.  After QC, there were a total of 18,916 unique HRS participants included in this dataset.

Additional information can be found on the HRS website: https://hrs.isr.umich.edu/data-products/genetic-data

Sample Summary per Data Type

Sample SetAccessionData TypeNumber of Samples
HRS-All Phasessnd10027GWAS, 1000G Imputation, HRC Imputation19,004
HRS-Phase 4snd10028GWAS3,475

Available Filesets

FilesetAccessionLatest ReleaseDescription
HRS GWASfsa000020NG00119.v1GWAS Illumina HumanOmni2.5
HRS Imputationfsa000021NG00119.v11000G Imputation data, HRC Imputation data

View the File Manifest for a full list of files released in this dataset.

The HRS is a nationally representative sample with oversamples of African-American and Hispanic populations. The target population for the original HRS cohort includes all adults in the contiguous United States born during the years 1931–1941 who reside in households. HRS was subsequently augmented with additional cohorts in 1993 and 1998 to represent the entire population 51 and older in 1998 (b. 1947 and earlier). Since then, the steady-state design calls for refreshment every six years with a new six-year birth cohort of 51–56 year olds. This was done in 2004 with the Early Baby Boomers (EBB) (b. 1948-53) and in 2010 with the Mid Boomers (MBB) (b. 1954–59).

Sample SetAccessionNumber of Subjects
HRS-All Phasessnd1002718,916
HRS-Phase 4snd100283,303
Consent LevelNumber of Subjects
GRU-IRB-PUB-NPU18,916

Visit the Data Use Limitations page for definitions of the consent levels above.

Total number of approved DARs: 5
  • Investigator:
    Benjamin, Daniel
    Institution:
    NBER and UCLA
    Project Title:
    How health-relevant outcomes are influenced by genetics.
    Date of Approval:
    June 1, 2022
    Request status:
    Approved
    Research use statements:
    Show statements
    Technical Research Use Statement:
    We will use the HRS data to pursue two complementary strategies. One is the discovery of particular genetic polymorphisms associated with social-science outcomes. Because the effect of an individual genetic polymorphism on the outcome is likely to be very small, the HRS sample is too small, taken alone, to be used to discover new associations. Hence, we will pursue this strategy with HRS data in conjunction with other datasets that we have organized in the Social Science Genetic Association Consortium (SSGAC; www.thessgac.org). Our second strategy focuses on exploiting the uniquely rich social-science data in the HRS. We will conduct analyses that will shed light on the genetic architecture of a range of social-science outcomes. We will apply statistical methods that use the information contained in the dense SNP data taken as a whole and are thus well-powered in a sample size such as that of the HRS. Our specific aims are: 1. To incorporate data from the HRS into ongoing meta-GWAS efforts from the SSGAC for a range of social-science outcomes, such as educational attainment, and personality. 2. To continue to include HRS in the future releases of the Polygenic Index (PGI) Repository. PGIs (aka polygenic scores) are summaries of a person's genetic predisposition to a particular trait. HRS was included in the first release of the Repository, for which we created PGIs for 47 phenotypes in 11 datasets, which were returned to the datasets to be shared with users according to the datasets’ own data sharing procedures. We will regularly update the existing PGIs and add new phenotypes as larger GWAS and better methodologies become available. Details on the Repository can be found in Becker et al. (2021, Resource profile and user guide of the Polygenic Index Repository. Nat. Hum. Behav.). 3. To use the HRS genotype data to conduct polygenic prediction analyses for a range of social-science traits. Besides the direct interest in assessing the degree of predictive power in PGIs, we will examine how these PGIs interact with environmental factors to influence life outcomes. 4. To estimate heritability and genetic correlations for social science traits in an older population.
    Non-Technical Research Use Statement:
    We will use HRS data to explore the genetic architecture of social-science outcomes. To do so, we will either use HRS data together with other datasets to identify specific genetic variants associated with these outcomes, or analyze the aggregate effect of all genetic variants in HRS alone using heritability analyses and polygenic indexes (PGIs). PGIs are summaries of a person's known genetic predisposition to a particular trait. We will use PGIs to examine the pathways underlying the relationship between genetic variants and outcomes of interest, including analyses of how genes and environment interact. We will also include HRS in future releases of the PGI Repository, an initiative that makes PGIs for a wide range of traits available in a number of datasets that may be useful to social scientists (https://www.thessgac.org/pgi-repository ). HRS was included in the first release of the Repository, and we wish to continue to update the HRS PGIs and add PGIs for new phenotypes as more data or better methodologies become available.
  • Investigator:
    Crimmins, Eileen
    Institution:
    University of Southern California
    Project Title:
    GWAS and Systems Biology Analyses for Aging-Related Conditions: Longevity and Disease
    Date of Approval:
    August 31, 2022
    Request status:
    Approved
    Research use statements:
    Show statements
    Technical Research Use Statement:
    Research Use Statement: Our project will rely on phenotype and genotype data from the Health and Retirement Study (HRS), a nationally representative longitudinal study of the older adult population in the U.S. This is an on-going study. Data we have been using beginning in 2016 are from an approved application through dbGaP, from 15,507 HRS participants and include single nucleotide polymorphism (SNP) data on just under 2.5 million markers, imputed data on approximately 21 million DNA variants, and phenotype data on disease incidence and prevalence, functioning, biomarkers, mortality, and environmental and behavioral covariates. Our request from NIAGADS would provide us with an additional genetic sample to what we have been using, for the additional data on 3,409 participations (yielding N=18,916 total with harmonized genetic data through NIAGADS). Data usage will not create additional risk to participants. Aims of the project are to (1) Identify genetic networks and pathways that influence human aging, disease, functioning, and longevity; (2) Develop predictive models of aging-related health outcomes using information from gene networks; and (3) Examine how social and environmental conditions interact with genes within these aging-related gene networks. We will implement statistical models to test for associations between genetic variants and the same phenotype data. In moving forward with the additional samples, we will use the HRS genome-wide data to examine genetic signatures of healthspan, lifespan, and cognitive aging. Using these genetic signatures, we plan to (i) run pathway enrichment analysis to identify influential biological pathways, (ii) use them for predictive modeling of morbidity/mortality risk and cognitive aging, and (iii) incorporate information from social and behavioral data to examine GxE interactions. The overall goal of the project is to identify mechanistic gene and environment networks that contribute to aging acceleration or deceleration.
    Non-Technical Research Use Statement:
    Non-Technical Summary: Aging is the largest risk factor for morbidity and mortality. Previous research using animal models or case-control studies of centenarians have suggested that variations in the pace of aging may be partially explained by genetic and genomic differences. However, few genetic regulators of human lifespan and healthspan have been identified. Furthermore, there is reason to suggest that the pace of aging may be a polygenic trait, for which multiple genes form complex networks that collectively influence aging and longevity phenotypes. These complex genetic networks may further interact with exogenous factors causing variation to arise in health outcomes under diverse environments. The goal of this project is to use advantaged statistical modeling techniques to understand how gene-gene and gene-environment interactions influence longevity and aging-related conditions.
  • Investigator:
    Pan, Wei
    Institution:
    University of Minnesota
    Project Title:
    Powerful and novel statistical methods to detect genetic variants associated with or putative causal to Alzheimer’s disease
    Date of Approval:
    June 8, 2022
    Request status:
    Approved
    Research use statements:
    Show statements
    Technical Research Use Statement:
    We have been developing more powerful statistical methods to detect common variant (CV)- or rare variant (RV)-complex trait associations and/or putative causal relationships for GWAS and DNA sequencing data. Here we propose applying our new methods, along with other suitable existing methods, to the existing ADSP sequencing data and other AD GWAS data provided by NIA, hence requesting approval for accessing the ADSP sequencing and other related GWAS/genetic data. We have the following two specific Aims: Aim1. Association testing under genetic heterogeneity: For complex traits, genetic heterogeneity, especially of RVs, is ubiquitous as well acknowledged in the literature, however there is barely any existing methodology to explicitly account for genetic heterogeneity in association analysis of RVs based on a single sample/cohort. We propose using secondary and other omic data, such as transcriptomic or metabolomic data, to stratify the given sample, then apply a weighted test to the resulting strata, explicitly accounting for genetic heterogeneity that causal RVs may be different (with varying effect sizes) across unknown and hidden subpopulations. Some preliminary analyses have confirmed power gains of the proposed approach over the standard analysis. Aim 2. Meta analysis of RV tests: Although it has been well appreciated that it is necessary to account for varying association effect sizes and directions in meta analysis of RVs for multi-ethnic cohorts, existing tests are not highly adaptive to varying association patterns across the cohorts and across the RVs, leading to power loss. We propose a highly adaptive test based on a family of SPU tests, which cover many existing meta-analysis tests as special cases. Our preliminary results demonstrated possibly substantial power gains.
    Non-Technical Research Use Statement:
    We propose applying our newly developed statistical analysis methods, along with other suitable existing methods, to the existing ADSP sequencing data and other AD GWAS data to detect common or rare genetic variants associated with Alzheimer’s disease (AD). The novelty and power of our new methods are in two aspects: first, we consider and account for possible genetic heterogeneity with several subcategories of AD; second, we apply powerful meta-analysis methods to combine the association analyses across multiple subcategories of AD. The proposed research is feasible, promising and potentially significant to AD research. In addition, our proposed analyses of the existing large amount of ADSP sequencing data and other AD GWAS data with our developed new methods are novel, powerful and cost-effective.
  • Investigator:
    Wingo, Thomas
    Institution:
    Emory University
    Project Title:
    Identifying Alzheimer's Disease Genetic Risk Factors By Integrated Genomic and Proteomic Analysis
    Date of Approval:
    August 25, 2022
    Request status:
    Approved
    Research use statements:
    Show statements
    Technical Research Use Statement:
    We aim to uncover new genetic risk variants for Alzheimer’s disease (AD) by analysis of an integrated analysis of proteomics and genetic sequencing performed at Emory University. Results of these analyses will be used to weight analysis of whole-genome sequencing (WGS), whole-genome genotyping (WGG), and whole-exome sequencing (WES) data from dbGaP and ADSP. We plan to publish our findings, so they are shared with the scientific community.Outcomes that will be tested include: (1) clinical disease status, (2) pathologic characterization (e.g., measures of beta-amylodi, tau, etc.), and (3) cognitive decline. For sequencing data, we will perform joint calling from samples previously mapped by ADSP using PECaller using default settings. Variant annotation will be performed using Bystro and quality control will follow Wingo et al., 2017. For rare variants, we will use burden- and variance-based tests to estimate association between genetic variants and each outcome for every gene in the genome. External weights from proteomic analyses will be optionally used, as well as measures of genomic conservation for each site. For common variants, we plan to test for differences in allele frequencies using maximum likelihood tests. For all analyses, we plan to control for population structure deriving principal components from the underlying sequencing or genotyping data.
    Non-Technical Research Use Statement:
    Our aim is to identify genetic variants that are associated with Alzheimer's Disease (AD) either using genomic data (from dbGap or from Emory University) or brain protein sequencing data (from Emory University) as a starting point. Each center’s data will be analyzed separately, and we will determine whether the findings are consistent among the centers. Additionally, we will use protein data from brain or cerebrospinal fluid of individuals with or without AD to guide the analysis of the genomic data to identify genetic variants that influence AD risk. Our overarching aim is to use genetic discoveries to identify mechanisms of AD pathogenesis and creation of more meaningful models of the disease.
  • Investigator:
    Zhi, Degui
    Institution:
    University of Texas Health Science Center at Houston
    Project Title:
    Genetics of deep-learning-derived neuroimaging endophenotypes for Alzheimer's Disease
    Date of Approval:
    July 14, 2022
    Request status:
    Approved
    Research use statements:
    Show statements
    Technical Research Use Statement:
    Alzheimer’s disease (AD) affects 5.6 million Americans over the age of 65 and exacts tremendous and increasing demands on patients, caregivers, and healthcare resources. Our current understanding of the biology and pathophysiology of AD is still limited, hindering advances in the development of therapeutic and preventive strategies. Existing genetic studies of AD have some success but these explain only a fraction of the overall disease risk, suggesting opportunities for additional discoveries. The proposed project will leverage existing neuroimaging and genetic data resources from the UK Biobank, the Alzheimer’s Disease Sequencing Project (ADSP), the Alzheimer’s Disease Neuroimaging Initiative (ADNI), and the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium, and will be conducted by a multidisciplinary team of investigators. We will derive AD endophenotypes from neuroimaging data in the UK Biobank using deep learning (DL). We will identify novel genetic loci associated with DL-derived imaging endophenotypes and optimize the co-heritability of these endophenotypes with AD-related phenotypes using UK Biobank genetic data. We will leverage resources and collaborations with AD Consortia and the power of DL-derived neuroimaging endophenotypes to identify novel genes for Alzheimer’s Disease and AD-related traits. Also, we will develop DL-based neuroimaging harmonization and imputation methods and distribute implementation software to the research community. We expect to discover new genes relevant to AD which may leads to understanding of molecular basis of AD and potential new treatment.
    Non-Technical Research Use Statement:
    Alzheimer’s disease (AD) exacts a tremendous burden on patients, caregivers, and healthcare resources. Our current understanding of the biology of AD is still limited, hindering advances in the development of treatment and prevention. Existing genetic studies of AD have some success but more studies are needed. The proposed project will leverage existing neuroimaging and genetic data resources from the UK Biobank, the Alzheimer’s Disease Sequencing Project (ADSP) and other consortia and will be conducted by a multidisciplinary team of investigators. We will derive new AD relevant intermediate phenotypes from neuroimaging data using deep learning (DL), an AI approach. We will identify novel genetic loci associated with these phenotypes. Also, we will develop imaging harmonization and imputation methods and distribute implementation software to the research community. We expect to discover new genes relevant to AD which may leads to understanding of molecular basis of AD and potential new treatment.

Acknowledgment statement for any data distributed by NIAGADS:

Data for this study were prepared, archived, and distributed by the National Institute on Aging Alzheimer’s Disease Data Storage Site (NIAGADS) at the University of Pennsylvania (U24-AG041689), funded by the National Institute on Aging.

Use the study-specific acknowledgement statements below (as applicable):

For investigators using any data from this dataset:

Please cite/reference the use of NIAGADS data by including the accession NG00119.

For investigators using LASI-DAD (sa000019) data:

In text: "The Longitudinal Aging Study in India, Diagnostic Assessment of Dementia data is sponsored by the National Institute on Aging (grant numbers R01AG051125 and U01AG065958) and is conducted by the University of Southern California."

In references: "The Longitudinal Aging Study in India, Diagnostic Assessment of Dementia Study. Produced and distributed by the University of Southern California with funding from the National Institute on Aging (grant numbers R01AG051125 and U01AG065958), Los Angles, CA."

For investigators using HRS (sa000021) data:

HRS is supported by the National Institute on Aging (NIA U01AG009740). The genotyping was partially funded by separate awards from NIA (RC2 AG036495 and RC4 AG039029). Our genotyping was conducted by the NIH Center for Inherited Disease Research (CIDR) at Johns Hopkins University. Genotyping quality control and final preparation were performed by the Genetics Coordinating Center at University of Washington (Phases 1-3) and the University of Michigan (Phase 4).