NG00134 - National Health and Aging Trends Study (NHATS) GWAS

To access this data, please log into DSS and submit an application.
Within the application, add this dataset (accession NG00134) in the “Choose a Dataset” section.
Once approved, you will be able to log in and access the data within the DARM portal.

Description

A dried blood spot (DBS) collection in Round 7 (2017) of NHATS provided the biological material for genotyping. Samples were genotyped at Erasmus Medical Center in Rotterdam, Netherlands on the Illumina Infinium Global Screening Array v3.0. The array contains clinical and rare variants ideal for multiethnic populations. After quality control steps removing variants with high (>5%) missingness and individuals with high missingness (>5%), a total of 700,009 variants and 4,006 samples were included in the NHATS genetic dataset. Quality control was performed at the Arking Lab at the Johns Hopkins University and validated independently at the University of Michigan. We include genotyped data (build hg19/GRCh37 plink format), TOPMed imputed data (build GRCh38, vcf format), ancestry-specific analytic groups, as well as recommended sample filtering information. Within ancestry principal components are available from the NHATS study by request.

Self-reported primary race/ethnicity with missing values assigned the modal category indicated 729 non-Hispanic Black, 2,962 non-Hispanic White, 223 Hispanic, and 92 other race/ethnicity samples (see population breakdown below). For detail about each self-reported race/ethnicity group see the NHATS User Guide [https://nhats.org/researcher/nhats/methods-documentation?id=user_guide]. To request phenotype data for participants in this study, apply at https://www.nhats.org/researcher/data-access/sensitive-data-files?id=restricted_data_files

	Male	Female	Total
Non-Hispanic White	1,261	1,701	2,962
Non-Hispanic Black	279	450	729
Other*	44	48	92
Hispanic	87	136	223
Total	1,671	2,335	4,006

*American Indian, Alaska Native, Asian, Native Hawaiian, Pacific Islander

Sample Summary per Data Type

Sample Set	Accession	Data Type	Number of Samples
National Health & Aging Trends Study (NHATS) GWAS	snd10042	GWAS	4,006

Available Filesets

Name	Accession	Latest Release	Description
NHATS GWAS: Genotype data	fsa000044	NG00134.v1	Genotype data
NHATS GWAS: TOPMed imputation data	fsa000045	NG00134.v1	TOPMed imputation data

View the File Manifest for a full list of files released in this dataset.

Provided in this dataset is a set of GWAS files that underwent a process of quality control measures by the Arking Lab at the Johns Hopkins University, as well as imputed genotypes from the TOPMed reference panel. 4,006 subjects were genotyped at the Erasmus Medical Center in Rotterdam, Netherlands on the Illumina Infinium Global Screening Array v3.0, which captures genotype data on 700,009 genomic SNPs.

Sample Set	Accession Number	Number of Subjects
National Health & Aging Trends Study (NHATS) GWAS	snd10042	4,006

Consent Level	Number of Subjects
GRU-IRB-PUB-NPU	4,006

Visit the Data Use Limitations page for definitions of the consent levels above.

Total number of approved DARs: 3

Investigator:
Cruchaga, Carlos
Institution:
Washington University School of Medicine
Project Title:
The Familial Alzheimer Sequencing (FASe) Project
Date of Approval:
May 9, 2024
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
The goal of this study is to identify new genes and mutations that cause or increase risk for Alzheimer disease (AD), as well as protective factors. Individuals and families were selected from the Knight-ADRC (Washington University) and the NIA-LOAD study. Only families with at least three first-degree affected individuals were included. Families with pathogenic variants in the known AD or FTD genes, or in which APOE4 segregated with disease were excluded. At least two cases and one control were selected per family. Cases had an age at onset (AAO) after 65 yo and controls had a larger age at last assessment than the latest AAO within the family. Whole exome (WES) and whole genome sequencing (WGS) was generated for 1,235 individuals (285 families) that together with data from our collaborators and the ADSP family-based cohort (3,449 individuals and 757 families) will provide enough statistical power to identify new genes for AD. Dr. Tanzi (Harvard Medical School) will provide WGS from 400 families from the NIMH Alzheimer disease genetics initiative study. We will perform single variant and gene-based analyses to identify genes and variants that increase risk for disease in AD families. Single variant analysis will consist of a combination of association and segregation analyses. We will run family-based gene-based methods to identify genes that show and overall enrichment of variants in AD cases. We will also look for protective and modifier variants. To do this we will identify families loaded with AD cases, that also include individuals with a high burden of known risk variants but that do not develop the disease (escapees). We will use the sequence data and the family structure to identify variants that segregate with the escapee phenotype. The most promising variants and genes will be replicated in independent datasets (ADSP case-control, ADNI, Knight-ADRC, NIA-LOAD ). We will perform single variant and gene-based analyses to replicate the initial findings, and survival analysis to replicate the protective variants. We will select the most promising variants/genes for functional studies
Non-Technical Research Use Statement:
Family-based approaches led to the identification of disease-causing Alzheimer’s Disease (AD) variants in the genes encoding APP, PSEN1 and PSEN2. The identification of these genes led to the A?-cascade hypothesis and to the development of drugs that target this pathway. Recently, we have identified rare coding variants in TREM2, ABCA7, PLD3 and SORL1 with large effect sizes for risk for AD, confirming that rare coding variants play a role in the etiology of AD. In this proposal, we will identify rare risk and protective alleles using sequence data from families densely affected by AD. We hypothesize that these families are enriched for genetic risk factors. We already have sequence data from 695 families (2,462 individuals), that combined with the ADSP and the NIMH dataset will lead to a dataset of more than 1,042 families (4,684 individuals). Our preliminary results support the flexibility of this approach and strongly suggest that protective and risk variants with large effect size will be found, which will lead to a better understanding of the biology of the disease.
Investigator:
Wainberg, Michael
Institution:
Sinai Health System
Project Title:
Uncovering the causal genetic variants, genes and cell types underlying brain disorders
Date of Approval:
April 3, 2024
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
We propose a multifaceted approach to elucidate and interpret genetic risk factors for Alzheimer's disease. First, we propose to perform a whole-genome sequencing meta-analysis of the Alzheimer's Disease Sequencing Project with the UK Biobank and All of Us to associate rare coding and non-coding variants with Alzheimer's disease and related dementias. We will explore a variety of case definitions in the UK Biobank and All of Us, including those based on ICD codes from electronic medical records (inpatient, primary care and/or death), self-report of Alzheimer's disease or Alzheimer's disease and related dementias, and/or family history of Alzheimer's disease or Alzheimer's disease and related dementias. We will perform single-variant, coding-variant burden, and non-coding variant burden tests using the REGENIE genome-wide association study toolkit.Second, we propose to develop statistical and machine learning models that can effectively infer (“fine-map”) the causal gene(s), variant(s), and cell type(s) underlying each association we find, as well as associations from existing genome-wide association studies and other Alzheimer's- and aging-related cohorts found in NIAGADS. In particular, we propose to improve causal gene identification by incorporating knowledge of gene function as a complement to functional genomics. For instance, we plan to develop improved methods for inferring biological networks, particularly from single-cell data, and integrate these networks with the results of the non-coding associations from our first aim to fine-map causal genes. To fine-map causal variants and cell types, we plan to integrate the associations from our first aim with single-nucleus chromatin accessibility data from postmortem brain cohorts to simultaneously infer which variant(s) are causal for each discovered locus and which cell type(s) they act through.
Non-Technical Research Use Statement:
We have a comprehensive plan to understand and explain the genetic factors that contribute to Alzheimer's disease. Our approach involves two main steps.First, we'll analyze genetic information from large research databases to identify rare genetic changes associated with Alzheimer's and related memory disorders. We'll look at both specific changes in genes and other parts of the genetic code. We'll use data from different studies and combine them to get a clearer picture.Second, we'll create advanced computer models that can help us figure out which specific genes, genetic changes, and cell types are responsible for these associations. This will help us pinpoint the most important factors contributing to Alzheimer's disease. We'll also analyze data from previous studies to build a more complete understanding of these genetic links.
Investigator:
Zhao, Jinying
Institution:
University of Florida
Project Title:
Identifying novel biomarkers for human complex diseases using an integrated multi-omics approach
Date of Approval:
November 21, 2023
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
GWAS, WES and WGS have identified many genes associated with Alzheimer’s Dementia (AD) and its related traits. However, the identified genes thus far collectively explain only a small proportion of disease heritability, suggesting that more genes remained to be identified. Moreover, there is a clear gender and ethnic disparity for AD susceptibility, but little research has been done to identify gender- and ethnic-specific variants associated with AD. Of the many challenges for deciphering AD pathology, lacking of efficient and power statistical methods for genetic association mapping and causal inference represents a major bottleneck. To tackle this challenge, we have developed a set of novel statistical and bioinformatics approaches for genetic association mapping and multi-omics causation inference in large-scale ethnicity-specific epidemiological studies. The goal of this project is to leverage the multi-omics and clinical data archived by the ADSP, ADNI, ADGC as well as other AD-related data repositories to identify novel genes and molecular markers for AD. Specifically, we will (1) validate our novel methods for identifying novel risk and protective genomic variants and multi-omics causal pathways of AD; (2) identify novel ethnicity- and gender-specific genes and molecular causal pathways of AD. We will share our results, statistical methods and computational software with the scientific community.
Non-Technical Research Use Statement:
Although many genes have been associated with Alzheimer’s Dementia (AD), these genes altogether explain only a small fraction of disease etiology, suggesting more genes remained to be identified. Of the many challenges for deciphering AD pathology, lacking of power statistical methods represents a major bottleneck. To tackle this challenge, we have developed a set of novel statistical and bioinformatics approaches for genetic association mapping and multi-omics causation inference in large-scale ethnicity-specific epidemiological studies. The goal of this project is to leverage the rich genetic and other omic data along with clinical data archived by the ADSP, ADNI, ADGC as well as other AD-related data repositories to identify novel genes and molecular markers for AD. Such results will enhance our understanding of AD pathogenesis and may also serve as biomarkers for early diagnosis and therapeutic targets.

Acknowledgment statement for any data distributed by NIAGADS:

Data for this study were prepared, archived, and distributed by the National Institute on Aging Alzheimer’s Disease Data Storage Site (NIAGADS) at the University of Pennsylvania (U24-AG041689), funded by the National Institute on Aging.

Use the study-specific acknowledgement statements below (as applicable):

For investigators using any data from this dataset:

Please cite/reference the use of NIAGADS data by including the accession NG00134.

For investigators using National Health and Aging Trends Study (NHATS) (sa000030) data:

In text: “National Health and Aging Trends Study (NHATS) is sponsored by the National Institute on Aging (grant number NIA U01AG32947) and conducted by the Johns Hopkins University.”

In references: “National Health and Aging Trends Study. Produced and distributed by www.nhats.org with funding from the National Institute on Aging (grant number NIA U01AG32947).”

National Health & Aging Trends Study (NHATS)

NG00134 – National Health and Aging Trends Study (NHATS) GWAS

Description

Data Available

Description

Sample Summary per Data Type

Available Filesets

Subject Information

Related Studies

Consent Levels

Approved Users

Acknowledgement

Acknowledgment statement for any data distributed by NIAGADS:

For investigators using any data from this dataset:

For investigators using National Health and Aging Trends Study (NHATS) (sa000030) data:

Cohorts