NG00075 - IGAP Rare Variant Summary Statistics - Kunkle et al. (2019)

To access this data, please log into DSS and submit an application. Within the application, add this dataset (accession NG00075) in the “Choose a Dataset” section. Once approved, you will be able to log in and access the data within the DARM portal.

The p-value only files are available in the “Open Access Dataset” tab.

Description

The International Genomics of Alzheimer’s Project (IGAP) is releasing the summary results data from the Alzheimer’s disease GWAS of Kunkle et al., Nat Genet, 2019 analysis in order to enable other researchers to examine particular variants or loci for their evidence of association.

Please note that these summary data should not be used for research into the genetics of intelligence, education, social outcomes such as income, or potentially sensitive behavioral traits such as alcohol or drug addictions. The files include p-values and direction of effect at over 11 million directly genotyped or imputed single nucleotide polymorphisms (SNPs). Due to the possibility of identification of individuals from these summary results, allele frequency data are accessible through the application process.

Two datasets are provided. The first one corresponds to the meta-analysis results obtained in stage 1 including genotyped and imputed data (11,480,632 variants, phase 1 integrated release 3, March 2012) of 21,982 Alzheimer’s disease cases and 41,944 cognitively normal controls. The second one corresponds to the meta-analysis results of the 11,632 variants that were genotyped on the I-select chip and tested for association in an independent set of 8,362 Alzheimer’s disease cases and 10,483 controls with the combined stage1/stage2 P-values. 11,540 of the I-select chip variants were available for meta-analysis with the stage 1 dataset. The Stage 3A (n = 11,666) and Stage 3B (n = 30,511) (for variants in regions not well captured on the I-select chip) results are available in the manuscript. The final sample was 35,274 clinical and autopsy-documented Alzheimer’s disease cases and 59,163 controls.

Although the individual datasets examined excluded any SNPs with call rates <95%, IGAP meta-analysis only analyzed SNPs either genotyped or successfully imputed in at least 30% of the AD cases and 30 %of the control samples across all datasets.

P-value data is generally available to all users using the link below. However, gaining access to allele frequencies requires a formal data request.

This dataset was originally published on the NIAGADS archive site on 03/03/2021 and was moved to DSS on 06/03/2025.

Available Filesets

Name	Accession	Latest Release	Description
IGAP Rare Variant - Kunkle et al. (2019) P-values only (open access)	fsa000139	NG00075.v1	P-values only
IGAP Rare Variant - Kunkle et al. (2019) Full Summary Statistics (application needed)	fsa000140	NG00075.v1	Full Summary Statistics

View the File Manifest for a full list of files released in this dataset.

Data Dictionary Files

Consent	Number of Subjects
DS-ADRD-IRB-PUB-NPU	NA

Visit the Data Use Limitations page for definitions of the consent levels above.

Acknowledgment statement for any data distributed by NIAGADS:

Data for this study were prepared, archived, and distributed by the National Institute on Aging Alzheimer's Disease Data Storage Site (NIAGADS) at the University of Pennsylvania (U24-AG041689), funded by the National Institute on Aging.

Use the study-specific acknowledgement statements below (as applicable):

For investigators using any data from this dataset:

Please cite/reference the use of NIAGADS data by including the accession NG00075.

For investigators using IGAP Rare Variant Summary Statistics- Kunkle et al. (2019) (sa000069) data:

We thank the International Genomics of Alzheimer's Project (IGAP) for providing summary results data for these analyses. The investigators within IGAP contributed to the design and implementation of IGAP and/or provided data but did not participate in analysis or writing of this report. IGAP was made possible by the generous participation of the control subjects, the patients, and their families. The i–Select chips was funded by the French National Foundation on Alzheimer's disease and related disorders. EADI was supported by the LABEX (laboratory of excellence program investment for the future) DISTALZ grant, Inserm, Institut Pasteur de Lille, Université de Lille 2 and the Lille University Hospital. GERAD/PERADES was supported by the Medical Research Council (Grant n° 503480), Alzheimer's Research UK (Grant n° 503176), the Wellcome Trust (Grant n° 082604/2/07/Z) and German Federal Ministry of Education and Research (BMBF): Competence Network Dementia (CND) grant n° 01GI0102, 01GI0711, 01GI0420. CHARGE was partly supported by the NIH/NIA grant R01 AG033193 and the NIA AG081220 and AGES contract N01–AG–12100, the NHLBI grant R01 HL105756, the Icelandic Heart Association, and the Erasmus Medical Center and Erasmus University. ADGC was supported by the NIH/NIA grants: U01 AG032984, U24 AG021886, U01 AG016976, and the Alzheimer's Association grant ADGC–10–196728.

Total number of approved DARs: 10

Investigator:
Belloy, Michael
Institution:
Washington University in St Louis
Project Title:
Elucidating sex-specific risk for Alzheimer's disease through state-of-the-art genetics and multi-omics
Date of Approval:
March 31, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
• Objectives: In this project, we seek to holistically investigate the genetic and molecular drivers of sex dimorphism in Alzheimer’s disease across ancestries. • Study design: This study integrates large-scale population genetics with multi-omics and endophenotype analyses. We are integrating all data available from ADGC and ADSP, together with other data from AMP-AD and biobanks such as UKB, FinnGen, and MVP to conduct large-scale multi-ancestry GWAS, rare-variant gene aggregation analyses, QTL studies, PWAS, TWAS, etc. We also particularly focus on X chromosome association studies. The study design also interrogates interactions with ancestry, hormone exposures, and with APOE*4, as well as comparisons to non-stratified GWAS/XWAS of Alzheimer’s disease. Further, we will also employ genetic correlation analyses, mendelian randomization, colocalization, and pleiotropy analyses, to interrogate overlap with other complex traits to better understand the mechanisms underlying sex dimorphism in Alzheimer’s disease. • Analysis plan, including the phenotypic characteristics that will be evaluated in association with genetic variants: Our phenotypes will include Alzheimer’s disease risk, conversion risk, various endophenotypes (including amyloid/tau biomarkers, brain imaging metrics, etc.) as well as molecular traits. As noted above, we will conduct large-scale multi-ancestry GWAS, XWAS, rare-variant gene aggregation analyses, QTL studies, PWAS, TWAS, etc. Specific aims include interrogating these question and analyses on (1) the autosomes, (2) the X chromosome, and (3) leveraging sex stratified QTL studies to drive discovery of risk genes.
Non-Technical Research Use Statement:
Alzheimer’s disease (AD) manifests itself differently across men and women, but the genetic and molecular factors that drive this remain elusive. AD is the most common cause of dementia and till today remains largely untreatable. It is thus crucial to study the genetics of AD in a sex-specific manner, as this will help the field gain important insights into disease pathophysiology, identify novel sex-specific risk factors relevant to personalized genetic medicine, and uncover potential new AD drug targets that may benefit both sexes. This project uses large-scale genomics and multi-omics to elucidate novel sex agnostic and sex-specific AD risk genes. We will interrogate sex dimorphism for AD risk on the autosomes and the sex chromosomes. We similarly interrogate sex dimorphism in the genetic regulation of gene expression and protein levels, which we will integrate with genetic risk for Alzheimer’s disease to further discovery risk genes. Throughout, we will also interrogate how sex-specific risk for AD interactions with hormone exposures, ancestry, and the APOE*4 risk allele.
Investigator:
Brown, Rebecca
Institution:
University of Pennsylvania
Project Title:
Trajectories of Cognition in Middle Age: Implications for Alzheimer's Disease and Related Dementias in the U.S.
Date of Approval:
March 16, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
Polygenic risk scores (PRS) for dementia and aging-related conditions are known to be associated with cognitive outcomes in older age, but little is known about their relationship to mid-life cognitive decline. We plan to use raw genetic data to derive novel PRS from GWAS sources (including Lambert Alzheimer’s disease PRS, with and without APOE; aPRS for coronary artery disease; a longevity PRS) and evaluate their predictive accuracy for cognitive outcomes in middle age relative to existing PRS. Specifically, we want to create a measure of genetic risk associated with three outcomes: age-related cognition; telomere shortening; and methylation/epigenetic clocks. To achieve this, we will combine the HRS Genotype data with other HRS datasets (Harmonized Cognitive Assessment Protocol (HCAP) (2016 Early V1.0); 2008 Telomere Data; Epigenetic Clocks; 2016 Venous Blood Study (VBS)) to which we already have access. Once we have approved NIAGADS genomics data access, we will additionally request access to the HRS-NIAGADS Cross-Reference File (Genotype Data v3,2006-2012) to link the genomics and HRS datasets. In our ongoing analyses, we would like to update our PRS models by incorporating the most recent GWAS summary statistics. For Alzheimer's disease, this requires access to the full summary statistics from the Kunkle et al., 2019 GWAS. We also would like access to the full summary statistics from the Farrell et al., 2024 GWAS and the Rajabli et al., 2025 GWAS to identify genetic modifiers of tauopathy by comparing progressive supranuclear palsy GWAS results with cross-ancestry Alzheimer’s disease GWAS results.
Non-Technical Research Use Statement:
There is evidence to suggest that differences in people’s genetic code might contribute to differences in age-associated cognitive changes. For example, some people develop memory problems in middle age, and other people experience no changes in memory. Researchers think this may be partially explained by differences in people’s genetic code. We might be able to predict who could experience age-related cognitive changes based on their DNA sequence. If we know which people have experienced memory problems, we can see what their DNA has in common compared to the DNA of people who don’t have any memory problems. Then, we can test this by looking at the DNA of a different group of people; evaluating if their DNA has the same things in common as the group of people with memory problems (vs. no memory problems); and predicting whether they will develop memory problems. The long-term goal of this work is to help identify people who might be at risk for developing memory problems and help them access preventative care or interventions to minimize future cognitive impairment.
Investigator:
Cruchaga, Carlos
Institution:
Washington University School of Medicine
Project Title:
The Familial Alzheimer Sequencing (FASe) Project
Date of Approval:
January 21, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
The goal of this study is to identify new genes and mutations that cause or increase risk for Alzheimer disease (AD), as well as protective factors. Individuals and families were selected from the Knight-ADRC (Washington University) and the NIA-LOAD study. Only families with at least three first-degree affected individuals were included. Families with pathogenic variants in the known AD or FTD genes, or in which APOE4 segregated with disease were excluded. At least two cases and one control were selected per family. Cases had an age at onset (AAO) after 65 yo and controls had a larger age at last assessment than the latest AAO within the family. Whole exome (WES) and whole genome sequencing (WGS) was generated for 1,235 individuals (285 families) that together with data from our collaborators and the ADSP family-based cohort (3,449 individuals and 757 families) will provide enough statistical power to identify new genes for AD. Dr. Tanzi (Harvard Medical School) will provide WGS from 400 families from the NIMH Alzheimer disease genetics initiative study. We will perform single variant and gene-based analyses to identify genes and variants that increase risk for disease in AD families. Single variant analysis will consist of a combination of association and segregation analyses. We will run family-based gene-based methods to identify genes that show and overall enrichment of variants in AD cases. We will also look for protective and modifier variants. To do this we will identify families loaded with AD cases, that also include individuals with a high burden of known risk variants but that do not develop the disease (escapees). We will use the sequence data and the family structure to identify variants that segregate with the escapee phenotype. The most promising variants and genes will be replicated in independent datasets (ADSP case-control, ADNI, Knight-ADRC, NIA-LOAD ). We will perform single variant and gene-based analyses to replicate the initial findings, and survival analysis to replicate the protective variants. We will select the most promising variants/genes for functional studies
Non-Technical Research Use Statement:
Family-based approaches led to the identification of disease-causing Alzheimer’s Disease (AD) variants in the genes encoding APP, PSEN1 and PSEN2. The identification of these genes led to the A?-cascade hypothesis and to the development of drugs that target this pathway. Recently, we have identified rare coding variants in TREM2, ABCA7, PLD3 and SORL1 with large effect sizes for risk for AD, confirming that rare coding variants play a role in the etiology of AD. In this proposal, we will identify rare risk and protective alleles using sequence data from families densely affected by AD. We hypothesize that these families are enriched for genetic risk factors. We already have sequence data from 695 families (2,462 individuals), that combined with the ADSP and the NIMH dataset will lead to a dataset of more than 1,042 families (4,684 individuals). Our preliminary results support the flexibility of this approach and strongly suggest that protective and risk variants with large effect size will be found, which will lead to a better understanding of the biology of the disease.
Investigator:
Fernandez, Victoria
Institution:
ACE Alzheimer Center
Project Title:
GADIR
Date of Approval:
February 10, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
The objective of this study is to contribute to our understanding of neurodegenerative diseases by examining the genetic contributors of major dementia neuropathological hallmarks (amyloid-β deposition, tau pathology, TDP-43, hippocampal sclerosis, Lewy body pathology, and cerebrovascular disease, among others. We will generate the largest Iberian database(N=3500) of neuropathologically curated brains (Aim 1) with a subset of those (N≈350) undergoing deep digital phenotyping (Aim 3). We will generate an associated genetic map (Aim 2) order to elucidate how common and rare genetic variants contribute to specific pathologies. We additionally aim to determine how polygenic risk scores (PRS) and pathway-specific PRS correspond to single and mixed neuropathological profiles, and to clarify the genetic architecture driving co-pathologies that frequently complicate clinical diagnosis. Eventually, we will replicate and fine-map our findings (Aim 4) leveraging available datasets at NIAGADS and other public repositories. Our analysis plan includes genome-wide association testing of ordinal, binary, and quantitative neuropathological traits; rare-variant burden analyses for coding and non-coding regions; PRS and pathway-PRS modeling across multiple dementia-related diseases; unsupervised clustering to identify variant sets defining specific endophenotypes; and pathway and network analyses to interpret significant signals. Colocalization and functional annotation approaches will integrate genomic findings with transcriptomic and proteomic resources. Data obtained from NIAGADS will be used to strengthen replication, broaden meta-analytic power, validate associations across independent neuropathology cohorts, and support functional interpretation using available genetic, expression, and multi-omic datasets. All analyses will use de-identified data in compliance with ethical and data-sharing standards.
Non-Technical Research Use Statement:
Dementia is an immensely challenging and prevalent condition, deeply impacting the lives of over 55 million individuals worldwide. While Alzheimer's disease stands as the most commonly recognized form of dementia, there exist other conditions that present comparable symptoms but distinct underlying pathological characteristics. To provide more effective support to patients and their families, we need to better understand the genetic causes associated to each of these brain pathologies, and to develop advanced tools for early classification and diagnosis. This grant proposal aims to tackle these challenges by establishing the largest Iberian (Spanish and Portuguese) database of dementia neuropathological cases, marked by a modernized and standardized neuropathological classification alongside comprehensive genomic data. Our goal is to delve further into the genetic architecture underpinning these pathological features and to refine existing risk assessment tools for more accurate diagnoses.
Investigator:
Goate, Alison
Institution:
Icahn School of Medicine at Mount Sinai
Project Title:
Study of Alzheimer's disease and other dementias (e.g. frontotemporal dementia) and related phenotypes
Date of Approval:
July 8, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
Alzheimer's disease (AD) is the most common form of dementia but has no effective prevention or treatment. Developing a comprehensive picture of the genetic architecture of AD including a network level functional assessment of risk/resilience genes is essential to develop novel therapeutic targets. The overarching goals of this study are to use genetic and genomic approaches to: 1) identify genes and variants that are involved in the development of AD and related disorders; 2) identify functional networks enriched for AD or related disorder risk and protective loci; 3) determine how cellular function and physiology is impacted by these genetic factors in disease-relevant cell types and animal models. This study will use publicly available whole genome/exome sequence data generated by the Alzheimer’s Disease Sequencing Project (ADSP) and genome-wide association study (GWAS) data from the International Genomics of Alzheimer’s Project (IGAP) and others. We will apply a suite of case-control and family approaches to investigate genetic association with dichotomous and continuous disease traits. This study will not only further our understanding of the genetic architecture of AD but also provide key information regarding the molecular mechanisms, setting the stage for novel therapeutic development.
Non-Technical Research Use Statement:
Alzheimer’s disease (AD) is the only disease among the top ten killers in the U.S. without a disease modifying therapy. Genetic studies provide a powerful means to identify genes and pathways that are causally linked to disease etiology. We propose to use genomic and functional approaches to identify genes that alter the risk of AD and investigate how these genes disrupt cellular pathways leading to disease.
Investigator:
Johnson, Emma
Institution:
Washington University School of Medicine
Project Title:
Investigating the Multivariate Genetic Architecture of Aging-Related Traits
Date of Approval:
January 30, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
Aging encompasses a range of phenotypes, including lifespan, healthspan, aging-related diseases, and other aging-related outcomes. Aging-related phenotypes are correlated with each other but not perfectly. We will develop a multivariate genomic model of aging-related outcomes and diseases in order to 1) understand whether aging-related phenotypes are best described individually, clustered in small groups, or as a single group and 2) identify genetic variants that underly processes shared across multiple aging-related phenotypes. First, we will examine genetic correlations among aging-related phenotypes such as Alzheimer's disease, longevity, epigenetic aging, telomere length, parental lifespan, healthspan, frailty, and other aging-related diseases (e.g. Parkinson's disease). We will then examine patterns of genomic covariation among aging-related phenotypes using Genomic Structural Equation Modeling. Finally, we will perform multivariate genome-wide association studies and derive polygenic scores from these GWAS.
Non-Technical Research Use Statement:
A wide range of characteristics and behaviors are related to aging, including lifespan, aging-related diseases such as Alzheimer's disease and Parkinson's disease, and frailty. Aging-related characteristics are genetically correlated, meaning that some genetic variants affect more than one aging-related phenotype (e.g. some genetic variants may confer risk for Alzheimer's disease and may be related to shorter lifespan). We aim to identify how different aspects of aging are genetically related to each other, and whether we can use broader groups of correlated phenotypes to identify genetic variants related to certain aspects of aging.
Investigator:
Kamboh, M. Ilyas
Institution:
University of Pittsburgh
Project Title:
Genetics of Alzheimer's Disease and Endophenotypes
Date of Approval:
March 31, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
Objectives: We are requesting access to the NIAGADS datasets to augment our ongoing studies on the genetics of Alzheimer’s disease (AD) and AD-related endophenotypes being carried out by Kamboh and his group since 1995. We are doing GWAS using array genotypes, whole-exome sequencing and whole-genome sequencing on datasets derived from University of Pittsburgh ADRC and ancillary population-based longitudinal studies on dementia and biomarkers. Different available phenotypes include AD and non-AD dementia, age-at-set, disease progression and survival, neuroimaging, cognitive decline, plasma biomarkers for the core ATN and non-ATN pathologies. We also plan to expand on gene-gene interaction and sex-stratified analyses which require the actual genotype data. The NIAGADS datasets will be used for replication and meta-analysis, and for gene-gene interaction and sex-stratified analyses. Study Design: A case-control design will incorporate a diverse cohort of individuals with AD and age-matched controls. For quantitative traits (neuroimaging and plasma biomarkers, cognitive performance measures, indicators of disease progression), linear regression analyses will be performed to identify genetic loci. To ensure the findings are robust and inclusive, participants from diverse demographic backgrounds will be included, enabling the exploration of potential genetic variations across populations. Analysis Plan: We will conduct GWAS and targeted analyses on candidate genes on different AD and AD-related phenotypes. Primary phenotypic variables include AD disease status, age-at-onset, last age for controls, APOE genotype, cognitive decline trajectories, sex, and race. Analyses will evaluate the influence of specific genetic variants on disease risk, cognitive performance, and biomarker levels, considering both individual and interactive effects of the APOE genotype. Results will be adjusted for potential confounders, such as demographic factors, to ensure valid associations. Detail analytical methods are described in our published papers for case-control (PMID: 32651314;35694926), quantitative traits (PMID: 30361487;37666928), and cognitive decline (PMID: 37089073; 30954325).
Non-Technical Research Use Statement:
Our research group at the University of Pittsburgh (Pitt), has been working on the genetics of Alzheimer’s disease (AD) and AD-related endophenotypes for almost three decades, on data derived largely from the University of Pittsburgh Alzheimer’s Disease Research Center and ancillary dementia studies. We are requesting access to the NIAGADS genotype and phenotype datasets to augment our sample size to increase power to detect novel genetic associations with AD and related endophenotypes.
Investigator:
Katt, Moriah
Institution:
West Virginia University
Project Title:
Machine Learning-Driven Identification of Alzheimer’s Disease-Specific Targets
Date of Approval:
June 10, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
The objective of this study is to prioritize Alzheimer’s disease (AD)-associated genetic targets using computational and machine learning approaches applied to genome-wide association study (GWAS) summary statistics derived from the NIAGADS dataset (NG00075). Specifically, we will analyze variant-level association signals from the International Genomics of Alzheimer’s Project (IGAP) meta-analysis (Kunkle et al., 2019) and integrate these signals into gene-level representations to identify genes that may be associated with AD pathology and potential molecular targets. Using statistical aggregation and machine learning modeling approaches, we will evaluate patterns of genetic association across variants mapped to genes and genomic regions to prioritize candidate genes that may be linked to AD biological mechanisms. These analyses will leverage association statistics and allele frequency information available in the controlled-access IGAP summary statistics. Allele frequency information provides population-level context for variant-disease associations and can support prioritization of candidate genes whose associated variants may reflect biological processes relevant to AD across broader segments of the population represented in the GWAS. This study is a secondary computational analysis of existing, de-identified GWAS summary statistics. The dataset contains variant-level association metrics describing the statistical relationship between genetic variants and AD. No individual-level genotype data, clinical records, or personally identifiable information will be accessed. All analyses will be conducted using secure institutional computing systems in compliance with NIAGADS data use policies. Computational analyses will be performed on a HIPAA-compliant High-Performance Computing cluster operated by Research Computing at West Virginia University that provides a secure institutional environment for data storage and analysis. The phenotypic characteristic evaluated in association with genetic variants is AD case-control status as defined in the original IGAP meta-analysis, and variants will be analyzed in relation to their reported statistical association with AD.
Non-Technical Research Use Statement:
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that impacts millions of people worldwide. Previous large genetic studies have identified many differences in people’s DNA that are associated with an increased risk of developing AD. By studying these genetic differences across large groups of individuals, researchers can identify genes that may be connected to the biological changes that occur in the brain during the disease. In this project, we will analyze existing genetic research data to identify genes that are strongly associated with AD. The long-term goal of this work is to help identify molecular targets that could inform future therapeutic delivery strategies aimed at improving the ability of treatments to localize to and remain in affected brain tissue.
Investigator:
Kim, Jong Hun
Institution:
KOREA UNIVERSITY RESEARCH AND BUSINESS FOUNDATION
Project Title:
Discovery of APOE-Interacting Genes Through Trans-Ancestry and Sex-Stratified Analysis to Elucidate Alzheimer's Disease Risk Mechanisms and Stratify ARIA Risk Using Proxy Outcomes
Date of Approval:
July 20, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
Objectives: This project identifies ancestry- and sex-specific APOE ε4 modifier genes—variants that amplify or attenuate APOE ε4’s effect on AD risk and ARIA susceptibility from anti-amyloid immunotherapy. Aim 1: Trans-ancestry sex-stratified GWIS to construct an APOE-Wide Epistasis Map. Aim 2: Mechanistic validation via eQTL/pQTL colocalization and epistasis network. Aim 3: Explainable AI (XAI) integrating modifier SNPs, multi-omics subtypes, and ARIA proxy outcomes to stratify pre-treatment ARIA risk. Study Design: Multi-cohort secondary analysis using NIAGADS-controlled ADSP data exclusively. Individual-level data from all 15 ADC cohorts (NG00022–NG00151) and multi-ancestry ADSP WGS (NG00067, NG00166) span European, African American, Hispanic/Latino, and South/East Asian ancestries. Functional datasets (eQTL/pQTL: NG00102, NG00118, NG00120, NG00130) support Aim 2; imaging and neuropathology datasets (NG00103, NG00147, NG00175) enable Aim 3 ARIA proxy development. No prospective recruitment. Multi-dataset rationale: GWIS requires 4–8× more samples than standard GWAS (Gauderman 2002); no single cohort is independently powered—all 15 ADC cohorts must be pooled. Trans-ancestry GWIS requires ancestry-matched datasets (NG00100/African, NG00106/South Asian, NG00141/Hispanic) because population-specific LD cannot be imputed from summary statistics. Functional datasets (eQTL, pQTL, methylation) are non-redundant—each covers a distinct regulatory layer for Aim 2. All datasets are AD-specific; non-AD neurodegeneration data are excluded. Analysis Plan: Phenotypes: AD case/control (primary); APOE ε4 × SNP interaction; lobar microbleed count (ARIA-H proxy); SVD score (WMH, lacunar infarcts, perivascular spaces); longitudinal cognitive decline. Covariates: age, sex, top 20 ancestry PCs, stratum. Methods: logistic GWIS; trans-ancestry meta-analysis (METAL/MR-MEGA); sex-stratified/X-chromosome analyses; eQTL/pQTL colocalization (COLOC2/SMR); XGBoost XAI with 5-fold CV and SHAP.
Non-Technical Research Use Statement:
Alzheimer’s disease affects tens of millions worldwide. Lecanemab, approved in 2024, slows Alzheimer’s progression by removing amyloid plaques—but causes dangerous brain side effects (ARIA: Amyloid-Related Imaging Abnormalities) especially in APOE ε4 carriers, who also most need treatment. Currently, doctors cannot predict which APOE ε4 carriers will benefit versus be harmed. Our research identifies modifier genes controlling how dangerous APOE ε4 is. We leverage the ADSP’s diverse dataset spanning 15+ cohorts across European, African American, Hispanic/Latino, and Asian ancestries—a scale statistically necessary because detecting gene–gene interactions requires 4–8× more samples than standard genetic studies. Population-specific patterns allow high-confidence modifier identification. MRI-based brain bleeds and vascular markers serve as validated ARIA surrogates available at scale. The result is an explainable AI tool that predicts—before treatment begins—which APOE ε4 patients face high ARIA risk and which will benefit from lecanemab, enabling precision Alzheimer’s therapy.
Investigator:
Zhan, Huixin
Institution:
New Mexico Institute of Mining and Technology
Project Title:
AI-Driven Analysis of Genetic and Transcriptomic Data in Alzheimer’s Disease
Date of Approval:
March 30, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
Objectives: This study aims to improve understanding of the genetic and molecular mechanisms underlying Alzheimer’s disease (AD) by applying advanced computational and deep learning models to existing genomic and transcriptomic datasets. Specifically, we seek to identify and characterize genetic variants associated with AD risk, progression, and related phenotypes, contributing to precision medicine approaches for neurodegenerative disorders. Study Design: This project involves secondary analysis of de-identified, controlled-access datasets from NIAGADS (NG00067, NG00116, NG00174, NG00027, NG00075). No new data will be collected. The data will be securely downloaded and analyzed on institutional servers at New Mexico Tech under an approved IRB and Data Use Agreement. Analysis Plan: We will integrate genomic, transcriptomic, and phenotypic data to develop and evaluate machine learning models—such as large language model–based architectures and disease-specific neural networks—to predict variant pathogenicity and gene-level associations. Phenotypic characteristics evaluated will include Alzheimer’s disease diagnosis, cognitive performance measures, neuropathological burden, and biomarker profiles (e.g., amyloid and tau levels). Statistical and model-based analyses will assess associations between genetic variants and these phenotypes, with results reported in aggregate, non-identifiable form. Collaborations (if applicable): N/A
Non-Technical Research Use Statement:
This project uses advanced artificial intelligence and statistical tools to study the genetic and molecular factors that contribute to Alzheimer’s disease. By analyzing existing, de-identified research data from the National Institute on Aging’s NIAGADS repository, we aim to identify genetic variants and biological pathways linked to disease risk and progression. The study will combine information from DNA and gene-expression data to build computer models that can better predict how certain genetic changes affect brain health. Our ultimate goal is to improve scientific understanding of Alzheimer’s disease and support future efforts in early detection and personalized treatment.

NG00075 – IGAP Rare Variant Summary Statistics – Kunkle et al. (2019)

Overview

Description

Available Filesets

Data Dictionary Files

Data Releases

Related Studies

Phenotype Harmonization

Consent Levels

Acknowledgement

Acknowledgment statement for any data distributed by NIAGADS:

For investigators using any data from this dataset:

For investigators using IGAP Rare Variant Summary Statistics- Kunkle et al. (2019) (sa000069) data:

Publications

Third-Party Access

Approved Users

Total number of subjects: 0