Overview
To access the dataset, please log into DSS and submit an application.
Within the application, add this dataset (accession NG00171) in the “Choose a Dataset” section.
Once approved, you will be able to log in and access the data within the DARM portal.
Description
This dataset includes summary statistics characterizing genetic population structure of individuals in the LASI-DAD cohort. This includes estimates of population admixture and founder events, including segments of Neanderthal and Denisovan ancestry and regions of homozygosity in India. We used the whole genome sequences for 2,762 individuals part of the LASI-DAD sample set snd10033. Sequencing depth information is provided for 2,712 individuals as a preliminary metric of data quality. Of the total sample pool, 2,679 samples passed quality control filters. For these individuals, we provide information about sequencing depth for each individual, ancestry proportions measured using population genetics tool – qpAdm, runs of homozygosity inferred using PLINK and estimates of Neanderthal and Denisovan ancestry based on hmmix. Details of the analysis and methods are described in the biorxiv preprint (bioRxiv 2024.02.15.580575, doi: https://doi.org/10.1101/2024.02.15.580575)
To obtain subject ID mapping between LASI-DAD datasets, please contact Jinkook Lee (jinkookl@usc.edu) or apply for data access on the LASI-DAD website.
Sample Summary per Data Type
| Sample Set | Accession | Data Type | Number of Samples |
|---|---|---|---|
| Population Structure in LASI-DAD | snd10117 | Population Structure Summary Statistics | 2,712 |
Available Filesets
| Name | Accession | Latest Release | Description |
|---|---|---|---|
| LASI-DAD Population Structure | fsa000126 | NG00171 | Population Structure Summary Statistics |
View the File Manifest for a full list of files released in this dataset.
Data Dictionary Files
Sample information
A total of 2,762 LASI-DAD participants, including 22 trios (mother-father-child), were sequenced at MedGenome, Inc. (Bangalore, India) at an average read depth of 30. Individuals were sampled from 18 different states across India, with median sample size of 157 individuals per state. The raw whole genome sequences were sent to the Genome Center for Alzheimer’s Disease (GCAD) at the University of Pennsylvania for joint calling and quality control. A total of 2,679 samples and 73.2 million autosomal bi-allelic variants passed quality control filters, including 67.1 million single nucleotide variants (SNVs) and 6.04 million insertion-deletions (indels).The dataset includes individuals born in 23 different states, speaking at least 26 different languages, from both rural (63%) and urban (37%) areas, and belonging to various caste groups as recognized by the Indian government: 4% from Scheduled Tribes, 18% from Scheduled Castes, and 44% from other backward class (OBC). Nearly equal numbers of males and females were recruited in the study constituting 52% of females. For many analyses, individuals were categorized, based on their birth location, into six major geographic regions: North (n=555), West (n=385), Central (n=373), South (n=715), North-East (n=73), and East (n=530). Most analysis in this dataset are preformed on 2,620 individuals that passed quality control checks and excludes first-degree relatives.
| Sample Set | Accession Number | Number of Subjects | Number of Samples |
|---|---|---|---|
| Population Structure in LASI-DAD | snd10117 | 1 | 2,712 |
Data Releases
Related Studies
- The Harmonized Diagnostic Assessment of Dementia for the Longitudinal Aging Study in India (LASI-DAD) is an add-on study to the Longitudinal Aging Study in India (LASI) focused on late-life cognition…
Cohorts
Phenotype Harmonization
Consent Levels
| Consent Level | Number of Subjects |
|---|---|
| GRU-IRB-PUB | 2,712 |
Visit the Data Use Limitations page for definitions of the consent levels above.
Acknowledgement
Acknowledgment statement for any data distributed by NIAGADS:
Data for this study were prepared, archived, and distributed by the National Institute on Aging Alzheimer’s Disease Data Storage Site (NIAGADS) at the University of Pennsylvania (U24-AG041689), funded by the National Institute on Aging.
Use the study-specific acknowledgement statements below (as applicable):
For investigators using any data from this dataset:
Please cite/reference the use of NIAGADS data by including the accession NG00171.
For investigators using The Diagnostic Assessment of Dementia for the Longitudinal Aging Study of India (LASI-DAD) (sa000019) data:
In text: "The Longitudinal Aging Study in India, Diagnostic Assessment of Dementia data is sponsored by the National Institute on Aging (grant numbers R01AG051125 and U01AG064948) and is conducted by the University of Southern California."
In references: "The Longitudinal Aging Study in India, Diagnostic Assessment of Dementia Study. Produced and distributed by the University of Southern California with funding from the National Institute on Aging (grant numbers R01AG051125 and U01AG064948), Los Angles, CA."
Publications
- Kerdoncuff E. 50,000 years of Evolutionary History of India: Insights from ~2,700 Whole Genome Sequences. bioRxiv : the preprint server for biology. 2024 Feb 17. PubMed link
Third-Party Access
Approved Users
- Investigator:Belloy, MichaelInstitution:Washington University in St LouisProject Title:Elucidating sex-specific risk for Alzheimer's disease through state-of-the-art genetics and multi-omicsDate of Approval:March 31, 2026Request status:ApprovedResearch use statements:Show statementsTechnical Research Use Statement:• Objectives: In this project, we seek to holistically investigate the genetic and molecular drivers of sex dimorphism in Alzheimer’s disease across ancestries. • Study design: This study integrates large-scale population genetics with multi-omics and endophenotype analyses. We are integrating all data available from ADGC and ADSP, together with other data from AMP-AD and biobanks such as UKB, FinnGen, and MVP to conduct large-scale multi-ancestry GWAS, rare-variant gene aggregation analyses, QTL studies, PWAS, TWAS, etc. We also particularly focus on X chromosome association studies. The study design also interrogates interactions with ancestry, hormone exposures, and with APOE*4, as well as comparisons to non-stratified GWAS/XWAS of Alzheimer’s disease. Further, we will also employ genetic correlation analyses, mendelian randomization, colocalization, and pleiotropy analyses, to interrogate overlap with other complex traits to better understand the mechanisms underlying sex dimorphism in Alzheimer’s disease. • Analysis plan, including the phenotypic characteristics that will be evaluated in association with genetic variants: Our phenotypes will include Alzheimer’s disease risk, conversion risk, various endophenotypes (including amyloid/tau biomarkers, brain imaging metrics, etc.) as well as molecular traits. As noted above, we will conduct large-scale multi-ancestry GWAS, XWAS, rare-variant gene aggregation analyses, QTL studies, PWAS, TWAS, etc. Specific aims include interrogating these question and analyses on (1) the autosomes, (2) the X chromosome, and (3) leveraging sex stratified QTL studies to drive discovery of risk genes.Non-Technical Research Use Statement:Alzheimer’s disease (AD) manifests itself differently across men and women, but the genetic and molecular factors that drive this remain elusive. AD is the most common cause of dementia and till today remains largely untreatable. It is thus crucial to study the genetics of AD in a sex-specific manner, as this will help the field gain important insights into disease pathophysiology, identify novel sex-specific risk factors relevant to personalized genetic medicine, and uncover potential new AD drug targets that may benefit both sexes. This project uses large-scale genomics and multi-omics to elucidate novel sex agnostic and sex-specific AD risk genes. We will interrogate sex dimorphism for AD risk on the autosomes and the sex chromosomes. We similarly interrogate sex dimorphism in the genetic regulation of gene expression and protein levels, which we will integrate with genetic risk for Alzheimer’s disease to further discovery risk genes. Throughout, we will also interrogate how sex-specific risk for AD interactions with hormone exposures, ancestry, and the APOE*4 risk allele.
- Investigator:Cruchaga, CarlosInstitution:Washington University School of MedicineProject Title:The Familial Alzheimer Sequencing (FASe) ProjectDate of Approval:January 21, 2026Request status:ApprovedResearch use statements:Show statementsTechnical Research Use Statement:The goal of this study is to identify new genes and mutations that cause or increase risk for Alzheimer disease (AD), as well as protective factors. Individuals and families were selected from the Knight-ADRC (Washington University) and the NIA-LOAD study. Only families with at least three first-degree affected individuals were included. Families with pathogenic variants in the known AD or FTD genes, or in which APOE4 segregated with disease were excluded. At least two cases and one control were selected per family. Cases had an age at onset (AAO) after 65 yo and controls had a larger age at last assessment than the latest AAO within the family. Whole exome (WES) and whole genome sequencing (WGS) was generated for 1,235 individuals (285 families) that together with data from our collaborators and the ADSP family-based cohort (3,449 individuals and 757 families) will provide enough statistical power to identify new genes for AD. Dr. Tanzi (Harvard Medical School) will provide WGS from 400 families from the NIMH Alzheimer disease genetics initiative study. We will perform single variant and gene-based analyses to identify genes and variants that increase risk for disease in AD families. Single variant analysis will consist of a combination of association and segregation analyses. We will run family-based gene-based methods to identify genes that show and overall enrichment of variants in AD cases. We will also look for protective and modifier variants. To do this we will identify families loaded with AD cases, that also include individuals with a high burden of known risk variants but that do not develop the disease (escapees). We will use the sequence data and the family structure to identify variants that segregate with the escapee phenotype. The most promising variants and genes will be replicated in independent datasets (ADSP case-control, ADNI, Knight-ADRC, NIA-LOAD ). We will perform single variant and gene-based analyses to replicate the initial findings, and survival analysis to replicate the protective variants. We will select the most promising variants/genes for functional studiesNon-Technical Research Use Statement:Family-based approaches led to the identification of disease-causing Alzheimer’s Disease (AD) variants in the genes encoding APP, PSEN1 and PSEN2. The identification of these genes led to the A?-cascade hypothesis and to the development of drugs that target this pathway. Recently, we have identified rare coding variants in TREM2, ABCA7, PLD3 and SORL1 with large effect sizes for risk for AD, confirming that rare coding variants play a role in the etiology of AD. In this proposal, we will identify rare risk and protective alleles using sequence data from families densely affected by AD. We hypothesize that these families are enriched for genetic risk factors. We already have sequence data from 695 families (2,462 individuals), that combined with the ADSP and the NIMH dataset will lead to a dataset of more than 1,042 families (4,684 individuals). Our preliminary results support the flexibility of this approach and strongly suggest that protective and risk variants with large effect size will be found, which will lead to a better understanding of the biology of the disease.
- Investigator:Konermann, SilvanaInstitution:Arc instituteProject Title:Modeling Alzheimer’s disease risk and associated molecular phenotypesDate of Approval:August 8, 2025Request status:ApprovedResearch use statements:Show statementsTechnical Research Use Statement:The objective of the proposed research is to determine the relationship between Alzheimer’s disease (AD) genetic risk and associated molecular phenotypes. Genotype data will be used to compute a polygenic risk score (PRS) for disease-affected and control (non-disease-affected) participants. Statistical regression and mediation analyses will be used to model variation of molecular phenotypes with respect to PRS and, where available, pathology stage or cognitive impairment. Molecular phenotypes to be analyzed include bulk/single-cell/single-nucleus transcriptome, epigenome, proteome, metabolome, lipidome, amyloid, and tau. Molecular phenotypes of participants, including controls, will be matched with molecular phenotypes of in vitro cellular models, informing the design of in vitro perturbation experiments that recapitulate the genetic drivers of AD risk.Non-Technical Research Use Statement:Our goal is to determine the relationship between human genetic profiles associated with Alzheimer’s disease (AD) risk and specific measurable characteristics of human cells. Using multiple statistical analysis methods, we will build quantitative models that describe how those characteristics vary as a function of AD genetic risk. The models we build will help us design in vitro cellular systems that reflect different levels of AD risk, enabling experiments that inform new strategies for treating or preventing AD.