NG00162 - Oligodendrocytes single-cell whole genome and RNA sequencing

To access this data, please log into DSS and submit an application.
Within the application, add this dataset (accession NG00162) in the “Choose a Dataset” section.
Once approved, you will be able to log in and access the data within the DARM portal.

Description

Characterizing the mechanisms of somatic mutations in the brain is important for understanding aging and disease, but little is known about the mutational patterns of different cell types. We performed whole-genome sequencing of 86 single oligodendrocytes, 20 mixed glia, and 56 single neurons from neurotypical individuals (0.4 to 104 years old) and compared the rates and signatures of somatic single nucleotide variants (sSNVs) and small insertions and deletions (indels) from each cell type. We further correlated this data with single-cell RNA (scRNA-seq) and chromatin accessibility (scATAC-seq) data generated from the same brains to compare the mutagenic processes in glia and neurons.

single-cell whole genome sequencing (scWGS):

Fluorescence-activated nuclear sorting (FANS) was used to isolate SOX10 cells from fresh frozen human brain tissue from the prefrontal cortex. Whole-genome amplification was performed using MDA or PTA following manufacturer guidelines. Libraries for sequencing were generated using the KAPA HyperPlus kit (Roche) using dual indexes and were sequenced across 5 lanes of Ilumina NovaSeq6000 (2x150bp), targeting 20x coverage (75Gbp)/sample. SCAN2 was used to identify single-cell somatic mutations.

single-cell RNA sequencing (scRNA-seq):

Sequencing libraries were prepared using the 10X Genomics Chromium Next GEM Single Cell Reagent Kit v3.1 with nuclear pellets from fresh frozen human brain tissue from the prefrontal cortex of 2 individuals. Each library preparation was submitted for paired-end single indexing sequencing on Illumina HiSeqX or NovaSeq6000 targeting ~50,000 read pairs per nucleus. The data was demultiplexed using bcl2fastq. scRNA-seq FASTQ files were processed using the 10X Genomics cellranger count pipeline for gene expression to perform alignment to hg19, barcode counting, UMI counting, and generation of feature-barcode matrices. Cell Ranger filtered count matrices were used for downstream analysis using Seurat 3.0. Each library was further filtered for cells with > 200 and < 3000 genes and <5% mitochondrial genes, and genes with <10,000 UMI counts and >3 cells. RNA counts were normalized using the LogNormalize method and the 2,000 most highly variable features were identified using the vst method. Data were scaled by regressing out the percentage of mitochondrial genes. Non-linear dimensional reduction and clustering was then performed. DoubletFinder was used to remove doublets using optimal parameters as per the paramSweep function. Finally, cell-type identities were assigned to each cluster in the Uniform Manifold Approximation and Projection (UMAP) based on expression of known brain cell-type markers.

single-cell ATAC sequencing (scATAC-seq):

Nuclei were obtained from the same brain region as used for single-cell whole-genome amplification. Nuclei derived from different individuals were processed for transposition separately, before loading to the 10x Chromium Controller for GEM generation, barcoding, and library construction, as per manufacturer instructions. Libraries were submitted for paired-end dual index sequencing on one flow cell of Illumina S2 NovaSeq6000 (100 cycles) to obtain ~50,000 reads per nucleus. Sequencing data were demultiplexed using bcl2fastq and mkfastq. cellranger-atac count v1.1.0 was run separately on the resulting FASTQ files for each scATAC-seq library (one per individual) with default parameters and the vendor-provided hg19 reference. Results from the individual library analyses were then merged by cellranger-atac aggr –normalize-depth. scATAC-seq data were analyzed by Signac v1.1.0 and Seurat v3 following the authors’ instructions.

Sample Summary per Data Type

Sample Set	Accession	Data Type	Number of Samples
Oligodendrocytes single-cell whole genome and RNA sequencing	snd10084	scATAC-seq, scRNA-seq, WGS	123

Available Filesets

Name	Accession	Latest Release	Description
single-cell ATAC sequencing (scATAC-seq)	fsa000106	NG00162.v1	scATAC-seq
single-cell RNA sequencing (scRNA-seq)	fsa000107	NG00162.v1	scRNA-seq
single-cell whole genome sequencing (scWGS)	fsa000108	NG00162.v1	scWGS

View the File Manifest for a full list of files released in this dataset.

The first release includes bam and vcf files for whole-genome sequencing from 15 participants, fastq files for single-cell RNA sequencing from 2 participants, bed and fastq files for single-cell ATAC sequencing from 9 participants. Samples were sequenced using Ilumina NovaSeq6000.

Sample Set	Accession Number	Number of Subjects	Number of Samples
Oligodendrocytes single-cell whole genome and RNA sequencing	snd10084	15	123

Consent Level	Number of Subjects
HMB-IRB-PUB	15

Total number of approved DARs: 8

Investigator:
Belloy, Michael
Institution:
Washington University in St Louis
Project Title:
Elucidating sex-specific risk for Alzheimer's disease through state-of-the-art genetics and multi-omics
Date of Approval:
January 6, 2025
Request status:
Expired
Research use statements:
Show statements
Technical Research Use Statement:
• Objectives: In this project, we seek to holistically investigate the genetic and molecular drivers of sex dimorphism in Alzheimer’s disease across ancestries. • Study design: This study integrates large-scale population genetics with multi-omics and endophenotype analyses. We are integrating all data available from ADGC and ADSP, together with other data from AMP-AD and biobanks such as UKB, FinnGen, and MVP to conduct large-scale multi-ancestry GWAS, rare-variant gene aggregation analyses, QTL studies, PWAS, TWAS, etc. We also particularly focus on X chromosome association studies. The study design also interrogates interactions with ancestry, hormone exposures, and with APOE*4, as well as comparisons to non-stratified GWAS/XWAS of Alzheimer’s disease. Further, we will also employ genetic correlation analyses, mendelian randomization, colocalization, and pleiotropy analyses, to interrogate overlap with other complex traits to better understand the mechanisms underlying sex dimorphism in Alzheimer’s disease. • Analysis plan, including the phenotypic characteristics that will be evaluated in association with genetic variants: Our phenotypes will include Alzheimer’s disease risk, conversion risk, various endophenotypes (including amyloid/tau biomarkers, brain imaging metrics, etc.) as well as molecular traits. As noted above, we will conduct large-scale multi-ancestry GWAS, XWAS, rare-variant gene aggregation analyses, QTL studies, PWAS, TWAS, etc. Specific aims include interrogating these question and analyses on (1) the autosomes, (2) the X chromosome, and (3) leveraging sex stratified QTL studies to drive discovery of risk genes.
Non-Technical Research Use Statement:
Alzheimer’s disease (AD) manifests itself differently across men and women, but the genetic and molecular factors that drive this remain elusive. AD is the most common cause of dementia and till today remains largely untreatable. It is thus crucial to study the genetics of AD in a sex-specific manner, as this will help the field gain important insights into disease pathophysiology, identify novel sex-specific risk factors relevant to personalized genetic medicine, and uncover potential new AD drug targets that may benefit both sexes. This project uses large-scale genomics and multi-omics to elucidate novel sex agnostic and sex-specific AD risk genes. We will interrogate sex dimorphism for AD risk on the autosomes and the sex chromosomes. We similarly interrogate sex dimorphism in the genetic regulation of gene expression and protein levels, which we will integrate with genetic risk for Alzheimer’s disease to further discovery risk genes. Throughout, we will also interrogate how sex-specific risk for AD interactions with hormone exposures, ancestry, and the APOE*4 risk allele.
Investigator:
Cheng, Feixiong
Institution:
Cleveland Clinic
Project Title:
A Multimodal Infrastructure for Alzheimer’s MultiOme Data Repurposing: Artificial Intelligence, Network Medicine, and Therapeutics Discovery
Date of Approval:
September 4, 2025
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
We propose to develop capable and intelligent computer-based toolboxes that enable searching, sharing, visualizing, querying, and analyzing genetics, genomics, multi-omics, and clinical data for AD. The central unifying hypothesis of this project (1U01AG073323-01 [pending for Council meeting at May/2021) is that a genome-wide, multimodal artificial intelligence (AI) framework to identify novel risk genes and networks from human WGS/WES and multi-omics findings will offer drug targets for targeted therapeutic development in AD. Aim 1 will identify rare coding variant-based risk genes using a sequence and structure-based deep learning model. Aim 2 will identify rare non-coding variant-based risk genes using a multiple kernel learning approach. Aim 3 will test whether GWAS common variants linked to AD pathobiology and endophenotypes are enriched in gene regulatory networks in a cell-type specific manner using a Bayesian framework. These analyses will leverage variants from ethnically diverse WGS/WES and clinical data (i.e., imaging, biomarkers, and cognitive measures) from Alzheimer's Disease Sequencing Project (ADSP), and publicly available chromatin interactomic data from NIH RoadMap, FANTOM5, and NIH 4D Nucleome. We will validate our findings using WGS/WES data and protein expression data from our existing cohorts: The Cleveland Clinic Lou Ruvo Center for Brain Health Aging and Neurodegenerative Disease Biobank (CBH-Biobank) and the Cleveland Alzheimer's Disease Research Center (CADRC). We will compile information for clinical data harmonization, including functional imaging, AD biomarkers, and cognitive measures for all integrative analyses. There are no any PHI information will collected or used in the data analysis. We don’t have the planned collaboration with researchers outside Cleveland Clinic in the current analytic plans.
Non-Technical Research Use Statement:
It is estimated that more than 16 million people with AD live in the United States by 2050 and the predisposition to AD involves a complex, polygenic, and pleiotropic genetic architecture. This project will develop intelligent computer-based network medicine and systems biology tools, capable of identifying and validating human genome sequencing findings for novel risk gene discoveries and targeted therapeutic development in AD. The innovative network-based, artificial intelligence toolboxes and novel risk genes and biologically relevant targeted therapeutic approaches developed in this proposal will prove to be novel and effective ways to improve outcomes in long-term brain care for the rapidly growing AD population, an essential goal of AD precision medicine.
Investigator:
Cruchaga, Carlos
Institution:
Washington University School of Medicine
Project Title:
The Familial Alzheimer Sequencing (FASe) Project
Date of Approval:
January 21, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
The goal of this study is to identify new genes and mutations that cause or increase risk for Alzheimer disease (AD), as well as protective factors. Individuals and families were selected from the Knight-ADRC (Washington University) and the NIA-LOAD study. Only families with at least three first-degree affected individuals were included. Families with pathogenic variants in the known AD or FTD genes, or in which APOE4 segregated with disease were excluded. At least two cases and one control were selected per family. Cases had an age at onset (AAO) after 65 yo and controls had a larger age at last assessment than the latest AAO within the family. Whole exome (WES) and whole genome sequencing (WGS) was generated for 1,235 individuals (285 families) that together with data from our collaborators and the ADSP family-based cohort (3,449 individuals and 757 families) will provide enough statistical power to identify new genes for AD. Dr. Tanzi (Harvard Medical School) will provide WGS from 400 families from the NIMH Alzheimer disease genetics initiative study. We will perform single variant and gene-based analyses to identify genes and variants that increase risk for disease in AD families. Single variant analysis will consist of a combination of association and segregation analyses. We will run family-based gene-based methods to identify genes that show and overall enrichment of variants in AD cases. We will also look for protective and modifier variants. To do this we will identify families loaded with AD cases, that also include individuals with a high burden of known risk variants but that do not develop the disease (escapees). We will use the sequence data and the family structure to identify variants that segregate with the escapee phenotype. The most promising variants and genes will be replicated in independent datasets (ADSP case-control, ADNI, Knight-ADRC, NIA-LOAD ). We will perform single variant and gene-based analyses to replicate the initial findings, and survival analysis to replicate the protective variants. We will select the most promising variants/genes for functional studies
Non-Technical Research Use Statement:
Family-based approaches led to the identification of disease-causing Alzheimer’s Disease (AD) variants in the genes encoding APP, PSEN1 and PSEN2. The identification of these genes led to the A?-cascade hypothesis and to the development of drugs that target this pathway. Recently, we have identified rare coding variants in TREM2, ABCA7, PLD3 and SORL1 with large effect sizes for risk for AD, confirming that rare coding variants play a role in the etiology of AD. In this proposal, we will identify rare risk and protective alleles using sequence data from families densely affected by AD. We hypothesize that these families are enriched for genetic risk factors. We already have sequence data from 695 families (2,462 individuals), that combined with the ADSP and the NIMH dataset will lead to a dataset of more than 1,042 families (4,684 individuals). Our preliminary results support the flexibility of this approach and strongly suggest that protective and risk variants with large effect size will be found, which will lead to a better understanding of the biology of the disease.
Investigator:
Konermann, Silvana
Institution:
Arc institute
Project Title:
Modeling Alzheimer’s disease risk and associated molecular phenotypes
Date of Approval:
August 8, 2025
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
The objective of the proposed research is to determine the relationship between Alzheimer’s disease (AD) genetic risk and associated molecular phenotypes. Genotype data will be used to compute a polygenic risk score (PRS) for disease-affected and control (non-disease-affected) participants. Statistical regression and mediation analyses will be used to model variation of molecular phenotypes with respect to PRS and, where available, pathology stage or cognitive impairment. Molecular phenotypes to be analyzed include bulk/single-cell/single-nucleus transcriptome, epigenome, proteome, metabolome, lipidome, amyloid, and tau. Molecular phenotypes of participants, including controls, will be matched with molecular phenotypes of in vitro cellular models, informing the design of in vitro perturbation experiments that recapitulate the genetic drivers of AD risk.
Non-Technical Research Use Statement:
Our goal is to determine the relationship between human genetic profiles associated with Alzheimer’s disease (AD) risk and specific measurable characteristics of human cells. Using multiple statistical analysis methods, we will build quantitative models that describe how those characteristics vary as a function of AD genetic risk. The models we build will help us design in vitro cellular systems that reflect different levels of AD risk, enabling experiments that inform new strategies for treating or preventing AD.
Investigator:
Lodato, Michael
Institution:
University of Massachusetts Chan Medical School
Project Title:
Somatic mutation analysis during aging
Date of Approval:
January 7, 2025
Request status:
Expired
Research use statements:
Show statements
Technical Research Use Statement:
Introduction: Our lab studies somatic mutation in the human brain. Somatic mutation is the process by which mutations occur in cells of the developing or postnatal body. Somatic mutations that occur in a proliferative cells are inherited by all cells derived from that mutated founder. We refer to these variants as clonal somatic mutations. Somatic mutations occurring in post-mitotic cells are restricted to the cell in which they occurred. We refer to these variants as non-clonal somatic mutations. In the human brain, most clonal somatic mutations reflect embryonic development, making them useful for lineage tracing studies. Non-clonal somatic mutations in the brain accumulate during life in differentiated cells like neurons and oligodendrocytes. Studying the molecular features of these variants, for example the type of base change comprising a substitution, or the nucleotide context in which a mutation occurs, nominates mechanisms responsible for generating a given mutation.Study Design: We have performed single-cell whole-genome sequencing (scWGS) on several postmortem human donors from across human lifespan to profile changes in the burden, molecular signature, and distribution of somatic mutations during life. Many of the donors we studied in our lab were also part of the cohort of the dataset we currently request from NIAGADS, published in Cell by Ganz et al. We aim to integrate the data from Ganz et al. with our own to increase the statistical power of our study.Analysis Plan: Data will be mapped to the human genome using the BWA algorithm, and mutations will be identified using SCAN2 pipeline and other pipelines as needed to broaden our study. Mutation burden, distribution, and signatures will be compared across cell types, during aging, and across collection sites (Boston Children's Hospital for these data, UMass Chan Medical School for our data). The only phenotypic data we would share would be age, sex, cause of death, and QC data (RIN, etc.)Planned Collaboration: We may share these data with Dr. Zhiping Weng, also at UMass Chan Medical School. We often collaborate with her computational biology group to analyze genomic data.
Non-Technical Research Use Statement:
DNA damage has long been thought to contribute to human aging. Our lab uses cutting-edge techniques to study specific types of DNA damage, called somatic mutations, in the human brain at high resolution. We will add the data in this collection to our own to increase the power of our study to learn new things about human brain aging, possibly leading to new anti-aging interventions.
Investigator:
Seshadri, Sudha
Institution:
Glenn Biggs Institute for Alzheimer's and Neurodegenerative Diseases, University of Texas Health Sciences Center, San Antonio, TX
Project Title:
Therapeutic target discovery in ADSP data via comprehensive whole-genome analysis incorporating ethnic diversity and systems approaches
Date of Approval:
August 12, 2025
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
Objective: Utilize ADSP data sets to identify genes & specific genetic variants that confer risk for or protection from Alzheimer disease. Aim 1: Using combined WGS/WES across the ADSP Discovery, Disc-Ext, and FUS Phases, including single nucleotide variants, small insertion/deletions, and structural variants. We will: Aim 1a. Perform whole genome single variant and rare variant case/control association analyses of AD using ADSP and other available data; Aim 1b. Target protective variant identification via association analysis using selected controls within the ADSP data and performing meta analysis across association results based on selected controls from non-ADSP data sets. Aim 1c. Perform endophenotype analyses including cognitive function measures, hippocampal volume and circulation beta-amyloid ADSP data in subjects for which these measures are available. Meta analysis will be conducted across ADSP and non-ADSP analysis results. Aim 2: To leverage ethnically-diverse and admixed populations to identify AD variants we will: Aim 2a. Estimate and account for global and local ancestry in all analyses; Aim 2b. Perform admixture mapping in samples of admixed ancestry; and Aim 2c. Perform ethnicity-specific and trans-ethnic meta-analyses. Aim 3: To identify putative therapeutic targets through functional characterization of genes and networks via bioinformatics, integrative ‘omics analyses. We will: Aim 3a. Annotate variants with their functional consequences using bioinformatic tools and publicly available “omics” data. Aim 3b. Prioritize results, group variants with shared function, and identify key genes functionally related to AD via weighted association analyses and network approaches. Analyses will be performed in coordination with the following PIs. Coordination will involve sharing expertise, analysis plans or analysis results. No individual level data will be shared across institutions. Philip De Jager, Columbia University; Eric Boerwinkle & Myriam Fornage, U of Texas Health Science Center, Houston; Sudha Seshadri, U of Texas, San Antonio; Ellen Wijsman, U of Washington. William Salerno, Baylor College of Medicine
Non-Technical Research Use Statement:
This proposal seeks to analyze existing genetic sequencing data generated as part of the Alzheimer’s Disease Sequencing Project (ADSP) including the ADSP Follow-up Study (FUS) with the goal of identifying genes and specific changes within those genes that either confer risk for Alzheimer’s Disease or provide protection from Alzheimer’s Disease. Analytic challenges include analysis of whole genome sequencing data, appropriately accounting for population structure across European ancestry, Hispanic, and African American participants, and interpreting results in the context of other genomic data available.
Investigator:
Wingo, Thomas
Institution:
University of California Davis
Project Title:
Identifying Alzheimer's Disease Genetic Risk Factors By Integrated Genomic and Proteomic Analysis
Date of Approval:
January 21, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
We aim to uncover new genetic risk variants for Alzheimer’s disease (AD), AD-related dementia (ADRD), and behavioral and psychiatric symptoms (BPS) associated with AD/ADRD. We expect to use whole-genome sequencing (WGS), whole-genome genotyping (WGG), and whole-exome sequencing (WES) data. Additionally, we will use the results of brain proteomic analysis to nominate genes and pathways for AD, ADRD, and dementia BPS. We plan to publish our findings to share them with the scientific community.Outcomes that will be tested include: (1) clinical disease status, (2) pathologic characterization (e.g., measures of beta-amyloid, tau, etc.), (3) cognitive decline, (4) BPSD, and (5) outcomes related to AD/ADRD severity. For sequencing data, we will extract raw sequencing reads from CRAM/BAM (or equivalent encrypted files) and re-map those to hg38 build of the human genome using PEMapper. Bascalling will be performed using PECaller using default settings. Variant annotation will use Bystro and quality control will follow approaches to assess completeness and account for ancestry as is customary in our lab. For rare variants, we will a variety of kernel-based approaches and for common variants, use standard statistical modeling. For all analyses, we plan to control for population structure deriving principal components from the underlying sequencing or genotyping data.
Non-Technical Research Use Statement:
Our aim is to identify genetic variants that are associated with Alzheimer's Disease (AD) to uncover new genetic associations. We will examine the role of important risk factors for AD (e.g., age and sex) in our analyses. Separately, we will perform integration of genetic findings for AD with information about how genetic variants influence or are associated with gene expression in the brain, cerebrospinal fluid, or blood to uncover new pathways of disease. Our overarching aim is to use genetic discoveries to identify mechanisms of AD pathogenesis to help nominate new treatment targets.
Investigator:
Zhao, Zhongming
Institution:
University of Texas Health Science Center at Houston
Project Title:
AIM-AI: an Actionable, Integrated and Multiscale genetic map of Alzheimer's disease via deep learning
Date of Approval:
March 27, 2025
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
Objectives: The objective of our study is to advance our understanding of the genetic basis of Alzheimer’s Disease (AD) through the analysis of comprehensive genomic datasets such as Whole Exome Sequencing (WES), Whole Genome Sequencing (WGS), single-nuclei RNA sequencing, and Genome-Wide Association Studies (GWAS), as well as the related phenotype. We aim to identify genetic variants that are integral to the development and progression of AD.Study Design: Our approach involves a detailed multi-omics analysis focusing on both coding and non-coding regions within these datasets. We will develop new analytical variables from existing data, ensuring that our research adheres to the established data use limitations and contributes meaningfully to the field of genetic research in AD.Analysis Plan: The plan centers on investigating the correlation between genetic variants and AD, exploring how these variants influence the disease at a genetic level. We will employ cutting-edge computational methods to analyze interactions between these genetic markers and their potential role in AD pathogenesis. The integration of data from multiple sources will be carefully executed to maintain compliance with data use agreements, emphasizing the scientific exploration of AD.
Non-Technical Research Use Statement:
Our research is dedicated to unraveling the genetic components of Alzheimer’s Disease. By analyzing genetic sequences and variations through various genomic datasets, we seek to deepen the scientific understanding of how these genetic elements contribute to AD. The outcomes of this study will be shared with the public, enhancing general knowledge of Alzheimer’s Disease and supporting the global research community in its ongoing efforts to decode this complex condition.

NG00162 – Oligodendrocytes single-cell whole genome and RNA sequencing

Overview

Description

Sample Summary per Data Type

Available Filesets

Sample information

Related Studies

Cohorts

Consent Levels

Acknowledgement

Acknowledgment statement for any data distributed by NIAGADS:

For investigators using any data from this dataset:

For investigators using Rates and mechanisms of age-related somatic mutation in normal and Alzheimer brain (sa000051) data:

Related Publications

Approved Users

Total number of samples: 15