Overview
To access this data, please log into DSS and submit an application.
Within the application, add this dataset (accession NG00162) in the “Choose a Dataset” section.
Once approved, you will be able to log in and access the data within the DARM portal.
Description
Characterizing the mechanisms of somatic mutations in the brain is important for understanding aging and disease, but little is known about the mutational patterns of different cell types. We performed whole-genome sequencing of 86 single oligodendrocytes, 20 mixed glia, and 56 single neurons from neurotypical individuals (0.4 to 104 years old) and compared the rates and signatures of somatic single nucleotide variants (sSNVs) and small insertions and deletions (indels) from each cell type. We further correlated this data with single-cell RNA (scRNA-seq) and chromatin accessibility (scATAC-seq) data generated from the same brains to compare the mutagenic processes in glia and neurons.
single-cell whole genome sequencing (scWGS):
Fluorescence-activated nuclear sorting (FANS) was used to isolate SOX10 cells from fresh frozen human brain tissue from the prefrontal cortex. Whole-genome amplification was performed using MDA or PTA following manufacturer guidelines. Libraries for sequencing were generated using the KAPA HyperPlus kit (Roche) using dual indexes and were sequenced across 5 lanes of Ilumina NovaSeq6000 (2x150bp), targeting 20x coverage (75Gbp)/sample. SCAN2 was used to identify single-cell somatic mutations.
single-cell RNA sequencing (scRNA-seq):
Sequencing libraries were prepared using the 10X Genomics Chromium Next GEM Single Cell Reagent Kit v3.1 with nuclear pellets from fresh frozen human brain tissue from the prefrontal cortex of 2 individuals. Each library preparation was submitted for paired-end single indexing sequencing on Illumina HiSeqX or NovaSeq6000 targeting ~50,000 read pairs per nucleus. The data was demultiplexed using bcl2fastq. scRNA-seq FASTQ files were processed using the 10X Genomics cellranger count pipeline for gene expression to perform alignment to hg19, barcode counting, UMI counting, and generation of feature-barcode matrices. Cell Ranger filtered count matrices were used for downstream analysis using Seurat 3.0. Each library was further filtered for cells with > 200 and < 3000 genes and <5% mitochondrial genes, and genes with <10,000 UMI counts and >3 cells. RNA counts were normalized using the LogNormalize method and the 2,000 most highly variable features were identified using the vst method. Data were scaled by regressing out the percentage of mitochondrial genes. Non-linear dimensional reduction and clustering was then performed. DoubletFinder was used to remove doublets using optimal parameters as per the paramSweep function. Finally, cell-type identities were assigned to each cluster in the Uniform Manifold Approximation and Projection (UMAP) based on expression of known brain cell-type markers.
single-cell ATAC sequencing (scATAC-seq):
Nuclei were obtained from the same brain region as used for single-cell whole-genome amplification. Nuclei derived from different individuals were processed for transposition separately, before loading to the 10x Chromium Controller for GEM generation, barcoding, and library construction, as per manufacturer instructions. Libraries were submitted for paired-end dual index sequencing on one flow cell of Illumina S2 NovaSeq6000 (100 cycles) to obtain ~50,000 reads per nucleus. Sequencing data were demultiplexed using bcl2fastq and mkfastq. cellranger-atac count v1.1.0 was run separately on the resulting FASTQ files for each scATAC-seq library (one per individual) with default parameters and the vendor-provided hg19 reference. Results from the individual library analyses were then merged by cellranger-atac aggr –normalize-depth. scATAC-seq data were analyzed by Signac v1.1.0 and Seurat v3 following the authors’ instructions.
Sample Summary per Data Type
Sample Set | Accession | Data Type | Number of Samples |
---|---|---|---|
Oligodendrocytes single-cell whole genome and RNA sequencing | snd10084 | scATAC-seq, scRNA-seq, WGS | 123 |
Available Filesets
Name | Accession | Latest Release | Description |
---|---|---|---|
single-cell ATAC sequencing (scATAC-seq) | fsa000106 | NG00162.v1 | scATAC-seq |
single-cell RNA sequencing (scRNA-seq) | fsa000107 | NG00162.v1 | scRNA-seq |
single-cell whole genome sequencing (scWGS) | fsa000108 | NG00162.v1 | scWGS |
View the File Manifest for a full list of files released in this dataset.
Sample information
The first release includes bam and vcf files for whole-genome sequencing from 15 participants, fastq files for single-cell RNA sequencing from 2 participants, bed and fastq files for single-cell ATAC sequencing from 9 participants. Samples were sequenced using Ilumina NovaSeq6000.
Sample Set | Accession Number | Number of Subjects | Number of Samples |
---|---|---|---|
Oligodendrocytes single-cell whole genome and RNA sequencing | snd10084 | 15 | 123 |
Related Studies
Consent Levels
Consent Level | Number of Subjects |
---|---|
HMB-IRB-PUB | 15 |
Visit the Data Use Limitations page for definitions of the consent levels above.
Approved Users
- Investigator:Cheng, FeixiongInstitution:Cleveland ClinicProject Title:A Multimodal Infrastructure for Alzheimer’s MultiOme Data Repurposing: Artificial Intelligence, Network Medicine, and Therapeutics DiscoveryDate of Approval:August 14, 2024Request status:ApprovedResearch use statements:Show statementsTechnical Research Use Statement:We propose to develop capable and intelligent computer-based toolboxes that enable searching, sharing, visualizing, querying, and analyzing genetics, genomics, multi-omics, and clinical data for AD. The central unifying hypothesis of this U01 project (U01AG073323) is that a genome-wide, multimodal artificial intelligence (AI) framework to identify novel risk genes and networks from human WGS/WES and multi-omics findings will offer drug targets for targeted therapeutic development in AD. Aim 1 will identify rare coding variant-based risk genes using a sequence and structure-based deep learning model. Aim 2 will identify rare non-coding variant-based risk genes using a multiple kernel learning approach. Aim 3 will test whether GWAS common variants linked to AD pathobiology and endophenotypes are enriched in gene regulatory networks in a cell-type specific manner using a Bayesian framework. These analyses will leverage variants from ethnically diverse WGS/WES and clinical data (i.e., imaging, biomarkers, and cognitive measures) from Alzheimer's Disease Sequencing Project (ADSP), and publicly available chromatin interactomic data from NIH RoadMap, FANTOM5, and NIH 4D Nucleome. We will validate our findings using WGS/WES data and protein expression data from our existing cohorts: The Cleveland Clinic Lou Ruvo Center for Brain Health Aging and Neurodegenerative Disease Biobank (CBH-Biobank) and the Cleveland Alzheimer's Disease Research Center (CADRC). We will compile information for clinical data harmonization, including functional imaging, AD biomarkers, and cognitive measures for all integrative analyses. There are no any PHI information will collected or used in the data analysis. We don’t have the planned collaboration with researchers outside Cleveland Clinic in the current analytic plans.Non-Technical Research Use Statement:It is estimated that more than 16 million people with AD live in the United States by 2050 and the predisposition to AD involves a complex, polygenic, and pleiotropic genetic architecture. This project will develop intelligent computer-based network medicine and systems biology tools, capable of identifying and validating human genome sequencing findings for novel risk gene discoveries and targeted therapeutic development in AD. The innovative network-based, artificial intelligence toolboxes and novel risk genes and biologically relevant targeted therapeutic approaches developed in this proposal will prove to be novel and effective ways to improve outcomes in long-term brain care for the rapidly growing AD population, an essential goal of AD precision medicine.
Acknowledgement
Acknowledgment statement for any data distributed by NIAGADS:
Data for this study were prepared, archived, and distributed by the National Institute on Aging Alzheimer’s Disease Data Storage Site (NIAGADS) at the University of Pennsylvania (U24-AG041689), funded by the National Institute on Aging.
Use the study-specific acknowledgement statements below (as applicable):
For investigators using any data from this dataset:
Please cite/reference the use of NIAGADS data by including the accession NG00162.
For investigators using Rates and mechanisms of age-related somatic mutation in normal and Alzheimer brain (sa000051) data:
Sequencing data from this study was generated with support from the National Institute on Aging (R01AG070921) to Christopher A. Walsh at Boston Children’s Hospital.