To access this data, please log into DSS and submit an application.
Within the application, add this dataset (accession NG00105) in the “Choose a Dataset” section.
Once approved, you will be able to log in and access the data within the DARM portal.
Human post-mortem brain samples were obtained from the Netherlands Brain Bank (NBB) and the Neuropathology Brain Bank and Research CoRE at Mount Sinai Hospital. The permission to collect human brain material was obtained from the Ethical Committee of the VU University Medical Center, Amsterdam, The Netherlands, and the Mount Sinai Institutional Review Board. For the Netherlands Brain bank, informed consent for autopsy, the use of brain tissue and accompanied clinical information for research purposes was obtained per donor ante-mortem.
Samples were genotyped using the Illumina Infinium Global Screening Array (GSA). Genotype imputation was performed for those 90 donors through the Michigan Imputation Server v1.4.1 (Minimac 4) using the 1000 Genomes (Phase 3) v5 (GRCh37) European panel and Eagle v2.4 phasing in quality control and imputation mode with rsq filter set to 0.3. Following imputation, variants were lifted over to the GRCh38 reference to match the RNA-seq data using Picard liftoverVCF and the “b37ToHg38.over.chain.gz” liftover chain file.
RNA extraction and sequencing
RNA was isolated using RNeasy Mini kit (Qiagen) adding the DNase I optional step or as described in detail before (Melief J, et al., 2016). Library preparation was performed at Genewiz using the Ultra-low input system which uses Poly-A selection. SMART-Seq v4 Ultra Low Input RNA Kit was used for library construction using 100 ng of RNA. The libraries were sequenced as 150 bp on fragments with an average read depth of 29 million (ranging from 14-82M) read pairs on the Illumina HiSeq 2500.
RNA-seq data processing
RNA-seq data was processed using the RAPiD pipeline (Wang YC, et al., 2015). RAPiD aligns samples to the hg38 genome build using STAR (Dobin A, et al., 2013) using the GENCODE v30 transcriptome reference and calculates quality control metrics using Picard. RNA-seq quality control was performed applying three filters to remove samples: 1) samples with less than 10M reads aligned from STAR; 2) samples with more than 20% of the reads aligned to ribosomal regions; 3) samples with less than 10% of the reads mapping to coding regions; 4) samples from brain regions with fewer than 20 donors. Estimated transcript abundance was obtained using RSEM (Li B and Dewey CN, 2011) and transcripts were summed to the gene level with tximport (Love MI, et al., 2017). Genes with more than 1 read count per million (CPM) in 30% of the samples were kept for downstream analysis. Gene level read counts were normalized as transcripts per million mapped reads (TPM) to adjust for sequencing library size differences.
Quantitative Trait Loci mapping
To perform expression QTL (eQTL) mapping, we followed the latest pipeline created by the GTEX consortium (Aguet et al. 2019). We completed a separate normalization and filtering method to previous analyses. Gene expression matrices were created from the RSEM output using tximport (Love, Soneson, and Robinson 2017). Matrices were then converted to GCT format, TMM normalized, filtered for lowly expressed genes, removing any gene with less than 0.1 TPM in 20% of samples and at least 6 counts in 20% of samples. Each gene was then inverse-normal transformed across samples. After filtering, we tested a total of 18,430 genes. Then, PEER (Stegle et al. 2012) factors were calculated to estimate hidden confounders within our expression data. We created a combined covariate matrix that included the PEER factors and the first 4 genotyping ancestry MDS values as input to the analysis. We tested numbers of PEER factors from 0 to 20 and found that between 5 and 10 factors produced the largest number of eGenes in each region.
To test for cis-eQTLs, linear regression was performed using the tensorQTL (Taylor-Weiner et al. 2019) cis_nominal mode for each SNP-gene pair using a 1 megabase window within the transcription start site (TSS) of a gene. To test for association between gene expression and the top variant in cis we used tensorQTL cis permutation pass per gene with 1000 permutations. To identify eGenes, we performed q-value correction of the permutation P-values for the top association per gene (Storey 2003) at a threshold of 0.05.
We performed splicing quantitative trait loci (sQTL) analysis using the splice junction read counts generated by regtools (Feng et al. 2018). Junctions were clustered using Leafcutter (Li et al. 2018), specifying for each junction in a cluster a maximum length of 100kb. Following the GTEx pipeline, introns without read counts in at least 50% of samples or with fewer than 10 read counts in at least 10% of samples were removed. Introns with insufficient variability across samples were removed. Filtered counts were then quantile normalized using prepare_phenotype_table.py from Leafcutter, merged, and converted to BED format, using the coordinates from the middle of the intron cluster. We created a combined covariate matrix that included the PEER factors and the first 4 genotyping ancestry MDS values as input to the analysis. We mapped sQTLs with between 0 and 20 PEER factors as covariates in our QTL model and determined 5 to be optimal in MFG, STG and THA. 0 PEER factors were used for SVZ.
To test for cis sQTLs, linear regression was performed using the tensorQTL nominal pass for each SNP-junction pair using a 100kb window from the center of each intron cluster. Although junctions were initially grouped together into clusters, we tested each SNP-junction pair separately, which is the standard approach (Li et al. 2018; Aguet et al. 2019). To test for association between intronic ratio and the top variant in cis we used tensorQTL permutation pass, grouping junctions by their cluster using –grp option. To identify significant clusters, we performed q-value correction using a threshold of 0.05.
Sample Summary per Data Type
|MiGA – Microglia Genomic Atlas – GWAS Data||fsa000008||NG00105.v1||1000Genomes Imputed GWAS|
|MiGA – Microglia Genomic Atlas – QTL Summary Statistics||fsa000009||NG00105.v1||QTL Summary Statistics|
|MiGA – Microglia Genomic Atlas – RNASeq Data||fsa000010||NG00105.v1||RNASeq BAM files|
View the File Manifest for a full list of files released in this dataset.
The Microglia Genomic Atlas (MiGA) is a genetic and transcriptomic resource comprised of 255 primary human microglia samples isolated ex vivo from four different brain regions of 100 human subjects with neurodegenerative, neurological, or neuropsychiatric disorders, as well as unaffected controls. We performed systematic analyses to investigate sources of microglial heterogeneity, including brain region, age, and sex. We further performed expression and splicing QTL analyses in each region and performed a meta-analysis across the four regions to increase our discovery power. We then performed colocalization and used fine-mapping and microglia-specific epigenomic data to prioritize genes and variants that influence neurological disease susceptibility through gene expression and splicing in microglia. With this approach, we have built the most comprehensive resource to date of cis genetic effects on the microglial transcriptome and propose underlying molecular mechanisms of potentially causal functional variants in several brain disorders.Human post-mortem brain samples were obtained from the Netherlands Brain Bank (NBB) and the Neuropathology Brain Bank and Research CoRE at Mount Sinai Hospital. The permission to collect human brain material was obtained from the Ethical Committee of the VU University Medical Center, Amsterdam, The Netherlands, and the Mount Sinai Institutional Review Board. For the Netherlands Brain bank, informed consent for autopsy, the use of brain tissue and accompanied clinical information for research purposes was obtained per donor ante-mortem.
|Sample Set||Accession||Number of Subjects|
|MiGA – Microglia Genomic Atlas||snd10022||n = 108|
|Consent Level||Number of Subjects|
|GRU-IRB-PUB||n = 108|
Visit the Data Use Limitations page for definitions of the consent levels above.
- Investigator:Black, Mary HelenInstitution:JOHNSON/JOHNSON/PHARM/RES/ DEVELOPMENTProject Title:Target identification and validation in Alzheimer’s Disease with Whole-Genome and Whole-Exome Sequence DataDate of Approval:April 18, 2022Request status:ApprovedResearch use statements:Show statementsTechnical Research Use Statement:Alzheimer’s disease (AD) is a common, progressive, neurodegenerative disorder with a strong genetic component with heritability estimates ranging from 58–79% for late-onset AD and over 90% for early onset AD. Genetic association studies are important to highlight key biological mechanisms contributing to the etiology of AD and provide key insights into potential pathways that can ultimately be targeted for future therapeutic development. The objective of this study is to perform a retrospective analysis of genetic data collected from large-scale population-based and case-control cohorts including the UK Biobank, the Alzheimer’s Disease Sequencing Project (ADSP), and FinnGen and integrate them with publicly available multi-omics datasets including, but not limited to, Genotype-Tissue Expression (GTEx), Microglia Genomic Atlas (MiGA), and neuroimaging data to identify novel and existing evidence for genetic determinants of AD. No attempt will be made to try and identify subjects. Aim 1: Identify novel and replicate existing gene associations for AD. We will perform case-control and family-based genetic analyses with AD diagnosis as the outcome of interest. Covariates include age, sex, and principal components. ADSP, UKB, and FinnGen will be analyzed separately and combined with meta-analysis. Biobank cases will be defined using ICD-9/ICD-10 codes, and proxy cases and controls will be carefully defined using questionnaire data on parental history of AD. Both true and proxy cases will be considered to maximize the number of AD cases. Aim 2: Prioritize novel gene associations identified in Aim 1. We will perform genetic fine-mapping and leverage tissue and cell-type specific datasets (e.g. GTEx and MiGA) to prioritize targets for further functional and analytical interrogation. Statistical methods used for target prioritization include colocalization, statistical fine-mapping, and Mendelian randomization. Furthermore, multi-omics-based network approaches will be used to identify disease-related molecular modules and tissue-specific regulatory circuits.Non-Technical Research Use Statement:Alzheimer’s disease (AD) is a common, progressive, neurodegenerative disorder with a strong genetic component with heritability estimates ranging from 58–79% for late-onset AD and over 90% for early onset AD. To date, there is only one treatment option intended to mediate the disease progression of AD, while all others treat symptoms associated with AD. Genetic association studies are important to highlight key biological mechanisms contributing to the etiology of AD and provide key insights into potential pathways that can ultimately be targeted for future therapeutic development. The objective of this study is to perform a retrospective analysis of genetic data collected from large-scale population-based and case-control cohorts including the UK Biobank, the Alzheimer’s Disease Sequencing Project (ADSP), and FinnGen and integrate them with publicly available multi-omics datasets including, but not limited to, Genotype-Tissue Expression (GTEx), Microglia Genomic Atlas (MiGA), and neuroimaging data to identify novel and existing evidence for genetic determinants of AD.
- Investigator:Cruchaga, CarlosInstitution:Washington University School of MedicineProject Title:The Familial Alzheimer Sequencing (FASe) ProjectDate of Approval:March 2, 2022Request status:ApprovedResearch use statements:Show statementsTechnical Research Use Statement:The goal of this study is to identify new genes and mutations that cause or increase risk for Alzheimer disease (AD), as well as protective factors. Individuals and families were selected from the Knight-ADRC (Washington University) and the NIA-LOAD study. Only families with at least three first-degree affected individuals were included. Families with pathogenic variants in the known AD or FTD genes, or in which APOE4 segregated with disease were excluded. At least two cases and one control were selected per family. Cases had an age at onset (AAO) after 65 yo and controls had a larger age at last assessment than the latest AAO within the family. Whole exome (WES) and whole genome sequencing (WGS) was generated for 1,235 individuals (285 families) that together with data from our collaborators and the ADSP family-based cohort (3,449 individuals and 757 families) will provide enough statistical power to identify new genes for AD. Dr. Tanzi (Harvard Medical School) will provide WGS from 400 families from the NIMH Alzheimer disease genetics initiative study. We will perform single variant and gene-based analyses to identify genes and variants that increase risk for disease in AD families. Single variant analysis will consist of a combination of association and segregation analyses. We will run family-based gene-based methods to identify genes that show and overall enrichment of variants in AD cases. We will also look for protective and modifier variants. To do this we will identify families loaded with AD cases, that also include individuals with a high burden of known risk variants but that do not develop the disease (escapees). We will use the sequence data and the family structure to identify variants that segregate with the escapee phenotype. The most promising variants and genes will be replicated in independent datasets (ADSP case-control, ADNI, Knight-ADRC, NIA-LOAD ). We will perform single variant and gene-based analyses to replicate the initial findings, and survival analysis to replicate the protective variants. We will select the most promising variants/genes for functional studiesNon-Technical Research Use Statement:Family-based approaches led to the identification of disease-causing Alzheimer’s Disease (AD) variants in the genes encoding APP, PSEN1 and PSEN2. The identification of these genes led to the A?-cascade hypothesis and to the development of drugs that target this pathway. Recently, we have identified rare coding variants in TREM2, ABCA7, PLD3 and SORL1 with large effect sizes for risk for AD, confirming that rare coding variants play a role in the etiology of AD. In this proposal, we will identify rare risk and protective alleles using sequence data from families densely affected by AD. We hypothesize that these families are enriched for genetic risk factors. We already have sequence data from 695 families (2,462 individuals), that combined with the ADSP and the NIMH dataset will lead to a dataset of more than 1,042 families (4,684 individuals). Our preliminary results support the flexibility of this approach and strongly suggest that protective and risk variants with large effect size will be found, which will lead to a better understanding of the biology of the disease.
- Investigator:Hohman, TimothyInstitution:Vanderbilt University Medical CenterProject Title:Genetic Drivers of Resilience to Alzheimer's DiseaseDate of Approval:December 23, 2021Request status:ApprovedResearch use statements:Show statementsTechnical Research Use Statement:“Asymptomatic” Alzheimer’s disease (AD) is a phenomenon in which 30% of individuals over age 65 meet criteria for autopsy-confirmed pathological AD (beta-amyloid plaques and tau aggregation) but do not clinically manifest cognitive impairment.1-3 The resilience that underlies asymptomatic AD is marked by both protection from neurodegeneration (brain resilience)4 and preserved cognition (cognitive resilience).Our central hypothesis is that genetic effects allow a subset of individuals to endure extensive AD neuropathology without marked brain atrophy or cognitive impairment. We are uniquely positioned to identify resilience genes by leveraging the Resilience from Alzheimer’s Disease (RAD) database, a local resource in which we have harmonized a validated quantitative phenotype of resilience across 8 large AD cohort studies.Our strong interdisciplinary team represents international leaders in genetics, neuroscience, neuropsychology, neuropathology, and psychometrics who will leverage the infrastructure and rich resources of the AD Genetics Consortium, IGAP, ADSP, and our recently established and harmonized continuous metric of resilience to fulfill the following aims:Aim 1. Identify and replicate common genetic variants that predict cognitive resilience (preserved cognition) and brain resilience (protection from brain atrophy) in the presence of AD pathology. We hypothesize that common genetic variation will explain variance in resilience above and beyond known predictors like education. Replication analyses will leverage age of onset data from IGAP to demonstrate that resilience loci predict a later age of AD onset.Aim 2. Identify and replicate rare and low-frequency genetic variants that predict cognitive and brain resilience. Rare and low-frequency variants with large effects have been identified in AD case/control studies, providing new insight into the genetic architecture of AD.Aim 3: Identify sex-specific genetic drivers of cognitive and brain resilience to AD pathology. Our preliminary results highlight sex differences in the downstream consequences of AD neuropathology, including sex-specific genetic markers of resilience.Non-Technical Research Use Statement:As the population ages, late-onset Alzheimer’s disease (AD) is becoming an increasingly important public health issue. Clinical trials targeted a reducing AD progression have demonstrated that patients continue to decline despite therapeutic intervention. Thus, there is a pressing need for new treatments aimed at novel therapeutic targets. A shift in focus from risk to resilience has tremendous potential to have a major public health impact by highlighting mechanisms that naturally counteract the damaging effects of AD neuropathology. The goal of the present project is to characterize genetic factors that protect the brain from the downstream consequences of AD neuropathology. We will identify both rare and common genetic variants using a robust metric of resilience developed and validated by our research team. The identification of such genetic effects will provide novel targets for therapeutic intervention in AD.
- Investigator:Pendergrass, RionInstitution:GenentechProject Title:Genetic Analyses Using Data from MiGA and related studiesDate of Approval:March 17, 2022Request status:ApprovedResearch use statements:Show statementsTechnical Research Use Statement:The purpose of our study is to identify novel genetic factors associated with age related neurodegeneration. This includes identifying genetic factors associated with the risk of these conditions, as well as genetic risk factors associated with age-at-onset (AAO) for these conditions. The findings from our analyses have the potential for identification of new therapeutic targets for Alzheimer's Disease and other age related neurodegenerative disease. The findings from our analyses also have the potential for identification of genetic and phenotypic biomarkers that will be beneficial for subsetting patients in new ways. Using the data we have requested we will be identifying genes driving neurodegenerative diseases by identifying dysregulated genes in cases through using total and allele specific gene expression profiles.Genotypes and RNA-seq reads will be used to generate allele specific expression (ASE). RNA-seq counts and ASE from controls will be used to model the variance of both total and ASE gene expression. Total gene expression vs ASE specifically from cases will be used to identify dysregulated genes in single individuals. These will then be compared to pathway and known disease-associated genes. Case/control status, genotype, and RNA-seq data will be all be evaluated together through quantitative trait loci (QTL) analyses, and additional statistical association analyses.All data will remain anonymized and securely stored, and only those listed on our application and their staff will have access to these data. We will not share any of the individual level data outside of Genentech nor beyond the researchers on our application. We will adhere to all data use agreement stipulations through the NIAGADS. We have a secure computational environment called Rosalind within Genentech where we will use these data. We have IT security staff that constantly monitor all our research computing, assuring safety and privacy of all of our stored data. We will not collaborate with researchers at other institutions.Non-Technical Research Use Statement:Genetic variation and gene expression data allows us to understand more of the genetic contribution to risk and protection from diseases such as Alzheimer’s and dementia. This information also allows us to identify important biological contributors to disease for developing effective treatment strategies, and identifying groups of individuals that would benefit most from new treatments. Our exploration of this relationship between genotype, disease traits, gene expression, and outcomes, through these datasets will allow us to pursue important new findings for disease treatment.
- Investigator:Yang, JingjingInstitution:Emory UniversityProject Title:Novel Bayesian methods for integrating transcriptomic data in GWASDate of Approval:February 16, 2022Request status:ApprovedResearch use statements:Show statementsTechnical Research Use Statement:The objective of the proposed project is to derive novel Bayesian methods to integrate multi-omics data in genome-wide association studies (GWAS) for studying complex phenotypes, with the goal of prioritizing genetic variants and identifying causal genes. First, we will model the expression quantitative trait loci (eQTL) and other molecular QTL information in GWAS by an adapted Bayesian variable selection model, such that the model can quantify the enrichment of associated genetic variants with respect to each annotation such as eQTL and prioritize genetic variants that are of the enriched annotation. Second, we will be conducting transcriptome-wide association studies (TWAS) by a Bayesian approach to identify potentially causal genes. Third, we will use our Bayesian GWAS results to evaluate a Bayesian polygenic risk score for the complex phenotype of interest.We will first learn molecular QTL information by using external transcriptomics data set such as GTEx V8 and external molecular QTL from TCGA, and then integrate this information with the whole genome sequence data from ADSP to prioritize genetic variants associated with complex phenotypes of interest and conduct TWAS to identify risk genes. We are interested in studying all complex phenotypes that were profiled for the ADSP samples, especially Alzheimer’s disease (AD) and AD-related complex phenotypes. Especially, our lab has access to the ROS/MAP multi-omics data shared by the Rush Alzheimer’s disease center (http://www.radc.rush.edu/). All samples in the ROS/MAP study are well-characterized with extensive complex phenotypes profiled, including clinical diagnosis of AD, AD-related complex phenotypes, and psychological phenotypes. We will combine the whole genome sequence data from both ADSP and ROS/MAP samples to increase the total sample size in our study, thus improving the mapping power.The purpose of using ADSP data is to increase the sample size for testing our derived methods for functional genetic association studies of complex phenotypes. We are not limited to studying AD only. We are flexible to study any complex phenotypes that are profiled for both ADSP and ROS/MAP samples.Non-Technical Research Use Statement:This proposed project is to develop novel Bayesian methods to integrate multi-omics data such as transcriptomic in genome-wide association studies (GWAS) of complex phenotypes, with the goal of prioritizing genetic variants and identifying causal genes. i) We will model molecular quantitative trait loci information in GWAS, such that the model can quantify the enrichment for associated genetic variants with respect to each annotation and prioritize genetic variants that are of the enriched annotation. ii) We will derive a novel Bayesian model to use the eQTL effect-sizes as weights to conduct gene-based association tests. iii) We will use the Bayesian results from the proposed two methods to calculate Bayesian polygenic risk scores. We propose to test our proposed methods on the applied genomic analysis data and ROS/MAP multi-omics data to study complex phenotypes that are profiled for both ADSP and ROS/MAP samples, including AD, AD-related pathology traits, and related psychological disorders.
Acknowledgment statement for any data distributed by NIAGADS:
Data for this study were prepared, archived, and distributed by the National Institute on Aging Alzheimer’s Disease Data Storage Site (NIAGADS) at the University of Pennsylvania (U24-AG041689), funded by the National Institute on Aging.
Use the study-specific acknowledgement statements below (as applicable):
For investigators using any data from this dataset:
Please cite/reference the use of NIAGADS data by including the accession NG00105.
For investigators using MiGA (sa000018) data:
We thank members of the Raj and de Witte labs for their feedback on the manuscript. This work was supported by grants from the US National Institutes of Health (NIH NIA R21-AG063130, NIA R01- AG054005, NIA U01-AG068880, and NIA R56-AG055824). This work was supported in part through the computational and data resources and staff expertise provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai. Research reported in this paper was supported by the Office of Research Infrastructure of the National Institutes of Health under award number S10OD026880. The authors thank Michael Chao for his assistance with genotyping QC. The authors thank the teams of the Netherlands Brain Bank and the Mount Sinai Neuropathology Brain Bank and Research CoRE for their services. We thank the study participants for their generous gifts of brain donation. The microglia were isolated through the efforts of a large team and we would like to thank Manja Litjens, Roland D. van Dijk, Alba Fernández-Andreu, Paul R. Ormel, Hans C. van Mierlo, Y. He, Stephanie Gumbs, Miriam E van Strien, Saskia Burm, Vanessa Donega, and Elly M. Hol for all their contributions to this effort. Gijsje Snijders was supported through ZonMw and the foundation “De Drie Lichten” in the Netherlands. Elisa Navarro was supported by Ramon Areces fellowship.
KATIA DE PAIVA LOPES*, GIJSJE SNIJDERS*, JACK HUMPHREY* et al. “Atlas of genetic effects in human microglia transcriptome across brain regions, aging and disease pathologies”. bioRxiv, 2020.