NG00128 - Proteomic profiling identified plasma biomarkers for SARS-CoV-2 infection and severity of COVID-19 patients

To access this data, please log into DSS and submit an application.
Within the application, add this dataset (accession NG00128) in the “Choose a Dataset” section.
Once approved, you will be able to log in and access the data within the DARM portal.

Description

The goal of WU350 cohort is to address the many complexities of the COVID-19 pandemic. Among the 332 COVID-19 cases, ~90% were symptomatic patients, 93.7% were hospitalized, 46.7% with ICU admission, 24.7% on ventilation, and 19.0% died due to COVID-19 (82 ventilated and 63 died; 44 of the deceased had been ventilated prior to death). COVID-19 patients were 59 years old on average, 58.7% men and 67.8% of African American ancestry.

A total of 150 age-, sex-, and race-matched non-COVID-19 samples were used as controls. Controls samples were collected from the Charles F. and Joanne Knight Alzheimer Disease Research Center (Knight-ADRC), at Washington University in St. Louis. The Knight-ADRC is one of 30 ADRCs funded by NIH. The goal of this collaborative research effort is to advance AD research with the ultimate goal of treatment or prevention of AD.

From the 482 individuals, peripheral blood was collected, and plasma was isolated by centrifuge and stored at -80⁰C. The proteomic data in plasma was measured using SomaScan v4.1 7K, a multiplexed, single-stranded DNA aptamer-based platform from SomaLogic (Boulder, CO). Instead of physical units, the readout in relative fluorescent units (RFU) was used to report the protein concentration targeted by 7,055 modified aptamers.

Additional information can be found on the websites:
https://neurogenomics.wustl.edu/
https://covid.proteomics.wustl.edu/

Sample Summary per Data Type

Sample Set	Accession	Data Type	Number of Samples
Knight ADRC & WU350	snd10038	Proteomics	482

Available Filesets

Fileset	Accession	Latest Release	Description
COVID19 - Proteomic and Phenotypic Data	fsa000034	NG00128.v1	Proteomic and Phenotypic Data

View the File Manifest for a full list of files released in this dataset.

Data Dictionary Files

COVID-19 cases (N=350) who presented with respiratory illness symptoms and had a physician-ordered positive SARS-CoV-2 test performed at the Barnes Jewish Hospital between 26 March 2020 and 28 August 2020 (Washington University 350 (WU350) cohort). Knight-ADRC cohort collects cognitive data, plasma, CSF and imaging to study the risk factors for Alzheimer’s disease. 150 age, sex and race matched Knight-ADRC cohort participants were used as COVID-19 Controls.

Sample Set	Accession Number	Number of Subjects	Number of Samples
Knight ADRC & WU350	snd10038	482	482

Consent Level	Number of Subjects
DS-ADRD-IRB-PUB	482

Total number of approved DARs: 9

Investigator:
Belloy, Michael
Institution:
Washington University in St Louis
Project Title:
Elucidating sex-specific risk for Alzheimer's disease through state-of-the-art genetics and multi-omics
Date of Approval:
March 31, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
• Objectives: In this project, we seek to holistically investigate the genetic and molecular drivers of sex dimorphism in Alzheimer’s disease across ancestries. • Study design: This study integrates large-scale population genetics with multi-omics and endophenotype analyses. We are integrating all data available from ADGC and ADSP, together with other data from AMP-AD and biobanks such as UKB, FinnGen, and MVP to conduct large-scale multi-ancestry GWAS, rare-variant gene aggregation analyses, QTL studies, PWAS, TWAS, etc. We also particularly focus on X chromosome association studies. The study design also interrogates interactions with ancestry, hormone exposures, and with APOE*4, as well as comparisons to non-stratified GWAS/XWAS of Alzheimer’s disease. Further, we will also employ genetic correlation analyses, mendelian randomization, colocalization, and pleiotropy analyses, to interrogate overlap with other complex traits to better understand the mechanisms underlying sex dimorphism in Alzheimer’s disease. • Analysis plan, including the phenotypic characteristics that will be evaluated in association with genetic variants: Our phenotypes will include Alzheimer’s disease risk, conversion risk, various endophenotypes (including amyloid/tau biomarkers, brain imaging metrics, etc.) as well as molecular traits. As noted above, we will conduct large-scale multi-ancestry GWAS, XWAS, rare-variant gene aggregation analyses, QTL studies, PWAS, TWAS, etc. Specific aims include interrogating these question and analyses on (1) the autosomes, (2) the X chromosome, and (3) leveraging sex stratified QTL studies to drive discovery of risk genes.
Non-Technical Research Use Statement:
Alzheimer’s disease (AD) manifests itself differently across men and women, but the genetic and molecular factors that drive this remain elusive. AD is the most common cause of dementia and till today remains largely untreatable. It is thus crucial to study the genetics of AD in a sex-specific manner, as this will help the field gain important insights into disease pathophysiology, identify novel sex-specific risk factors relevant to personalized genetic medicine, and uncover potential new AD drug targets that may benefit both sexes. This project uses large-scale genomics and multi-omics to elucidate novel sex agnostic and sex-specific AD risk genes. We will interrogate sex dimorphism for AD risk on the autosomes and the sex chromosomes. We similarly interrogate sex dimorphism in the genetic regulation of gene expression and protein levels, which we will integrate with genetic risk for Alzheimer’s disease to further discovery risk genes. Throughout, we will also interrogate how sex-specific risk for AD interactions with hormone exposures, ancestry, and the APOE*4 risk allele.
Investigator:
Cruchaga, Carlos
Institution:
Washington University School of Medicine
Project Title:
The Familial Alzheimer Sequencing (FASe) Project
Date of Approval:
January 21, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
The goal of this study is to identify new genes and mutations that cause or increase risk for Alzheimer disease (AD), as well as protective factors. Individuals and families were selected from the Knight-ADRC (Washington University) and the NIA-LOAD study. Only families with at least three first-degree affected individuals were included. Families with pathogenic variants in the known AD or FTD genes, or in which APOE4 segregated with disease were excluded. At least two cases and one control were selected per family. Cases had an age at onset (AAO) after 65 yo and controls had a larger age at last assessment than the latest AAO within the family. Whole exome (WES) and whole genome sequencing (WGS) was generated for 1,235 individuals (285 families) that together with data from our collaborators and the ADSP family-based cohort (3,449 individuals and 757 families) will provide enough statistical power to identify new genes for AD. Dr. Tanzi (Harvard Medical School) will provide WGS from 400 families from the NIMH Alzheimer disease genetics initiative study. We will perform single variant and gene-based analyses to identify genes and variants that increase risk for disease in AD families. Single variant analysis will consist of a combination of association and segregation analyses. We will run family-based gene-based methods to identify genes that show and overall enrichment of variants in AD cases. We will also look for protective and modifier variants. To do this we will identify families loaded with AD cases, that also include individuals with a high burden of known risk variants but that do not develop the disease (escapees). We will use the sequence data and the family structure to identify variants that segregate with the escapee phenotype. The most promising variants and genes will be replicated in independent datasets (ADSP case-control, ADNI, Knight-ADRC, NIA-LOAD ). We will perform single variant and gene-based analyses to replicate the initial findings, and survival analysis to replicate the protective variants. We will select the most promising variants/genes for functional studies
Non-Technical Research Use Statement:
Family-based approaches led to the identification of disease-causing Alzheimer’s Disease (AD) variants in the genes encoding APP, PSEN1 and PSEN2. The identification of these genes led to the A?-cascade hypothesis and to the development of drugs that target this pathway. Recently, we have identified rare coding variants in TREM2, ABCA7, PLD3 and SORL1 with large effect sizes for risk for AD, confirming that rare coding variants play a role in the etiology of AD. In this proposal, we will identify rare risk and protective alleles using sequence data from families densely affected by AD. We hypothesize that these families are enriched for genetic risk factors. We already have sequence data from 695 families (2,462 individuals), that combined with the ADSP and the NIMH dataset will lead to a dataset of more than 1,042 families (4,684 individuals). Our preliminary results support the flexibility of this approach and strongly suggest that protective and risk variants with large effect size will be found, which will lead to a better understanding of the biology of the disease.
Investigator:
Greicius, Michael
Institution:
Stanford University School of Medicine
Project Title:
Examining Genetic Associations in Neurodegenerative Diseases
Date of Approval:
March 31, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
We are studying the effects of rare (minor allele frequency < 5%) genetic variants on the risk of developing late-onset Alzheimer’s Disease (AD). We are interested in variants that have a protective effect in subjects who are at an increased genetic risk, or variants that lead to multiple dementias. Our aim is to identify any genetic variants that are present in the “case” group but not the “AD control” groups for both types of variants. The raw data we receive will be annotated to identify SNP locations and frequencies using existing databases such as 1,000 Genomes. We will filter the data based on genetic models such as compounded heterozygosity, recessive and dominant models to identify different types of variants.
Non-Technical Research Use Statement:
Current genetic understanding of Alzheimer’s Disease (AD) does not fully explain its heritability. The APOE4 allele is a well-established risk factor for the development of Alzheimer’s Disease (AD). However, some individuals who carry APOE4 remain cognitively healthy until advanced ages. Additionally, the cause of mixed dementia pathology development in individuals remains largely unexplained. We aim to identify genetic factors associated with these “protected” and mixed pathology phenotypes.
Investigator:
Pendergrass, Rion
Institution:
Genentech
Project Title:
Genetic Analyses Using Data from the Alzheimer’s Disease Sequencing Project (ADSP) and related studies
Date of Approval:
February 3, 2026
Request status:
Expired
Research use statements:
Show statements
Technical Research Use Statement:
The purpose of our study is to identify novel genetic factors associated with Alzheimer’s Disease, corticobasal degeneration (CBD) and progressive supranuclear palsy (PSP). This includes identifying genetic factors associated with the risk of these conditions, as well as genetic risk factors associated with age-at-onset (AAO) for these conditions. We will also evaluate genetic associations with sub-phenotypes individuals have within these broad disease categories, such as their Braak staging results which provide insights into the level of severity of Alzheimer’s. Thus we are requesting access to the set of genomic Whole Exome and Whole Genome Sequences (WES and WGS) have just been released through the National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site (DSS NIAGADS). The findings from our genetic association testing have the potential for identification of new therapeutic targets for Alzheimer's Disease, CBD, and PSP. The findings from our studies also have the potential for identification of genetic and phenotypic biomarkers that will be beneficial for subsetting patients in new ways standard genetic epidemiological methods to handle the WGS and WES data. All data will remain anonymized and securely stored, and only those listed on our application and their staff will have access to these data. We will not share any of the individual level data outside of Genentech nor beyond the researchers on our application. We will adhere to all data use agreement stipulations through the DSS NIAGADS. We have a secure computational environment called Rosalind within Genentech where we will use these data. We have IT security staff that constantly monitor all our research computing, assuring safety and privacy of all of our stored data. We will not collaborate with researchers at other institutions.
Non-Technical Research Use Statement:
Genetic variation allows us to understand more of the genetic contribution to risk and protection from diseases such as Alzheimer’s and dementia. This information also allows us to identify important biological contributors to disease for developing effective treatment strategies, and identifying groups of individuals that would benefit most from new treatments. Our exploration of this relationship between genotype and disease traits and outcomes through these datasets will allow us to pursue important new findings for disease treatment.
Investigator:
Safo, Sandra
Institution:
University of Minnesota
Project Title:
Innovative Machine and Deep Learning Analyses of Alzheimer's Disease Omics and Phenotypic Data
Date of Approval:
October 27, 2023
Request status:
Expired
Research use statements:
Show statements
Technical Research Use Statement:
AD is the most common cause of dementia and presents a substantial and increasing economic and social burden. Our ability to diagnose and classify AD from cognitive normals (CN), or discriminate among individuals with AD, early mild cognitive impairment [EMCI], or late mild cognitive impairment (LMCI), is essential for the prevention, diagnosis, and treatment of AD. Since individuals with MCI have a high chance of converting to AD, effectively discriminating between those who convert to AD (MCI-C) from those who do not convert (MCINC) is important for early diagnosis of AD. The heterogeneity of AD has motivated attempts to classify distinct subgroups of AD to better inform the underlying physiology. There is evidence to suggest that using data across multiple modalities (e.g. genetics, imaging, metabolomics) has potential to classify AD subgroups better than using single modality. We will apply machine and deep learning methods to gain deeper insight into AD and ADRD pathobiology. We will use datasets that include genomics, genetics, metabolomics, and phenotypic data for this purpose. Data will be divided into discovery and validation sets. On the discovery set, state-of-the-art ML and DL methods for integrative analysis that we and others have developed will be coupled with resampling techniques to determine candidate molecular signatures and pathways discriminating the AD groups considered. Molecular scores will be developed from these candidate biomarkers. The clinical utility of the scores beyond well-known clinical risk factors for AD will be ascertained. We will validate our findings using the validation data. We will visually and quantitatively compare the risk scores across several clinical variables and outcomes. We will use (un)supervised clustering methods to identify molecular clusters, and we will investigate molecular clusters differentiating MCI to AD converters from non-converters. We may explore differences across ethnic subgroups. We will also innovatively apply our multimodal molecular subtyping methods to discover, reproduce, and characterize novel molecular subgroups of AD– this will allow for better risk stratification.
Non-Technical Research Use Statement:
We have been developing novel machine learning (ML) and deep learning (DL) methods that leverage genomics, other omics (including proteomics and metabolomics), clinical and epidemiology data to better understand the pathogenesis of complex diseases. By integrating data from different sources, we have identified molecular signatures contributing to the risk of the development of complex diseases beyond established risk factors. We are proposing to innovatively apply these, and other existing, methods, to data pertaining to Alzheimer’s disease (AD) and Alzheimer’s disease related dementias (ADRD). A deeper understanding of the genes, genetic pathways, and other molecular signatures of AD is essential and could facilitate the identification of potential therapeutic targets for the disease.
Investigator:
Seshadri, Sudha
Institution:
Glenn Biggs Institute for Alzheimer's and Neurodegenerative Diseases, University of Texas Health Sciences Center, San Antonio, TX
Project Title:
Therapeutic target discovery in ADSP data via comprehensive whole-genome analysis incorporating ethnic diversity and systems approaches
Date of Approval:
August 12, 2025
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
Objective: Utilize ADSP data sets to identify genes & specific genetic variants that confer risk for or protection from Alzheimer disease. Aim 1: Using combined WGS/WES across the ADSP Discovery, Disc-Ext, and FUS Phases, including single nucleotide variants, small insertion/deletions, and structural variants. We will: Aim 1a. Perform whole genome single variant and rare variant case/control association analyses of AD using ADSP and other available data; Aim 1b. Target protective variant identification via association analysis using selected controls within the ADSP data and performing meta analysis across association results based on selected controls from non-ADSP data sets. Aim 1c. Perform endophenotype analyses including cognitive function measures, hippocampal volume and circulation beta-amyloid ADSP data in subjects for which these measures are available. Meta analysis will be conducted across ADSP and non-ADSP analysis results. Aim 2: To leverage ethnically-diverse and admixed populations to identify AD variants we will: Aim 2a. Estimate and account for global and local ancestry in all analyses; Aim 2b. Perform admixture mapping in samples of admixed ancestry; and Aim 2c. Perform ethnicity-specific and trans-ethnic meta-analyses. Aim 3: To identify putative therapeutic targets through functional characterization of genes and networks via bioinformatics, integrative ‘omics analyses. We will: Aim 3a. Annotate variants with their functional consequences using bioinformatic tools and publicly available “omics” data. Aim 3b. Prioritize results, group variants with shared function, and identify key genes functionally related to AD via weighted association analyses and network approaches. Analyses will be performed in coordination with the following PIs. Coordination will involve sharing expertise, analysis plans or analysis results. No individual level data will be shared across institutions. Philip De Jager, Columbia University; Eric Boerwinkle & Myriam Fornage, U of Texas Health Science Center, Houston; Sudha Seshadri, U of Texas, San Antonio; Ellen Wijsman, U of Washington. William Salerno, Baylor College of Medicine
Non-Technical Research Use Statement:
This proposal seeks to analyze existing genetic sequencing data generated as part of the Alzheimer’s Disease Sequencing Project (ADSP) including the ADSP Follow-up Study (FUS) with the goal of identifying genes and specific changes within those genes that either confer risk for Alzheimer’s Disease or provide protection from Alzheimer’s Disease. Analytic challenges include analysis of whole genome sequencing data, appropriately accounting for population structure across European ancestry, Hispanic, and African American participants, and interpreting results in the context of other genomic data available.
Investigator:
Shelton, Janie
Institution:
Bristol Myers Squibb
Project Title:
A longitudinal study of Alzheimer’s Disease and other dementing illnesses – KnightADRC GWAS
Date of Approval:
January 30, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
Recently approved Alzheimer’s disease (AD) therapies, such as lecanemab (Leqembi) and donanemab (Kisunla), represent a significant advancement toward disease-modifying treatment. However, their impact on cognitive decline remains modest, and both are associated with potentially serious adverse events, including amyloid-related imaging abnormalities (ARIA). These limitations underscore the urgent need for additional therapeutic strategies to reduce disease burden. Genetic approaches offer a powerful avenue for drug target discovery, with evidence suggesting that genetically supported targets are at least twice as likely to progress successfully through clinical development to FDA approval (Nelson et al., 2015, Nat Genet; King et al., 2019, PLoS Genet; Minikel et al., 2024, Nature). To date, most genetic studies in AD have focused on identifying loci associated with disease risk. Large-scale genome-wide association studies (GWAS) have uncovered approximately 75 risk loci (Bellenguez et al., 2022, Nat Genet), providing valuable insights into disease etiology. However, therapeutic interventions are typically aimed at individuals already diagnosed with AD, making the genetics of disease progression a critical—yet underexplored—complementary approach for target discovery. Progression-focused genetic studies face challenges due to limited availability of longitudinal phenotypic data. To address this, meta-analysis of multiple GWAS datasets offers a practical strategy to increase statistical power and detect robust associations. We propose to incorporate summary statistics from the Knight Alzheimer Disease Research Center (Knight-ADRC) AD progression GWAS into a meta-analysis alongside several publicly available and proprietary datasets. Our objective is to identify novel genetic drivers of AD progression, prioritize new therapeutic targets, and assess the impact of existing pipeline candidates on disease trajectory.
Non-Technical Research Use Statement:
New Alzheimer’s treatments like lecanemab (Leqembi) and donanemab (Kisunla) are an important step forward in the search for ways to help patients, but these drugs have only moderate benefits and can come with serious side effects. Better therapies are still needed to reduce the impact of the disease. Genetics offers a powerful way to discover new drugs—studies show that treatments based on genetic findings are more likely to succeed. So far most genetic research has focused on the genes which increase the risk of developing Alzheimer’s, but understanding genes that drive how the disease progresses in Alzheimer’s patients may be even more beneficial. However this type of data, which involves following participants over time, is limited, combining results from multiple smaller studies (a meta-analysis) can help uncover important patterns. We plan to add data from the Knight Alzheimer Disease Research Center to a larger analysis to find new genetic clues, identify better treatment targets, and evaluate how current and future drugs may slow disease progression.
Investigator:
Yang, Jingjing
Institution:
Emory University
Project Title:
Novel statistical methods for integrating transcriptomic and proteomic data in GWAS
Date of Approval:
December 2, 2025
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
The objective of the proposed project is to derive novel statistical methods to integrate multi-omics data and pathology data in genome-wide association studies (GWAS) for studying complex phenotypes, with the goal of prioritizing genetic variants and identifying causal genes. First, we will develop novel statistical methods to integrate summary-level omics data and pathology data of diverse populations with GWAS data to prioritize risk genes. Second, we will apply our tools to publicly available xQTL data and the ADSP GWAS data. Third, we will also use the ADSP GWAS summary data to conduct causal analysis of other aging-related phenotypes and AD dementia.We will first develop novel statistical methods to integrate summary-level xQTL data of multiple populations with GWAS data to test gene associations with complex human diseases. We are interested in studying all complex phenotypes that were profiled for the ADSP samples, especially Alzheimer’s disease (AD) and AD-related complex phenotypes. Especially, our lab has access to the ROS/MAP multi-omics data shared by the Rush Alzheimer’s disease center (http://www.radc.rush.edu/), and GTEx data. All samples in the ROS/MAP study are well-characterized with extensive complex phenotypes profiled, including clinical diagnosis of AD, AD-related complex phenotypes, and psychological phenotypes. GTEx provides transcriptomic data of multiple human tissues. We will leverage multiple omics data profiled from the ROS/MAP study and transcriptomics data profiled from GTEx to learn SNP-omics relations, and then integrate such learned relationships with ADSP data to identify risk genes of complex diseases. We will also validate our findings by using omics and pathology data in the requested data sets.The purpose of using ADSP data is to increase sample size for testing our derived methods for functional genetic association studies of complex phenotypes, studying the genetic etiology of AD and AD-related phenotypes, and validating our finding by using the omics data from Rush Alzheimer's Disease Center. We are not limited to studying AD only. We are flexible to study any complex phenotypes that are profiled for ADSP samples.
Non-Technical Research Use Statement:
This proposed project is to develop novel statistical methods to integrate summary-level multi-omics data such as transcriptomic, proteomics, and epigenetics, and pathology data, in genome-wide association studies (GWAS) of complex phenotypes, with the goal of identifying causal genes. i) We will develop novel statistical method for integrating summary-level omics data and pathology data with GWAS data. ii) We will apply our tools to publicly available summary-level omics data, omics data from the ROS/MAP study, and ADSP GWAS data for studying AD and AD-related phenotypes. iii) We will conduct causal inference to test the causal relationship between AD and other aging-related phenotypes. We propose to test our proposed methods on the applied genomic analysis data to study complex phenotypes that are profiled for ADSP, including AD, AD-related pathology traits, and related psychological disorders.
Investigator:
Zhao, Zhongming
Institution:
University of Texas Health Science Center at Houston
Project Title:
AIM-AI: an Actionable, Integrated and Multiscale genetic map of Alzheimer's disease via deep learning
Date of Approval:
June 1, 2026
Request status:
Approved
Research use statements:
Show statements
Technical Research Use Statement:
Objectives: The objective of our study is to advance our understanding of the genetic basis of Alzheimer’s Disease (AD) through the analysis of comprehensive genomic datasets such as Whole Exome Sequencing (WES), Whole Genome Sequencing (WGS), single-nuclei RNA sequencing, and Genome-Wide Association Studies (GWAS), as well as the related phenotype. We aim to identify genetic variants that are integral to the development and progression of AD.Study Design: Our approach involves a detailed multi-omics analysis focusing on both coding and non-coding regions within these datasets. We will develop new analytical variables from existing data, ensuring that our research adheres to the established data use limitations and contributes meaningfully to the field of genetic research in AD.Analysis Plan: The plan centers on investigating the correlation between genetic variants and AD, exploring how these variants influence the disease at a genetic level. We will employ cutting-edge computational methods to analyze interactions between these genetic markers and their potential role in AD pathogenesis. The integration of data from multiple sources will be carefully executed to maintain compliance with data use agreements, emphasizing the scientific exploration of AD.
Non-Technical Research Use Statement:
Our research is dedicated to unraveling the genetic components of Alzheimer’s Disease. By analyzing genetic sequences and variations through various genomic datasets, we seek to deepen the scientific understanding of how these genetic elements contribute to AD. The outcomes of this study will be shared with the public, enhancing general knowledge of Alzheimer’s Disease and supporting the global research community in its ongoing efforts to decode this complex condition.

Total number of samples: 482

Female 202 41.9 %

Male 280 58.1 %

American Indian/Alaska Native	2
Asian	2
Black or African American	327
White	145
Other	4
NA	2

NG00128 – Proteomic profiling identified plasma biomarkers for SARS-CoV-2 infection and severity of COVID-19 patients

Overview

Description

Sample Summary per Data Type

Available Filesets

Data Dictionary Files

Sample Information

Data Releases

Related Studies

Cohorts

Phenotype Harmonization

Consent Levels

Acknowledgement

Acknowledgment statement for any data distributed by NIAGADS:

For investigators using any data from this dataset:

For investigators using Knight ADRC & WU350 (sa000026) data:

Publications

Third-Party Access

Approved Users

Total number of samples: 482

COVID-19
Control	150	31.1%
Case	332	68.9%