The ADSP has recently unveiled an updated version of the dataset (ng00067.v10) that encompasses valuable genetic information. This release comprises two key components:
- A quality-controlled project-level Variant Call Format (pVCF) file for bi-allelic autosomes derived from the R4 whole genome sequencing (WGS) dataset, encompassing 36,361 samples.
- Individual-level structural variant calls generated by two callers, Manta and Smoove, specifically for the new samples introduced in R4 (n=19,451).
Furthermore, the R4 quality-controlled pVCF offers compact and compact filtered versions, providing smaller files that are more convenient to handle.
Additionally, a user-friendly script has been made available on GitHub to assist researchers in generating an integrated phenotype file. The script combines multiple phenotype files into one and includes a genetically unique list of samples specifically from the R4 WGS dataset. By using this integrated phenotype file in conjunction with the R4 genotype files, researchers can effectively conduct an Alzheimer’s disease (AD) case/control analysis. This resource aims to streamline the process for users, facilitating their research efforts and enhancing the efficiency of AD-related investigations.
The next planned releases from ADSP will include quality-controlled multi-allelic autosomes and chromosome X files for the R4 dataset, R4 genomic annotation files, R4 QC’d pVCF in Genomic Data Structure (GDS) format, and R4 structural variant joint genotyping files.
For additional information, check out the NG00067 dataset web page and see the RELEASE NOTES for full release details.
To submit a Data Access Request for this dataset, follow the instructions on the Application Instructions web page.