Data Submission Instructions

This page contains information about the process and documentation necessary to submit data to NIAGADS. Depending on the data size, a member from NIAGADS will work with you on data transfer. Contact niagads@pennmedicine.upenn.edu to deposit data or if you have any questions.

Required Policy Documents

Please email the following required documents to niagads@pennmedicine.upenn.edu in order to deposit and share your data:

  1. Institutional Certification for ADRD Studies that covers all subjects in your study. Multiple certifications may be required.
  2. Signed copy of the NIA AD Genomics Sharing Plan.
  3. Data Registration Template

NOTE: All documents related to the application should be provided in English. For institutions where English is not the primary language, please provide translations of documents along with the original document. Translated documents should be signed by the institutional signing official.

Data Submission Checklist

Genotype data

  1. Phenotype Data File in tab delimited format (including pedigree structures if applicable) and a data dictionary
  2. README file (see below for suggested file contents)
  3. APOE Genotypes (if applicable)
  4. Genotypes in PLINK or VCF file format (preferred)
  5. Consent level as specified in the Institutional Certification form for each subject
  6. List of cohorts included and a description for each

Summary statistics/ Association results

  1. Results files in .txt format
  2. README (see below for suggested file contents)

Whole genome or whole exome sequencing

  1. Sequencing read data can be submitted in any of formats:
    • FASTQ: please save all reads, including those that could not be mapped to the reference genome.
    • BAM: please save all reads, including those that could not be mapped to the reference genome.
    • CRAM: please save all reads, including those that could not be mapped to the reference genome.
    • VCF: standard VCF4.2 format (recommend split by chr and gz these)
  2. Provide any relevant sequencing information, including the following:
    • Sequencing Center
    • Sequencer Machine
    • Read Length
    • PCR Free or PCR Amplified?
    • Kit Name/Version
    • Copy of the WES target regions if applicable
    • Sequencing quality control metrics
  3. Phenotype Data File in tab delimited format (including pedigree structures if applicable) and a data dictionary
  4. APOE Genotypes (if applicable)
  5. Genotypes in PLINK or VCF file format (preferred)
  6. Consent level as specified in the Institutional Certification form for each subject
  7. List of cohorts included and a description for each

RNA-seq- or microarray data

  1. Sequencing Read Data
    • Required Information:
      1. Sequencing read data can be submitted in any of the following formats: FASTQ, or BAM. The BAM file should contain all reads, including those that could not be mapped to the reference genome.
      2. Phenotype Data File in tab delimited format (including pedigree structures if applicable) and a data dictionary
      3. README:
        • Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
        • RNA extraction protocol (e.g. Trizol/chloroform extraction, Qiagen RNeasy kit)
        • RNA integrity (RIN number) per sample
        • Library preparation protocol (i.e. polyA capture, adapters used for ligation, read length and sequencing machine, single cell platform)
        • Contributor contact information
        • Dataset Reference Genome Build
      4. Consent level as specified in the Institutional Certification form for each subject
      5. List of cohorts included and a description for each
    • Optional Information:
      1. QC report per sample (i.e. library characteristics (total number of reads, sequencing read length), GC content, % of rRNAs, % of Aligned reads, coverage, insert size)
  2. Summary Data
    • Required Information:
      1. Read abundance files can be submitted as summaries in tab-separated file format with explanations.
      2. Phenotype Data File in tab delimited format (including pedigree structures if applicable) and a data dictionary
      3. README:
        • Sample source and organism; provide protocol details if iPSCs
        • How the RAW data was generated and processed (steps needed, e.g., how mapping was done, how was multi-mapping handled)
        • Raw data and library preparation protocol information (e.g., polyA capture, sequencing machine)
        • Unit of quantification in these summary files (e.g., genes, exons, etc.)
        • Annotation source and version (e.g., ENSEMBL version 94)
        • Unit of counts (e.g., raw counts, RPKM values, UMI counts). Please provide details if normalization were performed, technical variations / batch effects were accounted for.
        • Software name and version used to generate those counts.
        • Contributor contact information
        • Dataset Reference Genome Build
    • Optional Information:
      1. Any publication that describes the data or findings from the data
      2. QC report per sample (i.e. library information (total number of reads, sequencing read length), GC content, % of rRNAs, % of Aligned reads, coverage, insert size)
      3. Highly recommend to send the workflow via code repository (e.g. github, bitbucket).

Epigenetics studies (e.g., ChIP-seq, ATAC-seq)

  1. Sequencing Read Data
    • Required Information:
      1. Sequencing read data can be submitted in any of the following formats: FASTQ, or BAM. Save all reads, including those that could not be mapped to the reference genome. Besides, must include background samples (input or mock IP samples).
      2. Phenotype Data File in tab delimited format (including pedigree structures if applicable) and a data dictionary
      3. README:
        • Sample source and organism; provide protocol details if iPSCs
        • Library preparation protocol (i.e. adapters used for ligation, read length and sequencing machine)
        • Contributor contact information
        • Dataset Reference Genome Build
      4. Consent level as specified in the Institutional Certification form for each subject
      5. List of cohorts included and a description for each
    • Optional Information:
      1. QC report per samples (i.e. Library size (total number of reads), GC content, % of Aligned reads, coverage, insert size)
  2. Summary Data
    • Required Information:
      1. Processed peak files can be submitted in BED format with explanations (including significance of called peaks).
      2. Phenotype Data File in tab delimited format (including pedigree structures if applicable) and a data dictionary
      3. README:
        • Sample source and organism; provide protocol details if iPSCs
        • Description of all the BED columns
        • Software name and version used to make those values (e.g. how do you filter the reads before calling peaks, was narrow or broad peaks called, how was the p-value corrected if any).
        • Contributor contact information
        • Dataset Reference Genome Build
    • Optional Information:
      1. Any publication that describes the data or findings from the data
      2. QC report per samples (i.e. Library size (total number of reads), GC content, % of Aligned reads, coverage, insert size)
      3. Highly recommend to send the workflow via some code repository (e.g. github, bitbucket).

Quantitative trait locus (QTL) analysis summary stats

  1. Required Information:
    • Variant position: chr, start, end
    • Allele information: ref, alt, a1, a2
    • Feature name (e.g. gene name, protein name)
    • P-value and or Q-value
    • Effect size (Beta and Beta SE), or Spearman correlation p value
  2. Optional Information:
    • Allele frequency or allele count
    • Feature location: chr, start, end
    • Cis/trans
    • Readme:
      1. Detailed sample source, molecular trait and organism; provide protocol details if iPSCs
      2. Description of all the columns
      3. Software name and version used to perform the analyses
      4. Contributor contact information
      5. Dataset Reference Genome Build. Annotation resource info: e.g. ensemble version, dbSNP version

NOTE: Please provide md5 checksum for every submitted data file to ensure submission completion.

README Description

A README should include the following information (please use plain text (.txt), PDF (.pdf), or Microsoft Word (.doc or .docx)):

  1. Description of the dataset and concise description of the study design
  2. Platform or array
  3. Any version information
  4. List of included files and formats
  5. Contributor contact information
  6. Dataset Reference Genome Build
  7. Publications