Amazon Instructions

In order to download data over 5Gb in size from the DSS, you will need an Amazon Web Services account. Files >5GB must be downloaded directly from Amazon.

After your Data Access Request has been approved (and you have saved your AWS account information in your profile), it will take 24 hours for permission to download files from Amazon. Below are some instructions and guidelines for using Amazon.

AWS Account Verification

After logging into the DSS, enter your AWS Canonical ID on the My Profile page. A new file containing your pass code will be generated and can be retrieved using your AWS account. The file will be deleted and the pass code invalidated 24 hours after being generated. You can use any AWS S3 tool to download the file. We recommend the AWS Command Line Interface (AWS CLI). Detailed instructions on how to set up the AWS CLI can be found below.

S3 URLS

The DSS website provides users with file URLs for downloading files. This is an example of an AWS S3 URL:

s3://ryft-public-sample-data/passengers-ipv4.txt

The example above is a publicly accessible file held on AWS S3 in a public bucket named ryft-public-sample-data. The data in DSS is in a private bucket and requires configuring external tools in order to download data. Another requirement is having an AWS account. Several tools exist to download private S3 data, such as AWS CLI, s3cmd, or S3 Browser. Once configured with your own account information you can use it to download from AWS S3.

AWS Downloading Costs

The costs associated with downloading are dependent on whether you are downloading to another AWS resource or outside AWS. You would not be charged if you download within the same region as our S3 bucket, US-East (N. Virginia), to another US-East (N. Virginia) AWS resource. Generally, it would cost $0.09 per GB to download to resources outside of AWS. See the pricing table for specifics: S3 Data Transfer Pricing.

If you plan to download the data locally, an affordable transfer option is an Amazon Snowball (more details below). The device costs $250 to transfer 80TB of data. See the pricing information for specifics: Amazon Snowball Pricing.

For CRAMs and gVCFs, the requesting institution will incur the cost of downloading the data. These files can be downloaded using the Amazon Requester Pays option. Downloading genotype, phenotype, and miscellaneous files are free to the requesting institution.

Whether a file is free or not can be determined by the beginning of its S3 path:

Free files: S3 path begins with wanglab-dss-tier0
Requester pays files: S3 path begins with wanglab-dss-share

AWS Command Line Interface

Amazon provides a command-line tool called AWS CLI, see details at aws.amazon.com/cli/. It must be configured with your account information by typing aws configure, see aws configure. Once properly setup, you can download from S3. In order to download from the DSS private bucket, you must have an approved DAR with download files in your cart within the portal. Please follow the site’s instructions on how to add files to your cart. The cart will provide you a list of S3 URLs for specific files.

CRAMs and gVCFs:

Downloading a CRAM or gVCF file will require a special flag in the aws command line called requester pays. The requesting institution will incur the cost of downloading the data. Its use is shown in the example below. Use a real path/file name in place of <example.file>.

To copy (download) file:

CODE

aws s3 cp --request-payer requester s3://wanglab-dss-share/<example.file> <example.file>

Genotypes, Phenotypes, and Miscellaneous Files:

Downloading genotype, phenotype, and miscellaneous files are free to the requesting institution. For downloading any type of data other than a CRAM or gVCF, use the command line below:

To copy (download) file:

CODE

aws s3 cp s3://wanglab-dss-tier0/<example.file> <example.file>

Full Dataset Transfer Using External Device

If you plan to download 100s of terabytes to a local device, we recommend using AWS Snowball. AWS Snowball allows you to export data stored in S3 to a physical device shipped to your address, where you can then copy the files to a local device. In order to use this service, you will need to have an S3 bucket set up under your own account. We will create a bucket with your requested data and give permission to you with your AWS canonical user ID where you can then order a snowball using your own AWS account. Reach out to us at help@niagads.org for assistance.

Advanced Processing Option

Certain tools can make use of S3 URLs to read data without having to download the file. Our CRAM files can be read by S3-aware alignment reader such as samtools.

This allows users to download either a portion or all data from a file without having to save the entire file to a local drive. Although file access may be slower, there is cost savings.