In order to download data over 5Gb in size from the DSS, you will need an Amazon Web Services account. Files >5GB must be downloaded directly from Amazon.
Please note: After your Data Access Request has been approved (and you have saved your AWS account information in your profile), it will take 24 hours for permission to download files from Amazon. Below are some instructions and guidelines for using Amazon.
AWS Account Verification
After logging into the DSS, enter your AWS Canonical ID on the My Profile page. A new file containing your pass code will be generated and can be retrieved using your AWS account. The file will be deleted and the pass code invalidated 24 hours after being generated. You can use any AWS S3 tool download the file, we recommend the AWS command line interface (awscli). Detailed instructions on how to set up the awscli can be found below.
The DSS website provides users with file URLs for downloading files, such as:
This is an example of an AWS S3 URL. The example above is a publicly accessible file held on AWS S3 in a public bucket named ryft-public-sample-data. The data in DSS is in a private bucket and requires configuring external tools in order to download data. Another requirement is having an AWS account. Several tools exist to download private S3 data, such as awscli, s3cmd, or S3 Browser. Once configured with your own account information you can use it to download from AWS S3.
AWS Downloading Costs
The costs associated with downloading are dependent on whether you are downloading to another AWS resource or outside AWS. You would not be charged if you download within the same region as our S3 bucket, US-East (N. Virginia), to another US-East (N. Virginia) AWS resource. Generally it would cost $0.09 per GB download to download to resources outside of AWS. See the pricing table for specifics: S3 Data Transfer Pricing.
If you plan to download the data locally, an affordable transfer option is an Amazon Snowball (more details below). The device costs $250 to transfer 80TB of data. See the pricing information for specifics: Amazon Snowball Pricing.
For CRAMs and gVCFs, the requesting institution will incur the cost of downloading the data. These files can be downloaded using the Amazon Requester Pays option. Downloading genotype, phenotype, and miscellaneous files are free to the requesting institution.
AWS Command Line Interface
Amazon provides a command-line tool called aws, see details at aws.amazon.com/cli/. It must be configured with your account information by typing
aws configure, see aws configure. Once properly setup, you can download from S3. In order to download from the DSS private bucket, you must have an approved DAR with download files in your cart within the portal. Please follow the site’s instructions on how to add files to your cart. The cart will provide you a list of S3 URLs for specific files.
CRAMs and gVCFs:
Downloading a CRAM or gVCF filefile will require a special flag in the aws command line called requester pays. The requesting institution will incur the cost of downloading the data. Its use is shown in the example below.
aws s3 ls --request-payer requester s3://wanglab-dss-share/example.file
aws s3 cp --request-payer requester s3://wanglab-dss-share/example.file example.file
Genotypes, Phenotypes, and Miscellaneous Files:
Downloading genotype, phenotype, and miscellaneous files are free to the requesting institution. For downloading any type of data other than a CRAM or gVCF, use the command line below:
aws s3 ls s3://wanglab-dss-tier0/example.file
aws s3 cp s3://wanglab-dss-tier0/example.file example.file
Full Dataset Transfer Using External Device
If you plan to download 100s of terabytes to a local device, we recommend using AWS Snowball. AWS Snowball allows you to export data stored in S3 to a physical device shipped to your address, where you can then copy the files to a local device. Reach out to us at email@example.com to set up this service.
Advanced Processing Option
Certain tools can make use of S3 URLs to read data without having to download the file. Our CRAM files can be read by S3-aware alignment reader such as samtools.
This allows users to download either a portion or all data from a file without having to save the entire file to a local drive. Although file access may be slower, there is cost savings.