There are many ways to upload your data.
Submit a ticket to contact the Genome10K VGP to obtain access credentials for uploading. Of note, the VGP does not require any credentials for downloading. By downloading the data, you agree and accept the data use policy.
Install AWS CLI (Command Line Interface) from here. Bellow is a short example if you don't have root permission.
curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip”
unzip awscli-bundle.zip
./awscli-bundle/install -i $path_to_install
export PATH=$path_to_install/bin:$PATH
The next step is to configure your credentials obtained from the assembly-group.
aws configure
Type in your given aws_access_key_id and aws_secret_access_key_id when prompted.
Copy a file following the data_structure.
aws s3 cp <file> s3://genomeark-upload/species/<species_name>/<species_id>/<data_type>/<file>
For example, uploading a pacbio subread.bam from the hummingbird will be
aws s3 mXXXX.subreads.bam s3://genomeark-upload/species/Calypte_anna/bCalAnn1/pacbio/mXXXX.subreads.bam
Run check_etag.sh and see if it matches the eTag on the uploaded file. The eTag will be the md5 (or md5sum) for files <5 Gb, and a combined hash of multi-part files when larger than 5Gb.
./check_etag.sh mXXXX.subreads.bam
There are ways to change the eTag, but please use the default behavior of the aws cli and not change the eTag.
This is very important. Let us know when your uploading is completed. After a short check for the file structure, your data will be transferred to the GenomeArk.
TBA