VCF submission in detail

The transPLANT variation archive accepts variant calls in VCF format (version 4 and above) on known reference sequences, i.e. sequences present in the databases of the INSDC: ENA, GenBank, or DDBJ. Submitters must supply valid VCF and appropriate meta-data: Institute, Title, Study reference, Assembly reference, Sample references, and Sequence references. The full process for VCF submission is as follows:

1) Register for an account on the transPLANT website
The users email address and institute name, together with a random 32 character key (for identification) are stored in a back-end database.
2) An email is sent with VCF upload instructions
A new directory is created in the transPLANT ENA 'drop-box' FTP account, and an email with the FTP credentials and path is sent to the user.
3) Upload VCFs to the FTP site
Upload is achieved using the supplied credentials and path via the FTP client of choice.
4) Uploaded VCFs are checked and metadata extracted
A background process periodically lists all files in each user FTP directory. If a new file is found it is downloaded to be parsed for labels used in the samples line and 'chromosome' column. The MD5 hash is also calculated. The parsed information is used to create an instance of the Config class, which holds all required metadata necessary for ENA submission. This instance is stored together with information about the user.
5) An email is sent with a link to a web form
When the background process above has found new files, an email is sent to the user with a link, listing all processed files. From this list a VCF file can be selected for submission.
6) Missing metadata is collected via the web form
If the selected VCF contained less than 100 samples and less than 100 chromosomes, a form is presented for the entry of metadata (title, study reference, assembly reference, sample references, and sequence references). Else the user is asked to download the whole Config instance as a single JSON file to be filled in locally. The completed file can than be uploaded and has the same effect as submitting the form.
7) XML is generated for submission
Using the information in the Config instance (coming from the form or JSON upload) the 'submission' and 'analysis' XMLs are created.
8) VCFs and XML is submitted to ENA
These XMLs are submitted using the ENA REST API.
9) Newly generated ENA accessions are returned (or an error message)
The XML response of the submission is parsed and the user is presented with either the assigned submission and analysis accessions or any submission errors.

Return to VCF submission.