As any bioinformatician knows, there are few things more frustrating than trying to understand how to use someone else's program. I struggled with this myself while working on this package. However, in the realm of scientific research, we must learn to appreciate the stringency of our frequently used tools. I will not tell you to ignore the various warnings and errors produced by VariTAS in this vignette, because they are essential to ensure that the pipeline produces statistically robust, reproducible results.
That being said, I empathise with the frustration of trying to use a new tool only to be met with a barrage of errors and incompatible data. So to minimise the amount of time you have to spend interpreting laconic error messages and resubmitting processes, I have written this guide. I hope that it helps to explain why these errors are thrown and more importantly, how to make them go away.
These are errors thrown when the pipeline is verifying the various options and parameters submitted to it through the config file. This includes a number of 'file ____ does not exist'-type errors that I have omitted for what I hope are obvious reasons.
An incompatible stage has been submitted to the main pipeline function. The only supported stages are 'alignment', 'qc', 'calling', 'annotation', and 'merging'.
Ensure that the
start.stage parameter is set to one of the allowed stages.
varitas.optionsmust be a list of options or a string giving the path to the config YAML file
Whatever you have tried to use as the VariTAS options file is incorrect. You shouldn't see this error if you're following the template in the Introduction vignette.
Ensure that you are pointing to the correct file when submitting it to
overwrite.varitas.options. It should be based on the
config.yaml file contained in the
inst directory of this package.
There must be a
reference_build parameter set somewhere in the config file so that the script knows which version of the genome you are using. This setting is present in the
config.yaml file found in the
inst directory of this package.
Add a parameter to the config file called
reference_build and make sure it's set to either 'grch37' or 'grch38' (anything else will cause you to run into the next error).
reference_buildmust be either grch37 or grch38
reference_build parameter in the config file can only be set to either 'grch37' or 'grch38', which are the two versions of the human genome supported by the pipeline. See also the previous error.
reference_build is set to your version of the genome, in the form of either 'grch37' or 'grch38'.
Only reference genomes in the FASTA format are supported by the various tools used in this pipeline. Of course, your genome might already be in FASTA format with a different file extension, but it's better to be sure.
Use a reference in FASTA format with the .fa or .fasta file extension.
target_panelmust be provided for alignment and variant calling stages
As VariTAS is meant to be run on data from amplicon sequencing experiments, some of the stages require a file detailing the target panel. This should be in the form of a BED file, the format of which is described here.
Ensure that you have a properly formatted BED file supplied as the
target_panel parameter in the config file.
Followed by “Reference genome chromosomes: ____ Target panel chromosomes: ____”. This error probably looks familiar if you've ever had the great priviledge of working with GATK. Essentially, the chromosomes listed in your target panel don't match up with those in the reference genome. In practice, it means you have one or more chromosomes in the target panel that are not in the reference.
This issue can arise from a few different places, so be sure to check that it's not something very simple first.
This issue and the next two are related to preparing the reference genome file. Various tools require that large FASTA files are indexed and have sequence dictionaries so that they can be parsed quickly. Once you fix these issues, they shouldn't come up again as long as the index files are in the same directory as the reference.
bwa index on the indicated file.
gatk CreateSequenceDictionary on the indicated file.
See above (x2)
samtools faidx on the indicated file.