For my colleague Julien: VCF with one sample called using different callers. Only keep variant if it was found in min<x=other-files<=max
Usage: vcfcomparecallersonesample [options] Files
Options:
-h, --help
print help and exit
--helpFormat
What kind of help. One of [usage,markdown,xml].
-o, --output
Output file. Optional . Default: stdout
--version
print version and exit
-M
max number of challengers found, inclusive.
Default: 2147483646
-a
ignore ALT allele
Default: false
-f
VCF to be challenged. Must be sorted on dict. Must contain a dict.
Default: []
-m
min number of challengers found, inclusive.
Default: 0
${PATH}
. Setting JAVA_HOME is not enough : (e.g: https://github.com/lindenb/jvarkit/issues/23 )$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ ./gradlew vcfcomparecallersonesample
The java jar file will be installed in the dist
directory.
The project is licensed under the MIT license.
Should you cite vcfcomparecallersonesample ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md
The current reference is:
http://dx.doi.org/10.6084/m9.figshare.1425030
Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030
I used
java -jar dist/VcfCompareCallersOneSample.jar -m1 -M1 -f samtools.vcf gatk.vcf
java -jar dist/VcfCompareCallersOneSample.jar -m1 -M1 -f gatk.vcf samtools.vcf
shouldn’t I get the same number of variants in both files ? Answer is “not always” in the following case:
in gatk.vcf:
11 244197 rs1128322 T C
in samtools.vcf:
11 244197 rs1128322 T C,G
in
java -jar dist/VcfCompareCallersOneSample.jar -m1 -M1 -f samtools.vcf gatk.vcf
we keep the variant because we found ‘C’ in samtools and ‘gatk’
java -jar dist/VcfCompareCallersOneSample.jar -m1 -M1 -f gatk.vcf samtools.vcf
the variant is discarded because ‘G’ is found in samtools but not in ‘gatk.vcf’