Variant annotation : variants clustered near the ends of reads
Usage: vcfclusteredreadedge [options] Files
Options:
-B, --bams
path of indexed BAM path with read Groups. You can put those paths in a
text file having a *.list sufffix
Default: []
-d, --distance
minimal distance to the end of the **CLIPPED** read.
Default: 1
-filter, --filter
A JEXL Expression that will be used to filter out some sam-records (see
https://software.broadinstitute.org/gatk/documentation/article.php?id=1255).
An expression should return a boolean value (true=exclude, false=keep
the read). An empty expression keeps everything. The variable 'record'
is the current observed read, an instance of SAMRecord (https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/samtools/SAMRecord.html).
Default: record.getMappingQuality()<1 || record.getDuplicateReadFlag() || record.getReadFailsVendorQualityCheckFlag() || record.isSecondaryOrSupplementary()
-gt, --gt
Genotype FILTER name
Default: EDGEVAR
-h, --help
print help and exit
--helpFormat
What kind of help. One of [usage,markdown,xml].
-o, --output
Output file. Optional . Default: stdout
--version
print version and exit
-vt, --vt
Variant FILTER name: set if ALL Genotypes have a variant near the edge.
Default: EDGEVAR
${PATH}
. Setting JAVA_HOME is not enough : (e.g: https://github.com/lindenb/jvarkit/issues/23 )$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ ./gradlew vcfclusteredreadedge
The java jar file will be installed in the dist
directory.
The project is licensed under the MIT license.
Should you cite vcfclusteredreadedge ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md
The current reference is:
http://dx.doi.org/10.6084/m9.figshare.1425030
Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030
Pour Sandro B.
GATK ClusteredReadPosition https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_cancer_ClusteredReadPosition.php only works with Mutect2
The program looks for SNV in the VCF, go back to the reads in the bam.
For one variant , if all the reads contain the variant at less than ‘distance’ then the genotype is FILTERED
if all the reads are FILTERED, the variant is FILTERED
java -jar dist/vcfclusteredreadedge.jar -B in.bam in.vcf
find . -name "*.bam" > tmp.list
java -jar dist/vcfclusteredreadedge.jar -B tmp.list in.vcf