Group VCF data by gene/transcript. By default it uses data from VEP , SnpEff
This program is now part of the main jvarkit
tool. See jvarkit for compiling.
Usage: java -jar dist/jvarkit.jar groupbygene [options] Files
Usage: groupbygene [options] Files
Options:
--cases
File or comma-separated list of control samples
--controls
File or comma-separated list of control samples
-e, -E, --extractors
Gene Extractors Name. Space/semicolon/Comma separated. custom:tag is a
custom extractor extracting all the values for INFO/tag as one or more
gene name
Default: ANN/GeneId VEP/GeneId BCSQ/gene SMOOVE SpliceAI
-h, --help
print help and exit
--helpFormat
What kind of help. One of [usage,markdown,xml].
--gtFiltered, --ignore-filtered-gt
ignore FILTERED genotypes
Default: false
-l, --list
[20190626]list all available gene extractors
--maxRecordsInRam
When writing files that need to be sorted, this will specify the number
of records stored in RAM before spilling to disk. Increasing this number
reduces the number of file handles needed to sort a file, and increases
the amount of RAM needed
Default: 50000
-o, --output
Output file. Optional . Default: stdout
--tmpDir
tmp working directory. Default: java.io.tmpDir
Default: []
--version
print version and exit
-positions
include variants positions in the output table.
Default: false
20140531
The project is licensed under the MIT license.
Should you cite groupbygene ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md
The current reference is:
http://dx.doi.org/10.6084/m9.figshare.1425030
Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030
Group VCF data by gene/transcript. By default it tries to use data from VEP and SnpEff
$ curl -s -k "https://raw.github.com/arq5x/gemini/master/test/test4.vep.snpeff.vcf" |\
java -jar dist/groupbygene.jar |\
head | column -t
#chrom min.POS max.POS gene.name gene.type samples.affected count.variations M10475 M10478 M10500 M128215
chr10 52004315 52004315 ASAH2 snpeff-gene-name 2 1 0 0 1 1
chr10 52004315 52004315 ASAH2 vep-gene-name 2 1 0 0 1 1
chr10 52497529 52497529 ASAH2B snpeff-gene-name 2 1 0 1 1 0
chr10 52497529 52497529 ASAH2B vep-gene-name 2 1 0 1 1 0
chr10 48003992 48003992 ASAH2C snpeff-gene-name 3 1 1 1 1 0
chr10 48003992 48003992 ASAH2C vep-gene-name 3 1 1 1 1 0
chr10 126678092 126678092 CTBP2 snpeff-gene-name 1 1 0 0 0 1
chr10 126678092 126678092 CTBP2 vep-gene-name 1 1 0 0 0 1
chr10 135336656 135369532 CYP2E1 snpeff-gene-name 3 2 0 2 1 1
```