Restriction sites overlaping variations in a vcf
This program is now part of the main jvarkit
tool. See jvarkit for compiling.
Usage: java -jar dist/jvarkit.jar vcfrebase [options] Files
Usage: vcfrebase [options] Files
Options:
-A, --attribute
VCF INFO attribute
Default: ENZ
--bcf-output
If this program writes a VCF to a file, The format is first guessed from
the file suffix. Otherwise, force BCF output. The current supported BCF
version is : 2.1 which is not compatible with bcftools/htslib (last
checked 2019-11-15)
Default: false
-E, -enzyme, --enzyme
restrict to that enzyme name. Default: use all enzymes
Default: []
--generate-vcf-md5
Generate MD5 checksum for VCF output.
Default: false
-h, --help
print help and exit
--helpFormat
What kind of help. One of [usage,markdown,xml].
-o, --out
Output file. Optional . Default: stdout
-R, -reference, --reference
Indexed fasta Reference file. This file must be indexed with samtools
faidx and with picard/gatk CreateSequenceDictionary or samtools dict
--version
print version and exit
-w, -weight, --weight
min enzyme weight 6 = 6 cutter like GAATTC, 2 = 2 cutter like ATNNNNNNAT
Default: 5.0
20131115
The project is licensed under the MIT license.
Should you cite vcfrebase ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md
The current reference is:
http://dx.doi.org/10.6084/m9.figshare.1425030
Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030
$ java -jar dist/vcfrebase.jar -w 6 -R ~/data/human_g1k_v37.fasta src/test/resources/test_vcf01.vcf | bcftools annotate -x '^INFO/ENZ' | bcftools view --drop-genotypes | grep ENZ
##INFO=<ID=ENZ,Number=.,Type=String,Description="Enzyme overlapping: Format: (Name|Site|Sequence|pos|strand)">
##bcftools_annotateCommand=annotate -x ^INFO/ENZ; Date=Wed Nov 13 10:38:39 2019
1 852063 . G A 387 PASS ENZ=PflMI|CCANNNN^NTGG|CCAGGCCCTGG|852064|+
1 866893 . T C 431 PASS ENZ=SacI|GAGCT^C|GAGCtC|866889|+
1 875770 . A G 338 PASS ENZ=ClaI|AT^CGAT|ATCGaT|875766|+
1 909238 . G C 229 PASS ENZ=PmaCI|CAC^GTG|CACgTG|909235|+
1 913889 . G A 372 PASS ENZ=BsaXI|(9/12)ACNNNNNCTCC(10/7)|GGAGGCCCCgT|913880|-
1 918384 . G T 489 PASS ENZ=DraIII|CACNNN^GTG|CACgCCGTG|918381|+
1 933790 . G A 436 PASS ENZ=BsaXI|(9/12)ACNNNNNCTCC(10/7)|GGAGGAGGGgT|933781|-
1 940005 . A G 188 PASS ENZ=GsuI|CTGGAG(16/14)|CTGGAG|940006|+,BaeI|(10/15)ACNNNNGTAYC(12/7)|GGTaCTGGAGT|940002|-
1 940096 . C T 487 PASS ENZ=BcgI|(10/12)CGANNNNNNTGC(12/10)|cGAGGTGGGTGC|940096|+
1 950113 . GAAGT G 1427 PASS ENZ=Eco57I|CTGAAG(16/14)|CTgaag|950111|+
1 950243 . A C 182 PASS ENZ=BclI|T^GATCA|TGaTCA|950241|+
1 951283 . C T 395 PASS ENZ=NarI|GG^CGCC|GGcGCC|951281|+
1 951564 . A G 105 PASS ENZ=BstXI|CCANNNNN^NTGG|CCaAGTAGTTGG|951562|+
1 952003 . G A 177 PASS ENZ=Bpu10I|CCTNAGC(-5/-2)|CCTCAGC|952004|+,BbvCI|CCTCAGC(-5/-2)|CCTCAGC|952004|+
1 952428 . G A 456 PASS ENZ=EciI|GGCGGA(11/9)|TCCgCC|952425|-
1 953952 . G A 490 PASS ENZ=BsrDI|GCAATG(2/0)|CATTgC|953948|-
1 959155 . G A 370 PASS ENZ=BarI|(7/12)GAAGNNNNNNTAC(12/7)|gAAGCCGCTCTAC|959155|+
1 959231 . G A 350 PASS ENZ=BsaXI|(9/12)ACNNNNNCTCC(10/7)|GGAGGGTCCgT|959222|-
1 960409 . G C 357 PASS ENZ=BseYI|CCCAGC(-5/-1)|CCCAgC|960405|+
1 962210 . A G 300 PASS ENZ=NcoI|C^CATGG|CCaTGG|962208|+
1 964389 . C T 32 LowGQXHetSNP;LowGQXHomSNP ENZ=BseYI|CCCAGC(-5/-1)|cCCAGC|964389|+
1 967658 . C T 515 PASS ENZ=StuI|AGG^CCT|AGGCcT|967654|+
1 970215 . G C 379 PASS ENZ=DrdI|GACNNNN^NNGTC|GACCCCTCGGTC|970216|+
1 972180 . G A 403 PASS ENZ=AgeI|A^CCGGT|ACCgGT|972177|+
1 1004957 . G A 316 PASS ENZ=BsgI|GTGCAG(16/14)|gTGCAG|1004957|+
1 1004980 . G A 292 PASS ENZ=BsePI|G^CGCGC|gCGCGC|1004980|+
1 1011087 . CG C 1052 PASS ENZ=Eam1105I|GACNNN^NNGTC|GACTCTCAGTc|1011077|+
1 1017170 . C G 507 PASS ENZ=AloI|(7/12)GAACNNNNNNTCC(12/7)|GAACAGAGcATCC|1017162|+