jvarkit

VcfRebase

Last commit

Restriction sites overlaping variations in a vcf

Usage

This program is now part of the main jvarkit tool. See jvarkit for compiling.

Usage: java -jar dist/jvarkit.jar vcfrebase  [options] Files

Usage: vcfrebase [options] Files
  Options:
    -A, --attribute
      VCF INFO attribute
      Default: ENZ
    --bcf-output
      If this program writes a VCF to a file, The format is first guessed from 
      the file suffix. Otherwise, force BCF output. The current supported BCF 
      version is : 2.1 which is not compatible with bcftools/htslib (last 
      checked 2019-11-15)
      Default: false
    -E, -enzyme, --enzyme
      restrict to that enzyme name. Default: use all enzymes
      Default: []
    --generate-vcf-md5
      Generate MD5 checksum for VCF output.
      Default: false
    -h, --help
      print help and exit
    --helpFormat
      What kind of help. One of [usage,markdown,xml].
    -o, --out
      Output file. Optional . Default: stdout
    -R, -reference, --reference
      Indexed fasta Reference file. This file must be indexed with samtools 
      faidx and with picard/gatk CreateSequenceDictionary or samtools dict
    --version
      print version and exit
    -w, -weight, --weight
      min enzyme weight 6 = 6 cutter like GAATTC, 2 = 2 cutter like ATNNNNNNAT
      Default: 5.0

Keywords

Creation Date

20131115

Source code

https://github.com/lindenb/jvarkit/tree/master/src/main/java/com/github/lindenb/jvarkit/tools/vcfrebase/VcfRebase.java

Unit Tests

https://github.com/lindenb/jvarkit/tree/master/src/test/java/com/github/lindenb/jvarkit/tools/vcfrebase/VcfRebaseTest.java

Contribute

License

The project is licensed under the MIT license.

Citing

Should you cite vcfrebase ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md

The current reference is:

http://dx.doi.org/10.6084/m9.figshare.1425030

Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030

Example

$ java -jar dist/vcfrebase.jar -w 6 -R ~/data/human_g1k_v37.fasta src/test/resources/test_vcf01.vcf | bcftools annotate -x '^INFO/ENZ' | bcftools view --drop-genotypes | grep ENZ

##INFO=<ID=ENZ,Number=.,Type=String,Description="Enzyme overlapping: Format: (Name|Site|Sequence|pos|strand)">
##bcftools_annotateCommand=annotate -x ^INFO/ENZ; Date=Wed Nov 13 10:38:39 2019
1	852063	.	G	A	387	PASS	ENZ=PflMI|CCANNNN^NTGG|CCAGGCCCTGG|852064|+
1	866893	.	T	C	431	PASS	ENZ=SacI|GAGCT^C|GAGCtC|866889|+
1	875770	.	A	G	338	PASS	ENZ=ClaI|AT^CGAT|ATCGaT|875766|+
1	909238	.	G	C	229	PASS	ENZ=PmaCI|CAC^GTG|CACgTG|909235|+
1	913889	.	G	A	372	PASS	ENZ=BsaXI|(9/12)ACNNNNNCTCC(10/7)|GGAGGCCCCgT|913880|-
1	918384	.	G	T	489	PASS	ENZ=DraIII|CACNNN^GTG|CACgCCGTG|918381|+
1	933790	.	G	A	436	PASS	ENZ=BsaXI|(9/12)ACNNNNNCTCC(10/7)|GGAGGAGGGgT|933781|-
1	940005	.	A	G	188	PASS	ENZ=GsuI|CTGGAG(16/14)|CTGGAG|940006|+,BaeI|(10/15)ACNNNNGTAYC(12/7)|GGTaCTGGAGT|940002|-
1	940096	.	C	T	487	PASS	ENZ=BcgI|(10/12)CGANNNNNNTGC(12/10)|cGAGGTGGGTGC|940096|+
1	950113	.	GAAGT	G	1427	PASS	ENZ=Eco57I|CTGAAG(16/14)|CTgaag|950111|+
1	950243	.	A	C	182	PASS	ENZ=BclI|T^GATCA|TGaTCA|950241|+
1	951283	.	C	T	395	PASS	ENZ=NarI|GG^CGCC|GGcGCC|951281|+
1	951564	.	A	G	105	PASS	ENZ=BstXI|CCANNNNN^NTGG|CCaAGTAGTTGG|951562|+
1	952003	.	G	A	177	PASS	ENZ=Bpu10I|CCTNAGC(-5/-2)|CCTCAGC|952004|+,BbvCI|CCTCAGC(-5/-2)|CCTCAGC|952004|+
1	952428	.	G	A	456	PASS	ENZ=EciI|GGCGGA(11/9)|TCCgCC|952425|-
1	953952	.	G	A	490	PASS	ENZ=BsrDI|GCAATG(2/0)|CATTgC|953948|-
1	959155	.	G	A	370	PASS	ENZ=BarI|(7/12)GAAGNNNNNNTAC(12/7)|gAAGCCGCTCTAC|959155|+
1	959231	.	G	A	350	PASS	ENZ=BsaXI|(9/12)ACNNNNNCTCC(10/7)|GGAGGGTCCgT|959222|-
1	960409	.	G	C	357	PASS	ENZ=BseYI|CCCAGC(-5/-1)|CCCAgC|960405|+
1	962210	.	A	G	300	PASS	ENZ=NcoI|C^CATGG|CCaTGG|962208|+
1	964389	.	C	T	32	LowGQXHetSNP;LowGQXHomSNP	ENZ=BseYI|CCCAGC(-5/-1)|cCCAGC|964389|+
1	967658	.	C	T	515	PASS	ENZ=StuI|AGG^CCT|AGGCcT|967654|+
1	970215	.	G	C	379	PASS	ENZ=DrdI|GACNNNN^NNGTC|GACCCCTCGGTC|970216|+
1	972180	.	G	A	403	PASS	ENZ=AgeI|A^CCGGT|ACCgGT|972177|+
1	1004957	.	G	A	316	PASS	ENZ=BsgI|GTGCAG(16/14)|gTGCAG|1004957|+
1	1004980	.	G	A	292	PASS	ENZ=BsePI|G^CGCGC|gCGCGC|1004980|+
1	1011087	.	CG	C	1052	PASS	ENZ=Eam1105I|GACNNN^NNGTC|GACTCTCAGTc|1011077|+
1	1017170	.	C	G	507	PASS	ENZ=AloI|(7/12)GAACNNNNNNTCC(12/7)|GAACAGAGcATCC|1017162|+