jvarkit

VcfToBed

Last commit

vcf to bed

Usage

Usage: vcf2bed [options] Files
  Options:
    -F, --format
      output format
      Default: bed
      Possible Values: [bed, interval]
    -header, --header
      Print Header
      Default: false
    -h, --help
      print help and exit
    --helpFormat
      What kind of help. One of [usage,markdown,xml].
    -M, --max
      Optional filter: max sequence length. A distance specified as a positive 
      integer.Commas are removed. The following suffixes are interpreted : 
      b,bp,k,kb,m,mb 
    -m, --min
      Optional filter: min sequence length. A distance specified as a positive 
      integer.Commas are removed. The following suffixes are interpreted : 
      b,bp,k,kb,m,mb 
    -c, --no-ci
      For structural variant, ignore the extention of the boundaries using 
      INFO/CIPOS and INFO/CIEND
      Default: false
    -o, --output
      Output file. Optional . Default: stdout
    -R, --reference, --dict
      A SAM Sequence dictionary source: it can be a *.dict file, a fasta file 
      indexed with 'picard CreateSequenceDictionary', or any hts file 
      containing a dictionary (VCF, BAM, CRAM, intervals...)
    -x, --slop
      Extends interval. Extending interval. The following syntaxes are 
      supported: 1000; 1kb; 1,000; 30%(shrink); 150% (extend); 0.5 (shrink); 
      1.5 (extend)
      Default: 0
    --version
      print version and exit

Keywords

Compilation

Requirements / Dependencies

Download and Compile

$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ ./gradlew vcf2bed

The java jar file will be installed in the dist directory.

Creation Date

20181203

Source code

https://github.com/lindenb/jvarkit/tree/master/src/main/java/com/github/lindenb/jvarkit/tools/misc/VcfToBed.java

Unit Tests

https://github.com/lindenb/jvarkit/tree/master/src/test/java/com/github/lindenb/jvarkit/tools/misc/VcfToBedTest.java

Contribute

License

The project is licensed under the MIT license.

Citing

Should you cite vcf2bed ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md

The current reference is:

http://dx.doi.org/10.6084/m9.figshare.1425030

Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030

Motivation

I’m lazy about using awk or bioalcidaejdk for this task and I want something that uses INFO/CIPOS and INFO/CIEND for structural variants

Input

input is one or more VCF file

one file ending with ‘.list’ is interpreted as a list of paths (one per lines)

one file ending with ‘.zip’ or ‘.tar’ or ‘.tar.gz’ is interpreted an archive and all the files looking like vcf files are extracted

if there is no input, the program reads vcf from stdin

##Example

$ wget -q -O - "https://github.com/hall-lab/cshl_sv_2014/blob/master/supplemental/NA12878.lumpy.vcf?raw=true" |\
	grep -A 10 '#CHROM'
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA12878
1	869423	1	G	<DEL>	345.50	.	SVTYPE=DEL;SVLEN=-857;END=870280;STR=+-:25;IMPRECISE;CIPOS=-1,34;CIEND=0,0;EVENT=1;SUP=25;PESUP=25;SRSUP=0;EVTYPE=PE;PRIN	GT:SUP:PE:SR:GQ:DP:RO:AO:SQ:GL	1/1:25:25:0:0.00:71:19:52:345.50:-36,-8,-1
1	1588585	5	A	<DUP>	0.00	.	SVTYPE=DUP;SVLEN=65356;END=1653941;STR=-+:7;IMPRECISE;CIPOS=-126,1;CIEND=-2,67;EVENT=5;SUP=7;PESUP=7;SRSUP=0;EVTYPE=PE;PRIN	GT:SUP:PE:SR:GQ:DP:RO:AO:SQ:GL	0/1:7:7:0:74.40:139:125:13:0.00:-1,-8,-24
1	1594964	6	C	<DUP>	0.00	.	SVTYPE=DUP;SVLEN=65855;END=1660819;STR=-+:8;IMPRECISE;CIPOS=-81,2;CIEND=-1,127;EVENT=6;SUP=8;PESUP=8;SRSUP=0;EVTYPE=PE;PRIN	GT:SUP:PE:SR:GQ:DP:RO:AO:SQ:GL	0/1:8:8:0:77.96:153:137:15:0.00:-1,-9,-25
1	2566176	7	A	<DEL>	121.20	.	SVTYPE=DEL;SVLEN=-418;END=2566594;STR=+-:14;IMPRECISE;CIPOS=-2,68;CIEND=0,0;EVENT=7;SUP=14;PESUP=14;SRSUP=0;EVTYPE=PE;PRIN	GT:SUP:PE:SR:GQ:DP:RO:AO:SQ:GL	0/1:14:14:0:0.00:78:44:33:121.20:-13,-1,-12
1	2911548	8	G	<DEL>	440.34	.	SVTYPE=DEL;SVLEN=-302;END=2911850;STR=+-:20;CIPOS=0,0;CIEND=0,0;EVENT=8;SUP=20;PESUP=8;SRSUP=12;EVTYPE=PE,SR;PRIN	GT:SUP:PE:SR:GQ:DP:RO:AO:SQ:GL	0/1:20:8:12:0.00:185:86:99:440.34:-48,-4,-15
1	2919034	9	G	<DEL>	289.83	.	SVTYPE=DEL;SVLEN=-332;END=2919366;STR=+-:22;CIPOS=0,0;CIEND=0,0;EVENT=9;SUP=22;PESUP=10;SRSUP=12;EVTYPE=PE,SR;PRIN	GT:SUP:PE:SR:GQ:DP:RO:AO:SQ:GL	0/1:22:10:12:0.00:160:86:74:289.83:-31,-2,-20
1	5447229	14	G	<DUP>	380.12	.	SVTYPE=DUP;SVLEN=210;END=5447439;STR=-+:11;CIPOS=0,0;CIEND=0,0;EVENT=14;SUP=11;PESUP=1;SRSUP=10;EVTYPE=PE,SR;PRIN	GT:SUP:PE:SR:GQ:DP:RO:AO:SQ:GL	1/1:11:1:10:0.00:197:104:93:380.12:-39,-7,-1
1	5876603	15	G	<DEL>	0.00	.	SVTYPE=DEL;SVLEN=-928;END=5877531;STR=+-:8;CIPOS=0,0;CIEND=0,0;EVENT=15;SUP=8;PESUP=0;SRSUP=8;EVTYPE=SR;PRIN	GT:SUP:PE:SR:GQ:DP:RO:AO:SQ:GL	0/1:8:0:8:294.07:169:168:0:0.00:-8,-37,-117
1	5877530	16	T	<DEL>	63.31	.	SVTYPE=DEL;SVLEN=-72;END=5877602;STR=+-:13;CIPOS=0,0;CIEND=0,0;EVENT=16;SUP=13;PESUP=0;SRSUP=13;EVTYPE=SR;PRIN	GT:SUP:PE:SR:GQ:DP:RO:AO:SQ:GL	0/1:13:0:13:0.00:188:136:51:63.31:-10,-4,-54
1	6619067	19_1	T	[1:6619506[T	0.00	.	SVTYPE=BND;STR=--:7;IMPRECISE;CIPOS=-88,1;CIEND=-26,2;MATEID=19_2;EVENT=19;SUP=7;PESUP=7;SRSUP=0;EVTYPE=PE;PRIN	GT:SUP:PE:SR:GQ:DP:RO:AO:SQ:GL	0/1:7:7:0:127.76:131:117:13:0.00:-1,-14,-66


$ wget -q -O - "https://github.com/hall-lab/cshl_sv_2014/blob/master/supplemental/NA12878.lumpy.vcf?raw=true" |\
	java -jar dist/vcf2bed.jar |\
	head

1	869421	870280	1	345
1	1588458	1654008	5	0
1	1594882	1660946	6	0
1	2566173	2566594	7	121
1	2911547	2911850	8	440
1	2919033	2919366	9	289
1	5447228	5447439	14	380
1	5876602	5877531	15	0
1	5877529	5877602	16	63
1	6618978	6619069	19_1	0

With tar.gz

$ tar cvfz ~/jeter.tar.gz src/test/resources/rotavirus_rf.*.vcf.gz
$ tar tvfz ~/jeter.tar.gz
-rw-r--r-- lindenb/lindenb 5805 2019-01-11 18:29 src/test/resources/rotavirus_rf.ann.vcf.gz
-rw-r--r-- lindenb/lindenb 27450 2019-01-11 18:29 src/test/resources/rotavirus_rf.freebayes.vcf.gz
-rw-r--r-- lindenb/lindenb  7366 2019-01-11 18:29 src/test/resources/rotavirus_rf.unifiedgenotyper.vcf.gz


With zip

$ zip ~/jeter.zip src/test/resources/rotavirus_rf.*.vcf.gz
$ tar tvfz ~/jeter.tar.gz
-rw-r--r-- lindenb/lindenb 5805 2019-01-11 18:29 src/test/resources/rotavirus_rf.ann.vcf.gz
-rw-r--r-- lindenb/lindenb 27450 2019-01-11 18:29 src/test/resources/rotavirus_rf.freebayes.vcf.gz
-rw-r--r-- lindenb/lindenb  7366 2019-01-11 18:29 src/test/resources/rotavirus_rf.unifiedgenotyper.vcf.gz

$ java -jar dist/vcf2bed.jar ~/jeter.zip  | wc