jvarkit

VcfToIntervals

Last commit

split a vcf to interval or bed for parallelization

Usage

This program is now part of the main jvarkit tool. See jvarkit for compiling.

Usage: java -jar dist/jvarkit.jar vcf2intervals  [options] Files

Usage: vcf2intervals [options] Files
  Options:
    --bed, --bed-output
      force BED format as output. (Default is '.interval_list')
      Default: false
    -D, --distance
      min size of an interval (or use option -N). A distance specified as a 
      positive integer.Commas are removed. The following suffixes are 
      interpreted : b,bp,k,kb,m,mb,g,gb
      Default: -1
    -h, --help
      print help and exit
    --helpFormat
      What kind of help. One of [usage,markdown,xml].
    --intervals, --bed-input
      Search for intervals for EACH record of the provided bed file. VCF path 
      must be provided and indexed.
    --min-distance
      extends the interval if the last variant is withing distance 'x' of the 
      next interval. Ignore if negative.A distance specified as a positive 
      integer.Commas are removed. The following suffixes are interpreted : 
      b,bp,k,kb,m,mb,g,gb 
      Default: -1
    -N, --variants, --n-variants
      number of variants per interval (or use option -D)
      Default: -1
    -o, --output
      Output file. Optional . Default: stdout
    --version
      print version and exit

Keywords

See also in Biostars

Creation Date

20211112

Source code

https://github.com/lindenb/jvarkit/tree/master/src/main/java/com/github/lindenb/jvarkit/tools/vcf2intervals/VcfToIntervals.java

Contribute

License

The project is licensed under the MIT license.

Citing

Should you cite vcf2intervals ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md

The current reference is:

http://dx.doi.org/10.6084/m9.figshare.1425030

Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030

Input

input is a VCF file or a VCF stream. input must be sorted on chrom/pos.

Example

$ java -jar dist/vcf2intervals.jar -N 5 src/test/resources/rotavirus_rf.vcf.gz  --min-distance 1
@HD	VN:1.6	SO:coordinate
@SQ	SN:RF01	LN:3302
@SQ	SN:RF02	LN:2687
@SQ	SN:RF03	LN:2592
@SQ	SN:RF04	LN:2362
@SQ	SN:RF05	LN:1579
@SQ	SN:RF06	LN:1356
@SQ	SN:RF07	LN:1074
@SQ	SN:RF08	LN:1059
@SQ	SN:RF09	LN:1062
@SQ	SN:RF10	LN:751
@SQ	SN:RF11	LN:666
@RG	ID:S1	SM:S1
@RG	ID:S2	SM:S2
@RG	ID:S3	SM:S3
@RG	ID:S4	SM:S4
@RG	ID:S5	SM:S5
@CO	vcf2intervals. compilation:20211112182935 githash:9b2ab03 htsjdk:2.24.1 date:20211112183125. cmd:-N 5 src/test/resources/rotavirus_rf.vcf.gz --min-distance 1
RF01	970	970	1	1
RF02	251	1965	5	1715
RF03	1221	2150	5	930
RF03	2201	2573	3	373
RF04	887	1860	5	974
RF04	1900	1920	2	21
RF05	41	1297	5	1257
RF05	1339	1339	1	1
RF06	517	1132	5	616
RF07	98	952	4	855
RF08	926	992	2	67
RF09	294	414	3	121
RF10	46	175	3	130
RF11	74	79	1	6
$ java -jar dist/vcf2intervals.jar --distance 300 --min-distance 0 src/test/resources/rotavirus_rf.vcf.gz  
@HD	VN:1.6	SO:coordinate
@SQ	SN:RF01	LN:3302
@SQ	SN:RF02	LN:2687
@SQ	SN:RF03	LN:2592
@SQ	SN:RF04	LN:2362
@SQ	SN:RF05	LN:1579
@SQ	SN:RF06	LN:1356
@SQ	SN:RF07	LN:1074
@SQ	SN:RF08	LN:1059
@SQ	SN:RF09	LN:1062
@SQ	SN:RF10	LN:751
@SQ	SN:RF11	LN:666
@RG	ID:S1	SM:S1
@RG	ID:S2	SM:S2
@RG	ID:S3	SM:S3
@RG	ID:S4	SM:S4
@RG	ID:S5	SM:S5
@CO	vcf2intervals. compilation:20211112182935 githash:9b2ab03 htsjdk:2.24.1 date:20211112183310. cmd:--distance 300 --min-distance 0 src/test/resources/rotavirus_rf.vcf.gz
RF01	970	970	1	1
RF02	251	251	1	1
RF02	578	877	2	300
RF02	1726	1965	2	240
RF03	1221	1242	2	22
RF03	1688	1708	2	21
RF03	2150	2315	3	166
RF03	2573	2573	1	1
RF04	887	991	2	105
RF04	1241	1262	2	22
RF04	1857	1920	3	64
RF05	41	41	1	1
RF05	499	795	2	297
RF05	879	879	1	1
RF05	1297	1339	2	43
RF06	517	695	4	179
RF06	1129	1132	1	4
RF07	98	225	2	128
RF07	684	952	2	269
RF08	926	992	2	67
RF09	294	414	3	121
RF10	46	175	3	130
RF11	74	79	1	6