jvarkit

VCFShuffle

Last commit

Shuffle a VCF

Usage

Usage: java -jar dist/vcfshuffle.jar  [options] Files
Usage: vcfshuffle [options] Files
  Options:
    --bcf-output
      If this program writes a VCF to a file, The format is first guessed from 
      the file suffix. Otherwise, force BCF output. The current supported BCF 
      version is : 2.1 which is not compatible with bcftools/htslib (last 
      checked 2019-11-15)
      Default: false
    --generate-vcf-md5
      Generate MD5 checksum for VCF output.
      Default: false
    -h, --help
      print help and exit
    --helpFormat
      What kind of help. One of [usage,markdown,xml].
    --maxRecordsInRam
      When writing  files that need to be sorted, this will specify the number 
      of records stored in RAM before spilling to disk. Increasing this number 
      reduces the number of file  handles needed to sort a file, and increases 
      the amount of RAM needed
      Default: 50000
    -o, --out
      Output file. Optional . Default: stdout
    -N, --seed
      random seed. Optional. -1 = use current time.
      Default: -1
    --tmpDir
      tmp working directory. Default: java.io.tmpDir
      Default: []
    --version
      print version and exit

Keywords

Compilation

Requirements / Dependencies

Download and Compile

$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ ./gradlew vcfshuffle

The java jar file will be installed in the dist directory.

Creation Date

20131210

Source code

https://github.com/lindenb/jvarkit/tree/master/src/main/java/com/github/lindenb/jvarkit/tools/misc/VCFShuffle.java

Unit Tests

https://github.com/lindenb/jvarkit/tree/master/src/test/java/com/github/lindenb/jvarkit/tools/misc/VCFShuffleTest.java

Contribute

License

The project is licensed under the MIT license.

Citing

Should you cite vcfshuffle ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md

The current reference is:

http://dx.doi.org/10.6084/m9.figshare.1425030

Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030

Example

$ java -jar dist/vcfshuffle.jar input.vcf

native alternative

bcftools view --header-only in.vcf > tmp1.vcf
bcftools view --no-header in.vcf |\
	awk '{printf("%d\t%s\n",int(rand()*10000),$0);}' |\
	sort -t $'\t' -k1,1n -T . |\
	cut -f 1 > tmp2.vcf
	
cat tmp1.vcf tmp2.vcf > shuffled.vcf