Creates an archive of small bams with only a few regions.
This program is now part of the main jvarkit
tool. See jvarkit for compiling.
Usage: java -jar dist/jvarkit.jar mkminibam [options] Files
Usage: mkminibam [options] Files
Options:
--bnd
[20190427]When reading VCF file, don't get the mate position for the
structural BND variants.
Default: false
-b, --bounds, --edge
[20190427] If `b` is greater than 0 and the user interval has a length
greater than `b` then consider the edges of the object as two positions.
the idea is to just save the boundaries of a large deletion. A distance
specified as a positive integer.Commas are removed. The following
suffixes are interpreted : b,bp,k,kb,m,mb,g,gb
Default: -1
-C, --comment
[20190427]Add a file '*.md' with this comment.
Default: <empty string>
-x, --extend
Extend the positions by 'x' bases. A distance specified as a positive
integer.Commas are removed. The following suffixes are interpreted :
b,bp,k,kb,m,mb,g,gb
Default: 5000
--filter
A filter expression. Reads matching the expression will be filtered-out.
Empty String means 'filter out nothing/Accept all'. See https://github.com/lindenb/jvarkit/blob/master/src/main/resources/javacc/com/github/lindenb/jvarkit/util/bio/samfilter/SamFilterParser.jj
for a complete syntax. 'default' is 'mapqlt(1) || Duplicate() ||
FailsVendorQuality() || NotPrimaryAlignment() ||
SupplementaryAlignment()'
Default: Accept All/ Filter out nothing
-h, --help
print help and exit
--helpFormat
What kind of help. One of [usage,markdown,xml].
--no-samples
[20191129]Allow no sample/ no read group : use filename
Default: false
* -o, --output
An existing directory or a filename ending with the '.zip' or '.tar' or
'.tar.gz' suffix.
--prefix
File prefix in the archive. Special value 'now' or empty string will be
replaced by the current date
Default: miniBam.
-R, --reference
Optional Reference file for CRAM files. Multiple allowed. Indexed fasta
Reference file. This file must be indexed with samtools faidx and with
picard/gatk CreateSequenceDictionary or samtools dict
Default: []
-T, --tmp
Tmp working directory
Default: /tmp
* -B, --bed, -p, --pos, -V, --variant, --vcf, --regions
A source of intervals. The following suffixes are recognized: vcf,
vcf.gz bed, bed.gz, gtf, gff, gff.gz, gtf.gz.Otherwise it could be an
empty string (no interval) or a list of plain interval separated by '[
\t\n;,]'
Default: (unspecified)
--version
print version and exit
20190410
The project is licensed under the MIT license.
Should you cite mkminibam ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md
The current reference is:
http://dx.doi.org/10.6084/m9.figshare.1425030
Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030
Bams are too bigs and my users often ask to visualize a small region of a set of bam
input is a set of bam files or a file with the suffix ‘.list’ containing one path to a bam per line.
$ find src/test/resources/ -name "S*.bam" > bams.list
$ java -jar dist/mkminibam.jar -p "RF01:100" -o out.zip bams.list
[INFO][MakeMiniBam]src/test/resources/S5.bam
[INFO][MakeMiniBam]src/test/resources/S2.bam
[INFO][MakeMiniBam]src/test/resources/S4.bam
[INFO][MakeMiniBam]src/test/resources/S3.bam
[INFO][MakeMiniBam]src/test/resources/S1.bam
$ unzip -t out.zip
Archive: out.zip
testing: miniBam.S5.bam OK
testing: miniBam.S5.bai OK
testing: miniBam.S2.bam OK
testing: miniBam.S2.bai OK
testing: miniBam.S4.bam OK
testing: miniBam.S4.bai OK
testing: miniBam.S3.bam OK
testing: miniBam.S3.bai OK
testing: miniBam.S1.bam OK
testing: miniBam.S1.bai OK
No errors detected in compressed data of out.zip.