Last commit

Coverage statistics for a BED file, group by gene


This program is now part of the main jvarkit tool. See jvarkit for compiling.

Usage: java -jar dist/jvarkit.jar bamstats05  [options] Files

Usage: bamstats05 [options] Files
  * -B, --bed
      bed file (columns: chrom(tab)start(tab)end(tab)GENE)
    -f, --filter, --jexl
      A JEXL Expression that will be used to filter out some sam-records (see 
      An expression should return a boolean value (true=exclude, false=keep 
      the read). An empty expression keeps everything. The variable 'record' 
      is the current observed read, an instance of SAMRecord (https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/samtools/SAMRecord.html).
      Default: record.getMappingQuality()<1 || record.getDuplicateReadFlag() || record.getReadFailsVendorQualityCheckFlag() || record.isSecondaryOrSupplementary()
      Group Reads by. Data partitioning using the SAM Read Group (see 
      https://gatkforums.broadinstitute.org/gatk/discussion/6472/ ) . It can 
      be any combination of sample, library....
      Default: sample
      Possible Values: [readgroup, sample, library, platform, center, sample_by_platform, sample_by_center, sample_by_platform_by_center, any]
    -h, --help
      print help and exit
      What kind of help. One of [usage,markdown,xml].
      Min mapping quality
      Default: 1
    -merge, --merge
      [20181122] Merge overlapping intervals for the same gene.
      Default: false
    -m, --mincoverage
      Coverage treshold. Any depth under this value will be considered as 
      'not-covered'.  Default: 0
      Default: []
    -o, --output
      Output file. Optional . Default: stdout
    -R, --reference
      For reading/writing CRAM files. Indexed fasta Reference file. This file 
      must be indexed with samtools faidx and with picard/gatk 
      CreateSequenceDictionary or samtools dict
      print version and exit


See also in Biostars

Creation Date


Source code


Unit Tests




The project is licensed under the MIT license.


Should you cite bamstats05 ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md

The current reference is:


Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030


input is one or more indexed bam file.

One file with the suffix ‘.list’ is interpreted as a text file with one path per line.

If there is no argument, stdin is interpreted as a list of path to the bam like in find . -name "*.bam"

Cited In:


$ head genes.bed
1	179655424	179655582	ZORG
1	179656788	179656934	ZORG

$ java -jar  dist/bamstats05.jar -B genes.bed --mincoverage 10 in.bam > out.txt

$ head out.txt
#chrom	start	end	gene	sample	length	mincov	maxcov	avg	nocoverage.bp	percentcovered
1	179655424	179656934	ZORG	SAMPLE1	304	27	405	216.80921052631578	0	100