Coverage statistics for a BED file, group by gene


This program is now part of the main jvarkit tool. See jvarkit for compiling.

Usage: java -jar dist/jvarkit.jar bamstats05  [options] Files

Usage: bamstats05 [options] Files
  * -B, --bed
      bed file (columns: chrom(tab)start(tab)end(tab)GENE)
    -f, --filter, --jexl
      A JEXL Expression that will be used to filter out some sam-records (see 
      An expression should return a boolean value (true=exclude, false=keep 
      the read). An empty expression keeps everything. The variable 'record' 
      is the current observed read, an instance of SAMRecord (https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/samtools/SAMRecord.html).
      Default: record.getMappingQuality()<1 || record.getDuplicateReadFlag() || record.getReadFailsVendorQualityCheckFlag() || record.isSecondaryOrSupplementary()
      Group Reads by. Data partitioning using the SAM Read Group (see 
      https://gatkforums.broadinstitute.org/gatk/discussion/6472/ ) . It can 
      be any combination of sample, library....
      Default: sample
      Possible Values: [readgroup, sample, library, platform, center, sample_by_platform, sample_by_center, sample_by_platform_by_center, any]
    -h, --help
      print help and exit
      What kind of help. One of [usage,markdown,xml].
      Min mapping quality
      Default: 1
    -merge, --merge
      [20181122] Merge overlapping intervals for the same gene.
      Default: false
    -m, --mincoverage
      Coverage treshold. Any depth under this value will be considered as 
      'not-covered'.  Default: 0
      Default: []
    -o, --output
      Output file. Optional . Default: stdout
    -R, --reference
      For reading/writing CRAM files. Indexed fasta Reference file. This file 
      must be indexed with samtools faidx and with picard/gatk 
      CreateSequenceDictionary or samtools dict
      print version and exit


The project is licensed under the MIT license.


Should you cite bamstats05 ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md

The current reference is:


Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030


input is one or more indexed bam file.

One file with the suffix ‘.list’ is interpreted as a text file with one path per line.

If there is no argument, stdin is interpreted as a list of path to the bam like in find . -name "*.bam"

$ head genes.bed
1	179655424	179655582	ZORG
1	179656788	179656934	ZORG

$ java -jar  dist/bamstats05.jar -B genes.bed --mincoverage 10 in.bam > out.txt

$ head out.txt
#chrom	start	end	gene	sample	length	mincov	maxcov	avg	nocoverage.bp	percentcovered
1	179655424	179656934	ZORG	SAMPLE1	304	27	405	216.80921052631578	0	100