jvarkit

Biostar78285

Last commit

Extract BAMs coverage as a VCF file.

Usage

This program is now part of the main jvarkit tool. See jvarkit for compiling.

Usage: java -jar dist/jvarkit.jar biostar78285  [options] Files

Usage: biostar78285 [options] Files
  Options:
    -B, --bed, --capture
      Limit analysis to this bed file
    -f, --filter
      A JEXL Expression that will be used to filter out some sam-records (see 
      https://software.broadinstitute.org/gatk/documentation/article.php?id=1255). 
      An expression should return a boolean value (true=exclude, false=keep 
      the read). An empty expression keeps everything. The variable 'record' 
      is the current observed read, an instance of SAMRecord (https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/samtools/SAMRecord.html).
      Default: record.getMappingQuality()<1 || record.getDuplicateReadFlag() || record.getReadFailsVendorQualityCheckFlag() || record.isSecondaryOrSupplementary()
    -gcw, --gc-percent-window, --gcw
      GC% window size. (if REF is defined)
      Default: 20
    -h, --help
      print help and exit
    --helpFormat
      What kind of help. One of [usage,markdown,xml].
    -m, --min-depth
      Min depth tresholds.
      Default: []
    -o, --output
      Output file. Optional . Default: stdout
    --partition
      When using display READ_GROUPS, how should we partition the ReadGroup ? 
      Data partitioning using the SAM Read Group (see 
      https://gatkforums.broadinstitute.org/gatk/discussion/6472/ ) . It can 
      be any combination of sample, library....
      Default: sample
      Possible Values: [readgroup, sample, library, platform, center, sample_by_platform, sample_by_center, sample_by_platform_by_center, any]
    -R, --reference
      Optional. Indexed fasta Reference file. This file must be indexed with 
      samtools faidx and with picard/gatk CreateSequenceDictionary or samtools 
      dict 
    --version
      print version and exit

Keywords

See also in Biostars

Source code

https://github.com/lindenb/jvarkit/tree/master/src/main/java/com/github/lindenb/jvarkit/tools/biostar/Biostar78285.java

Unit Tests

https://github.com/lindenb/jvarkit/tree/master/src/test/java/com/github/lindenb/jvarkit/tools/biostar/Biostar78285Test.java

Contribute

License

The project is licensed under the MIT license.

Citing

Should you cite biostar78285 ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md

The current reference is:

http://dx.doi.org/10.6084/m9.figshare.1425030

Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030

Example

$ java -jar dist/biostar78285.jar -m 300   -R  ref.fa S*.bam 

##fileformat=VCFv4.2
##Biostar78285.SamFilter=record.getMappingQuality()<1 || record.getDuplicateReadFlag() || record.getReadFailsVendorQualityCheckFlag() || record.isSecondaryOrSupplementary()
##FILTER=<ID=DP_LT_300,Description="All  genotypes have DP< 300">
##FORMAT=<ID=DF,Number=1,Type=Integer,Description="Number of Reads on plus strand">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=DR,Number=1,Type=Integer,Description="Number of Reads on minus strand">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AVG_DP,Number=1,Type=Float,Description="Mean depth">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=FRACT_DP_LT_300,Number=1,Type=Float,Description="Fraction of genotypes having DP< 300">
##INFO=<ID=GC_PERCENT,Number=1,Type=Integer,Description="GC% window_size:20">
##INFO=<ID=MAX_DP,Number=1,Type=Integer,Description="Max depth">
##INFO=<ID=MEDIAN_DP,Number=1,Type=Float,Description="Median depth">
##INFO=<ID=MIN_DP,Number=1,Type=Integer,Description="Min depth">
##INFO=<ID=NUM_DP_LT_300,Number=1,Type=Integer,Description="Number of genotypes having DP< 300">
##contig=<ID=rotavirus,length=1074>
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	S1	S2	S3	S4
rotavirus	1	.	G	.	.	DP_LT_300	AVG_DP=4.25;DP=17;FRACT_DP_LT_300=1.0;GC_PERCENT=38;MAX_DP=5;MEDIAN_DP=4.50;MIN_DP=3;NUM_DP_LT_300=4	GT:DF:DP:DR	./.:5:5:0	./.:5:5:0	./.:3:3:0	./.:4:4:0
rotavirus	2	.	G	.	.	DP_LT_300	AVG_DP=9.50;DP=38;FRACT_DP_LT_300=1.0;GC_PERCENT=40;MAX_DP=14;MEDIAN_DP=8.50;MIN_DP=7;NUM_DP_LT_300=4	GT:DF:DP:DR	./.:14:14:0	./.:9:9:0	./.:8:8:0	./.:7:7:0
rotavirus	3	.	C	.	.	DP_LT_300	AVG_DP=12.25;DP=49;FRACT_DP_LT_300=1.0;GC_PERCENT=39;MAX_DP=18;MEDIAN_DP=11.50;MIN_DP=8;NUM_DP_LT_300=4	GT:DF:DP:DR	./.:18:18:0	./.:11:11:0	./.:12:12:0	./.:8:8:0
rotavirus	4	.	T	.	.	DP_LT_300	AVG_DP=16.25;DP=65;FRACT_DP_LT_300=1.0;GC_PERCENT=37;MAX_DP=22;MEDIAN_DP=17.00;MIN_DP=9;NUM_DP_LT_300=4	GT:DF:DP:DR	./.:22:22:0	./.:16:16:0	./.:18:18:0	./.:9:9:0
rotavirus	5	.	T	.	.	DP_LT_300	AVG_DP=20.75;DP=83;FRACT_DP_LT_300=1.0;GC_PERCENT=40;MAX_DP=27;MEDIAN_DP=21.50;MIN_DP=13;NUM_DP_LT_300=4	GT:DF:DP:DR	./.:27:27:0	./.:18:18:0	./.:25:25:0	./.:13:13:0
rotavirus	6	.	T	.	.	DP_LT_300	AVG_DP=24.25;DP=97;FRACT_DP_LT_300=1.0;GC_PERCENT=42;MAX_DP=33;MEDIAN_DP=25.50;MIN_DP=13;NUM_DP_LT_300=4	GT:DF:DP:DR	./.:30:30:0	./.:21:21:0	./.:33:33:0	./.:13:13:0
rotavirus	7	.	T	.	.	DP_LT_300	AVG_DP=28.00;DP=112;FRACT_DP_LT_300=1.0;GC_PERCENT=40;MAX_DP=38;MEDIAN_DP=30.00;MIN_DP=14;NUM_DP_LT_300=4	GT:DF:DP:DR	./.:38:38:0	./.:23:23:0	./.:37:37:0	./.:14:14:0
rotavirus	8	.	A	.	.	DP_LT_300	AVG_DP=30.50;DP=122;FRACT_DP_LT_300=1.0;GC_PERCENT=42;MAX_DP=41;MEDIAN_DP=33.00;MIN_DP=15;NUM_DP_LT_300=4	GT:DF:DP:DR	./.:41:41:0	./.:26:26:0	./.:40:40:0	./.:15:15:0
rotavirus	9	.	A	.	.	DP_LT_300	AVG_DP=34.75;DP=139;FRACT_DP_LT_300=1.0;GC_PERCENT=44;MAX_DP=48;MEDIAN_DP=37.50;MIN_DP=16;NUM_DP_LT_300=4	GT:DF:DP:DR	./.:46:46:0	./.:29:29:0	./.:48:48:0	./.:16:16:0
rotavirus	10	.	T	.	.	DP_LT_300	AVG_DP=40.75;DP=163;FRACT_DP_LT_300=1.0;GC_PERCENT=43;MAX_DP=56;MEDIAN_DP=43.00;MIN_DP=21;NUM_DP_LT_300=4	GT:DF:DP:DR	./.:56:56:0	./.:35:35:0	./.:51:51:0	./.:21:21:0
rotavirus	11	.	G	.	.	DP_LT_300	AVG_DP=44.75;DP=179;FRACT_DP_LT_300=1.0;GC_PERCENT=45;MAX_DP=58;MEDIAN_DP=49.50;MIN_DP=22;NUM_DP_LT_300=4	GT:DF:DP:DR	./.:58:58:0	./.:42:42:0	./.:57:57:0	./.:22:22:0
rotavirus	12	.	C	.	.	DP_LT_300	AVG_DP=48.75;DP=195;FRACT_DP_LT_300=1.0;GC_PERCENT=43;MAX_DP=66;MEDIAN_DP=53.00;MIN_DP=23;NUM_DP_LT_300=4	GT:DF:DP:DR	./.:66:66:0	./.:46:46:0	./.:60:60:0	./.:23:23:0
rotavirus	13	.	T	.	.	DP_LT_300	AVG_DP=53.50;DP=214;FRACT_DP_LT_300=1.0;GC_PERCENT=42;MAX_DP=73;MEDIAN_DP=58.50;MIN_DP=24;NUM_DP_LT_300=4	GT:DF:DP:DR	./.:73:73:0	./.:51:51:0	./.:66:66:0	./.:24:24:0

History