Extracting reads from a regular expression in a bam file
This program is now part of the main jvarkit
tool. See jvarkit for compiling.
Usage: java -jar dist/jvarkit.jar biostar9462889 [options] Files
Usage: biostar9462889 [options] Files
Options:
--bamcompression
Compression Level. 0: no compression. 9: max compression;
Default: 5
--force
overwrite existing files
Default: false
-h, --help
print help and exit
--helpFormat
What kind of help. One of [usage,markdown,xml].
-M, --manifest
Manifest file describing the generated files. Optional
* -o, --output
(prefix) output directory
--prefix
Output file prefix
Default: split
-R, --reference
Indexed fasta Reference file. This file must be indexed with samtools
faidx and with picard/gatk CreateSequenceDictionary or samtools dict
* --regex, -regex
Regular expression that can be used to parse read names in the incoming
SAM file. Regex groups are used to classify the reads.
Default: <empty string>
--regions
Limit analysis to this interval. A source of intervals. The following
suffixes are recognized: vcf, vcf.gz bed, bed.gz, gtf, gff, gff.gz,
gtf.gz.Otherwise it could be an empty string (no interval) or a list of
plain interval separated by '[ \t\n;,]'
--samoutputformat
Sam output format.
Default: BAM
Possible Values: [BAM, SAM, CRAM]
--validation-stringency
SAM Reader Validation Stringency
Default: LENIENT
Possible Values: [STRICT, LENIENT, SILENT]
--version
print version and exit
20210402
The project is licensed under the MIT license.
Should you cite biostar9462889 ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md
The current reference is:
http://dx.doi.org/10.6084/m9.figshare.1425030
Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030
##Example
$ java -jar dist/biostar9462889.jar --manifest jeter.mf -o TMP --regex '^(RF[0-9]+)_' src/test/resources/S1.bam
WARNING: BAM index file /home/lindenb/src/jvarkit/src/test/resources/S1.bam.bai is older than BAM /home/lindenb/src/jvarkit/src/test/resources/S1.bam
[INFO][Biostar9462889]Creating output for "RF01" N=1
[INFO][Biostar9462889]Creating output for "RF02" N=2
[INFO][Biostar9462889]Creating output for "RF03" N=3
[INFO][Biostar9462889]Creating output for "RF04" N=4
[INFO][Biostar9462889]Creating output for "RF05" N=5
[INFO][Biostar9462889]Creating output for "RF06" N=6
[INFO][Biostar9462889]Creating output for "RF07" N=7
[INFO][Biostar9462889]Creating output for "RF08" N=8
[INFO][Biostar9462889]Creating output for "RF09" N=9
[INFO][Biostar9462889]Creating output for "RF10" N=10
[INFO][Biostar9462889]Creating output for "RF11" N=11
[WARN][Biostar9462889]0 read(s) where lost because the regex '^(RF[0-9]+)_' failed.
$ ls TMP/*.bam
TMP/split.000001.bam TMP/split.000004.bam TMP/split.000007.bam TMP/split.000010.bam
TMP/split.000002.bam TMP/split.000005.bam TMP/split.000008.bam TMP/split.000011.bam
TMP/split.000003.bam TMP/split.000006.bam TMP/split.000009.bam
$ cat jeter.mf
RF01 TMP/split.000001.bam
RF02 TMP/split.000002.bam
RF03 TMP/split.000003.bam
RF04 TMP/split.000004.bam
RF05 TMP/split.000005.bam
RF06 TMP/split.000006.bam
RF07 TMP/split.000007.bam
RF08 TMP/split.000008.bam
RF09 TMP/split.000009.bam
RF10 TMP/split.000010.bam
RF11 TMP/split.000011.bam