jvarkit

Biostar9462889

Last commit

Extracting reads from a regular expression in a bam file

Usage

Usage: biostar9462889 [options] Files
  Options:
    --bamcompression
      Compression Level. 0: no compression. 9: max compression;
      Default: 5
    --force
      overwrite existing files
      Default: false
    -h, --help
      print help and exit
    --helpFormat
      What kind of help. One of [usage,markdown,xml].
    -M, --manifest
      Manifest file describing the generated files. Optional
  * -o, --output
      (prefix) output directory
    --prefix
      Output file prefix
      Default: split
    -R, --reference
      Indexed fasta Reference file. This file must be indexed with samtools 
      faidx and with picard CreateSequenceDictionary
  * --regex, -regex
      Regular expression that can be used to parse read names in the incoming 
      SAM file. Regex groups are used to classify the reads.
      Default: <empty string>
    --regions
      Limit analysis to this interval. A source of intervals. The following 
      suffixes are recognized: vcf, vcf.gz bed, bed.gz, gtf, gff, gff.gz, 
      gtf.gz.Otherwise it could be an empty string (no interval) or a list of 
      plain interval separated by '[ \t\n;,]'
    --samoutputformat
      Sam output format.
      Default: BAM
      Possible Values: [BAM, SAM, CRAM]
    --validation-stringency
      SAM Reader Validation Stringency
      Default: LENIENT
      Possible Values: [STRICT, LENIENT, SILENT]
    --version
      print version and exit

Keywords

See also in Biostars

Compilation

Requirements / Dependencies

Download and Compile

$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ ./gradlew biostar9462889

The java jar file will be installed in the dist directory.

Creation Date

20210402

Source code

https://github.com/lindenb/jvarkit/tree/master/src/main/java/com/github/lindenb/jvarkit/tools/biostar/Biostar9462889.java

Contribute

License

The project is licensed under the MIT license.

Citing

Should you cite biostar9462889 ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md

The current reference is:

http://dx.doi.org/10.6084/m9.figshare.1425030

Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030

##Example

$ java -jar dist/biostar9462889.jar --manifest jeter.mf -o TMP --regex '^(RF[0-9]+)_' src/test/resources/S1.bam
WARNING: BAM index file /home/lindenb/src/jvarkit/src/test/resources/S1.bam.bai is older than BAM /home/lindenb/src/jvarkit/src/test/resources/S1.bam
[INFO][Biostar9462889]Creating output for "RF01" N=1
[INFO][Biostar9462889]Creating output for "RF02" N=2
[INFO][Biostar9462889]Creating output for "RF03" N=3
[INFO][Biostar9462889]Creating output for "RF04" N=4
[INFO][Biostar9462889]Creating output for "RF05" N=5
[INFO][Biostar9462889]Creating output for "RF06" N=6
[INFO][Biostar9462889]Creating output for "RF07" N=7
[INFO][Biostar9462889]Creating output for "RF08" N=8
[INFO][Biostar9462889]Creating output for "RF09" N=9
[INFO][Biostar9462889]Creating output for "RF10" N=10
[INFO][Biostar9462889]Creating output for "RF11" N=11
[WARN][Biostar9462889]0 read(s) where lost because the regex '^(RF[0-9]+)_' failed.

$ ls TMP/*.bam
TMP/split.000001.bam  TMP/split.000004.bam  TMP/split.000007.bam  TMP/split.000010.bam
TMP/split.000002.bam  TMP/split.000005.bam  TMP/split.000008.bam  TMP/split.000011.bam
TMP/split.000003.bam  TMP/split.000006.bam  TMP/split.000009.bam

$ cat jeter.mf
RF01	TMP/split.000001.bam
RF02	TMP/split.000002.bam
RF03	TMP/split.000003.bam
RF04	TMP/split.000004.bam
RF05	TMP/split.000005.bam
RF06	TMP/split.000006.bam
RF07	TMP/split.000007.bam
RF08	TMP/split.000008.bam
RF09	TMP/split.000009.bam
RF10	TMP/split.000010.bam
RF11	TMP/split.000011.bam