jvarkit

Biostar173114

Last commit

make a bam file smaller by removing unwanted information see also https://www.biostars.org/p/173114/

Usage

This program is now part of the main jvarkit tool. See jvarkit for compiling.

Usage: java -jar dist/jvarkit.jar biostar173114  [options] Files

Usage: biostar173114 [options] Files
  Options:
    --bamcompression
      Compression Level. 0: no compression. 9: max compression;
      Default: 9
    -h, --help
      print help and exit
    --helpFormat
      What kind of help. One of [usage,markdown,xml].
    -keepAtt, --keepAttributes
      keep Attributes
      Default: false
    -keepCigar, --keepCigar
      keep cigar : don't remove hard clip
      Default: false
    -keepName, --keepName, --name
      keep Read Name, do not try to create a shorter name
      Default: false
    -keepQuals, --keepQualities
      keep base qualities
      Default: false
    -keepRG, --keepReadGroup
      if attributes are removed, keep the RG
      Default: false
    -keepSeq, --keepSequence
      keep read sequence
      Default: false
    -mate, --mate
      keep Mate/Paired Information
      Default: false
    -o, --output
      Output file. Optional . Default: stdout
    -R, --reference
      For reading/writing CRAM files. Indexed fasta Reference file. This file 
      must be indexed with samtools faidx and with picard/gatk 
      CreateSequenceDictionary or samtools dict
    --samoutputformat
      Sam output format.
      Default: SAM
      Possible Values: [BAM, SAM, CRAM]
    --version
      print version and exit

Keywords

Source code

https://github.com/lindenb/jvarkit/tree/master/src/main/java/com/github/lindenb/jvarkit/tools/biostar/Biostar173114.java

Unit Tests

https://github.com/lindenb/jvarkit/tree/master/src/test/java/com/github/lindenb/jvarkit/tools/biostar/Biostar173114Test.java

Contribute

License

The project is licensed under the MIT license.

Citing

Should you cite biostar173114 ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md

The current reference is:

http://dx.doi.org/10.6084/m9.figshare.1425030

Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030

Example

 $ java -jar dist/biostar173114.jar --keepSequence    my.bam  

@HD	VN:1.5	GO:none	SO:coordinate
@SQ	SN:rotavirus	LN:1074
R0	0	rotavirus	1	60	70M	*	0	0	GGCTTTTAATGCTTTTCAGTGGTTGCTGCTCAATATGGCGTCAACTCAGCAGATGGTCAGCTCTAATATT	*
R1	0	rotavirus	1	60	70M	*	0	0	GGCTTTTACTGCTTTTCAGTGGTTGCTTCTCAAGATGGAGTGTACTCATCAGATGGTAAGCTCTATTATT	*
R2	0	rotavirus	1	60	70M	*	0	0	GGCTTTTAATGCTTTTCATTTGATGCTGCTCAAGATGGAGTCTACACAGCAGATGGTCAGCTCTATTATT	*
R3	0	rotavirus	1	60	70M	*	0	0	GGCTTTTAATGCTTTTCAGTGGTTGCTGCTCAAGATGGAGTCTCCTGAGCAGCTGGTAAGCTCTATTATT	*
R4	0	rotavirus	1	60	70M	*	0	0	GGCATTTAATGCTTAACAGTGGTTGCTGCTCAAGATGGAGTCTACTCAGCAGATGGTAAGCTCTCTTATT	*
R5	0	rotavirus	2	60	70M	*	0	0	GCTTTTAAAGCTTTTCAGTTGTTGCTGCTCAAGATGGAGTCTACTCAGCAGATGTTACGCTCTATTATTA	*
R6	0	rotavirus	2	60	51M19S	*	0	0	GCTTTTAATGCTTTTCAGTTGTTGCTGCACAAGATGGAGTCTACACAGCAGCTGTTCATCTCTCTTCATC	*
R7	0	rotavirus	2	60	70M	*	0	0	GCTTTTAATGCTTTTCAGTGGTTTCTTCTCACGATGGAGTCTACTCAGCAGAAGGTAAGCACTATTATTA	*
R8	0	rotavirus	2	60	70M	*	0	0	GCTTTTAAAGCATTACAGTTGTTGCAGCTCAAGAAGGAGACTACTCAGCAGATGGTAAGCTCTATAATTA	*
R9	0	rotavirus	2	60	70M	*	0	0	GCTTTTAATTCTATTCAGTGGTTGCTGCTCCAGAAGGAGTCTACTCAGGAGATGGTACGCTCTCTTATTA	*
Ra	0	rotavirus	2	60	70M	*	0	0	GCTTTTAATGCTTTTCAGTGGTTGCTGCTCAAGATGGAGTCTACTCAGCAGATGGTAAGCTCAATTATTA	*
Rb	0	rotavirus	2	60	70M	*	0	0	GCTTTTAATGCTTTTCAGTTGTAGCTGCTCAAGATGGAGTCTACTCATCAGATGGTAAGCTCTCTTCTTA	*
Rc	0	rotavirus	2	60	63M7S	*	0	0	TCTTTAAATGCTTTTCAGTGTTTGCTGCTCAAGATGGAGTCTACTCAGCAGAAGGTAAGCTCTCTTAAAC	*
Rd	0	rotavirus	2	60	66M4S	*	0	0	GCATTTAATGCTTTTCAGTGGTTGCTGCACAAGATGGAGTCTACTCAGCAGATTGTAAGCTCTATTCTAA	*
Re	0	rotavirus	3	60	70M	*	0	0	CTTTTAATGCTTTTCAGTGGTTGCTGCTCAAGAAGGCGTCTCCTGATGAGATGGTAAGCTCTATTATTAA	*
Rf	0	rotavirus	3	60	70M	*	0	0	CTTTTAATGGTTATGAGTGGTTGGTGCACAAGATGGAGTCTACTCAGCAGATGGTACTCTCTATAATTAA	*
R10	0	rotavirus	3	60	70M	*	0	0	CTTTTAAAGCTTTTCAGTGGTTGCTGCTCAAGATGGAGTCTACTCAGCAGATGGTACTCTCTATTCTTAA	*
R11	0	rotavirus	3	60	70M	*	0	0	CTTTTAAAGCTTTTCAGAGGTTGCTGCTCAAGATGTAGTCTACTCAGGAGATTGTAAGCTCTATTATTAA	

See also