jvarkit

FindNewSpliceSites

Last commit

use the ‘N’ operator in the cigar string to find unknown splice sites

Usage

Usage: findnewsplicesites [options] Files
  Options:
    --bamcompression
      Compression Level.
      Default: 5
    -B, --bed
      Optional BED output
  * -g, --gtf
      A GTF (General Transfer Format) file. See 
      https://www.ensembl.org/info/website/upload/gff.html . Please note that 
      CDS are only detected if a start and stop codons are defined.
    -h, --help
      print help and exit
    --helpFormat
      What kind of help. One of [usage,markdown,xml].
    --maxRecordsInRam
      When writing  files that need to be sorted, this will specify the number 
      of records stored in RAM before spilling to disk. Increasing this number 
      reduces the number of file  handles needed to sort a file, and increases 
      the amount of RAM needed
      Default: 50000
    -out, --out
      Output file. Optional . Default: stdout
    -R, --reference
      For reading cram. Indexed fasta Reference file. This file must be 
      indexed with samtools faidx and with picard CreateSequenceDictionary
    --samoutputformat
      Sam output format.
      Default: SAM
      Possible Values: [BAM, SAM, CRAM]
    --tmpDir
      tmp working directory. Default: java.io.tmpDir
      Default: []
    --version
      print version and exit
    -d
      max distance between known splice site and cigar end
      Default: 0

Keywords

bam
sam
rnaseq
splice
gtf

Compilation

Requirements / Dependencies

java compiler SDK 11. Please check that this java is in the ${PATH}. Setting JAVA_HOME is not enough : (e.g: https://github.com/lindenb/jvarkit/issues/23 )

Download and Compile

$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ ./gradlew findnewsplicesites

The java jar file will be installed in the dist directory.

Source code

https://github.com/lindenb/jvarkit/tree/master/src/main/java/com/github/lindenb/jvarkit/tools/rnaseq/FindNewSpliceSites.java

Contribute

Issue Tracker: http://github.com/lindenb/jvarkit/issues
Source Code: http://github.com/lindenb/jvarkit

License

The project is licensed under the MIT license.

Citing

Should you cite findnewsplicesites ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md

The current reference is:

http://dx.doi.org/10.6084/m9.figshare.1425030

Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030

Example

$  java -jar dist/findnewsplicesites.jar \
     --gtf http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.gtf.gz \
      hg19.bam > out.sam

This site is open source. Improve this page.