use the āNā operator in the cigar string to find unknown splice sites
Usage: findnewsplicesites [options] Files
Options:
--bamcompression
Compression Level.
Default: 5
-B, --bed
Optional BED output
* -g, --gtf
A GTF (General Transfer Format) file. See
https://www.ensembl.org/info/website/upload/gff.html . Please note that
CDS are only detected if a start and stop codons are defined.
-h, --help
print help and exit
--helpFormat
What kind of help. One of [usage,markdown,xml].
--maxRecordsInRam
When writing files that need to be sorted, this will specify the number
of records stored in RAM before spilling to disk. Increasing this number
reduces the number of file handles needed to sort a file, and increases
the amount of RAM needed
Default: 50000
-out, --out
Output file. Optional . Default: stdout
-R, --reference
For reading cram. Indexed fasta Reference file. This file must be
indexed with samtools faidx and with picard CreateSequenceDictionary
--samoutputformat
Sam output format.
Default: SAM
Possible Values: [BAM, SAM, CRAM]
--tmpDir
tmp working directory. Default: java.io.tmpDir
Default: []
--version
print version and exit
-d
max distance between known splice site and cigar end
Default: 0
${PATH}
. Setting JAVA_HOME is not enough : (e.g: https://github.com/lindenb/jvarkit/issues/23 )$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ ./gradlew findnewsplicesites
The java jar file will be installed in the dist
directory.
The project is licensed under the MIT license.
Should you cite findnewsplicesites ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md
The current reference is:
http://dx.doi.org/10.6084/m9.figshare.1425030
Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030
$ java -jar dist/findnewsplicesites.jar \
--gtf http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.gtf.gz \
hg19.bam > out.sam