Takes a ucsc genpred file, scan the 5’ UTRs and generate a GFF3 containing upstream-ORF. Inspired from https://github.com/ImperialCardioGenetics/uORFs
This program is now part of the main jvarkit
tool. See jvarkit for compiling.
Usage: java -jar dist/jvarkit.jar gff3upstreamorf [options] Files
Usage: gff3upstreamorf [options] Files
Options:
--break-original-orf
if ATG(uORF) is in frame with original ORF , do not calculate the
peptide beyond the original ATG.
Default: false
-h, --help
print help and exit
--helpFormat
What kind of help. One of [usage,markdown,xml].
-o, --output
Output file. Optional . Default: stdout
* -r, -R, --reference
Indexed fasta Reference file. This file must be indexed with samtools
faidx and with picard/gatk CreateSequenceDictionary or samtools dict
--strength
only accept events that are greater or equal to this Kozak strength.
Default: nil
Possible Values: [Strong, Moderate, Weak, nil]
--version
print version and exit
20220724
The project is licensed under the MIT license.
Should you cite gff3upstreamorf ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md
The current reference is:
http://dx.doi.org/10.6084/m9.figshare.1425030
Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030
Part of this code was inspired from: https://github.com/ImperialCardioGenetics/uORFs/blob/master/5primeUTRannotator/five_prime_UTR_annotator.pm
Wikipedia:
An Upstream Open Reading Frame (uORF) is an open reading frame (ORF) within the 5’ untranslated region (5’UTR) of an mRNA. uORFs can regulate eukaryotic gene expression. Translation of the uORF typically inhibits downstream expression of the primary ORF. In bacteria, uORFs are called leader peptides, and were originally discovered on the basis of their impact on the regulation of genes involved in the synthesis or transport of amino acids.
java -jar dist/gff3upstreamorfasta -R GRCh38.fa Homo_sapiens.GRCh38.107.chr.gff3.gz > uorf.gff3
note to self: test ENSG00000141736 https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1003529