Maps uniprot/genbank annotations on a blast result.
This program is now part of the main jvarkit
tool. See jvarkit for compiling.
Usage: java -jar dist/jvarkit.jar blastmapannots [options] Files
Usage: blastmapannots [options] Files
Options:
--exclude
Exclude uniprot/feature/type of genbank/feature/key.
Default: []
* -u, -g, --genbank, --uniprot
XML sequence file Genbank.xml or uniprot.xml.
-h, --help
print help and exit
--helpFormat
What kind of help. One of [usage,markdown,xml].
--include
Restrict to uniprot/feature/type of genbank/feature/key.
Default: []
--version
print version and exit
* -APC
append the sequence accession before the feature name.
Default: false
The project is licensed under the MIT license.
Should you cite blastmapannots ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md
The current reference is:
http://dx.doi.org/10.6084/m9.figshare.1425030
Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030
Download uniprot P04514 ( Rotavirus Non-structural protein 3 ) as XML
$ curl -o P04514.xml "http://www.uniprot.org/uniprot/P04514.xml"
Download the same P04514 as fasta
$ curl -o P04514.fasta "http://www.uniprot.org/uniprot/P04514.fasta"
TblastN P04514.fasta vs a RNA of NSP3 in genbank http://www.ncbi.nlm.nih.gov/nuccore/AY065842.1 and save the ouput as XML:
<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
<BlastOutput>
<BlastOutput_program>tblastn</BlastOutput_program>
(...)
<Hit>
<Hit_num>1</Hit_num>
<Hit_id>gi|18139606|gb|AY065842.1|</Hit_id>
<Hit_def>Rhesus rotavirus nonstructural protein 3 (NSP3) gene, complete cds</Hit_def>
<Hit_accession>AY065842</Hit_accession>
<Hit_len>1078</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_bit-score>546.584</Hsp_bit-score>
<Hsp_score>1407</Hsp_score>
<Hsp_evalue>0</Hsp_evalue>
<Hsp_query-from>1</Hsp_query-from>
<Hsp_query-to>313</Hsp_query-to>
<Hsp_hit-from>26</Hsp_hit-from>
<Hsp_hit-to>964</Hsp_hit-to> <Hsp_qseq>MLKMESTQQMASSIINTSFEAAVVAATSTLELMGIQYDYNEIYTRVKSKFDYVMDDSGVKNNLLGKAATIDQALNGKFGSVMRNKNWMTDSRTVAKLDEDVNKLRMMLSSKGIDQKMRVLNACFSVKRIPGKSSSVIKCTRLMKDKIERGAVEVDDSFVEEKMEVDTVDWKSRYDQLERRFESLKQRVNEKYTTWVQKAKKVNENMYSLQNVISQQQNQIADLQNYCSKLEADLQNKVGSLVSSVEWYLKSMELPDEVKTDIEQQLNSIDTISPINAIDDLEILIRNLIHDYDRTFLMFKGLLRQCNYEYAYE</Hsp_qseq>
<Hsp_hseq>MLKMESTQQMASSIINSSFEAAVVAATSTLELMGIQYDYNEVYTRVKSKFDLVMDDSGVKNNLIGKAITIDQALNGKFSSAIRNRNWMTDSRTVAKLDEDVNKLRIMLSSKGIDQKMRVLNACFSVKRIPGKSSSIVKCTRLMKDKLERGEVEVDDSFVEEKMEVDTIDWKSRYEQLEKRFESLKHRVNEKYNHWVLKARKVNENMNSLQNVISQQQAHINELQMYNNKLERDLQSKIGSVVSSIEWYLRSMELSDDVKSDIEQQLNSIDQLNPVNAIDDFESILRNLISDYDRLFIMFKGLLQQCNYTYTYE</Hsp_hseq>
<Hsp_midline>MLKMESTQQMASSIIN SFEAAVVAATSTLELMGIQYDYNE YTRVKSKFD VMDDSGVKNNL GKA TIDQALNGKF S RN NWMTDSRTVAKLDEDVNKLR MLSSKGIDQKMRVLNACFSVKRIPGKSSS KCTRLMKDK ERG VEVDDSFVEEKMEVDT DWKSRY QLE RFESLK RVNEKY WV KA KVNENM SLQNVISQQQ I LQ Y KLE DLQ K GS VSS EWYL SMEL D VK DIEQQLNSID P NAIDD E RNLI DYDR F MFKGLL QCNY Y YE</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
(...)
</Iteration>
</BlastOutput_iterations>
</BlastOutput>
Now produce a BED file with this blast result to map the features of P04514 to AY065842.
$ java -jar dist/jvarkit.jar blastmapannots I=P04514.xml B=blast.xml
AY065842 25 961 Non-structural_protein_3 943 + 25961 255,255,255 1 936 25
AY065842 34 469 RNA-binding 970 + 34 469 255,255,255 1 435 34
AY065842 472 640 Dimerization 947 + 472 640 255,255,255 1 168 472
AY065842 532 724 Interaction_with_ZC3H7B 917 + 532 724 255,255,255 1 192 532
AY065842 646 961 Interaction_with_EIF4G1 905 + 646 961 255,255,255 1 315 646
AY065842 520 733 coiled-coil_region 916 + 520 733 255,255,255 1 213 520