Filters Uniprot DUMP+ XML with a javascript (java rhino) expression. Context contain ‘entry’ an uniprot entry and ‘index’, the index in the XML file.
Usage: uniprotfilterjs [options] Files
Options:
-h, --help
print help and exit
--helpFormat
What kind of help. One of [usage,markdown,xml].
-o, --output
Output file. Optional . Default: stdout
--version
print version and exit
-e
(js expression). Optional.
-f
(js file). Optional.
${PATH}
. Setting JAVA_HOME is not enough : (e.g: https://github.com/lindenb/jvarkit/issues/23 )$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ ./gradlew uniprotfilterjs
The java jar file will be installed in the dist
directory.
The project is licensed under the MIT license.
Should you cite uniprotfilterjs ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md
The current reference is:
http://dx.doi.org/10.6084/m9.figshare.1425030
Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030
the following script get the human (id=9606) uniprot entries having an id in ensembl:
function accept(e)
{
var ok=0,i;
// check organism is human
if(e.getOrganism()==null) return false;
var L= e.getOrganism().getDbReference();
if(L==null) return false;
for(i=0;i<L.size();++i)
{
if(L.get(i).getId()=="9606") {ok=1;break;}
}
if(ok==0) return false;
ok=0;
L= e.getDbReference();
if(L==null) return false;
for(i=0;i<L.size();++i)
{
if(L.get(i).getType()=="Ensembl") {ok=1;break;}
}
return ok==1;
}
accept(entry);
$ curl -skL "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.xml.gz" | gunzip -c |\
java -jar dist/uniprotfilterjs.jar -f filter.js > output.xml