How to prepare taxonomic subsets from public databanks

The Databank Manager Module of KoriBlast is the place where you can prepare your own databanks. Making a taxonomic specific databank requires two major elements: the NCBI Taxononomy databank and a source of sequences containing taxonomic information. Regarding the second element, databanks formatted using Embl, Genbank or Uniprot data model contain taxonomic information ready to be used by KoriBlast.

So, first of all you need to install the NCBI Taxonomy databank. This is quite easy: from the Databank Manager of Koriblast, locate the entry called Ncbi_Taxonomy, and click on the [Install] button, as illustrated on this picture:

Installing NCBI Taxonomy databank  - sequence search alignment structure tree phylogenetic bioinformatics genomics proteomics Software design bioinformatic analysis biological databanks biological information systems

As soon as the taxonomy databank is installed, you'll be able to prepare custom-made databanks containing sequences for particular taxonomic groups.

To illustrate how to do this, we'll take a simple example: preparing a Maccaca mulata (Rhesus monkey) dataset out of Swissprot. After that short tutorial, you'll be capable to adapt it in order to install your own databanks for other species, families, etc.

First, we need to know the NCBI Taxonomy identifier that corresponds to our organism. So, let's go to the NCBI taxonomic databank, and search for Maccaca mulata:

Search for NCBI taxon ID - sequence search alignment structure tree phylogenetic bioinformatics genomics proteomics Software design bioinformatic analysis biological databanks biological information systems

On the figure, we can easily locate the NCBI Taxonomic ID: 9544.

Now, we have to tell KoriBlast what to do with that information: we need to create a specific databank descriptor for that purpose. To facilitate that work, a new descriptor is always created from an existing one. Since we want to create a databank out of SwissProt, we select the corresponding databank descriptor as illustrated here:

Preparing a custom databank from SwissProt  - sequence search alignment structure tree phylogenetic bioinformatics genomics proteomics Software design bioinformatic analysis biological databanks biological information systems

Then, we click on the [Create] button, and we enter the name of our databank:

Create a databank descriptor - sequence search alignment structure tree phylogenetic bioinformatics genomics proteomics Software design bioinformatic analysis biological databanks biological information systems

We end up with a dialogue box that contains several pieces of information that will instruct KoriBlast what to do to deploy SwissProt locally from the data available remotely at Uniprot. Here, we have to update three fields: description, unit tasks and global tasks. Regarding the field "descripton", enter an appropriate name, at your convenience. Regarding the tasks fields, update them, as follows (be careful while editing tasks parameters, do not put space characters and respect letter case, semicolon and parenthesis):

Updating databank descriptor - sequence search alignment structure tree phylogenetic bioinformatics genomics proteomics Software design bioinformatic analysis biological databanks biological information systems

As you can see, we set a specific argument (taxinc) to the post-processing tasks :

     Unit tasks are: gunzip,idxsw(taxinc=9544)

     Global tasks are: delgz,deltmpidx,formatdb(lclid=false;check=true;nr=true;taxinc=9544)

They instruct KoriBlast to retain ("taxinc" parameter) only sequences matching or being a child of taxon ID 9544. It is worth noting that "taxinc" parameter accepts a comma separated list of taxonomic IDs; there is the opposite paramater "taxexc" than can be used to exclude sequences, and both "taxexc" and "taxinc" parameters can be used altogether. More on this: have a look at the KoriBlast electronic manual (click on the [Help and Tutorial button], then follow the Table of Content: User Guide / Databank Manager Module / Adding a new public databank for installation).

When the new descriptor is ready, click on [Ok] to close the databank descriptor editor, then click on the [Install] button:

Install custom made taxonomic databank - sequence search alignment structure tree phylogenetic bioinformatics genomics proteomics Software design bioinformatic analysis biological databanks biological information systems

Please note that installing taxonomic subsets take more time since KoriBlast has to handle the NCBI Taxonomy classification in order to retain the sequences matching the criteria defined using parameters "taxexc" and "taxinc".

That's it.

Now, if you want to install taxonomic subsets from other major databanks (Embl, Trembl, Genbank, etc.), start by making a custom descriptor from the ones available within the Databank Manager.

Go back to Using FAQ

 

Newsletter



Receive HTML?


Follow us on Korilog on Youtube Korilog on Twitter
Products Services Download & try Korilog Compagny
© 2007-2012 Korilog SARL, all rights reserved. Terms of Use and Privacy Policy