| How to fetch data from your sequence databases ? |
|
Introduction As a reminder, KoriBlast is capable of retrieving sequences (Fasta formatted) and sequence data (features, taxonomy, db cross-references, etc) from sequence databanks. By default, KoriBlast does this by querying the NCBI Entrez facility. However, it is possible to tell KoriBlast to use your own sequence databanks managed on an exisiting web-based server. Starting with KoriBlast 3.0, it is possible to manage databanks from the Databank Manager of KoriBlast. The rest of this article explains how to do this with the standard release of the software. Please note that this article is intended to people having knowledge about the World Wide Web communication protocol, HTTP. The KoriBlast WWW Connectors To query sequence databanks in a generic way, KoriBlast uses the well known Hyper-Text Transfert Protocol, or HTTP for short. To achieve the querying process, the software uses what we call a connector, i.e. a component capable of sending a query to a server, then receiving the answer back from it. KoriBlast uses two connectors, one to get Fasta formatted sequences, the other to get sequence data. And because these connectors rely on the Word Wide Web communication protocol, they are called WWW Connectors.Such connectors are used by KoriBlast to make connections to your data gateways, as explained in the next section. What do you need to allow KoriBlast to query your own databanks ? To let KoriBlast query your databanks, you need two master pieces (in addition to your databanks, of course): a web server and a software gateway. The first piece is obvious. The second piece is the software that acts as the gateway between KoriBlast and your databanks. That software gateway has to be installed behind your web server and can be anything: a shell script, a cgi-bin, a servlet, etc. We do not provide this piece because it is highly dependent on your IT system. Either you develop the gateway yourself, or we can provide you with a consultancy service to help you in the development and tuning of your gateway (software design and implementation, and INSD XML files preparation). More about our consultancy service is available on this page. The next two sections give you the details on how KoriBlast talks to a web server to get back the requested data. How does KoriBlast query a data server to get FASTA sequences ? Since KoriBlast relies on a Web server, it gets data by emitting a POST query to provide the server with a set of key/value pairs. To get Fasta formatted sequences, KoriBlast sends to the server two key/value pairs, as follows
Accepted value for key 'database' is either 'nucleotide' or 'protein'. Accepted values for key 'id' are sequence identifiers; when several sequence IDs are given in one query, they have to be separated by a comma. Example. When KoriBlast needs the Fasta formatted sequence of entry P12265 from SwissProt, it emits the following HTTP POST request: http://path_to_your_sequence_gateway?database=protein&id=P12265 ('path_to_your_sequence_gateway' will be explained below). To serve KoriBlast with valid data, the gateway has to return a Fasta formatted file when answering such a KoriBlast query. How does KoriBlast query a server to get sequence data ? KoriBlast gets sequence data by emitting a POST query to provide the server with four of key/value pairs, as follows
Accepted value for key 'database' is either 'nucleotide' or 'protein'. Accepted value for key 'id' is a sequence identifier; please note that only one sequence ID can be provided for each single query. Accepted values for 'seq_start' and 'seq_stop' are positive integers that define the sequence coordinate range for which to retrieve the features; whatever the sequence type or strand (protein vs. nucleotide sequences) 'seq_start' is always less than or equal to 'seq_stop'. Keys 'seq_start' and 'seq_stop' are optional. If they are not provided, the server has to return the full entry to KoriBlast. However, if they are provided, the server has to return only the sequence data for the range defined by 'seq_start' and 'seq_stop'. The server has to answer KoriBlast with a valid INSDSeq XML formatted file. Please, refer to the web site of the INSD for more information. Example. When KoriBlast needs the data sequence of the full entry P12265 from SwissProt, it emits the following HTTP POST request: http://path_to_your_data_gateway?database=protein&id=P12265 Example. When KoriBlast needs the data sequence in the range [2,165] for the same entry, it emits the following HTTP POST request: http://path_to_your_data_gateway?database=protein&id=P12265&seq_start=2&seq_stop=165 How to configure KoriBlast to use your data server ? In the above examples, we have written 'path_to_your_sequence_gateway' and 'path_to_your_data_gateway' in the URLs used by KoriBlast to connect your sequence/data server. These items have to correspond to the fully qualified URLs of your gateways. The general form of such a gateway URL is: http://server_address:server_port/service_path/service_name?... Here is a simple example for 'path_to_your_sequence_gateway': http://my.lab.com:8080/cgi-bin/fasta.cgi?database=protein&id=P12265 Here is a simple example for 'path_to_your_data_gateway': http://my.lab.com:8080/cgi-bin/insd.cgi?database=protein&id=P12265&seq_start=2&seq_stop=165 These URLs can be set up directly from KoriBlast. Open the Preferences dialogue box (menu Edit->Preferences), and locate the two items 'WWW Sequence Connector' and 'WWW Feature Connector' on the left side: these are the KoriBlast WWW Connectors. The following picture shows you how to configure the 'WWW Sequence Server' using the URL of our example (of course, you need to adapt the values with your own):
On the figure you can see Fasta Directory and CGI items, and Entry Directory and CGI items as well. The two first items (Fasta) are configured as explained earlier. The two other items correspond to the service used by KoriBlast to display a sequence database entry in a web browser. You can configure these items with values that correspond to an appropriate gateway located on your server. The following picture shows you how to configure the 'WWW Feature Server' using the URL of our example (of course, you need to adapt the values with your own):
Now that you have set up the WWW Connectors, you have to tell KoriBlast for which Blast services they are available. Suppose you use a WWW Blast server and you want to fetch sequence data for the results obtained with this Blast system, proceed as illustrated on this picture:
You can see that we have set up the WWWBlast system to use the WWW Feature Connector (line 'WWWBLast.floader=WWWFeature') and the WWW Sequence Connector (line 'WWWBLast.sloader=WWWSequence'). Save the configuration... KoriBlast is now set up to use your databanks. |