| Getting Blast sequence databanks |
|
Introduction
To obtain Blast ready sequence databanks, you have to know that Blast only accepts a set of sequences formatted for it. This is not specific to KoriBlast, actually this is a requirement of the Blast software itself. Namely, to run Blast with any set of sequences, they have to be provided to the formatdb tool available from the NCBI as part of the Blast suite of softwares. That formatdb software is the only one capable of creating a Blast ready sequence databank. And you should also note that formatdb only accepts Fasta formatted set of sequences in order to prepare a Blast databank. KoriBlast extends the capability of formatdb and provides to you an easy way to prepare Blast databanks from standard sequence data sets formatted as Genbank, Embl, Uniprot, Swissprot files (either plain text or gzipped). You have two ways to provide KoriBlast (and Blast) with a Blast sequence databank. Either you download from the web such a sequence database or you have to prepare it from a set of sequence files. In both cases, KoriBlast will help you to install the sequence databanks.
Getting Blast databanks from the NCBI
As far as we know, only the NCBI directly provides Blast ready sequence databanks. They can be downloaded from their web site: ftp://ftp.ncbi.nih.gov/blast/db.The files of interest all have the extension '.tar.gz' and they have to be downloaded using a tool such as FileZilla, or any other FTP tool of your choice. Starting with KoriBlast 2.6, these files can be directly uncompressed and unarchived from KoriBlast. In the following table, we give some details to explain which files to download when you want to use some particular databanks. The NCBI help desk provides a more detailed document about the content of this repository, so do not hesitate to read it too.
All the files you can download from the NCBI Blast databank repository can be provided to KoriBlast to run Blast searches. This is quite easy, as explained here.
Getting sequence data files from the Web
Many sequence data sets are available all around the world, in various file formats accepted by KoriBlast (Genbank, Embl, Uniprot, Swissprot, Fasta). We cannot mention all of them, but here are some major sources of data:
All the files you can download from this table can be provided to KoriBlast to prepare a Blast databank. This is quite easy, as explained here.
Using NCBI thematic databanks
The NCBI provides a wide range of sequence databanks available to use with their Blast Internet service. You will appreciate to know that KoriBlast is capable of using all of them. This is explained in the KoriBlast User Manual. Within KoriBlast, click on the button [Help and Visual Tutorial] to open the manual. Within the table of content, follow the path: The Configuration Module, Updating the sequence databases for the NCBI Blast system.
|