| How to use pre-formatted Blast sequence databases ? |
|
As you may know some institutes like the NCBI provide the scientific community with pre-formatted Blast sequence databanks. Speaking about the NCBI, these banks are available at ftp://ftp.ncbi.nlm.nih.gov/blast/db/ . And you will be happy to know that these banks can be used with KoriBlast to run Blast search on your computer. Here is the procedure. First of all, you have to download the databanks of interest; either you use a tool such as FileZilla, or you can configure the KoriBlast Databank Manager to do the job. For example, if you want to use the protein 'nr' Blast sequence databank, download all the files named 'nr.XX.tar.gz' from the NCBI FTP site. Starting with KoriBlast 2.6, you can uncompress, unarchive and intall native Blast databanks directly with KoriBlast. This video shows you how to proceed. Technical note about the pre-formatted (i.e. NCBI native) Blast databanks. Pre-formatted Blast databanks are binary files that are archived and compressed using the Unix tools 'gzip' and 'tar'. This fact explains the file extension of native Blast databanks: '.tar.gz'. As soon as these files are uncompressed and unarchived, several files are extracted for each databank. Among these files, the ones ending with one of the Blast file extensions, '.nin', '.nal', '.pin' or '.pal' (check the end of this article for a meaning of these extensions) are of particular interest. Blast sequence databanks made of multiple volumes (in our example, the 'nr' bank is made of 3 volumes: 'nr.00', 'nr.01' and 'nr.02') are always associated with a file having '.pal' or '.nal' extension (in our example, it is 'nr.pal'). This is the file type you have to provide to KoriBlast as soon as you have extracted the content of the databank archives; note: when a Blast sequence databank is made of a single volume (pdb or swissprot, for example), then select the file having '.pin' or '.nin' extension. What does mean the Blast databank file extensions ? 'nin' stands for Nucleotide INdex, 'nal' stands for Nucleotide ALias, 'pin' stands for 'Protein INdex' and 'pal' stands for Protein ALias. 'nin' files are always available for a nucleotide databank. However 'nal' is only available when the nucleotide databank is made of several volumes. Same remark stands for 'pin' and 'pal', but for protein databanks. Important notice: when you have downloaded from NCBI a Blast sequence databank, be careful when selecting the file to use with KoriBlast:
1. when a databank is made of several volumes, give to KoriBlast '.pal' or '.nal' file.
|