|KLAST has been compared to BLAST and SSearch to evaluate speedup and data quality produced by the new algorithm. Since we used SSearch in this test, we chose reduced data sets to take into account long running times.
KLASTp benchmark compared the first 2327 proteins from the black cottonwood Populus trichocarpa proteome against the first 2.9 million sequences from the NCBI RefSeq databank. All computations were conducted on an Apple MacPro computer.
KLAST: release 2.0
BLAST: release 2.2.26+ from NCBI
SSearch: release 36 from University of Virginia
Data sets retrieved on April 25th, 2012:
1. Query databank: Populus trichocarpa, Fasta file Ptrichocarpa_156_peptide.fa.gz from ftp://ftp.jgi-psf.org/pub/JGI_data/phytozome/v7.0/Ptrichocarpa/annotation/.
2. Subject databank: NCBI RefSeq pre-formatted databank, volume 00. File refseq_protein.00.tar.gz from ftp://ftp.ncbi.nih.gov/blast/db/ were processed through blastdbcmd tool to extract the Fasta file.
All tests were conducted on an Apple MacPro computer running OSX-Lion (10.7.3) on two 2.66GHz 6-Core Intel Xeon “Westmere” processors, 32 Gb RAM and 1 Tb HDD.
|Running time (s)
- Softwares were configured using an increasing number of cores for computation, a BLOSUM62 matrix, an E-Value threshold set to 1e-3 and results were produced in tabular formatted files to enable comparison of data between Blast/Klast and SSearch.
- Accuray was evaluated by computing the fraction (%) of sequence alignments produced by each algorithm that are also found by a reference algorithm: SSearch. Results from Blast and Klast were compared with SSearch as follows: for each query sequence, we checked equality between hit sequence IDs and sequence alignment locations.
- It is worth noting that Klast is faster than Blast even on a single computing core.
Comparison of algorithms running on 8 cores.