This file contains a discussion about some advantages and disadvantages of using the BLAST program. We also show an example of doing a BLAST search in GCG.
  • Skip this description and get to the example.

    BLAST

    1. Advantages:

      1. Speed: The BLAST program searches for sequences similar to my query sequence. The BLAST program is preferred over the FASTA program because BLAST executes much faster than FASTA. A typical BLAST search done locally will execute in less than a minute, whereas a local FASTA search will take about 30 to 60 minutes or more.
      2. Sensitivity: The BLAST program is usually more sensitive than the FASTA program for detecting protein sequence similarity when both programs are used with their default parameters because it does not require a perfect match in the first stage of the search.
      3. DNA vs AA: The BLAST program can directly translate a nucleotide sequence into six frames and search a protein database. This would require six separate searches with FASTA.
      4. Mask Repeat Areas: The BLAST program allows you to filter repeat regions and areas of low complexity so that you do not have many false hits just because your sequence has a short repeat in it.
    2. Disadvantages

      1. Weak DNA Hits: The long word size in a DNA sequence similarity search allows the program to execute extremely fast, but the price of speed is a loss in sensitivity. The FASTA program will show some weak DNA hits that will not be found in your BLAST report.
    3. CONCLUSION:

      1. 1. The BLAST program is preferred over FASTA for sequence similarity searching--it will give you your answers in a minute or two rather than waiting an hour for your FASTA report.
      2. 2. As with any sequence comparison tool, there are problems of interpreting the significance of the result. There are some rules of thumb however; BLAST p-values of less than 1e-20 for protein comparisons can be taken as strongly supporting a shared sequence domain. Values between 1e-7 and 1e-19 are of potential interest. However one must check for common functionality among any large set of matches to a query in this range.

    The following is a short tutorial on how to fetch a sequence and run a BLAST search within the GCG package.

    cyrus% gcg <Return>
    

    First, Fetch a sequence to use as a query sequence. The GCG program called "fetch" will fetch a sequence from your local database. The sequence will be saved in a file with the name of the database appended at the end of the name. For example, the sequence "t57624" is saved in a file called "t57624.gb_est1" because "t57624" comes from the Genbank Expressed Sequence Tag database.

    cyrus% fetch t57624 <Return>
    

    Now, use the GCG blast program to perform the sequence search.

    cyrus% blast <Return>
    BLAST searches for sequences similar to a query sequence. The query and the
    database searched can be either peptide or nucleic acid in any combination. 
    BLAST can search databases on your own computer or databases maintained at
    the National Center for Biotechnology Information (NCBI) in Bethesda,
    Maryland, USA. 
     BLAST search with what query sequence?t57624.gb_est1.html
     Search for query in what sequence database:
       1) nr          p Non-redundant GenBank CDS translations+PDB+SwissProt+PIR   
       2)   pdb       p PDB protein sequences
       3)   swissprot p SwissProt sequences                                  
       4) yeast       p Saccharomyces cerevisiae protein sequences
       5) kabat       p Kabat Sequences of Proteins of Immunological Interest
       6) alu         p Translations of Select Alu Repeats from REPBASE
       7) month       p All new or revised GenBank CDS translation+PDB+SwissProt+PI
       8) nr          n Non-redundant GenBank+EMBL+DDBJ+PDB sequences (but no EST's
       9)   pdb       n PDB nucleotide sequences
      10)   vector    n Vector subset of GenBank
      11) yeast       n Saccharomyces cerevisiae genomic nucleotide sequences
      12) est         n Non-redundant Database of GenBank+EMBL+DDBJ EST Division   
      13) sts         n Non-redundant Database of GenBank+EMBL+DDBJ STS Division   
      14) gss         n Genome Survey Sequences
      15) mito        n Database of mitochondrial sequences, Rel. 1.0, July 1995   
      16) kabat       n Kabat Sequences of Nucleic Acid of Immunological Interest  
      17) epd         n Eukaryotic Promotor Database
      18) alu         n Select Alu Repeats from REPBASE
      19) month       n All new or revised GenBank+EMBL+DDBJ+PDB sequences released
     Please choose one (* 1 *):  8 <Return>  
     Ignore hits expected to occur by chance more than (* 10.0 *) times? <Return>  
     Limit the number of sequences in my output to (* 250 *) ?   <Return>  
     What should I call the output file (* t57624.blastn *) ?   <Return>  
     Trying cruncher.nlm.nih.gov (130.14.25.175)
     Connected to cruncher.nlm.nih.gov
    Waiting for 7 of 8 other BLAST requests on the system to finish.
    Still waiting
     Search in progress on the network server.
     ............................................
     Retrieving results.
     ....
    WARNING:  Descriptions of 668 database sequences were not reported due to the
              limiting value of parameter V = 250.
    ..........................................
    WARNING:  HSPs involving 818 database sequences were not reported due to the
              limiting value of parameter B = 100.
    .
    cyrus% more t57624.blastn <Return>  
    
    

    You can BLAST your sequence against on Genome Databases on our WWW site athttp://bmerc-www.bu.edu/genome/genomeblastp.html

    You can BLAST your sequence at the NCBI at the WWW site http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-blast?Jform=0

    You can BLAST your sequence against EMBL in France at the WWW site http://vega.crbm.cnrs-mop.fr/bin/blast-guess.cgi