• Motivation for Muliple Alignment

    After you do a blast sequence similarity search, you will have a file containing the alignments of sequences in the database against your query sequence. However, there are two limitations to this output. First, you will only see alignments between your sequence and the database sequence-- it will not show you how they all align with eachother. Second, the blast output will only show you the common areas between your sequence and the database sequence--it will not show you the whole sequences aligned with eachother. To get around these disadvantages, you can do a multiple alignment of a set of sequences. You might want to look at your blast output and retrieve all the sequences that have a blast score higher than 10-28. If you have too many hits, you might want to just take 1 hit from each family.
  • Skip this description and get to the example.

    PIMAII: To align a group of Protein Sequences in table format.

    1. Advantages:

      1. Speed: The PIMAII program is much faster than pileup.
      2. Sensitivity: The PIMAII program is usually more sensitive than other multiple alignment programs.
    2. Disadvantages

      1. No DNA: The PIMAII program does not work with DNA sequences. Your sequences must be Protein.
    3. CONCLUSION:

      1. 1. The PIMAII program is preferred over pileup and other multiple alignment programs when aligning a set of protein sequences.
    
    

    The following is a short tutorial on how to align a group of sequences with the PIMAII program.

    cyrus% gcg <Return>
    

    First, Fetch a sequence to use as a query sequence. The GCG program called "fetch" will fetch a sequence from your local database. The sequence will be saved in a file with the name of the database appended at the end of the name. For example, the sequence "hba_human" is saved in a file called "hba_human.swissprot" because "hba_human" comes from the Swissprot database.

    cyrus% fetch hba_human <Return>
    

    Now, do the same for the rest of your sequences.

    cyrus% toig *.swissprot <Return> ToIG converts GCG sequence file(s) into a single file in IntelliGenetics format. glb5petma 149 aa hbahorse 141 aa hbahuman 141 aa hbbhorse 146 aa hbbhuman 146 aa lgb2luplu 153 aa mygphyca 153 aa What should I call the output file (* glb5_petma.ig *) ? swiss.ig <Return> cyrus% dir *.ig <Return> 2 1coh.ig 2 2mhb.ig 136 swiss.ig cyrus% cat *.ig > peps <Return> cyrus% more peps cyrus% IG-to-tbl peps > peps.tbl <Return> cyrus% pimaII <Return> Usage: pimaII [-h] Sequence-File Outfile-Prefix -(l|g|s) -[f|v|k|a|p] [-c config-file] [-t locus-names-file] cyrus% pimaII peps.tbl horse -l <Return>

    The pimaII program will create the following output files.

    cyrus% dir horse* <Return> 8 horse 2 horse.cluster 2 horse.root 4 horse.align 4 horse.pattern cyrus%

    The file "horse.align" will contain the alignment information.

    To put this alignment information into a prettier format, you can use the "print_pimaII.sh" program.

    cyrus% print_pimaII.sh horse.align > horse.pp <Return>
    .
    cyrus% more horse.pp <Return>  
    
    

    You can PIMAII your sequence against at our WWW site athttp://bmerc-www.bu.edu/protein-seq/pimaII-new.html

    You can access a comprehensive list of multiple alignment programs at the VSNS BioComputing Division Multiple Alignment Resource Page http://www.techfak.uni-bielefeld.de/bcd/Curric/MulAli/welcome.html

    You can learn more about multiple sequence alignments by going to the ALGORITHMS FOR MULTIPLE SEQUENCE ALIGNMENTS homepage at the www site http://ben.vub.ac.be/embnet.news/vol2_1/align.html