• Skip this description and get to the example.

    clustalw: To align a group of Protein Sequences in table format.

      The clustalw program is an easy to use menu-driven program that accepts various file formats. This software was described in: Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Research, submitted, June 1994. Be sure to convert your IG formatted sequence file to Fasta format. (You can use readseq to do this). Then type clustalw to run the program. This program is very user-friendly and has online help. For more information, see the readme.txt and clustalw.ms files in the directory /usr/local/src/clustalw
      
      

      The following is a short tutorial on how to align a group of sequences with the clustalw program.

      cyrus% gcg <Return>
      

      First, Fetch a sequence to use as a query sequence. The GCG program called "fetch" will fetch a sequence from your local database. The sequence will be saved in a file with the name of the database appended at the end of the name. For example, the sequence "hba_human" is saved in a file called "hba_human.swissprot" because "hba_human" comes from the Swissprot database.

      cyrus% fetch hba_human <Return>
      

      Now, do the same for the rest of your sequences.

      cyrus% toig *.swissprot <Return> ToIG converts GCG sequence file(s) into a single file in IntelliGenetics format. glb5petma 149 aa hbahorse 141 aa hbahuman 141 aa hbbhorse 146 aa hbbhuman 146 aa lgb2luplu 153 aa mygphyca 153 aa What should I call the output file (* glb5_petma.ig *) ? hb.ig Note: Now put all of the IG sequences into one file and convert it to fasta format with the readseq program. cyrus% dir *.ig <Return> 2 1coh.ig 2 2mhb.ig 136 hb.ig cyrus% cat 1coh.ig 2mhb.ig hb.ig > temp.ig <Return> cyrus% readseq<Return> Enter an output filename: hb.fa<Return> Choose an output format (name or #): 8<Return> Name an input sequence or -option: temp.ig<Return> Choose a sequence (# or All): all <Return> Name an input sequence or -option: Hit Return key<Return> cyrus% more hb.fa cyrus% clustalw <Return> In the main menu, enter 1. Sequence Input From Disc Your choice: 1 You will be asked for your sequence file which can be in any of the following formts: NBRF/PIR, EMBL/SwissProt, Pearson (Fasta), GDE, Clustal, GCG/MSF Enter the name of the sequence file: horse.fa You will return to the main menu. Now enter the choice for 2. Multiple Alignments Your choice: 2 In the ****** MULTIPLE ALIGNMENT MENU ****** enter 9. Output format options Your choice: 9 3. Toggle GCG/MSF format output = OFF Enter number (or [RETURN] to exit): 3 Enter number (or [RETURN] to exit): 4 At the "****** MULTIPLE ALIGNMENT MENU ******" 1. Do complete multiple alignment now (Slow/Accurate) Your choice: 1 Enter a name for the CLUSTAL output file [horse.aln]: Enter a name for the GCG output file [horse.msf]: Enter a name for the PHYLIP output file [horse.phy]: Enter name for GUIDE TREE file [horse.dnd]: Press [RETURN] to continue or X to stop: Press [RETURN] to continue or X to stop: Press [RETURN] to continue: Your choice: 4. Phylogenetic trees Your choice: 4 2. Exclude positions with gaps? = OFF 3. Correct for multiple substitutions? = OFF Your choice: 2 Your choice: 3 4. Draw tree now Your choice: 4 Enter name for PHYLIP tree output file [horse.ph]: To print out the tree, use the phylip program drawtree or drawgram cyrus% drawtree<Return> drawtree: can't read fontfile Please enter a new filename> /opt/pkg/PHYLIP/font1<Return> printer? (Select choice L--Apple Laserwriter) preview? N (drawtree will create a postscript file called plotfile) ghostview plotfile (To look at tree on screen--make sure your DISPLAY variable is set.) lpr -Plw plotfile (To print out plotfile.)

      The clustalw program will create the following output files.

      cyrus% dir horse* <Return> -rw-r--r-- 1 tom users 397 Sep 3 15:45 horse.ph -rw-r--r-- 1 tom users 3739 Sep 3 15:42 horse.msf -rw-r--r-- 1 tom users 2276 Sep 3 15:42 horse.phy -rw-r--r-- 1 tom users 397 Sep 3 15:41 horse.dnd -rw-r--r-- 1 tom users 3200 Sep 3 15:37 horse.aln cyrus%

      You can align your sequences with clustalw at our WWW site athttp://www-igbmc.u-strasbg.fr/BioInfo/ClustalW/

      You can access a comprehensive list of multiple alignment programs at the VSNS BioComputing Division Multiple Alignment Resource Page http://www.techfak.uni-bielefeld.de/bcd/Curric/MulAli/welcome.html

      You can learn more about multiple sequence alignments by going to the ALGORITHMS FOR MULTIPLE SEQUENCE ALIGNMENTS homepage at the www site http://ben.vub.ac.be/embnet.news/vol2_1/align.html