The following is a short tutorial on how to align a group of sequences with the clustalw program.
cyrus% gcg <Return>First, Fetch a sequence to use as a query sequence. The GCG program called "fetch" will fetch a sequence from your local database. The sequence will be saved in a file with the name of the database appended at the end of the name. For example, the sequence "hba_human" is saved in a file called "hba_human.swissprot" because "hba_human" comes from the Swissprot database.
cyrus% fetch hba_human <Return>Now, do the same for the rest of your sequences.
cyrus% toig *.swissprot <Return> ToIG converts GCG sequence file(s) into a single file in IntelliGenetics format. glb5petma 149 aa hbahorse 141 aa hbahuman 141 aa hbbhorse 146 aa hbbhuman 146 aa lgb2luplu 153 aa mygphyca 153 aa What should I call the output file (* glb5_petma.ig *) ? hb.ig Note: Now put all of the IG sequences into one file and convert it to fasta format with the readseq program. cyrus% dir *.ig <Return> 2 1coh.ig 2 2mhb.ig 136 hb.ig cyrus% cat 1coh.ig 2mhb.ig hb.ig > temp.ig <Return> cyrus% readseq<Return> Enter an output filename: hb.fa<Return> Choose an output format (name or #): 8<Return> Name an input sequence or -option: temp.ig<Return> Choose a sequence (# or All): all <Return> Name an input sequence or -option: Hit Return key<Return> cyrus% more hb.fa cyrus% clustalw <Return> In the main menu, enter 1. Sequence Input From Disc Your choice: 1 You will be asked for your sequence file which can be in any of the following formts: NBRF/PIR, EMBL/SwissProt, Pearson (Fasta), GDE, Clustal, GCG/MSF Enter the name of the sequence file: horse.fa You will return to the main menu. Now enter the choice for 2. Multiple Alignments Your choice: 2 In the ****** MULTIPLE ALIGNMENT MENU ****** enter 9. Output format options Your choice: 9 3. Toggle GCG/MSF format output = OFF Enter number (or [RETURN] to exit): 3 Enter number (or [RETURN] to exit): 4 At the "****** MULTIPLE ALIGNMENT MENU ******" 1. Do complete multiple alignment now (Slow/Accurate) Your choice: 1 Enter a name for the CLUSTAL output file [horse.aln]: Enter a name for the GCG output file [horse.msf]: Enter a name for the PHYLIP output file [horse.phy]: Enter name for GUIDE TREE file [horse.dnd]: Press [RETURN] to continue or X to stop: Press [RETURN] to continue or X to stop: Press [RETURN] to continue: Your choice: 4. Phylogenetic trees Your choice: 4 2. Exclude positions with gaps? = OFF 3. Correct for multiple substitutions? = OFF Your choice: 2 Your choice: 3 4. Draw tree now Your choice: 4 Enter name for PHYLIP tree output file [horse.ph]: To print out the tree, use the phylip program drawtree or drawgram cyrus% drawtree<Return> drawtree: can't read fontfile Please enter a new filename> /opt/pkg/PHYLIP/font1<Return> printer? (Select choice L--Apple Laserwriter) preview? N (drawtree will create a postscript file called plotfile) ghostview plotfile (To look at tree on screen--make sure your DISPLAY variable is set.) lpr -Plw plotfile (To print out plotfile.)The clustalw program will create the following output files.
cyrus% dir horse* <Return> -rw-r--r-- 1 tom users 397 Sep 3 15:45 horse.ph -rw-r--r-- 1 tom users 3739 Sep 3 15:42 horse.msf -rw-r--r-- 1 tom users 2276 Sep 3 15:42 horse.phy -rw-r--r-- 1 tom users 397 Sep 3 15:41 horse.dnd -rw-r--r-- 1 tom users 3200 Sep 3 15:37 horse.aln cyrus%You can align your sequences with clustalw at our WWW site athttp://www-igbmc.u-strasbg.fr/BioInfo/ClustalW/
You can access a comprehensive list of multiple alignment programs at the VSNS BioComputing Division Multiple Alignment Resource Page http://www.techfak.uni-bielefeld.de/bcd/Curric/MulAli/welcome.html
You can learn more about multiple sequence alignments by going to the ALGORITHMS FOR MULTIPLE SEQUENCE ALIGNMENTS homepage at the www site http://ben.vub.ac.be/embnet.news/vol2_1/align.html