Here are some notes on how to create phylogenetic trees by using the
pima-to-paup.sh program after running pima.sh
CREATING PHYLOGENETIC TREES USING PIMA AND PIMA-TO-PAUP
If you want to create a phylogenetic tree you could run
doolitle in Eugene, but this works well only if your sequences
are similar.
To create a tree using pima and pima-to-paup, you will have to
have one file containing all of your dna sequences in table format,
and a second file containing the exact corressponding amino acid
translations of dna to amino acid, also in table format. The way
you create these files is immaterial. One way to do this is
the following:
CREATING A FILE OF DNA SEQUENCES.
Step 1.
Use lynx or the gcg program lookup to search for the DNA sequences
that you want.
Write down the LOCUS name for each sequence you are interested in.
1. Type gcg to bring up the gcg packag
Fetch all of the sequences you want from GenBank in GCG format.
Look at the Features table for each sequence. Write down the
positions of the CODING SEQUENCE, "CDS".
Now, use the program seqed to write out the CDS portions of your
DNA sequences to new files:
mbcrr% seqed hiv1bz163b.gb_vi <Return>
: 1,1015 w hiv1bz163b.dna <Return>
(The above command will create a file containing only the dna
that you are interested in--the exact corressponding dna to
the amino acid translation.)
REPEAT the above steps until you have retreived all of your
sequences, and written down the CDS positions for each.
Tutorial: If you would like to perform a tutorial of performing
phylogenetic analysis on a set of HIV GAG sequences, then
use the seqed command as discussed above and write out the
GAG CDS postions of the following Genbank sequences:
Genbank Locus Positions
HIV1BZ163B 1. .1015
HIV2ROD 546. .2114
HIVLBV217 1. .1462
HIVUG268 1. .1465
HIVVI557 1. .1468
SIVAGM3 431. .1996
SIVAGM677 897. .2438
SIVMM251 1041. .2561
SIVSTM 709. .2232
Now, put all of these sequences into IG format:
mbcrr% toig *.dna <Return>
output file: bigdnafile
mbcrr% IG-to-tbl bigdnafile > bigdnafile.tbl <Return>
Now you are done with STEP 1, creating a file of dna sequences.
Before we use this file, we must create a file of protein
sequences, and then run pima.
STEP 2: CREATING A FILE OF PROTEIN SEQUENCES: The exactly
Matching amino acid translations of the above DNA
sequences.
With the gcg package you can use the translate program.
(If you have closely followed the above directions, and all of
your dna cds files end in ".dna" you can do the translation step
in one step:
mbcrr% translate <Return>
filename? *.dna <Return>
Otherwise, you will have to run translate with each of your dna files.)
mbcrr% translate <Return>
filename? ciohombox.dna <Return>
pos 1 to end
output? ciohombox.aa <Return>
Remember to use a consistent naming strategy for your peptide files
so that you can use a wildcard like "*.aa" to list all of your
peptide files.
Repeat the above step for each dna sequence.
mbcrr% toig *.aa <Return>
output file: bigpepfile
mbcrr% IG-to-tbl bigpepfile > bigpepfile.tbl <Return>
Your output file, bigpepfile.tbl is now in table format.
STEP THREE: RUN THE PIMA PROGRAM.
mbcrr% pima.sh clus1 25.0 bigpepfile.tbl <Return>
The above command will cause pima to run. "clus1" is the name of
all of the output files from pima, 25.0 is used as the recommended
default value (you could use 0 instead, but this would force all
of the sequences into one cluster), and "bigpepfile.tbl" is the
name of the file containing all of the protein sequences in table
format.
NOTE: IF you get an error message such as
"ERROR: Sequence "hiv2rod" contains undefined characters !! ", check
to see that you do not have any STOP CODONS in your input file. If
you see any "." or "*" characters at the end of your protein sequences,
then remove them, and delete the corressponding codons in your DNA file.
Now, repeat the pima.sh command above.
Pima will create a bunch of output files, you are only interested in
"clus1-ML.pima", or possibly "clus1-SB.pima"
To put your alignments in printable form, run the following program.
mbcrr% print-pima.sh pimafile > outputfile <Return>
for example, we would type the following:
mbcrr% print-pima.sh clus1-ML.pima > clus1-ML.pp <Return>
mbcrr% more clus1-ML.pp
NOTE: above file viewed with gcg pretty program.
mbcrr% enscript -rfCourier8 -d qms clus1-ML.pp <Return>
or mbcrr% lpr clus1-ML.pp <Return>
To get detailed help on pima, type "man pima" at the mbcrr% prompt.
mbcrr% man pima <Return>
STEP FOUR: RUN PIMA-TO-PAUP TO CREATE A PHYLOGENETIC TREE.
(Note, this program only runs on SUNOS machines, not Solaris
machines, so you must run it on mbcrrb.dfci.harvard.edu
or hinshelwood.bu.edu
mbcrr% telnet mbcrrb (Or type rlogin mbcrrb) <Return>
login name: enter login name
passwd: enter password
mbcrrb% pima-to-paup.sh clus1-ML.pima bigdnafile.tbl <Return>
This program will take a few minutes to run. When finished:
mbcrr% enscript -rfCourier8 -d qms *ML.paup* <Return>
(to print out ascii text files sideways)
or
mbcrr% lpr -Plj3a *ML.tree* <Return>
(to print out postscript files on the lj3a laser printer located
in Mayer 3A-12)
Note to darwin/hinshelwood users:
You must specify the printer you want, such as:
hinshelwood% lpr -Plj *.tree* <Return>
or
hinshelwood% enscript -rfCourier8 -d qms *.paup* <Return>
Example of PAUP tree with 1st codons
![1st Codon Position PAUP tree [21 kilobytes]](clus1-ML.tree-1.gif)
Example of PAUP tree with 2nd codons
![2nd Codon Position PAUP tree [30 kilobytes]](clus1-ML.tree-2.gif)
Example of PAUP tree with 3rd codons
![3rd Codon Position PAUP tree [30 kilobytes]](clus1-ML.tree-3.gif)
Pima.sh Alignment of GAG Sequences displayed with prettyplot
![pima.sh&prettyplot example [300 kilobytes]](color1.gif)