Here are some notes on how to create phylogenetic trees by using the pima-to-paup.sh program after running pima.sh

CREATING PHYLOGENETIC TREES USING PIMA AND PIMA-TO-PAUP


If you want to create a phylogenetic tree you could run
doolitle in Eugene, but this works well only if your sequences
are similar.

To create a tree using pima and pima-to-paup, you will have to
have one file containing all of your dna sequences in table format,
and a second file containing the exact corressponding amino acid
translations of dna to amino acid, also in table format.  The way
you create these files is immaterial.  One way to do this is
the following:

CREATING A FILE OF DNA SEQUENCES.
Step 1.

Use lynx or the gcg program lookup to search for the DNA sequences 
that you want.
Write down the LOCUS name for each sequence you are interested in.

1.	Type gcg to bring up the gcg packag

Fetch all of the sequences you want from GenBank in GCG format.
Look at the Features table for each sequence.  Write down the
positions of the CODING SEQUENCE, "CDS". 

Now, use the program 	seqed to write out the CDS portions of your
DNA sequences to new files:

mbcrr% seqed hiv1bz163b.gb_vi <Return>
: 1,1015 w hiv1bz163b.dna  <Return>
	(The above command will create a file containing only the dna
	 that you are interested in--the exact corressponding dna to
         the amino acid translation.)


REPEAT the above steps until you have retreived all of your
sequences, and written down the CDS positions for each.

Tutorial:  If you would like to perform a tutorial of performing
phylogenetic analysis on a set of HIV GAG sequences, then
use the seqed command as discussed above and write out the
GAG CDS postions of the following Genbank sequences:
Genbank Locus      Positions
HIV1BZ163B         1. .1015
HIV2ROD            546. .2114
HIVLBV217          1. .1462
HIVUG268           1. .1465
HIVVI557           1. .1468
SIVAGM3            431. .1996
SIVAGM677          897. .2438
SIVMM251           1041. .2561
SIVSTM             709. .2232


Now, put all of these sequences into IG format:
mbcrr%  toig *.dna  <Return>
output file:  bigdnafile

mbcrr%   IG-to-tbl bigdnafile > bigdnafile.tbl <Return>

Now you are done with STEP 1, creating a file of dna sequences.
Before we use this file, we must create a file of protein
sequences, and then run pima.


STEP 2:		CREATING A FILE OF PROTEIN SEQUENCES:  The exactly
		Matching amino acid translations of the above DNA
                sequences.

With the gcg package you can use the translate program.

(If you have closely followed the above directions, and all of
your dna cds files end in ".dna" you can do the translation step
in one step:
mbcrr%  translate  <Return>
filename?  *.dna <Return>

    Otherwise, you will have to run translate with each of your dna files.)
    mbcrr%  translate  <Return>
    filename?   ciohombox.dna <Return>
    pos 1 to end
    output?    ciohombox.aa <Return>
    Remember to use a consistent naming strategy for your peptide files
    so that you can use a wildcard like "*.aa" to list all of your
    peptide files.
    Repeat the above step for each dna sequence.

mbcrr%  toig *.aa  <Return>
output file:  bigpepfile

mbcrr%   IG-to-tbl bigpepfile > bigpepfile.tbl <Return>
Your output file, bigpepfile.tbl is now in table format.

STEP THREE:		RUN THE PIMA PROGRAM.

mbcrr%   pima.sh clus1 25.0 bigpepfile.tbl <Return>
The above command will cause pima to run. "clus1" is the name of
all of the output files from pima, 25.0 is used as the recommended
default value (you could use 0 instead, but this would force all
of the sequences into one cluster), and "bigpepfile.tbl" is the
name of the file containing all of the protein sequences in table
format.
NOTE:  IF you get an error message such as 
"ERROR: Sequence "hiv2rod" contains undefined characters !! ", check
to see that you do not have any STOP CODONS in your input file.  If
you see any "." or "*" characters at the end of your protein sequences,
then remove them, and delete the corressponding codons in your DNA file.
Now, repeat the pima.sh command above.


Pima will create a bunch of output files, you are only interested in
"clus1-ML.pima", or possibly "clus1-SB.pima"
To put your alignments in printable form, run the following program.

mbcrr% 	print-pima.sh pimafile > outputfile <Return>
for example, we would type the following:
mbcrr%   print-pima.sh clus1-ML.pima > clus1-ML.pp <Return>
mbcrr% more clus1-ML.pp
NOTE:  above file viewed with gcg pretty program.

mbcrr%   enscript -rfCourier8 -d qms clus1-ML.pp <Return>
or mbcrr% lpr clus1-ML.pp <Return>

To get detailed help on pima, type "man pima" at the mbcrr% prompt.
mbcrr%  man pima <Return>

STEP FOUR:	RUN PIMA-TO-PAUP TO CREATE A PHYLOGENETIC TREE.
(Note, this program only runs on SUNOS machines, not Solaris
machines, so you must run it on mbcrrb.dfci.harvard.edu
or  hinshelwood.bu.edu

mbcrr% 	telnet mbcrrb	(Or type    rlogin mbcrrb) <Return>
login name:	enter login name
passwd:		enter password

mbcrrb% pima-to-paup.sh clus1-ML.pima bigdnafile.tbl  <Return>

This program will take a few minutes to run.  When finished:
                          
mbcrr%  enscript -rfCourier8 -d qms *ML.paup*  <Return>
(to print out ascii text files sideways)
or
mbcrr%  lpr -Plj3a *ML.tree*  <Return>
(to print out postscript files on the lj3a laser printer located
in Mayer 3A-12)

Note to darwin/hinshelwood users:  
You must specify the printer you want, such as:
hinshelwood% lpr -Plj *.tree* <Return>
or
hinshelwood%  enscript -rfCourier8 -d qms *.paup* <Return>


Example of PAUP tree with 1st codons


1st Codon Position PAUP tree [21 kilobytes]

Example of PAUP tree with 2nd codons


2nd Codon Position PAUP tree [30 kilobytes]

Example of PAUP tree with 3rd codons


3rd Codon Position PAUP tree [30 kilobytes]

Pima.sh Alignment of GAG Sequences displayed with prettyplot


pima.sh&prettyplot example [300 kilobytes]