AFRC TRAINING MANUAL - CHAPTER 15 15. PHYLOGENY INFERENCING 15.1. Starting Phylip The PHYLogeny Inferencing Package is not part of GCG, and comprises some 30 programs. It takes data in sequence or "discrete data" (numeric) form. You should make some effort to understand the theory behind these programs if you use them. The following examples take a multiple sequence alignment, convert it to PHYLIP format, then calculate a distance matrix. Trees are derived by two different methods: Neighbor-Joining and Fitch- Margoliash. To start up the phylip package: $ Seqprogs $ Phylip Note the list of program names. Documentation can be found in the directory PHYLIP_NOTES. 15.2. MSF to Phylip RNA.MSF is the output from a PILEUP of bacterial ribosomal RNA. Use the READSEQ program to convert to PHYLIP format, then use the editor to change any dots in the sequence to dashes. $ Readseq -a rna.msf -format=phylip -output=rna.phy $ Edit rna.phy 15.3. The DNADIST program DNADIST calculates pairwise distances between sequences, taking into account (for DNA) transitions and transversions. NB: There are four different methods for doing this in DNADIST! $ dnadist PHYLIP version 3.5c - running DNADIST Input file name (* DNADIST.DAT *) ? : rna.phy Output file name (* RNA.OUT *) ? : rna.dist Submit as batch job (* No *) ? : Nucleic acid sequence Distance Matrix program, version 3.53c Settings for this run: D Distance (Kimura, Jin/Nei, ML, J-C)? Kimura 2- parameter T Transition/transversion ratio? 2.0 C One category of substitution rates? Yes L Form of distance matrix? Square M Analyze multiple data sets? No I Input sequences interleaved? Yes 0 Terminal type (IBM PC, VT52, ANSI)? ANSI 1 Print out the data at start of run No 2 Print indications of progress of run Yes Are these settings correct? (type Y or letter for one to change) y Examine the distances file RNA.DIST 15.4. The NEIGHBOR and FITCH programs The file PHYLIP_NOTES:DISTANCE.DOC summarises four distance- based methods for nucleic and protein sequences. Fitch- Margoliash using "the PHYLIP search method" and Neighbour- Joining do not assume rate constancy across lineages. The UPGMA and KITSCH programs assume rate constancy across lineages. The FITCH and Neighbour-joining methods are equally preferred, with NJ the faster program. $ Neighbor PHYLIP version 3.5c - running NEIGHBOR Input file name (* NEIGHBOR.DAT *) ? : rna.dist Output file name (* RNA.OUT *) ? : neighbor.out Treefile name to WRITE (* NEIGHBOR.TREE *) ? : Submit as batch job (* No *) ? : Neighbor-Joining/UPGMA method version 3.5 Settings for this run: N Neighbor-joining or UPGMA tree? Neighbor-joining O Outgroup root? No, use as outgroup species 1 L Lower-triangular data matrix? No R Upper-triangular data matrix? No S Subreplicates? No J Randomize input order of species? No. Use input order M Analyze multiple data sets? No 0 Terminal type (IBM PC, VT52, ANSI)? ANSI 1 Print out the data at start of run No 2 Print indications of progress of run Yes 3 Print out tree Yes 4 Write out trees onto tree file? Yes Are these settings correct? (type Y or the letter for one to change) The output file NEIGHBOR.OUT gives a phylogenetic tree. Use the same distance data for FITCH: $ fitch PHYLIP version 3.5c - running FITCH Input file name (* FITCH.DAT *) ? : rna.dist Output file name (* RNA.OUT *) ? : fitch.out Treefile name to WRITE (* FITCH.TREE *) ? : Submit as batch job (* No *) ? : Accept the defaults and examine the output file FITCH.OUT. Do the trees produced by the two programs differ, or are they the same? The programs DRAWGRAM and DRAWTREE can be used to display the TREE files. RETREE can be used to rearrange the nodes on the trees.