From usenet.ucs.indiana.edu!vixen.cso.uiuc.edu!howland.reston.ans.net!europa.eng.gtefsd.com!uunet!biosci!afrc.ac.uk!odonnell Tue Sep 21 17:52:04 EST 1993 Article: 583 of bionet.announce Path: usenet.ucs.indiana.edu!vixen.cso.uiuc.edu!howland.reston.ans.net!europa.eng.gtefsd.com!uunet!biosci!afrc.ac.uk!odonnell From: odonnell@afrc.ac.uk Newsgroups: bionet.announce Subject: Training manual available on ftp Message-ID: <1993Sep21.084559.22770@gserv1.dl.ac.uk> Date: 21 Sep 93 08:46:00 GMT Sender: kristoff@net.bio.net Lines: 491 Approved: bionews-moderator@net.bio.net Molecular Biology Software Training Manual on ftp ================================================= The AFRC's training manual is available on anonymous ftp from ftp.embl-heidelberg.de as the compressed tar file /pub/doc/afrc_manual.tar.Z and on the EMBL file server at e-mail address: netserv@embl-heidelberg.de using the request "get doc:afrc_man.uaa" Summary The training manual comprises a contents section, 16 chapters and four appendices. There are three illustrations: a front cover and two diagrams for chapter 6 which must be added separately. Chapter 1 is very AFRC-specific. Chapter 2 is essentially a GCG-reference section with some AFRC-specific information included. Chapters 3-16 contain worked exercises and background notes. All the examples show how the software behaves on Agrenet VAXes - some programs default to batch-submission, using local queue names. All sequences used in the exercises are obtainable from the exercises themselves, or can be located using Appendix D. The separate chapter-files are in postscript form. Each chapter is formatted for double-sided printing, so some pages are blank. The contents and preface pages (in plain text) are shown below, along with a list of the tar file contents. Dare I say "comments are welcome" ? ************************************************************************* Cary O'Donnell Scientific Support Group, AFRC Computing Division, West Common, Harpenden, Herts AL5 2JE, United Kingdom (AFRC = Agricultural & Food Research Council) Tel: (+44) 582 762271 Internet e-mail: odonnell@afrc.ac.uk Fax: (+44) 582 761710 Long/Lat 00d 21m 45s West 51d 48m 30s North ------------------------------------------------------------------------- ============================================================================= MOLECULAR BIOLOGY SOFTWARE TRAINING MANUAL CONTENTS CHAPTER 1 STARTING SEQUENCE ANALYSIS ON AGRENET 1.1 LOGGING ON . . . . . . . . . . . . . . . . . . . . 1-3 1.2 STARTING UP THE MAIN PACKAGES . . . . . . . . . . 1-3 1.3 USING LOGIN.COM . . . . . . . . . . . . . . . . . 1-4 1.4 OTHER SOFTWARE PACKAGES . . . . . . . . . . . . . 1-4 1.5 HELP INFORMATION . . . . . . . . . . . . . . . . . 1-4 1.6 DOCUMENTATION . . . . . . . . . . . . . . . . . . 1-4 1.7 GRAPHICAL OUTPUT . . . . . . . . . . . . . . . . . 1-4 1.7.1 Unipict files . . . . . . . . . . . . . . . . . 1-5 1.8 SOFTWARE AND DATABASES ON AGRENET . . . . . . . . 1-5 1.9 BROOKHAVEN DATABASE . . . . . . . . . . . . . . . 1-6 1.10 USEFUL VMS COMMANDS . . . . . . . . . . . . . . . 1-6 1.11 QUEUES ON AGRENET . . . . . . . . . . . . . . . . 1-6 1.11.1 Batch queues . . . . . . . . . . . . . . . . . . 1-6 1.11.2 Printer queues . . . . . . . . . . . . . . . . . 1-7 CHAPTER 2 THE GENETICS COMPUTER GROUP PACKAGE 2.1 WHAT IS GCG ? . . . . . . . . . . . . . . . . . . 2-3 2.1.1 Program Examples . . . . . . . . . . . . . . . . 2-3 2.1.2 Command Line Modifiers . . . . . . . . . . . . . 2-3 2.2 DATABASES WITH GCG . . . . . . . . . . . . . . . . 2-4 2.2.1 Database Sequence Names . . . . . . . . . . . . 2-4 2.2.2 Database Accession Numbers . . . . . . . . . . . 2-4 2.2.3 Searching Databases . . . . . . . . . . . . . . 2-4 2.3 DATABASES ON AGRENET . . . . . . . . . . . . . . . 2-5 2.3.1 Nucleic Acid Databases . . . . . . . . . . . . . 2-5 2.3.1.1 Database Divisions . . . . . . . . . . . . . . . 2-5 2.3.2 Protein databases . . . . . . . . . . . . . . . 2-5 2.4 GROUPS OF SEQUENCES . . . . . . . . . . . . . . . 2-6 2.4.1 Files Of Sequence Names . . . . . . . . . . . . 2-6 2.4.2 Multiple Sequence Files . . . . . . . . . . . . 2-6 2.5 PROGRAM DEFAULT VALUES . . . . . . . . . . . . . . 2-6 2.6 GRAPHICS WITH THE GCG PACKAGE . . . . . . . . . . 2-7 2.6.1 Graphics Driver-Selection . . . . . . . . . . . 2-7 2.6.2 Fonts . . . . . . . . . . . . . . . . . . . . . 2-7 2.6.3 GCG 'Figure' Files . . . . . . . . . . . . . . . 2-7 2.7 LOCAL DATA FILES . . . . . . . . . . . . . . . . . 2-8 2.7.1 Enzyme Data Files . . . . . . . . . . . . . . . 2-8 2.8 NUCLEOTIDE SYMBOLS IN GCG . . . . . . . . . . . . 2-9 2.9 AMINO ACID SYMBOLS AND THE STANDARD TRANSLATION TABLE . . . . . . . . . . . . . . . . . . . . . 2-10 2.10 NUCLEOTIDE SYMBOL COMPARISON TABLE FOR BESTFIT . 2-11 2.11 AMINO ACID SYMBOL COMPARISON TABLE . . . . . . . 2-11 2.11.1 Analysis of symbol comparison values . . . . . 2-12 2.12 PROGRAMS IN THE GCG PACKAGE (RELEASE 7.2) . . . 2-14 2.12.1 Supplementary Programs . . . . . . . . . . . . 2-16 CHAPTER 3 GENERAL SEQUENCE MANIPULATION 3.1 HELP INFORMATION AND DOCUMENTATION . . . . . . . . 3-3 3.1.1 GENHELP - Help on GCG programs. . . . . . . . . 3-3 3.1.2 GENMANUAL - Help by program function . . . . . . 3-3 3.1.3 EASYGCG - Finding your way . . . . . . . . . . . 3-4 3.2 FORMATTING A GCG SEQUENCE . . . . . . . . . . . . 3-4 3.2.1 Formatting raw sequence data . . . . . . . . . . 3-4 3.2.2 DNA <-> RNA conversion . . . . . . . . . . . . . 3-5 3.2.3 Sequence complementing . . . . . . . . . . . . . 3-5 3.3 SEQHELP - HELP FOR OTHER ANALYSIS PROGRAMS . . . . 3-6 3.4 COPYING A DATABASE SEQUENCE . . . . . . . . . . . 3-6 3.5 RESTRICTION MAPPING PROGRAMS . . . . . . . . . . . 3-7 3.5.1 MAP . . . . . . . . . . . . . . . . . . . . . . 3-7 3.5.1.1 Selecting enzymes . . . . . . . . . . . . . . . 3-7 3.5.2 MAPSORT (and selecting enzymes by region) . . . 3-8 3.5.2.1 Digest (and selecting enzymes by name) . . . . . 3-8 3.5.2.2 Creating a plasmid map. . . . . . . . . . . . . 3-9 3.5.3 The enzyme list . . . . . . . . . . . . . . . . 3-9 3.5.4 PROTEIN SEQUENCE MAPPING . . . . . . . . . . . . 3-9 3.5.5 SEQUENCE EDITING . . . . . . . . . . . . . . . 3-10 3.5.5.1 Editing "Modes" . . . . . . . . . . . . . . . 3-10 3.5.5.2 Adding Sequence Data From a File . . . . . . . 3-10 3.5.5.3 Moving Around in the Sequence . . . . . . . . 3-10 3.5.5.4 Editing Comments . . . . . . . . . . . . . . . 3-11 3.5.5.5 Writing Part of the Sequence to a File . . . . 3-11 3.5.5.6 Deleting Part of the Sequence . . . . . . . . 3-11 3.5.5.7 Help . . . . . . . . . . . . . . . . . . . . . 3-11 3.5.5.8 Exiting . . . . . . . . . . . . . . . . . . . 3-11 3.5.5.9 Changing your keyboard . . . . . . . . . . . . 3-11 3.5.6 INTERCONVERTING SEQUENCE FORMATS . . . . . . . 3-12 3.5.6.1 PIR format . . . . . . . . . . . . . . . . . . 3-12 3.5.6.2 STADEN format . . . . . . . . . . . . . . . . 3-12 CHAPTER 4 PROTEIN ANALYSIS 4.1 IDENTIFY OPEN READING FRAMES . . . . . . . . . . . 4-3 4.2 IDENTIFYING POTENTIAL CODING REGIONS . . . . . . . 4-4 4.2.1 Base composition of bulk DNA . . . . . . . . . . 4-4 4.2.2 Base composition in the third codon position . . 4-5 4.2.3 Codon usage bias . . . . . . . . . . . . . . . . 4-5 4.3 TRANSLATING RNA INTO PROTEIN . . . . . . . . . . . 4-6 4.3.1 Three and one-letter abbreviations . . . . . . . 4-6 4.4 PREDICTING SECONDARY STRUCTURE IN PROTEINS . . . . 4-7 4.4.1 PEPTIDESTRUCTURE & PLOTSTRUCTURE . . . . . . . . 4-7 4.4.2 MOMENT . . . . . . . . . . . . . . . . . . . . . 4-8 4.4.3 PEPPLOT . . . . . . . . . . . . . . . . . . . . 4-8 4.5 TRANSLATING PROTEIN INTO RNA . . . . . . . . . . . 4-9 4.5.1 Best Sequence Option . . . . . . . . . . . . . . 4-9 4.5.2 Most Ambiguous Sequence Option . . . . . . . . . 4-9 CHAPTER 5 COMPARING SEQUENCES 5.1 IDENTIFYING SEQUENCE HOMOLOGY . . . . . . . . . . 5-3 5.1.1 WORD COMPARISON . . . . . . . . . . . . . . . . 5-3 5.1.2 DOTPLOTTING . . . . . . . . . . . . . . . . . . 5-3 5.1.2.1 Interpreting the Plot . . . . . . . . . . . . . 5-3 5.1.2.2 The Effect of Word Size . . . . . . . . . . . . 5-3 5.1.2.3 Types of patterns . . . . . . . . . . . . . . . 5-4 5.1.3 WINDOW COMPARISON . . . . . . . . . . . . . . . 5-4 5.1.4 COMPARISON OF PROTEINS . . . . . . . . . . . . . 5-5 5.1.5 Symbol comparison tables . . . . . . . . . . . . 5-5 5.2 SEQUENCE ALIGNMENTS . . . . . . . . . . . . . . . 5-6 5.2.1 BESTFIT . . . . . . . . . . . . . . . . . . . . 5-6 5.2.2 GAP . . . . . . . . . . . . . . . . . . . . . . 5-6 5.2.3 Protein sequence alignment . . . . . . . . . . . 5-7 5.2.4 ALIGNMENT MEASUREMENTS . . . . . . . . . . . . . 5-7 5.2.4.1 Quality . . . . . . . . . . . . . . . . . . . . 5-7 5.2.4.2 Ratio . . . . . . . . . . . . . . . . . . . . . 5-7 5.2.4.3 Identity . . . . . . . . . . . . . . . . . . . . 5-7 5.2.4.4 Similarity . . . . . . . . . . . . . . . . . . . 5-7 CHAPTER 6 SEARCHING DATABASES 6.1 SEARCHING BY TEXT - STRINGSEARCH . . . . . . . . . 6-3 6.1.1 Definition search . . . . . . . . . . . . . . . 6-3 6.1.2 Full text search . . . . . . . . . . . . . . . . 6-3 6.2 INTERACTIVE TEXT SEARCHES - XQS . . . . . . . . . 6-4 6.2.1 Nucleotide databases . . . . . . . . . . . . . . 6-4 6.2.2 Protein databases . . . . . . . . . . . . . . . 6-5 6.3 SEQUENCE HOMOLOGY SEARCH . . . . . . . . . . . . . 6-6 6.3.1 FASTA (direct search) . . . . . . . . . . . . . 6-6 6.3.2 TFASTA (translation search) . . . . . . . . . . 6-6 6.3.3 EXHAUSTIVE HOMOLOGY SEARCHING . . . . . . . . . 6-7 6.4 INTERPRETING FASTA OUTPUT . . . . . . . . . . . . 6-8 6.4.1 The FASTA algorithm . . . . . . . . . . . . . . 6-8 6.4.1.1 Disadvantages of the FASTA algorithm . . . . . . 6-8 6.4.2 A FASTA Strategy . . . . . . . . . . . . . . . . 6-8 6.4.3 The histogram . . . . . . . . . . . . . . . . 6-10 6.4.4 Mean Scores and CPU . . . . . . . . . . . . . 6-10 6.4.5 Example FASTA histogram: . . . . . . . . . . . 6-11 6.4.6 The best scores . . . . . . . . . . . . . . . 6-12 6.4.7 The alignments . . . . . . . . . . . . . . . . 6-13 6.5 INTERPRETING PROSRCH OUTPUT . . . . . . . . . . 6-14 6.5.1 Data check . . . . . . . . . . . . . . . . . . 6-14 6.5.2 Symbol comparison table . . . . . . . . . . . 6-14 6.5.3 Additional information . . . . . . . . . . . . 6-14 6.5.4 How PROSRCH works . . . . . . . . . . . . . . 6-14 6.5.5 Figure: Score vs log (number of entries). . . 6-15 6.5.6 The score distribution and statistics . . . . 6-16 6.5.7 The alignments . . . . . . . . . . . . . . . . 6-17 6.5.8 The individual alignment scores . . . . . . . 6-17 6.5.9 Score ratios and PAM tables . . . . . . . . . 6-17 6.5.10 Mapping . . . . . . . . . . . . . . . . . . . 6-18 6.5.11 Additional alignments . . . . . . . . . . . . 6-18 6.5.12 A PROSRCH strategy . . . . . . . . . . . . . . 6-19 6.6 OTHER BIOSEARCH PROGRAMS . . . . . . . . . . . . 6-19 CHAPTER 7 MULTIPLE SEQUENCE ALIGNMENT 7.1 CLUSTER ALIGNMENTS . . . . . . . . . . . . . . . . 7-3 7.1.1 PILEUP . . . . . . . . . . . . . . . . . . . . . 7-3 7.1.2 CLUSTALV . . . . . . . . . . . . . . . . . . . . 7-4 7.2 MANUAL ALIGNMENT . . . . . . . . . . . . . . . . . 7-6 7.3 ALIGNMENT DISPLAYS . . . . . . . . . . . . . . . . 7-7 7.3.1 Threshold, Plurality and Weightings . . . . . . 7-7 7.4 BOXED GRAPHIC DISPLAYS . . . . . . . . . . . . . . 7-8 7.4.1 PRETTYPLOT . . . . . . . . . . . . . . . . . . . 7-8 7.4.2 PRETTYBOX . . . . . . . . . . . . . . . . . . . 7-8 CHAPTER 8 FRAGMENT ASSEMBLY SYSTEM 8.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . 8-3 8.1.1 Goals . . . . . . . . . . . . . . . . . . . . . 8-3 8.1.2 Summary of programs . . . . . . . . . . . . . . 8-3 8.2 FRAGMENT ASSEMBLY TUTORIAL . . . . . . . . . . . . 8-4 8.2.1 NEWGELSTART . . . . . . . . . . . . . . . . . . 8-4 8.2.2 GELENTER . . . . . . . . . . . . . . . . . . . . 8-4 8.2.3 GELMERGE . . . . . . . . . . . . . . . . . . . . 8-5 8.2.4 GELVIEW . . . . . . . . . . . . . . . . . . . . 8-5 8.2.5 GELASSEMBLE . . . . . . . . . . . . . . . . . . 8-6 8.2.6 GELOVERLAP . . . . . . . . . . . . . . . . . . . 8-7 8.2.7 GELENTER as an editor . . . . . . . . . . . . . 8-8 8.2.8 Redefining the project . . . . . . . . . . . . . 8-8 8.2.9 Detecting vector the unofficial way . . . . . . 8-8 CHAPTER 9 FINDING SEQUENCE MOTIFS 9.1 LOCATING PARTIAL SEQUENCES . . . . . . . . . . . . 9-3 9.2 PROTEIN STRUCTURE MOTIFS . . . . . . . . . . . . . 9-4 9.2.1 Retrieving PROSITE documentation . . . . . . . . 9-4 CHAPTER 10 SEQUENCE PROFILING 10.1 THE PROFILING METHOD . . . . . . . . . . . . . . 10-3 10.2 DESCRIPTION OF THE PROFILE TABLE . . . . . . . . 10-3 10.2.1 A profile as a symbol comparison table . . . . 10-4 10.3 FINDING A NEW MEMBER OF THE ALIGNMENT . . . . . 10-4 10.4 PROFILING TUTORIAL . . . . . . . . . . . . . . . 10-5 10.4.1 PROFILEMAKE . . . . . . . . . . . . . . . . . 10-5 10.4.2 PROFILEGAP . . . . . . . . . . . . . . . . . . 10-5 10.4.3 PROFILESEARCH . . . . . . . . . . . . . . . . 10-6 10.4.4 PROFILESEGMENTS . . . . . . . . . . . . . . . 10-6 10.4.5 PROFILESCAN . . . . . . . . . . . . . . . . . 10-7 10.5 NUCLEOTIDE PROFILING . . . . . . . . . . . . . . 10-7 CHAPTER 11 RNA SECONDARY STRUCTURE 11.1 IDENTIFYING INVERTED REPEATS . . . . . . . . . . 11-3 11.2 CALCULATING RNA FOLDING . . . . . . . . . . . . 11-4 11.3 DISPLAY OF FOLDING STRUCTURES . . . . . . . . . 11-4 11.4 ALTERNATIVE STRUCTURES . . . . . . . . . . . . . 11-5 CHAPTER 12 GCG COMMAND FILES 12.1 WHAT ARE THEY? . . . . . . . . . . . . . . . . . 12-3 12.2 EDITING GCLUSTALV.COM . . . . . . . . . . . . . 12-3 12.3 STRINGSEARCH . . . . . . . . . . . . . . . . . . 12-4 12.4 OTHER COMMAND FILES . . . . . . . . . . . . . . 12-4 CHAPTER 13 GCG DATA FILES 13.1 LOCAL DATA FILES . . . . . . . . . . . . . . . . 13-3 13.1.1 Enzyme Tables . . . . . . . . . . . . . . . . 13-3 13.1.2 Codon Usage (or Codonpreference) Tables . . . 13-4 13.1.3 Symbol Comparison Tables . . . . . . . . . . . 13-4 13.1.4 Translation Tables . . . . . . . . . . . . . . 13-5 13.1.5 Yet more data!! . . . . . . . . . . . . . . . 13-5 13.2 PLASMIDMAP FILES . . . . . . . . . . . . . . . . 13-6 13.2.1 Displaying blocks and ranges . . . . . . . . 13-6 CHAPTER 14 DATABASE HANDLING 14.1 ORGANISING YOUR OWN DATABASES . . . . . . . . . 14-3 14.1.1 Using files of sequence names . . . . . . . . 14-3 14.1.2 Create an indexed database . . . . . . . . . . 14-3 CHAPTER 15 PHYLOGENY INFERENCING 15.1 THE PHYLIP PACKAGE . . . . . . . . . . . . . . . 15-3 15.2 CONVERTING TO PHYLIP FORMAT . . . . . . . . . . 15-3 15.3 THE DNADIST PROGRAM . . . . . . . . . . . . . . 15-3 15.4 THE NEIGHBOR AND FITCH PROGRAMS . . . . . . . . 15-4 CHAPTER 16 BEYOND GCG ....... 16.1 SUBMITTING A SEQUENCE TO THE DATABASES . . . . . 16-3 16.1.1 Copy the Submission Form . . . . . . . . . . . 16-3 16.1.2 Enter the Details . . . . . . . . . . . . . . 16-3 16.1.3 Mail the Sequence . . . . . . . . . . . . . . 16-3 16.1.4 Acknowledgement . . . . . . . . . . . . . . . 16-3 16.1.5 Authorin . . . . . . . . . . . . . . . . . . . 16-3 16.2 OBTAINING SOFTWARE FROM REMOTE SITES . . . . . . 16-4 16.2.1 The EMBL file server . . . . . . . . . . . . . 16-4 16.2.2 The Indiana FTP site . . . . . . . . . . . . . 16-4 16.3 BIOSCI BULLETIN BOARD . . . . . . . . . . . . . 16-5 16.3.1 Topics . . . . . . . . . . . . . . . . . . . . 16-5 16.3.2 Subscription requests . . . . . . . . . . . . 16-6 16.3.3 Sending a message to a bulletin board. . . . . 16-6 16.3.4 Reading the bulletin board . . . . . . . . . . 16-6 16.3.5 Cancelling subscriptions . . . . . . . . . . . 16-7 16.3.6 Biosci and local bulletins on Agrenet . . . . 16-7 16.4 THE INTERNET GOPHER . . . . . . . . . . . . . . 16-8 APPENDIX A DATABASE SEARCH RESULTS A.1 FASTA SEARCH OF PLATELET.SEQ WITH A WORD SIZE = 6 A-3 A.1.1 Histogram . . . . . . . . . . . . . . . . . . . A-3 A.1.2 The 100 best scores . . . . . . . . . . . . . . A-4 A.1.3 The alignments . . . . . . . . . . . . . . . . A-6 A.2 FASTA SEARCH OF PLATELET.SEQ WITH A WORD SIZE = 1 A-7 A.2.1 The histogram . . . . . . . . . . . . . . . . . A-7 A.2.2 The 100 best scores . . . . . . . . . . . . . . A-8 A.3 TFASTA SEARCH OF PLATELET.PEP WITH A WORD SIZE = 1 . . . . . . . . . . . . . . . . . . . . . . . A-10 A.3.1 The histogram . . . . . . . . . . . . . . . . A-10 A.3.2 The best 100 scores . . . . . . . . . . . . . A-11 A.3.3 The alignments . . . . . . . . . . . . . . . . A-13 APPENDIX B USING A SEQUENCE DIGITISER B.1 DIGISEQ . . . . . . . . . . . . . . . . . . . . . B-3 B.2 INTERPRETATION OF SEQUENCE GELS . . . . . . . . . B-4 B.2.1 Band distribution . . . . . . . . . . . . . . . B-4 B.2.2 Variable band intensity . . . . . . . . . . . . B-4 B.2.2.1 The C rules: . . . . . . . . . . . . . . . . . . B-4 B.2.2.2 The A rules: . . . . . . . . . . . . . . . . . . B-4 B.2.2.3 Other rules: . . . . . . . . . . . . . . . . . . B-4 B.3 KERMIT FILE TRANSFERS . . . . . . . . . . . . . . B-5 B.4 EMUTEK FILE TRANSFERS . . . . . . . . . . . . . . B-6 APPENDIX C DOTPLOT DIAGRAMS C.1 A SEQUENCE COMPARED TO ITSELF . . . . . . . . . . C-3 C.2 SEQUENCE DIVERGENCE . . . . . . . . . . . . . . . C-4 C.3 INSERTIONS AND DELETIONS . . . . . . . . . . . . . C-4 C.4 TANDEM DUPLICATION . . . . . . . . . . . . . . . . C-5 C.5 INTERNAL REPEATS . . . . . . . . . . . . . . . . . C-5 APPENDIX D MISCELLANEOUS D.1 SEQUENCES USED IN THE EXERCISES . . . . . . . . . D-3 D.1.1 Main example mRNA sequence. . . . . . . . . . . D-3 D.1.2 Other RNA sequences . . . . . . . . . . . . . . D-3 D.1.3 Protein sequences . . . . . . . . . . . . . . . D-3 D.2 FURTHER READING . . . . . . . . . . . . . . . . . D-3 ============================================================================= PREFACE History This document began, in 1989, as a set of exercises for a training course in the use of the GCG package. Its main intention was to introduce research workers, most of whom were novice computer users, to molecular biology software. The current document has evolved to include background notes and other software as part of the training course. The revised aim: to provide a brief introduction to the facilities available within the AFRC's VAX/VMS network called AGRENET, and beyond. This document was never intended to be a comprehensive coverage of the subject. Many items of detail are omitted, which were covered in short verbal presentations during the course. Given the shortage of such training material, one should not be surprised at the many requests for copies of the document from outside the AFRC. This is the main reason for making the document available generally at FTP sites. If the user is prepared to play around with some data and explore, then this document may be of some use. Many users will find themselves using UNIX-based systems in which case the amendments for using GCG programs are quite minor: the command line options use a space and a minus instead of a slash key. eg: on page 3-8 use: mapsort -exclude=388,1020 -six Contents summary Chapter 1 is very AFRC-specific. Chapter 2 is essentially a GCG-reference section with some AFRC-specific information included. Chapters 3-16 contain worked exercises and background notes. All the examples show how the software behaves on Agrenet VAXes - some programs default to batch-submission, using local queue names. All sequences used in the exercises are obtainable from the exercises themselves, or can be located using Appendix D. The course In its present form the course is given over a period of two days, although a three day course might be more appropriate. The order of presentation is intended as starting with the easy part, moving to progressively more complex programs, or where greater explanation is required. A case is easily made for providing a course with the chapters in a completely different order. Acknowledgements My thanks to the many (hundreds of) people who have attended the course for their comments, criticisms, and suggestions for improvements. I am particulary grateful to the following people for their comments on the document and for their contributions to my own understanding. David Judge - Department of Genetics, University of Cambridge, U.K. Sarah McQuay - BRU, Kings Buildings, University of Edinburgh, U.K. Frank Wright - SASS, Kings Buildings, University of Edinburgh. Cary O'Donnell 06-Sep-1993 ============================================================================== Contents of ftp tar file: Files without file extensions are plain text. 2971 Sep 14 10:04 0ADVERT - Training Course timetable 18459 Sep 13 17:58 0CONTENTS - Contents pages of manual 45080 Sep 13 17:58 0Contents.PS 3166 Sep 13 17:58 0PREFACE - Preface to manual 1894 Sep 20 16:19 0README _ This file 46197 Sep 13 17:58 Chapter01.PS 154711 Sep 13 17:58 Chapter02.PS 134566 Sep 13 17:58 Chapter03.PS 72367 Sep 13 17:58 Chapter04.PS 52480 Sep 13 17:58 Chapter05.PS 329050 Sep 13 17:58 Chapter06.PS 79583 Sep 13 17:58 Chapter07.PS 65286 Sep 13 17:58 Chapter08.PS 26510 Sep 13 17:58 Chapter09.PS 41700 Sep 13 17:58 Chapter10.PS 27116 Sep 13 17:58 Chapter11.PS 19249 Sep 13 17:58 Chapter12.PS 47195 Sep 13 17:58 Chapter13.PS 15447 Sep 13 17:58 Chapter14.PS 27291 Sep 17 11:20 Chapter15.PS 73666 Sep 13 17:58 Chapter16.PS 81145 Sep 13 17:58 appendixA.PS 50164 Sep 13 17:58 appendixB.PS 270018 Sep 20 15:57 appendixC.PS 25011 Sep 17 11:19 appendixD.PS 46631 Sep 17 15:53 cover.ps - Cover page 338776 Sep 13 17:58 dapjob.ps - page 6-15 56764 Sep 13 17:58 fasta.ps - page 6-9 Cary O'Donnell 20-Sep-1993