AFRC TRAINING MANUAL -CHAPTER 1 STARTING SEQUENCE ANALYSIS ON AGRENET Logging on Logon to ARCB via a PAD connection: PAD>call arcb *** Call connected AFRC Computing Centre 4600 running VMS V5.5-2 on Agrenet CLUSTER node ARCB Username:Studentn Password: Starting up the main packages Start up the general sequence analysis software. $ Seqprogs Next start up the GCG software. $ GCG Now specify what type of graphical output you require: $GKSTERM SELECT GRAPHICS OUTPUT DEVICE FOR GKS For SEQPROGS programs select one terminal and one file. For GCG programs graphics output is sent to the most recent choice. Terminal devices Last selected device: NONE 0 = NONE Unselect terminal AND plotter 1 = P7800 Pericom or Monteray MG100, MG200 2 = TEKBBC Agreterm ROM Tektronix 4010/4014 emulator 3 = TEKPCK Kermit running Tektronix 4010/4014 emulator 4 = X11 XWindows 5 = TEK4207 EMUTEK, Pericom MX7200, or Tektronix 4207 6 = VT340 VT340 colour terminal Plotter devices Last selected device: UNIPICT 7 = CALCOMP81 Plotter (A3 or A4 paper) 8 = BBCSE293 Plotter (A3 or A4 paper) 9 = HP7475 Plotter (A3 or A4 paper) 10 = DECLN03 Laser printer 11 = METAFILE GKS metafile 12 = UNIPICT Unipict file 13 = POST Postscript (laserwriter) file GCG graphic output currently UNIPICT to WISC.UPI Choose one device by name or number (* P7800 *) : Emutek The last prompt asks you to specify where to send the graphics output - directly to your terminal (the default), or to save it in a file. The port name TERMINAL sends graphics output to a VDU. OR Enter a file name to port TEK4207 output to disk (* TERMINAL *): You may ignore the statement "..select one TERMINAL and one FILE". Test your selection using the PLOTTEST program: $ Plottest/font=1 NB: Non-AFRC sites may have a SETPLOT command, providing a list of the devices supported at that site. Use SETPLOT instead of GKSTERM. Using LOGIN.COM You can place the following in your login.com file to do the above automatically each time you log in. $ SEQPROGS $ GCG $ GKS EMUTEK TERM Other software packages $ PHYLIP - Initiates PHYLogeny Inference Package Help information $ SEQHELP - Describes all the general sequence analysis programs. $ GENHELP - Describes all programs in the GCG package. $ GENMANUAL - GCG programs in subject sections. $ EGENHELP - Help on the Extended-GCG programs Documentation The SEQNOTES and PHYLIP_NOTES directories contain files which can be printed. Ask your site manager for GCG printed documentation. Graphical output There are several graphics "languages" available to you via "driver" programs that allow GCG programs access those languages. Each language is capable of "driving" a range of plotting equipment, or "devices". The preferred route on AGRENET is to use a driver called UNIGKS. GKSTERM is a driver-selection program that allows you to select from a list of "devices", that the UNIGKS driver knows about. You select the device at the following prompt. For VT340 for example: Choose one option by name or number (* P7800 *) : VT340 After the device is chosen, you must tell it where to send ("port") the graphics. By default the driver-selection program uses the word "TERMINAL" to send it to the device on your desktop. This is usually your VDU or a plotter connected to your VDU. By giving a filename instead of "TERMINAL" the file then becomes the "port", ie: graphics instructions are written to that file. You can later plot the file on another device. Unipict files When you first start GCG on AGRENET, the GKS driver is selected automatically, directing graphics output to a UNIPICT file called WISC.UPI. Use the program PLOT to display the graphics in WISC.UPI on a range of graphic devices, or edit the graphics using UNIEDIT. A UNIPICT file can be converted into a Freelance CGM file, using the program PITOFL. The file can then be transferred to a PC and edited using the Freelance, Drawperfect or Wordperfect packages. Software and database availability on AGRENET Not all software and data are installed on all VAX computers on AGRENET but are distributed as follows: ---------------------------------- --- ARCB AVRI IAPC RESA FRIR IAPE IRAD GCRI JII NPU NFL ---------------------------------- --- EMBL, PIR, Swissprot, Prosite Y Y Y - - Full Genbank database Y - - - - GBonly database Y Y Y - - EMBL intermediate updates Y Y Y - - Brookhaven Data Bank - - Y - Y GCG Package & Vecbase Y Y Y - - Staden Package Y Y Y Y - Los Alamos Package Y Y Y Y - Phylip Y Y Y Y - Miscellaneous software Y Y Y Y - Help,Documentation, Y Y Y Y - Bulletins Many users will therefore require an account on ARCB:: as well as their host machine. Some understanding of the AGRENET network and mechanisms for file transfer is therefore necessary. Databases on AGRENET Nucleic acid databases GCG name Short name Contents EMBL EM Entire EMBL database, updated quarterly. EMNEW EMN New EMBL entries since the full release. Genbank GB Entire Genbank database, updated quarterly. GBOnly GBO Subset of Genbank database not duplicated in EMBL. GenEMBL GE EMBL, GBONLY and EMNEW together. Euprom EUP Eukaryotic promoter database. KabatN KBN Kabat nucleotide database. Vecbase Vec About 140 vector sequences. For sites without the full Genbank database: 'Genbank' = 'GBOnly'. Database divisions - Each Genbank and EMBL entry is classified into one of several different divisions, with the following names: EMBL Genbank EmNew GbOnly EM+EmN+GbO Division Em_Pr Gb_Pr EmN_Pr GbO_Pr Pr Primate sequences Em_Ro Gb_Ro EmN_Ro GbO_Ro Ro Rodent sequences Em_Ma Gb_Ma EmN_Ma GbO_Ma Ma Other Mammalian (ie: not above) Em_Vr Gb_Vr EmN_Vr GbO_Vr Vr Other Vertebrate (ie: not above) Em_In Gb_In EmN_In GbO_In In Invertebrate Em_Pl Gb_Pl EmN_Pl GbO_Pl Pl Plant Genbank:includes fungi) Em_Or EmN_Or Or Eukaryote organelles (EMBL only) Em_Vi Gb_Vi EmN_Vi GbO_Vi Vi Viral Em_Ba Gb_Ba EmN_Ba GbO_Ba Ba Bacterial (prokaryote) Em_Ph Gb_Ph EmN_Ph GbO_Ph Ph Bacteriophage Em_Sy Gb_Sy EmN_Sy GbO_Sy Sy Synthetic Em_Un Gb_Un EmN_Un GbO_Un Un Unclassified Em_Es Gb_Es EmN_Es GbO_Es Es Expressed sequence tags Em_Fu EmN_Fu Fu Fungal (EMBL only) Gb_St GbO_St St Structural RNA (Genbank only) Em_Pa Emn_Pa Pa Patent entries Em_Bb Emn_Bb Bb NBCI backbone entries Em_Nw Gb_Nw Nw Entries added in last quarter. Em_Mo Mo Entries modified in last quarter To search only the primate entries in the EMBL database: Search for query in what sequence(s) (* GenEMBL:* *) ? EM_Pr:* Protein databases GCG name Contents PIR1 NBRF annotated and classified entries PIR2 NBRF preliminary ("new") entries PIR3 NBRF unverified entries NBRF PIR1, PIR2, PIR3 together Swiss Swissprot database. NRL3d Sequences taken from the Brookhaven database KabatP Kabat protein database. The Prosite database is available to the GCG MOTIFS program, with all the original data in the directory named prositedir. Brookhaven database The Brookhaven database is unavailable via GCG, but entries are provided as individual files in the directory: BHAVEN$DISK:[BHAVEN.DATA] An index is available in BHAVEN$DISK:[DOC]INDEX.DAT Other documentation is in BHAVEN$DISK:[ADVISORY] FORTRAN analysis programs are in BHAVEN$DISK:[PROGRAMS] Useful VMS commands $ HELP - Lists VMS commands and describes their function. $ SD - Shows the name of the current (default) directory. $ DIR - Lists all the filenames in the current directory. $ SET DEF [.WORK] - Changes default directory to WORK.DIR $ HOME - Changes default directory to the top level. $ UP - Changes default directory to one higher level. $ SH QUE/ALL LONG - Lists all entries in LONG batch queue. $ SH SYS/B - Shows how long batch jobs have been running. $ DEL/ENTRY=45 - Deletes batch job 45 from the queue. $ CREAT/DIR [.MYDATA] - Creates a sub-directory. Queues on AGRENET Batch queues Queue name Max CPU Job limit SHORT 2 min 1 MEDIUM 10 min 1 LONG 1 hour 1 OFFPEAK infinite 2 To see the current state of the queues: $SH QUE/ALL queuename To see how long the current jobs have been running for: $ SH SYS/B Printer queues Each site will have a number of different queues. Some are for draft text output, others for laser printer and plotters etc. The following queues exist on the centre cluster: Printer Queue-Name Output Se293 Plotter ARCCG1 Computing div - Machine room Benson Plotter ARCCG2 Computing div - Machine room Calcomp Plotter ARCCG4 Computing div - Machine room LN03 ARCCL1 Computing div - Reception LN03 graphics ARCCL2 Computing div - Reception Laser jet ARCCL3 Computing div - Open plan LN03 ARCCL4 Computing div - Reception Laser jet ARCCL5 Computing div - Reception HP laserjet LASER Computing div - Terminal room HP laserjet LASERWRITER Computing div - Terminal room Apple laser LASERWRITERII Computing div - Terminal room P300 SYS$PRINT Computing div - Machine room P300 PRINTONA4 Computing div - Machine room P300 LABELS Computing div - Machine room P300 LPA0 Computing div - Machine room DL5600 CDADP1 Computing div - Room 20 DL5600 TXA4 Computing div - Machine room BWCP1 AGRI IRADP1 IRAD IRADP2 IRAD JII$PRINT JII JIIP1 JII LARSG1 LARS LARSP1 LARS NFLG1 NFL NFLP1 NFL NWP1 NW WPBSG1 WPBS WPBSG2 WPBS WPBSP1 WPBS eg: $ Print/que=arccl1 output.txt