ALN3 and ALP3 (BIONET version 1.0) January 1989 Osamu Gotoh BIONET revisions March 1989 Spencer Yeh ("readme.doc" added 3/23/89 by Spencer Yeh) INTRODUCTION ALN3 and ALP3 are a pair of triple alignment programs, ALN3 for nucleic acid sequences and ALP3 for protein sequences. The phrase "ALX3" will be used to refer to both of the programs. The algorithm used in the programs is described in Gotoh, O. (1986) J. Theor. Biol. 121, 327-337. The user can create different protein scoring matrices by using the program MAKMDM.EXE. The two matrices provided by the author are MDM_1.dat and MDM_10.dat; the latter has data to more significant figures. The program uses a non-BIONET data format described in ALIGN3.DOC. The maximum sequence length limitations are not documented; however I have successfully aligned 3 sequences of 1000 bp. each using ALN3. Please be aware that the program is only minimally user-friendly. CONTACT ADDRESS For questions about the program or suggestions for future improvements please contact: Dr. Osamu Gotoh Dept. of Biochemistry Saitama Cancer Center Research Institute Ina-machi Saitama 362 JAPAN tel.: 0487-22-1111 (ext. 255) SYSTEMS SUPPORTED An IBM-compatible computer running MS-DOS (ver. 2.0 or greater) is needed. The MAKMDM program requires a 80X87 math coprocessor, but this program is not needed to run the ALX3 programs. AVAILABILITY These programs are available by anonymous FTP from BIONET (net.bio.net) in the directory ~ftp/public/dos/alx3 or, for BIONET subscribers only, by postal mail from the BIONET Lending Library. If you are a BIONET subscriber and would like to receive the ALX3 diskette by mail, please send a stamped, self-addressed return envelope along with a formatted diskette (specify capacity) and your request to : BIONET Administrator BIONET/IntelliGenetics, Inc. 700 East El Camino Real, Suite 300 Mountain View, CA 94040 tel.: (415) 962-7337 SOURCE CODE Source code written in C is available in the archive file "ALX3SRC.ARC". The program was originally compiled under Optimizing C86, but has since been modified to run under Turbo C (ver. 1.5a, Borland). The "diff.c" source file was missing on the diskette I received from Dr. Gotoh, but one can use the NCSEQ.LIB file without recompiling "diff.c". Changes were made in the BIONET version to make the default directory be the current directory. Please see REVIS.DOC. PROGRAM FILES before de-ARCing (BIONET diskette version) 210 Kb: README.DOC This documentation file. (8 Kb). ALX3.EXE Self-extracting archive file for the executables and documentation. (123 Kb). ALX3SRC.ARC Archive file for the source code and object files. (75 Kb). PROGRAM FILES before de-ARCing (BIONET downloadable version) README.DOC This documentation file. ALX3.UUE Archived and uuencoded file for the executable and documentation. (161 Kb). ALX3SRC.UUE Archived and uuencoded C source code and object libraries. (104 Kb). PROGRAM FILES after de-ARCing: ALIGN3 DOC 12928 3-15-89 11:15a Dr. Gotoh's documentation. REVIS DOC 1017 3-23-89 10:18a History of BIONET revisions. ALN3 EXE 42698 3-23-89 9:58a Executable file. ALP3 EXE 42962 3-23-89 9:58a Executable file. S1 SEQ 77 3-23-89 9:42a Nucl. acid test file. S2 SEQ 81 3-23-89 9:43a Nucl. acid test file. S3 SEQ 77 3-23-89 9:43a Nucl. acid test file. S123 OUT 1272 3-23-89 10:01a Sample ALN3 output file. P1 PEP 122 3-21-89 3:28p Protein test file. P2 PEP 110 3-21-89 3:28p Protein test file. P3 PEP 113 3-21-89 3:28p Protein test file. P123 OUT 1255 3-23-89 10:02a Sample ALP3 output file. MDM_1 DAT 8064 10-21-88 11:29a Default protein scoring matrix. MDM_10 DAT 8064 10-21-88 11:30a High precision scoring matrix. MAKMDM EXE 35298 12-23-88 5:48p Program to create MDM matrices. MDSQ BAT 128 12-05-88 3:15p Batch file to create subdirectories. DOCUMENTATION The program is briefly documented in the file ALIGN3.DOC. The BIONET version has been altered to make the default drive be the current connected directory instead of B:\NAS or B:\PAS. These changes are documented in REVIS.DOC. There is no internal help to the program, and the source code is not commented. STARTING THE PROGRAM On the BIONET diskette version, the archive file ALX3.EXE is a self-extracting archive, whereas the ".uue" files in the downloadable version require both the UUDECODE program and an "ARC"-compatible dearchiving program such as PKUNPAK to restore the executable file. Self-dearchiving files: De-archive the program by "running" the archive file and specifying the drive and directory path where you want the program installed. E.g., to install aln3 in the \aln3 directory of the c: drive, you should type: >ALN3 c:\aln3 ".UUE" files: First decode the ".uue" file: >UUDECODE aln3.uue Then dearchive the resulting ".arc" file to an appropriate directory by using PKUNPAK (or compatible program): >PKUNPAK aln3.arc c:\aln3 Once installed, CD to the appropriate directory and then start the program by typing its name at the MS-DOS prompt: >ALN3 SAMPLE PROGRAM OUTPUT (of ALP3): p1.pep (1 - 107) - p2.pep (1 - 96) - p3.pep (1 - 99) PAM = 250, BIAS = 0, u = 6, v = 6 Dist3 = -125, 3-id. = 11, 1-id. = 36, 0-id. = 47, Gaps = 9, Unpairs = 14 p2.pep (1 - 96) - p3.pep (1 - 99) Dist2 = 6, Matches = 18 ( 17.82 %) Gaps = 6, Unpairs = 7 p3.pep (1 - 99) - p1.pep (1 - 107) Dist2 = 43, Matches = 13 ( 12.04 %) Gaps = 6, Unpairs = 10 p1.pep (1 - 107) - p2.pep (1 - 96) Dist2 = -180, Matches = 38 ( 35.51 %) Gaps = 5, Unpairs = 11 1 TVYTVGDSAGWKVPFFGDVDYDWKWASNKTFHIGDVLVFKYDRRFHNVDKVTQKNYQSCN 60 .** **.:.** * = * . * *: ****:*.*: *:* * * * .*: 1 AVYVVGGSGGW--TFNTE---SW--PKGKRFRAGDILLFNYNPSMHNVVVVNQGGFSTCN 53 ::*. .* .* . * =. : * : .. * . . : . 1 IDVLLGADDGS-LAFVPS---EFSISPGEKIVFKNNAGFPHNIVFDEDSIPSGVDASKIS 56 61 DTTPIASYNTGNN-RINLKTVGQKYYICGVPKHCDLGQKVHINVTVRS 107 . * .* : .*.* ** **** * **: * ** *: 54 TPAGAKVYTSGRD-QIKL-PKGQSYFICNFPGHCQSGMKIAVNA---L 96 . : . * :* * **: * *. * * .** * 57 MSEEDLLNAKGETFEVALSNKGEYSFYCS-P-HQGAGMVGKVTV---N 99 KNOWN PROBLEMS: 1. Make sure that you have the MDM_1.DAT file located in the current directory or the root directory of A:, B:, C:, D:, or E:, otherwise you will get a message that the E:MDM_1.DAT file was not found. 2. Printing seems to require the AUX option instead of the PRN option. 3. The program does not check for memory limitations and will crash if the program runs out of memory. 4. If you are trying to recompile the program, the "diff.c" source file is missing. However by updating the NCSEQ.LIB file, you probably won't need to use the missing source code.