From lfk@ATHENA.MIT.EDU Wed Aug 15 10:19:12 1990 Received: from ATHENA.MIT.EDU by silver.ucs.indiana.edu (5.61+/9.2jsm) id AA02970; Wed, 15 Aug 90 10:19:03 -0500 Received: from E40-008-8.MIT.EDU by ATHENA.MIT.EDU with SMTP id AA29548; Wed, 15 Aug 90 11:17:42 EDT From: lfk@ATHENA.MIT.EDU Received: by E40-008-8.MIT.EDU (5.61/4.7) id AA11989; Wed, 15 Aug 90 11:17:32 -0400 Date: Wed, 15 Aug 90 11:17:32 -0400 Message-Id: <9008151517.AA11989@E40-008-8.MIT.EDU> To: Fuchs@embl.bitnet, davison@uhnix2.uh.edu, gilbertd@silver.ucs.indiana.edu Subject: ProSearch Update -- New Version (DCL Shar) Status: R Please place this file in your archives. It is a slightly new version of ProSearch, the ProSite Database Searching Package. You should have received the Shar file of the same in my mass mailing. This is a DCL shell archive. Unpack on VMS by @filenanme.com Thanks. Frank Kolakowski ====================================================================== |lfk@athena.mit.edu || Lee F. Kolakowski | |lfk@eastman2.mit.edu || M.I.T. | |kolakowski@wccf.mit.edu || Dept of Chemistry | |lfk@mbio.med.upenn.edu || Room 18-506 | |lfk@hx.lcs.mit.edu || 77 Massachusetts Ave.| |AT&T: 1-617-253-1866 || Cambridge, MA 02139 | |--------------------------------------------------------------------| | #include | | One-Liner Here! | ====================================================================== $! This is a DCL shar-type archive created by Unix dclshar. $! $CREATE COPYING $DECK GNU GENERAL PUBLIC LICENSE Version 1, February 1989 Copyright (C) 1990 Lee F. Kolakowski Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The license agreements of most software companies try to keep users at the mercy of those companies. By contrast, our General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. The General Public License applies to the Free Software Foundation's software and to any other program whose authors commit to using it. You can use it for your programs, too. When we speak of free software, we are referring to freedom, not price. Specifically, the General Public License is designed to make sure that you have the freedom to give away or sell copies of free software, that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of a such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must tell them their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License Agreement applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any work containing the Program or a portion of it, either verbatim or with modifications. Each licensee is addressed as "you". 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this General Public License and to the absence of any warranty; and give any other recipients of the Program a copy of this General Public License along with the Program. You may charge a fee for the physical act of transferring a copy. 2. You may modify your copy or copies of the Program or any portion of it, and copy and distribute such modifications under the terms of Paragraph 1 above, provided that you also do the following: a) cause the modified files to carry prominent notices stating that you changed the files and the date of any change; and b) cause the whole of any work that you distribute or publish, that in whole or in part contains the Program or any part thereof, either with or without modifications, to be licensed at no charge to all third parties under the terms of this General Public License (except that you may choose to grant warranty protection to some or all third parties, at your option). c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the simplest and most usual way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this General Public License. d) You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. Mere aggregation of another independent work with the Program (or its derivative) on a volume of a storage or distribution medium does not bring the other work under the scope of these terms. 3. You may copy and distribute the Program (or a portion or derivative of it, under Paragraph 2) in object code or executable form under the terms of Paragraphs 1 and 2 above provided that you also do one of the following: a) accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Paragraphs 1 and 2 above; or, b) accompany it with a written offer, valid for at least three years, to give any third party free (except for a nominal charge for the cost of distribution) a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Paragraphs 1 and 2 above; or, c) accompany it with the information you received as to where the corresponding source code may be obtained. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form alone.) Source code for a work means the preferred form of the work for making modifications to it. For an executable file, complete source code means all the source code for all modules it contains; but, as a special exception, it need not include source code for modules which are standard libraries that accompany the operating system on which the executable file runs, or for standard header files or definitions files that accompany that operating system. 4. You may not copy, modify, sublicense, distribute or transfer the Program except as expressly provided under this General Public License. Any attempt otherwise to copy, modify, sublicense, distribute or transfer the Program is void, and will automatically terminate your rights to use the Program under this License. However, parties who have received copies, or rights to use copies, from you under this General Public License will not have their licenses terminated so long as such parties remain in full compliance. 5. By copying, distributing or modifying the Program (or any work based on the Program) you indicate your acceptance of this license to do so, and all its terms and conditions. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. 7. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of the license which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the license, you may choose any version ever published by the Free Software Foundation. 8. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 9. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 10. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS Appendix: How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to humanity, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) 19yy This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 1, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) 19xx name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (a program to direct compilers to make passes at assemblers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice That's all there is to it! $EOD $! $CREATE INSTALL.msdos $DECK INSTALLATION for MSDOS 1) Get the datafile ProSite.doc from NETSERV@EMBL.BITNET This file must be mailed to you from the file server. Send a Mail message to the above address with a subject line that says: Subject: Get Prosite:Prosite.doc 2) If you do not have a working version of Awk. FTP to WSMR-SIMTEL20.ARMY.MIL (26.2.0.74) Simtel is a DEC-20 so use tenex mode for binaries. mget PD1:GAWK*.* 3) Place the two batch files (prosearc.bat and pros.bat) in your path. Edit the line: set prolib=\mit\lfk\lib\prosite to reflect where the awk scripts, the regular expression file and the prosite.doc file will be kept. Then replace all occurances of 'awk' in the batch files with the path and name of your implementation of the AWK language. 4) If you get Readseq working on your system, uncomment the lines in the scripts to use readseq. Readseq can be obtained from iubio.bio.indiana.edu 129.79.1.101. This code requires an ANSI C compiler. $EOD $! $CREATE INSTALL.unix $DECK INSTALLATION for UNIX 1) Get the datafile ProSite.doc from NETSERV@EMBL.BITNET This file must be mailed to you from the file server. Send a Mail message to the above address with a subject line that says: Subject: Get Prosite:Prosite.doc 2) If you do not have a working version of Awk. FTP to PREP.AI.MIT.EDU (18.71.0.38) and get the file pub/gnu/gawk* 3) Place the two scripts (prosearch and pros) in your path. Edit the line: prolib='/mit/lfk/lib/prosite' to reflect where the awk scripts, the regular expression file and the prosite.doc file will be kept. Then edit the next line: awk=gawk to reflect the path and name of your implementation of the AWK language. 4) If you get Readseq working on your system, uncomment the lines in the scripts to use readseq. Readseq can be obtained from iubio.bio.indiana.edu 129.79.1.101. This code requires an ANSI C compiler. $EOD $! $CREATE INSTALL.vms $DECK INSTALLATION for VMS 1) Get the datafile ProSite.doc from NETSERV@EMBL.BITNET This file must be mailed to you from the file server. Send a Mail message to the above address with a subject line that says: Subject: Get Prosite:Prosite.doc 2) If you do not have a working version of Awk. FTP to RML2.SRI.COM (128.18.22.20) and get the file getting_gawk. There are instructions for getting the backup save set containing the VMS implementation of gawk. 3) Place the two command files (prosearch.com and pros.com) in a directory. Edit the lines $ prosearch_awk = "GCGMITPROSITE:prosite.awk" $ prodoc_awk = "GCGMITPROSITE:prodoc.awk" $ prosite_doc = "gengenbankdisk:[prosite]prosite.doc" $ prosite_regex ="gengenbankdisk:[prosite]prosite.regex" to reflect where the awk scripts, the regular expression file and the prosite.doc file will be kept. Then replace all occurances of 'awk' in the command files with the path and name of your implementation of the AWK language. 4) You must get Readseq working on your system. Readseq can be obtained from iubio.bio.indiana.edu 129.79.1.101. This code requires an ANSI C compiler. 5) I'd like to thank Anna Tomecka and Jasper Rees for writting the VMS command files. $EOD $! $CREATE MANIFEST $DECK File Description ==== ============ COPYING Gnu Public License INSTALL.msdos Details for MSDOS INSTALL.unix Details for UNIX INSTALL.vms Details for VMS MANIFEST this file prodoc.awk awk script for data formatting pros short output script pros.1 unformatted manual page pros.bat MSDOS batch file for short output pros.com VMS command file for short output pros.nro formatted manual page prosearc.bat MSDOS batch file for long output prosearch long output script prosearch.com VMS command file for long output prosearch.doc background and info prosite.awk awk script for search prosite.bug Details of a small bug prosite.regex regular expression data for search $EOD $! $CREATE prodoc.awk $DECK # prodoc.awk - release version 1.1 # Copyright (C) 1990 Lee F. Kolakowski # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 1, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. # # # Send bugs or improvements to # lfk@athena.mit.edu # # August 13, 1990 # usage: {ng}awk -f prodoc.awk # this provides long output # # usage: {ng}awk -f prodoc.awk # this provides short output # BEGIN { printf("\n%s\t%12s\t%-20s\t%s\n", "Access#", "From->To", "Name", "Doc#") printf("%s\t%12s\t%-20s\t%s\n", "_______", "________", "____________________", "_________") n=1 } { if ($0 ~ /^PS/ && NF == 4) { printf("%s\t%12s\t%-20s\t%s\n", $1, $2, $3, $4) regex[NR] = "{"$4 regex_num = NR } if ($0 ~ /^{PDOC/ && n < regex_num) { for (i=n; i <= regex_num; i++) { if ( regex[i] != regex_last ) { regex_last = regex[i] if ($0 ~ regex[i]) { n = i print $0 while ( $0 !~ /{END}/) { getline print $0 } } } } } } $EOD $! $CREATE pros $DECK #!/bin/sh # pros - release version 1.1 # Copyright (C) 1990 Lee F. Kolakowski # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 1, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. # # # Send bugs or improvements to # lfk@athena.mit.edu # # August 13, 1990 # # usage: pros files... # produces short output # prolib='/mit/lfk/src/prosearch' awk=gawk echo 'Prosite Database -- Release 5.0 of April 1990 Copyright: Amos Bairoch' echo 'ProSearch Software -- Release 1.1 -- Copyright: Lee Kolakowski' for file in $* ; do echo "The following patterns are in < $file >:" # readseq -f10 $file > /tmp/pros$$.tmp # ${awk} -f ${prolib}/prosite.awk ${prolib}/prosite.regex /tmp/pros$$.tmp | ${awk} -f ${prolib}/prosite.awk ${prolib}/prosite.regex $file | ${awk} -f ${prolib}/prodoc.awk - done $EOD $! $CREATE pros.1 $DECK .TH PROS 1 "July 13, 1990" .SH NAME pros \- search protein sequence for Prosite Patterns .SH SYNOPSIS .B pros file ... .br .B prosearch file ... .br .SH DESCRIPTION .I Pros reads each .I file in sequence and searchs for regular expression patterns described sites or structures in the Prosite Database. The output is displayed on the standard output. The output is a table of sites. Longer output is generated by prosearch, which also displays the relevant section from the Prosite database. .SH "SEE ALSO" awk(1), gawk(1) .br prosite.regex - the regular expression file .br prosite.doc - the Prosite database (available from NETSERV@EMBL.BITNET) $EOD $! $CREATE pros.bat $DECK REM Note: If you uncomment the ReadSeq Lines replace RIGHT_CARET REM with the proper redirection character echo off REM pros.bat - release version 1.1 REM Copyright (C) 1990 Lee F. Kolakowski REM REM This program is free software; you can redistribute it and/or modify REM it under the terms of the GNU General Public License as published by REM the Free Software Foundation; either version 1, or (at your option) REM any later version. REM REM This program is distributed in the hope that it will be useful, REM but WITHOUT ANY WARRANTY; without even the implied warranty of REM MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the REM GNU General Public License for more details. REM REM You should have received a copy of the GNU General Public License REM along with this program; if not, write to the Free Software REM Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. REM REM REM Send bugs or improvements to REM lfk@athena.mit.edu REM REM August 13, 1990 REM REM usage: pros files... REM produces short output REM set prolib=\usr\lib\prosite echo Prosite Database -- Release 5.0 of April 1990 Copyright: Amos Bairoch echo ProSearch Software -- Release 1.1 -- Copyright: Lee Kolakowski echo The following patterns are in [ %1 ]: REM readseq -f10 %1 RIGHT_CARET pros$$.tmp REM awk -f %prolib%\prosite.awk %prolib%\prosite.regex pros$$.tmp RIGHT_CARET pros$$2.tmp awk -f %prolib%\prosite.awk %prolib%\prosite.regex %1 > pros$$2.tmp awk -f %prolib%\prodoc.awk pros$$2.tmp del pros$$*.tmp $EOD $! $CREATE pros.com $DECK $! pros.com - release version 1.1 $! Copyright (C) 1990 Lee F. Kolakowski $! $! This program is free software; you can redistribute it and/or modify $! it under the terms of the GNU General Public License as published by $! the Free Software Foundation; either version 1, or (at your option) $! any later version. $! $! This program is distributed in the hope that it will be useful, $! but WITHOUT ANY WARRANTY; without even the implied warranty of $! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the $! GNU General Public License for more details. $! $! You should have received a copy of the GNU General Public License $! along with this program; if not, write to the Free Software $! Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. $! $! $! Send bugs or improvements to $! lfk@athena.mit.edu $! $! August 13, 1990 $! $ ver=F$verify(0) $ Start: $ type sys$input ProSite Database Version 5.0 Copyright Amos Bairoch ProSearch Version 1.1 Copyright Lee F. Kolakowski $ goto ProMatch $! $ ProMatch: $!========== $ prosearch_awk = "GCGMITPROSITE:prosite.awk" $ prodoc_awk = "GCGMITPROSITE:prodoc.awk" $ prosite_doc = "gengenbankdisk:[prosite]prosite.doc" $ prosite_regex ="gengenbankdisk:[prosite]prosite.regex" $ type sys$input Prosearch scans your protein sequence against the Prosite database and records the list of matches as positions, pattern names and accession numbers. Your file of protein sequence can be in any format, it will be read into the correct format by this procedure. $! $ get_doc = 2 $ count = 1 $ Get_seq: $ file = p'count' $ if file.nes."" $ then $ if f$search("''file'").nes."" $ then $ write sys$output " Analyzing sequence ''file'" $ else $ write sys$output " file ''file' cannot be found, trying next file" $ count = count + 1 $ goto getseq $ endif $ else $ if count.gt.1 then goto leave $ Inquire file " Name of protein sequence file " $ if file.eqs."" then goto get_seq $ if f$search("''file'").eqs."" $ then $ write sys$output "" $ write sys$output " your file ''file' was not found" $ inquire retry " type 1 to continue, return to quit " $ if retry .eqs. "1" $ then $ file="" $ goto get_seq $ else $ goto leave $ endif $ endif $ endif $ $Setup: $ file = "''f$parse(file)'" $ def_dir="''f$environment("DEFAULT")'" $ staden_file = "''def_dir'"+"''F$parse(file,,,"NAME")'"+".STADENPRO" $ temp_file = "''def_dir'"+"''F$parse(file,,,"NAME")'"+".temp_file" $ outfile = "''def_dir'"+"''F$parse(file,,,"NAME")'"+".prosearch" $ $ write sys$output "" $ write sys$output " Your output will be in ''outfile'" $ if get_doc.eqs."1" .or. get_doc.eqs."2" then goto work_out $Get_DOC: $ write sys$output "" $ write sys$output " Do you want to include documentation in ''outfile'" $ inquire get_doc " Type 1 to include, 2 to exclude, QUIT to quit" $ if get_doc.eqs."QUIT" then goto leave $ if get_doc.eqs."1" .or. get_doc.eqs."2" then goto work_out $ goto Get_doc $ $work_out: $ write sys$output "" $ write sys$output " Matching against prosite: please wait...." $ write sys$output "" $ $TOSTADEN: $ on control_y then goto leave $ readseq -f13 'file' -o'staden_file' $ $ Prosearch: $ gawk -f 'prosearch_awk' 'prosite_regex' 'staden_file' > 'temp_file' $ if Get_doc.eqs."1" $ then $ gawk -f 'prodoc_awk' 'temp_file' 'prosite_doc' > 'outfile' $ else $ gawk -f 'prodoc_awk' 'temp_file' > 'outfile' $ endif $ $check_count: $ count= count+1 $ goto get_seq $Leave: $ save_message = f$environment("MESSAGE") $ set message/nofacility/noiden/noseverity/notext $ dele/nolog/noconf *.stadenpro;* $ dele/nolog/noconf *.temp_file;* $ set message 'save_message' $EOD $! $CREATE pros.nro $DECK PROS(1) UNIX Programmer's Manual PROS(1) NAME pros - search protein sequence for Prosite Patterns SYNOPSIS pros file ... prosearch file ... DESCRIPTION _P_r_o_s reads each _f_i_l_e in sequence and searchs for regular expression patterns described sites or structures in the Prosite Database. The output is displayed on the standard output. The output is a table of sites. Longer output is generated by prosearch, which also displays the relevant section from the Prosite database. SEE ALSO awk(1), gawk(1) prosite.regex - the regular expression file prosite.doc - the Prosite database (available from NETSERV@EMBL.BITNET) Printed 7/13/90 July 13, 1990 1 $EOD $! $CREATE prosearc.bat $DECK REM Note: If you uncomment the ReadSeq Lines replace RIGHT_CARET REM with the proper redirection character echo off REM prosearc.bat - release version 1.1 REM Copyright (C) 1990 Lee F. Kolakowski REM REM This program is free software; you can redistribute it and/or modify REM it under the terms of the GNU General Public License as published by REM the Free Software Foundation; either version 1, or (at your option) REM any later version. REM REM This program is distributed in the hope that it will be useful, REM but WITHOUT ANY WARRANTY; without even the implied warranty of REM MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the REM GNU General Public License for more details. REM REM You should have received a copy of the GNU General Public License REM along with this program; if not, write to the Free Software REM Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. REM REM REM Send bugs or improvements to REM lfk@athena.mit.edu REM REM August 13, 1990 REM REM usage: prosearch files... REM produces long output REM set prolib=\usr\lib\prosite echo Prosite Database -- Release 5.0 of April 1990 Copyright: Amos Bairoch echo ProSearch Software -- Release 1.1 -- Copyright: Lee Kolakowski echo The following patterns are in [ %1 ]: REM readseq -f10 %1 RIGHT_CARET pros$$.tmp REM awk -f %prolib%\prosite.awk %prolib%\prosite.regex pros$$.tmp RIGHT_CARET pros$$2.tmp awk -f %prolib%\prosite.awk %prolib%\prosite.regex %1 > pros$$2.tmp awk -f %prolib%\prodoc.awk pros$$2.tmp %prolib%\prosite.doc del pros$$*.tmp $EOD $! $CREATE prosearch $DECK #!/bin/sh # prosearch - release version 1.1 # Copyright (C) 1990 Lee F. Kolakowski # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 1, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. # # # Send bugs or improvements to # lfk@athena.mit.edu # # August 13, 1990 # # usage: prosearch files... # produces long output # prolib='/mit/lfk/lib/prosite' awk=gawk echo 'Prosite Database -- Release 5.0 of April 1990 Copyright: Amos Bairoch' echo 'ProSearch Software -- Release 1.1 -- Copyright: Lee Kolakowski' for file in $* ; do echo "The following patterns are in < $file >:" # readseq -f10 $file > /tmp/pros$$.tmp # ${awk} -f ${prolib}/prosite.awk ${prolib}/prosite.regex /tmp/pros$$.tmp | ${awk} -f ${prolib}/prosite.awk ${prolib}/prosite.regex $file | ${awk} -f ${prolib}/prodoc.awk - ${prolib}/prosite.doc done $EOD $! $CREATE prosearch.com $DECK $! prosearch.com - release version 1.1 $! Copyright (C) 1990 Lee F. Kolakowski $! $! This program is free software; you can redistribute it and/or modify $! it under the terms of the GNU General Public License as published by $! the Free Software Foundation; either version 1, or (at your option) $! any later version. $! $! This program is distributed in the hope that it will be useful, $! but WITHOUT ANY WARRANTY; without even the implied warranty of $! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the $! GNU General Public License for more details. $! $! You should have received a copy of the GNU General Public License $! along with this program; if not, write to the Free Software $! Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. $! $! $! Send bugs or improvements to $! lfk@athena.mit.edu $! $! August 13, 1990 $! $ ver=F$verify(0) $ Start: $ type sys$input ProSite Database Version 5.0 Copyright Amos Bairoch ProSearch Version 1.1 Copyright Lee F. Kolakowski $ goto ProMatch $! $ ProMatch: $!========== $ prosearch_awk = "GCGMITPROSITE:prosite.awk" $ prodoc_awk = "GCGMITPROSITE:prodoc.awk" $ prosite_doc = "gengenbankdisk:[prosite]prosite.doc" $ prosite_regex ="gengenbankdisk:[prosite]prosite.regex" $ type sys$input Prosearch scans your protein sequence against the Prosite database and records the list of matches as positions, pattern names and accession numbers. Your file of protein sequence can be in any format, it will be read into the correct format by this procedure. $! $ get_doc = 1 $ count = 1 $ Get_seq: $ file = p'count' $ if file.nes."" $ then $ if f$search("''file'").nes."" $ then $ write sys$output " Analyzing sequence ''file'" $ else $ write sys$output " file ''file' cannot be found, trying next file" $ count = count + 1 $ goto getseq $ endif $ else $ if count.gt.1 then goto leave $ Inquire file " Name of protein sequence file " $ if file.eqs."" then goto get_seq $ if f$search("''file'").eqs."" $ then $ write sys$output "" $ write sys$output " your file ''file' was not found" $ inquire retry " type 1 to continue, return to quit " $ if retry .eqs. "1" $ then $ file="" $ goto get_seq $ else $ goto leave $ endif $ endif $ endif $ $Setup: $ file = "''f$parse(file)'" $ def_dir="''f$environment("DEFAULT")'" $ staden_file = "''def_dir'"+"''F$parse(file,,,"NAME")'"+".STADENPRO" $ temp_file = "''def_dir'"+"''F$parse(file,,,"NAME")'"+".temp_file" $ outfile = "''def_dir'"+"''F$parse(file,,,"NAME")'"+".prosearch" $ $ write sys$output "" $ write sys$output " Your output will be in ''outfile'" $ if get_doc.eqs."1" .or. get_doc.eqs."2" then goto work_out $Get_DOC: $ write sys$output "" $ write sys$output " Do you want to include documentation in ''outfile'" $ inquire get_doc " Type 1 to include, 2 to exclude, QUIT to quit" $ if get_doc.eqs."QUIT" then goto leave $ if get_doc.eqs."1" .or. get_doc.eqs."2" then goto work_out $ goto Get_doc $ $work_out: $ write sys$output "" $ write sys$output " Matching against prosite: please wait...." $ write sys$output "" $ $TOSTADEN: $ on control_y then goto leave $ readseq -f13 'file' -o'staden_file' $ $ Prosearch: $ gawk -f 'prosearch_awk' 'prosite_regex' 'staden_file' > 'temp_file' $ if Get_doc.eqs."1" $ then $ gawk -f 'prodoc_awk' 'temp_file' 'prosite_doc' > 'outfile' $ else $ gawk -f 'prodoc_awk' 'temp_file' > 'outfile' $ endif $ $check_count: $ count= count+1 $ goto get_seq $Leave: $ save_message = f$environment("MESSAGE") $ set message/nofacility/noiden/noseverity/notext $ dele/nolog/noconf *.stadenpro;* $ dele/nolog/noconf *.temp_file;* $ set message 'save_message' $EOD $! $CREATE prosearch.doc $DECK INTRODUCTION Over the past year or so Amos Bairoch (bairoch @cgecmu51.BITNET) has released an number of versions of his Prosite database. This is a database of patterns which have been associated with particular enzymatic activities or structures. For example, the well known pattern for N-link glycosylation Asn-Xxx-Ser/Thr. Amos has compiled a database that consists of references about each pattern, validity of the patterns, occurrences, and a host of other details. This database is of general use, and has been used by Amos in his PC/Gene Suite of programs for analysis of DNA and Protein sequences. I wanted to use this database on a Unix machine and be able to ask the question, "Which of these patterns occur in sequence X?" This is the second release of Prosearch. It completely supersedes the first version with one important bug fix, and support for VMS, MS-DOS, and UNIX. Also, by using ReadSeq, a fine program from Don Gilbert , more protein data formats are accessible. IMPLEMENTATION Most patterns can be expressed as regular expressions. For example the pattern '^P' when used with the unix utility grep matches any line in the input that begins with a 'P'. I translated all but 1 of the 337 patterns in Prosite to Unix style regular expressions and wrote a simple searching program to search a protein sequence for their occurrence. The pattern I did not translate was the pattern PS0003 which is Tyrosine Sulfation. There is no clean pattern for this modification. The program is written in the Awk language, and runs on machines which have either Nawk from AT&T, Gawk from the Free Software Foundation, or one of several versions of Awk which run on MSDOS compatibles. Read the approriate INSTALL file for details. INPUT FILES In put file are any protein sequence files in an unstructured format. AWK will accept the input on any number of lines of any length (I've tried proteins sequences up to 2500 amino acids on one line with no problem). Each ASCII character will be interpreted as an amino acid, and all letters must be capitalized. With 'readseq' any of a number of formats can be used. OUTPUT There are two possible forms of output. The "short" form is a table of accession numbers, positions in the sequence and short names for patterns. The "long" form is the same except that the relevant sections from the Prosite Database is also printed. Here is an example of the short output for Bovine Rhodopsin. Prosite Database -- Release 5.0 of April 1990 Copyright: Amos Bairoch ProSearch Software -- Release 0.1beta -- Copyright: Lee Kolakowski The following patterns are in < test.ops >: Access# From->To Name _______ ________ ____ PS00001 2->6 ASN_GLYCOSYLATION PS00001 15->19 ASN_GLYCOSYLATION PS00001 200->204 ASN_GLYCOSYLATION PS00005 14->17 PKC_PHOSPHO_SITE PS00005 229->232 PKC_PHOSPHO_SITE PS00005 243->246 PKC_PHOSPHO_SITE PS00006 22->26 CK2_PHOSPHO_SITE PS00006 193->197 CK2_PHOSPHO_SITE PS00006 198->202 CK2_PHOSPHO_SITE PS00006 229->233 CK2_PHOSPHO_SITE PS00006 338->342 CK2_PHOSPHO_SITE PS00007 21->30 TYR_PHOSPHO_SITE PS00008 89->95 MYRISTYL PS00008 120->126 MYRISTYL PS00008 156->162 MYRISTYL PS00008 182->188 MYRISTYL PS00013 157->168 PROKAR_LIPOPROTEIN PS00237 68->85 G_PROTEIN_RECEPTOR PS00238 296->314 OPSIN USAGE This is described in the file pros.1, a printable version is in pros.nro. BUGS Please send bug reports or improvements to me. NOTICES This code is covered by the Free Software Foundation's Gnu Public License. See the file COPYING for details. Frank Kolakowski ====================================================================== |lfk@athena.mit.edu || Lee F. Kolakowski | |lfk@eastman2.mit.edu || M.I.T. | |kolakowski@wccf.mit.edu || Dept of Chemistry | |lfk@mbio.med.upenn.edu || Room 18-506 | |lfk@hx.lcs.mit.edu || 77 Massachusetts Ave.| |AT&T: 1-617-253-1866 || Cambridge, MA 02139 | ====================================================================== $EOD $! $CREATE prosite.awk $DECK # prosite.awk - release version 1.1 # Copyright (C) 1990 Lee F. Kolakowski # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 1, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. # # # Send bugs or improvements to # lfk@athena.mit.edu # # August 13, 1990 # # usage: {ng}awk -f prosite.awk prosite.regex filenames... # produces unformatted table for prodoc.awk # { if ( FILENAME ~ /prosite\.reg/ ) { accession[NR] = $1 ; regex[NR] = $2; name[NR] = $3; doc[NR] = $4; regex_num = NR; } else { if (FILENAME != lastfile) { while ( getline < FILENAME > 0 ) { input = input$0 } $0 = input n = length($0) for (i = 1; i <= regex_num ; i++ ) { if (match($0, regex[i])) { printf("%s\t%d->%d\t%s\t%s\n", accession[i], RSTART, RSTART+RLENGTH, name[i], doc[i]); offset = RSTART+1 seq_rem = n-offset new = substr($0, offset, seq_rem) while (match(new,regex[i])) { printf("%s\t%d->%d\t%s\t%s\n", accession[i], offset+RSTART-1, offset+RSTART+RLENGTH-1, name[i], doc[i]); offset += RSTART+1 seq_rem = n-offset new = substr($0, offset, seq_rem) } lastfile= FILENAME } } } } } $EOD $! $CREATE prosite.bug $DECK The following is a note about an error in ProSite.doc. Please correct your version Date: Fri, 27 Jul 90 22:23 N From: Amos Bairoch Subject: Re: PS number ambiguities in ProSite.doc The cross referenence for engrailed should be PS00033. sorry abou that and thanks for pointing it to me. Amos $EOD $! $CREATE prosite.regex $DECK PS00001 N[^P][ST][^P] ASN_GLYCOSYLATION PDOC00001 PS00002 SG.G GLYCOSAMINOGLYCAN PDOC00002 PS00004 [RK][RK].[ST] CAMP_PHOSPHO_SITE PDOC00004 PS00005 [ST].[RK] PKC_PHOSPHO_SITE PDOC00005 PS00006 [ST]..[DE] CK2_PHOSPHO_SITE PDOC00006 PS00007 [RK]...[DE]...Y TYR_PHOSPHO_SITE PDOC00007 PS00007 [RK]...[DE]..Y TYR_PHOSPHO_SITE PDOC00007 PS00007 [RK]..[DE]...Y TYR_PHOSPHO_SITE PDOC00007 PS00007 [RK]..[DE]..Y TYR_PHOSPHO_SITE PDOC00007 PS00008 G[^EDKRHPYFW]..[STAGCN][^P] MYRISTYL PDOC00008 PS00009 .G[RK][RK] AMIDATION PDOC00009 PS00010 C.[DN]....[FY].C.C ASX_HYDROXYL PDOC00010 PS00011 ............E...E.C......[DEN].[LIVMFY].........[FYW] GLU_CARBOXYLATION PDOC00011 PS00012 [LI]G[LIVMFYA]DS[LI]...[DE] PHOSPHOPANTETHEINE PDOC00012 PS00013 [^DERK][^DERK][^DERK][^DERK][^DERK][^DERK][^DERK][LIVSTAG][LIVSTAG][AG]C PROKAR_LIPOPROTEIN PDOC00013 PS00014 [RKH][DEN]EL$ ER_TARGET PDOC00014 PS00015 [RKTA]KK[RQNTSG]K NUCLEAR PDOC00015 PS00016 RGD RGD PDOC00016 PS00017 [AG]....GK[ST] ATP_A PDOC00017 PS00018 D.[DNS][^ILVFYW][DENSTG][DNQGHKR][^GP][LIVMC][DENQSTAGC]..[DE][LIVMFYW] EF_HAND PDOC00018 PS00019 Q[RK]KTFT.W.N ACTININ_1 PDOC00019 PS00020 A.....I.K[LIVM][LIVM]D..D[LIVM] ACTININ_2 PDOC00019 PS00021 [FY]CRNPD KRINGLE PDOC00020 PS00022 C.C.....G..C EGF PDOC00021 PS00023 C..PF.[FYW].......C..........WC....[ND][FYW].....[FYW].[FYW]C FIBRONECTIN_2 PDOC00022 PS00023 C..PF.[FYW].......C..........WC....[ND][FYW]...[FYW].[FYW]C FIBRONECTIN_2 PDOC00022 PS00023 C..PF.[FYW].......C........WC....[ND][FYW].....[FYW].[FYW]C FIBRONECTIN_2 PDOC00022 PS00023 C..PF.[FYW].......C........WC....[ND][FYW]...[FYW].[FYW]C FIBRONECTIN_2 PDOC00022 PS00024 [LI]...W...[PE]..[LIVMFY][DE]A[AV][LIVMFY] HEMOPEXIN PDOC00023 PS00024 [LI]...W..[PE]..[LIVMFY][DE]A[AV][LIVMFY] HEMOPEXIN PDOC00023 PS00025 R..CG[FY]...[ST]...C....C TREFOIL PDOC00024 PS00026 CG.......C....CCS..G.CG....[FYW]C CHITIN_BINDING PDOC00025 PS00027 [LIVMF].....[LIVM]....[IV][RKQ].W........[RK] HOMEOBOX PDOC00027 PS00028 C....C............H.....H ZINC_FINGER_C2H2 PDOC00028 PS00028 C....C............H...H ZINC_FINGER_C2H2 PDOC00028 PS00028 C..C............H.....H ZINC_FINGER_C2H2 PDOC00028 PS00028 C..C............H...H ZINC_FINGER_C2H2 PDOC00028 PS00029 L......L......L......L LEUCINE_ZIPPER PDOC00029 PS00030 [RK]G[^EDKRHPCG][AGCI][FY][LIVA].[FY] RNP_1 PDOC00030 PS00031 C..C.[DE].....H[FY]....C..CK.FF.R STEROID_FINGER PDOC00031 PS00032 [LIVM][FY]PWM ANTENNAPEDIA PDOC00032 PS00033 LMAQGLYN ENGRAILED PDOC00033 PS00034 RPC...........CVS PAIRED_BOX PDOC00034 PS00035 RRIKLG POU PDOC00035 PS00036 [RK][RK].[RKS]N..[STA][STA].[RK].R.[RK] FOS_JUN_BASIC PDOC00036 PS00037 W[ST]..ED..[LIV] MYB_1 PDOC00037 PS00038 K[LIVMA].[IT]L..[TA]...[LIVMA]..[LIVM] HELIX_LOOP_HELIX PDOC00038 PS00039 [LIVM][LIVM]DEAD.[LIVM][LIVM] ATP_HELICASE_1 PDOC00039 PS00040 Y[LIVM]HRIGR ATP_HELICASE_2 PDOC00039 PS00041 [LIV]..[LIV]....G[IFY].....F...[FY].......P HTH_ARAC_FAMILY PDOC00040 PS00042 [ST]R.[DE]I...[LIV]G.[ST].ET HTH_CRP_FAMILY PDOC00041 PS00043 E..[LIVM]...F.VSR..[LIVM]R.A[LIVM] HTH_GNTR_FAMILY PDOC00042 PS00044 [LIVF]..[STAV][STA].....[STA][PQHR]..[LIVM][STA]..[LIVF]..[LIVF][RKEQ]..[LIVFY] HTH_LYSR_FAMILY PDOC00043 PS00045 GF..............NP.T HISTONE_LIKE PDOC00044 PS00046 AGL.FPV HISTONE_H2A PDOC00045 PS00047 GAKRH HISTONE_H4 PDOC00046 PS00048 [AV]RYR...[ST].S.S PROTAMINE_P1 PDOC00047 PS00049 A[LIV][LIV][LIV].........[DN]G....[FY]..N..V[LIV] RIBOSOMAL_L14 PDOC00048 PS00050 [RK][RK][AM][IVY][IV][RKT]L RIBOSOMAL_L23 PDOC00049 PS00051 R[FY]N..RR.WRR RIBOSOMAL_L39 PDOC00050 PS00052 L....[LIVM]......GKK.....I[LIVMF] RIBOSOMAL_RS7 PDOC00051 PS00053 G..[LIV][LIV][ST]T..G[LIV]M....AR RIBOSOMAL_S8 PDOC00052 PS00054 [DN]VTP.P.[DN] RIBOSOMAL_S11 PDOC00053 PS00055 [RK].PNSA.R RIBOSOMAL_S12 PDOC00054 PS00056 GD.[LIV].[LIV]...RP[LIV]..T RIBOSOMAL_S17 PDOC00055 PS00057 AIK.AR...[LF]LP RIBOSOMAL_S18 PDOC00056 PS00058 LGFRGEAL DNA_MISMATCH_REPAIR PDOC00057 PS00059 GHE..G.....G..V ADH_ZINC PDOC00058 PS00060 G..H..AH..G.....PHG ADH_IRON PDOC00059 PS00061 Y[STAGC][STAGC][STAGC]K.[AG][LIVMAG]..[LIVMF] ADH_INSECT_TYPE PDOC00060 PS00062 G....[LIVM]G[LIVM]SNF ALDOKETO_REDUCTASE_1 PDOC00061 PS00063 Q.....[LIVM][AP]KS....R...N ALDOKETO_REDUCTASE_2 PDOC00061 PS00064 [LIVM]G[EQ]HG[DN][ST] L_LDH PDOC00062 PS00065 LIN..RG.V.D GLC_2_HYDROXYACID_DH PDOC00063 PS00066 [RKH]......D.MG.N.[LIVM] HMG_COA_REDUCTASE_1 PDOC00064 PS00067 GF[LIVM].NR[LIVM] 3HCDH PDOC00065 PS00068 [LIVM]T[TR]LD..R[STA] MDH PDOC00066 PS00069 DHYLGKE G6P_DEHYDROGENASE PDOC00067 PS00070 [AG].F...GQ.C.A ALDEHYDE_DEHYDROGEN PDOC00068 PS00071 ASCTT GAPDH PDOC00069 PS00072 G..[FYW][LIV][LIV]NG.K.[FYW]ITN ACYL_COA_DH_1 PDOC00070 PS00073 Q..GG.G[FY]..[DE].P ACYL_COA_DH_2 PDOC00070 PS00074 [LIV]..GG[STAG]K[STAG]....[DN] GLU_DEHYDROGENASE PDOC00071 PS00075 [LIF]G....[LIVMF]PW DHFR PDOC00072 PS00076 GG.C[LIV]..GC[LIV]P PYRIDINE_REDOX PDOC00073 PS00077 W.HH[LM] COX1 PDOC00074 PS00078 C[SA]..CG..H COX2 PDOC00075 PS00079 G.[FYW].[LIVMFYW].[CST]........G[LM]...[LIVMFYW] MULTICOPPER_OXIDASE1 PDOC00076 PS00080 HCH...H...G[LM] MULTICOPPER_OXIDASE2 PDOC00076 PS00081 HP[LIV].KL[LIV]..H LIPOXYGENASE PDOC00077 PS00082 H........Y...P.G...E EXTRADIOL_DIOXYGENAS PDOC00078 PS00083 G.[LIVM]....G..[LIVM]....[LIVM][DE].......G.[FY] INTRADIOL_DIOXYGENAS PDOC00079 PS00084 HHM..F.C CU2_MONOOXYGENASE_1 PDOC00080 PS00085 H.F....HTH..G CU2_MONOOXYGENASE_2 PDOC00080 PS00086 F[SGN].[GD].[RHP].C[LIVFA][GD] CYTOCHROME_P450 PDOC00081 PS00087 [RH][GA][IF]H[LIV]H..G SOD_CU_ZN_1 PDOC00082 PS00088 D.WEH[STA][FY][FY] SOD_MN PDOC00083 PS00089 G..NS...A.MP RIBORED_SMALL PDOC00084 PS00090 [LIVM]...[STANQ][ET]C.....GDD NITROGENASE_1 PDOC00085 PS00091 M.L.PC....Q THYMIDYLATE_SYNTHASE PDOC00086 PS00092 [LIVMA][LIVMFYA].[DN]PP[FY] N6_MTASE PDOC00087 PS00093 [LIVM]TSPP[FY] N4_MTASE PDOC00088 PS00094 [DN].[LIV]..G.PC..[FW]S C5_MTASE_1 PDOC00089 PS00095 [RKQ]..GN[STA][LIV]...[LIV]...[LIV]...[LIV] C5_MTASE_2 PDOC00089 PS00096 TTTTHKTL SER_HYDROXYMETHYLTRF PDOC00090 PS00097 F.[EK].STRT CARBAMOYLTRANSFERASE PDOC00091 PS00098 C[SAG]S[SAG][ILVMFY][RKQ][SAG][ILVM]......I THIOLASE_1 PDOC00092 PS00099 [AG][LIVM].[STA].C.G.G.[AG] THIOLASE_2 PDOC00092 PS00100 HH.VCD CAT PDOC00093 PS00101 IGAGS[LIVM]V CYSE_LACA_NODL PDOC00094 PS00102 GT.NMK PHOSPHORYLASE PDOC00095 PS00103 [LIVMFYWC][LIVM][LIVM][LIVM][DE][DE].[LIVM]..[GC].[STA] PUR_PYR_PR_TRANSFER PDOC00096 PS00104 C..KT[FYW]P.[FYW][FYW] EPSP_SYNTHASE PDOC00097 PS00105 S[FY][SA]K...LY ASP_AMINOTRANSFERASE PDOC00098 PS00106 GR.NLIGEH.DY GALACTOKINASE PDOC00099 PS00107 [LIV]G.G.[FY][SG].[LIV] PROTEIN_KINASE_ATP PDOC00100 PS00108 [LIVMFYC].[HY].D[LIVMFY]K..N[LIMVFC][LIMVFC][LIMVFC] PROTEIN_KINASE_ST PDOC00100 PS00109 [LIVMFYC].[HY].D[LIVMFY][RA]..N[LIMVFC][LIMVFC][LIMVFC] PROTEIN_KINASE_TYR PDOC00100 PS00110 II.KIEN PYRUVATE_KINASE PDOC00101 PS00111 WNGP.G.FE PGLYCERATE_KINASE PDOC00102 PS00112 LTCPSN CREATINE_KINASE PDOC00103 PS00113 DG[FY]PR.[LIVM].Q ADENYLATE_KINASE PDOC00104 PS00114 DLHA.QIQGFFD[LIVM]P[LIVM]D PRPP_SYNTHETASE PDOC00105 PS00115 Y[ST]P[ST]SP[STANK] RNA_POL_II_REPEAT PDOC00106 PS00116 [YA].DTDS[LIVM] DNA_POLYMERASE_B PDOC00107 PS00117 G....HPH.Q GAL-P-UDP-TRANSFER PDOC00108 PS00118 CC..H..C PA2_HIS PDOC00109 PS00119 [LIVM]C[^LIVMFYWPCST]CD.....C PA2_ASP PDOC00109 PS00120 [LIV].[LIVFY][LIV]G[HY]S.G LIPASE_SER PDOC00110 PS00121 Y..YY.C.C COLIPASE PDOC00111 PS00122 P.....[LIVFA]G.SAG CARBOXYLESTERASE_B PDOC00112 PS00123 V.DS..[STG]AT ALKALINE_PHOSPHATASE PDOC00113 PS00124 GKLR[LIV]LYE FBPASE PDOC00114 PS00125 RGNHE SER_THR_PHOSPHATASE PDOC00115 PS00126 HD[LIVMFY].H.[AG]..N.[LIVMFY] PDEASE PDOC00116 PS00127 CK..NTF RNASE_PANCREATIC PDOC00118 PS00128 C...C..[LF]...[DEN][LI].....C LACTALBUMIN_LYSOZYME PDOC00119 PS00129 WIDMN SUCRASE PDOC00120 PS00130 WA..GVLLLN U_DNA_GLYCOSYLASE PDOC00121 PS00131 GESYAG CARBOXYPEPTIDASE_SER PDOC00122 PS00132 [LIVMFY]H[SAG].E.[LIVM][STAG]......[LIVMFY] CARBOXYPEPTIDASE_ZN1 PDOC00123 PS00133 H[SAG]...[LIVM]..[LIVMFYW]P[FYW] CARBOXYPEPTIDASE_ZN2 PDOC00123 PS00134 [LIVM][ST]A[STAG]HC TRYPSIN_HIS PDOC00124 PS00135 GDSGG TRYPSIN_SER PDOC00124 PS00136 [SAIV].[LIVM][LIVM]D[DSTA]G[LIVMFC]...[DNH] SUBTILISIN_ASP PDOC00125 PS00137 HGT..[STA]G.[LIVMA] SUBTILISIN_HIS PDOC00125 PS00138 GTS.[SA].P..[STAV][AG] SUBTILISIN_SER PDOC00125 PS00139 Q...[GE].CW..[STAG] THIOL_PROTEASE PDOC00126 PS00140 N.CG...[LIVM][LIVM]H UCH PDOC00127 PS00141 [LIVFA]DTG[STA][STAN] EUK_ASP_PROTEASE PDOC00128 PS00142 [TAIV]..HE[LIVMFYW][^DEHKRP]H.[LIVMFYWQ] ZINC_PROTEASE PDOC00129 PS00143 GL.H..EHM IDE_PTR PDOC00130 PS00144 [STAG]TGGTIA[STAG] ASN_GLN_ASE PDOC00132 PS00145 MVCHHLD UREASE PDOC00133 PS00146 F.[LIVMFY].S..K....[AG].[LIVM]L BETA_LACTAMASE_A PDOC00134 PS00147 LGGDHS ARGINASE_1 PDOC00135 PS00148 SGNLHG ARGINASE_2 PDOC00135 PS00149 GKWHLG SULFATASE PDOC00117 PS00150 SVDYE[LIVM].G[RK] ACYLPHOSPHATASE_1 PDOC00136 PS00151 GTV.GQ.QGP ACYLPHOSPHATASE_2 PDOC00136 PS00152 P[SAP][IV][DN]...S.S ATPASE_ALPHA_BETA PDOC00137 PS00153 IT.E..E...GA.A ATPASE_GAMMA PDOC00138 PS00154 DKTGT[LIVM]T ATPASE_E1_E2 PDOC00139 PS00155 GGYSQG CUTINASE PDOC00140 PS00156 F.D.KF.DI..T OMPDECASE PDOC00141 PS00157 G.DF.K.DE RUBISCO_LARGE PDOC00142 PS00158 EG.LLKPN ALDOLASE PDOC00143 PS00159 LEVTLR ALDOLASE_KDPG_KHG_1 PDOC00144 PS00160 FK.FPAE ALDOLASE_KDPG_KHG_2 PDOC00144 PS00161 KKCGHM ISOCITRATE_LYASE PDOC00145 PS00162 Q.H.HWG CARBONIC_ANHYDRASE PDOC00146 PS00163 GS..M..K.N FUMARATE_LYASES PDOC00147 PS00164 DDLTV[STA]NP ENOLASE PDOC00148 PS00165 K........S[IF]K.RG DEHYDRATASE_SER_THR PDOC00149 PS00166 G.ALGGG ENOYL_COA_HYDRATASE PDOC00150 PS00167 G...[LIVM]ELG..[FY][ST]DP[LIVM]A[DE]G TRP_SYNTHASE_ALPHA PDOC00151 PS00168 L.H.G[STA]HK.N TRP_SYNTHASE_BETA PDOC00152 PS00169 D.[LIVM][LIVM]VKP D_ALA_DEHYDRATASE PDOC00153 PS00170 PG...MAN.GP PPIASE PDOC00154 PS00171 AYEP.W TPI PDOC00155 PS00172 [LI]EPKP..P XYLOSE_ISOMERASE_1 PDOC00156 PS00173 FHD.D[LIV].P XYLOSE_ISOMERASE_2 PDOC00156 PS00174 [FY]DQWGVELGK P_GLUCOSE_ISOMERASE PDOC00157 PS00175 [LIVM].RHG[EQ]...N PG_MUTASE PDOC00158 PS00176 E........SK..Y[LIM] TOPOISOMERASE_I_EUK PDOC00159 PS00177 [LIVM].EGDSA.[STAG] TOPOISOMERASE_II PDOC00160 PS00178 P..[STAN]..[LIVMFYP][HT][LIVMFYA]G[HNTG][LIVMFYSTA] AA_TRNA_LIGASE_HIGH PDOC00161 PS00178 P[STAN]..[LIVMFYP][HT][LIVMFYA]G[HNTG][LIVMFYSTA] AA_TRNA_LIGASE_HIGH PDOC00161 PS00179 [AG].G[LIVMF][DE]R[LIVM].[LMA][LIVMF] AA_TRNA_LIGASE_ATP PDOC00161 PS00180 [FYW]DGSS GLNA_1 PDOC00162 PS00181 NG[SA]G.H...S GLNA_ATP PDOC00162 PS00182 K[LIVM].....[LIVM]D[RK][DN][LI]Y GLNA_ADENYLATION PDOC00162 PS00183 [FY]HP........[LIV]C[LIV].[LIV][LIV].....P UBIQUITIN_CONJUGAT PDOC00163 PS00183 [FY]HP.......[LIV]C[LIV].[LIV][LIV].....P UBIQUITIN_CONJUGAT PDOC00163 PS00184 RFGDPETQ GARS PDOC00164 PS00185 [RK].[STA]..S.CY[SL] IPNS_1 PDOC00165 PS00186 [LIVM][LIVM].CG[STA]..[STAG]..T.[DNG] IPNS_2 PDOC00165 PS00187 P....[LIVMF].[LIVMF].GD..[LIVMF].[LIVMF]...[DE] TPP_ENZYMES PDOC00166 PS00188 [LIVM].[AV]MKM...[LIVM] BIOTIN PDOC00167 PS00189 G..[LIVF]...[DEQN].[LIVF]..[LIVF]...K[STAV][STAVQN]..[LIVF] LIPOYL PDOC00168 PS00190 C[^CPWHF][^CPWR]CH[^CFWY] CYTOCHROME_C PDOC00169 PS00191 F[LIV]..HPGG CYTOCHROME_B5 PDOC00170 PS00192 [DEQ]...G[FYW].[LIVM]R..H CYTOCHROME_B_HEME PDOC00171 PS00193 PEW[FY][LFY][LFY] CYTOCHROME_B_QO PDOC00171 PS00194 [TA].WC[AG][PH]C THIOREDOXIN PDOC00172 PS00195 C.[FY]C..[TA][KQ].[LI] GLUTAREDOXIN PDOC00173 PS00196 Y.[VFY].C..P.H COPPER_BLUE PDOC00174 PS00196 Y.[VFY].C..PH COPPER_BLUE PDOC00174 PS00196 Y.[VFY].C.P.H COPPER_BLUE PDOC00174 PS00196 Y.[VFY].C.PH COPPER_BLUE PDOC00174 PS00196 Y[VFY].C..P.H COPPER_BLUE PDOC00174 PS00196 Y[VFY].C..PH COPPER_BLUE PDOC00174 PS00196 Y[VFY].C.P.H COPPER_BLUE PDOC00174 PS00196 Y[VFY].C.PH COPPER_BLUE PDOC00174 PS00197 C..[STA]..C[STA][^P]C 2FE2S_FERREDOXIN PDOC00175 PS00197 C.[STA]..C[STA][^P]C 2FE2S_FERREDOXIN PDOC00175 PS00198 C..C..C...C[PEG] 4FE4S_FERREDOXIN PDOC00176 PS00199 CTHLGCV RIESKE_1 PDOC00177 PS00200 CPCHGS RIESKE_2 PDOC00177 PS00201 [FY].[ST].TG.T...A..I FLAVODOXIN PDOC00178 PS00202 W.CP.C[AG] RUBREDOXIN PDOC00179 PS00203 C...C.C..C.C..C METALLOTHIONEIN_CL1 PDOC00180 PS00204 DPH..DF[LI]E FERRITIN PDOC00181 PS00205 Y.[VA]VA[VA][VA][RK] TRANSFERRIN_1 PDOC00182 PS00206 Y.GA..CL.[DE] TRANSFERRIN_2 PDOC00182 PS00207 LLC.[DN].....V.....C..A....H.V..R TRANSFERRIN_3 PDOC00182 PS00208 [SN]P.L..HA...F PLANT_GLOBIN PDOC00183 PS00209 Y[FYW].ED[LIVM]..N......H...P HEMOCYANIN_1 PDOC00184 PS00210 T..RDP.F[FYW] HEMOCYANIN_2 PDOC00184 PS00211 [LVF]SGG...[RK][LIVMA].[LIVMF][AG] ATP_BIND_TRANSPORT PDOC00185 PS00212 [FY]......CC.......C[LFY]......[LIVMFYW] ALBUMIN PDOC00186 PS00213 [DENST]...[LIVFY].G.W[FYWRH].[LIVM] LIPOCALIN PDOC00187 PS00214 G.[FYW].[LIVM]....N[FY][DE] FABP PDOC00188 PS00215 P.[DE].[IVA][RK].[LR][LIVMFY] MITOCH_CARRIER PDOC00189 PS00216 [LIVMST][DE].[LIVMFA]GR[RK].....G SUGAR_TRANSPORT_1 PDOC00190 PS00216 [LIVMST][DE].[LIVMFA]GR[RK]....G SUGAR_TRANSPORT_1 PDOC00190 PS00217 [LIVMF].G[LIVMFA]..G........[LY]..[EQ]......[RK] SUGAR_TRANSPORT_2 PDOC00190 PS00218 A.GG.IGTGL AMINO_ACID_PERMEASE PDOC00191 PS00219 FGGL[LIVM]RD[LIVM][RK]RRYP ANION_EXCHANGER_1 PDOC00192 PS00220 FLISLIFIYETF.KL ANION_EXCHANGER_2 PDOC00192 PS00221 SG.H.NPAVT MIP_NO26_GLPF PDOC00193 PS00222 GCGCC..C IGF_BINDING PDOC00194 PS00223 [TG][STV]........[LIVMF]..R...[DEQNH].......[IFY].......[LIVMF]...[LIVMF]...........[LIVMF]..[LIVMF] ANNEXIN PDOC00195 PS00224 FLAQQES CLATHRIN_LIGHT_CHAIN PDOC00196 PS00225 [LIVMFYWA].[^DEHKRSTP][FY][DEQHKY]...[FY].G....[LIVMFCST] CRYSTALLIN_BETAGAMMA PDOC00197 PS00226 I.[TA]Y[RK].[LM]L[DE] IF PDOC00198 PS00227 [AG]GGTG[SA]G TUBULIN PDOC00199 PS00228 ^MR[DE]I TUBULIN_B_AUTOREG PDOC00200 PS00229 GS..N..H.PGGG MAP2_TAU PDOC00201 PS00230 Y.Y[DE]..[DE][RK] MAP1B_NEURAXIN PDOC00202 PS00231 CDYNRD F_ACTIN_CAPPING_BETA PDOC00203 PS00232 [LIV].[LIV].D.ND[NH].P CADHERIN PDOC00205 PS00233 G......Y.A.E.GY CUTICLE_LARVAL PDOC00206 PS00234 SLVGIE GAS_VESICLE_A PDOC00207 PS00235 FL..T...R...A..Q...L..F GAS_VESICLE_C PDOC00208 PS00236 C.[LIVM].[LIVM]..[FY]P.D...C NEUROTR_ION_CHANNEL PDOC00209 PS00237 [LMR][RKHQN]...[NT][LIVMFYW][LIVMFYW][LIV].[SNH][LIV]...[DEG][LIVMFYWA] G_PROTEIN_RECEPTOR PDOC00210 PS00238 K.....[DN]P.[IV]Y......[FY] OPSIN PDOC00211 PS00239 D[LIV]Y...YYR RECEPTOR_TYR_KIN_II PDOC00212 PS00240 C...G.P.P...W..C RECEPTOR_TYR_KIN_III PDOC00213 PS00241 C[FR]........[STVN]C.W RECEPTOR_CYTOKINES PDOC00214 PS00241 C[FR].......[STVN]C.W RECEPTOR_CYTOKINES PDOC00214 PS00242 [FYW][RK].GFF.R INTEGRIN_ALPHA PDOC00215 PS00243 C.[GNQ]...G.C.C..C.C INTEGRIN_BETA PDOC00216 PS00243 C.[GNQ].G.C.C..C.C INTEGRIN_BETA PDOC00216 PS00244 N....P.H..[SAG]...........[SAG].H[SAG][SAG] REACTION_CENTER PDOC00217 PS00245 APH.CH PHYTOCHROME PDOC00218 PS00246 QECKCHG INT1 PDOC00219 PS00247 G.L.[STAG]......[DE]C.F.E HBGF_FGF PDOC00220 PS00248 C........GC[RK]GID..HWNS.C NGF PDOC00221 PS00249 CV...RC.GCCN PDGF PDOC00222 PS00250 [LIVM]..P..[FY]....C.G.C TGF_BETA PDOC00223 PS00251 Y..YSQV.F TNF PDOC00224 PS00252 [FY]L.......[CY]AW INTERFERON_ALPHABETA PDOC00225 PS00253 F.S...P..[FY][LI].T INTERLEUKIN_1 PDOC00226 PS00254 C.........C......GL..[FY]...L INTERLEUKIN_6 PDOC00227 PS00255 CFLKRLL INTERLEUKIN_7 PDOC00228 PS00256 [EQ][LV][NT]F[ST]..W AKH PDOC00229 PS00257 WA.G[SH][LF]M BOMBESIN PDOC00230 PS00258 C[STAGDN][STAGDN]..TC[LIVMA]...[LFY]...[LFY] CALCITONIN PDOC00231 PS00258 C[STAGDN][STAGDN].TC[LIVMA]...[LFY]...[LFY] CALCITONIN PDOC00231 PS00259 Y.[GD][WH]M[DR]F GASTRIN PDOC00232 PS00259 Y[GD][WH]M[DR]F GASTRIN PDOC00232 PS00260 [YH][STA][DEQN][AG].[FY]..[DEQNST].............[LIV] GLUCAGON PDOC00233 PS00261 C[SA]G.C.[ST] GLYCOPROTEIN_HORMONE PDOC00234 PS00262 CC...C....[LIMF]...C INSULIN PDOC00235 PS00263 CFG...DRIG..S..GC NATRIURETIC_PEPTIDE PDOC00236 PS00264 C[IFY][IFY].NCP.G NEUROHYPOPHYS_HORM PDOC00237 PS00265 N..TR.RY PANCREATIC_HORMONE PDOC00238 PS00266 C.[ST]..[LIVMFY].[LIVMSTA]P.....[TAV].......[LIVMFY]......[LIVMFY]..[STA]W SOMATOTROPIN_1 PDOC00239 PS00267 F[IVFY]G[LM]M[G$] TACHYKININ PDOC00240 PS00268 W..[KN]..K[KE][LI]E[RKN] CECROPIN PDOC00241 PS00268 W[KN]..K[KE][LI]E[RKN] CECROPIN PDOC00241 PS00269 C.......G.C.........CC DEFENSIN PDOC00242 PS00270 C.C....D..C..[FY]C ENDOTHELIN PDOC00243 PS00271 CC.....R..[FY]..C THIONIN PDOC00244 PS00272 CP........[LIVYST].CC SNAKE_TOXIN PDOC00245 PS00272 CP......[LIVYST].CC SNAKE_TOXIN PDOC00245 PS00273 CC..CC.PAC.GC ENTEROTOXIN_H_STABLE PDOC00246 PS00274 T..NW..TNT AEROLYSIN PDOC00247 PS00275 [LIV]....EA.R[FY][RKQ].[LIV] SHIGA_RICIN PDOC00248 PS00276 T..W.P[LIVMFY][LIVMFY][LIVMFY]..E CHANNEL_COLICIN PDOC00249 PS00277 YGG[LIV]T....N STAPH_STREP_TOX_1 PDOC00250 PS00278 K..[LIV]....[LIV]D...R..L.....[LIV]Y STAPH_STREP_TOX_2 PDOC00250 PS00279 Y......[FY]GTH[FY] MAC_PERFORIN PDOC00251 PS00280 F...GC......[FY].....C BPTI_KUNITZ PDOC00252 PS00281 C.[SAD][STA]C..C BOWMAN_BIRK PDOC00253 PS00282 C.......C......Y...C...C KAZAL PDOC00254 PS00282 C.......C......Y...C..C KAZAL PDOC00254 PS00283 [LIVM].D..G..[LIVM].....Y.[LIVM] SOYBEAN_KUNITZ PDOC00255 PS00284 [LIVMF].[LIVMFA][DNQ][RKHQ][PS]F[LIVMFY][LIVMFY].[LIVMF] SERPIN PDOC00256 PS00285 [FYW]PE[LIV][LIV]...[STAV]..A POTATO_INHIBITOR PDOC00257 PS00286 CP.....CK....C...C.C SQUASH_INHIBITOR PDOC00258 PS00287 Q[LIVT]V[SAG]G..[LIVMFY].[LIVMFY].[LIVMFY] CYSTATIN PDOC00259 PS00288 C.C.P.HP TIMP PDOC00260 PS00289 H.C.[ST]W.S PENTRAXIN PDOC00261 PS00290 [FY].C.[VA].H IG_MHC PDOC00262 PS00291 GG.WGQ PRION PDOC00263 PS00292 R..[LIV]..[FYW][LIV]........[LIV].....[FYW]......D[RK] CYCLIN PDOC00264 PS00293 S[LIVM]SKI[LIVM][RK]C PCNA PDOC00265 PS00294 C[^DENQ][LIVM].$ FARNESYLATION PDOC00266 PS00295 N...K.VKKIK ARRESTIN PDOC00267 PS00296 AA.EE....GGG CHAPERONIN PDOC00268 PS00297 [IV]DLGTT.S HSP70_1 PDOC00269 PS00298 NKEIFL HSP90 PDOC00270 PS00299 VLRLRGG UBIQUITIN PDOC00271 PS00300 PI.[FY][LIVM]G.G SRP54 PDOC00272 PS00301 D....E...[GC].T[IV] EFACTOR_GTP PDOC00273 PS00302 TGKHGH IF4D_HYPUSINE PDOC00274 PS00303 LD...[DN]....[FY][EQ].[FY] S100_CABP PDOC00275 PS00304 GSVGGE SASP PDOC00276 PS00305 NG.[DE][DE]..C[ST] 11S_SEED_STORAGE PDOC00284 PS00306 CL[LV]A.A[LV]A CASEIN_ALPHA_BETA PDOC00277 PS00307 H[LIV]GI[DN][LIV].[ST][LIV].S..T LECTIN_LEGUME_BETA PDOC00278 PS00308 P[EQ][FYW]V.[LIV]G.[ST] LECTIN_LEGUME_ALPHA PDOC00278 PS00309 WG.E.RE LECTIN_GALACTOSIDE PDOC00279 PS00310 [STA]C[LIVM][LIVMFYW]A.[LIVMFYW]...[LIVMFYW]...Y LAMP_1 PDOC00280 PS00311 G.K..HAGY LAMP_2 PDOC00280 PS00312 II..VMAG GLYCOPHORIN_A PDOC00281 PS00313 GQD.VK.....K SVP PDOC00282 PS00314 AGYGST.T ICE_NUCLEATION PDOC00283 PS00315 SSSSSSSED[DE]G DEHYDRIN PDOC00285 PS00316 G.C.TGDC.G...C THAUMATIN PDOC00286 PS00317 C.[^C][DN]..C.....CC 4_DISULFIDE_CORE PDOC00026 PS00318 [LIVM]G.[LIVM]GG[AG]T HMG_COA_REDUCTASE_2 PDOC00064 PS00319 GVEFVCCP A4_EXTRA PDOC00204 PS00320 NGYENPTYK A4_INTRA PDOC00204 PS00321 ALKFYASVR RECA PDOC00131 PS00322 KAPRKQL HISTONE_H3 PDOC00287 PS00323 M[LIV]G[RKHNQ]KLGEF RIBOSOMAL_S19 PDOC00288 PS00324 KFGG[ST]S ASPARTOKINASE PDOC00289 PS00325 ^M[DE]AIKKKM TROPOMYOSIN_MUSCLE PDOC00290 PS00326 LKEAE.RAE TROPOMYOSIN PDOC00290 PS00327 [FYW]..LD[LIVM].AK..[FYW] BACTERIAL_OPSIN PDOC00291 PS00328 HRHRGH..[DE][DE][DE][DE][DE][DE][DE] HCP PDOC00292 PS00329 DLGGGTFD HSP70_2 PDOC00269 PS00330 D.[LI]....G.D.[LI].GG...D HEMOLYSIN_CALCIUM PDOC00293 PS00331 F.DD..GTA.V..AGLL MALIC_ENZYMES PDOC00294 PS00332 [STAG]G[PAG]H[FY][DN]P SOD_CU_ZN_2 PDOC00082 PS00333 EG[LIVM][LIVM][LIVM]K...[GC] DNA_LIGASE PDOC00295 PS00334 W..[LI][SAG].....R........[YW]...[LIM] MYB_2 PDOC00037 PS00335 VSE.Q..H..G PARATHYROID PDOC00296 PS00336 FELGS[LIVM]SKTF BETA_LACTAMASE_C PDOC00134 PS00337 P.STFKI BETA_LACTAMASE_D PDOC00134 PS00338 C[LIVMFY]..D[LIVMFYST].....[LIVMFY]..[LIVMFY]..C SOMATOTROPIN_2 PDOC00239 $EOD