From <@bloom-beacon.MIT.EDU:news@bloom-picayune.mit.edu> Wed Apr  8 22:21:23 1992
Received: from IUBIO.BIO.INDIANA.EDU by sunflower.bio.indiana.edu
	(4.1/9.5jsm) id AA07474; Wed, 8 Apr 92 22:21:16 EST
Newsgroups: bionet.software.sources
Path: 18.70.0.226!lfk
From: lfk@eastman1.mit.edu (Lee F. Kolakowski)
Subject: ProSearch 2.0
Message-Id: <LFK.92Apr8231055@eastman1.mit.edu>
Sender: news@athena.mit.edu (News system)
Nntp-Posting-Host: eastman1.mit.edu
Organization: Mass. Inst. of Tech., Dept. of Chemistry
Distribution: bionet
Date: Thu, 9 Apr 1992 04:10:55 GMT
Lines: 2520
Apparently-To: bionet-software-sources@bloom-beacon.mit.edu
Status: R


This group seems to pretty dull, lets see if we can get some more code
out of those closets. :->;-)8-)

Most of you are quite familiar with Amos Bairoch's ProSite Database
which is now up to version 8.1. This database is a collection of
sequence motifs and references which provide many potentially
interesting facts on protein sequences. There are a number of programs
which utilize this data.

This code is the second version of ProSearch, which utilizes the
ProSite database, and Awk to search for patterns in protein sequences.

The previous version was supported on Unix systems, VMS systems and
MSDOS systems. This code is only the Unix version. The VMS version has
been made obselete by the GCG program motif. The MSDOS version of the
code is becoming limited by memory in PCs. As a result, I no longer
intend to deal with VMS or MSDOS support. There is a fine program
called MacPattern written by Ranier Fuchs, which provides similar
features to this program for Macs.

This version improves the format of the output in more selectable ways,
and allows the use of alternative pattern files.

I'd like to thank Joe Smith (jes@mbio.med.upenn.edu) for improvements
in the main searching routines. This distribution includes cregex
written by Jack A.M. Leunissen.

I hope this is useful to somebody out there.

Frank Kolakowski

=======================================================================
O Email: lfk@eastman1.mit.edu or kolakowski@helix.mgh.harvard.edu     O
O US Mail: Lee F. Kolakowski        Endocrine Unit                    O
O Massachusetts General Hospital    Wellman 5                         O
O Boston, MA 02114                  Phone AT&T:  1-617-726-3966       O
=======================================================================

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  ProSearch2.0 ProSearch2.0/COPYING ProSearch2.0/INSTALL
#   ProSearch2.0/Makefile ProSearch2.0/README
#   ProSearch2.0/cregex_ansi.c ProSearch2.0/cregex_sun.c
#   ProSearch2.0/prodoc.awk ProSearch2.0/prosearch.1
#   ProSearch2.0/prosearch.help ProSearch2.0/prosearch.orig
#   ProSearch2.0/prosite.awk ProSearch2.0/regex_update.orig
#   ProSearch2.0/test.out ProSearch2.0/test.pep
# Wrapped by lfk@eastman1 on Wed Apr  8 22:11:46 1992
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test ! -d 'ProSearch2.0' ; then
    echo shar: Creating directory \"'ProSearch2.0'\"
    mkdir 'ProSearch2.0'
fi
if test -f 'ProSearch2.0/COPYING' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'ProSearch2.0/COPYING'\"
else
echo shar: Extracting \"'ProSearch2.0/COPYING'\" \(12488 characters\)
sed "s/^X//" >'ProSearch2.0/COPYING' <<'END_OF_FILE'
X
X		    GNU GENERAL PUBLIC LICENSE
X		     Version 1, February 1989
X
X Copyright (C) 1989 Free Software Foundation, Inc.
X                    675 Mass Ave, Cambridge, MA 02139, USA
X Everyone is permitted to copy and distribute verbatim copies
X of this license document, but changing it is not allowed.
X
X			    Preamble
X
X  The license agreements of most software companies try to keep users
Xat the mercy of those companies.  By contrast, our General Public
XLicense is intended to guarantee your freedom to share and change free
Xsoftware--to make sure the software is free for all its users.  The
XGeneral Public License applies to the Free Software Foundation's
Xsoftware and to any other program whose authors commit to using it.
XYou can use it for your programs, too.
X
X  When we speak of free software, we are referring to freedom, not
Xprice.  Specifically, the General Public License is designed to make
Xsure that you have the freedom to give away or sell copies of free
Xsoftware, that you receive source code or can get it if you want it,
Xthat you can change the software or use pieces of it in new free
Xprograms; and that you know you can do these things.
X
X  To protect your rights, we need to make restrictions that forbid
Xanyone to deny you these rights or to ask you to surrender the rights.
XThese restrictions translate to certain responsibilities for you if you
Xdistribute copies of the software, or if you modify it.
X
X  For example, if you distribute copies of a such a program, whether
Xgratis or for a fee, you must give the recipients all the rights that
Xyou have.  You must make sure that they, too, receive or can get the
Xsource code.  And you must tell them their rights.
X
X  We protect your rights with two steps: (1) copyright the software, and
X(2) offer you this license which gives you legal permission to copy,
Xdistribute and/or modify the software.
X
X  Also, for each author's protection and ours, we want to make certain
Xthat everyone understands that there is no warranty for this free
Xsoftware.  If the software is modified by someone else and passed on, we
Xwant its recipients to know that what they have is not the original, so
Xthat any problems introduced by others will not reflect on the original
Xauthors' reputations.
X
X  The precise terms and conditions for copying, distribution and
Xmodification follow.
X
X		    GNU GENERAL PUBLIC LICENSE
X   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
X
X  0. This License Agreement applies to any program or other work which
Xcontains a notice placed by the copyright holder saying it may be
Xdistributed under the terms of this General Public License.  The
X"Program", below, refers to any such program or work, and a "work based
Xon the Program" means either the Program or any work containing the
XProgram or a portion of it, either verbatim or with modifications.  Each
Xlicensee is addressed as "you".
X
X  1. You may copy and distribute verbatim copies of the Program's source
Xcode as you receive it, in any medium, provided that you conspicuously and
Xappropriately publish on each copy an appropriate copyright notice and
Xdisclaimer of warranty; keep intact all the notices that refer to this
XGeneral Public License and to the absence of any warranty; and give any
Xother recipients of the Program a copy of this General Public License
Xalong with the Program.  You may charge a fee for the physical act of
Xtransferring a copy.
X
X  2. You may modify your copy or copies of the Program or any portion of
Xit, and copy and distribute such modifications under the terms of Paragraph
X1 above, provided that you also do the following:
X
X    a) cause the modified files to carry prominent notices stating that
X    you changed the files and the date of any change; and
X
X    b) cause the whole of any work that you distribute or publish, that
X    in whole or in part contains the Program or any part thereof, either
X    with or without modifications, to be licensed at no charge to all
X    third parties under the terms of this General Public License (except
X    that you may choose to grant warranty protection to some or all
X    third parties, at your option).
X
X    c) If the modified program normally reads commands interactively when
X    run, you must cause it, when started running for such interactive use
X    in the simplest and most usual way, to print or display an
X    announcement including an appropriate copyright notice and a notice
X    that there is no warranty (or else, saying that you provide a
X    warranty) and that users may redistribute the program under these
X    conditions, and telling the user how to view a copy of this General
X    Public License.
X
X    d) You may charge a fee for the physical act of transferring a
X    copy, and you may at your option offer warranty protection in
X    exchange for a fee.
X
XMere aggregation of another independent work with the Program (or its
Xderivative) on a volume of a storage or distribution medium does not bring
Xthe other work under the scope of these terms.
X
X  3. You may copy and distribute the Program (or a portion or derivative of
Xit, under Paragraph 2) in object code or executable form under the terms of
XParagraphs 1 and 2 above provided that you also do one of the following:
X
X    a) accompany it with the complete corresponding machine-readable
X    source code, which must be distributed under the terms of
X    Paragraphs 1 and 2 above; or,
X
X    b) accompany it with a written offer, valid for at least three
X    years, to give any third party free (except for a nominal charge
X    for the cost of distribution) a complete machine-readable copy of the
X    corresponding source code, to be distributed under the terms of
X    Paragraphs 1 and 2 above; or,
X
X    c) accompany it with the information you received as to where the
X    corresponding source code may be obtained.  (This alternative is
X    allowed only for noncommercial distribution and only if you
X    received the program in object code or executable form alone.)
X
XSource code for a work means the preferred form of the work for making
Xmodifications to it.  For an executable file, complete source code means
Xall the source code for all modules it contains; but, as a special
Xexception, it need not include source code for modules which are standard
Xlibraries that accompany the operating system on which the executable
Xfile runs, or for standard header files or definitions files that
Xaccompany that operating system.
X
X  4. You may not copy, modify, sublicense, distribute or transfer the
XProgram except as expressly provided under this General Public License.
XAny attempt otherwise to copy, modify, sublicense, distribute or transfer
Xthe Program is void, and will automatically terminate your rights to use
Xthe Program under this License.  However, parties who have received
Xcopies, or rights to use copies, from you under this General Public
XLicense will not have their licenses terminated so long as such parties
Xremain in full compliance.
X
X  5. By copying, distributing or modifying the Program (or any work based
Xon the Program) you indicate your acceptance of this license to do so,
Xand all its terms and conditions.
X
X  6. Each time you redistribute the Program (or any work based on the
XProgram), the recipient automatically receives a license from the original
Xlicensor to copy, distribute or modify the Program subject to these
Xterms and conditions.  You may not impose any further restrictions on the
Xrecipients' exercise of the rights granted herein.
X
X  7. The Free Software Foundation may publish revised and/or new versions
Xof the General Public License from time to time.  Such new versions will
Xbe similar in spirit to the present version, but may differ in detail to
Xaddress new problems or concerns.
X
XEach version is given a distinguishing version number.  If the Program
Xspecifies a version number of the license which applies to it and "any
Xlater version", you have the option of following the terms and conditions
Xeither of that version or of any later version published by the Free
XSoftware Foundation.  If the Program does not specify a version number of
Xthe license, you may choose any version ever published by the Free Software
XFoundation.
X
X  8. If you wish to incorporate parts of the Program into other free
Xprograms whose distribution conditions are different, write to the author
Xto ask for permission.  For software which is copyrighted by the Free
XSoftware Foundation, write to the Free Software Foundation; we sometimes
Xmake exceptions for this.  Our decision will be guided by the two goals
Xof preserving the free status of all derivatives of our free software and
Xof promoting the sharing and reuse of software generally.
X
X			    NO WARRANTY
X
X  9. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
XFOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
XOTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
XPROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
XOR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
XMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
XTO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
XPROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
XREPAIR OR CORRECTION.
X
X  10. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
XWILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
XREDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
XINCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
XOUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
XTO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
XYOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
XPROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
XPOSSIBILITY OF SUCH DAMAGES.
X
X		     END OF TERMS AND CONDITIONS
X
X	Appendix: How to Apply These Terms to Your New Programs
X
X  If you develop a new program, and you want it to be of the greatest
Xpossible use to humanity, the best way to achieve this is to make it
Xfree software which everyone can redistribute and change under these
Xterms.
X
X  To do so, attach the following notices to the program.  It is safest to
Xattach them to the start of each source file to most effectively convey
Xthe exclusion of warranty; and each file should have at least the
X"copyright" line and a pointer to where the full notice is found.
X
X    <one line to give the program's name and a brief idea of what it does.>
X    Copyright (C) 19yy  <name of author>
X
X    This program is free software; you can redistribute it and/or modify
X    it under the terms of the GNU General Public License as published by
X    the Free Software Foundation; either version 1, or (at your option)
X    any later version.
X
X    This program is distributed in the hope that it will be useful,
X    but WITHOUT ANY WARRANTY; without even the implied warranty of
X    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
X    GNU General Public License for more details.
X
X    You should have received a copy of the GNU General Public License
X    along with this program; if not, write to the Free Software
X    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
X
XAlso add information on how to contact you by electronic and paper mail.
X
XIf the program is interactive, make it output a short notice like this
Xwhen it starts in an interactive mode:
X
X    Gnomovision version 69, Copyright (C) 19xx name of author
X    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
X    This is free software, and you are welcome to redistribute it
X    under certain conditions; type `show c' for details.
X
XThe hypothetical commands `show w' and `show c' should show the
Xappropriate parts of the General Public License.  Of course, the
Xcommands you use may be called something other than `show w' and `show
Xc'; they could even be mouse-clicks or menu items--whatever suits your
Xprogram.
X
XYou should also get your employer (if you work as a programmer) or your
Xschool, if any, to sign a "copyright disclaimer" for the program, if
Xnecessary.  Here a sample; alter the names:
X
X  Yoyodyne, Inc., hereby disclaims all copyright interest in the
X  program `Gnomovision' (a program to direct compilers to make passes
X  at assemblers) written by James Hacker.
X
X  <signature of Ty Coon>, 1 April 1989
X  Ty Coon, President of Vice
X
XThat's all there is to it!
END_OF_FILE
if test 12488 -ne `wc -c <'ProSearch2.0/COPYING'`; then
    echo shar: \"'ProSearch2.0/COPYING'\" unpacked with wrong size!
fi
# end of 'ProSearch2.0/COPYING'
fi
if test -f 'ProSearch2.0/INSTALL' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'ProSearch2.0/INSTALL'\"
else
echo shar: Extracting \"'ProSearch2.0/INSTALL'\" \(1668 characters\)
sed "s/^X//" >'ProSearch2.0/INSTALL' <<'END_OF_FILE'
XINSTALLATION for UNIX(tm) Systems
X
X1) Get the Prosite Database from your nearest molecular biology server
X   (e.g. ftp to ncbi.nlm.nih.gov, cd to repository/prosite and
X   get all the files there).
X
X   Place these files in a directory hereafter refered to as PROSITE.
X   Generally, I place all the ProSite stuff in the same directory as
X   this code.
X
X2) If you do not have a working version of Awk conforming to the 
X   specifications for new awk (nawk) get one from a arever near you.
X   The fastest awk I know of is mawk and is available many places
X   including "server.uga.edu" /usr/pub/packages/mawk.1.1.tar.Z
X   Place the compiled version of awk in a general place in your path.
X   Awk is a very useful langauge for small projects. Place the
X   awk executable in your path.
X
X3) Get readseq (a general sequence format translator, and can be
X   obtained from ftp.bio.indiana.edu. This code requires an ANSI
X   C compiler but compiled versions for some systems are available.
X   Follow the instructions for Readseq to install that program in 
X   your path.
X
X4) Edit the Makefile to define locations to find PROSITE and these 
X   files refered to as PROSEARCH as well as AWK. 
X
X5) Type make to generate the regex file and executable shell scripts.
X   If your computers C compiler is not ansi compatible, type 'make sun'
X   instead.
X
X6) Type 'make test'. If there are errors or if a file called prosite.error
X   is generated you may have to fiddle around a little with the shell 
X   scripts.
X
X7) Type 'make install' to get rid of unnecessary files.
X
X8) Move the file 'prosearch' to a place in the path.
X
X
X
X$Id: INSTALL,v 1.4 1992/04/09 02:09:57 lfk Exp $END_OF_FILE
END_OF_FILE
if test 1668 -ne `wc -c <'ProSearch2.0/INSTALL'`; then
    echo shar: \"'ProSearch2.0/INSTALL'\" unpacked with wrong size!
fi
# end of 'ProSearch2.0/INSTALL'
fi
if test -f 'ProSearch2.0/Makefile' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'ProSearch2.0/Makefile'\"
else
echo shar: Extracting \"'ProSearch2.0/Makefile'\" \(1330 characters\)
sed "s/^X//" >'ProSearch2.0/Makefile' <<'END_OF_FILE'
X# Makefile for ProSearch
X# $Id: Makefile,v 1.4 1992/04/09 02:09:57 lfk Exp $
X
X# EDIT HERE
X# define these variables to be where the database and the 
X# searching tools will be stored, ideally the same
X
XPROSITE=/usr/people/khorana/lfk/ProSearch
XPROSEARCH=/usr/people/khorana/lfk/ProSearch
X
X# define the name of the implementation of awk you use
XAWK=mawk
X
X# You should not need to edit below here
X
X# targets
Xall:	prosearch  regex_update  cregexa  prosite.regex
Xsun:	prosearch  regex_update  cregexkr prosite.regex
Xhelp:	prosearch.help
X
Xcregexa:	cregex_ansi.o
X	$(CC) -O cregex_ansi.o -o cregex
X
Xcregexkr:	cregex_sun.o
X	$(CC) -O cregex_sun.o -o cregex
X
Xprosite.regex:	prosite.dat
X	-regex_update
X
Xprosearch:	prosearch.orig
X	@cat prosearch.orig | \
X	sed 's^@PROSITE@^$(PROSITE)^g' | \
X	sed 's^@PROSEARCH@^$(PROSEARCH)^g' | \
X	sed 's^@AWK@^$(AWK)^g' > prosearch
X	chmod +x prosearch
X
Xregex_update:	regex_update.orig
X	@cat regex_update.orig | \
X	sed 's^@PROSITE@^$(PROSITE)^g' > regex_update
X	chmod +x regex_update
X
Xprosearch.help:	prosearch.1
X	groff -Tascii -man prosearch.1 > prosearch.help
X
Xtest:
X	./prosearch -d -s test.pep > test.prosearch
X	@diff test.prosearch test.out
X	@echo There should be no errors
X
Xclean:
X	rm -f regex_update prosearch cregex *.o prosite.regex
X
Xinstall:
X	compress prosite.dat
X	rm *.orig *.c INSTALL *.o rm test*
END_OF_FILE
if test 1330 -ne `wc -c <'ProSearch2.0/Makefile'`; then
    echo shar: \"'ProSearch2.0/Makefile'\" unpacked with wrong size!
fi
# end of 'ProSearch2.0/Makefile'
fi
if test -f 'ProSearch2.0/README' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'ProSearch2.0/README'\"
else
echo shar: Extracting \"'ProSearch2.0/README'\" \(3800 characters\)
sed "s/^X//" >'ProSearch2.0/README' <<'END_OF_FILE'
XINTRODUCTION
X
X	Over the past year or so Amos Bairoch (bairoch
X@cgecmu51.BITNET) has released an number of versions of his Prosite
Xdatabase. This is a database of patterns which have been associated
Xwith particular enzymatic activities or structures. For example, the
Xwell known pattern for N-link glycosylation Asn-Xxx-Ser/Thr.
X
X	Amos has compiled a database that consists of references about
Xeach pattern, validity of the patterns, occurrences, and a host of
Xother details. This database is of general use, and has been used by
XAmos in his PC/Gene Suite of programs for analysis of DNA and Protein
Xsequences.
X
X	I wanted to use this database on a Unix machine and be able to
Xask the question, "Which of these patterns occur in sequence X?"
X
X	This is the second release of Prosearch. It completely
Xsupersedes the first version with one important bug fix, and support
Xfor VMS, MS-DOS, and UNIX. Also, by using ReadSeq, a fine program
Xfrom Don Gilbert <gilbertd@silver.ucs.indiana.edu>, more protein
Xdata formats are accessible.
X
XIMPLEMENTATION
X
X	Most patterns can be expressed as regular expressions. For
Xexample the pattern '^P' when used with the unix utility grep matches
Xany line in the input that begins with a 'P'.
X
X	I translated all but 1 of the 337 patterns in Prosite to Unix
Xstyle regular expressions and wrote a simple searching program to
Xsearch a protein sequence for their occurrence. The pattern I did not
Xtranslate was the pattern PS0003 which is Tyrosine Sulfation. There is
Xno clean pattern for this modification.
X
X	The program is written in the Awk language, and runs on
Xmachines which have either Nawk from AT&T, Gawk from the Free Software
XFoundation, or one of several versions of Awk which run on MSDOS
Xcompatibles. Read the approriate INSTALL file for details.
X
XINPUT FILES
X
X	In put file are any protein sequence files in an unstructured
Xformat. AWK will accept the input on any number of lines of any length
X(I've tried proteins sequences up to 2500 amino acids on one line with
Xno problem). Each ASCII character will be interpreted as an amino
Xacid, and all letters must be capitalized. With 'readseq' any of a
Xnumber of formats can be used.
X
XOUTPUT
X
X	There are two possible forms of output. The "short" form is a
Xtable of accession numbers, positions in the sequence and short names
Xfor patterns. The "long" form is the same except that the relevant
Xsections from the Prosite Database is also printed.
X
XHere is an example of the short output for Bovine Rhodopsin.
X
XProsite Database -- Release 5.0 of April 1990 Copyright: Amos Bairoch
XProSearch Software -- Release 0.1beta -- Copyright: Lee Kolakowski
XThe following patterns are in < test.ops >:
X
XAccess#     From->To    Name
X_______     ________    ____
XPS00001         2->6    ASN_GLYCOSYLATION
XPS00001       15->19    ASN_GLYCOSYLATION
XPS00001     200->204    ASN_GLYCOSYLATION
XPS00005       14->17    PKC_PHOSPHO_SITE
XPS00005     229->232    PKC_PHOSPHO_SITE
XPS00005     243->246    PKC_PHOSPHO_SITE
XPS00006       22->26    CK2_PHOSPHO_SITE
XPS00006     193->197    CK2_PHOSPHO_SITE
XPS00006     198->202    CK2_PHOSPHO_SITE
XPS00006     229->233    CK2_PHOSPHO_SITE
XPS00006     338->342    CK2_PHOSPHO_SITE
XPS00007       21->30    TYR_PHOSPHO_SITE
XPS00008       89->95    MYRISTYL
XPS00008     120->126    MYRISTYL
XPS00008     156->162    MYRISTYL
XPS00008     182->188    MYRISTYL
XPS00013     157->168    PROKAR_LIPOPROTEIN
XPS00237       68->85    G_PROTEIN_RECEPTOR
XPS00238     296->314    OPSIN
X
XUSAGE
X
X	This is described in the file pros.1, a printable version is
Xin pros.nro.
X
X
XBUGS
X
X	Please send bug reports or improvements to me.
X
XNOTICES
X
X	This code is covered by the Free Software Foundation's Gnu
XPublic License. See the file COPYING for details.
X
X
XFrank Kolakowski 
X$Id: README,v 1.2 1992/04/08 22:56:25 lfk Exp $
END_OF_FILE
if test 3800 -ne `wc -c <'ProSearch2.0/README'`; then
    echo shar: \"'ProSearch2.0/README'\" unpacked with wrong size!
fi
# end of 'ProSearch2.0/README'
fi
if test -f 'ProSearch2.0/cregex_ansi.c' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'ProSearch2.0/cregex_ansi.c'\"
else
echo shar: Extracting \"'ProSearch2.0/cregex_ansi.c'\" \(13050 characters\)
sed "s/^X//" >'ProSearch2.0/cregex_ansi.c' <<'END_OF_FILE'
X/* This is distributed with Prosearch with the permission of 
X   the author. */
X/* $Id: cregex_ansi.c,v 1.2 1992/04/08 22:56:25 lfk Exp $ */
X/*** program cregex ***********************************************************
X *
X *
X * Name:
X *      cregex : create regex-file from PROSITE for use with PROSEARH.
X *
X *
X * Syntax:
X *      cregex <database> <regex-file>
X *
X *
X * Description:
X *      Cregex creates the file containing valid AWK regular expressions,
X *      from the native PROSITE data bank. This file is used in Kolakowski's
X *      PROSEARCH script.
X *      For appearances, the output-file may be sorted alphabetically.
X *
X *
X * Author:
X *      Jack A.M. Leunissen, CAOS/CAMM Center, Nijmegen, The Netherlands.
X *
X *
X * Version:     Date:           By:             Update:
X *      1.0     23-Oct-1990     JackL           -
X *      1.01    26-Nov-1990     JackL           Bug in check_title() fixed.
X *      1.1     26-Dec-1990     JackL           Processing changed.
X *
X */
X
X/*** preprocessor *************************************************************
X *
X */
X
X#include <stdio.h>
X#include <string.h>
X#include <stdlib.h>
X
X#define MAXLEN 512
X#define MAXNUM 100
X#define MAXSTR 512
X#define MAXWRD  30
X#define MAXCOD  10
X#define MAXTTL  80
X#define MAXDOC  10
X
X#define TRUE    1
X#define FALSE   0
X
X#define N_TERM  1
X#define C_TERM  2
X#define INLIST  4
X#define EXCLUD  8
X
X#define NEWLINE '\012'
X
X/*** globals ******************************************************************
X *
X */
X
Xchar pattern[MAXNUM][MAXLEN], word[MAXWRD], *ptr;
Xchar code[MAXCOD], title[MAXTTL], patlin[MAXSTR], docu[MAXDOC];
Xint number;
XFILE *fi, *fo;
X
X/*** prototypes ***************************************************************
X *
X */
X
Xvoid open_files(int, char **);
Xint read_data(void);
Xvoid check_title(void);
Xvoid extract(char *, char *, char);
Xint get_pat(char *, char *, char);
Xvoid initialize(void);
Xint read_word(void);
Xvoid process(void);
Xvoid add_pattern(char *, int, int);
Xvoid print_pat(void);
Xint parse_word(char *, int *, int *);
Xvoid close_files(void);
X
X/*** main *********************************************************************
X *
X */
X
Xint main(int argc, char **argv)
X{
X        int not_done;
X
X        open_files(argc, argv);
X        while (read_data()) {
X                initialize();
X                do {
X                        not_done = read_word();
X                        process();
X                } while (not_done);
X                print_pat();
X        }
X        close_files();
X}
X
X/*** open_files ***************************************************************
X *
X * Open the input-file (PROSITE.DAT) and output-file (PROSITE.REGEX).
X *
X */
X
Xvoid open_files(int argc, char **argv)
X{
X        if (argc != 3) {
X                fprintf(stderr, "Usage: %s database regex-file\n", argv[0]);
X                exit(0);
X        }
X        if ((fi = fopen(argv[1], "r")) == NULL) {
X                fprintf(stderr, "%s: cannot open %s\n", argv[0], argv[1]);
X                exit(1);
X        }
X        if ((fo = fopen(argv[2], "w")) == NULL) {
X                fprintf(stderr, "%s: cannot create %s\n", argv[0], argv[2]);
X                exit(2);
X        }
X}
X
X/*** read_data ****************************************************************
X *
X * ID-line -> title
X * AC-line -> code
X * PA-line -> pattern
X * DO-line -> documentation
X * //-line -> end of entry
X *
X */
X
Xint read_data(void)
X{
X        char c, line[MAXSTR];
X        int has_pat, ac_found;
X
X        has_pat = FALSE;
X
X        do {
X                /*
X                 * Find the ID-line
X                 *
X                 */
X                if (fgets(line, MAXSTR, fi) == NULL) return(FALSE);
X                while (line[0] != 'I' && line[1] != 'D') {
X                        if (fgets(line, MAXSTR, fi) == NULL)
X                                return(FALSE);
X                }
X                extract(title, line, ';');
X                check_title();
X
X                /*
X                 * Find the other relevant lines
X                 *
X                 */
X                if (fgets(line, MAXSTR, fi) == NULL) return(FALSE);
X                ac_found = FALSE;
X                while (line[0] != '/' && line[1] != '/') {
X                        if (line[0] == 'A' && line[1] == 'C') {
X                                if (ac_found)
X                                        fprintf(stderr,
X                                        "Too many AC-lines in %s\n",code);
X                                else {
X                                        extract(code, line, ';');
X                                        ac_found++;
X                                }
X                        }
X                        if (line[0] == 'P' && line[1] == 'A')  {
X                                if (!get_pat(patlin, line, '.'))
X                                        return(FALSE);
X                                has_pat++;
X                        }
X                        if (line[0] == 'D' && line[1] == 'O')
X                                extract(docu, line, ';');
X                        if (fgets(line, MAXSTR, fi) == NULL)
X                                return(FALSE);
X                }
X        } while (!has_pat);
X
X        return(TRUE);
X}
X
X/*** extract ******************************************************************
X *
X * Extract a substring 'so' from string 'si', ending with character 'end'.
X *
X */
X
Xvoid extract(char *so, char *si, char end)
X{
X        while (*si != ' ') *si++;               /* skip code    */
X        while (*si == ' ') *si++;               /* skip blanks  */
X        while (*si != end) *so++ = *si++;       /* find end     */
X        *so = '\0';
X}
X
X/*** get_pat ******************************************************************
X *
X * Extract the pattern from the PA-line(s).
X *
X */
X
Xint get_pat(char *so, char *si, char end)
X{
Xcont:
X        while (*si != ' ') *si++;               /* skip code    */
X        while (*si == ' ') *si++;               /* skip blanks  */
X        while (*si && *si != end && *si != NEWLINE)
X                *so++ = *si++;
X        if (*si != end) {
X                if (fgets(si, MAXSTR, fi) == NULL) return(FALSE);
X                goto cont;
X        }
X        *so++ = end;
X        *so = '\0';
X        return(TRUE);
X}
X
X/*** check_title **************************************************************
X *
X * Check the title for the occurrence of blanks, and change them into
X * underscores.
X * NOTE: This function is obsolete!
X *
X */
X
Xvoid check_title(void)
X{
X        char *s = title;
X        while (*s) {
X                *s = (*s == ' ') ? '_' : *s;
X                *s++;
X        }
X}
X
X/*** close_files **************************************************************
X *
X * Close the input- and output-file.
X *
X */
X
Xvoid close_files(void)
X{
X        fclose(fi);
X        fclose(fo);
X        exit(0);
X}
X
X/*** initialize ***************************************************************
X *
X * Initialize the pattern.
X *
X */
X
Xvoid initialize(void)
X{
X        int i;
X        number = 1;
X        for (i = 0; i < MAXNUM; pattern[i++][0] = '\0');
X        ptr = patlin;
X}
X
X/*** read_word ****************************************************************
X *
X * Read a 'word', i.e. one pattern entity.
X *
X */
X
Xint read_word(void)
X{
X        char *w = word;
X
X        while (*ptr != '-' && *ptr != '.') {
X                *w++ = *ptr++;
X        }
X        *w = '\0';
X        return ((int)*++ptr);
X}
X
X/*** process ******************************************************************
X *
X * Translate a pattern unit into valid AWK pattern description(s).
X *
X */
X
Xvoid process(void)
X{
X        int first, last, numold, i, j, k, ret;
X        char str[MAXWRD], tmp[MAXSTR];
X
X        ret = parse_word(str, &first, &last);
X
X        if (ret & N_TERM) {
X                /*
X                 * Special case: N-terminus
X                 */
X                if (ret & INLIST && ! (ret & EXCLUD)) {
X                        for (i = 0; i < number; i++)
X                                strcpy(pattern[number+i], pattern[i]);
X                        for (i = 0; i < number; i++)
X                                strcat(pattern[i], "^");
X                        for (i = number; i < 2*number; i++)
X                                strcat(pattern[i], str);
X                        number *= 2;
X                }
X                else {
X                        tmp[0] = (ret & EXCLUD) ? '.' : '^';
X                        tmp[1] = '\0';
X                        for (i = 0; i < number; i++)
X                                strcat(pattern[i], tmp);
X                        add_pattern(str, first, last);
X                }
X        }
X
X        else if (ret & C_TERM) {
X                /*
X                 * Special case: C-terminus
X                 */
X                if (ret & INLIST && ! (ret & EXCLUD)) {
X                        for (i = 0; i < number; i++)
X                                strcpy(pattern[number+i], pattern[i]);
X                        for (i = 0; i < number; i++)
X                                strcat(pattern[i], "$");
X                        for (i = number; i < 2*number; i++)
X                                strcat(pattern[i], str);
X                        number *= 2;
X                }
X                else {
X                        add_pattern(str, first, last);
X                        tmp[0] = (ret & EXCLUD) ? '.' : '$';
X                        tmp[1] = '\0';
X                        for (i = 0; i < number; i++)
X                                strcat(pattern[i], tmp);
X                }
X        }
X
X        else
X                /*
X                 * No special case.
X                 */
X                add_pattern(str, first, last);
X
X
X}
X
X/*** add_pattern **************************************************************
X *
X */
X
Xvoid add_pattern(char *str, int first, int last)
X{
X        int numold, i, j, k;
X        char tmp[MAXSTR];
X
X        numold = number;
X        if (last - first) {
X                for (i = first; i < last; i++) {
X                        for (j = 0; j <= numold; j++)
X                                strcpy(pattern[number+j], pattern[j]);
X                        number += numold;
X                }
X        }
X        for (i = first; i <= last; i++) {
X                tmp[0] = '\0';
X                for (j = 0; j < i; j++)
X                        strcat(tmp, str);
X                for (j = 0; j < numold; j++) {
X                        k = numold * (i - first) + j;
X                        strcat(pattern[k], tmp);
X                }
X        }
X}
X
X/*** parse_word ***************************************************************
X *
X * Parse the pattern entity: translate the 'word' into AWK rules, and
X * determine the repeat factor/range, if present.
X *
X */
X
Xint parse_word(char *str, int *first, int *last)
X{
X        char *w = word, *s = str;
X        int term, beg_list, end_list, exclude;
X
X        *first = *last = 1;
X        term = beg_list = end_list = exclude = FALSE;
X
X        do {
X                switch (*w) {
X
X                case '<':
X                        term |= N_TERM;
X                        if (beg_list)
X                                term |= INLIST;
X                        break;
X                case '>':
X                        term |= C_TERM;
X                        if (beg_list && !end_list)
X                                term |= INLIST;
X                        break;
X                case 'x':
X                        *s++ = '.';
X                        break;
X                case '[':
X                        *s++ = '[';
X                        beg_list = TRUE;
X                        break;
X                case '{':
X                        *s++ = '[';
X                        *s++ = '^';
X                        exclude = EXCLUD;
X                        beg_list = TRUE;
X                        break;
X                case ']':
X                        *s++ = ']';
X                        end_list = TRUE;
X                        break;
X                case '}':
X                        *s++ = ']';
X                        exclude = EXCLUD;
X                        end_list = TRUE;
X                        break;
X                case '(':
X                        *first = *last = 0;
X                        while (*++w != ',' && *w != ')')
X                                *first = *first * 10 + *w - '0';
X                        if (*w == ',')
X                                while (*++w != ',' && *w != ')')
X                                        *last = *last * 10 + *w - '0';
X                        else *last = *first;
X                        break;
X                default:
X                        *s++ = *w;
X                        break;
X                }
X        } while (*w++);
X
X        return (term|exclude);
X}
X
X/*** print_pat ****************************************************************
X *
X * Store the translated pattern(s).
X *
X */
X
Xvoid print_pat(void)
X{
X        int i;
X
X        for (i = 0; i < number; i++) {
X                fprintf(fo, "%s ", code);
X                fprintf(fo, "%s ", pattern[i]);
X                fprintf(fo, "%s ", title);
X                fprintf(fo, "%s\n", docu);
X        }
X}
X
X
END_OF_FILE
if test 13050 -ne `wc -c <'ProSearch2.0/cregex_ansi.c'`; then
    echo shar: \"'ProSearch2.0/cregex_ansi.c'\" unpacked with wrong size!
fi
# end of 'ProSearch2.0/cregex_ansi.c'
fi
if test -f 'ProSearch2.0/cregex_sun.c' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'ProSearch2.0/cregex_sun.c'\"
else
echo shar: Extracting \"'ProSearch2.0/cregex_sun.c'\" \(13513 characters\)
sed "s/^X//" >'ProSearch2.0/cregex_sun.c' <<'END_OF_FILE'
X/* cregex is distributed with prosearch with the permission of the author */
X/* Modified for K&R compiler by
XFrom reisner@ee.su.OZ.AU Fri Jan 18 12:25:20 1991
XSubject: cregex
XFrank,
X	Again thanks for sending down the latest version of cregex.
XAlec Dunn here at El. Eng. has modified it so that the standard C
Xcompiler on the Sun won't complain.  I'm taking the liberty of sending
Xthe source to you (I also sent a copy to Amos Bairoch in case he might
Xfind it of some use.         Cheers,alex
X*/
X/* $Id: cregex_sun.c,v 1.2 1992/04/08 22:56:25 lfk Exp $ */
X*** program cregex ***********************************************************
X * Name: cregex : create regex-file from PROSITE for use with PROSEARH.
X * Syntax: cregex <database> <regex-file>
X * Description:
X *      Cregex creates the file containing valid AWK regular expressions,
X *      from the native PROSITE data bank. This file is used in Kolakowski's
X *      PROSEARCH script.
X *      For appearances, the output-file may be sorted alphabetically.
X * Author:
X *      Jack A.M. Leunissen, CAOS/CAMM Center, Nijmegen, The Netherlands.
X *
X *
X * Version:     Date:           By:             Update:
X *      1.0     23-Oct-1990     JackL           -
X *      1.01    26-Nov-1990     JackL           Bug in check_title() fixed.
X *      1.1     26-Dec-1990     JackL           Processing changed.
X *
X * Modified A Dunn, Jan 91, to remove ANSI-C features to allow
X *	compilation by Sun C compiler.  Also closed the "preprocessor"
X *	comment below.
X */
X
X/*** preprocessor *************************************************************
X */
X
X#include <stdio.h>
X#include <string.h>
X#include <stdlib.h>
X
X#define MAXLEN 512
X#define MAXNUM 100
X#define MAXSTR 512
X#define MAXWRD  30
X#define MAXCOD  10
X#define MAXTTL  80
X#define MAXDOC  10
X
X#define TRUE    1
X#define FALSE   0
X
X#define N_TERM  1
X#define C_TERM  2
X#define INLIST  4
X#define EXCLUD  8
X
X#define NEWLINE '\012'
X
X/*** globals ******************************************************************
X *
X */
X
Xchar pattern[MAXNUM][MAXLEN], word[MAXWRD], *ptr;
Xchar code[MAXCOD], title[MAXTTL], patlin[MAXSTR], docu[MAXDOC];
Xint number;
XFILE *fi, *fo;
X
X/*** prototypes ***************************************************************
X *
X */
X
Xvoid open_files();
Xint read_data();
Xvoid check_title();
Xvoid extract();
Xint get_pat();
Xvoid initialize();
Xint read_word();
Xvoid process();
Xvoid add_pattern();
Xvoid print_pat();
Xint parse_word();
Xvoid close_files();
X
X/*** main *********************************************************************
X *
X */
X
Xint main(argc, argv)
Xint argc;
Xchar **argv;
X{
X        int not_done;
X
X        open_files(argc, argv);
X        while (read_data()) {
X                initialize();
X                do {
X                        not_done = read_word();
X                        process();
X                } while (not_done);
X                print_pat();
X        }
X        close_files();
X}
X
X/*** open_files ***************************************************************
X *
X * Open the input-file (PROSITE.DAT) and output-file (PROSITE.REGEX).
X *
X */
X
Xvoid open_files(argc, argv)
Xint argc;
Xchar **argv;
X{
X        if (argc != 3) {
X                fprintf(stderr, "Usage: %s database regex-file\n", argv[0]);
X                exit(0);
X        }
X        if ((fi = fopen(argv[1], "r")) == NULL) {
X                fprintf(stderr, "%s: cannot open %s\n", argv[0], argv[1]);
X                exit(1);
X        }
X        if ((fo = fopen(argv[2], "w")) == NULL) {
X                fprintf(stderr, "%s: cannot create %s\n", argv[0], argv[2]);
X                exit(2);
X        }
X}
X
X/*** read_data ****************************************************************
X *
X * ID-line -> title
X * AC-line -> code
X * PA-line -> pattern
X * DO-line -> documentation
X * //-line -> end of entry
X *
X */
X
Xint read_data()
X{
X        char c, line[MAXSTR];
X        int has_pat, ac_found;
X
X        has_pat = FALSE;
X
X        do {
X                /*
X                 * Find the ID-line
X                 *
X                 */
X                if (fgets(line, MAXSTR, fi) == NULL) return(FALSE);
X                while (line[0] != 'I' && line[1] != 'D') {
X                        if (fgets(line, MAXSTR, fi) == NULL)
X                                return(FALSE);
X                }
X                extract(title, line, ';');
X                check_title();
X
X                /*
X                 * Find the other relevant lines
X                 *
X                 */
X                if (fgets(line, MAXSTR, fi) == NULL) return(FALSE);
X                ac_found = FALSE;
X                while (line[0] != '/' && line[1] != '/') {
X                        if (line[0] == 'A' && line[1] == 'C') {
X                                if (ac_found)
X                                        fprintf(stderr,
X                                        "Too many AC-lines in %s\n",code);
X                                else {
X                                        extract(code, line, ';');
X                                        ac_found++;
X                                }
X                        }
X                        if (line[0] == 'P' && line[1] == 'A')  {
X                                if (!get_pat(patlin, line, '.'))
X                                        return(FALSE);
X                                has_pat++;
X                        }
X                        if (line[0] == 'D' && line[1] == 'O')
X                                extract(docu, line, ';');
X                        if (fgets(line, MAXSTR, fi) == NULL)
X                                return(FALSE);
X                }
X        } while (!has_pat);
X
X        return(TRUE);
X}
X
X/*** extract ******************************************************************
X *
X * Extract a substring 'so' from string 'si', ending with character 'end'.
X *
X */
X
Xvoid extract(so, si, end)
Xchar *so;
Xchar *si;
Xchar end;
X{
X        while (*si != ' ') *si++;               /* skip code    */
X        while (*si == ' ') *si++;               /* skip blanks  */
X        while (*si != end) *so++ = *si++;       /* find end     */
X        *so = '\0';
X}
X
X/*** get_pat ******************************************************************
X *
X * Extract the pattern from the PA-line(s).
X *
X */
X
Xint get_pat(so, si, end)
Xchar *so;
Xchar *si;
Xchar end;
X{
Xcont:
X        while (*si != ' ') *si++;               /* skip code    */
X        while (*si == ' ') *si++;               /* skip blanks  */
X        while (*si && *si != end && *si != NEWLINE)
X                *so++ = *si++;
X        if (*si != end) {
X                if (fgets(si, MAXSTR, fi) == NULL) return(FALSE);
X                goto cont;
X        }
X        *so++ = end;
X        *so = '\0';
X        return(TRUE);
X}
X
X/*** check_title **************************************************************
X *
X * Check the title for the occurrence of blanks, and change them into
X * underscores.
X * NOTE: This function is obsolete!
X *
X */
X
Xvoid check_title()
X{
X        char *s = title;
X        while (*s) {
X                *s = (*s == ' ') ? '_' : *s;
X                *s++;
X        }
X}
X
X/*** close_files **************************************************************
X *
X * Close the input- and output-file.
X *
X */
X
Xvoid close_files()
X{
X        fclose(fi);
X        fclose(fo);
X        exit(0);
X}
X
X/*** initialize ***************************************************************
X *
X * Initialize the pattern.
X *
X */
X
Xvoid initialize()
X{
X        int i;
X        number = 1;
X        for (i = 0; i < MAXNUM; pattern[i++][0] = '\0');
X        ptr = patlin;
X}
X
X/*** read_word ****************************************************************
X *
X * Read a 'word', i.e. one pattern entity.
X *
X */
X
Xint read_word()
X{
X        char *w = word;
X
X        while (*ptr != '-' && *ptr != '.') {
X                *w++ = *ptr++;
X        }
X        *w = '\0';
X        return ((int)*++ptr);
X}
X
X/*** process ******************************************************************
X *
X * Translate a pattern unit into valid AWK pattern description(s).
X *
X */
X
Xvoid process()
X{
X        int first, last, numold, i, j, k, ret;
X        char str[MAXWRD], tmp[MAXSTR];
X
X        ret = parse_word(str, &first, &last);
X
X        if (ret & N_TERM) {
X                /*
X                 * Special case: N-terminus
X                 */
X                if (ret & INLIST && ! (ret & EXCLUD)) {
X                        for (i = 0; i < number; i++)
X                                strcpy(pattern[number+i], pattern[i]);
X                        for (i = 0; i < number; i++)
X                                strcat(pattern[i], "^");
X                        for (i = number; i < 2*number; i++)
X                                strcat(pattern[i], str);
X                        number *= 2;
X                }
X                else {
X                        tmp[0] = (ret & EXCLUD) ? '.' : '^';
X                        tmp[1] = '\0';
X                        for (i = 0; i < number; i++)
X                                strcat(pattern[i], tmp);
X                        add_pattern(str, first, last);
X                }
X        }
X
X        else if (ret & C_TERM) {
X                /*
X                 * Special case: C-terminus
X                 */
X                if (ret & INLIST && ! (ret & EXCLUD)) {
X                        for (i = 0; i < number; i++)
X                                strcpy(pattern[number+i], pattern[i]);
X                        for (i = 0; i < number; i++)
X                                strcat(pattern[i], "$");
X                        for (i = number; i < 2*number; i++)
X                                strcat(pattern[i], str);
X                        number *= 2;
X                }
X                else {
X                        add_pattern(str, first, last);
X                        tmp[0] = (ret & EXCLUD) ? '.' : '$';
X                        tmp[1] = '\0';
X                        for (i = 0; i < number; i++)
X                                strcat(pattern[i], tmp);
X                }
X        }
X
X        else
X                /*
X                 * No special case.
X                 */
X                add_pattern(str, first, last);
X
X
X}
X
X/*** add_pattern **************************************************************
X *
X */
X
Xvoid add_pattern(str, first, last)
Xchar *str;
Xint first;
Xint last;
X{
X        int numold, i, j, k;
X        char tmp[MAXSTR];
X
X        numold = number;
X        if (last - first) {
X                for (i = first; i < last; i++) {
X                        for (j = 0; j <= numold; j++)
X                                strcpy(pattern[number+j], pattern[j]);
X                        number += numold;
X                }
X        }
X        for (i = first; i <= last; i++) {
X                tmp[0] = '\0';
X                for (j = 0; j < i; j++)
X                        strcat(tmp, str);
X                for (j = 0; j < numold; j++) {
X                        k = numold * (i - first) + j;
X                        strcat(pattern[k], tmp);
X                }
X        }
X}
X
X/*** parse_word ***************************************************************
X *
X * Parse the pattern entity: translate the 'word' into AWK rules, and
X * determine the repeat factor/range, if present.
X *
X */
X
Xint parse_word(str, first, last)
Xchar *str;
Xint *first;
Xint *last;
X{
X        char *w = word, *s = str;
X        int term, beg_list, end_list, exclude;
X
X        *first = *last = 1;
X        term = beg_list = end_list = exclude = FALSE;
X
X        do {
X                switch (*w) {
X
X                case '<':
X                        term |= N_TERM;
X                        if (beg_list)
X                                term |= INLIST;
X                        break;
X                case '>':
X                        term |= C_TERM;
X                        if (beg_list && !end_list)
X                                term |= INLIST;
X                        break;
X                case 'x':
X                        *s++ = '.';
X                        break;
X                case '[':
X                        *s++ = '[';
X                        beg_list = TRUE;
X                        break;
X                case '{':
X                        *s++ = '[';
X                        *s++ = '^';
X                        exclude = EXCLUD;
X                        beg_list = TRUE;
X                        break;
X                case ']':
X                        *s++ = ']';
X                        end_list = TRUE;
X                        break;
X                case '}':
X                        *s++ = ']';
X                        exclude = EXCLUD;
X                        end_list = TRUE;
X                        break;
X                case '(':
X                        *first = *last = 0;
X                        while (*++w != ',' && *w != ')')
X                                *first = *first * 10 + *w - '0';
X                        if (*w == ',')
X                                while (*++w != ',' && *w != ')')
X                                        *last = *last * 10 + *w - '0';
X                        else *last = *first;
X                        break;
X                default:
X                        *s++ = *w;
X                        break;
X                }
X        } while (*w++);
X
X        return (term|exclude);
X}
X
X/*** print_pat ****************************************************************
X *
X * Store the translated pattern(s).
X *
X */
X
Xvoid print_pat()
X{
X        int i;
X
X        for (i = 0; i < number; i++) {
X                fprintf(fo, "%s ", code);
X                fprintf(fo, "%s ", pattern[i]);
X                fprintf(fo, "%s ", title);
X                fprintf(fo, "%s\n", docu);
X        }
X}
X
X
X
X
X
X
END_OF_FILE
if test 13513 -ne `wc -c <'ProSearch2.0/cregex_sun.c'`; then
    echo shar: \"'ProSearch2.0/cregex_sun.c'\" unpacked with wrong size!
fi
# end of 'ProSearch2.0/cregex_sun.c'
fi
if test -f 'ProSearch2.0/prodoc.awk' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'ProSearch2.0/prodoc.awk'\"
else
echo shar: Extracting \"'ProSearch2.0/prodoc.awk'\" \(2098 characters\)
sed "s/^X//" >'ProSearch2.0/prodoc.awk' <<'END_OF_FILE'
X# Written by 
X#      Lee F. Kolakowski, Jr. 
X#      Massachusetts General Hospital
X#      Endocrine Unit
X#      Wellman 5
X#      Fruit Street
X#      Boston, MA 02114
X#
X# COPYRIGHT
X#      Copyright (c) 1990, 1991, 1992 
X#      by Lee F. Kolakowski, Jr. All rights reserved.
X#
X#      This program is free software; you can redistribute it and/or
X#      modify it under the terms of the GNU General Public License
X#      (version 1), as published by the Free Software Foundation, and
X#      found in the file 'COPYING' included with this distribution.
X#
X#      This program is distributed in the hope that it will be useful,
X#      but WITHOUT ANY WARRANTY; without even the implied warrant of
X#      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
X#      GNU General Public License for more details.
X#
X#      You should have received a copy of the GNU General Public License
X#      along with this program;  if not, write to the Free Software
X#      Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
X#
X#      Send bugs or improvements to kolakowski@helix.mgh.harvard.edu
X#
X##
X#      This file is prodoc.awk.
X#      This code presents the results from the search perfomed by 
X#      prosite.awk and is the interface to prosite.doc
X# $Id: prodoc.awk,v 1.2 1992/04/08 22:56:25 lfk Exp $
X#############################################################
X
XBEGIN {
X  printf("\n%s\t%12s\t%20s\t%s\n", "Access#", "From->To", "Name", "Documetation#")
X  printf("%s\t%12s\t%20s\t%s\n", "_______", "________", "____", "_____________")
X  n=1
X  count=1
X}
X{
X  if ($0 ~ /^PS/ && NF == 4) {
X    printf("%s\t%12s\t%20s\t%s\n", $1, $2, $3, $4)
X    regex[count] = "{"$4
X    regex_num = count
X    count++
X  }
X  else if ($0 ~ /^SITE/) {
X    gsub(/^SITE/, ""); print
X  }
X  else if ($0 ~ /^{PDOC/ && n < regex_num) {
X    for (i=n; i <= regex_num; i++) {
X      if ( regex[i] != regex_last ) {
X	regex_last = regex[i]
X	if (verbose)
X	  print "regex ="regex[i], "line ="$0
X	if (match($0,regex[i])) {
X          n = i
X	  print $0
X	  while ( $0 !~ /{END}/) {
X	    getline
X	    print $0
X	  }
X	}
X      }
X    }
X  }
X}
END_OF_FILE
if test 2098 -ne `wc -c <'ProSearch2.0/prodoc.awk'`; then
    echo shar: \"'ProSearch2.0/prodoc.awk'\" unpacked with wrong size!
fi
# end of 'ProSearch2.0/prodoc.awk'
fi
if test -f 'ProSearch2.0/prosearch.1' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'ProSearch2.0/prosearch.1'\"
else
echo shar: Extracting \"'ProSearch2.0/prosearch.1'\" \(2248 characters\)
sed "s/^X//" >'ProSearch2.0/prosearch.1' <<'END_OF_FILE'
X.\" $Id: prosearch.1,v 1.2 1992/04/08 22:56:25 lfk Exp $
X.TH PROSEARCH 1 "Apr 8 1992"
X.SH NAME
Xprosearch \- search protein sequences for Prosite Database patterns
X.SH SYNOPSIS
X.B prosearch
X[-sdh] [-rfilename] file ...
X.br
X.SH DESCRIPTION
X.I prosearch
Xreads each 
X.I file
Xin sequence and searchs the contents for regular expression patterns
Xdescribed in the Prosite Database. The output is displayed on the
Xstandard output. ProSearch is written in the Awk programming language.
X.LP
XIf a pattern is found in a protein sequence, it does not necessarily
Xmean that any structure or function attributed to that pattern is a
Xcharacteristic of that protein.
X.SH OPTIONS
XThe following options added additional information to the output.
X.TP 15
X.B \-s(ites)
XIf the
X.B \-s(ites)
Xoption is specified, the amino acid sequence in the target protein
Xand the regular expression are displayed.
X.TP 15
X.B \-d(oc)
XIf the
X.B \-d(oc)
Xoption is specified, the relevant information in the ProSite documentation
Xis displayed for each pattern matched.
X.TP 15
X.B \-rfilename
XIf the
X.B \-rfilename
Xoption is specified, the argument should be a filename preceeded by -r with
Xlocal patterns conforming to the following format on each line:
X.sp
Xaccession#<TAB>pattern<TAB>name<TAB>documentation#
X.sp
XThe patterns must conform to the specifications of regular expressions in
Xyour local version of Awk.
X.LP
XIf errors during processing occur, a file called "prosearch.error" will
Xappear in your directory describing the error.
X.SH SEE ALSO
Xawk(1), gawk(1), nawk(1), or mawk(1) 
X.br
Xcregex(1) translates prosite.dat to the regular expression format and
Xwas written by Jack A.M. Leunissen, CAOS/CAMM Center, Nijmegen, The
XNetherlands.
X.br
Xprosite.regex - the regular expression file
X.br
Xprosite.doc - the Prosite database (available from NETSERV@EMBL.BITNET)
X.SH AUTHORS
XProsearch was written by and is maintained by Lee F. Kolakowski, Jr.
X(email lfk@eastman1.mit.edu). If a citation is need for ProSearch
Xplease cite as "unpublished method."
X.LP
XThe ProSite Database is written and maintained by
X.br
XAmos Bairoch
X.br
XMedical Biochemistry Department
X.br
XUniversity of Geneva
X.br
XSwitzerland
X.br
XEmail: bairoch@cmu.unige.ch or bairoch@cgecmu51.bitnet
X.br
XTelephone: (+41 22) 618 492
X
END_OF_FILE
if test 2248 -ne `wc -c <'ProSearch2.0/prosearch.1'`; then
    echo shar: \"'ProSearch2.0/prosearch.1'\" unpacked with wrong size!
fi
# end of 'ProSearch2.0/prosearch.1'
fi
if test -f 'ProSearch2.0/prosearch.help' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'ProSearch2.0/prosearch.help'\"
else
echo shar: Extracting \"'ProSearch2.0/prosearch.help'\" \(3187 characters\)
sed "s/^X//" >'ProSearch2.0/prosearch.help' <<'END_OF_FILE'
X
X
X
XPROSEARCH(1)                                         PROSEARCH(1)
X
X
XNNAAMMEE
X       prosearch  - search protein sequences for Prosite Database
X       patterns
X
XSSYYNNOOPPSSIISS
X       pprroosseeaarrcchh [-sdh] [-rfilename] file ...
X
XDDEESSCCRRIIPPTTIIOONN
X       _p_r_o_s_e_a_r_c_h reads each _f_i_l_e in sequence and searchs the con-
X       tents  for  regular  expression  patterns described in the
X       Prosite Database. The output is displayed on the  standard
X       output.  ProSearch  is written in the Awk programming lan-
X       guage.
X
X       If a pattern is found in a protein sequence, it  does  not
X       necessarily mean that any structure or function attributed
X       to that pattern is a characteristic of that protein.
X
XOOPPTTIIOONNSS
X       The following options added additional information to  the
X       output.
X
X       --ss((iitteess))       If  the  --ss((iitteess))  option is specified, the
X                      amino acid sequence in the  target  protein
X                      and the regular expression are displayed.
X
X       --dd((oocc))         If the --dd((oocc)) option is specified, the rel-
X                      evant information in the ProSite documenta-
X                      tion is displayed for each pattern matched.
X
X       --rrffiilleennaammee     If the --rrffiilleennaammee option is specified,  the
X                      argument  should be a filename preceeded by
X                      -r with local patterns  conforming  to  the
X                      following format on each line:
X
X                      acces-
X                      sion#<TAB>pattern<TAB>name<TAB>documentation#
X
X                      The patterns must conform to the specifica-
X                      tions of regular expressions in your  local
X                      version of Awk.
X
X       If  errors  during  processing occur, a file called "pros-
X       earch.error" will appear in your directory describing  the
X       error.
X
XSSEEEE AALLSSOO
X       awk(1), gawk(1), nawk(1), or mawk(1)
X       cregex(1) translates prosite.dat to the regular expression
X       format and was written by Jack A.M.  Leunissen,  CAOS/CAMM
X       Center, Nijmegen, The Netherlands.
X       prosite.regex - the regular expression file
X       prosite.doc  -  the  Prosite database (available from NET-
X       SERV@EMBL.BITNET)
X
X
X
X                            Apr 8 1992                          1
X
X
X
X
X
XPROSEARCH(1)                                         PROSEARCH(1)
X
X
XAAUUTTHHOORRSS
X       Prosearch was written by  and  is  maintained  by  Lee  F.
X       Kolakowski,  Jr.  (email lfk@eastman1.mit.edu). If a cita-
X       tion is need for ProSearch  please  cite  as  "unpublished
X       method."
X
X       The ProSite Database is written and maintained by
X       Amos Bairoch
X       Medical Biochemistry Department
X       University of Geneva
X       Switzerland
X       Email: bairoch@cmu.unige.ch or bairoch@cgecmu51.bitnet
X       Telephone: (+41 22) 618 492
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X                            Apr 8 1992                          2
X
X
END_OF_FILE
echo shar: 114 control characters may be missing from \"'ProSearch2.0/prosearch.help'\"
if test 3187 -ne `wc -c <'ProSearch2.0/prosearch.help'`; then
    echo shar: \"'ProSearch2.0/prosearch.help'\" unpacked with wrong size!
fi
# end of 'ProSearch2.0/prosearch.help'
fi
if test -f 'ProSearch2.0/prosearch.orig' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'ProSearch2.0/prosearch.orig'\"
else
echo shar: Extracting \"'ProSearch2.0/prosearch.orig'\" \(2366 characters\)
sed "s/^X//" >'ProSearch2.0/prosearch.orig' <<'END_OF_FILE'
X#!/bin/sh
X
X# Written by 
X#      Lee F. Kolakowski, Jr. 
X#      Massachusetts General Hospital
X#      Endocrine Unit
X#      Wellman 5
X#      Fruit Street
X#      Boston, MA 02114
X#
X# COPYRIGHT
X#      Copyright (c) 1990, 1991, 1992 
X#      by Lee F. Kolakowski, Jr. All rights reserved.
X#
X#      This program is free software; you can redistribute it and/or
X#      modify it under the terms of the GNU General Public License
X#      (version 1), as published by the Free Software Foundation, and
X#      found in the file 'COPYING' included with this distribution.
X#
X#      This program is distributed in the hope that it will be useful,
X#      but WITHOUT ANY WARRANTY; without even the implied warrant of
X#      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
X#      GNU General Public License for more details.
X#
X#      You should have received a copy of the GNU General Public License
X#      along with this program;  if not, write to the Free Software
X#      Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
X#
X#      Send bugs or improvements to kolakowski@helix.mgh.harvard.edu
X#
X# $Id: prosearch.orig,v 1.3 1992/04/08 23:42:04 lfk Exp $
X
X
Xtrap 'rm -f /tmp/pros$$.tmp' 0
X
X# shell variables
X#awk='nawk' #AT&T Awk
X#awk='mawk' #fastest
X#awk='gawk' #slowest
Xawk=@AWK@
X
Xversion=2.0
Xdb_version=8.1
X
Xsite_dir='@PROSITE@'
Xsearch_dir='@PROSEARCH@'
X
Xprosearch=${search_dir}/prosite.awk
Xregsites=${site_dir}/prosite.regex
Xdocumentation=${site_dir}/prosite.doc
Xhelpfile=${search_dir}/prosearch.help
Xdocflag=""
Xsiteflag=no
X
X# process arguments
Xfor arg in $* 
Xdo	case $arg in
X		-s*)	siteflag=yes ;; # display sites in output
X		-d*)	docflag=${documentation}  ;; # display documentation 
X		-h*)	cat ${helpfile}; exit 0 ;;
X		-r*)	regsites=`echo $arg | sed 's@-r@@'` ;
X			echo `basename $0`: using $regsites for pattern file ;;
X		-*)	echo `basename $0`: unknown flag $arg ; 
X			echo Usage: prosearch [-sites] [-doc] -[help] -rfile filenames;
X			exit 1 ;;
X		*)	args="${args} $arg" ;; # echo $args ;;
X	esac
Xdone
X
Xecho "ProSearch${version} searches The Prosite Database ${db_version}"
Xfor file in $args
Xdo
X echo "The following ProSite Patterns are in < $file >:"
X readseq -C -f13 -p < $file > /tmp/pros$$.tmp
X ${awk} -f ${prosearch} -v showsites=$siteflag ${regsites}  /tmp/pros$$.tmp |
X ${awk} -f ${search_dir}/prodoc.awk - $docflag
X rm -f /tmp/pros$$.tmp
Xdone
X
END_OF_FILE
if test 2366 -ne `wc -c <'ProSearch2.0/prosearch.orig'`; then
    echo shar: \"'ProSearch2.0/prosearch.orig'\" unpacked with wrong size!
fi
# end of 'ProSearch2.0/prosearch.orig'
fi
if test -f 'ProSearch2.0/prosite.awk' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'ProSearch2.0/prosite.awk'\"
else
echo shar: Extracting \"'ProSearch2.0/prosite.awk'\" \(3495 characters\)
sed "s/^X//" >'ProSearch2.0/prosite.awk' <<'END_OF_FILE'
X# Written by 
X#      Lee F. Kolakowski, Jr. 
X#      Massachusetts General Hospital
X#      Endocrine Unit
X#      Wellman 5
X#      Fruit Street
X#      Boston, MA 02114
X#
X# This version was modified extensively by Joe Smith (jes@mbio.med.upenn.edu)
X#
X# COPYRIGHT
X#      Copyright (c) 1990, 1991, 1992 
X#      by Lee F. Kolakowski, Jr. All rights reserved.
X#
X#      This program is free software; you can redistribute it and/or
X#      modify it under the terms of the GNU General Public License
X#      (version 1), as published by the Free Software Foundation, and
X#      found in the file 'COPYING' included with this distribution.
X#
X#      This program is distributed in the hope that it will be useful,
X#      but WITHOUT ANY WARRANTY; without even the implied warrant of
X#      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
X#      GNU General Public License for more details.
X#
X#      You should have received a copy of the GNU General Public License
X#      along with this program;  if not, write to the Free Software
X#      Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
X#
X#      Send bugs or improvements to kolakowski@helix.mgh.harvard.edu
X#
X##
X#      This file is prosite.awk.
X#      This code searchs a target sequence for patterns in the regex format
X#      
X# $Id: prosite.awk,v 1.2 1992/04/08 22:56:25 lfk Exp $
X#############################################################
X
XBEGIN {
X# Increase to 11 for a file with debugging output "x.msg"
X  verbose = 0
X  me = "prosite.awk"
X  errfile = msgfile = "prosite.error"
X
X  if (ARGC < 2) {
X    printf "%s: missing pattern filename\n", me >errfile
X    exit 1
X  }
X  npat = get_patterns( ARGV[1] )
X  if (verbose)
X    printf "%s: %d patterns read from %s\n", me, npat, ARGV[1] >msgfile;
X
X  if ( ARGC > 2 )
X    for (i = 2; i < ARGC; ++i)
X      prosite( ARGV[i] )
X  else if ( ARGC == 2 )
X    prosite( "/dev/stdin" )
X  exit
X}
X
Xfunction get_patterns ( patfile,   n) {
X  for (n = 0; getline < patfile; ++n) {
X    if (NF == 4) {
X      accession[n] = $1 ;
X      pattern[n] = $2;
X      name[n] = $3;
X      doc[n] = $4;
X    }
X    else {
X      printf "%s: error in pattern file '%s', line %d\n",
X        me, patfile, n+1 >errfile
X      exit 1
X    }
X  }
X  close(patfile)
X  $0 = ""
X  return n
X}
X
Xfunction prosite ( seqfile,    i, offset, from, to, sites, matched) {
X# read & build the sequence string
X  for (seq = $0; getline <seqfile > 0; seq = seq$0)
X    ;
X  if (verbose)
X    printf "%s: read %d aa from %s\n",
X      me, length(seq), seqfile >msgfile
X  if (verbose >= 10)
X    for (offset = 1; offset < length(seq); offset += 50)
X      print substr(seq, offset, 50) >msgfile;
X
X  for (i = 0; i < npat; ++i) {
X    if (verbose >= 10)
X      printf "%s: searching for %s (%d of %d)\n",
X        me, accession[i], i+1, npat >msgfile
X      sites = 0
X      offset = 1
X      while (match(substr(seq, offset), pattern[i])) {
X	++sites
X	from = offset + RSTART - 1
X	to = from + RLENGTH - 1
X	printf("%s\t%d->%d\t%s\t%s\n",
X          accession[i], from, to, name[i], doc[i]);
X	if (showsites == "YES" || showsites == "yes") 
X	  printf "SITE\tPattern\t%s matched\nSITE\tSite\t%d %s %d\n",
X	    pattern[i], from, substr(seq, from, RLENGTH), to
X	if (verbose >= 10)
X	  printf "%s: %s matched %s\n", me, pattern[i],
X	  substr(seq, from, RLENGTH) >msgfile
X	offset = from + 1
X      }
X      if (sites) 
X	++matched
X      allsites += sites
X    }
X    if (verbose)
X      printf "%s: %d patterns matched at %d sites\n",
X        me, matched, allsites >msgfile
X}
END_OF_FILE
if test 3495 -ne `wc -c <'ProSearch2.0/prosite.awk'`; then
    echo shar: \"'ProSearch2.0/prosite.awk'\" unpacked with wrong size!
fi
# end of 'ProSearch2.0/prosite.awk'
fi
if test -f 'ProSearch2.0/regex_update.orig' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'ProSearch2.0/regex_update.orig'\"
else
echo shar: Extracting \"'ProSearch2.0/regex_update.orig'\" \(1621 characters\)
sed "s/^X//" >'ProSearch2.0/regex_update.orig' <<'END_OF_FILE'
X#!/bin/sh
X#
X# Update the ProSearch regular expression file
X#
X# Written by 
X#      Lee F. Kolakowski, Jr. 
X#      Massachusetts General Hospital
X#      Endocrine Unit
X#      Wellman 5
X#      Fruit Street
X#      Boston, MA 02114
X#
X# COPYRIGHT
X#      Copyright (c) 1990, 1991, 1992 
X#      by Lee F. Kolakowski, Jr. All rights reserved.
X#
X#      This program is free software; you can redistribute it and/or
X#      modify it under the terms of the GNU General Public License
X#      (version 1), as published by the Free Software Foundation, and
X#      found in the file 'COPYING' included with this distribution.
X#
X#      This program is distributed in the hope that it will be useful,
X#      but WITHOUT ANY WARRANTY; without even the implied warrant of
X#      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
X#      GNU General Public License for more details.
X#
X#      You should have received a copy of the GNU General Public License
X#      along with this program;  if not, write to the Free Software
X#      Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
X#
X#      Send bugs or improvements to kolakowski@helix.mgh.harvard.edu
X#
X# $Id: regex_update.orig,v 1.2 1992/04/08 22:56:25 lfk Exp $
X
Xsite_dir='@PROSITE@'
XDATA=$site_dir/prosite.dat
XREGEX=$site_dir/prosite.regex
XCREGEX=cregex
X
X#if [ $DATA -nt $REGEX ]; then
X#	echo Updateing $REGEX with $DATA
X#	mv $REGEX ${REGEX}.old
X#	cregex $DATA $REGEX
X#	sort $REGEX -o $REGEX
X#else
X#	echo $REGEX is up to date
X#fi
X
Xecho -n Updating `basename $REGEX` with `basename $DATA` ...
Xmv $REGEX ${REGEX}.old
Xcregex $DATA $REGEX
Xsort $REGEX -o $REGEX
Xecho done
END_OF_FILE
if test 1621 -ne `wc -c <'ProSearch2.0/regex_update.orig'`; then
    echo shar: \"'ProSearch2.0/regex_update.orig'\" unpacked with wrong size!
fi
# end of 'ProSearch2.0/regex_update.orig'
fi
if test -f 'ProSearch2.0/test.out' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'ProSearch2.0/test.out'\"
else
echo shar: Extracting \"'ProSearch2.0/test.out'\" \(15369 characters\)
sed "s/^X//" >'ProSearch2.0/test.out' <<'END_OF_FILE'
XProSearch2.0 searches The Prosite Database 8.1
XThe following ProSite Patterns are in < test.pep >:
X
XAccess#	    From->To	                Name	Documetation#
X_______	    ________	                ____	_____________
XPS00001	        2->5	   ASN_GLYCOSYLATION	PDOC00001
X	Pattern	N[^P][ST][^P] matched
X	Site	2 NGTE 5
XPS00001	      15->18	   ASN_GLYCOSYLATION	PDOC00001
X	Pattern	N[^P][ST][^P] matched
X	Site	15 NKTG 18
XPS00001	    200->203	   ASN_GLYCOSYLATION	PDOC00001
X	Pattern	N[^P][ST][^P] matched
X	Site	200 NESF 203
XPS00005	      14->16	    PKC_PHOSPHO_SITE	PDOC00005
X	Pattern	[ST].[RK] matched
X	Site	14 SNK 16
XPS00005	    229->231	    PKC_PHOSPHO_SITE	PDOC00005
X	Pattern	[ST].[RK] matched
X	Site	229 TVK 231
XPS00005	    243->245	    PKC_PHOSPHO_SITE	PDOC00005
X	Pattern	[ST].[RK] matched
X	Site	243 TQK 245
XPS00006	      22->25	    CK2_PHOSPHO_SITE	PDOC00006
X	Pattern	[ST]..[DE] matched
X	Site	22 SPFE 25
XPS00006	    193->196	    CK2_PHOSPHO_SITE	PDOC00006
X	Pattern	[ST]..[DE] matched
X	Site	193 TPHE 196
XPS00006	    198->201	    CK2_PHOSPHO_SITE	PDOC00006
X	Pattern	[ST]..[DE] matched
X	Site	198 TNNE 201
XPS00006	    229->232	    CK2_PHOSPHO_SITE	PDOC00006
X	Pattern	[ST]..[DE] matched
X	Site	229 TVKE 232
XPS00006	    338->341	    CK2_PHOSPHO_SITE	PDOC00006
X	Pattern	[ST]..[DE] matched
X	Site	338 SKTE 341
XPS00007	      21->29	    TYR_PHOSPHO_SITE	PDOC00007
X	Pattern	[RK]...[DE]...Y matched
X	Site	21 RSPFEAPQY 29
XPS00008	      89->94	            MYRISTYL	PDOC00008
X	Pattern	G[^EDRKHPFYW]..[STAGCN][^P] matched
X	Site	89 GGFTTT 94
XPS00008	    120->125	            MYRISTYL	PDOC00008
X	Pattern	G[^EDRKHPFYW]..[STAGCN][^P] matched
X	Site	120 GGEIAL 125
XPS00008	    156->161	            MYRISTYL	PDOC00008
X	Pattern	G[^EDRKHPFYW]..[STAGCN][^P] matched
X	Site	156 GVAFTW 161
XPS00008	    182->187	            MYRISTYL	PDOC00008
X	Pattern	G[^EDRKHPFYW]..[STAGCN][^P] matched
X	Site	182 GMQCSC 187
XPS00237	    123->139	  G_PROTEIN_RECEPTOR	PDOC00210
X	Pattern	[GSTALIVMC][GSTAPDE][^EDPKRH]..[LIVMNG]..[LIVMFT][GSTAN][LIVMFYWAS][DEN]R[FYWCH]..[LIVM] matched
X	Site	123 IALWSLVVLAIERYVVV 139
XPS00238	    296->313	               OPSIN	PDOC00211
X	Pattern	K.....[DN]P.[IV]Y......[FY] matched
X	Site	296 KTSAVYNPVIYIMMNKQF 313
X{PDOC00001}
X{PS00001; ASN_GLYCOSYLATION}
X{BEGIN}
X************************
X* N-glycosylation site *
X************************
X
XIt has been known for a long time [1] that potential N-glycosylation sites are
Xspecific to the consensus sequence Asn-Xaa-Ser/Thr.  It must be noted that the
Xpresence of the consensus  tripeptide  is  not sufficient  to conclude that an
Xasparagine residue is glycosylated, due to  the fact that the  folding of  the
Xprotein plays an important  role in the  regulation of N-glycosylation [2]. It
Xhas been shown [3] that  the  presence of proline between Asn and Ser/Thr will
Xinhibit N-glycosylation; this  has  been confirmed by a recent [4] statistical
Xanalysis of glycosylation sites, which also  shows that about 50% of the sites
Xthat have a proline C-terminal to Ser/Thr are not glycosylated.
X
XIt must also  be noted that there  are  a few  reported cases of glycosylation
Xsites with the pattern Asn-Xaa-Cys; an  experimentally demonstrated occurrence
Xof such non-standard site is found in the plasma protein C [5].
X
X-Consensus pattern: N-{P}-[ST]-{P}
X                    [N is the glycosylation site]
X-Last update: May 1991 / Text revised.
X
X[ 1] Marshall R.D.
X     Annu. Rev. Biochem. 41:673-702(1972).
X[ 2] Pless D.D., Lennarz W.J.
X     Proc. Natl. Acad. Sci. U.S.A. 74:134-138(1977).
X[ 3] Bause E.
X     Biochem. J. 209:331-336(1983).
X[ 4] Gavel Y., von Heijne G.
X     Protein Eng. 3:433-442(1990).
X[ 5] Miletich J.P., Broze G.J. Jr.
X     J. Biol. Chem. 265:11397-11404(1990).
X{END}
X{PDOC00005}
X{PS00005; PKC_PHOSPHO_SITE}
X{BEGIN}
X*****************************************
X* Protein kinase C phosphorylation site *
X*****************************************
X
XIn vivo, protein kinase C  exhibits  a  preference  for the phosphorylation of
Xserine or threonine residues  close to a  C-terminal basic residue [1,2].  The
Xpresence of additional  basic residues at the  N- or C-terminal of  the target
Xamino acid enhances the Vmax and Km of the phosphorylation reaction.
X
X-Consensus pattern: [ST]-x-[RK]
X                    [S or T is the phosphorylation site]
X-Last update: June 1988 / First entry.
X
X[ 1] Woodget J.R., Gould K.L., Hunter T.
X     Eur. J. Biochem. 161:177-184(1986).
X[ 2] Kishimoto A., Nishiyama K., Nakanishi H., Uratsuji Y., Nomura H.,
X     Takeyama Y., Nishizuka Y.
X     J. Biol. Chem. 260:12492-12499(1985).
X{END}
X{PDOC00006}
X{PS00006; CK2_PHOSPHO_SITE}
X{BEGIN}
X*****************************************
X* Casein kinase II phosphorylation site *
X*****************************************
X
XCasein kinase II (CK-2) is a protein serine/threonine kinase that has activity
Xindependent of cyclic nucleotides  and  of calcium.  CK-2  phosphorylates many
Xdifferent proteins.   The  substrate  specificity [1]  of  this  enzyme can be
Xsummarized as follows:
X
X (1) Under comparable conditions Ser is favoured over Thr.
X (2) An acidic residue (either Asp or Glu)  must  be present three residues to
X     the C-terminal of the phosphate acceptor site.
X (3) Additional acidic  residues in  positions +1, +2, +4, and +5 increase the
X     phosphorylation rate.  Most  physiological  substrates  have at least one
X     acidic residue in these positions.
X (4) Asp is preferred to Glu as the provider of acidic determinants.
X (5) A basic residue to the N-terminal  of the  acceptor  site  decreases  the
X     phosphorylation rate, while an acidic one will increase it.
X
X-Consensus pattern: [ST]-x(2)-[DE]
X                    [S or T is the phosphorylation site]
X-Note: this pattern  is  found  most,  but not all, of the known physiological
X substrates.
X-Last update: May 1991 / Text revised.
X
X[ 1] Pinna L.A.
X     Biochim. Biophys. Acta 1054:267-284(1990).
X{END}
X{PDOC00007}
X{PS00007; TYR_PHOSPHO_SITE}
X{BEGIN}
X****************************************
X* Tyrosine kinase phosphorylation site *
X****************************************
X
XSubstrates of tyrosine protein kinases are generally characterized by a lysine
Xor an arginine seven residues  to  the N-terminal side  of  the phosphorylated
Xtyrosine.  An acidic residue (Asp  or Glu) is often  found at either  three or
Xfour residues to  the N-terminal side  of  the tyrosine  [1,2,3].  There are a
Xnumber of exceptions to  this rule such as the  tyrosine phosphorylation sites
Xof enolase and lipocortin II.
X
X-Consensus pattern: [RK]-x(2)-[DE]-x(3)-Y
X                 or [RK]-x(3)-[DE]-x(2)-Y
X                    [Y is the phosphorylation site]
X-Last update: June 1988 / First entry.
X
X[ 1] Patschinsky T., Hunter T., Esch F.S., Cooper J.A., Sefton B.M.
X     Proc. Natl. Acad. Sci. U.S.A. 79:973-977(1982).
X[ 2] Hunter T.
X     J. Biol. Chem. 257:4843-4848(1982).
X[ 3] Cooper J.A., Esch F.S., Taylor S.S., Hunter T.
X     J. Biol. Chem. 259:7835-7841(1984).
X{END}
X{PDOC00008}
X{PS00008; MYRISTYL}
X{BEGIN}
X*************************
X* N-myristoylation site *
X*************************
X
XAn  appreciable  number of eukaryotic  proteins  are  acylated by the covalent
Xaddition of myristate (a C14-saturated fatty acid) to their N-terminal residue
Xvia an amide  linkage [1,2].  The  specificity  of the  enzyme responsible for
Xthis modification,  myristoyl CoA:protein  N-myristoyl  transferase (NMT), has
Xbeen derived from   the sequence  of  known N-myristoylated  proteins and from
Xstudies using  synthetic peptides.  The sequence specificity  seems to  be the
Xfollowing:
X
X - The N-terminal residue must be glycine.
X - In position 2, uncharged residues  are allowed.  Charged residues,  proline
X   and large hydrophobic residues are not allowed.
X - In positions 3 and 4, most, if not all, residues are allowed.
X - In position 5, small  uncharged residues  are  all  allowed (Ala, Ser, Thr,
X   Cys, Asn and Gly). Serine is favored.
X - In position 6, proline is not allowed.
X
X-Consensus pattern: G-{EDRKHPFYW}-x(2)-[STAGCN]-{P}
X                    [G is the N-myristoylation site]
X-Note: we deliberately  include as  potential myristoylated  glycine  residues
X those which  are  internal to a  sequence, for  it  could  well  be  that the
X sequence under  study  represents a  viral  polyprotein  precursor  and  that
X subsequent proteolytic processing  could expose an internal glycine as the N-
X terminal of a mature protein.
X-Last update: October 1989 / Pattern and text revised.
X
X[ 1] Towler D.A., Gordon J.I., Adams S.P., Glaser L.
X     Annu. Rev. Biochem. 57:69-99(1988).
X[ 2] Grand R.J.A.
X     Biochem. J. 258:625-638(1989).
X{END}
X{PDOC00210}
X{PS00237; G_PROTEIN_RECEPTOR}
X{BEGIN}
X*****************************************
X* G-protein coupled receptors signature *
X*****************************************
X
XG-protein coupled receptors [1,2] (also called R7G)  are an extensive group of
Xhormones, neurotransmitters, and light receptors which transduce extracellular
Xsignals by  interaction  with  guanine  nucleotide-binding (G)  proteins.  The
Xreceptors that  are  currently known to belong to this family are listed below
X(references are only provided for very recently sequenced proteins).
X
X - 5-hydroxytryptamine (serotonin) 1a, 1c and 2 [Reviewed in 3].
X - Acetylcholine, muscarinic-type, M1 to M5.
X - Adenosine A1 and A2 [4].
X - Adrenergic alpha-1 and alpha-2; beta-1, beta-2, and beta-3 [Reviewed in 5].
X - Angiotensin II (proto-oncogene mas).
X - Bombesin/gastrin-releasing peptide.
X - C5a anaphylatoxin [6].
X - Cannabinoid.
X - Dopamine D1 to D5 [Reviewed in 7].
X - Endothelin ETa and ETb [Reviewed in 8].
X - f-Met-Leu-Phe (fMLP) (N-formyl peptide).
X - Follicle stimulating hormone (FSH-R) [Reviewed in 9].
X - Histamine H2 (gastric receptor I) [10].
X - Lutropin-choriogonadotropic hormone (LSH-R) [Reviewed in 9].
X - Neuromedin K (NK-3).
X - Neurotensin.
X - Octopamine, from insects.
X - Odorants [11].
X - Platelet activating factor (PAF-R) [12].
X - Substance-K (NK-2).
X - Substance-P (NK-1).
X - Thromboxane A2 [13].
X - Thyrotropin (TSH-R) [Reviewed in 9].
X - Thyrotropin releasing factor.
X - Visual pigments (opsins and rhodopsin) [Reviewed in 14].
X - Two putative receptors from dog: RDC1 and RDC4.
X - Two putative receptors from rat: FC5, and RTA.
X - Three putative receptors encoded  in  the  genome of cytomegalovirus: US27,
X   US28, and UL33.
X - Slime mold cyclic AMP receptor.
X
XThe  structure of all  these receptors is  thought  to be identical: they have
Xseven hydrophobic regions,  each  of which most probably  spans  the membrane.
XThe N-terminus is located on the  extracellular side of the  membrane,  and is
Xglycosylated, while the C-terminus is cytoplasmic  and  phosphorylated.  Three
Xextracellular loops  alternate  with   three intracellular  loops to  link the
Xseven transmembrane regions.   The most conserved parts of  all these proteins
Xare the transmembrane regions and the first two cytoplasmic loops. A conserved
Xacidic-Arg-aromatic triplet is  present  in  the  N-terminal  extremity of the
Xsecond cytoplasmic loop [15], it could be implicated in the interaction with G
Xproteins.
X
XTo detect this widespread family of  proteins we have developed a pattern that
Xcontains the conserved triplet and that also spans the major part of the third
Xtransmembrane helix.
X
X-Consensus pattern: [GSTALIVMC]-[GSTAPDE]-{EDPKRH}-x(2)-[LIVMNG]-x(2)-
X                    [LIVMFT]-[GSTAN]-[LIVMFYWAS]-[DEN]-R-[FYWCH]-x(2)-[LIVM]
X-Sequences known to belong to this class detected by the pattern: ALL,  except
X for slime mold cAMP receptor which does not seem to really belong to the  R7G
X family.
X-Other sequence(s) detected in SWISS-PROT: Drosophila insulinase.
X
X-Expert(s) to contact by email: Chollet A.
X                                chollet@clients.switch.ch
X                                Attwood T.K.
X                                bph6tka@biovax.leeds.ac.uk
X
X-Last update: December 1991 / Pattern and text revised.
X
X[ 1] Strosberg A.D.
X     Eur. J. Biochem. 196:1-10(1991).
X[ 2] Kerlavage A.R.
X     Curr. Opin. Struct. Biol. 1:394-401(1991).
X[ 3] Hartig P.R.
X     Trends Pharmacol. Sci. 10:64-69(1989).
X[ 4] Libert F., Schiffmann S.N., Lefort A., Parmentier M., Gerard C.,
X     Dumont J.E., Vanderhaeghen J.-J., Vassart G.
X     EMBO J. 10:1677-1682(1991).
X[ 5] Friell T., Kobilka B.K., Lefkowitz R.J., Caron M.G.
X     Trends Neurosci. 11:321-324(1988).
X[ 6] Gerard N.P., Gerard C.
X     Nature 349:614-617(1991).
X[ 7] Stevens C.F.
X     Curr. Biol. 1:20-22(1991).
X[ 8] Vane J.
X     Nature 348:673-673(1990).
X[ 9] Salesse R., Remy J.J., Levin J.M., Jallal B., Garnier J.
X     Biochimie 73:109-120(1991).
X[10] Gantz I., Schaffer M., Delvalle J., Logsdon C., Campbell V., Uhler M.,
X     Yamada T.
X     Proc. Natl. Acad. Sci. U.S.A. 88:429-433(1991).
X[11] Buck L., Axel R.
X     Cell 65:175-187(1991).
X[12] Honda Z.-I., Nakamura M., Miki I., Minami M., Watanabe T., Seyama Y.,
X     Okado H., Toh H., Ito K., Miyamoto T., Shimizu T.
X     Nature 349:342-346(1991).
X[13] Hirata M., Hayashi Y., Ushikubi F., Yokota Y., Kageyama R., Nakanishi S.,
X     Narumiya S.
X     Nature 349:617-620(1991).
X[14] Applebury M.L., Hargrave P.A.
X     Vision Res. 26:1881-1895(1986).
X[15] Attwood T.K., Eliopoulos E.E., Findlay J.B.C.
X     Gene 98:153-159(1991).
X{END}
X{PDOC00211}
X{PS00238; OPSIN}
X{BEGIN}
X*************************************************
X* Visual pigments (opsins) retinal binding site *
X*************************************************
X
XVisual pigments [1,2] are the light-absorbing  molecules that  mediate vision.
XThey consist of  an apoprotein, opsin,  covalently  linked  to the chromophore
Xcis-retinal.  Vision is  effected trough  the  absorption of a  photon by cis-
Xretinal  which is isomerized to  trans-retinal.  This isomerization leads to a
Xchange  of conformation  of the protein. Opsins are integral membrane proteins
Xwith seven  transmembrane  regions.  The  attachment  site  for  retinal  is a
Xconserved lysine  residue in the middle of the seventh transmembrane helix.
X
XIn vertebrates there are four different pigments.   Cone cells, which function
Xin bright light, are responsible for color vision and contain  the three color
Xpigments (red, blue and green).  Rod cells, which mediate vision in dim light,
Xcontain the fourth pigment, rhodopsin.   The sequence of all  three human cone
Xcolor opsins is known, and  rhodopsin has been sequenced in mammals as well as
Xin various other species such as chicken, octopus, or lamprey [3].
X
XIn Drosophila, the  eye   is composed   of 800   facets  or   ommatidia.  Each
Xommatidium contains eight photoreceptor cells (R1-R8):  the R1 to R6 cells are
Xouter cells,  while R7 and  R8 are  inner  cells.  Each  of the three types of
Xcells (R1-R6, R7 and R8) expresses a specific opsin.
X
XWe developed a  pattern  which  includes  the retinal binding lysine and which
Xallows the specific detection of opsins.
X
X-Consensus pattern: K-x(5)-[DN]-P-x-[IV]-Y-x(6)-[FY]
X                    [K is the retinal binding site]
X-Sequences known to belong to this class detected by the pattern: ALL.
X-Other sequence(s) detected in SWISS-PROT: NONE.
X-Last update: December 1991 / Text revised.
X
X[ 1] Applebury M.L., Hargrave P.A.
X     Vision Res. 26:1881-1895(1986).
X[ 2] Fryxell K.J., Meyerowitz E.M.
X     J. Mol. Evol. 33:367-378(1991).
X[ 3] Hisatomi O., Iwasa T., Tokunaga F., Yasui A.
X     Biochem. Biophys. Res. Commun. 174:1125-1132(1991).
X{END}
END_OF_FILE
if test 15369 -ne `wc -c <'ProSearch2.0/test.out'`; then
    echo shar: \"'ProSearch2.0/test.out'\" unpacked with wrong size!
fi
# end of 'ProSearch2.0/test.out'
fi
if test -f 'ProSearch2.0/test.pep' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'ProSearch2.0/test.pep'\"
else
echo shar: Extracting \"'ProSearch2.0/test.pep'\" \(375 characters\)
sed "s/^X//" >'ProSearch2.0/test.pep' <<'END_OF_FILE'
X> bov.ops	348 bases
XMNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIML
XGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLH
XGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGE
XNHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNN
XESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEV
XTRMVIIMVIAFLICWLPYAGVAFYIFTHQGFDFGPIFMTIPAFFAKTSAV
XYNPVIYIMMNKQFRNCMVTTLCCGKNPLGDDEASTTVSKTETSQVAPA
END_OF_FILE
if test 375 -ne `wc -c <'ProSearch2.0/test.pep'`; then
    echo shar: \"'ProSearch2.0/test.pep'\" unpacked with wrong size!
fi
# end of 'ProSearch2.0/test.pep'
fi
echo shar: End of shell archive.
exit 0
--
Frank Kolakowski

=======================================================================
O Email: lfk@eastman1.mit.edu or kolakowski@helix.mgh.harvard.edu     O
O US Mail: Lee F. Kolakowski        Endocrine Unit                    O
O Massachusetts General Hospital    Wellman 5                         O
O Boston, MA 02114                  Phone AT&T:  1-617-726-3966       O
=======================================================================

