From BLAZE-MAIL@ren-n-stimpy.ig.com Mon May 18 10:58:45 1992 Received: from ren-n-stimpy (ren-n-stimpy.ig.com) by sunflower.bio.indiana.edu (4.1/9.5jsm) id AA24197; Mon, 18 May 92 10:58:39 EST Received: by ren-n-stimpy (5.57/IG-2.0) id AA11679; Mon, 18 May 92 08:57:37 -0700 Date: Mon, 18 May 92 08:57:37 -0700 Message-Id: <9205181557.AA11679@ren-n-stimpy> From: Genbank BLAZE E-mail Service Subject: Have some help To: Don Gilbert Status: R BLAZE Server Help GenBank now offers a preliminary version of BLAZE, a massively parallel sequence similarity search program from IntelliGenetics and MasPar Computer Corporation. BLAZE - full version -------------------- The full version of BLAZE - currently under development - searches DNA and protein databases for sequence similarity and reports the alignment score, percentage of match, and statistical significance of each database sequence. Queries can include stop codons and IUPAC codes. BLAZE also can report the optimal Smith-Waterman local alignment of your query with a database sequence by showing the number of matches, the number of gaps, and the number of mismatches. BLAZE can also display the annotations. Standard supported databases are GenBank, Swiss-Prot, GENESEQ, VectorBank, PIR and HIV. User-developed databases are also supported in the FastA (Pearson) or IntelliGenetics formats. BLAZE uses the full dynamic programming algorithm of Smith and Waterman for maximum sensitivity. Every character in the query is compared with every character in the database. At every position, BLAZE considers every possible insertion or deletion gap of any length. Approximate methods employing word matching are not used. BLAZE - preliminary version --------------------------- The preliminary version now being offered contains important limitations to the functionality described above. Here are the current limitations: 1) The dynamic programming algorithm currently uses a fixed penalty for indels (insertion or extension of gaps). (The full version uses affine gaps.) 2) The local alignments are not displayed. 3) The only supported database is Swiss-Prot. BLAZE - anticipated schedule ---------------------------- The full version of BLAZE is expected to be released in the 4th week of July. BLAZE - performance ------------------- Executing on a MasPar MP-1 massively parallel computer, BLAZE can search sequence databases 1000 times faster than on a workstation, comparing tens of millions of residues per second. The 4,096 processor MP1104 system running BLAZE searches Swiss-Prot 21 in 30 seconds with a query of 100 amino acids. Access to BLAZE --------------- You can access the GenBank BLAZE Server through a number of different networks, including Internet, BITNET, EARN, NETNORTH and JANET. The BLAZE program allows you to send a specially formatted mail message containing the protein query sequence to the BLAZE Server at GenBank. A BLAZE sequence similarity search is then performed using the full Smith Waterman dynamic programming algorithm. If you use BLAZE as a research tool, we ask that the IntelliGenetics BLAZE software be acknowledged in any resulting publication or public presentation. If you find errors, inconsistencies or other problems using the BLAZE software we ask that you notify IntelliGenetics or GenBank via electronic mail. To access the program, send an electronic mail message containing the formatted query sequence (as described below) to the following Internet address: BLAZE@GENBANK.BIO.NET If you are not on Internet, you may need to change the format of the address. Consult your systems manager to determine the correct address. Obtaining Help -------------- If you would like to receive instructions on using the BLAZE program, send a mail message to the address above containing the word "Help" on a single line of the mail message. Leave the Subject line in the mail header blank. The help text will be updated when new information is available for BLAZE searches (such as new databases on-line). For additional help on using BLAZE, send an electronic mail message to the address: BLAZE-REQ@GENBANK.BIO.NET Formatting a Query ------------------ Queries consist of a mail message with search parameters identifying the database to be searched, values related to the search and the query sequence to be used in the search. The mail message has two mandatory lines, five optional lines and a line identifying the query sequence as described below. All lines must be 80 characters or less or they will be truncated. These lines are typed into the body of the mail message in the order shown below: Search Parameter Mandatory Explanation DATALIB Swiss-Prot Yes This line specifies the database to be searched and must be included in the message. The only database available in this release of the software is Swiss-Prot. One of: MATRIX PAM50 No This line specifies the Dayhoff scoring MATRIX PAM100 matrix to use. The default is PAM250. MATRIX PAM150 MATRIX PAM200 MATRIX PAM250 GAPPEN No This line specifies the penalty for an indel, whether it creates a gap or extends one. The default is 6, which is the average score of a match with PAM250 SCORES No This line specifies the number of best-ranked sequences to be listed in the results. The default is 100. PERCENT No This line specifies the minimum percent match to the query required to be listed in the results. The percent match is the ratio of the score to the maximum (self-comparison) score. There is no default: SCORES 100 overrides. If the last two lines are listed, the number of scores returned is the smallest of the two numbers that should be returned for each line. RANGE This line specifies a range of amino acids No to use as the query sequence. and are the lower and upper bounds. The entire sequence is used as the default. BEGIN Yes This line must be included in the message. No other information is typed on it. The remainder of the message contains the query sequence in either FASTA (Pearson) format or in IntelliGenetics format. FastA Format: The first line begins with '>' (mandatory) followed by the sequence name (mandatory), a white space, and an optional description. The sequence data begins on the next line. For example: >AGREP4 Monkey SV40-like genomic segment promoting transcription. ccccttcaaatctattacaaggtgagcgtctcgccaaggcaatgaaatcgcaatatgatg tttccatttactttggattatacgtcattataaa IG Format: Optional comment lines begin with ';'. The first line not beginning with ';' has the sequence name only (mandatory). The sequence data begins on the next line. The last character must be a '1' (linear) or a '2' (circular). For example: ; Monkey SV40-like genomic segment promoting transcription. AGREP4 ccccttcaaatctattacaaggtgagcgtctcgccaaggcaatgaaatcgcaatatgatg tttccatttactttggattatacgtcattataaa1 In either format, white space characters (including newlines) within the sequence data are ignored. Sending the Query Sequence -------------------------- Use your local mail program to send GenBank your query sequence. Most mail programs allow you to import a file into the mail message. You can import your sequence file into the mail message on the line after "Begin". Please follow the format in the following example of a BLAZE request PRECISELY, but note that the program is case-insensitive, i.e. either upper or lower case letters may be used. This is an example of a mail message sent for a BLAZE search. Note that the first four lines are a mail header that is automatically created when you address a mail message. Nothing need be entered for the Subject. Each line of information must be less than 80 characters in length. Longer lines will be truncated. From: drbob@someaddress.somewhere.edu Tue Jun 14 21:36:38 1988 Date: 14 Jan 1992 2129:02-PDT To: blaze@genbank.bio.net Subject: The text that you enter into the body of the message begins with DATALIB (do not add blank lines in the message): DATALIB Swiss-Prot MATRIX PAM200 SCORES 50 PERCENT 25 RANGE 1 100 BEGIN >METR_SALTY MetR activatory protein. mieikhlktlqalrnsgslaaaaavlhqtqsalshqfsdleqrlgfrlfvrksqplrftpqgevllqlan qvlpqisralqacnepqqtrlriaiechsciqwltpalenfraswpqvemdftsgvtfdpqpalqqgeld lvmtsdilprselhyspmfdfevrlvlapdhplasktqitpedlasetlliypvqrsrldvwrhflqpag ispllksvdntllliqmvaarmgiaalphwvvesverqglvvtktlgdglwsrlyaavrdatsvrr The sequence is then sent to the BLAZE Server at GenBank. Once your message is received, it is placed in a batch queue and processed in the order it is received. Handling the Results of a BLAZE Search -------------------------------------- When the results are returned, use your local mail program to retrieve them. You can transfer the results of a BLAZE search to a separate disk file to free up space in your mail directory. Consult the documentation for your local mail program for the commands to transfer and read mail. If you wish to obtain sequences of interest, use the e-mail retrieval server mentioned below or the IRX searching system available through the GenBank On-line Service. Contact GenBank for details (415-962-7364). The mail message returned after the BLAZE search will contain: - The program banner - All input parameters including default values used for lines that were unspecified in the input message. - The statistics of the search including the total number of residues in the database searched, the number of sequences searched, the processing time for the search, and the number (in millions) of operations per second. - The names, lengths, scores and percent match (ratio of the score to the maximum score) of the top best-ranking sequences. Example: +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+= Here are your search results from the BLAZE e-mail server. Database versions currently in use BLAZE searches. Database Version Used -------- ------------ SWISS-PROT 21 Please report suspected bugs to us at blaze-req@rns.ig.com +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+= ============================ ========================== B L A Z E (tm) ======= ======= A High-Performance High-Sensitivity Biological ======= Sequence Similarity Searching Program ======= Utilizing a Massively Parallel Implementation ======= of the Dynamic Programming Algorithm of ======= Smith and Waterman ======= ======================================================================== ========================= Pre-Release 0.5 - May 1992 =================== Copyright (c) 1992 by IntelliGenetics, Inc. and MasPar Computer Corporation ******************************************************************** * * * Please note the following important limitations of this release: * * * * 1) The dynamic programming algorithm currently uses a fixed * * penalty for indels (insertion or extension of gaps) * * 2) The local alignments are not displayed. * * 3) The only supported database is Swiss-Prot. * * * ******************************************************************** INPUT PARAMETERS DATALIB SWISS-PROT MATRIX PAM200 GAPPEN 6.0 SCORES 50 PERCENT 25 >METR_SALTY MetR activatory protein. mieikhlktlqalrnsgslaaaaavlhqtqsalshqfsdleqrlgfrlfvrksqplrftpqgevllqlan qvlpqisralqacnepqqtrlriaiechsciqwltpalenfraswpqvemdftsgvtfdpqpalqqgeld lvmtsdilprselhyspmfdfevrlvlapdhplasktqitpedlasetlliypvqrsrldvwrhflqpag ispllksvdntllliqmvaarmgiaalphwvvesverqglvvtktlgdglwsrlyaavrdatsvrr SEARCH STATISTICS Query sequence length: 100 Number of sequences searched: 23742 Number of residues: 7866594 Time: 00:00:14.439 Mops: 54.482 Sequence Name Description Length Score %Match ------------------------------------------------------------------------------- 1. METR_SALTY METR ACTIVATORY PROTEIN. 276 387 100 2. METR_ECOLI METR ACTIVATORY PROTEIN. 317 375 97 3. AMPR_ENTCL AMPR ACTIVATORY PROTEIN. 290 106 27 4. AMPR_CITFR AMPR ACTIVATORY PROTEIN. 290 106 27 5. CATR_PSEPU CATBC OPERON REGULATORY PROTEIN. 289 106 27 6. TFDS_ALCEU PROBABLE ACTIVATOR PROTEIN (FRAGMENT). 180 105 27 7. AMPR_RHOCA AMPR ACTIVATORY PROTEIN. 289 102 26 8. LYSR_ECOLI LYSA ACTIVATORY PROTEIN. 311 100 26 9. ILVY_ECOLI ILVY ACTIVATORY PROTEIN. 297 100 26 10. YFEB_ECOLI HYPOTHETICAL 33.6 KD PROTEIN IN GLTY-ALA 294 97 25 11. GLTC_BACSU REGULATORY PROTEIN GLTC. 306 97 25 12. CATM_ACICA PROBABLE CAT OPERON REPRESSOR PROTEIN. 251 95 25 Retrieving DataBank Entries found with BLAZE -------------------------------------------- Database entries can be retrieved by either locus name or accession number. To use the GenBank Retrieval System, send an electronic message to RETRIEVE@GENBANK.BIO.NET containing as text (leave the Subject: line blank) either accession numbers (one per line) and/or entry names (one per line). Obtaining BLAZE --------------- The BLAZE program can be purchased. For information on equipment and network configuration, please contact: IntelliGenetics 700 E. El Camino Real Mountain View, CA 94040 or: MasPar Computer Corporation 749 North Mary Avenue Sunnyvale, CA 94086 End of BLAZE Server Help