<< , >> , up , Title , Contents

9.1. Locating partial sequences

FINDPATTERNS locates short sequence patterns, or 'motifs'. Pattern(s) are specified by typing them in at the keyboard, or by creating a file called PATTERN.DAT.

$ Findpatterns

FINDPATTERNS in what sequence(s) ? Platelet.seq
.
.
Pattern 1: GGAGGA
Pattern 2: TTCTTC
Pattern 3:

What should I call the output file (* Platelet.Find *) ?

Submit Findpatterns job to which batch queue (* LONG *) Short

Examine the file PLATELET.FIND.

When searching for several different patterns, it may be easier to give a file of patterns. FINDPATTERNS recognises the "local data file" name of PATTERN.DAT. An example can be fetched:

$ FETCH pattern.dat

$ Type pattern.dat

Name Offset Pattern Overhang Documentation ..

BamHI 1 GGATCC 0 !

EcoRI 1 GAATTC 0 !

Promotor 1 TAATA(N){20,30}ATG 0 !

These files are similar to the enzyme data files used for the mapping programs and FINDPATTERNS can even read an enzyme data file directly.

Edit in a real or imaginary pattern of your own. This time allow for a single mismatch, in any pattern, by using /mismatch:

$ Findpatterns/mismatch=1

FINDPATTERNS in what sequence(s) ? @myoglobin.strings

Search patterns read from "Pattern.Dat"

What should I call the output file (* FindPatterns.Find *) ? myoglobin.find

Submit Findpatterns job to which batch queue (* LONG *) ? Short

The file called MYOGLOBIN.STRINGS, was created using the STRINGSEARCH program, identifying all entries with "myoglobin" in the description line. MYOGLOBIN.FIND shows all the occurences of the patterns (with up to one mismatch) in the sequences named in Myoglobin.strings.


<< , >> , up , Title , Contents