Protein profiles can be used to scan the nucleotide databases using the EGCG program TPROFILESEARCH. The program TPROFILEGAP allows the comparison of a nucleotide sequence to a profile.
(T)Profilesearch is slow and can only search databases of 30,000 (Profilesearch) or 80,000 (Tprofilesearch) entries. The Genbank and EMBL databases already exceed 100,000 entries, so you should restrict searches to divisions or a list of combined divisions which do not exceed the limits. You can check the division totals by typing the database .NAM files:
$Type EMBLDIR:EMBL.NAM
$Type GENBANKDIR:GENBANK.NAM
A file of sequence names (FOSN) which defines the prokaryote, bacteriophage and virus divisions in EMBL would contain:
EM_Ba:*
EM_Ph:*
EM_Vi:*
If the file is called EMBUGS.FIL, the three divisions would be searched on using the following:
Search for query in what sequence(s) (* GenEmbl:* ) ? @Embugs.fil
PROFILESEARCH only examines the first 100,000 symbols of any sequence. (The limit for all other GCG programs is 350,000.)