SRS-FastA Subset Search

This is a service to match your query sequence against subsets of nucleic and protein databanks. These subsets are chosen by you with keyword selections in the sequence documentation.

This is an experimental service.

Your feedback on this service is essential to improving it. Please send your suggestions and comments to


About SRS-FastA Subset Searches (top)

There may be times when you will get better information by eliminating unwanted sections of the databanks before performing a sequence search. Given the large size and constant updates to the biosequence databanks, it is difficult to produce subsets of these data directly for similarity searching.

However, one can do this if the search software is tuned to search only certain indexed entries in the databank. By coupling similarity search software with keyword selection software, one can interactively produce subsets based on various criteria, then search those subsets for similarity.

Obvious examples for subset selection are all of one species group, e.g. Drosophila or Arabidopsis, and search that species's sequences. You may also select any group based on other fields in the sequence documentation, including accession, date, definition, keywords, organism, authors title, reference, comment, features, and sequence length. Please suggest subsets that would be frequently used. These can be put into a predefined subset section, for quicker and simpler use.

The two software components used in this service are

Other software could be adapted for the same task. BLAST for sequence similarity, and WAIS or IRX for subset selection would be possible choices. The main modifications needed for this service are to (a) teach SRS to produce a pair of files that index the databanks for a subset selected from an SRS query, and (b) teach FastA to read the subset indices as a new form of sequence library.

NOTICE. The sequence similarity searches, using FastA software, require signficant computer processor time. This service is currently run on a one-processor computer that must provide several other services, and consequently this test service may be unavailable for short periods if more than one or two people use it at the same time. If you get a message that the server is too busy at the moment, you may be able to resubmit your problem in 5 minutes or so.

Other biocomputing centers are encouraged to copy this set-up and provide this service, locally or Internet. It is likely that the IUBio server will not be able to handle demand of any large amount, if this becomes a useful service. Please contact Don Gilbert who will be happy to help others set up a similar service.

References (top)

Home of FastA software for rapid sequence similarity searches.

FastA references:

Home of Sequence Retrieval System (SRS) software for selecting databank records based on keywords.

SRS References:

Modifications to FastA source for searching subsets of sequence databanks are at IUBio Archive, in the molbio/search/subsets/ folder.

Modifications to SRS source for creating subset lists used by FastA are at IUBio Archive, in the molbio/search/subsets/ folder.

               -- Don Gilbert 
               September 1995