Directories of Bio-data

based on LDAP for a computable, networked search/retrieval of bio-data objects (as opposed to flat file databanks or web pages). This test sample includes an experimental LDAP-SRS gateway for query and retrieval of bulk biodata via the Lightweight Directory Access Protocol (LDAP).

LDAP has several features favorable for data-grid computing including: efficiency with large numbers and volumes of objects, public standard query syntax that is based on shared schema, and mature software protocols and systems that include security, binary encoded transport, flexibility in employing multiple backend data servers (RDMBS, Berkeley-DB, Perl and Shell tools), where new ones (like the bioinformatics data search standard SRS) can be added with relative ease.

The software here is experimental, work in progress (July 2002).

Experimental Web access to Bio-directories via LDAP and WebServices, Oct. 2002 Software behind this is all collected in iubio-srs.tar.gz (Note: this may be offline at times).

The simple client programs, ldapsearch.java and ldapsearch.pl (perl), should be usable from most computer systems, and are straight-forward programming examples of using LDAP to search and retrieve data. They include test examples for IUBio Archive's LDAP bio-data services (which may change).

The srs-gnoinf-dirs-talk, MS Powerpoint or Portable Doc, is a slide show presented at ISMB 2002 (Edmonton AB) for the SRS User group meeting. It touches on use of SRS with genome information systems like FlyBase and euGenes, and for grid data directory systems.

The iubio-srs.tar.gz tar file is a collection of new software to test for linking SRS with OpenLDAP server as a back-end search system for bio-data. It seems to work well in preliminary tests. Also, tests with Web-XML (SOAP) are underway to compare their usefulness (Oct. 2002; so far there are SOAP memory and efficiency problems w/ large amounts of data). See the srsldap-speed3.pdf chart of efficiencies of various methods (LDAP, SOAP, Wgetz, FTP) for biodata directory search and retrieval. (older chart in srsldap-speed2.pdf)

Tests have been focused mainly on looking at efficiency of this for use in distributed (grid) computing. The general idea being for useful grid computing w/ biodata, one needs to be able to select and move quantities in the order of 100K - 1000K records, 1GB - 10 GB in size, to many computers quickly enough to make it better than running the computation on one central server.

Though work remains to make this widely usable, those who are interested in testing SRS over LDAP can use it as a starting point. The source code in iubio-srs.tar consists primarily of :

For further information, please contact Don Gilbert (gilbertd@bio.indiana.edu)