Experimental Biodata Directory Systems
October 2002

Experimentals for developing practical, widely usable Bio-data access for the BioGrid
Further information, sources and test components used here are available at http://iubio.bio.indiana.edu/biogrid/directories/
Test Cases
This is a test of new automated methods for bio-data search and retieval systems. It is looking for practical methods which have several features suitable to high-volume data search and retrieval, that provide standard programming interfaces for wide use of automated bio-data access, suitable to data grid and related distributed computing for bioinformatics.

This work is based these components:


Why Java?

This work is now strongly focused on Java components. Why? The methods being tested all are multi-language, multi-source technologies with wide software industry support, including open-source and commercial implementations in all standard programming languages. This is a neccessary basis for a bioinformatics technology designed for wide acceptance and deployment. The author is testing methods using C, C++, Perl and Java.

Java currently provides the best common ground for testing and especially easily deploying these tests to new service centers, both in terms of the wide range of freely available libraries for network services, XML, Web and other needs, and for its simple packaging for deployment (with some very carefully chosen exceptions, the tests will run without compiling any platform-specific code on the range of Solaris, Linux, MacOSX and other Unix systems).

Perl is a very attractive language for bioinformatics and bio-data access, however tests so far using standard Perl network and related libraries have fallen short of the performance of Java, and often have blown-up with high data volume tests (as a few Java tests have). The proportion of compiled, platform-specific additions is also higher for Perl than Java, making packaging and deployment on multiple systems more difficult.

The C-based OpenLDAP server has provided the most efficient method so far for bio-directory access. But its intallation and use requires more effort, and its modification to use bioinformatics data access backends takes more work. For a production system, this may be the best choice. The JavaLDAP package provides close (1/2x) the performance, while being much simpler to deploy and develop with.


Contact: Don Gilbert, gilbertd@bio.indiana.edu, Oct. 2002