README for iubio-srs bio-directory services Source ftp://iubio.bio.indiana.edu/biogrid/gridserver/directories/iubio-srs.tar.gz This is free software. The SRS biodata retrieval system this works with is commercial, but free for academic users. Contact: Don Gilbert, gilbertd@bio.indiana.edu, Oct. 2002 ------------------- Experimental Biodata Directory Systems October 2002 Experimental tests for developing practical, widely usable Bio-data access for BioGrids. Further information, sources and test components used here are available at http://iubio.bio.indiana.edu/biogrid/directories/ Test client/server systems include Sequence Retrieval System via LDAP Sequence Retrieval System via Web Services This is a test of new automated methods for bio-data search and retieval systems. It is looking for practical methods which have several features suitable to high-volume data search and retrieval, that provide standard programming interfaces for wide use of automated bio-data access, suitable to data grid and related distributed computing for bioinformatics. This work is based these components: SRS, the bioinformatics Sequence Retrieval System. SRS is used as a backend data access system. Currently SRS provides bio-object search and retrieval for the largest volume of public bioinformatics (some 30 million objects at EBI's server, Sept. 2002). LDAP, the Lightweight Directory Access protocol, which provides a mature standard client/server protocol for search and retrieval of high-volume directories of objects (whether people info, computer resources, or bio-data). The OpenLDAP implementation provides an open-source, high performance component for implementing LDAP services (C source) JavaLDAP, an easy to use and manipulate implementation of LDAP services (Java source). Performance is near to that of OpenLDAP, while providing a simpler installation and programming interface for experimental work. Web Services, an emerging IT standard for computable access to data and compute services, using XML over Web (HTTP) protocols (SOAP, WSDL, UDDI and others). GLUE, a very useful and practical Web Service toolkit. This also provides a common Web server and Java servlet page methods for testing LDAP and WS thru web page clients. Jakarta The Apache Jakarta project set for Java-based web services and related tools. DSML, the Directory Services markup language (an LDAP-XML translation). This provides a standard directory service XML translation to/from LDAP, ensuring a common standard for these services, and providing for co-development of LDAP and WS access methods. Why Java? This work is now strongly focused on Java components. Why? The methods being tested all are multi-language, multi-source technologies with wide software industry support, including open-source and commercial implementations in all standard programming languages. This is a neccessary basis for a bioinformatics technology designed for wide acceptance and deployment. The author is testing methods using C, C++, Perl and Java. Java currently provides the best common ground for testing and especially easily deploying these tests to new service centers, both in terms of the wide range of freely available libraries for network services, XML, Web and other needs, and for its simple packaging for deployment (with some very carefully chosen exceptions, the tests will run without compiling any platform-specific code on the range of Solaris, Linux, MacOSX and other Unix systems). Perl is a very attractive language for bioinformatics and bio-data access, however tests so far using standard Perl network and related libraries have fallen short of the performance of Java, and often have blown-up with high data volume tests (as a few Java tests have). The proportion of compiled, platform-specific additions is also higher for Perl than Java, making packaging and deployment on multiple systems more difficult. The C-based OpenLDAP server has provided the most efficient method so far for bio-directory access. But its intallation and use requires more effort, and its modification to use bioinformatics data access backends takes more work. For a production system, this may be the best choice. The JavaLDAP package provides close (1/2x) the performance, while being much simpler to deploy and develop with. Contents of iubio-srs distribution ---------------------------------- build.pl* etc/ jldapserver/ lib/ schema/ srs.wsdl build.sh* iubio/ jwebserver/ mimedata/ slapd/ tmp/ -- Use build.pl (or build.sh) to build the SRS-ldap compiled backend, linking to your SRS v6 library. Also it will compile the java source in iubio/srs/ and with some help, the OpenLDAP backend for srs use. ./etc: Miscellany files bionames-dsml.xsl pcre-3.9.tar.gz srs.jsp.html iubio-srs.dir.list pcrs-0.0.1-src.tar.gz srs6ldap.notes iubio-srs.oat.tests pcrs.readme web-app_2_2.dtd jbuild.sh* runglue* -- mostly these are working notes. -- runglue is a small script to set Java paths to run the GLUE clients -- the pcre, pcrs sources are C regex packages for enabling ldap to srs query syntax conversion (not currently needed). ./iubio: Primary source code for this package lib-srs/ openldap-back-srs/ srs/ ./iubio/lib-srs: SRS-ldap C++ source iubio_srs_SRSjni.h srs6ldap.cc srs6ldap.h srs6ldap_jni.cc -- srs6ldap.cc is the main interface to SRS, it builds as standalone program (similar to SRS getz) and is linked into OpenLDAP and Java JNI -- srs6ldap_jni is a Java native interface ./iubio/openldap-back-srs: OpenLDAP backend for srs Makefile bioldapsearch.c pcre.h version.c Makefile.in config.c pcrs.h Makefile.slapd external.h portable.h backend.c init.c search.c -- parts needed to build back-srs for openldap ./iubio/srs: iubio-srs Java source and class files ConvertKeyVals.class SRSsoapPortType.class ConvertKeyVals.java SRSsoapPortType.java LdapBackendSRS$AddMap.class SRSsoapclient.class LdapBackendSRS$MyEntry.class SRSsoapclient.java LdapBackendSRS$SrsEntrySet.class SRSsoapserver.class LdapBackendSRS.class SRSsoapserver.java LdapBackendSRS.java ldapsearch.class SRSjni.class ldapsearch.java SRSjni.java -- SRSjni, the SRS interface -- ConvertKeyVals - helper class -- LdapBackendSRS - backend for JavaLDAP -- SRSsoapserver, SRSsoapPortType, SRSsoapclient - SOAP interface -- ldapsearch - simple LDAP client ./jldapserver: Java LDAP server directory javaldap-README.txt jldapdb.data log.srs6ldap javaldap.prop* jldapdb.properties old/ javaldap.sh* jldapdb.script srs6ldap.conf jldapdb.backup log.ldapserver std.oc.xml -- javaldap.sh - start server -- javaldap.prop - configurations -- jldapdb - data for hsql (java) database backend -- std.oc.xml - ldap schema (not used) -- srs6ldap.conf - configuration for srs6ldap ./jwebserver: Java Web server bionames/ common/ log.srs6ldap srs6ldap.conf tmp@ bionames.sh* log.bionames srs.wsdl srssoap.sh* -- bionames.sh - start server -- srssoap.sh - start simple SOAP service only ./jwebserver/bionames: bionames web application WEB-INF/ index.html@ srsldap.jsp.txt@ bionames-index.html iubio@ srssoap.jsp dsmlxsl.txt srsldap.jsp srssoap.jsp.txt@ ./jwebserver/bionames/WEB-INF: GLUE web server for bionames application classes/ jstl-tld/ maps/ services/ web.xml config.xml lib/ security/ tmp/ ./jwebserver/bionames/WEB-INF/services: SRS web service configuration srs.xml system/ ./jwebserver/common: GLUE web client/server common configurations ./lib: library files (mostly Java jars) -- includes GLUE package, JavaLDAP, various XML, Web and network services packages -- copyrights for these all provide for free redistribution of the libraries ./lib/TARGET_OS: compiled shared library for SRS (must have Lion Biosciences license; not freely redistributable) ./mimedata: folder for SOAP client ./schema: Biodata LDAP schema ./slapd: OpenLDAP slapd server configuration folder srsslapd.conf srsslapd.start* ./tmp: temp folder for soap server