NOTES for test DAS-LDAP server This das2ldap.pl is an LDAP-backend server for distributed genome annotation data, adapted from from LDasServer-1.05 (http://www.biodas.org/). It provides LDAP-standard directory queries using the same schema output as ldap://eugenes.org:3891/srv=srsgnomap (where http://eugenes.org:7180/cgi-bin/srsdas/dsn has the same backend and data) but using Bio::DB::GFF as the backend (with mysql database) for overlaying on existing DAS servers. A test server for das2ldap is ldap://eugenes.org:3892/srv=das,o=euGenes (where http://eugenes.org:7180/cgi-bin/das/dsn has the same backend and data) Given steps below, one can install an LDAP gateway to DAS servers that use bioperl Bio::DB::GFF methods, fairly easily. It may be that LDAP directories of bio-data offer a wider range of possible data-directory services, using existing protocols and relatively mature software. Servers and clients for LDAP browsing, search and retrieval are commonly available in several languages, and hooking in biology data takes not much more effort that writing backends like this to existing bio-databases, plus installing in client software LDAP adaptors. Development of consensus directory schema suited to biology data is the largest chore, but then reaching a consensus on any bio-data exchange format is always a chore :) Additional parts of this software, client and server, for genome maps are available at http://iubio.bio.indiana.edu/soft/molbio/java/apps/gnomap/vers2/ D. Gilbert, gilbertd@bio.indiana.edu May 2002 DAS-LDAP server configuration LDAP server configuration notes - install ldap server (slapd, http://www.openldap.org or equivalent) which permits backend-shell use of this script. This will run as a stand-alone daemon, as any user. Be sure to set the configuration --enable-shell flag. This is my openldap configure setting: ./configure --prefix=/usr/local/openldap2b --enable-aci --enable-debug \ --enable-dynamic --enable-modules --enable-ldbm --enable-shell --enable-sql \ --enable-rlookups --enable-phonetic - configure schema and server for this data My test set for das2ldap is found with this source packaged as das-slapd.tar.gz, including thes needed files and some others start.dasslapd -- simple script to start server dasslapd.conf -- configuration for the ldap server and the das2ldap database gnomap2.schema -- schema for genome annotation data directories schema/ - accessory schema included above (mostly standard ldap) das2ldap.pl -- this program - in dasslapd.conf, this is the das2ldap.pl backend interface definition database shell suffix "srv=das,o=euGenes" search das2ldap.pl - (re)start server: edit and run start.dasslapd to suit needs. set ldap=/usr/local/openldap2b/ setenv LDAPURL ldap://eugenes.org:3892/ # for das2ldap.pl setenv MYSQL_PWD mysql # needed by Bio::DB:GFF setenv LD_LIBRARY_PATH "$ldap/lib:${LD_LIBRARY_PATH}" $ldap/libexec/slapd -u gilbertd \ -h "ldap://0.0.0.0:3892/" -f dasslapd.conf Test DAS-LDAP server This server at 'srv=das,o=euGenes' has wormbase GFF data loaded as per LDasServer methods. # list species, other info ldapsearch -H "ldap://eugenes.org:3892/" -b "srv=das,o=euGenes" -s base ldapsearch -H "ldap://eugenes.org:3892/" -b "srv=das,o=euGenes" -s one # list segments ldapsearch -H "ldap://eugenes.org:3892/" -b "spp=worm,srv=das,o=euGenes" -s base ldapsearch -H "ldap://eugenes.org:3892/" -b "spp=worm,srv=das,o=euGenes" -s one # query for features and dna ldapsearch -H "ldap://eugenes.org:3892/" -b "chr=CHROMOSOME_V,spp=worm,srv=das,o=euGenes" -s sub \ '(&(|(objectClass=Feature)(objectClass=NA-sequence))(start=50000)(stop=60000))' # component features ldapsearch -H "ldap://eugenes.org:3892/" -b "chr=CHROMOSOME_V,spp=worm,srv=das,o=euGenes" -s sub \ '(&(objectClass=Feature)(|(ft=Sequence:curated)(ft=exon:curated)(ft=intron:curated))\ (start=50000)(stop=60000))' # compound features ldapsearch -H "ldap://eugenes.org:3892/" -b "chr=CHROMOSOME_V,spp=worm,srv=das,o=euGenes" -s sub \ '(&(objectClass=Feature)(ft=transcription)(start=>50000)(stop=<80000))' Compare to SRS-LDAP annotation server at euGenes.org Compare this DAS-LDAP to the eugenes LDAP server at 'srv=srsgnomap' which has data extracted from wormbase, flybase, SGD, TIGR (weed) and reformatted into a GenBank/EMBL style set of features (e.g. mRNAs and CDSs instead of exons and introns). Note that data are not currently (May 2002) in synchrony. This server uses an SRS based backend rather than MySQL. You can find the data for this and SRS parser configurations at ftp://iubio.bio.indiana.edu/eugenes/ for each species in e.g., fly/features/*.ldif (LDAP's exchange format) and tools/srs/vers6/gnomapldif.* for the SRS parser. # list species, other info ldapsearch -H "ldap://eugenes.org:3891/" -b "srv=srsgnomap" -s base ldapsearch -H "ldap://eugenes.org:3891/" -b "srv=srsgnomap" -s one # list segments ldapsearch -H "ldap://eugenes.org:3891/" -b "spp=worm,srv=srsgnomap" -s base ldapsearch -H "ldap://eugenes.org:3891/" -b "spp=worm,srv=srsgnomap" -s one # query for features & sequence ldapsearch -H "ldap://eugenes.org:3891/" -b "chr=V,spp=worm,srv=srsgnomap" -s sub \ '(&(|(objectClass=Feature)(objectClass=NA-sequence))(start=>10000)(stop=<30000))' ldapsearch -H "ldap://eugenes.org:3891/" -b "chr=V,spp=worm,srv=srsgnomap" -s sub \ '(&(objectClass=Feature)(|(ft=gene)(ft=mRNA)(ft=CDS))(start=>50000)(stop=<80000))' LDAP client software for genome data The work-in-progress genome map display program, gnomap, now supports LDAP and DAS data input adaptors. This is a minimal map program focused on generating server-side imagemaps for genome browsing, as at http://eugenes.org/ Software is at http://iubio.bio.indiana.edu/soft/molbio/java/apps/gnomap/vers2 An LDAP-aware bioinformatics GRID program (in progress) is BioGridRunner, at http://iubio.bio.indiana.edu/grid/ A very useful generic LDAP browser/editor is available at http://www.iit.edu/~gawojar/ldap/ or http://www.mcs.anl.gov/~gawor/ldap/ A Minimal perl client for LDAP searches is # Perl network libraries available from CPAN use URI::URL; use LWP::UserAgent; use Net::LDAP; # top level genome map directory (service srv=srsgnomap) my $url= 'ldap://eugenes.org:3891/srv=srsgnomap??one?(objectClass=*)'; ldapSearch( $url ); # one level directory of worm species data $url= 'ldap://eugenes.org:3891/spp=worm,srv=srsgnomap??one?(objectClass=*)'; ldapSearch( $url ); # Search directory of fly species, chromosome 2L, and all subdirectories # for Feature and NA-Sequence objects, for the chromosome base range # and limit search to specific feature (ft) kinds $url= 'ldap://eugenes.org:3891/chr=2L,spp=fly,srv=srsgnomap' .'?' # select attributes to return, or all, e.g. 'id,loc,name' .'?sub' # search scope - base,one,sub(directories) .'?(&(|(objectClass=Feature)(objectClass=NA-Sequence))' # search filter .'(|(ft=gene)(ft=CDS)(ft=insertion))' .'(start=<2000000)(stop=>1000000))' .'?sizelimit=20,deref=always' # other ldap options ; ldapSearch( $url ); sub ldapSearch { # minimal - should do error checks my $surl= shift; my $url= new URI::URL($surl); print "\n [ $url ] \n"; if ($url->scheme ne 'ldap') { warn "not ldap: $url\n"; return; } my $scope = $url->scope || "base"; my @opts = (scope => $scope); push @opts, "base" => $url->dn if $url->dn; push @opts, "filter" => $url->filter if $url->filter; my @att = $url->attributes; push @opts, "attrs" => \@att if @att; my @extn = $url->extensions; push @opts, @extn if (@extn); $ldap = new Net::LDAP($url->host, port => $url->port) or die "$@"; $ldap->bind; # anonymous my $mesg = $ldap->search(@opts); if ($mesg->code) { warn $mesg->error,"\n"; } else { foreach my $e ($mesg->all_entries) { $e->dump; } } $ldap->unbind; }