Dear Tin Wee and other biogridders, I've spent several days of my 'holiday' looking at globus beta v2, as it has data replication and other advancements from v1 which are of interest to bio-data-grids. My conclusion is that version2 is usable for the same things that version1 of globus does (remote jobs, url-copy, compute resource directories all in a secure/authenticated structure), but the new data replication methods are still too limited/incomplete to be useful for biodata replication, compared to existing ftp mirroring. There likely are some intermediate uses of the data replication software - the Grid/gsiftp modified methods may well provide efficiency improvements for parallel, higher performance data exchange than other ftp software, but the replication mechanism that wraps around this is still cumbersome and limited - e.g. it doesn't track file dates, it requires a lot of pre-processing of file names/locations, etc. The one thing that becomes clear to me for uses not only with data grids but generally in bioinformatics is a need for directories of reference data, and the LDAP software that globus and others use for directory information looks well designed for such, being more computable, with security, searching, etc. that other means such as XML or HTML based data directory information lacks. I hope to start on such a biodata and software directory here at IUBIo, in conjunction with the Bio-Mirror project. A test case is found at ldap://iubio.bio.indiana.edu:3891/ If you use OpenLDAP, a base search of this would be ldapsearch -t -H ldap://iubio.bio.indiana.edu:3891/ -b 'ou=bio.indiana.edu,o=Grid' -- Don Notes on Globus grid software package installation, tests beta version 2 don gilbert, gilbertd@bio.indiana.edu Tue Dec 25 17:26:02 EST 2001 For globus version 1, which includes gatekeeper (remote job run), security, and ftp url-copy, see globus_sag1.1.3.pdf document which clearly describes installation. For globus v2, documentation is yet sparse, with major changes from version 1, though the general sense of how it works is similar to version 1. Do read a bunch of the documents at www.globus.org before trying this. globus_sag1.1.3.pdf -- version 1 install notes, very clear GlobusQuickStart.pdf -- how to use GlobusFAQ.html For version 2, mds21.pdf FAQ.html - has some essentials about changes from version 1 ggf1replica.pdf gridftp-C2WPdraft3.pdf replicaGettingStarted.pdf ReplicaManagementService.pdf Notes: -- dont use binaries of beta 2, my trial found them incomplete -- successfully installing, using version 1 may be a prerequisite to getting beta version 2 to work, given its current sparse documentation. -- installed on sun solaris v8 systems, with much trial and error. -- getting/installing SSL security certificates is a prerequisite to using any of this Globus offers these now, for research purposes, but be aware a human has to certify, so avoid the mistake I made of requesting a bunch of these for several machines. You can use CA certs. from other sources, your own institution if available. These should be same as HTTPS certifications, but note that self-signed ones won't work; they must be validatable thru CA security chain -- the naming scheme of directory information is a common part of all this, for certificates, use of gatekeeper jobs, gridftp, and LDAP resource directories. Naming seems to be somewhat in flux - docs variously use things like "ou=organization, o=Globus, o=Grid" "o=organization, o=Globus, c=Country" I've chosen "ou=domain.name, o=Grid" as a common, simple one -- version 2 is substantially changed from version 1, and currently lacks full documentation. The most interesting addition for data intensive work is the replica management system, which is at an early stage of development. Functional but still lacking essential (to me) abilities. -- the build process for version2 creates .tar.gz packages that potentially can be reused to reinstall this on other machines. I found this didn't work well, but a tar copy of the installation to a new system, then re-running the setup/globus/ methods (below) on the new system was all that is needed. -- for any LDAP based directory services, such as with globus, this is an invaluable browser/editor: http://www-unix.mcs.anl.gov/~gawor/ldap/ set gl=/usr/local/globus2l setenv GLOBUS_LOCATION /usr/local/globus2l source of globus2 beta: ftp://ftp.globus.org/pub/gt2/beta/gpt/gpt-0.2.tar.gz ftp://ftp.globus.org/pub/gt2/beta/bundles/src/ ; get *.tar.gz oat% cd gpt /c7/soft/grid/globus2/beta/src/gpt oat% ./build_gpt /c7/soft/grid/globus2/beta oat% ls -1 src/glob*gz src/globus_api_bundle.tar.gz src/globus_service_static_bundle.tar.gz src/globus_services_1_bundle.tar.gz src/globus_services_2_bundle.tar.gz src/globus_tools_bundle.tar.gz $gl/sbin/globus-build -installdir=$gl -verbose \ -log=logs/globus_api_bundle.log src/globus_api_bundle.tar.gz vendorcc32dbg $gl/sbin/globus-build -installdir=$gl -verbose \ -log=logs/globus_service_static_bundle.log src/globus_service_static_bundle.tar.gz vendorcc32dbg $gl/sbin/globus-build -installdir=$gl -verbose \ -log=logs/globus_services_1_bundle.log src/globus_services_1_bundle.tar.gz vendorcc32dbg $gl/sbin/globus-build -installdir=$gl -verbose \ -log=logs/globus_services_2_bundle.log src/globus_services_2_bundle.tar.gz vendorcc32dbg $gl/sbin/globus-build -installdir=$gl -verbose \ -log=logs/globus_tools_bundle.log src/globus_tools_bundle.tar.gz vendorcc32dbg ## not in current source bundle: ncftp - try from alpha $gl/sbin/globus-build -installdir=$gl -verbose -log=$gl/../logs/ncftp.log vendorcc32dbg cd $gl/setup/globus ./setup-globus-core ./setup-globus-common ./setup-globus-gatekeeper ./setup-globus-gram-job-manager ./setup-globus-gram-job-manager-scripts ./setup-globus-mds-common ./setup-globus-mds-gris ./setup-ssl-utils -- as root, start slapd for GRIS/GIIS $gl/sbin/SXXgris start -- as user, initialize source $GLOBUS_LOCATION/etc/globus-user-env.csh ===== tests .............. --- basic proxy security, need certificates to do any of this grid-proxy-init Your identity: /O=Grid/O=Globus/OU=bio.indiana.edu/CN=Don Gilbert --- services etc/grid-info.conf:GRID_INFO_PORT="2135" gsiftp 2811/tcp # grid-ftp/globus globus-gatekeeper 2119/tcp # grid/globus --- GRAM/gatekeeper ./cogrun org.globus.tools.GlobusRun -s -r oat '&(executable=/bin/ls) (arguments="-l")' ... ok -- GRIS ldapsearch -x -h oat -p 2135 -b "Mds-Vo-name=local, o=Grid" -s sub "(objectclass=*)" # oat, local, grid dn: Mds-Host-hn=oat,Mds-Vo-name=local,o=grid ... --- GIIS ldapsearch -x -h oat -p 2135 -b "Mds-Vo-name=site, o=Grid" -s sub "(objectclass=*)" # oat, site, Grid dn: Mds-Host-hn=oat,Mds-Vo-name=site,o=Grid ^^^ haven't yet got GIIS to do multi-host service lookups -- NO DOCUMENTATION ON HOW TO --- GridFTP/gsiftp (wuftpd server, gsincftp client, globus-url-copy) globus-url-copy gsiftp://oat.bio.indiana.edu/var/tmp/rshelp.out \ gsiftp://oat.bio.indiana.edu/var/tmp/rshelp2.out --- REPLICAS - globus2 replica tests - basics work, but not yet ready/fully developed for bioinformatics data replication needs >> this is an invaluable browser/editor for working with LDAP directories http://www-unix.mcs.anl.gov/~gawor/ldap/ -- use separate slapd server for this, database ldbm suffix "ou=bio.indiana.edu, o=Grid" rootdn "cn=Data Manager, ou=bio.indiana.edu, o=Grid" directory /usr/local/var/openldap-datagrid-ldbm ## need to create rc= entry first ! globus-replica-catalog \ -h "ldap://oat.bio.indiana.edu:3891/lc=blast,rc=BioMirror Replica Catalog,ou=bio.indiana.edu,o=Grid"\ -manager "cn=Data Manager,ou=bio.indiana.edu,o=Grid" \ -collection -create biomir-blast.files # Register A location globus-replica-catalog \ -h "ldap://oat.bio.indiana.edu:3891/lc=blast,rc=BioMirror Replica Catalog,ou=bio.indiana.edu,o=Grid"\ -manager "cn=Data Manager,ou=bio.indiana.edu,o=Grid" \ -location "biomir-us" -create "ftp://bio-mirror.net/biomirror/blast/" \ biomir-blast.files # Register B location globus-replica-catalog \ -h "ldap://oat.bio.indiana.edu:3891/lc=blast,rc=BioMirror Replica Catalog,ou=bio.indiana.edu,o=Grid"\ -manager "cn=Data Manager,ou=bio.indiana.edu,o=Grid" \ -location "biomir-au" -create "ftp://bio-mirror.au.apan.net/biomirror/blast/" \ biomir-blast.files # manage C location -- create globus-replica-management \ -collection "ldap://oat.bio.indiana.edu:3891/lc=blast,rc=BioMirror Replica Catalog,ou=bio.indiana.edu,o=Grid" \ -manager "cn=Data Manager,ou=bio.indiana.edu,o=Grid" \ -location "bmir-test" -create "gsiftp://pondscum.bio.indiana.edu/bio/biomirror/blast/" # manage C - copy from A globus-replica-management \ -collection "ldap://oat.bio.indiana.edu:3891/lc=blast,rc=BioMirror Replica Catalog,ou=bio.indiana.edu,o=Grid" \ -manager "cn=Data Manager,ou=bio.indiana.edu,o=Grid" \ -location "biomir-us" \ -files bmir-files -copy "bmir-test" ldapsearch -H ldap://oat:3891/ -x -b 'ou=bio.indiana.edu,o=Grid' '(objectclass=*)' # bio.indiana.edu,Grid dn: ou=bio.indiana.edu,o=Grid objectClass: top objectClass: organization o: bio.indiana.edu o: Biology Department, Indiana Unversity street: 1001 E. 3rd Street st: Indiana # BioMirror Replica Catalog,bio.indiana.edu,Grid dn: rc=BioMirror Replica Catalog,ou=bio.indiana.edu,o=Grid objectClass: GlobusTop objectClass: GlobusReplica objectClass: ReplicaCatalog cn: biomirror-rc cn: BioMirror Replica Catalog # blast,rc=BioMirror Replica Catalog,bio.indiana.edu,Grid dn: lc=blast,rc=BioMirror Replica Catalog,ou=bio.indiana.edu,o=Grid objectClass: top objectClass: GlobusTop objectClass: GlobusReplicaLogicalCollection filename: README filename: alu.a.Z filename: alu.n.Z filename: drosoph.aa.Z ==== testing giis/gris slapd/sasl security problem grid-info-search -h oat -p 2135 -b "Mds-Vo-name=site, o=Grid" -s sub "(objectclass=*)" SASL/GSS-OWNYQ6NTEOAUVGWG authentication started ldap_sasl_interactive_bind_s: Local error ^^^ need CA certificate for lapd/server to use version2 giis/gris slapd ===== setups .............. oat% ./setup-globus-core creating globus-script-initializer creating globus-sh-tools.sh Done oat% ./setup-globus-common Creating globus-hostname Creating globus-domainname Done oat% ./setup-globus-gatekeeper Creating gatekeeper configuration file... Done Creating grid services directory... Done oat% ./setup-globus-gram-job-manager Setting up fork job manager ------------------------------ Creating job manager configuration file... - Getting gatekeeper subject - Getting gatekeeper port Done Creating grid service jobmanager... Done oat% ./setup-globus-gram-job-manager-scripts Setting up job manager scheduler scripts... Done oat% ./setup-globus-mds-common Creating... /usr/local/globus2l/etc/grid-info.conf Done oat% ./setup-globus-mds-gris Creating... /usr/local/globus2l/sbin/SXXgrid /usr/local/globus2l/etc/grid-info-resource-ldif.conf /usr/local/globus2l/etc/grid-info-resource-register.conf /usr/local/globus2l/etc/grid-info-slapd.conf /usr/local/globus2l/etc/grid-info-site-giis.conf /usr/local/globus2l/etc/grid-info-site-policy.conf Done oat% ./setup-ssl-utils setup-ssl-utils: Configuring ssl-utils package Running setup-ssl-utils-sh-scripts... *************************************************************************** Note: To complete setup of the GSI software you need to run the following script as root to configure your /etc/grid-security/ directory: /usr/local/globus2l/setup/globus//setup-gsi *************************************************************************** setup-ssl-utils: Complete Press return to continue. microbe% ./setup-globus-gram-job-manager Setting up fork job manager ------------------------------ Creating job manager configuration file... - Getting gatekeeper subject Warning: Host cert file: /etc/grid-security/hostcert.pem not found. rerun setup-globus-gram-job-manager after installing host cert file. - Getting gatekeeper port Done Creating grid service jobmanager... Done