BioGridRunner
Bioinformatics Data Grid application

Home of package

CAVEAT

This is a work in progress, an experimental program. No warranty is made for its operations. It includes methods which view and manipulate files, on your computer and others. The author believes it is safe for testing, but cannot promise use will not damage your files or data inadvertantly.

Intro

This is a distributed computing application for bioinformatics, incorporation directory services (data and software), grid computing methods (security, authentication, data transport and remote jobs), and gene sequence and genomic data processing methods.

This program requires a Java runtime (java or jre) program, preferrably version 1.3 or later. It likely will work with version 1.2, but not version 1.1.

This java version requirement precludes its use on Macintosh OS 9 or earlier, as Apple Computer will not make available a Java version 1.2+ You can use this program on MacOS X, where it works well.

Fetching

The current home of this package is at http://iubio.bio.indiana.edu/grid/runner/

The easiest way to run this is with Java Web Start, if that is installed on your computer. This is a standard part of Macintosh OS X 10.1. Web Start software is available for MS Windows and Unix systems at http://java.sun.com/products/javawebstart/ To use Web Start for launching BioGridRunner, find the Web Start ".jnlp" script here

You will probably find source code (Java 1.2/1.3) for BioGridRunner included with the author's java source library at ftp://iubio.bio.indiana.edu/molbio/java/source/ as iubiojava-src.zip This is a work in progress, done on a fluctuating schedule, updates will be available as time permits.

Starting

If you use Web Start, that includes methods to check for software updates, download all the Java archive files needed, and launch the program. If you prefer not to use this method, there are command-line scripts for Unix and MS Windows in the home folder
Unix: http://iubio.bio.indiana.edu/grid/runner/biogridrun.sh
MSWin: http://iubio.bio.indiana.edu/grid/runner/biogridrun.bat
With these you need also the fetch the contents of the http://iubio.bio.indiana.edu/grid/runner/lib/ folder with its java archives. Then you can run the program by running these scripts. For the hardy command-liner, these are equivalent currently to this java -cp lib/biogridrun.jar:lib/readseq.jar:lib/xerces.jar:lib/cog912all.jar iubio.grid.app (changing /: to \; for UNIX to MS DOS). If the program fails to run, check that you have the above .jar files.

Functions

A central theme of this program is directories of information, bio-data and software, on your computer and on bioinformatics services around the globe.

This client program aims to make it easy to find information, and move it from there to here, or there to elsewhere. Each resource has a URL, such as we know of from web hyperlinks, but extending to non-web GRID Internet resources. These URLs are the "name" attached to an object, whether data, software, computer disk or other resource. In this grid-runner, you can find these in directories, and move them among places using Drag'n'Drop methods to pull a URL from here to there.

Another basic part of this program is the ability to run other programs, given a description of how those programs operate. In this case we focus on command-line programs for bioinformatics: such as Clustal W (sequence alignment), EMBOSS and GCG sequence analysis packages, and others. BioGridRunner uses descriptions of how these programs run - their input data, command-line program options, and outputs. Given this description (now in an XML format "BIX" command script), this program allows you to run such bio-apps with a form or dialog to select options easily, and select or drag'n'drop data for input into the program. You can select to run these programs on your own computer or ones you have GRID credentials to run programs on.

Security and authenticated data and resource use is a basic part of GRID methods included here. Directories of data may include collaborative projects where you and others share data in a secure, authenticated way.

The program uses Drag'n'Drop methods, and will improve these so you can move things by their URL between directories / computers and into program jobs. Note that these drag and drop methods work across this program and others on your computer. You can drag files from your computer windows (Finder, MS Explorer) into this app, or drag URLs out of this app into a web browser, or other.

The application window has a menu with File, Grid Options, Help and Windows.

File/New Directory

-- this is currently the best starting option. The program operations center around directories of information, including data and program files on your computer, and on remote computers available thru FTP (file transport), and GSIFTP (Grid secure file transport).

The lightweight directory protocol (LDAP) is being tested here as a primary way for organizing federations of bioinformatic data and software. The test services at ldap://iubio.bio.indiana.edu/ include directories of gigabytes of bioinformatic data from the Bio-Mirror project; software cataloged at the IUBio Archive; genome data from the euGenes eukaryote genome service. These are and will change and be added to for testing of GRID-based data and software search and retrieval. One hope of this pilot test is to find methods whereby you can use this BioGridRunner to semi-automatically find current data of interest to you, and the software needed to analyze it, and move those data and software (with simple "Drag-n-Drop" visual methods) to the computer(s) you want to use them on. There is much behind the scenes programming and information engineering needed for this to work, but the LDAP, GRID and related tool sets make this all feasible now.

LDAP looks like an important method for automating methods of finding, searching and accessing current data in biology. LDAP provides means for searching among many computers, including globally linked ones, in ways that cannot be achieved with current Web or other means. It has been developed over the past decade for use with directories of people, computer resources, and other information, and has a range of methods for searching, joining disparate information sources, defining information objects and attributes, and offers security and wide spread software support.

Grid options/Grid Credentials

-- to use data and programs on remote computers, a necessary preliminary is secure, authenticated use of these. The Globus GRID package is used by this application for such. Grid credentials in the form of Certificate Authority signed digital certificates need to be installed by you on computers which you will use for secure access. For testing purposes, there should be available for limited use, a certificate. [ TO BE ADDED]

Grid options/Grid Job

-- this is a simple form to run any program thru GRID methods, straight from the Globus COG toolkit.

Grid options/URL Copy

-- this is a simple form to copy any URL (File, FTP, GSIFTP, maybe others) from one place to another, whether your computer or other Internet computers. The method uses "third party transfers" so that if you copy between two remote computers, the data goes between those two without touching your computer.

The author even likes it :) I've used URL copy to move parts of this developing program to/from computers.


Author

Don Gilbert
Center for Genomics and Bioinformatics
Biology Department, Indiana University
Bloomington, Indiana, 47405 USA
software@bio.indiana.edu