SeqPup

a biosequence editor


version 0.9
September 1999

Copyright 1990-1999 by D.G. Gilbert
email: seqpup@bio.indiana.edu


Summary

SeqPup, version 0.9, September 1999

SeqPup is a biological sequence editor and analysis program. It includes links to network services and external analysis programs. It is usable on common computer systems that support the Java 1.1 runtime environment, including Macintosh, MS-Windows and X-Windows.

Features include

( new this version, significant update this version )

Many of the features have been significantly improved in this release. As well, this release is much easier to install and use. This application is a work in progress; it has bugs.

Originally written in C++; this program has been ported to the new Java language. SeqApp/SeqPup was started in 1990 as sequence editor/analysis platform on which analysis programs from other authors could be easily incorporated into a useable interface.

You can obtain this release thru anonymous FTP or HTTP to iubio.bio.indiana.edu, in folder /molbio/seqpup/java/. This version will work on any computer system that supports Java runtime version 1.1, including Macintosh, MS Windows, and Unix/XWindows systems. The Internet URLs to this software are

ftp://iubio.bio.indiana.edu/molbio/seqpup/java/
http://iubio.bio.indiana.edu/soft/molbio/seqpup/java/

=== Brief Usage and installation, release 0.9 =====

You will need to fetch the SeqPup.jar java archive.  It includes
help documentation and data files, which will be installed when you
first run the program. 

To start SeqPup from MSWindows or Unix, use one of these command lines
  jre  -cp SeqPup.jar  run   (for Java 1.1.x with jre)
  java -cp SeqPup.jar  run   (for Java 1.2.x - not a recommended version)
  java -classpath SeqPup.jar:$CLASSPATH  run   (for Java 1.1.x without jre)
  
To start SeqPup from MacOS, use the MRJ application, in SeqPup9-macos.hqx.
Apple Java (MRJ) version 2.1.2 or later is recommended.
  
There are several external analysis applications available for SeqPup,
as compiled programs for MacOS and MSWindows.  Find these in
  seqpup9-methods-msdos.zip   (MS Windows ZIP archive)
  seqpup9-methods-macos.sit   (MacOS Stuffit archive)
These should be installed in a methods/ folder in the same folder as
SeqPup.jar.

Java version: This program will not work properly with the
Java 1.2 runtime that is now commonly available on MSWindows systems.
SeqPup will work with the Java 1.1.8 for MSWindows, which
can be installed in addition to Java 1.2.

Developers will find the source code for this application and others in the iubio:/molbio/java/source folders. Comments, bug reports and suggestions for new features are very welcome and should be sent via e-mail to SeqPup@Bio.Indiana.Edu.

September 1999: version 0.9 release
Additions to this Java release include sequence document and feature manipulation and display, enhanced sequence file handling, enhanced drawing/pretty print methods, including PDF output, user-developed plug-in methods for sequence analysis and display (Java classes), improved single-sequence editing, improved alignment manipulations.



The following needs revision for version 0.9

Contents

Summary
Fetching and Installing
Installing MacOS, MSWindows, Unix and others
Help
Updates and source Internet links
Views, Windows and Dialogs
About the application, Aligned multi-sequence view, Single sequence view, Print views
Data files
Restriction Enzyme Table, Codon Table
Features
File, Editing, Sequence manipulations, Single sequences - editing and features
Options and preferences
Pretty Print configuration
Color Selections
Internet
Internet Sequence Search and Fetch
SRS client, NCBI-BLAST client
Child Tasks
Configuring child applications
About the Java version
Source code and DCLAP
Comments, Copyright, Bugs and History



Fetching and Installing


Perhaps the most annoying problem to me with this software is the complexity of fetching and installing it. If you use Netscape or other Java enabled Internet browsers, you can use a Java applet just by clicking your way to a WWW page that has one embedded. This software is more complex than is suitable for an applet; for one thing it reads and writes files on your computer like any average program, but which applets cannot yet do.

Fetching


The current state of Java applications on computer systems means that you will need to fetch a Java runtime system as well as the files for this specific application. If you have other Java applications on your computer, you may already have the Java runtime system needed. Some current operating systems, including MacOS 8.1 and Sun Solaris 2.6 include the Java 1.1 runtime already.

SeqPup is available over the Internet at
ftp://iubio.bio.indiana.edu/molbio/seqpup/java/
http://iubio.bio.indiana.edu/soft/molbio/seqpup/java/

You will find a package file that includes all files needed for this application, in the full package/ subfolder at the Internet site above. Macintosh users will find these files in the archive file macos.seqpup-??.sit.bin. This is encoded as a Stuffit-MacBinary file. Many Internet fetching programs such as Fetch and Netscape will decode this directly, or utility software like Stuffit can do the job. MSWindows users will find program files in the archive file mswin.seqpup-??.zip. You will want to use a Win95/NT unzip utility that preserves long file names to extract this file. Unix and others will find just the program files in the archive file unix.seqpup-??.tar.gz.

The essential files and folders that make up this application are
As I've learned with other multi-platform applications that I make available to the bioscience community, there are difficulties involved in updating multiple archives for different computer systesm. The simplest way for me to offer updates to this software is to provide it as separate files. The most current version of this software is also available in its un-archived form. See also the Updates section below, for semi-automatic updating.

This current release is based on Java version 1.1. To run it, you should have installed a Java Runtime Environment (JRE), version 1.1 or later. This can be found through Javasoft, and at various mirror sites around the world.

This link provides Javasoft's runtime for Solaris and MSWindows systems
http://www.javasoft.com/products/jdk/1.1/jre/index.html

For Java runtimes for various operating systems, see
http://www.javasoft.com/cgi-bin/java-ports.cgi

Installing


The program is installed as follows. On all systems, the following items should be in one folder: the SeqPup.jar Java class archive file, the data/ and classes/ folders, and the seqpup-doc.html document. Each system also needs in this folder an application or script that starts the application in the Java runtime environment.


Installing for MacOS

There are various Java runtime systems available for Macintosh. For general use, the Macintosh Runtime Java (MRJ) produced by Apple Computer is the only one on which this program has been fully tested.

1. Keep in the same folder the application SeqPup, the SeqPup.jar Java class archive, and the data and classes folders. Move the application SeqPup from the local/Macintosh/ folder into this main folder.

2. Use the MRJ installer from Apple Computer to install this Java runtime software. It needs to be version 2.0 or later. If you have MacOS 8.1, this is included as part of the OS. With MacOS 8, an earlier version of MRJ is included, but it isn't compatible with this software. You should upgrade to the MRJ 2.0 release.

3. Unpack the local/Macintosh/data-methods-macos.sit archive. It includes child app methods that need to be placed in the data/nethods/ folder.

This program calls an Internet browser to display HTML documents. The default is to call Netscape. Also you now need to have this browser application already open; SeqPup won't yet open it for you (a bug). The browser can be changed by editing SeqPup preferences and changing the user.openurl variable. It needs to be set to the MacOS "creator" name for the program you want (sorry I don't yet have an easy interface to set this). For Netscape, the creator is "MOSS", for MS Internet Explorer, the creator is "MSIE". This is what the setting looks like now
user.openurl=MOSS

Installing for MSWindows

This has only been tested with MSWindows95/NT, and may not work with MSWin3. When you unzip the archive file, use a current unzipper that preserves long file names.

1. Keep in the same folder the program batch file SEQPUP.BAT, the SeqPup.jar Java class archive, and the data and classes folders. Move the SEQPUP.BAT file from the local/MSWindows/ folder to this main seqpup folder.

2. Install a Java Runtime system for MS Windows. A recommended Java runtime is found at http://www.javasoft.com/products/jdk/1.1/jre/index.html. You may want to install this in a general MSWindows folder, perhaps C:\WINDOWS\JAVA. I don't know of a prefered location yet on MS Window systems for Java runtime files. You will need to edit the batch file (step 3) to account for this location.

3. Edit the SEQPUP.BAT file to make the path names match the file locations on your computer.

set JAVA=C:\WINDOWS\JAVA
set APPPATH=C:\seqpup08

NOTE: If you get this message when running the batch file
OUT OF ENVIRONMENT SPACE
then try setting Properties:Memory:Initial environ value to 4096 for this batch file

4. Unpack the local/MSWindow/data-methods-mdos.zip archive. It includes child app methods that need to be placed in the data/methods/ folder.

For the application to link properly to Netscape or other Internet browser, you may need to edit the preferences file. You can do this from within the application; see the Options/Edit Framework ... Menu. Or you can edit the file dclap.ini which will be created in the \java\ folder. In either case, you want to enter the variable name user.openurl= then the full path to your browser to be sure that it works properly. This path may well be the same on your system as mine, which is as follows. Note the quote marks (due to space in name) and the double backslashes \\ which are required to insert one \.

user.openurl="C:\\Program Files\\Netscape\\Navigator\\Program\\netscape.exe"


If you use the Edit prefs menu, after editing close the window. You should be prompted to save changes; do so.


Installing for Unix and others

This has only been tested with Sun Solaris sytems.

1. Keep in the same folder, the program start script seqpup, the SeqPup.jar Java class archive, and the data and classes folders. Move the seqpup script file from the local/Unix/ folder to this main seqpup folder.

2. Install a Java Runtime system for your system. For Sun Solaris systems, see http://www.javasoft.com/products/jdk/1.1/jre/index.html. For other systems, see http://www.javasoft.com/cgi-bin/java-ports.cgi

3. Edit the seqpup file to make the path names match the file locations on your computer.
set java=/usr/local/java

4. You will want to compile or install binaries of the child applications for your system to use this feature (see Child Tasks below). Source code is provided for example child apps in the data/methods/ folder. You can use other pre-compiled versions of these on your system. You will need to edit the .command files in data/methods/ if these apps are located in other folders.

You need to define the user.openurl= variable to find your Netscape or equivalent. You can do this from within the application; see the Options/Edit basic prefs... Menu. Or edit the file ~/.dclaprc directly to enter such a line. The variable line for my unix system is

user.openurl=/usr/local/bin/netscape

Also, you might instead use a shell script (like the "netscape.sh" included). If you rename that to netscape, edit it to suit, and put in the folder with SeqPup.jar file, it may take the place of editing the preference file.



Help

Program help is available from this document. Typically the program documentation that I write gets done last and doesn't receive the effort that it deserves (because typically I haven't enough time to finish the software either, which is written in my spare, unpaid time). The help that I can offer to individual questions may be very limited. But please do send your questions and comments by e-mail to the address "seqpup@bio.indiana.edu", and these will be taken into account for future updates.

See also below section Bugs for a list of known program bugs and some work-around hints.

Updates and source Internet links

This software has a preliminary network method for easier updating when the program is revised. On the main splash-screen, the Updates button (on the "version" label) will check whether the software version you have is out-of-date. See also the File menu, Check updates command. These connect by Internet to the home archive for the software.

These options and the help command use an Internet browser program for opening URLs (Internet universal resource locators), which needs to be configured as discussed in the Install section. The updates option will list in your browser those items of this software that are newer than the version you have installed.



Views, Windows and Dialogs


About the application

The first window displayed when you start SeqPup is a splash screen that tells you a bit about the application and has active buttons to perform some basic commands. These include opening sequence files, fetching sequence from Internet servers, opening the help information, and network links to application updates, e-mail comments, and application source code.

All these functions are also accessible from the standard application menus. This form of Hypercard-like picture window with active buttons is used in all the DCLAP applications. Active button areas are highlighted when your mouse moves over them, and its function is explained at the bottom of the window. Mouse clicking, once or more depending on clicksToActivate preference, will activate that function. These Hypercard-like windows are configured as per standard HTTP NCSA-Imagemap information, stored in the data/ folder pix/about.gif.map file. Functions can be changed and new images substituted if you desire.


Data views

The program has these main kinds of views and windows onto data:

A multiple-sequence view which is the primary display when you open a sequence document; the single sequence editting view; various print views which result from an analysis, like the Restriction map; and dialog views where you control some function.

Many of these views have dialog controls -- push buttons, check boxes, radio controls and edittable text items -- to let you fine-tune a view to fit your preference. Many of these views also will remember your last preferences.

When a view has editable text items, including the sequence entry views, most usual undo/cut/copy/paste features will work.

Two or more views of the same data are possible. Some of these are truly views of the same data -- changes made in one view are reflected in another. For instance, one can have a single sequence view open, select a feature and mark that feature position on the main document view, and also have that feature mark show in any open pretty print of that sequence.

Other views are static pictures taken of the data at the time the analysis was performed -- later changes to the data do not affect that picture.


Aligned multi-sequence view


The main view into a sequence document is the multiple sequence editor window, which lists sequence names to the left and sequence bases as one line that can be scrolled thru. Bases can be colored or black. Sequence can be edited here, especially to align them, and subranges and subgroupings can be selected for further operations or analysis. Entire sequence(s) can be cut/copied/pasted by selecting the left name(s). Mouse-down selects one. Shift-mouse down selects many in group, Command-mouse down selects many unconnected. Double click a name to open a single sequence view. Select name, then grab and move up or down to relocate.

Select the lock/unlock button at the view top to lock/unlock text editting in the sequence line. With lock on (no editting) you can use shift and command mouse to select a subrange of sequences to operate on.

Bases can be slid to left and right, like beads on an abacus, when the edit lock is On (now default). Select a base or group of bases (over one or several sequences), using mouse, shift+mouse, option+mouse, command+mouse. Then grab selected bases with mouse (mouse up, then mouse down on selection), and slide to left or right. Indels "-" or spacing on ends "." will be added and squeezed out as needed to slide the bases. See also the "Degap" menu selection to remove all gaps thus entered from a sequence.

Single sequence view


For entering/editting a single sequence, this view displays one sequence with more info and control. Edit the name here (later other documentation). Bring out this view by double-clicking sequence name in align view, or choosing Edit from Sequence menu.

Print views


Various analyses provide non-editable displays. These are usually saveable as PICT, POSTSCRIPT and GIF formats for editing in your favorite graphic editor program, or printing. When a print or graphic view is displayed, choosing the File/Save As command will offer you the choice of where to save and in what format.



Data files


SeqPup uses plain text files for its basic sequence data. These files can be exchanged without modification with many other sequence analysis programs. SeqPup automatically determines the sequence format of a data file when opening it. You have an choice of several formats to save it as.

The program looks in the folder "data/prefs" for text files containing various data. At present these files include "codon.prefs", "renzyme.table" and "color.prefs".

Various temporary files are created for child tasks, currently in the main folder where the program lives. Currently you cannot run the Child Tasks portion of SeqPup from a locked file server because these temporary files need to be created. Otherwise, SeqPup should operate from a locked fileserver properly, and can be launched by several users at once.

In the data/prefs/ that is comes with the application, you find these files
color.prefs -- for base colors in displays
seqmasks.prefs -- for pretty printing displays
renzyme.table -- for restriction maps
codon.prefs -- for protein translation
any of these can subsitute for the codon.prefs file
codon-drosophila.prefs
codon-human.prefs
codon-ecoli.prefs
codon-rat.prefs
codon-tobacco.prefs


Restriction Enzyme Table


The file called "renzyme.table" contains restriction enzyme data, as distributed in REBASE by R.Roberts. The format used is identical to that used by GCG software.

{ documentation ...}

Commercial sources of restriction enzymes are abbreviated as follows:

          A     Amersham (12/91)
          B     BRL (6/91)
          ...
          X     New York Biolabs (4/91)
           Y     P.C. Bio (9/91)

..  { separates data}
;AatI      3 AGG'CCT        0 !  Eco147I,StuI                      >OU
AatII      5 G_ACGT'C      -4 !                                    >EJLMNOPRSUVX
AccI       2 GT'mk_AC       2 !                                   
>ABDEIJKLMNOPQRSUVXY
;AccII     2 CG'CG          0 !  Bsp50I,BstUI,MvnI,ThaI            >DEJKQVXY
;AccIII    1 T'CCGG_A       4 !  BseAI,BsiMI,Bsp13I,BspEI,Kpn2I,MroI  >DEJKQRVY
;Acc65I    1 G'GTAC_C       4 !  Asp718I,KpnI                      >DFNY


Codon Table


The file called "codon.prefs" in folder "Tables" is used for translation of nucleic to protein sequence, and for backtranslation. This file may be replaced with a table of your choice in the following format. The format is nearly identical to that used by GCG software codon tables. The Codon column has been put first. Each codon is followed by "=" equal sign. Any documentation is preceeded by "#" pound symbol.

#Escherichia coli
#
# any documentation
#
#Codon   AmAcid    Number    /1000     Fraction   ..
GGG=     Gly     1743.00      9.38      0.13
GGA=     Gly     1290.00      6.94      0.09
GGT=     Gly     5243.00     28.22      0.38
GGC=     Gly     5588.00     30.08      0.40

{ continue for 64 codons }




Features


The following topics describe main features found in the SeqPup menus.

File


New will create a document of sequence data (alignment view). With a new document one can add new sequences, or copy selections from another document.

Open commands will open exising files. -
The Open as Sequence... choice will open a file of sequences into a new align view document. -
You can also open appending sequences to the current document (Append to sequence list). -
You can fetch sequences from an Internet server (see below SRS information) with the Open sequence from databanks... command. -
The Open Text command will open and display a file as plain text. -
The Open URL command will open an Internet connection (or local file) given a URL of the format http://internet.address:port/path/to/data.file, as in http://iubio.bio.indiana.edu/Readme. If the file is sequence data it will be displayed in an alignment window. Currently only the HTTP protocol is supported for this command.

Save and Save as will save the current document to disk files. Save is context sensitive and will be active when a document has been changed.

Revert will restore the open align view to the last version saved to disk.
Save selection wil saves only highlighted sequences to a new disk file. Doesn't affect save status of current full alignment document.

Print setup, Print will print the current view (see bugs).

Check Updates will connect to the home server for the application and offer information on new versions and updates to the application.

Help brings up a view to page thru the help file.

Quit - terminate the program

Editing


Undo, redo -- Standard application commands to return a document to its state before a command was performed (undo), and to again do the command (redo) after an undo. For instance, complementing (changing) a sequence should be undoable. These are context sensitive, and should be enabled only when possible. Current design is to offer several levels of undo and redo, but see bugs.

Cut, copy, paste, clear, select all -- Standard application commands that are availble in a context-sensitive manner. Cut moves a selection from the document to the clipboard. Copy makes a copy to the clipboard. Paste copies from the clipboard to the active document. Clear removes a selection without copying to the clipboard. The clipboard is an application-wide special document that stores these data until overwritten by new data. Clipboard data is potentially copyable to other applications (see Bugs).

For instance, selected editable text should have these functions to manipulate the text. Sequence selections enable these functions to move sequence data within and between alignment documents. Not all appropriate contexts may yet have these commands enabled (see Bugs).

Find, Find same, Find "selection" will search for strings in text.

Find ORF, this will select the first or next open reading frame of the selected sequence.

Sequence manipulations


New sequence -- append a new, blank sequence to the sequence document.

Edit -- open single sequence editting view for selected items.

Reverse, Complement, Rev-complement -- Reverse, complement or reverse+complement a sequence. Works on one or more sequences, and the selected subrange.

Rna-Dna,Dna-Rna -- Convert dna to rna (t->u) and vice versa. Works on one or more sequences, and the selected subrange.

Degap -- remove alignment gaps "~". Works on one or more sequences, and the selected subrange. Gaps of "-" are locked and not affected by Degap. Works on one or more sequences, and the selected subrange.

Lock Indel & Unlock Indel
-- Convert from unlocked gaps "~", to locked gaps "-". Unlocked gaps will disappear and appear as needed as you slide bases left and right. Locked gaps are not affected by sliding nor by Degap. Works on one or more sequences, and the selected subrange.

Consensus -- generate a consensus sequence of the selected sequences. The Options/Seq Prefs... dialog modulates this function.

Translate -- translate to/from amino acid. This relies on Codon.prefs data, which can be changed for specific needs (see optional species-specific codon preference files).

Distance -- generate a distance or similarity matrix of the selected sequences. The Options/Seq Prefs... dialog modulates this function.

Pretty print -- a prettier view of a single or aligned sequences. Use these views to print your sequences. Printing from the editing display will not be supported fully, and may not print all of your sequence(s).

Restriction map -- Restriction enzyme cut points of selected sequence. Also protein translation options.

Dotty plot -- provide a dot plot comparison of two sequences.

Nucleic, amino codes -- These provide both reminders of the base codes, and a way to select colors to assocate with each code. See below for some discussion of the two "aa-color" documents that now ship with SeqPup.


Single sequences - editing and features


The Edit sequence function opens a single sequence editing window for selected sequence(s). One can edit sequence bases here, change sequence name and perform some sequence manipulations and analyses.

A recent addition is Document and Features sections, along with the Sequence editing window. These sections are currently editable text. The format is not yet formalized but follows the specific sequence file format. Currently only Genbank and EMBL formats are parsed for documentation and features.

The features section includes sequence position information, as per

     GC_signal       115..122
     exon            <447..571
     CDS             join(447..571,1786..2005,3441..3554)

These positions will be read by the program when you highlight the text, then choose the commands in the Features/ menu. Mark on main view command will copy the selected position to the alignment window, erasing any other mark for that sequence. Add to main view command will copy the position, adding it to any other marks. This is most useful when the main view has a Mask level selected. One can add feature marks to different mask levels. Then one can pretty print the sequence and these marked features will be highlighted according to the current styles for those masks.



Options and preferences

One can set various options which persist to later uses of the program. The Options menu includes several dialogs for these preferences, including for Sequence data functions, for Sequence Pretty print styles, for Base colors and styles, Codon table, Sequence Retrieval System (SRS) server.

Also among the options are dialogs to edit directly application and framework preference files. Generally you can ignore these, as other dialogs handle this. But some options don't yet have an easier interface for changing. One important one is the framework preference AWTs.clicksToActivate=1 which sets number of mouse clicks to active an icon button or other relevant item. Many people prefer clicksToActivate=2 (double-click).

For MSWindow and XWindow systems, the framework preference user.openurl is important. For Macintoshes, the equivalent is done using InternetConfig software.
See the above Installing section for details of setting the user.openurl preference.

An application preference with no current dialog choice is Adorns.backColor= 0xe8f0ff, which sets window background color (0xE8F0FF is a light blue).

Option files are stored in a system specific location as text files. One can edit them, when the application is not running, with a text editor. On Macintosh systems, the files are stored in System Folder:Extenstions:MRJ Libraries: as dclap.prefs and seqpup.prefs files (when using the MRJ runtime). On MS Windows sytems, they are stored in C:\WINDOWS folder as dclap.ini and seqpup.ini. On Unix systems, these options are stored in ~/.xxxrc files, including .dclaprc for the framework prefs, and .seqpuprc for the application prefs.


Pretty Print configuration


This is the syntax for specifying style information used in the pretty print function.
This information is currently stored in the data/prefs/seqmasks.prefs file, and can be edited with the Options/Base style table... command, or with a standard text editor. Each Style label should now be prefixed with the mask level it applies to, as in

     mask1.style=bold underline uppercase
     maks2.style=italic box lowercase

Style tags for pretty print include

style=
bold - bold font
italic - italic font
underline - underline font
box - put a box line around selected mask region
uppercase - convert base to uppercase
lowercase - convert base to lowercase
invertcolor - invert the colors of the font and background
Use any combination of values for style, separated by space or commas

repeatchar=.
- use this if you want mult-align repeated chars set to a single character

fontname=
- set a valid computer font name, like Courier, Helvetica, Times, ...
fontsize=
- set point size of the font
fontcolor=
- set rgb color of the font, using 6 digit hexadecimal value, see sample values in table (e.g., 0xff0000 is red, 0x00ff00 is green, and 0x0000ff is blue, 0x000000 is black and 0xffffff is white, 0xaaaaaa is one shade of grey).

backcolor=
- set rgb color of the background behind font

boxstyle=solid
set the style of the boxing line
current values are dashed, dotted, solid, dark, medium or light

fillpattern=
- set the pattern used to draw the background color or fill. This
will allow "hatching" types of shades. Not well tested yet (mostly needs
printer output to see).
- set this with two 8-digit hexadecimal values (to create an 8x8
pattern array). You need to experiment with values to find a nice
pattern. An example is fillpattern=0xaa55aa55 0xaa55aa55


Currently you can set four mask styles in this table. These should start with a header like below, but name as you like. Lines starting with "#" or "!" are comments that are ignored. Style names starting with "mask1." are associated with the sequence alignment mask called "Select mask 1...". Start the names with "mask2." to associated with "Select mask 2...", start with "mask3." to use wiht "Select mask 3..." and start with "mask4." to use with "Select mask 4...".

##----------------------------
##[mask1]
mask1.description=a test style 

## style values=bold,italic,underline,box,
##      uppercase/lowercase,invertcolor
mask1.style=bold uppercase box

## repeatchar -- use if you want mult-align repeated chars
mask1.repeatchar=. 

## font selection 
mask1.fontname=Courier 
mask1.fontsize=9
mask1.fontcolor=0xff0000     # red
mask1.backcolor=0x80e0e0     # lt.blue
mask1.boxcolor= 0x0000ff     # blue

## boxstyle values= dashed dotted solid dark medium light
mask1.boxstyle=dashed

## fillpattern= use 2 hex-long values for this 8-byte pattern
mask1.fillpattern=0x88228822 0x88228822 
mask1.fillpattern=0xaa55aa55 0xaa55aa55 



Color Selections


Base colors can be set for the alignment display and pretty prints. The preference file Color.prefs specifies color codes for each nucleic and amino base. It may be edited from the Options/Base color table... function, or directly with a text editor.

Currently color values are stored as hexadecimal codes. This is stored as a 3-byte hex value of Red-Green-Blue (RGB) values. 0xFF0000 is red, 0x00FF00 is green, 0x0000FF is blue. Future versions of the program should include a color picker interface.

A few early users of this new version provided color amino selections that ship with SeqPup. Here is one description.


Date: Mon, 7 Jun 1993 15:50:09 +0200
From: Heikki.Lehvaslaiho@Helsinki.FI (Heikki Lehvaslaiho)
Subject: aa colors

                           2 4 - b i t                M a c
COLOR           AA      R       G       B       R       G       B
---------------------------------------------------------------------
Magenta         AGPST   255     000     255     65535   0       65535
Black           BDENQZ  000     000     000
Red             C       225     000     000     57600   0       0
Blue            FWY     000     000     255     0       65535   65535
Light blue      HKR     000     192     192     0       49344   49344
Green           ILMV    000     192     000     0       49344   0
Gray            JOUX    145     145     145     37265   37265   37265


Internet



The Internet features of SeqPup let you interchange ideas and data with people and biocomputing services around the world. SeqPup includes a selection of network access features in the developing area of networked biocomputing.

Internet Sequence Search and Fetch


New in version 0.7 are (a) a client for the Sequence Retrieval System (SRS) to look up and fetch sequences from databanks like GenBank and EMBL, Swiss Protein an PIR, and (b) a client for the NCBI-BLAST server.

SRS client


The SRS client lets you search Internet databanks for sequences based on key words in the documentation, such as title, accession number, locus name, organism, author, and other documentation. To learn more of the Sequence Retrieval System (SRS), see http://srs.ebi.ac.uk:5000/, or http://iubio.bio.indiana.edu/srs/, or others listed at http://srs.ebi.ac.uk:5000/srs5list.html.

Use the File/Open sequences from databank menu command (or the Fetch sequences button on About SeqPup) to access SRS servers. Type one or more key words to describe the sequences you want to view. Sequence titles are fetched for all matches (which may be hundreds or thousands) from the selected server, and displayed in an alignment document view. You then can fetch full data for specific sequences by active clicking the name, or choosing the Sequence/Edit command.

You can use boolean operators & (AND), | (OR), ! (NOT) to join several key words in a query to tailor your search. SRS servers offer searches by fields of data. The general field "all" searches all indexed fields; each databank offers a selection of fields such as organism, accession, title, comments, and so forth.

The current SRS client in SeqPup is fairly simple, and doesn't offer the rich range of options you will find via an HTML browser, but it does offer the direct step of loading sequence data from an Internet server into this sequence editor.

The Options/SRS setup dialog lets you set your prefered server, data libraries and data fields for a query.

NCBI-BLAST client


The NCBI BLAST server performs a sequence similarity search of GenBank and/or other sequence databanks, matching your sequence against published sequences. To learn more of BLAST at NCBI, see http://www.ncbi.nlm.nih.gov/

The current BLAST client in SeqPup is also fairly simple. It doesn't offer any more than an HTML browser, except for the direct step of loading sequence data from your sequence editor to the analysis server.

To perform a BLAST search, select a sequence entry in a document, choose the Sequence/BLAST@NCBI command which will open a sequence edit view with BLAST option choices. You can edit the sequence here (without affecting the sequence in your main document). You can select the results document file (in HTML format which will be opened by your prefered HTML viewer). There is an Options drop-down dialog, click the BLAST options triangle/arrow to open this section. Choose among which BLAST program, which data library to search. Both of these are sequence context sensitive -- DNA and Amino sequences have different selections. The Do BLAST button sends your sequence to the server at NCBI via HTTP, and saves results to the selected file which will be displayed in your HTML viewer.




Child Tasks


The Externals menu lets you link SeqPup with external sequence analysis programs that you or others may write. SeqPup can be configured to run command-line style applications, sending them sequence data and command information. When the child program is finished with its analysis, SeqPup will display its results.

The current Externals menu has

When BOP servers are attached, their commands are added also to this menu.

The general design of child applications is taken to be data analysis programs that have a command-line user-interface, and that take input data from a file or from the system "standard input" file (stdin), and that write outputs to files and to two system standard files "standard output" (stdout) and "standard error" (stderr). This is how many existing analyses programs work, and it is straightforward to program this basic kind of interface.

The value of SeqPup joined with these kinds of programs is that the SeqPup can concentrate on providing an easy-to-use interface for biologists, and the analysis application can concentrate on data analyses, without having to add a lot of software to provide a humanly usable interface.

Many command-line biocomputing programs, including versions of Clustal, CAP, tacg, primer, FastA, and so forth can be added as Child apps or BOP remote services.

Which child applications?

I hope this new ChildApp/BOP method is general enough to let you add almost any command-line program. I'm still working on special cases like Phylip package that requires a structured command-file instead of command-line options. If you add any biocomputing programs that can be freely distrubuted with SeqPup, consider sending them, or the command configuration file, back to IUBio archive for addition to the general distribution.

On command-line systems, including Unix and MSDos/MSWin, you should be able to use any pre-compiled version of a program that runs in this command-line style. On Macintosh systems (command-line-less), you will need to compile a command-line program with the ChildAppJ.c main program source (see the data/methods folder). This allows SeqPup to send command line parameters using a file.


Configuring child applications


You can add new child apps to SeqPup by adding text files to the data/methods/ folder with the suffix .command,that include the string "Content-type: biocompute/command" at the top, and follow the syntax described below and given in example files. See especially the clustalw.command file.


The biocompute/command file syntax

The general biocompute/command file format is a nested list of
key = value
key = { structured value }-

Newlines or ';' separate key=value pairs in a structure. Values that include white space need to be quoted with "" or ''. -
use backslash to escape special characters in a string, mainly tabs, newlines and such. A string can be continued on multiple lines using \ just before the line end. Enclose such a string in quotes.-
A structured value (with subfields) needs to be enclosed in curly brackets {}. -
The order of fields in a structure does not matter. Some fields are required, some are optional. -
Strings in a string list in the value.list, menupath, resultsKinds and others can be separated with tab or | (pipe) or comma characters. -
Comment lines starting with # are ignored.

The top level key is command = { various other key=value pairs }
Within a command, most of the fields are parameter lists (parlist = { list of pars } ) and parameters (par = { structure} ). All parameter values should include an id field, a value field, and can include a label field for display.

See bopper2.idl and ReadCommand.java for current key words (these may change)
Key words match fields in the bopper.idl, and are case-insensitive .
At this writing the key words are
commandKeys = { "id", "transport", "action", "filepath", "parlist", "menu", "command" };
parameterKeys = { "id", "label", "value", "ifelseRules", "runSwitch" };
containerKeys = { "required", "parlist" };
choiceKeys = { "multiple", "minToShow", "parlist" };
dataKeys = { "datatype", "dataflow", "filename", "flavor", "data" };

ID values are case-sensitive, unique strings. You reference IDs and other variables with dollar sign, as $ID. TITLE, INFO and HELP are special parameter ids.

The command key includes these subfields:
id = a unique string (required)
action = the command line to be executed, with runswitches of parameters to be substituted (required)
transport = local: for use on the same computer, bop: for bopper. This may be optional, and should be set by software
parlist = { list of parameters } (required)
resultKinds = string list of MimeTypes: text/plain, biosequence/fasta, ... (optional)
filepath = path to app on server (optional)
menupath = menu item name, with submenu path, e.g. "Utilities|Reformat" (optional)

The parameter key includes these subfields:
id = a unique string (required)
label = visible label (optional)
value.type = value (see below, required)
runSwitch = the command line string to be inserted in the action string. It is optional. This often includes the term $value, which is the special variable signifying the parameter value chosen by user. In the case of value.boolean types, this runswitch is set to null if the value is false.
ifelseRules = string list of rules to enable the parameter, based on things like protein or nucleic type of the input data; yet to be implemented (optional)

Labels and values of a parameter are shown to the user in a dialog form. The values can be changed by the user, depending on type of value. Other parts of this description are mostly for the server's use in determining how to run a command-line program, and how to get and return data.


There are many variants of the value field. These are specified as value.boolean =, value.integer =, value.string = , and so forth. These match the ValueUnion structure of the bopper2.idl.

Pimitive value types are value.boolean, value.integer, value.float, value.string.

value.title displays a non-editable string
value.url is an Internet URL

value.integerRange is a range of integers specified as "default,minimum,maximum,step value".

value.floatRange is a range of real values specified as "default,minimum,maximum,step value".

value.data specifies a data file with these subfields (all required?):
datatype = mime/type of data, e.g. text/plain, biosequence/fasta
dataflow = input or output
filename = string value of file, e.g, 5srna.fasta
flavor = file flavor, from the set stdin, stdout, stderr, input, output, serverlocal
A final subfield, data need not be specified and is used in the interface to pass actual data. An example is
value.data={ dataflow= output; datatype= text/plain;
flavor= stdout; filename= clustalw-out.text; }


value.container = is a value that includes other parameters, and is displayed as a container of options to the user. It may be a required or optional container. It has the subfields:
required = true or false
parlist = { list of pars } (required)

value.choice = is a value that includes other parameters, often boolean options. It has the subfields:
multipleChoices = true or false
minToShow = minimum number of choices to be displayed
parlist = { list of pars } (required)

value.list = a list of strings to select from, separated with pipe or tab chars, e.g.
value.list= "AatII|AccI|AceIII|AciI|AclI|AflII"



For MacOS, there is limited support for AppleScript commands when using the MRJ java runtime. Use the word applescript as the action command:
action = "applescript text of script to run here"
Currently no objects are returned, but script results are printed to System.out


Example command file

This is an abbreviation of the clustal.command file.

Content-type: biocompute/command

command = {

id = clustalw
menu = "Sequence Alignment|Clustal multiple alignment"
filepath = data/methods

## server only config -- I think these values are set by
## software, but this may need fixing.
transport = local:
#transport = bop:

action= "$filepath/clustalw \
$INFILE $OUTFILE $ALIGN $TREE $QUICK $BOOT \
$GAPEXT $GAPOPEN $200 $220 $100 $PAIRGAP $KTUP \
$TOPDIAGS $WINDOW $PWGAPOPEN $PWGAPEXT $PWMATRIX \
$300 $221 $211 $212 $213"

parlist = {
par = {
id = TITLE
value.title = "Clustal W Alignment"
}

par = {
id = INFO
label = "About Clustal W"
value.title = "Clustal W - for multiple sequence alignment \
by Des Higgins and colleages. Clustal W is a general purpose multiple \
alignment program for DNA or proteins."
}

par = {
id = main
label = "Clustal W - A multiple sequence alignment program"

value.container = {
required = true
parlist = {
par = {
id = HELP
label = "Help with Clustal W"
value.url = file://$filepath/clustalw_help
}

par = {
id = ALIGN
label = "Do full multiple align"
value.boolean = true
runSwitch = -align
}
}
}
} # end main

par = {
id = IOfiles
label = "Input/Output files"
value.container = {
required = false
parlist = {

par = {
id = INFILE
label = "Input sequences"
value.data = {
dataflow = input
datatype = biosequence/nbrf
filename = clustalw.pir
flavor = input
}
runSwitch = "-infile=$value"
}

par = {
id = OUTFILE
label = "Output aligned sequences"
value.data = {
dataflow = output
datatype = biosequence/gcg
filename = clustalw.msf
flavor = output
}
runSwitch = "-outfile=$value -output=GCG"
}

par={ id= STDOUT; label="Command output";
value.data={dataflow= output; datatype= text/plain; flavor= stdout;
filename= clustalw-out.text;}
}

par={ id= STDERR; label="Errors";
value.data={dataflow= output;datatype= text/plain; flavor= stderr;
filename= clustalw.err;}
}
}
}
}

par = {
id = treeoptions
label = "Tree options"

value.container = {
required = false
parlist = {

par = {
id= TREE
label= "Calculate NJ tree"
value.boolean = false
runSwitch = -tree
}

par = {
id= BOOT
label= "Bootstrap NJ tree"
value.boolean = false
runSwitch = "-bootstrap=$BOOTVAL"
}

par = {
id= BOOTVAL
label= "No. of boostraps"
value.integer = 1000
}
}
}
}

## more option parameters here...

} # end parameters
} # end command




Side note: Prior versions of SeqPup used an HTML FORMS syntax. This has been replaced by a new syntax, with some misgivings, because programming effort to support HTML was much more costly, and this new syntax can be extended more easily to include features needed for biocomputing. The syntax evolved from this prior work and the GCG SeqLab configurations. It will be extended in the future to more fully cover needs for biocomputing programs. That may include adding back some of the HTML formatting options.



Bopper and Internet biosequence analsyses


An Internet method of using "child apps" is now available with SeqPup. This allows one to run analyses programs on a remote computer, and interface with SeqPup's editor platform (fairly) transparently, as for the local child apps. This is made possible with a network protocol I've acronymed BOP (Biocomputing Office Protocol; obviously the acronym came first). The first version of BOP written in 1996 was based directly on the POP internet mail protocol. BOP2 (Bopper2) uses a CORBA-based interface, and replaces the unfinished BOP1 methods.

Many command-line programs, including versions of Clustal, CAP, tacg, primer, FastA, BLAST, the Phylip series, fastDNAml, and so forth can be added as BOP services fairly simply.

One potentially popular use for this BOP interface may be to offer a simple-to-use client for Genetics Computing Group (GCG) command-line software. As of this writing, an example Bopper server for GCG software isn't quite ready, but will soon be.

If you are an administrator of GCG software for your institution and would like to test this experimental version of Bopper2 with GCG at your site, please let me know.

The configuration of apps on a server computer is essentially the same now as configuration of local child apps running from the SeqPup data/methods/ folder.

To install and configure a Bopper2 server, see the distribution software, in ftp://iubio.bio.indiana.edu/molbio/java/source/bopper2.tar.gz

To provide BOP services to SeqPup or other clients, follow these steps:

-- Install Bopper2 on a server computer. The current Bopper2 is based on a CORBA Interface Definition (IDL). It is implemented in Java, using the free Omnibroker ORB. It will potentially run on any system with a Java runtime, but has only been tested on Unix. The bopper2 distribution should include all Java source and classes needed to run it, excluding a Java runtime and the command-line programs themselves.

-- Configure bopper2 to add command-line programs. The same .command file syntax is used for local and remote external commands with SeqPup. But one may need to modify file path and perhaps other information for each specific system. See the data/methods/ folder in SeqPup for example .command files.--

run the bopper server and publish its access url. I hope to add some directory of bop servers mechanism, but that currently isn't available. The URL for the test bop server at IUBio archive is
iiop://iubio.bio.indiana.edu:7000/bop

Note the IIOP protocol specifier, which is a CORBA standard network protocol. "bop" is the name of this specific service. Other named services may be run at the same host:port.





About the Java version


I have high hopes now for Java as a development language and toolkit, good for rapid development of complex applications, and for many other development uses. Writing Phylodendron and LoopDloop applications in early 1997 gave me a feeling that Java is ready for complex application development. Extending this to FlyNapp and SeqPup applications, which are quite complex, has shown me that Java does have the potential for rapid application development with a good app framework. See also the Genesis of Phylodendron.

These four biocomputing applications now share about 60 - 70% of their code. Improvements in one lead to improvements in the other in many cases, and that holds for future applications written with this framework.

This framework called DCLAP was started with the NCBI toolkit, a cross platform C toolkit on which Entrez, Sequin and other apps from NCBI are written (Thank you to Jonathan Kans and colleages at NCBI for this wonderful, free toolkit). On top of this toolkit I wrote a C++ framework which is meant to handle much of the basic application chores such as document opening, saving, doc and window management, menu and command management, etc. With the advent of Java as a C++-like language that has broad support and funding of tools from the commercial sector, but which also is available in free form at its basics, it looked like a good underpining for rapid, cross-platform app development. However, neither NCBI toolkit nor Java nor other sources provide the kind of application framework freely that makes it quick and easy to produce robust, easy to use, full featured applications for the biosciences. As I write new applications, I aim at improving such a framework so that the next application can be written more quickly than the last. The source code for this framework, in C++ and now its beginnings in Java, is available freely to others for scientific application development. The current Java version of DCLAP is preliminary, and will change significantly over coming months as it is more fully converted to use Java version 1.1. However I find it now very helpful in producing new applications, and hope that other programmers may also find it useful.

Developers will find the source code for this application and others in the iubio:/molbio/java/source folders.



Source code and DCLAP

This describes the pre-Java version of DCLAP (still available).

The C++/C source code for the prior version is at iubio:/util/dclap/source/

SeqPup is built on an object-oriented application framework, originally written in C++, called DCLAP. This framework is designed to speed the development of easy to use, complex programs with a rich user-interface. At this point, DCLAP is an unfinished framework. It is lacking in documentation. However, it is complete enough to build complex programs like SeqPup.

DCLAP includes the following segments
DClap/ -- basic application framework, including command, control, dialog, file, icon, list, menu, display panel, table view, mouse tracker, child application, window and view classes.
Drtf/ -- rich text display handlers, including RTF, HTML document, PICT and GIF image format readers.
DNet/ -- Internet connection tools, including TCP/IP, SMTP, Gopher and preliminary HTTP classes.
DBio/ -- Biocomputing methods, included biosequence, restrict enzyme, sequence editor, seq. manipulator, seq. output classes.

New applications can be built to employ and reuse these classes fairly quickly. Variations on the current methods are simple to add in the class derivation method of C++. For instance, new document formats can be added on the Drtf display objects, and new sequence manipulations can be added in the biosequence handlers, by building on current methods.

DCLAP rests upon the NCBI toolkit, including the Vibrant GUI toolkit, which is designed for cross-platform functioning. The successful genome data browser Entrez is written with the NCBI toolkit.

All of this source is available without charge for non-profit use (see copyright). The NCBI toolkit portion is further available for profit use, and such arrangements may be made for use of DCLAP.

DCLAP will never compete with commercial programming frameworks, but it has the virtue of being freely available and redistributable, and includes support specifically for biocomputing applications. If you are undertaking a biocomputing project requiring a rich user interface, and wish it to run on multiple computer platforms, this may be a worthwhile choice, especially if you wish to redistribute your source code for the benefit of the scientific community.

The DCLAP developer archive is at ftp://iubio.bio.indiana.edu/util/dclap/
Please contact Don Gilbert for further information on using this framework in other applications.




Comments, Copyright, Bugs and History


Comments

Problems and shortcomings of this software are the responsibility of Don Gilbert, to who any correspondence regarding problems should be addressed. Comments, bug reports and suggestions for new features (see below) are very welcome and should be sent via e-mail to SeqPup@Bio.Indiana.Edu

With any bug reports, I would appreciate as much detail as is reasonable without putting you off from making the report. If you don't have time to send detailed descriptions of problems, please do send comments and reports, even if all you say is "Good" or "Bad" or "Ugly".

Please include mention of computer hardware, and operating system software, including version. Describe how the problem may be repeated, if it is repeatable. If it is sporadic or only seen once, please also describe actions leading up to it. Include copies of data if relevant.

If you need to use land mail, mail to

Don Gilbert
Biocomputing Office, Biology Department
Indiana University, Bloomington, IN 47405


Copyright


This SeqPup program is Copyright (C) 1990-1997 by D.G. Gilbert.
All Rights are reserved.

gilbertd@bio.indiana.edu
Biology Dept., Indiana University, Bloomington, IN 47405

You may use this program for your personal use, and/or to provide a non-profit service to others. You may not use this program in a commercial product, nor to provide commercial service, nor may you sell this code without express written permission of the author.
You may redistribute this program freely. If you wish to redistribute it as part of a commercial collection or venture, you need to contact the author for permission.

The source code to this program is likewise copyrighted, and may be used, modified and redistributed in a free manner. Commercial uses of it need prior permission of the author.

Any external applications that may distributed with SeqPup are copyrighted by their respective authors and subject to distribution provisions as described by those authors. At present this includes ClustalW, by Des Higgins and colleagues, CAP by Xiaoqiu Huang, and FastDNAml, written by Joseph Felsenstein with modifications by Gary Olsen, Hideo Matsuda and Ross Overbeek, is copyrighted by University of Washington and
Joseph Felsenstein.

Distribution of external analysis applications with this program is done as a convenience for users, and in no way modifies the original copyright. If there is a problem with this, instructions to users for obtaining and installing external applications will be substituted.

No warranty, express or implied, is provided with this software. The author is trying to produce a good quality program, and will incorporate corrections to problems reported by users of it.


Bugs


v0.8 [java] -- Known bugs and missing features:

General
- view size-sensitive window scroll bars are used in several windows. These may not yet work fully and seemlessly. Views may be shifted above the scroll area, or scroll bars may not show up as they should when views extend beyond the window. Resizing the window will often cure these problems.
- drop down boxes are used extensively in the dialog windows, to hide/show selected information. Currently when a box is dropped down by clicking its drop arrow, the box isn't resized/displayed fully. One needs to resize the window with a mouse drag to get it displayed fully.
- appMenuBar needs work -- menus not showing in new doc (mrj), other bugs
- context sensitive menus not always properly sensitive to context (disabled when should be able, or vice versa).
- undo/redo isn't working in cases where it should. but does work in several cases. repeated undo/redo generally doesn't work, while first level undo often does.
- copy/cut/paste functions may not be working as smoothly in as many contexts yet as they should be.
- clipboard use (via copy/paste) and display needs work; export of clipboard to other apps not yet supported ( will happen when converted to jdk1.1)
- window menu doesn't always list items (java runtime/os variable)
- preference editing needs improved user interface

Sequence functions
- not yet ready : Restrict map, Dot plot, Nucleic & Amino codes pictures.
- Consensus overwrites first sequence in selection w/ cons as well as appending cons seq at end of list
- mask menu items not always enabled when mask views are selected (context sens. menu bug).
- find bases not ready; find ORF may be okay but needs testing
- sequence file reading and writing (readseq functions) still need testing and may well have bugs. Interleaved formats NEXUS/Paup, Phylip are not debugged. New formats await adding.
- single sequence editor is slow for long sequences
- sequence manipulation functions for single sequence window may not be ready.

- feature able parsing is preliminary; expect it to be improved in future releases.
-- saving feature/document info associated with sequence works now only for genbank and embl formats; cross-saving is still problematic (embl->genbank and vice versa)
-- editing feature/doc info in the single sequence windows should work but needs testing and may have bugs
-- using feature ranges to mark up masks and pretty prints, while now possible, is still too awkward a process; I hope to make this essentially automatic in later releases.
- changes to prefs such as codon prefs, color and style prefs may not be stored (edit these files w/ external text editor if need be).

Child/external applications
- v0.8 1st java version supporting these, with new interface. Undoubtedly there are bugs and missing features.
- remote external app interface is CORBA designed, using OmniBroker ORB. Interface def. will change. Current system doesn't well support user/password logins (primarily server-end problem)
- file handling for child apps still a problem -- where to put temp/results files
- need more sequence input checking, handling of specific child app needs
- need if/else handling of dialog items w/ respect to input seqs (prot vs nucleic)

Internet functions
- SRS functions need more testing and debugging;

Macintosh specific:
- MRJ 2.0 seems to have problems that other JRE's don't (besides slowness).
-- The work around with several window display problems is to resize the window a bit (grab size box and drag some) to get it to display properly.
-- Menus disappear when a new window is opened. The work-around is to select another SeqPup window then switch back to the new window, and menus appear.
-- scrolling dialog windows don't update on scroll -- esp. ones w/ dialog items.
-- text document display is horribly slow for any text longer than 20-30 lines

MS Windows specific:
- window menu may be non-functional, or picks wrong window
- functions that depend on window list (close at least) not working properly

XWindows specific:
- the BLAST dialog seems to send only a few bases to NCBI server

Fixes in v 0.8a
xx- printing directly now supported (java 1.0 missing feature; will happen
when converted to 1.1). Still needs work (printing graphics works, printing a TextDoc fails).
x?- mswin & xwin: menu command keys may now supported (XWin test shows them but non functional?)
xx- remote data analysis via biocomputing office protocol (BOP) now supported
xx- child apps now supported
xx- seqed/editable text areas now wrap
xx - mac - window location preferences are not used (seems fixed in MRJ2).
xx - save selection command fixed.

Fixes in v0.8b
xx - fix for mswindows file:/// url prefix ?
xx - improved readseq file format detection (I hope!)
xx - alignment editing of sequences enabled


Coming Features


Somewhat further on, I'd like to make SeqPup a bean-box, capable of incorporating new functions using the JavaBeans technology. It is a hope, that I don't know if it is feasible in my programming time frame, that this bean interface will be simple enough that an average biologist with interests could put together a data analysis function in Java and add it to SeqPup w/o having to spend a lot of time learning programming and software development methods. There are suggestions that Java will become a more ubiquitous and easy to use language than the combination of C, C++ and Perl, which are often used now for various biocomputing analyses.



History


SeqApp was started Sept. 1990 as MacApp sequence editor/analysis platform on which analysis programs from other authors, typically command line w/ weak user interfaces, could be easily incorporated into a useable Mac interface.


January 1998: version 0.8 release

+ Update to Java version; C++ version no longer updated
+ Bopper2 remote/local interface to command-line applications added. This is based on CORBA standard. It is experimental; the interface will likely change (improve I hope). But it has the basic functionality needed to attach local or remote network command-line style applications to this program. It is user-configurable (with help and better documentation).
+ 1 Feb 98 - added color picker, background color command, base color prefs dialog, sequence styles dialog. corrected several pretty print problems.

August 97: version 0.7b release

+ First java - based version

June/July 96: version 0.6d release

+ "bopper" Internet protocol for client/server use of command line programs such as the GCG suite.
+ autoseq base calling app for reading ABI and SCF sequencer trace file data, plus base/trace editing functions.
+ Started expanding maximum sequence limit to 2 megabases (from about 30Kb), however most functions beyond viewing will still fail for >30Kb sequences.
+ Several bug fixes are included for mac, mswin, unix. Added background color in align view, minimum ORF size pref, improved tracking of changed data, improved align editing, save pretty print to PICT or text; fixed child app bugs; fixed mswin edit truncation to 255 bases; editable data tables in selection dialogs


Jan. 96: version 0.5 of SeqPup.

fixed Save file in place -- now saves in proper folder, not in seqpup folder
improved seqpup folder path finding:
- MacOS: now should always find :tables:, :apps: if they exist w/in SeqPup folder, and prefs paths are relative (e.g., apps=apps, tables=tables in .prefs)
- UnixOS: now can 'setenv SEQPUPHOME /path/to/seqpup/folder'
- MSDOS : ditto with 'setenv SEQPUPHOME c:\path\to\seqpup'
NOTE: must use "APPNAME"HOME, so if you change name of SeqPup to PeekUp, you need to change env var to PEEKUPHOME.
added click-top-index-line to mask sequence column (only when sequence mask mode 1..4 is selected in main window popup)
added mask-to-selection, selection-to-mask commands -- mask-to-selection is not yet useful because base DTableView selection methods need to be rewritten to allow disjoint selections.
added seq-index display -- lists base number that mouse pointer is at
added mac file bundle rez & finder-open, finder-print
added save of pretty print in PICT format (mac), metafile (mswin - still buggy?!)
added variable position grey coloring in align display
added mswin/xwin sticky menubar window
fixed mswin mouse-shift commands
added mswin menu command keys
fixed mac/mswin text edit command keys: cut/copy/paste
many updates to mswin version for micsoft win32/winnt/win95
updated fastdnaml child app to new version 1.1.1
added configurable child-app launch parameters
-- dialogs in HTML.form format; needs more work, additions
added dna distance/similarity matrix function
added child apps: DeSoete's LSADT, Felsenstein's DrawTree & DrawGram


July 95: Version 0.4 of SeqPup.
This includes most of the features of its ancestor SeqApp. Alignment window: shift & slide sequences, copy/cut/paste/undo sequence entries among windows; Restriction maps and pretty print output; useable child apps for mac, mswin, and unix.

v0.4 corrections:
- File/Open for non-sequence data (text, rtf, etc.) has alternate open menu, to distinguish from sequence data. Added sequence append-open.
- Cut/copy/paste/undo for align-seq view now available
- Sequence menu items that are now ready: Consensus, Pretty print, Restriction Map, nucleic & amino codes. Some of these need further work (pretty, remap options).
- Child apps usage improved, may need more work though.
- The Mac/68K, Mac/PPC, MSWin, Unix now do Child applications.
- Include ClustalW, CAP, FastDNAml, child apps
- Restriction map function is extensively revised and improved.
- FindORF and Find string functions added
- Printing for pretty print, r.e.map now functional on Mac (and maybe MSWin)

v0.4 Known bugs and missing features (see above Bugs section for fuller list):
- Character editing (unlocked text) in the alignment (main) window is not working on Xwindow systems, and may be bugging in MSWindow and Mac systems.
- Single sequence editor (Sequence/Edit) is very slow for long sequences (6,000bases)
- Sequence menu items not yet ready : Dot plot.
- Child Apps fail in various ways on MSWindows and Unix systems.
-- CAP seems most likely to succeed completely.
-- ClustalW and FastDNAml may be launched and run properly, but SeqPup will fail to automatically open their results files.
- MSWindows and XWindows versions are less stable than Mac versions.
- XWindows versions reliable crash/core dump when Quit is chosen. This is an annoyance but doesn't seem to impair use.
- Internet menu needs testing & reworking - I haven't tested any of the e-mail services listed since last year.
- Nucleic codes picture shows PICT processing bug -- misplaced text, and an error in biology -- complement of W is W, not S, and complement of S is S, not W.
- Repeated copy/cut/paste of the alignment window entries might cause problems. Please let me know if you see this.
- There is no printing for X Window systems.

21 Mar 95: Second release of SeqPup, version 0.1.
This release has more parts of the SeqApp program put into it. This includes some alignment view manipulations, limited use of child applications, some undo-able commands, choosing data tables for colors, codon and r.enzymes. This release also includes much of the basics of GopherPup, including display of RTF, HTML, PICT, GIF document formats. However there is still some work to be done to let you open these w/o interpreting them as sequence data.
This release has just a Mac PowerPPC (SeqPup/PPC) and Mac 68000 processor (SeqPup/68K) versions. When more of the basic bugs are worked out, I'll try Sun and MSWindows versions.

v0.1 Known bugs/missing features:
- Use of character editing (unlocked text) in the alignment (main) window will lead to a crash after a few windows have been opened/closed or other manipulations performed.
- File/Open for non-sequence data (text, rtf, etc.) may well mistakenly identify them as sequence data. File/New is probably not doing anything useful, or bombing.
- Single sequence editor (Sequence/Edit) is very slow for long sequences (6,000bases)
- Single seq. editor may be failing in various ways (I've not looked at it carefully yet).
- No cut/copy/paste/undo for align-seq view yet (coming soon I hope).
- Internet menu needs reworking - I haven't tested any of the e-mail services listed there since last year.
- Sequence menu items not yet ready : Consensus, Pretty print, Restriction Map, Dot plot, nucleic & amino codes.
- Child apps usage needs more development to work smoothly.
- The Mac/68K version fails when using Child applications.
- Only the ClustalW child app is ready for distribution (may have FastDNAml, CAP, and DNAml soon -- let me know of programs you would like to see here).

1 Mar 94: First public release of SeqPup, version -1.
It has plenty of bugs and missing features, including:
no Undo (this is a real bite to those used to it)
mostly no cut/copy/paste/clear
limited printing of documents or views
mostly no align-view manipulations (move,cut/copy,edit in place, shift, ...)
no pretty print views
no restriction maps
no dot plots
no ...
problems w/ window display & keeping track of active window (x,mswin)
I'll be adding back many of these features from the Macintosh SeqApp as time permits.

SeqApp 12+ June 93, version 1.9a157+
a semi-major update, and time extension release with various enhancements and corrections. These include
-- lock/unlock indels (alignment gaps). Useful when sliding bases around
during hand alignment, to keep alignment fixed in some sections.
-- color amino (and nucleic) acids of your choice.
-- added support for more sequence file formats: MSF, PAUP, PIR. SeqApp now relies on the current Readseq code for sequence reading & writing.
-- save selection option to save subset of bases to file.
-- addition the useful contig assembly program CAP, written by Xiaoqiu Huang.
-- major revision of preference saving method (less buggy, I hope)
-- major revision of the underlying application framework, due to moving from MacApp 2 to MacApp 3.
-- fixed a bug that caused loss of data when alignment with a selection was saved to disk.

5 Oct 92, version 1.8a152+ -- a semi-major update with various enhancements and corrections. These include
- corrections to the main alignment display,
- improvements to the help system,
- major changes to the sequence print-out options,
-- including addition of a dotplot display (curtesy of DottyPlot),
-- a phylogeny tree display (curtesy of TreeDraw Deck & J. Felsenstein's DrawTree),
-- improved Pretty Print, which now has a single sequence form and a better aligned sequence form,
-- improved Restriction map display,
- addition and updating of several e-mail service links,
-- including Blast Search and Genbank Fetch via NCBI,
-- BLOCKS, Genmark, and Pythia services,
- updated Internet gopher client (equal to GopherApp),
- editable Child Tasks dialogs
- addition of links to Phylip applications as Child Tasks
- addition of Phylip interleaved format as sequence output option

11 June 92, version 1.6a35 is primarily a bug fix release. Several of the disasterous bugs have been squashed. This version now works on the Mac SE model, except for sendmail. No new features have been added.

7Jun92, v. 1.5a?? -- fixed several of the causes of mysterious bombs (mostly uninitialized handles), link b/n multiseq and 1-seq views is better now, folded in GopherApp updates, death date moved to Jan 93,

25Mar92, v1.5a32 (or later). First release to general public. Includes Internet Gopher client. Also released subset as GopherApp for non-biologists.

4Mar92, v 1.4a38 -- added base sliding in align view. Bases now slide something like beads on an abucus. Select a section with mouse, then grab section and shift left or right. Gaps are inserted/removed as needed. For use as contig aligner, still needs equivalent of GCG GelOverlap to automatically find contig/fragment overlaps.

Also added "Degap" menu item, to remove "." and "-". Fixed several small bugs including Align pretty print which again should display.

2Mar92, v 1.4a19 -- fixed several annoying bugs, see SeqApp.Help, section on bugs for their resolution. These include Complement/Reverse/Dna2Rna/ Translation which should work now in align view; Consensus menu item; entering sequence in align window now doesn't freeze after 30+ bases; pearson/fasta format reading; ...

10Feb92, v 1.4a6 -- fix for Mac System 6; add Internet service dialogs for Univ. Houston gene-server, Geneid @ BU, Grail @ ORNL; correct About Clustalv attribution.

5Feb92, v 1.4a4 -- limited release to network resource managers, clustalv authors, testers.

Vers 1.4, Dec91 - Feb92. Dropped multi-sequence picker window, made multi-align window the primary view (no need for both; extra confusion for users). added pretty print, restriction map, sequence conversions. Generalized "call clustal" to Hypercard-like, System 7 aware menu for calling external tasks. Fleshed out internet e-mail objects, added help objects, window menu, nucleic/amino help windows. Many major/minor revisions to all aspects to clean out bugs. Preliminary release to a limited set of testers (1.4a?)

Vers. 1.3, Sept - Dec91. Modified clustalv for use as external app (commandline file, background task, ...). Added basic Internet e-mail routines call clustal routine (preliminary child task) Many major/minor revisions to all aspects to clean out bugs.

Jun91-Aug91: overwork at other tasks kept SeqApp on back burner.

Mar91-Jun91: not much work on SeqApp, fleshed out TCP methods (UTCP, USMTP, UPOP).

Feb 1991, vers 1.2? made available to Indiana University biologists and NCBI biocomputists.

Vers. 1.1, Oct 1990, multiple sequence picker and multiple sequence alignement window, including colored bases, added to deal with alignment and common multi-sequence file formats.

Version 1, Sep 1990. Single sequence edit window + TextEdit window, from MacApp skeleton/example source + readseq.