From usenet.ucs.indiana.edu!sol.ctr.columbia.edu!spool.mu.edu!agate!ames!bionet!SALK-SC2.SDSC.EDU!mangalam Tue Sep 8 19:34:15 EST 1992 Article: 613 of bionet.software Path: usenet.ucs.indiana.edu!sol.ctr.columbia.edu!spool.mu.edu!agate!ames!bionet!SALK-SC2.SDSC.EDU!mangalam From: mangalam@SALK-SC2.SDSC.EDU Newsgroups: bionet.software Subject: Reviews of Seq Analysis Progs (longish) Message-ID: <9209082318.AA02348@genbank.bio.net> Date: 8 Sep 92 23:18:28 GMT Sender: daemon@genbank.bio.net Distribution: bionet Lines: 740 Harry Mangalam Vox:(619) 453-4100, x250 Dept of Biocomputing Fax:(619) 552-1546 The Salk Institute mangalam@salk-sc2.sdsc.edu 10010 N Torrey Pines Rd mangalam@salk-sgi.sdsc.edu La Jolla CA 92037 mangalam@salk.bitnet Greetings, Netlandos, In response to the recent spate of advisories, queries, and warnings about sequence analysis programs (SAPs), I thought I'd muddy the metaphoric waters further by throwing my own $.02 into the ring. I work in a mostly Mac environment hence the emphasis is on Mac software. A few PC packages are included mostly because they are useful enough to warrant putting up with the hassle of transferring and modifying files. I have recently begun to work in an X window situation and therefore a few X Window programs are also included. Notably absent from this review is Steve Smith's Genetic Data Environment (GDE), but I will try to get to it Real Soon Now. It is, for those who don't know, an extensible, X window app for molecular biology whose server program (client, in the bizzarro world of X) runs on a Sun SPARC hardware (and now on SGI, thanks to Anthony Persechini). But enough of GDE for now. If you're looking for a program that will do everything from restriction maps to multiple alignments, will make only a polite dent in your wallet, is well-debugged, takes advantage of the latest network services, co-exists peacefully with the rest of your applications and is easy to use, you will have to search beyond the narrow confines of this planet. Many of the packages mentioned below are commercially produced and thus have to make a profit. Because of the restricted market for sales (as opposed to a general purpose package like a spreadsheet or word processor), they must be relatively expensive and many address the all-too-common practice of software piracy by requiring the presence of a hardware lock. Those that are freeware obviously cannot support the level of debugging and support of a commercial program, however there are some that are of surprisingly high quality. The following notes are not meant as an exhaustive, objective review. They are my opinions (and a few measurements) based on having used the programs or the demo versions and are obviously biased by my approach to various problems - what I may dismiss without a thought may be a critical determinant for someone else. And certainly, don't take my word for it - I've (fitfully) tried to include the email addresses of people who have taken opposing views. Also, while I have made a reasonable attempt to be accurate, features, updates, and prices change so often in this field that like many reviews, this one is sliding into obsolescence as you read it. All prices are approximate (and sometimes negotiable, especially at the end of a fiscal period). Consider this a work in progress - as time allows, I plan to post reviews of other packages not covered here and increase the detail of the reviews, but for now a quick overview will have to do. I invite comments, corrections, and flames and will certainly post explanations, expansions, apologies, and retractions if they are warranted. Additional Sources of Information on SAPs: You can also search the archived biosci postings for additional information by WAIS and gopher. One gopher path is: Title: BioSci-Bionet-News.src Host: fly.bio.indiana.edu Path: waissrc:/Other-Bio-Gophers-Etc/Wide-Area-Info-Servers/ BioSci-Bionet-News.src Type: Query There are additional reviews on sequence analysis software, notably by Peter Markiewicz, available from the Bio-archives. His review (titled pm-macinmolbio.txt on the Indiana archives) has a very good introduction and covers more ground than this review. I highly recommend it. Dan Jacobson (danj@welchgate.welch.jhu.edu) recently posted a more extensive review of public domain primer/oligo analysis programs, including chunks from their documentation. You can access this via the gopher mentioned above. Disclaimer: The views expressed below are my own. I have not received any considerations, monetary or otherwise, from any of the entities mentioned here. I have acted as an uncompensated beta tester for the "Sequencher" program, for a module of DNASTAR, and a friend (Lisa Caballero) wrote much of the guts for IBI's AssemblyLign (a competitor to Sequencher, incidentally). In most cases of freeware, I have included the author's email address; these should be used sparingly - you should first try the appropriate archive, read the included documentation, and only as a last resort or to report a bug, contact the author - let them keep working to keep bringing us these programs. At the end of the text is a table that compares a (very) few of the execution times for some of the programs that do approximately the same things. Opening Diatribe: Crippled demos are a lousy idea - DNASTAR, whatever you might think about their corporate leadership or programs, has implemented the correct introductory strategy - a 60 day free trial of the full, everything-enabled, working program; after 60 days, the program irreversibly suicides. It is a rare demo that gives you a good feel for the program when you can't save your work, or print, or import your own data. There are some exceptions (see the blurb for Gene Construction Kit below), but in general, crippled demos are not worth the floppies they rode in on. Rather, see if you can get the company to give you a 30 or 60 day trial period. High Quality Freeware/Shareware: The Don Gilbert Collection: {I nominate Don Gilbert for the BioGNUdos Prize (apologies to Dan Jacobson for the nested pun), awarded annually to the author of the most useful free software for the Biological Sciences.} Just about everything I have ever tried by Don Gilbert (gilbertd@sunflower.bio.indiana.edu - also keeper of the Indiana Archives) has been exceptionally useful. This includes: - DottyPlot, a diagonal comparison program that plots identities or similarities between 2 easily input sequences. You can magnify the area of interest and save the output for further evaluation or as a PICT file for inclusion into a graphic. DNA Strider 1.2 now provides a very similar comparison, but to my surprise, Dottyplot is faster by quite a bit. Using proteins of 789 aas (M11969) and 1127 aas (M69238), I measured the following times on a MacIIci: DottyPlot DNA Strider 1.2 Window:15 7.5" 27" Match:7 Window:9 9.5" 32" Match:4 Window:7 13.5" 18" Match:3 - READSEQ, a sequence format converter that is unsurpassed in flexibility and portability. - GopherApp, a Mac version of the U of Minn gopher protocol that allows you to attach the computer resources of the Internet to your Mac as sort of an extended hard disk. Not surprisingly, DG's BioGopher hole at Indiana is one of the best, with his gopher access to Genbank beating most of the CD ROMS on our local network ("but", he whined, "when are you going to implement Boolean searches?"). This is one of those programs that is so useful that it is worth buying a Mac (and ethernet card) for. [Another aside - assuming you are within reach of an ethernet backbone, the single most cost-effective piece of equipment you can buy for your Mac or PC is an ethernet card. For ~$200, you get (almost) instantaneous access to terabytes of helpfully sorted information, reasonably supported software, BBSs, e-mail, etc.] - loopDloop, a visual RNA secondary structure editor, sort of like Canvas specifically for RNA. It takes the as input, the output from the Zuker RNA folding programs and helps you turn them into quite pretty figures. - SeqApp, an Internet aware, extensible, multiple sequence editor and analysis package. This is what sequence analysis packages of the future will look like if they want to sell. It is still in 'alpha' testing, and as such, is still rough around the edges, but it is definitely the shape of things to come. From within this program, you can send and receive mail via a POP mailer, send off sequences for FASTA, BLAST, GRAIL, and GeneID searches, retrieve Genbank sequences, initiate gopher sessions, inter convert sequence formats as well as a number of the usual sequence analysis functions. And, if the function you want is not included, you can also add your own. (clustalv, a multiple sequence alignment program is included as an example). As well, there is an almost-hypertextual help system and, possibly the most responsible gripe reporter available - instant mail to the author from within the program. A warning - because of it's neonatal state, it's not yet ready for those who need to be spoon fed, as DG says himself - "expect it to fail in many ways." However, if you are reasonably Mac-fluent and could use some of the tools that _do_ work_, I highly recommend it.. Other Useful Freeware/Shareware: Primer Analysis Programs: - Amplify by Bill Engels (WREngels@wisc.macc.edu). A native Mac application to do oligo/primer analysis. Not quite glitzy or as full featured as Oligo (National Biosciences Inc, 800 747 4362, for those of you with HHMI funding), but then it doesn't cost $800 ($640 nonprofit) either. It will, given oligos, search a target sequence for near matches and graphically display the results of using the various primers. It will test the oligos for matching sequence and examine them for internal repeats but will not search for the best primers to use given a target sequence and a set of conditions as will Primer and OSP (see below). Two other freeware primer/oligo programs spring to mind, both of them more capable than Amplify, but both harder to use. - Primer (by Stephen E. Lincoln, Mark J. Daly, and Eric S. Lander; primer@genome.wi.edu, FAX 617-258-6505) was one of the first of these types of programs and has been ported to just about every platform that you'd ever find (although the DOS version suffers from the memory limitations of that purported OS and no one yet has posted a DJGPP-compiled or otherwise 'DOS-extended' version that would break the normal DOS memory limitations). It is also one of the most capable, containing just about every feature that you'd want in an oligo program, from testing oligos to looking for them, under a very large number of conditions. It is, for portability, a command line-driven program that requires that you know how to use an external editor to make up or alter data files. - OSP was originally an X window app whose authors (LaDeana Hillier, lfw@elegans.wustl.edu and Phil Green (pg@genome.wustl.edu); FAX license requests to (314) 362-2985 c/o Paula; DJ reminded me - it's free, but you have to sign a licensing agreement to get it) graciously eviscerated it and stuffed the guts into a text-window Mac app for the rest of us. The X-win version is much nicer (and allows you to access TED, their ABI trace editor), but both will get the job done. OSPX is closer to Primer in abilities, but closer to Amplify in ease of use. It is a very nice piece of work. A Suggestion: If you have an ethernet connection and a Sun SPARC machine available to you (almost everyone does, whether they know it or not), you can get an X-window emulator for your Mac or PC for ~$400, run native-mode OSP and still save $200 compared to buying Oligo. Other programs: - Speakquencer (not to be confused with Sequencher, see below) by Christian Fritze (fri @midway.uchicago.edu). This utility allows you to add variable-speed voice readback to any program. This function has been added to most commercial programs, but if you want to go the PD route, it's a very nice utility to have. - NCSA Gelreader/ContigAsm, is a gel reading program somewhat similar in idea to Helix (see below), which has been a work in progress for the last 2 or 3 years. It, like all the NCSA programs (Image, PALedit, Telnet, Datascope, and many others), is free, of surprisingly high quality, and can be obtained by FTP to zaphod.ncsa.uiuc.edu. In short, what this combination intends to be (someday) is a gel image analysis and sequence assembly program rolled into one. As of now, the Gelreader part of it is useful for analyzing gels to a point (it can read in gels that have been digitized into a TIFF format and can do video densitometry and fragment sizing, and the ContigAsm part of it can be used to assemble restriction maps into simple physical contigs. - COMAP, a program by Kay Hoffman (KHOFMANN@cipvax.biolan.Uni-Koeln.DE), and available from most bio-archives, provides a somewhat similar functionality as the above-mentioned GelReader/ContigAsm for DOS machines. - MACAW - from Greg Schuler at NCBI (schuler@ncbi.nlm.nih.gov; FTP to ncbi.nlm.nih.gov, in /pub/macaw) is the only strictly molecular biology program of which I know that runs under MS Windows (which should give you an idea of how easy it is to program for Windows). It is an exceptionally nice bit of work, allowing you to look for matching blocks of homology in a restricted set of protein or nucleic acid sequences. You can load up to about 16 sequences easily and up to at least 21 sequences if you're prepared to wait up to 30 min. (!) - a bug in the file handling routine of 1.03 that may have been fixed in 1.05. You can graphically pick the sequences (or subsequences) that you want to compare and after the block searching (start with a very high cutoff and work low!), you can lock segments of homology together as you deem fit. You can view the homologies as sequence or as graphic blocks, with or without color accents. The program will also let you print out the graphics, but you may have to expend some energy massaging your printer to make the output reflect the screen. Commercial Programs: - DNA Strider - The only for-pay DNA analysis program that I can unhesitatingly recommend. For $200, you can't buy more program. It can't draw pretty pictures by reading Genbank Feature tables (but if it's already in Genbank, most scientists I know aren't interested in the sequence per se, but what can be done with it, and if you modify it, then the annotations don't coincide, so you still have to draw the picture yourself anyway). Strider can output its graphics in PICT form, so that they import smoothly into drawing programs like Canvas. It doesn't pretend to predict secondary or (God forbid) tertiary structure. It does not support color. It doesn't speak to you in multiple voices. What it does do, though, it does incredibly fast. I've only seen ONE program that comes close to Strider for speed in restriction mapping, and the screen output, while perhaps a little spare (in the best Edward Tufte tradition) is _useful_! It's interface is simple and smooth and easy to do real work with. It will do some limited protein analysis, such as hydrophobicity, and acid/base prediction, and the latest version (1.2) includes the ability to do some primer analysis and Diagonal comparisons (much like DottyPlot). Also, when you paste sequence into a Strider window, it intelligently strips spaces, numbers, etc., allowing you to add sequences from nonStrider formats relatively easily. Strider does not allow you to analyze more than 32.5K bases at a time, a nasty fault in my view and like most of the commercial packages, doesn't allow you to add your own routines. DNA Strider is available only from its author Christian Marck at the following address: Dr. Christian Marck Service de Biochimie et de Genetique Moleculaire Bat. 142 Centre d'Etudes de Saclay 91191 GIF-SUR-YVETTE CEDEX FRANCE fax: (33 1) 69 08 47 12 (warning: he has been known to be slow in responding to correspondence not containing cheques in the amount of US$200) - MacVector and Geneworks: The two most commonly mentioned packages in this forum are MacVector from IBI/Kodak (800 243 2555, 203 786 5600, fax 203 624 3143) and Geneworks from Intelligenetics (800 876 9994, 415 962 7300, fax 962 7302). Both cost on the order of $3000 per machine, enforced by the use of a hardware lock. >From personal experience and watching the BBSs, IBI has historically been more willing to make deals on multiple purchases and/or selling additional locks so that installing the software on additional machines doesn't have to cost the full amount (Geneworks will sell you one additional 1 lock per package for $500). For example, the local area was offered a deal whereby academic users could buy MacVector for $1500 and then could buy an additional lock for another $200. (This deal has since expired, but certainly ask your local rep if he has a similar deal). Both have their strengths and weakness'. MacVector has been around longest and therefore _should_ be the most stable. From my experience and others', this is a questionable assumption - see recent threads for examples. Since this is not an intensive examination of each program and since these are the most popular of the SAPs, I will leave it to others (or at least to another day) to go into niggling, nitpicking, picayune detail about their blemishes. Instead, a brief overview, mostly as to value. Both come with almost everything but the kitchen sink; Geneworks includes a skeletal sequence assembly program for which MacVector makes you pay extra. I assume that because of the care (and time....;^)...) that it has taken IBI since it began being advertised, the IBI AssemblyLign will be quite a nice piece of work, but I believe that it has still not been released. Geneworks, according to promotional material, was designed to take advantage of the latest techniques in object-oriented programming - there has been a recent thread about this topic on the bio-soft group. Well, there's good and bad in that. It seems to take the approach that you should always be looking at the 'gestalt' of the analysis, and to this end, all the views of a sequence are linked; modify one and the rest change to reflect that change. That's fine, except that sometimes you don't care about the other 5 views on-screen, and you don't want to wait the extra XX seconds it takes for them all to update. A personal preference. MacVector takes the approach (or did - our version is 3.5, about a year old) that it shouldn't do anything unless you specifically tell it exactly what to do. Want a restriction map? Fine, fill in the (admittedly) extensive selection menu as to HOW you want the restriction map presented. As mentioned previously, I am not at all happy about it's stability. While my view of MacVector is on the cool side, a vote of support has been recently forwarded by stine@jeeves.ucsd.edu (Blaine Stine), so for the benefits of MacVector, contact him (her?). - MacDNASIS/PROSIS aka MacDNAsis Pro is from Hitachi Software (sold locally through NOVEX 800 456 6839). I just tested their latest demo (1.01) and frankly, it's a little schizophrenic. It's cheaper than the rest (~$1500), not hardware copy-protected, and in fact the reps at the demo said that they considered that limited copying (within one lab) was within the license agreement, altho I'm not sure that assessment is the official legal line. What bothered me most about DNASIS was that its menu system was not very intuitive - it very much looked like the product of programmers who had limited input from a bench scientist, with features grouped algorithmically rather than by use. They have also taken broad liberties with the Mac interface, which steepens the learning curve quite a bit. The other glaring fault was that restriction digests are horrifically slow, fully 2 _orders_of_magnitude_ slower than Strider or DNASTAR (see the performance table at end). It will take sequences larger than 32.5K, but you wouldn't want to analyze them. It's protein analysis routines were also not particularly quick nor complete and were scattered around the menus haphazardly. It will do sequence alignments, but will not allow you to set very many parameters and the ones it does allow you to change are not well explained by the otherwise quite good help system. Surprisingly, it's CD ROM database searching routines are surprisingly well thought-out and relatively fast, subjectively quite a bit faster than others I've tried, including ENTREZ. The interface between the CD searcher and the rest of the program is a little cumbersome, but if you're not close to an Internet connection, it's one of the better CD searchers around. It is still slower than gopher, but it has the ability to do boolean searches on specified fields on multiple databases It also comes with a sequence assembly program, but again, it isn't particularly well thought-out or full featured, although it does support the use of a standard electromagnetic digitizer (as opposed to some programs which require that you buy the company-modified digitizer (the PC version of DNASTAR was one, IBI was another). Watching others use it, it appeared that it was not very intuitive either. There were some unexpected pleasures - DNAsis includes quite a nice 'Plasmid Artist'-like drawing program and it claims to be able to do Zuker RNA folding analyses (the rep said that it could do 2000 bases overnight on a Quadra - isn't that a little fast?) MacDNAsis also includes a primitive primer analysis tool. All of the above criticisms made to the reps were answered with "yeah, we took care of that in Ver 2.0". I managed to take a peek at it and it does look a bit better, but not tremendously compelling. They also said that they were working on bringing their PC version (at present, almost unusable) up to speed by re-writing it for Windows NT but I wouldn't hold my breath. In short, I don't consider it a particularly good deal. Hitachi makes a terrific plunge router (I have one); it is almost as good a cloning tool as MacDNAsis. MacMolly Tetra Ver 1.0 by Soft Gene (030-8326342, fax 030-8219764) is a rewrite and rename of one of the first Mac SAPs. I'm surprised that it hasn't gotten more air time than it has (although John.Hardham@lambada.oit.unc.edu recently posted a short, relatively positive note on it). If I remember the first correctly, they have merged some of the original's features, but it still comes as multiple programs (like DNASTAR Mac), but as far as I know, you cannot buy them individually. Would that they had spiffed up the code as much as the packaging. In the central module "Analyze", you can still only import a very restricted number of formats, and their restriction mapper is positively Devonian for a recent Mac application. While enzyme selection is the _best_ I've seen; easy to use and very flexible (select and deselect by 6, 5, and 4 cutters, extensions, from all, asymmetric, or convertible sites, heat tolerant or inactivated enzymes, sensitive or insensitive to dam or dcm methylation, salt sensitivity, single strand or star activity!), and it does support a digitizer, it is extraordinarily slow and its output is basically a TEXT WINDOW (Aack!). It does have basic oligo/primer tools included, but again, it is on the slow side. There is also no on-line help. The Complign module does a reasonable job of multiple alignments but not really any faster than clustalv available with Don Gilbert's SeqApp (see timing chart at end). In short, Tetra looks like a beta test of what might turn out to be a reasonably good SAP, but it's really not up to snuff compared to the other programs available. - DNASTAR Mac (608 258 7420) is the Mac rewrite of the popular (maybe _popular_ is the wrong word - maybe widespread is better) program for the PC. In keeping with its PC past, instead of a huge, monolithic program, there are a number of smaller ones, each with a different focus. EditSeq, Mapdraw, Protean, XRay, Seqman, Geneman, etc. An advantage of this is that you can buy them separately. If you don't want the sequence assembly module (which is functional; better than Geneworks, but lackluster compared to Sequencher, for example) it's an easy way to save $800, or if you use gopher or other network solutions to access sequence databases, you can save another $800 by clipping Geneman (not to mention the cost of the CDROM), or leave out Align because it will only align 2 sequences at present. This last problem is being addressed by DNASTAR and they were quite willing to let us try out a "pre-alpha" version of their multiple alignment package. For a "pre-alpha", it is surprisingly well put together and full-featured; the main point being that it does in fact exist and their protestations of it being released 'Real Soon Now' have some basis in fact. Compared to it's PC parent, this version is infinitely smoother, much more intuitive, and it's on-line help is complete, if organized in a peculiar manner (it's not at all alphabetical, it seems to be only slightly context-sensitive, and it lacks hypertext links). It is also not as capable as the parent; the PC version, like the GCG suite to which it is somehow related, could (eventually, with enough aggro) do just about anything. The Mac version is somewhat more restricted in scope, but much better integrated. With all these reasons not to buy it, why bother with it at all? Because the restriction mapper is _quite_ flexible and FAST, as fast as or faster than DNA Strider for most things. One part about it that I don't like is that you have to use the separate sequence editor module to import/export sequences. Editseq does this reasonably well and covers most popular formats, although not as many as DG's READSEQ (see above) does. DNASTAR's protein analysis module (Protean) is _quite_ spectacular and the rest of the modules are acceptably good. Maybe most importantly to a group of researchers, it is the only SAP available with a token check-out licensing scheme. This drives down the price for the full setup from ~$3,000 per user to $1700 per user for 5, getting cheaper the more you buy) _AND_ it's not tied to particular machines - the 5 people using it can be anywhere your network extends. It is because of this last feature in particular that we're evaluating it with a view to installing it as an Institute service - the rest of you molbio software companies, take note! An additional point, at least obliquely related to this topic: It was one of the principles of DNASTAR (Fred Blattner) who recently instigated the brouhaha with NCBI, alleging that a government organization had no business in the development of biotech software or information services and should therefore have its budget cut severely (Science 257:156-7, 1992). Those of you who have used and appreciated NCBI's services may feel that this was not a welcome turn of events and may change your feelings toward DNASTAR. - Helix (~$2000, including a hand scanner; less if you already have a compatible scanner) is a new sequencing product from, of all people, General Atomics (800 424-3549). It's a good idea - use a handscanner to scan in the lanes that you want read, then have the software interpret it for you. Why buy a gigantically expensive 11x17" scanner when you usually only want part of the film read anyway? They claim (see the August 1992 Biotechniques Vol 13 (2), p207) that it will do a scan's worth of reading in under 2 min. ). I have tried it and it does work - sort of. It requires system 7 and a MacII+FPU with 10 (!) MB of memory (but you can get away (slowly) with using virtual memory); it also uses a hardware lock. The program is surprisingly well-designed for a Ver. 1.0 release. Installing the hardware and software is easy and it has a nice interface. The problem is that it just doesn't work well enough to warrant spending the money. The problems are: a) it's difficult to get the scans to enter completely straight and if the scans are warped, you cannot get the lane guides to track the lanes correctly. Unlike NCSA's GelReader, which uses a 'comb' aid to set the lanes, you can use an expandable box to include the lanes you want and then _individually_ set the lanes within this box, so that if the lanes are at an angle, you can correct for it - a nice touch; however, you cannot correct for warped lanes. b) Most problematic of all, it doesn't determine sequence very well, at least from the default settings and I fed it some pretty nice gels (which it inhaled with surprising ease via the handscanner and then compressed on the fly (to about 1.5 - 2 MB per full length scan of 32 lanes). Like some of the more upscale automated scanners, it then shows the digitized image on the screen, supplements it with the traces of the densitometry, much like an ABI trace output and then interprets it. For the amount of computation involved, it runs surprisingly fast, perhaps _too_ fast, because the base calls that it makes are very inaccurate. I could excuse some of the confusion in difficult areas, but it ignores sequence that is clear and unambiguous. Granted, it does allow you to oversee its calls (and the interface for doing so is nicely thought out with a window for the image, another for the textual sequence, and links between the two such that if you touch a called base in the image, the corresponding base in the text window is highlighted and vice versa, but the sequence itself is so riddled with errors that the whole point of the software is bypassed. In short, this version is not quite there, but I would imagine that with a little more work on the interpretation algorithm, it will be a very nice product. I just heard from the programmers today (8-27-92) who said that an error had been introduced while trying to speed the interpretation algorithm and a new version will be in my hands shortly - more as it becomes available... - Sequencher is (a silly name for) an exceptional sequence assembly program from GeneCodes (313 769 7249; $1200 w/ hardware lock, site/multiple copy discounts). It is specifically for gel assembly, not for general purpose analysis, although it does include a very nice (though not blazingly fast) restriction mapper. It is one of the easiest, fastest interfaces for what was once a horrible job. From my experience with the demo on some truly awful sequence: I threw the sequence at the program and it threw the contig back at me in about the same time as it takes for the GCG GelAssembly to load. Of course, it also allows assisted manual manipulation of the sequence as well. It also allows you to view the contig diagramatically or as sequence in a very nicely designed, scalable window (sort of like the "pretty" output from the GCG programs). It also does something that few other programs will do - take tracefile output from an ABI autosequencer and allow you to view it for verification while assembling the contig (it will also compress the tracefiles automatically on exit). Of course, if you use film, it also supports a standard interface) electromagnetic digitizer. One of the strengths of the program is that the parent company is small, aggressive, and very willing to listen to suggestions. Their response time with fixes or enhancements is nothing short of miraculous. After sending me the demo, a rep from Gene Codes (the president, as it turned out) called me to see how I liked it. I had some suggestions for the interface and some questions; within 2 weeks I had another version incorporating those suggestions. The same has been the case for another lab here. I was very impressed. Gene Construction Kit by Textco (vox/fax: 603 643 1471) is a relatively new entry but it is an extremely interesting cloning tool. All of the previously mentioned programs allow you to 'clone electronically', but GCK is optimized for it. It remembers the history of the sequences you use to clone, what enzymes you used to clip the sequences, what you did to the ends, whatever you create, you have a history. For any lab that does a significant amount of cloning, it is easily worth the money . Besides being a great idea, it is also relatively easy to use. The last program that attempted this trick was a piece of work from Intelligenetics called Strategene (A great idea, if only they hadn't made it for _only_ a Xerox workstation and demanded about $50,000 for it). Despite my usual rage at demos, there is an excellent demo-tutorial of GCK available at the usual archives (ftp.bio.indiana.bio, in the IUBIO Software+Data/molbio/mac folder by gopher). Timing Charts: The following times were obtained using MacIIfx (40Mhz 68030, 68881fpu, 8 MB RAM, 507MB Fujitsu HD, System 6.0.5, 19" E-machines color monitor). Times are also listed for DNA Strider using a Mac (MacSE, 8 Mhz 68000, no fpu, 2.5 MB RAM, 40 MB CMS HD (Seagate mechanism), system 6.0.5, built-in B+W monitor) and a MacIIci (25MHz 68030+68881, 8MB RAM, 105 MB HD (Conner LPS mechanism). The times were measured to the nearest half second using Douglas Adams' instrument of societal dissolution, the digital watch. The sequence was 32.5 kb of lambda, the largest amount of sequence that Strider can handle in one window. The other programs can, unless noted, handle more sequence, although with varying facility. Square brackets indicate an explanatory note or additional information listed at bottom. Program Time to Restriction Restriction 6 Frame Load Map - Text Map - Graphic ORF Map DNA Strider 1.2 2.5[1] 3.5/8.5[2] 1 1 (Mac SE) 19.2 42/1'43" 5.5 8 (MacIIci) 8 9.5/17 1.5 3.5 MacVector 3.5 6 45/32[3] 29 4[4] Geneworks 2.0 14 14/27[5] 18.5 4.5 (Demo) DNAsis 1.0 5 10 Minutes[6] [7] 38[8] (Demo) MacMolly Tetra 3.5 10.5[10] [10] 12.5 Analyze [9] Sequencher 10 [11] 3[11] n/a (Demo) SeqApp 5 42[12] [12] [12] DNASTAR Mac 10 4.5[13] 16[14] [] [1] At startup, Strider reads in and recalculates the hash table for it's Restriction Enzyme library, an unnecessary waste of time, especially for slower Macs. This should really be made optional. [2] Times are for "Restriction Map"/"Complete Restriction Map"; the latter not only displays the textual restriction map, but all the enzyme sites sorted by number and location of cuts. (170 enzymes) [3] After 45", the program stopped with the error "too many sites"; after <10 sites were selected, it completed in 32". (163 enzymes) [4] MacVector can find ORFs by defining an ORF in a number of ways including Ficketts Method. I used the default ORF which looks for ORFs beginning with an atg and ending with a stop, with a minimum translation of 25 aas. [5] 14" for all 100 enzymes; 27" for those that cut <=2 times (the default). [6] DNAsis could not import the text file of sequence and when I tried to cut and paste the sequence into a "new" window, it reported a number of invalid residues which it then stripped out, leaving the sequence about 400 nt short (there were no ambiguous nt in the sequence), so I pasted in the missing # of nt from the same sequence. I did not wait for the restriction to complete; the program throws up a "remaining" thermometer and when it hit 15%, I stopped it and did the math. DNASIS can show either actual site cleavage or use the 5' end of the recognition sequence (as Strider does); it takes the same time either way. (247 enzymes; a contributing factor) [7] I could not determine how to make DNAsis give me a graphic map without modifying the enzyme file. [8] DNAsis not only gives you an ORF map, but also includes a ORF position table. [9] Tetra comes in multiple modules; Analyze is the module that performs the restriction and translation functions. [10] The restriction enzyme selection process is quite flexible, allowing you to select or deselect by # of bases in the recognition site or by overhang (but not by number of times of cutting). After 10.5 minutes, the first part of the screen appeared then an irreversible memory error occurred. After rebooting, I selected only 6 cutters and after 1 minute the list had only gotten through the enzymes starting with "A", so I canceled the digest; the typical Mac 'cancel' ("command" + ".") works smoothly and does not kill the application. (324 enzymes). [11] The Sequencher demo comes with only 21 enzymes entered in its database, a strange omission, but Ver 1.0 comes with 59. [12] No doubt a function of it's 'alpha' test status, SeqApp has some quirks. Despite it's incorporation of READSEQ, it was not able to smoothly import the 32.5 kb sequence file and when I tried to copy/paste it into a 'new' window, the sequence could be pasted, but it was not 'intelligently pasted' as it would have been in Strider - i.e. uneven lines, incomplete numbering, and most bizarrely, no sequence showing up in the multiple sequence window. The sequence name was there and when selected again, the sequence would appear in the editing window (still unevenly formatted, but with the correct numbering), but I could not get it to show up in the multiple sequence window. After 42", the text window appeared showing the enzyme cut sites, but I was unable to make the complete text sequence window appear.. SeqApp does not support (yet?) graphic maps, and the translation is supposed to show up underneath the text sequence, but doesn't. (189 enzymes) [13] I was astonished at how fast DNASTAR was able to produce the initial map, including the sequence, complement, all restriction sites (146 enzymes), and 3 letter translations in all 6 frames, any and all of which can be modified or removed using a single easy-to-understand menu. It, like a number of other programs has screen sensing so that it will expand the window to full size initially. This may be annoying, because it obscures the rest of your screen. Multiple Alignments: For this exercise, I used 5 POU domain proteins ranging in size from 235 to 451 aas. Because of the various implementations of the homology search, I was not able to do exact comparisons, but I tried to use similar parameters when possible. Times are in min. (') and sec ("). Time (min=', sec=") DNASTAR: 40"[1] (MegAlign) Geneworks: 2'43"[2] MacMolly Tetra: 2'47"[3] (Complign) MacDNAsis Pro: 175'(yes, minutes)[4] ClustalV: 33"[5] MACAW: 17"[6] (386/387 PC) [1] This is not quite fair because this is not a released version, but it gives an idea of what the finished product will perform like. MegAlign flatters Geneworks quite a bit in terms of it's presentation and format, but manages to improve on the speed considerably. It also introduces a number of improvements on the interface. DNASTAR and Geneworks are by far the easiest to set up and use. Their use of color for shading differs slightly but both are helpful in determining conserved sequences. [2] See the comments above. Geneworks gives you more access to the controlling parameters and to the coloring scheme. [3] Don't these guys use their own software? Don't they check out the competition? The menu system for Complign is among the most ill-thought-out that I've ever seen. They have managed to bury a reasonably fast implementation under a serpentine, recursive, oddly phrased, and unnecessary set of menus and procedures so complex that it made me want to scream. Let's see - where to start? First you can't set up a multiple homology. For reasons of their own, they give you the choice of comparing 1 protein to either one or multiple others. It then takes 3 sets of menus to start the alignment, the output of which is displayed on precisely overlapping sets of text windows. Finally, the alignment itself proceeds dynamically, being modified onscreen as you watch. This may be eye-catching the first time, but it is a tremendous time sink if all you want done is the alignment. It looks like they didn't take out the debugging statements. Finally, while I admittedly did not take the time to delve into these programs do the last combination of options, the alignment that Complign finally presented to me was completely wrong. I suspect that the gap value it presents as a parameter does not mean the same thing as in the other programs. [4] Again, MacDNAsis finishes a distant last, somewhat perplexing because of the generally positive things in the press about Hitachi Software. Again, I missed the end because of impatience; after 8% (14 minutes), I killed it. [5] I included ClustalV because many of the commercial programs are implementations of it and you can get an idea of how much speed you gain (or lose) by spending a great deal of money. ClustalV is fastest among Mac programs (using the MacII+FPU version supplied as an additional module to Don Gilbert's SeqApp; it runs as a stand-alone, text-window application) but it's output is a non-proportional font text file. Don't expect whizbang graphics and autoshading of identities, but if you're doing exploratory alignments, it is certainly worth having if only because it's very fast. For final figure production, the text is easily importable into Canvas, so it can be shaded to your heart's content. [6] MACAW 1.05, running under Windows 3.1 on a 25 MHz 386/387 with 8MB extended memory. Macaw is included in this report because it is free and relatively easy to use; it can import 16 sequences with ease, 6 more uneasily, and is quite fast. It does not do an autoalignment like the others, but instead highlights local blocks of homology above a user-defined level. For this reason, I like it more than the autoalign programs for exploratory work. Also, because it chooses only local blocks, you can detect homologies that might go undetected if you forced a global alignment. For example, if there was an EGF homology at the Nterm of one protein and an HLH motif at the Cterm and this was reversed in an another protein, you would only pick up the stronger of the two homologies by the general multiple alignment programs. After you have examined the local similarities, you can link the ones you think are significant with a 'link' command that automatically introduces gaps and enhances the visibility of the blocks. Let me know if this has been useful or what it needs to become so. Please address correspondence to the 'salk-sc2' address and if you want to send me mail, please prefix the Subject line with HJM! Cheers, Harry