/* From gilbertd@bio.indiana.edu Wed Mar 18 01:22:09 EST 1998 Article: 982 of bionet.molbio.genbank Path: news.indiana.edu!not-for-mail From: gilbertd@bio.indiana.edu (Don Gilbert) Newsgroups: bionet.molbio.genbank Subject: Re: Cross-referencing protein/nucleotide sequences Date: 18 Mar 1998 06:00:27 GMT Organization: Biology, Indiana University - Bloomington Lines: 145 Message-ID: <6enntr$6on$1@jetsam.uits.indiana.edu> References: <6ejp8k$hlr$1@nnrp1.dejanews.com> NNTP-Posting-Host: chipmunk.bio.indiana.edu Xref: news.indiana.edu bionet.molbio.genbank:982 The Perl scripts that Bill Pearson gave are a good example of how biologists can use programs to automate data analysis. I think learning a programming language is a useful skill for most students in biology, certainly those dealing with biosequence analysis. My suggestion is to start with Java as a first language. Java provides a very rich set of standard methods (graphic interfaces, standard network functions, database connectivity), is easy to learn, and can be used for projects small and large, and there are a very large number of Java resources (people, software libraries) out there. See below for a rough Java equivalent of the Perl script to fetch sequences from Entrez. It requires nothing beyond a standard, free java compiler and standard java libraries (version 1.1). B.F. Francis Ouellette noted: | It makes good use of the bandwith between your site and that of NCBI in | setting up a session where the whole request of multiple records is | transfered as one file, as opposed to a transfer for each accession/gi | back and forth that a "foreach" loop would create. Francis, I tried programming an http post request to your batch server, but couldn't get past a Proxy error response (it worked on an echo server). Are there any instructions around for programmatic access to this? http://www.ncbi.nlm.nih.gov/cgi-bin/Entrez/nph-batch/ Anyway, the attached java example does batch the requests by default, using your Entrez/query server. (&uid=a,b,c,...) -- Don --------------------- cut -----------------*/ // FetchFromEntrez.java // a simple example of a biocomputing network tool written in java // d.gilbert, mar'98 // to compile: javac FetchFromEntrez.java // test run : java FetchFromEntrez - dna gb_U30153 gb_M81833 gb_L13173 // test run : java FetchFromEntrez - prot gi_304809 gi_2286196 import java.io.*; import java.net.*; import java.util.*; public class FetchFromEntrez { static boolean batch= true; public static String entrezUrl= "http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=6"; public static String usage= "This java program extracts dna or protein sequences using gid numbers.\n"+ "from a perl script by Bill Pearson, converted to java by d. gilbert\n\n"+ "Usage: java FetchFromEntrez [options] gid gid gid ...\n"+ "options:\n"+ " dna|protein - choose dna or protein output\n"+ " fasta|genbank - sequence format\n"+ " html|text - html or plain text\n"+ " - - write to standard output\n"+ " output=somefile - output filename\n"+ " single|batch - do single or batch request\n"; public static void main(String[] args) { if (args.length==0) System.out.println(usage); else { boolean html= false; String db= "n", form= "f", outname= "entrez_fetch"; for (int i=0 ; i" : "LOCUS"); } if (batch) { String gids= ""; for (int i=0; i