MAVG contains a number of C programs for finding CpG islands in a genomic sequence Description file for MAVT program (11/22/2001): mavg (mavg.c):the main testing program Usage: read testing data from the stadard input and output result. Important subroutines: get_input() [line number 38] mln_part() [line number 61] mslc() [line number 207] --> O(n) drs_part() [line number 76] mavs() [line number 156] --> O(n) ?? but analyzed O(nL)? Locate() [line number 121] --> used by mavs() report() [line number 247] --- debug info. to see, turn it on in main() clock() ---> builtin function as timer NB: 1000 clock = 1 sec. Input format for mavg.exe: n L range seed kk (n integers) 0 0 0 0 0 Note: n: numbers of elements; L: limit length. range and seed are output from gen-data for repeated experiment. several sets of tested can be put into one file but must be ended with five zeros: 0 0 0 0 0 Note: For an example, see [mavg.in] file. currently, the max n allows is 4,000,000 (4 millions), but the number can be changed in source in a machine with larger memeory. Output for mavg.exe: First few lines (8) are for debuging purpose, just ignore them It follows by two groups of output The first group print the largest averaged [kk] subsequence The second group print the smallest averaged [kk] subsequence gen-data (gen-data.c): generating random input sequence for mavg. Input arguments for gen-data.exe: Usage: gen-data n L range seed kk Note: all these five arguments have default values Output format for gen-data.exe: Same as the Input format for mavg.exe First-Time-Tester: (or just run: "sh runme.sh") 1. Try this: gen-data | mslc -s 2. Try this: gen-data 3000 | mslc -s 3. Try this: gen-data 10000 | mslc -s 4. Try this: gen-data 1000000 | mslc -- Yaw-Ling Lin, Dept CSIM, PU. Generation of an input file for MAVT (4/18/2002): base2no is a program for converting a sequence file in fasta format into a sequence number of numbers for MAVG. Each dinucleotide is converted into a number. The usage for base2no is base2no Seq L K > Num_File where Seq is a file of a DNA sequence in FASTA format, L >= 10 is the minimum length of regions, K >= 1 is the number of regions to be reported, Num_File is an output file of numbers. Xiaoqiu Huang Department of Computer Science Iowa State University Description file for MAVG program (11/22/2001):