Don, I am including an update of my MISMATCH program (for designing PCR primers so that any mutant alleles can be distinguished from wild type by restriction enzyme digests). The update includes a program MISINDEL which can handle insertion-deletion mutations. The original MISMATCH program is included (unchanged) and handles base substitution mutations. You have the previous version in http://iubio.bio.indiana.edu:70/IUBio-Software+Data/molbio/ibmpc/mismatch.txt-basic Cheers, Lance *************************************************************************** Lance S. Davidow, PhD Computer Systems Manager Dept. of Molecular Biology Fax: (617) 726-6893 Mass. General Hospital Phone: (617) 726-5955 Boston, MA 02114 E-Mail: Davidow@frodo.mgh.harvard.edu *************************************************************************** ---------cut here for text version of MISMATCH and MISINDEL plus docs---- By Lance Davidow-- NEW ADDRESS as of JULY 1993 Email davidow@frodo.mgh.harvard.edu Dept of Molecular Biology, Mass. General Hospital, Boston, MA 02114 Ref: L. S. Davidow (1992). Selecting PCR designed mismatch primers to create diagnostic restriction sites. Comput. Applic. in Biosci. (CABIOS) 8:193-194. ************************************************************************ MISINDEL (covers insertion and deletion mutations) added 14 December 1994. Run MISINDEL separately from, but similarly to MISMATCH. ************************************************************************ -------------------MISMATCH.DOC follows------------------------ This file, MISMATCH.DOC contains user information for the BASIC program MISMATCH.BAS. BACKGROUND. This program automates the design of " Directed Mismatch" experiments. In this technique (probably first used by Haliassos et al. (1989) Nucleic Acids Res. 17:3606) the user desires to use restriction enzyme digestion to distinguish between the wild type allele and a common mutant allele of an interesting gene even though the two sequences do not immediately differ by the presence vs. absence of a site. The user designs a PCR primer containing a single mismatch at a site near the mutation such that the PCR products of the wild and mutant genes differ by a restriction site. Since the mismatched primer functions poorly in the first few cycles, the PCR reaction is usually run a few cycles longer than a reaction with correctly matched primers. Many diagnostic laboratories prefer the Directed Mismatch technique to the Allele Specific Oligonucleotide hybridization method for allelotyping cystic fibrosis genes (A. Beaudet, R. Fenwick et al., personal communication). Many published uses of the Directed Mismatch technique involve allelotyping at the the various RAS genes. COMPUTER REQUIREMENTS. Any computer capable of running BASIC should be able to run this program since it does not use unusual instructions or features. It has been run with IBM-BASIC and GW-BASIC on IBM-PCs and clones. USER SUPPLIES. The user must supply the wild type sequence for the region of his/her gene in a file as a single line consisting of all upper case letters and no intervening blank spaces. A 50 character line containing 25 bases before and after the mutant site is large enough to allow the program to test all the known restriction sites in the current default file APR91.ENZ. The user must either supply a restriction site file or else use the default file. The enzyme file contains the name of an enzyme in columns 1-13 and the recognition site (using all upper case letters and the same ambiguity base assignments as New England Biolabs--sorry, there are two different systems in use and I mixed them in the March 1991 version of the MISMATCH.BAS program) beginning in column 14. The user is asked for the position in her sequence file of the mutation in question, the identity of the mutant base and the wild type base. The user is also asked for the name of an output file in which the program can list the possibly useful single mismatched sites and the alignment of the recognition sites with the gene sequence. Output also is sent to the screen. RUNNING THE PROGRAM. An inexperienced user should follow these directions. All of your responses to the computer requests are terminated by or . 1. Change the default disk and directory on your PC to the directory containing your BASIC program (BASIC.COM, BASICA.COM, GWBASIC.COM, etc.), for example: cd c:\basic. Copy the program MISMATCH.BAS, the restriction file APR91.ENZ and your one line gene sequence file (I am assuming this last file is called myfile) to that directory as well. 2. At the prompt C> issue the command BASIC MISMATCH. 3. When the program asks for your sequence file, type in filename. 4. When the program asks for the enzyme file, just hit to accept the default restriction file. 5. When the program asks for the name of an output file, invent a name such as mygene.mis which you can later print with the DOS print command or edit with your word processor. 6. Answer the program's questions about the position of the mutation and the identities of the wild type and mutant bases. 7. Sit back while the program runs. When it is finished, you can either run it again (choosing a different enzyme file or a different gene sequence file) with the command RUN, or you can leave BASIC with the command SYSTEM. A compiled version runs considerably faster than interpreted BASIC. ------------------"MISMATCH.BAS" follows:--------------------------- 10 REM This program, "MISMATCH", by LANCE DAVIDOW (COLLABORATIVE RESEARCH,Inc. 20 REM (Two Oak Park, Bedford MA 01730. Phone (617) 275-0004 ext.115) 30 REM finds directed mismatch primers to allow allelotyping tests by a 40 REM restriction digest analysis following PCR. version 11 april 1991 50 REM ref: Haliassos et al. (1989). Nuc Acids Res 17:3606 60 DEFINT A-Z 70 REM ALLBASES$ IS THE STRING OF ALL POSSIBLE LEGAL BASES 80 ALLBASES$="ABCDGHKMNRSTUVWY" 90 REM matrule$ array has NEBiolabs ambiguous base matching rules 100 DIM MATRULE$(16) 110 REM blanks or illegal characters in the recognition site treated as "N" 120 MATRULE$(0)="ACGTU" 130 MATRULE$(1)="A" 140 MATRULE$(2)="CGTU" 150 MATRULE$(3)="C" 160 MATRULE$(4)="AGTU" 170 MATRULE$(5)="G" 180 MATRULE$(6)="ACTU" 190 MATRULE$(7)="GTU" 200 MATRULE$(8)="AC" 210 MATRULE$(9)="ACGTU" 220 MATRULE$(10)="AG" 230 MATRULE$(11)="CG" 240 MATRULE$(12)="TU" 250 MATRULE$(13)="TU" 260 MATRULE$(14)="ACG" 270 MATRULE$(15)="ATU" 280 MATRULE$(16)="CTU" 290 INPUT "FILE NAME WITH WILD TYPE SEQUENCE REGION ON 1 LINE";GENEFILE$ 300 INPUT "RESTRICTION ENZYME FILE [APR91.ENZ]";ENZFILE$ 310 IF ENZFILE$="" THEN ENZFILE$="APR91.ENZ" 320 INPUT "OUTPUT FILE NAME [NUL=NO OUTPUT FILE]";OUTFILE$ 330 IF OUTFILE$="" THEN OUTFILE$="NUL" 340 OPEN GENEFILE$ FOR INPUT AS #1 350 LINE INPUT #1, GENESEQ$ 360 CLOSE #1 370 OPEN OUTFILE$ FOR OUTPUT AS #3 380 PRINT#3,"GENE FILE=";GENEFILE$,"ENZYME FILE=";ENZFILE$,"OUTPUT=";OUTFILE$ 390 INPUT "MUTANT POSITION in bp from start--e.g. 25";MUTPOS 400 INPUT "WILD TYPE BASE--UPPER CASE ONLY!!!--e.g. C";WTBASE$ 410 REM program does not verify that position and base agree 420 INPUT "MUTANT BASE--UPPER CASE ONLY!!!" ; MUTBASE$ 430 PRINT#3, 440 PRINT#3,"AN '*' INDICATES MUTANT POSITION. A '-' DENOTES MISMATCH POSITION." 450 PRINT#3, SPC(MUTPOS-1) "*" 460 PRINT#3, GENESEQ$ 470 PRINT#3,"MUTANT POSITION=";MUTPOS,"WT base=";WTBASE$,"Mutant base=";MUTBASE$ 480 PRINT "First Pass--Sites Present in WT but not in Mutant" 490 PRINT#3,"First Pass--Sites Present in WT but not in Mutant" 500 FOR PASS=1 TO 2 510 IF PASS=2 THEN MID$(GENESEQ$,MUTPOS)=MUTBASE$:SWAP MUTBASE$,WTBASE$: PRINT "--PASS#2--Site in MUT": PRINT#3,"--PASS#2--Site in MUT" 520 OPEN ENZFILE$ FOR INPUT AS #2 530 WHILE NOT EOF(2) 540 LINE INPUT #2,NEXTENZ$ 550 ENZNAME$=MID$(NEXTENZ$,1,13) 560 REM ENZYME NAMES IN COLS 1 TO 13. RECOGNITION SITE FROM 14 TO END 570 RECOG$=MID$(NEXTENZ$,14,65) 580 SITESIZE=LEN(RECOG$) 590 FOR INSITE=1 TO SITESIZE 600 REM DOES THIS BASE MATCH WT BUT NOT MUTANT? 610 LOOKUP=INSTR(ALLBASES$,MID$(RECOG$,INSITE,1)) 620 A$=MATRULE$(LOOKUP) 630 REM call subroutine if this base does distinguish wt and mut 640 IF INSTR(A$,WTBASE$)<>0 AND INSTR(A$,MUTBASE$)=0 THEN GOSUB 740 650 NEXT INSITE 660 WEND 670 CLOSE #2 680 PRINT 690 PRINT#3, 700 NEXT PASS 710 CLOSE #3 720 END 730 REM 740 REM Subroutine to count up mismatches between enzyme and sequence 750 REM then call up another subroutine to output any alignments with 760 REM one or 0 mismatches 770 REM align recognition site with sequence. INSITE base over MUTPOS base 780 MISS=0 790 MISSPT=0 800 FOR TEST=1 TO SITESIZE 810 LOOKUP=INSTR(ALLBASES$,(MID$(RECOG$,TEST,1))) 820 A$=MATRULE$(LOOKUP) 830 IF INSTR(A$,MID$(GENESEQ$,(MUTPOS-INSITE+TEST),1))=0 THEN MISS=MISS+1: MISSPT=TEST 840 NEXT TEST 850 IF MISS<=1 THEN GOSUB 880 860 RETURN 870 REM 880 REM Subroutine to output a useful restriction site and alignment 890 REM print a "*" at the mutation site and a "-" at the mismatch base 900 PRINT#3, 910 PRINT 920 OUTLINE$=SPACE$(20) 930 IF MISSPT<>0 THEN MID$(OUTLINE$,MISSPT)="-" 940 MID$(OUTLINE$,INSITE)="*" 950 PRINT , OUTLINE$ 960 PRINT#3, , OUTLINE$ 970 PRINT#3, ENZNAME$, RECOG$, MISS;" MISMATCHES" 980 PRINT ENZNAME$, RECOG$, MISS;" mismatches" 990 PRINT#3, "TARGET DNA ", MID$(GENESEQ$,(MUTPOS-INSITE+1),SITESIZE) 1000 PRINT "target dna ",MID$(GENESEQ$,(MUTPOS-INSITE+1),SITESIZE) 1010 RETURN -----------------------------APR91.ENZ follows:------------------------- AatII GACGTC AccI GTMKAC AciI CCGC AflII CTTAAG AflIII ACRYGT AgeI ACCGGT AhaII GRCGYC AluI AGCT AlwI GGATC AlwI GATCC AlwNI CAGNNNCTG ApaI GGGCCC ApaLI GTGCAC AseI ATTAAT AvaI CYCGRG AvaII GGWCC AvrII CCTAGG BamHI GGATCC BanI GGYRCC BanII GRGCYC BbsI GAAGAC BbsI GTCTTC BbvI GCAGC BbvI GCTGC BcgI CGANNNNNNTGC BcgI GCANNNNNNTCG BclI TGATCA BglI GCCNNNNNGGC BglII AGATCT BsaI GGTCTC BsaI GAGACC BsaAI YACGTR BsaBI GATNNNNATC BsaJI CCNNGG BsaHI GRCGYC BsiWI CGTACG BslI CCNNNNNNNGG BsmI GAATGC BsmI GCATTC BsmAI GTCTC BsmAI GAGAC Bspl286 GDGCHC BspDI ATCGAT BspEI TCCGGA BspHI TCATGA BspMI ACCTGC BspMI GCAGGT BsrI ACTGG BsrI CCAGT BssHII GCGCGC BstBI TTCGAA BstEII GGTNACC BstNI CCWGG BstUI CGCG BstXI CCANNNNNNTGG BstYI RGATCY Bsu36I CCTNAGG Cfr10I RCCGGY ClaI ATCGAT DdeI CTNAG DpnI GATC DpnII GATC DraI TTTAAA DraIII CACNNNGTG DrdI GACNNNNNNGTC EaeI YGGCCR EagI CGGCCG EarI CTCTTC EarI GAAGAG EcoNI CCTNNNNNAGG EcoO109I RGGNCCY EcoRI GAATTC EcoRV GATATC Eco47III AGCGCT EspI GCTNAGC Fnu4HI GCNGC FokI GGATG FokI CATCC FspI TGCGCA GdiII YGGCCG GsuI CTCCAG GsuI CTGGAG HaeI WGGCCW HaeII RGCGCY HaeIII GGCC HgaI GACGC HgaI GCGTC HgiAI GWGCWC HgiEII ACCNNNNNNGGT HhaI GCGC HinCII GTYRAC HinDIII AAGCTT HinFI GANTC HinPI GCGC HpaI GTTAAC HphI GGTGA HphI TCACC KasI GGCGCC KpnI GGTACC MaeII ACGT MaeIII GTNAC MboI GATC MboII GAAGA MboII TCTTC MluI ACGCGT MnlI CCTC MnlI GAGG MscI TGGCCA MseI TTAA MspI CCGG NaeI GCCGGC NarI GGCGCC NciI CCSGG NcoI CCATGG NdeI CATATG NheI GCTAGC NlaIII CATG NlaIV GGNNCC NotI GCGGCCGC NruI TCGCGA NsiI ATGCAT NspBII CMGCKG NspHI RCATGY PacI TTAATTAA PaeR7I CTCGAG PflMI CCANNNNNTGG PleI GAGTC PleI GACTC PmlI CACGTG PpuMI RGGWCCY PstI CTGCAG PvuI CGATCG PvuII CAGCTG RmaI CTAG RsaI GTAC RsrII CGGWCCG SacI GAGCTC SacII CCGCGG SalI GTCGAC Sau96I GGNCC ScaI AGTACT ScrFI CCNGG SfaNI GCATC SfaNI GATGC Sfi I GGCCNNNNNGGCC SmaI CCCGGG SnaI GTATAC SnaBI TACGTA SpeI ACTAGT SphI GCATGC SplI CGTACG SspI AATATT StuI AGGCCT StyI CCWWGG TaqI TCGA TfiI GAWTC Tth111I GACNNNGTC Tth111II CAARCA Tth111II TGYTTG XbaI TCTAGA XcmI CCANNNNNNNNNTGG XhoI CTCGAG XmaI CCCGGG XmnI GAANNNNTTC ----------------------------MISINDEL.BAS follows-------------------------- 10 REM This program, "MISINDEL", by LANCE DAVIDOW. ver 14June91.(rems 14Dec94) 15 REM Work done at Collaborative Research, Inc,in 1991 20 REM My new address:Dept of Molecular Biology 21 REM Mass. General Hospital, Boston MA 02114. (effective July 1993) 22 REM Email davidow@frodo.mgh.harvard.edu. 23 REM This program is separate from and supplements MISMATCH (L. S. 24 REM Davidow (1992). Selecting PCR designed mismatch primers to create 25 REM diagnostic restriction sites. Comput. Applic. in Biosci. (CABIOS) 26 REM 8:193-194.) 30 REM It finds directed mismatch primers to allow allelotyping by restriction 40 REM digest analysis following PC for INSERTIONS and DELETIONS 50 REM ref: Haliassos et al. (1989). Nuc Acids Res 17:3606 60 REM Overall Program Logic= For larger allele, enzyme recognition site must 70 REM include at least one of the bases in the insertion. 80 REM I'm not checking if the site is present in the deleted allele but would 90 REM generate a different sized fragment 100 REM For Deleted allele, the recognition site must include at least 1 bp 110 REM on both sides of the deletion. Test 1st thru (last-1) bp of recog site 120 REM to fit bp of gene before deletion OR 2nd thru last to fit bp after del 130 DEFINT A-Z 140 REM ALLBASES$ IS THE STRING OF ALL POSSIBLE LEGAL BASES 150 ALLBASES$="ABCDGHKMNRSTUVWY" 160 REM matrule$ array has NEBiolabs ambiguous base matching rules 170 DIM MATRULE$(16) 180 REM blanks or illegal characters in the recognition site treated as "N" 190 MATRULE$(0)="ACGTU" 200 MATRULE$(1)="A" 210 MATRULE$(2)="CGTU" 220 MATRULE$(3)="C" 230 MATRULE$(4)="AGTU" 240 MATRULE$(5)="G" 250 MATRULE$(6)="ACTU" 260 MATRULE$(7)="GTU" 270 MATRULE$(8)="AC" 280 MATRULE$(9)="ACGTU" 290 MATRULE$(10)="AG" 300 MATRULE$(11)="CG" 310 MATRULE$(12)="TU" 320 MATRULE$(13)="TU" 330 MATRULE$(14)="ACG" 340 MATRULE$(15)="ATU" 350 MATRULE$(16)="CTU" 355 REM to avoid duplicate hits, XENZNAME$ is the ENZNAME$ previously tried 357 XENZNAME$="" 358 XDIFF=-100 360 REM the array XTRAB$(20) stores each consecutive base in the insertion 370 DIM XTRAB$(20) 380 INPUT "FILE NAME WITH THE LONGER SEQUENCE ON 1 LINE";GENEFILE$ 390 INPUT "RESTRICTION ENZYME FILE [APR91.ENZ]";ENZFILE$ 400 IF ENZFILE$="" THEN ENZFILE$="APR91.ENZ" 410 INPUT "OUTPUT FILE NAME [NUL=NO OUTPUT FILE]";OUTFILE$ 420 IF OUTFILE$="" THEN OUTFILE$="NUL" 430 OPEN GENEFILE$ FOR INPUT AS #1 440 LINE INPUT #1, GENESEQ$ 450 CLOSE #1 460 OPEN OUTFILE$ FOR OUTPUT AS #3 470 PRINT#3,"GENE FILE=";GENEFILE$,"ENZYME FILE=";ENZFILE$,"OUTPUT=";OUTFILE$ 480 INPUT "POSITION of last shared base before insertion--e.g.25";MUTPOS 490 INPUT "NUMBER OF BASES EXTRA IN LARGER ALLELE--e.g.3";DELSIZE 500 REM 21>DELSIZE>=1 and at least DELSIZE bases must be left in GENESEQ$ 510 FOR I=1 TO DELSIZE 520 XTRAB$(I)=MID$(GENESEQ$,(MUTPOS+I),1) 530 NEXT I 540 PRINT#3, 550 PRINT#3,"ARROWS FLANK THE NUMBERED INSERTED BASES. '-' AT MISMATCH POSITION" 560 PRINT#3, 570 TAG$=LEFT$("123456789abcdefghijk",DELSIZE) 580 REM TAG$ is numbering (base 21) printed over the sequence to show insert 590 OUTLINE$= SPACE$(MUTPOS-1) + ">"+ TAG$ + "<" + SPACE$(20) 600 PRINT#3, OUTLINE$ 610 PRINT#3, GENESEQ$ 620 PRINT#3,"INSERT AFTER bp ";MUTPOS,"NUMBER OF INSERTED BASES= ";DELSIZE 630 PRINT "First Pass--Sites Present only in Longer Allele" 640 PRINT#3,"First Pass--Sites Present Only in Longer Allele" 650 OPEN ENZFILE$ FOR INPUT AS #2 660 WHILE NOT EOF(2) 670 LINE INPUT #2,NEXTENZ$ 680 ENZNAME$=MID$(NEXTENZ$,1,13) 690 REM ENZYME NAMES IN COLS 1 TO 13. RECOGNITION SITE FROM 14 TO END 700 RECOG$=MID$(NEXTENZ$,14,65) 710 SITESIZE=LEN(RECOG$) 720 FOR INSITE=1 TO SITESIZE 730 REM DOES THIS BASE MATCH the next extra base in insert? 740 FOR INDEL=1 TO DELSIZE 750 LOOKUP=INSTR(ALLBASES$,MID$(RECOG$,INSITE,1)) 760 A$=MATRULE$(LOOKUP) 770 REM call subroutine if this base matches a base in the insert 780 REM need to eliminate testing an alignment already tested 790 REM i.e. if 2nd base in recog site matches 2nd in insert and 800 REM we already tested 1st base matches 1st base. repeats in output 810 IF INSTR(A$,XTRAB$(INDEL))<>0 THEN GOSUB 1170 820 NEXT INDEL 830 NEXT INSITE 840 WEND 850 CLOSE #2 860 PRINT 870 PRINT#3, 880 PRINT "Second Pass--Sites Present Only in Shorter Allele" 890 PRINT#3, "Second Pass--Sites Present Only in Shorter Allele" 900 GENESEQ$=MID$(GENESEQ$,1,MUTPOS) + MID$(GENESEQ$,(MUTPOS+DELSIZE+1)) 910 OUTLINE$=SPACE$(MUTPOS-1)+">" + "<" +SPACE$(20) 930 OPEN ENZFILE$ FOR INPUT AS #2 940 WHILE NOT EOF(2) 950 LINE INPUT #2, NEXTENZ$ 960 ENZNAME$=MID$(NEXTENZ$,1,13) 970 RECOG$=MID$(NEXTENZ$,14) 980 SITESIZE=LEN(RECOG$) 985 TESTBASE$=MID$(GENESEQ$,MUTPOS,1) 990 INDEL=0 1000 FOR INSITE=1 TO (SITESIZE-1) 1010 LOOKUP=INSTR(ALLBASES$,MID$(RECOG$,INSITE,1)) 1020 A$=MATRULE$(LOOKUP) 1030 IF INSTR(A$,TESTBASE$)<>0 THEN GOSUB 1170 1040 NEXT INSITE 1050 TESTBASE$=MID$(GENESEQ$,(MUTPOS+1),1) 1060 REM set indel=1 to increment the alignment to insite base over mutpos+1 1070 INDEL=1 1080 FOR INSITE=2 TO SITESIZE 1090 LOOKUP=INSTR(ALLBASES$,MID$(RECOG$,INSITE,1)) 1100 A$=MATRULE$(LOOKUP) 1110 IF INSTR(A$,TESTBASE$)<>0 THEN GOSUB 1170 1120 NEXT INSITE 1130 WEND 1140 CLOSE #3 1150 END 1160 REM 1170 REM Subroutine to count up mismatches between enzyme and sequence 1180 REM then call up another subroutine to output any alignments with 1190 REM one or 0 mismatches 1200 REM align recog. site with sequence. INSITE base over MUTPOS+INDEL base 1210 REM MISS is running sum of mismatches between recog site and gene 1220 REM need to avoid duplications if already tried this alignment. 1230 MISS=0 1240 REM MISSPT is position of mismatch. e.g. 5th bp in recog. site 1250 MISSPT=0 1260 FOR TEST=1 TO SITESIZE 1270 LOOKUP=INSTR(ALLBASES$,(MID$(RECOG$,TEST,1))) 1280 A$=MATRULE$(LOOKUP) 1290 IF INSTR(A$,MID$(GENESEQ$,(MUTPOS+INDEL-INSITE+TEST),1))=0 THEN MISS=MISS+1: MISSPT=TEST 1300 IF MISS>1 THEN RETURN 1310 NEXT TEST 1320 GOSUB 1331 1330 RETURN 1331 REM Subroutine to output a useful restriction site and alignment 1332 IF (XENZNAME$=ENZNAME$) AND (INSITE-MUTPOS-INDEL=XDIFF) THEN RETURN 1334 XENZNAME$=ENZNAME$ 1336 XDIFF=INSITE-MUTPOS-INDEL 1340 REM 1360 PRINT#3, 1370 PRINT 1380 ALIGN$=MID$(OUTLINE$,(MUTPOS+INDEL-INSITE+1),SITESIZE) 1390 IF MISSPT<>0 THEN MID$(ALIGN$,MISSPT)="-" 1400 PRINT , ALIGN$ 1410 PRINT#3, , ALIGN$ 1420 PRINT#3, ENZNAME$, RECOG$, MISS;" MISMATCHES" 1430 PRINT ENZNAME$, RECOG$, MISS;" mismatch" 1440 TARGET$=MID$(GENESEQ$,(MUTPOS+INDEL-INSITE+1),SITESIZE) 1450 PRINT#3, "TARGET DNA ", TARGET$ 1460 PRINT "target dna ",TARGET$ 1470 RETURN