The Domain Theory (for recognizing promoters): % Promoters have a region where a protein (RNA polymerase) must make contact % and the helical DNA sequence must have a valid conformation so that % the two pieces of the contact region spatially align. % Prolog notation is used. promoter :- contact, conformation. % There are two regions "upstream" from the beginning of the gene % at which the RNA polymerase makes contact. contact :- minus_35, minus_10. % The following rules describe the compositions of possible contact regions. minus_35 :- p-37=c, p-36=t, p-35=t, p-34=g, p-33=a, p-32=c. minus_35 :- p-36=t, p-35=t, p-34=g, p-32=c, p-31=a. minus_35 :- p-36=t, p-35=t, p-34=g, p-33=a, p-32=c, p-31=a. minus_35 :- p-36=t, p-35=t, p-34=g, p-33=a, p-32=c. minus_10 :- p-14 t, p-13 a, p-12=t, p-11=a, p-10=a, p-9=t. minus_10 :- p-13 t, p-12=a, p-10=a, p-8=t. minus_10 :- p-13 t, p-12=a, p-11=t, p-10=a, p-9=a, p-8=t. minus_10 :- p-12=t, p-11=a, p-7=t. % The following rules describe sequence characteristics that produce % acceptable conformations. conformation :- p-47=c, p-46=a, p-45=a, p-43=t, p-42=t, p-40=a, p-39=c, p-22=g, p-18=t, p-16=c, p-8=g, p-7=c, p-6=g, p-5=c, p-4=c, p-2=c, p-1=c. conformation :- p-45=a, p-44=a, p-41=a. conformation :- p-49=a, p-44=t, p-27=t, p-22=a, p-18=t, p-16=t, p-15=g, p-1=a. conformation :- p-45=a, p-41=a, p-28=t, p-27=t, p-23=t, p-21=a, p-20=a, p-17=t, p-15=t, p-4=t. % If exact matches are required, this domain theory matches NONE % of the examples below. Also note that some of the MINUS_35 rules % are subsumed by another MINUS_35 rule. This occurs because the % biological evidence is inconclusive wrt the correct specificity. To: ronnyk@cs.stanford.edu Date: Fri, 21 Jan 1994 09:10:12 -0800 From: Eddie Schwalb +,S10, tactagcaatacgcttgcgttcggtggttaagtatgtataatgcgcgggcttgtcgt +,AMPC, tgctatcctgacagttgtcacgctgattggtgtcgttacaatctaacgcatcgccaa +,AROH, gtactagagaactagtgcattagcttatttttttgttatcatgctaaccacccggcg +,DEOP2, aattgtgatgtgtatcgaagtgtgttgcggagtagatgttagaatactaacaaactc +,LEU1_TRNA, tcgataattaactattgacgaaaagctgaaaaccactagaatgcgcctccgtggtag +,MALEFG, aggggcaaggaggatggaaagaggttgccgtataaagaaactagagtccgtttaggt +,MALK, cagggggtggaggatttaagccatctcctgatgacgcatagtcagcccatcatgaat +,RECA, tttctacaaaacacttgatactgtatgagcatacagtataattgcttcaacagaaca +,RPOB, cgacttaatatactgcgacaggacgtccgttctgtgtaaatcgcaatgaaatggttt +,RRNAB_P1, ttttaaatttcctcttgtcaggccggaataactccctataatgcgccaccactgaca +,RRNAB_P2, gcaaaaataaatgcttgactctgtagcgggaaggcgtattatgcacaccccgcgccg +,RRNDEX_P2, cctgaaattcagggttgactctgaaagaggaaagcgtaatatacgccacctcgcgac +,RRND_P1, gatcaaaaaaatacttgtgcaaaaaattgggatccctataatgcgcctccgttgaga +,RRNE_P1, ctgcaatttttctattgcggcctgcggagaactccctataatgcgcctccatcgaca +,RRNG_P1, tttatatttttcgcttgtcaggccggaataactccctataatgcgccaccactgaca +,RRNG_P2, aagcaaagaaatgcttgactctgtagcgggaaggcgtattatgcacaccgccgcgcc +,RRNX_P1, atgcatttttccgcttgtcttcctgagccgactccctataatgcgcctccatcgaca +,TNAA, aaacaatttcagaatagacaaaaactctgagtgtaataatgtagcctcgtgtcttgc +,TYRT, tctcaacgtaacactttacagcggcgcgtcatttgatatgatgcgccccgcttcccg +,ARAC, gcaaataatcaatgtggacttttctgccgtgattatagacacttttgttacgcgttt +,LACI, gacaccatcgaatggcgcaaaacctttcgcggtatggcatgatagcgcccggaagag +,MALT, aaaaacgtcatcgcttgcattagaaaggtttctggccgaccttataaccattaatta +,TRP, tctgaaatgagctgttgacaattaatcatcgaactagttaactagtacgcaagttca +,TRPP2, accggaagaaaaccgtgacattttaacacgtttgttacaaggtaaaggcgacgccgc +,THR, aaattaaaattttattgacttaggtcactaaatactttaaccaatataggcatagcg +,BIOB, ttgtcataatcgacttgtaaaccaaattgaaaagatttaggtttacaagtctacacc +,FOL, catcctcgcaccagtcgacgacggtttacgctttacgtatagtggcgacaatttttt +,UVRBP1, tccagtataatttgttggcataattaagtacgacgagtaaaattacatacctgcccg +,UVRBP3, acagttatccactattcctgtggataaccatgtgtattagagttagaaaacacgagg +,LEXA, tgtgcagtttatggttccaaaatcgccttttgctgtatatactcacagcataactgt +,PORI-L, ctgttgttcagtttttgagttgtgtataacccctcattctgatcccagcttatacgg +,SPOT42, attacaaaaagtgctttctgaactgaacaaaaaagagtaaagttagtcgcgtagggt +,M1RNA, atgcgcaacgcggggtgacaagggcgcgcaaaccctctatactgcgcgccgaagctg +,GLNS, taaaaaactaacagttgtcagcctgtcccgcttataagatcatacgccgttatacgt +,TUFB, atgcaattttttagttgcatgaactcgcatgtctccatagaatgcgcgctacttgat +,SUBB-E, ccttgaaaaagaggttgacgctgcaaggctctatacgcataatgcgccccgcaacgc +,STR, tcgttgtatatttcttgacaccttttcggcatcgccctaaaattcggcgtcctcata +,SPC, ccgtttattttttctacccatatccttgaagcggtgttataatgccgcgccctcgat +,RPOA, ttcgcatatttttcttgcaaagttgggttgagctggctagattagccagccaatctt +,RPLJ, tgtaaactaatgcctttacgtgggcggtgattttgtctacaatcttacccccacgta +,PORI-R, gatcgcacgatctgtatacttatttgagtaaattaacccacgatcccagccattctt +,ALAS, aacgcatacggtattttaccttcccagtcaagaaaacttatcttattcccacttttc +,ARABAD, ttagcggatcctacctgacgctttttatcgcaactctctactgtttctccatacccg +,BIOA, gccttctccaaaacgtgttttttgttgttaattcggtgtagacttgtaaacctaaat +,DEOP1, cagaaacgttttattcgaacatcgatctcgtcttgtgttagaattctaacatacggt +,GALP2, cactaatttattccatgtcacacttttcgcatctttgttatgctatggttatttcat +,HIS, atataaaaaagttcttgctttctaacgtgaaagtggtttaggttaaaagacatcagt +,HISJ, caaggtagaatgctttgccttgtcggcctgattaatggcacgatagtcgcatcggat +,ILVGEDA, ggccaaaaaatatcttgtactatttacaaaacctatggtaactctttaggcattcct +,LACP1, taggcaccccaggctttacactttatgcttccggctcgtatgttgtgtggaattgtg +,LPP, ccatcaaaaaaatattctcaacataaaaaactttgtgtaatacttgtaacgctacat +,TRPR, tggggacgtcgttactgatccgcacgtttatgatatgctatcgtactctttagcgag +,UVRB_P2, tcagaaatattatggtgatgaactgtttttttatccagtataatttgttggcataat -, 867, atatgaacgttgagactgccgctgagttatcagctgtgaacgacattctggcgtcta -,1169, cgaacgagtcaatcagaccgctttgactctggtattactgtgaacattattcgtctc -, 802, caatggcctctaaacgggtcttgaggggttttttgctgaaaggaggaactatatgcg -, 521, ttgacctactacgccagcattttggcggtgtaagctaaccattccggttgactcaat -, 918, cgtctatcggtgaacctccggtatcaacgctggaaggtgacgctaacgcagatgcag -,1481, gccaatcaatcaagaacttgaagggtggtatcagccaacagcctgacatccttcgtt -,1024, tggatggacgttcaacattgaggaaggcataacgctactacctgatgtttactccaa -,1149, gaggtggctatgtgtatgaccgaacgagtcaatcagaccgctttgactctggtatta -, 313, cgtagcgcatcagtgctttcttactgtgagtacgcaccagcgccagaggacgacgac -, 780, cgaccgaagcgagcctcgtcctcaatggcctctaaacgggtcttgaggggttttttg -,1384, ctacggtgggtacaatatgctggatggagatgcgttcacttctggtctactgactcg -, 507, atagtctcagagtcttgacctactacgccagcattttggcggtgtaagctaaccatt -, 39, aactcaaggctgatacggcgagacttgcgagccttgtccttgcggtacacagcagcg -,1203, ttactgtgaacattattcgtctccgcgactacgatgagatgcctgagtgcttccgtt -, 988, tattctcaacaagattaaccgacagattcaatctcgtggatggacgttcaacattga -,1171, aacgagtcaatcagaccgctttgactctggtattactgtgaacattattcgtctccg -, 753, aagtgcttagcttcaaggtcacggatacgaccgaagcgagcctcgtcctcaatggcc -, 630, gaagaccacgcctcgccaccgagtagacccttagagagcatgtcagcctcgacaact -, 660, ttagagagcatgtcagcctcgacaacttgcataaatgctttcttgtagacgtgccct -,1216, tattcgtctccgcgactacgatgagatgcctgagtgcttccgttactggattgtcac -, 835, tgctgaaaggaggaactatatgcgctcatacgatatgaacgttgagactgccgctga -, 35, catgaactcaaggctgatacggcgagacttgcgagccttgtccttgcggtacacagc -,1218, ttcgtctccgcgactacgatgagatgcctgagtgcttccgttactggattgtcacca -, 668, catgtcagcctcgacaacttgcataaatgctttcttgtagacgtgccctacgcgctt -, 413, aggaggaactacgcaaggttggaacatcggagagatgccagccagcgcacctgcacg -, 991, tctcaacaagattaaccgacagattcaatctcgtggatggacgttcaacattgagga -, 751, tgaagtgcttagcttcaaggtcacggatacgaccgaagcgagcctcgtcctcaatgg -, 850, ctatatgcgctcatacgatatgaacgttgagactgccgctgagttatcagctgtgaa -, 93, gcggcagcacgtttccacgcggtgagagcctcaggattcatgtcgatgtcttccggt -,1108, atccctaatgtctacttccggtcaatccatctacgttaaccgaggtggctatgtgta -, 915, tggcgtctatcggtgaacctccggtatcaacgctggaaggtgacgctaacgcagatg -,1019, tctcgtggatggacgttcaacattgaggaaggcataacgctactacctgatgtttac -, 19, tattggcttgctcaagcatgaactcaaggctgatacggcgagacttgcgagccttgt -,1320, tagagggtgtactccaagaagaggaagatgaggctagacgtctctgcatggagtatg -, 91, cagcggcagcacgtttccacgcggtgagagcctcaggattcatgtcgatgtcttccg -, 217, ttacgttggcgaccgctaggactttcttgttgattttccatgcggtgttttgcgcaa -, 957, acgctaacgcagatgcagcgaacgctcggcgtattctcaacaagattaaccgacaga -, 260, ggtgttttgcgcaatgttaatcgctttgtacacctcaggcatgtaaacgtcttcgta -, 557, aaccattccggttgactcaatgagcatctcgatgcagcgtactcctacatgaataga -,1355, agacgtctctgcatggagtatgagatggactacggtgggtacaatatgctggatgga -, 244, tgttgattttccatgcggtgttttgcgcaatgttaatcgctttgtacacctcaggca -, 464, tgcacgggttgcgatagcctcagcgtattcaggtgcgagttcgatagtctcagagtc -, 296, aggcatgtaaacgtcttcgtagcgcatcagtgctttcttactgtgagtacgcaccag -, 648, ccgagtagacccttagagagcatgtcagcctcgacaacttgcataaatgctttcttg -, 230, cgctaggactttcttgttgattttccatgcggtgttttgcgcaatgttaatcgcttt -,1163, tatgaccgaacgagtcaatcagaccgctttgactctggtattactgtgaacattatt -,1321, agagggtgtactccaagaagaggaagatgaggctagacgtctctgcatggagtatga -, 663, gagagcatgtcagcctcgacaacttgcataaatgctttcttgtagacgtgccctacg -, 799, cctcaatggcctctaaacgggtcttgaggggttttttgctgaaaggaggaactatat -, 987, gtattctcaacaagattaaccgacagattcaatctcgtggatggacgttcaacattg -,1226, cgcgactacgatgagatgcctgagtgcttccgttactggattgtcaccaaggcttcc -, 794, ctcgtcctcaatggcctctaaacgggtcttgaggggttttttgctgaaaggaggaac -,1442, taacattaataaataaggaggctctaatggcactcattagccaatcaatcaagaact The Domain Theory (for recognizing promoters): % Promoters have a region where a protein (RNA polymerase) must make contact % and the helical DNA sequence must have a valid conformation so that % the two pieces of the contact region spatially align. % Prolog notation is used. promoter :- contact, conformation. % There are two regions "upstream" from the beginning of the gene % at which the RNA polymerase makes contact. contact :- minus_35, minus_10. % The following rules describe the compositions of possible contact regions. minus_35 :- p-37=c, p-36=t, p-35=t, p-34=g, p-33=a, p-32=c. minus_35 :- p-36=t, p-35=t, p-34=g, p-32=c, p-31=a. minus_35 :- p-36=t, p-35=t, p-34=g, p-33=a, p-32=c, p-31=a. minus_35 :- p-36=t, p-35=t, p-34=g, p-33=a, p-32=c. minus_10 :- p-14 t, p-13 a, p-12=t, p-11=a, p-10=a, p-9=t. minus_10 :- p-13 t, p-12=a, p-10=a, p-8=t. minus_10 :- p-13 t, p-12=a, p-11=t, p-10=a, p-9=a, p-8=t. minus_10 :- p-12=t, p-11=a, p-7=t. % The following rules describe sequence characteristics that produce % acceptable conformations. conformation :- p-47=c, p-46=a, p-45=a, p-43=t, p-42=t, p-40=a, p-39=c, p-22=g, p-18=t, p-16=c, p-8=g, p-7=c, p-6=g, p-5=c, p-4=c, p-2=c, p-1=c. conformation :- p-45=a, p-44=a, p-41=a. conformation :- p-49=a, p-44=t, p-27=t, p-22=a, p-18=t, p-16=t, p-15=g, p-1=a. conformation :- p-45=a, p-41=a, p-28=t, p-27=t, p-23=t, p-21=a, p-20=a, p-17=t, p-15=t, p-4=t. % If exact matches are required, this domain theory matches NONE % of the examples below. Also note that some of the MINUS_35 rules % are subsumed by another MINUS_35 rule. This occurs because the % biological evidence is inconclusive wrt the correct specificity.