Presentation is loading. Please wait.

Presentation is loading. Please wait.

Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

Similar presentations


Presentation on theme: "Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M."— Presentation transcript:

1 Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M. Hogan Queensland University of Technology

2 CRICOS No. 00213J a university for the world real R 2 An Agenda Bacterial Promoters –The domain and the motifs –Earlier approaches, including ours Why dumber is better –Not quite, but flexibility before sophistication –Exploiting new features as they are identified Results

3 CRICOS No. 00213J a university for the world real R 3 Upstream from a Bacterial Gene TSS promoter  RNA polymerase transcription GSS gene Search for ‘conserved’ -10 and -35 hexamers –Except they’re not really conserved –Plagued by massive false positive rates But this is the Reader’s Digest version

4 CRICOS No. 00213J a university for the world real R 4 Previous Work Mainly in the E. coli system PWMs – simple, but poor discrimination –Good performance if compound structure used –(Collado-Vides et. al.: State of the art pre 2006) HMMs – less successful than in eukaryotes TDNNs – boosted by GSS offset distribution SVMs – spectrum kernel ensemble –(Gordon et. al. (us): state of the art, but at a price)  70

5 CRICOS No. 00213J a university for the world real R 5 Beagle Principled and rapid inclusion of motifs as they are discovered or hypothesised –Prior to the Gordon et. al. paper, a TP:FP ratio of 1:300 was considered good. –But this was based solely on -10 and -35 motifs A model description language and parser –Less sophisticated than it sounds, but sufficient Iterative refinement of the model

6 CRICOS No. 00213J a university for the world real R 6 Upstream from a Bacterial Gene TTGACA -10 element TATAAT TSSGSS-35 element ATG Core Enzyme:  ’  Specific sigma controls binding at -10, -35 elements But binding probability varies enormously Compensate when hexamers are weak        ’’   “It has long been known that domains 2 and 4 … bind to the strongly conserved - 10 and -35 boxes”. Except when they don’t because they aren’t…

7 CRICOS No. 00213J a university for the world real R 7 Upstream from a Bacterial Gene TTGACA TRTG Extended -10 element TATAAT      TSSGSS-35 element   ’’  ATG  Simple Extended -10: TG Discovered in B. Subtilis, found in 20% of promoters in E. Coli -16 hypothesised to be important in E. Coli, TRTG or T(AG)TG consensus   But even the alpha units aren’t what they seem…

8 CRICOS No. 00213J a university for the world real R 8 Upstream from a Bacterial Gene TTGACA AAAAAARNR AWWWWWTTTTT       CTD 1  CTD 2  NTD 2 proximal UP element TSSGSSdistal UP element -35 element   ’’  ATG  NTD 1  TRTG Extended -10 element TGTATAAT -16  CTDs are carboxy terminal domains, binding to UP elements AT-rich region, proximal element more important

9 CRICOS No. 00213J a university for the world real R 9 The Data E. Coli and B. Subtilis Confirmed TSS locations within 250bp of the nearest gene start –No overlapping reading frames N=492 (E. Coli), 205 (B. Subtilis) 250 bp USRs available

10 CRICOS No. 00213J a university for the world real R 10 Beagle algorithm Define a consensus promoter –e.g. –Ordered pairs specify gap ranges Parse the description and define PWMs and weighted gaps –Initially trivial Refine using the confirmed TSS locations

11 CRICOS No. 00213J a university for the world real R 11 Beagle algorithm For each USR in the training set: –Anchor the pattern to the known TSS location –Determine the best match based on the current model Find the MLE of the model parameters based on the best matches from the training data. Test the refined definition on unseen data –10 repeats x 10 fold cross validation –Essentially TSS prediction Iterate until improvement ceases.

12 CRICOS No. 00213J a university for the world real R 12 TSS recognition (% accuracy) PatternE. coliB. subtilis Canonical -35, -10 boxes37.5 ± 1.4 % 61.6 ± 1.8 % Canonical + distance to GSS43.3 ± 1.2 %61.2 ± 1.7 % Guess which promoter boxes are more strongly conserved…

13 CRICOS No. 00213J a university for the world real R 13 Including UP elements NNW15NN –AT rich region NNAAAWWTWTTNNAAANNN –Estrem et al 1998 NNAAAWWTWTTN – A6RNR –Gourse et al 2000 –distal - proximal motif

14 CRICOS No. 00213J a university for the world real R 14 TSS recognition (% accuracy) PatternE. coliB. subtilis Canonical boxes + distance to GSS 43.3 ± 1.2 % 61.2 ± 1.7 % Canonical + distance to GSS + Estrem UP 41.4 ± 1.2 %62.0 ± 1.7 % Canonical + distance to GSS + AT rich region 47.3 ± 1.2 %64.8 ± 1.8 %

15 CRICOS No. 00213J a university for the world real R 15 Comparing E. coli and B. subtilis promoters B. subtilis -35 element B. subtilis -10 element E. coli -10 element E. coli -35 element E. Coli has 7 known sigmas; B. Subtilis 18…

16 CRICOS No. 00213J a university for the world real R 16 Motifs ‘in the Gap’ Extended -10 element –Consensus TGTATAAT –Strongly implicated in Subtilis –Hypothesised as significant in 20% E Coli Extended -16 element –Consensus TRTG  

17 CRICOS No. 00213J a university for the world real R 17 TSS recognition (% accuracy) PatternE. coliB. subtilis Canonical boxes + distance to GSS 43.3 ± 1.2 % 61.2 ± 1.7 % Canonical + distance to GSS +TG extended-10 41.6 ± 1.3 %62.5 ± 1.8 % Canonical + distance to GSS +TRTG extended-10 37.6 ± 1.3 %62.6 ± 1.8 %

18 CRICOS No. 00213J a university for the world real R 18 The Complete Picture -10-35  CTD II  CTD   NTD  ’  70  CTD II  CTD II -40.5 -52 -62 -72 UP element AT rich Variable location

19 CRICOS No. 00213J a university for the world real R 19 TSS recognition (% accuracy) PatternE. coliB. subtilis Canonical boxes + distance to GSS 43.3 ± 1.2 % 61.2 ± 1.7 % Canonical + distance to GSS +TG extended-10 + AT rich region 48.3 ± 1.5 %68.8 ± 1.6 % Canonical + distance to GSS +TRTG extended-10 + AT rich region 40.5 ± 1.4 %71.2 ± 1.7 %

20 CRICOS No. 00213J a university for the world real R 20 TSS recognition (% accuracy) E. coli 43.3% 48.3% B. subtilis 61.2% 71.2% +AT rich 47.3% 41.6% +TG +AT rich 64.8% 62.6% +TRTG

21 CRICOS No. 00213J a university for the world real R 21 Conclusions Beagle provides a simple bridge between experiment and computational discovery –Is the extended -16 motif really important in E. Coli? –(Well, not in any general sense) Fast, robust and flexible Extensions –Combination of model organisms –Comparative genomics & regulation


Download ppt "Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M."

Similar presentations


Ads by Google