Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Bioinformatics Thursday, 26 February In this episode I introduce the Research Project, an activity that will occupy most of your bioinformatics-related.

Similar presentations


Presentation on theme: "Introduction to Bioinformatics Thursday, 26 February In this episode I introduce the Research Project, an activity that will occupy most of your bioinformatics-related."— Presentation transcript:

1 Introduction to Bioinformatics Thursday, 26 February In this episode I introduce the Research Project, an activity that will occupy most of your bioinformatics-related attention for the rest of the semester. WHY a research project WHAT is the scientific rationale HOW to get started and continue to make progress WHEN specific events along the way will take place I’ll talk about:

2 WHY a research project? To read more on this topic, go to the course web page, click Course at a Glance, Strategies, Research Project Arguably, the main purpose of this class (or any other class in a liberal arts education curriculum) is to help you to become independent producers of value able to define for yourself important questions to address and then to find a way to address them. There’s no better way to learn how to do this......than to do it.

3 WHY a research project? But conceiving a project and bringing it to fruition takes time. Four years is often not enough for most people in PhD programs. We’ll therefore take some shortcuts. One is confining the project to computational analysis, even though such analysis can only suggest and predict biological significance. Laboratory confirmation is much slower than computational analysis. You’ll have available a nonstandard computational environment, BioBIKE, that does many of the preliminary tasks for you (e.g. gathering together genomes). You’ll focus on topics where rapid progress is possible and do so in a social environment that facilitates cross-fertilization.

4 WHAT is the scientific rationale? What topics are ripe for progress and within our reach? Consider a genome...

5 TGAGACACATATTTTTGATATTCCAGTTGTTGCAATC GAATGTAAAACATATTTAGATCTTTATAGTTATGGTAC ATTCAAGATCCAACCTTCATTCTAGTGTTTAAAGAGAAC TGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTTGGAGATTG AAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGATAC C TGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGT TA TTCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGA ATA TCTGCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAG GGATT TAGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATG GTTTATT CATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTC ATGGAAAA CGTAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAG ATATTATTAAG AAAACATTTGAAGAAAAAATGGCGCACATTGTGCGACATTTTTTAAATGCTGCGGAATATGGC A TTCCGCAAATTAGAGAACGTATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACT AGGTATTCCTAAAAA AACACATTCTCTGCAATTTTTAAATTTAAATAGGTCTCAATTATCGATCTTCGATGAT ATGAGTATTATACCTGCACT AACTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGGAAGTGAA GAGCAACCCTATCATTTAAAACC TCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAAT GGAACATTCTGACCGTTTAGTAGAA CGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAACATGGAGCTAGACGACGT AGTGGTAATGGTGAATTATCAAACAAA TCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTG TCCATCCTTTTCAACATCGAAATTTAACAGC CCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAACTATTGCATCGA GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATA TAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTA GTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTA CAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTGACCA AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAA GATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATG GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTA ACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATA GGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCA CTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAC AACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGT CATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAA Information in a genome What sense can we make of a genome? A sea of letters... What sense can we make of a genome? A sea of letters...

6 TGAGACACATATTTTTGATATTCCAGTTGTTGCAATC GAATGTAAAACATATTTAGATCTTTATAGTTATGGTAC ATTCAAGATCCAACCTTCATTCTAGTGTTTAAAGAGAAC TGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTTGGAGATTG AAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGATAC C TGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGT TA TTCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGA ATA TCTGCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAG GGATT TAGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATG GTTTATT CATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTC ATGGAAAA CGTAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAG ATATTATTAAG AAAACATTTGAAGAAAAAATGGCGCACATTGTGCGACATTTTTTAAATGCTGCGGAATATGGC A TTCCGCAAATTAGAGAACGTATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACT AGGTATTCCTAAAAA AACACATTCTCTGCAATTTTTAAATTTAAATAGGTCTCAATTATCGATCTTCGATGAT ATGAGTATTATACCTGCACT AACTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGGAAGTGAA GAGCAACCCTATCATTTAAAACC TCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAAT GGAACATTCTGACCGTTTAGTAGAA CGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAACATGGAGCTAGACGACGT AGTGGTAATGGTGAATTATCAAACAAA TCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTG TCCATCCTTTTCAACATCGAAATTTAACAGC CCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAACTATTGCATCGA GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATA TAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTA GTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTA CAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTGACCA AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAA GATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATG GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTA ACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATA GGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCA CTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAC AACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGT CATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAA Information in a genome Actually, we can make a lot of sense.

7 TGAGACACATATTTTTGATATTCCAGTTGTTGCAATC GAATGTAAAACATATTTAGATCTTTATAGTTATGGTAC ATTCAAGATCCAACCTTCATTCTAGTGTTTAAAGAGAAC TGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTTGGAGATTG AAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGATAC C TGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGT TA TTCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGA ATA TCTGCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAG GGATT TAGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATG GTTTATT CATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTC ATGGAAAA CGTAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAG ATATTATTAAG AAAACATTTGAAGAAAAAATGGCGCACATTGTGCGACATTTTTTAAATGCTGCGGAATATGGC A TTCCGCAAATTAGAGAACGTATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACT AGGTATTCCTAAAAA AACACATTCTCTGCAATTTTTAAATTTAAATAGGTCTCAATTATCGATCTTCGATGAT ATGAGTATTATACCTGCACT AACTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGGAAGTGAA GAGCAACCCTATCATTTAAAACC TCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAAT GGAACATTCTGACCGTTTAGTAGAA CGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAACATGGAGCTAGACGACGT AGTGGTAATGGTGAATTATCAAACAAA TCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTG TCCATCCTTTTCAACATCGAAATTTAACAGC CCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAACTATTGCATCGA GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATA TAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTA GTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTA CAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTGACCA AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAA GATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATG GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTA ACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATA GGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCA CTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAC AACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGT CATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAA Information in a genome We know how to identify protein- encoding genes with a fair bit of confidence. They’re not a mystery.

8 TGAGACACATATTTTTGATATTCCAGTTGTTGCAATC GAATGTAAAACATATTTAGATCTTTATAGTTATGGTAC ATTCAAGATCCAACCTTCATTCTAGTGTTTAAAGAGAAC TGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTTGGAGATTG AAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGATAC C TGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGT TA TTCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGA ATA TCTGCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAG GGATT TAGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATG GTTTATT CATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTC ATGGAAAA CGTAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAG ATATTATTAAG AAAACATTTGAAGAAAAAATGGCGCACATTGTGCGACATTTTTTAAATGCTGCGGAATATGGC A TTCCGCAAATTAGAGAACGTATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACT AGGTATTCCTAAAAA AACACATTCTCTGCAATTTTTAAATTTAAATAGGTCTCAATTATCGATCTTCGATGAT ATGAGTATTATACCTGCACT AACTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGGAAGTGAA GAGCAACCCTATCATTTAAAACC TCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAAT GGAACATTCTGACCGTTTAGTAGAA CGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAACATGGAGCTAGACGACGT AGTGGTAATGGTGAATTATCAAACAAA TCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTG TCCATCCTTTTCAACATCGAAATTTAACAGC CCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAACTATTGCATCGA GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATA TAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTA GTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTA CAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTGACCA AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAA GATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATG GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTA ACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATA GGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCA CTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAC AACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGT CATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAA AATATAAAGTTGAT Information in a genome + Junk What’s the rest? It’s been called junk -- just the dough if all you really like are the chocolate chips. (are there such people?) What’s the rest? It’s been called junk -- just the dough if all you really like are the chocolate chips. (are there such people?)

9 TGAGACACATATTTTTGATATTCCAGTTGTTGCAATC GAATGTAAAACATATTTAGATCTTTATAGTTATGGTAC ATTCAAGATCCAACCTTCATTCTAGTGTTTAAAGAGAAC TGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTTGGAGATTG AAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGATAC C TGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGT TA TTCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGA ATA TCTGCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAG GGATT TAGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATG GTTTATT CATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTC ATGGAAAA CGTAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAG ATATTATTAAG AAAACATTTGAAGAAAAAATGGCGCACATTGTGCGACATTTTTTAAATGCTGCGGAATATGGC A TTCCGCAAATTAGAGAACGTATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACT AGGTATTCCTAAAAA AACACATTCTCTGCAATTTTTAAATTTAAATAGGTCTCAATTATCGATCTTCGATGAT ATGAGTATTATACCTGCACT AACTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGGAAGTGAA GAGCAACCCTATCATTTAAAACC TCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAAT GGAACATTCTGACCGTTTAGTAGAA CGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAACATGGAGCTAGACGACGT AGTGGTAATGGTGAATTATCAAACAAA TCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTG TCCATCCTTTTCAACATCGAAATTTAACAGC CCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAACTATTGCATCGA GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATA TAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTA GTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTA CAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTGACCA AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAA GATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATG GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTA ACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATA GGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCA CTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAC AACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGT CATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAA Information in a genome + Junk RNA Polymerase...but wait! Genes don’t just sit on the genome. They have to be transcribed. There has to be a signal – a sequence -- for RNA polymerase to bind to....but wait! Genes don’t just sit on the genome. They have to be transcribed. There has to be a signal – a sequence -- for RNA polymerase to bind to.

10 TGAGACACATATTTTTGATATTCCAGTTGTTGCAATC GAATGTAAAACATATTTAGATCTTTATAGTTATGGTAC ATTCAAGATCCAACCTTCATTCTAGTGTTTAAAGAGAAC TGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTTGGAGATTG AAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGATAC C TGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGT TA TTCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGA ATA TCTGCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAG GGATT TAGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATG GTTTATT CATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTC ATGGAAAA CGTAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAG ATATTATTAAG AAAACATTTGAAGAAAAAATGGCGCACATTGTGCGACATTTTTTAAATGCTGCGGAATATGGC A TTCCGCAAATTAGAGAACGTATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACT AGGTATTCCTAAAAA AACACATTCTCTGCAATTTTTAAATTTAAATAGGTCTCAATTATCGATCTTCGATGAT ATGAGTATTATACCTGCACT AACTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGGAAGTGAA GAGCAACCCTATCATTTAAAACC TCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAAT GGAACATTCTGACCGTTTAGTAGAA CGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAACATGGAGCTAGACGACGT AGTGGTAATGGTGAATTATCAAACAAA TCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTG TCCATCCTTTTCAACATCGAAATTTAACAGC CCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAACTATTGCATCGA GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATA TAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTA GTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTA CAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTGACCA AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAA GATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATG GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTA ACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATA GGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCA CTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAC AACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGT CATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAA A Information in a genome + Junk Ribosome...and the RNA has to be translated. Some of the supposed junk must really be ribosome binding sites. But the rest......and the RNA has to be translated. Some of the supposed junk must really be ribosome binding sites. But the rest...

11 TGAGACACATATTTTTGATATTCCAGTTGTTGCAATC GAATGTAAAACATATTTAGATCTTTATAGTTATGGTAC ATTCAAGATCCAACCTTCATTCTAGTGTTTAAAGAGAAC TGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTTGGAGATTG AAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGATAC C TGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGT TA TTCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGA ATA TCTGCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAG GGATT TAGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATG GTTTATT CATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTC ATGGAAAA CGTAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAG ATATTATTAAG AAAACATTTGAAGAAAAAATGGCGCACATTGTGCGACATTTTTTAAATGCTGCGGAATATGGC A TTCCGCAAATTAGAGAACGTATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACT AGGTATTCCTAAAAA AACACATTCTCTGCAATTTTTAAATTTAAATAGGTCTCAATTATCGATCTTCGATGAT ATGAGTATTATACCTGCACT AACTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGGAAGTGAA GAGCAACCCTATCATTTAAAACC TCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAAT GGAACATTCTGACCGTTTAGTAGAA CGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAACATGGAGCTAGACGACGT AGTGGTAATGGTGAATTATCAAACAAA TCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTG TCCATCCTTTTCAACATCGAAATTTAACAGC CCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAACTATTGCATCGA GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATA TAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTA GTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTA CAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTGACCA AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAA GATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATG GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTA ACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATA GGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCA CTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAC AACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGT CATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAA Information in a genome + Junk...well, there’s also the signals for the termination of transcription.

12 TGAGACACATATTTTTGATATTCCAGTTGTTGCAATC GAATGTAAAACATATTTAGATCTTTATAGTTATGGTAC ATTCAAGATCCAACCTTCATTCTAGTGTTTAAAGAGAAC TGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTTGGAGATTG AAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGATAC C TGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGT TA TTCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGA ATA TCTGCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAG GGATT TAGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATG GTTTATT CATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTC ATGGAAAA CGTAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAG ATATTATTAAG AAAACATTTGAAGAAAAAATGGCGCACATTGTGCGACATTTTTTAAATGCTGCGGAATATGGC A TTCCGCAAATTAGAGAACGTATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACT AGGTATTCCTAAAAA AACACATTCTCTGCAATTTTTAAATTTAAATAGGTCTCAATTATCGATCTTCGATGAT ATGAGTATTATACCTGCACT AACTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGGAAGTGAA GAGCAACCCTATCATTTAAAACC TCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAAT GGAACATTCTGACCGTTTAGTAGAA CGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAACATGGAGCTAGACGACGT AGTGGTAATGGTGAATTATCAAACAAA TCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTG TCCATCCTTTTCAACATCGAAATTTAACAGC CCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAACTATTGCATCGA GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATA TAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTA GTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTA CAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTGACCA AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAA GATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATG GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTA ACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATA GGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCA CTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAC AACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGT CATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAA Information in a genome + Junk Termination signal U U U U U 5'... U U U U U G G G G G C C C C A A A G G C A U...well, there are also the signals for the termination of transcription.

13 TGAGACACATATTTTTGATATTCCAGTTGTTGCAATC GAATGTAAAACATATTTAGATCTTTATAGTTATGGTAC ATTCAAGATCCAACCTTCATTCTAGTGTTTAAAGAGAAC TGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTTGGAGATTG AAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGATAC C TGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGT TA TTCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGA ATA TCTGCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAG GGATT TAGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATG GTTTATT CATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTC ATGGAAAA CGTAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAG ATATTATTAAG AAAACATTTGAAGAAAAAATGGCGCACATTGTGCGACATTTTTTAAATGCTGCGGAATATGGC A TTCCGCAAATTAGAGAACGTATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACT AGGTATTCCTAAAAA AACACATTCTCTGCAATTTTTAAATTTAAATAGGTCTCAATTATCGATCTTCGATGAT ATGAGTATTATACCTGCACT AACTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGGAAGTGAA GAGCAACCCTATCATTTAAAACC TCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAAT GGAACATTCTGACCGTTTAGTAGAA CGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAACATGGAGCTAGACGACGT AGTGGTAATGGTGAATTATCAAACAAA TCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTG TCCATCCTTTTCAACATCGAAATTTAACAGC CCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAACTATTGCATCGA GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATA TAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTA GTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTA CAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTGACCA AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAA GATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATG GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTA ACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATA GGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCA CTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAC AACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGT CATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAA Information in a genome + Junk But these are small sequences, the rest must surely be... Junk?

14 TGAGACACATATTTTTGATATTCCAGTTGTTGCAATC GAATGTAAAACATATTTAGATCTTTATAGTTATGGTAC ATTCAAGATCCAACCTTCATTCTAGTGTTTAAAGAGAAC TGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTTGGAGATTG AAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGATAC C TGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGT TA TTCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGA ATA TCTGCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAG GGATT TAGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATG GTTTATT CATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTC ATGGAAAA CGTAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAG ATATTATTAAG AAAACATTTGAAGAAAAAATGGCGCACATTGTGCGACATTTTTTAAATGCTGCGGAATATGGC A TTCCGCAAATTAGAGAACGTATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACT AGGTATTCCTAAAAA AACACATTCTCTGCAATTTTTAAATTTAAATAGGTCTCAATTATCGATCTTCGATGAT ATGAGTATTATACCTGCACT AACTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGGAAGTGAA GAGCAACCCTATCATTTAAAACC TCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAAT GGAACATTCTGACCGTTTAGTAGAA CGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAACATGGAGCTAGACGACGT AGTGGTAATGGTGAATTATCAAACAAA TCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTG TCCATCCTTTTCAACATCGAAATTTAACAGC CCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAACTATTGCATCGA GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATA TAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTA GTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTA CAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTGACCA AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAA GATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATG GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTA ACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATA GGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCA CTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAC AACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGT CATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAA Information in a genome + Junk Maybe not. In the past 10 years there’s been a revolution in our understanding of the human genome. It used to be that 97% of it was junk. Now there’s far less, owing to our appreciation of small, noncoding RNA. Maybe not. In the past 10 years there’s been a revolution in our understanding of the human genome. It used to be that 97% of it was junk. Now there’s far less, owing to our appreciation of small, noncoding RNA.

15 TGAGACACATATTTTTGATATTCCAGTTGTTGCAATC GAATGTAAAACATATTTAGATCTTTATAGTTATGGTAC ATTCAAGATCCAACCTTCATTCTAGTGTTTAAAGAGAAC TGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTTGGAGATTG AAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGATAC C TGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGT TA TTCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGA ATA TCTGCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAG GGATT TAGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATG GTTTATT CATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTC ATGGAAAA CGTAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAG ATATTATTAAG AAAACATTTGAAGAAAAAATGGCGCACATTGTGCGACATTTTTTAAATGCTGCGGAATATGGC A TTCCGCAAATTAGAGAACGTATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACT AGGTATTCCTAAAAA AACACATTCTCTGCAATTTTTAAATTTAAATAGGTCTCAATTATCGATCTTCGATGAT ATGAGTATTATACCTGCACT AACTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGGAAGTGAA GAGCAACCCTATCATTTAAAACC TCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAAT GGAACATTCTGACCGTTTAGTAGAA CGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAACATGGAGCTAGACGACGT AGTGGTAATGGTGAATTATCAAACAAA TCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTG TCCATCCTTTTCAACATCGAAATTTAACAGC CCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAACTATTGCATCGA GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATA TAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTA GTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTA CAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTGACCA AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAA GATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATG GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTA ACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATA GGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCA CTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAC AACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGT CATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAA Information in a genome The part of DNA we don't understand Junk Of course, the amount of junk in genomes hasn’t changed. What’s changed is our understanding. A plausible definition of “junk DNA” is... Of course, the amount of junk in genomes hasn’t changed. What’s changed is our understanding. A plausible definition of “junk DNA” is...

16 TGAGACACATATTTTTGATATTCCAGTTGTTGCAATC GAATGTAAAACATATTTAGATCTTTATAGTTATGGTAC ATTCAAGATCCAACCTTCATTCTAGTGTTTAAAGAGAAC TGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTTGGAGATTG AAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGATAC C TGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGT TA TTCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGA ATA TCTGCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAG GGATT TAGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATG GTTTATT CATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTC ATGGAAAA CGTAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAG ATATTATTAAG AAAACATTTGAAGAAAAAATGGCGCACATTGTGCGACATTTTTTAAATGCTGCGGAATATGGC A TTCCGCAAATTAGAGAACGTATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACT AGGTATTCCTAAAAA AACACATTCTCTGCAATTTTTAAATTTAAATAGGTCTCAATTATCGATCTTCGATGAT ATGAGTATTATACCTGCACT AACTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGGAAGTGAA GAGCAACCCTATCATTTAAAACC TCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAAT GGAACATTCTGACCGTTTAGTAGAA CGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAACATGGAGCTAGACGACGT AGTGGTAATGGTGAATTATCAAACAAA TCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTG TCCATCCTTTTCAACATCGAAATTTAACAGC CCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAACTATTGCATCGA GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATA TAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTA GTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTA CAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTGACCA AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAA GATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATG GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTA ACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATA GGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCA CTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAC AACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGT CATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAA Information in a genome The part of DNA we don't understand Junk How do we come to understand the significance of the parts of DNA for which we have no clues? How do we know where to look for signals? Consider this similar situation... How do we come to understand the significance of the parts of DNA for which we have no clues? How do we know where to look for signals? Consider this similar situation...

17 TGAGACACATATTTTTGATATTCCAGTTGTTGCAATC GAATGTAAAACATATTTAGATCTTTATAGTTATGGTAC ATTCAAGATCCAACCTTCATTCTAGTGTTTAAAGAGAAC TGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTTGGAGATTG AAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGATAC C TGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGT TA TTCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGA ATA TCTGCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAG GGATT TAGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATG GTTTATT CATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTC ATGGAAAA CGTAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAG ATATTATTAAG AAAACATTTGAAGAAAAAATGGCGCACATTGTGCGACATTTTTTAAATGCTGCGGAATATGGC A TTCCGCAAATTAGAGAACGTATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACT AGGTATTCCTAAAAA AACACATTCTCTGCAATTTTTAAATTTAAATAGGTCTCAATTATCGATCTTCGATGAT ATGAGTATTATACCTGCACT AACTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGGAAGTGAA GAGCAACCCTATCATTTAAAACC TCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAAT GGAACATTCTGACCGTTTAGTAGAA CGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAACATGGAGCTAGACGACGT AGTGGTAATGGTGAATTATCAAACAAA TCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTG TCCATCCTTTTCAACATCGAAATTTAACAGC CCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAACTATTGCATCGA GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATA TAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTA GTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTA CAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTGACCA AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAA GATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATG GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTA ACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATA GGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCA CTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAC AACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGT CATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAA Information in a genome We are visited by aliens....but are they friendly? We are visited by aliens....but are they friendly?

18 GREETINGS PRIMITIVE EARTHLINGS! WE COME IN PEACE. WE INVITE YOU TO LIVE IN PEACE WITHIN A GALACTIC COMMUNITY… TESTING, TESTING 1 2 3 (hey, is this thing on?) Information in a message Whew! That’s a relief! Wait! They probably can’t speak English (and no, they haven’t learned through reruns of I Love Lucy) Whew! That’s a relief! Wait! They probably can’t speak English (and no, they haven’t learned through reruns of I Love Lucy)

19 VYNNWGZVT LYGXGWGDN NUYWHPGZVT! QN 2SXN GZ LNU2N. QN GZDGWN RSF WS PGDN GZ LNU2N QGWHGZ U VUPU2WG2 2SXXFZGWR… WNTWGZV, WNTWGZV K J O (gkq, xt rgxt rgxjb dj?) Information in a message...so this is what we might receive as the first intergalactic message. You don’t speak alien, but are there any clues here? Is it just intergalactic noise? Look carefully......so this is what we might receive as the first intergalactic message. You don’t speak alien, but are there any clues here? Is it just intergalactic noise? Look carefully...

20 VYNNWGZVT LYGXGWGDN NUYWHPGZVT! QN 2SXN GZ LNU2N. QN GZDGWN RSF WS PGDN GZ LNU2N QGWHGZ U VUPU2WG2 2SXXFZGWR… WNTWGZV, WNTWGZV K J O (gkq, xt rgxt rgxjb dj?) Information in a message There’s a suspicious repetition of QN. That’s not likely to have arisen by chance (and you should know how to calculate the probability that it is!) There’s a suspicious repetition of QN. That’s not likely to have arisen by chance (and you should know how to calculate the probability that it is!)

21 VYNNWGZVT LYGXGWGDN NUYWHPGZVT! QN 2SXN GZ LNU2N. QN GZDGWN RSF WS PGDN GZ LNU2N QGWHGZ U VUPU2WG2 2SXXFZGWR… WNTWGZV, WNTWGZV K J O (gkq, xt rgxt rgxjb dj?) Information in a message And there are other repetitions, surely indicative of meaning. See any others? And there are other repetitions, surely indicative of meaning. See any others?

22 VYNNWGZVT LYGXGWGDN NUYWHPGZVT! QN 2SXN GZ LNU2N. QN GZDGWN RSF WS PGDN GZ LNU2N QGWHGZ U VUPU2WG2 2SXXFZGWR… WNTWGZV, WNTWGZV K J O (gkq, xt rgxt rgxjb dj?) Information in a message Sure are. There would be in any language that communicates meaning. I made this example easy, sliding in a symbol that may not be in their language – spaces. Sure are. There would be in any language that communicates meaning. I made this example easy, sliding in a symbol that may not be in their language – spaces.

23 VYNNWGZVTOLYGXGWGD XSFNENUYWHPGZVTEIMJO FPPFWAOXY2LZMKQDJFK HARQNG2SXNGZNLNU2NXL 12VEA2XJFNJBXMUP2TSB QNNGZDGWNKRSFKWSVHC DNTGZKLNU2NFQGWHZGU VUPU2WG2F2SXXFZGWR3A XTWVZZEYDWNVYDHW VHWNTWGZVNWNTWGZV2 PCMIRZX KPJSOH22MNWU (gkq, xt rgxt rgxjb dj?) Information in a message Is it still possible to pick out repetitions?

24 VYNNWGZVTOLYGXGWGD XSFNENUYWHPGZVTEIMJO FPPFWAOXY2LZMKQDJFK HARQNG2SXNGZNLNU2NXL 12VEA2XJFNJBXMUP2TSB QNNGZDGWNKRSFKWSVHC DNTGZKLNU2NFQGWHZGU VUPU2WG2F2SXXFZGWR3A XTWVZZEYDWNVYDHW VHWNTWGZVNWNTWGZV2 PCMIRZX KPJSOH22MNWU (gkq, xt rgxt rgxjb dj?) Information in a message Not a problem... (Well, now you might want to automate the process) Not a problem... (Well, now you might want to automate the process)

25 TGAGACACATATTTTTGATATTCCAGTTGTTGCAATC GAATGTAAAACATATTTAGATCTTTATAGTTATGGTAC ATTCAAGATCCAACCTTCATTCTAGTGTTTAAAGAGAAC TGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTTGGAGATTG AAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGATAC C TGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGT TA TTCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGA ATA TCTGCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAG GGATT TAGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATG GTTTATT CATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAAAGCGTTTGTC ATGGAAAA CGTAAAAGGATTGCTATCAAGGAAAAATGCAGAAGGTTTTAAAGTTATAG ATATTATTAAG AAAACATTTGAAGAAAAAATGGCGCACATTGTGCGACATTTTTTAAATGCTGCGGAATATGGC A TTCCGCAAATTAGAGAACGTATTTTTATTGTTGGCAATAAAAAAGGTAAAGTACT AGGTATTCCTAAAAA AACACATTCTCTGCAATTTTTAAATTTAAATAGGTCTCAATTATCGATCTTCGATGAT ATGAGTATTATACCTGCACT AACTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGGAAGTGAA GAGCAACCCTATCATTTAAAACC TCAAAATACTTATCAGACTTGGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAAT GGAACATTCTGACCGTTTAGTAGAA CGTTTCCGGCATATAAAATGGGGTGAATCCAGTTCGGATGTATCTAAAGAACATGGAGCTAGACGACGT AGTGGTAATGGTGAATTATCAAACAAA TCATATGATCAGAATAATCGCCGTTTAAATCCTCATAAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTG TCCATCCTTTTCAACATCGAAATTTAACAGC CCGTGAAGGAGCTAGAATCCAATCTTTTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAACTATTGCATCGA GAAGAAAGATTTGATGAAAAATTTCTTTGTCAATA TAATCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATCATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTA GTGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTA CAAAATACAGAGATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAAATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTGACCA AGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATATAAA GATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATCCAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGACGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATG GCAGCCTCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTGGTTTA ACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCTGAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATCAGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCAGATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCATCAAAGTATA GGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACAACCA CTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCGTCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTCAGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTGACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCAGTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGAC AACCTGTTTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTCGGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCCCATTCTGAGT CATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTCCTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTTAGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTTATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATTGGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTCTGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATTTCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCAGAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAGCGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTGACCAATGATGTAAATTTACGAA Information in a genome This is the problem we face. There are messages in DNA, sent not by aliens (probably) but by a few billion years of evolution. Detecting repetitions can provide valuable clues This is the problem we face. There are messages in DNA, sent not by aliens (probably) but by a few billion years of evolution. Detecting repetitions can provide valuable clues

26 Repeated Sequences Tandem repeats Microsatellites: CACACACACA… Repeats come in many forms. One form is tandem repeats Repeats come in many forms. One form is tandem repeats Tandem repeats are famous in their role in forensic identification.

27 Repeated Sequences Tandem repeats Microsatellites: CACACACACA… Some bacterial genomes are full of larger tandem repeats, their origin and function not understood. 7-mer repeats: AAAATTCAAAATTCAAAATTCAAAATTC…

28 Repeated Sequences Tandem repeats Dispersed repeats There are also dispersed repeats (the repeat units not next to each other). The most famous of these are transposons...

29 Repeated Sequences Tandem repeats Dispersed repeats - Long sequence repeats transposase ~1000 nt...consisting of a gene encoding an enzyme (transposase) that recognizes the ends of the DNA that contains it and moves or copies it to a different location.

30 Repeated Sequences Tandem repeats Dispersed repeats - Long sequence repeats transposase ~1000 nt - Shorter sequence repeats transposase ~1000 nt (minitransposons; MITES) Smaller sequences are parasitic on transposons, having lost the gene but retained the ends recognized by the transposase.

31 Repeated Sequences Tandem repeats Dispersed repeats - Long sequence repeats transposase ~1000 nt - Shorter sequence repeats transposase ~1000 nt (minitransposons; MITES) - Short sequence repeats TTATCCACA (e.g. DnaA-binding sites) Of course you’re already very familiar with other short repeats, serving as binding sites for regulatory proteins. Remember...?

32 Repeated Sequences Tandem repeats Dispersed repeats - Long sequence repeats transposase ~1000 nt - Shorter sequence repeats transposase ~1000 nt (minitransposons; MITES) - Short sequence repeats (Highly Iterated Palindromes; HIP) GCGATCGC - Short sequence repeats (e.g. DnaA-binding sites) TTATCCACA There are also sequences that are amongst the most common in all of bacteria. Function? Unknown.

33 Repeated Sequences Tandem repeats Dispersed repeats CRISPR Clustered Regularly Interspersed Short Palindromic Repeats A third class of repeated sequences are not quite tandem and not quite dispersed. Better to explain through an example... A third class of repeated sequences are not quite tandem and not quite dispersed. Better to explain through an example...

34 Repeated Sequences Tandem repeats Dispersed repeats CRISPR Clustered Regularly Interspersed Short Palindromic Repeats TTTGATTATTGCCTGTGCGGCAGTGAACTCAGGGGACTGGCGAACAATGTCTTTCATGATTT TCTAAGCTGCCTGTGCGGCAGTGAACGAAAAGGTAAGATGGGCAAGCTTCTAGTAGTTTTTC TAAGCTGCCTGTGCGGCAGTGAACATTATCTGAATGGCATTTTCTTTGGCGCAGATTTTCTA AGCTGCCTGTGCGGCAGTGAACTCGCCATTCCGTGAACCTGAGCGCGTTCGCGATTTCTAAG CTGCCTGTGCGGCAGTGAACATATTCTCGAGCGATAGCAATAGCCATTCCACTTTCTAAGCT GCCTGTGCGGCAGTGAACTCGGTCAAACAAATTTAGGCGACGATTTAACATTTCTAAGCTGC CTGTGCGGCAGTGAACAAAAAGAATTTGGGATTAAAGTTACCCATCAGTTTCTAAGCTGCCT GTGCGGCAGTGAACTCAATGCCTGAATCTCTGGCGTGATAGCTGCGGTTTCTAAGCTGCCTG Do you see any repetition here? (Look for a bunch of T’s)

35 Repeated Sequences Tandem repeats Dispersed repeats CRISPR Clustered Regularly Interspersed Short Palindromic Repeats TTTGATTATTGCCTGTGCGGCAGTGAACTCAGGGGACTGGCGAACAATGTCTTTCATGATTT TCTAAGCTGCCTGTGCGGCAGTGAACGAAAAGGTAAGATGGGCAAGCTTCTAGTAGTTTTTC TAAGCTGCCTGTGCGGCAGTGAACATTATCTGAATGGCATTTTCTTTGGCGCAGATTTTCTA AGCTGCCTGTGCGGCAGTGAACTCGCCATTCCGTGAACCTGAGCGCGTTCGCGATTTCTAAG CTGCCTGTGCGGCAGTGAACATATTCTCGAGCGATAGCAATAGCCATTCCACTTTCTAAGCT GCCTGTGCGGCAGTGAACTCGGTCAAACAAATTTAGGCGACGATTTAACATTTCTAAGCTGC CTGTGCGGCAGTGAACAAAAAGAATTTGGGATTAAAGTTACCCATCAGTTTCTAAGCTGCCT GTGCGGCAGTGAACTCAATGCCTGAATCTCTGGCGTGATAGCTGCGGTTTCTAAGCTGCCTG Is that all there is? Just T’s?

36 Repeated Sequences Tandem repeats Dispersed repeats CRISPR Clustered Regularly Interspersed Short Palindromic Repeats TTTGATTATTGCCTGTGCGGCAGTGAACTCAGGGGACTGGCGAACAATGTCTTTCATGATTT TCTAAGCTGCCTGTGCGGCAGTGAACGAAAAGGTAAGATGGGCAAGCTTCTAGTAGTTTTTC TAAGCTGCCTGTGCGGCAGTGAACATTATCTGAATGGCATTTTCTTTGGCGCAGATTTTCTA AGCTGCCTGTGCGGCAGTGAACTCGCCATTCCGTGAACCTGAGCGCGTTCGCGATTTCTAAG CTGCCTGTGCGGCAGTGAACATATTCTCGAGCGATAGCAATAGCCATTCCACTTTCTAAGCT GCCTGTGCGGCAGTGAACTCGGTCAAACAAATTTAGGCGACGATTTAACATTTCTAAGCTGC CTGTGCGGCAGTGAACAAAAAGAATTTGGGATTAAAGTTACCCATCAGTTTCTAAGCTGCCT GTGCGGCAGTGAACTCAATGCCTGAATCTCTGGCGTGATAGCTGCGGTTTCTAAGCTGCCTG No, the repetition extends far beyond the easily noticeable T’s....but then it stops! Let me redraw this, breaking the lines just before the red repeats...

37 Repeated Sequences Tandem repeats Dispersed repeats CRISPR Clustered Regularly Interspersed Short Palindromic Repeats TTTGATTATTGCCTGTGCGGCAGTGAACTCAGGGGACTGGCGAACAATGTCTTTCATGAT TTTCTAAGCTGCCTGTGCGGCAGTGAACGAAAAGGTAAGATGGGCAAGCTTCTAGTAGTT TTTCTAAGCTGCCTGTGCGGCAGTGAACATTATCTGAATGGCATTTTCTTTGGCGCAGAT TTTCTAAGCTGCCTGTGCGGCAGTGAACTCGCCATTCCGTGAACCTGAGCGCGTTCGCGA TTTCTAAGCTGCCTGTGCGGCAGTGAACATATTCTCGAGCGATAGCAATAGCCATTCCAC TTTCTAAGCTGCCTGTGCGGCAGTGAACTCGGTCAAACAAATTTAGGCGACGATTTAACA TTTCTAAGCTGCCTGTGCGGCAGTGAACAAAAAGAATTTGGGATTAAAGTTACCCATCAG TTTCTAAGCTGCCTGTGCGGCAGTGAACTCAATGCCTGAATCTCTGGCGTGATAGCTGCGG TTTCTAAGCTGCCTG Notice that the repeat units are spaced by non-repeat units.

38 Repeated Sequences Tandem repeats Dispersed repeats CRISPR Clustered Regularly Interspersed Short Palindromic Repeats TTTGATTATTGCCTGTGCGGCAGTGAACTCAGGGGACTGGCGAACAATGTCTTTCATGAT TTTCTAAGCTGCCTGTGCGGCAGTGAACGAAAAGGTAAGATGGGCAAGCTTCTAGTAGTT TTTCTAAGCTGCCTGTGCGGCAGTGAACATTATCTGAATGGCATTTTCTTTGGCGCAGAT TTTCTAAGCTGCCTGTGCGGCAGTGAACTCGCCATTCCGTGAACCTGAGCGCGTTCGCGA TTTCTAAGCTGCCTGTGCGGCAGTGAACATATTCTCGAGCGATAGCAATAGCCATTCCAC TTTCTAAGCTGCCTGTGCGGCAGTGAACTCGGTCAAACAAATTTAGGCGACGATTTAACA TTTCTAAGCTGCCTGTGCGGCAGTGAACAAAAAGAATTTGGGATTAAAGTTACCCATCAG TTTCTAAGCTGCCTGTGCGGCAGTGAACTCAATGCCTGAATCTCTGGCGTGATAGCTGCGG TTTCTAAGCTGCCTG Also notice the palindrome within the repeat unit.

39 Repeated Sequences Tandem repeats Dispersed repeats CRISPR Clustered Regularly Interspersed Short Palindromic Repeats TTTGATTATTGCCTGTGCGGCAGTGAACTCAGGGGACTGGCGAACAATGTCTTTCATGAT TTTCTAAGCTGCCTGTGCGGCAGTGAACGAAAAGGTAAGATGGGCAAGCTTCTAGTAGTT TTTCTAAGCTGCCTGTGCGGCAGTGAACATTATCTGAATGGCATTTTCTTTGGCGCAGAT TTTCTAAGCTGCCTGTGCGGCAGTGAACTCGCCATTCCGTGAACCTGAGCGCGTTCGCGA TTTCTAAGCTGCCTGTGCGGCAGTGAACATATTCTCGAGCGATAGCAATAGCCATTCCAC TTTCTAAGCTGCCTGTGCGGCAGTGAACTCGGTCAAACAAATTTAGGCGACGATTTAACA TTTCTAAGCTGCCTGTGCGGCAGTGAACAAAAAGAATTTGGGATTAAAGTTACCCATCAG TTTCTAAGCTGCCTGTGCGGCAGTGAACTCAATGCCTGAATCTCTGGCGTGATAGCTGCGG TTTCTAAGCTGCCTG So these things are Clustered, Regularly Interspersed, Short, Palindromic, and Repeats... CRISPRs!

40 Repeated Sequences Tandem repeats Dispersed repeats CRISPR Clustered Regularly Interspersed Short Palindromic Repeats TTTGATTATTGCCTGTGCGGCAGTGAACTCAGGGGACTGGCGAACAATGTCTTTCATGAT TTTCTAAGCTGCCTGTGCGGCAGTGAACGAAAAGGTAAGATGGGCAAGCTTCTAGTAGTT TTTCTAAGCTGCCTGTGCGGCAGTGAACATTATCTGAATGGCATTTTCTTTGGCGCAGAT TTTCTAAGCTGCCTGTGCGGCAGTGAACTCGCCATTCCGTGAACCTGAGCGCGTTCGCGA TTTCTAAGCTGCCTGTGCGGCAGTGAACATATTCTCGAGCGATAGCAATAGCCATTCCAC TTTCTAAGCTGCCTGTGCGGCAGTGAACTCGGTCAAACAAATTTAGGCGACGATTTAACA TTTCTAAGCTGCCTGTGCGGCAGTGAACAAAAAGAATTTGGGATTAAAGTTACCCATCAG TTTCTAAGCTGCCTGTGCGGCAGTGAACTCAATGCCTGAATCTCTGGCGTGATAGCTGCGG TTTCTAAGCTGCCTG CRISPRs are used by bacteria as an immune system (which I don’t have time to talk about) and used by humans in targeting gene modification (which I really don’t have time to talk about).

41 WHAT is the scientific rationale? What topics are ripe for progress and within our reach? I’ll tell you a few specific scientific stories in the coming weeks about repeated sequences, but for now, let’s turn to...

42 HOW to get started? Decide whether you have a preference as to group research topic To do this, go to: Course web site, List of Topics, Research Project, Groups and investigate the links to the four groups. If you have a preference let me know prior to March 8.

43 HOW to get started? Decide whether you have a preference as to group research topic Once groups are assigned (March 9), go to your group page and look at the articles listed on it. Investigate your group topic

44 HOW to get started? Decide whether you have a preference as to group research topic -Use the articles to build a picture of what is known and what is not known in your group topic. -Find other articles that explore what is known and unknown -Find a question at the edge of what is known and claim it as yours -Explore with your fingers. At each stage, try out ideas on BioBIKE. Do at least as much as you read. Investigate your group topic Hone in on a research topic

45 HOW to get started? Decide whether you have a preference as to group research topic Investigate your group topic Hone in on a research topic Build a bibliography of useful articles

46 HOW to get started? Decide whether you have a preference as to group research topic Investigate your group topic Hone in on a research topic Build a bibliography of useful articles Find a key article closest to your idea -Understand its key experiment thoroughly. -Replicate the experiment, if feasible. -Use the experiment as a guide to your own experiments.

47 HOW to get started? At each stage, there is online advice and examples on the Research Project topic page

48 WHEN to get started? (This one’s easy...) Now (if you want a say in the group topic, otherwise...) March 9 Your greatest strength is your subconscious. Give it lots of time and lots of directed sleep.

49 WHEN are key events? There are several events along the way, including weekly meetings with your research group (I’ll be there as well)

50 WHEN are key events? You’ll present the fruits of your labor to your peers at the end of the semester (during finals week)

51 WHEN are key events? You’ll also submit your work in the form of a written report, after having gotten the benefit of criticism on your presentation.

52 Research Project It’s a lot of work. Here’s what your predecessors said about it (anonymous year-end survey)...

53 Words of your predecessors The project certainly did require me to think on my own and to create and solve my own question. This project definitely opened my eyes to what science has still left to research and discover....by far the best part. Although I don't think bioinformatics is for me, the course itself has sparked my interest in research I liked how the class was set up to prepare us for the process of scientific discovery. The projects were great I feel that it was a lot of work but at the end I felt proficient in what I was doing. …it was an awesome experience researching and finding answers on your own. The project was a good way in giving students a taste of what actual bioinformatics is like. I was a little late in terms of being inspired. Looking back through all the work I did for the project, I feel that I really like the whole self- directed research and the process in doing something for a larger goal.

54 More words of your predecessors I feel as if the ideas of the individual projects of the course have the same problems of independence as the rest of the course in that there isn't enough base to work from in order to gain independence….For sure we were treated as if we were experts in the field, but wasn't it not clear that at the end of the project we were not? This project pushed me the most out of everything in this class,... I actually enjoyed having a self-directed research project, where I set up my own set of questions and answered them to the best of my knowledge and ability. Learned a ton, screamed a ton. Got a ton out of it. one of if not the most challenging project I have ever done. I found this to be a learning experience for me and what good leadership is- it's not about doing the work yourself but helping others do it in a way that will truly help them. I also learned things from my group members and that encouraged me to revise my strategy on how I am tackling some problems. Overall, it was extremely valuable to my development as an independent researcher.


Download ppt "Introduction to Bioinformatics Thursday, 26 February In this episode I introduce the Research Project, an activity that will occupy most of your bioinformatics-related."

Similar presentations


Ads by Google