Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gene Finding. Biological Background The Central Dogma Transcription RNA Translation Protein DNA.

Similar presentations


Presentation on theme: "Gene Finding. Biological Background The Central Dogma Transcription RNA Translation Protein DNA."— Presentation transcript:

1 Gene Finding

2 Biological Background

3 The Central Dogma Transcription RNA Translation Protein DNA

4 Background *Essential Cell Biology; p.268 Non-coding regions  gene regulation wVicinity of TSS: direct interactions with Pol-II complex wLarger vicinity – indirect interactions (chromatin remodelling)

5

6 The Genetic Code First Letter Second Letter Third Letter

7 tRNA – Responsible for Translation Adopted from Genetic Analysis V, p.388

8 tRNA – Responsible for Translation Adopted from Genetic Analysis V, p.388

9 Frame Shifts wCode Triplets (“codons”) are not overlapping w  3x2 possible ways of reading depending on strand and the relative position where reading starts wThis is not just our concern when looking for genes, it is also the cell’s concern in terms of mutations: wOriginal: THE FAT CAT ATE THE BIG RAT wDelete C:THE FAT ATA TET HEB IGR AT w

10 Prokaryotes Gene Finding wNo noclues wMost DNA is coding (e.g. 70% in H.influenza) wEach gene is one contiunes DNA sequence (no introns) wPolyI – rRNA, PolyII – mRNA, PolyIII - tRNA

11 Detecting ORF wSimple Idea:  If there is no gene encoded then the expected frequency of STOP codon is 3/64 codons  ORF – open reading frame, a sequence of codons with no STOP codon  Simple Algorithm: 1.scan until you find a stop condon, in all reading frames. 2.Scan back to find a start codon. 3.If it’s long ehough, report this ORF as a putative gene Cons: Can’t detect short genes High FP ( E.Coli has 6500 ORFS but only 1100 genes)

12 Coding vs. Non coding regions Codon frequencies wCodon usage in coding regions is different wLeucine, Alanine, Tryptophan are coded in 6:4:1 different codons w  Expect to see a ratio of 6:4:1 in random sequence wIn proteins the appear in 6.9:6.5:1 ratio wAnother example: A or T appear in 90% of the case as the last letter of a codon in protein coding regions

13 Nocleutide MM for Gene Detection

14

15 2 nd Order MM Idea: extend the model to capture codons Results: poor…. Code overlap in this model

16 MM over codons Idea: Transform the code into codons, then use 1 rd MM

17 Why not use codon frequencies directly? “Codon Preferences” program:

18 “Codon Preferences” program Uses a window of 25 codons around each point Score:

19 Using Promoter’s Signal wWe are still far from perfect… w  idea: try to detect signals in the promoter regions, to help descriminate real genes in ORFs wProkaryotes: ~-35 tss: TTGACA ~-10 tss: TATAAT (“TATA box” signal) wNo single promoter has the exact consensus wNearly all promoters have 2-3 from TAxyzT w80-90% have all 3 wIn 50% xyz = TAA

20

21

22

23 Up To here summary wWe have seen the problems in trying to find genes in wide genome scan – Prokaryotes! wThe bottom line is that the problem is not really solved, but most research in gene finding focus on Eukaryotes, where the main interest lies … wNext lecture – much more sophisticated models, to handle the much more complex situation in Eukaryotes in general, and Human in particular


Download ppt "Gene Finding. Biological Background The Central Dogma Transcription RNA Translation Protein DNA."

Similar presentations


Ads by Google