Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Similar presentations


Presentation on theme: "Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell."— Presentation transcript:

1 Gene, Proteins, and Genetic Code

2 Protein Synthesis in a Cell

3 A protein sequence >gi|7228451|dbj|BAA92411.1| EST AU055734(S20025) corresponds to a region … MCSYIRYDTPKLFTHVTKTPPKNQVSNSINDVGSRRATDRSVASCSSEKSVGTMSVKNASSISFEDIEKSISNWKIPKVN IKEIYHVDTDIHKVLTLNLQTSGYELELGSENISVTYRVYYKAMTTLAPCAKHYTPKGLTTLLQTNPNNRCTTPKTLKWD EITLPEKWVLSQAVEPKSMDQSEVESLIETPDGDVEITFASKQKAFLQSRPSVSLDSRPRTKPQNVVYATYEDNSDEPSI SDFDINVIELDVGFVIAIEEDEFEIDKDLLKKELRLQKNRPKMKRYFERVDEPFRLKIRELWHKEMREQRKNIFFFDWYE SSQVRHFEEFFKGKNMMKKEQKSEAEDLTVIKKVSTEWETTSGNKSSSSQSVSPMFVPTIDPNIKLGKQKAFGPAISEEL VSELALKLNNLKVNKNINEISDNEKYDMVNKIFKPSTLTSTTRNYYPRPTYADLQFEEMPQIQNMTYYNGKEIVEWNLDG FTEYQIFTLCHQMIMYANACIANGNKEREAANMIVIGFSGQLKGWWNNYLNETQRQEILCAVKRDDQGRPLPDRDGNGNP TELKEGFHMEEKDEPIQEDDQVVGTIQKYTKQKWYAEVMYRFIDGSYFQHITLIDSGADVNCIREDEILDQLVQTKREQV VNSIYLHDNSFPKSMDLPDQKITEKRAKLQDIPHHEERLLDYREKKSRDGQDKLPMEVEQSMATNKNTKILLRAWLLST A protein sequence may have a few hundreds to several thousands amino acids.

4 Protein synthesis

5 Genetic code..ATTCACAGTGGA....ATTCACAGTGGA.. I H S G

6 Notes on translation Three Reading frames Third base not important 5’ -> 3’ Start and end codon Open Reading Frame (ORF) Each gene is an ORF, but not all ORF are genes.

7 The Central Dogma of Molecular Biology DNARNAProtein transcripttranslation replication genotype phenotype

8 Exception – retroviruses DNARNAProtein transcripttranslation replication genotype phenotype

9 Protein Phenotype DNA (Genotype) Biology

10 Genes One gene encodes one protein (or sometimes RNA). Like a program, it starts with start codon (e.g. ATG), then each three code one amino acid. Then a stop codon (e.g. TGA) signifies end of the gene. Genes are dense in prokaryotes and sparse in eukaryotes. In the middle of a eukaryotic gene, there are introns that are spliced out (as junk) after transcription. Good parts are called exons. This is the task of gene finding.

11 Gene related diseases Hemophilia: on X chromosome. Sickle-Cell Anemia: single nucleotide mutation in the first exon of beta-globin gene (removes a cutting site). 1 in 12 African Americans are carriers. (sick for homozygotes) BRCA1 gene (chr. 17q) – responsible for ½ inherited breast cancer (10% of breast cancer) Fragile X syndrome (mentally retard) – 1 in 1250 males, 2500 females (dominate, but females have partially expressed good gene). FMR-1 gene: tri-nucleotide repeats >200 causes disease. P53 gene: chr. 17p, tumor suppressor protein.

12 Gene Prediction and Annotation Prokaryotes 1.Start/stop codon (ORF) 2.Promoters 3.Content 4.Sequence similarity

13

14 Start Codon May miss short genes. Do not know which start codon to use. Overlapping ORF at different reading frames.

15 Promoters 5'-XXXXPPPPPPXXXXXXXXXPPPPPPXXXXGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGXXXX-3‘ -35 -10 Gene to be transcribed -10: T A T A A T 77% 76% 60% 61% 56% 82% -35: T T G A C A 69% 79% 61% 56% 54% 54% Pribnow box In prokaryotes, the promoter consists of two short sequences at -10 and -35 position upstream of the gene, that is, prior to the gene in the direction of transcription. The sequence at -10 is called the Pribnow box and usually consists of the six nucleotides TATAAT. The Pribnow box is absolutely essential to start transcription in prokaryotes. The other sequence at -35 usually consists of the six nucleotides TTGACA. Its presence allows a very high transcription rate.prokaryotesPribnow box These rules are only approximately correct.

16 Scoring a 6-mer as Pribnow box We need a “score function” to measure the likelihood that a 6-mer is a pribnow box

17 An exemplary function for pribnow box fitness evaluation log()

18 Content I – codon bias A codon XYZ occurs with different freqencies in coding regions and non-coding regions different amino acids have different freq. Diff. codons for the same amino acid have diff. freq. In non-coding regions approx. p(X)*p(Y)*p(Z)

19 http://www.kazusa.or.jp/codon/

20 Codon bias First use many known genes of the organism or similar organisms to train codon frequency table. Each codon c i has f(c i ). Second compute the background frequency of each base bf(X) for X=A,C,G,T The “significance” of a codon c=XYZ is then –log( f(c) / (bf(X)*bf(Y)*bf(Z))). High average significance in a region is an indication of gene.

21

22 Content II - Hidden Markov Model (HMM)

23 Eukaryotes Basic idea similar to Prokaryotes Difference:

24 DNA-specific transcription factors These are the basic of gene-regulatory network Another hot area in Bioinformatics

25 Splicing Consensus sequences have been identified as necessary but not sufficient for splicing. In vertebrates, these sequences are (the slash identifies the exon-intron or intron-exon junction): Consensussplicing C(orA)AG/GTA(orG)AGT "donor" splice site T(orC)nNC(orT)AG/G "acceptor" splice site. A third sequence, which in yeast is TACTAAC, is necessary within the intron sequence. These rules are only approximately correct.

26

27

28


Download ppt "Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell."

Similar presentations


Ads by Google