Presentation is loading. Please wait.

Presentation is loading. Please wait.

DEPARTAMENTO DE ESTATÍSTICA Prof Hélio Magalhães de Oliveira, UFPE, 21/08/2013 1/2 × n-ário = 1 × (semi-n-ário) TKS Dr Francisco Cysneiros.

Similar presentations


Presentation on theme: "DEPARTAMENTO DE ESTATÍSTICA Prof Hélio Magalhães de Oliveira, UFPE, 21/08/2013 1/2 × n-ário = 1 × (semi-n-ário) TKS Dr Francisco Cysneiros."— Presentation transcript:

1 DEPARTAMENTO DE ESTATÍSTICA Prof Hélio Magalhães de Oliveira, UFPE, 21/08/2013 1/2 × n-ário = 1 × (semi-n-ário) TKS Dr Francisco Cysneiros

2 Prof. H. Magalhães de Oliveira UFPE – AGO 2013 Dados estatísticos sobre a vida biológica: a aleatoriedade como marca indelével no genoma das espécies. UNIVERSIDADE FEDERAL DE PERNAMBUCO DEPARTAMENTO DE ESTATÍSTICA

3 Escala Cronológica da Evolução da Vida DNA – origem da vida: Uma cronologia (Battail, 2001)

4 O QUE É REALMENTE A VIDA? 1 a mudança: Superação do vitalismo. 2 a mudança: desaparecimento dos contornos nítidos na distinção entre vivos e não vivos Seleção natural –Darwinismo e Teoria da evolução –O DNA / RNA Tendências estão derrubando as barreiras entre o vivo e o não vivo.

5 Propriedades características da vida natural Capacidade de reprodução Sensibilidade ao ambiente Metabolismo Singularidade química Alto grau de complexidade e organização Programação genética que dirige o desenvolvimento Histórico modelado pela seleção natural

6 Dificuldades para definir a vida. SEMENTES, estão vivas, mas não metabolizam VIRUS, não se auto-reproduzem (vide mulas) SALSICHAS não estão vivas, mas contém programa genético, são feitas de proteínas e DNA VIRUS DE COMPUTADOR, com propriedades da vida biológica: reproduzem-se, são sensíveis ao ambiente, metabolizam (consomem processamento, memória), podem ser complexos, sobrevivem usando seleção natural.

7 Fundamentos da Estrutura do DNA Os organismos vivos => células Procariontes vs Eucariontes As células dos eucariontes- coordenação de todas as atividades: o núcleo Núcleo: DNA, contém a informação genética. –transmissão da informação genética e –síntese de proteínas.

8 DNA – Estrutura e Função Bases nitrogenadas Purinas Pirimidinas

9 DNA – Estrutura Ligação Fosfodiéster

10 DNA – Estrutura Bases Complementares

11 1953: descoberta da estrutura do DNA Watson & Crick: estrutura dupla hélice do DNA

12 DNA – Estrutura e Função Dupla Hélice

13 DNA – Duplicação Ocorre na presença da DNA polimerase, que rompe as pontes de hidrogênio entre as bases nitrogenadas e as duas fitas do DNA se afastam: Nucleotídeos livres existentes na célula encaixam-se nas fitas, sempre em suas bases complementares São formadas duas moléculas de DNA idênticas. A duplicação do DNA é chamada semiconservativa porque a molécula nova do DNA tem uma fita nova e uma fita velha, originária da molécula mãe.

14 Relação do Dogma Central DNA DNA RNA Síntese Protéica X In vivo RNA polimerase transcrição tradução replicação

15 Síntese de Proteínas - Tradução A tradução ocorre nos ribossomas Trinca de bases do mRNA códon Trinca de bases do tRNA anti-códon

16 Tradução Nirenberg & Kohana

17 Síntese de proteínas

18 Mapping DNA into Proteins The genetic source is characterized by a four-letter alphabet : N={U, C, A, G} N={U, C, A, G} Input alphabet N 3 ={ n 1, n 2, n 3 | n i N, i =1,2,3} Output alphabet A:={ Leu, Pro, Arg, Gln, His, Ser, Phe, Trp, Tyr, Asn, Lys, Ile, Met, Thr, Asp, Glu, Gly, Ala, Val, Stop } High redundancy map GC : N 3 (|| N 3 ||=64) A (||A||=21)

19 O Código Genético UCAG U FENILALANINA LEUCINA SERINA TIROSINA PARADA CISTEÍNA PARADA TRIPTOFANO UCAGUCAG C LEUCINA PROLINA HISTIDINA GLUTAMINA ARGININA UCAGUCAG A ISOLEUCINA METIONINA (INÍCIO.) TREONINA ASPARAGINA LISINA SERINA ARGININA UCAGUCAG G VALINA ALANINA AC. ASPÁRTICO AC. GLUTÂMICO GLICINA UCAGUCAG 1 a Letra3 a Letra 2 a Letra

20 A analogia me levaria a um passo adiante, isto é, à crença de que todos os animais e vegetais descendem de um protótipo único [...] Todos os seres vivos têm muito em comum, em sua composição química, em suas vesículas germinativas, em sua estrutura celular e em suas leis de crescimento e reprodução [...] Provavelmente todos os seres orgânicos que tenham em qualquer ocasião vivido nessa Terra, descendem de alguma forma primordial única, na qual a vida primeiro respirou.... De um começo tão simples, formas infindáveis, as mais belas e as mais maravilhosas, evoluíram e estão evoluindo. CHARLES DARWIN (1859) On the Origin of Species

21 DNA: Similaridades Similaridade entre DNA de humanos: 99 a 99,1% Similaridade humanos - chimpanzés: 98,5% Somente ~2% do genoma humano codifica proteínas: bp -> 120 Mb/(8b/B)=15MB

22 O homem é mais próximo do gorila ou do orangotango? Comparação do DNA mitocondrial homem ATA ACC ATG CAC ACT ACT ATA ACC ACC CTA ACC CTG ACT TCC CTA ATT CCC CCC ATC CTT ACC CTC GTT ACC... gorila ATA ACT ATG TAC GAT ACC ATA ACC ACC TTA GCC CTA ACT TCC TTA ATT CCC CCT ATC CTT ACC TTC ATC ACT... orangotango ACA GCC ATG TTT ACT ACC ATA ACT GCC CTC ACC TTA ACT TCC CTA ATC CCC CCC ATT ACC GCT CTC ATT AAC...

23 1953: primeira seqüência de aminoácidos MALWTRLRPLLALLALWPPPPARAFVNQHLCGSHLVEALYLVCGERGFFYTP KARREVEGPQVGALELAGGPGAGGLEGPPQKRGIVEQCCASVCSLYQLENYCN Sanger: seqüência de aminoácidos da insulina bovina

24 Representações Alternativas para o Código Genético –Inner-to-outer map –2D-Gray genetic map, –genetic world-chart representations DE OLIVEIRA, H.M.,SANTOS-MAGALHÃES, N.S., The Genetic Code revisited: Inner-to-outer map, 2D-Gray map, and World-map Genetic Representations, 11th International Conference on Telecommunications, August 1-7, Fortaleza, Brazil, ICT2004, 2004, submetido. SANTOS-MAGALHÃES, N.S., BOUTON, E.A., DE OLIVEIRA, H.M., How to Represent the Genetic Code?, Reunião Anual da Sociedade Brasileira de Bioquímica, SBBq, 2004, submetido.

25 The Inner-to-outer Map Inner-to-outer map for the genetic code First nucleotide: inner circle Second nucleotide: surrounding Third nucleotide: outer region Homofonemas

26 Modem 64-QAM de Oliveira

27 U [11]; A [00]; G [10]; C [01]. bacteriophage X174: Each binary codeword belongs to a constant weigh code. DNACodeword G...C0110 A...T0011 G...C0110 T...A1100 A...T0011 T...A1100 G...C0110

28 Representação 2D-Gray de Oliveira, Santos Magalhães 2004

29 Código Genético: Mapeamento dos aminoácidos Santos Magalhães, E.Bouton, de Oliveira 2004

30 Coloured 2D-Gray genetic map Coloured Genetic code map for amino-acids This representation merges regions mapped into the same amino-acid ! ValIle Thr Ala Val Ile PheLeu Pro Ser Phe Leu Pro Ser Leu TrpArg Gln Stop TrpArpArg CysArg His Tyr Cys Arg GlySer Asn Asp Gly Ser GlyArg Lys Glu Gly Arg ValMetIleThr Ala Val MetIle ValIle Thr Ala Val Ile PheLeu Pro Ser Phe Leu

31 Espectro para localização de Éxons (Gene F56F11.4) Análise genômica

32 Análise wavelet de seqüências genômicas -cardíaco humano bp Oncogênio c-myb (galinha) bp

33 ÍNTROS & ÉXONS

34 Eliminando os íntrons na transcrição

35 Trecho de DNA da -hemoglobina humana (reading frames)...ACA GAC ACC ATG GTC CAC CTT GAC CAG ACA CCA TGG TGC ACC TGG AGA CAC CAT GGT GCA CCT TGA... Genes da sub-unidade da hemoglobina (2 genes) B 90 bp 131 bp 222 bp 851 bp 126 bp A

36 Porção do DNA do genoma do HIV-1 GGG TTC TTG GGA GCA GCA GGA AGC ACT ATG GGC GCA... O câncer é causado por agentes (carcinógenos, radiação, vírus) que danificam o DNA, ou interferem nos seus mecanismos de replicação e/ou reparo.

37 Genoma Music - Body Music Susumo Ohno URL- /

38 DNA do bacteriófago X bp - 10 genes (A até K) Gene n. de aminoácidos quadro A 455(1539 bp)2 B 120(360 bp)1 C 86(258 bp)1 D 152(456 bp)3 E 91(273 bp)1 F 427(1281 bp)2 G 175(525 bp)1 H 328(984 bp)3 J 38(114 bp)2 K 56(168 bp) bp

39 Genes no DNA do bacteriófago X174

40 GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAAAAATTATCTTGATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAG TGGACTGCTGGCGGAAAATGAGAAAATTCGACCTATCCTTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATCAACTAACGATTCTGTCAAAAACTGACGCGTTG GATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGTTTAGATATGAGTCACATTTTGTTCATGGTAGAGATTCTCTTGTTGACATTTTAAAAGAGCGTGGA TTACTATCTGAGTCCGATGCTGTTCAACCACTAATAGGTAAGAAATCATGAGTCAAGTTACTGAACAATCCGTACGTTTCCAGACCGCTTTGGCCTCTATTAAGCTCATTCAGG CTTCTGCCGTTTTGGATTTAACCGAAGATGATTTCGATTTTCTGACGAGTAACAAAGTTTGGATTGCTACTGACCGCTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTATG GTACGCTGGACTTTGTGGGATACCCTCGCTTTCCTGCTCCTGTTGAGTTTATTGCTGCCGTCATTGCTTATTATGTTCATCCCGTCAACATTCAAACGGCCTGTCTCATCATGG AAGGCGCTGAATTTACGGAAAACATTATTAATGGCGTCGAGCGTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCTTGCGTGTACGCGCAGGAAACACTGACGTTCTTACT GACGCAGAAGAAAACGTGCGTCAAAAATTACGTGCGGAAGGAGTGATGTAATGTCTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTA CTAAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGCCCCTTACTTGAGGATAAATTATGTCTAATATTCAAACTGGC GCCGAGCGTATGCCGCATGACCTTTCCCATCTTGGCTTCCTTGCTGGTCAGATTGGTCGTCTTATTACCATTTCAACTACTCCGGTTATCGCTGGCGACTCCTTCGAGATGGA CGCCGTTGGCGCTCTCCGTCTTTCTCCATTGCGTCGTGGCCTTGCTATTGACTCTACTGTAGACATTTTTACTTTTTATGTCCCTCATCGTCACGTTTATGGTGAACAGTGGAT TAAGTTCATGAAGGATGGTGTTAATGCCACTCCTCTCCCGACTGTTAACACTACTGGTTATATTGACCATGCCGCTTTTCTTGGCACGATTAACCCTGATACCAATAAAATCCC TAAGCATTTGTTTCAGGGTTATTTGAATATCTATAACAACTATTTTAAAGCGCCGTGGATGCCTGACCGTACCGAGGCTAACCCTAATGAGCTTAATCAAGATGATGCTCGTTAT GGTTTCCGTTGCTGCCATCTCAAAAACATTTGGACTGCTCCGCTTCCTCCTGAGACTGAGCTTTCTCGCCAAATGACGACTTCTACCACATCTATTGACATTATGGGTCTGCAA GCTGCTTATGCTAATTTGCATACTGACCAAGAACGTGATTACTTCATGCAGCGTTACCATGATGTTATTTCTTCATTTGGAGGTAAAACCTCTTATGACGCTGACAACCGTCCTT TACTTGTCATGCGCTCTAATCTCTGGGCATCTGGCTATGATGTTGATGGAACTGACCAAACGTCGTTAGGCCAGTTTTCTGGTCGTGTTCAACAGACCTATAAACATTCTGTGC CGCGTTTCTTTGTTCCTGAGCATGGCACTATGTTTACTCTTGCGCTTGTTCGTTTTCCGCCTACTGCGACTAAAGAGATTCAGTACCTTAACGCTAAAGGTGCTTTGACTTATA CCGATATTGCTGGCGACCCTGTTTTGTATGGCAACTTGCCGCCGCGTGAAATTTCTATGAAGGATGTTTTCCGTTCTGGTGATTCGTCTAAGAAGTTTAAGATTGCTGAGGGT CAGTGGTATCGTTATGCGCCTTCGTATGTTTCTCCTGCTTATCACCTTCTTGAAGGCTTCCCATTCATTCAGGAACCGCCTTCTGGTGATTTGCAAGAACGCGTACTTATTCGC CACCATGATTATGACCAGTGTTTCCAGTCCGTTCAGTTGTTGCAGTGGAATAGTCAGGTTAAATTTAATGTGACCGTTTATCGCAATCTGCCGACCACTCGCGATTCAATCATG ACTTCGTGATAAAAGATTGAGTGTGAGGTTATAACGCCGAAGCGGTAAAAATTTTAATTTTTGCCGCTGAGGGGTTGACCAAGCGAAGCGCGGTAGGTTTTCTGCTTAGGAGT TTAATCATGTTTCAGACTTTTATTTCTCGCCATAATTCAAACTTTTTTTCTGATAAGCTGGTTCTCACTTCTGTTACTCCAGCTTCTTCGGCACCTGTTTTACAGACACCTAAAGC TACATCGTCAACGTTATATTTTGATAGTTTGACGGTTAATGCTGGTAATGGTGGTTTTCTTCATTGCATTCAGATGGATACATCTGTCAACGCCGCTAATCAGGTTGTTTCTGTT GGTGCTGATATTGCTTTTGATGCCGACCCTAAATTTTTTGCCTGTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCGACTACCCTCCCGACTGCCTATGATGTTTATCCTTTGAATG GTCGCCATGATGGTGGTTATTATACCGTCAAGGACTGTGTGACTATTGACGTCCTTCCCCGTACGCCGGGCAATAACGTTTATGTTGGTTTCATGGTTTGGTCTAACTTTACC GCTACTAAATGCCGCGGATTGGTTTCGCTGAATCAGGTTATTAAAGAGATTATTTGTCTCCAGCCACTTAAGTGAGGTGATTTATGTTTGGTGCTATTGCTGGCGGTATTGCTT CTGCTCTTGCTGGTGGCGCCATGTCTAAATTGTTTGGAGGCGGTCAAAAAGCCGCCTCCGGTGGCATTCAAGGTGATGTGCTTGCTACCGATAACAATACTGTAGGCATGGG TGATGCTGGTATTAAATCTGCCATTCAAGGCTCTAATGTTCCTAACCCTGATGAGGCCGCCCCTAGTTTTGTTTCTGGTGCTATGGCTAAAGCTGGTAAAGGACTTCTTGAAGG TACGTTGCAGGCTGGCACTTCTGCCGTTTCTGATAAGTTGCTTGATTTGGTTGGACTTGGTGGCAAGTCTGCCGCTGATAAAGGAAAGGATACTCGTGATTATCTTGCTGCTG CATTTCCTGAGCTTAATGCTTGGGAGCGTGCTGGTGCTGATGCTTCCTCTGCTGGTATGGTTGACGCCGGATTTGAGAATCAAAAAGAGCTTACTAAAATGCAACTGGACAAT CAGAAAGAGATTGCCGAGATGCAAAATGAGACTCAAAAAGAGATTGCTGGCATTCAGTCGGCGACTTCACGCCAGAATACGAAAGACCAGGTATATGCACAAAATGAGATGC TTGCTTATCAACAGAAGGAGTCTACTGCTCGCGTTGCGTCTATTATGGAAAACACCAATCTTTCCAAGCAACAGCAGGTTTCCGAGATTATGCGCCAAATGCTTACTCAAGCTC AAACGGCTGGTCAGTATTTTACCAATGACCAAATCAAAGAAATGACTCGCAAGGTTAGTGCTGAGGTTGACTTAGTTCATCAGCAAACGCAGAATCAGCGGTATGGCTCTTCT CATATTGGCGCTACTGCAAAGGATATTTCTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGATACTTGGAACAATTTCT GGAAAGACGGTAAAGCTGATGGTATTGGCTCTAATTTGTCTAGGAAATAACCGTCAGGATTGACACCCTCCCAATTGTATGTTTTCATGCCTCCAAATCTTGGAGGCTTTTTTA TGGTTCGTTCTTATTACCCTTCTGAATGTCACGCTGATTATTTTGACTTTGAGCGTATCGAGGCTCTTAAACCTGCTATTGAGGCTTGTGGCATTTCTACTCTTTCTCAATCCCC AATGCTTGGCTTCCATAAGCAGATGGATAACCGCATCAAGCTCTTGGAAGAGATTCTGTCTTTTCGTATGCAGGGCGTTGAGTTCGATAATGGTGATATGTATGTTGACGGCC ATAAGGCTGCTTCTGACGTTCGTGATGAGTTTGTATCTGTTACTGAGAAGTTAATGGATGAATTGGCACAATGCTACAATGTGCTCCCCCAACTTGATATTAATAACACTATAGA CCACCGCCCCGAAGGGGACGAAAAATGGTTTTTAGAGAACGAGAAGACGGTTACGCAGTTTTGCCGCAAGCTGGCTGCTGAACGCCCTCTTAAGGATATTCGCGATGAGTAT AATTACCCCAAAAAGAAAGGTATTAAGGATGAGTGTTCAAGATTGCTGGAGGCCTCCACTATGAAATCGCGTAGAGGCTTTGCTATTCAGCGTTTGATGAATGCAATGCGACA GGCTCATGCTGATGGTTGGTTTATCGTTTTTGACACTCTCACGTTGGCTGACGACCGATTAGAGGCGTTTTATGATAATCCCAATGCTTTGCGTGACTATTTTCGTGATATTGG TCGTATGGTTCTTGCTGCCGAGGGTCGCAAGGCTAATGATTCACACGCCGACTGCTATCAGTATTTTTGTGTGCCTGAGTATGGTACAGCTAATGGCCGTCTTCATTTCCATG CGGTGCACTTTATGCGGACACTTCCTACAGGTAGCGTTGACCCTAATTTTGGTCGTCGGGTACGCAATCGCCGCCAGTTAAATAGCTTGCAAAATACGTGGCCTTATGGTTAC AGTATGCCCATCGCAGTTCGCTACACGCAGGACGCTTTTTCACGTTCTGGTTGGTTGTGGCCTGTTGATGCTAAAGGTGAGCCGCTTAAAGCTACCAGTTATATGGCTGTTGG TTTCTATGTGGCTAAATACGTTAACAAAAAGTCAGATATGGACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCACTAAAAACCAAGCTGTCGCTACTTCCCAA GAAGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTGTCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACGCGACG CCGTTCAACCAGATATTGAAGCAGAACGCAAAAAGAGAGATGAGATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTTTGGCGGCGCAACCTGTGACGACAAATCTGCTCA AATTTATGCGCGCTTCGATAAAAATGATTGGCGTATCCAACCTGCA

41 Tamanho de Genomas Menor número de genes Mycoplasma genitalium 470 genes Genoma humano Homem ~ genes (pensava-se erroneamente!)

42 bacteriófago X174

43 ORDEM DE MAGNITUDE DE GENOMAS (pares de bases = bp) Vírus10 kbp (SV40 5k, T k...) bactéria4 Mbp (E. coli 4.7 Mb) Levedura9 Mbp nematóide 90 Mbp insetos Gbp mosca da fruta180 Gbp mamíferos Gbp (man 3.2 Gbp) Peixe pulmonado140 Gbp mostarda de erva daninha200 Mbp Pinheiro68 Gbp amoebia dubia670 Gbp

44 PARADOXO DO valor C Valor C = Quantidade de DNA no Seu genoma haploide Muitos organismos menos complexos possuem valores C surpreendentemente elevados. O DNA extra tem função? Senão, por que é preservado de geração para geração?

45 Gene doença comprimento -globina humana anemia falciforme bp Fator VIII humano hemofilia bp Proteína kinase distrofia muscular bp

46 N. de espécies vivas na Terra ~ 10 7 Admita que estas sejam uma fração de 1/100 das que existiram (extinção) Tem-se ~10 9 espécies (aparentemente grande...) Isso é ridiculamente pequeno com respeito ao n. total de possíveis genomas na ausência de redundância GENOMAS ~ 4^10 9 ~ (para um genoma típico de 10 9 nucleotídeos) A identidade das coisas vivas fornecida pelo substrato genético, parece válida a hipótesespecies are sparse (Battail).

47 Pequena Cronologia de Genomas 1977 Seqüenciamento completo genoma do fago X174 (5.386 bp) 1995 Primeiro organismo vivo Genoma do Haemophilus influenzae (1,8 Mbp) 1996 Saccharomyces cerevisiae (12,1 Mbp) 1997 Escherichia coli (4.6 Mbp) 1998 Primeiro animal –nematóide Genoma do caenorhabditis elegans (97,1 Mbp) 1999 Primeiro cromossomo humano Cromossomo 22 (33,4 Mbp) 2000 Drosophila melanogaster (120 Mbp) 2000 Cromossomos 5, 16, 19, Human Genome Project June 2000 – milestone draft sequence

48 Reducionaismo: Alerta Andras Paldi (CNRS). O temendo reducionismo dos pesquisadores genéticos acaba considerando o ser vivo como uma adição estrita de elementos justapostos. Ao estabelecer um catálogo das proteínas corremos o risco de agravar o problema. É como se tentássemos entender o funcionamento de um foguete lendo o catálogo das suas peças!

49 Of Protein Size and Genomes NEREIDE S. SANTOS-MAGALHÃES, HÉLIO M. DE OLIVEIRA Of Protein Size and Genomes NEREIDE S. SANTOS-MAGALHÃES, HÉLIO M. DE OLIVEIRA WSEAS TRANS. ON BIOLOGY AND BIOMEDICINE Issue 2, Vol.3, February 2006 ISSN: ~200 academia downloads number of genes? (in living organisms) 1) bacterial genomes ; number of genes ~= genome size kbp. bacterial proteins reveals 350 amino acid residues as typical. 2) C. elegans genome of 99 Mbp and genomic rate 25%. Its protein size distribution has an average polypeptide length of 469 amino acids.

50 human proteins; serum albumin has 609 amino acid residues, collagen about 1,000, apolipoprotein B 4,536, human Titin 26,926. A DNA code is specified by the triplet DNA(C,R,d), where C is genome size (bp), R is genomic rate d is coding density (genes/bp). number of protein-coding base pairs R= total number C of base pairs of the genome.

51 Further DNA parameters: g is the number of genes of the genome, e is the average number of exons per gene.

52 coding density: estimated in terms of the expected protein size bp/gene average bacterial protein ~300 amino acids long, genomic bacterial rate ~ 0.8 to 0.9. Bacteria usually have a coding density d 1,000 bp/gene number of genes for bacteria: g C/1,000 (this is striking confirmed at

53 protein size histograms (straightforward organisms), FX174 and the phage l viruses

54 C. elegans

55 The coding density of different chromosomes of lower eukaryotic species is roughly the same, i.e. slight fluctuations from one chromosome to another in the same organism. The C=12,057,849 bp, g=6,268 genes) has an average coding deS. cerevisiae (nsity 1,947 bp/gene chromosomes. S. cerevisiae Chr12,093Chr91,864 Chr21,918Chr101,906 Chr31,855Chr111,960 Chr41,870Chr121,989 Chr52,090Chr131,841 Chr62,144Chr141,854 Chr71,891Chr151,908 Chr82,017average1,947 bp/gene (from The coefficient of variation (CV %) of the coding density is 5.06 %

56 The six chromosomes of the C. elegans (C=98,971,533 bp, g=17,585 genes) present an average coding density of 5,731 bp/gene. C. elegans ChrI5,072 ChrII5,592 ChrIII5,771 ChrIV6,312 ChrV4,899 Chr X6,740 average5,731 bp/gene (from The coding density barely varies from one chromosome to another The coefficient of variation (CV %) of the coding density is 1.72 %

57 DNA parameters for some well-known genomes, virus X174 microbial M. genitalium H. pylori H. influenzae S. Aureus B. subtilis M. tuberculosis E. coli X. fastidiosa

58 Organism genome size C (Mbp) coding density (bp/gene) number of genes g genomic rate R average protein length genomic information (Mbits) redundancy 1-R (%) X ~0 bacteriophage M. genitalium0.581, H. pylori1.671,0661, H. influenzae1.831,0711, S. aureus2.801,0692, B. subtilis4.211,0254, M. tuberculosis4.411,1263, E. coli4.641,0824, X. fastidiosa2.521,2382, S. cerevisiae12.061,9246, C. elegans995,62817, D.melanogaster 180 Mbp ~60* 120 ~ 13,235 ' ~ 8,823 13, Human (old) ~3,000 Mbp 1,000* 2,000 ~ 30,000 ' ~20, ,000?~0.03 ~300?~180.0?~97? Human (update) ~2,900 Mbp 967* 1,933 ~112,500 ~75,000 ~25,800~0.016 ~600~92.9~98.4

59 1) unsuccessful attempt to explain the complexity of living beings: the genome length. The so-called C-value paradox proved that this is incorrect. 2) The number of genes was supposed to be related to complexity. people to expect more genes than human actually have. about 100,000 widespread in 80s and late 90s 3) A potential measure that correlated with the complexity average protein size.

60 storing all genes of a single human require less than 10 MB (albeit the entire the human DNA sequence requires about 1 GB) Let C and d denote, the genome size and the coding density with the exception of highly repetitive sequences. About one third of high eukaryotic DNA corresponds to these sequences, which are not transcribed, but may have structural properties. Therefore, C=2C/3 and d=2d/3. The superscript prime refers to the expurgated genome, i.e. highly repeated sequences apart.

61 expected gene distribution in the 23 human chromosomes chromosomelength (bp) predicted genes (unveiled genes) Chr1226,828,9292,016 Chr2205,000,0001,822 (1,346) Chr3195,073,3061,734 Chr4115,000,0001,022 (796) Chr5117,696,5091,046 (923) Chr6169,212,3271,504 (1,557) Chr7310,210,9441,367 a (1,150) Chr8143,297,3001,274 Chr9117,790,3861,047 (1,149) Chr10132,016,9901,173 (816) Chr11130,908,9541,163 Chr12129,826,3791,154 Chr1390,000, (633) Chr1487,191, (1,050) Chr1581,992, Chr1679,932, (880) Chr1779,376, Chr1874,658, Chr1955,878, b (1,461) Chr2059,424, (727) Chr2133,924, c (225) Chr2234,352, (545) Chr X152,118,9491,352 (1,098)

62 gene distribution in human chromosomes: Genome size C=2,881 Gbp; Number of genes g=22,525. The genes mean size (bp) in each chromosome is:

63 Chrom. number C (bp) genes& pseudo (only genes) (bp) e (kbp) Chr2 [27] 237,000,000 2,585 (1,346) Chr4 [27] 186,000,000 1,574 (796) Chr6 [28] 166,800,000 2,190 (1,557)3187, Chr9 [29] 109,044,351 1,575 (1,149)3426, a 34.4 Chr10 [30] 131,666,441 1,357 (816)3227, Chr13 [31] 95,500, (633)3209, Chr14 [32] 87,410,661 1,443 (1,050)2958, a 45.7 Chr20 [33] 59,187, (727)2925, Chr22 [34] 34,491, (545)2664, Cromossomas humanos: Comprimentos médios

64 the average number of amino acid residues ( ) and the genomic rate (R) are shown. average number of amino acid residues ( L) genomic rate (R) Chrom. number Chr6Chr9Chr10Chr13Chr14Chr20Chr22 (aa) R (%)

65 CONCLUSIONS average length of exon about 300 bp, average length of intron about 6,900 bp, mean of about 6 exons/gene (from single-exon genes to 175 exon for the Titin gene!) average number of residues for coded-proteins ~ 600 aa. ************ average protein size as a worthy criterion for assessing life complexity.

66 DNA-Error Control Code May Be Unstructured H. M. DE OLIVEIRA, N.S. SANTOS-MAGALHÃES The astonishing reliability by which deoxyribonucleic acid (DNA) has been preserved through ages implies that cells replication machinery have to ensure against copying mistakes. The replication machine is self-correcting and operates with a mean of 1 error per 10 7 nucleotides copied. Around 99% of such errors are corrected by the DNA mismatch repair mechanism, resulting 1 error per 10 9 nucleotides copied.

67 Introns & exons most eukaryotic genes have their coding sequences interrupted by noncoding regions (the so-called introns, for intervening nontranscribed sequences). Introns are usually longer than the exons. INTRONS: size ranging from 20 bp, to 250,000 bp; EXONS: size ranging from 50 to 600 bp (average 300 bp). attempts in understanding the biological role of introns: no recognized functions were found.

68 Highly repetitive sequences: SINES (short interspersed elements) 13% of the genome, LINES (long interspersed elements.) 21% of the genome. Repetitive DNA has commonly been regarded as junk-DNA, noncoding DNA: introns, 26% of the human genome. Viruses and bacteria have a high fecundity and few gene families; have little or almost no need for protection. Plants and animals have high permanency. => Must be robust to mutations (survivors of natural selection)

69 Standard error correcting codes designed by imposing constraints on the sequences. Why using structured codes? Answer : (mislead) belief that the decoding of random code is unfeasible. Due to the lack of structure => an exhaustive search. We think that Darwinian mechanisms for protecting DNA may be quite different. No parity rules should be looked for! (HMdO)

70 we believe : introns were the spontaneous mechanism of introducing uncertainty. In a battle, a crucial payload is to be sent to the front. If the only way is sending it through the battlefield, it should not be directly dispatched. Many fake-cargos could be added, and the relevant one will be hidden among them. If the enemy (noise, mutation) hardly tries to intercept this crucial delivery, he can now probably not succeed due to the amount of uncertainty added to the process. Many ineffective cargos (junk-cargos or introns) will be hit, but the main one will probably be missed. same strategy used in the safeguard of authorities such as Presidents of some nations (to include uncertain routes and second self.)

71 DNA coding has trivial decoding scheme (asynchronous start-stop protocol). DNA code meet Battails close-to-random criterion Biological evolutionary codes match Shannon's paradigm: they are long truly random codes. We quote Battail: Nature appears as an outstanding engineer…

72 ARREMATE: Este seminário é essencialmente uma provocação! Se a Estatística lida com grandes massas de dados (dados já disponíveis), com comportamento inerentemente aleatório, as bases de dados de Genomas, disponíveis publicamente, são fonte de desafio para excelentes trabalhos e descobertas Obrigado...


Download ppt "DEPARTAMENTO DE ESTATÍSTICA Prof Hélio Magalhães de Oliveira, UFPE, 21/08/2013 1/2 × n-ário = 1 × (semi-n-ário) TKS Dr Francisco Cysneiros."

Similar presentations


Ads by Google