Presentation is loading. Please wait.

Presentation is loading. Please wait.

BIOINFORMATICS GENE DISCOVERY BIOINFORMATICS AND GENE DISCOVERY Iosif Vaisman 1998 UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL Bioinformatics Tutorials.

Similar presentations


Presentation on theme: "BIOINFORMATICS GENE DISCOVERY BIOINFORMATICS AND GENE DISCOVERY Iosif Vaisman 1998 UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL Bioinformatics Tutorials."— Presentation transcript:

1 BIOINFORMATICS GENE DISCOVERY BIOINFORMATICS AND GENE DISCOVERY Iosif Vaisman 1998 UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL Bioinformatics Tutorials

2

3 From genes to proteins

4 DNA RNA mRNA TRANSCRIPTION SPLICING PROMOTER ELEMENTS PROTEI N TRANSLATION START CODON STOP CODON SPLICE SITES From genes to proteins

5

6 Comparative Sequence Sizes Yeast chromosome 3 350,000 Escherichia coli (bacterium) genome 4,600,000 Largest yeast chromosome now mapped 5,800,000 Entire yeast genome 15,000,000 Smallest human chromosome (Y) 50,000,000 Largest human chromosome (1) 250,000,000 Entire human genome 3,000,000,000

7 Low-resolution physical map of chromosome 19

8 Chromosome 19 gene map

9 Computational Gene Prediction Where the genes are unlikely to be located? How do transcription factors know where to bind a region of DNA? Where are the transcription, splicing, and translation start and stop signals? What does coding region do (and non-coding regions do not) ? Can we learn from examples? Does this sequence look familiar?

10 Artificial Intelligence in Biosciences Neural Networks (NN) Genetic Algorithms (GA) Hidden Markov Models (HMM) Stochastic context-free grammars (CFG)

11 Information Theory 0 1 1 bit

12 Information Theory 00 01 1 bit 11 10

13 Information Theory 1 bit

14 Scientific Models Mechanistic models Predictive power Elegance Consistency Stochastic models Predictive power Hidden Markov models Mechanism Black box Stochastic mechanism Physical models-- Mathematical models

15 Neural Networks interconnected assembly of simple processing elements (units or nodes) nodes functionality is similar to that of the animal neuron processing ability is stored in the inter-unit connection strengths (weights) weights are obtained by a process of adaptation to, or learning from, a set of training patterns

16 Genetic Algorithms Search or optimization methods using simulated evolution. Population of potential solutions is subjected to natural selection, crossover, and mutation choose initial population evaluate each individual's fitness repeat select individuals to reproduce mate pairs at random apply crossover operator apply mutation operator evaluate each individual's fitness until terminating condition

17 Crossover Child AB Child BA Parent A Parent B crossover point Mutation

18 Markov Model (or Markov Chain) A GA TCT Probability for each character based only on several preceding characters in the sequence # of preceding characters = order of the Markov Model Probability of a sequence P(s) = P[A] P[A,T] P[A,T,C] P[T,C,T] P[C,T,A] P[T,A,G]

19 Hidden Markov Models States -- well defined conditions Edges -- transitions between the states A T C G T A C ATGAC ATTAC ACGAC ACTAC Each transition asigned a probability. Probability of the sequence: single path with the highest probability --- Viterbi path sum of the probabilities over all paths -- Baum-Welch method

20 Hidden Markov Model of Biased Coin Tosses States (S i ): Two Biased Coins {C1, C2} Outputs (O j ): Two Possible Outputs {H, T} p(OutputsO ij ): p(C1, H), p(C1, T), p(C2, H) p(C2, T) Transitions: From State X to Y {A11, A22, A12, A21} p(Initial S i ): p(I, C1), p(I, C2) p(End S i ): p(C1, E), p(C2, E)

21 Hidden Markov Model for Exon and Stop Codon (VEIL Algorithm)

22 GRAIL gene identification program POSSIBLE EXONS REFINED EXON POSITIONS FINAL EXON CANDIDATES

23 Suboptimal Solutions for the Human Growth Hormone Gene (GeneParser)

24 Measures of Prediction Accuracy TN FP FNTN TPFN TP FN REALITY PREDICTION REALITY TP FN TN FP c c nc S n = TP / (TP + FN) S p = TP / (TP + FP) Sensitivity Specificity Nucleotide Level

25 Measures of Prediction Accuracy REALITY PREDICTION Exon Level WRONG EXON CORRECT EXON MISSING EXON S n = Sensitivity number of correct exons number of actual exons S p = Specificity number of correct exons number of predicted exons

26 GeneMark Accuracy Evaluation

27 Gene Discovery Exercise http://metalab.unc.edu/pharmacy/Bioinfo/Gene Bibliography http://linkage.rockefeller.edu/wli/gene/list.html and http://www-hto.usc.edu/software/procrustes/fans_ref/


Download ppt "BIOINFORMATICS GENE DISCOVERY BIOINFORMATICS AND GENE DISCOVERY Iosif Vaisman 1998 UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL Bioinformatics Tutorials."

Similar presentations


Ads by Google