Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.

Similar presentations


Presentation on theme: "CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure."— Presentation transcript:

1 CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure search pattern/structure recognition relationship of sequences Algorithm design optimal algorithms heuristic algorithms parallel algorithms Probabilistic models stochastic finite state automata (HMMs) stochastic regular grammars stochastic context-free grammars more complex grammar models

2 Probabilistic modeling and algorithms M: modeling a family of sequences (e.g. RNA) to capture certain properties Q 1, Q 2, …. (1)Each sequence x possesses a property Q k (x) with probability P k (x) (2) A probability distribution for each sequence x over the properties, i.e., ∑ k P k (x) = 1 for each given x (3) The most likely property Q * (x) is one with the highest probability, i.e., Q*(x) = arg max k { P k (x) } (4) Algorithms are designed to find the most likely property for given sequences. But how? Modeling mechanism M Computational linguistic systems can describe desired properties of bio sequences D (sample, training data) assigning probs

3 Outline for the course Part 0: molecular biology basics and review of probability theory Part 1: pairwise alignment, HMMs, profile-HMMs, gene finding, and multiple alignment (chapters 1-6) potential research projects: efficient HMM algorithms, gene finding Part 2: RNA stem-loops, SCFG, secondary structure prediction, structural homology search (chapters 9-10) potential research projects: efficient SCFG algorithms, pseudoknot prediction, protein secondary structure prediction Part 3: phylogeny reconstruction, probabilistic approaches (chapters 7-8) potential research projects: grammar modeling of evolution

4 The ways this course is to be conducted To learn new concepts and techniques Lectures (by the instructor and students) To apply learned knowledge to research Research discussions (lead by students and the instructor) To demonstrate learning effectiveness Presentations of research results (by students)

5 The central dogma of molecular biology

6 Nucleotides Purines Adenine, Guanine Pyrimidines Cytosine, Thymine Building blocks of DNA

7 Double helix of DNA

8 DNA replication

9 Genetic code

10 Mutations (1) synonymous (2) Missense (3) nonsense (4) frame-shift

11 RNA synthesis

12 RNA synthesis (cont’)

13 RNA can fold to itself

14 Protein synthesis

15 Biological information flow Genome AGACGCTGGTAT CGCATTAACTAA CGGGTTACTCG GATATTACCTTA CTATAGGGCGCT ATCGCGCGTTAA TCTGGTATC Introns Exons Gene sequence Protein sequence Protein structure Regulatory DNA sequence Sequence family Structure family Protein-DNA interactions Protein-protein interactions Gene regulation Gene expression Protein function Protein abundance Cellular role

16 What bioinformatics is NOT: Not just using a computer to speed up biology Not just applying computer algorithms to biology Not just the accountant of genomic data What bioinformatics is then: The creative use of computers to define and solve central biological puzzles The computer becomes an hypothesis machine, making predictions to be tested at the bench.


Download ppt "CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure."

Similar presentations


Ads by Google