1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.

Slides:



Advertisements
Similar presentations
1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University.
Advertisements

3 very important cell processes:
Chapter 12:DNA and RNA (Molecular Genetics).
MOLECULAR GENETICS. DNA- deoxyribonucleic acid James Watson and Francis Crick discover the structure of the DNA molecule DNA is a double helix (twisted.
Chapter 7 Dynamic Programming.
Classical and Modern Genetics.  “Genetics”: study of how biological information is carried from one generation to the next –Classical Laws of inheritance.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Sequencing and Sequence Alignment
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 18: Application-Driven Hardware Acceleration (4/4)
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Implementation of Planted Motif Search Algorithms PMS1 and PMS2 Clifford Locke BioGrid REU, Summer 2008 Department of Computer Science and Engineering.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Exploration Session Week 8: Computational Biology Melissa Winstanley: (based on slides by Martin Tompa,
Case Study. DNA Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic instructions used in the development and functioning of all known.
DNA & Genetics Biology. Remember chromosomes? What are genes? Made up of DNA and are units of heredity; unique to everyone What are traits? Are physical.
C OMPUTATIONAL BIOLOGY. O UTLINE Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity of the Algorithms.
Physical Mapping of DNA Shanna Terry March 2, 2004.
CSE 6406: Bioinformatics Algorithms. Course Outline
Molecules of life:DNA, RNA and Amino Acids Molecular Structure Lecture 3.
Interest Grabber DNA contains the information that a cell needs to carry out all of its functions. In a way, DNA is like the cell’s encyclopedia. Suppose.
DNA alphabet DNA is the principal constituent of the genome. It may be regarded as a complex set of instructions for creating an organism. Four different.
DNA Deoxyribose Nucleic Acid. DNA (deoxyribonucleic acid) Genetic Information in the form of DNA is passed from parent to offspring. Genes are the code.
The Fourth Macromolecule!!! Objectives: 1.Describe the structure and function of DNA and RNA 2.Explain how DNA replicates itself 3.Explain the purpose.
Computational Molecular Biology Introduction and Preliminaries.
What are the parts of DNA? Vocabulary word for chapter 6.
CHAPTER 12 STUDY GUIDE MATER LAKES ACADEMY MR. R. VAZQUEZ BIOLOGY
Chapter 11 DNA and GENES. DNA: The Molecule of Heredity DNA, the genetic material of organisms, is composed of four kinds nucleotides. A DNA molecule.
DNA Deoxyribose Nucleic Acid – is the information code to make an organism and controls the activities of the cell. –Mitosis copies this code so that all.
Chap. 4 FRAGMENT ASSEMBLY OF DNA Introduction to Computational Molecular Biology Chapter 4.
Doug Raiford Lesson 2.  Material of life  Heritable traits  The job of DNA is to produce proteins  Involved in virtually every chemical reaction ▪
2/10/2014 to 2/14/2014. DNA structure In 1952, scientist Rosalind Franklin discovered that DNA is two chains of molecules in a spiral form. The actual.
Revision 1 Molecules of life. Comparing cell structure.
Nucleic Acids Nucleic acids provide the directions for building proteins. Two main types…  DNA – deoxyribonucleic acid  Genetic material (genes) that.
Biochemical Composition Evidence of Evolutionary Relationships.
Genetics 3.1 Genes. Essential Idea: Every living organism inherits a blueprint for life from its parents.
Chapter 10 Part - 1 Molecular Biology of the Gene - DNA Structure and Replication.
Molecular Genetics Molecular Genetics. Question??????? What IS a gene or trait? In the case above, what are freckles? What IS a gene or trait? In the.
DNA The Secret of Life. Deoxyribonucleic Acid DNA is the molecule responsible for controlling the activities of the cell It is the hereditary molecule.
GENETICS. Objectives: Objective 10- Identify the differences between DNA & RNA. Objective Identify the mechanisms through which DNA can be mutated.
Prepared By: Syed Khaleelulla Hussaini. Outline Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity.
THE ROLES OF DNA.
What are proteins? Constructed from 20 different amino acids Involved in almost all cellular functions including:
1 DNA The illustration is a ‘model’ of the double helix forming part of a DNA molecule (Slide 14)
Nucleic Acids DNA & RNA.
Genetics.
Let’s Review! What is a macromolecule?
Proteins and Nucleic Acids
Protein Synthesis DNA RNA Protein.
Nucleic Acids Stores information
Protein Synthesis.

The Structure of DNA and Restriction Enzymes
Chapter 12 Molecular Genetics.
Review Sheet: DNA, RNA & Protein Synthesis
Chapter 3 The Double Helix.
BIOLOGY Vocabulary Chapter 12 & 13.
What Does DNA Look Like? Do Now! Section 1
DNA Notes.
Molecular Basis of Heredity
Chapter 12 Molecular Genetics.
Replication, Transcription, Translation
Science Review Week 3 DNA and RNA.
Biology 331 Genetics Introduction.
Nucleic Acids DNA & RNA.
Nucleic Acids DNA & RNA.
The Structure of DNA.
The Structure and Function of DNA
BC Science Connections 10
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

1 Application of Algorithm Research to Molecular Biology R. C. T. Lee Dept. Of Computer Science National Chinan University

2 There is one peculiar characteristics of all living organisms: We can reproduce ourselves. Yet, it is important that what we reproduce have to be the same as we are. That is, wild flowers produce the same kind of wild flowers and birds reproduce the same kind of birds.

3 Information about ourselves must be passed to our descendants. Question: How is this done? Answer: Through DNA.

4 DNA(Deoxyribonucleic Acid) can be viewed as two strands of nucleic acids formed as a double helix.

5

6 There are only four types of nucleic acids in every DNA: A: Adenine G: Guanine C: Cytosine T: Thymine

7 Each strand of a DNA is a sequence of A, G, C and T. Yet, in each strand, A is paired with T in the other strand. Similarly, G is paired with C.

8 Human Mitochondrial DNA Control Region TTCTTTCATGGGGAAGCAAA AAGAAAGTACCCCTTCGTTT

9 DNA exists in cells. For each living organism, there are a lot of different kinds of cells. For instance, in human beings, we have muscle cells, blood cells, neural cells etc. How can different cells perform different functions?

10 Genes In each DNA sequence, there are subsequences which are called genes. Each gene corresponds to a distinct protein and it is the protein which determines the function of the cell. For instance, in red blood cells, there must be oxygen carrying protein haemoglobin and the production of this protein is controlled by a certain gene.

11 Proteins Each protein consists of amino acids. There are 20 different amino acids

12

13 The Relationship between a Gene and its Corresponding Protein

14 As shown above, each amino acid is coded by a triplet. For instance, TTC denotes PHE(Phenylalanine). Each triplet is called a codon. There are three codons, namely TAA, TGA and TAG which represent “end of gene”.

15 Protein Rnase A: KETAAAKFER Its corresponding DNA sequence is: AAA GAA ACT GCT GCT GCT AAA TTT GAA CGT

16 How Is a Protein Produced? RNA (Ribonucleic Acid) Each cell is able to recognize all of the starting points of genes relevant to the proteins important to the functions of the cell.

17 The RNA system scans a gene. For each codon being scanned, it produces a corresponding amino acid. After all codons have been scanned, the corresponding protein is produced.

18

19 AAA GAA ACT GCT GCT GCT AAA TTT GAA CGT KETAAAKFER Note that codon AAA corresponds to amino acid K and CGT corresponds to R. Remember TAA, TGA and TAG signify “end of gene”.

20 Problems 1. String Matching Problem 2. Sequence Alignment Problem 3. Evolution Tree Problem 4. RNA Secondary Structure Prediction Problem 5. Protein Structure Problem 6. Physical Mapping Problem

21 Exact String Matching Problems –Instance: A text T of length n and a pattern P of length m, where n > m. –Question: Find all occurrences of P in T. –Example: If T = “ttaptaap” and P = “ap”, then P occurs in T starting at 3 and 7. Linear time (O(n+m) time) Algorithms –Knuth-Morris-Pratt (KMP) algorithm –Boyer-Moore algorithm

22 Approximate String Matching Problems –Instance: A text T of length n, a pattern P of length m and a maximal number of errors allowed k –Question: Find all text positions where the pattern matches the text up to k errors, where errors can be substituting, deleting, or inserting a character. –Example: Let T = “pttapa”, P = “patt” and k = 2. The substrings T[1..2], T[1..3], T[1..4] and T[5..6] are up to 2 errors with P. Algorithms –Dynamic Programming approach – NFA approach

23 Sequence Alignment Problem ATTCATTACAACCGCTATG ACCCATCAACAACCGCTATG It appears that these two sequences are quite different. An alignment will produce the following: ATTCATTA-CAACCGCTATG ACCCATCAACAACCGCTATG

24 Given two sequences, any alignment will have a corresponding score. For each exact match, the score is equal to 2. For each mismatch, the score is equal to -1. AGC- AG-C AAAC AAAC 2-3=-1 2x2-2x(-1)=2

25 The sequence alignment problem: Given two sequences, find an alignment which produces the highest score. Approach: Dynamic Programming The multiple sequence alignment problem is NP-hard

26 The Evolution Tree Problem

27

28 The evolution tree problem: Given a distance matrix of n species, find an evolution tree under some criterion. Usually, the criteria are such that all of the tree distances reflect the original distances. That is, when two species are close to each other in the distance matrix, they should be close in the evolution tree.

29 Each criterion corresponds to a distinct evolution tree problem. Most of them are NP-complete. Algorithms which produce optimal evolution trees in polynomial time are mostly based upon the minimal spanning tree approach.

30 A Partial Evolution Tree of the Homo Sapien (Intelligent Human Beings, also Modern Men) Our ancestors are from Africa.

31 Secondary Structure of RNA Due to hydrogen bonds, the primary structure of a RNA can fold back on itself to form its secondary structure. Base pairs (formed by hydrogen bonds): 1.A  U (Watson-Crick base pair) 2.C  G (Watson-Crick base pair) 3.G  U (Wobble base pair)

32 AGGCCUUCCU

33 2D & 3D Structures of Yeast Phenylalanyl-Transfer RNA 2D Structure 3D Structure

34 Secondary Structure Prediction Problem Given an RNA sequence, determine the secondary structure of the minimum free energy from this sequence. Approach: Dynamic Programming

35 Protein Structure Problem Each amino acid of a protein can be classified into either of the following two types: –H (hydrophobic, non-polar) (hating water) –P (hydrophilic, polar) (loving water) Then the amino acid sequence of a protein can be viewed as a binary sequence of H’s (1’s) and P’s (0’s).

36 Example Instance: Score = 5Score = 3

37 H-P Model Instance: A sequence of 1’s (H’s) and 0’s (P’s). Question: To find a self-avoiding paths embedded in either a 2D or 3D lattice which maximizes score, where the score is the number of pairs of 1’s that are adjacent in the lattice without being adjacent in the sequence. NP-complete even for 2D lattice.

38 Physical Mapping Problem Select a subset of cosmid clones of minimum total length that covers the YAC DNA. C: Full DNA 10 8 bp Cut C and clone into overlapping YAC clones bp Fragment assembling Physical mapping Cut the DNA in each YAC clone and clone into overlapping cosmid clones bp Duplicate the cosmid and then cut the copies randomly. Select and sequence short fragments and then reassemble them into a deduced cosmid string bp

39 Shortest Common Superstring Input: A collection F of strings. Output: A shortest possible string S such that for every f  F, S is a superstring of f. For example: NP-complete ACT CTA AGT ACTAGT F S

40 Suppose the target is too long and its contents are unknown. What can we do? Enzyme A  {6, 8, 3, 10} Enzyme B  {7, 11, 4, 5} Enzymes A and B  {1, 5, 2, 6, 7, 3, 3}

41 A B AB This problem is called the two digest problem which is NP-complete.

42 TAA, TGA, or TAG. Do you know what they mean? End of Gene. Thank you for your patience. Have a good conference.