1 Algorithms in Computational Biology (236522) Fall 2005-6 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Wednesday 12:30-13:30 (or.

Slides:



Advertisements
Similar presentations
How is RNA Transcribed from DNA
Advertisements

Algorithms in Computational Biology (236522) Fall Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday TA: Sivan.
Introduction to Bioinformatics. What is Bioinformatics Easy Answer Using computers to solve molecular biology problems; Intersection of molecular biology.
Classical and Modern Genetics.  “Genetics”: study of how biological information is carried from one generation to the next –Classical Laws of inheritance.
RNA and Protein Synthesis
13.2 Ribosomes and Protein Synthesis
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Basic Biology for CS262 OMKAR DESHPANDE (TA) Overview Structures of biomolecules How does DNA function? What is a gene? How are genes regulated?
1 Algorithms in Computational Biology (236522) Spring 2006 Lecturer: Golan Yona Office hours: Wednesday or Thursday 2-3pm (Taub 632, Tel 4356) TA: Itai.
Genetics and the Organism 10 Jan, Genetics Experimental science of heredity Grew out of need of plant and animal breeders for greater understanding.
13.2 Ribosomes and Protein Synthesis
Prepared with lots of help from friends... Metsada Pasmanik-Chor, Zohar Yakhini and NUMEROUS WEB RESOURCES. BioInformatics / Computational Biology Introduction.
DNA and RNA. I. DNA Structure Double Helix In the early 1950s, American James Watson and Britain Francis Crick determined that DNA is in the shape of.
Cbio course, spring 2005, Hebrew University Computational Methods In Molecular Biology CS-67693, Spring 2005 School of Computer Science & Engineering Hebrew.
RNA and Protein Synthesis
. Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at
. Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday TA: Ydo Wexler,
Welcome to Introduction to Bioinformatics Computing aka BIC1.
Introduction to Biological Sequences. Background: What is DNA? Deoxyribonucleic acid Blueprint that carries genetic information from one generation to.
12-3: RNA AND PROTEIN SYNTHESIS Biology 2. DNA double helix structure explains how DNA can be copied, but not how genes work GENES: sequence of DNA that.
Lesson Overview 13.1 RNA.
Overview of Molecular Biology
Unit 7 Lesson 1 DNA Structure and Function
. Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Prof. Shlomo Moran TA: Ydo Wexler Lecture: Tuesday12:30-14:30, Taub 6 Tutorial: Tuesday11:30-12:30,
Welcome to Introduction to Bioinformatics Computing aka BIC1.
Protein Synthesis Transcription and Translation. The Central Dogma The information encoded with the DNA nucleotide sequence of a double helix is transferred.
13.2 Ribosomes and Protein Synthesis
Chapter 13: RNA and Protein Synthesis
Chapter 13.1 and 13.2 RNA, Ribosomes, and Protein Synthesis
RNA & Protein Synthesis.
Phylogenetics II.
1 TRANSCRIPTION AND TRANSLATION. 2 Central Dogma of Gene Expression.
A Biology Primer Part II: DNA, RNA, replication, and reproduction Vasileios Hatzivassiloglou University of Texas at Dallas.
12-3 RNA AND PROTEIN SYNTHESIS. 1. THE STRUCTURE OF RNA.
What is central dogma? From DNA to Protein
RNA & Protein Synthesis
DNA Deoxyribonucleic Acid. DNA Structure What is DNA? The information that determines an organisms traits. DNA produces proteins which gives it “The.
CHAPTER 13 RNA and Protein Synthesis. Differences between DNA and RNA  Sugar = Deoxyribose  Double stranded  Bases  Cytosine  Guanine  Adenine 
PROTEIN SYNTHESIS TRANSCRIPTION AND TRANSLATION. TRANSLATING THE GENETIC CODE ■GENES: CODED DNA INSTRUCTIONS THAT CONTROL THE PRODUCTION OF PROTEINS WITHIN.
Reading DNA The DNA molecule has the same basic structure and function in all living things. It carries the instructions for building and operating an.
Introduction to Molecular Biology and Genomics BMI/CS 776 Mark Craven January 2002.
The Discovery of DNA as the genetic material. Frederick Griffith.
Chapter 12 DNA and RNA.
Biochemical Composition Evidence of Evolutionary Relationships.
Gene Activity 1Outline Function of Genes  One Gene-One Enzyme Hypothesis Genetic Code Transcription  Processing Messenger RNA Translation  Transfer.
Placed on the same page as your notes Warm-up pg. 48 Complete the complementary strand of DNA A T G A C G A C T Diagram 1 A T G A C G A C T T A A C T G.
Gene Expression DNA, RNA, and Protein Synthesis. Gene Expression Genes contain messages that determine traits. The process of expressing those genes includes.
Gene Activity Chapter 14. Gene Activity 2Outline Function of Genes  One Gene-One Enzyme Hypothesis Genetic Code Transcription  Processing Messenger.
Chapter 13.1: RNA Essential Questions
13.2 Ribosomes and Protein Synthesis
13.2 Ribosomes and Protein Synthesis
13.2 Ribosomes and Protein Synthesis
13.2 Ribosomes and Protein Synthesis
12-3 RNA and Protein Synthesis
What is RNA? Do Now: What is RNA made of?
RNA and Transcription DNA RNA PROTEIN.
How Proteins are Made Biology I: Chapter 10.
Central Dogma Central Dogma categorized by: DNA Replication Transcription Translation From that, we find the flow of.
13.2 Ribosomes and Protein Synthesis
13.2 Ribosomes and Protein Synthesis
12-3 RNA and Protein Synthesis
13.2 Ribosomes and Protein Synthesis
4/6 Objective: Explain the steps and key players in transcription.
2/22/12 Objective: Recognize the central dogma of genetics Describe the process of transcription Describe the structure of messenger RNA Warm-Up:
Replication, Transcription, Translation
13.2 Ribosomes and Protein Synthesis
13.2 Ribosomes and Protein Synthesis
4/2 Objective: Explain the steps and key players in transcription.
The Structure of DNA.
Presentation transcript:

1 Algorithms in Computational Biology (236522) Fall Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Wednesday 12:30-13:30 (or 15:30-16:30) TA: Ilan Gronau, Taub 700, tel 4894 Office hours Monday Lecture: Wednesday 10:30-12:30, Taub 6 Tutorial: Monday 14:30-15:30, Taub 6 This class has been initially edited from Nir Friedman’s lecture at the Hebrew University. Changes made by Dan Geiger, then by Shlomo Moran.

2 Course Information Requirements & Grades: 15-25% homework, in five assignments. [Submit in two weeks time]. Homework is obligatory % test. Must pass beyond 55 for the homework’s grade to count Exam date: /to be coordinated.

3 Bibliography Biological Sequence Analysis, R.Durbin et al., Cambridge University Press, 1998 Introduction to Molecular Biology, J. Setubal, J. Meidanis, PWS publishing Company, 1997 Phylogenetics, C. Semple, M. Steel, Oxford press, 2003 url: webcourse.cs.technion.ac.il/~cs236522webcourse.cs.technion.ac.il/~cs236522

4 Course Prerequisites Computer Science and Probability Background Data structure 1 (cs234218) Algorithms 1 (cs234247) Probability (any course)

5 Relations to Some Other Courses Bioinformatics Software (cs236523). The course Introduction to Bioinformatics covers practical aspects and hands on experience with many web-based bioinformatics programs. Albeit not a formal requirement, it is recommended that you look on the web site and examine the relevant software. Bioinformatics algorithms (cs236522). This is the current course which focuses on modeling some bioinformatics problems and presents algorithms for their solution. Bioinformatics project (cs236524). Developing bioinformatics tools under close guidance.

6 Biological Background Due time: Tutorial class of (<3 weeks from today). First home work assignment: Read the first chapter (pages 1-30) of Setubal et al., (copies are available in the Taub building library, and in the central library). Answer the questions of the first assignment in the course site.

7 Computational Biology Computational biology is the application of computational tools and techniques to (primarily) molecular biology. It enables new ways of study in life sciences, allowing analytic and predictive methodologies that support and enhance laboratory work. It is a multidisciplinary area of study that combines Biology, Computer Science, and Statistics. Computational biology is also called Bioinformatics, although many practitioners define Bioinformatics somewhat narrower by restricting the field to molecular Biology only.

8 Examples of Areas of Interest Building evolutionary trees from molecular (and other) data Efficiently constructing genomes of various organisms Understanding the structure of genomes (SNP, SSR, Genes) Understanding function of genes in the cell cycle and disease Deciphering structure and function of proteins _____________________ SNP: Single Nucleotide Polymorphism SSR: Simple Sequence Repeat

9 Exponential growth of biological information: growth of sequences, structures, and literature.

10 Four Aspects Biological –What is the task? Algorithmic –How to perform the task at hand efficiently? Learning –How to adapt/estimate/learn parameters and models describing the task from examples Statistics –How to differentiate true phenomena from artifacts

11 Example: Sequence Comparison Biological –Evolution preserves sequences, thus similar genes might have similar function Algorithmic –Consider all ways to “align” one sequence against another Learning –How do we define “similar” sequences? Use examples to define similarity Statistics –When we compare to ~10 6 sequences, what is a random match and what is true one

12 Course Goals Learning about computational tools for (primarily) molecular biology. Cover computational tasks that are posed by modern molecular biology Discuss the biological motivation and setup for these tasks Understand the kinds of solutions that exist and what principles justify them

13 Topics I Dealing with DNA/Protein sequences: Informal biological background. (1 week) Finding similar sequence (~3 weeks) Models of sequences: Hidden Markov Models (~2 weeks) Parameter estimation: ML methods and the EM algorithm (~4 weeks)

14 Topics II Reconstructing evolutionary trees: Background: Darwin’s theory of evolution Distance based methods (~2 weeks) Character based methods (~2 weeks) The presentations are similar to these given in the fall Semester 04-05, and can be found in the site of that semester. Updated presentations will be uploaded to the course site before the lectures.

15 Topics III (if time allows) Protein World: How proteins fold - secondary & tertiary structure How to predict protein folds from sequences data How to analyze proteins changes from raw experimental measurements (MassSpec)

16 Human Genome Most human cells contain 46 chromosomes: 2 sex chromosomes (X,Y): XY – in males. XX – in females. 22 pairs of chromosomes named autosomes.

17 DNA Organization Source: Alberts et al

18 The Double Helix Source: Alberts et al

19 DNA Components Four nucleotide types: Adenine Guanine Cytosine Thymine Hydrogen bonds (electrostatic connection): A-T C-G

20 Genome Sizes E.Coli (bacteria)4.6 x 10 6 bases Yeast (simple fungi)15 x 10 6 bases Smallest human chromosome 50 x 10 6 bases Entire human genome 3 x 10 9 bases

21 Genetic Information Genome – the collection of genetic information. Chromosomes – storage units of genes. Gene – basic unit of genetic information. They determine the inherited characters.

22 Genes The DNA strings include: Coding regions (“genes”) –E. coli has ~4,000 genes –Yeast has ~6,000 genes –C. Elegans has ~13,000 genes –Humans have ~32,000 genes Control regions –These typically are adjacent to the genes –They determine when a gene should be “expressed” “Junk” DNA (unknown function - ~90% of the DNA in human’s chromosomes)

23 The Cell All cells of an organism contain the same DNA content (and the same genes) yet there is a variety of cell types.

24 Example: Tissues in Stomach How is this variety encoded and expressed ?

25 Central Dogma Transcription mRNA Translation Protein Gene cells express different subset of the genes In different tissues and under different conditions שעתוק תרגום

26 Transcription Coding sequences can be transcribed to RNA RNA –Similar to DNA, slightly different nucleotides: different backbone –Uracil (U) instead of Thymine (T) Source: Mathews & van Holde

27 Transcription: RNA Editing Exons hold information, they are more stable during evolution. This process takes place in the nucleus. The mRNA molecules diffuse through the nucleus membrane to the outer cell plasma. 1.Transcribe to RNA 2.Eliminate introns 3.Splice (connect) exons * Alternative splicing exists

28 RNA roles Messenger RNA (mRNA) –Encodes protein sequences. Each three nucleotide acids translate to an amino acid (the protein building block). Transfer RNA (tRNA) –Decodes the mRNA molecules to amino-acids. It connects to the mRNA with one side and holds the appropriate amino acid on its other side. Ribosomal RNA (rRNA) –Part of the ribosome, a machine for translating mRNA to proteins. It catalyzes (like enzymes) the reaction that attaches the hanging amino acid from the tRNA to the amino acid chain being created....

29 Translation Translation is mediated by the ribosome Ribosome is a complex of protein & rRNA molecules The ribosome attaches to the mRNA at a translation initiation site Then ribosome moves along the mRNA sequence and in the process constructs a sequence of amino acids (polypeptide) which is released and folds into a protein.

30 Genetic Code There are 20 amino acids from which proteins are build.

31 Protein Structure Proteins are poly- peptides of amino-acids This structure is (mostly) determined by the sequence of amino-acids that make up the protein

32 Protein Structure

33 Evolution Related organisms have similar DNA –Similarity in sequences of proteins –Similarity in organization of genes along the chromosomes Evolution plays a major role in biology –Many mechanisms are shared across a wide range of organisms –During the course of evolution existing components are adapted for new functions

34 Evolution Evolution of new organisms is driven by Diversity –Different individuals carry different variants of the same basic blue print Mutations –The DNA sequence can be changed due to single base changes, deletion/insertion of DNA segments, etc. Selection bias

35 The Tree of Life Source: Alberts et al

36 Example of a graph theoretic problem related to evolution trees: the perfect phylogeny problem

37 Characters in Species A (discrete) character is a property which distinguishes between species (e.g. dental structure, a certain gene) A characters state is a value of the character (human dental structure). Problem: Given set of species, specified by their characters, reconstruct their evolutionary tree.

38 Species ≡ Vertices Characters ≡ Colorings States ≡ Colors Each species is identified by its states Evolutionary tree ≡ A tree with many colorings, containing the given vertices = No teeth = teeth A B C D

39 Another tree Which tree is more reasonable? = No teeth = teeth A B C D

40 Evolutionary trees should avoid reversal transitions A species regains a state it’s direct ancestor has lost. Famous (and rare) examples: –Teeth in birds. –Legs in snakes.

41 Evolutionary trees should avoid convergence transitions Two species possess the same state while their least common ancestor possesses a different state. Famous example: The marsupials.

42

43 Common Assumption: Characters with Reversal or Convergent transitions are highly unlikely in the Evolutionary Tree A character that exhibits neither reversals nor convergence is denoted homoplasy free.

44 A character is Homoplasy Free ↕ The corresponding coloring is convex (each color induces a connected subtree)

45 A partial coloring is convex if it can be completed to a (total) convex coloring

46 The Perfect Phylogeny Problem Input: a set of species, and many characters, each assign states (colors) to the species. Question: is there a tree T containing the species as vertices, in which all the characters (colorings) are convex?

47 Input: Some colorings (C 1,…,C k ) of a set of vertices (in the example: 3 colorings: left, center, right, each by (the same) two colors). Problem: Is there a tree T which includes these vertices, s.t. (T,C i ) is convex for i=1,…,k? RBRRBRRRR BBRRRB The Perfect Phylogeny Problem (combinatorial setting) NP-Hard In general, in P for some special cases