. Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at www.cs.huji.ac.il/~nir.

Slides:



Advertisements
Similar presentations
Algorithms in Computational Biology (236522) Fall Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Thursday TA: Sivan.
Advertisements

Introduction to Bioinformatics. What is Bioinformatics Easy Answer Using computers to solve molecular biology problems; Intersection of molecular biology.
1 Genetics The Study of Biological Information. 2 Chapter Outline DNA molecules encode the biological information fundamental to all life forms DNA molecules.
RNA and Protein Synthesis
13.2 Ribosomes and Protein Synthesis
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Basic Biology for CS262 OMKAR DESHPANDE (TA) Overview Structures of biomolecules How does DNA function? What is a gene? How are genes regulated?
1 Algorithms in Computational Biology (236522) Spring 2006 Lecturer: Golan Yona Office hours: Wednesday or Thursday 2-3pm (Taub 632, Tel 4356) TA: Itai.
13.2 Ribosomes and Protein Synthesis
1 Algorithms in Computational Biology (236522) Fall Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Wednesday 12:30-13:30 (or.
Prepared with lots of help from friends... Metsada Pasmanik-Chor, Zohar Yakhini and NUMEROUS WEB RESOURCES. BioInformatics / Computational Biology Introduction.
DNA and RNA. I. DNA Structure Double Helix In the early 1950s, American James Watson and Britain Francis Crick determined that DNA is in the shape of.
Cbio course, spring 2005, Hebrew University Computational Methods In Molecular Biology CS-67693, Spring 2005 School of Computer Science & Engineering Hebrew.
. Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours Wednesday TA: Ydo Wexler,
Introduction to Biological Sequences. Background: What is DNA? Deoxyribonucleic acid Blueprint that carries genetic information from one generation to.
12-3: RNA AND PROTEIN SYNTHESIS Biology 2. DNA double helix structure explains how DNA can be copied, but not how genes work GENES: sequence of DNA that.
From DNA to Proteins Lesson 1. Lesson Objectives State the central dogma of molecular biology. Describe the structure of RNA, and identify the three main.
Lesson Overview 13.1 RNA.
Overview of Molecular Biology
Chapter 3 The Biological Basis of Life. Chapter Outline  The Cell  DNA Structure  DNA Replication  Protein Synthesis  What is a Gene?  Cell Division:
. Algorithms in Computational Biology (236522) Spring 2002 Lecturer: Prof. Shlomo Moran TA: Ydo Wexler Lecture: Tuesday12:30-14:30, Taub 6 Tutorial: Tuesday11:30-12:30,
Intelligent Systems for Bioinformatics Michael J. Watts
From DNA to Protein Chapter DNA, RNA, and Gene Expression  What is genetic information and how does a cell use it?
Protein Synthesis Transcription and Translation. The Central Dogma The information encoded with the DNA nucleotide sequence of a double helix is transferred.
RNA and Protein Synthesis
13.2 Ribosomes and Protein Synthesis
Chapter 13: RNA and Protein Synthesis
Chapter 13.1 and 13.2 RNA, Ribosomes, and Protein Synthesis
RNA and Protein Synthesis
RNA & Protein Synthesis.
1 TRANSCRIPTION AND TRANSLATION. 2 Central Dogma of Gene Expression.
Deoxyribonucleic Acid (DNA) & Ribonucleic Acid (RNA)
DNA and RNA Objectives: 8.0 Identify the structure and function of DNA, RNA, and protein. 8.1 Explaining relationships among DNA, genes, and chromosomes.
12.3 DNA, RNA, and Protein Objective: 6(C) Explain the purpose and process of transcription and translation using models of DNA and RNA.
What is central dogma? From DNA to Protein
 After describing the structure of DNA, they released a second paper ◦ Basically stated that the base pairing model indicated a method for replication.
DNA Deoxyribonucleic Acid. DNA Structure What is DNA? The information that determines an organisms traits. DNA produces proteins which gives it “The.
Chapter 13 –RNA and Protein Synthesis
CHAPTER 13 RNA and Protein Synthesis. Differences between DNA and RNA  Sugar = Deoxyribose  Double stranded  Bases  Cytosine  Guanine  Adenine 
Teaching Bioinformatics Nevena Ackovska Ana Madevska - Bogdanova.
Reading DNA The DNA molecule has the same basic structure and function in all living things. It carries the instructions for building and operating an.
Introduction to Molecular Biology and Genomics BMI/CS 776 Mark Craven January 2002.
The Discovery of DNA as the genetic material. Frederick Griffith.
Gene Activity 1Outline Function of Genes  One Gene-One Enzyme Hypothesis Genetic Code Transcription  Processing Messenger RNA Translation  Transfer.
Placed on the same page as your notes Warm-up pg. 48 Complete the complementary strand of DNA A T G A C G A C T Diagram 1 A T G A C G A C T T A A C T G.
13.1 RNA 13.2 Ribosomes & Protein Synthesis
Gene Expression DNA, RNA, and Protein Synthesis. Gene Expression Genes contain messages that determine traits. The process of expressing those genes includes.
Gene Activity Chapter 14. Gene Activity 2Outline Function of Genes  One Gene-One Enzyme Hypothesis Genetic Code Transcription  Processing Messenger.
1 Genes and Proteins The genetic information contained in the nucleotide sequence of DNA specifies a particular type of protein Enzymes = proteins that.
DNA to RNA to Protein. RNA Made up of 1. Phosphate 2. Ribose (a sugar) 3. Four bases RNA bases are: Adenine Guanine Cytosine Uracil (instead of thymine)
Chapter 13.1: RNA Essential Questions
Replication, Transcription and Translation
13.2 Ribosomes and Protein Synthesis
13.2 Ribosomes and Protein Synthesis
13.2 Ribosomes and Protein Synthesis
13.2 Ribosomes and Protein Synthesis
What is RNA? Do Now: What is RNA made of?
Protein synthesis: Overview
RNA and Transcription DNA RNA PROTEIN.
Central Dogma Central Dogma categorized by: DNA Replication Transcription Translation From that, we find the flow of.
13.2 Ribosomes and Protein Synthesis
13.2 Ribosomes and Protein Synthesis
13.2 Ribosomes and Protein Synthesis
13.2 Ribosomes and Protein Synthesis
2/22/12 Objective: Recognize the central dogma of genetics Describe the process of transcription Describe the structure of messenger RNA Warm-Up:
Genes and Protein Synthesis Review
13.2 Ribosomes and Protein Synthesis
13.2 Ribosomes and Protein Synthesis
The Structure of DNA.
Presentation transcript:

. Introduction to Algorithms in Computational Biology Lecture 1 This class has been edited from Nir Friedman’s lecture which is available at Changes made by Dan Geiger. Background Readings: The first three chapters (pages 1-31) in Genetics in Medicine, Nussbaum et al., 2001.

2 Course Information Meetings: l Lecture, by Dan Geiger: Mondays 16:30 –18:30, Taub 4. l Tutorial, by Ydo Wexler: Tuesdays 10:30 – 11:30, Taub 2. Grade: u 20% in five question sets. These questions sets are obligatory. Each contains 4-6 theoretical problems. Submit in pairs in two weeks time u 80% test. Must pass beyond 55 for the homework’s grade to count Information and handouts: u u A brochure with zeroxed material at Taub library

3 Course Prerequisites Computer Science and Probability Background u Data structure 1 (cs234218) u Algorithms 1 (cs234247) u Probability (any course) Some Biology Background u Formally: None, to allow CS students to take this course. u Recommended: Biology 1 (especially for those in the Bioinformatics track), or a similar Biology course, and/or a serious desire to complement your knowledge in Biology by reading the appropriate material (see the course web site). Studying the algorithms in this course while acquiring enough biology background is far more rewarding than ignoring the biological context.

4 Relations to Some Other Courses Intro to Bioinformatics (cs236523). This course covers practical aspects and hands on experience with web-based bioinformatics Software. Albeit not a formal requirement, it is recommended that you look on the web site and examine the relevant software. Algorithms in Computational Biology (cs236522). This is the current course which focuses on modeling some bioinformatics problems and presents algorithms for their solution. Bioinformatics project (cs ). Developing bioinformatics tools under close guidance.

5 First Homework Assignment Solve two of the questions for Chapter 2 and two of the questions for Chapter 3. Due time: During the third tutorial class, or earlier in the teaching assistant’s mail slot. Recall to submit in pairs. Read carefully the first three chapters (pages 1-31) in Genetics in Medicine, Nussbaum et al., 2001.

6 Computational Biology Computational biology is the application of computational tools and techniques to (primarily) molecular biology. It enables new ways of study in life sciences, allowing analytic and predictive methodologies that support and enhance laboratory work. It is a multidisciplinary area of study that combines Biology, Computer Science, and Statistics. Computational biology is also called Bioinformatics, although many practitioners define Bioinformatics somewhat narrower by restricting the field to molecular Biology only.

7 Examples of Areas of Interest Building evolutionary trees from molecular (and other) data Efficiently assembling genomes of various organisms Understanding the structure of genomes (SNP, SSR, Genes) Understanding function of genes in the cell cycle and disease Deciphering structure and function of proteins

8 Exponential growth of biological information: growth of sequences, structures, and literature.

9 Four Aspects Biological l What is the task? Algorithmic l How to perform the task at hand efficiently? Learning l How to adapt/estimate/learn parameters and models describing the task from examples Statistics l How to differentiate true phenomena from artifacts

10 Example: Sequence Comparison Biological l Evolution preserves sequences, thus similar genes might have similar function Algorithmic l Consider all ways to “align” one sequence against another Learning l How do we define “similar” sequences? Use examples to define similarity Statistics l When we compare to ~10 6 sequences, what is a random match and what is true one

11 Course Goals u Learning about computational tools for (primarily) molecular biology. u We will cover computational tasks that are posed by modern molecular biology u We will discuss the biological motivation and setup for these tasks u We will understand the kinds of solutions that exist and what principles justify them

12 Topics I Dealing with DNA/Protein sequences: u Finding similar sequences u Models of sequences: Hidden Markov Models u Gene finding u Genome projects and how sequences are found

13 Topics II Models of genetic change: u Long term: evolutionary changes among species u Reconstructing evolutionary trees from sequences u Short term: genetic variations in a population u Finding genes by linkage and association

14 Topics III (One class, if time allows) Protein World: u How proteins fold - secondary & tertiary structure u How to predict protein folds from sequences data u How to analyze proteins changes from raw experimental measurements (MassSpec)

15 Human Genome Most human cells contain 46 chromosomes: u 2 sex chromosomes (X,Y): XY – in males. XX – in females. u 22 pairs of chromosomes named autosomes.

16 DNA Organization Source: Alberts et al

17 The Double Helix Source: Alberts et al

18 DNA Components Four nucleotide types: u Adenine u Guanine u Cytosine u Thymine Hydrogen bonds (electrostatic connection): u A-T u C-G

19 Genome Sizes u E.Coli (bacteria)4.6 x 10 6 bases u Yeast (simple fungi) 15 x 10 6 bases u Smallest human chromosome 50 x 10 6 bases u Entire human genome 3 x 10 9 bases

20 Genetic Information u Gene – basic unit of genetic information. They determine the inherited characters. u Genome – the collection of genetic information. u Chromosomes – storage units of genes.

21 Genes The DNA strings include: u Coding regions (“genes”) l E. coli has ~4,000 genes l Yeast has ~6,000 genes l C. Elegans has ~13,000 genes l Humans have ~32,000 genes u Control regions l These typically are adjacent to the genes l They determine when a gene should be expressed u “Junk” DNA (unknown function)

22 The Cell All cells of an organism contain the same DNA content (and the same genes) yet there is a variety of cell types.

23 Example: Tissues in Stomach How is this variety encoded and expressed ?

24 Central Dogma Transcription mRNA Translation Protein Gene cells express different subset of the genes In different tissues and under different conditions שעתוק תרגום

25 Transcription u Coding sequences can be transcribed to RNA u RNA nucleotides: l Similar to DNA, slightly different backbone l Uracil (U) instead of Thymine (T) Source: Mathews & van Holde

26 Transcription: RNA Editing Exons hold information, they are more stable during evolution. This process takes place in the nucleus. The mRNA molecules diffuse through the nucleus membrane to the outer cell plasma. 1.Transcribe to RNA 2.Eliminate introns 3.Splice (connect) exons * Alternative splicing exists

27 RNA roles u Messenger RNA (mRNA) l Encodes protein sequences. Each three nucleotide acids translate to an amino acid (the protein building block). u Transfer RNA (tRNA) l Decodes the mRNA molecules to amino-acids. It connects to the mRNA with one side and holds the appropriate amino acid on its other side. u Ribosomal RNA (rRNA) l Part of the ribosome, a machine for translating mRNA to proteins. It catalyzes (like enzymes) the reaction that attaches the hanging amino acid from the tRNA to the amino acid chain being created. u...

28 Translation (Outside the nucleolus) u Translation is mediated by the ribosome u Ribosome is a complex of protein & rRNA molecules u The ribosome attaches to the mRNA at a translation initiation site u Then ribosome moves along the mRNA sequence and in the process constructs a sequence of amino acids (polypeptide) which is released and folds into a protein.

29 Genetic Code There are 20 amino acids from which proteins are build.

30 Protein Structure u Proteins are poly- peptides of amino-acids u This structure is (mostly) determined by the sequence of amino-acids that make up the protein

31 Protein Structure

32 Evolution u Related organisms have similar DNA l Similarity in sequences of proteins l Similarity in organization of genes along the chromosomes u Evolution plays a major role in biology l Many mechanisms are shared across a wide range of organisms l During the course of evolution existing components are adapted for new functions

33 Evolution Evolution of new organisms is driven by u Diversity l Different individuals carry different variants of the same basic blue print u Mutations l The DNA sequence can be changed due to single base changes, deletion/insertion of DNA segments, etc. u Selection bias

34 The Tree of Life Source: Alberts et al

35 Example for Phylogenetic Analysis Input: four nucleotide sequences: AAG, AAA, GGA, AGA taken from four species. Question: Which evolutionary tree best explains these sequences ? AGA AAA GGA AAG AAA Total #substitutions = 4 One Answer (the parsimony principle): Pick a tree that has a minimum total number of substitutions of symbols between species and their originator in the evolutionary tree (Also called phylogenetic tree).

36 Example Continued There are many trees possible. For example: AGA GGA AAA AAG AAA AGA AAA Total #substitutions = 3 GGA AAA AGA AAG AAA Total #substitutions = 4 The left tree is “better” than the right tree. Questions: Is this principle yielding realistic phylogenetic trees ? (Evolution) How can we compute the best tree efficiently ? (Computer Science) What is the probability of substitutions given the data ? (Learning) Is the best tree found significantly better than others ? (Statistics)

37 Werner’s Syndrome A successful application of genetic linkage analysis

38 The Disease u First references in 1960s u Causes premature ageing u Linkage studies from 1992 u WRN gene cloned in 1996 u Subsequent discovery of mechanisms involved in wild-type and mutant proteins

39 A sample Input The study used 13 Markers; here we see only one. The study used 14 families; here we see only one H A 1 /A 1 D A 2 /A 2 H A 1 /A 2 D A 1 /A 2 H A 2 /A 2 D A 1 A 2 H D A 1 A 2 H | D A 2 | A 2 D A 2 Recombinant Phase inferred

40 Genehunter Output position LOD_score information [data skipped] [data skipped] distance between markers in centi- morgans Most ‘likely’ position D8S339 D8S131 D8S259 Marker’s name Log likelihood of placing disease gene at distance, relative to it being unlinked. Maximum log likelihood score

41 Final Location Marker D8S131 Marker D8S259 location of marker D8S339 WRN Gene final location Error in location by genetic linkage of about 1.25M base pairs.