Presentation is loading. Please wait.

Presentation is loading. Please wait.

341: Introduction to Bioinformatics Dr. Nataša Pržulj Department of Computing Imperial College London 1.

Similar presentations


Presentation on theme: "341: Introduction to Bioinformatics Dr. Nataša Pržulj Department of Computing Imperial College London 1."— Presentation transcript:

1 341: Introduction to Bioinformatics Dr. Nataša Pržulj Department of Computing Imperial College London natasha@imperial.ac.uk 1

2 Course overview Flood of the available biological data:  Sequences and microarrays (Dr. Rice)  Protein 3D structure (Dr. Malod-Dognin)  Networks: e.g., of protein interactions; expected to be as useful as the sequence data in uncovering new biology (Dr. Pržulj) 2 Motivation:

3 3 Course overview Flood of the available biological data:  Sequences and microarrays (Dr. Rice)  Protein 3D structure (Dr. Malod-Dognin)  Networks: e.g., of protein interactions; expected to be as useful as the sequence data in uncovering new biology (Dr. Pržulj) The goal of systems biology:  Systems-level understanding of biological systems, e.g. the cell  Analyze not only individual components, but their interactions as well and its functioning as a whole  E.g.: Learn new biology from the topology (wiring patterns) of such interaction networks 3 Motivation:

4 4 Course overview Flood of the available biological data:  Sequences and microarrays (Dr. Rice)  Protein 3D structure (Dr. Malod-Dognin)  Networks: e.g., of protein interactions; expected to be as useful as the sequence data in uncovering new biology (Dr. Pržulj) The goal of systems biology:  Systems-level understanding of biological systems, e.g. the cell  Analyze not only individual components, but their interactions as well and its functioning as a whole  E.g.: Learn new biology from the topology (wiring patterns) of such interaction networks However, biological data analysis research faces considerable challenges  Incomplete and noisy data  Computational intractability of many computational (e.g., graph theoretic) problems 4 Motivation:

5 Course overview We will cover: 1.Sequence analysis (Dr. Peter Rice) 2.Microarray analysis (Dr. Peter Rice) 3.Graph theoretic aspects: Fundamental topics in graph theory (e.g., basic graph notation, graph representation, and special graph types) Basic graph algorithms (e.g., graph search/traversal algorithms and running time analysis) Important computational complexity concepts (e.g., complexity classes, subgraph isomorphism, and NP-completeness) which pose challenges on analyzing biological nets 4.Protein 3D structure (Dr. Malod-Dognin) 5.Biological networks aspects: Basic biological concepts (e.g., DNA, genes, proteins, gene expression, …) Different types of biological networks Experimental techniques for acquiring the data and their biases Public databases and other sources of biological network data 6.Existing approaches for analyzing and modeling biological networks: Structural properties of large networks Network models Network clustering Network alignment Integration of various heterogeneous networks Software tools for network analysis 7.Applications – data analysis: interplay of topology and biology Learn how the above methods have been applied Discuss valuable insights that have been learned: into biological function, evolution, complex diseases (e.g., cancer) and drug discovery 5

6 Course overview Grading scheme:  One coursework assignment Given out on Feb 13 by email and posted on class website Due on Thursday, March 5, by 2pm  Written exam  Standard DoC Grading Scheme will be used as described by Degree Regulations at https://www.doc.ic.ac.uk/internal/teachingsupport/re gulations/index.htm https://www.doc.ic.ac.uk/internal/teachingsupport/re gulations/index.htm  Other departments: we provide coursework and exam marks and a particular department decides on the weighting for the final grade 6

7 7 Course overview

8 Course organization: 1.Lectures Relevant theoretical concepts and examples 2.Tutorials Exercises covering concepts covered in class 3.One coursework assignment Opportunity to solve problems using the methods learned in class 4.Written exam Testing students’ understanding of the concepts learned in lectures Tutorial helpers:  Anida Sarajlic (a.sarajlic12@imperial.ac.uk )a.sarajlic12@imperial.ac.uk  Dr. Noel Malod-Dognin (n.malod-dognin@imperial.ac.uk )  Vladimir Gligorijevic (v.gligorijevic@imperial.ac.uk )v.gligorijevic@imperial.ac.uk  Vuk Janjic ( v.janjic11@imperial.ac.uk )v.janjic11@imperial.ac.uk 8

9 Course overview Textbooks and readings  Recommended textbooks: Pevzner and Shamir, “Bioinformatics for Biologists,” Cambridge University Press, 2011 Junker and Schreiber, “Analysis of Biological Networks,” Wiley, 2008 West, “Introduction to graph theory,” 2nd edition, Prentice Hall, 2001 orT. Cormen et al., “Analysis of Algorithms”, 3 rd edition, MIT press, 2009 A list of up-to-date research papers selected by the instructor: see http://www.doc.ic.ac.uk/~natasha/course2012/class_material.html. http://www.doc.ic.ac.uk/~natasha/course2012/class_material.html  Recommended readings: F. Kepes (Author, Editor), “Biological Networks (Complex Systems and Interdisciplinary Science),” World Scientific Publishing Company; 1st edition, 2007 Bornholdt and Schuster (Editors), “Handbook of Graphs and Networks: From the Genome to the Internet,” Wiley, 2003 or Dorogovtsev and Mendes (Authors), “Evolution of Networks: From Biological Nets to the Internet and WWW (Physics),” Oxford University Press, 2003. Chapter 17 from: Chen and Lonardi (Editors), “Biological Data Mining,” Chapman and Hall/CRC press, 2009 Chapter 4 from: Jurisica and Wigle (Editors), “Knowledge Discovery in Proteomics,” CRC Press, 2005 “LEDA: A Platform for Combinatorial and Geometric Computing,” by Kurt Mehlhorn, Stefan Näher, Cambridge University Press, 1999 9

10 Course overview When and where:  Thursdays 11-13h (LT 144) and Fridays 16-18h (LT 311)  Huxley Building Contact:  E-mail: natasha@imperial.ac.uk  Subject: “341 Bioinformatics” Office hours:  Fridays after class  Office: 407 C Huxley 10

11 Course overview Prerequisites: no formal ones, but  General computational/mathematical maturity  Basic programming skills are desirable  Introduction into biological concepts will be provided Course website (curriculum, class material, etc.):  http://www.doc.ic.ac.uk/~natasha/course2012/index.html http://www.doc.ic.ac.uk/~natasha/course2012/index.html also linked from CATE Academic code of honor 11

12 Topics Introduction: biology (Dr. Przulj, 1 lecture) Sequence analysis (Dr. Rice, 2 lectures) Microarray analysis (Dr. Rice, 3 lectures) Introduction to graph theory (Dr. Przulj, 2 lectures) Protein 3D structure (Dr. Malod-Dognin, 2 lectures) Network biology (Dr. Przulj, 8 lectures):  Network properties Network/node centralities Network motifs  Network models  Network/node clustering  Network comparison/alignment  Network data integration  Software tools for network analysis  Interplay between topology and biology 12

13 Course overview Any questions so far? 13

14 Course overview About you… 14

15 Introduction: biology 15

16 Introduction: biology Cell - the building block of life  Cytoplasm and organelles separated by membranes: Mitochondria, nucleus, etc. 16

17 Introduction: biology Distinguish between:  Prokaryotes Single-celled, no cell nucleus or any other membrane-bound organelles The genetic material in prokaryotes is not membrane-bound The bacteria and the archaea Model organism: E.coli  Eukaryotes Have “true” nuclei containing their DNA May be unicellular, as in amoebae May be multicellular, as in plants and animals Model organism: S. cerevisiae (baker’s yeast) 17

18 Introduction: biology Nucleus contains DNA  Deoxyribonucleic acid DNA nucleotides: A and T, C and G DNA structure: double helix 18

19 Introduction: biology DNA is organized into Chromosomes RNA: similar to DNA, except T  U and single stranded 19

20 20 Introduction: biology Main role of DNA: long-term storage of genetic information Genes: DNA segments that carry this information  Intron: part of gene not translated into protein, spliced out of mRNA (messenger RNA – conveys genetic info from DNA to ribosome where proteins are made)  Exon: mRNA translated into protein; protein consists only of exon- derived sequences Genome: total set of all genes in an organism  Every cell (except sex cells and mature red blood cells) contains the complete genome of an organism So how can we have different cells (neuron, liver…)? 20

21 21 Introduction: biology Codons: sets of three nucleotides  4 nucleotides  4 3 =64 possible codons Each codon codes for an amino acid  64 codons produce 20 different amino acids  More than one codon stands for one amino acid. Why? Polypeptide:  String of amino acids, composed from a 20-character alphabet Proteins:  Composed of one or more polypeptide chains (70-3000 amino acids)  Sequence of amino acids is defined by a gene  Gene expression: information transmission from DNA to proteins Proteome: total set of proteins in an organism

22 Introduction: biology The 20 amino acids 22

23 23 Introduction: biology Levels of protein structure: 23

24 Introduction: biology Genes vs. proteins  Genes – passive; proteins – active Protein synthesis: from genes to proteins  Transcription (in nucleus)  Splicing (eukaryotes)  Translation (in cytoplasm) 24

25 Introduction: biology Transcription (in nucleus)  RNA polymerase enzyme builds an RNA strand from a gene (DNA is “unzipped”)  The gene is transcribed to messenger RNA (mRNA)  Transcription is regulated by proteins called transcription factors 25

26 Introduction: biology Transcription (in nucleus)  RNA polymerase enzyme builds an RNA strand from a gene (DNA is “unzipped”)  The gene is transcribed to messenger RNA (mRNA)  Transcription is regulated by proteins called transcription factors 26

27 Introduction: biology Splicing (eukaryotes) – in nucleus, after and concurrently with transcription Regions that are not coding for proteins (introns) are removed from pre-mRNA sequence Mature mRNA is produced 27

28 Introduction: biology Translation (in cytoplasm)  Ribosomes synthesize proteins from mRNA  mRNA is decoded and used as a template to guide the synthesis of a chain of amino acids that form a protein  Translation: the process of converting the mRNA codon sequences into an amino acid polypeptide chain 28

29 Introduction: biology Microarrays:  Measure mRNA abundance for each gene  The amount of transcribed mRNA correlates with gene expression: The rate at which a gene produces the corresponding protein It is hard to measure protein level directly! 29

30 Introduction: biology Every cell* contains the complete genome of an organism How is the variety of different tissues encoded and expressed? 30

31 Introduction: biology 31 22,000?

32 Introduction: biology -ome and –omics: studying collectively all genes, proteins…  Genome and genomics  Proteome and proteomics  … 32

33 Course overview: Motivation What is a network (or graph)?  A set of nodes (vertices) and edges (links)  Edges describe a relationship between the nodes A B C D A B C D 33

34 Course overview: Motivation Networks model many real-world phenomena 34

35 Course overview Facebook 35

36 Course overview WWW 36

37 Course overview Internet 37

38 Course overview Airline routes 38

39 Course overview Biological nets Protein-protein interaction (PPI) networks 39

40 Course overview Biological nets Protein structure networks 40

41 Course overview Biological nets Metabolic networks 41

42 Course overview Biological nets Other network types 42

43 From functional genomics to systems biology 2010 (EMBO) 43


Download ppt "341: Introduction to Bioinformatics Dr. Nataša Pržulj Department of Computing Imperial College London 1."

Similar presentations


Ads by Google