Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Biology, Part 1 Introduction Robert F. Murphy Copyright  1996, 2000, 2001. All rights reserved.

Similar presentations


Presentation on theme: "Computational Biology, Part 1 Introduction Robert F. Murphy Copyright  1996, 2000, 2001. All rights reserved."— Presentation transcript:

1 Computational Biology, Part 1 Introduction Robert F. Murphy Copyright  1996, 2000, 2001. All rights reserved.

2 Course Introduction What these courses are about What these courses are about What I expect What I expect What you can expect What you can expect

3 What these courses are about overview of ways in which computers are used to solve problems in biology overview of ways in which computers are used to solve problems in biology supervised learning of illustrative or frequently-used programs supervised learning of illustrative or frequently-used programs (03-510) supervised learning of programming techniques and algorithms selected from these uses (03-510) supervised learning of programming techniques and algorithms selected from these uses

4 I expect students will have basic knowledge of biology and chemistry (at the level of Modern Biology/Chemistry) and willingness to learn more students will have basic knowledge of biology and chemistry (at the level of Modern Biology/Chemistry) and willingness to learn more students will have basic familiarity with use of computers (e.g., at the level of Computing Skills Workshop) and eagerness to gain new skills students will have basic familiarity with use of computers (e.g., at the level of Computing Skills Workshop) and eagerness to gain new skills (03-510) students have some programming experience and willingness to work to improve (03-510) students have some programming experience and willingness to work to improve heterogeneous class - I plan to include refreshers on each new topic heterogeneous class - I plan to include refreshers on each new topic students will ask questions in class and via email students will ask questions in class and via email

5 You can expect Three major course sections Three major course sections  Sequence Analysis (13 classes)  Biological Modeling (11 classes)  Biological Imaging (4 classes) Class sessions: lectures/demonstrations/exercises/quizzes Class sessions: lectures/demonstrations/exercises/quizzes Homework assignments Homework assignments  4 homework assignments for 03-311 (80% of grade)  8 homework assignments for 03-310 (70% of grade)  10 homework assignments for 03-510 (70% of grade) Test March 1 (20% for 03-311, 10% for others) Test March 1 (20% for 03-311, 10% for others) Final (20% of grade for 03-310, 03-510) Final (20% of grade for 03-310, 03-510) Communication on class matters via email list Communication on class matters via email list

6 Textbooks for first half of course For 03-310/311 students For 03-310/311 students  “Required textbook” is Baxevanis & Ouellette For 03-510 students For 03-510 students  “Recommended” textbook is Durbin et al. Additional suggested book Additional suggested book  Computational Molecular Biology, Peter Clote & Rolf Backofen (ISBN 0-471-87252-0)  Chap. 1 is an excellent introduction to Molec. Biol. for non-Biology majors

7 Specific sources for CMU computational biology classes Web page (http://www.bio.cmu.edu/Courses/03310 or 03311 or 03510) Web page (http://www.bio.cmu.edu/Courses/03310 or 03311 or 03510)  Lecture Notes (as PowerPoint files)  Homework Assignments (as Word files)  Additional materials as needed FTP server (www.bio.cmu.edu) FTP server (www.bio.cmu.edu)  Files needed for homework assignments CompBiol project volume on AFS CompBiol project volume on AFS  /afs/andrew.cmu.edu/usr/murphy/CompBiol

8 Additional classes for 03-510 We will have one additional class meeting per week for 03-510 for the first half of the semester only We will have one additional class meeting per week for 03-510 for the first half of the semester only Purpose is to cover some more advanced material and programming assignments Purpose is to cover some more advanced material and programming assignments

9 Other relevant courses Second half mini-course “ ” will be taught by Dr. R. Ravi Second half mini-course “47-863: Topics in Operations Research: Computational Biology” will be taught by Dr. R. Ravi  Tuesday-Thursday 1:30-2:50 starting 3/13  Recommended for 03-510 students Fall 2001 course on advanced topics in computational molecular biology will be taught by Dr. Dannie Durand Fall 2001 course on advanced topics in computational molecular biology will be taught by Dr. Dannie Durand  Prerequisite: 03-310/311/510

10 Information flow A major task in computational molecular biology is to “decipher” information contained in biological sequences A major task in computational molecular biology is to “decipher” information contained in biological sequences Since the nucleotide sequence of a genome contains all information necessary to produce a functional organism, we should in theory be able to duplicate this decoding using computers Since the nucleotide sequence of a genome contains all information necessary to produce a functional organism, we should in theory be able to duplicate this decoding using computers

11 Review of basic biochemistry Central Dogma: DNA makes RNA makes protein Central Dogma: DNA makes RNA makes protein Sequence determines structure determines function Sequence determines structure determines function

12 Structure macromolecular structure divided into macromolecular structure divided into  primary structure (1D sequence)  secondary structure (local 2D & 3D)  tertiary structure (global 3D) DNA composed of four nucleotides or "bases": A,C,G,T DNA composed of four nucleotides or "bases": A,C,G,T RNA composed of four also: A,C,G,U (T transcribed as U) RNA composed of four also: A,C,G,U (T transcribed as U) proteins are composed of amino acids proteins are composed of amino acids

13 DNA properties - base composition Some properties of long, naturally-occuring DNA molecules can be predicted accurately given only the base composition, usually expressed as either Some properties of long, naturally-occuring DNA molecules can be predicted accurately given only the base composition, usually expressed as either  %GC (the percent of all base pairs that are G:C), or   GC (the mole fraction of all bases that are either G or C)  %GC = 100*  GC

14 DNA properties - melting temperature and buoyant density Two such properties are Two such properties are  T m, the melting temperature, defined as the temperature at which half of the DNA is single- stranded and half is double-stranded  T m ( o C) = 69.3 + 41  GC (for 0.15 M NaCl)   0, the buoyant density, defined as the density of a solution in which a DNA molecule will feel no net force when centrifuged (the density at the point in a density gradient at which the DNA stops moving, or “bands”)   0 (g cm -3 ) = 1.660 + 0.098  GC (for CsCl)

15 DNA structure - restriction maps Restriction enzymes cut DNA at specific sequences. Restriction enzymes cut DNA at specific sequences. A restriction map is a graphical description of the order and lengths of fragments that would be produced by the digestion of a DNA molecule with one or more restriction enzymes A restriction map is a graphical description of the order and lengths of fragments that would be produced by the digestion of a DNA molecule with one or more restriction enzymes

16 Restriction map of a circular plasmid with one enzyme AccII pGEM4

17 Restriction map of all enzymes that cut only once AcsIApoIEcoRIEcl136IIEcoICRISacISstIAcc65IAsp718IAvaI BcoICfr9IEco88IKpnIPspAIXmaISmaIBamHIBstIXbaISalIAccIHincIIHindIIPstISse8387I BspMIBbuIPaeISphIHindIII PvuII SapI AflIII AlwNI AhdIAspEIEam1105IEclHKI BpmIGsuI BglI AviIIFspI BspCIPvuIXorII Eco255IScaI Asp700IXmnI SspI AatII EcoNI BsmFI DsaI Aor51HIEco47III SgrAI NgoAIVNgoMINaeI NheI Bsp1407IBsrGISspBI pGEM4

18 Transcription transcription is accomplished by RNA polymerase transcription is accomplished by RNA polymerase RNA polymerase binds to promoters RNA polymerase binds to promoters promoters have distinct regions "-35" and "-10" promoters have distinct regions "-35" and "-10" efficiency of transcription controlled by binding and progression rates efficiency of transcription controlled by binding and progression rates transcription start and stop affected by tertiary structure transcription start and stop affected by tertiary structure regulatory sequences can be positive or negative regulatory sequences can be positive or negative

19 RNA processing eukaryotic genes are interrupted by introns eukaryotic genes are interrupted by introns these are "spliced" out to yield mRNA these are "spliced" out to yield mRNA splicing done by spliceosome splicing done by spliceosome splicing sites are quite degenerate but not all are used splicing sites are quite degenerate but not all are used

20 Translation conversion from RNA to protein is by codon: 3 bases = 1 amino acid conversion from RNA to protein is by codon: 3 bases = 1 amino acid translation done by ribosome translation done by ribosome translation efficiency controlled by mRNA copy number (turnover) and ribosome binding efficiency translation efficiency controlled by mRNA copy number (turnover) and ribosome binding efficiency translation affected by mRNA tertiary structure translation affected by mRNA tertiary structure

21 Protein localization leader sequences can specify cellular location (e.g., insert across membranes) leader sequences can specify cellular location (e.g., insert across membranes) leader sequences usually removed by proteolytic cleavage leader sequences usually removed by proteolytic cleavage

22 Postranslational processing peptides fold after translation - may be assisted or unassisted peptides fold after translation - may be assisted or unassisted processing enzymes recognize specific sites (amino acid sequences) processing enzymes recognize specific sites (amino acid sequences) protein signals can involve secondary and tertiary structure, not just primary structure protein signals can involve secondary and tertiary structure, not just primary structure

23 Goals of Sequence Analysis Assigned Reading: Assigned Reading: Baxevanis & Ouellette, Chapter 10 Baxevanis & Ouellette, Chapter 10

24 Goals of Sequence Analysis Management of sequence information Management of sequence information Assembly of sequence fragments into complete units (proteins, genes, chromosomes) Assembly of sequence fragments into complete units (proteins, genes, chromosomes)

25 Goals of Sequence Analysis Confirmation and prediction of restriction enzyme sites (for nuc.acids) Confirmation and prediction of restriction enzyme sites (for nuc.acids)  can aid sequence determination in areas of uncertainty by permitting testing of specific bases  can permit selection of appropriate enzymes for sequence checking  can permit selection of appropriate enzymes for subcloning or generation of probes

26 Goals of Sequence Analysis Finding open reading frames (ORFs) for cDNAs or genomic DNA from organisms without introns Finding open reading frames (ORFs) for cDNAs or genomic DNA from organisms without introns Finding protein coding regions in DNAs using codon usage tables Finding protein coding regions in DNAs using codon usage tables  not all ORFs are made into proteins  redundancy in genetic code is not fully reflected in the tRNAs made by a particular organism (codon preference)  can use to identify "real" coding regions (pseudo-genes "drift" in their codon usage)  can use expressed sequence tags (ESTs)

27 Goals of Sequence Analysis Finding and using consensus sequences Finding and using consensus sequences  Examples  promoters  transcription initiation sites  transcription termination sites  polyadenylation sites  ribosome binding sites  protein features  use sets of sequences identified (by other means) as related  use sets of sequences identified by sequence comparison

28 Goals of Sequence Analysis Comparison and alignment of sequences Comparison and alignment of sequences  compare sequence to database - goal: find related sequences (SIMILARITY)  compare sequence to sequence - goal: find matching domains (ALIGNMENT)  compare database to database - goal: estimate genetic distance (EVOLUTION)  either: determine consensus sequences  comparisons can be pairwise or multiple-strand

29 Goals of Sequence Analysis Translation to protein sequence and prediction of protein properties - use measured propensities of particular amino acids or amino acid stretches Translation to protein sequence and prediction of protein properties - use measured propensities of particular amino acids or amino acid stretches  Predict molecular weight  Predict isoelectric point (pI)  Predict extinction coefficient Prediction of secondary and tertiary structure Prediction of secondary and tertiary structure  RNA - use base pairing energies  protein - use propensities


Download ppt "Computational Biology, Part 1 Introduction Robert F. Murphy Copyright  1996, 2000, 2001. All rights reserved."

Similar presentations


Ads by Google