Exploration Session Week 8: Computational Biology Melissa Winstanley: (based on slides by Martin Tompa,

Slides:



Advertisements
Similar presentations
Introduction to molecular biology. Subjects overview Investigate how cells organize their DNA within the cell nucleus, and replicate it during cell division.
Advertisements

1 Genetics The Study of Biological Information. 2 Chapter Outline DNA molecules encode the biological information fundamental to all life forms DNA molecules.
RNA and Protein Synthesis
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 18: Application-Driven Hardware Acceleration (4/4)
Molecular Biology Background. Schematic view of DNA organization in a cell.
DNA and RNA. I. DNA Structure Double Helix In the early 1950s, American James Watson and Britain Francis Crick determined that DNA is in the shape of.
Sequence Alignment III CIS 667 February 10, 2004.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
13.3: RNA and Gene Expression
Biomolecules Nucleic acids.  Are the genetic materials of all organisms and determine inherited characteristics.  The are two kinds of nucleic acids,
Nucleic Acids DNA vs. RNA
DNA and Protein Synthesis. DNA Does 2 Important Things in a Cell: 1)DNA is capable of replicating itself. Every time a cell divides, each DNA strand makes.
C OMPUTATIONAL BIOLOGY. O UTLINE Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity of the Algorithms.
CSE 6406: Bioinformatics Algorithms. Course Outline
Chapter 10 packet: DNA and Protein Synthesis. Discovery of the structure of DNA DNA is in the shape of a double helix – discovered by Franklin & Wilkins.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
DNA alphabet DNA is the principal constituent of the genome. It may be regarded as a complex set of instructions for creating an organism. Four different.
National 5 Biology Course Notes Part 4 : DNA and production of
Deoxyribonucleic Acid (DNA) & Ribonucleic Acid (RNA)
Chapter 11 DNA and GENES. DNA: The Molecule of Heredity DNA, the genetic material of organisms, is composed of four kinds nucleotides. A DNA molecule.
Construction of Substitution Matrices
Introduction to Biochemistry. Biochemistry Chemistry of living organisms. The study of biology at the molecular level.
Protein Synthesis Part 1: Transcription. DNA is like a book of instructions written with the alphabet A, T, G, and C. Genes are specific sequences of.
Lecture #3 Transcription Unit 4: Molecular Genetics.
DNA, RNA, and Proteins Section 3 Section 3: RNA and Gene Expression Preview Bellringer Key Ideas An Overview of Gene Expression RNA: A Major Player Transcription:
Chapter 3 Computational Molecular Biology Michael Smith
RNA Structure and Protein Synthesis Chapter 10, pg
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Chapter 9 From DNA to Protein.
Chapter 11: DNA & Genes Sections 11.1: DNA: The Molecular of Heredity Subsections: What is DNA? Replication of DNA.
DNA Structure and Protein Synthesis (also known as Gene Expression)
Leaving Cert Biology Genetics – section 2.5 Genetics ( RNA), 2.5.5,
Protein Synthesis Review By PresenterMedia.com PresenterMedia.com.
Construction of Substitution matrices
Gene Expression How do genotypes become phenotypes? 23 from mom 23 from dad.
 A very large molecule, found in the chromosomes of all cells  Carries the genetic code - all the instructions for the structure and functioning of.
Introduction to Molecular Biology and Genomics BMI/CS 776 Mark Craven January 2002.
Have Your DNA and Eat It Too I will be able to describe the structure of the DNA molecule I will be able to explain the rules of base pairing I will understand.
DNA. Unless you have an identical twin, you, like the sisters in this picture will share some, but not all characteristics with family members.
DNA Study Guide. 1. What is DNA? 1.What is DNA? DNA is the blueprint of life. Chromosomes are made of DNA.
DNA. An organism’s genetic material Located on chromosomes Genes are segments on DNA Contains information needed for an organism to grow, maintain itself,
Copyright © by Holt, Rinehart and Winston. All rights reserved. ResourcesChapter menu Overview Section 2 The Structure of DNA DNA.
Prepared By: Syed Khaleelulla Hussaini. Outline Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity.
Transcription & Translation. Objectives: Relate the concept of the gene to the sequences of nucleotides in DNA Sequence the steps involved in protein.
8.2 KEY CONCEPT DNA structure is the same in all organisms.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
DNA Structure and Protein Synthesis (also known as Gene Expression)
Data-intensive Computing: Case Study Area 1: Bioinformatics
LO: SWBAT describe the connection between DNA and proteins
Aim: What is the connection between DNA & protein?
Section 3: RNA and Gene Expression
Overview of Genetics Genes make us who we are!.
Nucleic Acids 2 Types What do they do? DNA- deoxyribonucleic acid
MODERN GENETICS DNA.
The Cell Cycle and Protein Synthesis
Genetics: From Genes to Genomes
DNA.
The Study of Biological Information
DNA:The cells Information system
Translation and Transcription
DNA: the molecule of heredity
An Overview of Gene Expression
Basic Local Alignment Search Tool
Transcription and the RNA code
TRANSCRIPTION DNA mRNA.
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Exploration Session Week 8: Computational Biology Melissa Winstanley: (based on slides by Martin Tompa, Luca Cardelli)

Exploring DNA Sequences

Overview of DNA  Instructions for cellular function  Building proteins  Composed of nucleotides  Adenine, thymine, cytosine, guanine  A pairs with T, C pairs with G  Double-stranded: forms a double helix  Strands have an orientation  Pairing of antiparallel strands  Huge amount of DNA  3 billion base pairs, 2m long in a cell  133 AU long in human  20 million light years long in human population

Overview of Proteins  Workhorses of cells  Composed of sequence of amino acids  20 to 5000 amino acids in a protein  20 possible amino acids  Proteins fold into complex 3D shapes  Fold-It  Information to make proteins encoded in DNA  Codon: 3 base pairs  Ex. CTA  leucine  Gene: sequence of DNA for 1 protein

Overall Goals  Overall  Identify key molecules in organisms  Identify interactions among molecules  Computational focus: sequence analysis  Identify genes  Determine gene function (what protein is produced?)  Identify proteins involved in gene expression  Identify key functional regions  Why do we care?  Determining function of a new sequence  Genetic diseases  Evolution

String Alignment  How to judge how well two strings are aligned? acbcdba c - - b c d b cadbd - c a d b - d -  Each dash represents an inserted space  Assign +2 to every exact match, -1 to every mismatch 3 * * (-1) = 1  Higher score indicates a greater match between the strings

BLAST Algorithm  “ B asic L ocal A lignment S earch T ool”  For comparing biological sequence information  Amino acid sequences (proteins) or nucleotide sequences (DNA)  Inputs  A query sequence Q  A database D of sequences  Output  Sequences from D that match Q above a certain threshold  Usefulness  Unknown gene in a mouse, so query the human gene database to see if a similar gene exists in humans

BLAST ctd  Make k-letter subsequences from Q Ex. k = 3: “acbcdb” “acb” “cbc” “bcd” “cdb”  Usually k = 28 for DNA, k = 3 for proteins

BLAST ctd  For each subsequence w, find matching subsequences  Only consider a matching subsequence if its alignment score is greater than some threshold  Alignment(seq) >= T Ex. T = 2, w=“TCG” seq = “TCA”  Alignment = 2 * * (-1) = 3 Considered seq = “ACT”  Alignment = 2 * * (-1) = 0 Not considered

BLAST ctd  Scan the database for exact matches with the high scoring subsequences  Take each exact match and extend in either direction (no gaps)  Until the score decreases below a “dropoff”  Forms a “high-scoring segment pair” (HSP)  Only save match extensions above a certain score threshold S Query seq:A C T C G G C Database:G C T C A G T Score HSP: score = – = 7 Exact match

BLAST ctd  For each HSP, do a gapped extension (spaces possible)  Output each extension that has probability of randomly occurring below a pre-set threshold x

More Complicated Analysis  Multiple sequence alignment  Different ways to score subsequences  Considering context around a sequence  Predicting 3D structures of proteins

Programming Molecules

Getting Smaller  First transistor  25nm NAND flash  Single molecule transistor  Molecules on a chip  ~10 Moore’s Law cycles left

Building Smaller  How to build things smaller than your tools?  You can’t  Solution: self-assembly  Molecular IKEA  Dear IKEA, please send me a chest of drawers that assembles itself.  At a molecular scale, many such materials exist  Proteins, DNA/RNA, membranes 

Machines in Biochemistry Gene Machine Membrane Machine Protein Machine

Machines in Biochemistry Gene Machine Membrane Machine Protein Machine Nucleotides Amino acids Phospholipids

Machines in Biochemistry Gene Machine Membrane Machine Protein Machine Nucleotides Amino acids Phospholipids Strings RecordsMultisets

Machines in Biochemistry Gene Machine Membrane Machine Protein Machine Nucleotides Amino acids Phospholipids Strings RecordsMultisets Activation Inhibition On/off switches Binding sites Fusion Fission

How do we form a “language”?  Chemical reactions  A + C  r B + D  Instructions in a “program”  Problem: combinatorial explosion  SO MANY chemical reactions in a cell  Model reactions as automata – machines that perform a task  Problem: chemistry is not an executable language  Dear Chemist, please execute this arbitrary reaction.

Controlling Systems on a Nanoscale Computing Sensing Constructing Actuating

DNA Tweezers

One Approach to Autonomous Computing  Goal: precisely control organization and dynamics of matter and information at the molecular level  Uses DNA, but use is accidental  No genes involved t x t a t a x t y t a t t x t a y t “Gates” and “transducers”

Molecular programming workflow  First figure out what gates you want to use and signals you want to send  Signals + gates  structures of DNA  Structures  sequences of DNA (NUPACK)NUPACK  Sequences  DNA synthesis (IDT)IDT  DNA synthesis  mail  Receipt of DNA  water execution  Florescence is your “print” statement