Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA.

Slides:



Advertisements
Similar presentations
Motivation “Nothing in biology makes sense except in the light of evolution” Christian Theodosius Dobzhansky.
Advertisements

BIOINFORMATICS Ency Lee.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Gibbs Sampling in Motif Finding. Gibbs Sampling Given:  x 1, …, x N,  motif length K,  background B, Find:  Model M  Locations a 1,…, a N in x 1,
1 Regulatory Motif Finding Discovery of Regulatory Elements by a Computational Method for Phylogenetic Footprinting, Blanchette & Tompa (2002) Statistical.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Comparative Motif Finding
(Regulatory-) Motif Finding
Introduction to BioInformatics GCB/CIS535
Network Motifs Zach Saul CS 289 Network Motifs: Simple Building Blocks of Complex Networks R. Milo et al.
BACKGROUND E. coli is a free living, gram negative bacterium which colonizes the lower gut of animals. Since it is a model organism, a lot of experimental.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Bioinformatics page 12, part of ch. 21 Cell and Mol Biol Lab.
Gene Regulation and Microarrays …after which we come back to multiple alignments for finding regulatory motifs.
Introduction to computational genomics – hands on course Gene expression (Gasch et al) Unit 1: Mapper Unit 2: Aggregator and peak finder Solexa MNase Reads.
Proteomics Understanding Proteins in the Postgenomic Era.
Motif finding : Lecture 2 CS 498 CXZ. Recap Problem 1: Given a motif, finding its instances Problem 2: Finding motif ab initio. –Paradigm: look for over-represented.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Whole Exome Sequencing for Variant Discovery and Prioritisation
Metagenomic Analysis Using MEGAN4
1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
CSE 6406: Bioinformatics Algorithms. Course Outline
Interactions and more interactions
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Helping scientists collaborate BioCAD. ©2003 All Rights Reserved.
Finish up array applications Move on to proteomics Protein microarrays.
Discovery of Regulatory Elements by a Phylogenetic Footprinting Algorithm Mathieu Blanchette Martin Tompa Computer Science & Engineering University of.
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Comparative Sequence Analysis in Molecular Biology Martin Tompa Computer Science & Engineering Genome Sciences University of Washington Seattle, Washington,
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
1 Prediction of Regulatory Elements Controlling Gene Expression Martin Tompa Computer Science & Engineering Genome Sciences University of Washington Seattle,
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
NY Times Molecular Sciences Institute Started in 1996 by Dr. Syndey Brenner (2002 Nobel Prize winner). Opened in Berkeley in Roger Brent,
EB3233 Bioinformatics Introduction to Bioinformatics.
COMPUTATIONAL BIOLOGIST DR. MARTIN TOMPA Place of Employment: University of Washington Type of Work: Develops computer programs and algorithms to identify.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
No reference available
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Chapter 1 Principles of Life
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Network Analysis Goal: to turn a list of genes/proteins/metabolites into a network to capture insights about the biological system 1.Types of high-throughput.
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
Chapter 1 Principles of Life. All organisms Are composed of a common set of chemical components. Genetic information that uses a nearly universal code.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
CSCI2950-C Lecture 12 Networks
Sequence based searches:
Large Scale Data Integration
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
9 Future Challenges for Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Schedule for the Afternoon
Anastasia Baryshnikova  Cell Systems 
Computational Biology
Deep Learning in Bioinformatics
Vermont Genetics Network Outreach Proteomics Module
Presentation transcript:

Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Math and Computer Science ChallengesBenno Schwikowski Species Conditions/time Genes Towards integrative models Protein interaction - Interaction partner - Direct/indirect - Affinity - Effect DNA - Sequence - Genomic locus - Domain content - Intron/exon structure - Regulatory motifs - Chemical modifications - SNPs - Splice variants - Accessibility - Variation mRNA - Abundance - Regulatory information - initiation/ termination signals Protein - Abundance - State - Localization - 3D structure - Functional characterization - Half-life - Active sites - Biochemical function - Cellular role

Math and Computer Science ChallengesBenno Schwikowski Challenge: Integrative models …Across genes and proteins: Many genes involved (e.g., multifactorial diseases) …Across model systems: Lack of experimental platforms in target system …Across levels of biological organization (e.g. gene regulatory processes involving phosphorylation) …Across experiments: Robustness against errors in mass spectrometry, mRNA measurements …Across timescales

Math and Computer Science ChallengesBenno Schwikowski DNA RNA Proteins Modules Organelles Cells Organs Individuals Populations Ecologies Challenge: Capturing evolutionary constraints "Nothing in biology makes sense except in the light of evolution.“ Theodosius Dobzhansky

Challenge: Which tools and experiments to use

Math and Computer Science ChallengesBenno Schwikowski Challenge: Choosing experiments Machine Learning Determine most likely classification/parameterization on the basis of a randomly sampled dataset Active Learning Allow an algorithm to query selected data points, using the result of previous queries.

Math and Computer Science ChallengesBenno Schwikowski Challenge: Relations between system variables can be quite complex Yuh, Bolouri, Davidson, Science, 1998

Math and Computer Science ChallengesBenno Schwikowski Challenge: Relations between system variables can be quite complex Yuh, Bolouri, Davidson, Science, 1998

Math and Computer Science ChallengesBenno Schwikowski Challenge: Develop models that allow extremely efficient algorithms AGTCGTACGTGAC... AGTAGACGTGCCG... ACGTGAGATACGT... GAACGGAGTACGT... TCGTGACGGTGAT...

Math and Computer Science ChallengesBenno Schwikowski CLUSTALW(1.74) multiple sequence alignment CottonACGGTT-TCCATTGGATGA---AATGAGATAAGAT---CACTGTGC---TTCTTCCACGTG--GCAGGTTGCCAAAGATA AGGCTTTACCATT PeaGTTTTT-TCAGTTAGCTTA---GTGGGCATCTTA----CACGTGGC---ATTATTATCCTA--TT-GGTGGCTAATGATA AGG--TTAGCACA TobaccoTAGGAT-GAGATAAGATTA---CTGAGGTGCTTTA---CACGTGGC---ACCTCCATTGTG--GT-GACTTAAATGAAGA ATGGCTTAGCACC Ice-plantTCCCAT-ACATTGACATAT---ATGGCCCGCCTGCGGCAACAAAAA---AACTAAAGGATA--GCTAGTTGCTACTACAATTC--CCATAACTCACCACC TurnipATTCAT-ATAAATAGAAGG---TCCGCGAACATTG--AAATGTAGATCATGCGTCAGAATT--GTCCTCTCTTAATAGGA A GGAGC WheatTATGAT-AAAATGAAATAT---TTTGCCCAGCCA-----ACTCAGTCGCATCCTCGGACAA--TTTGTTATCAAGGAACTCAC--CCAAAAACAAGCAAA DuckweedTCGGAT-GGGGGGGCATGAACACTTGCAATCATT-----TCATGACTCATTTCTGAACATGT-GCCCTTGGCAACGTGTAGACTGCCAACATTAATTAAA LarchTAACAT-ATGATATAACAC---CGGGCACACATTCCTAAACAAAGAGTGATTTCAAATATATCGTTAATTACGACTAACAAAA--TGAAAGTACAAGACC CottonCAAGAAAAGTTTCCACCCTC------TTTGTGGTCATAATG-GTT-GTAATGTC-ATCTGATTT----AGGATCCAACGTCACCCTTTCTCCCA-----A PeaC---AAAACTTTTCAATCT TGTGTGGTTAATATG-ACT-GCAAAGTTTATCATTTTC----ACAATCCAACAA-ACTGGTTCT A TobaccoAAAAATAATTTTCCAACCTTT---CATGTGTGGATATTAAG-ATTTGTATAATGTATCAAGAACC-ACATAATCCAATGGTTAGCTTTATTCCAAGATGA Ice-plantATCACACATTCTTCCATTTCATCCCCTTTTTCTTGGATGAG-ATAAGATATGGGTTCCTGCCAC----GTGGCACCATACCATGGTTTGTTA-ACGATAA TurnipCAAAAGCATTGGCTCAAGTTG-----AGACGAGTAACCATACACATTCATACGTTTTCTTACAAG-ATAAGATAAGATAATGTTATTTCT A WheatGCTAGAAAAAGGTTGTGTGGCAGCCACCTAATGACATGAAGGACT-GAAATTTCCAGCACACACA-A-TGTATCCGACGGCAATGCTTCTTC DuckweedATATAATATTAGAAAAAAATC-----TCCCATAGTATTTAGTATTTACCAAAAGTCACACGACCA-CTAGACTCCAATTTACCCAAATCACTAACCAATT LarchTTCTCGTATAAGGCCACCA TTGGTAGACACGTAGTATGCTAAATATGCACCACACACA-CTATCAGATATGGTAGTGGGATCTG--ACGGTCA CottonACCAATCTCT---AAATGTT----GTGAGCT---TAG-GCCAAATTT-TATGACTATA--TAT----AGGGGATTGCACC----AAGGCAGTG-ACACTA PeaGGCAGTGGCC---AACTAC CACAATTT-TAAGACCATAA-TAT----TGGAAATAGAA------AAATCAAT--ACATTA TobaccoGGGGGTTGTT---GATTTTT----GTCCGTTAGATAT-GCGAAATATGTAAAACCTTAT-CAT----TATATATAGAG------TGGTGGGCA-ACGATG Ice-plantGGCTCTTAATCAAAAGTTTTAGGTGTGAATTTAGTTT-GATGAGTTTTAAGGTCCTTAT-TATA---TATAGGAAGGGGG----TGCTATGGA-GCAAGG TurnipCACCTTTCTTTAATCCTGTGGCAGTTAACGACGATATCATGAAATCTTGATCCTTCGAT-CATTAGGGCTTCATACCTCT----TGCGCTTCTCACTATA WheatCACTGATCCGGAGAAGATAAGGAAACGAGGCAACCAGCGAACGTGAGCCATCCCAACCA-CATCTGTACCAAAGAAACGG----GGCTATATATACCGTG DuckweedTTAGGTTGAATGGAAAATAG---AACGCAATAATGTCCGACATATTTCCTATATTTCCG-TTTTTCGAGAGAAGGCCTGTGTACCGATAAGGATGTAATC LarchCGCTTCTCCTCTGGAGTTATCCGATTGTAATCCTTGCAGTCCAATTTCTCTGGTCTGGC-CCA----ACCTTAGAGATTG----GGGCTTATA-TCTATA CottonT-TAAGGGATCAGTGAGAC-TCTTTTGTATAACTGTAGCAT--ATAGTAC PeaTATAAAGCAAGTTTTAGTA-CAAGCTTTGCAATTCAACCAC--A-AGAAC TobaccoCATAGACCATCTTGGAAGT-TTAAAGGGAAAAAAGGAAAAG--GGAGAAA Ice-plantTCCTCATCAAAAGGGAAGTGTTTTTTCTCTAACTATATTACTAAGAGTAC LarchTCTTCTTCACAC---AATCCATTTGTGTAGAGCCGCTGGAAGGTAAATCA TurnipTATAGATAACCA---AAGCAATAGACAGACAAGTAAGTTAAG-AGAAAAG WheatGTGACCCGGCAATGGGGTCCTCAACTGTAGCCGGCATCCTCCTCTCCTCC DuckweedCATGGGGCGACG---CAGTGTGTGGAGGAGCAGGCTCAGTCTCCTTCTCG

Math and Computer Science ChallengesBenno Schwikowski Challenge: Developing models that allow extremely efficient algorithms Parsimony score: 1 AGTCGTACGTGAC... AGTAGACGTGCCG... ACGTGAGATACGT... GAACGGAGTACGT... TCGTGACGGTGAT... ACGG ACGT J. Comp Biol. 2002

Math and Computer Science ChallengesBenno Schwikowski An Exact Algorithm (generalizing Sankoff and Rousseau 1975) W u [s] =best parsimony score for subtree rooted at node u, if u is labeled with string s. AGTCGTACGTG ACGGGACGTGC ACGTGAGATAC GAACGGAGTAC TCGTGACGGTG … ACGG: 2 ACGT: 1... … ACGG : 0 ACGT : 2... … ACGG : 1 ACGT : 1... … ACGG: +  ACGT: 0... … ACGG: 1 ACGT: k entries … ACGG: 0 ACGT: + ... … ACGG:  ACGT :0... W u [s] =  min ( W v [t] + d(s, t) ) v : child t of u J. Comp Biol. 2002

Math and Computer Science ChallengesBenno Schwikowski What are good challenges to tackle? Biological/medical questions asked Experimental technologies to acquire a lot of relevant data Available datasets with a formalized notion of “data quality”

Math and Computer Science ChallengesBenno Schwikowski Memory complexity: O(k  4 2k ) per node Number of species Average sequence length Motif length Time complexity: Total time O(n k (4 2k + l )) J. Comp Biol. 2002

Technology-based challenges: Universal DNA Tag Systems Existing applications in high-throughput technologies Universal DNA arrays Padlock probes LYNX mRNA technology

Formalization Define: weight(A/T)=1, weight(C/G)=2 weight(AACTTG) = = 8  melting temperature (AACTTG) = 2·weight l-u code problem Given two integers, l < u, find the largest set of tags such that Each tag has weight  u Each string of weight  l occurs at most once J. Comp Biol & 2003

Math and Computer Science ChallengesBenno Schwikowski Challenge: Visualization Andrea Weston et ISB & Cytoscape

Math and Computer Science ChallengesBenno Schwikowski Challenge: Visualization Cytoscape, pre-release 2.0

Math and Computer Science ChallengesBenno Schwikowski A computer scientist’s perspective “Biology is so digital, and incredibly complicated […] I can't be as confident about computer science as I can about biology. Biology easily has 500 years of exciting problems to work on, it's at that level.” Donald Knuth, 7 Dec 1993 Donald Knuth