Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2015 Colin Dewey

Slides:



Advertisements
Similar presentations
BCH364C/391L Systems Biology/Bioinformatics (course # 54995/55095) Spring 2015 Tues/Thurs 11 – 12:30 PM BUR 212 Edward Marcotte/Univ. of Texas/BCH391L/Spring.
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
CSE 5522: Survey of Artificial Intelligence II: Advanced Techniques Instructor: Alan Ritter TA: Fan Yang.
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2014 Sushmita Roy
CSE 591 (99689) Application of AI to molecular Biology (5:15 – 6: 30 PM, PSA 309) Instructor: Chitta Baral Office hours: Tuesday 2 to 5 PM.
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Bioinformatics and Phylogenetic Analysis
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
CSE 590ST Statistical Methods in Computer Science Instructor: Pedro Domingos.
Ayesha Masrur Khan Spring Course Outline Introduction to Bioinformatics Definition of Bioinformatics and Related Fields Earliest Bioinformatics.
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2013 Sushmita Roy
EECS 395/495 Algorithmic Techniques for Bioinformatics General Introduction 9/27/2012 Ming-Yang Kao 19/27/2012.
Presented by Liu Qi An introduction to Bioinformatics Algorithms Qi Liu
CSE 515 Statistical Methods in Computer Science Instructor: Pedro Domingos.
BIO337 Systems Biology/Bioinformatics (course # 50524) Spring 2014 Tues/Thurs 11 – 12:30 PM BUR 212 Edward Marcotte/Univ. of Texas/BIO337/Spring 2014.
Computer Science 102 Data Structures and Algorithms V Fall 2009 Lecture 1: administrative details Professor: Evan Korth New York University 1.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
WEEK 1 CS 361: ADVANCED DATA STRUCTURES AND ALGORITHMS Dong Si Dept. of Computer Science 1.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
© 2004 Goodrich, Tamassia CS2210 Data Structures and Algorithms Lecture 1: Course Overview Instructor: Olga Veksler.
CS223 Algorithms D-Term 2013 Instructor: Mohamed Eltabakh WPI, CS Introduction Slide 1.
Cpt S 471/571: Computational Genomics Spring 2015, 3 cr. Where: Sloan 9 When: M WF 11:10-12:00 Instructor weekly office hour for Spring 2015: Tuesdays.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
1 CS 233 Data Structures and Algorithms 황승원 Fall 2010 CSE, POSTECH.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
Object Oriented Programming (OOP) Design Lecture 1 : Course Overview Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Introduction to Bioinformatics Prologue. Bioinformatics Living things have the ability to store, utilize, and pass on information Bioinformatics strives.
Object Oriented Programming (OOP) Design Lecture 1 : Course Overview Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang.
Master’s Degrees in Bioinformatics in Switzerland: Past, present and near future Patricia M. Palagi Swiss Institute of Bioinformatics.
CS397-CXZ Algorithms in Bioinformatics ChengXiang (“Cheng”) Zhai, Robert Skeel (Department of Computer Science) Nick Sahinidis (Department of Chemical.
Introduction to Databases Computer Science 557 September 2007 Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
Overviews of ITCS 6161/8161: Advanced Topics on Database Systems Dr. Jianping Fan Department of Computer Science UNC-Charlotte
Intelligent systems in bioinformatics Introduction to the course.
Data Structures (Second Part) Lecture 1 Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University.
Introduction to Science Informatics Lecture 1. What Is Science? a dependence on external verification; an expectation of reproducible results; a focus.
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2008 Colin Dewey Dept. of Biostatistics & Medical Informatics.
CS 173, Lecture B August 25, 2015 Professor Tandy Warnow.
Object Oriented Programming (OOP) Design Lecture 1 : Course Overview Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang.
Introduction to ECE 2401 Data Structure Fall 2005 Chapter 0 Chen, Chang-Sheng
Overview of Bioinformatics 1 Module Denis Manley..
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
EB3233 Bioinformatics Introduction to Bioinformatics.
An overview of Bioinformatics. Cell and Central Dogma.
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
Bioinformatics and Computational Biology
Introduction to biological molecular networks
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Data Structures and Algorithms in Java AlaaEddin 2012.
BCH339N Systems Biology/Bioinformatics (course # 54040) Spring 2016 Tues/Thurs 11 – 12:30 PM BUR 212.
Course Overview Stephen M. Thebaut, Ph.D. University of Florida Software Engineering.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Presenter: Bradley Green.  What is Bioinformatics?  Brief History of Bioinformatics  Development  Computer Science and Bioinformatics  Current Applications.
Bioinformatics Professor: Monica Bianchini Department of Information Engineering and Mathematics E–mail: Phone: 1012.
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
Advanced Bioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2018 Anthony Gitter
What is Bioinformatics?
Cpt S 471/571: Computational Genomics
Algorithms for Biological Sequence Analysis
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Cpt S 471/571: Computational Genomics
Introduction to Bioinformatic
Introduction to Bioinformatics
Presentation transcript:

Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2015 Colin Dewey

Goals for today Administrivia Course Topics Short survey of interests/background 2

Course Web Site syllabus readings tentative schedule lecture slides in PDF/PPT homework link to Piazza discussion board etc. 3

Your Instructor: Colin Dewey website: office: 2128 Genetics-Biotechnology Center Associate professor in the department of Biostatistics & Medical Informatics with an affiliate appointment in Computer Sciences research interests: probabilistic modeling, biological sequence evolution, analysis of “next-generation” sequencing data (RNA-Seq in particular), whole-genome alignment 4

Finding My Office: 2128 Genetics-Biotechnology Center slightly confusing building(s) best bet: use Henry Mall main entrance 5 Engineering Hall Genetics-BiotechnologyCenter Computer Sciences my office

Course TAs Manish Bansal – – Office: 1309 Computer Sciences Zhen Niu – – Office: TBA 6

Office Hours To be announced Will begin next week Doodle poll to determine a good office hour schedule for TAs and me – Please fill out poll to increase the likelihood that our office hours will work for you! – With a class of this size we have limited ability to accommodate appointments outside of office hours You are encouraged to visit our office hours! 7

Expected Background CS 367 (Intro to Data Structures) or equivalent – Arrays – Hash tables – Trees – Graphs Statistics: good if you’ve had at least one course, but not required – Continuous/Discrete probability distributions – Conditional and joint distributions Molecular biology: no knowledge assumed, but an interest in learning some basic molecular biology is mandatory 8

Course grading 7 or so homework assignments: ~60% – Programming problems – Written exercises midterm exam: ~20% final exam: ~20% 9

Homework assignments For programming exercises, you should use one of: – C – C++ – Java – Perl (discouraged, TAs cannot read Perl) – Python – R (somewhat discouraged, not general-purpose) – Matlab (somewhat discouraged, not general-purpose) These are the most commonly used languages in bioinformatics Use a language not on this list at your own risk Written exercises must be typed up (e.g., LaTeX, Word) Homework will be submitted electronically 10

Computing Resources for the class UNIX workstations in Dept. of Biostatistics & Medical Informatics – accounts will be created soon – two machines mi1.biostat.wisc.edu mi2.biostat.wisc.edu UNIX tutorial:

Exams Midterm: October 27 th, in class Final: December 23 rd, 12:25-2:25pm 12

Participation Attending lectures is not optional A significant amount of material is not in the slides (e.g., board work) Questions are welcome during class 13

Piazza Discussion Forum Instead of a mailing list Please consider posting your questions to Piazza first, before ing the instructor or TAs Consider answering your classmates’ questions! Quick announcements will also be posted to Piazza instructor or TAs with questions inappropriate for Piazza Expect response within 24 hours 14

Course readings Readings assigned for each lecture – please read these ahead of time Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Cambridge University Press, Articles from the primary literature (scientific journals, etc.) 15

Reading assignment for Sep 8th Life and Its Molecules A Brief Introduction by Lawrence Hunter – ter04.pdf ter04.pdf 16

Goals for today Administrivia Course Overview Short survey of interests/background 17

Learning goals of this class Gain an overview of different problem areas in bioinformatics Understanding significant & interesting algorithms Ability to apply the computational concepts to related problems in biology and other areas Ability to understand scientific articles about more cutting- edge approaches Foundation to enable independent learning and deeper study of related topics 18

What is Bioinformatics? The term Bioinformatics was coined in the 1970s Very close cousin: Computational Biology An interdisciplinary field rooted in computer and information sciences and life sciences. Draws from other areas such as – Math, statistics, machine learning, physics, genetics, evolutionary biology, biochemistry Definitions from the National Institute of Health – Bioinformatics: Research, development, or application of computational tools and approaches to make the vast, diverse and complex life sciences data more understandable and useful. – Computational biology: The development and application of mathematical and computational approaches to address theoretical and experimental questions in biology 19

Why Bioinformatics? Biology is a data-driven field – By far the richest types and sources of data – Biological systems are complex and noisy Need informatics tools to – Store, manage, mine, visualize biological data – Model biological complexity – Generate testable hypotheses Many biological questions translate naturally into a computational problem – Pattern extraction – Search – Inferring function of bio-chemical entities – Finding relationships among entities 20

Bioinformatics then and now 1990s: Mostly data storage, search and retrieval of sequence data, and databases to store biological knowledge Now: abstract knowledge and principles from large-scale data, to present a complete representation of cells and organisms, and to make computational predictions of systems of higher complexity such as cellular interaction networks and global phenotypes 21 Kanehisa and Bork, 2003

YearBiological landmarksComputational advances 1953DNA’s double helix structure 1967Availability of protein sequences First database of protein sequences by Margaret Dayhoff Global and local alignment algorithms 1987Swissprot: First indexed database 1990BLAST, a fast program to search large databases for query sequences Several whole genomes sequenced HMMs for sequence analysis 1997First DNA microarraysClustering to expression data 2000Large collections of expression data Probabilistic graphical models to analyze networks 2003Human genome sequence published 2005-Growth of next-generation sequencing methods Advanced statistical and machine learning methods for next-gen sequencing data A few important dates 22

Overview of bioinformatics topics Sequence assembly Sequence alignment Phylogenetic trees Genome annotation Analysis of “omic” datasets Modeling and analysis of biological networks 23

Computer Science Topics Algorithms Graphs Exact Greedy Dynamic Programming Branch and bound Heuristics Computational Complexity

Statistics Topics Probability for discrete random variables Markov Chains Hidden Markov Models Maximum Likelihood Expectation-Maximization Bayesian networks

Sequence Assembly How do we determine the genome sequence of an organism?

Topics in sequence assembly Sequencing technologies Fragment assembly problem Spectral assembly problem Graph algorithms Assembly in practice

Sequence comparison: How similar are the sequences? 28 Human ADNP geneMouse ADNP gene

Topics in sequence alignment Pairwise alignment – Global alignment – Local alignment Multiple sequence alignment Scores and substitution matrices Practical algorithms for sequence alignment – BLAST – Progressive multiple alignment 29

How are these organisms related? 30 Toh et al, Nature, 2011

Topics in phylogenetic trees Reconstructing Phylogenetic trees – distance-based approaches – probabilistic methods – parsimony methods Inferring ancestral sequences Felsenstein’s algorithm Neighbor Joining UPGMA 31

CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACCCACACACACACATCCTAACACTACCCTAACACAGCCCTAATCTAACCCTGGCCAACCT GTCTCTCAACTTACCCTCCATTACCCTGCCTCCACTCGTTACCCTGTCCCATTCAACCATACCACTCCGAACCACCATCCATCCCTCTACTTACTACCACTCACCCACCGT TACCCTCCAATTACCCATATCCAACCCACTGCCACTTACCCTACCATTACCCTACCATCCACCATGACCTACTCACCATACTGTTCTTCTACCCACCATATTGAAACGCTAA CAAATGATCGTAAATAACACACACGTGCTTACCCTACCACTTTATACCACCACCACATGCCATACTCACCCTCACTTGTATACTGATTTTACGTACGCACACGGATGCTA CAGTATATACCATCTCAAACTTACCCTACTCTCAGATTCCACTTCACTCCATGGCCCATCTCTCACTGAATCAGTACCAAATGCACTCACATCATTATGCACGGCACTTGC CTCAGCGGTCTATACCCTGTGCCATTTACCCATAACGCCCATCATTATCCACATTTTGATATCTATATCTCATTCGGCGGTCCCAAATATTGTATAACTGCCCTTAATACATA CGTTATACCACTTTTGCACCATATACTTACCACTCCATTTATATACACTTATGTCAATATTACAGAAAAATCCCCACAAAAATCACCTAAACATAAAAATATTCTACTTTTC AACAATAATACATAAACATATTGGCTTGTGGTAGCAACACTATCATGGTATCACTAACGTAAAAGTTCCTCAATATTGCAATTTGCTTGAACGGATGCTATTTCAGAATA TTTCGTACTTACACAGGCCATACATTAGAATAATATGTCACATCACTGTCGTAACACTCTTTATTCACCGAGCAATAATACGGTAGTGGCTCAAACTCATGCGGGTGCTA TGATACAATTATATCTTATTTCCATTCCCATATGCTAACCGCAATATCCTAAAAGCATAACTGATGCATCTTTAATCTTGTATGTGACACTACTCATACGAAGGGACTATAT CTAGTCAAGACGATACTGTGATAGGTACGTTATTTAATAGGATCTATAACGAAATGTCAAATAATTTTACGGTAATATAACTTATCAGCGGCGTATACTAAAACGGACGT TACGATATTGTCTCACTTCATCTTACCACCCTCTATCTTATTGCTGATAGAACACTAACCCCTCAGCTTTATTTCTAGTTACAGTTACACAAAAAACTATGCCAACCCAGA AATCTTGATATTTTACGTGTCAAAAAATGAGGGTCTCTAAATGAGAGTTTGGTACCATGACTTGTAACTCGCACTGCCCTGATCTGCAATCTTGTTCTTAGAAGTGAC GCATATTCTATACGGCCCGACGCGACGCGCCAAAAAATGAAAAACGAAGCAGCGACTCATTTTTATTTAAGGACAAAGGTTGCGAAGCCGCACATTTCCAATTTCAT TGTTGTTTATTGGACATACACTGTTAGCTTTATTACCGTCCACGTTTTTTCTACAATAGTGTAGAAGTTTCTTTCTTATGTTCATCGTATTCATAAAATGCTTCACGAACA CCGTCATTGATCAAATAGGTCTATAATATTAATATACATTTATATAATCTACGGTATTTATATCATCAAAAAAAAGTAGTTTTTTTATTTTATTTTGTTCGTTAATTTTCAATT TCTATGGAAACCCGTTCGTAAAATTGGCGTTTGTCTCTAGTTTGCGATAGTGTAGATACCGTCCTTGGATAGAGCACTGGAGATGGCTGGCTTTAATCTGCTGGAGTA CCATGGAACACCGGTGATCATTCTGGTCACTTGGTCTGGAGCAATACCGGTCAACATGGTGGTGAAGTCACCGTAGTTGAAAACGGCTTCAGCAACTTCGACTGGG TAGGTTTCAGTTGGGTGGGCGGCTTGGAACATGTAGTATTGGGCTAAGTGAGCTCTGATATCAGAGACGTAGACACCCAATTCCACCAAGTTGACTCTTTCGTCAGA TTGAGCTAGAGTGGTGGTTGCAGAAGCAGTAGCAGCGATGGCAGCGACACCAGCGGCGATTGAAGTTAATTTGACCATTGTATTTGTTTTGTTTGTTAGTGCTGAT ATAAGCTTAACAGGAAAGGAAAGAATAAAGACATATTCTCAAAGGCATATAGTTGAAGCAGCTCTATTTATACCCATTCCCTCATGGGTTGTTGCTATTTAAACGATCG CTGACTGGCACCAGTTCCTCATCAAATATTCTCTATATCTCATCTTTCACACAATCTCATTATCTCTATGGAGATGCTCTTGTTTCTGAACGAATCATAAATCTTTCATAGG TTTCGTATGTGGAGTACTGTTTTATGGCGCTTATGTGTATTCGTATGCGCAGAATGTGGGAATGCCAATTATAGGGGTGCCGAGGTGCCTTATAAAACCCTTTTCTGTG CCTGTGACATTTCCTTTTTCGGTCAAAAAGAATATCCGAATTTTAGATTTGGACCCTCGTACAGAAGCTTATTGTCTAAGCCTGAATTCAGTCTGCTTTAAACGGCTTC CGCGGAGGAAATATTTCCATCTCTTGAATTCGTACAACATTAAACGTGTGTTGGGAGTCGTATACTGTTAGGGTCTGTAAACTTGTGAACTCTCGGCAAATGCCTTGG TGCAATTACGTAATTTTAGCCGCTGAGAAGCGGATGGTAATGAGACAAGTTGATATCAAACAGATACATATTTAAAAGAGGGTACCGCTAATTTAGCAGGGCAGTAT TATTGTAGTTTGATATGTACGGCTAACTGAACCTAAGTAGGGATATGAGAGTAAGAACGTTCGGCTACTCTTCTTTCTAAGTGGGATTTTTCTTAATCCTTGGATTCTTA AAAGGTTATTAAAGTTCCGCACAAAGAACGCTTGGAAATCGCATTCATCAAAGAACAACTCTTCGTTTTCCAAACAATCTTCCCGAAAAAGTAGCCGTTCATTTCCCT TCCGATTTCATTCCTAGACTGCCAAATTTTTCTTGCTCATTTATAATGATTGATAAGAATTGTATTTGTGTCCCATTCTCGTAGATAAAATTCTTGGATGTTAAAAAATTA AAGGGACTATATCTAGTCAAGACGATACTGTCAGTAGCAGCGATGGCAGCGTGGCTTGTGGTAGCAACACTATCATGGTATCACTAACGTAAAAGTTCCTCAATATTG CAATTTGCTTGAACGGATGCTATTTCAGAATATTTCGTACTTACACAGGCCATACATTAGAATAATATGTCACATCACTGTCGTAACACTCTTTATTCACCGAGCAATAAT ACGGTAGTGGCTCAAACTCATGCGGGTGCTATGATACAATTATATCTTATTTCCATTCCCATATGCTAACCGCAATATCCTAAAAGCATAACTGATGCATCTTTAATCTTG TATGTGACACTACTCATACGAAGGGACTATATCTAGTCAAGACGATACTGTGATAGGTACGTTATTTAATAGGATCTATAACGAAATGTCAAATAATTTTACGGTAATATA ACTTATCAGCGGCGTATACTAAAACGGACGTTACGATATTGTCTCACTTCATCTTACCACCCTCTATCTTATTGCTGATAGAACACTAACCCCTCAGCTTTATTTCTAGTT ACAGTTACACAAAAAACTATGCCAACCCAGAAATCTTGATATTTTACGTGTCAAAAAATGAGGGTCTCTAAATGAGAGTTTGGTACCATGACTTGTAACTCGCACTGC CCTGATCTGCAATCTTGTTCTTAGAAGTGACGCATATTCTATACGGCCCGACGCGACGCGCCAAAAAATGAAAAACGAAGCAGCGACTCATTTTTATTTAAGGACAA AGGTTGCGAAGCCGCACATTTCCAATTTCATTGTTGTTTATTGGACATACACTGTTAGCTTTATTACCGTCCACGTTTTTTCTAGCACCATATACTTACCACTCCATTTAT GAATCAGTACCAAATGCA Where are the genes in this genome? 32

CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACCCACACACACACATCCTAACACTACCCTAACACAGCCCTAATCTAACCCTGGCCAACCT GTCTCTCAACTTACCCTCCATTACCCTGCCTCCACTCGTTACCCTGTCCCATTCAACCATACCACTCCGAACCACCATCCATCCCTCTACTTACTACCACTCACCCACCGT TACCCTCCAATTACCCATATCCAACCCACTGCCACTTACCCTACCATTACCCTACCATCCACCATGACCTACTCACCATACTGTTCTTCTACCCACCATATTGAAACGCTA ACAAATGATCGTAAATAACACACACGTGCTTACCCTACCACTTTATACCACCACCACATGCCATACTCACCCTCACTTGTATACTGATTTTACGTACGCACACGGATG CTACAGTATATACCATCTCAAACTTACCCTACTCTCAGATTCCACTTCACTCCATGGCCCATCTCTCACTGAATCAGTACCAAATGCACTCACATCATTATGCACGGCA CTTGCCTCAGCGGTCTATACCCTGTGCCATTTACCCATAACGCCCATCATTATCCACATTTTGATATCTATATCTCATTCGGCGGTCCCAAATATTGTATAACTGCCCTTAA TACATACGTTATACCACTTTTGCACCATATACTTACCACTCCATTTATATACACTTATGTCAATATTACAGAAAAATCCCCACAAAAATCACCTAAACATAAAAATATTCTA CTTTTCAACAATAATACATAAACATATTGGCTTGTGGTAGCAACACTATCATGGTATCACTAACGTAAAAGTTCCTCAATATTGCAATTTGCTTGAACGGATGCTATTTC AGAATATTTCGTACTTACACAGGCCATACATTAGAATAATATGTCACATCACTGTCGTAACACTCTTTATTCACCGAGCAATAATACGGTAGTGGCTCAAACTCATGCGG GTGCTATGATACAATTATATCTTATTTCCATTCCCATATGCTAACCGCAATATCCTAAAAGCATAACTGATGCATCTTTAATCTTGTATGTGACACTACTCATACGAAGGGA CTATATCTAGTCAAGACGATACTGTGATAGGTACGTTATTTAATAGGATCTATAACGAAATGTCAAATAATTTTACGGTAATATAACTTATCAGCGGCGTATACTAAAACG GACGTTACGATATTGTCTCACTTCATCTTACCACCCTCTATCTTATTGCTGATAGAACACTAACCCCTCAGCTTTATTTCTAGTTACAGTTACACAAAAAACTATGCCAAC CCAGAAATCTTGATATTTTACGTGTCAAAAAATGAGGGTCTCTAAATGAGAGTTTGGTACCATGACTTGTAACTCGCACTGCCCTGATCTGCAATCTTGTTCTTAGAA GTGACGCATATTCTATACGGCCCGACGCGACGCGCCAAAAAATGAAAAACGAAGCAGCGACTCATTTTTATTTAAGGACAAAGGTTGCGAAGCCGCACATTTCCAA TTTCATTGTTGTTTATTGGACATACACTGTTAGCTTTATTACCGTCCACGTTTTTTCTACAATAGTGTAGAAGTTTCTTTCTTATGTTCATCGTATTCATAAAATGCTTCAC GAACACCGTCATTGATCAAATAGGTCTATAATATTAATATACATTTATATAATCTACGGTATTTATATCATCAAAAAAAAGTAGTTTTTTTATTTTATTTTGTTCGTTAATTTT CAATTTCTATGGAAACCCGTTCGTAAAATTGGCGTTTGTCTCTAGTTTGCGATAGTGTAGATACCGTCCTTGGATAGAGCACTGGAGATGGCTGGCTTTAATCTGCTG GAGTACCATGGAACACCGGTGATCATTCTGGTCACTTGGTCTGGAGCAATACCGGTCAACATGGTGGTGAAGTCACCGTAGTTGAAAACGGCTTCAGCAACTTCGA CTGGGTAGGTTTCAGTTGGGTGGGCGGCTTGGAACATGTAGTATTGGGCTAAGTGAGCTCTGATATCAGAGACGTAGACACCCAATTCCACCAAGTTGACTCTTTC GTCAGATTGAGCTAGAGTGGTGGTTGCAGAAGCAGTAGCAGCGATGGCAGCGACACCAGCGGCGATTGAAGTTAATTTGACCATTGTATTTGTTTTGTTTGTTAGT GCTGATATAAGCTTAACAGGAAAGGAAAGAATAAAGACATATTCTCAAAGGCATATAGTTGAAGCAGCTCTATTTATACCCATTCCCTCATGGGTTGTTGCTATTTAAA CGATCGCTGACTGGCACCAGTTCCTCATCAAATATTCTCTATATCTCATCTTTCACACAATCTCATTATCTCTATGGAGATGCTCTTGTTTCTGAACGAATCATAAATCTTT CATAGGTTTCGTATGTGGAGTACTGTTTTATGGCGCTTATGTGTATTCGTATGCGCAGAATGTGGGAATGCCAATTATAGGGGTGCCGAGGTGCCTTATAAAACCCTTT TCTGTGCCTGTGACATTTCCTTTTTCGGTCAAAAAGAATATCCGAATTTTAGATTTGGACCCTCGTACAGAAGCTTATTGTCTAAGCCTGAATTCAGTCTGCTTTAAAC GGCTTCCGCGGAGGAAATATTTCCATCTCTTGAATTCGTACAACATTAAACGTGTGTTGGGAGTCGTATACTGTTAGGGTCTGTAAACTTGTGAACTCTCGGCAAATG CCTTGGTGCAATTACGTAATTTTAGCCGCTGAGAAGCGGATGGTAATGAGACAAGTTGATATCAAACAGATACATATTTAAAAGAGGGTACCGCTAATTTAGCAGGG CAGTATTATTGTAGTTTGATATGTACGGCTAACTGAACCTAAGTAGGGATATGAGAGTAAGAACGTTCGGCTACTCTTCTTTCTAAGTGGGATTTTTCTTAATCCTTGG ATTCTTAAAAGGTTATTAAAGTTCCGCACAAAGAACGCTTGGAAATCGCATTCATCAAAGAACAACTCTTCGTTTTCCAAACAATCTTCCCGAAAAAGTAGCCGTTCA TTTCCCTTCCGATTTCATTCCTAGACTGCCAAATTTTTCTTGCTCATTTATAATGATTGATAAGAATTGTATTTGTGTCCCATTCTCGTAGATAAAATTCTTGGATGTTAAA AAATTAAAGGGACTATATCTAGTCAAGACGATACTGTCAGTAGCAGCGATGGCAGCGTGGCTTGTGGTAGCAACACTATCATGGTATCACTAACGTAAAAGTTCCTCA ATATTGCAATTTGCTTGAACGGATGCTATTTCAGAATATTTCGTACTTACACAGGCCATACATTAGAATAATATGTCACATCACTGTCGTAACACTCTTTATTCACCGAGC AATAATACGGTAGTGGCTCAAACTCATGCGGGTGCTATGATACAATTATATCTTATTTCCATTCCCATATGCTAACCGCAATATCCTAAAAGCATAACTGATGCATCTTT AATCTTGTATGTGACACTACTCATACGAAGGGACTATATCTAGTCAAGACGATACTGTGATAGGTACGTTATTTAATAGGATCTATAACGAAATGTCAAATAATTTTA CGGTAATATAACTTATCAGCGGCGTATACTAAAACGGACGTTACGATATTGTCTCACTTCATCTTACCACCCTCTATCTTATTGCTGATAGAACACTAACCCCTCAGCT TTATTTCTAGTTACAGTTACACAAAAAACTATGCCAACCCAGAAATCTTGATATTTTACGTGTCAAAAAATGAGGGTCTCTAAATGAGAGTTTGGTACCATGACTTG TAACTCGCACTGCCCTGATCTGCAATCTTGTTCTTAGAAGTGACGCATATTCTATACGGCCCGACGCGACGCGCCAAAAAATGAAAAACGAAGCAGCGACTCATTTT TATTTAAGGACAAAGGTTGCGAAGCCGCACATTTCCAATTTCATTGTTGTTTATTGGACATACACTGTTAGCTTTATTACCGTCCACGTTTTTTCTAGCACCATATACTT ACCACTCCATTTATGAATCAGTACC Protein coding sequence 33

Topics in sequence annotation Markov chains Hidden Markov models Inference and Parameter estimation – Forward, Backward, Viterbi algorithms Applications to genome segmentation 34

How do cells function under different conditions? Measure mRNA/proteins levels under different environmental conditions Compare levels of genes under different conditions 35

Topics in data analysis from high-throughput experiments Clustering algorithms hierarchical clustering k-means clustering EM-based clustering Interpretation of clusters Evaluation of clusters 36

How do molecular entities interact within a cell? 37 Interactions within a cellNetwork model AB A controls B

What networks get perturbed in a disease? 38 Subnetworks of genes predictive of cancer prognosis Chuan et al, MSB 2007

Topics in network modeling Different types of biological networks Probabilistic graphical models for representing networks Algorithms of network inference Evaluating inferred networks Analysis of inferred networks 39

The Short-term Plan Tuesday (9/8) “Molecular Biology 101” lecture Optional for molecular biology students Thursday (9/10) start on “Sequence Assembly”

Reminder: Reading assignment for Tuesday Life and Its Molecules A Brief Introduction by Lawrence Hunter – ter04.pdf ter04.pdf 41

Goals for today Administrivia Course Overview Short survey of interests/background 42