Presentation on theme: "CS 7010: Computational Methods in Bioinformatics (course introduction)"— Presentation transcript:
1CS 7010: Computational Methods in Bioinformatics (course introduction) Dong XuComputer Science Department109 Engineering Building West
2Challenges of Our Civilization -1 top 125 unsolved problems in science over the next quarter-century (http://www.sciencemag.org/sciext/125th/)The Top 25What Is the Universe Made Of?What is the Biological Basis of Consciousness?Why Do Humans Have So Few Genes?To What Extent Are Genetic Variation and Personal Health Linked?Can the Laws of Physics Be Unified?How Much Can Human Life Span Be Extended?What Controls Organ Regeneration?How Can a Skin Cell Become a Nerve Cell?
3Challenges of Our Civilization-2 How Does a Single Somatic Cell Become a Whole Plant?How Does Earth's Interior Work?Are We Alone in the Universe?How and Where Did Life on Earth Arise?What Determines Species Diversity?What Genetic Changes Made Us Uniquely Human?How Are Memories Stored and Retrieved?How Did Cooperative Behavior Evolve?How Will Big Pictures Emerge from a Sea of Biological Data?
4Challenges of Our Civilization-3 How Far Can We Push Chemical Self-Assembly?What Are the Limits of Conventional Computing?Can We Selectively Shut Off Immune Responses?Do Deeper Principles Underlie Quantum Uncertainty and Nonlocality?Is an Effective HIV Vaccine Feasible?How Hot Will the Greenhouse World Be?What Can Replace Cheap Oil -- and When?Will Malthus Continue to Be Wrong?
5Lecture Outline What does bioinformatics do? Course topics Course OrganizationWorkload/grades
6Technical Definitions NIH (http://www.bisti.nih.gov/)Bioinformatics: “research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, represent, describe, store, analyze, or visualize such data”.Computational Biology: “the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems”.
7Scope of Bioinformatics: Studying Biology on Computer data management; data mining; modeling; prediction; theory formulationbioinformaticsgenes, proteins, protein complexes, pathways, cells, organisms, ecosysteman indispensable part of biological sciencewith its own methodologyengineering aspectscientific aspectcomputer science, biology, statisticsphysics, mathematics, chemistry, engineering,…
8Why Bioinformatics is So Hot? (I) More than 80 universities offer graduate degrees in bioinformaticsAt cross-section of two most active fields: computer science and molecular biologyExponential growths in computer technologies (hardware, Internet) pave the way for bioinformatics development
9Why Bioinformatics is So Hot? (II) Analytical technologyHigh-throughput dataBiological knowledgeMedicine & bioengineering
10What Can Computing Do for Biology? Data interpretation in analytical technologiesData management and computational infrastructureDiscovery from data miningModeling, prediction and designTheoretical / in silico biologyAlmost cover every area of computer science
11Data Interpretation in Analytical Technologies (I) Analytical technologies are the driving force of new (large-scale) biology:DNA sequencing (genomics)X-ray / NMR structure determination (structural genomics)Protein identification using mass spectrometry (proteomics)Microarray chips (functional genomics)
12Data Interpretation in Analytical Technologies (II) peak assignmentstructuralrestraintextractionNMR spectraNMR protein structure determinationi+4i+3i+2ii-1i+1structure calculationprotein structure
13Data Interpretation in Analytical Technologies (III) From image to data (imaging processing)Large-scale data cannot be handled without computerNoisy data (optimization with under-constraint / over-constraint)Computer algorithms/programs can mimic human interpretation process and do it much fasterAutomation of experimental data interpretation
14Data Management and Computational Infrastructure Track instruments, experiment conditions and results at each step of a complicated biological experiment (LIMS at modern wet labs)Data storage and retrieval (database)Data visualizationData query and analysis pipeline
16Discovery from Data Mining (II) Pattern/knowledge discovery from datamany biological data are generated by biological processes which are not well understoodinterpretation of such data requires discovery of convoluted relationships hidden in the datawhich segment of a DNA sequence represents a gene, a regulatory regionwhich genes are possibly responsible for a particular diseaseComplicated dataLarge-scale, high-dimensionNoisy (false positives and false negatives)
17Modeling, Prediction and Design (I) Modeling and prediction of biological objects/processesmodeling of biochemistryenzyme reaction ratesmodeling of biophysicsdynamics of biomoleculesmodeling of evolutionprediction of phylogeny
18Modeling, Prediction and Design (II) Prediction of outcomes of biological processescomputing will become an integral part of modern biology through an iterative process ofFrom prediction to engineering designProtein structure prediction to protein engineeringDesign genetically modified speciesmodel formulationexperimental validationcomputational prediction
19Theoretical / In Silico Biology Generate new hypothesis, formulate and test fundamental theories of biologynew hypothesis about detailed evolutionary history, through mining genomic sequence data?new hypothesis about a particular signaling network, through data mining?new hypothesis about protein folding pathways, through simulations?
20Bioinformatics Application to Biological Systems bacteria(Synechococcus)yeast(Saccharomyces cerevisia)plants (Arabidopsis)neural systems(neurons)viruses(SARS)
21Can Biology Help Computing? Computational techniques inspired by biology:Neural network (artificial intelligence)Genetic algorithm, automataA new driver of computer science:Better hardware (supercomputers)New data representationDevelop new theoretical framework:DNA computingNetwork communication(communication between ants, see
22Computing versus Biology what computer science is to molecular biology is like what mathematics has been to physics Larry Hunter, ISMB’94molecular biology is (becoming) an information science Leroy Hood, RECOMB’00Bioinformatics is still in its infancy!
23Lecture Outline What does bioinformatics do? Course topics Course OrganizationWorkload/grades
24Course Topics Data interpretation in analytical technologies Data management and computational infrastructureDiscovery from data miningModeling, prediction and designTheoretical / in silico biologyCover classical/mainstream bioinformatics problems from computer science prospective
25Course Schedule See http://digbio.missouri.edu/cs7010/ Course ScheduleSeeFirst take home exam:--given on 9/29; due on 10/6Second take home exam:--given on 11/17; due on 11/29Three phases of project:--9/22, 10/20, 11/17, final report due 12/8
26What I Will TeachA general introduction to a few major problems in the field of bioinformaticsproblems definitions: from biological problem to computable problemsome key computational techniquesA way of thinking: tackling “biological problem” computationallyhow to look at a biological problem from a computational point of viewhow to formulate a computational problem to address a biological issuehow to collect statistics from biological datahow to build a computational modelhow to design algorithms for the modelhow to test and evaluate a computational algorithmhow to access confidence of a prediction result
28Lecture Outline What does bioinformatics do? Course topics Course OrganizationWorkload/grades
29A Brief Survey Register for the course? Academic department? Computer background?Biology background?Statistical background?Taken another bioinformatics course?
30PrerequisitesCS 2050 (Algorithm Design and Programming II) or equivalent trainingStatistics 2500 (Introduction to Probability and Statistics I) or equivalent trainingProgramming skills in any programming language are requiredNo biology background is necessary
31Course Info Co Instructor: Trupti Joshi (firstname.lastname@example.org) Course Web Site:
32Reference Books - 1• Neil C.Jones and Pavel A. Pevzner: An Introduction to Bioinformatics Algorithms (Computational Molecular Biology). MIT Press, • Pavel Pevzner: Computational Molecular Biology - An Algorithmic Approach. MIT Press, • Current Topics in Computational Molecular Biology, edited by Tao Jiang, Ying Xu, and Michael Zhang. MIT Press
33Reference Books - 2• Pierre Baldi and Soren Brunak: Bioinformatics – The Machine Learning Approach (second edition). MIT Press, 2001.• Dan Gusfield: Algorithms on Strings, Trees, and Sequences. Cambridge University Press • Warren J. Ewens and Gregory R. Grant: Statistical Methods in Bioinformatics – An Introduction. Springer • Terry Speed: Statistical analysis of gene expression of gene expression microarray data. Chapman&Hall/CRC
34Lectures 3:30pm – 4:45pm, Tuesday and Thursday Powerpoint sides for each lecture (posted before the lecture)Questions/answers in the beginning and end of lectureDiscussions are encouraged during the lecture (A topic discussion may be at the end of a lecture)
35Office Hours 4:45pm-5:35pm, Tuesdays and Thursdays The instructor who deliver the lecture will give the office hourDong Xu: Room 109, Engineering Building West ( )Trupti Joshi: Room 317, Engineering Building North ( )Special office hours will be arranged close to the finalAppointments at other time
36Lecture Outline What does bioinformatics do? Course topics Course OrganizationWorkload/grades
37Minimum Requirement Attend class regularly Read suggested class handout after classDeliver the two take-home examsDeliver final project (for graduate students)Expected workload: 5-6 hours / week in addition to class attendance
38How to Get Maximum out of the Course Study suggested reading/slide before classStudy optional readingAsk questions on classFrequent visits at office hoursPerform homework assignments (not graded)Not required (not counted in the final grade) but encouraged.
39Grading A final grade of A, B, C, etc. will be assigned, Final project 2 take-home exams (20% each)Project : 3 Phase Reports (5% each), Final Report (15%), Software Demo (15%), Presentation (15%)Final projectA working bioinformatics program that can be used by biologists or comprehensive computational analysis on bioinformatics tool outputsOne student one project (independent development) with consultation from instructorsPotential for publication
40Three Phases of Project Phase 1 (due 9/22): Define your project subject. A brief literature survey and illustration of its importance.Phase 2 (due 10/20): Describe key methods.Phase 3 (due 11/17): Present key results.Final report: due 12/8
41Discussion What do you expect from this course? - content? - ways of teaching?- how the instructors can help?-…
42Assignments Suggested reading: Optional reading: Bioboxes in “Neil C.Jones and Pavel A. Pevzner: An Introduction to Bioinformatics Algorithms (Computational Molecular Biology). MIT Press, 2004.”Optional reading:Chapter 1 in “Current Topics in Computational Molecular Biology, edited by Tao Jiang, Ying Xu, and Michael Zhang. MIT Press ”