Computer Science and Bioinformatics James Edwards and Rajinder Singh Bhatti.

Slides:



Advertisements
Similar presentations
Artificial Intelligence Presentation
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Bioinformatics For MNW 2 nd Year Jaap Heringa FEW/FALW Integrative Bioinformatics Institute VU (IBIVU) Tel ,
Bioinformatics at IU - Ketan Mane. Bioinformatics at IU What is Bioinformatics? Bioinformatics is the study of the inherent structure of biological information.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Sequencing and Sequence Alignment
Using Bioinformatics to Make the Bio- Math Connection The Confessions of a Biology Teacher.
Bioinformatics and Phylogenetic Analysis
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 18: Application-Driven Hardware Acceleration (4/4)
DNA Computing DCS 860A-2008 Team 3 December 20, 2008 Marco Hernandez, Jeff Hutchinson, Nelson Kondulah, Kevin Lohrasbi, Frank Tsen.
Building Knowledge-Driven DSS and Mining Data
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Protein Structures.
From T. MADHAVAN, & K.Chandrasekaran Lecturers in Zoology.. EXIT.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
CSE 6406: Bioinformatics Algorithms. Course Outline
Problem Statement and Motivation Key Achievements and Future Goals Technical Approach Investigators: Yang Dai Prime Grant Support: NSF High-throughput.
Introduction to Bioinformatics Prologue. Bioinformatics Living things have the ability to store, utilize, and pass on information Bioinformatics strives.
Artificial Intelligence
Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Decision Support Systems Chapter 10.
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
Approximate Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
Intelligent systems in bioinformatics Introduction to the course.
Agent-based methods for translational cancer multilevel modelling Sylvia Nagl PhD Cancer Systems Science & Biomedical Informatics UCL Cancer Institute.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Chapter 11 Artificial Intelligence Introduction to CS 1 st Semester, 2015 Sanghyun Park.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
I Robot.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Overview of Bioinformatics 1 Module Denis Manley..
Bioinformatics The Prediction of Life Tony C Smith Department of Computer Science University of Waikato
Information Technology in the Natural Sciences Biology – Chemistry – Physics.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College Bio Informatics January
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Bioinformatics Dipl. Ing. (FH) Patrick Grossmann
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Chapter 9 : Application Areas. 2 Some Advance Application Areas of Computers  Software Development  Artificial Intelligence  Robotics  Industrial.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Presenter: Bradley Green.  What is Bioinformatics?  Brief History of Bioinformatics  Development  Computer Science and Bioinformatics  Current Applications.
Business Analytics Several odds and ends Copyright © 2016 Curt Hill.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Sub-fields of computer science. Sub-fields of computer science.
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
High-throughput Biological Data The data deluge
Genomes and Their Evolution
Objective of This Course
Data Warehousing and Data Mining
Introduction to Bioinformatic
Knowledge Representation
Introduction to Artificial Intelligence Instructor: Dr. Eduardo Urbina
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Computer Science and Bioinformatics James Edwards and Rajinder Singh Bhatti.

Biology and Computer Science?  Initially Biology depended on Chemistry to make major strides Biochemistry  Biology then needed to work at atomic level explaining phenomena Biophysics  The modern era of Biology needs to interpret a wealth of data, tools that only computer Science is able to provide Hence Bioinformatics

What is Bioinformatics?  The study of computational methods to expand the use of biological data (Data Orientated).  Often (incorrectly) used instead of the term ‘Computational Biology’. However this is a slightly different discipline.  Computational Biology is the use of computational and mathematical methods to study or simulate biological systems (Hypothesis Orientated). [source National Institutes of health]

Overlaps Between the two Disciplines  1 – Bioinformatics problems  2 – Computational Biology problems  3 – Problems in both categories  4 – Problems in neither category

Motivation for Bioinformatics  Quote from Donald Knuth Turing Award winner:  “…I can’t be as confident about computer science as I can about biology. Biology easily has 500 years of exciting problems to work on. It’s at that level.” [source – Wikiquotes]  Can Biological life be equated with Computing? Results so far would suggest the answer is yes!

Common Bioinformatics Problems  Finding and assessing Similarities between Strings (next slides).  Detecting patterns in strings.  Constructing trees of the evolution of organisms.  Classifying new data by clustering existing data.  Also applications of Machine Vision to detect interactions between proteins

Data Structures Used in Biology  Strings for representing sequences (e.g. DNA, RNA, Amino Acid Sequences). “ATACGGCGCGCAAGGCT” “TATGCCGCGCGTTCCGA”  Trees for representing the evolution of organisms and other purposes. Prokaryotes Eukaryotes ReptilesBirds ……

Data Structures (Cont..)‏  Graphs can represent signalling pathways (often found in Neural networks)  3d Points and their Linkages can represent protein structures.

First Instance of a Problem – DNA Shotgun Sequencing  In order to derive a DNA sequence, the DNA must first be duplicated many times. ATCACCGTAAGAGGA  It must then be processed by Gel Electrophoresis, which ‘chops’ the DNA into smaller pieces named ‘fragments’. ATCACCGTAAGAGGA ATC CCGT AGGA AAG CCGT AAGA TCA This is a very simplified Instance of the problem typically each fragment can be between 250 and 1000 Bases long. TAA GTA

Alignments – the Smith Waterman Method.  How do we identify fragments which link together?  Can use dynamic programming to compute optimal alignment scores between fragments.  Align with either match (1) gap -(1/3 x length of gap) or mismatch (-1).  The score in each cell is the best total score from an already chosen cell/row + the cost of the alignment. If a score is < 0 it is said to be 0.  The first row is always filled with 0’s 000A 010T 000A CTA

The Smith Waterman algorithm (Continued)  Following this trace back a path through the optimum alignment starting at the highest number in the matrix to the first 0.  In this case it is: ‘AT’  Algorithm extremely expensive O(NM) run time and O(NM) storage complexity.  Always finds optimum solution. 000A 010T 000A CTA

Alternative to SW Algorithm  Sequences are usually at the very least tens of thousands of characters long Makes O(NM) runtime (and storage complexity) unacceptable.  Alternative – use BLAST (Basic Local Alignment Search Tool) Algorithm.  Gives a much more reasonable run time of O(N+M).  However does not always compute best solution.

BLAST Algorithm  Computing an entire matrix of values will always require N x M space.  Iterating over values will always require N x M Space. Solution: Ignore parts of the alignment which are unlikely to improve the score.  This improves the Storage Complexity as only a singular alignment must be stored.  It also improves the Runtime Complexity as at each stage of the algorithm only the optimum so far is processed.

BLAST Illustrated  The strings at the beginning and end are very unlikely to improve the score of the alignment. Therefore no gap and mismatches are computed in the matrix CTCTCTCTCATTGATTGCGGGGGG GGGGGGGGGATTGATTGCCCCCCC ATTGATTGC  Consider forming an alignment between two sequences:

Alignments Relation to Shotgun Sequencing.  So now there is a way to measure which fragments are likely to align we still need a way to find the correct order efficiently.  In depth Algorithm beyond scope of presentation  However the best current techniques are: Greedy Methods (align every element – then use only best solutions). Evolutionary Algorithms (start with initial set of solutions, computing sum of alignment scores then ‘evolve’ set of solutions in each iteration).  Problem is NP- Hard – Techniques give Approximations.

Relating Computer Science to Biology  What have us Computer Science students studied so far in this MSc course that can have some use to Bioinformatics? Data Mining Artificial Intelligence Heuristic approaches (e.g. Knowledge Representation – Logics) ‏ Algorithm Techniques

Data Mining and Bioinformatics How and why?  Some of you do COMP 527 Data Mining with Rob  Why Data Mining is essential in Bioinformatics. KDD (Knowledge Discovery DB) is the process of finding useful information and patterns in data. Data Mining is the use of algorithms to extract information and patterns derived by the KDD process.  Graphical Techniques such as Brush, Data smoothing etc.

Data Mining and Bioinformatics Algorithm implementation examples  Data Mining algorithm use for tackling problems in Bioinformatics In conjunction with microarray Technology  Predict a patients outcome, such as survival time disease recurrence health risk assessments etc... How does Data Mining help?  Accurate predictions could help provide better treatment!

AI and Bioinformatics Artificial Intelligence?  Research in genetics, molecular biology etc. generate enormous amounts of data  Use AI to extract useful information from the wealth of available data  Build good probabilistic models (gene models) ‏  AI provides several powerful algorithms and techniques solving these problems using the stored data

AI and Bioinformatics AI techniques used Neural networks (Biological and Artificial) ‏ Hidden Markov models (Probabilistic Statistical models) ‏ Bayesian networks (Models logic) ‏ and many others....

Logic and Bioinformatics  Biology works by applying prior knowledge “what is known” to unknown entities. Therefore Biology said to be knowledge- based (rather than axiom based) ‏ Use pre-existing knowledge to make inferences about the item under investigation.  Description Logic?

Description Logic and Bioinformatics  Why description Logic? decidable logic with good systems impossible for a single biologist to deal with all of a domains knowledge!  similar to programmers writing extremely complex programs without an IDE to help with libraries medical diagnosis systems make good use of ABOX and TBOX assertions for example, determine if a patients problem is an element of a particular known disease

Description Logic Example TBOX sick person isInfected.Cancer non_sick person isInfected.Cold ABOX Tim : personSteven : person Cancer : ProblemCold : Problem (Tim, Cancer) : isInfected (Steven, Cold) : isInfected

Improvements How far has Bioinformatics come?  “One is struck both by how far the field has come in a relatively short period of time, and also by how far it has yet to go.” - Jessica D. Tenenbaum  The discipline of Bioinformatics has vastly improved over recent years due to Fast technological development of the computer industry Demand for Computer Scientists - more computer scientists than ever before! Biological “unknown” discoveries – things that are discovered with no previous knowledge base Growing of sub-Biology interests, such as molecular Biology

Improvements How far will Bioinformatics go?  Thoroughly depends if the gap between Biology and Computer Science increases or decreases The gap increases if educational institutions decide ignore Bioinformatics  Put emphasis on prospective students Computer Scientists choose to ignore Biology Biologists choose to ignore Computer Science

Closing the gap I  Biologists cannot build their own analytical tools  Computer Scientists don't know what to build!

Closing the gap II  Putting a Computer Scientist (Data Mining expert) into a room with a Biologist investigator wont solve the problem Boundaries such as methodologies and discipline language are a problem.

Closing the gap III  Computer Science is the “science of the artificial”  Biology is the “science of discovery”  The only way to bridge the gap is for both parties to learn the basic fundamentals of each science

Breakthroughs of Bioinformatics  Spatial patterns of structures for understanding protein folding, evolution, and biological functions To predict protein functions, we develop a method by rapidly matching local surfaces and by incorporating evolutionary information specific to individual binding region via a Bayesian Monte Carlo approach.  These kinds of breakthroughs encourage the computer industry to get involved and work with Biology.

Related Problems? Are there any other disciplines which involve the similar integration of Computer Science with Biology?  Cheminformatics/Chemoinformatics the application of informatics tools to solve discovery chemistry problems an integral component of hit and lead generation development of new computational methods or efficient algorithms for chemical software, and pharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery

Related Problems? Are there any other disciplines which involve the similar integration of Computer Science with Biology?  Other similar interests Ecoinformatics Geoinformatics Quantum informatics Astroinformatics Business informatics And many others...

Follow ups of Jacques Cohen  Bioinformatics—an introduction for computer scientists is a previous publication from Jacques Cohen aims to encourage Computer Scientists to get involved with Biology  Updating Computer Science Education released after Bioinformatics and Computer Science Talks about encouraging the next generation of Computer Scientists that Computer Science is more than just programming.

Who is Jacques Cohen?  Currently serving Brandies University since  Docter in the field of analysis of algorithms, parsing and compiling, memory management, logic and constraint logic programming, and parallelism  Recently started researching his interest of Bioinformatics  His most recent publication is about methods used in microarray Data Interpretation See

References and related material (All web links last accessed 4 th February 2008)‏  Shotgun sequencing G. Luque, E. Alba Torres and S. Khuri, Assembling DNA Fragments with a Distributed Genetic Algorithm, Parallel Computing for Bioinformatics and Computational Biology, Wiley- Interscience, New Jersey, 2006, Chapter 12, pp  L.D. Paulson, Bioinformatics Experiences Important Breakthroughs, 2005, pp  J. Cohen, Bioinformatics: An Introduction for Computer Scientists, ACM Computing Surveys, 36(2), ,  B. Tjaden, J. Cohen, A Survey of Computational Methods used in Microarray Data Interpretation, Applied Mycology and Biotechnology, Bioinformatics 6,  J. Cohen, Updating Computer Science Education, Communications of the ACM, 48(6), 29-31,  J. Cohen, Computational Molecular Biology: A Promising Application Using Logic Programming and Constraint Logic Programming, Lecture Notes in Artificial Intelligence,  R. Stevens, C.A. Goble and S. Bechhofer, Ontology-based Knowledge Representation for Bioinformatics,  Jinyan Li, Limsoon Wong and Qiang Yang, Data Mining in Bioinformatics,  Various material about Bioinformatics,  Data Mining in Bioinformatics, muenchen.de/Forschung/Bioinformatics/ muenchen.de/Forschung/Bioinformatics/