Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.

Slides:



Advertisements
Similar presentations
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Advertisements

A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004.
A new method of finding similarity regions in DNA sequences Laurent Noé Gregory Kucherov LORIA/UHP Nancy, France LORIA/INRIA Nancy, France Corresponding.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
1 Genetics The Study of Biological Information. 2 Chapter Outline DNA molecules encode the biological information fundamental to all life forms DNA molecules.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Archives and Information Retrieval
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
JYC: CSM17 BioinformaticsCSM17 Week 10: Summary, Conclusions, The Future.....? Bioinformatics is –the study of living systems –with respect to representation,
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Bioinformatics and Phylogenetic Analysis
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Protein Modules An Introduction to Bioinformatics.
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
Chapter 14 The Second Component: The Database.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Ayesha Masrur Khan Spring Course Outline Introduction to Bioinformatics Definition of Bioinformatics and Related Fields Earliest Bioinformatics.
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Bioinformatics.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
CSE 6406: Bioinformatics Algorithms. Course Outline
Introduction to Bioinformatics Prologue. Bioinformatics Living things have the ability to store, utilize, and pass on information Bioinformatics strives.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller November 18, 2004 Based on the Genomics in Biomedical Research course at.
DNA alphabet DNA is the principal constituent of the genome. It may be regarded as a complex set of instructions for creating an organism. Four different.
Copyright © 2010 Pearson Education Inc. Lecture 01 – Genetics & Genomics: An Introduction Based on Chapter 1 – Genetics: An introduction.
Organizing information in the post-genomic era The rise of bioinformatics.
Bioinformatics field of science in which biology, computer science, and information technology merge to form a single discipline.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Overview of Bioinformatics 1 Module Denis Manley..
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
WMU CS 6260 Parallel Computations II Spring 2013 Presentation #1 about Semester Project Feb/18/2013 Professor: Dr. de Doncker Name: Sandino Vargas Xuanyu.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics lectures at Rice University Li Zhang Lecture 11: Networks and integrative genomic analysis-3 Genomic data
Bioinformatics and Computational Biology
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
DNA Sequences Analysis Hasan Alshahrani CS6800 Statistical Background : HMMs. What is DNA Sequence. How to get DNA Sequence. DNA Sequence formats. Analysis.
Investigations of HIV-1 Env Evolution Evolutionary Bioinformatics Education: A BioQUEST Curriculum Consortium Approach Grand Valley State University August.
Entrez, dbSNP, GEO, OMIM & LinkOut JanPlan Entrez Distributed by NCBI in 1991 on CD-ROM Included linked nodes: GenBank & PDB Translated GenBank,
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Pipelines for Computational Analysis (Bioinformatics)
Genomes and Their Evolution
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Sequence Based Analysis Tutorial
Bioinformatics Vicki & Joe.
Explore Evolution: Instrument for Analysis
Genetics: From Genes to Genomes
The Study of Biological Information
Introduction to Bioinformatic
Lesson 3 Bioinformatics Laboratory
BSC1010: Intro to Biology I K. Maltz Chapter 21.
Supporting High-Performance Data Processing on Flat-Files
Presentation transcript:

Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics

Agenda - What is Bioinformatics? - Goals - Molecular Biology – Genes & Proteins - AI Techniques applied to Gene & Protein Studies - Molecular Biology – Phylogenetic Trees - CS Techniques applied to Tree Estimation - Databases - Tools - Results - Discussion Bioinformatics Introductio n

Bioinformatics What is Bioinformatics? - Entire field of Computational Biology? - Computational Molecular Biology? - Application of Computer Science to Genome Analysis? Bioinformatics Introductio n

Bioinformatics What is Bioinformatics? Definition: …conceptualizing biology in terms of molecules (in thesense of physical chemistry) and applying “informatics techniques” (derived from disciplines such as applied maths, computer science and statistics) to understand and organize the information associated with these molecules, on a large scale. In short, Bioinformatics is an information management system for molecular biology… Bioinformatics Introductio n

Bioinformatics Organizing existing biological data. Developing tools and techniques to mine the data. Using the data and tools for knowledge discovery. Bioinformatics Goals

Bioinformatics Genetics - Genome - Chromosomes - Genes - Nucleotides - Base Pairs - Key Point The sequence of nucleotides in a gene determines its functions and changes in the sequence can lead major changes in those functions. Bioinformatics Molecular Biology

Bioinformatics Proteins - Linear chains of amino acids - Structural Components - Primary - Secondary - Tertiary - Quaternary - Key Point The four structural components along with the chemical properties of the amino acids determine the function of the protein. Bioinformatics Molecular Biology

Bioinformatics Artificial Intelligence - Components - Performance element - Learning element - Critic - Training - Testing - Operation Bioinformatics Techniques

Bioinformatics Decision Trees Bioinformatics Attribute 1 Attribute 2 Condition 1 Condition 2 Condition 3 Condition 1Condition 2Condition 1Condition 2Condition 1Condition 2 Result 1Result 2Result 1Result 2Result 1Result 2 Techniques

Bioinformatics Decision Trees Bioinformatics Attribute 2 Condition 1 Condition 2 Result 1Result 2 Techniques

Bioinformatics Neural Networks Bioinformatics Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 Decision f act Techniques

Bioinformatics Belief Networks Bioinformatics Techniques Attribute 1 Result 1Attribute 2Attribute 3 Result 2Result 3 p = 0.3p = 0.7 p = 0.3 p = 0.4 p = 0.5

Bioinformatics Hidden Markov Models Bioinformatics Techniques Match StateDelete StateInsert State

Bioinformatics Phylogenetic Trees - Used to map evolutionary relationships - Traditionally done at the organism level - Mapping at molecular level can help evaluate the relationships and/or evolution of genetic structures, proteins or organisms Bioinformatics Molecular Biology

Bioinformatics Tree Estimation - Number of Trees (T) for a given number of taxa (n) - T increases very rapidly (10 8 trees for 11 taxa) - Need efficient search methods Bioinformatics Techniques

Bioinformatics Exhaustive Search - Brute Force Method - Algorithm - Create all possible trees - Evaluate against optimality criteria - Select best tree - Only used up to 11 taxa Bioinformatics Techniques

Bioinformatics Branch and Bound - Effectively used for problems involving less than 20 taxa (approximately trees) - Algorithm - Establish minimally acceptable criteria - Evaluate all n taxa trees, discard ones not meeting criteria - Evaluate n+1 taxa trees using remaining 4 taxa trees as bases - Repeat until all taxa have been evaluated - Select optimal remaining tree Bioinformatics Techniques

Bioinformatics Branch Swapping - Used in most phylogenetic tree estimates - Algorithm - Construct trees with n taxa - Discard all but optimal tree - Rearrange branches of optimal tree to check for more optimal arrangement - Best tree becomes base for n+1 taxa - Repeat for n+1 taxa Bioinformatics Techniques

Bioinformatics Divide and Conquer - Subdivides problem by finding optimal sub-trees into a super-tree - Algorithm - Select a subset size (less than n) - Divide taxa into subsets - Find optimal trees for each subset of taxa - Combine optimal sub-trees into super-tree with all taxa Bioinformatics Techniques

Bioinformatics Problem All the previous methods (except Exhaustive Search) may result in a finding a locally optimal tree, but not the globally optimal tree Bioinformatics Techniques

Bioinformatics Stochastic Methods - Simulated Annealing Algorithm - Create trees for n taxa (based on other methods) - Evaluate against optimality criteria, select best - Evaluate remaining trees using other parameters (“cooling schedule”) - Tree retained is one best meeting both optimality criteria and cooling schedule - Allows retention of a less optimal tree in some cases, but may lead to better globally optimal result Bioinformatics Techniques

Bioinformatics Stochastic Methods - Genetic Algorithm - Create trees for n taxa (based on other methods) - Select a population of trees to proceed to next generation - Allow trees to mutate or cross over based on criteria established by designer - Follows the Darwinian Evolution Model (Survival of the Fittest) Bioinformatics Techniques

Bioinformatics Databases - Overwhelming amount of information available - As of 1998, over 200 databases - Some have well over 1,000,000 entries - Includes sequences and metadata - Most freely available over web Bioinformatics Resources

Bioinformatics Databases - EpoDB - Used for study of gene regulation of blood - Organized by gene, not structure - 10,000 entries - GenBank - Operated by NIH - Over 18,000,000 records - Contains info on all publicly available DNA sequences - Flat file structure Bioinformatics Resources

Bioinformatics Databases - GeneCards - Focus on medical aspects of genetics - Uses metadata - Provides efficient navigation system to other databases - The Genome Database - Official database for HGI - Information includes maps of gene locations, genetic structure and variations. Bioinformatics Resources

Bioinformatics Databases - PIR – International Protein Sequence Database - oldest database of molecular sequence info - begun in 1960’s (paper based) - info on protein sequences, functional and structural properties and phylogeny - SWISS-PROT - Protein database (90,000 entries) - Links to other databases - Most often cited Bioinformatics Resources

Bioinformatics Tools - Search engines - Programming languages for structured queries - Phylogenetic Tree Analysis tools Bioinformatics Resources

Bioinformatics Tools - BLAST (Basic Local Alignment Search Tool) - Dominant search engine for biological sequence databases. - Uses an algorithm that concentrates on finding regions of high local similarity and then attempting to extend the sequence over adjacent areas. - Provides an estimate of the statistical significance of sequence matches. - Various versions Bioinformatics Resources

Bioinformatics Tools - Entrez - Search and retrieval system at National Center for Biotechnology Information - Searches all databases at NCBI for information on nucleotide and protein sequences, macromolecular structures and whole genomes. - User defined custom search strategies - Frequently cited Bioinformatics Resources

Bioinformatics Tools - Kleisli - Integrated data management system - Functional programming language (CPL) - Built in data types – user extensible - Extends Flat and Relational DBs to OODB - Works with Sybase, ORACLE, Entrez & BLAST Bioinformatics Resources

Bioinformatics Tools - PHYLIP (Phylogeny Inference Package) - Collection of tools for developing trees - Works with proteins and genes - Uses branch and bound & branch swapping techniques. - Created in 1980 (lots of citations) - Freely available on web (both source code & executables Bioinformatics Resources

Bioinformatics Tools - SMART (Simple Modular Architecture Research Tool) - Analyzes protein sequences - Can identify more than 400 structural families - Information on phylogeny, function and structure - Uses Hidden Markov Models - Web-based Bioinformatics Resources

Bioinformatics Human Genome Project - Requires i dentifying and decoding 35,000 genes - From 2,000 – 2,000,000 base pairs per gene - First draft (~90% of base pairs) in Recently published 4 th chromosome map (87,000,000 base pairs) - Expect to complete in April, 2003 Bioinformatics Research

Bioinformatics Other Work - HIV-1 Genome Mutation Detection - Link between Neuregulin-1 and Schizophrenia - MLP and Cardiomyopathy Link Bioinformatics Research

Bioinformatics Other Work - Study to Identify Genetic & Environmental Disease Causes - “in silico” Biology Bioinformatics Research

Bioinformatics - What level of domain knowledge is needed for IT professionals working in Bioinformatics? Bioinformatics Discussion - What courses would be needed in a Bioinformatics curriculum? - Is a Bioethics course needed for IT professionals working in the field?