Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.

Similar presentations


Presentation on theme: "Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics."— Presentation transcript:

1 Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics

2 Agenda - What is Bioinformatics? - Goals - Molecular Biology – Genes & Proteins - AI Techniques applied to Gene & Protein Studies - Molecular Biology – Phylogenetic Trees - CS Techniques applied to Tree Estimation - Databases - Tools - Results - Discussion Bioinformatics Introductio n

3 Bioinformatics What is Bioinformatics? - Entire field of Computational Biology? - Computational Molecular Biology? - Application of Computer Science to Genome Analysis? Bioinformatics Introductio n

4 Bioinformatics What is Bioinformatics? Definition: …conceptualizing biology in terms of molecules (in thesense of physical chemistry) and applying “informatics techniques” (derived from disciplines such as applied maths, computer science and statistics) to understand and organize the information associated with these molecules, on a large scale. In short, Bioinformatics is an information management system for molecular biology… Bioinformatics Introductio n

5 Bioinformatics Organizing existing biological data. Developing tools and techniques to mine the data. Using the data and tools for knowledge discovery. Bioinformatics Goals

6 Bioinformatics Genetics - Genome - Chromosomes - Genes - Nucleotides - Base Pairs - Key Point The sequence of nucleotides in a gene determines its functions and changes in the sequence can lead major changes in those functions. Bioinformatics Molecular Biology

7 Bioinformatics Proteins - Linear chains of amino acids - Structural Components - Primary - Secondary - Tertiary - Quaternary - Key Point The four structural components along with the chemical properties of the amino acids determine the function of the protein. Bioinformatics Molecular Biology

8 Bioinformatics Artificial Intelligence - Components - Performance element - Learning element - Critic - Training - Testing - Operation Bioinformatics Techniques

9 Bioinformatics Decision Trees Bioinformatics Attribute 1 Attribute 2 Condition 1 Condition 2 Condition 3 Condition 1Condition 2Condition 1Condition 2Condition 1Condition 2 Result 1Result 2Result 1Result 2Result 1Result 2 Techniques

10 Bioinformatics Decision Trees Bioinformatics Attribute 2 Condition 1 Condition 2 Result 1Result 2 Techniques

11 Bioinformatics Neural Networks Bioinformatics Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 Decision f act Techniques

12 Bioinformatics Belief Networks Bioinformatics Techniques Attribute 1 Result 1Attribute 2Attribute 3 Result 2Result 3 p = 0.3p = 0.7 p = 0.3 p = 0.4 p = 0.5

13 Bioinformatics Hidden Markov Models Bioinformatics Techniques Match StateDelete StateInsert State

14 Bioinformatics Phylogenetic Trees - Used to map evolutionary relationships - Traditionally done at the organism level - Mapping at molecular level can help evaluate the relationships and/or evolution of genetic structures, proteins or organisms Bioinformatics Molecular Biology

15 Bioinformatics Tree Estimation - Number of Trees (T) for a given number of taxa (n) - T increases very rapidly (10 8 trees for 11 taxa) - Need efficient search methods Bioinformatics Techniques

16 Bioinformatics Exhaustive Search - Brute Force Method - Algorithm - Create all possible trees - Evaluate against optimality criteria - Select best tree - Only used up to 11 taxa Bioinformatics Techniques

17 Bioinformatics Branch and Bound - Effectively used for problems involving less than 20 taxa (approximately 10 22 trees) - Algorithm - Establish minimally acceptable criteria - Evaluate all n taxa trees, discard ones not meeting criteria - Evaluate n+1 taxa trees using remaining 4 taxa trees as bases - Repeat until all taxa have been evaluated - Select optimal remaining tree Bioinformatics Techniques

18 Bioinformatics Branch Swapping - Used in most phylogenetic tree estimates - Algorithm - Construct trees with n taxa - Discard all but optimal tree - Rearrange branches of optimal tree to check for more optimal arrangement - Best tree becomes base for n+1 taxa - Repeat for n+1 taxa Bioinformatics Techniques

19 Bioinformatics Divide and Conquer - Subdivides problem by finding optimal sub-trees into a super-tree - Algorithm - Select a subset size (less than n) - Divide taxa into subsets - Find optimal trees for each subset of taxa - Combine optimal sub-trees into super-tree with all taxa Bioinformatics Techniques

20 Bioinformatics Problem All the previous methods (except Exhaustive Search) may result in a finding a locally optimal tree, but not the globally optimal tree Bioinformatics Techniques

21 Bioinformatics Stochastic Methods - Simulated Annealing Algorithm - Create trees for n taxa (based on other methods) - Evaluate against optimality criteria, select best - Evaluate remaining trees using other parameters (“cooling schedule”) - Tree retained is one best meeting both optimality criteria and cooling schedule - Allows retention of a less optimal tree in some cases, but may lead to better globally optimal result Bioinformatics Techniques

22 Bioinformatics Stochastic Methods - Genetic Algorithm - Create trees for n taxa (based on other methods) - Select a population of trees to proceed to next generation - Allow trees to mutate or cross over based on criteria established by designer - Follows the Darwinian Evolution Model (Survival of the Fittest) Bioinformatics Techniques

23 Bioinformatics Databases - Overwhelming amount of information available - As of 1998, over 200 databases - Some have well over 1,000,000 entries - Includes sequences and metadata - Most freely available over web Bioinformatics Resources

24 Bioinformatics Databases - EpoDB - Used for study of gene regulation of blood - Organized by gene, not structure - 10,000 entries - GenBank - Operated by NIH - Over 18,000,000 records - Contains info on all publicly available DNA sequences - Flat file structure Bioinformatics Resources

25 Bioinformatics Databases - GeneCards - Focus on medical aspects of genetics - Uses metadata - Provides efficient navigation system to other databases - The Genome Database - Official database for HGI - Information includes maps of gene locations, genetic structure and variations. Bioinformatics Resources

26 Bioinformatics Databases - PIR – International Protein Sequence Database - oldest database of molecular sequence info - begun in 1960’s (paper based) - info on protein sequences, functional and structural properties and phylogeny - SWISS-PROT - Protein database (90,000 entries) - Links to other databases - Most often cited Bioinformatics Resources

27 Bioinformatics Tools - Search engines - Programming languages for structured queries - Phylogenetic Tree Analysis tools Bioinformatics Resources

28 Bioinformatics Tools - BLAST (Basic Local Alignment Search Tool) - Dominant search engine for biological sequence databases. - Uses an algorithm that concentrates on finding regions of high local similarity and then attempting to extend the sequence over adjacent areas. - Provides an estimate of the statistical significance of sequence matches. - Various versions Bioinformatics Resources

29 Bioinformatics Tools - Entrez - Search and retrieval system at National Center for Biotechnology Information - Searches all databases at NCBI for information on nucleotide and protein sequences, macromolecular structures and whole genomes. - User defined custom search strategies - Frequently cited Bioinformatics Resources

30 Bioinformatics Tools - Kleisli - Integrated data management system - Functional programming language (CPL) - Built in data types – user extensible - Extends Flat and Relational DBs to OODB - Works with Sybase, ORACLE, Entrez & BLAST Bioinformatics Resources

31 Bioinformatics Tools - PHYLIP (Phylogeny Inference Package) - Collection of tools for developing trees - Works with proteins and genes - Uses branch and bound & branch swapping techniques. - Created in 1980 (lots of citations) - Freely available on web (both source code & executables Bioinformatics Resources

32 Bioinformatics Tools - SMART (Simple Modular Architecture Research Tool) - Analyzes protein sequences - Can identify more than 400 structural families - Information on phylogeny, function and structure - Uses Hidden Markov Models - Web-based Bioinformatics Resources

33 Bioinformatics Human Genome Project - Requires i dentifying and decoding 35,000 genes - From 2,000 – 2,000,000 base pairs per gene - First draft (~90% of base pairs) in 2001 - Recently published 4 th chromosome map (87,000,000 base pairs) - Expect to complete in April, 2003 Bioinformatics Research

34 Bioinformatics Other Work - HIV-1 Genome Mutation Detection - Link between Neuregulin-1 and Schizophrenia - MLP and Cardiomyopathy Link Bioinformatics Research

35 Bioinformatics Other Work - Study to Identify Genetic & Environmental Disease Causes - “in silico” Biology Bioinformatics Research

36 Bioinformatics - What level of domain knowledge is needed for IT professionals working in Bioinformatics? Bioinformatics Discussion - What courses would be needed in a Bioinformatics curriculum? - Is a Bioethics course needed for IT professionals working in the field?


Download ppt "Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics."

Similar presentations


Ads by Google