Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective,

Similar presentations


Presentation on theme: "Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective,"— Presentation transcript:

1 Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective, Expressive, Interpretable

2 Motivations Understanding correlations between genotype and phenotype Predicting genotype phenotype Phenotypes: –Protein function –Drug/therapy response –Drug-drug interactions for expression –Drug mechanism –Interacting pathways of metabolism

3 Projects –Homology detection, protein family classification (funded by a DuPont S&E award)  Support Vector Machines  Hidden Markov models  Graph theoretic methods –Probabilistic modeling for BioSequence (funded by NIH)  HMMs, and beyond  Motifs finding  Secondary structure –Comparative Genomics  Identify genome features for diagnostic and therapeutic purposes (funded by an Army grant)  Evolution of metabolic pathways Tree and graph comparisons

4 Detect remote homologues Attributes to be looked at: -Sequence similarity, Aggregate statistics (e.g., protein families), Pattern/motif, and more attributes (presence at phylogenetic tree). How to incorporate domain specific knowledge into the model so a classifier can be more accurate? Results: -Quasi-consensus based comparison of profile HMM for protein sequences (submitted to Bioinformatics) -Using extended phylogenetic profiles and support vector machines for protein family classification (SNPD 04) -Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships (JCB 2003)

5 Support Vector Machines

6 1 1 1 1 0 1 1 1 1 1 = 3 0.5 0 1 1 1 1 = 3 0.1 x = y = z = Hamming distanceTree-based distance Data: phylogenetic profiles - How to account for correlations among profile components?  profile extension (Narra & Liao, SNPD 04)

7 From MSA to profile HMMs using existing packages (SAM-T99 or HMMER) Generation of quasi consensus sequence from the model Alignment of consensus sequence of a model with the other model Extraction of two alignments in each direction Quasi consensus based comparison of HMMs V G A - - H A G E Y V - - - - N V D E V V E A - - D V A G H V K G - - - - - - D V Y S - - T Y E T S F N A - - N I P K H I A G A D N G A G V A G A - - H D G E F V - - - - N V D E F C K A - - D V A G H V K G - - - - - - F V L S - - T I E T S D N K - - T I A K H I A G A D T G A G V V G A - - H A G E Y V - K A - T I A E H A - G A - H D G E F Consensus 2 Seed 1 Seed 2 A G A - - H D G E F V - G A N - V A E H V - G A H - A G E Y Seed 2 Consensus 1 Seed 1 V - K A - T I A E H V G A - - H A G E Y V - - - - N V D E V V E A - - D V A G H V K G - - - - - - D V Y S - - T Y E T S F N A - - N I P K H I A G A D N G A G V V G A - - N V A E H S(c 2 |M 1 ) A - G A - H D G E F V G A - - H A G E Y Aln 21 A G A - - H D G E F V - G A H - A G E Y Aln 12 V - G A N - V A E H A G A - - H D G E F V - - - - N V D E F C K A - - D V A G H V K G - - - - - - F V L S - - T I E T S D N K - - T I A K H I A G A D T G A G V V K A - - T I A E H S(c 1 |M 2 ) M 1 V G A N V A E H Consensus 1 M 2 V K A T I A E H Consensus 2

8

9 Sequence Models (HMMs and beyond) Motivations: What is responsible for the function? –Patterns/motifs –Secondary structure To capture long range correlations of bio sequences –Transporter proteins –RNA secondary structure Methods: generative versus discriminative –Linear dependent processes –Stochastic grammars –Model equivalence

10 TMMOD: An improved hidden Markov model for predicting transmembrane topology (to appear in IEEE ICTAI04)

11 Mod.Reg. Data set Correct topology Correct location Sens- itivity Speci- ficity TMMOD 1 (a) (b) (c) S-83 65 (78.3%) 51 (61.4%) 64 (77.1%) 67 (80.7%) 52 (62.7%) 65 (78.3%) 97.4% 71.3% 97.1% 97.4% 71.3% 97.1% TMMOD 2 (a) (b) (c) S-83 61 (73.5%) 54 (65.1%) 65 (78.3%) 61 (73.5%) 66 (79.5%) 99.4% 93.8% 99.7% 97.4% 71.3% 97.1% TMMOD 3 (a) (b) (c) S-83 70 (84.3%) 64 (77.1%) 74 (89.2%) 71 (85.5%) 65 (78.3%) 74 (89.2%) 98.2% 95.3% 99.1% 97.4% 71.3% 97.1% TMHMMS-83 64 (77.1%)69 (83.1%)96.2% PHDtmS-83 (85.5%) (88.0%)98.8%95.2% TMMOD 1 (a) (b) (c) S-160 117 (73.1%) 92 (57.5%) 117 (73.1%) 128 (80.0%) 103 (64.4%) 126 (78.8%) 97.4% 77.4% 96.1% 97.0% 80.8% 96.7% TMMOD 2 (a) (b) (c) S-160 120 (75.0%) 97 (60.6%) 118 (73.8%) 132 (82.5%) 121 (75.6%) 135 (84.4%) 98.4% 97.7% 98.4% 97.2% 95.6% 97.2% TMMOD 3 (a) (b) (c) S-160 120 (75.0%) 110 (68.8%) 135 (84.4%) 133 (83.1%) 124 (77.5%) 143 (89.4%) 97.8% 94.5% 98.3% 97.6% 98.1% TMHMMS-160123 (76.9%)134 (83.8%)97.1%97.7%

12 Genomics study of enterobacterial BT agents (funded by the US Army via Center for Biological Defense, USF ) Goals: –Identification of genes and sequence tags as targets for novel diagnosis and therapy –BT agents: Yersinia pestis, Salmonella, Escherichia coli O157:H7) Methods: –Various bioinformatics tools and databases

13 Comparative Genomics Motivation: –Evolution of metabolic pathways –Gene functions –De novo (alternative pathways) Genetic engineering Drug discovery Methods: –Put data into a context: knowledge/data representation Trees, graphs, etc. –Learning models/methods

14 O1O1 O2O2 OmOm P1P1 P1P1 PnPn 101 0 11 01 0         Profiling: pairs of attribute-value

15 What we found: Informative way to compare genomes Majority pathways (or rather their enzyme components) evolve in congruence with species

16 What we do next: –Database and search engine –Off-line self-consistent iteration –Pathways in a network Graph comparisons –Identify key components of networks –Small world topology Cross-level interactions with regulatory networks


Download ppt "Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective,"

Similar presentations


Ads by Google