Kiyoko F. Aoki-Kinoshita Dept. of Bioinformatics, Soka University

Slides:



Advertisements
Similar presentations
Bioinformatics Platform Three-tier Architecture Object-based Relational Database implemented using Oracle Middleware implemented using Entity-Class Operations,
Advertisements

Statistical Data Fusion to Prioritize Lists of Genes Bert Coessens, Stein Aerts Departement ESAT - SCD Katholieke Universiteit Leuven Promotor: Bart De.
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
A Multi-PCA Approach to Glycan Biomarker Discovery using Mass Spectrometry Profile Data Anoop Mayampurath, Chuan-Yih Yu Info-690 (Glycoinformatics) Final.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Principal Component Analysis
Bioinformatics Core (B) Progress and Future Goals
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Comparative ab initio prediction of gene structures using pair HMMs
Sequence alignment, E-value & Extreme value distribution
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.
EUROCarbDB CCRC – Database for high quality mass spectrometry data Khalifeh Al Jadda 1, Haseeb Yousef 1, Kitae Myong 1, Srikalyan Swayampakula 1, David.
Semantic Similarity over Gene Ontology for Multi-label Protein Subcellular Localization Shibiao WAN and Man-Wai MAK The Hong Kong Polytechnic University.
Multiple Examples of tumor tissue (public data from Whitehead/MIT) SVM Classification of Multiple Tumor Types DNA Microarray Data Oracle Data Mining 78.25%
Protein Sequence Alignment and Database Searching.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2008 Colin Dewey Dept. of Biostatistics & Medical Informatics.
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Glycan database. Database of molecules Two models (of vocabularies) – Proteins / Nucleic Acids Residues (+ modifications) Genbank / Swissprot – Compounds.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Sequence Based Analysis Tutorial
Bioinformatics Research Overview Outline Biomedical Ontologies oGlycO oEnzyO oProPreO Scientific Workflow for analysis of Proteomics Data Framework for.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective,
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
David Amar, Tom Hait, and Ron Shamir
BME435 BIOINFORMATICS.
Data-intensive Computing: Case Study Area 1: Bioinformatics
Protein Families, Motifs & Domains.
Using ArrayExpress.
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Mangaldai College, Mangaldai
חיזוי ואפיון אתרי קישור של חלבון לדנ"א מתוך הרצף
Predicting Active Site Residue Annotations in the Pfam Database
Gene Expression Analysis and Proteins
RECOMB 2001 The Fifth Annual International Conference On Computational Molecular Biology Montreal, Canada.
Bioinformatics Biological Data Computer Calculations +
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Sequence Based Analysis Tutorial
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Sequence Based Analysis Tutorial
The Challenge and Promise of Glycomics
High level view of the MAE algorithm.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
(A) Design of the PhosphoPep database.
BIOBASE Training TRANSFAC® ExPlain™
Sequence alignment, E-value & Extreme value distribution
Modeling IDS using hybrid intelligent systems
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Glycome Informatics for the Practical Application of Computational Models Kiyoko F. Aoki-Kinoshita Dept. of Bioinformatics, Soka University (formerly of Bioinformatics Center@Kyoto University) Frontiers in Glycomics: Bioinformatics and Biomarkers in Disease September 11-13, 2005

Another vocabulary term Glycome Informatics Algorithms, methods and computational models for the study of the glycome (Glycome: the repertoire of glycans in a cell, tissue, or organism)

Informatics Techniques for Glycomics Mass spec prediction/annotation StrOligo (M. Ethier et al, Methods Mol Biol., 2006) Cartoonist (D. Goldberg et al, Proteomics, 2005) H. Tang et al, Bioinformatics, 2005 Method using glycan arrays Structure prediction from glycosyltransferase expression data (S. Kawano et al, Bioinformatics, 2005)

Algorithmic Techniques for Glycome Informatics Computer Theoretic Algorithms for Trees KCaM: K.F. Aoki et al, NAR, 2004 Score matrix for glycan linkages, K.F. Aoki et al, Bioinformatics, 2005 Least common supertree approximation algorithm for reconstructing glycans from spectral data, K.F. Aoki-Kinoshita et al, ISAAC 2006 Probabilistic Models PSTMM, N. Ueda et al, TKDE, 2005 Profile PSTMM, K.F. Aoki-Kinoshita et al, ISMB 2006 OTMM, Hashimoto et al, KDD 2006 Kernel Methods Leukemia marker detection, Y. Hizukuri et al, Carbohydrate Research, 2005 General purpose marker detection, T. Kuboyama et al, GIW 2006 (submitted) Proteins Glycans Smith-Waterman KCaM PAM/ BLOSUM Glycan Score Matrix (Profile) HMM (Profile) PSTMM

Applications of Probabilistic Models Statistically compute the common patterns in tree structures Profile PSTMM (Probabilistic Sibling-dependent Tree Markov Model) Provided binding affinity data for a specific lectin, compute the most likely structure being recognized Statistically compute the key patterns of sulfation in GAGs based on various biological measurements (i.e. inhibition)

Kernel Methods Machine learning method http://www-kairo.csce.kyushu-u.ac.jp/~norikazu/research.en.html Machine learning method e.g. Support Vector Machines (SVM) Can handle features in high-dimensions e.g. Expression data, pathway information, localization information, etc. Statistically computes commonalities by reducing the dimensions of the data Data classification Feature extraction

Leukemia-specific features Hizukuri et al, Carbohydr. Res. 340, 2270-2278 (2005). Used KEGG GLYCAN data: Entries whose CarbBank annotations were related to leukemic cells, erythrocytes, plasma and serum Predicted possible glycan markers Correlated well with experimental data

Glyco-Databases & Resources:the Vision XML XML XML XML XML Bacterial Carbohydrate Structure DataBase

Resource for Glycome Informatics at Soka (RINGS) Goal: a publicly available web resource of tools for glycome analysis Started April, 2006 Currently based on KEGG GLYCAN, REACTION, ENZYME data Glycan structures Glycan interaction information Links proteins to related glycans 3D protein data from PDB Currently available tools: BLAST Server for nucleotide and protein sequences 2D Drawing Tool written in Java for queries Glycan structure estimation from microarray expression data

DrawRINGS 2D glycan structure drawing tool Can also query the RINGS database and retrieve similar structures Resulting Glycan IDs are linked to corresponding entry pages Each Interaction that the resulting glycans are involved in are also listed

BLAST Search Glycan-related proteins can be searched for by sequence using BLAST

3D Protein Structures

RINGS Microarray Tool Based on method of S. Kawano et al., Bioinformatics, 2005 Input: Glycosidic bonds and values Corresponding to glycosyltransferase expression data Output: Glycan structures and related interaction information

RINGS Microarray Tool

Ongoing Work Implementation of more free web-based tools for analysis Careful incorporation of other data is planned Glycosciences, CFG, BCSDB, etc. MSn data... But...

For the Community What tools are currently lacking? Web Portal Glycosciences.de has links to many resources Search of multiple resources from a single query interface By structure By protein By disease Or combo of the above Web-based MS analysis tool Open to suggestions/requests http://rings.t.soka.ac.jp Email: kkiyoko@t.soka.ac.jp

Acknowledgements Masao Ichikawa, Shuichi Ikeda, Kouichi Yamada, Takako Yamaguchi NIH