Larry Lam Southern California Bioinformatics Summer Institute 2009 Graeber Lab – Crump Institute for Molecular Imaging UCLA A Data Management and Analysis.

Slides:



Advertisements
Similar presentations
David Campbell 1,, Eric Deutsch 1, Henry Lam 1, Hamid Mirzaei 1, Paola Picotti 2, Jeff Ranish 1, Ning Zhang 1, and Ruedi Aebersold 1,2,3 1.Institute for.
Advertisements

Protein Quantitation II: Multiple Reaction Monitoring
Weixi Zhong Mentor: Dr. Andrew Cameron Center for Computational Regulatory Genomics California Institute of Technology.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
Pathways & Networks analysis COST Functional Modeling Workshop April, Helsinki.
Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Distinguishing Regulators of Biomolecular Pathways Mentor: Dr. Xiwei Wu City of Hope Sean Caonguyen SoCalBSI 8/21/08.
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Aug. 20, JPL, SoCalBSI '091 The power of bioinformatics tools in cancer research Early Detection Research Network, JPL Mentors: Dr. Chris Mattmann,
ProInt Finder to Search Protein Interactions Shwe S. Lin Mentor: Matteo Pellegrini, UCLA.
Southern California Bioinformatics Summer Institute Wendie Johnston, Beverly Krilowicz, Jamil Momand, Sandra Sharp, Nancy Warter- Perez.
A Genomic Survey of Polymorphism and Linkage Disequilibrium Imran Mohiuddin Magnus Nordborg, Ph.D. University of Southern California.
Southern California Bioinformatics Summer Institute Wendie Johnston, Beverly Krilowicz, Jamil Momand, Sandra Sharp, Nancy Warter-Perez.
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
Southern California Bioinformatics Summer Institute Wendie Johnston, Beverly Krilowicz, Jamil Momand, Sandra Sharp, Nancy Warter-Perez.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 1 Introduction to Database Management.
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Enhancing the C-48 STAT3 Inhibitor
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Study of Arabidopsis’ Copper Regulation by High Throughput Sequence Data Analysis Steven A. Cardenas, SoCal BSI Dr. Pellegrini, PI, UCLA Dr. Casero Diaz-Cano,
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
ProReP - Protein Results Parser v3.0©
BWBmin Administrative Web Interface for Paracel BioView WorkBench Frances Tong Marc Rieffel, PhD Paracel Southern California Bioinformatics Summer Institute.
Building Bioinformatics Tools for Research Scientists Southern California Bioinformatics Summer Institute 2008 Andrew Clark Mentor: Dr. Ping Du, Allergan,
Genetic Effects of Stress in Vervet Monkey Olivera Grujic Dr. Eleazar Eskin’s Lab, UCLA Dr. Nelson Freimer’s Lab,UCLA SoCalBSI, 2008.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Daehee Hwang Leroy Hood Institute for Systems Biology.
Pathways Database System: An Integrated System For Biological Pathways L. Krishnamurthy, J. Nadeau, G. Ozsoyoglu, M. Ozsoyoglu, G. Schaeffer, M. Tasan.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Microsoft Access Database software. What is a database? … a database is an organized collection of data. A collection of data of similar information compiled.
A highly abbreviated introduction to proteomics
Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.
Copyright OpenHelix. No use or reproduction without express written consent1.
Bioinformatics Dr. Víctor Treviño BT4007
Information Systems: Databases Define the role of general information systems Describe the elements of a database management system (DBMS) Describe the.
© 2010 SRI International - Company Confidential and Proprietary Information Quantitative Proteomics: Approaches and Current Capabilities Pathway Tools.
Introduction to Database Management. 1-2 Outline  Database characteristics  DBMS features  Architectures  Organizational roles.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
ChipDB: An interactive database system for high- throughput expression analysis Peter Young, John Barnett, Bing Ren, Ezra Jennings and Richard Young Whitehead.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Extracting quantitative information from proteomic 2-D gels Lecture in the bioinformatics course ”Gene expression and cell models” April 20, 2005 John.
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and.
Proteomics databases for comparative studies: Transactional and Data Warehouse approaches Patricia Rodriguez-Tomé, Nicolas Pinaud, Thomas Kowall GeneProt,
CS779 Term Project Steve Shoyer Section 5 December 9, 2006 Week 6.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Copyright OpenHelix. No use or reproduction without express written consent1.
D. Heynderickx DH Consultancy, Leuven, Belgium 22 April 2010EuroPlanet, London, UK.
Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Pathway Ranking Tool Dimitri Kosturos Linda Tsai SoCalBSI, 8/21/2003.
High throughput biology data management and data intensive computing drivers George Michaels.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Introduction to FFI: Why and how FFI was developed Introduction to FFI: Why and how FFI was developed 04/02/2013.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics.
CPAS Comparative Proteomics Analysis System Adam Rauch LabKey Software
WEB-BASED APPLICATION for DIGITAL PATHOLOGY and MOLECULAR ANALYSIS Dave Billiter, BA, PMP, Tom Barr, BS, Mark Plaskow, BA, MCSD, Kathy Nicol, MD Research.
CellExpress Tutorial A Comprehensive Microarray-Based Cancer Cell Line and Clinical Sample Gene Expression Analysis Online System :8080 NTU.
MATLAB Distributed, and Other Toolboxes
Using Spotfire for Proteomic Analysis
A perspective on proteomics in cell biology
DATABASES WHAT IS A DATABASE?
Pathway Visualization
Presentation transcript:

Larry Lam Southern California Bioinformatics Summer Institute 2009 Graeber Lab – Crump Institute for Molecular Imaging UCLA A Data Management and Analysis Software Platform for Phospho-Proteomics Data

Outline Graeber Lab Background Project Objective My Experimental Project (Example Dataset) Software Design Software Demo Conclusion / Future Work Acknowledgements

Systems Biology of Cancer Signaling Lab Goals –Understand Cancer Signaling Through Systems Biology Approaches –[long term] Improve Cancer Treatment Signaling Pathway Modeling Through –Kinetics –Phospho-Profiling –Adaptor Complex Analysis

Project Objective Develop a Software Platform for Convenient Storage and Analysis of Large-Scale Data Sets -Design Database to Collect and Store Large Scale Proteomic Data Sets -Allow for Comprehensive Meta Information -Simplify Access to Multiple Data Sets -Simplify The Use of Common Tools of Analysis

BCR/Abl Leukemia BCR/Abl fusion protein found in - 90% - 95% of chronic myleoid leukemia - 20% of adult acute lymphoblastic leukemia - 5% of children acute lymphoblastic leukemia Analyze the adaptor proteins in BCR/Abl signaling - Adaptor proteins mediate protein interactions BaitBait PreyPreyPreyPrey Complex Capture Protein Interacting Protein

Experimental Workflow Experimental Protocol Mass Spectometry Quantitation Pipeline Mass Spectometry Quantitation Pipeline IPI Proteomics Database [Complex] NS Filter/ Consolidation Complex Phospho Profiling Quantitation Output File Manual Organization/ Analysis Purification Current Workflow

Identifying Interactions of the Crk Adaptor Proteins 1.Genetic modification of pro-B-lymphocytes (Baf3) Express adaptor + streptavidin binding peptide(SBP) 2.Culture 3.Lyse each culture for protein complex purification Crk I LysateCrk L LysateCrk II LysateNTAP Lysate

1.Separation of protein complex with streptavidin beads 2.Trypsin digestion from proteins to peptides 3.Separation of phosphorylated peptides with Fe(III)-NTA beads 4.Liquid Chromotography + Mass Spectometry 5.Quantitation Pipeline Protein Complex Purification P P P P

Quantitation Output File Consolidation of quantified peptides and associated proteins per sample All peptides identified All adaptor proteins used Phosphorylation position within the peptide [optional] Peptide SequenceDescription/ IPI Accession Crk ICrk LCrk IINTAP K.ADAAEFWR.KCBL IPI R.QEAVALLQGQR.HIsoform Crk-II IPI

NS Filter/Consolidation Quantitation Output File Collapse Peptides To Protein Quantity Remove Insignificant Proteins Heatmap Analysis Remove Known Contaminants Peptide SequenceDescription/ IPI Accession Crk ICrk LCrk IINTAP K.ADAAEFWR.KCBL IPI K.ALVIAHNNIEMAK.NCBL IPI R.QEAVALLQGQR.HIsoform Crk-II IPI K.IHYLDTTTLIEPVAR.SIsoform Crk-II IPI Quantity Is Normalized For Each Row

NS Filter/Consolidation Quantitation Output File Collapse Peptides To Protein Quantity Remove Insignificant Proteins Heatmap Analysis Remove Known Contaminants

NS Filter/Consolidation Quantitation Output File Collapse Peptides To Protein Quantity Remove Insignificant Proteins Heatmap Analysis Remove Known Contaminants Protein Enrichment Factor = (Median – NTAP Median)/  Protein NTAP

NS Filter/Consolidation Quantitation Output File Collapse Peptides To Protein Quantity Remove Insignificant Proteins Heatmap Analysis Remove Known Contaminants Configuration File of Known Contaminants

Statistical Analysis: Peptide Quantity Heatmap Java TreeView High Quantity Low Quantity Crk I Crk L CrkII NTAP Cbl Peptides Crk I Peptides

Experimental Workflow Experimental Protocol Mass Spectometry Quantitation Pipeline Mass Spectometry Quantitation Pipeline IPI Proteomics Database [Complex] NS Filter/ Consolidation Complex Phospho Profiling Quantitation Output File Manual Organization/ Analysis Purification Current Workflow Quantitation Import Local DB Statistical Analysis ExternalSources ExternalSources ExternalSources New Workflow

Program Design C# GUI Application Quantitation Output File DATA IMPORT MySQL Database DATA QUERY Quantitation Data Set R Statistical Function Programming Language: C# Database: MySQL –Free Statistical Computing: R –Free, Accessible to C#

Data Import Methodology 1.Define Meta Data (Descriptors) And Relationships About The Quantitation Values 2.Create The Tables In MySQL 3.Access Using MySQL Connector/Net

Statistical Analysis Methodology R Language and Environment for Statistical Computing and Graphics -Modeling -Statistical Tests -Clustering -Heatmaps Develop a Graphical User Interface To R Functions - Access R Functions Through R-(D)COM Interface

Software Demo

Conclusion Management Software –Standardized approach in maintaining lab data Analyze Data Sets –Analysis tools highly accessible to biologists of various technical levels Combine Data Sets –Potentially lead to new discoveries

Future Work Add More Links To External Database Enhance Data Query Include More Analysis Functions

Acknowledgments Graeber Lab Members –Dr. Thomas Graeber –Dr. Björn Titz SoCalBSI Faculty and Members –Dr. Jamil Momand –Dr. Sandy Sharp –Dr. Nancy Warter-Perez –Dr. Wendie Johnston –Dr. Beverly Krilowicz –Ronnie Cheng Funding

Main Window

Main Window: Options

Batch Import

Batch Information

Sample Information

Sample Information: Technical Replicates

Feature Type

Features

Project Assignment

Batch Prtotocol Assignment

Biological System Assignment

Import

Batch Query

Feature Type Selection

Matrix/Heatmap Dialog

Heatmap Options

Data Import Design Methodology BatchBatch FeatureFeature Label Description Experimenter Date Label Description Feature Type SampleSample Label Description Quality 1.Define Meta Data (Descriptors) About The Quantitation Values - Define Relationships 2.Create The Tables In MySQL 3.Develop Support for MySQL Access - MySQL Connector Feature Value Value Value Type V V V