Bioinformatics with MATLAB Noviembre 18, 2003 Pontificia Universidad Javeriana.

Slides:



Advertisements
Similar presentations
The Complete Technical Analysis and Development Environment An attractive alternative to MATLAB and GAUSS - Physics World.
Advertisements

Copyright © 2008, SAS Institute Inc. All rights reserved. Discovering Meaningful Patterns in Genomics Data with JMP Genomics Jordan Hiller JMP Genomics.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
BiGCaT Bioinformatics Hunting strategy of the bigcat.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
MATLAB Presented By: Nathalie Tacconi Presented By: Nathalie Tacconi Originally Prepared By: Sheridan Saint-Michel Originally Prepared By: Sheridan Saint-Michel.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Bioinformatics: One Minute and One Hour at a Time Laurie J. Heyer L.R. King Asst. Professor of Mathematics Davidson College
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
NYU Microarray Database (NYUMAD)
© 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Packard BioScience. Packard BioScience What is ArrayInformatics?
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Agilent: The Company, The Myth, The Lengend. Agilent: Agilent Technologies Inc. (NYSE: A) is a world-wide, diverse technology company focused on expansion.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Overview of Search Engines
Proteomics Understanding Proteins in the Postgenomic Era.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Anne Mascarin DSP Marketing The MathWorks
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Waters Corporation Connecting Data to Decisions John Swallow Principal Engineer Waters Data Products
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Digital Image Processing Lecture3: Introduction to MATLAB.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Copyright 2000, Media Cybernetics, L.P. Array-Pro ® Analyzer Software.
© 2004 The MathWorks, Inc. 1 MATLAB for C/C++ Programmers Support your C/C++ development using MATLAB’s prebuilt graphics functions and trusted numerics.
Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.
ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.
Whole Genome Expression Analysis
© 2005 The MathWorks December 2 nd, 2005 MATLAB ® and HDF Accelerating Engineering Productivity and Scientific Discovery.
© 2008 The MathWorks, Inc. ® ® Parallel Computing with MATLAB ® Silvina Grad-Freilich Manager, Parallel Computing Marketing
Bioinformatics and it’s methods Prepared by: Petro Rogutskyi
Chapter 5 Engineering Tools for Electrical and Computer Engineers.
DNA MICROARRAYS WHAT ARE THEY? BEFORE WE ANSWER THAT FIRST TAKE 1 MIN TO WRITE DOWN WHAT YOU KNOW ABOUT GENE EXPRESSION THEN SHARE YOUR THOUGHTS IN GROUPS.
Matlab Bioinformatics Toolkit Evaluation Kanishka Bhutani.
MathCore Engineering AB Experts in Modeling & Simulation WTC.
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)
1 Computer Programming (ECGD2102 ) Using MATLAB Instructor: Eng. Eman Al.Swaity Lecture (1): Introduction.
Microarray - Leukemia vs. normal GeneChip System.
Organizing information in the post-genomic era The rise of bioinformatics.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
ImArray - An Automated High-Performance Microarray Scanner Software for Microarray Image Analysis, Data Management and Knowledge Mining Wei-Bang Chen and.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
EB3233 Bioinformatics Introduction to Bioinformatics.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
Bioinformatics lectures at Rice University Li Zhang Lecture 11: Networks and integrative genomic analysis-3 Genomic data
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Meeting Users Needs with Octave A survey of the use of Matlab™ and Octave at the NIH Tom Holroyd NIMH MEG Core Facility Presented at the Octave 2006 Workshop.
Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.
High throughput biology data management and data intensive computing drivers George Michaels.
Microarray: An Introduction
Wednesday NI Vision Sessions
MATLAB Distributed, and Other Toolboxes
Data-intensive Computing: Case Study Area 1: Bioinformatics
INTRODUCTION TO BASIC MATLAB
MATLAB Applications in Bioinformatics
Sequence Based Analysis Tutorial
Digital Image Processing
Introduction to Bioinformatic
Presentation transcript:

Bioinformatics with MATLAB Noviembre 18, 2003 Pontificia Universidad Javeriana

2 Agenda n Bioinformatics an engineering challenge n Overview of MATLAB ® n The Bioinformatics Toolbox n Developing and deploying applications with MATLAB. n Product demonstrations and examples n Questions and answer session

3 Improved health care technology leads to increased lifespan - with age bringing more diseases. n 61 million Americans have some form of cardiovascular disease; 8.5 million Americans who had cancer are alive today. n Total health care expenditures are 14% of US GDP and rising. n Health care spending in the United States is projected to reach $3.1 trillion in 2012, up from $1.4 trillion in 2001* n The worldwide pharma and biotech industries spent $69 billion in 2001 on R&D. n The US government spent $22 billion on life science R&D in *Centers for Medicare & Medicaid Services Life Expectancy at Birth USA

4 Life Science R&D spending is growing and R&D activities are becoming more quantitative. *sources: American Association for the Advancement of Science, Ernst & Young n Pharmaceutical and biotech companies are starting to adopt discovery techniques using genomics and bioinformatics and are becoming more dependent on engineering methods. n Medical instrumentation and devices companies are pushing the boundaries of mechanical, electrical and biomedical engineering and can oftentimes benefit from a variety of engineering disciplines in their work. Drug Discovery Medical Devices and Instrumentation

5 Bioinformatics is the application of computational methods to biology. Combine rapidly evolving biological sciences...  Genomics  Proteomics  Metabolic pathways... with computational methods...  Gene sequencing (Human Genome Project)  Expression analysis (DNA microarrays)  Combinatorial chemistry... to develop engineered products.  Main application: automate drug target discovery.  Basic research into the causes of disease.  Genetically engineer better crops & livestock.

6 Complete draft of the human genome 26-JUN-2000 Genetic Sequence Information is growing exponentially.

7 Important Bioinformatics Milestones

The practical challenge of working as a Bioinformatics Specialist

9 The data intensive discovery process in Pharma and Biotech. Algorithm development Custom one-off analyses Robust programs for biologists Exchange of ideas and discussion of requirements Research Biologists and Chemists need timely and easy to access analysis reports. Research Biologists and Chemists Need intuitive analysis tools. Bioinformatics and Software Development Team Research Biologists and Chemists Report Desktop Technical Applications Data Analysis, Modeling & Visualization Data I/O Algorithm/ System Design & Analysis Mathematical Modeling

10 Data access and integration of data sources and applications. Report Desktop Technical Applications Data Analysis, Modeling & Visualization Data I/O Algorithm/ System Design & Analysis Mathematical Modeling Your Databases Your Files Your instruments that generate data, signals and images Internet databases you use Your existing Excel / COM Applications Your existing C/C++, Java, Perl programs Control Instruments and acquire data Read and write files in most formats Read and write and run methods on your databases. Read from and write to Internet sites Run or interface to your applications Run your applications Read / Write Execute Control Execute Converting formats and integrating systems

11 Speed up analysis tasks with ready made functionality. Report Desktop Technical Applications Data Analysis, Modeling & Visualization Data I/O Algorithm/ System Design & Analysis Mathematical Modeling Statistics, Curve Fitting Numerical and Symbolic Math Optimization Neural Networks Image Processing, Signal Processing 2D, 3D Graphics prototyping and custom algorithm development Deploy to the Web Integrate in larger software project Generate desktop executables

12 The MathWorks at a Glance n Founded in 1984, privately held n Over 1000 employees, including 1/3 in product development n Revenues exceeding $200M n More than 500,000 users in 100 countries n Natick, MA - World Headquarters l Product Development l Technical Support n European Offices l UK, France, Germany, Italy, Switzerland, Spain, and The Netherlands n Distributors in 21 countries

13 MathWorks Mission and Vision Accelerate innovation and discovery in engineering and science MATLAB l a powerful, high-level language to develop algorithms, collect and analyze data, and visualize information Simulink l a graphical system to model and simulate complex systems, and implement real-time and embedded systems

14 l Aerospace and Defense l Automotive l Biotech, Pharmaceutical and Medical l Communications, Semiconductor l Education l Financial Services l Industrial Equipment and Machinery l Instrumentation l Medical Devices and Instrumentation MathWorks Products are Used in Various Industries

15 More than 450 textbooks for education and professional use, in 19 languages l Biosciences l Controls l Signal Processing l Image Processing l Mechanical Engineering l Mathematics l Natural Sciences l Environmental Sciences Thousands of universities teach students using MathWorks products.

16 Toolboxes The MathWorks Product Family Toolboxes for Modeling, Analysis, and Computation Specific functionality for data analysis, modeling, design, and other capabilities Signal Processing Statistics Image Processing Neural Networks Optimization Compiler Symbolic Math Curve Fitting Filter Design Wavelet Spline Fuzzy Logic

17 A portion of the DNA dye-label spectral profile, which allows the researcher to read the sequence of bases in a selected strand of DNA. Technical Applications Rosetta Inpharmatics predicts breast cancer outcome from genetic profile

18 Sequence Analysis Applications Start Stop Deploying a Sequence Analysis Algorithm Hidden Markov Model for Pair-wise Alignment

Case Study: Microarray Image Processing

20 mRNA (messenger RNA) from several cell types are each tagged with a fluor emitting a different color light and then hybridized to an array of cDNA (complementary DNA). How do Microarrays work?

21 n Automate image and statistical analysis n Try out different algorithms n Build software applications n Gather quality control measures n Normalize High-throughput experimental techniques require automated image analysis

22 Analyzing DNA with Microarray Imaging Through image analysis, the fluorescence at the site of each immobilized cDNA can be quantified. For example, the log ratio of red-to-green intensity gives a measure of gene expression. Fluorescently tagged mRNA from different cells are hybridized to a microscopic array of hundreds of thousands of cDNA spots that correspond to different genes. Illuminated spots emit different color light, indicating which genes are expressed (e.g., green=control, red=sample, yellow=both).

23 n Clean up images with noise n Correct for rotation, skew => regular spot spacing (rows, cols) n Isolate sub-image array of colored spots n Separate red and green planes n Remove non-uniform local background n Identify regular grid pattern of spots on slide n Address individual spots by region of interest n Integrate red and green intensity values n Detect poor spot quality and flag as bad data points n Determine gene expression from intensities n Develop robust algorithm to automate process n Deploy application to implement algorithm. Application Challenges

24 1. Read image file (imread) 2. Determine horizontal spot locations (columns) a. Create horizontal profile using column averages (mean) b. Remove local background using morphology (imtophat) c. Segment and label spot columns (im2bw, bwlabel) d. Extract spot centers (regionprops,.Centroid) e. Calculate column boundaries between spots 3. Transpose image and repeat => spot rows 4. Display detected spot locations on top of image 5. Tabulate spot intensities. Solution Algorithm

25 What did this case study show? 1. MATLAB environment was great for developing an algorithm (environment + language + graphics) 2. Image Processing Toolbox provided a rich set of functions for segmentation, region properties and background removal 3. Signal Processing Toolbox provided autocorrelation function to determine spot periodicity.

© 2003 The MathWorks, Inc. 26 The Bioinformatics Toolbox

27 Function Overview n File I/O l Read FASTA, PDB, GenePix, Affymetrix and many more format files n Web connectivity l Directly access GenBank, PDB, EMBL, PIR,… n Sequence analysis l Base density, codon counts, ORF finding,… n Sequence alignment l Local, global and profile HMM based alignment n Microarray normalization & visualization l Normalization tools, Gene filters, expression profile cluster analysis,… n Protein visualization l Hydrophobicity plots, Ramachandran plots,…

28 Getting data into MATLAB “get” functions retrieve data from Internet based databases. getembl - Sequence data from EMBL. getgenbank - Sequence data from GenBank. getgenpept - Sequence data from GenPept. getpdb - Sequence data from PDB. getpir - Sequence data from PIR-PSD. gethmmprof - HMM from the PFAM database. getgeodata - Gene Expression Omnibus (GEO) data

29 Sequence Alignment Tutorial Example n Get human and mouse genes from GenBank n Look for open reading frames (ORFs) n Convert DNA sequences to amino acid sequences n Create a dotplot of the two sequences n Perform global alignment n Perform local alignment

30 Microarray Data Analysis Tutorial Example n Plot expression profiles for genes n Filter genes based on information content of profile n Perform hierarchical clustering n Perform K-means clustering n Perform Principal Component Analysis Reference: DeRisi, JL, Iyer, VR, Brown, PO. "Exploring the metabolic and genetic control of gene expression on a genomic scale." Science Oct 24;278(5338):680-6.

Integrating and Deploying Bioinformatics Tools with MATLAB Robert Henson The MathWorks, Inc. Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks, Inc. 31 Integrating and Deploying Bioinformatics Tools with MATLAB Rob Henson Bioinformatics Development

32 Connecting to MATLAB Web Instrument Control Data Acquisition Image Acquisition Excel / COM File I/O DatabaseToolbox C/C++ Java Perl

33 C/C++ Web Stand-alone ExcelCOM Deploying with MATLAB

34 Database Connections n ODBC or JDBC compliant database l ODBC and JDBC on PC l JDBC on UNIX n Data types are preserved n Retrieval of large/partial data sets n Access multiple connections (same or different DB) n Database connections remain open

35 Database Connections Visual Query Builder n Access data without knowing SQL Scroll through tables and fields Customize your query n Built-in visualization tools Plotting and charting Creating HMTL reports Handling date strings n Reuse SQL statements in your own program

36 Customized Reports

37 C/C++ Web Stand-alone ExcelCOM Deploying with MATLAB

38 Push Data into MATLAB Data I/O n Import Excel ranges into MATLAB n Export MATLAB data into Excel ranges n Evaluate MATLAB Statements in Excel

39 Computational Engine for Excel Spread Sheet Applications n MATLAB Excel Link can be the computational engine behind your Excel applications n Fast scalable solution MLPutMatrix("data",B2:H43) MLPutMatrix("Genes",A2:A43) MLPutMatrix("TimeSteps",B1:H1) MLEvalString("clustergram(data,'RowLabels',… Genes,'ColLabels',TimeSteps)")

40 Summary n Read and write to a database l Powerful math and data analysis l Generate custom reports n Create standalone applications n Easy integration with Excel l MATLAB as a computational engine l Create Excel Plug-ins in minutes

41 Industry Issues & Solutions Integrating tools from various programming languages is difficult, closed source tools are not customizable, and freeware is often not supported. There is no standard biological data format. Applications must be easily deployable within organizations. MATLAB is a supported, viewable source, user-friendly environment for data analysis across applications, algorithm development, and deployment. MATLAB and the Bioinformatics Toolbox provides file format support for common data sources (web-based, sequences, microarray, etc.). MATLAB’s deployment tools and user- interface design environment allow easy deployment of MATLAB based applications.

42 Further Information n Product Information and Demos Trials and technical literature are available through the MathWorks. n MATLAB Central l File exchange and newsgroup access for MATLAB and Simulink users l l Access to comp.soft-sys.matlab file exchange and newsgroup access for the MATLAB & Simulink user community

Free trials and technical literature are available through the MathWorks Visit