Presentation is loading. Please wait.

Presentation is loading. Please wait.

Meeting the Bioinformatics Challenges of Functional Genomics VanBUG 11 September 2003.

Similar presentations


Presentation on theme: "Meeting the Bioinformatics Challenges of Functional Genomics VanBUG 11 September 2003."— Presentation transcript:

1 Meeting the Bioinformatics Challenges of Functional Genomics VanBUG 11 September 2003

2 The TIGR Gene Index Team Foo Cheung Svetlana Karamycheva Yudan Lee Babak Parvizi Geo Pertea Razvan Sultana Jennifer Tsai John Quackenbush Joseph White Funding provided by the Department of Energy and the National Science Foundation TIGR Human/Mouse/Arabidopsis Expression Team Emily Chen Bryan Frank Renee Gaspard Jeremy Hasseman Lara Linford Fenglong Liu Simon Kwong John Quackenbush Shuibang Wang Yonghong Wang Ivana Yang Yan Yu Array Software Hit Team Nirmal Bhagabati John Braisted Tracey Currier Jerry Li Wei Liang John Quackenbush Alexander I. Saeed Vasily Sharov Mathangi Thiagarajan Joseph White Assistant Sue Mineo Funding provided by the National Cancer Institute, the National Heart, Lung, Blood Institute, and the National Science Foundation H. Lee Moffitt Center/USF Timothy J. Yeatman Greg Bloom TIGR PGA Collaborators Norman Lee Renae Malek Hong-Ying Wang Truong Luu Bobby Behbahani TIGR Faculty, IT Group, and Staff Acknowledgments PGA Collaborators Gary Churchill (TJL) Greg Evans (NHLBI) Harry Gavras (BU) Howard Jacob (MCW) Anne Kwitek (MCW) Allan Pack (Penn) Beverly Paigen (TJL) Luanne Peters (TJL) David Schwartz (Duke) Emeritus Jennifer Cho (TGI) Ingeborg Holt (TGI) Feng Liang (TGI) Kristie Abernathy ( A) Sonia Dharap ( A) Julie Earle-Hughes ( A) Cheryl Gay ( A) Priti Hegde ( A) Rong Qi ( A) Erik Snesrud ( A) Heenam Kim ( A)

3 Acknowledgments Thanks to Syntek, Inc. for GeneShaving MeV module and assistance with MyMADAM Thanks to DataNaut, Inc. for RelNet and Terrain Map modules and assistance with Client/Server MeV

4 Science is built with facts as a house is with stones – but a collection of facts is no more a science than a heap of stones is a house. – Jules Henri Poincare – Jules Henri Poincare

5 There are 10 11 stars in the galaxy. That used to be a huge number. But it's only a hundred billion. It's less than the national deficit! We used to call them astronomical numbers. Now we should call them economical numbers. - Richard Feynman, physicist, Nobel laureate (1918-1988)

6 Microarray Analysis at TIGR

7 Step 1: Experimental Design

8 Step 2: Data Collection

9 Step 3: Data Analysis

10 Step 4: Consulting with the ArraySW gang in the trailer

11 Step 5: Sharing data with our collaborators

12 Select array elements and annotate them Build a database to manage stuff Print arrays and manage the lab Hybridize and analyze images; manage data Analyze hybridization data and get results Steps in the Process

13 Select array elements and annotate them Build a database to manage stuff Print arrays and manage the lab Hybridize and analyze images; manage data Analyze hybridization data and get results Steps in the Process

14 TIGR Gene Indices home page www.tigr.org/tdb/tgi ~60 species >16,000,000 sequences

15 TGICL Tools are available – with more coming Available with source Available with source Geo Pertea Razvan Sultana Valentin Antonescu

16 High stringency pair- wise comparisons to build Clusters Gene Index Assembly process reduceredundancy Expressed Transcripts (ET) from GenBank CDS remove vector, poly-A, adapter,mitochondrial and ribosomal sequence ESTs from GenBank (dbEST) TIGR ESTs Each cluster is assembled to obtain Tentative Consensus sequences ( TC s) Annotate TCs and release

17 The Mouse Gene Index The Mouse Gene Index

18 A TC Example

19 Babak Parvizi GO Terms and EC Numbers

20 The TIGR Gene Indices The TIGR Gene Indices Dan Lee, Ingeborg Holt

21 Tentative Orthologues And Paralogues Building TOGs: Reflexive, Transitive Closure Thanks to Woytek Makałowski and Mark Boguski

22 TOGA: An Sample Alignment: bithoraxoid-like protein

23

24 Gene Finding in Humans is easy! Razvan Sultana

25 Gene Finding in Humans is easy? Razvan Sultana

26 Gene Finding in Humans is difficult? Razvan Sultana

27 Gene Finding in Humans is difficult? Razvan Sultana A genome and its annotation is only a hypothesis that must be tested.

28 http://pga.tigr.org/tools.shtml RESOURCERER Jennifer Tsai

29 RESOURCERER: An Example

30 RESOURCERER: Using Genetic Markers Just added: Integrated QTLs

31 Select array elements and annotate them Build a database to manage stuff Print arrays and manage the lab Hybridize and analyze images; manage data Analyze hybridization data and get results Steps in the Process

32 SOPs are available cDNA/template prep PCR purification Printing RNA labeling Hybridization Coming: Data QC SOP

33 What data should we collect? Nature Genetics 29, December 2001 MAGE-ML – XML-based data exchange format EVERYTHING

34 An Example of MIAME-lite

35 MIAME Relational Schema

36 Whats Wrong with MIAME? MIAME was designed as a model for capturing information necessary to create public databases. MIAME-based databases lack LIMS capabilities, which are necessary for large-scale studies. We do not want to store images in our database for practical reasons – limited space. We needed to develop a variety of tools adapted to our existing infrastructure and legacy data and databases. Probes are labeled and applied to the arrays Probes are labeled and applied to the arrays An experiment is a hybridization An experiment is a hybridization A study is a collection of hybridization experiments A study is a collection of hybridization experiments

37 MAD Microarray Database Schema

38 Conceptual Schema: MAD Protocol Protocol Primer_pair Primer Primer PCR PCR New_plate Slide Slide Slide_type Slide_type Spot Spot Scan Scan Analysis Analysis Normalize Normalize ExperimentExpression Expt_probe Hyb Hyb Study Study Probe Probe Probe_source Probe_source Gene Gene Clone Clone

39 MADAM: Microarray Data Manager Available with source and MySQL Available with source and MySQL Marie-Michelle Cordonnier-Pratt, UGA Marie-Michelle Cordonnier-Pratt, UGA converted MySQL to Oracle and made MADAM work!

40 ExpDesigner

41 MAGE OM for Array Data Purpose: Provide and object model to capture data for a MIAME data repository.Purpose: Provide and object model to capture data for a MIAME data repository. Accept data in XML format from any database, i.e. schema independent.Accept data in XML format from any database, i.e. schema independent. Load data into MIAME database.Load data into MIAME database. Export data from MIAME database in XML format.Export data from MIAME database in XML format. MIAME MA D XML Formatted Document MAGE OM THE INSTITUTE FOR GENOMIC RESEARCH TIGRTIGR

42 Select array elements and annotate them Build a database to manage stuff Print arrays and manage the lab Hybridize and analyze images; manage data Analyze hybridization data and get results Steps in the Process

43 Microbial ORFs Design PCR Primers PCR Products Eukaryotic Genes Select cDNA clones PCR Products Microarray Overview I For each plate set, many identical replicas Microarray Slide (with 60,000 or more spotted genes) + Microtiter Plate Many different plates containing different genes

44 Microarray Overview PCR Amplification Selected Genes Primer Design Gel-based Scoring Primer Synthesis MAD PCR Scorer Reads/loads primer data file to MAD and allows PCR data entry, and translation of 96 384. (Alex Saeed, developer and maintainer enhancements: Wedge Smith) Clone Selection

45 The Beast: Microarray Robot from Intelligent Automation

46 Additional Software for Arrays: Scheduler Microarray Scheduler Allows scheduling of all instruments Designed and maintained by Jerry Li Available with source Available with source

47 Microarray Overview MAD Amplified/Purified Genes Loaded in Arrayer Slides Printed Run Parameters Set SliTrack/Controller Takes Slide Order and Run parameters, generates spot order, IAS control file, launches IAS run software, loads database. (J. Li, developer and maintainer)

48 Select array elements and annotate them Build a database to manage stuff Print arrays and manage the lab Hybridize and analyze images; manage data Analyze hybridization data and get results Steps in the Process

49 Microarray Overview II Prepare Fluorescently Labeled Probes ControlTest Hybridize,Wash MeasureFluorescence in 2 channels red/green Analyze the data to identify patterns of gene expression

50 Microarray Overview II Prepare Fluorescently Labeled Probes ControlTest Hybridize,Wash MeasureFluorescence in 2 channels red/green Analyze the data to identify patterns of gene expression WeedBush

51 Microarray Overview II Prepare Fluorescently Labeled Probes ControlTest Hybridize,Wash MeasureFluoresence in 2 channels red/green Analyze the data to identify differentially expressed genes Obtain RNA Samples

52 Microarray Overview MAD MADAM Allows data entry (J. Li & J. White, web prototype; A. Saeed, J. White, J.Li, & V. Sharov, developers) Prepare Fluorescently Labeled Probes ControlTest Hybridize,Wash Obtain RNA Samples

53 Microarray Overview MAD MABCOS Uses Bar Codes to track samples (J. Li developer) Obtain RNA Samples Prepare Fluorescently Labeled Probes ControlTest Hybridize,Wash Available with source Available with source

54 Microarray Overview Paired TIFF Image Files MADAM + MAP Allows data entry, moves files/renames to long-term storage (A. Saeed, J. White, J.Li, & V. Sharov, developers) MAD NetAPP

55 Microarray Overview NetAPP Spotfinder Provides Image Analysis, writes data to flat files or directly to db (V. Sharov, developer and maintainer) MAD Available as Executable for Windows; device-independent C/C++ coming Available as Executable for Windows; device-independent C/C++ coming

56 The TIGR Array Software System SLITRACK MADAM PCRSCORE ExpDesigner SpotFinder MABCOS McCoder MeV MIDAS MAD

57 Data Normalization and Filtering

58 Lowess Normalization Why LOWESS? Why LOWESS? ASD = 0.346 Observations 1.Intensity-dependent structure 2.Data not mean centered at log 2 (ratio) = 0

59 LOWESS (Contd) Local linear regression model Local linear regression model Tri-cube weight function Tri-cube weight function Least Squares Least Squares Estimated values of log 2 (Cy5/Cy3) as function of log 10 (Cy3*Cy5) A SD = 0.346

60 LOWESS Results

61 Slice Analysis (Intensity-dependent Z-score)

62 MIDAS: Data Analysis Wei Liang Available with OSI source Available with OSI source Adding Error Models, MAANOVA, Automated Reporting

63 Microarray Overview MAD MIDAS Performs data normalization and filtering, including, soon, ANOVA MIDAS MIDAS

64 Select array elements and annotate them Build a database to manage stuff Print arrays and manage the lab Hybridize and analyze images; manage data Analyze hybridization data and get results Steps in the Process

65 MeV: Data Mining Tools Alexander Saeed Alexander Sturn Nirmal Bhagabati John Braisted Syntek Inc. Datanaut, Inc. Datanaut, Inc. Available with OSI source Available with OSI source

66 MeV: Metabolic pathway analysis is coming Maria Klapa and Chris Koenig

67 Analyses available in MeV... Hierarchical clustering (HCL) Bootstrapped/Jackknifed HCL k-means clustering (KMC) k-means support (iterative KMC) Self-Organizing Maps (SOMs) Cluster Affinity Search Technique (CAST) Figure of Merit for CAST and KMC (soon SOM) QT-clust (Heyer Jackknife) Principal component analysis (PCA) Gene Shaving Relevance Networks Support Vector Machines (SVM) Self-Organizing Trees Classification approaches, including Template Matching t-tests Significance Analysis of Microarrays (SAM) ANOVA tools GO, Metabolic Pathway, and Genome Localization annotation/clustering Client-server mode with well-defined API

68 Missing from MeV... MAGE-ML output for direct submission to databases... Coming in the next MADAM release. Links to BioConductor … are coming. Array CGH module from Barb Weber and Adam Margolin... is coming. EASE module from Doug Hosack... is coming Lots of stuff we are not smart enough to think about.

69 036912 z zz z zz z zz z zz z zz Sleep Deprivation Studies in Mouse z zz z zz z zz z zz z zz z zz z zz z zz z zz z zz

70 Experimental Paradigm Compare gene expression between sleeping and sleep-deprived mice in cortex and hypothalamus Perform 3 biological replicates Normalize and filter data and use data mining techniques to select distinct patterns of gene expression Use Gene Ontology (GO) assignments to classify genes by cellular localization, molecular function, biological process Use GO analysis to develop an understanding of response

71 Differential Expression in Cortex Energy Metabolism Transcription; Mitochondrial and Ribosomal Proteins Stress Response Intermediate Metabolism and Signal Transduction

72 Differential Expression in Hypothalamus Sleep signaling

73 EASE Analysis of GO terms Hosack, et al. 2003 Themes: General biological trends based on representation of functional roles on the array Problem: Requirement of functional class assignment limits utility for discovery of new functional networks Thanks to Doug Hosack, NIAID

74 Now available... The TGI databases, including RESOURCERER The TGICL Gene Index Clustering and Assembly Tools A freely-available MySQL version of our MIAME- supportive database A freely-available, open source, java-based set of tools: MADAM: Microarray Data Manager MADAM: Microarray Data Manager MIDAS: Microarray Data Analysis System MIDAS: Microarray Data Analysis System MeV: Multiexperiment Viewer MeV: Multiexperiment Viewer A freely-available, image processing software system linked to the database: TIGR Spotfinder

75 Nobody in the game of football should be called a genius. A genius is somebody like Norman Einstein. -Joe Theisman, Former quarterback

76 A theory has only the possibility of being right or wrong. A model has a third possibility; it may be right but irrelevant. – Manfred Eigen – Manfred Eigen

77 Unless a reviewer has the courage to give you unqualified praise, I say ignore the bastard. - John Steinbeck

78 The TIGR Gene Index Team Foo Cheung Svetlana Karamycheva Yudan Lee Babak Parvizi Geo Pertea Razvan Sultana Jennifer Tsai John Quackenbush Joseph White Funding provided by the Department of Energy and the National Science Foundation TIGR Human/Mouse/Arabidopsis Expression Team Emily Chen Bryan Frank Renee Gaspard Jeremy Hasseman Heenam Kim Lara Linford Simon Kwong John Quackenbush Shuibang Wang Yonghong Wang Ivana Yang Yan Yu Array Software Hit Team Nirmal Bhagabati John Braisted Tracey Currier Jerry Li Wei Liang John Quackenbush Alexander I. Saeed Vasily Sharov Mathangi Thaiagarjian Joseph White Assistant Sue Mineo Funding provided by the National Cancer Institute, the National Heart, Lung, Blood Institute, and the National Science Foundation H. Lee Moffitt Center/USF Timothy J. Yeatman Greg Bloom TIGR PGA Collaborators Norman Lee Renae Malek Hong-Ying Wang Truong Luu Bobby Behbahani TIGR Faculty, IT Group, and Staff Acknowledgments PGA Collaborators Gary Churchill (TJL) Greg Evans (NHLBI) Harry Gavaras (BU) Howard Jacob (MCW) Anne Kwitek (MCW) Allan Pack (Penn) Beverly Paigen (TJL) Luanne Peters (TJL) David Schwartz (Duke) Emeritus Jennifer Cho (TGI) Ingeborg Holt (TGI) Feng Liang (TGI) Kristie Abernathy (mA) Sonia Dharap(mA) Julie Earle-Hughes (mA) Cheryl Gay (mA) Priti Hegde (mA) Rong Qi (mA) Erik Snesrud (mA)


Download ppt "Meeting the Bioinformatics Challenges of Functional Genomics VanBUG 11 September 2003."

Similar presentations


Ads by Google