Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

BiGCaT Bioinformatics Hunting strategy of the bigcat.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Network inference from repeated observations of node sets Neil Clark, Avi Ma'ayan.
Microarray Pitfalls Stem Cell Network Microarray Course, Unit 3 October 2006.
Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Walk-thru of CAGE exercise Also at /tag_analysis/ /tag_analysis/
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Using Gene Ontology Models and Tests Mark Reimers, NCI.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
Pathway Analysis Michael Sneddon Southern California Bioinformatics Institute August 20, 2004.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
Making the Most of Small Sample High Dimensional Micro-Array Data Allan Tucker, Veronica Vinciotti, Xiaohui Liu; Brunel University Paul Kellam; Windeyer.
Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab.
Triangulation of network metaphors The Royal Netherlands Academy of Arts and Sciences Iina Hellsten & Andrea Scharnhorst Networked Research and Digital.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Pathway analysis Daniel Hurley Pathway analysis: summary A popular buzzword… but what does it mean? A popular buzzword… but what does it mean? How do.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks Jacob Scott, Trey Ideker, Richard M. Karp, Roded Sharan RECOMB 2005.
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
Multiple testing correction
A Bioinformatics Meta-analysis of Differentially Expressed Genes in Colorectal Cancer Simon Chan, Thursday Trainee Seminar – October 11.
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
Semantic Similarity over Gene Ontology for Multi-label Protein Subcellular Localization Shibiao WAN and Man-Wai MAK The Hong Kong Polytechnic University.
Frédéric Schütz Statistics and bioinformatics applied to –omics technologies Part II: Integrating biological knowledge Center.
Analysis of Molecular and Clinical Data at PolyomX Adrian Driga 1, Kathryn Graham 1, 2, Sambasivarao Damaraju 1, 2, Jennifer Listgarten 3, Russ Greiner.
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Interpreting Microarray Expression Data Using Text Annotating the Genes Michael Molla, Peter Andreae, Jeremy Glasner, Frederick Blattner, Jude Shavlik.
RNAseq analyses -- methods
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
Fission Yeast Computing Workshop -1- Searching, querying, browsing downloading and analysing data using PomBase Basic PomBase Features Gene Page Overview.
Use of Hierarchical Keywords for Easy Data Management on HUBzero HUBbub Conference 2013 September 6 th, 2013 Gaurav Nanda, Jonathan Tan, Peter Auyeung,
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
CellFateScout step- by-step tutorial for a case study Version 0.94.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Network & Systems Modeling 29 June 2009 NCSU GO Workshop.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
Do What Needs to Be Done Today. The secret of happy successful living is to do what needs to be done now, and not worry about the past or the future.
Statistical Testing with Genes Saurabh Sinha CS 466.
Clustering Algorithms to make sense of Microarray data: Systems Analyses in Biology Doug Welsh and Brian Davis BioQuest Workshop Beloit Wisconsin, June.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
The Protein Identifier Cross-Reference (PICR) service.
The Broad Institute of MIT and Harvard Differential Analysis.
GO enrichment and GOrilla
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Microarray Data Analysis The Bioinformatics side of the bench.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Stitching the Tutorials Together Introduction to Systems Biology Course Chris Plaisier Institute for Systems Biology.
Canadian Bioinformatics Workshops
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
A New Interface to GeneKeyDB Methods for analyzing relationships among proteins based on shared motifs Chris Symons & Xinxia Peng.
Biases and their Effect on Biological Interpretation
AP Statistics Multiple Choice Investigation
Stephen Bridgett, James Campbell, Christopher J. Lord, Colm J. Ryan 
Presentation transcript:

Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

Contents Recap what I have done past Introduction to the dataset from Scott Powers Statistical model used Results Summary and future directions

Results from Previous Talks Notes: 1. Values in parentheses are numbers of proteins in SwissProt (Column 2) or coverage in SwissProt (Column 3) 2. Coverages in column 3 were calculated by dividing numbers in column 2 by total number of HPRD entries (25205) or SwissProt entries (14446) 3. Citations: 1). Joshi-Tope G. et al. Nucleric Acids Res.33 : D (2005) 2). Huaiyu Mi et al. Nucleric Acids Res. 35: D247-D252 (2007) 3). 4). 5). 6).

Results from Previous Talks Naïve Bayes Classifier

Dataset from Scott Powers Lung cancer samples or cell lines: 135 Amplified fragments: 365 Genes contained by fragments: 3900 Question: How to find statistically significant pathways for these genes?

A Simple Model Binomial Test Bonferroni Correction ?

Results from Simple Model

Bonferroni Correction: P-values  : number of pathways

How to consider frequencies? To consider frequencies, a new list of genes was generated: genes were counted multiple times based on frequencies E.g.: OR2T29  14, MYC  9, etc. Total numbers: 5717 Redundant SetNon-redundant Set

Results from Simple Model - Redundant Set Bonferroni Correction cannot make any difference!

Permutation Based Model Sampling genes Binomial test Filtering out hit pathways based on cut-off value  1000 Counting occurrences of pathways Generating a mapping file Binomial test of actual sample Correcting sample p values using mapping file Choosing cut-off p-value

Sampling Genes Chromosome segment based: Using a fixed length to sample a chromosome based on CNV information Example: Chromosome 1

One Run 2.9E-07 B cell receptor signaling pathway(I) 3.0E-07

B cell receptor signaling pathway(I) p value: < 0.001

Another Run 2.7E-06 TGFBR(C) 3.0E-06

TGFBR(C) 2TGFBR(C) p value: 0.002

Significantly Hit Pathways - Non-redundant Set

Significantly Hit Pathways - Redundant Set

Results from A Simple Sampling Sampling: Randomly pick 3900 genes from all human genes

Summary A framework has been built to look for statistically significant pathways for a list of genes Using this framework, we found several pathways linking to the gene set from lung cancer CNVs However, relationships among these hit pathways and genes in these pathways need further investigations.

Future Directions Validate the predicated results: Pick disease-related gene sets with known pathways (e.g. Type 1 diabetes) Develop a web based application to deploy the combined network to end users. Develop methods based on the Graph theory to explore relationships among genes in hit pathways: protein interaction data will be used as bridges to traversal different pathways.

Reference Osier, MV, Zhao, H and Cheung, KH: Handling multiple testing while interpreting microarrays with the Gene Ontology Database. BMC Bioinformatics 2004, 5: 124

Thanks!!!