Control Case Common Always active

Slides:



Advertisements
Similar presentations
Analysis of High-Throughput Screening Data C371 Fall 2004.
Advertisements

DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Using a Mixture of Probabilistic Decision Trees for Direct Prediction of Protein Functions Paper by Umar Syed and Golan Yona department of CS, Cornell.
Pattern Recognition and Machine Learning
CGeMM – University of Louisville Mining gene-gene interactions from microarray data - Coefficient of Determination Marcel Brun – CGeMM - UofL.
Putting genetic interactions in context through a global modular decomposition Jamal.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Minimum Redundancy and Maximum Relevance Feature Selection
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
Decomposition of overlapping protein complexes: A graph theoretical method for analyzing static and dynamic protein associations Algorithms for Molecular.
GENIE – GEne Network Inference with Ensemble of trees Van Anh Huynh-Thu Department of Electrical Engineering and Computer Science, Systems and Modeling,
Mutual Information Mathematical Biology Seminar
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Reduced Support Vector Machine
Cancer classification using Machine Learning Techniques on Microarray Data Yongjin Park 1 and Ming-Chi Tsai 2 1 Department of Biology, Computational Biology.
Lecture 5 (Classification with Decision Trees)
Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Statistical Analysis of Microarray Data
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Ensemble Learning (2), Tree and Forest
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Gene Set Enrichment Analysis (GSEA)
by B. Zadrozny and C. Elkan
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
CellFateScout step- by-step tutorial for a case study Version 0.94.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Slides for “Data Mining” by I. H. Witten and E. Frank.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network problems Tamer Kahveci.
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Microarray Data Analysis The Bioinformatics side of the bench.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Lecture 7: Constrained Conditional Models
University of Rochester
Heping Zhang, Chang-Yung Yu, Burton Singer, Momian Xiong
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Trees, bagging, boosting, and stacking
Chapter 6 Classification and Prediction
Source: Procedia Computer Science(2015)70:
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Computer Vision Chapter 4
Ying Dai Faculty of software and information science,
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Chapter 7: Transformations
Presentation transcript:

Control Case Common Always active

2 MinePath MinePath introduces a new methodology for the identification of differentially expressed functional paths or sub-paths within a gene regulatory network (GRN) using microarray data analysis. Innovative features & benefits: ◦ MinePath takes advantage of the regulatory mechanisms in a GRN such as the direction and the type of interaction (activation/inhibition) between genes for each sub-pathway. ◦ Contrary to similar efforts which visualize the state of genes on a pathway, MinePath identifies and visualizes differentially expressed regulatory mechanisms and sub-pathways of GRNs. ◦ MinePath is a web based application (no setup is needed) which can compute, identify and visualize differentially expressed paths from your expression data within seconds

3

4 Information Gain & Entropy*  Based on Information Gain & Entropy*  0 indicates a non- expressed or under- expressed gene  1 indicates over- expressed gene * Potamias G., Koumakis L., & Moustakis V. "Gene selection via discretized gene-expression profiles and greedy feature-elimination." In Methods and Applications of Artificial Intelligence, pp Springer Berlin Heidelberg, 2004.

5 GRNs are described through standard graph annotations.  Nodes can be either genes, groups of genes, compounds or other networks.  Edges can be one of the gene relations known from the biology theory

6 Sub-paths decomposition:  KGML (KEGG XML) processing  All possible GRN sub-paths are extracted

7 MinePath provides two options to cope with the one to many (probe to gene) issue:  Max Probe: selection of the value of the probe with the highest intensity out of all the probes that map to the same gene (default option).  Probes clones: produce all the possible combinations of sub-paths based on probes and not on gene ids. ◦ 3*3*1*3*2 = 54 sub-paths in this example

NormalDisease S1S2S3S4S5S6 A B C D E A D C B E D E Microarrays Gene Regulatory Networks

9 Boolean operations Sub-paths are extracted from the graph using basic Boolean operations for optimization logic AND  Activation is mapped as a logic AND logic XOR  Inhibition as a logic XOR  sub-paths with more than one reaction require the combination of previous sub- path and the last relation using a logic AND

10 The result is an array of sub- paths with binary values for every sample in the form of a discretized microarray

11 MinePath produces a binary matrix containing information about the sub-paths (active or not) for the specific samples  Transformation does not aim to reduce the dimensionality issue of microarrays  e.g. U133A ( probes) & all hsa KEGG pathways produce more than sub-paths  MinePath analysis identifies: ◦ The “best” or in our case the most discriminant features (sub- paths) using two different filtering/ranking methodologies:  the discriminant ranking  the polarity ranking ◦ The “best” common sub-paths (sub-paths that appear to be functional for both phenotypes)

12 Assume the two phenotypic classes P (positive), N (negative) : ◦ H P = number of P samples that the sub-path holds. ◦ L P = number of P samples that the sub-path does not hold. ◦ H N = number of N samples that the sub-path holds. ◦ L N = number of N samples that the sub-path does not hold.

13  Relative polarity  Relative polarity: A variation of the polarity ranking formula where in-stead of absolute counts we use the percentages of the over expressed per class. ◦ valuable for datasets with unbalanced number of samples per class  Boost true positive  Boost true positive: gives a low ranking (“penalty”) to sub-paths which are activated in a small number of samples for the one class and in none or almost none samples in the other. ◦ This variation of the polarity will assure that the almost always non-functional sub-paths will be rejected

14 Sub-paths which are always activated may fill-in the gap (functional interaction) between two sub- paths and reveal a complete functional and biologically valuable route.

15 MinePath provides mechanisms that validate the best sub-paths against the different phenotypes using well-known algorithms and validation procedures from the area of machine learning: ◦ Decision tree learning (C4.5) ◦ Naïve Bays ◦ Support Vector Machines (Linear kernel) By default MinePath computes, stores and reports 10-fold cross-validation results, but additional modelling experiments could be conducted and evaluated ◦ e.g. following a train vs. independent test experimentation mode

16

17 Select or upload gene expression dataset Select pathwaysRun MinePath

18 Statistics for each pathway participated in the experiment such as the number of genes, the number of sub-paths, and number of sub-paths for each class and for the common sub-paths, percentages and three scores: pwA  Pathway power (pwA): is the sum of the significant sub-paths in the pathway (including the common sub-paths) divided by the number of the total sub-paths of the pathway. pwDS  Pathway discriminant power (pwDS): is the number of the significant sub-paths for the two classes divided by the number of the total sub-paths of the pathway. Score  The pathway score (Score):Score = pwA * pwDS The user can also short the results based on any of these features.

19 MinePath supports active interaction and immediate visualization when the end user sets new thresholds for the two phenotypes or for the always active sub-paths, as well as to hide/show the overlapping relations and hide/show the association- dissociations of the pathway from the control panel An example of the ErbB pathway for the ‘4ERdatasets’ dataset Colour coding:  Red: relations active at class 1  Blue: relations active at class 2  Magenta: overlapping relations in the two classes  Orange: sub-paths that are always active.

20 MinePath is equipped with special functionality that enables the reduction of network’s complexity: deletion of genes deletion of relations deletion of parts of the network re-orientation of its topology. The functionality is available with a right click (in the viewer). Thresholds:13 for class 1 (ERpos), 13 for class 2 (ERneg), 95% for always active sub-paths