An Association Analysis Approach to Biclustering website:

Slides:



Advertisements
Similar presentations
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Advertisements

Visual Data Mining: Concepts, Frameworks and Algorithm Development Student: Fasheng Qiu Instructor: Dr. Yingshu Li.
Putting genetic interactions in context through a global modular decomposition Jamal.
Rakesh Agrawal Ramakrishnan Srikant
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
University at BuffaloThe State University of New York Interactive Exploration of Coherent Patterns in Time-series Gene Expression Data Daxin Jiang Jian.
A New Biclustering Algorithm for Analyzing Biological Data Prashant Paymal Advisor: Dr. Hesham Ali.
Exhaustive Signature Algorithm
Data Mining Association Analysis: Basic Concepts and Algorithms
August 26, 2008Gupta et al. KDD Quantitative Evaluation of Approximate Frequent Pattern Mining Algorithms Rohit Gupta, Gang Fang, Blayne Field, Michael.
Data Mining Association Analysis: Basic Concepts and Algorithms
Clustering (Part II) 10/07/09. Outline Affinity propagation Quality evaluation.
SSCP: Mining Statistically Significant Co-location Patterns Sajib Barua and Jörg Sander Dept. of Computing Science University of Alberta, Canada.
Mining Phenotypes and Informative Genes from Gene Expression Data Chun Tang, Aidong Zhang and Jian Pei Department of Computer Science and Engineering State.
Clustering (Part II) 11/26/07. Spectral Clustering.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Generating Robust and Consensus Clusters from Gene Expression Data Allan Tucker a, Stephen Swift a, Xiaohui Liu a, Nigel Martin b, Christine Orengo c,
Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Bulut, Singh # Selecting the Right Interestingness Measure for Association Patterns Pang-Ning Tan, Vipin Kumar, and Jaideep Srivastava Department of Computer.
Processor Consistency [Goodman 1989]* Processor Consistency is a memory model in which the result of any execution is the same as if the operations of.
An Unsupervised Learning Approach for Overlapping Co-clustering Machine Learning Project Presentation Rohit Gupta and Varun Chandola
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.
Web Usage Mining Sara Vahid. Agenda Introduction Web Usage Mining Procedure Preprocessing Stage Pattern Discovery Stage Data Mining Approaches Sample.
Bi-Clustering Jinze Liu. Outline The Curse of Dimensionality Co-Clustering  Partition-based hard clustering Subspace-Clustering  Pattern-based 2.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Decision Optimization Techniques for Efficient Delivery of Multimedia Streams Mugurel Ionut Andreica, Nicolae Tapus Politehnica University of Bucharest,
Graph preprocessing. Common Neighborhood Similarity (CNS) measures.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Bi-Clustering. 2 Data Mining: Clustering Where K-means clustering minimizes.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
ImArray - An Automated High-Performance Microarray Scanner Software for Microarray Image Analysis, Data Management and Knowledge Mining Wei-Bang Chen and.
Clustering by Pattern Similarity in Large Data Sets Haixun Wang, Wei Wang, Jiong Yang, Philip S. Yu IBM T. J. Watson Research Center Presented by Edmond.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
EECS 730 Introduction to Bioinformatics Microarray Luke Huan Electrical Engineering and Computer Science
Gene-Markers Representation for Microarray Data Integration Boston, October 2007 Elena Baralis, Elisa Ficarra, Alessandro Fiori, Enrico Macii Department.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Bi-Clustering COMP Seminar Spring 2011.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Bi-Clustering COMP Seminar Spring 2008.
Consensus Group Stable Feature Selection
Biclustering of Expression Data by Yizong Cheng and Geoge M. Church Presented by Bojun Yan March 25, 2004.
Data Mining and Decision Support
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
University at BuffaloThe State University of New York Pattern-based Clustering How to cluster the five objects? qHard to define a global similarity measure.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Subgraph Search Over Uncertain Graphs Erşan Demircioğlu.
Community Detection  Definition: Community Detection  Girwan Newman Approach  Hierarchical Clustering.
Overview of Biomedical Informatics
Cohesive Subgraph Computation over Large Graphs
Semi-Supervised Clustering
Two études on modularity
Data Mining Techniques For Correlating Phenotypic Expressions With Genomic and Medical Characteristics This work has been supported by DTC, IBM and NSF.
William Norris Professor and Head, Department of Computer Science
CARPENTER Find Closed Patterns in Long Biological Datasets
Data Mining for Biomedical Informatics
Association Analysis Techniques for Bioinformatics Problems
William Norris Professor and Head, Department of Computer Science
Data Model.
GPX: Interactive Exploration of Time-series Microarray Data
Data Mining for Finding Connections of Disease and Medical and Genomic Characteristics Vipin Kumar William Norris Professor and Head, Department of Computer.
CS 485G: Special Topics in Data Mining
Madeira, Teixeira, Sa-Correia, Oliveira TCBB Volumn 7(1) 2010
Presentation transcript:

An Association Analysis Approach to Biclustering website: Gaurav Pandey, Gowtham Atluri, Michael Steinbach, Chad L. Myers and Vipin Kumar Department of Computer Science and Engineering, University of Minnesota MOTIVATION REFERENCES This work has been supported by NSF grants #CRI , #IIS and #ITR Computational resources for this work were provided by MSI. Computational Approaches for Protein Function Prediction: A Survey, Gaurav Pandey, Vipin Kumar, Michael Steinbach, Technical Report , October 2006, Department of Computer Science, University of Minnesota Association Analysis-based Transformations for Protein Interaction Networks: A Function Prediction Case Study, Gaurav Pandey, Michael Steinbach, Rohit Gupta, Tushar Garg, Vipin Kumar, Proceedings of ACM KDD, pp , 2007 Association Analysis for Real-valued Data: Definitions and Application to Microarray Data, Gaurav Pandey, Gowtham Atluri, Michael Steinbach, Vipin Kumar, TR , Department of Computer Science, University of Minnesota, 2008 H. Xiong, G. Pandey, M. Steinbach, and V. Kumar. Enhancing data analysis with noise removal. IEEE Transactions on Knowledge and Data Engineering, 18(3):304–319, H. Xiong, X. He, C. Ding, Y. Zhang, V. Kumar, and S. R. Holbrook. Identification of functional modules in protein complexes via hyperclique pattern discovery. In Proc. Pacific Symposium on Biocomputing (PSB), pages 221–232, H. Xiong, P.-N. Tan, and V. Kumar. Hyperclique pattern discovery. Data Min. Knowl. Discov., 13(2):219–242, ACKNOWLEDGEMENTS Pruned supersets Found to be Infrequent APPROACH RESULTS Functional enrichment for large classes ( members) Functional enrichment for small classes (1-30 members) Fraction of patterns (biclusters) enriched by several groups of small classes at p-value 1x10 -5 Fraction of class covered by patterns (biclusters) among several groups of small classes at p-value 1x10 -5 Constant row (column) biclusters Constant addition biclusters Bicluster: Group of objects showing similarity over only a subset of the features in a data set. Problem studied extensively for microarray data for finding various type of biclusters Finds more functionally enriched groups of genes than hierarchical clustering [Prelic et al, 2006] Constant addition biclusters Constant value biclusters Define an objective function/measure for coherence of a bicluster Reorder rows and columns for global minimum Eliminate rows and columns for local minimum Eliminate rows and columns from random seed Coclustering Cheng & Church (CC) ISA Common Issues CURRENT BICLUSTERING APPROACHES Non-exhaustive Heuristic search scheme doesn’t enumerate all biclusters satisfying the specified condition Bias towards larger biclusters Objective function/measure satisfied early Non-overlapping biclusters (some) Madeira & Oliveira, 2004 Association patterns are biclusters! Range Support: An anti-monotonic support measure for real-valued data!  Constraints imposed: Consistency of expression values Same direction of expression These conditions satisfied over substantial number of conditions  Can be used within an Apriori-like framework [Agrawal et al. 1994]  Implementation at Advantages Disadvantage Exhaustive (and efficient) discovery of biclusters. Can discover small biclusters owing to bottom-up search procedure. Need to binarize or discretize the original real- valued data set which causes a loss of information [Becquet et al, 2002; Creighton et al, 2003; McIntosh et al, 2007]