Presentation is loading. Please wait.

Presentation is loading. Please wait.

The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.

Similar presentations


Presentation on theme: "The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006."— Presentation transcript:

1 The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006

2 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide2 10/11/2006 Graph Mining in Microarray Overview A quick review of PCA Graph mining in microarray analysis Graph indexing

3 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide3 10/11/2006 Graph Mining in Microarray Data Matrix The data matrix: where is a column vector is the column mean of

4 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide4 10/11/2006 Graph Mining in Microarray Projection Project the data matrix to a line where is a unit column vector The variance of the projection is Where is the covariance matrix

5 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide5 10/11/2006 Graph Mining in Microarray To find λ that maximizes V subject to Let k be a Lagrange multiplier Chose the engenvector with the largest eigenvalue Therefore λ is an eigenvector of Σ. Derivation of PCs

6 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide6 10/11/2006 Graph Mining in Microarray Relational Graph Each node represents a distinct entity Social networks Gene relevance networks Protein interaction networks YKL172W YOR206W YPL146C … …

7 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide7 10/11/2006 Graph Mining in Microarray Motivation Highly connected subgraphs in a large graph usually are not artifacts (group, functionality) Recurrent patterns discovered in multiple graphs are more robust than the patterns mined from a single graph

8 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide8 10/11/2006 Graph Mining in Microarray Microarray Data Analysis How to integration results from multiple microarray experiments that are performed on the same set of genes?

9 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide9 10/11/2006 Graph Mining in Microarray Problem Definition Given a set of relational graphs, find all frequent closed subgraphs with high edge connectivity

10 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide10 10/11/2006 Graph Mining in Microarray Constraints Highly connected subgraph The edge connectivity is greater than a threshold Frequent subgraph A subgraph is frequent if a large number of graphs contain this subgraph Closed subgraph A subgraph is closed if there does not exist a supergraph that has the same support

11 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide11 10/11/2006 Graph Mining in Microarray Minimum Cut Decomposition A minimal cut of a graph G is the (minimal) set of edges, once removed from G, G becomes an unconnected graphs. The connectivity of G is defined as the size of the minimal cut of G. Problem: find subgraphs in a graph such that its minimum cut size (edge connectivity) is greater than K G

12 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide12 10/11/2006 Graph Mining in Microarray Minimum Cut Decomposition Solution: repeatedly find a minimum cut in the graph and remove the cut edges until the minimum cut size is greater than K or there is no edge left

13 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide13 10/11/2006 Graph Mining in Microarray Challenges How to perform minimum cut decomposition in the context of multiple relational graphs How to integrate with pattern-growth approach and pattern-reduction approach

14 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide14 10/11/2006 Graph Mining in Microarray No Downward Closure Property Given two graphs G and G’, if G is a subgraph of G’, it does not imply that the connectivity of G is less than that of G’, and vice versa. G G’

15 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide15 10/11/2006 Graph Mining in Microarray Minimum Degree Constraint Let G be a frequent graph and X be the set of edges which can be added to G such that G U e (e ε X) is connected and frequent. Graph G U X is the maximal graph that can be extended for the vertices belong to G. G G U X

16 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide16 10/11/2006 Graph Mining in Microarray Pattern-Growth Approach Find a small frequent candidate graph Remove vertices (shadow graph) whose degree is less than the connectivity Decompose it to extract the subgraphs satisfying the connectivity constraint Stop decomposing when the subgraph has been checked before Extend this candidate graph by adding new vertices and edges Repeat

17 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide17 10/11/2006 Graph Mining in Microarray Pattern-Reduction Approach Decompose the relational graphs according to the connectivity constraint

18 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide18 10/11/2006 Graph Mining in Microarray Pattern-Reduction Approach (cont.) Intersect them and decompose the resulting subgraphs + + decompose

19 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide19 10/11/2006 Graph Mining in Microarray Experimental Results Pattern-growth approach: CloseCut Pattern-reduction approach: Splat Synthetic data: the number of graphs, objects, seeds, the size of seeds, the density, the number of seeds per graph, and the density of noise edges

20 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide20 10/11/2006 Graph Mining in Microarray Experimental Results (cont.) 32 yeast microarray data sets from Stanford Microarray Database and the NCBI Gene Expression Omnibus Each data set has the expression profiles of 6,661 genes in at least 8 experiments, Cell cycle Amino acid starvation Heat shock … We constructed 32 relational graphs from this dataset

21 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide21 10/11/2006 Graph Mining in Microarray Experimental Results (Synthetic Data)

22 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide22 10/11/2006 Graph Mining in Microarray Experimental Results (32 Microarray Datasets )

23 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide23 10/11/2006 Graph Mining in Microarray Discovered Patterns Ribosomal RNA Processing UNKNOWN

24 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide24 10/11/2006 Graph Mining in Microarray Discovered Patterns Ribosomal Biogenesis UNKNOWN

25 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide25 10/11/2006 Graph Mining in Microarray Summary Introduce a new graph mining problem Develop graph algorithms in the context of multiple graphs, where the existing methods should be re-examined Demonstrate the applicability of frequent graph mining in biological network

26 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide26 10/11/2006 Graph Mining in Microarray Types of Graph Database Queries Given a query graph Q and a graph database G, perform one of the following: Graph Isomorphism Query: Find a graph in G equivalent to Q. Subgraph Isomorphism Query: Find all graphs in G with a subgraph equivalent to Q. Similarity Query: Find all graphs in G which are similar to Q.

27 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide27 10/11/2006 Graph Mining in Microarray Graph Isomorphism Let V(G) be the vertex set of a graph and E(G) its edge set. Graphs G and H are isomorphic iff there is a bijection f: V(G) →V(H) such that uv ε E(G) if and only if f(u)f(v) ε E(G).

28 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide28 10/11/2006 Graph Mining in Microarray Graph Labeling All nodes in a graph may be considered equivalent. Labels in such a graph are merely names. Alternatively, graphs may be labeled with class labels. For example, in the graph of benzene, vertexes labeled with “C” correspond with carbon atoms. Vertexes with “H” correspond with hydrogen atoms. Nodes and edges with different class labels are not considered interchangeable.

29 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide29 10/11/2006 Graph Mining in Microarray Class Labels and Isomorphism Under a class labeling scheme, graph isomorphism limits a bijection to only map nodes/edges with an equivalent class label. A B Z B B M Z B A B B M A A Z B B M  

30 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide30 10/11/2006 Graph Mining in Microarray Subgraph Isomorphism Let V(G) be the vertex set of a graph and E(G) its edge set. Graphs G and H are sub-isomorphic iff there is an injection f: V(G) -> V(H) such that uv ε E(G) if and only if f(u)f(v) ε E(G). In other words: A graph G is sub-isomorphic to graph H iff graph G is isomorphic to at least one subgraph of H.

31 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide31 10/11/2006 Graph Mining in Microarray Graph Similarity What makes two graphs similar? Abstractly, two graphs can be described as similar if they have a high number of corresponding nodes and edges. However, in depending on the interpretation, the change of a single node may (or may not) completely change the properties of the represented object. Thus, similarity is application dependent.

32 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide32 10/11/2006 Graph Mining in Microarray N M BB M BB M BBBB Graph Similarity For this discussion, similarity between two graphs, G1 and G2, is defined as the maximum number of node and edge matches under any mapping of nodes between them. A Z B BB M BB M BBBB Similarity = 6 Similarity = 3

33 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide33 10/11/2006 Graph Mining in Microarray A Graph Isomorphism Query A B Z B B M BB M BB N BB Z BB M BB M + Database of Graphs Graph Query => BB M Graph Isomorphism Matches BB

34 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide34 10/11/2006 Graph Mining in Microarray A Subgraph Isomorphism Query A B Z B B M BB M BB N BB Z BB M BB M + Database of Graphs Graph Query => A B Z B B M BB M Subgraph Isomorphism Matches BB

35 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide35 10/11/2006 Graph Mining in Microarray A Graph Similarity Query A B Z B B M BB M BB N BB Z BB M BB M + Database of Graphs Graph Query => A B Z B B M BB M Subgraph Similarity Matches BB Similarity Criteria > 4 BB N BB Z

36 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide36 10/11/2006 Graph Mining in Microarray Computational Challenges Pairwise graph comparisons is difficult. Graph isomorphism problem is GI-Complete. Subgraph isomorphism problem is NP-Complete. Usual similarity comparisons also not in P. Graph databases are often large in size. NCI/NIH AIDS antiviral screen dataset contains ~42,000 chemical compounds with average 25 vertices and 27 edges. Intelligent indexing and filtering is needed!

37 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide37 10/11/2006 Graph Mining in Microarray Related Work GraphGrep by Sasha et al. Filters by enumerating all possible node-to-node paths up to a specified maximum length. GIndex by Yan et al. Indexes by finding distinctive features from frequently occurring subgraphs. Limitations: Support only discrete values for nodes and edges. Require exhaustive enumeration of features. Summarizing features lose information about graphs.

38 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide38 10/11/2006 Graph Mining in Microarray Graph Closures A graph closure is a an element-wise union of graphs. It has the characteristics of a graph except that instead of singleton labels, a graph closure can have multiple labels. The symbol ε represents a null label. BB M BC M C B {B,C} M {C, ε} G1G1 G2G2 C 1 = Closure(G 1, G 2 )

39 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide39 10/11/2006 Graph Mining in Microarray Volume of Graph Closures A graph closure is a bounding container which can contain one or more graphs. The volume of a graph closure is determined by the number of graph permutations it contains. B {B,C} M {C, ε} BB M BB M C BC M C BC M Volume(C 1 ) = 4 C1C1 ==

40 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide40 10/11/2006 Graph Mining in Microarray Isomorphism of Graph Closures Isomorphism can be extended to graph closures. When matching, any label of node or edge can be used: A graph is sub-isomorphic to a graph closure if it is sub-isomorphic to at least one of the graphs it encloses. B {B,C} M {C, ε} => BB M B M C C M Sub-isomorphs

41 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide41 10/11/2006 Graph Mining in Microarray Pseudo Subgraph Isomorphism

42 Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide42 10/11/2006 Graph Mining in Microarray Further Readings H. Hu, X. Yan, Yu, J. Han and X. J. Zhou, Mining coherent dense subgraphs across massive biological networks for functional discoveryISMB'05. X. Yan, X. Jasmine Zhou, and J. Han, Mining closed relational graphs with connectivity constraints, by SIGKDD'05. Huahai He, Singh, A.K. Closure-Tree: An Index Structure for Graph Queries, ICDE’06 David Williams, Jun Huan, Wei Wang Graph Database Indexing Using Structured Graph Decomposition, ICDE’07


Download ppt "The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006."

Similar presentations


Ads by Google