Presentation on theme: "5/12/2015PhD seminar CS BGU Counting subgraphs Support measures for graphs Natalia Vanetik."— Presentation transcript:
5/12/2015PhD seminar CS BGU Counting subgraphs Support measures for graphs Natalia Vanetik
PhD seminar CS BGU 5/12/2015 This research was carried out under the supervision of Prof. Eyal S. Shimony and Prof. Ehud Gudes. Published in DAMI Journal vol. 13(2), September 2006.
PhD seminar CS BGU 5/12/2015 Multiflows in graphs Counting functions in graphs Research directions
PhD seminar CS BGU 5/12/2015 Problem description Let D and G be graphs. We need to measure statistical significance of G as a subgraph of D. Observe instances (isomorphic copies) of G within D. D D D G GG G G G G G G has zero significanceG has high significanceG has some significance
PhD seminar CS BGU 5/12/2015 Definition A counting function on graphs that measures statistical significance of one graph G as a subgraph of another graph D is called a support measure. It is obvious that when G is not a subgraph of D, this function should return 0. Otherwise, it should return value greater than 0.
PhD seminar CS BGU 5/12/2015 Traditional support measure An item-set X in relational model is a set of tuples (f 1,v 1 ),…,(f n,v n ) where f i are the names of fields and v i are values. A transaction T supports X if the value of f i in it equals to v i for every i=1…n. A support of an item-set X is the number of transactions in the database that support X.
PhD seminar CS BGU 5/12/2015 Admissibility It is important, especially for graph mining, that support measure is admissible or has a downward closure property or antimonotonicity: support of a graph cannot be smaller than support of its supergraph. support of a graph cannot be larger than support of any of its subgraphs.
PhD seminar CS BGU 5/12/2015 Motivation Significant amount of data in the world is graph- like and not relational. Graph data is usually represented by one or more large graphs. Transaction-like graph datasets are rare. Traditional support definition is not admissible. Admissible support measures are required for mining the graph data and other tasks.
PhD seminar CS BGU 5/12/2015 Instance graph We observe all the subgraphs of G in D, called instances. Instances are thought to be connected if they have an edge/node/subgraph in common. A graph with instances of G as nodes and edges between every pair of connected vertices is called the instance graph of G in D.
PhD seminar CS BGU 5/12/2015 Instance graph: an example G D Instance graph of G in D
PhD seminar CS BGU 5/12/2015 Intuitive support measures Just count the instances. Perform some sort of a weighted count. G Count D (G)=3 G Wcount D (G) = Count D (G) / 3 =1 D D
PhD seminar CS BGU 5/12/2015 The problem with intuitive approach is… …that these measures are not admissible: Count D (G)=3 Count D (g)=1 G g D D Wcount D (G)=1+1=2 Wcount D (g)= 1/2+1/2+1/2=3/2 G g
PhD seminar CS BGU 5/12/2015 What is going on? A counting function can be viewed as acting on the instance graph. A graph g and its supergraph G have different instance graphs I g and I G, and I g is obtained from I G by a series of graph operations. If a counting function does not decrease under these operations, it is admissible (for specific G and g, at least).
PhD seminar CS BGU 5/12/2015 Operations on instance graphs We narrowed it down to the following three operations on instance graphs: clique contraction, node addition, edge deletion.
PhD seminar CS BGU 5/12/2015 Clique contraction A clique is contracted into a single node. Another node is incident to the new one only if it was incident to all the nodes in the clique. Intuition behind it: G G g
PhD seminar CS BGU 5/12/2015 Node addition A new node and some edges incident to this node are added. Intuition behind it: G G g g g
PhD seminar CS BGU 5/12/2015 Edge removal An edge is removed. Intuition behind it: G G gg
PhD seminar CS BGU 5/12/2015 The main result Theorem. A support measure on graphs is admissible if and only if it does not decrease under following operations on instance graphs: 1. clique contraction, 2. edge removal, 3. node addition.
PhD seminar CS BGU 5/12/2015 Sufficiency To prove sufficiency for these three operations, we need to show that for every graph D and every pair of graphs G and g, s.t. g is a subgraph of G, the instance graph I g of g is obtained from the instance graph I G of G by these operations alone.
PhD seminar CS BGU 5/12/2015 Sufficiency: proof outline The proof is constructive (algorithmic). The main idea is to build a pair of mappings, first from instances of G to instances of g and second from instances of g to instances of G. Perform clique contractions and node additions to obtain a vertex set of I g from a vertex set of I G. Perform edge deletions as necessary.
PhD seminar CS BGU 5/12/2015 Necessity To prove the necessity, we need to show that for every graph H and every operation (from the above list) that produces a graph h, there exist a database graph D and a pair of its subgraphs G and g, where g is a subgraph of G, so that H=I G and h=I g.
PhD seminar CS BGU 5/12/2015 Necessity: proof outline The proof is constructive. Specific graphs G and g are constructed. For convenience, these graphs are labeled. Intersection types for instances of G and g in D are defined. D is constructed accordingly.
PhD seminar CS BGU 5/12/2015 Necessity: the patterns a … … a a a b c dd dd G a … a a a b g Top Bottom Legs Arms Legs
PhD seminar CS BGU 5/12/2015 Necessity: intersection Following intersection types are allowed in D: Bottom overlap: all legs of two instances overlap. Leg overlap: two instances have exactly one leg in common. Arm overlap: two instances have exactly one arm in common.
PhD seminar CS BGU 5/12/2015 Bottom overlap: for clique contraction a … … a a a b c dd dd G1G1 … c dd dd … c dd dd G3G3 G2G2
PhD seminar CS BGU 5/12/2015 Leg overlap: for node addition a … … a a a b c dd dd G1G1 … … a a b c dd dd G2G2
PhD seminar CS BGU 5/12/2015 Arm overlap: for edge removal a … … a a a b c dd dd G1G1 … … a a b c d d G2G2 a a
PhD seminar CS BGU 5/12/2015 Necessity: proof outline Use instances of G to construct the database graph D. Prove that no additional instances of G arise from the overlaps. Show that the instance graph of g arises from the instance graph of G by applying the chosen operation.
PhD seminar CS BGU 5/12/2015 MIS measure MIS measure is the size of maximum independent set (anti-clique) in the instance graph. It satisfies the necessity conditions (direct admissibility proof is also available). It was used in several papers (Han, Kuramochi etc.) No other admissible support measure have been found to date.
PhD seminar CS BGU 5/12/2015 MIS: example G D Instance graph I G of G in D MIS(I G )=1
PhD seminar CS BGU 5/12/2015 Extensions Necessary and sufficient conditions can be re-formulated for different pattern intersection types (for example, a common node can be considered an intersection).
PhD seminar CS BGU 5/12/2015 Open problems and conjectures Is computation of an admissible support measure an NP-hard problem, regardless of the measure chosen? Is any admissible support measure a function on MIS size? What kind of a function?