Dimacs Graph Mining (via Similarity Measures) Ye Zhu Stephanie REU-DIMACS, July 17, 2009 Mentor : James Abello.

Dimacs Graph Mining (via Similarity Measures) Ye Zhu Stephanie REU-DIMACS, July 17, 2009 zzyye@eden.rutgers.edu Mentor : James Abello

Talk Outline I. From Data to Graphs via Similarity Measure II. Our Research Project Input: REU participants information DIMACS Workshop Data Output: A variety of Graphs III. Main Questions a. Choose good similarity measures b. Visualize and detect “interesting” patterns

Original data records for building REU-Participants graphs

Original Data Records

DIMACS Workshop Abstracts

General Method Step 1: Compute a similarity measure among the data records shown above. Since a record can be viewed as a unweighted/weighted set of attributes we use unweighted/weighted version of an standard metric among finite sets that uses the size of the intersection over the size of the union between two sets

Weighted case Weighted case eat 0.7 shaggy 0.8 brown 0.9 fat 0.75 pet 0.6 hairy 0.85 fat 0.75 pet 0.6 pet fat dog cat

Computation Computation eat 0.7 shaggy 0.8 brown 0.9 fat 0.75 pet 0.6 hairy 0.85 fat 0.75 pet 0.6 pet fat dog cat

General Method Step 1: Compute a similarity measure among the data records shown above. Step 2: Deal with different types of data records respectively.

Computing Edge Weight To deal with different types of information, we partition the attributes into different classes according to their value types and compute a similarity measure for each class and then combine these values using a convex combination Eg. Total Weight= 0.3*Weighted Coeff+0.7*Unweighted Coeff

REU participants example How to calculate the Edge Weight? Unweighted Weighted

REU participants example How about the Vertices' Weight(ball size) We can simply convert these 3 columns to three-digit numbers !!!

General Method Step 1: Compute a similarity measure among the data records shown above. Step 2: Deal with different types of data records respectively. Step 3: Build weighted graph where each record is now treated as a vertex and two vertices are joined by an edge with weight equal to their computed similarity

General Method Step 1: Compute a similarity measure among the data records shown above. Step 2: Deal with different types of data records respectively. Step 3: Build weighted graph where each record is now treated as a vertex and two vertices are joined by an edge with weight equal to their computed similarity Step 4: Visualize the graph use GraphView Software and find interesting clusters

REU Participants Graph

Workshop Abstract Example Read in all workshop abstracts file Delete stop words -> unimportant words Get a count of number of appearances (freqency) of ALL words left in All workshop abstracts Compute Jaccard Coefficient

After-delete file

Dimacs Workshop Abstract Graph

Conclusion We have shown how data set records can be transformed into a weighted graph by using a similarity measure among records This methodology allows us to use powerful graph clustering techniques to analyze and visualize data bases.

References [ 1 ] GraphView system [ 2 ] C Gasperin, P Gamallo, A Agustini, G Lopes, V Lima 2001- Using syntactic contexts for measuring word similarity [3] Resnik, Philip (1999) Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research

Thank you! The end

Dimacs Graph Mining (via Similarity Measures) Ye Zhu Stephanie REU-DIMACS, July 17, 2009 Mentor : James Abello.

Similar presentations

Presentation on theme: "Dimacs Graph Mining (via Similarity Measures) Ye Zhu Stephanie REU-DIMACS, July 17, 2009 Mentor : James Abello."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dimacs Graph Mining (via Similarity Measures) Ye Zhu Stephanie REU-DIMACS, July 17, 2009 Mentor : James Abello.

Similar presentations

Presentation on theme: "Dimacs Graph Mining (via Similarity Measures) Ye Zhu Stephanie REU-DIMACS, July 17, 2009 Mentor : James Abello."— Presentation transcript:

Similar presentations

About project

Feedback