Dimacs Graph Mining (via Similarity Measures) Ye Zhu Stephanie REU-DIMACS, July 17, 2009 Mentor : James Abello.

Slides:

Advertisements

Similar presentations

Chapter 4 Partition I. Covering and Dominating.

Advertisements

Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.

Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.

Clustering Categorical Data The Case of Quran Verses

Evaluation of Clustering Techniques on DMOZ Data  Alper Rifat Uluçınar  Rıfat Özcan  Mustafa Canım.

Data Mining Techniques: Clustering

Incidences and Many Faces via cuttings Sivanne Goldfarb

A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.

Graph Visualization CSC4170 Web Intelligence and Social Computing Tutorial 2 Tutor: Tom Chao Zhou

Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.

Investigating JAVA Classes with Formal Concept Analysis Uri Dekel Based on M.Sc. work at the Israeli Institute of Technology. To appear:

Visual Mining of Communities in Complex Networks: Bringing Humans Into the Loop Perceptual Science and Technology REU Jack Murtagh & Florentina Ferati.

Polynomial-Time Approximation Schemes for Geometric Intersection Graphs Authors: T. Erlebach, L. Jansen, and E. Seidel Presented by: Ping Luo 10/17/2005.

CS5371 Theory of Computation Lecture 1: Mathematics Review I (Basic Terminology)

Example Data Sets Prior Research Join related objects to form independent compound objects, cluster normally (Yin et al., 2005). Use attribute-based distance.

Inverse Trig Functions Remember that the inverse of a relationship is found by interchanging the x’s and the y’s. Given a series of points in.

Feature Selection for Automatic Taxonomy Induction The Features Input: Two terms Output: A numeric score, or. Lexical-Syntactic Patterns Co-occurrence.

Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)

Data Mining Chun-Hung Chou

Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.

Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.

UNDERSTANDING DYNAMIC BEHAVIOR OF EMBRYONIC STEM CELL MITOSIS Shubham Debnath 1, Bir Bhanu 2 Embryonic stem cells are derived from the inner cell mass.

Dependency Tracking in software systems Presented by: Ashgan Fararooy.

Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.

7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.

Module 5 – Networks and Decision Mathematics Chapter 23 – Undirected Graphs.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Taxonomy of Similarity Mechanisms for Case-Based Reasoning.

Pajek – Program for Large Network Analysis Vladimir Batagelj and Andrej Mrvar.

A Graph-based Friend Recommendation System Using Genetic Algorithm

Activity Set 2.2 PREP PPTX Visual Algebra for Teachers.

CSM Workshop 1: Zeros of Graph Polynomials Enumeration of Spanning Subgraphs with Degree Constraints Dave Wagner University of Waterloo.

IAT Text ______________________________________________________________________________________ SCHOOL OF INTERACTIVE ARTS + TECHNOLOGY [SIAT]

Advanced Relational Algebra & SQL (Part1 )

Excel Screen Slide 1 Column Row Cell Formula bar Column heading Row heading Worksheet tab.

Mining and Visualizing the Evolution of Subgroups in Social Networks Falkowsky, T., Bartelheimer, J. & Spiliopoulou, M. (2006) IEEE/WIC/ACM International.

Presenter ： Kuang-Jui Hsu Date ： 2011/3/24(Thur.).

2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.

Relations. Important Definitions We covered all of these definitions on the board on Monday, November 7 th. Definition 1 Definition 2 Definition 3 Definition.

Multimodal Analysis Using Network Analyst. Outline Summarizing accessibility Summarizing accessibility Adding transportation modes to a network Adding.

Relation. Combining Relations Because relations from A to B are subsets of A x B, two relations from A to B can be combined in any way two sets can be.

Information Retrieval Chapter 2 by Rajendra Akerkar, Pawan Lingras Presented by: Xxxxxx.

Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Hedonic Clustering Games Moran Feldman Joint work with: Seffi Naor and Liane Lewin-Eytan.

Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.

Preliminary Transformations Presented By: -Mona Saudagar Under Guidance of: - Prof. S. V. Jain Multi Oriented Text Recognition In Digital Images.

Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.

Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.

Clustering [Idea only, Chapter 10.1, 10.2, 10.4].

Anupam Saxena Associate Professor Indian Institute of Technology KANPUR

1 New metrics for characterizing the significance of nodes in wireless networks via path-based neighborhood analysis Leandros A. Maglaras 1 Dimitrios Katsaros.

UNIT V STATES, STATE GRAPHS, AND TRANSITION TESTING STATE GRAPHS

A Methodology for Finding Bad Data

3-4 Functions Course 3 Warm Up Problem of the Day Lesson Presentation.

Minimum-Segment Convex Drawings of 3-Connected Cubic Plane Graphs

Advanced Higher Computing Based on Heriot-Watt University Scholar Materials Applications of AI – Vision and Languages 1.

Associative Query Answering via Query Feature Similarity

Sam Somuah REU-DIMACS 2010 Mentor: James Abello

School of Computer Science & Engineering

Using Friendship Ties and Family Circles for Link Prediction

Discrete Math: Hamilton Circuits

Methodology & Current Results

Graphing on the Coordinate Plane

Somi Jacob and Christian Bach

Wikipedia Network Analysis: Commonality detection among Wikipedia authors Deepthi Sajja.

Software Design Methodologies and Testing

Warm Up Problem of the Day Lesson Presentation Lesson Quizzes.

Graphing on the Coordinate Plane

Chapter 10: Compilers and Language Translation

Evaluation of Clustering Techniques on DMOZ Data

Presentation transcript:

Dimacs Graph Mining (via Similarity Measures) Ye Zhu Stephanie REU-DIMACS, July 17, 2009 Mentor : James Abello

Talk Outline I. From Data to Graphs via Similarity Measure II. Our Research Project Input: REU participants information DIMACS Workshop Data Output: A variety of Graphs III. Main Questions a. Choose good similarity measures b. Visualize and detect “interesting” patterns

Original data records for building REU-Participants graphs

Original Data Records

DIMACS Workshop Abstracts

General Method Step 1: Compute a similarity measure among the data records shown above. Since a record can be viewed as a unweighted/weighted set of attributes we use unweighted/weighted version of an standard metric among finite sets that uses the size of the intersection over the size of the union between two sets

Weighted case Weighted case eat 0.7 shaggy 0.8 brown 0.9 fat 0.75 pet 0.6 hairy 0.85 fat 0.75 pet 0.6 pet fat dog cat

Computation Computation eat 0.7 shaggy 0.8 brown 0.9 fat 0.75 pet 0.6 hairy 0.85 fat 0.75 pet 0.6 pet fat dog cat

General Method Step 1: Compute a similarity measure among the data records shown above. Step 2: Deal with different types of data records respectively.

Computing Edge Weight To deal with different types of information, we partition the attributes into different classes according to their value types and compute a similarity measure for each class and then combine these values using a convex combination Eg. Total Weight= 0.3*Weighted Coeff+0.7*Unweighted Coeff

REU participants example How to calculate the Edge Weight? Unweighted Weighted

REU participants example How about the Vertices' Weight(ball size) We can simply convert these 3 columns to three-digit numbers !!!

General Method Step 1: Compute a similarity measure among the data records shown above. Step 2: Deal with different types of data records respectively. Step 3: Build weighted graph where each record is now treated as a vertex and two vertices are joined by an edge with weight equal to their computed similarity

General Method Step 1: Compute a similarity measure among the data records shown above. Step 2: Deal with different types of data records respectively. Step 3: Build weighted graph where each record is now treated as a vertex and two vertices are joined by an edge with weight equal to their computed similarity Step 4: Visualize the graph use GraphView Software and find interesting clusters

REU Participants Graph

Workshop Abstract Example Read in all workshop abstracts file Delete stop words -> unimportant words Get a count of number of appearances (freqency) of ALL words left in All workshop abstracts Compute Jaccard Coefficient

After-delete file

Dimacs Workshop Abstract Graph

Conclusion We have shown how data set records can be transformed into a weighted graph by using a similarity measure among records This methodology allows us to use powerful graph clustering techniques to analyze and visualize data bases.

References [ 1 ] GraphView system [ 2 ] C Gasperin, P Gamallo, A Agustini, G Lopes, V Lima Using syntactic contexts for measuring word similarity [3] Resnik, Philip (1999) Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research

Thank you! The end