Download presentation

Presentation is loading. Please wait.

1
Algorithms for Data Mining and Querying with Graphs Investigators: Padhraic Smyth, Sharad Mehrotra University of California, Irvine Students: Joshua O’ Madadhain, Dawit Seid, Jon Hutchins JUNG: JAVA Universal Network/ Graph Framework GAAL: A General-Purpose Graph Query Language Link Prediction Algorithms Example of software built using JUNG: Netsight, an interactive graph visualization and analysis tool - extensible, open source software library (API) for graph/network modeling, analysis, and visualization -can decorate graphs, vertices, edges with any JUNG object -complex filtering/transformation/subset management -includes library of network and graph algorithms -clustering, centrality, importance, paths, flows, etc -includes visualization API, or can use other visualization APIs (e.g. prefuse) -supports graphs, hypergraphs, parallel edges, mixed- mode graphs, k-partite graphs - active user/developer community -30,000 downloads, 1.3 million page visits -ranked #60 out of 100k Sourceforge projects -used in social network analysis, games, trust metrics, upcoming version of HP Zoomgraph, email visualization, and Netsight JUNG software is publicly available at http://jung.sourceforge.net We have developed a general predictive learning approach that can uses historical graph data to learn a predictive model of whether a link is likely to exist between any pair of nodes A and B in a future time- period. The prediction model utilizes information from both structural graph features around A and B, as well as individual node attributes for A and B. For example, for co-author graphs, features can include distance in the co-author graph of A from B, properties of A’s and B’s graph neighborhoods, and topic models in the form of probability distributions characterizing A’s and B’s research interests. We have developed a new query language called GAAL that allows users to express complex relational queries on attributed graphs, allowing for queries on graph properties, aggregation operations, and scalability to very large graphs. In 2005 we have extended this approach to provide an algebraic framework for spatio-temporal analysis of semantic graphs. Institute WorksIn Researcher Write Paper Cite Multi-relational (attributed graph) representation entity/event data Graph Schema and Other metadata Graph Schema Definition Interface Visualization/ Analysis Applications (NetSight) GAAL Language (Graph Querying Algebra) DBMS Specific Adapters Extensible ORDBMS Algorithms for Ranking Nodes in Dynamic Networks Email Rankings and Organizational Structure We have developed a novel algorithmic approach to the problem of determining the importance of nodes in a network where the links occur over time, e.g., an email network or a co-author network. The concept is similar to centrality ideas in social networks, and HITS and PageRank for Web page ranking, but produces a “dynamic rank” such that the rank of each node varies over time as it receives messages in the network. Example of Rankings over Time Results on KDD Challenge/Biobase Data This prediction competition in 2005 evaluated different approaches for link prediction. The specific problem was to predict new collaborations among 300,000 medical researchers in 2002, based on co- author relations in 128,000 papers published from 1998-2001. The figure to the right shows the “lift curve” the ratio of the number of true new collaborations predicted by our models’ rankings (relative to a random ranking). In the top 50 predictions for example, our models predict between 40 and 45 true collaborations (versus about 3 for a random ranking). Data: Corporate Email History 1 million emails, 21 months, 628 individuals Architecture Query Example Algebraic Framework

Similar presentations

© 2021 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google