A Visualization Model Based on Adjacency Data by Edward Condon Bruce Golden S. Lele S. Raghavan Edward Wasil Presented at Miami Beach INFORMS Conference.

Slides:



Advertisements
Similar presentations
Clustering Art & Learning the Semantics of Words and Pictures Manigantan Sethuraman.
Advertisements

Lecture 5 Graph Theory. Graphs Graphs are the most useful model with computer science such as logical design, formal languages, communication network,
Lecture 6; The Finite Element Method 1-dimensional spring systems (modified ) 1 Lecture 6; The Finite Element Method 1-dimensional spring systems.
Clustering Categorical Data The Case of Quran Verses
8.3 Representing Relations Connection Matrices Let R be a relation from A = {a 1, a 2,..., a m } to B = {b 1, b 2,..., b n }. Definition: A n m  n connection.
Introduction to Graph “theory”
Reachability as Transitive Closure
Graphs Chapter 12. Chapter Objectives  To become familiar with graph terminology and the different types of graphs  To study a Graph ADT and different.
Distributed Breadth-First Search with 2-D Partitioning Edmond Chow, Keith Henderson, Andy Yoo Lawrence Livermore National Laboratory LLNL Technical report.
STUDYING COLLEGE TEXTBOOKS AND INTERPRETING VIAUAL AND GRAPHIC AIDS
Copyright © 2010 Pearson Education InternationalChapter Designing Visual Communication.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Applied Discrete Mathematics Week 12: Trees
CS447/ECE453/SE465 Prof. Alencar University of Waterloo 1 CS447/ECE453/SE465 Software Testing Tutorial Winter 2008 Based on the tutorials by Prof. Kontogiannis,
Neural Networks for Optimization William J. Wolfe California State University Channel Islands.
Neural Networks for Optimization Bill Wolfe California State University Channel Islands.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Graphs.
Localization from Mere Connectivity Yi Shang (University of Missouri - Columbia); Wheeler Ruml (Palo Alto Research Center ); Ying Zhang; Markus Fromherz.
CS 261 – Data Structures Graphs. Used in a variety of applications and algorithms Graphs represent relationships or connections Superset of trees (i.e.,
Introduction to Graphs
Language Arts Technology Resources How can it help you?
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Insight Through Computing 18. Two-Dimensional Arrays Set-Up Rows and Columns Subscripting Operations Examples.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Hypercubes and Neural Networks bill wolfe 10/23/2005.
What is GIS Geographic Information Systems –System capable of capturing, storing, analyzing, and displaying geographical data identified according to.
MR2300: MARKETING RESEARCH PAUL TILLEY Unit 10: Basic Data Analysis.
 Limit Cycle Isolated closed trajectories Analysis is, in general, quite difficult. For nonlinear systems, two simple results exist. Limit Cycle
A Genetic Algorithm-Based Approach for Building Accurate Decision Trees by Z. Fu, Fannie Mae Bruce Golden, University of Maryland S. Lele, University of.
Matrix Completion Problems for Various Classes of P-Matrices Leslie Hogben Department of Mathematics, Iowa State University, Ames, IA 50011
CS 376b Introduction to Computer Vision 02 / 22 / 2008 Instructor: Michael Eckmann.
“Topological Index Calculator” A JavaScript application to introduce quantitative structure-property relationships (QSPR) in undergraduate organic chemistry.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Graphs.
1 Data models Vector data model Raster data model.
© Prentice Hall, 2007 Business Communication Essentials, 3eChapter Writing and Completing Reports and Proposals.
Unaddition (Subtraction)
IS201 Agenda: 10/15/2013 Do form and report exercise. Identify general guidelines for form and report design. Discuss a few key points about reports in.
Based on slides by Y. Peng University of Maryland
© Prentice Hall, 2008 Excellence in Business Communication, 8eChapter Writing Business Reports and Proposals.
A genetic approach to the automatic clustering problem Author : Lin Yu Tseng Shiueng Bien Yang Graduate : Chien-Ming Hsiao.
1 1 © 2003 Thomson  /South-Western Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
A Computational Study of Three Demon Algorithm Variants for Solving the TSP Bala Chandran, University of Maryland Bruce Golden, University of Maryland.
Some Examples of Visualization in Data Mining by Bruce Golden R. H. Smith School of Business University of Maryland Presented at CORS 2002, Toronto, June.
Matrix Completion Problems for Various Classes of P-Matrices Leslie Hogben Department of Mathematics, Iowa State University, Ames, IA 50011
COSC 2007 Data Structures II Chapter 14 Graphs I.
Graphs & Matrices Todd Cromedy & Bruce Nicometo March 30, 2004.
© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
Graphs A graphs is an abstract representation of a set of objects, called vertices or nodes, where some pairs of the objects are connected by links, called.
Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Graph Theory. undirected graph node: a, b, c, d, e, f edge: (a, b), (a, c), (b, c), (b, e), (c, d), (c, f), (d, e), (d, f), (e, f) subgraph.
Example 10.4 Measuring the Effects of Traditional and New Styles of Soft-Drink Cans Hypothesis Test for Other Parameters.
© Prentice Hall, 2007 Excellence in Business Communication, 7eChapter Writing Reports and Proposals.
Basic Graph Algorithms Programming Puzzles and Competitions CIS 4900 / 5920 Spring 2009.
Copyright © Curt Hill Graphs Definitions and Implementations.
Chapter 9: Graphs.
CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.
3/13/2016 Data Mining 1 Lecture 2-1 Data Exploration: Understanding Data Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB)
CSC321: Extra Lecture (not on the exam) Non-linear dimensionality reduction Geoffrey Hinton.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Structural Properties of Networks: Introduction
Structural Properties of Networks: Introduction
Comp 245 Data Structures Graphs.
Network Science: A Short Introduction i3 Workshop
Centrality in Social Networks
Party-by-Night Problem
Graphs, Trees and Algorithms 1
TECHNICAL REPORTS WRITING
Presentation transcript:

A Visualization Model Based on Adjacency Data by Edward Condon Bruce Golden S. Lele S. Raghavan Edward Wasil Presented at Miami Beach INFORMS Conference – Nov. 2001

Focus of Paper The focus of this paper will be on a visualization project based on adjacency data (Fiske data) The paper illustrates the power of visualization Visualization generates insights and impact 1

Motivation Typically, data are provided in multidimensional format A large table where the rows represent countries and the columns represent socio-economic variables Alternatively, data may be provided in adjacency format Consumers who buy item a are likely to buy or consider buying items b, c, and d also Students who apply to college a are likely to apply to colleges b, c, and d also 2

More on adjacency If the purchase of item i results in the recommendation of item j, then item j is adjacent to item i Adjacency data for n alternatives can be summarized in an n x n adjacency matrix, A = (a ij ), where 1 if item j is adjacent to item i, and 0 otherwise Adjacency is not necessarily symmetric Motivation -- continued 3

Adjacency indicates a notion of similarity Given adjacency data w.r.t. n items or alternatives, can we display the items in a two-dimensional map? Traditional tools such as multidimensional scaling and Sammon maps work well with data in multidimensional format Can these tools work well with adjacency data? 4

Sammon Map of World Poverty Data Set (World Bank, 1992) 5

Obtaining Distances from Adjacency Data How can we use linkage information to determine distances ? 6 a b c d e items adjacent to a items adjacent to b items adjacent to c items adjacent to d

Obtaining Distances from Adjacency Data -- continued 1.Start with the n x n 0-1 asymmetric adjacency matrix 2.Convert the adjacency matrix to a directed graph Create a node for each item (n nodes) Create a directed arc from node i to node j if a ij = 1 3.Compute distance measures Each arc has a length of 1 Compute the all-pairs shortest path distance matrix D The distance from node i to node j is d ij 7

4.Modify the distance matrix D, to obtain a final distance matrix X Symmetry Disconnected components Example 1 Obtaining Distances from Adjacency Data -- continued A =

Find shortest paths between all pairs of nodes to obtain D Average d ij and d ji to arrive at a symmetric distance matrix X Example 1 -- continued 9

A and B are strongly connected components The graph below is weakly connected There are paths from A to B, but none from B to A MDS and Sammon maps require that distances be finite Example 2 10

Basic idea: simply replace all infinite distances with a large finite value, say R If R is too large The points within each strongly connected component will be pushed together in the map Within-component relationships will be difficult to see If R is too small Distinct components (e.g., A and B) may blend together in the map Ensuring Finite and Symmetric Distances 11

R must be chosen carefully (see Technical Report) This leads to a finite distance matrix D Next, we obtain the final distance matrix X where X becomes input to a Sammon map or MDS procedure Ensuring Finite and Symmetric Distances -- continued 12

Data source: The Fiske Guide to Colleges, 2000 edition Contains information on 300 colleges Approx. 750 pages Loaded with statistics and ratings For each school, its biggest overlaps are listed Overlaps: “the colleges and universities to which its applicants are also applying in greatest numbers and which thus represent its major competitors” Application: College Selection 13

Penn’s overlaps are Harvard, Princeton, Yale, Cornell, and Brown Harvard’s overlaps are Princeton, Yale, Stanford, M.I.T., and Brown Note the lack of symmetry Harvard is adjacent to Penn, but not vice versa Overlaps and the Adjacency Matrix 14

Proof of Concept Start with 300 colleges and the associated adjacency matrix From the directed graph, several strongly connected components emerge We focus on the four largest to test the concept (100 schools) Component A has 74 schools Component B has 11 southern colleges Component C has 8 mainly Ivy League colleges Component D has 7 California universities 15

Sammon Map with Each School Labeled by its Component Identifier 16

Sammon Map with Each School Labeled by its Geographical Location 17

Sammon Map with Each School Labeled by its Designation ( Public (U) or Private (R) ) 18

Sammon Map with Each School Labeled by its Cost 19

Sammon Map with Each School Labeled by its Academic Quality 20

Six Panels Showing Zoomed Views of Schools that are Neighbors of Tufts University 21 (a) Identifier (f) School name(e) Academics (d) Cost(c) Public or private (b) State

Benefits of Visualization Adjacency (overlap) data provides “local” information only E.g., which schools are Maryland’s overlaps ? With visualization, “global” information is more easily conveyed E.g., which schools are similar to Maryland ? 22

Benefits of Visualization -- continued Within group (strongly connected component) and between group relationships are displayed at same time A variety of what-if questions can be asked and answered using maps Based on this concept, a web-based DSS for college selection is easy to envision 23

Conclusions The approach represents a nice application of shortest paths to data visualization The resulting maps convey more information than is immediately available in The Fiske Guide Visualization encourages what-if analysis of the data Can be applied in other settings (e.g., web-based recommender systems) 24