A Brief Overview on Some Recent Study of Graph Data Yunkai Liu, Ph. D., Gannon University.

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.
Scale Free Networks.
Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale Network Theory: Computational Phenomena and Processes Social Network.
Network Matrix and Graph. Network Size Network size – a number of actors (nodes) in a network, usually denoted as k or n Size is critical for the structure.
Introduction to NodeXL Like MSPaint™ for graphs. — the Community.
Stelios Lelis UAegean, FME: Special Lecture Social Media & Social Networks (SM&SN)
Relationship Mining Network Analysis Week 5 Video 5.
Mining and Searching Massive Graphs (Networks)
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Applied Discrete Mathematics Week 12: Trees
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Centrality and Prestige HCC Spring 2005 Wednesday, April 13, 2005 Aliseya Wright.
HCS Clustering Algorithm
Introduction to Graphs
Centrality Measures These measure a nodes importance or prominence in the network. The more central a node is in a network the more significant it is to.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.
Section 8 – Ec1818 Jeremy Barofsky March 31 st and April 1 st, 2010.
Infrastructure of MANETs  MANETS are without a fixed infrastructure  Network Graphs in MANETS are rarely or ever connected  MANET routing protocols.
Social Network Analysis: A Non- Technical Introduction José Luis Molina Universitat Autònoma de Barcelona
Network properties Slides are modified from Networks: Theory and Application by Lada Adamic.
LANGUAGE NETWORKS THE SMALL WORLD OF HUMAN LANGUAGE Akilan Velmurugan Computer Networks – CS 790G.
Principles of Social Network Analysis. Definition of Social Networks “A social network is a set of actors that may have relationships with one another”
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Social Network Analysis (1) LING 575 Fei Xia 01/04/2011.
Science: Graph theory and networks Dr Andy Evans.
Vertices and Edges Introduction to Graphs and Networks Mills College Spring 2012.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 25, 2012.
Special Topics in Educational Data Mining HUDK5199 Spring 2013 March 25, 2012.
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Professor Yashar Ganjali Department of Computer Science University of Toronto
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Slides are modified from Lada Adamic
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
Topics Paths and Circuits (11.2) A B C D E F G.
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
OPTIMAL CONNECTIONS: STRENGTH AND DISTANCE IN VALUED GRAPHS Yang, Song and David Knoke RESEARCH QUESTION: How to identify optimal connections, that is,
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
Class 2: Graph Theory IST402. Can one walk across the seven bridges and never cross the same bridge twice? Network Science: Graph Theory THE BRIDGES OF.
How to Analyse Social Network? Social networks can be represented by complex networks.
G LOBAL S IMILARITY B ETWEEN M ULTIPLE B IONETWORKS Yunkai Liu Computer Science Department University of South Dakota.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Informatics tools in network science
Class 2: Graph Theory IST402.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Groups of vertices and Core-periphery structure
Social Networks Analysis
Biological networks CS 5263 Bioinformatics.
Department of Computer and IT Engineering University of Kurdistan
Comparison of Social Networks by Likhitha Ravi
Network analysis.
Community detection in graphs
Network Science: A Short Introduction i3 Workshop
SEG5010 Presentation Zhou Lanjun.
(Social) Networks Analysis II
Practical Applications Using igraph in R Roger Stanton
Analyzing Massive Graphs - ParT I
Presentation transcript:

A Brief Overview on Some Recent Study of Graph Data Yunkai Liu, Ph. D., Gannon University

Outlines Graph Database vs. Traditional Database – Data structure – Some frequently-used measurements – Overview of Graph Databases Graph Data on Social Networks – Case study Graph Data on Biology – Case study Graph Data on other areas

What is the specialty of graph data in application Basic Data Structure – G = (N, E) Sometime edges are also named as links Some difference / limitation – Directed graph – Contains a large amount of attribute categories in nodes – Contains limited amount of attributes categories in edges – Rarely using adjacent matrices; hash table and indices are widely used Example – SN between us

Some frequently-addressed graph properties Homophily is the tendency to relate to people with similar characteristics (status, beliefs, etc.) – It leads to the formation of homogeneous groups (clusters) where forming relations is easier – Extreme homogenization can act counter to innovation and idea generation (heterophilyis thus desirable in some contexts) – Homophilousties can be strong or weak

Some frequently-addressed graph properties Transitivity is a property of ties: if there is a tie between A and B and one between B and C, then in a transitive network A and C will also be connected – Strong ties are more often transitive than weak ties; transitivity is therefore evidence for the existence of strong ties (but not a necessary or sufficient condition) – Transitivity and homophily together lead to the formation of cliques (fully connected clusters) – How to decide reasonable transitivity degree in graph models?

Some frequently-addressed graph properties Bridges are nodes and edges that connect across groups – Facilitate inter-group communication, increase social cohesion, and help spur innovation – They are usually weak ties, but not every weak tie is a bridge

Some frequently-addressed graph properties -Degree centrality A node’s (in-) or (out-)degree is the number of links that lead into or out of the node In an undirected graph they are of course identical Often used as measure of a node’s degree of connectedness and hence also influence and/or popularity Useful in assessing which nodes are central with respect to spreading information and influencing others in their immediate ‘neighborhood’

Some frequently-addressed graph properties -Paths A path between two nodes is any sequence of non-repeating nodes that connects the two nodes The shortest path between two nodes is the path that connects the two nodes with the shortest number of edges (also called the distance between the nodes) – All shortest paths – K-th shortest path

Some frequently-addressed graph properties – Betweeness centrality The number of shortest paths that pass through a node divided by all shortest paths in the network Sometimes normalized such that the highest value is 1 Shows which nodes are more likely to be in communication paths between other nodes Also useful in determining points where the network would break apart.

Some frequently-addressed graph properties – Closeness centrality The mean length of all shortest paths from a node to all other nodes in the network (i.e. how many hops on average it takes to reach every other node) It is a measure of reach, i.e. how long it will take to reach other nodes from a given starting node Useful in cases where speed of information dissemination is main concern Lower values are better when higher speed is desirable

Some frequently-addressed graph properties – Eigenvector centrality A node’s eigenvector centrality is proportional to the sum of the eigenvector centralities of all nodes directly connected to it In other words, a node with a high eigenvector centrality is connected to other nodes with high eigenvector centrality This is similar to how Google ranks web pages: links from highly linked-to pages count more Useful in determining who is connected to the most connected nodes

Others measurements Reciprocity (degree of) – The ratio of the number of relations which are reciprocated (i.e. there is an edge in both directions) over the total number of relations in the network – A useful indicator of the degree of mutuality and reciprocal exchange in a network, which relate to social cohesion – Only makes sense in directed graphs

Others measurements Density – A network’s density is the ratio of the number of edges in the network over the total number of possible edges between all pairs of nodes (which is n(n-1)/2, where n is the number of vertices, for an undirected graph) – It is a common measure of how well connected a network is (in other words, how closely knit it is) –a perfectly connected network is called a clique and has density=1 – A directed graph will have half the density of its undirected equivalent, because there are twice as many possible edges, i.e. n(n-1) – Density is useful in comparing networks against each other, or in doing the same for different regions within a single network

Others measurements Clustering – A node’s clustering coefficient is the density of its neighborhood(i.e. the network consisting only of this node and all other nodes directly connected to it) – The clustering coefficient for an entire network is the average of all coefficients for its nodes – Clustering indicative of the presence of different (sub-)communities in a network

Others measurements Average and longest distance – The longest shortest path (distance) between any two nodes in a network is called the network’s diameter – It also indicates how long it will take at most to reach any node in the network (sparser networks will generally have greater diameters) – The average of all shortest paths in a network is also interesting because it indicates how far apart any two nodes will be on average (average distance)

What is Graph Database Graph database started in 1970s It is growing fast recently due to the development of computer science tech. – Some GD claimed that they can represent millions of nodes and billions of edges GD is a part of NoSQL database

Social Network Analysis (SNA) News – In 2013 Feb, Facebook announced their new “graph search” app Major questions – Networks: How to represent various social networks – Tie Strength: How to identify strong/weak ties in the network – Key Players: How to identify key/central nodes in network – Cohesion: How to characterize a network’s structure Major application – Social study – National security – Micro-advertisement – …

Some of my project Meth-Hunter Graph Data Management system Graph Data warehouse protocol

NodeXL - s

NodeXL - Facebook

Graph MetricValue Graph TypeUndirected Vertices67 Unique Edges165 Edges With Duplicates0 Total Edges165 Self-Loops0 Reciprocated Vertex Pair RatioNot Applicable Reciprocated Edge RatioNot Applicable Connected Components8 Single-Vertex Connected Components0 Maximum Vertices in a Connected Component29 Maximum Edges in a Connected Component102 Maximum Geodesic Distance (Diameter)4 Average Geodesic Distance Graph Density Modularity NodeXL Version

Graph Data in Biology Multiple classes of bionetwork models exist, such as metabolic, protein-gene, or protein-protein interactions – Metabolic networks entail nodes as metabolites and edges as enzymes facilitating a specific reaction within the body or nature. – Protein-gene interactions involve understanding and mapping gene expression. – As with metabolic and gene expression, protein- protein interaction networks include nodes as proteins

Graph Data in Biology The structure of bio-network is important for us to understand the nature The analysis part is similar with SNA, – The clique-finding is important and it may related with tumar.

One case study – bionetwork alignment Two previous models include Graemlin (General and robust alignment of multiple large interaction networks) and PHUNKEE (Pairing subgrapHs Using NetworK Environment Equivalence) – As Graemlin considers the entire network spectrum, the PHUNKEE algorithm considers only the most conserved portions between two graphs

One case study – bionetwork alignment Graemlin was advantageous in that it could align multiple networks at a fast pace, however; all nodes and edges are considered whether or not they are similar to each other. On the contrary, PHUNKEE considers only the most conserved portions of two graphs, taking into account that insertions and deletions may occur over time. However, the algorithm performs slowly, working in a step-by-step manner.

One case study – bionetwork alignment we realized that one method is not enough to determine the relationship between two graphs because of various factors from data. Thus, we create a comprehensive package for pairwise graph comparison. – The package includes two interfaces; one is for global alignment and another for local alignment. – Transitivity property is also considered in case of missing nodes or missing edges.

The bionetworks of four species in our experiment. Rattus norvegicus Mus musculus Saccharomyces cerevisiae Homo sapiens Number of Nodes Number of Edges

The comparisons between three species and Homo sapiens. Rattus norvegicus vs Homo sapiens Mus musculus vs Homo sapiens Saccharomyces cerevisiae vs Homo sapiens Number of Shared Nodes 1124 (92.74%)2928 (91.10%)537(10.94%) Number of Shared Edges 23233(9.61%)17422 (5.07%)1308(0.34%) Inner Global Similarity Outer Global Similarity Left Global Similarity Biased on the Three Species Left Global Similarity Biased on Home sapiens

A Cladogram for Rattus norvegicus, Mus musculus and Saccharomyces cerevisiae

Some Weird Part The normalization of the data is a big challenge. It is easy to get a wrong conclusion, which is yeast is more close to human than mice. It is just an example of graph mining in bioinformatics

Other area of Graph Data GIS Financial / business – Public spending Gaming Some challenges of GD in CS – Cloud app and cloud computing – Visualization – Integrating with other databases