Extracting hidden information from knowledge networks Sergei Maslov Brookhaven National Laboratory, New York, USA.

Slides:



Advertisements
Similar presentations
www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
Advertisements

Analysis and Modeling of Social Networks Foudalis Ilias.
It’s a Small World by Jamie Luo. Introduction Small World Networks and their place in Network Theory An application of a 1D small world network to model.
The multi-layered organization of information in living systems
An RG theory of cultural evolution Gábor Fáth Hungarian Academy of Sciences Budapest, Hungary in collaboration with Miklos Sarvary - INSEAD, Fontainebleau,
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
Advanced Topics in Data Mining Special focus: Social Networks.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
School of Information University of Michigan SI 614 Random graphs & power law networks preferential attachment Lecture 7 Instructor: Lada Adamic.
Hierarchy in networks Peter Náther, Mária Markošová, Boris Rudolf Vyjde : Physica A, dec
Mining and Searching Massive Graphs (Networks)
On Search, Ranking, and Matchmaking in Information Networks Sergei Maslov Brookhaven National Laboratory.
Degree correlations in complex networks Lazaros K. Gallos Chaoming Song Hernan A. Makse Levich Institute, City College of New York.
Fast algorithm for detecting community structure in networks.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Segmentation Graph-Theoretic Clustering.
Global topological properties of biological networks.
Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.
Advanced Topics in Data Mining Special focus: Social Networks.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Lecture 4. Modules/communities in networks What is a module? Nodes in a given module (or community group or a functional unit) tend to connect with other.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
The Role of Specialization in LDPC Codes Jeremy Thorpe Pizza Meeting Talk 2/12/03.
Clustering Unsupervised learning Generating “classes”
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
The Erdös-Rényi models
Author: M.E.J. Newman Presenter: Guoliang Liu Date:5/4/2012.
ANALYZING PROTEIN NETWORK ROBUSTNESS USING GRAPH SPECTRUM Jingchun Chen The Ohio State University, Columbus, Ohio Institute.
Network properties Slides are modified from Networks: Theory and Application by Lada Adamic.
School of Information University of Michigan SI 614 Network subgraphs (motifs) Biological networks Lecture 11 Instructor: Lada Adamic.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
Workshop on Optimization in Complex Networks, CNLS, LANL (19-22 June 2006) Application of replica method to scale-free networks: Spectral density and spin-glass.
Class 9: Barabasi-Albert Model-Part I
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
KPS 2007 (April 19, 2007) On spectral density of scale-free networks Doochul Kim (Department of Physics and Astronomy, Seoul National University) Collaborators:
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
Comparison of Tarry’s Algorithm and Awerbuch’s Algorithm CS 6/73201 Advanced Operating System Presentation by: Sanjitkumar Patel.
Spectrum Sensing In Cognitive Radio Networks
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Properties of Growing Networks Geoff Rodgers School of Information Systems, Computing and Mathematics.
March 3, 2009 Network Analysis Valerie Cardenas Nicolson Assistant Adjunct Professor Department of Radiology and Biomedical Imaging.
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Class 2: Graph Theory IST402.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
1 Lesson 12 Networks / Systems Biology. 2 Systems biology  Not only understanding components! 1.System structures: the network of gene interactions and.
Scale-free and Hierarchical Structures in Complex Networks L. Barabasi, Z. Dezso, E. Ravasz, S.H. Yook and Z. Oltvai Presented by Arzucan Özgür.
Motoki Shiga, Ichigaku Takigawa, Hiroshi Mamitsuka
Random Walk for Similarity Testing in Complex Networks
Structures of Networks
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Bioinformatics 3 V6 – Biological Networks are Scale- free, aren't they? Fri, Nov 2, 2012.
Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama
Lecture 1: Introduction CS 765: Complex Networks
Degree and Eigenvector Centrality
Segmentation Graph-Theoretic Clustering.
Advanced Artificial Intelligence
Ilan Ben-Bassat Omri Weinstein
Modelling Structure and Function in Complex Networks
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
A quantum machine learning algorithm based on generative models
Presentation transcript:

Extracting hidden information from knowledge networks Sergei Maslov Brookhaven National Laboratory, New York, USA

Hanse Institute for Advanced Study, March 2002 Outline of the talk What is a knowledge network and how is it different from an ordinary graph or network? Knowledge networks on the internet: matching products to customers Knowledge networks in biology: large ensembles of interacting biomolecules Empirical study of correlations in the network of interacting proteins Collaborators: Y-C. Zhang, and K. Sneppen

Hanse Institute for Advanced Study, March 2002 Networks in complex systems Network is the backbone of a complex system Answers the question: who interacts with whom? Examples: – Internet and WWW – Interacting biomolecules (metabolic, physical, regulatory) – Food webs in ecosystems – Economics: customers and products; Social: people and their choice of partners

Hanse Institute for Advanced Study, March 2002 Predicting tastes of customers based on their opinions on products Each of us has personal tastes These tastes are sometimes unknown even to ourselves (hidden wants) Information is contained in our opinions on products Matchmaking: customers with similar tastes can be used to predict future opinions Internet allows to do it on a large scale

Hanse Institute for Advanced Study, March 2002 Types of networks readers books Plain networkKnowledge or opinion network reader’s tastes book’s features opinion

Hanse Institute for Advanced Study, March 2002 Storing opinions XXX29?? XXX?8?8 XXX??1? 2??XXXX 98?XXXX ??1XXXX ?8?XXXX books readers Matrix of opinions  IJ Network of opinions

Hanse Institute for Advanced Study, March 2002 Using correlations to reconstruct customer’s tastes Similar opinions  similar tastes Simplest model: – Readers  M-dimensional vector of tastes r I – Books  M-dimensional vector of features b J – Opinions  scalar product:  IJ = r I  b J customers books

Hanse Institute for Advanced Study, March 2002 Loop correlation customers books predictive power 1/M (L-1)/2 one needs many loops to completely freeze mutual orientation of vectors an unknown opinion L known opinions

Hanse Institute for Advanced Study, March 2002 Field Theory Approach If all components of vectors are Gaussian and uncorrelated: Generating functional is: det(1+i  ) -M/2 All irreducible correlations are proportional to M All loop correlations =M Since each is  IJ ~  M sign correlation scales as M –(L-1)/2

Hanse Institute for Advanced Study, March 2002 Main parameter: density of edges The larger is the density of edges p the easier is the prediction At p 1  1/N (N=N readers +N books ) macroscopic prediction becomes possible. Nodes are connected but vectors r I b J are not fixed: ordinary percolation threshold At p 2  2M/N > p 1 all tastes and features ( r I and b J ) can be uniquely reconstructed: rigidity percolation threshold

Hanse Institute for Advanced Study, March 2002 Spectral properties of  For M<N the matrix  IJ has N-M zero eigenvalues and M positive ones:  = R  R +. Using SVD one can “diagonalize” R = U  D  V + such that matrices V and U are orthogonal V +  V = 1, U  U + = 1, and D is diagonal. Then  = U  D 2  U + The amount of information contained in  : NM-M(M-1)/2 << N(N-1)/2 - the # of off-diagonal elements

Hanse Institute for Advanced Study, March 2002 Practical recursive algorithm of prediction of unknown opinions 1. Start with  0 where all unknown elements are filled with (zero in our case) 2. Diagonalize and keep only M largest eigenvalues and eigenvectors 3. In the resulting truncated matrix  ’ 0 replace all known elements with their exact values and go to step 1

Hanse Institute for Advanced Study, March 2002 Convergence of the algorithm Above p 2 the algorithm exponentially converges to the exact values of unknown elements The rate of convergence scales as (p-p 2 ) 2

Hanse Institute for Advanced Study, March 2002 Reality check: sources of errors Customers are not rational!  IJ = r I  b J +  Ij (idiosyncrasy) Opinions are delivered to the matchmaker through a narrow channel: – Binary channel S IJ = sign(  IJ ) : 1 or 0 (liked or not) – Experience rated on a scale 1 to 5 or 1 to 10 at best If number of edges K, and size N are large, while M is small these errors can be reduced

Hanse Institute for Advanced Study, March 2002 How to determine M? In real systems M is not fixed: there are always finer and finer details of tastes Given the number of known opinions K one should choose M eff  K/(N readers +N books ) so that systems are below the second transition p 2  tastes should be determined hierarchically

Hanse Institute for Advanced Study, March 2002 Avoid overfitting Divide known votes into training and test sets Select M eff so that to avoid overfitting !!! Reasonable fit Overfit

Hanse Institute for Advanced Study, March 2002 Knowledge networks in biology Interacting biomolecules: key and lock principle Matrix of interactions (binding energies)  IJ = k I  l J + l I  k J Matchmaker (bioinformatics researcher) tries to guess yet unknown interactions based on the pattern of known ones Many experiments measure S IJ =  (  IJ -  th ) k (1) k (2) l (2) l (1)

Hanse Institute for Advanced Study, March 2002 Real systems Internet commerce: the dataset of opinions on movies collected by Compaq systems research center: – users entered a total of numeric ratings (* to *****) for 1628 different movies: M eff ~40 – Default set for collaborative filtering research Biology: table of interactions between yeast proteins from Ito et al. high throughput two-hybrid experiment – 6000 proteins (~3300 have at least one interaction partner) and 4400 known interactions – Binary (interact or not) – M eff ~1: too small!

Hanse Institute for Advanced Study, March 2002 Yeast Protein Interaction Network Data from T. Ito, et al. PNAS (2001) Full set contains 4549 interactions among 3278 yeast proteins Here are shown only nuclear proteins interacting with at least one other nuclear protein

Hanse Institute for Advanced Study, March 2002 Correlations in connectivities Basic design principles of the network can be revealed by comparing the frequency of a pattern in real and random networks P(k 0,k 1 ) – probability that nodes with connectivities k 0 and k 1 directly interact Should be normalized by P r (k 0,k 1 ) – the same property in a randomized network such that: – Each node has the same number of neighbors (connectivity) – These neighbors are randomly selected – The whole ensemble of random networks can be generated

Hanse Institute for Advanced Study, March 2002 Correlation profile of the protein interaction network P(k 0,k 1 )/P r (k 0,k 1 ) Z(k 0,k 1 ) =(P(k 0,k 1 )-P r (k 0,k 1 ))/  r (k 0,k 1 )

Hanse Institute for Advanced Study, March 2002 Correlation profile of the internet

Hanse Institute for Advanced Study, March 2002 What it may mean? Hubs avoid each other ( like in the internet R. Pastor-Satorras, et al. Phys. Rev. Lett. (2001)) Hubs prefer to connect to terminal ends (low connected nodes) Specificity: network is organized in modules clustered around individual hubs Stability: the number of second nearest neighbors is suppressed  harder to propagate deleterious perturbations

Hanse Institute for Advanced Study, March 2002 Conclusion Studies of networks are similar to paleontology: learning about an organism from its backbone You can learn a lot about a complex system from its network !! But not everything…

Hanse Institute for Advanced Study, March 2002 THE END

Hanse Institute for Advanced Study, March 2002 Entropy of unknown opinions Density of known opinions p p1p1 p2p2 Entropy 01

Hanse Institute for Advanced Study, March 2002 How to determine p 2 ? K known elements of an NxN matrix  IJ = r I  b J (N=N r +N b ) Approximately N x M degrees of freedom (minus M(M-1)/2 gauge parameters) For K>MN all missing elements can be reconstructed  p 2 =K 2 /(N(N-1)/2)  2M/N

Hanse Institute for Advanced Study, March 2002 What is a knowledge network? Undirected graph with N vertices and K edges Each vertex has a (hidden) M-dimensional vector of tastes/features Each edge carries a scalar product (opinion) of vectors on vertices it connects The centralized matchmaker is trying to guess vectors (tastes) based on their scalar products (opinions) and to predict unknown opinions

Hanse Institute for Advanced Study, March 2002 Versions of knowledge networks Regular graph: every link is allowed. Example: recommending people to other people according to their areas of interests Bipartite graphs: Example: Customers to products Non-reciprocal opinions: each vertex has two vectors d I, q I so that  IJ = d I  q J. Example: Real matchmaker recommending men to women.