Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.

Slides:



Advertisements
Similar presentations
The Structure of the Web Mark Levene (Follow the links to learn more!)
Advertisements

Scale Free Networks.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Analysis and Modeling of Social Networks Foudalis Ilias.
Introduction to Web Science Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial-
Web as Network: A Case Study Networked Life CIS 112 Spring 2010 Prof. Michael Kearns.
Generative Models for the Web Graph José Rolim. Aim Reproduce emergent properties: –Distribution site size –Connectivity of the Web –Power law distriubutions.
On Power-Law Relationships of the Internet Topology Michalis Faloutsos Petros Faloutsos Christos Faloutsos.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
Web Graph Characteristics Kira Radinsky All of the following slides are courtesy of Ronny Lempel (Yahoo!)
1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira.
Masters Thesis Defense Amit Karandikar Advisor: Dr. Anupam Joshi Committee: Dr. Finin, Dr. Yesha, Dr. Oates Date: 1 st May 2007 Time: 9:30 am Place: ITE.
The Barabási-Albert [BA] model (1999) ER Model Look at the distribution of degrees ER ModelWS Model actorspower grid www The probability of finding a highly.
Social Networks 101 P ROF. J ASON H ARTLINE AND P ROF. N ICOLE I MMORLICA.
WEB GRAPHS (Chap 3 of Baldi) Wen-Hsiang Lu ( 盧文祥 ) Department of Computer Science and Information Engineering, National Cheng Kung University 2005/10/6.
CS 345A Data Mining Lecture 1
1 Complex systems Made of many non-identical elements connected by diverse interactions. NETWORK New York Times Slides: thanks to A-L Barabasi.
CS 345A Data Mining Lecture 1 Introduction to Web Mining.
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
CS Lecture 6 Generative Graph Models Part II.
The Web as Network Networked Life CSE 112 Spring 2006 Prof. Michael Kearns.
CS 345 Data Mining Lecture 1 Introduction to Web Mining.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Network Science and the Web: A Case Study Networked Life CIS 112 Spring 2009 Prof. Michael Kearns.
CS347 Lecture 12 May 21, 2001 ©Prabhakar Raghavan.
Ch. 13 Structure of the Web Padmini Srinivasan Computer Science Department Department of Management Sciences
Modeling the Internet and the Web School of Information and Computer Science University of California, Irvine WEB GRAPHS.
Leveraging Big Data: Lecture 11 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
The Mobile Web is Structurally Different Apoorva Jindal USC Chris Crutchfield MIT Samir Goel Google Inc Ravi Jain Google Inc Ravi Kolluri Google Inc.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
WEB SCIENCE: ANALYZING THE WEB. Graph Terminology Graph ~ a structure of nodes/vertices connected by edges The edges may be directed or undirected Distance.
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 9.1 Chapter 9 : Social Networks What is a social.
CS246 Web Characteristics. Junghoo "John" Cho (UCLA Computer Science)2 Web Characteristics What is the Web like? Any questions on some of the characteristics.
Week 3 - Complex Networks and their Properties
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
Graph-based Algorithms in Large Scale Information Retrieval Fatemeh Kaveh-Yazdy Computer Engineering Department School of Electrical and Computer Engineering.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
Graph Algorithms: Properties of Graphs? William Cohen.
Mathematics of Networks (Cont)
Professor Yashar Ganjali Department of Computer Science University of Toronto
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
Meet the web: First impressions How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Models and Algorithms for Complex Networks Introduction and Background Lecture 1.
Social Networking: Large scale Networks
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Copyright © Curt Hill Graphs Definitions and Implementations.
Informatics tools in network science
Information Retrieval (9) Prof. Dragomir R. Radev
“Important” Vertices and the PageRank Algorithm Networked Life NETS 112 Fall 2014 Prof. Michael Kearns.
Models of Web-Like Graphs: Integrated Approach
Information Retrieval Search Engine Technology (10) Prof. Dragomir R. Radev.
Abstract Networks. WWW (2000) Scientific Collaboration Girvan & Newman (2002)
CPS 49S Google: The Computer Science Within and its Impact on Society Shivnath Babu Spring 2007.
CS 115: COMPUTING FOR THE SOCIO-TECHNO WEB FINDING INFORMATION WITH SEARCH ENGINES.
Topics In Social Computing (67810)
Generative Model To Construct Blog and Post Networks In Blogosphere
The likelihood of linking to a popular website is higher
Peer-to-Peer and Social Networks Fall 2017
CS246 Web Characteristics.
Peer-to-Peer and Social Networks
CS246: Web Characteristics
Presentation transcript:

Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang Birkbeck, University of London The slides are adapted from Prof. Mark Levene’s at

The Size of the Web Lawrence and Giles 1999 – 800 million Over 11.5 billion in 2005 (Google indexes over 8 billion) Coverage – about 40% in 1999 Overlap - low Overlap The deep (or hidden or invisible) web contains times more information.

Capture Recapture SE1 : the reported size of search engine 1. QSE1 and QSE2 : the pages returned for a set of queries Q from two engines. OVR : the overlap of QSE1 and QSE2 Estimate of Web size: (QSE2 x SE1) / OVR a.k.a. Mark and RecaptureMark and Recapture OVR / QSE2 = SE1 / Web

Diameter of the Web Compute Average shortest path between pairs of pages that have a path from one to the other. Broder 99 – directed 16.2, undirected 6.8 Barabasi 99 – directed for nd.edu 19 Small diameter is a charactersitic of a small world network Choose random source and destination – 75% of the time no directed path between them.

Bowtie Model of the Web Broder et al – crawl of over 200 million pages and 1.5 billion links.  SCC – 27.5%  IN and OUT – 21.5%  Tendrils and tubes – 21.5%  Disconnected – 8%

Link Degree Distributions How many page have n=1,2,… links:  indegree :  outdegree : The log-log plots are linear!

What is a Power Law f(i) is the proportion of objects having property i  E.g. f(i) = # pages, i = # inlinks  E.g. f(i) = # sites, i = # pages  E.g. f(i) = # sites i = # users  E.g. f(i) = frequency of word, i = rank of word, from most freqeunt to least frequent The log-log plot: linear relationship (straight line)

Power Laws on the Web inlinks (2.1) outlinks (2.72) Strongly connected components (2.54) No. of web pages in a site (2.2) No. of visitors to a site during a day (2.07) No. links clicked by web surfers (1.5) PageRank (2.1)

Preferential Attachment or The Rich Get Richer How Power Laws Arise

Scale-Free Networks Classic Random Graphs

Take Home Messages The Web Graph  Large and Sparse Capture Recapture  Small World Network 19 Degrees of Separation  Scale Free Network The Power Law Rich Get Richer