Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

Similar presentations


Presentation on theme: "The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?"— Presentation transcript:

1 The Structure of the Web

2 Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape of the web? How do people search for information? Can we categorize web searchers?

3 The Web Is a Graph “Map of the Internet” (1998)

4 Examples of Graphs i a b c g e d f h

5 Graph G: Formal Definition A graph G consists of two sets G = {V, E} –A set V of vertices, or nodes (entities) –A set E  V  V of edges (relationships between entities) i a b c g e d f h Nodes are students in a large HS, edges join two who had a romantic relationship at some point during the 18-month period in which the study was conducted

6 Peter Mary Albert co-worker friend brothers friend Protein 1 Protein 2 Protein 5 Protein 9 Movie 1 Movie 3 Movie 2 Actor 3 Actor 1 Actor 2 Actor 4 N=4 L=4 Network Science: Graph Theory 2012 Examples of Relationships (Graphs)

7 Connected, Paths and Distances A Connected graph has a path between each pair of distinct vertices A Path A  B is A sequence of edges from A to B Minimum Distance The shortest path How do you measure it?

8 Computing Minimum Distances Algorithm:

9 Directed Graphs and DAGs Directed graph –Each edge is a directed edge, or an arc, or a link –Can have two arcs between a pair of vertices, one in each direction –Vertex y is adjacent to vertex x iff there is a directed edge from x to y Directed Path –A sequence of directed edges between two vertices Directed Acyclic Graph (DAG) –Directed graph that has no cycles

10 The WEB is a DIRECTED GRAPH

11 Structure of The Web A Web Page corresponds to a node A Hyperlink corresponds to a directed edge

12 Structure of The Web The Web is a directed graph with one large Strongly Connected Component Is there a directed path from the Univ. of X to Company Z’s Home? How about to USNews College Rankings? The other way(s) around?

13 How big is the web? Number of accessible web pages – May 2005 estimate: 11.5 Billion pages Most recent estimates? ________ The deep (or hidden or invisible) web “contains 400-550 times more information” (Are they serious?) Coverage (i.e. the proportion of the web indexed) is crucial for search engines. Today, ____________ pages are indexed

14 How do you measure the size of web? Capture-recapture method SE1 = # of pages indexed search engine 1. QSE2 = # of pages returned by search engine 2 for “typical” queries. OVR = # of pages returned by both search engines for typical queries. Estimate : SE1 / WWW = OVR / QSE2 => WWW = (SE1 x QSE2) / OVR SE1 OVR QSE2 WWW

15 How hard is it to go from one page to another? Over 75% of the time there is no directed path from one random web page to another. When a directed path exists its average length is 16 clicks. When an undirected path exists its average length is 7 clicks. Short average path between pairs of nodes is characteristic of a small-world network. Kleinberg: The small-world phenomenon (we will study later)


Download ppt "The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?"

Similar presentations


Ads by Google