Presentation is loading. Please wait.

Presentation is loading. Please wait.

Searching in Graphs.

Similar presentations


Presentation on theme: "Searching in Graphs."— Presentation transcript:

1 Searching in Graphs

2 Google: life time of a query
All web pages need to be in Google’s index Over 20 billion webpages New ones are constantly being added How can Google keep searching for new web pages?

3 Web Crawlers First crawler: Web Wanderer from MIT, 1993
Measure the growth of the web Well known crawlers GoogleBot MSNBot Slurp (from Yahoo!) Teoma (from AskJeeves)

4 Crawler Architecture PARSER HREFs extractor Citations and normalizer
Load Monitor SCHEDULER Crawl Metadata Duplicate URL Eliminator Filter Hosts HREFs extractor and normalizer PARSER Internet seed URLs URL FRONTIER Citations RETRIEVERS DNS HTTP

5 Web Crawler Architechture
High level structure Start with a set of URLs Repeatedly get web pages, scan for outlinks Issues Latency of several seconds per page DNS lookup delays Duplicate pages “Spider traps”: hyperlinks constructed to trap the crawler Crashing the server due to overload Delays in server response

6 Web Crawler Architechture
hatchline/hatchline/flyfactory/hatchline/flyfactory/hatchline/ flyfactory/flyfactory/flyfactory/hatchline/flyfactory/hatchline/ Spider traps: dummy links Basic web crawl: searching a graph

7 Graph Theory: Basic Definitions and Applications
Section 3.1 of [KT]

8 Connections between web links
College of Engineering Academics VT home page Computer Science Sports

9 Road Map

10 Airline routes

11 Directed Graphs 1 2 3 4 Directed graph. G = (V, E) V = nodes.
E = edges between pairs of nodes. Captures pairwise relationship between objects. Graph size parameters: n = |V|, m = |E|. Maximum number of distinct edges = O(n2) Edges are asymmetric: edge (1,4) but not (4,1) V = { 1, 2, 3, 4} E = { (1,2), (1,3), (1,4), (2,4), (4,2), (4,3)} n = 4 m = 6 1 2 3 4

12 Adjacencies 1 2 3 4 In(v) = { u : (u,v) is an edge}
Indegree(v) = | In(v)| Out(v) = { w: (v,w) is an edge } Outdegree(v) = |Out(v)| Maximum Indegree, Outdegree = O(n) Outdegree(1) Indegree(2) 1 2 3 4

13 Undirected Graphs Undirected graph. G = (V, E) V = nodes.
E = edges between pairs of nodes. Captures symmetric pairwise relationship between objects. Graph size parameters: n = |V|, m = |E|. V = { 1, 2, 3, 4, 5, 6, 7, 8 } E = { (1,2), (1,3), (2,3), (2,4), (2,5), (3,5), (3,7), (3,8), (4,5), (5,6) } n = 8 m = 11

14 Some Graph Applications
Nodes Edges transportation street intersections highways communication computers fiber optic cables World Wide Web web pages hyperlinks social people relationships food web species predator-prey software systems functions function calls scheduling tasks precedence constraints circuits gates wires

15 World Wide Web Web graph. Directed graph Node: web page.
Edge: hyperlink from one page to another. cnn.com netscape.com novell.com cnnsi.com timewarner.com hbo.com sorpranos.com

16 Ecological Food Web Food web graph. Directed graph Node = species.
Edge = from prey to predator. Reference:

17 Road Map Nodes: intersections Edges: roads

18 Other graphs in the real world
Airline routes Nodes: cities Edges: Flights Yeast protein network Nodes: proteins Edges: interacting pairs

19 Other graphs in the real world
Sexual interaction network High school dating network

20 Phylogeny Trees Phylogeny trees. Describe evolutionary history of species. biologists draw their tree from left to right The phylogeny states that there was an ancestral species that gave rise to mammals and birds, but not to the other species shown in the tree (that is, mammals and birds share a common ancestor that they do not share with other species on the tree), that all animals are descended from an ancestor not shared with mushrooms, trees, and bacteria, and so on.

21 GUI Containment Hierarchy
GUI containment hierarchy. Describe organization of GUI widgets. Reference:

22 Paths and Connectivity
Def. A path in an undirected graph G = (V, E) is a sequence P of nodes v1, v2, …, vk-1, vk with the property that each consecutive pair vi, vi+1 is joined by an edge in E. Def. A path is simple if all nodes are distinct. Def. An undirected graph is connected if for every pair of nodes u and v, there is a path between u and v.

23 Cycles Def. A cycle is a path v1, v2, …, vk-1, vk in which v1 = vk, k > 2, and the first k-1 nodes are all distinct. cycle C =

24 Trees Def. An undirected graph is a tree if it is connected and does not contain a cycle. Theorem. Let G be an undirected graph on n nodes. Any two of the following statements imply the third. G is connected. G does not contain a cycle. G has n-1 edges.

25 Rooted Trees Rooted tree. Given a tree T, choose a root node r and orient each edge away from r. Importance. Models hierarchical structure. root r by rooting a tree, it's easy to see that it has n-1 edges (exactly one edge leading upward from each non-root node.) parent of v v child of v a tree the same tree, rooted at 1

26 Phylogeny Trees Phylogeny trees. Describe evolutionary history of species. biologists draw their tree from left to right The phylogeny states that there was an ancestral species that gave rise to mammals and birds, but not to the other species shown in the tree (that is, mammals and birds share a common ancestor that they do not share with other species on the tree), that all animals are descended from an ancestor not shared with mushrooms, trees, and bacteria, and so on.

27 GUI Containment Hierarchy
GUI containment hierarchy. Describe organization of GUI widgets. Reference:

28 Binary Trees A rooted tree in which every node has either two
1 A rooted tree in which every node has either two or zero children 2 3 4 5 Complete binary tree: all leaf nodes are at the same level #nodes in a complete binary tree with k levels?


Download ppt "Searching in Graphs."

Similar presentations


Ads by Google