Presentation is loading. Please wait.

Presentation is loading. Please wait.

PageRank. Un Motor de Búsqueda “obama” PageRank Model: Final Version The Web: a directed graph Vertices (pages) Edges (links) fa eb dc.

Similar presentations


Presentation on theme: "PageRank. Un Motor de Búsqueda “obama” PageRank Model: Final Version The Web: a directed graph Vertices (pages) Edges (links) fa eb dc."— Presentation transcript:

1 PageRank

2 Un Motor de Búsqueda

3 “obama”

4 PageRank Model: Final Version The Web: a directed graph Vertices (pages) Edges (links) fa eb dc

5 Input Structure 41.5 million edges 5.4 million nodes document-with-linkdocument-linked

6 Step 1. Dictionary Encode Links Strings difficult to fit in memory Encode strings as OIDs (object ids = integers) Input line: http://es.dbpedia.org/resource/Ciencia_ficción http://es.dbpedia.org/resource/Robot Output line: 1203952673 Dictionary: 12039http://es.dbpedia.org/resource/Ciencia_ficción … 52673http://es.dbpedia.org/resource/Robot … OIDCompress -i [folder]/page_links_es.tsv.gz -igz -o [folder]/page_links_es.oid.gz -ogz -d [folder]/page_links_es.dict.gz -dgz

7 Step 2. Write PageRank Algorithm PageRankGraph.rankGraph(int[][] graph) int[] out = graph[i]; – out contains the nodes linked from node i – it might be empty or null if node i doesn’t link to anything! two rank vectors: rank[graph.length], nextRank[graph.length] initial rank values set as 1d / graph.length run ITERS number of iterations – compute edge-invariant rank once per iteration (red and blue) need to keep track of sum of ranks of nodes with no outlinks from prev. round – for each node (orange) split it’s rank[] by the number of outlinks it has, and add the result to the nextRank[] of each node it links to – the sum of the ranks after each round should be very very close to 1 test on –i data/test-graph.txt –o data/test-data.txt

8 Step 3. Rank full data Run ranking -i [folder]/page_links_es.oid.gz -igz -o [folder]/page_ranks_es.oid.gz –ogz Sort by rank -i [folder]/page_ranks_es.oid.gz -igz -o [folder]/page_ranks_es_s.oid.gz –ogz Decompress the file -d [folder]/page_links_es.dict.gz -dgz -i [folder]/page_ranks_es_s.oid.gz -igz -n 0 -o [folder]/page_ranks_es_s.tsv.gz -ogz

9 Course Marking 45% for Weekly Labs (~3% a lab!) 35% for Final Exam 20% for Small Class Project

10 Class Project Done in pairs (Except Alejandro :P) Goal: Use what you’ve learned to do something cool (basically) Expected difficulty: More than a lab’s worth – But from scratch / without my help! Marked on: Difficulty, appropriateness, scale, good use of techniques, presentation, coolness – Ambition is appreciated, even if you don’t succeed: feel free to bite off more than you can chew! Process: – Pair up (default random) by Wednesday – Decide on a topic (by June 9 th ) or let me assign one – If you need data or get stuck, I will (try to) help out Deliverables: 10 minute presentation (June 23 rd ) & 4-page report

11


Download ppt "PageRank. Un Motor de Búsqueda “obama” PageRank Model: Final Version The Web: a directed graph Vertices (pages) Edges (links) fa eb dc."

Similar presentations


Ads by Google