Presentation is loading. Please wait.

Presentation is loading. Please wait.

Design of a Click-tracking Network for Full-text Search Engine Group 5: Yuan Hu, Yu Ge, Youwen Gong, Zenghui Qiu and Miao Liu.

Similar presentations


Presentation on theme: "Design of a Click-tracking Network for Full-text Search Engine Group 5: Yuan Hu, Yu Ge, Youwen Gong, Zenghui Qiu and Miao Liu."— Presentation transcript:

1 Design of a Click-tracking Network for Full-text Search Engine Group 5: Yuan Hu, Yu Ge, Youwen Gong, Zenghui Qiu and Miao Liu

2 Outline Introduction Objective Project diagram –Web Crawling –Indexing schema Ranking strategies –PageRank Algorithms –Neural Network –Content-Based Ranking Software and Reference

3 Introduction Full-text Search Engine –search on key words –rank results What is in a Search Engine? –Crawling –Indexing –Ranking results of query

4 Objective Design a full-text search engine Rank search results in different ways

5 Project Diagram Website Crawling Text & urls Database Indexing Query Function Click-Tracking Network PageRank Algorithms Content-Based Ranking Ranked results

6 Web Crawling Depth 1: crawling all the url links on the main page Depth 2: crawling all the url links found in depth 1 Main page: …… http://en.wikipedia.org/wiki/Machine_learning http://en.wikipedia.org/wiki/Decision_tree_learning#Information_gain http://en.wikipedia.org/wiki/Machine_learning#Decision_tree_learning …… # Implemented with Python urllib2 module and BeautifulSoup API

7 URL LINK URL Main Page Depth 1 Depth 2 URL LINK

8 Schema for Basic Index Link Row_ID From_ID To_ID Url_list Row_ID Url Word_location Url_ID Word_ID Location Word_list Row_ID Word Link_words Word_ID Link_ID # Implemented with SQLite

9 Results for Multiple-words Query Words Combination Same url _idWord location ! Notice that all the url_ids returned are not ranked.. Query function

10 PageRank Algorithm Developed by Larry Page at Stanford U. in 1996. How important that page is. The importance of the page is calculated from all the other pages that link to it. http://www.rasch.org/rmt/rmt232a.htm

11 How to Calculate PR d: damping factor, 0<d<1, 0.85. PR(B), ……..,PR(D)…. : PageRank value of each webpage linking to page A. L(B),…….,L(D),….. : The number of links going out of page B,……D…..

12 Example PR(A) = 0.15 + 0.85 * ( PR(B)/links(B) + PR(C)/links(C) +PR(D)/links(D) ) = 0.15 + 0.85 * ( 0.5/4 + 0.7/4 + 0.2/1 ) = 0.15 + 0.85 * ( 0.125 + 0.175 + 0.2) = 0.15 + 0.85 * 0.465 = 0.575

13 How to Update the PR Value If we don’t know what their PR should be to begin with, just assign an initial PR value for every page. 20 Iterations Update http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm

14 Results for PageRank PageRank values

15 Neural Network Why? Make reasonable guess about results for queries that they have never seen before. Click-tracking The weights are updated based on the search results which the user clicked.

16 Neural Net Work Step1: Setting Up the Database Step2: Feeding Forward Activation Step3: Training with BackPropagation How Neural Network works? Solid line: Strong connections Bold text: Active node

17 Step1: Setting Up the ANN Database Create a table for hidden layer(red box) Create two tables for the connections(green boxes)

18 Step2: Feeding Forward Activation Objective: activate the ANN. –Take words as inputs –Activate the links in the network –Give outputs for URL Hyperbolic tangent function X-axis: total input to the node

19 Step3: Training with Backpropagation Train the network every time someone performs a search and choose one of the links The same algorithm covered in class. Learning rate = 0.5

20 Step 1: From ID To ID Hidden node Strength Step 2: relevance of URL input URL Results For Neural Network Step 3: Training with one query

21 Results For Neural Network(contd) Step 3: Training with more queries

22 Content-Based Ranking Word frequency Document location Word distance Basic Idea: Calculate a score based only on the query and the content of the page

23 Reference Collective Intelligence- Toby Segaran SQLite Tutorial - ZetCode Dive into Python – Mark Pilgrim Software Ubuntu 11.04 Python 2.7.3 SQLite

24 Thank you.


Download ppt "Design of a Click-tracking Network for Full-text Search Engine Group 5: Yuan Hu, Yu Ge, Youwen Gong, Zenghui Qiu and Miao Liu."

Similar presentations


Ads by Google