Presentation is loading. Please wait.

Presentation is loading. Please wait.

PAIR project progress report Yi-Ting Chou Shui-Lung Chuang Xuanhui Wang.

Similar presentations


Presentation on theme: "PAIR project progress report Yi-Ting Chou Shui-Lung Chuang Xuanhui Wang."— Presentation transcript:

1 PAIR project progress report Yi-Ting Chou Shui-Lung Chuang Xuanhui Wang

2 Motivation  A lot of information exists distributed and unstructured on the Web  Web IE: To extract and organize such information into a structured format E.g., Person (name, contact (email, phone, address), research interests, … ) E.g., Book (title, authors, price, ISBN, … )

3 Example Person (name, contact (email, phone, address), research interests, … ) Page 1 Page 2 Page 3 ……

4

5 Motivation (cont.)  Direct Web IE is very hard. E.g., distributed and unstructured  This project is to provide a instance- attribute retrieval engine towards this problem In this project, We focus on personal information. The attribute should be given (e.g. contact).

6 Flow Chart Name Attribute Page Collector Attribute Expansion Pages Attribute* Segment Tool Trees Retrieval Rank List

7 Why tree structure for page segmentation??  The parameter which controls the size of leaf block is difficult to tune  Our Solution: score each node of the tree instead of the leaf blocks. Then select the appropriate node to rank.

8 Current Progress Name Attribute Page Collector Attribute Expansion Pages Attribute* Segment Tool Trees Retrieval Rank List

9 The main idea of the project  1. Given a person name, first identify the pages which contain the information of the person (multiple pages exist on the Web)  2. Each page will be segmented into semantic-coherent blocks  3. Given an attribute name, identify the most relevant blocks  4. NLP techniques can be applied to extracted the Noun Phrase from the relevant blocks.

10 The progress so far Currently, we are focus on the single page.  1. Page Segmentation, using VIPS, will generate a tree structure for the page.  2. Given an attribute, match it with the most relevant “ node ” of the “ tree ”.  3. Present the rank list of the relevant blocks.

11 Demo

12 The remaining task  1. Improve the accuracy for single page.  2. Extend to multiple pages: INPUT: a person name (instead of a URL) and attribute name. OUTPUT: a rank list of the blocks.

13 Issues for discussion  The possible problem of our method E.g. how to effectively score and rank the “ node ” of the page “ tree ” ?  The way to improve and extend our method E.g. how to combine with the NLP/Name-Entity- Extraction on the retrieved blocks E.g. How to deal with multiple page and duplicated information  The evaluation suggestion of our method E.g. user study, anything more??  The relation with Entity Retrieval ??


Download ppt "PAIR project progress report Yi-Ting Chou Shui-Lung Chuang Xuanhui Wang."

Similar presentations


Ads by Google