Presentation is loading. Please wait.

Presentation is loading. Please wait.

6/17/20151 Table Structure Understanding by Sibling Page Comparison Cui Tao Data Extraction Group Department of Computer Science Brigham Young University.

Similar presentations


Presentation on theme: "6/17/20151 Table Structure Understanding by Sibling Page Comparison Cui Tao Data Extraction Group Department of Computer Science Brigham Young University."— Presentation transcript:

1 6/17/20151 Table Structure Understanding by Sibling Page Comparison Cui Tao Data Extraction Group Department of Computer Science Brigham Young University Supported by NSF

2 6/17/20152 Table Structure Understanding Motivation Many documents contain tables Data extraction Data integration Ontology evolution Solution Locate tables Locate table labels Locate table values Find label/value associations

3 6/17/20153 Table Structure Understanding

4 6/17/20154 Table Structure Understanding 1 2 (Gene Model, 1) = F 1 8H 3.5a (Gene Model, 2) = F 1 8H 3.5b :

5 6/17/20155

6 6

7 7 Sibling Pages Generated output pages user query results in predefined page structure Same web site ~ same structure

8 6/17/20158 Problems Data rich area --- discard the irrelevant parts Find table correspondences Find mappings between table cells Find structure patterns

9 6/17/20159 HTML Table Components

10 6/17/201510 Data Rich Area

11 6/17/201511 Table Unnesting

12 6/17/201512 DOM Tree

13 6/17/201513 Simple Tree Matching Simple Tree Matching (STM) Yang91 Maximum matching pairs of nodes O(mn) label Value

14 6/17/201514 Table Structure Pattern

15 6/17/201515 Table Structure Pattern

16 6/17/201516 Experimental Results Initial Test General pattern extraction Molecular biology: 95.6% Car ad: 100% Dynamic adjustment Unseen structure Structure variations


Download ppt "6/17/20151 Table Structure Understanding by Sibling Page Comparison Cui Tao Data Extraction Group Department of Computer Science Brigham Young University."

Similar presentations


Ads by Google