Download presentation
Presentation is loading. Please wait.
1
Data Extraction From HTML Tables Cui Tao Department of Computer Science Brigham Young University
2
Information In Tables Nowadays, significant portion of the information on the Wed is stored in tables.
3
The Ontology-Based Extraction
5
Major Problems In the tables, the values and their corresponding attributes are separately. But the ontology can only extract the data when they are together. Sometimes the attributes in the table are the values in the database, the values in the table are only the identifier of the attributes. Sometimes, the values in one cell of the table may informs several attribute values in the database.
6
Attribute-Value Pair Attribute: (part of the) constant/key word rule
7
How To Solve This Problem? Put the attribute-value pair together. Try both order.
8
More General…
9
The attributes in the table are actually values in the database… Attribute Value
10
How To Solve This Problem? Put attribute in the file depends on the Boolean value
11
Value Multiple Information
12
More Problems …
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.