Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Extraction From HTML Tables Cui Tao Department of Computer Science Brigham Young University.

Similar presentations


Presentation on theme: "Data Extraction From HTML Tables Cui Tao Department of Computer Science Brigham Young University."— Presentation transcript:

1 Data Extraction From HTML Tables Cui Tao Department of Computer Science Brigham Young University

2 Information In Tables  Nowadays, significant portion of the information on the Wed is stored in tables.

3 The Ontology-Based Extraction

4

5 Major Problems  In the tables, the values and their corresponding attributes are separately. But the ontology can only extract the data when they are together.  Sometimes the attributes in the table are the values in the database, the values in the table are only the identifier of the attributes.  Sometimes, the values in one cell of the table may informs several attribute values in the database.

6 Attribute-Value Pair Attribute: (part of the) constant/key word rule

7 How To Solve This Problem? Put the attribute-value pair together. Try both order.

8 More General…

9  The attributes in the table are actually values in the database… Attribute Value

10 How To Solve This Problem?  Put attribute in the file depends on the Boolean value

11 Value Multiple Information

12 More Problems …


Download ppt "Data Extraction From HTML Tables Cui Tao Department of Computer Science Brigham Young University."

Similar presentations


Ads by Google