Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extracting Semistructured Information from the Web J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, A. Crespo from Stanford University Presented by: Wei.

Similar presentations


Presentation on theme: "Extracting Semistructured Information from the Web J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, A. Crespo from Stanford University Presented by: Wei."— Presentation transcript:

1 Extracting Semistructured Information from the Web J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, A. Crespo from Stanford University Presented by: Wei Mao

2 Introduction: Background  Fast growing of WWW  Semistructured data in web pages  Difficulty with manipulating web data One solution  A configurable extraction program  Extraction result in OEM  A wrapper is used for query

3 A detailed example: Weather table Can we query “What is the forecast for Vienna for Jan. 28, 1997?”?

4 Extraction process: HTML file Specification file Commands [ variables, source, pattern ] Package result into an OEM object

5 The HTML for weather table

6 A sample specification file

7 Extraction result

8 Customizing the extraction result

9 Additional capabilities Extract_table construct Case operator Get(url) operator Query the extracted result Use existing wrapper generation tool Only simple interface is required

10 Advantages Manipulate web data efficiently Flexible Easy to use Reuse the existing systems (OEM, Lorel, HTML parser)

11 Disadvantages Depends on outside input Requires prior knowledge of the structure of HTML file Have to use specification file


Download ppt "Extracting Semistructured Information from the Web J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, A. Crespo from Stanford University Presented by: Wei."

Similar presentations


Ads by Google