Presentation is loading. Please wait.

Presentation is loading. Please wait.

Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.

Similar presentations


Presentation on theme: "Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan."— Presentation transcript:

1 Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan

2 2 Agenda Properties of Web Data Approaches of Web Data Searching WebQA Introduction WebQA System Architecture WebQA Implementation WebQA System Evaluation Conclusion

3 3 Web Data Searching Search Engine is Enough? Web Data Query is Necessary?

4 4 Characteristics of Web Data Properties of Web Data –Wide distribution, large volume –High percentage of volatility –Unstructuredness, redundancy, inconsistency of redundant copies –Representation heterogeneity –Dynamism DB Perspective Difficulties of Querying Web Data –No schema –Short of scalability in searching the whole web –No exact web query language

5 5 Web Data Searching Approaches Information Retrieval Approach –Search engine and Metasearchers Database-Oriented Web Querying –Information Integration –Semistructured Data Querying –Special Web Query Languages –Question-Answer

6 6 Question-Answer Approach Basic principle – Web pages that could contain the answer to the user query are retrieved – The answer is extracted from these pages. NLP and Information Retrieval (IR) technologies Answer extracted by Information Extraction (IE) techniques. Example Systems – Mulder [Kwork et al, 2001] – WebQA [Lam & Özsu, 2002]

7 7 WebQA Question-answer approach –Accepts short factual queries –Returns the exact answers Aims at : –Accept fuzziness in user queries –Return actual answers, not URLs –Query entire webs and easily scale with new data sources

8 8 WebQA System Architecture Query Parser Semantic Cache Manager Answer Formatter Resource Locator/ Decomposer Answer Collector Complex Query Evaluation Search Engine Web Data Source Web Site … Search Engine Cache User Reference [1]

9 9 WebQA Prototype Architecture Query Parser (QP)Answer Extractor (AE) Summary Retriever (SR) Search Engine Web Data Source Web Site … Search Engine Cache User Valid WebQAL Query Keywords Category Keywords / Description Keywords/ Description List of Ranked Records Reference [1]

10 10 Query Parser Query Example: which country produced the most computers in the world? WebQAL Syntax: [-output ] -keywords place –output country –keywords producer most computers Categorizer WebQAL Generator WebQAL Checker NL question category Valid WebQAL WebQAL Reference [1] User query

11 11 List of Ranked Records Summary Retriever Web SiteRemote Database Keyword GeneratorSource Ranker Record Consolidator/Ranker Record Retriever Wrapper #2Wrapper #1 WebQAL Record Retriever Search Engine Wrapper #N Record Retriever Reference [1] Source Ranker identifies better data resources to answer certain types of questions. Ranked records are based on the source ranking first and local ranking second.

12 12 Answer Extractor Candidate Retriever Rearrange Output Converter List of Ranked Records Top ten answers (user readable) Reference [1] Candidate is retrieved based on word frequency of occurrence of the answer and the score of the rule that adds it to the candidate list. The higher the score, the more likely is the candidate the answer to the user’s query. The shorter the answer, the higher the score.

13 13 WebQA Implementation Architecture Web Server JSPs, HTMLs QA Server Thread QA Engine QA Server Thread QA Server Client #1 Client #2 Client #N Question/Answer (HTTP)...... Q/A Question answer (string) Reference [3]

14 14 System Evaluation Evaluation is using TREC-9 and measured in two aspects: accuracy and efficiency Reference [3]

15 15 Conclusion WebQA is in Question-Answer approach. –query input, exact answer –NLP, IR and IE technologies Data schema-independent. Query multiple Web sources: –Search engines –Data sources (CIA’ World Factbook) –Web Sites.

16 16 Future work To develop a full-fledged Web query system –Execution algorithms for more complex queries –Common aggregation functions on retrieving answers To think about other query types –Continuous query Ex: notify me whenever the Ottawa’s temperature drops below zero –Procedural query Ex: How do I make pancakes?

17 17 References 1.S. Lam and M.T. Özsu, "Querying Web Data - The WebQA Approach. WISE 2002. 2.D. Florescu and A. Levy and A. Mendelzon. Database techniques for the World Wide Web: A survey. SIGMOD Record, 27(3):59-74, 1998. 3.Web Data Management -Some Issues, M.T. Özsu, Course Slides


Download ppt "Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan."

Similar presentations


Ads by Google