Presentation is loading. Please wait.

Presentation is loading. Please wait.

Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Similar presentations


Presentation on theme: "Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted."— Presentation transcript:

1 Query Expansion By: Sean McGettrick

2 What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted search. Query Expansion is the term given when a search engine adding search terms to a user’s weighted search. The goal is to improve precision and/or recall. The goal is to improve precision and/or recall. Example: User Query: “car”; Expanded Query: “car cars automobile automobiles auto” etc… Example: User Query: “car”; Expanded Query: “car cars automobile automobiles auto” etc…

3 Relevance of Query Expansion Query expansion is very important on the web. Query expansion is very important on the web. The amount of information on the web is always increasing. The amount of information on the web is always increasing. In 1999, Google had 135 million pages. It now has over 3 billion. In 1999, Google had 135 million pages. It now has over 3 billion. Search engine users follow specific trends with their searches. Search engine users follow specific trends with their searches. 2-3 words 2-3 words Broad search term Broad search term Do not like to expand their queries either through refining search terms or using Boolean operators Do not like to expand their queries either through refining search terms or using Boolean operators

4 Query Expansion Issues Two major issues Two major issues Which terms to include? Which terms to include? Which terms to weight more? Which terms to weight more? Concept-Based vs. Term-Based Query Expansion Concept-Based vs. Term-Based Query Expansion Is it better to expand based upon the individual terms in the query, or the overall concept of the query? Is it better to expand based upon the individual terms in the query, or the overall concept of the query?

5 Classes of Query Expansion Manual approach - Human generated thesauri Manual approach - Human generated thesauri Interactive Query Expansion Interactive Query Expansion Automatic Query Expansion Automatic Query Expansion

6 Approaches to Query Expansion Global Analysis Considers all the documents in the system. Considers all the documents in the system. Local analysis Uses some initially retrieved documents for expansion terms. Uses some initially retrieved documents for expansion terms. Another classification: Document-term based approach. Query-term based approach. Combined approach.

7 Global Analysis Term clustering Latent Semantic Indexing Similarity Thesauri Disadvantages Corpus wide statistical analysis takes computation time. Cannot address term mismatch problem.

8 The Need For Thesauri Naturally assumed that pulling words from a thesauri would increase: Naturally assumed that pulling words from a thesauri would increase: The number of documents retrieved. The number of documents retrieved. Possibly precision. Possibly precision. The car example: “car” vs. “car, auto, automobile, vehicle, sedan, etc…” The car example: “car” vs. “car, auto, automobile, vehicle, sedan, etc…” Which would retrieve the largest number of documents? Which would retrieve the largest number of documents? Is larger necessarily better? Is larger necessarily better?

9 Human & Automatically Generated Thesauri Earliest work began in the 1950s. Earliest work began in the 1950s. H.P. Luhn H.P. Luhn Thesaurofacet – detailed list of engineering terms Thesaurofacet – detailed list of engineering terms Largely used in such industries as medicine, aerospace, and other technological fields. Largely used in such industries as medicine, aerospace, and other technological fields.

10 Drawbacks of Handcrafted Thesauri Cost Cost Development. Development. Maintenance. Maintenance. Cost often outweighs benefit. Cost often outweighs benefit. Time Time It often takes a long time for thesauri to develop. It often takes a long time for thesauri to develop. Hard to keep up with the pace of scientific and technological development. Hard to keep up with the pace of scientific and technological development.

11 Automatically Generated Thesauri Global analysis method. 3 Steps. Extract word co-occurrences. Extract word co-occurrences. Define word similarities. Define word similarities. Based upon word co-occurrence or lexical relationship. Cluster words based upon their similarities. Cluster words based upon their similarities. Not proven very successful. Not proven very successful. As late as 1990 many industries were still using handcrafted thesauri. As late as 1990 many industries were still using handcrafted thesauri.

12 Interactive Query Expansion Uses a thesaurus. Uses a thesaurus. After initial query is submitted, the system returns a list of associated and relevant words derived from both the result set and a thesaurus. After initial query is submitted, the system returns a list of associated and relevant words derived from both the result set and a thesaurus. Useful, but more research is needed. Useful, but more research is needed.

13 Relevance Feedback Local analysis + interactive. Significant improvement in recall and precision over early query expansion work. Basic process as follows. Basic process as follows. The user creates their initial query which returns an initial result set. The user creates their initial query which returns an initial result set. The user then selects a list of documents that are relevant to their search. The user then selects a list of documents that are relevant to their search. The system then re-weights and/or expands the query based upon the terms in the documents. The system then re-weights and/or expands the query based upon the terms in the documents.

14 Automatic Query Expansion The process of automatic query expansion using computer generated thesauri. The process of automatic query expansion using computer generated thesauri. Works somewhat like pseudo-relevance feedback. Works somewhat like pseudo-relevance feedback.

15 Pseudo-relevance Feedback Also known as blind feedback. Also known as blind feedback. Grew from problems involved in implementing relevance feedback systems. Grew from problems involved in implementing relevance feedback systems. Users do not like to give manual feedback to the system. Users do not like to give manual feedback to the system.

16 Pseudo-relevance Feedback Process The system returns an initial set of documents. The system returns an initial set of documents. The system assumes that the top n number of documents are relevant to the query. The system assumes that the top n number of documents are relevant to the query. The system takes terms from these documents to re-weight the query. The system takes terms from these documents to re-weight the query. Relies largely on the systems ability to initially retrieve relevant documents. Relies largely on the systems ability to initially retrieve relevant documents. May lead to “query drift”.

17 Concept Based Query Expansion Uses terms that are closer to the concept of query rather than individual query terms. Determining concept representing a query is hard. Mathematical approach tried by Qiu, Y. and Frei, H.P. 1993. Concept Based Query Expansion. Proceedings of 16 th SIGIR.

18 Mining for Query Expansion Needs a log of queries fired and the corresponding documents clicked by the user. If a set of documents is often selected for the same queries, then the terms in this document are strongly related to terms in the queries. Takes advantage of user judgment implied in the logs. Described in the paper Cui, H.; Wen, J.R.; Nie, J.Y; and Ma, W.Y. 2003. Query Expansion by Mining User Logs. IEEE Transactions on Knowledge and Data Engineering.

19 Where To Go From Here? Grammatical Based Thesauri Grammatical Based Thesauri Syntactical relationship between words Syntactical relationship between words Words placed into classes Words placed into classes Some improvement on small document collections. Failed on larger ones. Some improvement on small document collections. Failed on larger ones. AI Searching AI Searching Mostly theory Mostly theory Intelligent Agents Intelligent Agents Could be customized reflect specific needs of the user Could be customized reflect specific needs of the user Next logical step in IR, but still far off from commercial use Next logical step in IR, but still far off from commercial use

20 Works Cited Attardi, G., S. Di Marco and F. Sebastiani. 1998. Automated Generation of Category-Specific Thesauri for Interactive Query Expansion. Grefenstette, G. 1992. Use of Syntactic Context to Produce Term Association Lists for Text Retrieval. In Proceedings of the 15th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, ed. N. Belkin, P. Ingwersen and A. M. Pesjtersen: pp. 89-97. New York: ACM Press. Ide, E. 1971. New Experiments in Relevance Feedback. In G. Salton. The SMART Retrieval System: Experiments in automatic document processing. Englewood Cliffs, NJ: Prentice-Hall. Qiu, Y., 1993. Concept Based Query Expansion. In Proceedings of SIGIR- 93, 16 th ACM International Conference on Research and Development in Information Retrieval. Schutze, H. and J. Pederson. 1997. A Cooccurance-based Thesaurus and Two Applications to Information Retrieval. Information Processing and Management 33, no. 3: pp. 307-318. Walker, D. 2001. Query Expansion Using Thesauri.


Download ppt "Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted."

Similar presentations


Ads by Google