Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatically obtain a description for a larger cluster of relevant documents Identify terms related to query terms  Synonyms, stemming variations, terms.

Similar presentations


Presentation on theme: "Automatically obtain a description for a larger cluster of relevant documents Identify terms related to query terms  Synonyms, stemming variations, terms."— Presentation transcript:

1 Automatically obtain a description for a larger cluster of relevant documents Identify terms related to query terms  Synonyms, stemming variations, terms close to query terms Local analysis Use correlated terms from retrieved documents for query expansion

2 Three types of clusters Association clusters  Stems co-occurring frequently inside documents have a synonymity association

3 Un-normalized correlation factor S u,v =C u,v Normalized correlation factor

4  Build local association clusters as follows  Find clusters for the query terms

5 Metric clusters  Consider the distance between two terms to compute their correlation factor

6 Un-normalized correlation factor S u,v =C u,v Normalized correlation factor  Build local metric clusters as follows

7 Scalar clusters  Two stems with similar neighborhoods have some synonymity relationship

8  A term S u is a neighbor of S v if S u belongs to a cluster (of size n) associated with S v  Neighbor stems having a synonymity relationship are not necessarily synonyms in the grammatical sense  Union of un-normalized and normalized clusters provides a better representation of possible correlations Metric clusters seem to perform better than purely association clusters

9 Global analysis Expand the query using information from the whole set of documents in the collection  Build a thesaurus-like structure  Select terms for expansion based on their similarity to the whole query Previous approaches failed to yield good results by considering individual query terms

10

11

12 Query expression done in three steps  Represent the query as follows

13  Compute the similarity between each term correlated to the query terms and the whole query

14  Expand the query with the top r ranked terms according to the similarity computed Yield improved retrieval performance in the range of 20%


Download ppt "Automatically obtain a description for a larger cluster of relevant documents Identify terms related to query terms  Synonyms, stemming variations, terms."

Similar presentations


Ads by Google