Presentation is loading. Please wait.

Presentation is loading. Please wait.

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

Similar presentations


Presentation on theme: "Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423."— Presentation transcript:

1 Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423

2 Menu Information Retrieval (IR) basics An example Evaluation: Precision and Recall

3 Information Retrieval (IR) The field of information retrieval deals with the representation, storage, organization of, access to information items. Search Engines: Search a large collection of documents to find the ones that satisfy an information need, e,g, find relevant documents Indexing Document representation Comparison with query Evaluation/feedback

4 IR basics Indexing Manual indexing: using controlled vocabularies, e,g, libraries, early version of Yahoo Automatic indexing: indexing program assigns keywords, phrases or other features, e.g. words from text of document Popular Retrieval Models Boolean: exact match Vector Space: best match Citation analysis models: best match Probabilistic models: best match

5 Vector space Model Any text object can be represented by a term vector. Example Doc1: 0.3, 0.1, 0.4 Doc2: 0.8, 0.5, 0.6 Query: 0.0, 0.2,0.0 Similarity is determined by distance in a vector space. Vector Space Similarity: Cosine of the angle between the two vectors

6 Text representation Term weights Term weights reflect the estimated importance of each term The more often a word occurs in a document, the better that term is in describing what the document is about But terms that appear in many documents in the collection are not very useful for distinguishing a relevant document from a non-relevant one

7 Term weights: TF*IDF Term weight W ij = TF * IDF TF: Term Frequency IDF: Inverse Document Frequency

8 An example Document1: cat eat mouse, mouse eat chocolate Document 2: cat eat mouse Document 3: mouse eat chocolate mouse Query: cat Indexing terms: Vector representation for each document Vector representation of query Which document is more relevant?

9 example Document1: cat eat mouse, mouse eat chocolate Document 2: cat eat mouse Document 3: mouse eat chocolate mouse Query: cat

10 Evaluation Data collection TREC Evaluation criteria Recall: Percentage of all relevant documents that are found by a search Precision: Percentage of retrieved documents that are relevant

11 Discussion How to compare two IR systems Which is more important in Web search: precision or recall?

12 Next lecture PageRank paper Discussion on Google and other search engines Current solution to search


Download ppt "Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423."

Similar presentations


Ads by Google