Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Retrieval Ch 23.2. Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection.

Similar presentations


Presentation on theme: "Information Retrieval Ch 23.2. Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection."— Presentation transcript:

1 Information Retrieval Ch 23.2

2 Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection Query language Result set Presentation of the result set

3 Evaluating IR system Precision (relevant doc in result set)/(doc in result set) Recall (relevant doc in result set)/(relevant doc)

4 Presentation of result sets Relevance feedback User saying which doc are relevant Document classification Preexisting taxonomy of topics Ch 18 Document clustering Tree of categories is created from scratch Ch20.3 Agglomerative clustering: merge nearest two doc. K-means clustering: assign doc. Into k categories.

5 K-means clustering 1. Pick k documents at random to represent the k categories 2. Assign every document to the closest category 3. Compute the mean of each cluster and use the k means to represent the new values of the k categories. 4. Repeat steps 2 and 3 until convergence.

6 Implementing IR systems Lexicon Stop words Inverted file Vector space model

7 Vector Space Model Transform document into vector Di=ABC, Dj=BBC Di={1, 1, 1}, Dj={0,2,1} Measure the distance between two document Dist=Di ‧ Dj = Sqrt((1-0) 2 + (1-2) 2 + (1-1) 2 ) Retrieval documents with smallest distance


Download ppt "Information Retrieval Ch 23.2. Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection."

Similar presentations


Ads by Google