Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Collection “Recruiter” Shuang Song. Project Goal Given a collection, automatically suggest other items to add to the collection  Design a process.

Similar presentations


Presentation on theme: "Automatic Collection “Recruiter” Shuang Song. Project Goal Given a collection, automatically suggest other items to add to the collection  Design a process."— Presentation transcript:

1 Automatic Collection “Recruiter” Shuang Song

2 Project Goal Given a collection, automatically suggest other items to add to the collection  Design a process to achieve the task  Apply different filtering algorithms  Evaluate the result

3 The Process  Tokenization and frequency counting  New items extraction  New items filtering and ranking Query Terms Filter Collection External Source 1 2 3 Query Results Training Sets New Items

4 Filtering Algorithms Latent Semantic Analysis (LSA)  Pre-processing, no stemming  SVD over term by document matrix  Pseudo-document representation of new items Gzip Compression Algorithms

5 Relevance Measure - LSA LSA Feature Space Collection Signature Vector Pseudo-document Vector V* V

6 Relevance Measure - gzip

7 First Experiment – Math Forum Collection 19 courseware in the collection 10 items in the experiment set  First 5 from math forum  The other 5 from other collections in www.smete.org

8 First Experiment Result

9 Second Experiment – Collaborative Filtering Collection 12 papers in the collection 11 items in the experiment set  First 10 from Citeseer Query terms submitted: (information 284) (algorithm 250) (ratings 217) (filtering 159) (system 197) (query 149) (reputation 114) (reviewer 109) (collaborative 106) (recommendations 98)  Last one is the paper we read in class: “An Algorithm for Automated Rating of Reviewers”

10 Second Experiment Result

11 Second Experiment – User Study 6 people in my research lab participated in this study  3 of them with IR background  3 of them without IR background They were asked to rate the 11 items in the experiment set in according to the the degree of relevance to the given collection

12 Second Experiment Result – Human Rating

13 Second Experiment Result – Another View Document ID LSAgzip Group with IR background Group without IR background 1MMLL 2HHHH 3LLLM 4HLHM 5LHHH 6HMHM 7MLHH 8HLHH 9LMLL 10MHHH 11LHHM

14 Second Experiment Result – comparison of w/o SVD and w/o weightings

15 Second Experiment – Correlation with human rating

16 Second Experiment – precision and recall (cutoff: R LSA >0.5 & R gzip >0.2)

17 Second Experiment – precision and recall (cutoff: R LSA >0.4 & R gzip >0.17)

18 Comparison of Two Filtering Algorithms Gzip works well when input documents are just abstracts, while LSA works for both LSA captures words association pattern and statistical importance, gzip scans for repetition only. LSA is more computationally demanding, while gzip is simple Effectiveness

19 To Do List And Future Work Accurate and trustworthy evaluation from expert (collection owner?) Extract full text and abstract from Citeseer automatically


Download ppt "Automatic Collection “Recruiter” Shuang Song. Project Goal Given a collection, automatically suggest other items to add to the collection  Design a process."

Similar presentations


Ads by Google