Automatic Collection “Recruiter” Shuang Song. Project Goal Given a collection, automatically suggest other items to add to the collection  Design a process.

Automatic Collection “Recruiter” Shuang Song

Project Goal Given a collection, automatically suggest other items to add to the collection  Design a process to achieve the task  Apply different filtering algorithms  Evaluate the result

The Process  Tokenization and frequency counting  New items extraction  New items filtering and ranking Query Terms Filter Collection External Source 1 2 3 Query Results Training Sets New Items

Filtering Algorithms Latent Semantic Analysis (LSA)  Pre-processing, no stemming  SVD over term by document matrix  Pseudo-document representation of new items Gzip Compression Algorithms

Relevance Measure - LSA LSA Feature Space Collection Signature Vector Pseudo-document Vector V* V

Relevance Measure - gzip

First Experiment – Math Forum Collection 19 courseware in the collection 10 items in the experiment set  First 5 from math forum  The other 5 from other collections in www.smete.org

First Experiment Result

Second Experiment – Collaborative Filtering Collection 12 papers in the collection 11 items in the experiment set  First 10 from Citeseer Query terms submitted: (information 284) (algorithm 250) (ratings 217) (filtering 159) (system 197) (query 149) (reputation 114) (reviewer 109) (collaborative 106) (recommendations 98)  Last one is the paper we read in class: “An Algorithm for Automated Rating of Reviewers”

Second Experiment Result

Second Experiment – User Study 6 people in my research lab participated in this study  3 of them with IR background  3 of them without IR background They were asked to rate the 11 items in the experiment set in according to the the degree of relevance to the given collection

Second Experiment Result – Human Rating

Second Experiment Result – Another View Document ID LSAgzip Group with IR background Group without IR background 1MMLL 2HHHH 3LLLM 4HLHM 5LHHH 6HMHM 7MLHH 8HLHH 9LMLL 10MHHH 11LHHM

Second Experiment Result – comparison of w/o SVD and w/o weightings

Second Experiment – Correlation with human rating

Second Experiment – precision and recall (cutoff: R LSA >0.5 & R gzip >0.2)

Second Experiment – precision and recall (cutoff: R LSA >0.4 & R gzip >0.17)

Comparison of Two Filtering Algorithms Gzip works well when input documents are just abstracts, while LSA works for both LSA captures words association pattern and statistical importance, gzip scans for repetition only. LSA is more computationally demanding, while gzip is simple Effectiveness

To Do List And Future Work Accurate and trustworthy evaluation from expert (collection owner?) Extract full text and abstract from Citeseer automatically

Automatic Collection “Recruiter” Shuang Song. Project Goal Given a collection, automatically suggest other items to add to the collection  Design a process.

Similar presentations

Presentation on theme: "Automatic Collection “Recruiter” Shuang Song. Project Goal Given a collection, automatically suggest other items to add to the collection  Design a process."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automatic Collection “Recruiter” Shuang Song. Project Goal Given a collection, automatically suggest other items to add to the collection  Design a process.

Similar presentations

Presentation on theme: "Automatic Collection “Recruiter” Shuang Song. Project Goal Given a collection, automatically suggest other items to add to the collection  Design a process."— Presentation transcript:

Similar presentations

About project

Feedback