Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview of a Information Retrieval System: Terrier Ashish.

Similar presentations


Presentation on theme: "Overview of a Information Retrieval System: Terrier Ashish."— Presentation transcript:

1 Overview of a Information Retrieval System: Terrier Ashish

2 overview Structural view –Indexing –Retrieval Extend Setup Run

3 IR Systems Terrier –Academic/ research –Open source Lucene-Nutch –Commercial/ research –Open source

4 Terrier Being developed at University of Glasgow.University of Glasgow Open Source OS independent : Java Easy to learn Easy to extend –modular

5 Subfolders-1 etc/ –Configuration files bin/ –Srcipts to compile and run the terrier lib/ –Java library, jar files containing the terrier system.

6 Subfolders-2 src/ –The java source files, user written plugins doc/ –Javadocs for terrier and for extended components var/ –Index/ Index files –Results/ Results and evaluation share/ –Shared resources such as stopwords, lexicon etc.

7 Indexing

8 Tokenization Identifying words –Based on space –Handling spacial characters such as -,$, digits etc. –Sometimes space is not word separator. German, Chinese –agglutinative languages Marathi

9 Term Pipelining Stemming/ finding root –ate -> eat Stopword removal –is, was, I, in etc. Abbreviations –Dr -> Doctor Normalisation –Color Vs colour

10 Index – data structures Direct Index –stores the identifiers of terms that appear in each document and the corresponding frequencies. Document Index –stores information about each document for example the document length and identifier, Inverted Index –stores the posting lists, i.e. the identifiers of the documents and their corresponding term frequencies. Lexicon –stores the collection vocabulary and the corresponding document and term frequencies.

11 Extending the indexing process Tokenisation: –uk.ac.gla.terrier.indexing.*Document Term Pipelines: –uk.ac.gla.terrier.terms.*

12 Retrieval query Index

13 Scoring and Ranking Score: S(d i,q j ) Documents are ranked (sorted) according to the score Presented to the user in decreasing order of S(d i,q j ) –Scoring model e.g. TF-IDF

14 Matching Process Input –Query and weighting model Output –Ranked resultset Weighting model –Himestra-LM Uses –Term Score Modifiers uk.ac.gla.terrier.matching.tsms –Document Score Modifiers uk.ac.gla.terrier.matching.dsms extend –uk.ac.gla.terrier.matching.models

15 Input Corpus –Very large set of documents Topics –Queries representing user need Relevance Results –Set of judgments per query per document

16 Topic format Mumbai85B7FB3BB9.htm.txt राज्यपालांनी घेतली राष्ट्रपती, उपराष्ट्रपतींची भेट मुंबई, ता. २१ - राज्यपाल एस. एम. कृष्णा यांनी आज राष्ट्रपती प्रतिभा पाटील आणि उपराष्ट्रपती डॉ. हमीद अन्सारी यांची दिल्ली येथे भेट घेतली. राष्ट्रपती, उपराष्ट्रपतिपदी निवड झाल्याच्या पार्श् ‍ वभूमीवर राज्यपालांनी भेट घेऊन त्यांचे स्वागत केले. आज दुपारी राष्ट्रपती भवन येथे श्रीमती प्रतिभा पाटील यांची भेट घेतल्यानंतर त्यांनी हरियाना भवन येथे जाऊन उपराष्ट्रपतींची भेट घेतली.

17 Document 5 भारतीय राष्ट्रपती निवडणूक २००७ भारताच्या राष्ट्रपती निवडणूकीशी संबंधित मुद्दे व घटना. राष्ट्रपतींची निवडणूक, उमेदवारांविरूध्द केलेली / गलिच्छ राजकीय चिखलफेक आणि आपल्या निकटच्या उमेदवाराचा पराभव करून प्रतिभा पाटील ह्यांचे भारताच्या सर्वप्रथम महिला राष्ट्रपती ( अध्यक्ष ) म्हणून निवडून येणे ह्या - विषयीची माहिती संबंधित कागदपत्रात असावयास हवी.

18 . 13 Q0 1100019.cms.txt 0 13 Q0 1102914.cms.txt 0 13 Q0 1104294.cms.txt 0 13 Q0 1104312.cms.txt 1 13 Q0 1110418.cms.txt 0 13 Q0 1123377.cms.txt 0 13 Q0 1124813.cms.txt 1 13 Q0 1126006.cms.txt 1. Relevance Judement Document id Query-id Relevence judgement: 0 or 1

19 Configuration files etc/terrier.properties –Utf-8 settings, stemmer, index name, etc etc/trec.topic.list –set topics/queries etc/trec.models –Set matching/retrieval model etc\trec.qrels –Set Relevane Judgement file path

20 Running terrier Already compiled To recompile –bin/compile.sh Setup corpus – bin/trec_setup.sh “ “ Index –bin/trec_terrier.sh -i Retrieval –bin/trec_terrier.sh -r Evaluate –bin/trec_terrier.sh -e “ ”

21 Reference http://ir.dcs.gla.ac.uk/terrier/doc/ http://ir.dcs.gla.ac.uk/wiki/Terrier


Download ppt "Overview of a Information Retrieval System: Terrier Ashish."

Similar presentations


Ads by Google