Presentation is loading. Please wait.

Presentation is loading. Please wait.

Current and Future Research Directions University of Tehran Database Research Group 1 October 2009 Abolfazl AleAhmad, Ehsan Darrudi, Hadi.

Similar presentations


Presentation on theme: "Current and Future Research Directions University of Tehran Database Research Group 1 October 2009 Abolfazl AleAhmad, Ehsan Darrudi, Hadi."— Presentation transcript:

1 Current and Future Research Directions University of Tehran Database Research Group 1 October 2009 Abolfazl AleAhmad, Ehsan Darrudi, Hadi Amiri, Azadeh Shakery, Farhad Oroumchian

2 1 Oct 2009 Current and Future Research Directions Why Persian IR Language Resources for Persian Hamshahri at CLEF 2009 participants results pool analysis Future works Outline 2

3 1 Oct 2009 Persian in the Middle East 3 Source: Internet World Stats, User Population Growth on the Web ( ) Current and Future Research Directions

4 1 Oct 2009 Current and Future Research Directions Why Persian IR Updated in June 2009 from Internet World Stats 4

5 1 Oct 2009 A branch of Indo-European Languages Official Language of Iran, Afghanistan and Tajikistan Its morphological analysis is Comparably difficult The word “خبر” has two plural forms: Persian rules: “خبرها” Arabic rules: “اخبار” Writing Style Issues: e.g. ”می شود“ and “میشود” are the same e.g. ”کتابها“ and ”کتاب ها“ are the same 5 Current and Future Research Directions The Persian Language

6 1 Oct 2009 Persian Test Collections Text IR Domain Ghavanin (domain specific) Hamshahri (news): Hamshahri 2 (recently developed 50 topics) Web IR Domain FWT1m (.ir Web) nearly 1Million docs NLP Domain Bijankhan (2.7 Million Words): 6 Current and Future Research Directions

7 1 Oct 2009 Hamshahri at CLEF 2008 & News articles of Hamshahri newspaper from year 1996 to bilingual topics 166,000+ documents Current and Future Research Directions Hamshahri 2 News articles of Hamshahri newspaper from year 1996 to bilingual topics 320,000 documents (2times larger ~ 1.5GB) Richer document tags

8 1 Oct Participants Current and Future Research Directions 1.JHU-APL N-gram tokenization (skip n-grams for n=5) 2.Unine Developed “light” and “plural” stemmers and blind query expansion 3.Open Text Savoy’s Stemmer and 4-grams Pool analysis (with top 10,000 retrieved docs) 4.Quazvin IAU Perstem for monolingual runs (Prec +91%, Rec +43%) “Query Wikification” Algorithm for bilingual runs

9 1 Oct Final Results Current and Future Research Directions

10 1 Oct Final Results Current and Future Research Directions

11 1 Oct Pool of CLEF 2008 Current and Future Research Directions

12 1 Oct Pool of CLEF 2009 Current and Future Research Directions

13 1 Oct Pool Comparison Current and Future Research Directions Quoted from: Stephen Tomlinson. German, French, English and Persian Retrieval Experiments at CLEF 2008 & Working Notes for the CLEF 2008 & 2009 Workshops.

14 1 Oct Pool Comparison Current and Future Research Directions Quoted from: Stephen Tomlinson. German, French, English and Persian Retrieval Experiments at CLEF 2008 & Working Notes for the CLEF 2008 & 2009 Workshops

15 1 Oct 2009 Current and Future Research Directions Using Hamshahri 2 for CLEF 2010 (50 training topics) A campaign on the Persian WebIR collection Creation of an English-Persian parallel corpora Creation of a comparable corpora A stemmer for the Persian language Future Works 15

16 1 Oct 2009 Thanks ? 16 Current and Future Research Directions

17 1 Oct Current and Future Research Directions

18 1 Oct Current and Future Research Directions

19 1 Oct Current and Future Research Directions

20 1 Oct Current and Future Research Directions

21 1 Oct Current and Future Research Directions

22 1 Oct Current and Future Research Directions

23 1 Oct Current and Future Research Directions

24 1 Oct Current and Future Research Directions

25 1 Oct Current and Future Research Directions

26 1 Oct Current and Future Research Directions

27 1 Oct Current and Future Research Directions


Download ppt "Current and Future Research Directions University of Tehran Database Research Group 1 October 2009 Abolfazl AleAhmad, Ehsan Darrudi, Hadi."

Similar presentations


Ads by Google