Presentation is loading. Please wait.

Presentation is loading. Please wait.

ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:

Similar presentations


Presentation on theme: "ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:"— Presentation transcript:

1 ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:
Q/A SYSTEM ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:

2 COMPONENTS: Query Processing Passage Retrieval Answer Extraction

3 QUERY PROCESSING Classification Package Mallet Classifiers: Maxent
DecisionTree NaiveBayes Balanced Winnow

4 QUERY PROCESSING Features Semantic Morphological
Neighboring (Syntactic)

5 QUERY PROCESSING Stemming NLTK stemmer Trigrams:
Poor Classification results Named Entity Recognition NLTK NER Pre-trained model to do this task. 6 types of NE

6 Query Processing Our Results: For Binary:
BalancedWinnow: Testing Accuracy = 0.804 MaxEnt: Testing Accuracy = 0.78 For Real Values: BalancedWinnow: Testing Accuracy = 0.784 MaxEnt: Testing Accuracy = 0.758 Named Entity Recognition Testing Accuracy = 0.802

7 QUERY EXPANSION Two different methods: Target Concatenation
Add the target for each question to the end of the question. Deletion/Addition Deletion of wh-words + function words Addition of synonyms and hypernyms (via WordNet)

8 QUERY EXPANSION Addition Synonyms Hypernyms First Ancestor
Morphological variants WordNet as thesaurus: wordnet.morphy Poor results

9 PASSAGE RETRIEVAL Using Indri/Lemur Ran both query reformulation/expansion approaches through the software. Took the top 50 documents per query.

10 PASSAGE RETRIEVAL Used Indri/Lemur
Took the top passage from each of the top 50 documents for each query. Query grammar #combine[passageWIDTH:INC] Default for system: 120 terms, 1000 terms window

11 PASSAGE RETRIEVAL Passage Re-ranking Modified the window size
500, 1000 terms Modified the number of top passages taken from the top 50 documents: 1, 5, 10, 20, 25 passages

12 ANSWER EXTRACTION Stemming Applied to queries. Stopwords
Applied it to the indexing Removed all the stopwords Removed all but the wh-words

13 ANSWER EXTRACTION Term Weighting Applied it to the queries
Changed the weights of the target terms and query terms Utilized query grammar to implement this Snippet Extraction Using Indri’s API to implement this Encountered problems with fixed snippet size (due to hardcoding)

14 EVALUATION QE Approach MAP Target Concatenation 0.3223
Document ranking Note: All results based on TREC-2004 QE Approach MAP Target Concatenation 0.3223 Subtraction + WordNet 0.2381

15 EVALUATION (CONT.) Stopwords in indexing MAP No stopwords removed
0.3223 Stopwords removed 0.3262 Keeping WH-words 0.3407

16 EVALUATION (CONT.) Passage Retrieval QE Approach Type MRR
Target Concatenation Strict Lenient Subtraction + WordNet

17 EVALUATION (CONT.) Window Size Type MRR 1000 Strict 0.195439095783
Passage Re-ranking: Window Size Window Size Type MRR 1000 Strict Lenient 500 100

18 EVALUATION (CONT.) Strict Lenient Non-stemmed 0.340829170969
Stemming on query terms (using the Porter Stemmer) Strict Lenient Non-stemmed Stemmed

19 EVALUATION (CONT.) Strict Lenient 100 0.290138907213 0.407219304738
Snippet Extraction (using Indri/Lemur with different window sizes) Strict Lenient 100 500 1000

20 EVALUATION (CONT.) Strict Lenient Term weighting on queries
Balanced (no weights) Query = .33, Target = .66 Query = .66, Target = .33 Query = .80, Target = .20

21 FINAL RESULTS Strict Lenient 100 0.078516123253 0.132782385953 250
TREC 2004 (Training Data) Strict Lenient 100 250 1000

22 FINAL RESULTS Strict Lenient 100 0.0617858062617 0.150900599492 250
TREC 2004 (Training Data) Strict Lenient 100 250 1000

23 CONCLUSIONS Some things were helpful… Stemming Stopwords
Window Size/Query Grammar changes While others weren’t… Our attempt at Query Expansion Term Weighting We found improvement from the previous deliverable, but nothing dramatic. Still a lot left to be desired for future work (i.e. apply other answer extraction methods)

24 FUTURE WORK Work more with the Text Snippet feature from Indri?
Change the code to enable different snippet sizes Applying the work from query classification to our answer extraction or passage re- ranking Semantic Role Labeling Finding Bad Candidates Using redundancy-based QA ARANEA Structure-based extraction FrameNet

25 SOFTWARE PACKAGES USED
Mallet Indri/Lemur NLTK Porter Stemmer Self-written Code Stanford Parser, Berkeley Parser

26 READINGS Employing Two Question Answering Systems in TREC-2005, Sanda Harabagiu & others. Query Expansion/Reformulation Kwok, Etzioni, and Weld, 2001 Lin, 2007 Fang, 2008 Aktolga et al, 2011 Passage Retrieval Tiedemann et al, 2008 Indri/Lemur documentation


Download ppt "ABDULLAH ALOTAYQ, DONG WANG, ED PHAM PROJECT BY:"

Similar presentations


Ads by Google