Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deliverable #2: Question Classification Group 5 Caleb Barr Maria Alexandropoulou.

Similar presentations


Presentation on theme: "Deliverable #2: Question Classification Group 5 Caleb Barr Maria Alexandropoulou."— Presentation transcript:

1 Deliverable #2: Question Classification Group 5 Caleb Barr Maria Alexandropoulou

2 Software used JAVA in order to perform feature extraction Illinois Chunker was applied to extract chunks Python – Automating classification tasks – Preprocessing of data when necessary Mallet was used for the classification task

3 System Properties Classification Algorithms – MaxEnt – NaiveBayes Training data – Sum of: Li and Roth Training set 5 (5500 questions) TREC-2004 Test data – Li and Roth test data set – TREC-2005.xml

4 System Properties (cont.) Features extracted Focused on syntactic features since we targeted coarse classification (i.e. conclusion in Li and Roth) – Unigrams – Bigrams – Trigrams – Chunks with POS tags e.g. [NP (DT) (JJ) (NN)] – Head NP/VP chunks as in Li and Roth e.g. [NP (DT the) (JJS oldest) ] in “What is the oldest profession ? “

5 Runs performed Runs were performed for all combinations of classification algorithms and feature templates e.g. MaxEnt, Unigrams NaiveBayes, Unigrams, Bigrams, Chunks etc

6 Charts

7

8 Conclusions Maximum test accuracy – TREC10: 0.892 UnigramsBigramsHeads Maxent – TREC2005: 0.81758 UnigramsBigramsHeads NaiveBayes (MaxEnt was very close) Trigrams affect accuracy negatively – bad feature

9 Sample confusion matrix for our best accuracy TREC_10_MaxEnt_UnigramBigramHeads: label012345total 0 DESC1362----138 1 ENTY1276-2-494 2 ABBR2-7---9 3 HUM14-59-165 4 NUM93--983113 5 LOC56---7081


Download ppt "Deliverable #2: Question Classification Group 5 Caleb Barr Maria Alexandropoulou."

Similar presentations


Ads by Google