Q/A System First Stage: Classification Project by: Abdullah Alotayq, Dong Wang, Ed Pham.

Q/A System First Stage: Classification Project by: Abdullah Alotayq, Dong Wang, Ed Pham

Query Processing Classification Package: Mallet Classifiers: Maxent, DecisionTree, C45, NaiveBayes, AdaBoost, Winnow, Balanced Winnow, Bagging Trainer.etc

Main Techniques

Features Semantic Morphological Neighboring (Syntactic)

Stemming nltk stemmer

N-grams Bigrams:

Trigrams: – Poor Classification results 0.48 0.478 Not A good strategy.

NER (Named Entity Recognition) nltk NER pre-trained model to do this task. 6 types of NE

Frequencies TypeFreq. GSP22 FACILITY3 GPE1203 PERSON600 LOCATION21 ORGANIZATION622 Training Data:

Test Data: TypeFreq. GSP2 FACILITY0 GPE90 PERSON35 LOCATION3 ORGANIZATION42

NO Named Entity detected In training data: 3533, namely 64.8% In test data, 353, 70.6%. -> data sparseness problem

NER Results & Future work Test data accuracy= 0.802 we might try other NE tools, which would give more NE types and cover more percentage on training and test data.

Binary and Real Values Testing for potential improvement. Best performing classifiers: For Binary: – BalancedWinnow: Test data accuracy= 0.804 – MaxEnt: Test accuracy mean = 0.78 For Real Values: -BalancedWinnow: Test data accuracy= 0.784 -MaxEnt: Test data accuracy= 0.758

Data set1: TypeTrainerResults Binary BalancedWinnow0.804 DecisionTree0.68 MaxEnt0.756 NaiveBayes0.546 Real Values BalancedWinnow0.784 DecisionTree0.42 MaxEnt0.758 NaiveBayes0.54 NER Binary BalancedWinnow0.802 DecisionTree0.5 MaxEnt0.772 NaiveBayes0.54 NER Real Values DecisionTree0.48 MaxEnt0.768 NaiveBayes0.538 Bigrams Binary MaxEnt0.702 NaiveBayes0.624 BalancedWinnow0.76 Bigrams Real Values MaxEnt0.698 NaiveBayes0.624 BalancedWinnow0.76 Trigrams Binary NaiveBayes0.4 BalancedWinnow0.478 Trigrams Real Values NaiveBayes0.4 BalancedWinnow0.478

Data set2: TypeTrainerResults Binary BalancedWinnow0.74 MaxEnt0.74 NaiveBayes0.72 Real Values BalancedWinnow0.784 MaxEnt0.75 NaiveBayes0.71 Stemmed Binary BalancedWinnow0.78 MaxEnt0.76 NaiveBayes0.76 Stemmed Real Values BalancedWinnow0.75 MaxEnt0.77

Proposed future improvement WordNet Senses Class-Specific Related Words

Issues Performing poorly on some refinements. – Low accuracy scores: 0.42 0.54 Memory consuming classifiers. – Classifiers showed some error messages.

Successes Made progress in creating the system. Had some hands-on experience dealing with classifiers, and NLP packages. Learned ways to improve classification results.

Readings that helped Employing Two Question Answering Systems in TREC-2005, Sanda Harabagiu & others.

Software packages participated Mallet NLTK Porter-stemmer Self-written code files Stanford Parser, Berkeley Parser

Q/A System First Stage: Classification Project by: Abdullah Alotayq, Dong Wang, Ed Pham.

Similar presentations

Presentation on theme: "Q/A System First Stage: Classification Project by: Abdullah Alotayq, Dong Wang, Ed Pham."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Q/A System First Stage: Classification Project by: Abdullah Alotayq, Dong Wang, Ed Pham.

Similar presentations

Presentation on theme: "Q/A System First Stage: Classification Project by: Abdullah Alotayq, Dong Wang, Ed Pham."— Presentation transcript:

Similar presentations

About project

Feedback