Query Classification The task is to assign a query to one or more predefined categories, based on its topics. (from wikipedia) Applications: Paid Placement Advertisement, Federated Search. Challenge: Query is short, noisy.
Query Classification To handle those challenges, (Pseudo) Relevance Feedback is used to enrich the queries. But it involves sophisticated searching and ranking function. The motivation is to study the performance of query classification in the absence of PRF.
Query Classification Another approach is to enrich the queries with co-occurrence terms from query logs. For example, the query machine learning is strongly correlated to machine learning algorithm and machine learning research in query logs.
Our Approach Vector Space Model: A document is represented as a vector. Each dimension corresponds to a separate term. If a term occurs in the document, its value in the vector is non-zero.
Our Approach N-gram model in word level and character level. Linear SVM. Ensemble Linear SVM (Symmetrical case), base classifier trained from snippets, titles, urls respectively.
Our Approach Multi-label, Multi-class problem: decompose it to Binary class problem. Evaluation Criteria: Micro-Precision, Micro-Recall and Micro-F1.
Results – Symmetrical Case
Result –Symmetrical Case
Result – Asymmetrical Case
Conclusion Pseudo-Relevance Feedback yields better performance, however it is a post-search strategy. Yahoo suggested keyword achieve worse result. Training with PRF, testing with suggested keywords is in the middle, but it doesnt require the searching and ranking.