Named Entity Classification Chioma Osondu & Wei Wei.

Slides:



Advertisements
Similar presentations
Machine Learning Basics with Applications to Spam Detection UGR P ROJECT - H AOYU LI, BRITTANY EDWARDS, WEI ZHANG UNDER XIAOXIAO XU AND ARYE NEHORAI.
Advertisements

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Tweet Classification for Political Sentiment Analysis Micol Marchetti-Bowick.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Text Categorization Karl Rees Ling 580 April 2, 2001.
Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick Dan Preston
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
Applicability of N-Grams to Data Classification A review of 3 NLP-related papers Presented by Andrei Missine (CS 825, Fall 2003)
Intelligent Database Systems Lab Presenter : JIAN-REN CHEN Authors : Ahmed Abbasi, Stephen France, Zhu Zhang, and Hsinchun Chen 2011, IEEE TKDE Selecting.
Lecture Notes for Chapter 4 Introduction to Data Mining
A Novel Approach to Event Duration Prediction
Semantic Analysis of Movie Reviews for Rating Prediction
Mapping Between Taxonomies Elena Eneva 30 Oct 2001 Advanced IR Seminar.
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Ensemble Learning: An Introduction
Bagging and Boosting in Data Mining Carolina Ruiz
Document Classification Comparison Evangel Sarwar, Josh Woolever, Rebecca Zimmerman.
Movie Info Web Search & Classification Frankie Wu CS224N Final Project Spring 2008.
Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff, Janyce Wiebe, Theresa Wilson Presenter: Gabriel Nicolae.
Statistical techniques in NLP Vasileios Hatzivassiloglou University of Texas at Dallas.
Author Identification for LiveJournal Alyssa Liang.
Modeling Consensus: Classifier Combination for WSD Authors: Radu Florian and David Yarowsky Presenter: Marian Olteanu.
Automatic Sentiment Analysis in On-line Text Erik Boiy Pieter Hens Koen Deschacht Marie-Francine Moens CS & ICRI Katholieke Universiteit Leuven.
SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
Jay Stokes, Microsoft Research John Platt, Microsoft Research Joseph Kravis, Microsoft Network Security Michael Shilman, ChatterPop, Inc. ALADIN: Active.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Automatic Extraction of Opinion Propositions and their Holders Steven Bethard, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou and Dan Jurafsky Department.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Outline Classification Linear classifiers Perceptron Multi-class classification Generative approach Naïve Bayes classifier 2.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
Web Page Language Identification Based on URLs Reporter: 鄭志欣 Advisor: Hsing-Kuo Pao 1.
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Collating Social Network Profiles. Objective 2 System.
Machine Learning Queens College Lecture 2: Decision Trees.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
ECE738 Advanced Image Processing Face Detection IEEE Trans. PAMI, July 1997.
Classification Techniques: Bayesian Classification
Introduction Use machine learning and various classifying techniques to be able to create an algorithm that can decipher between spam and ham s. .
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Matwin Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa
CSC 594 Topics in AI – Text Mining and Analytics
Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research.
Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Cheng-Lung Huang Mu-Chen Chen Chieh-Jen Wang
Question Classification using Support Vector Machine Dell Zhang National University of Singapore Wee Sun Lee National University of Singapore SIGIR2003.
Musical Genre Categorization Using Support Vector Machines Shu Wang.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
A classifier-based approach to preposition and determiner error correction in L2 English Rachele De Felice, Stephen G. Pulman Oxford University Computing.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Feasibility of Using Machine Learning Algorithms to Determine Future Price Points of Stocks By: Alexander Dumont.
Thumbs up? Sentiment Classification using Machine Learning Techniques Jason Lewris, Don Chesworth “Okay, I’m really ashamed of it, but I enjoyed it. I.
Sentiment analysis algorithms and applications: A survey
Juweek Adolphe Zhaoyu Li Ressi Miranda Dr. Shang
Classification Techniques: Bayesian Classification
Automatic Detection of Causal Relations for Question Answering
Classification with CART
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Presentation transcript:

Named Entity Classification Chioma Osondu & Wei Wei

Classifiers Decision Tree Multinomial Naïve Bayes Support Vector Machines

Features Unigrams Bigrams Trigrams Quadrigrams Specialized features like number of words, presence of numbers, etc Stemmed words

Accuracy with Tree Depth Accuracy does not grow with the tree depth Accuracy is lower than Maximum Entropy Model with the same sets features.

Results & Error Analysis (1) Features Are not abstract enough: Corp., Corporation, Inc., is really the same feature. Out of the 599 disputed classifications, MEM had 481 correct, and the decision tree had 118 correct Not enough features defined on Place, Movie and Person.

Results & Error Analysis (2)

Results & Error Analysis (3)

Results & Error Analysis (4)

Conclusion & Future Work Stemmed words are too coarse for multi-way Better accuracies of over 94% can be achieved using a combination of features See Automatic Classification of Previously Unseen Proper Noun Phrases into Semantic Categories Using an N-Gram Letter Model by Stephen Patel & Joseph Smarr (2001 Final Project) Combining classifiers