Opinion Mining and Topic Categorization with Novel Term Weighting Roman Sergienko, Ph.D student Tatiana Gasanova, Ph.D student Ulm University, Germany.

Slides:



Advertisements
Similar presentations
Content-based Recommendation Systems
Advertisements

ADBIS 2007 Discretization Numbers for Multiple-Instances Problem in Relational Database Rayner Alfred Dimitar Kazakov Artificial Intelligence Group, Computer.
Text Categorization.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Chapter 5: Introduction to Information Retrieval
Text Categorization Moshe Koppel Lecture 1: Introduction Slides based on Manning, Raghavan and Schutze and odds and ends from here and there.
Farag Saad i-KNOW 2014 Graz- Austria,
Evaluation of Decision Forests on Text Categorization
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Ping-Tsun Chang Intelligent Systems Laboratory Computer Science and Information Engineering National Taiwan University Text Mining with Machine Learning.
Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) KDD Group Research Seminar.
Learning for Text Categorization
Optimizing Text Classification Mark Trenorden Supervisor: Geoff Webb.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
A Technique for Advanced Dynamic Integration of Multiple Classifiers Alexey Tsymbal*, Seppo Puuronen**, Vagan Terziyan* *Department of Artificial Intelligence.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
ApMl (All Purpose Machine Learning) Toolkit David W. Miller and Helen Howell Semantic Web Final Project Spring 2002 Department of Computer Science University.
HypertextHypertext Categorization Rayid Ghani IR Seminar - 10/3/00.
Recommender systems Ram Akella November 26 th 2008.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
1 An Excel-based Data Mining Tool Chapter The iData Analyzer.
Opinion mining in social networks Student: Aleksandar Ponjavić 3244/2014 Mentor: Profesor dr Veljko Milutinović.
Advanced Multimedia Text Classification Tamara Berg.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Efficient Concept-Based Mining Model for Enhancing.
Text mining.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
The identification of interesting web sites Presented by Xiaoshu Cai.
Text Classification, Active/Interactive learning.
Wei Zhang Akshat Surve Xiaoli Fern Thomas Dietterich.
Data Mining – Algorithms: Linear Models Chapter 4, Section 4.6.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
SCAVENGER: A JUNK MAIL CLASSIFICATION PROGRAM Rohan Malkhare Committee : Dr. Eugene Fink Dr. Dewey Rundus Dr. Alan Hevner.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.
Data Management and Database Technologies 1 DATA MINING Extracting Knowledge From Data Petr Olmer CERN
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
CONCEPTS AND TECHNIQUES FOR RECORD LINKAGE, ENTITY RESOLUTION, AND DUPLICATE DETECTION BY PETER CHRISTEN PRESENTED BY JOSEPH PARK Data Matching.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
1 Data Mining: Text Mining. 2 Information Retrieval Techniques Index Terms (Attribute) Selection: Stop list Word stem Index terms weighting methods Terms.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Class Imbalance in Text Classification
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Searching Full Text 3.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Comparing Association Rules and Decision Trees for Disease.
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Spam Detection Kingsley Okeke Nimrat Virk. Everyone hates spams!! Spam s, also known as junk s, are unwanted s sent to numerous recipients.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Proximity based one-class classification with Common N-Gram dissimilarity for authorship verification task Magdalena Jankowska, Vlado Kešelj and Evangelos.
Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE Using Support Vector Machine for Integrating Catalogs.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
IR 6 Scoring, term weighting and the vector space model.
A Simple Approach for Author Profiling in MapReduce
Presentation by: ABHISHEK KAMAT ABHISHEK MADHUSUDHAN SUYAMEENDRA WADKI
Efficient Ranking of Keyword Queries Using P-trees
SAD: 6º Projecto.
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Prepared by: Mahmoud Rafeek Al-Farra
Special Topics in Text Mining
Mulugeta H Tedla University of Cincinnati, April 22, 2008
Presentation transcript:

Opinion Mining and Topic Categorization with Novel Term Weighting Roman Sergienko, Ph.D student Tatiana Gasanova, Ph.D student Ulm University, Germany Shaknaz Akhmedova, Ph.D. student Siberian State Aerospace University, Krasnoyarsk, Russia

Contents  Motivation  Databases  Text preprocessing methods  The novel term weighting method  Features selection  Classification algorithms  Results of numerical experiments  Conclusions 2

Motivation  The goal of the work is to evaluate the competitiveness of the novel term weighting in comparison with the standard techniques for opining mining and topic categorization.  The criteria are: 1)Macro F-measure for the test set 2)Computational time 3

Databases: DEFT’07 and DEFT’08 4 CorpusSizeClasses BooksTrain size = 2074 Test size = 1386 Vocabulary = : negative, 1: neutral, 2: positive GamesTrain size = 2537 Test size = 1694 Vocabulary = : negative, 1: neutral, 2: positive DebatesTrain size = Test size = Vocabulary = : against, 1: for CorpusSizeClasses T1Train size = Test size = Vocabulary = : Sport, 1: Economy, 2: Art, 3: Television T2Train size = Test size = Vocabulary = : France, 1: International, 2: Literature, 3: Science, 4: Society

The existing text preprocessing methods  Binary preprocessing  TF-IDF (Salton and Buckley, 1988) 5  Confident Weights (Soucy and Mineau, 2005)

The novel term weighting method 6 L – the number of classes; n i – the number of instances of the i-th class; N ji – the number of j-th word occurrence in all instances of the i-th class; T ji =N ji /n i – the relative frequency of j-th word occurrence in the i-th class; Rj=max i T ji, S j =arg(max i T ji ) – the number of class which we assign to j-th word.

Features selection 1)Calculating a relative frequency for each word in the each class 2)Choice for each word the class with the maximum relative frequency 3)For each classification utterance calculating sums of weights of words which belong to each class 4)Number of attributes = number of classes 7

Classification algorithms 8

Computational effectiveness 9 DEFT’07 DEFT’08

The best values of F-measure 10 ProblemF- measure The best known value Term weighting method Classification algorithm Books The novel TWSVM Games ConfWeightk-NN Debates ConfWeightSVM T The novel TWSVM T The novel TWSVM

Comparison of ConfWeight and the novel term weighting 11 ProblemConfWeightThe novel TW Difference Books Games Debates T T

Conclusions  The novel term weighting method gives similar or better classification quality than the ConfWeight method but it requires the same amount of time as TF-IDF. 12