Mapping Between Taxonomies Elena Eneva 30 Oct 2001 Advanced IR Seminar.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Naïve-Bayes Classifiers Business Intelligence for Managers.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
ICML Linear Programming Boosting for Uneven Datasets Jurij Leskovec, Jožef Stefan Institute, Slovenia John Shawe-Taylor, Royal Holloway University.
Data Mining Classification: Alternative Techniques
Text Categorization Karl Rees Ling 580 April 2, 2001.
Indian Statistical Institute Kolkata
TÍTULO GENÉRICO Concept Indexing for Automated Text Categorization Enrique Puertas Sanz Universidad Europea de Madrid.
Christine Preisach, Steffen Rendle and Lars Schmidt- Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Germany Relational.
Assuming normally distributed data! Naïve Bayes Classifier.
K nearest neighbor and Rocchio algorithm
Low/High Findability Analysis Shariq Bashir Vienna University of Technology Seminar on 2 nd February, 2009.
Ensemble Learning what is an ensemble? why use an ensemble?
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Mapping Between Taxonomies Elena Eneva 27 Sep 2001 Advanced IR Seminar.
Mapping Between Taxonomies Elena Eneva 11 Dec 2001 Advanced IR Seminar.
Three kinds of learning
Making the Most of Small Sample High Dimensional Micro-Array Data Allan Tucker, Veronica Vinciotti, Xiaohui Liu; Brunel University Paul Kellam; Windeyer.
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Machine Learning CS 165B Spring 2012
Final review LING572 Fei Xia Week 10: 03/11/
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
MULTICLASS CONTINUED AND RANKING David Kauchak CS 451 – Fall 2013.
Bayesian Networks. Male brain wiring Female brain wiring.
CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
WEKA – Knowledge Flow & Simple CLI
Human Gesture Recognition Using Kinect Camera Presented by Carolina Vettorazzo and Diego Santo Orasa Patsadu, Chakarida Nukoolkit and Bunthit Watanapa.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
AUTOMATED TEXT CATEGORIZATION: THE TWO-DIMENSIONAL PROBABILITY MODE Abdulaziz alsharikh.
Learning from Observations Chapter 18 Through
Web Taxonomy Integration through Co-Bootstrapping Dell Zhang National University of Singapore Wee Sun Lee National University of Singapore SIGIR’04.
Externally Enhanced Classifiers and Application in Web Page Classification Join work with Chi-Feng Chang and Hsuan-Yu Chen Jyh-Jong Tsay National Chung.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
CLASSIFICATION: Ensemble Methods
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
Adaptive Sampling Methods for Scaling up Knowledge Discovery Algorithms From Ch 8 of Instace selection and Costruction for Data Mining (2001) From Ch 8.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Project Final Presentation – Dec. 6, 2012 CS 5604 : Information Storage and Retrieval Instructor: Prof. Edward Fox GTA : Tarek Kanan ProjArabic Team Ahmed.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
IR Homework #3 By J. H. Wang May 4, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
CSE 473 Ensemble Learning. © CSE AI Faculty 2 Ensemble Learning Sometimes each learning technique yields a different hypothesis (or function) But no perfect.
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Data Mining and Text Mining. The Standard Data Mining process.
IR Homework #2 By J. H. Wang May 9, Programming Exercise #2: Text Classification Goal: to classify each document into predefined categories Input:
Naïve Bayes Classification Recitation, 1/25/07 Jonathan Huang.
Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.
A Simple Approach for Author Profiling in MapReduce
Semi-supervised Machine Learning Gergana Lazarova
Perceptrons Lirong Xia.
SAD: 6º Projecto.
CS6604 Project Ensemble Classification
Project 2 k-NN 2018/11/10.
Ensemble learning Reminder - Bagging of Trees Random Forest
Perceptrons Lirong Xia.
Presentation transcript:

Mapping Between Taxonomies Elena Eneva 30 Oct 2001 Advanced IR Seminar

Idea Review German French Textile Automobile By country By industry

Learning Algorithms 2 separate learners for the documents Old doc category -> new doc category Doc contents -> new category Put together Weighted average based on confidence Final result determined by a decision tree One combined learner – used both old category and contents as features

Data Sets Hoovers – 4285 documents –28 categories –255 categories Reuter 2001 – documents –Topics –Industry categories

Current System Simple Decision Tree (C4.5) – learns probabilities of new categories based on old categories (doesn’t know about documents/words) Naïve Bayes (rainbow) – word-based classification into the new categories (doesn’t know about old categories) Combination (Decision Tree) – takes the outputs and confidences of the two, predicts new category

Current Results NB tr NB te DT tr DT te Comb tr Comb te 28p255? p28??100 Accuracy (%) Five fold cross validation

Work in Progress Naïve Bayes for 255 predict 28 (expect higher accuracies) Use one classifier only (taking both kinds of features - words & old categories) – NB An additional single simple classifier – KNN (and VNC-Light, if there is time in the end) Run everything on Reuters 2001 (in addition to Hoovers)

Comments? The end.