One-class Classification of Text Streams with Concept Drift

Slides:



Advertisements
Similar presentations
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Advertisements

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.
Learning under concept drift: an overview Zhimin He iTechs – ISCAS
Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
DIMENSIONALITY REDUCTION: FEATURE EXTRACTION & FEATURE SELECTION Principle Component Analysis.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
The Problem of Concept Drift: Definitions and Related Work Alexev Tsymbalo paper. (April 29, 2004)
Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams YING YANG, XINDONG WU, XINGQUAN ZHU Data Mining and Knowledge.
Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach Xiaoli Zhang Fern, Carla E. Brodley ICML’2003 Presented by Dehong Liu.
CompLACS Composing Learning for Artificial Cognitive Systems Year 2: Specification of scenarios.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Robust Object Tracking via Sparsity-based Collaborative Model
Christine Preisach, Steffen Rendle and Lars Schmidt- Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Germany Relational.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University.
Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.
Multiple Instance Learning
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
ACM Multimedia th Annual Conference, October , 2004
Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton.
DUAL STRATEGY ACTIVE LEARNING presenter: Pinar Donmez 1 Joint work with Jaime G. Carbonell 1 & Paul N. Bennett 2 1 Language Technologies Institute, Carnegie.
Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
Distributed Representations of Sentences and Documents
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
Automatic Gender Identification using Cell Phone Calling Behavior Presented by David.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Semi-Supervised Learning with Concept Drift using Particle Dynamics applied to Network Intrusion Detection Data Fabricio Breve Institute of Geosciences.
Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará
Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.
Learning with Positive and Unlabeled Examples using Weighted Logistic Regression Wee Sun Lee National University of Singapore Bing Liu University of Illinois,
Smart RSS Aggregator A text classification problem Alban Scholer & Markus Kirsten 2005.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Recent Trends in Text Mining Girish Keswani
1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.
Universit at Dortmund, LS VIII
Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Semi-Supervised Time Series Classification & DTW-D REPORTED BY WANG YAWEN.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Prediction of Influencers from Word Use Chan Shing Hei.
Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.
Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Semi-automatic Product Attribute Extraction from Store Website
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Xiangnan Kong,Philip S. Yu An Ensemble-based Approach to Fast Classification of Multi-label Data Streams Dept. of Computer Science University of Illinois.
Unsupervised Streaming Feature Selection in Social Media
Online Manifold Regularization: A New Learning Setting and Empirical Study Andrew B. Goldberg, Ming Li, and Xiaojin Zhu.(ECML PKDD, 2008). Hu EnLiang Friday,
High Throughput and Programmable Online Traffic Classifier on FPGA Author: Da Tong, Lu Sun, Kiran Kumar Matam, Viktor Prasanna Publisher: FPGA 2013 Presenter:
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu, W. Wang, and B. Shi Proc. of the Fifth IEEE International.
Max-Confidence Boosting With Uncertainty for Visual tracking WEN GUO, LIANGLIANG CAO, TONY X. HAN, SHUICHENG YAN AND CHANGSHENG XU IEEE TRANSACTIONS ON.
Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies Bhavana Dalvi ¶*, Aditya Mishra †, and William W. Cohen * ¶ Allen Institute.
Queensland University of Technology
ECG data classification with deep learning tools
Cross Domain Distribution Adaptation via Kernel Mapping
Transfer Learning in Astronomy: A New Machine Learning Paradigm
Active learning The learning algorithm must have some control over the data from which it learns It must be able to query an oracle, requesting for labels.
Open-Category Classification by Adversarial Sample Generation
Learning from Data Streams
Presentation transcript:

One-class Classification of Text Streams with Concept Drift Yang ZHANG, Xue LI, Maria Orlowska DDDM 2008 The University of Queensland Australia

Outline Motivation Related Work Framework for One-class Classification of Data Stream Learning Concept Drift under One-class Scenario Experiment Result Future Work

Motivation State-of-art data stream classification algorithm: Based on fully labeled data. Impossible to label all data. Expensive to label data. Changing of user interests. Difficult apply to real-life applications.

Scenario The user feedback emails to the customer service section: finding out the feedback emails of a certain newly launched product. Building a text data stream classification system to retrieve all the ontopic feedbacks. Section manager behavior: Patient enough to label only a few ontopic emails. No patient to label offtopic emails.

One-class Classification of Text Stream Challenge Concept drift. Small number of training data. No negative training data. Noisy data. Limited memory space.

Related work Semi-supervised classification of data stream, cannot cope with concept drift. [Wu&Yang, ICDMW06] Active learning for data stream classification, cannot cope with concept drift caused by sudden shift of user interests. [Fan&Huang, SDM04] [Fan&Huang, ICDM04] [Huang&Dong, IDA07] Need multiply scan. [Klinkenberg &Joachims, ICML00]

Related Work Static approaches for data stream classification (fully labelled). [Street&Kim, KDD01] [Wang&Fan, KDD03] Dynamic approaches for data stream classification (fully labelled). [Kolter&Maloof, ICDM03] [Zhang&Jin,SIGmodRecord06] [Zhu&Wu, ICDM04] One-class text classification. [Li&Liu, ECML05] [Liu&Dai, ICDM03] [Liu&Li, AAAI04]

Proposed Approach

Base Classifier Selection – phenomena observed If the reader is very interested in a certain topic today, say, sports, then, there is a high probability that he is also interested in sports tomorrow. If the reader is interested in a topic, say, sports, and for some reason his interests change to another topic, say, politics, then after sometime, there is high probability that his interests will change back to sports again.

Base Classifier Selection - strategy The ensemble should keep some recent base classifier. The ensemble should keep some base classifiers which represent the reader's interests in the long run.

Experiment Result Dataset: 20NewsGroup We compare the following approaches: Single Window (SW): The classifier is built on the current batch of data. Full Memory (FM): The classifier is built on the current batch of data, together with positive samples dated back to batch 0. Fixed Window (FW): The classifier is built on the samples from a fixed size of windows. Ensemble (EN): The classifier is built by the algorithms proposed in this paper.

Experiment Scenarios 4 groups of experiments: Experiment with concept drift caused by changing of user interests. Experiment with heavy vs. gradual concept drift. Experiment with concept drift caused by both changing of user interests and data distribution. Experiment with 5 ontopic categories.

Experiment: concept drift caused by changing of user interests.

Experiment: concept drift caused by changing of user interests.

Experiment: concept drift caused by changing of user interests.

Experiment: heavy vs. gradual concept drift.

Experiment: heavy vs. gradual concept drift.

Experiment: heavy vs. gradual concept drift.

Experiment: changing of user interests & data distribution. Very similar to the experiment result observed in the first group of experiment.

Experiment : 5 ontopic categories.

Experiment : 5 ontopic categories.

Experiment : 5 ontopic categories.

Conclusion & Future Research We firstly tackled the problem of the one-class classification on streaming data, by ensemble based approach. Future research Dynamic feature space One-class classification on general data streaming.

Thank you! 