Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.

Slides:



Advertisements
Similar presentations
Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
GENETICA The problem-solving approach proposed here, largely based on the previous remarks, is implemented through the computer language GENETICA. GENETICA.
SVM—Support Vector Machines
Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Chapter 5: Partially-Supervised Learning
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,
Active Learning Strategies for Drug Screening 1. Introduction At the intersection of drug discovery and experimental design, active learning algorithms.
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
Bing LiuCS Department, UIC1 Chapter 8: Semi-Supervised Learning Also called “partially supervised learning”
Learning from Multiple Outlooks Maayan Harel and Shie Mannor ICML 2011 Presented by Minhua Chen.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Learning from Imbalanced, Only Positive and Unlabeled Data Yetian Chen
Active Learning for Class Imbalance Problem
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 10a-11:30a Instructor: Christoph F. Eick Classroom:AH123
CpSc 810: Machine Learning Design a learning system.
Introduction To System Analysis and Design
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2
Computational Tools for Population Biology Tanya Berger-Wolf, Computer Science, UIC; Daniel Rubenstein, Ecology and Evolutionary Biology, Princeton; Jared.
Agent-Based Hybrid Intelligent Systems and Their Dynamic Reconfiguration Zili Zhang Faculty of Computer and Information Science Southwest University
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui.
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Adaptive Hopfield Network Gürsel Serpen Dr. Gürsel Serpen Associate Professor Electrical Engineering and Computer Science Department University of Toledo.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Biointelligence Laboratory, Seoul National University
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Detecting New a Priori Probabilities of Data Using Supervised Learning Karpov Nikolay Associate professor NRU Higher School of Economics.
On Utillizing LVQ3-Type Algorithms to Enhance Prototype Reduction Schemes Sang-Woon Kim and B. John Oommen* Myongji University, Carleton University*
Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
1 Statistics & R, TiP, 2011/12 Neural Networks  Technique for discrimination & regression problems  More mathematical theoretical foundation  Works.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Discovering Interesting Patterns for Investment Decision Making with GLOWER-A Genetic Learner Overlaid With Entropy Reduction Advisor : Dr. Hsu Graduate.
Unsupervised Streaming Feature Selection in Social Media
Machine Learning in CSC 196K
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Evolutionary Computation Evolving Neural Network Topologies.
Introduction to Machine Learning, its potential usage in network area,
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Chapter 8: Semi-Supervised Learning
C.-S. Shieh, EC, KUAS, Taiwan
Boosting and Additive Trees
Logistic Regression & Parallel SGD
Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee
Model generalization Brief summary of methods
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Presentation transcript:

Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement and Motivation Technical Approach Key Achievements and Future Goals Given a set of positive examples P and a set of unlabeled examples U, we want to build a classifier. The key feature of this problem is that we do not have labeled negative examples. This makes traditional classification learning algorithms not directly applicable..The main motivation for studying this learning model is to solve many practical problems where it is needed. Labeling of negative examples can be very time consuming. We have proposed three approaches. Two-step approach: The first step finds some reliable negative data from U. The second step uses an iterative algorithm based on naïve Bayesian classification and support vector machines (SVM) to build the final classifier. Biased SVM: This method models the problem with a biased SVM formulation and solves it directly. A new evaluation method is also given, which allows us to tune biased SVM parameters. Weighted logistic regression: The problem can be regarded as an one-side error problem and thus a weighted logistic regress method is proposed. In (Liu et al. ICML-2002), it was shown theoretically that P and U provide sufficient information for learning, and the problem can be posed as a constrained optimization problem. Some of our algorithms are reported in (Liu et al. ICML- 2002; Liu et al. ICDM-2003; Lee and Liu ICML-2003; Li and Liu IJCAI-2003). Our future work will focus on two aspects: Deal with the problem when P is very small Apply it to the bio-informatics domain. There are many problems there requiring this type of learning. Positive training data Unlabeled data Learning algorithm Classifier

Gene Expression Programming for Data Mining and Knowledge Discovery Investigators: Peter Nelson, CS; Xin Li, CS; Chi Zhou, Motorola Inc. Prime Grant Support: Physical Realization Research Center of Motorola Labs Problem Statement and Motivation Technical Approach Key Achievements and Future Goals Real world data mining tasks: large data set, high dimensional feature set, non-linear form of hidden knowledge; in need of effective algorithms. Gene Expression Programming (GEP): a new evolutionary computation technique for the creation of computer programs; capable of producing solutions of any possible form. Research goal: applying and enhancing GEP algorithm to fulfill complex data mining tasks. Overview: improving the problem solving ability of the GEP algorithm by preserving and utilizing the self- emergence of structures during its evolutionary process Constant Creation Methods for GEP: local optimization of constant coefficients given the evolved solution structures to speed up the learning process. A new hierarchical genotype representation: natural hierarchy in forming the solution and more protective genetic operation for functional components Dynamic substructure library: defining and reusing self- emergent substructures in the evolutionary process. Have finished the initial implementation of the proposed approaches. Preliminary testing has demonstrated the feasibility and effectiveness of the implemented methods: constant creation methods have achieved significant improvement in the fitness of the best solutions; dynamic substructure library helps identify meaningful building blocks to incrementally form the final solution following a faster fitness convergence curve. Future work include investigation for parametric constants, exploration of higher level emergent structures, and comprehensive benchmark studies. Genotype: sqrt.*.+.*.a.*.sqrt.a.b.c./.1.-.c.d Mathematical form:Phenotype: Figure 1. Representations of solutions in GEP