Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

Slides:



Advertisements
Similar presentations
Bayesian Treatment of Incomplete Discrete Data applied to Mutual Information and Feature Selection Marcus Hutter & Marco Zaffalon IDSIA IDSIA Galleria.
Advertisements

Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
What is Statistical Modeling
Pattern Recognition and Machine Learning
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Lecture II-2: Probability Review
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Probabilistic Graphical Models for Semi-Supervised Traffic Classification Rotsos Charalampos, Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani Computer.
Randomized Algorithms for Bayesian Hierarchical Clustering
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Bayesian Generalized Kernel Mixed Models Zhihua Zhang, Guang Dai and Michael I. Jordan JMLR 2011.
Gaussian Processes Li An Li An
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Lecture 2: Statistical learning primer for biologists
Dropout as a Bayesian Approximation
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Gaussian Processes For Regression, Classification, and Prediction.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Zhilin Zhang, Bhaskar D. Rao University of California, San Diego March 28,
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer.
Gaussian Process Networks Nir Friedman and Iftach Nachman UAI-2K.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Applying Link-based Classification to Label Blogs Smriti Bhagat, Irina Rozenbaum Graham Cormode.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Semi-Supervised Clustering
Learning Deep Generative Models by Ruslan Salakhutdinov
Sofus A. Macskassy Fetch Technologies
Constrained Clustering -Semi Supervised Clustering-
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
CSCI 5822 Probabilistic Models of Human and Machine Learning
Learning with information of features
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Discriminative Probabilistic Models for Relational Data
Machine Learning – a Probabilistic Perspective
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Presentation transcript:

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented by Nesreen Ahmed, Nguyen Cao, Sebastian Moreno, Philip Schatz

Outline Introduction Relational Gaussian Processes Application – Linkage prediction – Semi-Supervised Learning Experiments & Results Conclusion & Discussion 12/02/08CS590M: Statistical Machine Learning - Fall 20082

Introduction Many domains involve Relational Data – Web: document links – Document Categorization: citations – Computational Biology: protein interactions Inter-relationships between instances can be informative for learning tasks Relations reflect network structure, enrich how instances are correlated 12/02/08CS590M: Statistical Machine Learning - Fall 20083

Introduction Relational Information represented by a graph G = (V, E) Supervised Learning: – Provide structural knowledge Also for semi-supervised: derived from input attributes. Graph estimates the global geometric structure of the data 12/02/08CS590M: Statistical Machine Learning - Fall 20084

A Gaussian Process is a joint Gaussian distribution over sets of function values {f x } of any arbitrary set of n instances x   Gaussian Processes 12/02/08CS590M: Statistical Machine Learning - Fall where

Linkages: The uncertainty in observing ε ij induces Gaussian noise N(0, σ 2 ) in observing the values of the corresponding instances’ function value Relational Gaussian Processes 12/02/08CS590M: Statistical Machine Learning - Fall xjxj xixi ε ij

Approximate Inference: Relational Gaussian Processes 12/02/08CS590M: Statistical Machine Learning - Fall i,j runs over the set of observed undirected linkages EP algorithm approximates where as : is a 2x2 symmetric matrix

Relational Gaussian Processes 12/02/08CS590M: Statistical Machine Learning - Fall where is a nxn matrix with four non-zero entries augmented from

For any finite collection of data points X, the set of random variables {f x } conditioned on ε have a multivariate Gaussian distribution: Relational Gaussian Processes 12/02/08CS590M: Statistical Machine Learning - Fall where elements of covariance matrix are given by evaluating the following (covariance) kernel function:

Linkage Prediction Joint prob. Probability for an edge between X r and X s 12/02/08CS590M: Statistical Machine Learning - Fall

Semi supervised learning 12/02/08CS590M: Statistical Machine Learning - Fall ? ?? ? ? ? ? ? 1 1 ?

Semi supervised learning 12/02/08CS590M: Statistical Machine Learning - Fall ? ?? ? ? ? ? ? 1 1 ? Nearest Neighborhood K=1

Semi supervised learning 12/02/08CS590M: Statistical Machine Learning - Fall ? ?? ? ? ? ? ? 1 1 ? Nearest Neighborhood K=2

Semi supervised learning Apply RGP to obtain Variables are related through a Probit noise Applying Bayes 12/02/08CS590M: Statistical Machine Learning - Fall

Semi supervised learning Predictive distribution Obtaining Bernoulli distribution for classification 12/02/08CS590M: Statistical Machine Learning - Fall

Experiments Experimental Setup – Kernel function Centralized Kernel : linear or Gaussian kernel shifted to the empirical mean – Noise level Label noise = (for RGP and GPC) Edge noise = [5 : 0.05] 12/02/08CS590M: Statistical Machine Learning - Fall

12/02/08CS590M: Statistical Machine Learning - Fall Samples collected from a gaussian mixture with two components on the x-axis. Two labeled samples indicated by diamond and circle. K=3 Best value =0.4 based on approximate model evidence Results

12/02/08CS590M: Statistical Machine Learning - Fall Posterior Covariance matrix of RGP learnt from the data It captures the density information of unlabelled data Using the posterior covariance matrix learnt from the data as the new prior, supervised learning is carried out Curves represent predictive distribution for each class Results

Real World Experiment – Subset of the WEBKB dataset Collected from CS dept. of 4 universities Contains pages with hyperlinks interconnecting them Pages classified into 7 categories (e.g student, course, other) – Documents are preprocessed as vectors of input attributes – Hyperlinks translated into undirected positive linkages 2 pages are likely to be positively correlated if hyperlinked by the same hub page No negative linkages – Compared with GPC & LapSVM (Sindhwani et al. 2005) 12/02/08CS590M: Statistical Machine Learning - Fall

Results Two classification tasks – Student vs. non-student, Other vs. non-other Randomly selected 10% samples as labeled data Selection repeated 100 times Linear kernel Table shows average AUC for predicting the labels of unlabeled cases 12/02/08CS590M: Statistical Machine Learning - Fall Student or NotOther or Not Univ.GPCLapSVMRGPGPCLapSVMRGP Corn.0.825± ± ± ± ± ±0.025 Texa.0.899± ± ± ± ± ±0.026 Wash.0.839± ± ± ± ± ±0.024 Wisc0.883± ± ± ± ± ±0.015

Conclusion A novel Bayesian framework to learn from relational data based on GP The RGP provides a data-dependent covariance function for supervised learning tasks (classification) Applied to semi-supervised learning tasks RGP requires very few labels to generalize on unseen test points – Incorporate unlabeled data in the model selection 12/02/08CS590M: Statistical Machine Learning - Fall

Discussion The proposed framework can be extended to model: – Directed (asymmetric) relations as well as undirected relations – Multiple classes of relations – Graphs with weighted edges The model should be compared to other models The results can be sensitive to choice of K in KNN 12/02/08CS590M: Statistical Machine Learning - Fall

12/02/08CS590M: Statistical Machine Learning - Fall Thanks Questions ?