Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1 University of Illinois, IBM TJ Watson Debapriya Basu.

Slides:



Advertisements
Similar presentations
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Pattern Finding and Pattern Discovery in Time Series
Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.
Exact Inference in Bayes Nets
K Means Clustering , Nearest Cluster and Gaussian Mixture
Loopy Belief Propagation a summary. What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some.
Probability distribution functions Normal distribution Lognormal distribution Mean, median and mode Tails Extreme value distributions.
Learning to Detect A Salient Object Reporter: 鄭綱 (3/2)
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
On Community Outliers and their Efficient Detection in Information Networks Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1.
Data Mining Techniques Outline
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
Clustering.
Visual Recognition Tutorial
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Chapter Two Probability Distributions: Discrete Variables
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Social Network Analysis via Factor Graph Model
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
SegmentationSegmentation C. Phillips, Institut Montefiore, ULg, 2006.
Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.
Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Markov Random Fields Probabilistic Models for Images
On Node Classification in Dynamic Content-based Networks.
First topic: clustering and pattern recognition Marc Sobel.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
The final exam solutions. Part I, #1, Central limit theorem Let X1,X2, …, Xn be a sequence of i.i.d. random variables each having mean μ and variance.
Lecture 2: Statistical learning primer for biologists
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
6.4 Random Fields on Graphs 6.5 Random Fields Models In “Adaptive Cooperative Systems” Summarized by Ho-Sik Seok.
Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Bayesian Belief Propagation for Image Understanding David Rosenberg.
6.8 Maximizer of the Posterior Marginals 6.9 Iterated Conditional Modes of the Posterior Distribution Jang, HaYoung.
Biointelligence Laboratory, Seoul National University
Probability Theory and Parameter Estimation I
Chapter 4. Inference about Process Quality
CS6220: Data Mining Techniques
Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.
Community Distribution Outliers in Heterogeneous Information Networks
Markov Networks.
Bayesian Models in Machine Learning
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
EE513 Audio Signals and Systems
Expectation-Maximization & Belief Propagation
Biointelligence Laboratory, Seoul National University
Markov Networks.
Outline Texture modeling - continued Markov Random Field models
Presentation transcript:

Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1 University of Illinois, IBM TJ Watson Debapriya Basu

2  Determine outliers in information networks  Compare various algorithms which does the same

3  Eg Internet, Social Networking Sites  Nodes – characterized by feature values  Links - representative of relation between nodes

 Outliers – anomalies, novelties  Different kinds of outliers ◦ Global ◦ Contextual 4

5

6  Unified model considering both nodes and links  Community discovery and outlier detection are related processes

7  Treat each object as a multivariate data point  Use K components to describe normal community behavior and one component to denote outliers  Induce a hidden variable z i at each object indicating community  Treat network information as a graph  Model the graph as a Hidden Markov Random Field on z i  Find the local minimum of the posterior probability potential energy of the model.

8 community label Z outlier node feature X link structure W high-income: mean: 116k std: 35k low-income: mean: 20k std: 12k model parameters K: number of communitie s

9 SymbolDefinition I = {1,2,3….i,..M}Indices of the objects V = {v1,v2….v m }Set of objects S = {s1,s2,….s m }Given attributes of objects W M*M = {w ij }Adjacency matrix containing the weights of the links Z = {z 1,…..,z m }RVs for hidden labels of objects X = {x 1,…..,x m }RVs for observed data N i (i ∈ I)Neighborhood of object v i 1,….,k,….KIndices of normal communities Θ = {Θ 1, Θ 2,……, Θ k }R.Vs for model parameters

◦ Set of R.Vs X are conditionally independent given their labels P(X=S|Z) = ΠP(x i =s i |z i ) ◦ Kth normal community is characterized by a set of parameters P(x i =s i |z i =k) = P(x i =s i |Θ k ) ◦ Outliers are characterized by uniform distribution ◦ P(x i =s i |z i =0) = ρ0 ◦ Markov random field is defined over hidden variable Z ◦ P(z i |z I-{i} ) = P(z i |z Ni ) ◦ The equivalent Gibbs distribution is P(Z) = exp(-U(Z))*1/H 1 H 1 = normalizing constant, U(Z) = sum of clique potentials. ◦ Goal is to find the configuration of z that maximizes P(X=S|Z)P(Z) for a given Θ 10

11  Continuous Data ◦ Is modeled as Gaussian distribution ◦ Model parameters: mean, standard deviation  Text Data ◦ Is modeled as Multinomial distribution ◦ Model parameters: probability of a word appearing in a community

12 Given Θ, find Z that maximizes P(Z|X) Given Z, find Θ that maximizes P(X|Z) Initialize Z INFERENCE PARAMETER ESTIMATION Θ : model parameters Z: community labels

13  Calculate model parameters ◦ maximum likelihood estimation  Continuous ◦ mean: sample mean of the community ◦ standard deviation: square root of the sample variance of the community  Text ◦ probability of a word appearing in the community: empirical probability

14  Calculate Z i values ◦ Given Model parameters, ◦ Iteratively update the community labels of nodes at each timestep ◦ Select the label that maximizes P(Z|X,Z N )  Calculate P(Z|X,Z N ) values ◦ Both the node features and community labels of neighbors if Z indicates a normal community ◦ If the probability of a node belonging to any community is low enough, label it as an outlier

15  Setting Hyper parameters ◦ a 0 = threshold ◦ Λ = confidence in the network ◦ K = number of communities  Initialization ◦ Group outliers in clusters. ◦ It will eventually get corrected.

16  Data Generation ◦ Generate continuous data based on Gaussian distributions and generate labels according to the model ◦ Define r: percentage of outliers, K: number of communities  Baseline models ◦ GLODA: global outlier detection (based on node features only) ◦ DNODA: local outlier detection (check the feature values of direct neighbors) ◦ CNA: partition data into communities based on links and then conduct outlier detection in each community

17

18  Communities ◦ data mining, artificial intelligence, database, information analysis  Sub network of Conferences  Links: percentage of common authors among two conferences  Node features: publication titles in the conference  Sub network of Authors  Links: co-authorship relationship  Node features: titles of publications by an author

19 Community outliers: CVPR CIKM

20  Community Outliers  Community Outlier Detection QUESTIONS

21  On Community Outliers and their Efficient Detection in Information Networks – Gao, Liang, Fan, Wang, Sun, Han  Outlier detection – Irad Ben-Gal  Automated detection of outliers in real-world data – Last, Kandel  Outlier Detection for High Dimensional Data – Aggarwal, Yu