General Graphical Model Learning Schema

General Graphical Model Learning Schema
After Kimmig et al Initialize graph G := empty. While not converged do Generate candidate graphs. For each candidate graph C, learn parameters θC that maximize score(C, θ, dataset). G := argmaxC score(C, θC,dataset). check convergence criterion. relational score From Relational Statistics to Degrees of Belief

Tutorial on Learning Bayesian Networks for Complex Relational Data
Parameter Learning Section 3 Tutorial on Learning Bayesian Networks for Complex Relational Data

Overview: Upgrading Parameter Learning
Extend learning concepts/algorithms designed for iid data to relational data This is called upgrading iid learning (van de Laer and De Raedt) Score/Objective Function: Random Selection Likelihood Algorithm: Fast Möbius Transform van de Laer, W. & De Raedt, L. (2001), How to upgrade propositional learners to first-order logic: A case study’, in Relational Data Mining', Springer Verlag

Likelihood Function for IID Data
Learning Bayesian Networks for Complex Relational Data

Score-based Learning for IID data
Most Bayesian network learning methods are based on a score function The score function measures how well the network fits the observed data Key component: the likelihood function. measures how likely each datapoint is according to the Bayesian network Bayesian Network intuitively, how well the model explains each datapoint data table Log-likelihood, e.g. -3.5 Learning Bayesian Networks for Complex Relational Data

The Bayes Net Likelihood Function for IID data
For each row, compute the log-likelihood for the attribute values in the row. Log-likelihood for table = sum of log-likelihoods for rows. generalizes i.i.d. case: only one first-order variable. uses random selection semantics, instantiation principle

IID Example Title Drama Action Horror Fargo T F Kill_Bill Title Drama
P(Action(M.)=T) = 1 Action(Movie) Title Drama Action Horror Fargo T F Kill_Bill Drama(Movie) P(Drama(M.)=T|Action(M.)=T) = 1/2 Horror(Movie) P(Horror(M.)=F|...) = 1 Title Drama Action Horror PB ln(PB) Fargo T F 1x1/2x1 = 1/2 -0.69 Kill_Bill In this toy data table: Action is always true Horror is always false P_B is the joint probability from the Bayes net. This is the product of conditional probabilities from the CP table. Total Log-likelihood Score for Table = -1.38 Learning Bayesian Networks for Complex Relational Data

Likelihood Function for Relational Data
Learning Bayesian Networks for Complex Relational Data

Wanted: a likelihood score for relational data
database Problems Multiple Tables. Dependent data points Bayesian Network Log-Likelihood, e.g. -3.5 likelihood score is not necessarily normalized likelihood function = likelihood score/normalization constant (partition function) Learning Bayesian Networks for Complex Relational Data

The Random Selection Likelihood Score
Randomly select a grounding/instantiation for all first-order variables in the first-order Bayesian network Compute the log-likelihood for the attributes of the selected grounding Log-likelihood score = expected log-likelihood for a random grounding Generalizes IID log-likelihood, but without independence assumption Schulte, O. (2011), A tractable pseudo-likelihood function for Bayes Nets applied to relational data, in 'SIAM SDM', pp

Example P(g(A)=M) = 1/2 P(ActsIn(A,M)=T|g(A)=M) = 1/4
gender(A) ActsIn(A,M) P(ActsIn(A,M)=T|g(A)=M) = 1/4 P(ActsIn(A,M)=T|g(A)=W) = 2/4 Prob A M gender(A) ActsIn(A,M) PB ln(PB) 1/8 Brad_Pitt Fargo F 3/8 -0.98 Kill_Bill Lucy_Liu W 2/8 -1.39 T Steve_Buscemi -2.08 Uma_Thurman 0.27 geo -1.32 arith data + Bayesian network -> random selection likelihood value

Observed Frequencies Maximize Random Selection Likelihood
Proposition The random selection log-likelihood score is maximized by setting the Bayesian network parameters to the observed conditional frequencies gender(A) ActsIn(A,M) P(g(A)=M) = 1/2 P(ActsIn(A,M)=T|g(A)=M) = 1/4 P(ActsIn(A,M)=T|g(A)=W) = 2/4 to compute the first conditional probability: there are 4 actor-movie pairs where the actor is male (Brad Pitt x 2 + Steve Buscemi x 2). Of those 4, there is only one where the actor appears in the movie (Buscemi in Fargo) Schulte, O. (2011), A tractable pseudo-likelihood function for Bayes Nets applied to relational data, in 'SIAM SDM', pp

Computing Maximum Likelihood Parameter Values
The Parameter Values that maximize the Random Selection Likelihood Learning Bayesian Networks for Complex Relational Data

Computing Relational Frequencies
Need to compute a contingency table with instantiation counts Well researched for all true relationships SQL Count(*) Virtual Join Partition Function Reduction g(A) Acts(A,M) action(M) count M F T 3 W 2 1 Parametrized polynomial complexity in number of first-order variables. Vardi, M. Y. (1995), On the Complexity of Bounded-Variable Queries, in 'PODS', ACM Press, , pp Yin, X.; Han, J.; Yang, J. & Yu, P. S. (2004), CrossMine: Efficient Classification Across Multiple Database Relations, in 'ICDE'. Venugopal, D.; Sarkhel, S. & Gogate, V. (2015), Just Count the Satisfied Groundings: Scalable Local-Search and Sampling Based Inference in MLNs, in AAAI, 2015, pp

Single Relation Case How to generalize to multiple relations?
For single relation compute PD (R = F) using 1-minus trick (Getoor et al. 2003) Example: PD(HasRated(User,Movie) = T) = 4.27% PD(HasRated(User,Movie) = F) = 95.73% How to generalize to multiple relations? PD(ActsIn(Actor,Movie)=F,HasRated(User,Movie)=F) Getoor, L.; Friedman, N.; Koller, D. & Taskar, B. (2003), 'Learning probabilistic models of link structure', J. Mach. Learn. Res. 3,

The Möbius Extension Theorem for negated relations
For two link types R1 R2 p4 Joint probabilities R1 R2 p3 R1 R2 p2 R1 R2 p1 R1 R2 q4 R1 q3 Möbius Parameters R2 q2 nothing q1 Learning Bayesian Networks for Complex Relational Data

The Fast Inverse Möbius Transform
HasRated(U,M) = R2 ActsIn(A,M) = R1 Initial table with no false relationships Table with joint probabilities J.P. = joint probability R1 R2 J.P. T 0.2 * 0.3 0.4 1 R1 R2 J.P. T 0.2 F 0.1 * 0.4 0.6 R1 R2 J.P. T 0.2 F 0.1 0.5 - ignoring attribute conditions numbers are made up * means: nothing specified. Exercise: trace method + - + - - + + Kennes, R. & Smets, P. (1990), Computational aspects of the Möbius transformation, in 'UAI', pp

Parameter Learning Time
Fast Inverse Möbius transform (IMT) vs Constructing complement tables using SQL Times are in seconds Möbius transform is much faster, times Learning Bayesian Networks for Complex Relational Data

Using Presence and Absence of Relationships
Find correlations between links/relationships, not just attributes given links If a user performs a web search for an item, is it likely that the user watches a movie about the item? Example of Weka-interesting association rule on Financial benchmark dataset: statement_frequency(Account) = monthly  HasLoan(Account, Loan) = true Qian, Z.; Schulte, O. & Sun, Y. (2014), Computing Multi-Relational Sufficient Statistics for Large Databases, in 'Computational Intelligence and Knowledge Management (CIKM)', pp

Summary Random Selection Semantics  random selection log-likelihood
Maximizing values for random selection log-likelihood = observed empirical frequencies. Generalizes maximum likelihood result for IID data. Fast Möbius Transform: computes database frequencies for conjunctive formulas involving any number of negative relationships. Enables link analysis: modelling probabilistic associations that involve the presence or absence of relationships. Learning Bayesian Networks for Complex Relational Data

General Graphical Model Learning Schema

Similar presentations

Presentation on theme: "General Graphical Model Learning Schema"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

General Graphical Model Learning Schema

Similar presentations

Presentation on theme: "General Graphical Model Learning Schema"— Presentation transcript:

Similar presentations

About project

Feedback