Presentation is loading. Please wait.

Presentation is loading. Please wait.

A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown.

Similar presentations


Presentation on theme: "A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown."— Presentation transcript:

1 A. Darwiche Learning in Bayesian Networks

2 A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown Structure Incomplete Data Learning The Learning Problem

3 A. Darwiche Known Structure Complete Data

4 A. Darwiche Known Structure Incomplete Data

5 A. Darwiche Unknown Structure Complete Data

6 A. Darwiche Unknown Structure Incomplete Data

7 A. Darwiche Known Structure Method A CPTs A Method B CPTs B

8 A. Darwiche Known Structure = Pr A + CPTs A = Pr B + CPTs B Which probability distribution should we choose? Common criterion: Choose distribution that maximizes likelihood of data

9 A. Darwiche Known Structure = Pr A + CPTs A = Pr B + CPTs B d1d1 d6d6 Data D Pr A (D) = Pr A (d 1 ) … Pr A (d m ) Likelihood of data given Pr A Pr B (D) = Pr B (d 1 ) … Pr B (d m ) Likelihood of data given Pr B

10 A. Darwiche Maximizing Likelihood of Data Complete Data: Unique set of CPTs which maximize likelihood of data Incomplete Data: No Unique set of CPTs which maximize likelihood of data

11 A. Darwiche Maximizing Likelihood of Data Complete Data: Unique set of CPTs which maximize likelihood of data Incomplete Data: No Unique set of CPTs which maximize likelihood of data

12 A. Darwiche Known Structure, Complete Data Data D d1d1 d6d6 Estimated parameter: Number of data points d i with d b c Number of data points d i with b c =

13 A. Darwiche Known Structure, Complete Data Data D d1d1 d6d6 Estimated parameter:

14 A. Darwiche Complexity Network with: –Nodes: n –Parameters: k –Data points: m Time complexity: O(m k n) (straightforward implementation) Space complexity: O(k) parameter count

15 A. Darwiche Known Structure, Incomplete Data EM Algorithm (Expectation-Maximization): -Initial CPTs to random values -Repeat until convergence: -Estimate parameters using current CPTs (E-step) -Update CPTs using estimates (M-step)

16 A. Darwiche Known Structure, Incomplete Data Estimated parameters at iteration i+1 (using the CPTs at iteration i): Pr 0 corresponds to the initial Bayesian network (random CPTs)

17 A. Darwiche EM Algorithm Likelihood of data cannot get smaller after an iteration Algorithm is not guaranteed to return the network which absolutely maximizes likelihood of data It is guaranteed to return a local maxima: Random re- starts Algorithm is stopped when –change in likelihood gets very small –Change in parameters gets very small

18 A. Darwiche Complexity Network with: –Nodes: n –Parameters: k –Data points: m –Treewidth: w Time complexity (per iteration): O(m k n 2 w ) (straightforward implementation) Space complexity: O(k + n 2 w ) parameter count + space for inference

19 A. Darwiche Collaborative Filtering Collaborative Filtering (CF) finds items of interest to a user based on the preferences of other similar users. –Assumes that human behavior is predictable

20 A. Darwiche Where is it used? E-commerce –Recommend products based on previous purchases or click- stream behavior –Ex: Amazon.com Information sites –Rate items based on previous user ratings –Ex: MovieLens, Jester

21 A. Darwiche John5-32 Sam-415 Cindy3-5- Bob51-- 513.51.7 CF

22 A. Darwiche Memory-based Algorithms Use the entire database of user ratings to make predictions. –Find users with similar voting histories to the active user. –Use these users’ votes to predict ratings for products not voted on by the active user.

23 A. Darwiche Model-based Algorithms Construct a model from the vote database. Use the model to predict the active user’s ratings.

24 A. Darwiche Bayesian Clustering Use a Naïve Bayes network to model the vote database. m vote variables: one for each title. –Represent discrete vote values. 1 “cluster” variable –Represents user personalities

25 A. Darwiche Naïve Bayes C V1V1 V2V2 V3V3 VmVm …

26 A. Darwiche C V1V1 V2V2 V3V3 VmVm …

27 Inference –Evidence: known votes v k for titles k  I –Query: title j for which we need to predict vote Expected value of vote:    w h kjj Ikvhvhp 1 ):|Pr( C V1V1 V2V2 V3V3 VmVm …

28 A. Darwiche Learning Simplified Expectation Maximization (EM) Algorithm with partial data Initialize CPTs with random values subject to the following constraints: )Pr(c c  )| | cv kcv k  1   C c  1 |   k k v cv 

29 A. Darwiche Datasets MovieLens –943 users; 1682 titles; 100,000 votes (1..5); explicit voting MS Web – website visits –610 users; 294 titles; 8,275 votes (0,1) : null votes => 0 : 179,340 votes; implicit voting

30 A. Darwiche Learning curve for MovieLens Dataset

31 A. Darwiche Protocols User database is divided into: 80% training set and 20% test set. –One-by-one select a user from the test set to be the active user. –Predict some of their votes based on remaining votes

32 A. Darwiche All-But-One Given-{Two, Five, Ten} Qee IaIa eeeeeeeeeee Qee IaIa QQQQQQQQQQQ eeeeeQ IaIa QQQQQQQQeeeeQe IaIa QQQeeeee

33 A. Darwiche Evaluation Metric Average Absolute Deviation Ranked Scoring

34 A. Darwiche Results Experiments were run 5 times and averaged Movielens AlgorithmGiven-TwoGiven-FiveGiven-TenAll-But-One Correlation1.019.916.865.806 VecSim.948.878.843.799 BC(9).771.765.763.753

35 A. Darwiche MS Web AlgorithmGiven-TwoGiven-FiveGiven-TenAll-But-One Correlation0.1050.09110.08440.0673 VecSim0.1010.08850.08180.0675 BC(9)0.0652 0.06490.0507

36 A. Darwiche Computational Issues Prediction time: (Memory-based) 10 minutes per experiment; (Model-based) 2 minutes Learning time: 20 minutes per iteration n: number of data point; m: number of titles; w: number of votes per title; |C| number of personality types AlgorithmPrediction TimeLearning TimeSpace Memory-basedO(n*m)N/AO(n*m) Model-basedO(|C|*m)O(n*m*|C|*w)O(|C|*m*w)

37 A. Darwiche Demo of SamIam Building networks: –Nodes, Edges –CPTs Inference: –Posterior marginals –MPE –MAP Learning: EM Sensitivity Engine


Download ppt "A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown."

Similar presentations


Ads by Google