CS188: Computational Models of Human Behavior

Slides:



Advertisements
Similar presentations
Markov models and HMMs. Probabilistic Inference P(X,H,E) P(X|E=e) = P(X,E=e) / P(E=e) with P(X,E=e) = sum h P(X, H=h, E=e) P(X,E=e) = sum h,x P(X=x, H=h,
Advertisements

Introduction to Graphical Models Brookes Vision Lab Reading Group.
ABSTRACT: We examine how to determine the number of states of a hidden variables when learning probabilistic models. This problem is crucial for improving.
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Markov Networks Alan Ritter.
BAYESIAN NETWORKS Ivan Bratko Faculty of Computer and Information Sc. University of Ljubljana.
Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Dynamic Bayesian Networks (DBNs)
An Introduction to Variational Methods for Graphical Models.
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Introduction to probability theory and graphical models Translational Neuroimaging Seminar on Bayesian Inference Spring 2013 Jakob Heinzle Translational.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bayes Nets Rong Jin. Hidden Markov Model  Inferring from observations (o i ) to hidden variables (q i )  This is a general framework for representing.
Bayesian Belief Networks
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Today Logistic Regression Decision Trees Redux Graphical Models
Bayesian Networks Alan Ritter.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Probability and Statistics Review Thursday Sep 11.
Probabilistic Reasoning
Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
A Brief Introduction to Graphical Models
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 12th Bayesian network and belief propagation in statistical inference Kazuyuki Tanaka.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Slides for “Data Mining” by I. H. Witten and E. Frank.
An Introduction to Variational Methods for Graphical Models
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
Lecture 2: Statistical learning primer for biologists
Machine Learning – Lecture 11
Introduction on Graphic Models
Today Graphical Models Representing conditional dependence graphically
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Probabilistic Reasoning Inference and Relational Bayesian Networks.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Graduate School of Information Sciences, Tohoku University
INTRODUCTION TO Machine Learning 2nd Edition
CHAPTER 16: Graphical Models
Qian Liu CSE spring University of Pennsylvania
Markov Networks.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Pattern Recognition and Image Analysis
An Introduction to Variational Methods for Graphical Models
Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.
Markov Random Fields Presented by: Vladan Radosavljevic.
Graduate School of Information Sciences, Tohoku University
Graduate School of Information Sciences, Tohoku University
Graduate School of Information Sciences, Tohoku University
Markov Networks.
Graduate School of Information Sciences, Tohoku University
Presentation transcript:

CS188: Computational Models of Human Behavior Introduction to graphical models slide Credits: Kevin Murphy, mark pashkin, zoubin ghahramani and jeff bilmes

Reasoning under uncertainty In many settings, we need to understand what is going on in a system when we have imperfect or incomplete information For example, we might deploy a burglar alarm to detect intruders But the sensor could be triggered by other events, e.g., earth-quake Probabilities quantify the uncertainties regarding the occurrence of events

Probability spaces A probability space represents our uncertainty regarding an experiment It has two parts: A sample space , which is the set of outcomes the probability measure P, which is a real function of the subsets of  A set of outcomes A is called an event. P(A) represents how likely it is that the experiment’s actual outcome be a member of A

An example If our experiment is to deploy a burglar alarm and see if it works, then there could be four outcomes:  = {(alarm, intruder), (no alarm, intruder), (alarm, no intruder), (no alarm, no intruder)} Our choice of P has to obey these simple rules …

The three axioms of probability theory P(A)≥0 for all events A P()=1 P(A U B) = P(A) + P(B) for disjoint events A and B

Some consequences of the axioms

Example Let’s assign a probability to each outcome ω These probabilities must be non-negative and sum to one intruder no intruder alarm 0.002 0.003 no alarm 0.001 0.994

Conditional Probability

Marginal probability Marginal probability is then the unconditional probability P(A) of the event A; that is, the probability of A, regardless of whether event B did or did not occur. For example, if there are two possible outcomes corresponding to events B and B', this means that P(A) = P(AB) + P(AB’) This is called marginalization

Example If P is defined by then P({(intruder, alarm)|(intruder, alarm),(no intruder, alarm)}) intruder no intruder alarm 0.002 0.003 no alarm 0.001 0.994

The product rule The probability that A and B both happen is the probability that A happens and B happens, given A has occurred

The chain rule Applying the product rule repeatedly: P(A1,A2,…,Ak) = P(A1) P(A2|A1)P(A3|A2,A1)…P(Ak|Ak-1,…,A1) Where P(A3|A2,A1) = P(A3|A2A1)

Bayes’ rule Use the product rule both ways with P(AB) P(A B) = P(A)P(B|A) P(A B) = P(B)P(A|B)

Random variables and densities

Inference One of the central problems of computational probability theory Many problems can be formulated in these terms. Examples: The probability that there is an intruder given the alarm went off is pI|A(true, true) Inference requires manipulating densities

Probabilistic graphical models Combination of graph theory and probability theory Graph structure specifies which parts of the system are directly dependent Local functions at each node specify how different parts interaction Bayesian Networks = Probabilistic Graphical Models based on directed acyclic graph Markov Networks = Probabilistic Graphical Models based on undirected graph

Some broad questions

Bayesian Networks Nodes are random variables Edges represent dependence – no directed cycles allowed) P(X1:N) = P(X1)P(X2|X1)P(X3|X1,X2) = P(Xi|X1:i-1) = P(Xi|Xi) x2 x3 x5 x4 x7 x6 x1

Example Water sprinkler Bayes net P(C,S,R,W)=P(C)P(S|C)P(R|C,S)P(W|C,S,R) chain rule =P(C)P(S|C)P(R|C)P(W|C,S,R) since R  S|C =P(C)P(S|C)P(R|C)P(W|S,R) since W  C|R,S

Inference

Naïve inference

Problem with naïve representation of the joint probability Problems with the working with the joint probability Representation: big table of numbers is hard to understand Inference: computing a marginal P(Xi) takes O(2N) time Learning: there are O(2N) parameters to estimate Graphical models solve the above problems by providing a structured representation for the joint Graphs encode conditional independence properties and represent families of probability distribution that satisfy these properties

Bayesian networks provide a compact representation of the joint probability

Conditional probabilities

Another example: medical diagnosis (classification)

Approach: build a Bayes’ net and use Bayes’s rule to get class probability

A very simple Bayes’ net: Naïve Bayes

Naïve Bayes classifier for medical diagnosis

Another commonly used Bayes’ net: Hidden Markov Model (HMM)

Conditional independence properties of Bayesian networks: chains

Conditional independence properties of Bayesian networks: common cause

Conditional independence properties of Bayesian networks: explaining away

Global Markov properties of DAGs

Bayes ball algorithm

Example

Undirected graphical models

Parameterization

Clique potentials

Interpretation of clique potentials

Examples

Joint distribution of an undirected graphical model Complexity scales exponentially as 2n for binary random variable if we use a naïve approach to computing the partition function

Max clique vs. sub-clique

Log-linear models

Log-linear models

Log-linear models

Summary

Summary

From directed to undirected graphs

From directed to undirected graphs

Example of moralization

Comparing directed and undirected models

Expressive power w x y z x y z

Coming back to inference

Coming back to inference

Belief propagation in trees

Belief propagation in trees

Belief propagation in trees

Belief propagation in trees

Belief propagation in trees

Belief propagation in trees

Belief propagation in trees

Belief propagation in trees

Learning

Parameter Estimation

Parameter Estimation

Maximum-likelihood Estimation (MLE)

Example: 1-D Gaussian

MLE for Bayes’ Net

MLE for Bayes’ Net

MLE for Bayes’ Net with Discrete Nodes

Parameter Estimation with Hidden Nodes Z Z1 Z2 Z3 Z4 Z5 Z6

Why is learning harder?

Where do hidden variables come from?

Parameter Estimation with Hidden Nodes z z

EM

Different Learning Conditions Structure Observability Full Partial Known Closed form search EM Unknown Local search Structural EM