1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.

Slides:



Advertisements
Similar presentations
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
Advertisements

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Neural Networks  A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Neural Network I Week 7 1. Team Homework Assignment #9 Read pp. 327 – 334 and the Week 7 slide. Design a neural network for XOR (Exclusive OR) Explore.
Data Mining Classification: Alternative Techniques
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Machine Learning Neural Networks
Overview Full Bayesian Learning MAP learning
Bayesian network inference
Lecture 14 – Neural Networks
Artificial Intelligence (CS 461D)
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
. Learning Bayesian networks Slides by Nir Friedman.
Statistical Learning Methods Russell and Norvig: Chapter 20 (20.1,20.2,20.4,20.5) CMSC 421 – Fall 2006.
Lecture 5: Learning models using EM
Neural Networks Marco Loog.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Learning Bayesian Networks
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #31 4/17/02 Neural Networks.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.
October 28, 2010Neural Networks Lecture 13: Adaptive Networks 1 Adaptive Networks As you know, there is no equation that would tell you the ideal number.
1 Machine Learning: Naïve Bayes, Neural Networks, Clustering Skim 20.5 CMSC 471.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way
EM and expected complete log-likelihood Mixture of Experts
Cascade Correlation Architecture and Learning Algorithm for Neural Networks.
Artificial Neural Networks
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Introduction to Neural Networks. Neural Networks in the Brain Human brain “computes” in an entirely different way from conventional digital computers.
Multi-Layer Perceptrons Michael J. Watts
Artificial Intelligence Neural Networks ( Chapter 9 )
Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos VC 14/15 – TP19 Neural Networks & SVMs Miguel Tavares.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CMSC 471 Spring 2014 Class #16 Thursday, March 27, 2014 Machine Learning II Professor Marie desJardins,
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 Instance-Based & Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
Machine Learning 5. Parametric Methods.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
“Principles of Soft Computing, 2 nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved. CHAPTER 2 ARTIFICIAL.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Neural Networks Review
Class #21 – Tuesday, November 10
Artificial Intelligence (CS 370D)
Neural Networks A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Data Mining Lecture 11.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
EM for Inference in MV Data
Professor Marie desJardins,
Bayesian Learning Chapter
The Naïve Bayes (NB) Classifier
Class #25 – Monday, November 29
EM for Inference in MV Data
Learning Bayesian networks
Presentation transcript:

1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29

2 Today’s class Neural networks Bayesian learning

3 Machine Learning: Neural and Bayesian Chapter 19 Some material adapted from lecture notes by Lise Getoor and Ron Parr

4 Neural function Brain function (thought) occurs as the result of the firing of neurons Neurons connect to each other through synapses, which propagate action potential (electrical impulses) by releasing neurotransmitters Synapses can be excitatory (potential-increasing) or inhibitory (potential-decreasing), and have varying activation thresholds Learning occurs as a result of the synapses’ plasticicity: They exhibit long-term changes in connection strength There are about neurons and about synapses in the human brain

5 Biology of a neuron

6 Brain structure Different areas of the brain have different functions –Some areas seem to have the same function in all humans (e.g., Broca’s region); the overall layout is generally consistent –Some areas are more plastic, and vary in their function; also, the lower-level structure and function vary greatly We don’t know how different functions are “assigned” or acquired –Partly the result of the physical layout / connection to inputs (sensors) and outputs (effectors) –Partly the result of experience (learning) We really don’t understand how this neural structure leads to what we perceive as “consciousness” or “thought” Our neural networks are not nearly as complex or intricate as the actual brain structure

7 Comparison of computing power Computers are way faster than neurons… But there are a lot more neurons than we can reasonably model in modern digital computers, and they all fire in parallel Neural networks are designed to be massively parallel The brain is effectively a billion times faster

8 Neural networks Neural networks are made up of nodes or units, connected by links Each link has an associated weight and activation level Each node has an input function (typically summing over weighted inputs), an activation function, and an output

9 Layered feed-forward network Output units Hidden units Input units

10 Neural unit

11 “Executing” neural networks Input units are set by some exterior function (think of these as sensors), which causes their output links to be activated at the specified level Working forward through the network, the input function of each unit is applied to compute the input value –Usually this is just the weighted sum of the activation on the links feeding into this node The activation function transforms this input function into a final value –Typically this is a nonlinear function, often a sigmoid function corresponding to the “threshold” of that node

12 Learning neural networks Backpropagation Cascade correlation: adding hidden units

13 Learning Bayesian networks Given training set Find B that best matches D –model selection –parameter estimation Data D Inducer C A EB

14 Parameter estimation Assume known structure Goal: estimate BN parameters  –entries in local probability models, P(X | Parents(X)) A parameterization  is good if it is likely to generate the observed data: Maximum Likelihood Estimation (MLE) Principle: Choose   so as to maximize L i.i.d. samples

15 Parameter estimation in BNs The likelihood decomposes according to the structure of the network → we get a separate estimation task for each parameter The MLE (maximum likelihood estimate) solution: –for each value x of a node X –and each instantiation u of Parents(X) –Just need to collect the counts for every combination of parents and children observed in the data –MLE is equivalent to an assumption of a uniform prior over parameter values sufficient statistics

16 Sufficient statistics: Example Why are the counts sufficient? EarthquakeBurglary Alarm Moon-phase Light-level

17 Model selection Goal: Select the best network structure, given the data Input: –Training data –Scoring function Output: –A network that maximizes the score

18 Structure selection: Scoring Bayesian: prior over parameters and structure –get balance between model complexity and fit to data as a byproduct Score (G:D) = log P(G|D)  log [P(D|G) P(G)] Marginal likelihood just comes from our parameter estimates Prior on structure can be any measure we want; typically a function of the network complexity Same key property: Decomposability Score(structure) =  i Score(family of X i ) Marginal likelihood Prior

19 Heuristic search B E A C B E A C B E A C B E A C Δscore(C) Add E  C Δscore(A) Delete E  A Δscore(A) Reverse E  A

20 Exploiting decomposability B E A C B E A C B E A C Δscore(C) Add E  C Δscore(A) Delete E  A Δscore(A) Reverse E  A B E A C Δscore(A) Delete E  A To recompute scores, only need to re-score families that changed in the last move

21 Variations on a theme Known structure, fully observable: only need to do parameter estimation Unknown structure, fully observable: do heuristic search through structure space, then parameter estimation Known structure, missing values: use expectation maximization (EM) to estimate parameters Known structure, hidden variables: apply adaptive probabilistic network (APN) techniques Unknown structure, hidden variables: too hard to solve!

22 Handling missing data Suppose that in some cases, we observe earthquake, alarm, light-level, and moon-phase, but not burglary Should we throw that data away?? Idea: Guess the missing values based on the other data EarthquakeBurglary Alarm Moon-phase Light-level

23 EM (expectation maximization) Guess probabilities for nodes with missing values (e.g., based on other observations) Compute the probability distribution over the missing values, given our guess Update the probabilities based on the guessed values Repeat until convergence

24 EM example Suppose we have observed Earthquake and Alarm but not Burglary for an observation on November 27 We estimate the CPTs based on the rest of the data We then estimate P(Burglary) for November 27 from those CPTs Now we recompute the CPTs as if that estimated value had been observed Repeat until convergence! EarthquakeBurglary Alarm