Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Beyond Linear Separability
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Introduction to Neural Networks Computing
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Overview over different methods – Supervised Learning
Lecture 14 – Neural Networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 5 NEURAL NETWORKS
Back-Propagation Algorithm
Chapter 6: Multilayer Neural Networks
Artificial Neural Networks
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Artificial Neural Networks
LOGO Classification III Lecturer: Dr. Bo Yuan
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
Biointelligence Laboratory, Seoul National University
Computer Science and Engineering
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Machine Learning Chapter 4. Artificial Neural Networks
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Non-Bayes classifiers. Linear discriminants, neural networks.
ADALINE (ADAptive LInear NEuron) Network and
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Chapter 2 Single Layer Feedforward Networks
Supervised learning network G.Anuradha. Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single.
Chapter 8: Adaptive Networks
Perceptrons Michael J. Watts
Chapter 6 Neural Network.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Lecture 2 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 2/1 Dr.-Ing. Erwin Sitompul President University
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
Chapter 2 Single Layer Feedforward Networks
One-layer neural networks Approximation problems
第 3 章 神经网络.
CSE 473 Introduction to Artificial Intelligence Neural Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 2 Stimulus-Response Agents
CSE 573 Introduction to Artificial Intelligence Neural Networks
Neural Networks Chapter 5
Artificial Intelligence Chapter 3 Neural Networks
Machine Learning: Lecture 4
Machine Learning: UNIT-2 CHAPTER-1
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Computer Vision Lecture 19: Object Recognition III
Seminar on Machine Learning Rada Mihalcea
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University

(c) SNU CSE Biointelligence Lab Outline 3.1 Introduction 3.2 Training Single TLUs  Gradient Descent  Widrow-Hoff Rule  Generalized Delta Procedure 3.3 Neural Networks  The Backpropagation Method  Derivation of the Backpropagation Learning Rule 3.4 Generalization, Accuracy, and Overfitting 3.5 Discussion

(c) SNU CSE Biointelligence Lab 3.1 Introduction TLU (threshold logic unit): Basic units for neural networks  Based on some properties of biological neurons Training set  Input: real value, boolean value, …  Output:  d i : associated actions (Label, Class …) Target of training  Finding f(X) that corresponds “acceptably” to the members of the training set.  Supervised learning: Labels are given along with the input vectors.

(c) SNU CSE Biointelligence Lab 3.2 Training Single TLUs TLU Geometry Training TLU: Adjusting variable weights A single TLU: Perceptron, Adaline (adaptive linear element) [Rosenblatt 1962, Widrow 1962] Elements of TLU  Weight: W =(w 1, …, w n )  Threshold:  Output of TLU: Using weighted sum s = W  X  1 if s   > 0  0 if s   < 0 Hyperplane  W  X   = 0

(c) SNU CSE Biointelligence Lab Figure 3.1 TLU Geometry

(c) SNU CSE Biointelligence Lab Augmented Vectors Adopting the convention that threshold is fixed to 0. Arbitrary thresholds: (n + 1)-dimensional vector W = (w 1, …, w n,  ), X = (x 1, …, x n, 1) Output of TLU  1 if W  X  0  0 if W  X < 0

(c) SNU CSE Biointelligence Lab Gradient Decent Methods Training TLU: minimizing the error function by adjusting weight values. Batch learning v.s. incremental learning Commonly used error function: squared error  Gradient:  Chain rule: Solution of nonlinearity of  f /  s :  Ignoring threshold function: f = s  Replacing threshold function with differentiable nonlinear function

(c) SNU CSE Biointelligence Lab The Widrow-Hoff Procedure: First Solution Weight update procedure:  Using f = s = W  X  Data labeled 1  1, Data labeled 0   1 Gradient: New weight vector Widrow-Hoff (delta) rule  (d  f) > 0  increasing s  decreasing (d  f)  (d  f) < 0  decreasing s  increasing (d  f)

(c) SNU CSE Biointelligence Lab The Generalized Delta Procedure: Second Solution Sigmoid function (differentiable): [Rumelhart, et al. 1986] Gradient: Generalized delta procedure:  Target output: 1, 0  Output f = output of sigmoid function  f(1– f) = 0, where f = 0 or 1  Weight change can occur only within ‘fuzzy’ region surrounding the hyperplane (near the point f(s) = ½).

(c) SNU CSE Biointelligence Lab Figure 3.2 A Sigmoid Function

(c) SNU CSE Biointelligence Lab The Error-Correction Procedure Using threshold unit: (d – f) can be either 1 or –1. In the linearly separable case, after finite iteration, W will be converged to the solution. In the nonlinearly separable case, W will never be converged. The Widrow-Hoff and generalized delta procedures will find minimum squared error solutions even when the minimum error is not zero.

(c) SNU CSE Biointelligence Lab 3.3 Neural Networks Motivation Need for use of multiple TLUs  Feedforward network: no cycle  Recurrent network: cycle (treated in a later chapter)  Layered feedforward network  j th layer can receive input only from j – 1 th layer. Example : Figure 3.4 A Network of TLUs That Implements the Even-Parity Function

(c) SNU CSE Biointelligence Lab Notation Hidden unit: neurons in all but the last layer Output of j-th layer: X (j)  input of (j+1)-th layer Input vector: X (0) Final output: f The weight of i-th sigmoid unit in the j-th layer: W i (j) Weighted sum of i-th sigmoid unit in the j-th layer: s i (j) Number of sigmoid units in j-th layer: m j

(c) SNU CSE Biointelligence Lab Figure 3.4 A k-layer Network of Sigmoid Units

(c) SNU CSE Biointelligence Lab The Backpropagation Method Gradient of W i (j) : Weight update: Local gradient

(c) SNU CSE Biointelligence Lab Computing Weight Changes in the Final Layer Local gradient: Weight update:

(c) SNU CSE Biointelligence Lab Computing Changes to the Weights in Intermediate Layers Local gradient:  The final ouput f, depends on s i (j) through of the summed inputs to the sigmoids in the (j+1)-th layer. Need for computation of

(c) SNU CSE Biointelligence Lab Weight Update in Hidden Layers (cont.)  v  i : v = i :  Conseqeuntly,

(c) SNU CSE Biointelligence Lab Weight Update in Hidden Layers (cont.) Attention to recursive equation of local gradient! Backpropagation:  Error is back-propagated from the output layer to the input layer  Local gradient of the latter layer is used in calculating local gradient of the former layer.

(c) SNU CSE Biointelligence Lab (cont’d) Example (even parity function)  Learning rate: 1.0 inputtarget Figure 3.6 A Network to Be Trained by Backprop

(c) SNU CSE Biointelligence Lab 3.4 Generalization, Accuracy, and Overfitting Generalization ability:  NN appropriately classifies vectors not in the training set.  Measurement = accuracy Curve fitting  Number of training input vectors  number of degrees of freedom of the network.  In the case of m data points, is (m-1)-degree polynomial best model? No, it can not capture any special information. Overfitting  Extra degrees of freedom are essentially just fitting the noise.  Given sufficient data, the Occam’s Razor principle dictates to choose the lowest-degree polynomial that adequately fits the data.

(c) SNU CSE Biointelligence Lab Figure 3.7 Curve Fitting

(c) SNU CSE Biointelligence Lab 3.4 (cont’d) Out-of-sample-set error rate  Error rate on data drawn from the same underlying distribution of training set. Dividing available data into a training set and a validation set  Usually use 2/3 for training and 1/3 for validation k-fold cross validation  k disjoint subsets (called folds).  Repeat training k times with the configuration: one validation set, k-1 (combined) training sets.  Take average of the error rate of each validation as the out-of- sample error.  Empirically 10-fold is preferred.

(c) SNU CSE Biointelligence Lab Figure 3.8 Error Versus Number of Hidden Units Fig 3.9 Estimate of Generalization Error Versus Number of Hidden Units

(c) SNU CSE Biointelligence Lab 3.5 Additional Readings and Discussion Applications  Pattern recognition, automatic control, brain-function modeling  Designing and training neural networks still need experience and experiments. Major annual conferences  Neural Information Processing Systems (NIPS)  International Conference on Machine Learning (ICML)  Computational Learning Theory (COLT) Major journals  Neural Computation  IEEE Transactions on Neural Networks  Machine Learning