Artificial Neural Networks

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Beyond Linear Separability
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Artificial Neural Networks
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Reading for Next Week Textbook, Section 9, pp A User’s Guide to Support Vector Machines (linked from course website)
Classification Neural Networks 1
Perceptron.
Machine Learning Neural Networks
Overview over different methods – Supervised Learning
Artificial Neural Networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Artificial Neural Networks #1 Machine Learning CH4 : 4.1 – 4.5
Back-Propagation Algorithm
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Artificial Neural Networks
Artificial Neural Networks
LOGO Classification III Lecturer: Dr. Bo Yuan
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 484 – Artificial Intelligence
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Computer Science and Engineering
Artificial Neural Networks
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Artificial Neural Network Yalong Li Some slides are from _24_2011_ann.pdf.
Artificial Neural Networks Biointelligence Laboratory Department of Computer Engineering Seoul National University.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Non-Bayes classifiers. Linear discriminants, neural networks.
EE459 Neural Networks Backpropagation
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
SUPERVISED LEARNING NETWORK
Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Artificial Neural Network
EEE502 Pattern Recognition
Multilayer Neural Networks (sometimes called “Multilayer Perceptrons” or MLPs)
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Chapter 6 Neural Network.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
Machine Learning Supervised Learning Classification and Regression
Fall 2004 Backpropagation CS478 - Machine Learning.
第 3 章 神经网络.
Real Neurons Cell structures Cell body Dendrites Axon
Artificial Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Artificial Neural Networks
Classification Neural Networks 1
Artificial Intelligence Chapter 3 Neural Networks
Perceptron as one Type of Linear Discriminants
Artificial Neural Networks
Lecture Notes for Chapter 4 Artificial Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Machine Learning: Lecture 4
Machine Learning: UNIT-2 CHAPTER-1
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Seminar on Machine Learning Rada Mihalcea
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Artificial Neural Networks Chapter 4 Artificial Neural Networks

Questions: What is ANNs? How to learn an ANN? (algorithm) The presentational power of ANNs(advantage and disadvantage)

What is ANNs ------Background Consider humans Neuron switching time 0.001 second Number of neurons 1010 Connections per neuron 104~5 Scene recognition time 0.1 second much parallel computation Property of neuron: thresholded unit One motivation for ANN systems is to capture this kind of highly parallel computation based on distributed reprensetation

What is ANNs? -----Problems related to ANNs classfication Voice recognition others

Another example: Properties of artificial neural nets (ANNs) Many neuron like threshold switching units Many weighted interconnections among units Highly parallel distributed process Emphasis on tuning weights automatically

4.1 Perceptrons

To simplify notation, set x0 =1 Learning a perceptron involves choosing values for the weight . Therefore, the space H of candidate hypotheses considered in perceptron learning is the set of all possible real-valued weight vectors.

We can view the perceptron as representing a hyperplane decision surface in the n-dimensional space of instances. Two way to train perceptron: Perceptron Training Rule and Gradient Descent

(1). Perceptron Training Rule Initialize the ωi with random value in the given interval Update the value of ωi according to the training example is target value o is perceptron output is small constant called learning rate

Representation Power of Perceptrons A single perceptron can be used to represent many boolean functions, such as AND, OR, NAND, NOR, but fail to represent XOR. Eg: g(x1, x2) = AND(x1 ,x2) o(x1, x2) = sgn(- 0.8 + 0.5x1 + 0.5x2 ) x1 - 0.8 + 0.5x1 + 0.5x2 O -1 -1.8 1 -0.8 0.2

Representation Power of Perceptrons (a) Can prove it will converge If training data is linearly separable and sufficiently small (b)But some functions not representable ,eg: not linearly separable (c) Every boolean function can be represented by some network of perceptrons only two levels deep

(2). Gradient Descent Key idea: searching the hypothesis space to find the weights that best fit the training examples. Best fit: minimize the squared error Where D is set of training examples

Gradient Descent Gradient: Training rule: or

Gradient Descent

Gradient Descent Algorithm Initialize each ωi to some small random value Until the termination condition is met , Do Initialize each Δωi to zero. For each <x, t> in training examples Do Input the instance x to the unit and compute the output o For each linear unit weight ωi Do For each linear unit weight ωi ,Do

When to use gradient descent Continuously parameterized hypothesis The error can be differentiable

Advantage vs Disadvantage Guaranteed to converge to hypothesis with local minimum error , Given sufficiently small learning rate η; Even when training data contains noise; Training data not linear separable ; Converge to the single global minimum. Disadvantage Converging sometimes can be very slow; No guarantee Converging to global minimum in cases where there are multiple local minima

Incremental (Stochastic) Gradient Descent Vs. standard Gradient Descent Do until satisfied Compute the gradient Stochastic Gradient Descent For each training example d in D

Standard Gradient Descent vs. Stochastic Gradient Descent Stochastic Gradient Descent can approximate Standard Gradient Descent arbitrarily closely if η made small enough; Stochastic mode can converge faster; Stochastic Gradient descent can sometimes avoid falling into local minima.

(3).Perceptron training rule Vs. gradient descent Thresholded perceptron output: Provided examples are linearly separable Converge to a hypothesis that perfectly classfies the training data gradient descent Unthresholded linear output: Regardless of whether the training data are linearly separable Converge asymptotically toward the minimum error hypothesis

Perceptron: Network: 4.2 Multilayer Networks Perceptrons can only express liner decision,we need to express a rich variety of nonlinear decision

Sigmoid unit – a differentiable threshold unit Sigmoid function: Property: Output: Why do we use sigmoid instead of linear and sgn(x)?

The Backpropagation Algorithm The main idea of backpropagation algorithm computing the input and output of each unit foreword; modifying the weights of units pairs backward with respect to errors

Error definition : Batch mode: Individual mode:

oj oi = xji ω ij … …

Training rule for Output Unit weights

Training Rule for Hidden Unit Weights ok Error term

Backpropagation Algorithm Initialize all weights to small random numbers Until termination condition is met Do For each training example Do //Propagate the input forward 1. Input the training example to the network and compute the network outputs //Propagate the errors backward 2. For each output unit k 3. For each hidden unit h 4.Update each network weight where

Hidden layer Representations

Hidden layer Representations

Hidden layer Representations

Hidden layer Representations

Hidden layer Representations

Convergence and local minima Converge to some local minimum and not necessarily to the global minimum error Use stochastic gradient descent rather than the standard gradient descent Initialization will influence the convergence. Training multiple networks network with different initializing random weights,over the same data, then select the best one Training can take thousands of iterations -->slow Initialize weights near zero, Therefore initial networks near linear. Increasingly nonlinear functions is possible as training progresses Add a momentum term to speed convergence

Expressive Capabilities of ANNs Every boolean function can be represented by network with single hidden layer Every bounded continuous function can be approximated with arbitrarily small error by network with one hidden layer Any function can be approximated to arbitrary accuracy by a network with two hidden layers The network with more hidden layers possibly results in the rise of precision , the possibility of converging to a local minima ,however, will increase as well.

When to Consider Neural Networks Input is high dimensional discrete or real valued Output is discrete or real valued Output is a vector of values Possibly noisy data Form of target function is unknown Human readability of result is unimportant

Overfitting in ANNs

Strategy applied to avoid overfitting Poor strategy: continue training until the error falls below some threshold A good indicator : the number of iterations that produces the lowest error over the validation set Once the trained weights reach a significantly higher error over the validation set than the stored weights, terminate!

Alternative Error Functions

Recurrent Networks

Recurrent Networks

Thank you !