Feed-Forward Neural Networks 主講人 : 虞台文. Content Introduction Single-Layer Perceptron Networks Learning Rules for Single-Layer Perceptron Networks – Perceptron.

Slides:



Advertisements
Similar presentations
Perceptron Lecture 4.
Advertisements

Introduction to Artificial Neural Networks
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
NEURAL NETWORKS Perceptron
Artificial Neural Networks - Introduction -
Linear Discriminant Functions
Perceptron.
Neural NetworksNN 11 Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Simple Neural Nets For Pattern Classification
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
A Review: Architecture
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
20.5 Nerual Networks Thanks: Professors Frank Hoffmann and Jiawei Han, and Russell and Norvig.
Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.
Connectionist Modeling Some material taken from cspeech.ucd.ie/~connectionism and Rich & Knight, 1991.
AN INTERACTIVE TOOL FOR THE STOCK MARKET RESEARCH USING RECURSIVE NEURAL NETWORKS Master Thesis Michal Trna
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
Artificial Neural Networks
Before we start ADALINE
Data Mining with Neural Networks (HK: Chapter 7.5)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
NEURAL NETWORKS Introduction
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Feed-Forward Neural Networks
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Some more Artificial Intelligence
Neurons, Neural Networks, and Learning 1. Human brain contains a massively interconnected net of (10 billion) neurons (cortical cells) Biological.
Computer Science and Engineering
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Introduction to Neural Networks. Neural Networks in the Brain Human brain “computes” in an entirely different way from conventional digital computers.
Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006.
Artificial Intelligence Neural Networks ( Chapter 9 )
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
1 Machine Learning The Perceptron. 2 Heuristic Search Knowledge Based Systems (KBS) Genetic Algorithms (GAs)
NEURAL NETWORKS FOR DATA MINING
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Artificial Neural Networks An Introduction. What is a Neural Network? A human Brain A porpoise brain The brain in a living creature A computer program.
Introduction to Artificial Intelligence (G51IAI) Dr Rong Qu Neural Networks.
Neural Networks Steven Le. Overview Introduction Architectures Learning Techniques Advantages Applications.
Linear Classification with Perceptrons
CS 478 – Tools for Machine Learning and Data Mining Perceptron.
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Chapter 2 Single Layer Feedforward Networks
Lecture 5 Neural Control
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Supervised learning network G.Anuradha. Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single.
Artificial Neural Networks Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
IE 585 History of Neural Networks & Introduction to Simple Learning Rules.
November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Perceptrons Michael J. Watts
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Chapter 6 Neural Network.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Lecture 2 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 2/1 Dr.-Ing. Erwin Sitompul President University
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Perceptron as one Type of Linear Discriminants
Introduction to Neural Network
Presentation transcript:

Feed-Forward Neural Networks 主講人 : 虞台文

Content Introduction Single-Layer Perceptron Networks Learning Rules for Single-Layer Perceptron Networks – Perceptron Learning Rule – Adline Leaning Rule

Feed-Forward Neural Networks Introduction

Historical Background 1943 McCulloch and Pitts proposed the first computational models of neuron Hebb proposed the first learning rule Rosenblatt’s work in perceptrons Minsky and Papert’s exposed limitation of the theory. 1970s Decade of dormancy for neural networks s Neural network return (self-organization, back-propagation algorithms, etc)

Nervous Systems Human brain contains ~ neurons. Each neuron is connected ~ 10 4 others. Some scientists compared the brain with a “complex, nonlinear, parallel computer”. The largest modern neural networks achieve the complexity comparable to a nervous system of a fly.

Neurons The main purpose of neurons is to receive, analyze and transmit further the information in a form of signals (electric pulses). When a neuron sends the information we say that a neuron “fires”.

Neurons This animation demonstrates the firing of a synapse between the pre-synaptic terminal of one neuron to the soma (cell body) of another neuron. Acting through specialized projections known as dendrites and axons, neurons carry information throughout the neural network.

bias x1x1 x2x2 x m =  1 wi1wi1 wi2wi2 w im =  i A Model of Artificial Neuron yiyi f (.)a (.) 

bias x1x1 x2x2 x m =  1 wi1wi1 wi2wi2 w im =  i A Model of Artificial Neuron yiyi f (.)a (.) 

Feed-Forward Neural Networks Graph representation: – nodes: neurons – arrows: signal flow directions A neural network that does not contain cycles (feedback loops) is called a feed–forward network (or perceptron).... x1x1 x2x2 xmxm y1y1 y2y2 ynyn

Hidden Layer(s) Input Layer Output Layer Layered Structure... x1x1 x2x2 xmxm y1y1 y2y2 ynyn

Knowledge and Memory... x1x1 x2x2 xmxm y1y1 y2y2 ynyn The output behavior of a network is determined by the weights. Weights  the memory of an NN. Knowledge  distributed across the network. Large number of nodes – increases the storage “capacity”; – ensures that the knowledge is robust; – fault tolerance. Store new information by changing weights.

Pattern Classification... x1x1 x2x2 xmxm y1y1 y2y2 ynyn Function: x  y The NN’s output is used to distinguish between and recognize different input patterns. Different output patterns correspond to particular classes of input patterns. Networks with hidden layers can be used for solving more complex problems then just a linear pattern classification. input pattern x output pattern y

Training... xi1xi1 xi2xi2 x im yi1yi1 yi2yi2 y in Training Set di1di1 di2di2 d in Goal:

Generalization... x1x1 x2x2 xmxm y1y1 y2y2 ynyn By properly training a neural network may produce reasonable answers for input patterns not seen during training (generalization). Generalization is particularly useful for the analysis of a “noisy” data (e.g. time–series).

Generalization... x1x1 x2x2 xmxm y1y1 y2y2 ynyn By properly training a neural network may produce reasonable answers for input patterns not seen during training (generalization). Generalization is particularly useful for the analysis of a “noisy” data (e.g. time–series). without noise with noise

Applications Pattern classification Object recognition Function approximation Data compression Time series analysis and forecast...

Feed-Forward Neural Networks Single-Layer Perceptron Networks

The Single-Layered Perceptron... x1x1 x2x2 x m =  1 y1y1 y2y2 ynyn x m-1... w 11 w 12 w1mw1m w 21 w 22 w2mw2m wn1wn1 w nm wn2wn2

The Single-Layered Perceptron... x1x1 x2x2 x m =  1 y1y1 y2y2 ynyn x m-1... w 11 w 12 w1mw1m w 21 w 22 w2mw2m wn1wn1 w nm wn2wn2

Training a Single-Layered Perceptron... x1x1 x2x2 x m =  1 y1y1 y2y2 ynyn x m-1... w 11 w 12 w1mw1m w 21 w 22 w2mw2m wn1wn1 w nm wn2wn2 d1d1 d2d2 dndn Training Set Goal:

Learning Rules... x1x1 x2x2 x m =  1 y1y1 y2y2 ynyn x m-1... w 11 w 12 w1mw1m w 21 w 22 w2mw2m wn1wn1 w nm wn2wn2 d1d1 d2d2 dndn Training Set Goal: Linear Threshold Units (LTUs) : Perceptron Learning Rule Linearly Graded Units (LGUs) : Widrow-Hoff learning Rule

Feed-Forward Neural Networks Learning Rules for Single-Layered Perceptron Networks Perceptron Learning Rule Adline Leaning Rule

Perceptron Linear Threshold Unit sgn x1x1 x2x2 x m =  1 wi1wi1 wi2wi2 w im =  i  +1 11

Perceptron Linear Threshold Unit sgn Goal: x1x1 x2x2 x m =  1 wi1wi1 wi2wi2 w im =  i  +1 11

Example x1x1 x2x2 x 3 =  1 22 1 22 y Class 1 (+1) Class 2 (  1) Class 1 Class 2 x1x1 x2x2 g(x) =  2x 1 +2x 2 +2=0 Goal:

Augmented input vector x1x1 x2x2 x 3 =  1 22 1 22 y Class 1 (+1) Class 2 (  1) Class 1 (+1) Class 2 (  1) Goal:

Augmented input vector x1x1 x2x2 x 3 =  1 22 1 22 y Class 1 (  1,  2,  1) (  1.5,  1,  1) (  1,0,  1) Class 2 (1,  2,  1) (2.5,  1,  1) (2,0,  1) x1x1 x2x2 x3x3 (0,0,0) Goal:

Augmented input vector x1x1 x2x2 x 3 =  1 22 1 22 y Class 1 (  1,  2,  1) (  1.5,  1,  1) (  1,0,  1) Class 2 (1,  2,  1) (2.5,  1,  1) (2,0,  1) x1x1 x2x2 x3x3 (0,0,0) Goal: A plane passes through the origin in the augmented input space.

Linearly Separable vs. Linearly Non-Separable ANDORXOR Linearly Separable Linearly Non-Separable

Goal Given training sets T 1  C 1 and T 2  C 2 with elements in form of x=(x 1, x 2,..., x m-1, x m ) T, where x 1, x 2,..., x m-1  R and x m =  1. Assume T 1 and T 2 are linearly separable. Find w=(w 1, w 2,..., w m ) T such that

Goal Given training sets T 1  C 1 and T 2  C 2 with elements in form of x=(x 1, x 2,..., x m-1, x m ) T, where x 1, x 2,..., x m-1  R and x m =  1. Assume T 1 and T 2 are linearly separable. Find w=(w 1, w 2,..., w m ) T such that w T x = 0 is a hyperplain passes through the origin of augmented input space.

Observation x1x1 x2x2 +  d = +1 d =  1 + w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 x Which w ’s correctly classify x ? What trick can be used?

Observation x1x1 x2x2 +  d = +1 d =  1 + w 1 x 1 +w 2 x 2 = 0 w x Is this w ok?

Observation x1x1 x2x2 +  d = +1 d =  1 + w 1 x 1 +w 2 x 2 = 0 w x Is this w ok?

Observation x1x1 x2x2 +  d = +1 d =  1 + w 1 x 1 +w 2 x 2 = 0 w x Is this w ok? How to adjust w ?  w = ? ? ?

Observation x1x1 x2x2 +  d = +1 d =  1 + w x Is this w ok? How to adjust w ?  w =  x reasonable? <0 >0

Observation x1x1 x2x2 +  d = +1 d =  1 + w x Is this w ok? How to adjust w ?  w =  x reasonable? <0 >0

Observation x1x1 x2x2 +  d = +1 d =  1  w x Is this w ok?  w = ? +x+x or  x

Perceptron Learning Rule + d = +1  d =  1 Upon misclassification on Define error +   + No error

Perceptron Learning Rule Define error +   + No error

Perceptron Learning Rule Learning Rate Error (d  y) Input

Summary  Perceptron Learning Rule Based on the general weight learning rule. incorrect correct

Summary  Perceptron Learning Rule xy d +   Converge?

Perceptron Convergence Theorem xy d +   Exercise: Reference some papers or textbooks to prove the theorem. If the given training set is linearly separable, the learning process will converge in a finite number of steps.

The Learning Scenario x1x1 x2x2 + x (1) + x (2)  x (3)  x (4) Linearly Separable.

The Learning Scenario x1x1 x2x2 w0w0 + x (1) + x (2)  x (3)  x (4)

The Learning Scenario x1x1 x2x2 w0w0 + x (1) + x (2)  x (3)  x (4) w1w1 w0w0

The Learning Scenario x1x1 x2x2 w0w0 + x (1) + x (2)  x (3)  x (4) w1w1 w0w0 w2w2 w1w1

The Learning Scenario x1x1 x2x2 w0w0 + x (1) + x (2)  x (3)  x (4) w1w1 w0w0 w1w1 w2w2 w2w2 w3w3

The Learning Scenario x1x1 x2x2 w0w0 + x (1) + x (2)  x (3)  x (4) w1w1 w0w0 w1w1 w2w2 w2w2 w3w3 w 4 = w 3

The Learning Scenario x1x1 x2x2 + x (1) + x (2)  x (3)  x (4) w

The Learning Scenario x1x1 x2x2 + x (1) + x (2)  x (3)  x (4) w The demonstration is in augmented space. Conceptually, in augmented space, we adjust the weight vector to fit the data.

Weight Space w1w1 w2w2 + x A weight in the shaded area will give correct classification for the positive example. w

Weight Space w1w1 w2w2 + x A weight in the shaded area will give correct classification for the positive example. w  w =  x

Weight Space w1w1 w2w2  x A weight not in the shaded area will give correct classification for the negative example. w

Weight Space w1w1 w2w2  x A weight not in the shaded area will give correct classification for the negative example. w  w =  x

The Learning Scenario in Weight Space w1w1 w2w2 + x (1) + x (2)  x (3)  x (4)

The Learning Scenario in Weight Space w1w1 w2w2 + x (1) + x (2)  x (3)  x (4) To correctly classify the training set, the weight must move into the shaded area.

The Learning Scenario in Weight Space w1w1 w2w2 + x (1) + x (2)  x (3)  x (4) To correctly classify the training set, the weight must move into the shaded area. w0w0 w1w1

The Learning Scenario in Weight Space w1w1 w2w2 + x (1) + x (2)  x (3)  x (4) To correctly classify the training set, the weight must move into the shaded area. w0w0 w1w1 w2w2

The Learning Scenario in Weight Space w1w1 w2w2 + x (1) + x (2)  x (3)  x (4) To correctly classify the training set, the weight must move into the shaded area. w0w0 w1w1 w2w2 =w 3

The Learning Scenario in Weight Space w1w1 w2w2 + x (1) + x (2)  x (3)  x (4) To correctly classify the training set, the weight must move into the shaded area. w0w0 w1w1 w2w2 =w 3 w4w4

The Learning Scenario in Weight Space w1w1 w2w2 + x (1) + x (2)  x (3)  x (4) To correctly classify the training set, the weight must move into the shaded area. w0w0 w1w1 w2w2 =w 3 w4w4 w5w5

The Learning Scenario in Weight Space w1w1 w2w2 + x (1) + x (2)  x (3)  x (4) To correctly classify the training set, the weight must move into the shaded area. w0w0 w1w1 w2w2 =w 3 w4w4 w5w5 w6w6

The Learning Scenario in Weight Space w1w1 w2w2 + x (1) + x (2)  x (3)  x (4) To correctly classify the training set, the weight must move into the shaded area. w0w0 w1w1 w2w2 =w 3 w4w4 w5w5 w6w6 w7w7

The Learning Scenario in Weight Space w1w1 w2w2 + x (1) + x (2)  x (3)  x (4) To correctly classify the training set, the weight must move into the shaded area. w0w0 w1w1 w2w2 =w 3 w4w4 w5w5 w6w6 w7w7 w8w8

The Learning Scenario in Weight Space w1w1 w2w2 + x (1) + x (2)  x (3)  x (4) To correctly classify the training set, the weight must move into the shaded area. w0w0 w1w1 w2w2 =w 3 w4w4 w5w5 w6w6 w7w7 w8w8 w9w9

The Learning Scenario in Weight Space w1w1 w2w2 + x (1) + x (2)  x (3)  x (4) To correctly classify the training set, the weight must move into the shaded area. w0w0 w1w1 w2w2 =w 3 w4w4 w5w5 w6w6 w7w7 w8w8 w9w9 w 10

The Learning Scenario in Weight Space w1w1 w2w2 + x (1) + x (2)  x (3)  x (4) To correctly classify the training set, the weight must move into the shaded area. w0w0 w1w1 w2w2 =w 3 w4w4 w5w5 w6w6 w7w7 w8w8 w9w9 w 10 w 11

The Learning Scenario in Weight Space w1w1 w2w2 + x (1) + x (2)  x (3)  x (4) To correctly classify the training set, the weight must move into the shaded area. w0w0 w 11 Conceptually, in weight space, we move the weight into the feasible region.

Feed-Forward Neural Networks Learning Rules for Single-Layered Perceptron Networks Perceptron Learning Rule Adline Leaning Rule

Adline (Adaptive Linear Element) Widrow [1962] x1x1 x2x2 xmxm wi1wi1 wi2wi2 w im 

Adline (Adaptive Linear Element) Widrow [1962] x1x1 x2x2 xmxm wi1wi1 wi2wi2 w im  Goal: In what condition, the goal is reachable?

LMS (Least Mean Square) Minimize the cost function (error function):

Gradient Decent Algorithm E(w)E(w) w1w1 w2w2 Our goal is to go downhill. Contour Map (w 1, w 2 ) w1w1 w2w2 ww

Gradient Decent Algorithm E(w)E(w) w1w1 w2w2 Our goal is to go downhill. Contour Map (w 1, w 2 ) w1w1 w2w2 ww How to find the steepest decent direction?

Gradient Operator Let f(w) = f (w 1, w 2, …, w m ) be a function over R m. Define

Gradient Operator ff ww ff ww ff ww df : positive df : zero df : negative Go uphillPlainGo downhill

ff ww ff ww ff ww The Steepest Decent Direction df : positive df : zero df : negative Go uphillPlainGo downhill To minimize f, we choose  w =   f

LMS (Least Mean Square) Minimize the cost function (error function):

Adline Learning Rule Minimize the cost function (error function): Weight Modification Rule

Learning Modes Batch Learning Mode: Incremental Learning Mode:

Comparisons Fundamental Habbian Assumption Gradient Decent ConvergenceIn finite steps Converge Asymptotically Constraint Linearly Separable Linear Independence Perceptron Learning Rule Adline Learning Rule (Widrow-Hoff)