Learning Processes.

Slides:



Advertisements
Similar presentations
© Negnevitsky, Pearson Education, Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Advertisements

Introduction to Neural Networks Computing
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Artificial neural networks:
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Computer Science Department
Chapter 7 Supervised Hebbian Learning.
Neural NetworksNN 11 Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Slides are based on Negnevitsky, Pearson Education, Lecture 8 Artificial neural networks: Unsupervised learning n Introduction n Hebbian learning.
Learning Process CS/CMPE 537 – Neural Networks. CS/CMPE Neural Networks (Sp 2004/2005) - Asim LUMS2 Learning Learning…? Learning is a process.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Un Supervised Learning & Self Organizing Maps Learning From Examples
2806 Neural Computation Learning Processes Lecture Ari Visa.
AN INTERACTIVE TOOL FOR THE STOCK MARKET RESEARCH USING RECURSIVE NEURAL NETWORKS Master Thesis Michal Trna
Supervised Hebbian Learning. Hebb’s Postulate “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing.
Artificial neural networks.
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #31 4/17/02 Neural Networks.
PART 5 Supervised Hebbian Learning. Outline Linear Associator The Hebb Rule Pseudoinverse Rule Application.
COMP305. Part I. Artificial neural networks.. Topic 3. Learning Rules of the Artificial Neural Networks.
Lecture 09 Clustering-based Learning
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Supervised Hebbian Learning
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006.
Artificial Neural Network Unsupervised Learning
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
Artificial Neural Network Yalong Li Some slides are from _24_2011_ann.pdf.
NEURAL NETWORKS FOR DATA MINING
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Neural Networks and Fuzzy Systems Hopfield Network A feedback neural network has feedback loops from its outputs to its inputs. The presence of such loops.
7 1 Supervised Hebbian Learning. 7 2 Hebb’s Postulate “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
Neural Networks Steven Le. Overview Introduction Architectures Learning Techniques Advantages Applications.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 12-1 Chapter 12 Advanced Intelligent Systems.
Neural Networks 2nd Edition Simon Haykin
1 Financial Informatics –XVII: Unsupervised Learning 1 Khurshid Ahmad, Professor of Computer Science, Department of Computer Science Trinity College, Dublin-2,
Synaptic Dynamics: Unsupervised Learning
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Lecture 5 Neural Control
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Supervised Learning. Teacher response: Emulation. Error: y1 – y2, where y1 is teacher response ( desired response, y2 is actual response ). Aim: To reduce.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
Data Mining and Decision Support
Neural Networks 2nd Edition Simon Haykin
Chapter 6 Neural Network.
“Principles of Soft Computing, 2 nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved. CHAPTER 2 ARTIFICIAL.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
Lecture 2 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 2/1 Dr.-Ing. Erwin Sitompul President University
Introduction to Connectionism Jaap Murre Universiteit van Amsterdam en Universiteit Utrecht
J. Kubalík, Gerstner Laboratory for Intelligent Decision Making and Control Artificial Neural Networks II - Outline Cascade Nets and Cascade-Correlation.
Self-Organizing Network Model (SOM) Session 11
Learning in Neural Networks
CSE P573 Applications of Artificial Intelligence Neural Networks
Simple learning in connectionist networks
Biological and Artificial Neuron
Biological and Artificial Neuron
Biological and Artificial Neuron
CSE 573 Introduction to Artificial Intelligence Neural Networks
Capabilities of Threshold Neurons
The Naïve Bayes (NB) Classifier
Artificial Neural Network Chapter – 2, Learning Processes
Simple learning in connectionist networks
Introduction to Neural Network
Supervised Hebbian Learning
Presentation transcript:

Learning Processes

Introduction Property of primary significance in nn: learn from its environment, and improve its performance through learning. Iterative adjustment of synaptic weights. Learning: hard to define. One definition by Mendel and McClaren: Learning is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded. The type of learning is determined by the manner in which the parameter changes take place.

Learning Sequence of events in nn learning: NN is stimulated by the environment. NN undergoes changes in its free parameters as a result of this stimulation. NN responds in a new way to the environment because of the changes that have occurred in its internal structure. A prescribed set of well-defined rules for the solution of the learning problem is called a learning algorithm. The manner in which a NN relates to the environment dictates the learning paradigm that refers to a model of environment operated on by the nn.

Overview Five basic learning rules Learning paradigms Error correction, Hebbian, memory-based, competitive, and Boltzmann Learning paradigms Credit assignment problem, supervised learning, unsupervised learning Learning tasks, memory, and adaptation Bias-Variance Tradeoff

Error Correction Learning Input 𝑥 𝑛 , output 𝑦 𝑘 𝑛 , and desired response or target output 𝑑 𝑘 𝑛 Error signal 𝑒 𝑘 𝑛 = 𝑑 𝑘 𝑛 - y 𝑘 𝑛 𝑒 𝑘 𝑛 actuates a control mechanism that gradually adjust the synaptic weights, to minimize the cost function (or index of performance): 𝐸 𝑛 = 1 2 𝑒 𝑘 2 𝑛 When synaptic weights reach a steady state, learning is stopped.1

Error Correction Learning: Delta Rule Widrow-Hoff rule, with learning rate 𝜂 : Δ𝑤 𝑘𝑗 𝑛 = 𝜂 𝑒 𝑘 𝑛 𝑥 𝑗 𝑛 With that, we can update the weights: 𝑤 𝑘𝑗 𝑛+1 = 𝑤 𝑘𝑗 𝑛 + Δ𝑤 𝑘𝑗 𝑛 Note: Partial derivatives for gradient

Memory-Based Learning or Lazy Learning All (or most) past experiences are explicitly stored, as input- target pairs 𝑥 𝑖 , 𝑑 𝑖 𝑖=1,…𝑁 Given a new input 𝑥 𝑡𝑒𝑠𝑡 , determine class based on local neighborhood of 𝑥 𝑡𝑒𝑠𝑡 . Criterion used for determining the neighborhood Learning rule applied to the neighborhood of the input, within the set of training examples CMAC, Case-Based Learning

Memory-Based Learning: Nearest Neighbor A set of instances observed so far: 𝑋= 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑁 Nearest neighbor 𝑥 𝑁 ′ ∈𝑋 of 𝑥 𝑡𝑒𝑠𝑡 : min 𝑖 𝑑 𝑥 𝑖 ,𝑥 𝑡𝑒𝑠𝑡 =𝑑 𝑥 𝑁 ′ ,𝑥 𝑡𝑒𝑠𝑡 where 𝑑 .,. is the Euclidean distance. 𝑥 𝑡𝑒𝑠𝑡 is classified as the same class as 𝑥 𝑁 ′ Cover and Hart (1967): error (for Nearest Neighbor) is at most twice that of the optimal (Bayes probability of error), given The classified examples are independently and identically distributed. The sample size N is infinitely large.

Hebbian Synapses Donald Hebb’s postulate of learning appeared in his book The Organization of Behavior (1949). When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic changes take place in one or both cells such that A’s efficiency as one of the cells firing B, is increased. Hebbian synapse If two neurons on either side of a synapse are activated simultaneously, the synapse is strengthened. If they are activated asynchronously, the synapse is weakened or eliminated. (This part was not mentioned in Hebb.)

Hebbian Learning I n p u t S i g n a l s O u t p u t S i g n a l s Hebb’s Law provides the basis for learning without a teacher. Learning here is a local phenomenon occurring without feedback from the environment.

Hebbian Learning Using Hebb’s Law we can express the adjustment applied to the weight 𝑤 𝑖𝑗 at iteration 𝑛: Δ𝑤 𝑖𝑗 𝑛 = 𝐹( 𝑦 𝑗 𝑛 , 𝑥 𝑖 𝑛 ) As a special case, we can represent Hebb’s Law as follows(Activity product rule): Δ𝑤 𝑖𝑗 𝑛 = α 𝑦 𝑗 𝑛 𝑥 𝑖 𝑛 , where α is the learning rate parameter. Non-linear forgetting factor into Hebb’s Law: Δ𝑤 𝑖𝑗 𝑛 = α 𝑦 𝑗 𝑛 𝑥 𝑖 𝑛 −𝜑 𝑦 𝑗 𝑛 𝑤 𝑖𝑗 𝑛 where 𝜑 is the (usually small positive) forgetting factor.

Hebbian Learning Hebbian: time-dependent, highly local, heavily interactive. Positively correlated: Strengthen Negatively correlated: Weaken Strong evidence for Hebbian plasticity in the Hippocampus (brainregion). Correlation v.s. Covariance Rule Covariance Rule(1977 Sejnowski) Δ𝑤 𝑖𝑗 𝑛 = 𝜂(𝑦 𝑗 𝑛 − 𝑦 )(𝑥 𝑖 𝑛 - 𝑥 )

Competitive Learning In competitive learning, neurons compete among themselves to be activated. While in Hebbian learning, several output neurons can be activated simultaneously, in competitive learning, only a single output neuron is active at any time. The output neuron that wins the “competition” is called the winner- takes-all neuron. The basic idea of competitive learning was introduced in the early 1970s. In the late 1980s, Teuvo Kohonen introduced a special class of artificial neural networks called self-organizing feature maps. These maps are based on competitive learning.

Competitive Learning Highly suited to discover statistically salient features (that may aid in classification). Three basic elements: Same type of neurons with different weight sets, so that they respond differently to a given set of inputs. A limit imposed on the strength of each neuron. Competition mechanism, to choose one winner: winner-takes- all neuron.

Competitive Learning Inputs and weights can be seen as vectors: 𝑥 and 𝑤 𝑘 . Note that the weight vector belongs to a certain output neuron 𝑘, and thus the index. Single layer, feedforward excitatory, and lateral inhibitory connection Winner selection: 𝑦 𝑘 = 1 𝑖𝑓 𝑣 𝑘 > 𝑣 𝑗 for all 𝑗, 𝑗≠𝑘 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Limit: 𝑗 𝑤 𝑘𝑗 =1 for all 𝑘 Adaptation: Δ𝑤 𝑘𝑗 = 𝜂 𝑥 𝑗 − 𝑤 𝑘𝑗 if 𝑘 is the winner or neighbor of it 0 otherwise 𝑤 𝑘 =( 𝑤 𝑘1 , 𝑤 𝑘2 ,…, 𝑤 𝑘𝑛 ) is moved toward the input vector.

Competitive Learning Δ𝑤 𝑘𝑗 = 𝜂 𝑥 𝑗 − 𝑤 𝑘𝑗 if 𝑘 is the winner or neighbor of it 0 otherwise Weight vectors converge toward local input clusters: clustering.

Bolzmann Learning Stochastic learning algorithm rooted in statistical mechanics. Recurrent network, binary neurons (on: ‘+1’, off: ‘-1’). Energy function 𝐸=− 1 2 𝑗 𝑘,𝑘≠𝑗 𝑤 𝑘𝑗 𝑥 𝑘 𝑥 𝑗 Activation: Choose a random neuron 𝑘 Flip state with a probability (given temperature 𝑇) 𝑃( 𝑥 𝑘 →− 𝑥 𝑘 )= 1 1+exp⁡(− ∆ 𝐸 𝑘 𝑇 ) where ∆ 𝐸 𝑘 is the change in 𝐸 due to the flip.

BolzmannMachine Two types of neurons Two modes of operation Visible neurons: can be affected by the environment Hidden neurons: isolated Two modes of operation Clamped: visible neuron states are fixed by environmental input and held constant. Free-running: all neurons are allowed to update their activity freely.

Bolzmann Machine: Learning Correlation of activity during clamped condition: 𝜌 𝑘𝑗 + Correlation of activity during free-running condition 𝜌 𝑘𝑗 − Weight update: ∆ 𝑤 𝑘𝑗 =𝜂( 𝜌 𝑘𝑗 + − 𝜌 𝑘𝑗 − ) Train weights 𝑤 𝑘𝑗 with various clamping input patterns. After training is completed, present new clamping input pattern that is a partial input of one of the known vectors. Let it run clamped on the new input (subset of visible neurons),and eventually it will complete the pattern (pattern completion). Corr 𝑥,𝑦 = 𝐶𝑜𝑣(𝑥,𝑦) 𝜎 𝑥 𝜎 𝑦 Note that both +kj and -kj range in value from –1 to +1.

Learning Paradigms How neural networks relate to their environment credit assignment problem learning with a teacher learning without a teacher

Credit Assignment Problem How to assign credit or blame for overall outcome to individual decisions made by the learning machine. In many cases, the outcomes depend on a sequence of actions. Assignment of credit for outcomes of actions (temporal credit- assignment problem): When does a particular action deserve credit. Assignment of credit for actions to internal decisions (structural credit-assignment problem): assign credit to internal structures of actions generated by the system. Credit-assignment problem routinely arises in neural network learning. Which neuron, which connection to credit or blame?

Learning Paradigms Supervised learning Teacher has knowledge, represented as input–output examples. The environment is unknown to the nnet. Nnet tries to emulate the teacher gradually. Error-correction learning is one way to achieve this. Error surface, gradient, steepest descent, etc.

Learning Paradigms Learning without a Teacher: no labeled examples available of the function to be learned. 1) Reinforcement learning 2) Unsupervised learning

Learning Paradigms 1) Reinforcement learning: The learning of input-output mapping is performed through continued interaction with the environment in oder to minimize a scalar index of performance.

Learning Paradigms Delayed reinforcement, which means that the system observes a temporal sequence of stimuli. Difficult to perform for two reasons: There is no teacher to provide a desired response at each step of the learning process. The delay incurred in the generation of the primary reinforcement signal implies that the machine must solve a temporal credit assignment problem. Reinforcement learning is closely related to dynamic programming.

Learning Paradigms Unsupervised Learning: There is no external teacher or critic to oversee the learning process. The provision is made for a task independent measure of the quality of representation that the network is required to learn. Internal representations for encoding features of the input space. Competitive learning rule needed, such as winner-takes-all.

The Issues of Learning Tasks: Pattern Association An associative memory: a brain- like distributed memory that learns by association. Autoassociation: A neural network is required to store a set of patterns by repeatedly presenting then to the network. The network is presented a partial description of an original pattern stored in it, and the task is to retrieve that particular pattern. Heteroassociation: It differs from autoassociation in that an arbitary set of input patterns is paired with another arbitary set of output patterns.

The Issues of Learning Tasks Let 𝑥 𝑘 denote a key pattern and 𝑦 𝑘 denote a memorized pattern. The pattern association is decribed by 𝑥 𝑘 → 𝑦 𝑘 𝑘=1,2,…,𝑞 In an autoassociative memory 𝑥 𝑘 = 𝑦 𝑘 In a heteroassociative memory 𝑥 𝑘 ≠ 𝑦 𝑘 Storage phase v.s. Recall phase 𝑞 is a direct measure of the storage capacity.

The Issues of Learning Tasks: Classification Pattern Recognition: The process whereby a received pattern/signal is assigned to one of a prescribed number of classes Two general types: Feature extraction (observation space to feature space: cf. dimensionality reduction), then classification (feature space todecision space). Single step (observation space to decision space).

The Issues of Learning Tasks Function Approximation: Consider a nonlinear input-output mapping 𝒅=𝒇(𝒙) The vector 𝒙 is the input and the vector 𝒅 is the output. The function 𝒇(.) is assumed to be unknown. The requirement is to design a neural network that approximates the unknown function 𝒇(.). 𝑭 𝒙 −𝒇(𝒙) <𝜖 for all 𝒙 System identification Inverse system 𝒅=𝒇(𝒙) 𝒙= 𝒇 −1 (𝒅)

The Issues of Learning Tasks Control: The controller has to invert the plant’s input- output behavior. Feedback controller: adjust plant input u so that the output of the plant y tracks the reference signal d. Learning is in the form of free-parameter adjustment in the controller.

The Issues of Learning Tasks Filtering: estimate quantity at time 𝑛, based on measurements up to time 𝑛. Smoothing: estimate quantity at time 𝑛, based on measurements up to time 𝑛+𝛼 (𝛼 > 0). Prediction: estimate quantity at time 𝑛+𝛼, based on measurements up to time 𝑛 (𝛼 > 0).

The Issues of Learning Tasks Blind source separation: recover 𝑢(𝑛) from distorted signal 𝑥(𝑛) when mixing matrix 𝐴 is unknown. 𝑥 𝑛 =𝐴𝑢 𝑛 Nonlinear prediction: given 𝑥 𝑛−𝑇 , 𝑥 𝑛−2𝑇 , …, estimate 𝑥 𝑛 ( 𝑥 𝑛 is the estimated value).

The Issues of Learning Tasks: Associative Memory 𝑞 pattern pairs: 𝑥 𝑘 , 𝑦 𝑘 for 𝑘=1,2,…,𝑞. Input (key vector) 𝑥 𝑘 = 𝑥 𝑘1 , 𝑥 𝑘2 ,…, 𝑥 𝑘𝑚 𝑡 Output (memorized vector) 𝑦 𝑘 = 𝑦 𝑘1 , 𝑦 𝑘2 ,…, 𝑦 𝑘𝑚 𝑡 . Weights can be represented as a weight matrix (see next slide): 𝑦 𝑘 =𝑾 𝑘 𝑥 𝑘 for 𝑘=1,2,…,𝑞. 𝑦 𝑘𝑖 = 𝑗=1 𝑚 𝑤 𝑖𝑗 𝑘 𝑥 𝑘𝑗 for 𝑖=1,2,…,𝑚. 𝑦 𝑘𝑖 = 𝑤 𝑖1 𝑘 , 𝑤 𝑖2 𝑘 ,…, 𝑤 𝑖𝑚 𝑘 𝑥 𝑘1 𝑥 𝑘2 . . 𝑥 𝑘𝑚

Associative Memory 𝑦 𝑘1 𝑦 𝑘2 . . 𝑦 𝑘𝑚 = 𝑤 11 𝑘 , 𝑤 12 𝑘 ,…, 𝑤 1𝑚 𝑘 𝑤 21 𝑘 , 𝑤 22 𝑘 ,…, 𝑤 2𝑚 𝑘 … 𝑤 𝑚1 𝑘 , 𝑤 𝑚2 𝑘 ,…, 𝑤 𝑚𝑚 𝑘 𝑥 𝑘1 𝑥 𝑘2 . . 𝑥 𝑘𝑚 With a single 𝑊(𝑘), we can only represent one mapping 𝑥 𝑘 to 𝑦 𝑘 . For all pairs 𝑥 𝑘 , 𝑦 𝑘 (for 𝑘=1,2,…,𝑞 ). We need 𝑞 such weight matrices. 𝑦 𝑘 =𝑾 𝑘 𝑥 𝑘 for 𝑘=1,2,…,𝑞 One strategy is to combine all 𝑾 𝑘 into a single memory matrix 𝑀 by simple summation 𝑀= 𝑘=1 𝑞 𝑾 𝑘 Will such a simple strategy work? That is, can the following be possible with 𝑀? 𝑦 𝑘 ≈𝑀 𝑥 𝑘 for 𝑘=1,2,…,𝑞

Correlation Matrix Memory With fixed set of key vectors 𝑥 𝑘 , an 𝑚×𝑚 matrix can store 𝑚 arbitrary output vectors 𝑦 𝑘 . Let 𝑥 𝑘 = 0,0,…,1,…0 𝑡 where only the 𝑘-th element is 1 and all the rest is 0. Construct a memory matrix 𝑀 with each column representing the arbitrary output vectors 𝑦 𝑘 . 𝑀= 𝑦 1 𝑘 , 𝑦 2 𝑘 ,…, 𝑦 𝑚 𝑘 Then, 𝑦 𝑘 =𝑀 𝑥 𝑘 for 𝑘=1,2,…,𝑚 But, we want 𝑥 𝑘 to be arbitrary too!

Correlation Matrix Memory With 𝑞 pairs 𝑥 𝑘 , 𝑦 𝑘 for 𝑘=1,2,…,𝑞, we can construct a candidate memory matrix that stores all 𝑞 mappings as: 𝑀 = 𝑘=1 𝑞 𝑦 𝑘 𝑥 𝑘 𝑡 = 𝑦 1 𝑥 1 𝑡 + 𝑦 2 𝑥 2 𝑡 +… 𝑦 𝑞 𝑥 𝑞 𝑡 = 𝑦 1 , 𝑦 2 ,… 𝑦 𝑞 𝑥 1 𝑥 2 .. 𝑥 𝑞 =𝑌 𝑋 𝑡 This can be verified easily using partitioned matrices

Correlation Matrix Memory First, consider 𝑾 𝑘 = 𝑦 𝑘 𝑥 𝑘 𝑡 only. Check if 𝑾 𝑘 𝑥 𝑘 = 𝑦 𝑘 : 𝑾 𝑘 𝑥 𝑘 = 𝑦 𝑘 𝑥 𝑘 𝑡 𝑥 𝑘 = 𝑦 𝑘 (𝑥 𝑘 𝑡 𝑥 𝑘 )= 𝑐𝑦 𝑘 where 𝑐=𝑥 𝑘 𝑡 𝑥 𝑘 a scalar value (the length of vector 𝑥 𝑘 squared). If all 𝑥 𝑘 s were normalized to have length 1, 𝑾 𝑘 𝑥 𝑘 = 𝑦 𝑘 will hold! Now, back to 𝑀 : Under what condition will 𝑀 𝑥 𝑗 give 𝑦 𝑗 for all 𝑗? Let’s begin by assuming 𝑥 𝑘 𝑡 𝑥 𝑘 =1.(Key vectors are normalized) We can decompose 𝑀 𝑥 𝑗 as follows. 𝑀 𝑥 𝑗 = 𝑘=1 𝑞 𝑦 𝑘 𝑥 𝑘 𝑡 𝑥 𝑗 = 𝑦 𝑗 𝑥 𝑗 𝑡 𝑥 𝑗 + 𝑘=1,𝑘≠𝑗 𝑞 𝑦 𝑘 𝑥 𝑘 𝑡 𝑥 𝑗 = 𝑦 𝑗 + 𝑘=1,𝑘≠𝑗 𝑞 𝑦 𝑘 𝑥 𝑘 𝑡 𝑥 𝑗 If all keys are orthogonal, then 𝑀 𝑥 𝑗 = 𝑦 𝑗 by removing the last noise term

Correlation Matrix Memory We can also ask how many items can be stored in 𝑀 , i.e., its capacity. The capacity is closely related with the rank of the matrix 𝑀 . The rank means the number of linearly independent column vectors (or row vectors) in the matrix.

Adaptation When the environment is stationary (the statistic characteristics do not change over time), supervised learning can be used to obtain a relatively stable set of parameters. If the environment is nonstationary, the parameters need to be adapted over time, on an on-going basis (continuous learning or learning-on-the-fly). If the signal is locally stationary (pseudostationary), then the parameters can repeatedly be retrained based on a small window of samples, assuming these are stationary: continual training with time-ordered samples.

Probabilistic and Statistical Aspects of the Learning Process We do not have knowledge of the exact functional relationship between X and D -> D = f(X) + , a regressive model The mean value of the expectational error , given any realization of X, is zero. The expectational error  is uncorrelated with the regression function f(X).

Probabilistic and Statistical Aspects of the Learning Process Bias/Variance Dilemma Lav(f(x),F(x,T)) = B²(w)+V(w) B(w) = ET[F(x,T)]-E[D|X=x] (an approximation error) V(w) = ET[(F(x,T)-ET[F(x,T)])² ] (an estimation error) NN -> small bias and large variance Introduce bias -> reduce variance See other sources