Artificial neural networks

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Perceptron Learning Rule
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Perceptron.
Machine Learning Neural Networks
Neural Networks I CMPUT 466/551 Nilanjan Ray. Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation.
Lecture 14 – Neural Networks
Simple Neural Nets For Pattern Classification
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
20.5 Nerual Networks Thanks: Professors Frank Hoffmann and Jiawei Han, and Russell and Norvig.
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Data Mining with Neural Networks (HK: Chapter 7.5)
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Artificial Neural Networks
CS 4700: Foundations of Artificial Intelligence
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural Networks Lecture 8: Two simple learning algorithms
Artificial Neural Networks
Biointelligence Laboratory, Seoul National University
Artificial Neural Networks
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
1 Machine Learning The Perceptron. 2 Heuristic Search Knowledge Based Systems (KBS) Genetic Algorithms (GAs)
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Non-Bayes classifiers. Linear discriminants, neural networks.
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Artificial Neural Network
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Perceptrons Michael J. Watts
Chapter 6 Neural Network.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Fall 2004 Backpropagation CS478 - Machine Learning.
Artificial Neural Networks
One-layer neural networks Approximation problems
第 3 章 神经网络.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Machine Learning Today: Reading: Maria Florina Balcan
Data Mining with Neural Networks (HK: Chapter 7.5)
Classification Neural Networks 1
Neuro-Computing Lecture 4 Radial Basis Function Network
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Chapter - 3 Single Layer Percetron
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Artificial neural networks Ricardo Ñanculef Alegría Universidad Técnica Federico Santa María Campus Santiago

Learning from Natural Systems Bio-inspired systems Ants colony Genetic algorithms Artificial neural networks The power of the brain Examples: vision, text-processing Other animals: dolphins, bats Las redes neuronales se describen tradicionalmente con un modelo bio-inspirado. Los sistemas de computación bioinspirados son aquellos que se basan en la observación y estudio de los sistemas naturales con el fin de obtener sistemas de cómputo, de procesamiento de información que imiten dichos sistemas y hereden algunas de sus características esenciales. Como su nombre da a entender, las redes neuronales aparecen como un modelo simplificado del funcionamiento del cerebro, el cual posee muchas características deseables en una máquina de cómputo y que no están presentes en el modelo tradicional de computación de Von Neumann. Algunas de estas características son: Computación en paralelo: gran cantidad de procesadores simples (neuronas) Representación distribuida del conocimiento: memoria no localizada, integrada al procesador Capacidad de aprender y generalizar Adaptabilidad: podemos reaccionar rápidamente a nuevos contextos Robustez y tolerancia a fallas: obtenidamente gracias a la redundancia y al paralelismo.

Modeling the Human Brain key functional characteristics Learning and generalization ability Continuous adaptation Robustesness and fault tolerance Las redes neuronales se describen tradicionalmente con un modelo bio-inspirado. Los sistemas de computación bioinspirados son aquellos que se basan en la observación y estudio de los sistemas naturales con el fin de obtener sistemas de cómputo, de procesamiento de información que imiten dichos sistemas y hereden algunas de sus características esenciales. Como su nombre da a entender, las redes neuronales aparecen como un modelo simplificado del funcionamiento del cerebro, el cual posee muchas características deseables en una máquina de cómputo y que no están presentes en el modelo tradicional de computación de Von Neumann. Algunas de estas características son: Computación en paralelo: gran cantidad de procesadores simples (neuronas) Representación distribuida del conocimiento: memoria no localizada, integrada al procesador Capacidad de aprender y generalizar Adaptabilidad: podemos reaccionar rápidamente a nuevos contextos Robustez y tolerancia a fallas: obtenidamente gracias a la redundancia y al paralelismo.

Modeling the Human Brain key structural characteristics Massive parallelism Distributed knowledge representation: memory Basic organization: networks of neurons Las redes neuronales se describen tradicionalmente con un modelo bio-inspirado. Los sistemas de computación bioinspirados son aquellos que se basan en la observación y estudio de los sistemas naturales con el fin de obtener sistemas de cómputo, de procesamiento de información que imiten dichos sistemas y hereden algunas de sus características esenciales. Como su nombre da a entender, las redes neuronales aparecen como un modelo simplificado del funcionamiento del cerebro, el cual posee muchas características deseables en una máquina de cómputo y que no están presentes en el modelo tradicional de computación de Von Neumann. Algunas de estas características son: Computación en paralelo: gran cantidad de procesadores simples (neuronas) Representación distribuida del conocimiento: memoria no localizada, integrada al procesador Capacidad de aprender y generalizar Adaptabilidad: podemos reaccionar rápidamente a nuevos contextos Robustez y tolerancia a fallas: obtenidamente gracias a la redundancia y al paralelismo. receptors Neural nets effectors

Modeling the Human Brain neurons

Human Brain in numbers Cerebral cortex: E11 neurons (more than the number of stars in the milky-way) Massive connectivity: E3 to E4 connections per neuron (in total, E15 connections) Time response E-3 seconds. Silicon chips E-9 seconds (one million times faster ) Yet, human are more efficient than computers at computationally complex tasks. Why?

Artificial Neural Networks “ A neural network is a massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use. It resembles the brain in: Knowledge is acquired by a learning process Connection strengths between processing units are used to store the acquired knowledge. ” Simon Haykin, “Neural Networks, a comprehensive foundation”, 2nd Ed, Reprint 2005, Prentice Hall

Artificial Neural Networks “ From the perspective of pattern recognition, neural networks can be regarded as an extension of the many conventional techniques which have been developed over several decades (…) for example, discriminant functions, logit ..” Christopher Bishop, “Neural Networks for Pattern Recogniton”, Reprint, 2005, Oxford University Press

Artificial Neural Networks diverse applications Pattern Classification Clustering Function Approximation: Regression Time Series Forecasting Optimization Content-addressable Memory

The beginnings McCulloch and Pitts, 1943: “A logical calculus of the ideas immanent in nervous activity” First neuron model based on simplifications about the brain behavior binary incoming signals Connection strengths: weights to each incoming signal binary response: active or inactive activation threshold or bias Just some years earlier: boolean algebra

The beginnings The model activation threshold (bias) connection weights

The beginnings These neurons can compute logical operations

The beginnings Perceptron (1958). Rosenblatt proposes the use of “layers of neurons” as a computational tool. Proposes a training algorithm Emphasis in the learning capabilities of NN McCulloch and Pitts derived to automata theory

The perceptron Architecture …

The perceptron Notation …

Perceptron Rule We have a set of patterns with desired responses … If used for classification … number of neurons in the perceptron clase

Perceptron Rule Initialize the weights and the thresholds Present a pattern vector Update the weights according to If used for classification … learning rate

Separating hyperplanes

Separating hyperplanes Hyperplane: set L of points satisfying For any pair of points lying in L Hence, the normal vector to L is

Separating hyperplanes Signed distance of any point to L

Separating hyperplanes Consider a two-class classificaton problem One class coded as +1 and one as -1 An input is classified as the sign of the distance to the hyperplane How to train the classifier?

Separating hyperplanes An idea: train to minimize the distance of the misclassified inputs to the hyperplane Note this is very different to train with the quadratic loss of all the points

Gradient Descent Suppose we have to minimize on Where is a vector For example Iterate:

Stochastic Gradient Descent Suppose we have to minimize on Where is a random variable We have samples of Iterate:

Separating hyperplanes If M is fixed Stochastic gradient descent

Separating hyperplanes For correctly classified inputs no correction on the parameters is applied Now, note that

Separating hyperplanes Perceptron rule

Perceptron convergence theorem Theorem: If there exists a set of connection weights and activation threshold which is able to separate the two clases, the perceptron algorithm will converge to some solution in a finite number of steps and indepently of the initialization of the weights and bias.

Perceptron Conclusion: perceptron rule, with two clases, is a stochastic gradient descent algorithm that aims to minimize the distances of the misclassified examples to the hyperplane. With more than two classes, the perceptron uses one neuron to model a class againts the others. This is an actual perspective

Delta Rule Widrow and Hoff It considers general activation functions

Delta Rule Update the weights according to …

Delta Rule Update the weights according to …

Delta Rule Can the perceptron rule be obtained as a special case from this rule? Step function is not differentiable Note that with this algorithm all the patterns are observed before correction, while with the Rosenblatt's algorithm each pattern induces a correction

Perceptrons Perceptrons and logistic regression With more than 1 neuron: each neuron has the form of a logistic model of one class against the others.

Neural Networks Death Minsky: 1969, “Perceptrons”. ... xn y = 1 x1 b x1 y = -1 ... x1 x2 x3 xn

Neural Networks Death A perceptron cannot learn the XOR

Neural Networks renaissance Idea: map the data to a feature space where the solution is linear

Neural Networks renaissance Problem: this transformation is problem dependent

Neural Networks renaissance Solution: multilayer perceptrons (FANN) More biologically plausible Internal layers learn the map

Architecture

Architecture: regression each output corresponds to a response

Architecture: classification each output corresponds to a class, such that Training data has to be coded by 0-1 response variables

Universal approximation Theorem: Let an admissible activation function and let be a compact subset of Hence, for any continuous function and for any Theorem

Universal approximation Admissible activation functions Theorem

Universal approximation norm extensions: other output activation functions other norms Theorem

Fitting Neural Networks The back-propagation algorithm: A generalization of the delta rule for multilayer perceptrons It is a gradient descent algorithm for the quadratic loss function

Back-propagation Gradient descent generates a sequence of aproximations related as

Back-propagation Equations For Why back propagation? …

Back-propagation Algorithm Initialize the weights and the thresholds For each example i compute Update the weights according to Iterate 2 and 3 until convergence

Stochastic Back-propagation Initialize the weights and the thresholds For each example i compute Iterate 2 until convergence

Some Issues in Training NN Local Minima Architecture selection Generalization and Overfitting Other training functions

Local Minima Back-propagation is a gradient descent procedure and hence converges to any configuration of weights such that This can be local

Local Minima Starting values Usually random values near zero Note that the sigmoid functions roughly linear if the weights are near zero Training as non-linearity increasing

Local Minima Starting values Stochastic back-propagation: order in presentation of the examples Multiple neural networks Select the best Average the networks Average the weights Ensemble models

Local Minima Other optimization algorithms Back-propagation with momentum Momentum term Momentum parameter 0.1-0.8

Overfitting Early stopping and validation set Divide the available data into training and validation sets. Compute the validation error rate periodically during training. Stop training when the validation error rate "starts to go up".

Overfitting Early stopping and validation set

Overfitting Regularization by weight decay Weight decays shrinks towards a linear (very simple!!) model

A Closer Look at Regularization Values of risk Space of functions Convergence in A doesn’t guarantee convergence in B

Overfitting Regularization by weight decay Tiknohov regularization: let us consider the problem of estimating a function by observing Suppose we minimize on some space H

Overfitting Regularization by weight decay it is well-known that ever for continuos A is not true Key regularization theorem: If H is compact, the last property holds!!

Overfitting Regularization by weight decay compactness: a metric space is called compact if it is bounded and closed suppose we minimize on H where the sets are compact

Overfitting Regularization by weight decay Let a function such that Hence under some selections of Example

A Closer Look at Weight Decay Less complicated hypothesis has lower error rate

NN for classification Loss function: Is the quadratic loss appropriate?

NN for classification

Projection Pursuit Generalization of 2-layer regression NN Universal approximator Good for prediction Not good for deriving interpretable models of data Basis functions (activation functions) are now “learned” from data Weights are viewed as projection directions we have to “pursuit”

Projection Pursuit Output ridge functions & unit vectors Inputs

PPR: Derived Features Dot product is projection of the signal onto Ridge function varies in the direction of

PPR: Training Minimize squared error Consider Given , we derive features and smooth Given , we minimize over with Newton’s (like) Method Iterate those two steps to convergence

PPR: Newton’s Method Use derivatives to iteratively improve estimate

PPR: Newton’s Method Use derivatives to iteratively improve estimate Weighted least squares regression to hit the target

PPR: Implementation Details Suggested smoothing methods Local regression Smoothing splines ( , ) pairs added in a forward stage-wise manner Very close to ensemble methods

Conclusions Neural Networks are very general approach to both regression and classification Effective learning tool when: Prediction is desired Formulating a description of a problem’s solution is not desired