CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 5: Distributed Representations Geoffrey Hinton.

Slides:



Advertisements
Similar presentations
Data Models There are 3 parts to a GIS: GUI Tools
Advertisements

Slides from: Doug Gray, David Poole
Neural Network Models in Vision Peter Andras
NEURAL NETWORKS Perceptron
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Un Supervised Learning & Self Organizing Maps. Un Supervised Competitive Learning In Hebbian networks, all neurons can fire at the same time Competitive.
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Artificial neural networks:
PDP: Motivation, basic approach. Cognitive psychology or “How the Mind Works”
Classification Neural Networks 1
CSC321: Neural Networks Lecture 3: Perceptrons
Upcoming Stuff: Finish attention lectures this week No class Tuesday next week – What should you do instead? Start memory Thursday next week – Read Oliver.
Lecture 14 – Neural Networks
Neural Networks Basic concepts ArchitectureOperation.
9.012 Brain and Cognitive Sciences II Part VIII: Intro to Language & Psycholinguistics - Dr. Ted Gibson.
Un Supervised Learning & Self Organizing Maps Learning From Examples
COGNITIVE NEUROSCIENCE
Chapter Seven The Network Approach: Mind as a Web.
Reading. Reading Research Processes involved in reading –Orthography (the spelling of words) –Phonology (the sound of words) –Word meaning –Syntax –Higher-level.
©2005 Austin Troy. All rights reserved Lecture 3: Introduction to GIS Part 1. Understanding Spatial Data Structures by Austin Troy, University of Vermont.
CS292 Computational Vision and Language Visual Features - Colour and Texture.
Physical Symbol System Hypothesis
PY202 Overview. Meta issue How do we internalise the world to enable recognition judgements to be made, visual thinking, and actions to be executed.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
Self organizing maps 1 iCSC2014, Juan López González, University of Oviedo Self organizing maps A visualization technique with data dimension reduction.
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
CSC321: Neural Networks Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis Geoffrey Hinton.
CSC321: Neural Networks Lecture 2: Learning with linear neurons Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
CSC321: Introduction to Neural Networks and machine Learning Lecture 16: Hopfield nets and simulated annealing Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 22: Transforming autoencoders for learning the right representation of shapes Geoffrey.
CSC321: Neural Networks Lecture 19: Simulating Brain Damage Geoffrey Hinton.
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
381 Self Organization Map Learning without Examples.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
CSC321: Lecture 7:Ways to prevent overfitting
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
CSC321: Neural Networks Lecture 18: Distributed Representations
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Week 4 Motion, Depth, Form: Cormack Wolfe Ch 6, 8 Kandell Ch 27, 28 Advanced readings: Werner and Chalupa Chs 49, 54, 57.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
Perceptrons Michael J. Watts
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
1 Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Intelligent Systems Laboratory.
CSC321: Neural Networks Lecture 1: What are neural networks? Geoffrey Hinton
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
March 31, 2016Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I 1 … let us move on to… Artificial Neural Networks.
CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.
Today’s Lecture Neural networks Training
Big data classification using neural network
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
Deep Learning Amin Sobhani.
Data Mining, Neural Network and Genetic Programming
© 2016 by W. W. Norton & Company Recognizing Objects Chapter 4 Lecture Outline.
Volume 56, Issue 2, Pages (October 2007)
CSC2535: Computation in Neural Networks Lecture 13: Representing things with neurons Geoffrey Hinton.
Problems with CNNs and recent innovations 2/13/19
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
The Network Approach: Mind as a Web
Volume 56, Issue 2, Pages (October 2007)
Presentation transcript:

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 5: Distributed Representations Geoffrey Hinton

Localist representations The simplest way to represent things with neural networks is to dedicate one neuron to each thing. –Easy to understand. –Easy to code by hand Often used to represent inputs to a net –Easy to learn This is what mixture models do. Each cluster corresponds to one neuron –Easy to associate with other representations or responses. But localist models are very inefficient whenever the data has componential structure.

Examples of componential structure Big, yellow, Volkswagen –Do we have a neuron for this combination Is the BYV neuron set aside in advance? Is it created on the fly? How is it related to the neurons for big and yellow and Volkswagen? Consider a visual scene –It contains many different objects – Each object has many properties like shape, color, size, motion. –Objects have spatial relationships to each other.

Using simultaneity to bind things together Represent conjunctions by activating all the constituents at the same time. –This doesn’t require connections between the constituents. –But what if we want to represent yellow triangle and blue circle at the same time? Maybe this explains the serial nature of consciousness. –And maybe it doesn’t! color neurons shape neurons

Using space to bind things together Conventional computers can bind things together by putting them into neighboring memory locations. –This works nicely in vision. Surfaces are generally opaque, so we only get to see one thing at each location in the visual field. If we use topographic maps for different properties, we can assume that properties at the same location belong to the same thing.

The definition of “distributed representation” Each neuron must represent something, so this must be a local representation. “Distributed representation” means a many-to- many relationship between two types of representation (such as concepts and neurons). –Each concept is represented by many neurons –Each neuron participates in the representation of many concepts

Coarse coding Using one neuron per entity is inefficient. –An efficient code would have each neuron active half the time. This might be inefficient for other purposes (like associating responses with representations). Can we get accurate representations by using lots of inaccurate neurons? –If we can it would be very robust against hardware failure.

Coarse coding Use three overlapping arrays of large cells to get an array of fine cells –If a point falls in a fine cell, code it by activating 3 coarse cells. This is more efficient than using a neuron for each fine cell. –It loses by needing 3 arrays –It wins by a factor of 3x3 per array –Overall it wins by a factor of 3

How efficient is coarse coding? The efficiency depends on the dimensionality –In one dimension coarse coding does not help –In 2-D the saving in neurons is proportional to the ratio of the fine radius to the coarse radius. –In k dimensions, by increasing the radius by a factor of r we can keep the same accuracy as with fine fields and get a saving of:

Coarse regions and fine regions use the same surface Each binary neuron defines a boundary between k- dimensional points that activate it and points that don’t. –To get lots of small regions we need a lot of boundary. finecoarse constant saving in neurons without loss of accuracy ratio of radii of fine and coarse fields

Limitations of coarse coding It achieves accuracy at the cost of resolution –Accuracy is defined by how much a point must be moved before the representation changes. –Resolution is defined by how close points can be and still be distinguished in the represention. Representations can overlap and still be decoded if we allow integer activities of more than 1. It makes it difficult to associate very different responses with similar points, because their representations overlap –This is useful for generalization. The boundary effects dominate when the fields are very big.

Coarse coding in the visual system As we get further from the retina the receptive fields of neurons get bigger and bigger and require more complicated patterns. –Most neuroscientists interpret this as neurons exhibiting invariance. –But its also just what would be needed if neurons wanted to achieve high accuracy –For properties like position orientation and size. High accuracy is needed to decide if the parts of an object are in the right spatial relationship to each other.

The Effects of Brain Damage Performance deteriorates in some unexpected ways when the brain is damaged. This can tell us a lot about how information is processed. –Damage to the right hemisphere can cause neglect of the left half of visual space and a lack of a sense of ownership of body parts. –Damage to parts of the infero-temporal cortex can prevent face recognition. –Damage to other areas can destroy the perception of color or of motion. Before brain scans, the performance deficits caused by physical damage were the main way to localize functions in the human brain – recording from human brain cells is not usually allowed (but it can give surprising results!).

Acquired dyslexia Occasionally, damage to the brain of an adult causes bizarre reading deficits – Surface dyslexics can read regular nonsense words like “mave” but mispronounce irregular words like “yacht”. –Deep dyslexics cannot deal with nonsense words at all. They can read “yacht” correctly sometimes but sometimes misread “yacht” as “boat”. They are also much better at concrete nouns than at abstract nouns (like “peace”) or verbs.

Some weird effects PEACH “apricot” SYMPATHY “orchestra” HAM “food”

The dual route theory of reading Marshall and Newcombe proposed that there are two routes that can be separately damaged. –Deep dyslexics have lost the phonological route and may also have damage to the semantic route. But there are consistent peculiarities that are hard to explain this way. PEACH The meaning of the word The sound of the word Speech production visual features of the word

An advantage of neural network models Until quite recently, nearly all the models of information processing that psychologists used were inadequate for explaining the effects of damage. –Either they were symbol processing models that had no direct relationship to hardware –Or they were just vague descriptions that could not actually do the information processing. There is no easy way to make detailed predictions of how hardware damage will affect performance in models of this type. Neural net models have several advantages: –They actually do the required information processing rather than just describing it. –They can be physically damaged and the effects can be observed.

A model of the semantic route Grapheme units: one unit for each letter/position pair Intermediate units to allow a non-linear mapping Sememe units: one per feature of the meaning Recurrently connected clean-up units: to capture regularities among sememes

The equivalence between layered, feedforward nets and recurrent nets w 1 w 2 w 3 w 4 w 1 w 2 w 3 w 4 w 1 w 2 w 3 w 4 w 1 w 2 w 3 w 4 time=0 time=2 time=1 time=3 Assume that there is a time delay of 1 in using each connection. The recurrent net is just a layered net that keeps reusing the same weights.

What the network learns We used recurrent back-propagation for six time steps with the sememe vector as the desired output for the last 3 time steps. –The network creates semantic attractors. – Each word meaning is a point in semantic space and has its own basin of attraction. –Damage to the sememe or clean-up units can change the boundaries of the attractors. This explains semantic errors. Meanings fall into a neighboring attractor. –Damage to the bottom-up input can change the initial conditions for the attractors. This explains why early damage can cause semantic errors.

Sharing the work between attractors and the bottom-up pathway Feed-forward nets prefer to produce similar outputs for similar inputs. –Attractors can be used to make life easy for the feed- forward pathway. Damaging attractors can cause errors involving visually similar words. – This explains why patients who make semantic errors always make some visual errors as well. –It also explains why errors that are both visually and semantically similar are particularly frequent. semantic space visual space cot cat

Can very different meanings be next to each other in semantic space? Take two random binary vectors in a high- dimensional space. –Their scalar product depends on the fraction of the bits that are on in each vector. –The average of two random binary vectors is much closer to them than to other random binary vectors!

Fractal attractors? The semantic space may have structure at several different scales. –Large-scale structure represents broad categories. –Fine-scale structure represents finer distinctions Severe damage could blur out all the fine structure –Meanings get cleaned-up to the meaning of the broad category. Complex features get lost. –May explain regression to childhood in senility? tools animals hammer saw drill weasel cat dog

The advantage of concrete words We assume that concrete nouns have many more semantic features than abstract words. –So they can benefit much more from the semantic clean-up. The right meaning can be recovered even if the bottom-up input is severely damaged. But severe damage to the semantic part of the network will hurt concrete nouns more because they are more reliant on the clean-up.