Presentation is loading. Please wait.

Presentation is loading. Please wait.

12.06.2016COGS 5111 Computational Cognitive Modelling COGS 511-Lecture 4 Connectionist and Dynamic Approaches.

Similar presentations


Presentation on theme: "12.06.2016COGS 5111 Computational Cognitive Modelling COGS 511-Lecture 4 Connectionist and Dynamic Approaches."— Presentation transcript:

1 12.06.2016COGS 5111 Computational Cognitive Modelling COGS 511-Lecture 4 Connectionist and Dynamic Approaches

2 12.06.2016COGS 5112 Related Readings Readings: McLeod et al. Chaps 1,5,7; Eliasmith, The Third Contender (In Thagard, Chap. 13) See also (all books available in METU Library) Sun. R (2008). The Cambridge Handbook of Computational Psychology – Chs 2,3, and 4. Rumelhart and McClelland. (1986) Parallel Distributed Processing. Wermter and Sun (2000) Hybrid Neural Systems Dawson (2005). Connectionism: A Hands-on Approach, Blackwell. Also see http://www.bcp.psych.ualberta.ca/~mike/Book3/index.htmlhttp://www.bcp.psych.ualberta.ca/~mike/Book3/index.html Lytton. (2002) From Computer to Brain: Foundations of Computational Neuroscience O’Reilly and Munakata (2000) Computational Explorations in Cognitive Neuroscience Rolls and Treves (1998) Neural Networks and Brain Function Ward. (2002) Dynamical Cognitive Science Sutton’s web site on Dynamicism in Cognitive Science (see Links) Misc. books are available for studying neural networks technically.

3 12.06.2016COGS 5113 Computers vs Brains Fastly evolving Fast cycle time Storage capacity ? Parallelism ? Fault tolerance ? Adaptiveness ? Slowly evolving Slow cycle time Inherent parallelism (100 step constraint: time for a simple cognitive task/time for firing of one neuron) Fault tolerant Adaptive

4 12.06.2016COGS 5114 Neurons Major cell type in the nervous system (other: glial cells) About 50- 100 billion neurons (10 11 ) connectedness (typical fanout 10 3 ), layered organization Different types of neurons Soma (body), dendrites (small branches), axon, myelin sheath, synaptic gap (10 -3 mm) Synaptic connections exhibit plasticity

5 12.06.2016COGS 5115 Neurons (cont). Resting membrane potential vs. Action potential (fire!) : concentration of ions Electrical synapses vs. Chemical synapses Excitatory vs inhibitory Neurotransmitters: Chemicals released across the synapse, e.g. acetylocholine, dopamine, serotonin-around 30 known A neuron’s death is final!! Synaptic proliferation, pruning, graceful degradation

6 12.06.2016COGS 5116 (McLeod et al., 1998)

7 12.06.2016COGS 5117 Connectionism Parallel Distributed Processing (PDP), Artificial Neural Networks (ANNs) Computational Neuroscience Dynamicism No supermodel (Unified Theory of Cognition) but rather a metatheory w. explorational perspective or not?

8 12.06.2016COGS 5118 Connectionist Emphasis on Cognition Parallelism Gradedness Interactivity Competition Learning

9 12.06.2016COGS 5119 Comparing PDP vs Symbolic Knowledge Storage Mainly in Connection Strengths Static Copy of a Pattern Knowledge retrievalRecreate the patterns (not necessarily exactly) – interference possible Map with LTM Knowledge and processing Intimately relatedRelated but separate LearningAdjusting connections or their strengths Better rule formations Representation(Usually) DistributedLocal (Rumelhart and McClleland, 1986)

10 12.06.2016COGS 51110 Elements of Connectionism Computing Unit: A node (neuron) as a nonlinear computing unit Network Topology and Connectivity: Each node having directed and weighted connections with some other neurons. Feedforward/recurrent; single layer/multilayered Learning Policy: Supervised/Unsupervised/Reinforcement Problem Representation: Local/Distributed/ Temporal

11 12.06.2016COGS 51111 Objections as Stated in PDP book PDP models are too weak. Minsky and Papert’s objections to perceptrons Turing equivalence and recursion behaviouristic. Reply: Concerned with representation and mental processing at implementational level of analysis only. Reply: PDP is at psychological level “Macrotheories are approximations to underlying microstructure” The argument of total brain simulation: “A principled interpretation of our understanding of the brain that transfigures it into an understanding of the mind” not biologically plausible. Reply: not enough known from neuroscience to seriously constrain the cognitive theories Quite changed since 1986. reductionist. Reply: Interactional, not reductionist

12 12.06.2016COGS 51112 Basic Properties of a unit ANN A set of input signals A set of real valued weights (excitatory or inhibitory) An activation level based on an input function on input values and weights (inner product of weight and input vectors, Σw ij.a j ) activation/threshold function: if the output of input function is above the threshold send activation along output links. A positive weight will increase positive net input to a unit it is connected; a negative weight will reduce the effect of a positive net input to a unit it is connected.

13 12.06.2016COGS 51113 Basic ANN architecture (McLeod et al., 1998)

14 12.06.2016COGS 51114 Sample Activation Functions (McLeod et al., 1998)

15 12.06.2016COGS 51115 Learning If the output units have the correct level of activity in response to the pattern of activity produced by any input in the problem domain, the model has learnt the desired task. The network is said to converge when error becomes acceptably low. The aim of a learning rule is to find such an optimum set of weights for the network connections such that it can correctly predict previously unseen input data too. (problem: overfitting/overtraining) Learning by changing network topology: optimizing number of hidden units by genetic algorithms; optimizing connectivity by “optimal brain damage”. Training vs testing: training done in multiple presentations of data called sweeps – organized into epochs.

16 12.06.2016COGS 51116 Delta Rule Perceptrons: single layered, feed forward network (Rosenblatt, 50s) An output unit which has too low an activity can be corrected by increasing the weights of connections from units in previous layer which provide positive input to it (decrease the weights if the input is negative) w ij = [a i (desired) – a i (obtained)].a j.ε Learning Rate (ε) : A constant to determine how large the changes to weights will be on any learning trial

17 12.06.2016COGS 51117 Delta Rule ->LMS -> Gradient Descent Netinput i = Σ i w ij.a j  = (t out – a out ) E p = (t out – a out ) 2 for an input pattern p a out = F (Σ in w.a in ) w= -εdE/dw w= -εd(t out – a out ) 2 )/dw w= -ε d(t out – F (Σ in w.a in )) 2 /dw w= 2. ε. .F’ (Σ in w.a in ) F’: slope of the activation function at the output unit

18 12.06.2016COGS 51118 Linear Separability (McLeod et al., 1998)

19 12.06.2016COGS 51119 Exclusive OR (XOR) Reimplemented (McLeod et al., 1998)

20 12.06.2016COGS 51120 Gradient Descent E p : error score for input pattern p (square of the difference between desired output and obtained output) Holding all but one weight constant Error landscape for two weights (McLeod et al., 1998)

21 12.06.2016COGS 51121 Basic Idea of Gradient Descent If you can calculate the slope of the curve at its current position, you can change w in the direction that will reduce the error: if slope is positive, decrease w; if negative, increase w. Slope is calculated by taking derivatives (rate of change of E wrt w) With more weights, the surface has more dimensions, but you still try to minimize the error. Since derivative of the activation function is also needed, a continous (which has a derivative at every point) function (like the logistics function) is used as the activation function.

22 12.06.2016COGS 51122 Backpropagation Aka Generalized Delta Rule applied on Multilayer Perceptrons (at least one layer of hidden units) Propagate error back to previous layers and update weights. A hidden node is responsible for some fraction of the error in each of the output node to which it connects, in proportion to strength of the connection between them.

23 12.06.2016COGS 51123 (McLeod et al., 1998)

24 12.06.2016COGS 51124 Local minima Backpropagation guarantees that a solution exists for any mapping problem; but does not guarantee to find it. A general AI problem for gradient descent search with a number of fixes (McLeod et al., 1998)

25 12.06.2016COGS 51125 Biological Plausibility of Backpropagation Axons- unidirectional transmitters of information-error ? Number of hidden units critical Learning rules such as Hebbian learning rule require local information and unsupervised. Biologically more plausible extensions of backpropagation such as Generalized Recirculation (GeneRec) algorithm and Leabra (GeneRec + Hebbian)

26 12.06.2016COGS 51126 Simple Recurrent Networks (SRNs) (McLeod et al., 1998)

27 12.06.2016COGS 51127 Recurrent Network Architectures SRNs: Fixed-weight connections from hidden units to a set of context units, that act as memory of hidden unit activities and feeding them back to hidden units on the next time step. Can discover sequential dependencies in training data. Change of output over time causes the network to settle into one of several states depending on the input. Those states are called attractors. Points of attraction close to the attractor reach the final state more quickly.

28 12.06.2016COGS 51128 An Attractor Space with Two Basins (McLeod et al., 1998)

29 12.06.2016COGS 51129 Advantages Simulate reaction time by measuring time to settle into one of the attractor states Relatively immune to noisy input Arbitrary mappings between input and output are allowed Dynamic in character

30 12.06.2016COGS 51130 Arbitrary Input-Output Mappings in Attractor Networks (McLeod et al., 1998)

31 12.06.2016COGS 51131 Variety of ANNs Hopfield networks Adaptive Resonance Theory (ART) networks Kohonen Self Organizing Maps Radial Basis Function Networks Boltzmann Machines, Support Vector Machines and more...

32 12.06.2016COGS 51132 Hybrid Neural Networks Best of both worlds ? Unified Neural Architectures Rely solely on connectionist representations but symbolic interpretation is possible Hybrid Transformation Architectures Transform symbolic representations into neural ones or vice versa e.g. rule extraction architectures Hybrid Modular Architectures Coexisting symbolic and neural modules Coupling between them can be loose or tight

33 12.06.2016COGS 51133 Dynamicism Natural cognitive systems are certain kinds of dynamical systems and are best understood from the perpective of dynamics, i.e. unambigously described interactions of a cognizer with its environment through time. A novel set of metaphors for thinking about cognition or real explanatory power for embodied cognition? Brains are dynamical systems but is dynamicist hypothesis a new paradigm?

34 12.06.2016COGS 51134 Dynamical Systems Theory Terminology: state space, trajectory, attractors, topology, birfurcation points etc etc. Tools: Linear and Nonlinear Time Series Analysis, Chaos Theory, Complexity Theory, Relaxation Oscillators

35 12.06.2016COGS 51135 Applications Cyclical Motor Behaviour Model of Human Neonate Olfactory Bulb Model: Model for neural processing of smell in rabbits The A-not-B error in infants – immature concept of object permanence or inability to sustain visually cued reach in a novel direction in presence of strong memory of previous reaches, prediction: should be possible to observe in older children

36 12.06.2016COGS 51136 Difficulties Dimensionality and tractability of models Do we reject mental represenations, alternatives? Internal states? Predictive power? Connectionism vs dynamicism – not always easily separable, eg. Elman’s SRNs predicting word boundaries; vocabulary and style of explanation different only?

37 12.06.2016COGS 51137 Bayes' Rule Product rule P(hd) = P(h | d) P(d) = P(d | h) P(h) Bayes' rule: P(h | d) = (P(d | h) P(h)) / P(d) h: hypothesis d: data Useful for assessing diagnostic probability from causal probability: P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect)

38 12.06.2016COGS 51138 Bayes’ Rule A doctor knows that the disease meningitis causes the patient to have a stiff neck, say 50% of the time. The doctor also knows that the prior probability that a patient has meningitis is very low, 1/50 000 and the prior probability that a patient has stiff neck is 1/20. E.g., let M be meningitis, S be stiff neck: P(m|s) = P(s|m) P(m) / P(s) = 0.5 × 1/50000 / 1/20 = 0.0002 Note: posterior probability of meningitis still very small! We expect only one in 5000 cases with stiff neck to have meningitis. (Russell and Norvig, 2003)

39 12.06.2016COGS 51139 Bayesian Models of Cognition Many Recent Models on Learning and Inference Language Acquisition Visual Scene Perception Categorization Causal Relations Available data underconstrain the inferences so we make guesses guided by prior probabilities about which structures are most likely. Modeling at “computational” level Integration attempts between connectionism and Bayesian models

40 12.06.2016COGS 51140 Lecture 5 HWs Announced – To be individually done... Do not forget the Forum Activity Problems and Evaluation in Cognitive Modelling Reading: Gluck, Pew and Young (2005).


Download ppt "12.06.2016COGS 5111 Computational Cognitive Modelling COGS 511-Lecture 4 Connectionist and Dynamic Approaches."

Similar presentations


Ads by Google