EE141 1 Memory Janusz A. Starzyk Computational Intelligence Based on a course taught by Prof. Randall O'ReillyRandall O'Reilly University of Colorado and.

EE141 1 Memory Janusz A. Starzyk Computational Intelligence Based on a course taught by Prof. Randall O'ReillyRandall O'Reilly University of Colorado and Prof. Włodzisław Duch Uniwersytet Mikołaja Kopernika

EE141 2 General remarks Memory is any persistent effect of experience. Memory is seemingly uniform, but in reality it is very differentiated: spatial, visual, aural, recognition, declarative, semantic, procedural, explicit, implicit … Here we test mechanisms, so the primary division is:  Synaptic memory (physical changes in synapses), long-term and requiring activation to have some influence on functioning.  Dynamic memory, active, temporary activations, affects current functioning.  Long-term priming, based on synaptic memory, yielding to fast modification – semantic and procedural memory are the result of slow processes.  Short-term priming, based on active memory.

EE141 3 General remarks Memory Types Working memory Short term memory Long term memory Declarative Nondeclarative Facts Events Manual skills Conditioning Priming Emotional Motor NeocortexCerebellumNuclei Parietal cortex Prefrontal cortex Limbic system STMLTM

EE141 4 3 regions PC – rear parietal cortex and motor cortex; distributed representations, spatial memory, long-term priming, associations, deductions, schemes. FC – prefrontal cortex, isolated representations, disruption control, working memory. HC – hippocampus formation, episodic memory, spatial memory, declarative memory, sparse representations, good image separation.  Slow learning, statistically relevant relationships => procedural and semantic memory, cortical; fast => episodic, HC.  Retaining active information and simultaneously accepting new information, eg. multiplying in your head 12*6, requires FC.

EE141 5 Slow/rapid learning A neurons learns situational probability, correlations between the desired activity and input signals; optimal value of 0.7 is reached rapidly only with a small learning constant of 0.005  Every experience is a small fragment of uncertain, potentially useful knowledge about the world => stability of one's image of the world requires slow learning, integration leads to forgetting individual events.  Relevant new information is learned after a single exposure.  Lesions in the formation of the hippocampus cause subsequent amnesia.  The neuromodulation system reaches a compromise of stability/plasticity.

EE141 6 Complementary learning systems

EE141 7 Active memory and priming Distributed overlapping representations in the PC can efficiently record information about the world, but this is not very precise and blurs with the passage of time. FC – prefrontal cortex, stores isolated representations; increases memory stability. The effects of priming are evident in people with a damaged hippocampus, cortical priming in the PC is possible. We will differentiate many forms of priming:  length (short-term, long-term),  type of information (visual, lexical),  similarity (repetition, semantic).

EE141 8 Priming Standard: completing roots, after reading a list of words we get a root and must add the ending, eg. rea--- If reaction was on the list earlier, then it is usually chosen. The interval of time can be about an hour, so active memory can't be responsible for this. Homophones: read, reed. Completion: "It was found that the...eel is on the...", in which the last word is "orange, wagon, shoe, table” is heard as: "peel is on the orange", "wheel is on the wagon", "heel is on the shoe" "meal is on the table".

EE141 9 Priming model Project wt_priming.proj, Chapter 9 from ( http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_Wt_Priming) View Events: the first 3 have the same input images, but different output images, in total 13 pairs x 2 outputs = 26 combinations, IA - IB Attention: we're not yet learning the AB-AC lists, just the effect of learning.

EE141 10 Exploring the model View TrainLog and evaluation of the result: similarity of the output image, summarized as a yellow line, the name of the most similar event, measured by sm_nm = binary errors in the names of the closest events, part of the result not very similar to the given: A  B. In blue both_err = 1 only if this isn't one of the two acceptable output images. Noise helps to break through impasses but it also causes a small lack of stabilisation of already-learned images.

EE141 11 Further tests Test_logs: first we will check if there are some tendencies, and then if we can teach a network to change preference after the presentation of IA and then IB. wt_update=Test, Test does one epoch, check Trial1_TextLog: ev_nm is either IA, or IB, and sm_nm is either 0 or 1, randomly. In Epoch1_TextLog we can see that there is always one of the two results, in sum 13/26, or half the time: there is no tendency. We check whether one exposure changes anything. wt_update => On_Line, learning after every event, Run Test, the frequency increases significantly to 18 and then 25 times. Conclusions: just error reduction gives mixed outputs A and B, a network without kWTA won't learn this task. The parietal cortex can be responsible for long-term priming.

EE141 12 AB-AC Learning People are able to learn two lists, word pairs A-B, and then A-C, eg. window-mind bike-trash.... and then: window-train bike-cloud without greater interference, doing well on tests for AB and AC. Networks with only error correction forget catastrophically! Interference results from using the same elements and weights to learn different associations. It's necessary to use different units, or to learn with context.

EE141 13 AB-AC Model Project ab_ac_interference.proj ( http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_AB- AC_List_Learning) View Events_AB, Events_AC, Output: either A, or C, the context differentiates. Replication of catastrophic learning: View: Train_graph_log, red = errors, yellow = tests for AB. The test shows that after learning AC, the network forgets AB, many units in the hidden layer take part in the learning of both lists.

EE141 14 AB-AC Model hid_kwta 12=>4 to decrease the number of active elements. The test, but without changes. Increase the variance of initial values. wt_var 0.25=>0.4 Stronger influence of context fm_context 1=>1.5 Hebbian learning hebb 0.01=>0.05 Decrease the rate of learning lrate => 0.1, Batch Nothing here clearly helps but the catastrophes are less likely... Two systems of learning are clearly necessary, a fast one and a slow one – cortex and hippocampus.

EE141 15 Hippocampus Anatomy and connections of the structures of the hippocampal formation: signals reach from uni- and multimodal association areas through the Entorminal Cortex (EC).

EE141 16 More anatomy Hippocampus = king of the cortex Bidirectional connections with the entorhinal cortex: olfactory bulb, cingulate cortex, superior temporal gyrus (STG), insula, orbitofrontal cortex.

EE141 17 More anatomy Sporadic activation Representations in CA3 and CA1 are focused on specific stimuli, while in the subiculum and the entorhinal cortex they are strongly distributed.

EE141 18 Hippocampal formation Model contains structures: dentate gyrus (DG), areas CA1 and CA3, entorhinal cortex (EC). Pct Act = % of activation.

EE141 19 Separation and conjunction of images CA1 separates by conjunction of images (representations) It's also able to recreate the original activation from the EC by reversible connections  The hippocampus rapidly associates various representations of the cortex.  Creates episodic memory  Completes activations recreated from the memory and separates them into clearly distinct meanings  Sparse encoding eases the separation of meanings

EE141 20 Model of the hippocampus Project hip.proj ( http://grey.colorado.edu/CompCogNeuro/index.php/CECN1 _Hippocampus ) Input signals enter through the entorhinal cortex (EC_in), to the dentate gyrus DG and the CA3 area, DG also influences CA3, where received signals can be completed through associations. CA3 has strong internal connections. CA1 has more distributed sparse representations => EC_out. EC: 144 el = 4*36; 1 of 4 active. DG: 625 el, CA3: 240 el CA1: 384 el = 12 col * 32 el

EE141 21 Exploration of the hippocampal model Learning of AB – AC associations without interference. Autoassociations: EC_in = EC_out, reversible transformations. BuildNet, View_Train_Trial_Log will show the statistics. The input includes information about the input and output images and the list. StepTrain: units chosen in the previous step have white outlines. Partial overlapping of images in EC_in, DG, CA3, CA1. Training epoch: 10 list elements + 3 test sets: AB, AC, new View Test_Logs => text and graph log train_updt = no_updt to the test log, Run will do 3 epochs, the results are in Text_log, 70% remembered from the AB list and 100% from the AC list. Set test_uodt = no_updt, the network will more rapidly finish 3 training/test epochs. Test analysis: test_updt = Cycle_updt, Clear Trial1_1_Text_log StepTest, we see only A + context, we see how the image completes.

EE141 22 Further exploration Targ in Network shows what image was learned, act  targ In TextLog, stim_er_on = proportion of units erroneously activated in EC_out, stim_er_off = erroneously not activated in EC_out. In Trial_1_GraphLog we can see these two numbers after every test, for known images they're small, correct memories, for new ones they're large, but on ~0,5 and off ~0.8, the network rarely fails. To move to list AC we turn off Test_updt = Trial_updt (or no_updt) and StepTest until in text_log, epc_ctrl changes to 1. These are events for list AC: the network does not recognize them (rmbr=0) because it hasn't learned them yet. Train_Epcs=5, train_env=Train_AC, Run and check results.

EE141 23 Summary The hippocampal model can rapidly, sequentially learn associations AB – AC without excessive interference. For this it was sufficient to use the Hebbian contrast rule, CPCA and the correct architecture. Interference results from using the same units, in CA3 it arrives at separation of identical images (representations) learned in another context. Separation of images doesn't allow associations, inferences based on similarity, efficient encoding of multidimensional information. The conjunction of images happens in CA1. This suggests a complementary role of the hippocampus, supplementing the slow learning mechanisms of the cortex. The hippocampus can remember episodes helping in spatial orientation, create conjunctive representations connecting different stimuli together quicker than the cortex.

EE141 24 Memory Memory is not uniform 1. Weights (long-term, require activation) vs activations (short-term, already activated, can influence processing) 2. Based on weights  The cortex has initial states but suffers from catastrophic influences.  The hippocampus can learn fast without influences, using sparse distributed representations of images 3. Based on activation  The cortex shows initial states but  isn't good for short-term memory 4. Cooperation of activation and memory based on weights 5. Video 1.short-term memory in chimpanzees -30 secshort-term memory in chimpanzees 2.Comparison with students– 30 secComparison with students–

EE141 25 Active short-term memory Short-term priming: attention and influence on reaction speed. Besides the duration, memory content and effects resulting from similarity are like long-term priming. Project act_priming.proj. (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_Act_Priming) Completing roots or homophony, but without learning, only the influence of the remains after the last activation. The network has learned series IA-IB. The test has a series of images and results A and B, we show it A upon output, the network responds A; now we show the image for B but only phase is turned on – (lack of learning), the network's result is sometimes A, sometimes B. LoadNet, View TestLogs,Test The correlations of previous results A and B depend on the speed of fading of activation; check efekt act_decay 1 => 0, tendency to leaving a. Analyze the influence on results in test_log.

EE141 26 Active maintenance Project act_maint.proj (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_Active_Maintenance): active maintenance of information in working memory despite interference, quickly accessible, doesn't require synaptic changes. Recurrence is necessary, an attractor network with a large pool of attraction, resistant to noise. Video – remembering with delay – 30 secremembering with delay The processes of analysing environmental data don't require such networks, because they are steered by incoming information. Activation should diverse, enabling associations and inferences, while we have external signals this will suffice, eg. if we note on paper the results of intermediate operations. With a lack of external activations, we have to rely on actively maintained representations in working memory, which has serious limits (famous Miller's 7  2, and even 4  2 for complex objects). First a model without attractors, which requires external signals, then distributed representations, but shallow attractors, not very resistant to noise; in the end deep but localised attractors, which disable associations.

EE141 27 Maintenance model Project act_maint.proj. 3 objects, 3 elements (features) r.wt, View Grid_log, Run: if there is an input activation is maintained, but after removal it disperses (the network blurred...). Check influence wt_mean =0.5, wt_var = 0.1, 0.25, 0.4 Net_Type Higher_order: we add combinations of feature pairs. Defaults, Run, add noise_var=0.01, the network forgets...

EE141 28 Isolated representations Default to return to initial parameters. network = IsolatedNet Lack of connections between hidden units, but there is recurrence, activation doesn't fade. Noise = 0.01 doesn't interfere, but with 0.02 sometimes gets ruined. Is it worth learning to focus in spite of noise? Different task: does stimulus S(t) = S(t+2)? Parameters: input_data = MaintUpdateEnv, network Isolated, noise 0.01 Init, Run: there are two inputs, Input 1 and 2, wt_scale 1=>2, changes the strength of local connections. The network can be switched from fast actualization to long-lasting maintenance. How to do this automatically? Dopamine and dynamic regulation of reward in the PFC.

EE141 29 Working memory The prefrontal cortex plays the central role in maintaining active working memory and has desired properties: isolated self-activating attractor networks with extensive pools. Neuroanatomy, PFC connections and microcolumns => specialized area for active memory.  A. PR – spatial.  B. PR - spatial, self-ordered tasks.  C. PR - spatial, object and verbal, self-ordered tasks and analytical thinking.  D. PR - objects, analytical thinking. Typical experiments require delayed choice and show the differences between PC, IT, which have only temporary stimulus representations, and PFC, which maintains them longer.

EE141 30 Role of dopamine Blocking of dopamine has a negative influence on working memory, and aiding it has a positive influence. Dopamine (DA) arrives from the VTA (ventral tegmental area). DA strengthens internal activations, regulating access to working memory. VTA displays such increased activity. Basal ganglia can also regulate PFC activity. TD – temporal Difference in RL

EE141 31 Working memory Project pfc_maint_updt.proj (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_PFC_Maint_Updt) Dynamic "gate” AC added to the network with recurrence and learning based on temporal differences (TD). Inputs: A, B, C, D Ignore, Store, Recall decides what to do with them PFC is working memory, AC = adaptive critic is a reward system (dopamine) controlling information renewal in the PFC, hidden layer represents the parietal cortex, hidden 2 maps to the output (frontal cortex). AC learns to predict the next reward, modulating the strength of internal PFC connections.

EE141 32 PFC Model r.wt: one-to-one connections between input, hidden layers and the PFC. AC has connections with the hidden layer and the PFC, but reverse connections AC => PFC serve only to modulate. Act, Step: we observe phases – and +, at first the activation of PFC and AC is zero, there are two + steps, first to change PFC weights, and then to set the correct signal propagation. When signal R appears (reminders), the network will not act correctly at first, the reward in AC is 0. At first the network doesn't know what's going on, learning only on Store, Ignore hidden layer 2, but sometimes noise in the PFC will cause the correct result and reward to appear. View Epoch_log, observe the change in weight of unit AC, r.wt Weights of S => AC should increase and error will decrease, the yellow line is the number of incorrect predictions of AC. View, Grid_log, Clear, act, Step. Store introduces data to the PFC, but Ignore doesn't. After Recall, PFC is zeroed.

EE141 33 A- not B Interactions between active and synaptic memory - weights have already changed but active memory is in a different state: what wins? These interactions are visible in the developing brains of children ~ 8 months (Piaget 1954), experiments done also on animals. A toy (food) is hidden in box A and after a short delay the child (animal) can remove it from there. After several repetitions in A, the toy is hidden in box B; the children keep looking in A. Active memory doesn't work in children as efficiently as synaptic memory, lesions in the area of the prefrontal cortex cause similar effects in adult and infant rhesus monkeys. Children make fewer errors looking in the direction of the place where the toy was hidden, than reaching for it. There are many interesting variants of this type of experiment and explanations on different levels.

EE141 34 Project A- not B Decision-making process model: we know that information about place and objects is divided, so this information is given on input: place A, B, C, toy T1 or T2 and cover C1 or C2. Synaptic memory is realized with the help of standard CPCA Hebbian learning, and active memory as bi-directional connections between network representations in the hidden layer. Output layers: decisions about the direction of looking and reaching. The direction of looking is always activated during each experience, reaching is activated less often, only after moving the whole set-up toward the child, so these connections will rely on weaker learning. Initial tendency: agreement of looking and reaching on A (weight 0.7). All inputs connected with hidden neurons, weight 0.3. Project a_not_b.proj. (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_A_Not_B)

EE141 35 Experiment 1 rect_ws =0.3 decides on the strength of recurrent activations in the hidden layer (working memory), changing this parameter simulates a child's development. View Events: 3 types of events, initial showing 4x, then A 2x, then B 1 x. An event has 4 temporal segments: 1) start, pretrial – boxes covered; 2) presentation, toy hidden in A; 3) expectation – toy in A; 4) choice – possible reaching. Only visible elements are active. View: Grid_log, Run performs the entire experiment, turns off display. ViewPre shows on Grid_log, A is activated ViewA shows A tests, after learning. ViewB shows B tests: the network makes an error.

EE141 36 Further experiments Activation in the hidden layer flows toward the representation associated from A. rect_ws 0.3 => 0.75 for a mature child. Run, ViewB Although synaptic memory didn't change, more efficient working memory enables the undertaking of correct action. Try for rect_ws = 0.47 i 0.50 What happens? There is no activity – hesitation? The results depend on the length of the delay, with a shorter delay there are fewer errors. Delay 3=>1 Do tests for rect_ws = 0.47 i 0.50 What happens with a very young child? rect_ws = 0.15, delay = 3; Weak recurrence, weak learning for A.

EE141 37 Other types of memory The traditional approach to memory assumes functional, cognitive, monolithic, canonical representations in memory. From modeling, it turns out that there are many systems interacting with each other which are responsible for memory, with different characteristics, variable representations and types of information. Recognition memory: was an element of the list seen earlier? A "recognition" signal is enough, remembering is not necessary. A hippocampus model is also useful here, it allows for remembering, but this is too much – in recognition memory the central role seems to be played by the area of the perirhinal cortex. Cued recall - completion of missing information. Free recall – effects of placement on the list (best at the beginning and the end), as well as grouping (chunking) of information.

EE141 38 Learning categories Categorization in psychology - many theories. Classic experiments: Shepard et al. (1961), Nosofsky et al. (1994). Problems with an increasing degree of complexity, division into categories C 1, C 2, 3 binary properties: color (black/white), size (small/large), shape ( ,  ). Type I: one property defines the category. Type II: two properties, XOR, np. Cat A: (black,large) or (white,small), any shape. Type III-V: one property + increasingly more exceptions. Type VI: lack of rules, enumeration Difficulties and speeds of learning: Type I < II < III ~ IV ~ V < VI

EE141 39 Canonical dynamic What happens in the brain while learning category definitions based on examples? Complex neurodynamics the simplest dynamics (canonical). For all logical rules, we can write corresponding equations. For type II problems, or XOR: Feature area

EE141 40 Against majority List: diseases C or R, symptoms PC, PR, I Disease C is associated with symptoms (PC, I), disease R with (PR, I); C happens 3 times more often than R. (PC, I) => C, PC => C, I => C. Predictions „against majority” (Medin, Edelson 1988). Although PC + I + PR => C (60%), PC + PR => R (60%) Neurodynamic attractor pools? PDF in areas {C, R, I, PC, PR}. Psychological interpretation (Kruschke 1996): PR has meaning even though this is a differentiating symptom, although PC is more common. Activation PR + PC more often leads to result R although the gradient in direction R is greater.

EE141 41 LearningLearning Neurodynamics Psychology I+PC is more common => stronger synaptic connections, larger and deeper attractor basins. Symptoms I, PC are typical for C since they happen more often. To avoid attractors around I+PC leading to C, a deeper and more localized attractor around I+PR is created. For rare disease R, symptom I is not distinct, so attention focuses on PR associated with R. Point of view

EE141 42 TestingTesting Neurodynamics Psychology Point of view Activating only I leads to C since more examples of I+PC create a larger shared attractor basin than I+PR. I => C, in accordance with expectations, more frequent stimuli I+PC are recalled more often. Activation by I+PC+PR leads frequently to C, because I+PC puts the system in the middle of the large C basin and even for PR gradients still lead to C. I+PC+PR => C because all symptoms are present and C is more frequent (base rates again). Activation by PR+PC leads more frequently to R because the attractor basin for R is deeper, and the gradient at (PR,PC) leads to R. PC+PR => R because R is distinct symptom, although PC is more common.

EE141 43 SummarySummary  Knowledge formed in memory is  built, dynamic, continuous, appearing  Behavior and inhibition of knowledge are the result of dynamic information processing rather than interaction structures set at the top.  Recognition is based on the ability to differentiate earlier-learned activations from new, unknown activations.  The hippocampus ensures high-quality recognition with a high threshold guaranteeing association of earlier-learned activations.  Priming contributes to slow building of inviariant representations  Two learning mechanisms  Based on connection weights  Based on neuron activation

EE141 44 SummarySummary  The cortex helps recognition by priming  The cortex leads to unstimulated associations  The cortex is responsible for working memory cooperating with the hippocampus  Sequences of grouped representations are stored in long-term memory  Memory based on activation requires combining quick-actualizing with stable representations  The hippocampus uses sparse distributed representations for fast learning without mixing ideas  Priming memory can be long-term (based on weights) or short-term (based on activation)

EE141 1 Memory Janusz A. Starzyk Computational Intelligence Based on a course taught by Prof. Randall O'ReillyRandall O'Reilly University of Colorado and.

Similar presentations

Presentation on theme: "EE141 1 Memory Janusz A. Starzyk Computational Intelligence Based on a course taught by Prof. Randall O'ReillyRandall O'Reilly University of Colorado and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EE141 1 Memory Janusz A. Starzyk Computational Intelligence Based on a course taught by Prof. Randall O'ReillyRandall O'Reilly University of Colorado and.

Similar presentations

Presentation on theme: "EE141 1 Memory Janusz A. Starzyk Computational Intelligence Based on a course taught by Prof. Randall O'ReillyRandall O'Reilly University of Colorado and."— Presentation transcript:

Similar presentations

About project

Feedback