Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Brain-Like Computer for Cognitive Applications: The Ersatz Brain Project James A. Anderson Department of Cognitive and Linguistic.

Similar presentations


Presentation on theme: "A Brain-Like Computer for Cognitive Applications: The Ersatz Brain Project James A. Anderson Department of Cognitive and Linguistic."— Presentation transcript:

1 A Brain-Like Computer for Cognitive Applications: The Ersatz Brain Project James A. Anderson James_Anderson@brown.edu Department of Cognitive and Linguistic Sciences Brown University, Providence, RI 02912 Paul Allopenna pallopenna@aptima.com Aptima, Inc. 12 Gill Street, Suite 1400, Woburn, MA Our Goal: We want to build a first-rate, second-rate brain.

2 Participants Faculty: Jim Anderson, Cognitive Science. Gerry Guralnik, Physics. Tom Dean, Computer Science. David Sheinberg, Neuroscience. Students: Socrates Dimitriadis, Cognitive Science. Brian Merritt, Cognitive Science. Benjamin Machta, Physics. Private Industry: Paul Allopenna, Aptima, Inc. John Santini, Anteon, Inc.

3 Acknowledgements This work was supported by: A seed money grant from the Office of the Vice President for Research, Brown University. An SBIR, The Ersatz Brain Project, FA8750-05-C- 0122, to Aptima, Inc. (Woburn MA), Dr. Paul Allopenna, Project Manager. Also: Early support was received from a DARPA grant to Brown University Engineering Department in the Bio/Info/Micro program, MDA972-00-1-0026.

4 Comparison of Silicon Computers and Carbon Computer Digital computers are Made from silicon Accurate (essentially no errors) Fast (nanoseconds) Execute long chains of logical operations (billions) Often irritating (because they don’t think like us).

5 Comparison of Silicon Computers and Carbon Computer Brains are Made from carbon Inaccurate (low precision, noisy) Slow (milliseconds, 10 6 times slower) Execute short chains of parallel alogical associative operations (perhaps 10 operations/second) Yet largely understandable (because they think like us).

6 Comparison of Silicon Computers and Carbon Computer Huge disadvantage for carbon: more than 10 12 in the product of speed and power. But we still do better than them in many perceptual skills: speech recognition, object recognition, face recognition, motor control. Implication: Cognitive “software” uses only a few but very powerful elementary operations.

7 Major Point Brains and computers are very different in their underlying hardware, leading to major differences in software. Computers, as the result of 60 years of evolution, are great at modeling physics. They are not great (after 50 years of and largely failing) at modeling human cognition. One possible reason: inappropriate hardware leads to inappropriate software. Maybe we need something completely different: new software, new hardware, new basic operations, even new ideas about computation.

8 So Why Build a Brain-Like Computer? 1. Engineering. Computers are all special purpose devices. Many of the most important practical computer applications of the next few decades will be cognitive in nature:  Natural language processing.  Internet search.  Cognitive data mining.  Decent human-computer interfaces.  Text understanding. We claim it will be necessary to have a cortex-like architecture (either software or hardware) to run these applications efficiently.

9 2. Science : Such a system, even in simulation, becomes a powerful research tool. It leads to designing software with a particular structure to match the brain-like computer. If we capture any of the essence of the cortex, writing good programs will give insight into biology and cognitive science. If we can write good software for a vaguely brain like computer we may show we really understand something important about the brain.

10 3. Personal : It would be the ultimate cool gadget. A technological vision: In 2055 the personal computer you buy in Wal-Mart will have two CPU’s with very different architectures: First, a traditional von Neumann machine that runs spreadsheets, does word processing, keeps your calendar straight, etc. etc. What they do now. Second, a brain-like chip  To handle the interface with the von Neumann machine,  Give you the data that you need from the Web or your files (but didn’t think to ask for).  Be your silicon friend, guide, and confidant.

11 History : Technical Issues Many have proposed the construction of brain-like computers. These attempts usually start with  massively parallel arrays of neural computing elements  elements based on biological neurons, and  the layered 2-D anatomy of mammalian cerebral cortex. Such attempts have failed commercially. The early connection machines from Thinking Machines,Inc.,(W.D. Hillis, The Connection Machine, 1987) was most nearly successful commercially and is most like the architecture we are proposing here. Consider the extremes of computational brain models.

12 First Extreme: Biological Realism The human brain is composed of the order of 10 10 neurons, connected together with at least 10 14 neural connections. (Probably underestimates.) Biological neurons and their connections are extremely complex electrochemical structures. The more realistic the neuron approximation the smaller the network that can be modeled. There is good evidence that for cerebral cortex a bigger brain is a better brain. Projects that model neurons in detail are of scientific importance. But they are not large enough to simulate interesting cognition.

13 Neural Networks. The most successful brain inspired models are neural networks. They are built from simple approximations of biological neurons: nonlinear integration of many weighted inputs. Throw out all the other biological detail.

14 Neural Network Systems Units with these approximations can build systems that  can be made large,  can be analyzed,  can be simulated,  can display complex cognitive behavior. Neural networks have been used to model (rather well) important aspects of human cognition.

15 Second Extreme: Associatively Linked Networks. The second class of brain-like computing models is a basic part of computer science: Associatively linked structures. One example of such a structure is a semantic network. Such structures underlie most of the practically successful applications of artificial intelligence.

16 Associatively Linked Networks (2) The connection between the biological nervous system and such a structure is unclear. Few believe that nodes in a semantic network correspond in any sense to single neurons. Physiology (fMRI) suggests that a complex cognitive structure – a word, for instance – gives rise to widely distributed cortical activation. Major virtue of Linked Networks: They have sparsely connected “interesting” nodes. (words, concepts) In practical systems, the number of links converging on a node range from one or two up to a dozen or so.

17 Conventional wisdom says neurons are the basic computational units of the brain. The Ersatz Brain Project is based on a different assumption. The Network of Networks model was developed in collaboration with Jeff Sutton (Harvard Medical School, now at NSBRI). Cerebral cortex contains intermediate level structure, between neurons and an entire cortical region. Intermediate level brain structures are hard to study experimentally because they require recording from many cells simultaneously. The Ersatz Brain Approximation: The Network of Networks.

18 Network of Networks Approximation We use the Network of Networks [NofN] approximation to structure the hardware and to reduce the number of connections. We assume the basic computing units are not neurons, but small (10 4 neurons) attractor networks. Basic Network of Networks Architecture: 2 Dimensional array of modules Locally connected to neighbors

19 Cortical Columns: Minicolumns “The basic unit of cortical operation is the minicolumn … It contains of the order of 80-100 neurons except in the primate striate cortex, where the number is more than doubled. The minicolumn measures of the order of 40-50  m in transverse diameter, separated from adjacent minicolumns by vertical, cell-sparse zones … The minicolumn is produced by the iterative division of a small number of progenitor cells in the neuroepithelium.” (Mountcastle, p. 2) VB Mountcastle (2003). Introduction [to a special issue of Cerebral Cortex on columns]. Cerebral Cortex, 13, 2-4. Figure: Nissl stain of cortex in planum temporale.

20 Columns: Functional Groupings of minicolumns seem to form the physiologically observed functional columns. Best known example is orientation columns in V1. They are significantly bigger than minicolumns, typically around 0.3-0.5 mm. Mountcastle’s summation : “Cortical columns are formed by the binding together of many minicolumns by common input and short range horizontal connections. … The number of minicolumns per column varies … between 50 and 80. Long range intracortical projections link columns with similar functional properties.” (p. 3) Cells in a column ~ (80)(100) = 8000

21 Elementary Modules The activity of the non- linear attractor networks (modules) is dominated by their attractor states. Attractor states may be built in or acquired through learning. We approximate the activity of a module as a weighted sum of attractor states.That is: an adequate set of basis functions. Activity of Module: x = Σ c i a i where the a i are the attractor states.

22 The Single Module: BSB The attractor network we use for the individual modules is the BSB network (Anderson, 1993). It can be analyzed using the eigenvectors and eigenvalues of its local connections.

23 Interactions between Modules Interactions between modules are described by state interaction matrices, M. The state interaction matrix elements give the contribution of an attractor state in one module to the amplitude of an attractor state in a connected module. In the BSB linear region x(t+1) = Σ M i s i + f + x(t) weighted sum input ongoing from other modules activity

24 The Linear-Nonlinear Transition The first BSB processing stage is linear and sums influences from other modules. The second processing stage is nonlinear. This linear to nonlinear transition is a powerful computational tool for cognitive applications. It describes the processing path taken by many cognitive processes. A generalization from cognitive science: Sensory inputs  (categories, concepts, words) Cognitive processing moves from continuous values to discrete entities.

25 Scaling We can extend this associative model to larger scale groupings. It may become possible to suggest a natural way to bridge the gap in scale between single neurons and entire brain regions. Networks > Networks of Networks > Networks of (Networks of Networks) > Networks of (Networks of (Networks of Networks )) and so on …

26 Binding Module Patterns Together. An associative Hebbian learning event will tend to link f with g through the local connections. There is a speculative connection to the important binding problem of cognitive science and neuroscience. The larger groupings will act like a unit. Responses will be stronger to the pair f,g than to either f or g by itself.

27 Sparse Connectivity The brain is sparsely connected. (Unlike most neural nets.) A neuron in cortex may have on the order of 100,000 synapses. There are more than 10 10 neurons in the brain. Fractional connectivity is very low: 0.001%. Implications: Connections are expensive biologically since they take up space, use energy, and are hard to wire up correctly. Therefore, connections are valuable. The pattern of connection is under tight control. Short local connections are cheaper than long ones. Our approximation makes extensive use of local connections for computation.

28 Interference Patterns We are using local transmission of (vector) patterns, not scalar activity level. We have the potential for traveling pattern waves using the local connections. Lateral information flow allows the potential for the formation of feature combinations in the interference patterns where two different patterns collide.

29 Learning the Interference Pattern The individual modules are nonlinear learning networks. We can form new attractor states when an interference pattern forms when two patterns meet at a module.

30 Module Evolution Module evolution with learning:  From an initial repertoire of basic attractor states  to the development of specialized pattern combination states unique to the history of each module.

31 Biological Evidence

32 Biological Evidence: Columnar Organization in Inferotemporal Cortex Tanaka (2003) suggests a columnar organization of different response classes in primate inferotemporal cortex. There seems to be some internal structure in these regions: for example, spatial representation of orientation of the image in the column.

33 IT Response Clusters: Imaging Tanaka (2003) used intrinsic visual imaging of cortex. Train video camera on exposed cortex, cell activity can be picked up. At least a factor of ten higher resolution than fMRI. Size of response is around the size of functional columns seen elsewhere: 300-400 microns.

34 Columns: Inferotemporal Cortex Responses of a region of IT to complex images involve discrete columns. The response to a picture of a fire extinguisher shows how regions of activity are determined. Boundaries are where the activity falls by a half. Note: some spots are roughly equally spaced.

35 Active IT Regions for a Complex Stimulus Note the large number of roughly equally distant spots (2 mm) for a familiar complex image.

36 Histogram of Distances Were able to plot histograms of distances in a number of published IT intrinsic images of complex figures. Distances computed from data in previous figure (Dimitriadis)

37 Back-of-the-Envelope Engineering Considerations

38 Network of Networks Functional Summary. The NofN approximation assumes a two dimensional array of attractor networks. The attractor states dominate the output of the system at all levels. Interactions between different modules are approximated by interactions between their attractor states. Lateral information propagation plus nonlinear learning allows formation of new attractors at the location of interference patterns. There is a linear and a nonlinear region of operation in both single and multiple modules. The qualitative behavior of the attractor networks can be controlled by analog gain control parameters.

39 Engineering Hardware Considerations We feel that there is a size, connectivity, and computational power “sweet spot” at the level of the parameters of the network of network model. If an elementary attractor network has 10 4 actual neurons, that network display 50 attractor states. Each elementary network might connect to 50 others through state connection matrices. A brain-sized system might consist of 10 6 elementary units with about 10 11 (0.1 terabyte) numbers specifying the connections. If 100 to 1000 elementary units can be placed on a chip there would be a total of 1,000 to 10,000 chips in a cortex sized system. These numbers are large but within the upper bounds of current technology.

40 Modules Function of Computational (NofN) Modules: Simulate local integration: Addition of inputs from outside, other modules. Simulate local dynamics. Communications Controller: Handle long range (i.e. not neighboring) interactions. Simpler approximations are possible: “Cellular automaton”. (Ignore local dynamics.) Approximations to dynamics.

41 Topographic Model for Information Integration

42 A Software Example: Sensor Fusion A potential application is to sensor fusion. Sensor fusion means merging information from different sensors into a unified interpretation. Involved in such a project in collaboration with Texas Instruments and Distributed Data Systems, Inc. The project was a way to do the de-interleaving problem in radar signal processing using a neural net. In a radar environment the problem is to determine how many radar emitters are present and whom they belong to. Biologically, this corresponds to the behaviorally important question, “Who is looking at me?” (To be followed, of course, by “And what am I going to do about it?”)

43 Radar A receiver for radar pulses provide several kinds of quantitative data: frequency, intensity, pulse width, angle of arrival, and time of arrival. The user of the radar system wants to know qualitative information: How many emitters? What type are they? Who owns them? Has a new emitter appeared?

44 Concepts The way we solved the problem was by using a concept forming model from cognitive science. Concepts are labels for a large class of members that may differ substantially from each other. (For example, birds, tables, furniture.) We built a system where a nonlinear network developed an attractor structure where each attractor corresponded to an emitter. That is, emitters became discrete, valid concepts.

45 Human Concepts One of the most useful computational properties of human concepts is that they often show a hierarchical structure. Examples might be: animal > bird > canary > Tweetie or artifact > motor vehicle > car > Porsche > 911. A weakness of the radar concept model is that it did not allow development of these important hierarchical structures.

46 Sensor Fusion and Information Integration with the Ersatz Brain. We can do simple sensor fusion in the Ersatz Brain. The data representation we develop is directly based on the topographic data representations used in the brain: topographic computation. Spatializing the data, that is letting it find a natural topographic organization that reflects the relationships between data values, is a technique of potential utility. We are working with relationships between values, not with the values themselves. Spatializing the problem provides a way of “programming” a parallel computer.

47 Topographic Data Representation We initially will use a simple bar code to code the value of a single parameter. The precision of this coding is low. But we don’t care about quantitative precision: We want qualitative analysis. Brains are good at qualitative analysis, poor at quantitative analysis. (Traditional computers are the opposite.) Low Values Medium Values High Values ++++

48 Demo For our demo Ersatz Brain program, we will assume we have four parameters derived from a source. An “object” is characterized by values of these four parameters, coded as bar codes on the edges of the array of CPUs. We assume local linear transmission of patterns from module to module.

49 Each pair of input patterns gives rise to an interference pattern, a line perpendicular to the midpoint of the line between the pair of input locations.

50 There are places where three or four features meet at a module. Geometry determines location. The higher-level combinations represent relations between several individual data values in the input pattern. Combinations have literally fused spatial relations of the input data,

51 Formation of Hierarchical Concepts. This approach allows the formation of what look like hierarchical concept representations. Suppose we have three parameter values that are fixed for each object and one value that varies widely from example to example. The system develops two different types of spatial data. In the first, some high order feature combinations are fixed since the three fixed input (core) patterns never change. In the second there is a varying set of feature combinations corresponding to the details of each specific example of the object. The specific examples all contain the common core pattern.

52 Core Representation The group of coincidences in the center of the array is due to the three input values arranged around the left, top and bottom edges.

53 Left are two examples where there is a different value on the right side of the array. Note the common core pattern (above).

54 Development of A “Hierarchy” Through Spatial Localization. The coincidences due to the core (three values) and to the examples (all four values) are spatially separated. We can use the core as a representation of the examples since it is present in all of them. It acts as the higher level in a simple hierarchy: all examples contain the core. Key Point: This approach is based on relationships between parameter values and not on the values themselves.

55 Relationships are Valuable Consider :

56 Which pair is most similar?

57 Experimental Results One pair has high physical similarity to the initial stimulus, that is, one half of the figure is identical. The other pair has high relational similarity, that is, they form a pair of identical figures. Adults tend to choose relational similarity. Children tend to choose physical similarity. However, It is easy to bias adults and children toward either relational or physical similarity. Potentially very a very flexible and programmable system.

58 Filtering Using Topographical Representations Now, show how to use these ideas to do something (perhaps) useful. …

59 The Problem Develop a topographic data representation inspired by the the perceptual invariances seen in human speech. Look at problems analyzing vowels in a speech signal as an example of an important class of signals. First in a a series of demonstrations using the topography of data representations to do useful computation.

60 Speech Signal Basics Vowels are long duration and often stable. But still hard to analyze correctly. Problems: different speakers, accents, high variability, dipthongs, similarity between vowels, context effects, gender The acoustic signals from a vowel are dominated by the resonances of the vocal tract, called formants. We are interested in using this problem as a test case. Show difficulties of biological signal processing. But: Important signal types, brains very good with this type of data.

61 Vowel Processing Vocal tracts come in different sizes: men, women, children, Alvin the Chipmunk. Resonant peaks change their frequency as a function of vocal tract length. This frequency shift can be substantial. But: causes little problem for human speech perception. An important perceptual feature for phoneme recognition seems to be the ratios between the formant frequencies, not just absolute values of frequency. How can we make a system respond to ratios?

62 Power Spectrum of a Steady State Vowel

63 Sound Spectrogram: Male American Words: heed, hid, head, had, hod, hawed, hood, who’d From: P Ladefoged (2000), A Course in Phonetics, 4 th Edition, Henle

64 Sound Spectrogram: Female American Words: heed, hid, head, had, hod, hawed, hood, who’d From: P Ladefoged (2000), A Course in Phonetics, 4 th Edition, Henle

65 [i] [æ] [u] Men F1 267 (0.86) 664 (0.77) 307 (0.81) Women F1 310 (1.00) 863 (1.00) 378 (1.00) Children F1 360 (1.16) 1017 (1.18) 432 (1.14) Men F2 2294 (0.82) 1727 (0.84) 876 (0.91) Women F2 2783 (1.00) 2049 (1.00) 961 (1.00) Children F2 3178 (1.14) 2334 (1.14) 1193 (1.24) Men F3 2937 (0.89) 2420 (0.85) 2239 (0.84) Women F3 3312 (1.00) 2832 (1.00) 2666 (1.00) Children F3 3763 (1.14) 3336 (1.18) 3250 (1.21) Data taken from Watrous (1991) derived originally from Peterson and Barney (1952). Average Formant Frequencies for Men, Women and Children.

66 [i] [æ] [u] Men F1/F2 0.12 0.38 0.35 Women F1/F2 0.11 0.42 0.39 Children F1/F2 0.11 0.43 0.36 Men F2/F3 0.78 0.71 0.39 Women F2/F3 0.84 0.72 0.36 Children F2/F3 0.84 0.70 0.37 Data taken from Watrous (1991) derived originally from Peterson and Barney (1952). Ratios Between Formant Frequencies (Hz) for Men, Women and Children.

67 Other Representation Issues There is a roughly logarithmic spatial mapping of frequency onto the surface of auditory cortex. Sometimes called a tonotopic mapping. Logarithmic coding of a parameter changes multiplication by a constant into the addition of a constant. A logarithmic spatial coding has the effect of translating the parameters multiplied by the constant the same distance.

68 Spatial Coding of Frequency Three data points on a map of frequency. Multiply by ‘c’. Distance moved on map varies from point to point. Suppose use the log of data value. Now scale by ‘c’ Each point moves an amount 

69 Multiple Maps Human fMRI derived maps in human auditory cortex. Note at least five, probably six maps. Some joined at high frequency end and some at low frequency end. (Figure 6 from Talavage, et al., p. 1290)

70 Representational Filtering Our computational goal: Enhance the representation of ratios between formant frequencies De-emphasize the exact values of those frequencies. We wish to make filter using the data representation that responds to one aspect of the input data. We suggest that brain-like computers can make use of this strategy.

71 Use the Information Integration Architecture Assume the information integration square array of modules with parameters fed in from the edges Map of frequency along an edge. Assume formant frequencies are precise points. (Actually they are somewhat broad.) We start by duplicating the frequency representation along the edges of a square.

72 Simple Topographic System To Represent Relationships Simplest system: Two opposing maps of frequency. Look at points equally distant between f1 on one map and f2 on the other. Shift frequency by constant amount, . The point of equal distance between new frequencies (f1+  and (f2+  does not move.

73 Problems Unfortunately, this desirable invariance property only holds on the center line. Two points determine a line, not a point. There are many equidistant points. What happens off the center line is more complex. Still interesting, but a triple equidistant coincidence would be much more stable.

74 Three Parameter Coincidences Assume we are interested in the more complex system where three frequency components come together at a single module at the same time. We conjecture the target module may form a new internal representation corresponding to this triple coincidence. Assume uniform transmission speed between modules. Then we look for module locations equidistant from the locations of triple sets of frequencies.

75 Triple Coincidences

76 Construction Location of triple coincidences is a function of Ratios of f’s. Values of f’s. Careful parametric study has not yet been done. But: Now mixes frequency and ratios.

77 Multiple triple coincidence locations are present. Depending on the triple different modules are activated. A three “formant” system has six locations corresponding to possible triples. If we shift the frequency by an amount  (multiplication by a constant!)the location of the triple shifts slightly. Data Representation

78 Two Different Stimuli: Selectivity of Representation The geometry of the triple coincidence points varies with the location of the inputs along the edges. A different set of frequencies will give rise to a different set of triple coincidences. Representation is selective.

79 Robust Data Representation The system is robust. Changes in the shape of the maps do not affect the qualitative results. Different spatial data arrangements work nicely. Changes in geometry have possibilities for computation. The non-square arrangement spreads out the triple coincidence points along the vertical axis.

80 Representation of a vowel is composed of multiple triple coincidences (multiple active modules). But since information can move laterally. Closed loops of activity become possible. Idea proposed before: Hebb cell assemblies were self exciting neural loops. Corresponded to cognitive entities: concepts. Hebb’s cell assemblies were hard to make work because of the use of scalar interconnected units. We have pattern sensitive interconnections. Module assemblies may become a powerful feature of the Network of Networks approach. See if we can integrate relatively dense local connections to form module assemblies. Module Assemblies

81 If the modules are simultaneously active the pairwise associations forming the loop abcda can be learned through simple Hebb learning. The path closes on itself. Consider a. After traversing the linked path a>b>c>d>a, the pattern arriving at a around the loop is a constant times the pattern on a. If the constant is positive there is the potential for positive feedback if the total loop gain is greater than one. Loops

82 Formation of Module Assemblies A single frequency pattern will give rise to multiple triple coincidences. Speculation: Assume a module assembly mechanism: Simultaneous activation can associate the active regions together for a particular pattern. Two different patterns can give rise to different module assemblies.

83 Provocative Neurobiology The behavior of the active regions under transformations (i.e. multiplication by a constant) has similarity to one of Tanaka’s observations. Tanaka shows an intrinsic imaging response inferotemporal cortex to the image of a model head. As the head rotates there is a gradual shift of the columnar-sized region. The total movement for 180 degree rotation is about 1 mm (three or four columns). The shift seems to be smooth with rotation. Tanaka was sufficiently impressed with this result to modify his columnar model.

84 Rotating Face Representation in Inferotemporal Cortex (from Tanaka, 2003)

85 Revised Tanaka Columnar Model for Inferotemporal Cortex

86 Theme, Variations, Transformations Speculation: Cortical processing involving common continuous transformations may be working on a “theme and variations” principle. There are an infinite number of possible transformations. But the most common seems to be topographically represented by a small physically contiguous range of locations on the surface of cortex. By far the most common transformation for a head would be rotation around the vertical axis of the head caused by different viewing angles.

87 Potential Value This is an example of an approach to signal processing for biological and cognitive signals. Many important problems: for example, vision, speech, even much cognition and information integration. Potentially interesting aspects to algorithm. Largely parallel Conjecture: Should be robust. Conjecture: May be able to handle common important transformations. Speculation: May put information in useful form for later cognitive processing. Speculation: If many small active areas (modules) is the right form for output, then this technique may work.

88 Potential Value (2) To be done: Develop general rules for topographic geometries Are the filter characteristics good? Over what range of values? Example: Could we develop a “pure” stable ratio filter? Right now, mixed. Since we are assuming traveling waves underlying this model, what are the temporal dynamics? >Does it work for real data?

89 Conclusions: Representation Topographic maps of the type we suggest can do information processing. They can act like filters, enhancing some aspects of the input pattern and suppressing others. Here, enhancing ratios of frequency components and suppressing absolute frequency values. Speculation: Their behavior may have some similarities to effects seen in cortex.

90 Sparse Neural Systems: The Ersatz Brain gets Thinner

91 Neural Networks. The most successful brain inspired models are neural networks. They are built from simple approximations of biological neurons: nonlinear integration of many weighted inputs. Throw out all the other biological detail.

92 Layers Up to now we have emphasized local, lateral interactions between cells and cortical columns. But there are also long range projections in cortex where one large group of cells projects to another one some distance away. Traditional neural net processing is built around these projection systems and have little lateral interaction. They usually assume full connectivity between layers. Is this correct?

93 Neural Network Systems Standard neural network is formed using: multiple layers projections between layers.

94 Most neural nets assume full connectivity between layers. A fully connected neural net uses lots of connections! A Fully Connected Network

95 Limitation 1: Sparse Connectivity We believe that the computational strategy used by the brain is strongly determined by severe hardware limitations. Example: The brain is sparsely connected. Fractional connectivity of the brain is very low: 0.001%. Implications: Connections are expensive biologically since they take up space, use energy, and are hard to wire up correctly. Connections are valuable. The pattern of connection is under tight control. Short local connections are cheaper than long ones. But many long projections do exist and are very important.

96 In sparse coding only a few active units represent an event. “In recent years a combination of experimental, computational, and theoretical studies have pointed to the existence of a common underlying principle involved in sensory information processing, namely that information is represented by a relatively small number of simultaneously active neurons out of a large population, commonly referred to as ‘sparse coding.’” (p. 481) BA Olshausen and DJ Field (2004). Sparse coding of sensor inputs. Current Opinions in Neurobiology, 14, 481-487. Limitation 2: Sparse Coding

97 There are numerous advantages to sparse coding. Sparse coding provides increased storage capacity in associative memories is easy to work with computationally, Very fast! (Few or no network interactions). is energy efficient Best of all: It seems to exist! Higher levels (further from sensory inputs) show sparser coding than lower levels. Inferotemporal cortex seems to be more selective, less spontaneously active than primary areas (V1). Advantages of Sparse Coding

98 See if we can make a learning system that starts from the assumption of both sparse connectivity and sparse coding. If we use simple neural net units it doesn’t work so well. But if we use our Network of Networks approximation, it works better and makes some interesting predictions. Sparse Connectivity + Sparse Coding

99 The simplest sparse system has a single active unit connecting to a single active unit. If the potential connection does exist, simple outer-product Hebb learning can learn it easily. Not interesting. The Simplest Connection

100 A useful notion in sparse systems is the idea of a path. A path connects a sparsely coded input unit with a sparsely coded output unit. Paths have strengths just as connections do. Strengths are based on the entire path, from input to output, which may involve intermediate connections. It is easy for Hebb synaptic learning to learn paths. Paths

101 One of many problems. Suppose there is a common portion of a path for two single active unit associations, a with d (a>b>c>d) and e with f (e>b>c>f). We cannot easily weaken or strengthen the common part of the path (b>c) because it is used in multiple associations. Interference occurs. Common Parts of a Path

102 Some speculations: If independent paths are desirable an initial construction bias would be to make available as many potential paths as possible. In a fully connected system, adding more units than contained in the input and output layers would be redundant. They would add no additional processing power. Obviously not so in sparse systems! Fact: There is a huge expansion in number of units going from retina to thalamus to cortex. In V1, a million input fibers drive 200 million V1 neurons. Make Many, Many Paths!

103 Network of Networks Approximation Single units do not work so well in sparse systems. Let us our Network of Networks approximation and see if we can do better. Network of Networks: the basic computing units are not neurons, but small (10 4 neurons) attractor networks. Basic Network of Networks Architecture: 2 Dimensional array of modules Locally connected to neighbors

104 Interactions between Modules Interactions between modules are vector in nature not simple scalar activity. Interactions between modules are described by state interaction matrices instead of simple scalar weights. Gain greater path selectivity this way.

105 Feedforward, Feedback Emphasize: Cortex is not a simple feedforward system moving “upward” from layer to layer. (Input to output) It has massive connections backwards from layer to layer, at least as dense as the forward connections. There is not a simple processing hierarchy!

106 Columnar organization is maintained in both forward and backward projections “The anatomical column acts as a functionally tuned unit and point of information collation from laterally offset regions and feedback pathways.” (p. 12) “… feedback projections from extra-striate cortex target the clusters of neurons that provide feedforward projections to the same extra-striate site. ….” (p. 22). Lund, Angelucci and Bressloff (2003). Cerebral Cortex, 12, 15-24. Columns and Their Connections

107 Return to the simplest situation for layers: Modules a and b can display two orthogonal patterns, A and C on a and B and D on b. The same pathways can learn to associate A with B and C with D. Path selectivity can overcome the limitations of scalar systems. Paths are both upward and downward. Sparse Network of Networks

108 Consider the common path situation again. We want to associate patterns on two paths, a-b-c-d and e-b- c-f with link b-c in common. Parts of the path are physically common but they can be functionally separated if they use different patterns. Pattern information propagating forwards and backwards can sharpen and strengthen specific paths without interfering with the strengths of other paths. Common Paths Revisted

109 Just stringing together simple associators works: For module b: Change in coupling term between a and b: Δ(S ab ) = ηba T Change in coupling term between c and b Δ(T cb ) = ηbc T For module c: Δ(coupling term U dc ) = ηcd T Δ(coupling term T bc ) = ηcb T If pattern a is presented at layer 1 then: Pattern on d = (U cd ) (T bc ) (S ab ) a = η 3 dc T cb T ba T a = (constant) d Associative Learning along a Path

110 Because information moves backward, forward, and sideways, closed loops are possible and likely. Tried before: Hebb cell assemblies were self exciting neural loops. Corresponded to cognitive entities: for example, concepts. Hebb’s cell assemblies are hard to make work because of the use of scalar interconnected units. But module assemblies can become a powerful feature of the sparse approach. We have more selective connections. See if we can integrate relatively dense local connections with relatively sparse projections to and from other layers to form module assemblies. Module Assemblies

111 Biological Evidence: Columnar Organization in IT Tanaka (2003) suggests a columnar organization of different response classes in primate inferotemporal cortex. There seems to be some internal structure in these regions: for example, spatial representation of orientation of the image in the column.

112 Columns: Inferotemporal Cortex Responses of a region of IT to complex images involve discrete columns. The response to a picture of a fire extinguisher shows how regions of activity are determined. Boundaries are where the activity falls by a half. Note: some spots are roughly equally spaced.

113 Active IT Regions for a Complex Stimulus Note the large number of roughly equally distant spots (2 mm) for a familiar complex image.

114 Intralayer connections are sufficiently dense so that active modules a little distance apart can become associatively linked. Recurrent collaterals of cortical pyramidal cells form relatively dense projections around a pyramidal cell. The extent of lateral spread of recurrent collaterals in cortex seems to be over a circle of roughly 3 mm diameter. If we assume that: A column is roughly a third of a mm, There are roughly 10 columns in a square mm. A 3 mm diameter circle has an area of roughly 10 square mm, A column projects locally to about 100 other columns. Intralayer Connections

115 If the modules are simultaneously active the pairwise associations forming the loop abcda can be learned through simple Hebb learning. The path closes on itself. Consider a. After traversing the linked path a>b>c>d>a, the pattern arriving at a around the loop is a constant times the pattern on a. If the constant is positive there is the potential for positive feedback if the total loop gain is greater than one. Loops

116 Loops can be kept separate even with common modules. If the b pattern is different in the two loops, there is no problem. The selectivity of links will keep activities separate. Activity from one loop will not spread into the other (unlike Hebb cell assemblies). Loops with Common Modules If b is identical in the two loops b is ambiguous. There is no a priori reason to activate Loop 1, Loop 2, or both. Selective loop activation is still possible, though it requires additional assumptions to accomplish.

117 More complex connection patterns are possible. Richer interconnection patterns might have all connections learned. Ambiguous module b will receive input from d as well as a and c. A larger context would allow better loop disambiguation by increasing the coupling strength of modules. Richly Connected Loops

118 Putting in All Together: Sparse interlayer connections and dense intralayer connections work together. Once a coupled module assembly is formed, it can be linked to by other layers. Now becomes a dynamic, adaptive computational architecture that becomes both workable and interesting. Working Together

119 Two Parts … Suppose we have two such assemblies that co-occur frequently. Parts of an object say …

120 As learning continues: Groups of module assemblies bind together through Hebb associative learning. The small assemblies can act as the “sub- symbolic” substrate of cognition and the larger assemblies, symbols and concepts. Note the many new interconnections. Make a Whole!

121 Conclusion (1) The binding process looks like compositionality. The virtues of compositionality are well known. It is a powerful and flexible way to build cognitive information processing systems. Complex mental and cognitive objects can be built from previously constructed, statistically well- designed pieces. (Like cognitive Lego’s.)

122 Conclusion (2) We are suggesting here a possible model for the dynamics and learning in a compositional-like system. It is built based on constraints derived from connectivity, learning, and dynamics and not as a way to do optimal information processing. Perhaps this property of cognitive systems is more like a splendid bug fix than a well chosen computational strategy. Sparseness is an idea worth pursuing. May be a way to organize and teach a cognitive computer.

123 Conclusions Speculation: Perhaps digital computers and humans (and brain-like computers??) are evolving toward a complementary relationship. Each computational style has its virtues: –Humans (and brain-like computers??): show flexibility, estimation, connection to the physical world –Digital Computers: show speed, logic, accuracy. Both styles of computation are valuable. There is a place for both. But their hardware is so different that brain- like coprocessors make sense. As always, software will be more difficult build and understand than hardware.


Download ppt "A Brain-Like Computer for Cognitive Applications: The Ersatz Brain Project James A. Anderson Department of Cognitive and Linguistic."

Similar presentations


Ads by Google