Arithmetic Done by Brains and Machines: The Ersatz Brain Project

Arithmetic Done by Brains and Machines: The Ersatz Brain Project
James A. Anderson Department of Cognitive and Linguistic Sciences Brown University, Providence, RI 02912 Our Goal: We want to build a first-rate, second-rate brain.

Ersatz Participants Faculty: Jim Anderson, Cognitive Science.
Gerry Guralnik, Physics. David Sheinberg, Neuroscience. Students: Socrates Dimitriadis, Cognitive Science. Brian Merritt, Cognitive Science. Private Industry: Paul Allopenna, Aptima, Inc. Andrew Duchon, Aptima, Inc. John Santini, Alion, Inc.

Acknowledgements This work was supported by:
A seed money grant from the Office of the Vice President for Research, Brown University. Phase I and Phase II SBIRs, “The Ersatz Brain Project,” to Aptima, Inc. (Woburn MA), Dr. Paul Allopenna, Project Manager. Funding from the Air Force Research Laboratory, Rome, NY

Comparison of Silicon Computers and Carbon Computer
Digital computers are Made from silicon Accurate (essentially no errors) Fast (nanoseconds) Execute long chains of logical operations (billions) Often irritating (because they don’t think like us).

Brains are Made from carbon Inaccurate (low precision, noisy) Slow (milliseconds, 106 times slower) Execute short chains of parallel alogical associative operations (perhaps 10 operations/second) Yet largely understandable (because they think like us).

Huge disadvantage for carbon: more than 1012 in the product of speed and power. But we still do better than them in many perceptual skills: speech recognition, object recognition, face recognition, information integration, motor control. One implication: Cognitive “software” uses only a few but very powerful elementary operations.

Major Point Brains and computers are very different in their underlying hardware, leading to major differences in software. Computers, as the result of 60 years of evolution, are great at modeling physics. They are not great (after 50 years trying and largely failing) at modeling human cognition. One possible reason: inappropriate hardware leads to inappropriate software. Maybe we need something completely different: new software, new hardware, new basic operations, even new ideas about computation.

So Why Build a Brain-Like Computer?
1. Engineering. Computers are all special purpose devices. Many of the most important practical computer applications of the next few decades will be cognitive in nature: · Natural language processing. · Internet search. · Cognitive data mining. · Decent human-computer interfaces. · Text understanding. We claim it will be necessary to have a cortex-like architecture (either software or hardware) to run these applications efficiently.

2. Science: Such a system, even in simulation, becomes a powerful research tool. It leads to designing software with a particular structure to match the brain-like computer. If we capture any of the essence of the cortex, writing good programs will give insight into biology and cognitive science. If we can write good software for a vaguely brain like computer we may show we really understand something important about the brain.

3. Personal: It would be the ultimate cool gadget.
A technological vision: In 2057 the personal computer you buy in Wal-Mart will have two CPU’s with very different architectures: First, a traditional von Neumann machine that runs spreadsheets, does word processing, keeps your calendar straight, etc. etc. What they do now. Second, a brain-like chip · To handle the interface with the von Neumann machine, · Give you the data that you need from the Web or your files (but didn’t think to ask for). · Be your silicon friend, guide, and confidant (Because you understand each other.)

Ersatz Basic Assumptions

The Ersatz Brain Approximation: The Network of Networks.
Conventional wisdom says neurons are the basic computational units of the brain. The Ersatz Brain Project is based on a different approximation. The Network of Networks model was developed in collaboration with Jeff Sutton then at Harvard Medical School, now at NSBRI. Cerebral cortex contains intermediate level structure, between neurons and an entire cortical region. Intermediate level brain structures are hard to study experimentally because they require recording from many cells simultaneously.

Network of Networks Approximation
We use the Network of Networks [NofN] approximation to structure the hardware and to reduce the number of connections. We assume the basic computing units are not neurons, but small (104 neurons) attractor networks. Basic Network of Networks Hardware Architecture: 2 Dimensional array of modules Locally connected to neighbors

Cortical Columns: Minicolumns
“The basic unit of cortical operation is the minicolumn … It contains of the order of neurons except in the primate striate cortex, where the number is more than doubled. The minicolumn measures of the order of m in transverse diameter, separated from adjacent minicolumns by vertical, cell-sparse zones … The minicolumn is produced by the iterative division of a small number of progenitor cells in the neuroepithelium.” (Mountcastle, p. 2) VB Mountcastle (2003). Introduction [to a special issue of Cerebral Cortex on columns]. Cerebral Cortex, 13, 2-4. Figure: Nissl stain of cortex in planum temporale.

Columns: Functional Cells in a column ~ (80)(100) = 8000
Groupings of minicolumns seem to form the physiologically observed functional columns. Best known example is orientation columns in V1. They are significantly bigger than minicolumns, typically around mm. Mountcastle’s summation: “Cortical columns are formed by the binding together of many minicolumns by common input and short range horizontal connections. … The number of minicolumns per column varies … between 50 and 80. Long range intracortical projections link columns with similar functional properties.” (p. 3) Cells in a column ~ (80)(100) = 8000

The activity of the non-linear attractor networks (modules) is dominated by their attractor states.
Attractor states may be built in or acquired through learning. We approximate the activity of a module as a weighted sum of attractor states.That is: an adequate set of basis functions. Activity of Module: x = Σ ciai where the ai are the attractor states. Elementary Modules

The Single Module: BSB The attractor network we use for the individual modules is the BSB network (Anderson, 1993). It can be analyzed using the eigenvectors and eigenvalues of its local connections.

Interactions between Modules
Interactions between modules are described by state interaction matrices, M. The state interaction matrix elements give the contribution of an attractor state in one module to the amplitude of an attractor state in a connected module. In the BSB linear region x(t+1) = Σ Misi + f x(t) weighted sum input ongoing from other modules activity

The Linear-Nonlinear Transition
The first BSB processing stage is linear and sums influences from other modules. The second processing stage is nonlinear. This linear to nonlinear transition is a powerful computational tool for cognitive applications. It describes the processing path taken by many cognitive processes. A generalization from cognitive science: Sensory inputs  (categories, concepts, words) Cognitive processing moves from continuous values to discrete entities.

Sparse Connectivity The brain is sparsely connected. (Unlike most neural nets.) A neuron in cortex may have on the order of 100,000 synapses. There are more than 1010 neurons in the brain. Fractional connectivity is very low: 0.001%. Implications: Connections are expensive biologically since they take up space, use energy, and are hard to wire up correctly. Connections are valuable. The pattern of connection is under tight control. Short local connections are cheaper than long ones. Our approximation makes extensive use of local connections for computation.

Biological Evidence

Biological Evidence: Columnar Organization in Inferotemporal Cortex
Tanaka (2003) suggests a columnar organization of different response classes in primate inferotemporal cortex. There seems to be some internal structure in these regions: for example, spatial representation of orientation of the image in the column.

IT Response Clusters: Imaging
Tanaka (2003) used intrinsic visual imaging of cortex. Train video camera on exposed cortex, cell activity can be picked up. At least a factor of ten higher resolution than fMRI. Size of response is around the size of functional columns seen elsewhere: microns.

Columns: Inferotemporal Cortex
Responses of a region of IT to complex images involve discrete columns. The response to a picture of a fire extinguisher shows how regions of activity are determined. Boundaries are where the activity falls by a half. Note: some spots are roughly equally spaced.

Active IT Regions for a Complex Stimulus
Note the large number of roughly equally distant spots (2 mm) for a familiar complex image.

Back-of-the-Envelope Engineering Considerations

Engineering Hardware Considerations
We feel that there is a size, connectivity, and computational power sweet spot at the level of the parameters of the network of network model. If an elementary attractor network has 104 actual neurons, that network might have 50 attractor states. Each elementary network might connect to 50 others through state connection matrices. A brain-sized system might consist of 106 elementary units with about 1011 (0.1-1 terabyte) numbers specifying the connections. If 100 to 1000 elementary units on a chip gives a total of 1,000 to 10,000 chips in a cortex sized system. Well within the upper bounds of current technology.

Modules (Ersatz Processing Units:EPUs)
Function of EPU Modules: Simulate local integration: Addition of inputs from outside, from other modules. Simulate local network dynamics. Communications Controller: Handle long range (i.e. not neighboring) interactions. Simpler approximations are possible: “Cellular automaton”. (Ignore local dynamics.) Approximations to local dynamics.

Physical (Hardware) Module
We assume only local connections for the physical hardware. Reason: Flexible, easy to build, easy to work with.

Software Based Connectivity
Cortical data suggests more connections than just nearest neighbors exist. Simulate these with EPU module software, in the the Communications Controller.

Implications Interesting bonus from this structure:
Information transmission both local and long range can be slow. It will take multiple steps (a long time) to move data to distant modules. But: This is a feature, not a bug!

Implications Forces us to pay attention to the
Temporal aspects of module behavior Communication times Module temporal dynamics Note: The details of spatial arrangement of data affects communication times. Consistent with cortical neuroscience Implication: We can “program” the array by manipulating these “analog” properties to control array behavior.

Ersatz Programming Peculiarities
How do you make this “computer” compute? Not with logic! It is like a hybrid analog-digital computer. Programming Techniques: Spatial arrangement of data on array Integration of data from multiple sources Abstraction and discrete concept formation Control of computation using (analog) dynamical system parameters Assemblies of interacting modules. Give one example: performance of arithmetic by a simple Ersatz-like system.

Ersatz Arithmetic

Cognitive Computation: Example - Arithmetic
Brains and computers are very different in the way they do things, largely because the underlying hardware is so different. Consider a computational task that humans and computers do frequently, but by different means: Learning simple arithmetic facts

Learning the “Right Thing”
Cognition is not memory for facts (like computer data) but remembering the “right things” even if the right things are constructed from many experiences and don’t actually exist! Most (99.9%) sensory input data is discarded. (The essential process of “creative data destruction.”) What is kept are useful abstractions and transformation of the inputs.

Arithmetic Digital computers compute the answers to problem using well-known logic based algorithms. Humans do it very differently. The human algorithm for elementary multiplication facts seems to look like: Find a number that is the answer to some multiplication problem and 2. A product number that is about the right size. This is a process involving memory and estimation, not computation as traditionally understood. Next, develop advantages and disadvantages of doing it this way.

A Problem with Arithmetic
We often congratulate ourselves on the powers of the human mind. But why does this amazing structure have such trouble learning elementary arithmetic? Adults doing arithmetic are slow and make many errors. Learning the times tables takes children several years and they find it hard.

Brain Software: John von Neumann
Von Neumann: 1958, The Computer and the Brain The nervous system is a complex machine which manages to do its exceedingly complex work on a rather low level of precision. Von Neumann, as a numerical analyst, knew that errors would rapidly grow and the result would be meaningless if there were more than a few steps in the computation.

Computational Strategy
Ways to avoid problem: Use a small number of steps Use discrete (“logic-like”) operations rather than hard (“analog”) operations. Engineering rule: Digital is easy, analog is hard. Von Neumann: … Whatever language the central nervous system is using is characterized by less logical and arithmetical depth than we are normally used to. A small number of powerful operations are strung together to form a mental computation.

Teaching of Mathematics
Collaborators: Prof. Kathryn Spoehr, Dr. Susan Viscuso, and Dr. David Bennett My own interest goes back to a joint paper with Prof. Phil Davis of Brown Applied Mathematics. Point of the paper: The “Theorem-Proof” method of teaching mathematics has ruined mathematics in the 20th Century.

Reason for Ruination Real mathematicians do not think this way.
Mathematicians use a complex blend of intuition, perception, and memory to understand complex systems. Proving theorems is the last stage, to convince others that you are correct. Effects very hard on consumers of mathematics: Engineers and scientists. They say, “I don’t think like this.” and lose confidence in their intuitions.

Why is Arithmetic so Hard?
People are much worse than they should be at elementary arithmetic. Elementary arithmetic fact learning involves making the right associative links between pairs of the 10 digits to give products, sums, etc. Only a few hundred facts to learn ... Arithmetic rules are orders of magnitude less complicated than syntax in language. But: Takes years for children to learn arithmetic.

The Problem with Arithmetic
At the same time children are having trouble learning arithmetic they are knowledge sponges learning Several new words a day. Social customs. Many facts in other areas.

Association In structure, arithmetic facts are simple associations.
Example: multiplication: (Multiplicand)(Multiplicand)  Product Simple association (S-R learning) was popular idea in the 1920’s (Thorndyke). Formation of arbitrary associations is the basic rationale behind flash cards. Can learn this way, but hard and not really with “understanding.”

Multiplication Arithmetic facts are not arbitrary associations.
They have an ambiguous structure that gives rise to associative interference. 4 x 3 = 12 4 x 4 = 16 4 x 5 = 20 Initial ‘4’ has associations with many possible products. Ambiguity causes difficulties for simple associative systems.

Number Magnitude One way to cope with ambiguity is to embed the fact in a larger context. Numbers are much more than arbitrary abstract patterns. Experiment: Which is greater? 17 or 85 Which is greater? 73 or 74

Response Time Data

Number Magnitude It takes much longer to compare 74 and 73.
When a “distance” intrudes into what should be an abstract relationship it is called a symbolic distance effect. A computer would be unlikely to show such an effect. (Subtract numbers, look at sign.)

Magnitude Coding Key observation: We see a similar effect when sensory magnitudes are being compared. Deciding which of two weights is heavier, two lights is brighter, two sounds is louder two numbers is bigger displays the same reaction time pattern.

Magnitude Coding This effect and many others suggest that we have an internal representation of number that acts like a sensory magnitude. Conclusion: Instead of number being an abstract symbol, humans use a much richer coding of number containing powerful sensory and perceptual components.

Magnitude Coding Argue that this “perceptual” elaboration of number is a good thing. It Connects abstract “number” to the physical world. Provides the basis for mathematical intuition. Is perhaps responsible for the creative aspects of mathematics.

Mathematics by Adults Mathematics is the most lawful and abstract of the sciences. Real mathematicians would not crudely associate a number with a weight? Would they? In fact, they do. Consider Jacques Hadamard’s book The Psychology of Invention in the Mathematical Field. (1946)

How Experts do Mathematics
Hadamard (a world class mathematician) interviewed his peers in Conclusion: Most of them did not reason abstractly. They used Visualization Auditory imagery Kinesthetic imagery with imagined muscle movements for insights in to “abstract” systems. Language and formal abstract reasoning were conspicuous by their rarity.

Quotes: The mental pictures of the mathematicians whose answers I have received are most frequently visual, but they may also be of another kind – for example, kinetic. There can be auditive ones.” … practically all of (them) avoided not only the use of mental words but also the mental use of any algebraic or any precise signs … they use vague images. There are two or three exceptional cases, the most important of which is the mathematician George D Birkhoff, one of the greatest in the world, who is accustomed to visualize algebraic symbols and work with them mentally … Hadamard

Einstein One of Hadamard’s informants was Einstein.
The words or the language as they are written or spoken do not seem to play any role in my mechanism of thought Albert Einstein To Einstein, thinking involves transforming of received sense images into a series of “memory pictures.” Thinking began when he found a certain picture recurring in a number of series. “… such an element becomes a concept.”

Einstein These concepts are not words but can become linked to words.
It is by no means necessary that a concept must be connected with a sensorily cognizable and reproducible sign (a word) but when this is the case thinking becomes by means of that fact communicable. (Albert Einstein, Autobiographical Notes.) Therefore, the function of words and concepts is to convince others, not necessarily yourself who had understood the system through other means.

Richard Feynman Richard Feynman was a “kinesthetic” thinker:
Feynman said to Dyson … that Einstein’s great work had sprung from physical intuition and when Einstein stopped creating it was because ‘he stopped thinking in concrete physical images and became a manipulator of equations.’ Intuition was not just visual but also auditory and kinesthetic. Those who watched Feynman in moments of intense concentration came away with a strong, even disturbing sense of the physicality of the process, as though his brain did not stop at the gray matter but extended through every muscle in his body. A Cornell dormitory neighbor opened Feynman’s door to find him rolling about on the floor beside his bed as he worked on a problem. James Gleick, Genius: The Life and Science of Richard Feynman

Non-Verbal Science Among the virtuosos of intuitive (non-verbal) science are physicists with their “gedanken experiments.” At the age of 16 Einstein performed a powerful visual thought experiment. He assumed an observer was moving along side an electromagnetic wave. Think of a boat moving in the same speed and direction as an ocean wave.

Waves: Water and Electro-Magnetic
See a stationary hill of water. See a stationary electro-magnetic field?

Waves Water wave: See a stationary hill of water.
If you traveled with the same speed and direction as an electromagnetic wave, you would see a motionless spatially varying electric and magnetic field. Einstein knew this had been looked for and never found.

Relativity Perhaps we did not see this because it was impossible for an observer to travel at the same velocity as an electromagnetic wave. Results of this insight: … a paradox upon which I had already hit at the age of 16: if I purse a beam of light with the velocity c … I should observe such a beam as a spatially oscillatory electromagnetic field at rest. However there seems to be no such thing. … One sees that in this paradox, the germ of the special relativity theory is already contained. Albert Einstein, Autobiographical Notes.

Visual Image of a Proof Hadamard gives his own visual images of a proof. The proof is by contradiction. Theorem: There is no largest prime number. Suppose someone claims that P is the largest prime. Form the product of all the prime numbers up to P, forming a large number, N. Add one to N, giving N+1. Given this construction, all the primes up to P must give a remainder of 1 when they divide N+1. Previously Shown: All integers are primes or the product of primes. Therefore, either (1) the number N+1 itself is prime or (2) It is the product of two or more primes, each larger than any in the sequence of known primes that formed N+1.

I consider all primes from 2 to 11, say 2,3,5,7,11
I consider all primes from 2 to 11, say 2,3,5,7,11. I see a confused mass. I form their product, 2x3x5x7x11. N being a rather large number I imagine a point far from the confused mass. I increase that product by 1, say N+1. I see a second point a little beyond the first. That number, if not a prime, must admit of a prime divisor. … I see a place somewhere between the confused mass and the first point.

Problems These images are supposed to be universal.
In fact: Hadamard’s image is wrong for the number 11. For N=11, N+1 is 2,311 which is itself prime so the “place” in the last image is identical to the “second point.” If the number used is N=13, N+1 = 30,031 which is the product of 59 and 509. N=13 agrees with Hadamard’s image. A visual image can be misleading! We need formal proofs to check our intuitions.

Model Makes Small Mistakes, Not Big Ones
Model used a neural network based associative system. Buzz words: non-linear, associative, dynamical system, attractor network. The magnitude representation is built into the system by assuming there is a topographic map of magnitude somewhere in the brain.

First Observation about Arithmetic Errors
Arithmetic fact errors are not random. Errors tend to be close in size to the correct answer. In the simulations, this effect is due to the presence of the magnitude code.

Second Observation: Error Values
Values of incorrect answers are not random. They are product numbers, that is, the answer to some multiplication problem. Only 8% of errors are not the answer to a multiplication problem.

Human Algorithm for Multiplication
The answer to a multiplication problem is: 1. Familiar (a product) 2. About the right size.

Human Algorithm for Multiplication
Arithmetic fact learning is a memory and estimation process. It is not really a computation!

Flexible and programmable
Learning facts alone doesn’t get you far. The world never looks exactly like what you learned. Heraclitus (500 BC): It is not possible to step twice into the same river. A major goal of learning is to apply past learning to new situations.

Getting Correct What you Never Learned: Comparisons
Consider number comparisons: Is 7 bigger than 9? We can be sure that children do not learn number comparisons individually. There are too many of them. About 100 single digit comparisons About 10,000 two-digit comparisons And so on.

Building a System to Perform Simple Arithmetic Operations
We have a model for arithmetic learning. Can we now make a system capable of performing some simple mathematical operations on numbers? Techniques we can use include attractor networks, differential weighting of portions of an array of units, and a specialized data representation for number. Examples of simple operations are increment, decrement, greater than, less than, round off. The current version is restricted to the digits from 1 to 10.

Bar Codes A bar code represents magnitude by position on a map.
There are ten patterns for the digits from 1 to 10. The patterns for each digit overlap slightly.

Number Representation
Bar coding for number resembles the number line: (10)

Programming Patterns The number map is weighted by programming patterns. One pattern is used for each operation. The pattern(s) for number and for operation multiply. System dynamics gives the final answer.

Basic Arithmetic Operations
Count up (starting number + 1) Count down (starting number – 1) Greater than: Given two digits, output the larger. Lesser than: Given two digits, output the smaller. Round off: Given activity at a location on the array, output the nearest integer.

Programming Pattern: Count Up/Down
Count up (starting number + 1) Count down (starting number – 1) (mirror image of count up).

Programming Pattern: Greater Than/Less Than
Greater than: Given two digits, output the larger. Lesser than: Given two digits, output the smaller. (mirror image of “Greater than” pattern.

Programming Pattern: Round-Off
Round off: Given activity at a location on the array, output the nearest integer.

Manipulating Starting Point
We are manipulating the starting point in the attractor structure. Once the attractor structure is formed many operations can be performed without further learning. Operations are not “logical” but based on continuous mathematics. This might be considered a very simple kind of mathematical intuition.

Experimental Data: Single Digit Number Comparisons
Assume something like experimental reaction time is related to the time taken to get the answer. The Greater-Than operation shows a “symbolic distance” effect just like humans do.

Physiological Evidence
Bars overlap. Integers close in magnitude show a degree of similarity in their representations. A 2002 paper in Science showed this effect in single unit recordings in primate prefrontal cortex. Note the similarity to the symbolic distance curves. A Nieder, DJ Friedman, EK Miller (2002). Representation of the quantity of visual items in the primate prefrontal cortex. Science 297, 1708

Numerosity Numerosity:
A problem joining ‘abstract’ quantities with pattern recognition. Given a set of identical items presented in a field, report how many items there are.

Subitzing For humans, number from one to about four works in what is called the subitizing region. Subjects “know” quickly how many objects are present. Each additional item (up to 4) adds about 40 msec to the response time. In the counting region (more than 4 objects) each additional item adds around 300 msec per item. This figure is consistent with explicit counting.. Evidence that there is a strong “total activity” component to subitizing.

Number Estimation with Lateral Information Flow
The network of networks model propagates pattern information laterally. Total maximum activity gives numerosity.

Which Plate has the Most Cookies?
Segment the field by using boundary modules in attractor states. (No lateral transmission.) (Lateral interactions can be halted by interposing lines or regions.) Metacontrast

Counting Cookies Program Counting Cookies: The image is segmented.
The numerosity of objects in each segment is computed using activity based lateral spread. Activity measure is converted into an integer by the round-off operation. Integers are compared using the greater-than operator with the largest integer is the output. This very simple program is based on topographic and dynamic representational assumptions. Not just a toy problem: Can let you estimate number of similar or identical objects with a largely parallel and selective algorithm.

Magnitude We now see the usefulness of the “sensory” magnitude number representation. We can use magnitude to do computations like number comparisons without having to learn special cases.

Implications We have constructed a system that acts like like logic or symbol processing in a limited domain. It does so by using its connection to perception to do much of the computation. These “abstract” or “symbolic” operations display their underlying perceptual nature in effects like symbolic distance and error patterns in arithmetic.

Connect perception to abstraction and gain the power of each approach
Humans are a hybrid computer. We have a recently evolved, rather buggy ability to handle abstract quantities and symbols. (only 100,000 years old. We have the alpha release of the intelligence software.)

Connect perception to abstraction and gain the power of each approach
We combine symbol processing with highly evolved, extremely effective sensory and perceptual systems. Realized in a mammalian neocortex. (over 500 million years old. We have a late release, high version number of the perceptual software.) The two systems cooperate and work together effectively.

Conclusions A hybrid strategy is biological:
Let a new system complement an old one. Never throw anything away. Even a little abstract processing goes a long way. Perhaps that is one reason why our species has been so successful so fast.

Arithmetic Done by Brains and Machines: The Ersatz Brain Project

Similar presentations

Presentation on theme: "Arithmetic Done by Brains and Machines: The Ersatz Brain Project"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Arithmetic Done by Brains and Machines: The Ersatz Brain Project

Similar presentations

Presentation on theme: "Arithmetic Done by Brains and Machines: The Ersatz Brain Project"— Presentation transcript:

Similar presentations

About project

Feedback