Download presentation

Presentation is loading. Please wait.

Published byAdela Welch Modified about 1 year ago

1
Continuous States and Distributed Symbols: Toward a Biological Theory of Computation Simon D. Levy Computer Science Department Washington and Lee University Lexington, VA IntroductionIntroduction The classical theory of computation rests on two fundamental assumptions: states are finite, and symbols are atomic. Although automata built on these assumptions are extremely successful at solving many computational tasks, the assumptions are highly implausible for human and animal cognition. First, the signals used by the brain and other biological systems are mainly continuous, as evidenced by the widespread use of differential equations in modeling these systems. For this reason, it makes little sense to view mental states as countable, let alone finite. Second, there is very little reason to believe that mental representations involve locally-stored atomic symbols. Consequently, classical pointer-based discrete structures over such symbols, and algorithms operating on such structures, are not biologically realistic. Experimental evidence instead favors a view in which the representations of entities, concepts, relations, etc., are distributed over a large number of individually meaningless elements in a way that supports similarity metrics and content-based retrieval. Although both continuous-state computation and distributed representations have received a fair amount of research attention, it is uncommon to see them discussed together in the unconventional-computation literature (except, perhaps, as part of a general survey). In our presentation we argue that a biologically plausible theory of computation will require both a continuous-state automaton component and a distributed-memory component, much as a classical pushdown automaton uses both a finite-state automaton and a pushdown stack. We show further that stack-like operations (PUSH and POP) over distributed representations can be performed as simple vector addition and scalar multiplication, in a way reminiscent of foreground/background effects in visual processing. This possibility suggests that “higher” mental functions like language and abstract thought might be exploiting existing neural circuitry already available for other purposes. We conclude with a simple visual example and some speculation about possible new directions and guiding principles for biologically- inspired unconventional computation. Dynamical Automata If we want to imitate human memory with models, we must take account of the weaknesses of the nervous system as well as its powers. D. Gabor [1] Figure 1. A neural network implementing a dynamical automaton that accepts the language of balanced parentheses. The node labeled z holds a state variable whose initial value is 1. Nodes labeled compute the products of their inputs, node labeled the sum, weighted by the values on the edges. (Unlabeled edges have a weight of 1.0.) Node f is the modulo 2 function. The dashed line represents a feedback connection that copies f ’s output back to z at each input step. Node g is a threshold function that outputs 1 (accept) when its input goes above 0.75, and 0 (reject) otherwise. The left-parenthesis symbol is encoded as the vector [1 0] T, and the right parenthesis as [0 1] T. Tensor Products and Related Models Dynamical automata [2, 3, 4] are neurally-inspired models designed specifically to behave like pushdown automata (PDA) recognizing infinite languages. The back- propagation algorithm [5] can be used to train dynamical automata to recognize different languages [2], or the automata can be hard-wired to recognize specific languages. [4] Either way, their functioning is built on a simple principle: given sufficient numerical precision, the state of the machine can be represented as a single number or set of numbers. Push and pop operations can then represented as multiplication by various constants, implemented as weighted connections in a recurrent neural network. For example, Figure 1 illustrates a dynamical automaton that recognizes the language of balanced parentheses. This automaton starts with an empty-stack value of 1, multiplies by 0.5 on seeing a left parenthesis, and by 2 modulo 2 on seeing a right parenthesis. [2] Input is accepted when the stack value gets sufficiently close to 1 (e.g., > 0.75). The stack and discrete state space of the classical PDA are thus replaced by a continuous state space having a fractal property. [6] Dynamical automata thus provide a principle by which “infinite-state” recursion can be performed in a neural-like architecture. Classical computation relies crucially on the ability to bind values (e.g., 3.14) to variables (x) and roles (AGENT/PATIENT) to fillers (JOHN/MARY), and to represent such bindings in a way that supports recursion. Tensor-product models represent an effort to deal explicitly role/filler, variable/value, and other binding tasks in a distributed representation. Tensor- product models represent fillers (and roles) as vectors of values, supporting distributed representations of arbitrary size. In the simplest formulation [7], roles are vectors of the same length as their fillers. Binding is implemented by taking the tensor (outer) product of a role vector and a filler vector, resulting in a mathematical object (matrix) having one more dimension than the filler. Given vectors of sufficient length, each role/filler matrix will be unique. As shown in Figure 2, another crucial property of such representations is that role/filler s can be bundled to produce more complex structures, through simple element-wise addition. This capability opens the door to recursion, allowing entire bundles of structure (John loves Mary) to fill roles (Bill thinks John loves Mary). Figure 2. Binding and bundling with tensor products to represent the proposition John loves Mary. Roles are column vectors, fillers row vectors. Black squares represent nonzero values, white zero values. Crucially, the original fillers of all roles must be recoverable from such bundled representations; i.e., there must be an unbinding operation as well. Figure 3 shows example of unbinding, in which the transposed role vector LOVEE is multiplied by the bundled tensor product for John loves Mary, producing a noisy version of the original vector for the filler MARY. The introduction of noise requires that the unbinding process employ a “cleanup memory” to restore the fillers to their original form. The cleanup memory can be implemented using neurally-plausible Hebbian auto-association, like a Hopfield Network. [8] In such networks the original fillers are attractor basins in the network’s dynamical state space. Figure 3. Unbinding the tensor representation of John loves Mary by probing with the LOVEE role produces a noisy version of MARY. Because the dimension of the role/filler binding increases with each binding operation, tensor products grow exponentially as more recursive embedding is performed. The solution is to collapse the bound N × N role/filler matrix back into a length-N vector. As shown in Figure 4, there are two ways of doing this. In Binary Spatter Codes (BSC; [9]), only the elements along the main diagonal are kept, and the rest are discarded. In Holographic Reduced Representations, or HRR [10], the sum of each diagonal is taken, with wraparound (circular convolution) keeping the length of all diagonals equal. Both approaches use very large (> 1000 element) vectors of random values drawn from a fixed set or interval. Despite the size of the representations, both approaches are computationally efficient, requiring no back-propagation or other costly iterative algorithm, and can be done in parallel. Even in a serial implementation, the BSC approach is O(N) for a vector of length N, and the HRR approach can be implemented using the Fast Fourier Transform, which is O(N log N). The price paid is that the binding operation becomes a variety of lossy compression, collapsing N 2 pieces of information down to N. As with the noise introduced by the unbinding operation, this noise can be dealt with by a cleanup memory. Figure 4. Two methods for maintaining fixed dimensionality in tensor-product representations. Holographic Reduced Representation sums along main diagonal and off diagonals. Binary Spatter Code uses only elements on main diagonal. Stacking Distributed Representations Consistent with an approach driven by neural plausibility, we would like a way to represent the classical automata-theoretic operations (PUSH, POP) in a neurally plausible algorithm. One way to do this is to modify the Hopfield network learning equation via a coefficient on each vector to be learned: Here, w ij is the weight to be learned between the i th and j th elements of vector x; p is the number of such vector patterns to be learned, and is the strength coefficient assigned to each vector. By increasing or decreasing over the sequence of patterns, a stack- or queue-like behavior can be easily implemented. As with a standard Hopfield network, the top pattern on the stack/queue can then be recovered by iterating the following equation until its output converges: where u i = 1 initially; then u i = 0 for s i <0; u i = 1 otherwise. Popping the stack then corresponds to running the Hopfield unlearning algorithm on this recovered image, using the additive inverse of the coefficient associated with learning it. Because it is difficult to make visual sense of random vectors, Figure 5 illustrates the use of this “Hopfield Stack” network to encode images: specifically, 70×50-pixel images of three Civil War Generals. Patterns are represented as bit vectors of length 3500, with a 3500 × 3500 matrix of weights implementing the stack. Contrast the stack visualization at right with a classical stack, in which only the top (last-in) object would be “visible”. Figure 5. A “Hopfield Stack” for images. Image at right represents the result of pushing the first three images in the order given, with 1 = 1.0 2 = 2.0 3 = 3.0. Conclusions and Future Work This presentation has shown concrete ways in which biologically motivated representations can be used to perform some of the critical operations associated with classical computation – namely, tracking state and maintaining a stack or queue of recursively decomposable symbols. This approach contrasts favorably with localist representations, which merely implement classical computation with a “one neuron per symbol” approach, or which ignore the issue of symbol content entirely. The next obvious step would be to build a model integrating the state and stack components in a way that supports learning grammar-like mappings between recursively structured meanings and symbol sequences for parsing and generating language. Such a model could represent a significant step in overcoming the unnaturalness of conventional computing approaches to cognition, AI, and related fields. ReferencesReferences [1] Gabor, D. Improved holographic model of temporal recall. Nature 217 (1968) [2] Pollack, J. The induction of dynamical recognizers. Machine Learning 7 (1991) [3] Moore, C. Dynamical recognizers: Real-time language recognition by analog computers. Theoretical Computer Science 201 (1998) [4] Tabor, W. Dynamical automata. Technical Report Tr , Computer Science Department, Cornell University (1998) [5] Rumelhart, D., Hinton, G., Williams, R.: Learning internal representation by error propagation. In Rumelhart, D., McClelland, J., eds.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1. MIT Press (1986) [6] Mandelbrot, B.B.: The Fractal Geometry of Nature. W.H. Freeman and Company (1988) [7] Smolensky, P.: Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence 46 (1990) 159–216 [8] Hopfield, J.: Neural networks and physical systems with emergent collective com- putational abilities. Proceedings of the National Academy of Sciences 79 (1982) 2554–2558 [9] Kanerva, P.: The binary spatter code for encoding concepts at many levels. In Marinaro, M., Morasso, P., eds.: ICANN ’94: Proceedings of International Conference on Artificial Neural Networks. Volume 1., London, Springer-Verlag (1994) 226–229 [10]Plate, T.A.: Holographic Reduced Representation: Distributed Representation for Cognitive Science. CSLI Publications (2003) z f g

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google