Representing Representers and What They Represent

Representing Representers and What They Represent
Note change to less pretentious and more accessible title Kathryn Blackmond Laskey George Mason University Department of Systems Engineering and Operations Research Krasnow Institute GMU QMind II

This talk is dedicated to the memory of journalist Danny Pearl, murdered in Pakistan in February 2002, and to the pioneering research of his father Judea Pearl. Judea Pearl’s research has the potential to create unprecedented advances in our ability to anticipate and prevent future terrorist incidents. Judea Pearl has been an inspiration to me throughout my career. Much of my work builds directly on his accomplishments. I also consider him a friend. Although prior to his abduction Danny Pearl did not consider himself a religious person, he was chosen to be murdered because of his Jewish heritage. The Jewish tradition has no official position on an afterlife, but there is a strong sense of our connection with previous generations and a belief that people live on through their impact on the lives of others. If through this dedication I am able to inspire people to apply Judea’s work in sound, principled, and effective ways to increase our ability to anticipate, detect, prevent, and respond to terrorist incidents, then I and those who are inspired by my words will help to ensure that Danny Pearl did not die in vain.

Representation A representation consists of:
A representing system A represented system A mapping between the representing system and the represented system Important properties of the represented system correspond to features in the representation A conscious organism Represents its environment and possibly itself to itself Uses its representations to engage in adaptive behavior with respect to its environment Sense Recognize Plan and act

Observations Representation Real World Actions

Science and Representation
Elements of a representation Reality to represent Space of possible representations of reality Correspondence between aspects of reality and features in representation space Important considerations By whom is representation being used? For what purpose? How to measure how good it is? Scientists study a phenomenon by Building a representation of the phenomenon Manipulating the representation Comparing non-obvious features of the representation to corresponding features in reality How do we study representation?

Representing Representation
Observations Real world with real representation created by real conscious subsystem Artificial world with simulated representation created by simulated conscious subsystem Actions

Physics, Representation and Learning
Cross-fertilization from physics to statistics and machine learning has created rapid progress Recipe for creating a good learning algorithm Represent the learning problem as a physical system in which “low action” or “low free energy” maps to good representation Simulate the physical system on a computer Let the simulation evolve according to (simulated) laws of physics Presto! Out comes a good solution to your problem The opposite direction: Can ideas from learning theory give insights for a physics of consciousness?

Learners and Learnable Phenomena
Good learners Loosely coupled local learners Multi-resolution representations Bias toward simple representations Compose elements to form complex representations Adjust appropriately to environmental feedback Intrinsic randomness to bump out of locally but not globally optimal representations Learnable systems Repeated structure Complexity built up out of simple pieces Not too much randomness A system capable of self-representation must be Simple enough to exhibit learnable regularities Complex enough to form and evolve representations of itself

? 20th Century Science Observations Real World Representation Actions
Physical reality Wave function Deterministic evolution punctuated by “jumps” No consensus on How and why “jumps” occur How consciousness interacts with physical world Actions

Stapp Theory of Consciousness
Timing of reduction and choice of operator occur by conscious choice Efficacious conscious choice enters where physics currently lacks a theory Comments on Stapp theory Stapp does not demand that all state vector reductions involve conscious choice Theory and experiment verify that macroscopic evolution of physical system can depend on choice and timing of reductions Experimentally verified quantum Zeno effect is one potential mechanism by which conscious choice might operate Stapp argues that operation of quantum Zeno effect is plausible in conditions occurring in brains

Paradigm Shift in Computing
Old paradigm: Algorithms running on Turing machines Deterministic Based on Boolean logic New paradigm: Economy of software agents executing on a physical symbol system Agents make decisions (deterministic or stochastic) to achieve objectives “Program” is replaced by dynamic system evolving better solutions Based on decision theory / game theory / stochastic processes Hardware realizations of physical symbol systems Physical systems minimize action Decision theoretic systems maximize utility / minimize loss Hardware realization of physical symbol system maps action to utility Programming languages are replaced by specification / interaction languages Software designer specifies goals, rewards and information flows Unified theory spans sub-symbolic to cognitive levels Old paradigm is limiting case of new paradigm

Decision Graph: An Example
Maria is visiting a friend when she suddenly begins sneezing. "Oh dear, I'm getting a cold," she thinks. “I had better not visit Grandma.” Then she notices scratches on the furniture. She sighs in relief. "I'm not getting a cold! It's only my cat allergy acting up!” Plausible inference 2 The evidence for cat allergy “explains away” sneezing and cold is no longer needed as an explanation 3 1 Does Maria have a “grandmother neuron”?

What Happened Under the Hood?
A decision graph is both a knowledge representation and a computational architecture Represents knowledge about variables and their interactions Modular elements with defined interconnections Computation can exploit loosely coupled structure for efficiency Parsimony Probability distributions on 5 binary variables  31-dimensional space Probability distributions for Maria’s Bayesian network  9-dimensional space Learning about one variable affects likelihood of other variables Evidence “flows” along the arcs Bidirectional inference Learn structure and probabilities as cases accumulate The information update operation is called Bayes Rule Bayesian inference is belief dynamics Within-case evidence accumulation Cross-case learning Before Bayesian networks the prevailing wisdom was that probability was intrinsically intractable without ridiculously restrictive independence assumptions Bayesian network models are feasible to elicit 100 variables and 3 states per variable (e.g., hi, med, lo) 3100 = probabilities to elicit in a fully general model In a Bayesian network with 3 parents per node there are 8100 probabilities to elicit We cut this down further using reusable fragments additional node-level structure Bayesian network models are tractable to compute We can run models with hundreds of variables faster than real time We have lots of tricks to increase efficiency But the main trick is the modular architecture that can be exploited by both KE and inference Bayesian network models can be realistically complex without becoming unmanageable Bayesian networks can be learned from cases. The theory for both online incremental learning and offline batch learning is well developed and application toolkits are coming online.

Subjective Probability
PS(E|B) is system’s degree of belief that E will occur given background information B In subjectivist theory there is no one “correct” probability Viewpoints vary on whether “objective probabilities” exist Probability as belief dynamics If new information N is added to background information B then belief in E changes to PS(E|B&N) Probability updating follows the dynamic equation known as Bayes rule Posterior odds ratio Prior odds ratio Likelihood ratio Belief in E1 increases relative to E2 if N was more likely to co-occur with E1 than with E2

Maria’s Continuing Saga…
Variation 1: Tran is sneezing and saw scratches Tran was recently exposed to a cold and probably is not allergy prone Variation 2: Tran saw scratches Maria did not see scratches Tran is in room with Maria Variation 3: Tran and Maria both are sneezing, are allergy prone, and saw scratches Tran and Maria are a continent apart

Variation 1 Add background variables to specialize model to different individuals Still a “template model” with limited expressive power

Variation 2 Decision graph has replicated sub-parts
Different kinds of entities (cats and people)

But is the cat dead or alive?
Variation 3 Done Wrong Variation 2 model gets wrong answer if Maria and Tran are not near each other and both are near cats! We need to be able to hypothesize additional cats if and when necessary But is the cat dead or alive?

Variation 3 Done Right (…but what a mess!)
This model gets the right answer on all the variations

The Solution: Multi-Entity Decision Graphs
Spatial Fragment Hypothesis Management Fragment Cats & Allergies Fragment Value Fragment Colds&Time Fragment Sneezing Fragment Specify model in pieces and let the computer compose them First-order predicate calculus plus probabilities and decisions

Representing Representation
Observations Representation of “Real” World Decisions and actions When to take observation Which question to ask Predicted outcomes Probability distribution for next observable Values Accurate prediction Survival “Real” World Stochastic process Time evolution governed by Shrödinger equation plus “quantum jumps” No good theory for: Timing of reductions Which operator is applied Actions

Information influences and value nodes are modeled by standard physics
Representation of “Player’s” Choice as Decision Graph Y1 Y1+ O1 E1 wave function reduction Shrödinger Dynamics T1 Y2 Y2+ T2 “Information influence” V1 O2 E2 V2 Information influences and value nodes are modeled by standard physics Described in psychological terms Yi = wave function before observation Yi+ = wave function after observation Mi = measurement operation Ti = time since last measurement Ei = current experience Vi = value to player - decision - chance event - value - deterministic event “Players” choose when to cause reduction events & operator to apply “Players” evolve representations Consistent with quantum mechanics Schrödinger evolution between reductions Dirac probabilities for selecting actual experience from possible experiences

When to Reduce? Game theoretic semantics
Player’s utility function includes effort of applying operator and value of result Choose reduction policy that maximizes player’s utility Players interact and can affect each other’s utility Evolutionary pressure for players who “like” policies conducive to survival As time since last observation increases Probability of “termination state” increases Fatigue decreases Components of uncertainty Intrinsic stochasticity (“object level” uncertainty) Lack of knowledge (“higher order” uncertainty) Approximation error (“model uncertainty”) As players evolve more complex and more accurate models Forecasts become more accurate Less higher order uncertainty Better ability to control question asking Players can learn to share information and effort

Direction of Time Second law of thermodynamics: time is direction of increasing physical entropy “Learning universe” hypothesis: time is direction of increasing knowledge of players about the universe they inhabit Can these arrows be reconciled? Expansion of physical phase space Contraction of information phase space

Communication Learning can be faster when players exchange information
Communicating players exchange messages Players can learn each other’s representations Efficient communication: Player 1 expresses difference between Player 1’s knowledge and Player 2’s knowledge in language of Player 2’s representation Mixed motives for information sharing Communication respects laws of physics Intrinsic randomness Prevents “freezing” at local optima

Summary Conscious agents construct representations
Conscious agents learn better representations over time Common mathematics and algorithms for Simulating physical systems Learning complex representations Many parameters High degree of conditional independence (representation is restricted to low-dimensional subspace of all probability distributions) High degree of self-similarity Conscious subsystems of universe evolve to construct better representations of themselves and the world around them

To Think About Where is information technology revolution heading?
Silicon intelligence? Symbiotic carbon/silicon intelligence? “Earth consciousness”? Assertion: It is important that we Embed (decision theoretically) coherent logic and decision making rules in hardware of the systems we build Base software architecture on decision and game theoretic semantics Map the software dynamics properly to the physical dynamics Understand the semantics of the knowledge representations we construct Understand how they are realized in the physics Understand the interface between the physics and the representation Questions to address: What base beliefs should we embed in hardware? What base values values should we embed in hardware? How should we initialize decision rules? It is better to have our eyes open and our minds engaged in considering the possibilities than to just let it happen to us

Speculative Question Can “intelligent life” be modeled as attractor in phase space? “Players” stay alive by constructing accurate (enough) representations Control exercised by “players” introduces nonlinearity “Players” guide the system toward “edge of complexity” Simple enough to learn Complex enough to evolve learners Learners cooperate to organize into mutually beneficial societies Could we obtain a shadowing theorem? Unconscious Schrodinger evolution moves away from the attractor under influence of local forces Wave function reduction brings system back toward the attractor Reduction event registers in consciousness of learners and increases their knowledge Consciously adopting policies that keep us near attractor has survival value

Additional Speculative Questions
“Mixture distributions” over models of different dimensions are active research area Bayes rule gives rise to “natural Occam’s razor” Bias toward simple models (low-dimensional parameter space) Dimensions are included as needed to explain observations Tractable approximation of complex models Algorithms imported from physics (e.g., variational methods, Markov Chain Monte Carlo) are being applied to learn very complex models Can we be modeled as MCMC samplers learning a representation of the universe we live in? If we average over models of different dimensions the parameter space is not a smooth manifold Image: quantum foam Might there be something to this image?

Representing Representers and What They Represent

Similar presentations

Presentation on theme: "Representing Representers and What They Represent"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Representing Representers and What They Represent

Similar presentations

Presentation on theme: "Representing Representers and What They Represent"— Presentation transcript:

Similar presentations

About project

Feedback