Presentation on theme: "Representing Representers and What They Represent Kathryn Blackmond Laskey George Mason University Department of Systems Engineering and Operations Research."— Presentation transcript:
Representing Representers and What They Represent Kathryn Blackmond Laskey George Mason University Department of Systems Engineering and Operations Research Krasnow Institute GMU QMind II Note change to less pretentious and more accessible title
This talk is dedicated to the memory of journalist Danny Pearl, murdered in Pakistan in February 2002, and to the pioneering research of his father Judea Pearl. Judea Pearls research has the potential to create unprecedented advances in our ability to anticipate and prevent future terrorist incidents.
Representation A representation consists of: –A representing system –A represented system –A mapping between the representing system and the represented system Important properties of the represented system correspond to features in the representation A conscious organism –Represents its environment and possibly itself to itself –Uses its representations to engage in adaptive behavior with respect to its environment »Sense »Recognize »Plan and act
Observations Actions Real World Representation
Science and Representation Elements of a representation –Reality to represent –Space of possible representations of reality –Correspondence between aspects of reality and features in representation space Important considerations –By whom is representation being used? –For what purpose? –How to measure how good it is? Scientists study a phenomenon by –Building a representation of the phenomenon –Manipulating the representation –Comparing non-obvious features of the representation to corresponding features in reality How do we study representation?How do we study representation?
Observations Actions Real world with real representation created by real conscious subsystem Artificial world with simulated representation created by simulated conscious subsystem Representing Representation
Physics, Representation and Learning Cross-fertilization from physics to statistics and machine learning has created rapid progress Recipe for creating a good learning algorithm –Represent the learning problem as a physical system in which low action or low free energy maps to good representation –Simulate the physical system on a computer –Let the simulation evolve according to (simulated) laws of physics –Presto! Out comes a good solution to your problem The opposite direction: –Can ideas from learning theory give insights for a physics of consciousness?
Learners and Learnable Phenomena Good learners –Loosely coupled local learners –Multi-resolution representations –Bias toward simple representations –Compose elements to form complex representations –Adjust appropriately to environmental feedback –Intrinsic randomness to bump out of locally but not globally optimal representations Learnable systems –Repeated structure –Complexity built up out of simple pieces –Not too much randomness A system capable of self-representation must be –Simple enough to exhibit learnable regularities –Complex enough to form and evolve representations of itself
Observations Actions 20th Century Science Real World ? Representation Physical reality - -Wave function - -Deterministic evolution punctuated by jumps No consensus on - -How and why jumps occur - -How consciousness interacts with physical world
Stapp Theory of Consciousness Timing of reduction and choice of operator occur by conscious choice Efficacious conscious choice enters where physics currently lacks a theory Comments on Stapp theory –Stapp does not demand that all state vector reductions involve conscious choice –Theory and experiment verify that macroscopic evolution of physical system can depend on choice and timing of reductions –Experimentally verified quantum Zeno effect is one potential mechanism by which conscious choice might operate –Stapp argues that operation of quantum Zeno effect is plausible in conditions occurring in brains
Paradigm Shift in Computing Old paradigm: Algorithms running on Turing machines –Deterministic –Based on Boolean logic New paradigm: Economy of software agents executing on a physical symbol system –Agents make decisions (deterministic or stochastic) to achieve objectives –Program is replaced by dynamic system evolving better solutions –Based on decision theory / game theory / stochastic processes Hardware realizations of physical symbol systems –Physical systems minimize action –Decision theoretic systems maximize utility / minimize loss –Hardware realization of physical symbol system maps action to utility –Programming languages are replaced by specification / interaction languages –Software designer specifies goals, rewards and information flows –Unified theory spans sub-symbolic to cognitive levels Old paradigm is limiting case of new paradigm
Decision Graph: An Example Maria is visiting a friend when she suddenly begins sneezing.Maria is visiting a friend when she suddenly begins sneezing. "Oh dear, I'm getting a cold," she thinks. I had better not visit Grandma."Oh dear, I'm getting a cold," she thinks. I had better not visit Grandma. Then she notices scratches on the furniture. She sighs in relief. "I'm not getting a cold! It's only my cat allergy acting up!Then she notices scratches on the furniture. She sighs in relief. "I'm not getting a cold! It's only my cat allergy acting up! Maria is visiting a friend when she suddenly begins sneezing.Maria is visiting a friend when she suddenly begins sneezing. "Oh dear, I'm getting a cold," she thinks. I had better not visit Grandma."Oh dear, I'm getting a cold," she thinks. I had better not visit Grandma. Then she notices scratches on the furniture. She sighs in relief. "I'm not getting a cold! It's only my cat allergy acting up!Then she notices scratches on the furniture. She sighs in relief. "I'm not getting a cold! It's only my cat allergy acting up! 1 Plausible inference 2 The evidence for cat allergy explains away sneezing and cold is no longer needed as an explanation 3 Does Maria have a grandmother neuron?
What Happened Under the Hood? A decision graph is both a knowledge representation and a computational architecture –Represents knowledge about variables and their interactions –Modular elements with defined interconnections –Computation can exploit loosely coupled structure for efficiency –Parsimony »Probability distributions on 5 binary variables 31-dimensional space »Probability distributions for Marias Bayesian network 9-dimensional space Learning about one variable affects likelihood of other variables –Evidence flows along the arcs –Bidirectional inference –Learn structure and probabilities as cases accumulate The information update operation is called Bayes Rule –Bayesian inference is belief dynamics –Within-case evidence accumulation –Cross-case learning
Subjective Probability P S (E|B) is systems degree of belief that E will occur given background information B –In subjectivist theory there is no one correct probability –Viewpoints vary on whether objective probabilities exist Probability as belief dynamics –If new information N is added to background information B then belief in E changes to P S (E|B&N) –Probability updating follows the dynamic equation known as Bayes rule Prior odds ratio Posterior odds ratio Likelihood ratio - -Belief in E 1 increases relative to E 2 if N was more likely to co- occur with E 1 than with E 2
Marias Continuing Saga… Variation 1: –Tran is sneezing and saw scratches –Tran was recently exposed to a cold and probably is not allergy prone Variation 2: –Tran saw scratches –Maria did not see scratches –Tran is in room with Maria Variation 3: –Tran and Maria both are sneezing, are allergy prone, and saw scratches –Tran and Maria are a continent apart
Variation 1 Add background variables to specialize model to different individuals Still a template model with limited expressive power
Variation 2 Decision graph has replicated sub-parts Different kinds of entities (cats and people)
Variation 3 Done Wrong Variation 2 model gets wrong answer if Maria and Tran are not near each other and both are near cats! We need to be able to hypothesize additional cats if and when necessary But is the cat dead or alive?
Variation 3 Done Right (…but what a mess!) This model gets the right answer on all the variations
The Solution: Multi-Entity Decision Graphs Cats & Allergies Fragment SpatialFragment HypothesisManagementFragment Colds&TimeFragment ValueFragment SneezingFragment Specify model in pieces and let the computer compose them First-order predicate calculus plus probabilities and decisions
Observations Actions Representing Representation Real World Stochastic process Time evolution governed by Shrödinger equation plus quantum jumps No good theory for: - -Timing of reductions - -Which operator is applied Representation of Real World Decisions and actions - -When to take observation - -Which question to ask Predicted outcomes - -Probability distribution for next observable Values - -Accurate prediction - -Survival
i = wave function before observation i + = wave function after observation M i = measurement operation T i =time since last measurement E i = current experience V i =value to player -decision-chance event -value-deterministic event – –Players choose when to cause reduction events & operator to apply – –Players evolve representations – –Consistent with quantum mechanics » »Schrödinger evolution between reductions » »Dirac probabilities for selecting actual experience from possible experiences – –Players choose when to cause reduction events & operator to apply – –Players evolve representations – –Consistent with quantum mechanics » »Schrödinger evolution between reductions » »Dirac probabilities for selecting actual experience from possible experiences Representation of Players Choice as Decision Graph O1O1 E1E1 wave function reduction Shrödinger Dynamics T1T wave function reduction T2T2 Information influence V1V1 O2O2 E2E2 V2V2 Described in psychological terms Information influences and value nodes are modeled by standard physics
When to Reduce? Game theoretic semantics –Players utility function includes effort of applying operator and value of result –Choose reduction policy that maximizes players utility –Players interact and can affect each others utility –Evolutionary pressure for players who like policies conducive to survival As time since last observation increases –Probability of termination state increases –Fatigue decreases Components of uncertainty –Intrinsic stochasticity (object level uncertainty) –Lack of knowledge (higher order uncertainty) –Approximation error (model uncertainty) As players evolve more complex and more accurate models –Forecasts become more accurate –Less higher order uncertainty –Better ability to control question asking Players can learn to share information and effort
Direction of Time Second law of thermodynamics: time is direction of increasing physical entropy Learning universe hypothesis: time is direction of increasing knowledge of players about the universe they inhabit Can these arrows be reconciled? –Expansion of physical phase space –Contraction of information phase space
Communication Learning can be faster when players exchange information Communicating players exchange messages Players can learn each others representations –Efficient communication: Player 1 expresses difference between Player 1s knowledge and Player 2s knowledge in language of Player 2s representation –Mixed motives for information sharing Communication respects laws of physics –Intrinsic randomness –Prevents freezing at local optima
Summary Conscious agents construct representations Conscious agents learn better representations over time Common mathematics and algorithms for –Simulating physical systems –Learning complex representations »Many parameters »High degree of conditional independence (representation is restricted to low-dimensional subspace of all probability distributions) »High degree of self-similarity Conscious subsystems of universe evolve to construct better representations of themselves and the world around them
To Think About Where is information technology revolution heading? –Silicon intelligence? –Symbiotic carbon/silicon intelligence? –Earth consciousness? Assertion: It is important that we –Embed (decision theoretically) coherent logic and decision making rules in hardware of the systems we build –Base software architecture on decision and game theoretic semantics –Map the software dynamics properly to the physical dynamics –Understand the semantics of the knowledge representations we construct –Understand how they are realized in the physics –Understand the interface between the physics and the representation Questions to address: –What base beliefs should we embed in hardware? –What base values values should we embed in hardware? –How should we initialize decision rules? It is better to have our eyes open and our minds engaged in considering the possibilities than to just let it happen to us
Speculative Question Can intelligent life be modeled as attractor in phase space? –Players stay alive by constructing accurate (enough) representations –Control exercised by players introduces nonlinearity –Players guide the system toward edge of complexity »Simple enough to learn »Complex enough to evolve learners »Learners cooperate to organize into mutually beneficial societies Could we obtain a shadowing theorem? –Unconscious Schrodinger evolution moves away from the attractor under influence of local forces –Wave function reduction brings system back toward the attractor –Reduction event registers in consciousness of learners and increases their knowledge –Consciously adopting policies that keep us near attractor has survival value
Additional Speculative Questions Mixture distributions over models of different dimensions are active research area Bayes rule gives rise to natural Occams razor –Bias toward simple models (low-dimensional parameter space) –Dimensions are included as needed to explain observations Tractable approximation of complex models –Algorithms imported from physics (e.g., variational methods, Markov Chain Monte Carlo) are being applied to learn very complex models –Can we be modeled as MCMC samplers learning a representation of the universe we live in? If we average over models of different dimensions the parameter space is not a smooth manifold –Image: quantum foam –Might there be something to this image?