Presentation on theme: "Representing Representers and What They Represent"— Presentation transcript:
1 Representing Representers and What They Represent Note change to less pretentious and more accessible titleKathryn Blackmond LaskeyGeorge Mason UniversityDepartment of Systems Engineering and Operations ResearchKrasnow InstituteGMUQMind II
2 This talk is dedicated to the memory of journalist Danny Pearl, murdered in Pakistan in February 2002, and to the pioneering research of his father Judea Pearl. Judea Pearl’s research has the potential to create unprecedented advances in our ability to anticipate and prevent future terrorist incidents.Judea Pearl has been an inspiration to me throughout my career. Much of my work builds directly on his accomplishments. I also consider him a friend. Although prior to his abduction Danny Pearl did not consider himself a religious person, he was chosen to be murdered because of his Jewish heritage. The Jewish tradition has no official position on an afterlife, but there is a strong sense of our connection with previous generations and a belief that people live on through their impact on the lives of others.If through this dedication I am able to inspire people to apply Judea’s work in sound, principled, and effective ways to increase our ability to anticipate, detect, prevent, and respond to terrorist incidents, then I and those who are inspired by my words will help to ensure that Danny Pearl did not die in vain.
3 Representation A representation consists of: A representing systemA represented systemA mapping between the representing system and the represented systemImportant properties of the represented system correspond to features in the representationA conscious organismRepresents its environment and possibly itself to itselfUses its representations to engage in adaptive behavior with respect to its environmentSenseRecognizePlan and act
5 Science and Representation Elements of a representationReality to representSpace of possible representations of realityCorrespondence between aspects of reality and features in representation spaceImportant considerationsBy whom is representation being used?For what purpose?How to measure how good it is?Scientists study a phenomenon byBuilding a representation of the phenomenonManipulating the representationComparing non-obvious features of the representation to corresponding features in realityHow do we study representation?
7 Representing Representation ObservationsReal world with real representationcreated by real conscious subsystemArtificial world with simulated representationcreated by simulated conscious subsystemActions
8 Physics, Representation and Learning Cross-fertilization from physics to statistics and machine learning has created rapid progressRecipe for creating a good learning algorithmRepresent the learning problem as a physical system in which “low action” or “low free energy” maps to good representationSimulate the physical system on a computerLet the simulation evolve according to (simulated) laws of physicsPresto! Out comes a good solution to your problemThe opposite direction:Can ideas from learning theory give insights for a physics of consciousness?
9 Learners and Learnable Phenomena Good learnersLoosely coupled local learnersMulti-resolution representationsBias toward simple representationsCompose elements to form complex representationsAdjust appropriately to environmental feedbackIntrinsic randomness to bump out of locally but not globally optimal representationsLearnable systemsRepeated structureComplexity built up out of simple piecesNot too much randomnessA system capable of self-representation must beSimple enough to exhibit learnable regularitiesComplex enough to form and evolve representations of itself
10 ? 20th Century Science Observations Real World Representation Actions Physical realityWave functionDeterministic evolution punctuated by “jumps”No consensus onHow and why “jumps” occurHow consciousness interacts with physical worldActions
11 Stapp Theory of Consciousness Timing of reduction and choice of operator occur by conscious choiceEfficacious conscious choice enters where physics currently lacks a theoryComments on Stapp theoryStapp does not demand that all state vector reductions involve conscious choiceTheory and experiment verify that macroscopic evolution of physical system can depend on choice and timing of reductionsExperimentally verified quantum Zeno effect is one potential mechanism by which conscious choice might operateStapp argues that operation of quantum Zeno effect is plausible in conditions occurring in brains
12 Paradigm Shift in Computing Old paradigm: Algorithms running on Turing machinesDeterministicBased on Boolean logicNew paradigm: Economy of software agents executing on a physical symbol systemAgents make decisions (deterministic or stochastic) to achieve objectives“Program” is replaced by dynamic system evolving better solutionsBased on decision theory / game theory / stochastic processesHardware realizations of physical symbol systemsPhysical systems minimize actionDecision theoretic systems maximize utility / minimize lossHardware realization of physical symbol system maps action to utilityProgramming languages are replaced by specification / interaction languagesSoftware designer specifies goals, rewards and information flowsUnified theory spans sub-symbolic to cognitive levelsOld paradigm is limiting case of new paradigm
13 Decision Graph: An Example Maria is visiting a friend when she suddenly begins sneezing."Oh dear, I'm getting a cold," she thinks. “I had better not visit Grandma.”Then she notices scratches on the furniture. She sighs in relief. "I'm not getting a cold! It's only my cat allergy acting up!”Plausible inference2The evidence for cat allergy“explains away” sneezingand cold is no longer neededas an explanation31Does Maria have a “grandmother neuron”?
14 What Happened Under the Hood? A decision graph is both a knowledge representation and a computational architectureRepresents knowledge about variables and their interactionsModular elements with defined interconnectionsComputation can exploit loosely coupled structure for efficiencyParsimonyProbability distributions on 5 binary variables 31-dimensional spaceProbability distributions for Maria’s Bayesian network 9-dimensional spaceLearning about one variable affects likelihood of other variablesEvidence “flows” along the arcsBidirectional inferenceLearn structure and probabilities as cases accumulateThe information update operation is called Bayes RuleBayesian inference is belief dynamicsWithin-case evidence accumulationCross-case learningBefore Bayesian networks the prevailing wisdom was that probability was intrinsically intractable without ridiculously restrictive independence assumptionsBayesian network models are feasible to elicit100 variables and 3 states per variable (e.g., hi, med, lo)3100 = probabilities to elicit in a fully general modelIn a Bayesian network with 3 parents per node there are 8100 probabilities to elicitWe cut this down further usingreusable fragmentsadditional node-level structureBayesian network models are tractable to computeWe can run models with hundreds of variables faster than real timeWe have lots of tricks to increase efficiencyBut the main trick is the modular architecture that can be exploited by both KE and inferenceBayesian network models can be realistically complex without becoming unmanageableBayesian networks can be learned from cases. The theory for both online incremental learning and offline batch learning is well developed and application toolkits are coming online.
15 Subjective Probability PS(E|B) is system’s degree of belief that E will occur given background information BIn subjectivist theory there is no one “correct” probabilityViewpoints vary on whether “objective probabilities” existProbability as belief dynamicsIf new information N is added to background information B then belief in E changes to PS(E|B&N)Probability updating follows the dynamic equation known as Bayes rulePosterior odds ratioPrior odds ratioLikelihood ratioBelief in E1 increases relative to E2 if N was more likely to co-occur with E1 than with E2
16 Maria’s Continuing Saga… Variation 1:Tran is sneezing and saw scratchesTran was recently exposed to a cold and probably is not allergy proneVariation 2:Tran saw scratchesMaria did not see scratchesTran is in room with MariaVariation 3:Tran and Maria both are sneezing, are allergy prone, and saw scratchesTran and Maria are a continent apart
17 Variation 1Add background variables to specialize model to different individualsStill a “template model” with limited expressive power
18 Variation 2 Decision graph has replicated sub-parts Different kinds of entities (cats and people)
19 But is the cat dead or alive? Variation 3 Done WrongVariation 2 model gets wrong answer if Maria and Tran are not near each other and both are near cats!We need to be able to hypothesize additional cats if and when necessaryBut is the cat dead or alive?
20 Variation 3 Done Right (…but what a mess!) This model gets the right answer on all the variations
21 The Solution: Multi-Entity Decision Graphs SpatialFragmentHypothesisManagementFragmentCats & AllergiesFragmentValueFragmentColds&TimeFragmentSneezingFragmentSpecify model in pieces and let the computer compose themFirst-order predicate calculus plus probabilities and decisions
22 Representing Representation ObservationsRepresentation of“Real” WorldDecisions and actionsWhen to take observationWhich question to askPredicted outcomesProbability distribution for next observableValuesAccurate predictionSurvival“Real” WorldStochastic processTime evolution governed by Shrödinger equation plus “quantum jumps”No good theory for:Timing of reductionsWhich operator is appliedActions
23 Information influences and value nodes are modeled by standard physics Representation of “Player’s” Choice as Decision GraphY1Y1+O1E1wavefunctionreductionShrödingerDynamicsT1Y2Y2+T2“Information influence”V1O2E2V2Information influences and value nodes are modeled by standard physicsDescribed in psychological termsYi = wave function before observationYi+ = wave function after observationMi = measurement operationTi = time since last measurementEi = current experienceVi = value to player- decision - chance event- value - deterministic event“Players” choose when to cause reduction events & operator to apply“Players” evolve representationsConsistent with quantum mechanicsSchrödinger evolution between reductionsDirac probabilities for selecting actual experience from possible experiences
24 When to Reduce? Game theoretic semantics Player’s utility function includes effort of applying operator and value of resultChoose reduction policy that maximizes player’s utilityPlayers interact and can affect each other’s utilityEvolutionary pressure for players who “like” policies conducive to survivalAs time since last observation increasesProbability of “termination state” increasesFatigue decreasesComponents of uncertaintyIntrinsic stochasticity (“object level” uncertainty)Lack of knowledge (“higher order” uncertainty)Approximation error (“model uncertainty”)As players evolve more complex and more accurate modelsForecasts become more accurateLess higher order uncertaintyBetter ability to control question askingPlayers can learn to share information and effort
25 Direction of TimeSecond law of thermodynamics: time is direction of increasing physical entropy“Learning universe” hypothesis: time is direction of increasing knowledge of players about the universe they inhabitCan these arrows be reconciled?Expansion of physical phase spaceContraction of information phase space
26 Communication Learning can be faster when players exchange information Communicating players exchange messagesPlayers can learn each other’s representationsEfficient communication: Player 1 expresses difference between Player 1’s knowledge and Player 2’s knowledge in language of Player 2’s representationMixed motives for information sharingCommunication respects laws of physicsIntrinsic randomnessPrevents “freezing” at local optima
27 Summary Conscious agents construct representations Conscious agents learn better representations over timeCommon mathematics and algorithms forSimulating physical systemsLearning complex representationsMany parametersHigh degree of conditional independence (representation is restricted to low-dimensional subspace of all probability distributions)High degree of self-similarityConscious subsystems of universe evolve to construct better representations of themselves and the world around them
28 To Think About Where is information technology revolution heading? Silicon intelligence?Symbiotic carbon/silicon intelligence?“Earth consciousness”?Assertion: It is important that weEmbed (decision theoretically) coherent logic and decision making rules in hardware of the systems we buildBase software architecture on decision and game theoretic semanticsMap the software dynamics properly to the physical dynamicsUnderstand the semantics of the knowledge representations we constructUnderstand how they are realized in the physicsUnderstand the interface between the physics and the representationQuestions to address:What base beliefs should we embed in hardware?What base values values should we embed in hardware?How should we initialize decision rules?It is better to have our eyes open and our minds engaged in considering the possibilities than to just let it happen to us
29 Speculative QuestionCan “intelligent life” be modeled as attractor in phase space?“Players” stay alive by constructing accurate (enough) representationsControl exercised by “players” introduces nonlinearity“Players” guide the system toward “edge of complexity”Simple enough to learnComplex enough to evolve learnersLearners cooperate to organize into mutually beneficial societiesCould we obtain a shadowing theorem?Unconscious Schrodinger evolution moves away from the attractor under influence of local forcesWave function reduction brings system back toward the attractorReduction event registers in consciousness of learners and increases their knowledgeConsciously adopting policies that keep us near attractor has survival value
30 Additional Speculative Questions “Mixture distributions” over models of different dimensions are active research areaBayes rule gives rise to “natural Occam’s razor”Bias toward simple models (low-dimensional parameter space)Dimensions are included as needed to explain observationsTractable approximation of complex modelsAlgorithms imported from physics (e.g., variational methods, Markov Chain Monte Carlo) are being applied to learn very complex modelsCan we be modeled as MCMC samplers learning a representation of the universe we live in?If we average over models of different dimensions the parameter space is not a smooth manifoldImage: quantum foamMight there be something to this image?