Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise.

Similar presentations


Presentation on theme: "1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise."— Presentation transcript:

1 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise I CARUS

2 Transfer Learning Site Visit August 4, 2006 Results from Year 1 for the ISLE Team

3 3 ISLE: Transfer in I CARUS PI: Pat Langley I CARUS Architecture Architecture Components Conceptual inference: Icarus performs bottom-up inference from relational ground state literals to higher level state concepts. Skill execution: Icarus retrieves relevant skills for goals and executes them reactively. Skill learning: Icarus acquires general hierarchical reactive skills that explain/generate successful solution paths. Value learning: Icarus employs reinforcement learning to acquire a value function over game states using a factored state representation (hierarchy of first-order predicates) Long-TermConceptualMemoryShort-TermConceptualMemory Short-TermGoal/SkillMemory ConceptualInference SkillExecution Perception Environment PerceptualBuffer Problem Solving Skill Learning MotorBuffer Skill Retrieval Long-Term Skill Memory Contains relational and hierarchical knowledge about relevant concepts Generates beliefs using observed environment and long term conceptual knowledge Creates internal description of the perceived environment Contains descriptions of the perceived objects Contains inferred beliefs about the environment Contains hierarchical knowledge about executable skills Finds novel solutions for achieving goals Acquires new skills based on successful problem solving traces Selects relevant skills based on beliefs and goals Contains goals and intentions Executes skills on the environment Logically defined arbitrary rules of play Addressed by learning value function over game states Testbeds: Urban Combat and GGP First-person real-time shooter game Goal: find and defuse IEDs Addressed by learning new skills Results Urban Combat: Evaluation ongoing GGP: Transfer ratio of 1.3 on TL 7, jump start of 20..

4 4 University of Michigan: Transfer in Soar PI: John Laird Payoff Problem/Objective Solution Approach/Accomplishments Study transfer learning using multiple online architectural learning mechanisms Chunking (EBL) Reinforcement Learning, Semantic Learning Episodic Learning Determine strengths and weaknesses Develop reasoning strategies that maximize transfer Fair comparison of learning mechanisms All use same performance system Integration and synthesis of multiple learning mechanisms and reasoning strategies on same problem Not reliant on one mechanism Best technique used for given problem Positive interaction between methods Integrated Soar & Urban Combat Testbed Three learning approaches in UCT Levels 0-2 Significant transfer Body Long-Term Memories Procedural Short-Term Memory Decision Procedure Chunking Episodic Learning Semantic Learning Semantic Reinforcement Learning PerceptionAction Soar LevelMemorySearchRL n/a

5 5 Northwestern University: Companions PIs: Kenneth D. Forbus, Thomas Hinrichs Payoff Problem/Objective Solution Approach/Accomplishments Extend Companion Cognitive Systems architecture to achieve transfer learning Advance analogical processing technology Develop techniques for learning self-models Test using ETS Physics testbed New techniques for robust near and far transfer learning Advances can be incorporated in other cognitive architectures, systems Near term: Analogy Servers Long-range: Companions architecture used in military/intelligence systems Todays cluster is tomorrows multi- core laptop Analogy approach based on how humans seem to do transfer Study worked solutions to learn equations, modeling assumptions e.g., when could something be a point mass? Pilot experiment: Achieved transfer levels 1, 3, & 5 ETS-generated AP Physics test Worked Solutions Sketches included Y2-3 Learned strategies, encoding rules, and cases Learning to solve problems by studying worked solutions

6 6 UT Arlington: Urban Combat Testbed (UCT) PIs: L. Holder, M. Youngblood, D. Cook Develop Urban Combat Testbed (UCT), a simulated, real- time, urban combat domain Agent interface provides detailed, real-time perceptual information and command execution Human interface provides compelling video interface and keyboard/mouse command interface Develop scenarios for human and agent trials for each level of transfer Execute human and agent trials, compare transfer learning performance Investigate other approaches to transfer learning Human transfer learning Hierarchical reinforcement learning Agent-based cognitive architectures UCT version 1.0 available Based on Quake 3 Arena first-person shooter (FPS) game Enhanced to include realistic urban combat environments Agent version provides interface to game percepts and commands Human-player version provides standard interface as in commercial FPS games Under development Set of scenarios to evaluate different levels of transfer learning Random generation of scenarios Ability to log game interaction Technical DetailsHighlights Vision/Goals Develop Urban Combat Testbed (UCT) capable of generating tasks to evaluate transfer learning performance Conduct significant human trials to evaluate human transfer learning performance Disseminate UCT to community as a benchmarking tool for cognitive performance Investigate novel cognitive architectures for achieving transfer learning in Urban Combat and similar domains Achieve 70% of human transfer learning performance

7 7 UT Arlington: Reinforcement Learning PI: M. Huber Technical Approach Benefits and New Capabilities Integration and Deliverables Example and Performance Transfer of skill and concept hierarchies from training to transfer tasks Transfer skills and concepts are found automatically and carry probability and value attributes Transfer skills are extracted based on local system characteristics in the task domain Sub-skills are reward-independent Transfer skills have an associated probabilistic model Hierarchical concepts capture capabilities of the skill set Concepts capture probabilistic behavior of skills Concepts capture value attributes of the task domain Generated representation hierarchy and refinement process have bounded optimality properties Policies learned on the representation are within a bound of optimal The approach provides skill and concept hierarchies for use as representations by reasoning systems Provides probabilistic and utility information to representation hierarchies in ICARUS Explicit tie of reasoning structure and reinforcement learning Generates new, capability-specific concepts that could serve as new predicates in Markov Logic Networks (MLN) Probabilistic attributes can facilitate fast integration into MLN Integration and Delivery Milestones Integration: ICARUS int. MLN MLN/ICARUS Development: Skill utility Skill generalization Skill extension Deliverables: Prototype w. Prototype w. Final system UCT interface skill gen. Urban Combat Testbed (UCT) Training task: Go to flag Transfer task: Retrieve different flag Transfer from training to transfer task 29 sub-skills and associated concepts Reduction from 20,000 to 81 states Transfer Performance (Transfer Ratio - TR) TR 2.5 with skill transfer TR 5 with skill and concept transfer Skill Hierarchy Concept Hierarchy Selective, task-specific state space construction Hierarchical state representation Task learning Skill and concept extraction Extraction of sub-skills using subgoal discovery Learning of concepts characterizing skill capabilities Transferred concepts and skills are used to construct a more abstract Bounded Parameter state representation Learning on new, more compact representation leads to improved learning performance Year 1 Year 2 Year 3

8 8 University of Washington: Markov Logic PI: Pedro Domingos Payoff Problem/Objective Approach/Accomplishments - Transfer learning requires: - Relational inference & learning - Uncertain inference & learning - Markov logic provides this - Simple, general, unified framework - Needs: - Scaling to large problems - Online, lifelong operation - Extension to continuous data - Extension to decision-making - Key approaches: - Representation mapping - Statistical predicate invention - Accomplishments to date: - LazySAT: Efficient use of memory (400,000 X less than WalkSAT on BibServ) - MC-SAT: Fast mixed inference (>1000 X faster than Gibbs, tempering) - Alchemy system - Collaborated on integration w/ Icarus, etc. - Enables highest levels of transfer - Between relational structures, as opposed to surface descriptions - Enables transfer in the wild - Noisy, rich, real-world domains - As opposed to shoehorning problems into standard machine learning form -Broadly applicable AI technology - Greatly increases speed of adaptation Markov logic ILP Weight Learning WalkSAT MCMC Source Domain Target Domain

9 9 Cycorp PI: Michael Witbrock Payoff Problem/Objective: Knowledge-based transfer learning Supply background knowledge and well-encoded, logically meaningful domains and problem spaces Elaborate on background knowledge and knowledge gathered from source tasks and domains Informed by existing background knowledge in Cyc Source Testbed Situation, Status, & Queries Advice & Support Target Testbed Cyc KB (background knowledge) Collect knowledge relevant to a task, domain, or problem Elaborate on knowledge: Inferential expansion Probabilistic weighting Rule formation (ILP) Execution Agent(s) Perform inference; supply advice, query results, background knowledge Information flow among complementary learning and transfer mechanisms and approaches Establish a well-founded, mutually compatible base of assumptions and facts – necessary for transfer Allow systems to communicate observations, conclusions, skills, memories and intentions Learning can take full advantage of existing background knowledge, knowledge from less- obviously related domains and problems New high-level, semantically connected knowledge, within a context of existing knowledge: understanding Solution Approach/Accomplishments Representation of initial domains and solutions Existing knowledge relevant to domains identified Physics testing domain: encoding developed, first transfer level problems represented in Cyc Urban Combat (FPS) testbed: map space semantics defined; distribution being developed Initial integration of probabilistic reasoning System integrated, scalability testing underway Alchemy system Extended Rule and skill learning underway First utomatically-generated results from evaluation domains Application of work from BUTLER seedling New Rules and Skills: Rule Induction New Facts: Automated Knowledge Acquisition Expanded Knowledge: Inference & Markov Logic

10 10 Maryland/Lehigh: Hierarchical Task Nets PIs: Dana Nau, Héctor Muñoz-Avila Problem/Objective Solution Approach/Accomplishments Learn applicability conditions of HTN methods that tell how to decompose tasks into subtasks Input: plan traces produced by an expert problem-solver Reflects abstraction levels in the game Output: methods consistent with plan traces Can be transferred in different games HTNs represent knowledge of different granularity at different levels Facilitates transfer to different games Increasingly capable HTN learning algorithms Y1: transfer levels 1-3 Y2: transfer levels 4-7 Y3: transfer levels 8-10 Approach: our new HDL algorithm Can start with no prior information Can start with info transferred from a previous learning session Accomplishments: Development of the HDL algorithm Theoretical conditions in which HDL achieves full convergence [paper at ICAPS-06] Experiments: even when only halfway to convergence, HDL solved > 3/4 of test set MadRTS real-time strategy game HDL++ Learning Agent Statistical methods to compare learning curves TIELT Payoff for TL Scenario Generator

11 11 Rutgers University: Relational Templates PI: Michael Pazzani ConstraintClauses evaluated None320,968 Unique195,489 Commutative165,601 Both88,230 Approach Payoff Problem/Objective Solution Approach/Accomplishments Learn templates from Markov Logic Networks (MLNs) Learn Markov Logic Networks (MLNs) from templates Learning general concepts and strategies applicable across many domains, Transitivity thwarting, feigning Constraining Learning of MLN clauses Creating template from MLN clauses by Least General Generalization Speed Up MLNsTemplates SameVenue(a1, a2) v !SameVenue(a2,a3) v !SameVenue(a3,a1) SameTitle(a1,a 2) v !SameTitle(a2,a 3) v !SameTitle(a3,a 1) P(a1,a2) v !P(a2,a3) v !P(a3,a1)

12 12 UT Austin: Theory Refinement PI: Mooney SummaryProblem/Objective Faster learning in target domain by efficiently transferring probabilistic relational knowledge using bottom-up theory refinement. Determine appropriate predicate mapping by searching possible mappings to find the most accurate for the target domain. Use relational path-finding to more effectively construct new clauses in the target domain. Develop transfer learning methods for Markov Logic Networks (MLNs) that: Efficiently revise structure and parameters of learned knowledge from source domain to fit novel target domain. Automatically recover an effective mapping of the predicates in the source domain to those in the target domain. Approach 2. Determine which parts of the source structure are still valid in target domain and which need to be revised; annotate source MLN accordingly. 3. Specialize only overly- general clauses and generalize only overly-specific ones, leaving the good ones unchanged. Alchemy and our transfer algorithm equally improve accuracy over learning from scratch. Our approach decreases learning time and number of revision candidates significantly. Experimental Results (Mihalkova & Mooney, ICML06-TL workshop) 1. Find an effective predicate mapping. 4. Search for additional clauses using relational path- finding.

13 13 UT Austin: Reinforcement Learning PI: Peter Stone β (A)A γ (S)S I- TAC SME- QDBN 1 2 OX OX XOX XOOX OXXO XOX Problem/Objective 1 2 Develop core architecture-independent unified transfer learning technology for reinforcement learning Key technical idea: transfer via inter-task mapping –Generalization of value-function-based transfer Automatic discovery of inter-task mapping –I-TAC (inter-task action correlation) –SME-QDBN (structure mapping + qualitative dynamic Bayes networks) Value-function-based transfer and policy-based transfer Focus on results in many domains Transfer of knowledge among reinforcement learning tasks (within the same domain/testbed) –RoboCup Soccer, GGP Compare with Icarus GGP performance Results Technical Approach Automatic discovery of inter-task mapping –I-TAC (inter-task action correlation) Data-centered approach Train a classifier to map state transition pairs to actions in the source Use the classifier and state mapping to obtain the action mapping –SME-QDBN (structure mapping + qualitative dynamic Bayes networks) Knowledge/model-centered approach Represent action model using qualitative DBNs Specialized and optimized SME for QDBNs, using heuristic search RoboCup soccer –Value-function-based transfer: sarsa-learning, function approximators –Policy-based transfer: neuro-evolution (NEAT) GGP: value-function-based –Using symmetry to scale up the same type of games –Identifying game-tree features to transfer among different types of games Source Actions (a) Target Actions (a) 163 (76%)51 (24%)1 (<1%) 4v3 Pass (50%) 0 (0%) 4v3 Pass 2 97 (36%)174 (64%)1 (<1%) 4v3 Pass 1 71 (24%)0 (0%)227 (76%) 4v3 Hold 297 (92%)25 (8%)2 (<1%) 3v2 Pass 2 26 (7%)330 (93%)0 (0%) 3v2 Pass 1 0 (0%) 382 (100%) 3v2 Hold 3v2 Pass 2 3v2 Pass 1 3v2 Hold I-TAC SME-QDBN t.r. = 5.8 t.t.r. = 84% Connect-3 (4x4, same opp) CaptureGo (3x3, same opp) t.r. = 5.6 t.t.r. = 83% t.r. = 4.3 t.t.r. = 88% t.r. = 4.3 t.t.r. = 73% Minichess (5x5) RoboCup GGP

14 14 Model/knowledge oriented approach Using knowledge about –How actions affects state variables? –How state variables relate to each other? Use structure mapping to find similarities between source and target tasks Discover β and γ together Objective Technical Approach Results Keepaway match scores Representation: qualitative dynamic Bayes networks Specialized and optimized SME for QDBNs SME-QDBN uses heuristic search to find the mapping of the maximal score 1.Generate local matches and calculate the conflict set for each local match; 2.Generate initial global mappings based on immediate relations of local matches; 3.Merge global mappings with common structures; 4.Search for a maximal global mapping with the highest score; Structure Mapping can be decomposed into two parts –Mappings of states ( ) and actions ( ) –Transforming representation of value functions (table-based or function approximation) Current work focuses on automatic discovery of mappings of state variables and actions –Data oriented approach (I-TAC) –Model/knowledge oriented approach (structure mapping) Data oriented approach to automatic discovery of Considers mappings of states ( ) and actions ( ) separately (S)S (A)A Assume that is given.How can we learn β? Inter-Task Action Correlation (I-TAC) Technical Approach 1.Collect transition data in source domain 2.Train a classifier from state pairs to actions 3.Collect transition data in target domain, define as β(a) = arg max a #{all tuples with a | C(γ(s 1 ),γ(s 2 )) = a} Results Source Actions (a) Target Actions (a) 3v2 Hold3v2 Pass 1 3v2 Pass 2 3v2 Hold 382 (100%)0 (0%) 3v2 Pass 1 0 (0%)330 (93%)26 (7%) 3v2 Pass 2 2 (<1%)25 (8%)297 (92%) 4v3 Hold 227 (76%)0 (0%)71 (24%) 4v3 Pass 1 1 (<1%)174 (64%)97 (36%) 4v3 Pass 2 0 (0%)133 (50%) 4v3 Pass 3 1 (<1%)51 (24%)163 (76%) UT Austin: Mapping Value Functions PI: Stone

15 15 UT Austin: Feature Construction PI: Stone Scale up from small to large version of same game –Simultaneous update of isomorphic states –Exploit symmetry to scale up RL in board games Transfer between different small games –Table-based learning but transfer in feature space –Automated discovery of state-features –Initialization by feature-matching –Two person, complete-information, turn-taking games Technical Approach Results (rand opp) Objective Connect-3, 4x4 t.r. = 4.3 t.t.r. = 73% Minichess (5x5) Verify presence of symmetries on smaller task (larger task => too much memory) Transfer knowledge to larger task (simultaneous backups for upto 8 transitions) Othello, 4x4 Feature discovery Features discovered Feature extraction/matching based on abstract game-tree expansion upto 2 levels Findings Limited look-ahead based features are quick to extract and match, few (manageable knowledgebase), highly common/reusable, and faster than minimax lookahead against suboptimal opponents Future Plan: Abstraction Matching transfer Minimax lookahead tr =1.66, ttr=56.7% tr~ 40, ttr~ 99%

16 Transfer Learning Site Visit August 4, 2006 Proposal for Year 2 from the ISLE Team

17 17 Changes from Initial Plans Year 1 Full integration did not happen in Y1 –Component systems (Icarus, Soar, Companions, LUTA, CaMeL) were developed independently and did not emerge as a single system –Ideas and/or subsystems of component efforts to be integrated in later years. Little use of background knowledge in Y1 –Still believe it is critical for taking full advantage of transfer opportunities, but… –Y1 concentrated on basic navigation and problem solving without exploiting deep semantic domain knowledge Markov logic not used in Y1 testbed evaluations –Initial integration with I CARUS is finished, but efficiency issues advised against its use for Y1 tasks –Improving efficiency is a top Y2 priority Year 2 Continuing with three main architectures –Development of Component systems will continue. –Evaluation will focus on comparing and contrasting agent architectures. Focus on highest transfer levels in all three testbeds – Urban Combat, Physics, and GGP More interesting scientific results linked to key claims, but fewer total experimental conditions and less engineering Management structure for project will change to a matrix organization

18 18 Year 2 Matrix Management Structure ISLE (Langley) Oversight ISLE (Langley) I CARUS UW (Domingos) Markov logic UW (Domingos) Alchemy Rutgers (Pazzani) Rel. templates UT (Mooney) Theory Revision Michigan (Laird) Soar extension NU (Forbus) Companions UT (Stone) LUTA extension Maryland (Nau) HTN planning ISLE (Konik) Skill learning Cyc. (Witbrock) CYC integr. ISLE (Langley) Oversight ISLE (Shapiro) Urban Combat WSU (Holder) UC extensions ISLE (Stracuzzi) I CARUS on GGP NU (Forbus) Compns Physics UT (Stone) LUTA on GGP ISLE (Stracuzzi) GGP evaluation Cyc. (Matuszek) Physics eval. Michigan (Laird) UC evaluation ISLE (Choi) UC evaluation ISLE (Konik) I CARUS Physics WSU (Holder) Humans on UC Technology Development Experimental Evaluations Technology work breaks down into extending Markov logic, integrating Markov logic and HTNs into I CARUS, and extending other agent architectures Evaluation efforts focus on GGP (external), Urban Combat (internal), and ETS Physics (external), each used on two agent architectures

19 19 Expected Year 2 Products Extended Alchemy software that includes: Techniques for inventing new predicates that support mapping across domains Methods for revising inference rules based on observed regularities (from UT Austin) Methods for using relational templates to learn from few instances (from Rutgers) Ability to access background knowledge from CYC (from Cycorp) Extended I CARUS software that includes: Techniques for learning goal-oriented mappings that support transfer More flexible inference using Alchemy as a central module (from Washington) Extended methods for learning skills in adversarial contexts (from Maryland) Methods for combining skill learning with value learning (from UT Austin) Extended versions of software for: Soar (Michigan) that supports transfer by semantic learning and chunking Companions (Northwestern) that supports transfer by deep structural analogy LUTA (UT Austin) that achieves transfer by knowledge-based feature construction Extended Urban Combat testbed that: Includes a richer variety of objects, activities, and spatial settings Supports multi-agent coordination and multi-agent competition Allows tests of high-level transfer from urban military operations to search and rescue activities

20 20 Claims about Transfer Learning Claim: Transfer that produces human rates of learning depends on reusing structures that are relational and composable Test: Design source/target scenarios which involve shared relational structures that satisfy specified classes of transformations Example: Draw source and target problems from branches of physics with established relations among statements and solutions Claim: Deep transfer depends on the ability to discover mappings between superficially different representations Test: Design source/target scenarios that use different predicates and distinct formulations of states, rules, and goals Example: Define two games in GGP that are nearly equivalent but have no superficial relationship Meta-Claim: These claims hold for domains that involve reactive execution, problem- solving search, and conceptual inference Test: Demonstrate deep transfer in testbeds that need these aspects of cognitive systems Example: Develop transfer learning agents for Urban Combat, GGP, and Physics Predicate invention for representation mapping in Markov logic (Washington) Goal-directed solution analysis for hierarchical skill mapping (ISLE) Representation mapping through deep structural analogy (Northwestern) Semantic learning augmented with procedural chunking (Michigan) We will explore four paths to deep transfer:

21 21 ISLE Year 2 Plans for I CARUS Combine rapid analytic creation of hierarchical skills with statistical estimation of their utilities Learn relational concepts that characterize the conditions under which skills achieve goals Retrieve relevant skills even when the goals that index them match only incompletely Acquire mappings among domain representations based on analysis of problem solution traces Use these capabilities to support deep transfer We will demonstrate deep transfer in three separate testbeds with distinct characteristics. I CARUS Unique Capabilities Plans for Evaluation Integration Plans Mapping Concepts and Skills Long-TermConceptualMemoryShort-TermConceptualMemory Short-TermGoal/SkillMemory ConceptualInference SkillExecution Perception Environment PerceptualBuffer Problem Solving Skill Learning MotorBuffer Skill Retrieval Long-Term Skill Memory Replace w/Alchemy inference software Augment with CYC knowledge base Incorporate HTN planning methods Cycorp Washington Maryland Add methods for learning value fns UT Austin source concepts target concepts source skills target skills ICARUS will not only learn hierarchical skills and concepts, but also how they map across different settings Urban CombatProblem Solving in Physics General Game Playing

22 New Technology: Concept Revision 1. Learn new domain-specific concepts 2. Generalize these concepts to expand possible transfer opportunities 3. Specialize again in target domain to increase utility Details vary, but underlying structure unchanged Y2 Plans for I CARUS on GGP Challenges Domain Independence –Remove assumption of chess-like games –Expand beyond common board games, consider puzzles or games with many players Concept learning and revision –Remove assumption that domain-specific concepts will be provided –Agent must discover new concepts or revise existing ones Goals Demonstrate discovery / transfer of structural domain knowledge –Build on Y1 success with first-order concepts –Learn relationships among concepts to capture domain structure –Expand learning of relative concept utility to revise concepts to improve utility Generalize existing concepts to expand coverage Specialize general concepts to improve utility –Derive new concepts from game description Domain-specific concept (source) Generalized concept Specialized concept (target) New Technology: Concept Derivation 1. Derive basic concepts from game description 2. Evaluate utility through experience 3. Construct more complex structures by combining concepts and expanding derivation GGP Game Description Simple derived concepts Complex structures Return to description and derived concepts for further expansion

23 23 General Concept Transform the current situation When expecting or searching for transfer Retrieve a memory based on transformed situation Automatic (procedural) or Deliberate (semantic/episodic) Use transfer memory to impact behavior Control selection of actions Decide on strategy or tactic Perform target task Generate behavior Sense environment Create internal situational assessment Michigan Year 2 Plans for Soar Body Long-Term Memories Procedural Short-Term Memory Decision Procedure Chunking Episodic Learning Semantic Learning Semantic Appraisal Detector Reinforcement Learning PerceptionAction ExperienceIdentify Generalize Abstract Store Source ProblemsTarget Problems Soar provides Extreme flexibility in every phase of transfer Multiple performance methods Task-dependent knowledge for abstraction, transformation, instantiation Multiple learning mechanisms Create general concept/skill/… Generalization based on multiple examples Abstraction based on prior semantic knowledge Store in memory for later recall Different memories for different types of knowledge Procedural, semantic, episodic Identify elements that might be useful Everything, but literal (episodic) Categories, structures (semantic) Results of processing (chunking) Explicit analysis (reflection) Perform source task Generate behavior Sense environment Create internal situational assessment Experience Retrieve Use TransformRetrieve Transform Transform/map retrieved memory Explicitly map to current situation or Instantiate for current situation Retrieve from memory a related concept For some memories this will be automatic For others, it will be deliberate

24 24 Level 9 Transfer in Soar Body Long-Term Memories Procedural Short-Term Memory Decision Procedure Chunking Episodic Learning Semantic Learning Semantic Appraisal Detector Reinforcement Learning PerceptionAction ExperienceIdentify Generalize Abstract Store Source ProblemsTarget Problems Source: Hunted dies after getting trapped in a dead end learns spatial configuration of dead end learns dead end is deadly to hunted Target: Hunter tries to chase hunted to a location it has recognized as a dead end. Perform source task Hunted dies after getting trapped in a dead end ExperienceUse TransformRetrieve Perform target task As hunter, tries to develop strategy for killing hunted Use transfer memory to impact behavior Searches for dead ends Tries to herd hunted into dead ends Identify elements that might be useful Death is feedback that made mistake Create general concept/skill/… Uses episodic knowledge to recall behavior that led up to death Analyzes spatial configuration Causal knowledge determines critical features Store in memory for later recall Stores dead-end concept in semantic memory and associates bad result Transform the current situation Creates internal model of hunted Retrieve a memory based on transformed situation Queries memory – what would be bad when I imagine myself as hunted? Retrieves memory of dead end

25 25 Level 10 Transfer in Soar Body Long-Term Memories Procedural Short-Term Memory Decision Procedure Chunking Episodic Learning Semantic Learning Semantic Appraisal Detector Reinforcement Learning PerceptionAction ExperienceIdentify Generalize Abstract Store Source ProblemsTarget Problems Source: 1v1 Learns to pick up ammo to deny enemy Target: Fire rescue Transforms to remove gasoline near fire ExperienceUse TransformRetrieve Perform source task Tries to kill enemy Identify elements that might be useful Encounters experience when can pick up enemy ammo and realizes that would deny enemy ammo Store in memory for later recall Stores general concept in semantic memory Perform target task As fire rescuer, try to search building (and avoid dieing, flames, etc.) Transform the current situation Analyzes situation Determines that fire is its enemy Retrieve a memory based on transformed situation Queries memory for ways to defeat an enemy Retrieves general concept about resources Create general concept/skill/… Uses background knowledge to generalize to a concept of deny enemy its resources necessary to hurt me Use transfer memory to impact behavior Instantiates general concept in current situation [resource map to air and fuel – wood, gasoline, …] Takes actions to eliminate fuel

26 26 Northwestern Year 2 Plans for Companions Foundation: Analogical Processing Northwesterns technology is based on how humans seem to do transfer – by analogy and similarity Based on Gentners (1983) Structure-Mapping theory Simulations of cognitive processes engineered into components in prior DARPA research –SME: Analogical matching, similarity estimation, comparison –SEQL: Generalization –MAC/FAC: Similarity-based retrieval Approach Extend Companions Cognitive Systems architecture by –Creating and incorporating advances in analogical processing –Develop techniques to learn self-models to help formulate own knowledge goals Compare Companions and ICARUS in physics testbed –Help ISLE and Cycorp integrate our representations and support libraries into ICARUS –Extend as necessary (e.g., sketching support) Metrics Coverage = Fraction of time an answer is generated Accuracy = Whether the answer is right (including partial credit) Companions Cognitive Systems Architecture Structure-mapping operations appear to be heavily used throughout human reasoning and learning Hypothesis: Can achieve human-like reasoning and learning by making structure-mapping operations central in a cognitive architecture Year CoverageAccuracy Near Transfer Far Transfer Near Transfer Far Transfer 150%0%50%0% 280%50%90%80% 390%80%90%

27 27 Northwestern Year 2 Plans for Far Transfer (7-10) A battery is like a pump Advice can be about appropriate analogs, mappings, analogical inferences Analogical encoding will let Companions work with more abstract advice Metamappings will guide cross-domain analogies by first matching general knowledge KB Persistent Mappings store ongoing understanding of cross-domain analogy Expanded self- modeling capabilities to improve skills and knowledge Need to study pulley problem s more Need to figure out trig inverses

28 28 University of Washington Year 2 Plans Integration Evaluation Unique Contribution - Integrate into Icarus - Apply to Physics and Urban Combat - Transfer Level 8: - 60% of human performance - Transfer Levels 9-10: - 30% of human performance - Infrastructure: - Component-wise evaluation - White box evaluation Technologies Alchemy I CARUS Percepts Inferences - Representation mapping - Entities - Attributes - Relations - Ontologies - Situations - Events - Based on statistical predicate invention - Discover abstract relations, etc., & transfer - Infrastructure - Efficient inference and learning - Online, lifelong operation - Extension to continuous data - Extension to decision-making Predicate Invention Infrastructure Representation Mapping

29 29 UT Austin Year 2 Plans (Mooney) Theory RefinementPredicate Mapping Improve systems ability to revise the structure of the source Markov Logic Network (MLN) to fit the target domain. Improve efficiency of clause generalization and specialization procedures by using bottom-up search to directly identify productive changes rather than blindly searching the space of possible refinements. Improve generation of new clauses in the target domain by exploiting advanced ILP methods. Improve systems ability to accurately map predicates from source to target domain. Use schema-mapping techniques from information integration to suggest predicate mappings by analyzing source and target data. Use lexical knowledge (e.g. WordNet) to guide matching of predicate names in source and target. Use heuristic search to improve efficiency of finding the best overall predicate mapping. Integration with Alchemy and I CARUS Evaluation on Testbeds Incorporate MLN Transfer Learning Methods into Alchemy and Icarus Integrate predicate mapping and theory refinement methods into UW Alchemy MLN software package. Integrate our transfer learning methods for MLNs into Icarus+Alchemy to provide transfer of static inferential knowledge from the source to the target domain. Evaluate MLN TL methods on ISLE Testbeds In Urban Combat and other ISLE testbeds, measure the accuracy of transfer learning at making within-state inferences (using AUC) compared to learning an MLN from scratch by adapting knowledge from source to target tasks for several levels of transfer. Measure training time of our system versus existing Alchemy to demonstrate improved efficiency. Compare ablated version of Icarus without MLN-transfer- learning to enhanced version on final testbed performance metrics and demonstrate improved performance. Source Training Data MLN Learner (Alchemy) Source MLN MLN Revision Target Training Data Target MLN Predicate Mapping

30 30 Rutgers University Year 2 Plans Integration Evaluation Unique Contributions - Alchemy Integration into Icarus - Apply to Physics and Urban Combat - Transfer Levels 9-10: - 50% of human performance - Infrastructure: - Component-wise evaluation: Alchemy Technologies -Learning and Instantiating Templates - Based on second order learning - Discover general regularities & transfer -Entailment Learning -Combining Inductive and deductive learning -Discover simple rules and combine (e.g., OCCAM) - Infrastructure -Template Learner for Markov Logic Networks - Deductive Learner for Markov Logic Networks Templates Alchemy I CARUS Percepts Inferences MLNS Template Learning and Instantiation Learning by entailment

31 31 Cycorp Year 2 Plans Technologies and Capabilities Problem/Objective: Knowledge-based transfer learning Supply formalized domain expertise and well-encoded, logically meaningful domains and problem spaces Elaborate on background knowledge via ILP and inference, provide advice, and extend knowledge gathered from source tasks and domains Informed by existing background knowledge in Cyc New Rules and Skills: Rule Induction via ILP New Facts: Domain & General Knowledge Expanded Knowledge: Inference, Advice, & Probabilities Technical Development Provide domain knowledge for use by Urban Combat, Physics, and GGP performers Provide inference capabilities, including query support, goal advice, and knowledge elaboration, for UCT, Physics and GGP performers Pursue knowledge gathering and elaboration via ILP over domain and background knowledge Pursue inference speedup and results improvement via Reinforcement Learning of inference pathways Integration & Coordination Integrate Alchemy and other probabilistic reasoning approaches with Cycs inference capabilities Tasks Responsibility for technical integration of Alchemy, Cyc, and other inference approaches Cyc background knowledge & inference capability Source Testbed Situation, Status, & Queries Advice, Support, & Elaboration Target Testbed Collect knowledge relevant to a task, domain, or problem Develop knowledge: Inferential expansion Probabilistic weighting Rule formation (ILP) Execution Agent(s) Perform inference; advice, query results, background, skills and memories Responsibility for technical coordination of groups developing on the Physics testbed Payoff Information flow among complementary learning and transfer mechanisms and approaches Establish a well-founded, mutually compatible base of assumptions and facts – necessary for transfer Allow systems to communicate observations, conclusions, skills, memories and intentions Learning can take full advantage of existing background knowledge, knowledge from less- obviously related domains and problems High-level, semantically connected knowledge, within a context of existing knowledge = understanding

32 32 Cycorp Year 2 Plans Evaluation and Integration Coordination & Integration Representation Coverage: in each testbed, How many problems are represented? How many types of problem? What novel problem categories? How many and what type of obstacles, goals, percepts, and actions? What novel types of solution information? Accuracy: Well-represented domains are critical for successful performance; accurate representations are demonstrated by successful agent evaluations. Cyc KB Testbeds: Urban Combat, Physics, GGP S2 IS2-1 IS2-j S1 IS1-i IS1-1 IS2-1 IS1-i IS2-1 IS1-1 IS1-i IS2-j S3 IS3-1 IS1-1 Inference Engines & Approaches Reasoning Over Queries & Testbeds Coordinating Inferences Inferences (Queries, Goals, Search Paths, Elaborations) Queries & Inference Needs; Skills, Concepts, LTMs Domain Knowledge Coordination & Semantic Content Background & Problem Representation LTMs & Followup Queries Large Scale ILP, Knowledge Seeking, Generalization, LTMs COMPANIONS Soar Queries, Goals, Elaborations Knowledge, Goals, Analysis, Advice ICARUS How many formal representations of problems and queries in different testbeds are shared by different architectures? How many inference requests, of how many types, go through a common interface? How effectively can knowledge be probabilistically qualified (as measured by crossfold validation)? Learning & Transfer: What novel fact-level knowledge gathered for the source is reused in the target space? How many facts, in what domains? How many rules can be obtained via ILP over gathered and domain knowledge, in what domains? What agent skills are obtainable by ILP within Cyc? Advice-giving and query results: What appropriate, novel goals are presented? What improvement on random search can be obtained through advice? Skills, abilities, and long-term memories: Inference & Learning Knowledge & Inference Soar, ICARUS, Companions Urban Combat, GGP, Physics What novel abilities can agents demonstrate with knowledge and inference support? What new problems are solvable that could not be solved without that support?

33 33 Maryland/Lehigh Year 2 Plans New Technologies Solution and Evaluation 1.Mapping between Icarus hierarchical representations and HTNs 2.Techniques for systematically extending planners to work in adversarial domains (i.e., multiple possible responses from an adversary) 3.Extensions to Icarus to learn in such domains A mapping between Icarus hierarchical representations and Hierarchical Task Networks New algorithms will provide capabilities to reason about adversaries: I.e., to learn about them and to plan against them This will provide high-level transfer in adversarial environments via learning about abstract strategies/models of the behaviors of single or groups of adversaries in one scenario and transferring this knowledge to another scenario Contributions 1.How: Generalize our planner-modification techniques to deal with adversaries 2.Work with ISLE to generalize Icarus learning to learn about adversaries 3.When: 1.September-December 2006: develop the theory and implement the new algorithms 2.January-April 2007: Work with the ISLE team to incorporate algorithms into Icarus 3.May 2007: Evaluation: Use the GGP testbed for Year 2 Icarus does plan abstraction by grouping actions The groups are analogous to Hierarchical Task Network (HTN) decomposition templates (e.g.., as in SHOP2) Planner-modification techniques for systematically generalizing planners to work with nondeterministic actions (i.e., multiple possible outcomes) Capabilities a5a6 a2 a3a4 a1 our action State 1 State 2 adversary response 1 adversary response 2 action State 1 State 2 possible outcome 1 possible outcome 2

34 34 UT Austin Year 2 Plans (Stone) β (A)A γ (S)S I- TAC SME- QDBN OX OX XOX XOOX OXXO XOX Evaluation Unique Abilities Automatic discovery of inter-task mapping –I-TAC (inter-task action correlation) Train a classifier to map state transition pairs to actions in the source Use the classifier and state mapping to obtain the action mapping –SME-QDBN (structure mapping + qualitative dynamic Bayes nets) Knowledge/model-centered approach Represent action model using QDBNs Specialized and optimized SME for QDBNs using heuristic search Policy-based transfer Capabilities/Technologies Integration Incorporate RL into Icarus and/or Soar –Focus on leveraging action-value functions into generalizable planning knowledge –Abstract learned RL knowledge to relational representations ISLE team comparisons: compare value function transfer vs. Icarus approach in GGP Evaluate same core algorithms in multiple domains GGP: value-function-based –Use symmetry to scale up within same game types –Game-tree features to transfer among different types of games –Automatic abstraction discovery RoboCup Soccer –Value-function-based transfer: sarsa, function approximators –Policy-based transfer: neuroevolution (NEAT) Urban Combat -- continued evaluation of year 1 effort Core architecture-independent TL for reinforcement learning t t+1 Hold Dist(K1,C) Dist(K2,C) Dist(T1,C) Dist(K1,K2) Dist(K1,T1) Dist(K2,T) Ang(K2,K1,T) DAng(C,K1,K2) DAng(C,K1,T1) Dist(K2,T1) Ang(K2,K1,T1) Dist(K1,C) Dist(K2,C) Dist(T1,C) Dist(K1,K2) Dist(K1,T1) Dist(K2,T) Ang(K2,K1,T) DAng(C,K1,K2) DAng(C,K1,T1) Dist(K2,T1) Ang(K2,K1,T1) - min 13 inputs, 3 outputs 19 inputs, 4 outputs

35 35 Generalization of skills and concepts Policy homomorphisms as general skills Relational concepts and features from homomorphic mapping Relational Learning Reinforcement learning with relational concepts and generalized skills Estimation of skill/concept utility Skill utility to regulate exploration Concept utility to improve state hierarchy construction UT Arlington Year 2 Plans (Huber) Technical Approach Novel Capabilities Integration Evaluation Plans Generalization of skills and concepts, and estimation of skill and concept utilities to improve transfer Automatic definition of relevant representational concepts and utility-based guidance for efficient hierarchy construction and skill exploration in RL. Learning of generalized, parametric skills Generalized policies apply in novel situations and environments Skills have operator descriptions with utilities and probabilities Automatic generation of useful representational concepts Generation of task-relevant relations in the form of predicates Discovery of relevant feature sets and object types Automatic derivation of skill and concept utilities Concept utilities allow construction of appropriate representation Skill utilities guide exploration or guide planning Provides RL-based creation of hierarchical skills and concepts with symbolic representations, and skill and concept utilities to provide guidance on their use. RL-based skill learning component for use in ICARUS Learned operator representations facilitate integration of skills Learned features and concepts can augment concept hierarchy Skill and concept utilities for search and planning guidance Skill utility estimates can guide operator selection Concept utility can inform the representation investigated Development and Integration Timeline Skill generalization: Skill/concept utility: New capabilities will extend the set of transfer levels the Hierarchical RL system can address Evaluation within the Urban Combat Testbed (UCT) Application of standalone system to transfer levels 1-6 Evaluation focus on tasks with significant change of the environment and of the task objective Evaluation of performance using Transfer Ratio (TR) Target of TR values larger than 2 Evaluation of use of capabilities by evaluation of frequency and task utility of generalized skills Year 2 Year 3 Skill Hierarchy Concept Hierarchy Selective, task-specific state space construction Hierarchical state representation Task learning Skill and concept extraction Skill and concept generalization Skill and concept utility Development Integration

36 36 Year 1 Evaluation Plans Comparison among architectures should reveal the conditions for successful transfer learning. But implementing agents that can operate in multiple testbeds takes considerable time and resources. Instead, we will develop agents within two architectures for each testbed, with only one (I CARUS ) being applied to all three of them. Experiments will evaluate how well each pair of frameworks supports transfer involving quite different forms of knowledge. Urban Combat ETS Physics GGP Soar Companions LUTA I CARUS

37 37 Year 2 Evaluation Plans Comparison among architectures should reveal the conditions for successful transfer learning. But implementing agents that can operate in multiple testbeds takes considerable time and resources. Instead, we will develop agents within two architectures for each testbed, with only one (I CARUS ) being applied to all three of them. Experiments will evaluate how well each pair of frameworks supports transfer involving quite different forms of knowledge. Urban Combat ETS Physics GGP Soar Companions LUTA I CARUS

38 38 Urban Combat Level 9 Transfer: Hunted to Hunter Hunted Hunter Learn that there is a path with very low visibility Learn that getting caught in a dead end is deadly Avoid path that goes near ambush places Learn to check the hidden path periodically Learn to try to trap hunted in dead end Discover a place that makes a good ambush Tactical reasoning and strategies; Symbolic and spatial representations Transfer Scenarios in Urban Combat Testbed (UCT)

39 39 Urban Combat Level 10 Transfer: 1v1 to Fire Rescue 1 vs. 1 Fire Rescue Pick up enemy ammo Avoid being seen or shot Dont get caught in dead end Use doors, walls for protection Always have multiple exits Consume enemy resources Take advantage of terrain Always leave an out Tactical reasoning and strategies; Symbolic and spatial representations Transfer Backburn or backdraft Remove wood from fires path Scenarios in Urban Combat Testbed (UCT)

40 40 Year 2 Plans for Urban Combat Evaluation UCT TL0 UCT TL1 Source (A) Problems Target (B) Problems Interface Experience Performance Transfer Ratio > 30% (Y2 Go/No-Go) Source: 1v1, 2v2 Target: FireRescue Source BK BK+ TK Target BK TL0 UCT B Transferred Knowledge (TK) Tactical Terrain Resource Spatial Tactical Terrain Resource Spatial Ammunition Combustible Deadend NoExit Transfer Learning Performance: Transfer ratio (go/no-go) Demonstrate deep transfer Comparison to human trials

41 41 Year 2 Plans for Urban Combat Evaluation Metrics –Hunter: Time it takes agent to kill opponent plus time penalties for health loss –Hunted: Inverse of time before opponent kills agent plus time penalties for health loss –1v1: Time to kill N opponents plus time penalties for health loss and fewer than N kills –Fire Rescue: Time to rescue ally from fire plus time penalties for health loss

42 42 Year 2 Plans for Urban Combat Evaluation Transferred knowledge –Hunter Hunted (TL level 9) Spatial –Visibility, dead-ends, ambush places, terrain Tactical –Check (hunter) / seek (hunted) low visibility areas –Trap in (hunter) / avoid (hunted) dead-ends –Seek (hunter) / avoid (hunted) ambush places –1v1 Fire Rescue (TL level 10) Spatial –Accessibility of resources (ammunition / combustibles) –Dead-ends, exits, terrain Tactical –Consume enemy resources –Use terrain for protection –Always leave an out –Taxonomic Types of terrain, spaces, resources and tactics

43 43 Year 2 Plans for Urban Combat Evaluation Performance milestones –Based on TL levels 9 and 10 –Go/No Go: Transfer ratio > 30% –Demonstrate achievement of specific deep transfer opportunities (e.g., ammunition combustible –Comparison to human trials Transfer ratios Deep transfer

44 Level 10: Differing Source / target game graphs share substructure corresponding to transferable strategy. Year 2 Experimental Plans for GGP GGP Terms Game: defines environment in which agent operates. –Includes initial state, terminal states, transition function, goals –Score associated with goal and terminal states Match: competition between two agents in a game Scenario: source / target pairing of games –Source / target may vary in one or more ways (initial state, terminal states, goals, transitions) –Typically exactly one source and one target game Protocol Two transfer levels 9. Reformulating 10. Differing Seven scenarios per level Multiple consecutive matches in each game Fixed opponent (non-learning) Players receive score according to goal Domain performance metric: Score from satisfied goal Domain performance goal: Maximize score SourceTarget Structural Transfer Level 9: Reformulating Source / target game graphs are isomorphic. Source / target game descriptions fundamentally different Different axioms Different structural representation Equivalent meaning (same state graph structure) Transfer must occur at structural level Opponent choices can lead to several possible successor states Current goal value is 50 Regardless of the opponents move, agent can reach a state with a higher goal value Example: Generalized Fork

45 45 Year 2 Plans for Physics Testbed All of Newtonian Dynamics –Requires calculus, higher-degree polynomials, graphs Two areas from Dynamical Analogies –Well-explored cross-domain analogies in various physical domains –Excellent venue for exploring distant transfer Two areas from Dynamical Analogies –Well-explored cross-domain analogies in various physical domains –Excellent venue for exploring distant transfer –Example: Domain A = linear motion Domain B = rotational motion, thermal systems, hydraulics, electricity, …

46 46 Physics Testbed Dynamical Analogies in Detail

47 47 BACKUP SLIDES

48 48 Transfer Using Structure Mapping Model/knowledge oriented approach Using knowledge about –How actions affects state variables? –How state variables relate to each other? Use structure mapping to find similarities between source and target tasks Discover β and γ together Objective Technical Approach Results Keepaway Summary Works nicely for Keepaway Strong demand for domain knowledge Provides similarity measures for source and target Future work –Improve efficiency –Learn QDBN from data –Apply to GGP and Urban Combat Qualitative DBNs –Dynamic Bayes networks are structure representation for actions: an action (directly) affects a small number of state variables –Probabilities are less relevant; more qualitative properties matter: no change, increase/decrease, etc. Specialized and optimized SME for QDBNs –Fixed types of entities –How entities match? –How to evaluate mappings? SME-QDBN uses heuristic search to find the mapping of the maximal score –Prune with upper bounds 1.Generate local matches and calculate the conflict set for each local match; 2.Generate initial global mappings based on immediate relations of local matches; 3.Merge global mappings with common structures; 4.Search for a maximal global mapping with the highest score; Step 3 Step 4 Algorithm t t+1 Hold Dist(K1,C) Dist(K2,C) Dist(T1,C) Dist(K1,K2) Dist(K1,T1) Dist(K2,T) Ang(K2,K1,T) DAng(C,K1,K2) DAng(C,K1,T1) Dist(K2,T1) Ang(K2,K1,T1) Dist(K1,C) Dist(K2,C) Dist(T1,C) Dist(K1,K2) Dist(K1,T1) Dist(K2,T) Ang(K2,K1,T) DAng(C,K1,K2) DAng(C,K1,T1) Dist(K2,T1) Ang(K2,K1,T1) - min

49 49 Policy Transfer Using NEAT An alternative to value-function-based transfer Direct policy transfer based on the mappings of state variables (β) and actions (γ) NEAT (NeuroEvolution of Augmenting Topologies) –Uses genetic algorithms to evolve neural networks –Neural networks are used as action selectors Results for Keepaway Results for Scheduling Objective NEAT evolves 3v2 players 13 inputs, 3 outputs 19 inputs, 4 outputs Use a (from β & γ) to transform organisms NEAT evolves 4v3 with population from 3v2 An autonomic computing task Task: determine in what order to process jobs Goal: maximize aggregate utility Source task: 2 job types (8 state variables, 8 actions) Target task: 4 job types (16 state variables, 16 actions) Episodes Cost Scratch With Transfer t.t.r. = 80% t.r. = 35 t.t.r. = 84% t.r. = 5.8 Comparison of Sarsa and NEAT Taylor, Whiteson, & Stone (GECCO-06)

50 50 Evaluation Process Year 1: Near-transfer (levels 1-6) –A = set of basic problems, B= transfer variations –Training runs include quiz of four problems, followed by worked solutions –Experiment design worked out by ETS, NU ETS provided training + test examples for NUs research needs –Novel problems from same templates were used for evaluation Tests were carried out on a sequestered NU cluster –5 nodes to ETS for Physics –Scripting language developed to facilitate creation of experiments –Code frozen at start of evaluation –Efforts to make Companions usable by others has been an important step towards making the architecture into a robust product

51 51 Good News from Evaluation With 2/3rds of the data in, likely to achieve our 50% near transfer goal Simple model works surprisingly well –Study worked solution = store it. –Extracted equations, modeling conditions, and sanity checks from prior problems Analogical mapping problems did arise –Some straightforward rerepresentation techniques may suffice for near transfer of concrete cases Most of the failures were to due to limitations in parts of the system where learning currently does not take place –Provides clear examples for helping drive Y2 research

52 52 Research Questions Raised by Evaluation so far Self-monitoring to learn self-models is important –25% of restructuring problems failed because of hard-wired resource bounds. A smarter system would figure out that it was making reasonable progress, and dynamically increasing the bounds would enable it to solve the problem. A really smart system would look over the pattern of activity, see a lot of redundancy, and figure out how to change its strategies to be more efficient Even black-box subsystems need to be extensible by learning –Example: ArcSineFn, ArcCosineFn left out of algebra system, causing 25% of the restructuring problems to fail. –In real world, such gaps are always possible –Cognitive systems must be adaptable enough to work around them.

53 53 Common materials developed for Physics Testbed Representational Infrastructure –Starting with ResearchCyc plus Northwesterns representations –Extended to include representation of worked solutions (NU, Cycorp, ETS) Support Libraries –Algebra package for symbolic, numerical equation solving With Johan de Kleer (PARC, Inc.) –Units package, tightly integrated with Cycorp representations Sketching tool for creating sketches associated with worked solutions and problems –Modification of Sketching Knowledge Entry Associate (sKEA) developed in earlier DARPA programs –Not deployed this year due to tight evaluation schedule

54 54 Understanding Strengths and Weakness of Soar / ICARUS UCT Stresses –Real time –Integration of reaction, decision making, and planning –Spatial reasoning Level 9 and 10 transfer stress –Flexibility in reasoning –Mixing task performance, deliberate reflection –Using multiple learning mechanisms –Multiple strategies for transfer –Knowledge-based transfer

55 55 Michigan Year 2 Plans for Urban Combat What capabilities/technologies you will provide? –Transfer across variations at the highest levels: 9 – Reformulation, 10 - Differing –Transfer of tactical reasoning and strategies across symbolic and spatial representations –Use synergy across multiple learning mechanisms and general strategy discovery methods –Compare and contrast across two cognitive architectures (Soar and ICARUS) How you plan to evaluate those capabilities? –In variations of UCT scenarios and maps across very different goals 1v1 Hunter/hunted 1v1, 2v2 combat engagements Search/rescue –How fast and how safely can they perform their tasks? How and when they will be integrated into the larger system(s)? –From day 1 they will be integrated in Soar & ICARUS – full cognitive architectures –All scenarios require complete end to end behavior What unique ability your technologies will add? –Transfer across tasks requiring a combination of bottom-up, knowledge-based, and spatial reasoning –Real-time, on-line learning and transfer –Learning mechanisms that require small numbers of source trials

56 56 Example Level 10 Transfer Examples of general tactics learned from source: 1v1 Engagements 1.Consume opponents resources –Pick up enemys ammo 2.Divide and conquer –Attack one enemy at a time 3.Attack from a distance –Attack with rifle at distance 4.Minimize exposure –Take advantage of terrain 5.Always leave yourself an out 6.Sacrifice for ultimate goal Examples of transfer to target (rescue from burning building). 1. Consume fires resources –Set a backburn/backfire –Remove fuel (word) 2. Divide and conquer –Putting out one fire at a time 3. Avoid getting close to fire –Use ropes & tools to work at distance 4.Minimize exposure to fire –Use barriers/doors 5.Always have a safe exit available 6.One fights (to death) while other rescues

57 57 Transfer of Spatial Knowledge Why Spatial Reasoning and Knowledge? –Ubiquitous across all military domains, inherent to military tactics and strategy –Requires integration of symbolic and metric data Examples: –Types of surfaces that facilitate/hinder travel Not only speed travel but are likely to provide unobstructed paths Roads and sidewalks vs. grassy areas and buildings Bridges over water –Placement of IEDs relative to structures (fixed and dynamic) Best places to search for IEDs - Areas that can be ignored –Common structures, organization of building that aid searching (next slide) Transfer basic structure (in two-story houses, bedrooms usually on second floor) –Locations for attacking/defending from enemies Exposure to detection and fire Location for setting ambush / being ambushed Sniper positions How to out flank opponent –Spatial organization of groups of agents Too difficult for year 2 (maybe can get to in year 3) Ability to provide cover to teammates How to search as a group How to attack and defend as a group

58 58 Combinations of Spatial and Symbolic Common spatial layouts of specific types of buildings –Movie theaters –Schools –Restaurants –Office buildings –Homes Common spatial layouts of specific types of rooms –Bathrooms –Offices –Classrooms –Theaters Correlation of signs and spatial structures –Exit signs –Street signs –Restrooms Will add graphics


Download ppt "1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise."

Similar presentations


Ads by Google