Oregon State University School of Electrical Engineering and Computer Science End-User Programming of Intelligent Learning Agents Prasad Tadepalli, Ron.

Oregon State University School of Electrical Engineering and Computer Science End-User Programming of Intelligent Learning Agents Prasad Tadepalli, Ron Metoyer, and Margaret Burnett In conjunction with the EUSES Consortium: End Users Shaping Effective Software

2 Prasad Tadepalli : Machine Learning Scaling Average- reward Reinforcement Learning to large spaces Relational Learning Relational learning from prior knowledge and sparse user input Relational Reinforcement Learning

3 NSF CAREER Award winner (2003). Complexities of animated content. –Creating characters for training. –Emphasis on usability and realism. Real-time simulation of evacuation dynamics for large crowds. Ron Metoyer: Computer Graphics & Animation

4 Margaret Burnett: Visual & End-User Programming Project director: EUSES Consortium (End Users Shaping Effective Software) An ITR project by Oregon State, Carnegie Mellon, Drexel, Nebraska, & Penn State. Principal architect: Forms/3, FAR end-user programming support. Co-architect: Functions for Excel users (a Microsoft Research project).

5 Motivation Task Training –Sports –Military Boston Dynamics Inc. Electronic Arts Who creates the training content?

6 Current Approaches Joystick Control: –User does all (once, not reusable). Scripting Languages –User does all (reusable program). Programming by Demonstration –User and system share. Autonomous Agents –System does all.

7 Application:Quarterback Training QB’s can benefit from 3D training content Coaches: –Do not program or animate. –Need responsive, semi-intelligent agents that perform football tasks. Agents: –Should get better over time. –Should do so with few examples. Agent behavior: –Must morph over time (different opponents).

8 End-User Programming by Demonstration Generalizing from demonstrations is still an active area of research: –Some viable approaches for particular assumptions, but not a solved problem. Other systems allow demonstrating only reactive behaviors. –Not used to train people strategy. –Largely distinct from machine learning.

9 Our Approach to End-User Programming Our approach: demonstrate goals and strategies to achieve the goals. –Allows generalization and planning by agents. –Thus, suited to training: Agents can simulate both “good” characters for training (desirable strategies)... and “bad” characters (strategies we know they employ).

10 Example Goal: Get the football to Character A. –Demonstration: Start state, goal state. –Research issue: “What is relevant”? Any trees are ignorable background. Character A can be any character. The football is a unique object. Start:Goal:

11 Strategy 1: Pass it directly. –Demonstration: Passing to A. –“What’s relevant” issues arise again. Strategy 2: Pass it to B who passes to A. –New issue: recursiveness. (Need to learn a general strategy of “get it to someone who can get it to closer to A”.) Example (cont.)

12 Machine Learning Challenges Learning must be on-line. Users can only give a few examples. Provide a predictable model of generalization. Must include support for debugging. Must allow safety checks. Expressive representation language.

13 Strategy Languages Some high-level languages exist to express strategies, e.g., Golog, CML. Our plan: simpler rule-based languages, suitable for learning. –Starting point: our previous work on a decomposition-rule language: IF Condition(s) and Goal(s) Then Subgoals(s1,s2,..sn) While invariant conditions hold.

14 Requirements of the Learning Algorithms Follow HCI findings: –User motivation, attention, trust. Need transparent generalization procedure, e.g., no neural nets. Treat user input as examples of high- level specification of strategy... –...and fill in the details. –User “steers” agent behaviors to correct faulty generalizations. Assertions to monitor behavior. –Provided, Inferred, and propagated.

15 Learning from Exercises Generate examples automatically by searching for successful plans. Bottom-up learning of skills. –Learn how to solve simple problems first. –Compose known strategies for solving subgoals to solve more complex goals.

16 Oops! That’s Not Right! Debugging by end-user programmers. –When the agents pick the right strategy but it doesn’t work right. –When the agents pick the wrong strategies. These provide negative examples to the learning component.

17 How to Support Debugging? User/system collaboration. –User helps narrow the problem. –System revises its rules and runs them on the example until the user is satisfied. Testing and Assertions –Used for quality control, but designed specifically for end users. –Assertions will be used to rule out bad generalizations.

18 Debugging (cont.) Draws from our previous work on end-user software development: –WYSIWYT testing, fault localization, and assertions. –Surprise-Explain-Reward strategy: Empirically driven research. Draws from psychology to motivate desired behaviors via surprises (to arouse curiosity).

19 Research Issues How to learn from a small number of examples? How to let the user “speak” his/her own language? How to motivate the users and earn their trust? How to facilitate debugging and maintenance in a natural way? How to make learning safe?

20 Summary: The Research Question Is it possible to empower end users......to program in evolving task-training environments......using machine learning and programming by demonstration?

21 (The End)

22 Leftovers

24 How to Support Debugging? User/system collaboration. Builds on our previous work: –Motivating, suggesting, and supporting... –...end-user testing, end-user fault localization, and end-user assertions.

25 Web Navigation (**possibly cut) Navigation of the web to satisfy a goal: Students trying to find an appropriate school that match their interests and constraints. Shoppers looking for bargain purchases. Traders searching for appropriate stocks to buy and sell. In each case, the system should learn to retrieve the target information efficiently.

26 Debugging Negative examples are used to specialize over-general rules. Maintain confidences of rules based on their support among the training examples and suggest possible incorrect rules. Encourage users to enter assertions to correct errors. Verify assertions during rule evaluation and warn the user if they are not valid.

27 Agent Behavior Control Joystick Controlled Autonomous Scripting languages Autonomous but “teachable” End-User Agents -program by interaction -generalize

Oregon State University School of Electrical Engineering and Computer Science End-User Programming of Intelligent Learning Agents Prasad Tadepalli, Ron.

Similar presentations

Presentation on theme: "Oregon State University School of Electrical Engineering and Computer Science End-User Programming of Intelligent Learning Agents Prasad Tadepalli, Ron."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Oregon State University School of Electrical Engineering and Computer Science End-User Programming of Intelligent Learning Agents Prasad Tadepalli, Ron.

Similar presentations

Presentation on theme: "Oregon State University School of Electrical Engineering and Computer Science End-User Programming of Intelligent Learning Agents Prasad Tadepalli, Ron."— Presentation transcript:

Similar presentations

About project

Feedback