Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR.

Similar presentations


Presentation on theme: "Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR."— Presentation transcript:

1 Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR Research Meeting Under the supervision of Patrick R EIGNIER and James L. C ROWLEY

2 Ambient Computing 17/11/20102PULSAR Research Meeting Ubiquitous Computing (ubicomp) [Weiser, 1991] [Weiser, 1994] [Weiser and Brown, 1996]

3 3PULSAR Research Meeting17/11/2010

4 4PULSAR Research Meeting17/11/2010

5 Ambient Computing « autistic » devices Independent Heterogeneous Unaware Ubiquitous systems Accompany without imposing In periphery of the attention Invisible Calm computing 5PULSAR Research Meeting17/11/2010

6 15 years later… Ubiquitous computing is already here [Bell and Dourish, 2007] Its not exactly like we expected Singapore, the intelligent island U-Korea Not seamless but messy Not invisible but flashy Characterized by improvisation and appropriation Engaging user experiences [Rogers, 2006] 6PULSAR Research Meeting17/11/2010

7 New directions Study habits and ways of living of people and create technologies based on that (and not the opposite) [Barton and Pierce, 2006; Pascoe et al., 2007; Taylor et al., 2007; Jose, 2008] Redefine smart technologies [Rogers, 2006] Our goal: Provide an AmI application assisting the user in his everyday activities 7PULSAR Research Meeting17/11/2010

8 Our goal 8PULSAR Research Meeting 1.Perception 2.Decision 17/11/2010

9 Proposed solution 9PULSAR Research Meeting17/11/2010

10 Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Experimentations and results Conclusion 10PULSAR Research Meeting17/11/2010

11 Proposed system 11PULSAR Research Meeting17/11/2010

12 Constraints 12PULSAR Research Meeting The system is adapting to environment and preferences changes 17/11/2010

13 Example 13PULSAR Research Meeting hyperion Reminder! 17/11/2010

14 Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Experimentations and results Conclusion 14PULSAR Research Meeting17/11/2010

15 User study Why a general public user study? Original Weisers vision of calm computing revealed itself to be unsuited to current user needs Objective Evaluate the expectations and needs vis-à-vis ambient computing and its usages 15PULSAR Research Meeting17/11/2010

16 Terms of the study 26 interviewed subjects Non-experts Distributed as follows: 16PULSAR Research Meeting17/11/2010

17 Terms of the study 1 hour interviews with open discussion and support of an interactive model Questions about advantages and drawbacks of a ubiquitous assistant User ideas about interesting and useful usages 17PULSAR Research Meeting17/11/2010

18 Results 44 % of subjects interested, 13 % conquered Profiles of interested subjects: Very busy people Experiencing cognitive overload Leaning considered as a plus More reliable system Gradual training vs. heavy configuration Simple and pleasant training (one click) 18PULSAR Research Meeting17/11/2010

19 Results Short learning phase Explanations are essential Interactions Depending on the subject Optional debriefing phase Mistakes accepted as long as the consequences are not critical Use control Reveals subconscious customs Worry of dependence 19PULSAR Research Meeting17/11/2010

20 Conclusions 20 Constraints Not a black box [Bellotti and Edwards, 2001] Intelligibility and accountability Simple, not intrusive training Short training period, fast re-adaptation to preference changes Coherent initial behavior Build a ubiquitous assistant based on these constraints PULSAR Research Meeting17/11/2010

21 Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Experimentations and results Conclusion 21PULSAR Research Meeting17/11/2010

22 Ubiquitous system 22 hyperion Reminder! Uses the existing devices Heterogeneous Scattered System needs Multiplatform system Distributed system Communication protocol Dynamic service discovery Easily deployable [Emonet et al., 2006] PULSAR Research Meeting17/11/2010

23 Modules interconnection 23PULSAR Research Meeting SensorsActuators 17/11/2010

24 Example of message exchange 24 hyperion protee Text2Speech on protee ? Text2Speech on protee ? Yes! No… Install and start Text2Speech Bundle repository PULSAR Research Meeting17/11/2010

25 Database Contains Static knowledge History of events and actions To provide explanations Centralized Queried Fed Simplifies queries 25PULSAR Research Meeting by all modules on all devices 17/11/2010

26 Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Reinforcement learning Applying reinforcement learning Experimentations and results Conclusion 26PULSAR Research Meeting17/11/2010

27 Reminder: our constraints Simple training Fast learning Initial behavior consistency Life long learning Explanations 27PULSAR Research Meeting Supervised [Brdiczka et al., 2007] 17/11/2010

28 Reinforcement learning (RL) 28PULSAR Research Meeting Markov property The state at time t depends only on the state at time t-1 17/11/2010

29 Standard algorithm Q -Learning [Watkins, 1989] Updates Q -values on a new experience {state, action, next state, reward} Slow because evolution only when something happens Needs a lot of examples to learn a behavior 29PULSAR Research Meeting17/11/2010

30 Example 30PULSAR Research Meeting Go fast Far from the door + go fast = Open the door World model 17/11/2010

31 D YNA architecture 31PULSAR Research Meeting Agent World World model ActionRewardState DYNA Switch [Sutton, 1991] 17/11/2010

32 Policy D YNA architecture 32PULSAR Research Meeting Environment World model Use Update Policy Real interactions 17/11/2010

33 Global system 33PULSAR Research Meeting Environment DatabaseDatabase State Action Reward? Example Action World model Real interactions Update Policy Use Perception Example Reward Policy 17/11/2010

34 Modeling of the problem Components: States Actions 34PULSAR Research Meeting World model Real interactions Use Update Policy Components: Transition model Reward model 17/11/2010

35 State space 35PULSAR Research Meeting World model Real interactions Use Update Policy Predicates System predicates Environment predicates Karl 17/11/2010

36 State space State split newEmail( from= directeur, to= ) Notify newEmail(from = newsletter, to= ) Do not notify 36PULSAR Research Meeting17/11/2010

37 Modeling of the problem 37PULSAR Research Meeting [Buffet, 2003] 17/11/2010

38 Action space Possible actions combine Forwarding a reminder to the user Notify of a new email Lock a computer screen Unlock a computer screen Pause the music playing on a computer Un-pause the music playing on a computer Do nothing 38PULSAR Research Meeting World model Real interactions Use Update Policy 17/11/2010

39 Reward Explicit reward Through a non-intrusive user interface Problems with user rewards Implicit reward Gathered from clues (numerical value of lower amplitude) Smoothing of the model 39PULSAR Research Meeting World model Real interactions Use Update Policy 17/11/2010

40 World model Built using supervised learning From real examples Initialized using common sense Functional system from the beginning Initial model vs. initial Q-values [Kaelbling, 2004] Extensibility 40PULSAR Research Meeting World model Real interactions Use Update Politique Reward model Transition model Reward model 17/11/2010

41 Transition model 41PULSAR Research Meeting s1s1 s2s2 Starting states Action or event Modifications Reward model Transition model + Probability 17/11/2010

42 Supervised learning of the transition model Database of examples {previous state, action, next state} 42PULSAR Research Meeting World model Real interactions Use Update Policy s s … t2t2 t1t1 t3t3 s t n+1 17/11/2010

43 Global system 43PULSAR Research Meeting Environment DatabaseDatabase State Action Reward? Example Action World model Real interactions Update Policy Use Perception Example Reward Policy 17/11/2010

44 Episode Episode steps have 2 stages: Select an event that modifies the state Select an action to react to that event 44PULSAR Research Meeting World model Real interactions Update Policy Use Update 17/11/2010

45 Episode 45PULSAR Research Meeting RL agent Environment World model DatabaseDatabase Learned from real interactions or Q-Learning updates World model Real interactions Update Policy Use Update Policy Experience Policy 17/11/2010

46 Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Experimentations and results Conclusion 46PULSAR Research Meeting17/11/2010

47 Experimentations General public survey qualitative evaluation Quantitative evaluations in 2 steps: Evaluation of the initial phase Evaluation of the system during normal functioning 47PULSAR Research Meeting17/11/2010

48 Evaluation 1 « about initial learning » 48PULSAR Research Meeting17/11/2010

49 Evaluation 1 « about initial learning » 49PULSAR Research Meeting Number of iterations per episode 17/11/2010

50 Evaluation 2 « interactions and learning » 50PULSAR Research Meeting17/11/2010

51 Evaluation 2 « interactions and learning » 51PULSAR Research Meeting17/11/2010

52 Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Experimentations and results Conclusion 52PULSAR Research Meeting17/11/2010

53 Contributions Personalization of a ubiquitous system Without explicit specification Easy to evolve Adaptation of indirect reinforcement learning to a real-world problem Construction of a world model Injection of initial knowledge Deployment of a prototype 53PULSAR Research Meeting17/11/2010

54 Perspectives Non-interactive analyze of data User interactions Debriefing 54PULSAR Research Meeting17/11/2010

55 Conclusion The assistant is a means of creating an ambient intelligence application The user is the one making it smart 55PULSAR Research Meeting17/11/2010

56 Thanks for your attention Questions? 56PULSAR Research Meeting17/11/2010

57 Bibliography [Bellotti and Edwards, 2001] Victoria B ELLOTTI and Keith E DWARDS. « Intelligibility and accountability: human considerations in context-aware systems ». In Human-Computer Interaction, 2001. [Brdiczka et al., 2007]Oliver B RDICZKA, James L. C ROWLEY and Patrick R EIGNIER. « Learning Situation Models for Providing Context-Aware Services ». In Proceedings of HCI International, 2007. [Buffet, 2003]Olivier Buffet. « Une double approche modulaire de l'apprentissage par renforcement pour des agents intelligents adaptatifs ». Thèse de doctorat, Université Henri Poincaré, 2003. [Emonet et al., 2006]Rémi Emonet, Dominique Vaufreydaz, Patrick Reignier and Julien Letessier. « O3MiSCID: an Object Oriented Opensource Middleware for Service Connection, Introspection and Discovery ». In IEEE International Workshop on Services Integration in Pervasive Environments, 2006. [Kaelbling, 2004]Leslie Pack Kaelbling. « Life-Sized Learning ». Lecture at CSE Colloquia, 2004. [Maes, 1994]Pattie M AES. « Agents that reduce work and information overload ». In Commun. ACM, 1994. [Maisonnasse 2007]Jerome M AISONNASSE, Nicolas G OURIER, Patrick R EIGNIER and James L. C ROWLEY. « Machine awareness of attention for non-disruptive services ». In HCI International, 2007. [Moore, 1975]Gordon E. M OORE. « Progress in digital integrated electronics ». In Proc. IEEE International Electron Devices Meeting,1975. 57PULSAR Research Meeting17/11/2010

58 Bibliography [Nonogaki and Ueda, 1991] Hajime Nonogaki and Hirotada Ueda. « FRIEND21 project: a construction of 21st century human interface ». In CHI '91: Proceedings of the SIGCHI conference on Human factors in computing systems, 1991. [Roman et al., 2002]Manuel R OMAN, Christopher K. H ESS, Renato C ERQUEIRA, Anand R ANGANATHAN, Roy H. C AMPBELL and Klara N AHRSTEDT. « Gaia: A Middleware Infrastructure to Enable Active Spaces ». In IEEE Pervasive Computing, 2002. [Sutton, 1991]Richard S. Sutton. « Dyna, an integrated architecture for learning, planning, and reacting ». In SIGART Bull, 1991. [Weiser, 1991]Mark W EISER. « The computer for the 21 st century ». In Scientic American, 1991. [Weiser, 1994]Mark W EISER. « Some computer science issues in ubiquitous computing ». In Commun. ACM, 1993. [Weiser et Brown, 1996] Mark W EISER and John Seely B ROWN. « The coming age of calm technology ». http://www.ubiq.com/hypertext/weiser/acmfuture2endnote.htm, 1996. http://www.ubiq.com/hypertext/weiser/acmfuture2endnote.htm [Watkins, 1989]CJCH Watkins. « Learning from Delayed Rewards ». Thèse de doctorat, University of Cambridge, 1989. 58PULSAR Research Meeting17/11/2010

59 Intelligence ambiante 59PULSAR Research Meeting 1.Perception 2.Décision 17/11/2010

60 État de lart 60PULSAR Research Meeting [Roman et al., 2002][Nonogaki et Ueda, 1991] FRIEND 21Gaia Blossom Sajid S ADI et Pattie M AES http://consciousanima.net/projects/blossom/ 17/11/2010

61 Personnalisation Personnalisation dun agent informatique complexe qui assiste lutilisateur. Deux solutions [Maes, 1994] Lutilisateur spécifie lui-même le comportement Système trop complexe Tâche laborieuse Peu-évolutif Choix prédéfini par un expert Non-personnalisé Non-évolutif Utilisateur ne maîtrise pas tout le système 61PULSAR Research Meeting17/11/2010

62 Plan Présentation du problème Apprentissage dans les systèmes ubiquitaires Enquête grand public Système ubiquitaire Apprentissage par renforcement du modèle de contexte Expérimentations et résultats Conclusion 62PULSAR Research Meeting17/11/2010

63 Système ubiquitaire 63PULSAR Research Meeting hyperion Rappel ! Utilise les dispositifs existants Hétérogènes Éparpillés Besoins du système Système multiplateforme Système distribué Protocole de communication Découverte dynamique de services Déploiement facile [Emonet et al., 2006] 17/11/2010

64 Exemple déchanges de messages 64PULSAR Research Meeting hyperion protee Text2Speech sur protee ? Text2Speech sur protee ? Oui ! Non… Installe et démarre Text2Speech Dépôt de bundles bundles 17/11/2010

65 Interconnexion des modules 65PULSAR Research Meeting CapteursActionneurs 17/11/2010

66 Service OM i SCID 66PULSAR Research Meeting17/11/2010

67 Définition dun état 67PULSAR Research Meeting17/11/2010

68 Modèle de lenvironnement 68PULSAR Research Meeting Modèle État E[ état suivant ] E[ renforcement ] ActionÉvénement ou 17/11/2010

69 Réduction de lespace détats Accélération de lapprentissage Factorisation détats Division détats 69PULSAR Research Meeting ÉtatAction Q -valeur …entrance(isAlone=true, friendlyName=, btAddress= )… pauseMusic125.3 ÉtatAction Q -valeur …hasUnreadMail(from=boss, to=, subject=, body= )… inform144.02 …hasUnreadMail(from=newsletter, to=, subject=, body= )… notInform105 Jokers et 17/11/2010

70 Le simulateur de lenvironnement 70PULSAR Research Meeting17/11/2010

71 Critère dévaluation : la note Résultat de lAR : une Q -table Comment savoir si elle est « bonne » ? Apprentissage réussi si Comportement correspond aux souhaits de lutilisateur Et cest mieux si on a beaucoup exploré et si on a une estimation du comportement dans beaucoup détats 71PULSAR Research Meeting17/11/2010

72 « Le tableau de bord » 72PULSAR Research Meeting Permet denvoyer par un clic les mêmes événements que les capteurs 17/11/2010

73 Modèle de récompense Ensemble dentrées spécifiant Des contraintes sur certains arguments de létat Une action La récompense 73PULSAR Research Meeting17/11/2010

74 Modèle de récompense 74PULSAR Research Meeting s1s1 -50 États de départ Action Récompense Modèle de transition Modèle de récompense 17/11/2010

75 Apprentissage supervisé du modèle de récompense La base de données contient des exemples {état précédent, action, récompense} 75PULSAR Research Meeting Modèle du monde Interactions réelles Utilisation Mise-à-jour Politique s ar s s s … ar s ar e1e1 e n+1 17/11/2010


Download ppt "Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR."

Similar presentations


Ads by Google