Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR.

Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR Research Meeting Under the supervision of Patrick R EIGNIER and James L. C ROWLEY

Ambient Computing 17/11/20102PULSAR Research Meeting Ubiquitous Computing (ubicomp) [Weiser, 1991] [Weiser, 1994] [Weiser and Brown, 1996]

3PULSAR Research Meeting17/11/2010

4PULSAR Research Meeting17/11/2010

Ambient Computing « autistic » devices Independent Heterogeneous Unaware Ubiquitous systems Accompany without imposing In periphery of the attention Invisible Calm computing 5PULSAR Research Meeting17/11/2010

15 years later… Ubiquitous computing is already here [Bell and Dourish, 2007] Its not exactly like we expected Singapore, the intelligent island U-Korea Not seamless but messy Not invisible but flashy Characterized by improvisation and appropriation Engaging user experiences [Rogers, 2006] 6PULSAR Research Meeting17/11/2010

New directions Study habits and ways of living of people and create technologies based on that (and not the opposite) [Barton and Pierce, 2006; Pascoe et al., 2007; Taylor et al., 2007; Jose, 2008] Redefine smart technologies [Rogers, 2006] Our goal: Provide an AmI application assisting the user in his everyday activities 7PULSAR Research Meeting17/11/2010

Our goal 8PULSAR Research Meeting 1.Perception 2.Decision 17/11/2010

Proposed solution 9PULSAR Research Meeting17/11/2010

Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Experimentations and results Conclusion 10PULSAR Research Meeting17/11/2010

Proposed system 11PULSAR Research Meeting17/11/2010

Constraints 12PULSAR Research Meeting The system is adapting to environment and preferences changes 17/11/2010

Example 13PULSAR Research Meeting hyperion Reminder! 17/11/2010

User study Why a general public user study? Original Weisers vision of calm computing revealed itself to be unsuited to current user needs Objective Evaluate the expectations and needs vis-à-vis ambient computing and its usages 15PULSAR Research Meeting17/11/2010

Terms of the study 26 interviewed subjects Non-experts Distributed as follows: 16PULSAR Research Meeting17/11/2010

Terms of the study 1 hour interviews with open discussion and support of an interactive model Questions about advantages and drawbacks of a ubiquitous assistant User ideas about interesting and useful usages 17PULSAR Research Meeting17/11/2010

Results 44 % of subjects interested, 13 % conquered Profiles of interested subjects: Very busy people Experiencing cognitive overload Leaning considered as a plus More reliable system Gradual training vs. heavy configuration Simple and pleasant training (one click) 18PULSAR Research Meeting17/11/2010

Results Short learning phase Explanations are essential Interactions Depending on the subject Optional debriefing phase Mistakes accepted as long as the consequences are not critical Use control Reveals subconscious customs Worry of dependence 19PULSAR Research Meeting17/11/2010

Conclusions 20 Constraints Not a black box [Bellotti and Edwards, 2001] Intelligibility and accountability Simple, not intrusive training Short training period, fast re-adaptation to preference changes Coherent initial behavior Build a ubiquitous assistant based on these constraints PULSAR Research Meeting17/11/2010

Ubiquitous system 22 hyperion Reminder! Uses the existing devices Heterogeneous Scattered System needs Multiplatform system Distributed system Communication protocol Dynamic service discovery Easily deployable [Emonet et al., 2006] PULSAR Research Meeting17/11/2010

Modules interconnection 23PULSAR Research Meeting SensorsActuators 17/11/2010

Example of message exchange 24 hyperion protee Text2Speech on protee ? Text2Speech on protee ? Yes! No… Install and start Text2Speech Bundle repository PULSAR Research Meeting17/11/2010

Database Contains Static knowledge History of events and actions To provide explanations Centralized Queried Fed Simplifies queries 25PULSAR Research Meeting by all modules on all devices 17/11/2010

Outline Problem statement Learning in ubiquitous systems User Study Ubiquitous system Reinforcement learning of a context model Reinforcement learning Applying reinforcement learning Experimentations and results Conclusion 26PULSAR Research Meeting17/11/2010

Reminder: our constraints Simple training Fast learning Initial behavior consistency Life long learning Explanations 27PULSAR Research Meeting Supervised [Brdiczka et al., 2007] 17/11/2010

Reinforcement learning (RL) 28PULSAR Research Meeting Markov property The state at time t depends only on the state at time t-1 17/11/2010

Standard algorithm Q -Learning [Watkins, 1989] Updates Q -values on a new experience {state, action, next state, reward} Slow because evolution only when something happens Needs a lot of examples to learn a behavior 29PULSAR Research Meeting17/11/2010

Example 30PULSAR Research Meeting Go fast Far from the door + go fast = Open the door World model 17/11/2010

D YNA architecture 31PULSAR Research Meeting Agent World World model ActionRewardState DYNA Switch [Sutton, 1991] 17/11/2010

Policy D YNA architecture 32PULSAR Research Meeting Environment World model Use Update Policy Real interactions 17/11/2010

Global system 33PULSAR Research Meeting Environment DatabaseDatabase State Action Reward? Example Action World model Real interactions Update Policy Use Perception Example Reward Policy 17/11/2010

Modeling of the problem Components: States Actions 34PULSAR Research Meeting World model Real interactions Use Update Policy Components: Transition model Reward model 17/11/2010

State space 35PULSAR Research Meeting World model Real interactions Use Update Policy Predicates System predicates Environment predicates Karl 17/11/2010

State space State split newEmail( from= directeur, to= ) Notify newEmail(from = newsletter, to= ) Do not notify 36PULSAR Research Meeting17/11/2010

Modeling of the problem 37PULSAR Research Meeting [Buffet, 2003] 17/11/2010

Action space Possible actions combine Forwarding a reminder to the user Notify of a new email Lock a computer screen Unlock a computer screen Pause the music playing on a computer Un-pause the music playing on a computer Do nothing 38PULSAR Research Meeting World model Real interactions Use Update Policy 17/11/2010

Reward Explicit reward Through a non-intrusive user interface Problems with user rewards Implicit reward Gathered from clues (numerical value of lower amplitude) Smoothing of the model 39PULSAR Research Meeting World model Real interactions Use Update Policy 17/11/2010

World model Built using supervised learning From real examples Initialized using common sense Functional system from the beginning Initial model vs. initial Q-values [Kaelbling, 2004] Extensibility 40PULSAR Research Meeting World model Real interactions Use Update Politique Reward model Transition model Reward model 17/11/2010

Transition model 41PULSAR Research Meeting s1s1 s2s2 Starting states Action or event Modifications Reward model Transition model + Probability 17/11/2010

Supervised learning of the transition model Database of examples {previous state, action, next state} 42PULSAR Research Meeting World model Real interactions Use Update Policy s s … t2t2 t1t1 t3t3 s t n+1 17/11/2010

Global system 43PULSAR Research Meeting Environment DatabaseDatabase State Action Reward? Example Action World model Real interactions Update Policy Use Perception Example Reward Policy 17/11/2010

Episode Episode steps have 2 stages: Select an event that modifies the state Select an action to react to that event 44PULSAR Research Meeting World model Real interactions Update Policy Use Update 17/11/2010

Episode 45PULSAR Research Meeting RL agent Environment World model DatabaseDatabase Learned from real interactions or Q-Learning updates World model Real interactions Update Policy Use Update Policy Experience Policy 17/11/2010

Experimentations General public survey qualitative evaluation Quantitative evaluations in 2 steps: Evaluation of the initial phase Evaluation of the system during normal functioning 47PULSAR Research Meeting17/11/2010

Evaluation 1 « about initial learning » 48PULSAR Research Meeting17/11/2010

Evaluation 1 « about initial learning » 49PULSAR Research Meeting Number of iterations per episode 17/11/2010

Evaluation 2 « interactions and learning » 50PULSAR Research Meeting17/11/2010

Evaluation 2 « interactions and learning » 51PULSAR Research Meeting17/11/2010

Contributions Personalization of a ubiquitous system Without explicit specification Easy to evolve Adaptation of indirect reinforcement learning to a real-world problem Construction of a world model Injection of initial knowledge Deployment of a prototype 53PULSAR Research Meeting17/11/2010

Perspectives Non-interactive analyze of data User interactions Debriefing 54PULSAR Research Meeting17/11/2010

Conclusion The assistant is a means of creating an ambient intelligence application The user is the one making it smart 55PULSAR Research Meeting17/11/2010

Thanks for your attention Questions? 56PULSAR Research Meeting17/11/2010

Bibliography [Bellotti and Edwards, 2001] Victoria B ELLOTTI and Keith E DWARDS. « Intelligibility and accountability: human considerations in context-aware systems ». In Human-Computer Interaction, 2001. [Brdiczka et al., 2007]Oliver B RDICZKA, James L. C ROWLEY and Patrick R EIGNIER. « Learning Situation Models for Providing Context-Aware Services ». In Proceedings of HCI International, 2007. [Buffet, 2003]Olivier Buffet. « Une double approche modulaire de l'apprentissage par renforcement pour des agents intelligents adaptatifs ». Thèse de doctorat, Université Henri Poincaré, 2003. [Emonet et al., 2006]Rémi Emonet, Dominique Vaufreydaz, Patrick Reignier and Julien Letessier. « O3MiSCID: an Object Oriented Opensource Middleware for Service Connection, Introspection and Discovery ». In IEEE International Workshop on Services Integration in Pervasive Environments, 2006. [Kaelbling, 2004]Leslie Pack Kaelbling. « Life-Sized Learning ». Lecture at CSE Colloquia, 2004. [Maes, 1994]Pattie M AES. « Agents that reduce work and information overload ». In Commun. ACM, 1994. [Maisonnasse 2007]Jerome M AISONNASSE, Nicolas G OURIER, Patrick R EIGNIER and James L. C ROWLEY. « Machine awareness of attention for non-disruptive services ». In HCI International, 2007. [Moore, 1975]Gordon E. M OORE. « Progress in digital integrated electronics ». In Proc. IEEE International Electron Devices Meeting,1975. 57PULSAR Research Meeting17/11/2010

Bibliography [Nonogaki and Ueda, 1991] Hajime Nonogaki and Hirotada Ueda. « FRIEND21 project: a construction of 21st century human interface ». In CHI '91: Proceedings of the SIGCHI conference on Human factors in computing systems, 1991. [Roman et al., 2002]Manuel R OMAN, Christopher K. H ESS, Renato C ERQUEIRA, Anand R ANGANATHAN, Roy H. C AMPBELL and Klara N AHRSTEDT. « Gaia: A Middleware Infrastructure to Enable Active Spaces ». In IEEE Pervasive Computing, 2002. [Sutton, 1991]Richard S. Sutton. « Dyna, an integrated architecture for learning, planning, and reacting ». In SIGART Bull, 1991. [Weiser, 1991]Mark W EISER. « The computer for the 21 st century ». In Scientic American, 1991. [Weiser, 1994]Mark W EISER. « Some computer science issues in ubiquitous computing ». In Commun. ACM, 1993. [Weiser et Brown, 1996] Mark W EISER and John Seely B ROWN. « The coming age of calm technology ». http://www.ubiq.com/hypertext/weiser/acmfuture2endnote.htm, 1996. http://www.ubiq.com/hypertext/weiser/acmfuture2endnote.htm [Watkins, 1989]CJCH Watkins. « Learning from Delayed Rewards ». Thèse de doctorat, University of Cambridge, 1989. 58PULSAR Research Meeting17/11/2010

Intelligence ambiante 59PULSAR Research Meeting 1.Perception 2.Décision 17/11/2010

État de lart 60PULSAR Research Meeting [Roman et al., 2002][Nonogaki et Ueda, 1991] FRIEND 21Gaia Blossom Sajid S ADI et Pattie M AES http://consciousanima.net/projects/blossom/ 17/11/2010

Personnalisation Personnalisation dun agent informatique complexe qui assiste lutilisateur. Deux solutions [Maes, 1994] Lutilisateur spécifie lui-même le comportement Système trop complexe Tâche laborieuse Peu-évolutif Choix prédéfini par un expert Non-personnalisé Non-évolutif Utilisateur ne maîtrise pas tout le système 61PULSAR Research Meeting17/11/2010

Plan Présentation du problème Apprentissage dans les systèmes ubiquitaires Enquête grand public Système ubiquitaire Apprentissage par renforcement du modèle de contexte Expérimentations et résultats Conclusion 62PULSAR Research Meeting17/11/2010

Système ubiquitaire 63PULSAR Research Meeting hyperion Rappel ! Utilise les dispositifs existants Hétérogènes Éparpillés Besoins du système Système multiplateforme Système distribué Protocole de communication Découverte dynamique de services Déploiement facile [Emonet et al., 2006] 17/11/2010

Exemple déchanges de messages 64PULSAR Research Meeting hyperion protee Text2Speech sur protee ? Text2Speech sur protee ? Oui ! Non… Installe et démarre Text2Speech Dépôt de bundles bundles 17/11/2010

Interconnexion des modules 65PULSAR Research Meeting CapteursActionneurs 17/11/2010

Service OM i SCID 66PULSAR Research Meeting17/11/2010

Définition dun état 67PULSAR Research Meeting17/11/2010

Modèle de lenvironnement 68PULSAR Research Meeting Modèle État E[ état suivant ] E[ renforcement ] ActionÉvénement ou 17/11/2010

Réduction de lespace détats Accélération de lapprentissage Factorisation détats Division détats 69PULSAR Research Meeting ÉtatAction Q -valeur …entrance(isAlone=true, friendlyName=, btAddress= )… pauseMusic125.3 ÉtatAction Q -valeur …hasUnreadMail(from=boss, to=, subject=, body= )… inform144.02 …hasUnreadMail(from=newsletter, to=, subject=, body= )… notInform105 Jokers et 17/11/2010

Le simulateur de lenvironnement 70PULSAR Research Meeting17/11/2010

Critère dévaluation : la note Résultat de lAR : une Q -table Comment savoir si elle est « bonne » ? Apprentissage réussi si Comportement correspond aux souhaits de lutilisateur Et cest mieux si on a beaucoup exploré et si on a une estimation du comportement dans beaucoup détats 71PULSAR Research Meeting17/11/2010

« Le tableau de bord » 72PULSAR Research Meeting Permet denvoyer par un clic les mêmes événements que les capteurs 17/11/2010

Modèle de récompense Ensemble dentrées spécifiant Des contraintes sur certains arguments de létat Une action La récompense 73PULSAR Research Meeting17/11/2010

Modèle de récompense 74PULSAR Research Meeting s1s1 -50 États de départ Action Récompense Modèle de transition Modèle de récompense 17/11/2010

Apprentissage supervisé du modèle de récompense La base de données contient des exemples {état précédent, action, récompense} 75PULSAR Research Meeting Modèle du monde Interactions réelles Utilisation Mise-à-jour Politique s ar s s s … ar s ar e1e1 e n+1 17/11/2010

Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR.

Similar presentations

Presentation on theme: "Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR.

Similar presentations

Presentation on theme: "Reinforcement Learning of Context Models for Ubiquitous Computing Sofia Z AIDENBERG Laboratoire dInformatique de Grenoble P RIMA Group 17/11/20101PULSAR."— Presentation transcript:

Similar presentations

About project

Feedback