Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 On Choosing An Efficient Service Selection Mechanism In Dynamic Environments by Murat Sensoy & Pinar Yolum Bogazici University, Istanbul,Turkey.

Similar presentations


Presentation on theme: "1 On Choosing An Efficient Service Selection Mechanism In Dynamic Environments by Murat Sensoy & Pinar Yolum Bogazici University, Istanbul,Turkey."— Presentation transcript:

1 1 On Choosing An Efficient Service Selection Mechanism In Dynamic Environments by Murat Sensoy & Pinar Yolum Bogazici University, Istanbul,Turkey

2 2 OUTLINE I ntroduction C omparison of different service selection mechanisms P roblem statement and P roposed approach E valuation C onclusions

3 3 INTRODUCTION We examine the problem of service selection in an e- commerce setting where consumer agents cooperate to identify service providers that would satisfy their service needs the most. Service Selection Problem:

4 4 INTRODUCTION Using Selective Ratings (SR): Ratings are taken from those agents who have similar demands. Using Context-Aware Ratings (CAR): Context of the ratings is described using an ontology. So, ratings are evaluated considering their context. Using Experiences: Instead of ratings, experiences of consumers are represented using ontologies and shared. An experience represent what is demanded and what is provided in response. Two approach for using experiences: Parametric classification and Gaussian Model (GM ) Using Case-Based Reasoning (CBR )

5 5 Comparison of Service Selection Methods Simulations: 20 Service Providers 400 Service Consumers Repeated 10 times Performance Measures: Ratio of satisfaction (ratio of service decisions resulted in satisfaction) time required for service selection.

6 6 Variations in service quality (PI) : In our simulations, with a very small probability, providers deviate from their expected behavior in the favor of consumers (they produce absolutely satisfactory service). This probability is called probability of indeterminism (PI). SIMULATION ENVIRONMENT Some factors are regarded in the Simulations: Variations in service demand (P CD ): Each service consumer changes its demand characteristics after receiving a service with a predefined probability denoted as P CD.

7 7 Variations in service satisfaction: Misleading similarity factor (β) is roughly the ratio of service consumers who have similar service demands but conflicting satisfaction criteria. SIMULATION ENVIRONMENT Example: β = 0.5 means that half of the consumers having similar demands have conflicting satisfaction criteria. In this case half of the ratings given for this demand will be misleading.

8 8 ConfigurationBad PerformanceGood Performance P CD =0, =0, PI=0 { }{SR, CAR, CBR, GM} Consumers vary their demands. ( P CD > 0 ) { SR }{CAR, CBR, GM} Taste of the consumes vary. ( > 0 ) { SR, CAR }{ CBR, GM } Indeterminism. ( PI > 0) { CBR }{SR, CAR, GM} P CD > 0, > 0, PI > 0 {SR, CAR, CBR}{GM} RATIO OF SATISFACTION

9 9 MethodAverage Time Consumption (msec) SR0.9 CAR10.6 CBR502.6 GM TIME CONSUMPTION T SR < T CAR < T CBR < T GM

10 10 PROBLEM A number of different service selection methods are shortly explained. Each of these approaches has different strengths and weaknesses in different configurations of the environment. Configuration of the environment is not observable. Consumers can only observe the outcomes of their service selections. How will an agent select among these methods given its trade-offs and a partially observable environment ?

11 11 Using Reinforcement Learning To Choose A Service Selection Mechanism Dynamically Reinforcement learning (RL) is an ideal learning technique to enable agents to learn the environment and thus decide on which strategy to use in a particular situation. Hence, we propose to use RL for choosing a service selection mechanism in dynamic environments.

12 12 Basics of Reinforcement Learning In RL, an agent interacts with the environment.

13 13 Basics of Reinforcement Learning The agent partially observes the states of the environment.

14 14 Basics of Reinforcement Learning The agent has a number of actions to take in a given state of the environment

15 15 Basics of Reinforcement Learning As a result of this action, new state of the environment is observed.

16 16 Basics of Reinforcement Learning … and a reward is given. The purpose of RL is to construct an optimal action policy that maximizes the total reward (finds the best service providers through out).

17 17 SERVICE SELECTION & RL Actions are choosing one of different service selection mechanisms (e.g., choosing context-aware ratings). Rewards are computed using the result of the current service selection mechanism and the trade-offs of the agent. States of the environment is observed in terms of the consequences of the agents actions. In order to use standard RL techniques, we need a reward function and a set of discrete states.

18 18 Reward function reflects the trade offs of the service consumers. The reward function used in this study is shown below. Reward Function a negative reward after choosing an action if there is another action with an expected ratio of satisfaction that is at least 10% better than that of the chosen action. A negative reward if the chosen action is at least 10% slower than another action whose ratio of satisfaction is at most 1% worse than that of the chosen action.

19 19 States Although we can parameterize environment in our simulations (using β, P CD, PI), in real-life, these parameters are not visible to consumer agents. Consumer agents observe the environment through the consequences of their actions. Therefore, states of the environment are coded using the expected ratio of satisfaction of known service selection mechanisms (actions). That is (R SR, R CAR, R CBR, R GM ). For example: If the consumer observes that R SR =0.5,R CAR =0.9, R CBR =0.7, and R GM =0.95 then the agent observes the state of the environment as (0.5,0.9,0.7,0.95).

20 20 States Different values of (R SR, R CAR, R CBR, R GM ) may represent different states of the environment. This results in a continuous state-space. This continuous state-space must be discretized in order to use standard Reinforcement Learning approaches. We propose to use k-means clustering algorithm to incrementally create discrete states, each of which encapsulates a portion of continuous state-space.

21 21 Discretization Example Initially there is only one state represented by a cluster. (0.5,0.7,0.9,0.95) Initial observation of the agent

22 22 Discretization Example Observations of the agent are encapsulated by this cluster

23 23 Discretization Example If within class variance exceeds a predefined threshold. A new state and a corresponding new cluster is created.

24 24 Discretization Example

25 25 Discretization Example

26 26 Discretization Example

27 27 Discretization Example

28 28 Determination of the Current State State 1 State 2 State 3 Given the current observation of the agent and the states with their corresponding clusters. How can we determine current state of the environment ? Current Observation

29 29 Determination of the Current State State 1 State 2 State 3 Euclidian distance from the current observation to the center of each cluster is computed. Current state is the nearest state (State 1) Current state is State 1

30 30 Determination of the Current State State 1 State 2 State 3 Then, k-means is used to update clusters and compute new cluster centers. If necessary a new cluster (and a new state) is created.

31 31 EVALUATION We take several runs to evaluate the performance of the proposed approach. In the first 8 runs environment has only one configuration through the simulations. In the last run, environment is changed from the 1 th configuration to the 8 th configuration through the simulations. Same performance as GM but 114 times faster. R GM R RL T GM T RL Almost, same performance as GM but 46 times faster. R GM R RL T GM T RL R GM R RL T GM T RL Same performance as GM but 95 times faster. R GM R RL T GM T RL Performance is 10 % less than that of GM. Primary choice of RL is GM. When we combine different configurations, performance of RL is slightly less than that of GM and it 32 times faster than GM. Performance of Modeling each provider using GM Performance of Proposed approach. Ratio of time consumptions of the GM and the proposed approach.

32 32 CONCLUSION Our approach allows agents to learn how to choose the most useful service selection mechanism among different alternatives in dynamic environments. Our experiments show that consumers choose the most useful service selection mechanism using the proposed approach. The performance of the proposed approach does not go below the lower-bound defined by the trade-offs of the consumers. As a future work, we plan to enable online addition of new service selection mechanisms. We also plan to enable agents to share their observations of the environment.

33 33

34 34 Comparison of Service Selection Methods CONFIGURATION OF THE ENVIRONMENT PERFORMANCE OF METHODS IN TERMS OF RATIO OF SATISFACTION Each approach has the same performance

35 35 Comparison of Service Selection Methods CONFIGURATION OF THE ENVIRONMENT PERFORMANCE OF METHODS IN TERMS OF RATIO OF SATISFACTION Performance of rating- based approach decreases when consumers vary their demands.

36 36 Comparison of Service Selection Methods CONFIGURATION OF THE ENVIRONMENT PERFORMANCE OF METHODS IN TERMS OF RATIO OF SATISFACTION Performances of rating- based approach and context-aware ratings decrease when taste of the consumes significantly vary.

37 37 Comparison of Service Selection Methods CONFIGURATION OF THE ENVIRONMENT PERFORMANCE OF METHODS IN TERMS OF RATIO OF SATISFACTION Performance of GM is high and does not change in different configurations of the environment. Performance of CBR approach decreases when providers produce services with a little indeterminism.

38 38 Comparison of Service Selection Methods CONFIGURATION OF THE ENVIRONMENT TIME CONSUMPTION OF METHODS (msec) T SR < T CAR < T CBR < T GM

39 39 INTRODUCTION Ratings have two major drawbacks: Ratings disregard the context of the service demands. Ratings reflect the satisfaction criteria and taste of ratings. Previous approaches to service selection are mainly based on ratings.

40 40 Define the context using an ontology and attach this to the rating. Consumers aggregates ratings from contexts that are similar to their current context. Example: Consumer wants to buy a book. - Context: buying a book - Some Context-Aware Ratings: Positive rating for buying a book from Amazon. Negative rating for buying a bycle from Amazon. Enrich ratings with context-information (Context-Aware Rating):

41 41 Ratings reflect the satisfaction criteria and taste of the raters. Experiences Instead of Negative/Positive Ratings, Tell me about your Experiences and Let Me Evaluate them on my own. MAIN IDEA Of Experiences

42 42 EXPERIENCES An Experience of a consumer contains … Service Demand of the Consumer Identity of the Selected Service Provider Supplied Service in response to the Service Demand Date of the Experience Commitments between the Consumer and the Provider if any

43 43 MAKING SERVICE DECISIONS USING EXPERIENCES Modeling Service Providers Using Multivariate Gaussian Model (Parametric Classification) Case-Based Reasoning

44 44 Comparison of Service Selection Methods If demands of the consumers do not change significantly and taste of consumers are similar for a specific demand, ratings are better. If consumers significantly change their demands and their tastes are similar for a specific demand, using context-aware ratings is better. If consumers significantly change their demands and their tastes, using experiences with CBR is better. In other cases, using experiences with GM is better.

45 45 Basics of Reinforcement Learning Each state has a value in terms of the maximum discounted reward expected in that state. The equation expresses that expected value of a state, s t, is the weighted sum of the rewards received when starting in the state s t and following the current policy. The action selection at each step is based on Q-values, which are related to the goodness of the actions. The Q-value, Q(s, a), is the total discounted reward that the agent would receive when it starts at a state s, performs an action a, and behaves optimally thereafter.

46 46 Basics of Reinforcement Learning The purpose of RL is to construct an optimal action policy that maximizes the total reward (finds the best service providers through out ). There are different approaches to achieve this, such as Q-Learning, SARSA etc. We prefer SARSA in our work, because it learns rapidly and in the early part of learning, its average policy is better than that of the other RL approaches.

47 47 SARSA Learning RateDiscount factor Reward Old Q-value Next State Best action according to the current policy

48 48 Reward Function R X = Average ratio of service decisions resulted in satisfaction when X is used for service selection. For example: R CAR = 0.8 means that on the average 80 % of service decisions results in satisfaction of the consumer when context-aware ratings are used for service selection. Terminology: T X = Average time consumed when X is used for service selection.

49 49 GM SRCBR CAR RL T RL /T GM

50 50 GM SRCBR CAR RL T RL /T GM

51 51 GM SRCBR CAR RL T RL /T GM


Download ppt "1 On Choosing An Efficient Service Selection Mechanism In Dynamic Environments by Murat Sensoy & Pinar Yolum Bogazici University, Istanbul,Turkey."

Similar presentations


Ads by Google