On the Evaluation of Semantic Web Service Matchmaking Systems Vassileios Tsetsos, Christos Anagnostopoulos and Stathes Hadjiefthymiades P ervasive C omputing.

On the Evaluation of Semantic Web Service Matchmaking Systems Vassileios Tsetsos, Christos Anagnostopoulos and Stathes Hadjiefthymiades P ervasive C omputing R esearch G roup C ommunication N etworks L aboratory Department of Informatics and Telecommunications University of Athens – Greece ECOWS ’06 @ Zurich

Outline Introduction Problem Statement A Generalized Fuzzy Evaluation Scheme for Service Retrieval Experimental Results A Pragmatic View Conclusions

SWS Matchmaking Matching service requests and advertisements, based on their semantic annotations (expressed through ontologies) Numerous matchmaking approaches –Logic-, similarity-, structure-based (graph matching) Various matched entities –functional service parameters (e.g., IOPE attributes) –Non-functional parameters (e.g., QoS attributes) Ultimate goal: More effective service discovery, based on semantics and not just on syntax of service descriptions

Degree of Match A value that expresses how similar two entities are, with respect to some similarity metric(s) Important feature of almost all SWS matchmaking approaches Allows for ranking of discovered services Example DoM set: exact, plugin, subsumes, subsumed-by, fail

Evaluation Basics Most works evaluate the performance of SWS Discovery (i.e., response times, scalability) Limited contributions to the evaluation of retrieval effectiveness (i.e., the ability to discover relevant services) Q: possible service requests S: advertisements of published services e : QxS→W (DoM, analogous to Retrieval Status Value in IR) r : QxS→W (expert mappings) Evaluation is the determination of how closely vector e approximates vector r R S1S1 S2S2 SnSn...... R r(R,S 1 ) r(R,S 2 ) e(R,S 1 ) r(R,S n ) e(R,S 2 ) e(R,S n ) Expert Matchmaking Engine

Evaluation Schemes W is the set of values denoting DoM (for e ) or degree of relevance (for r ) W defines different evaluation schemes (EVS): Evaluation SchemeRSVs – e(R,S i )Expert Mappings – r(R,S i ) EVS1Boolean EVS2Multi-valued

Boolean Evaluation (EVS1) W={0,1} Information Retrieval (IR) measures can be used: Precision (P B ) and Recall (R B ) RT: set of retrieved advertisements RL: set of relevant advertisements

Problem Statement (1/2) Since, SWS matchmaking systems have multi-valued vectors e, application of Boolean evaluation implies the introduction of a relevance threshold S1AS2BS3AS4DS5DS6CS7BS1AS2BS3AS4DS5DS6CS7B SiSi e(R,S i ) Threshold = “B” S11S21S31S40S50S60S71S11S21S31S40S50S60S71 SiSi e’(R,S i ) Problem 1: This “Booleanization” process filters out any service semantics captured through DoM Problem 2: An optimal threshold value is hard to find

Problem Statement (2/2) Problem 3: Boolean expert mappings are too coarse-grained and do not always reflect the intention of the domain expert. Experiment –Manually defined multi-valued mappings between 6 requests and 135 advertisements of TC2 with W={0, 0.25, 0.5, 0.75, 1} –Calculation of deviation from existing Boolean mappings Only ~33% of the Boolean mappings agree with the multi- valued ones ~40% of the Boolean mappings are not even close to the multi-valued ones (deviation > 0.25)

A Generalized Fuzzy Evaluation Scheme Such scheme (EVS2) can provide solutions to the aforementioned problems Main design decisions –Expert mappings are fuzzy linguistic terms –DoM are fuzzy sets –Boolean measures are substituted by generalized ones Why fuzzy modeling? – Relevance is an “amorphic” concept (L. Zadeh). I.e., its complexity prevents its mathematical definition –Numeric values have vague semantics –Fuzzy linguistic variables assume values from a linguistic term set, with each term being a fuzzy variable set – Warning: Fuzziness does not refer to the matchmaking process per se

Fuzzification of e and r 0.01.00.5 Degree of Relevance Membership Value 1.0 I S SW R V I: Irrelevant S: Slightly relevant SW: Somewhat relevant R: Relevant V: Very relevant 0.01.00.5 Degree of Match Membership Value 1.0 F SB S P E F: FAIL SB: SUBSUMED-BY S: SUBSUMES P: PLUGIN E: EXACT fe : QxS→[0,1] fr : QxS→[0,1] If there is not one-to-one correspondence between the number of fuzzy variables in each set, fuzzy modifiers could be used (e.g., dilutions, concentrators)

Generalized Evaluation Measures Based on [Buell and Kraft, “Performance measurement in a fuzzy retrieval system”, 1981] the following measures are defined: The cardinalities of the sets RT and RL are transformed to fuzzy set cardinalities, since the above sets are fuzzy. Note : the evaluation measures take into account all services S i

Experimental Results (1/3) Manual assessment of fuzzy relevance in the “Education” subset of TC v2 Matchmaking engine: OWLS-MX Matcher –Used only logic-based matching algorithms –Threshold = FAIL EVS1EVS2 Query IDRBRB PBPB RGRG PGPG Q1577% Q1660%92%87%96% Q1757%92%77%89% Q1873%92%90%88% Q19100%65%100%71% Q2080%71%95%72% Difference between R G and R B is due to considerable deviation between Boolean and fuzzy expert mappings

Experimental Results (2/3) Sensitivity of the proposed scheme Actual caseHypothetical case S 1 somewhat relevant/FAIL (RG=87%) S 1 very relevant/FAIL (RG=84%, all other unchanged) S 2 irrelevant/SUBSUMES (PG=96%) S 2 irrelevant/EXACT (PG=93%, all other unchanged) Only the generalized measures, are affected by “stronger” false negatives/positives

Experimental Results (3/3) Similar overall behavior but better accuracy/sensitivity as already shown EVS1 EVS1 (average) EVS2 (average) EVS2

A Pragmatic View A reasonable assumption – experts are not willing to provide more than Boolean mappings Automatic fuzzification of Boolean expert mappings would be valuable Statistics Logic implications Other inference rules Reasoning about “Relevance” Boolean Value (e.g., “1”) Adjusted Fuzzy Value (e.g., “relevant”)

A First Approach Services are represented as concepts and form a service profile ontology Then an inference matrix is used for adjusting the Boolean r values S3S3 S6S6 S1S1 R S7S7 S5S5 Service SxSx Logic relationEqDSupDSubSibNo Boolean Value11111 Inferred Fuzzy Value VRRRSW Logic relationEqDSupDSubSibNo Boolean Value00000 Inferred Fuzzy Value SWSSII

Experimental Results The new scheme (EVS2’) approximates EVS2 better than EVS1 Under the assumption that EVS2 is more accurate, the EVS2’ seems promising EVS1 EVS1 (average) EVS2 (average) EVS2 EVS2’

Conclusions Service retrieval evaluation should be semantics-aware A generalization of the current evaluation measures is deemed necessary Fuzzy Set Theory may assist towards this direction However, many practical issues remain open

Thank You! Questions??? http://p-comp.di.uoa.gr

On the Evaluation of Semantic Web Service Matchmaking Systems Vassileios Tsetsos, Christos Anagnostopoulos and Stathes Hadjiefthymiades P ervasive C omputing.

Similar presentations

Presentation on theme: "On the Evaluation of Semantic Web Service Matchmaking Systems Vassileios Tsetsos, Christos Anagnostopoulos and Stathes Hadjiefthymiades P ervasive C omputing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

On the Evaluation of Semantic Web Service Matchmaking Systems Vassileios Tsetsos, Christos Anagnostopoulos and Stathes Hadjiefthymiades P ervasive C omputing.

Similar presentations

Presentation on theme: "On the Evaluation of Semantic Web Service Matchmaking Systems Vassileios Tsetsos, Christos Anagnostopoulos and Stathes Hadjiefthymiades P ervasive C omputing."— Presentation transcript:

Similar presentations

About project

Feedback