Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Service Clustering Building Homogenous Service Communities Wei Liu Wilson Wong.

Similar presentations

Presentation on theme: "Web Service Clustering Building Homogenous Service Communities Wei Liu Wilson Wong."— Presentation transcript:

1 Web Service Clustering Building Homogenous Service Communities Wei Liu Wilson Wong

2 22-Jun-05 Outline A brief introduction on – Web services – Text mining Web Service Clustering – The motivation – The challenges – The process – The results

3 22-Jun-05 What are Web Services It is software designed to be used by other software via Internet protocols and formats (Forrester) Web services are self-describing components that can discover and engage other web services or applications to complete complex tasks over the Internet. (Sun Microsystems, Inc) Web Services are loosely coupled software components delivered over the Internet via standards-based technologies like XML, and SOAP. (Gartner) Self-describing, self-contained, modular unit of application logic that provides some business functionality to other applications through an Internet connection… ( Web services are Internet-based, modular applications that perform a specific business task and conform to a particular technical format. (IBM) A web service is application logic that is programmatically available, exposed using the Internet. (Microsoft)

4 22-Jun-054 Web services are applications accessible via the Web to be consumed by clients. Clients of a Web Service are usually refer as service requester. Technologies standardized by the W3C to support Web service applications are: Web Service Description Language (WSDL) Simple Object Access Protocol (SOAP) Universal Discovery, Description, and Integration (UDDI) The Web Service Triangle

5 Broadly defined as the act of locating a machine-processable description of a web service that may have been unknown and that meets certain functional criteria Originated from agent match-making paradigm (middle agents and brokers), later moved onto UDDI [2] The discovery mechanisms differ according what languages are used for describing the service (WSDL or OWL-S) What is Web Service Discovery [2] Garofalakis, J., Panagis, Y., Sakkopoulos, E., Tsakalidis, A.: Web service discovery mechanisms: Looking for a needle in a haystack? In: International Workshop on Web Engineering, Hypermedia Development and Web Engineering Principles and Techniques: Put them in use, in conjunction with ACM Hypertext, Santa Cruz (2004)

6 Static and Not scalable – The registry can become a bottle neck – New services have to be added through a laborious process to ensure correct categorisation, which deters people from using it Search is keyword based – Ontology supported semantic search are only available agent and semantic web services Ill-fated Registry Based Structure

7 Make use of the wsdl files collected by Google Automatically cluster these files into functionally similar groups using text mining methods – linguistic analysis, and statistical techniques combined The resulting clusters will help service discovery by reducing the size of the haystacks What we propose

8 Traditional Information Retrieval and Document Clustering techniques cannot be borrowed directly, because of the following observations – web service files do not usually contain sufficiently large number of words for use as index terms or features. – Moreover, the small number of words present in the web service files are erratic and unreliable. – Related web pages that describe the WSDL service are also considered. GoogleAPI for discovering web page referral or citation. However, most of the WSDL files do not have related web pages that provide hyperlinks to them. The few that have hyperlinks referring to them are typically examples teaching how to program in a service-oriented paradigm. Observations are concurred by [9] Challenges [9] Li, Y., Liu, Y., Zhang, L., Li, G., Xie, B., Sun, J.: An exploratory study of web services on the internet. In: 2007 IEEE International Conference on Web Services (ICWS). (2007)

9 System Architecture 22-Jun-05

10 Collected WSDL File 22-Jun-05

11 Content – Parse the WSDL file for service descriptions in natural language Context – Relate documents by looking at parent/grandparent directories – Tokenising, stemming, – Remove function words* – Remove programming terms* Obtaining Content and Context

12 One of the properties of content words is that they tend to clump or to re-occur whenever they have appeared once [10]. On the other hand, the occurrence of function words tend to be independent of one another. Very often, such contrasting property can be captured through the inability of the Poisson distribution to model word occurrences in documents [11]. In other words, unlike content words, function words tend to be Poisson distributed. Content Words vs. Function Words [10] Manning, C., Schutze, H.: Foundations of statistical natural language processing. MIT Press, MA, USA (1999) [11] Church, K., Gale, W.: Inverse document frequency (idf): A measure of deviations from poisson. In: Proceedings of the ACL 3rd Workshop on Very Large Corpora. (1995)

13 Remove Function Words 22-Jun-05 A segment of the output during content-word recognition performed on the word tokens in the web service context set for the service QuranService. (single parameter poisson distribution)

14 Remove Programming Terms 22-Jun-05 Using term clustering methods that based on Normalised Google Distance to identify programming term clusters using our Tree-Traversing Ants featureless term clustering [12] [NGD] Cilibrasi, R.L., Vitanyi, P.M.B.: The Google Similarity Distance, IEEE Trans. Knowledge and Data Engineering, 19:3(2007), [12] Wong, W., Liu, W., Bennamoun, M.: Tree-traversing ant algorithm for term clustering based on featureless similarities. Data Mining and Knowledge Discovery Vol15(3) (2007) 349–381

15 Clustering Results for QuranService 22-Jun-05 A small oracle: runtime, webservice, developer, module, data

16 The service host is the second and top-level portion of the domain name (i.e. a segment of the authority part of the URI) of the host containing the WSDL file, and The service name is the name of the WSDL file. As one may note, the four features are by no means the best or the only ones available for describing a web service. However they are the most accessible and feasible ones to use in this case. The service host and the service name

17 Combining the four features

18 Web Service Clusters 22-Jun-05

19 Conclusions The paper presented techniques for automatic discovery of web services of similar functionalities. We term such service clusters as homogeneous service communities. If the crawling and the clustering process are in continuous operation like a typical search engine does, the approach has the potential of enabling self-organisation of the Web as proposed in [3]. The proposed web service clustering approach assumes no registries, and can automatically reduce the search space of web services effectively. Therefore, it can be seen as a predecessor for Web Service Discovery. This paper gathers real service description files from the Web instead of working on hypothetical examples. The resulting clusters not only provide a useful glimpse on what services are out there, but also an insight into the types of technologies which have proliferated in this area. 22-Jun-05 [3] Liu, W.: Trustworthy service selection and composition reducing the entropy of service- oriented web. In: 3rd International IEEE Conference on Industrial Informatics, Perth, Australia (2005)

20 22-Jun-0520 Web service has become a new trend for doing business online. U.S. – 65% of companies will and have been working on Web service projects – $3 billions; 2008 – $15.8 billions Web services help in e-business and e-commerce development. The Web Service Hype Just as the Web revolutionized how users talk to applications, XML transforms how applications talk to each other. (Bill Gates) Web services are expected to revolutionize our life in much the same way as the Internet has during the past decade or so. (Gartner)

21 The UDDI Business Registry (UBR) was part of the UDDI Project announced in September The project goals were to define a set of specifications to enable description, discovery and integration and to prove interoperability through operational experience. The UBR ran for 5 years, demonstrating live, industrial strength UDDI implementations managing over 50,000 replicated entries. Why IBM, Microsoft and SAP stopped UBR

22 Is Popfly service-oriented?

23 Thank You

Download ppt "Web Service Clustering Building Homogenous Service Communities Wei Liu Wilson Wong."

Similar presentations

Ads by Google