10 Mashup Autocompletion – Problem Definition Given a database of mashlets and GPs and a set of mashlets selected by the user, identify and rank GPs that link a subset of the selected mashlets. Based on: Popularity & Relevance to user query What would be the “ideal” GP: The most popular one that connects only the user mashlets and nothing else Relaxations: - Less popular - Connects variants of the user mashlets - Connects a subset of the user mashlets - Connects additional mashlets
12 -Each glue pattern is represented as a point in a multidimensional space. -One dimension representing the GP popularity -The rest: All mashlets 1) User Mashlets 2) Other mashlets -The algorithm goal is to find the top-k GPs that link the given user mashlets (the ones close to the optimal GP). Problem Abstraction m1 m2 GP Popularity A simplified 3D illustration 0 0 0 0 0 0 0 0 0... g 0.4 0.3 0.2 0 1 0 0 1 0 1 0...
14 Problems with the algorithm The number of lists the algorithm accesses is very large Most of the mashlet lists are unrelated to the user selection (query)
15 Data Structure Glue Patterns Mashlets GP Popularity User mashlets
16 Algorithm n n and p g’ [m]=0 for n < m ≤ |M all | n M
17 Correctness of AC* - Lemma Theorem 4.1: Algorithm AC* returns a correct solution Proof is based on a lemma showing that any candidate that has not been encountered by AC*, has a total score lower than the threshold. Optimality of AC* Competing Algorithms: C – class of deterministic algorithms that operate under the same access model as AC*. Algorithms receive as input the lists, the monotonic function, and k. Algorithms can use any order (i.e., not specifically round-robin) and any thresholding scheme, and can rely on accessed elements. Instance Optimality: AC* is instance optimal within class C if there are constants c and c0 such that for every input instance I, cost(AC*,I) ≤ c·cost(A,I)+c0 for any A C.
18 Calculating Popularity Glue Pattern and Mashlets Rank Page-rank style algorithm Takes into account popularity of mashlets and GPs, as well as relationship between them. MM GP M M
19 Websphere Application Server MatchUp Algorithm 4 Knowledge base 1 1 2 3 5 IBM Mashup Center Implementation
20 Experiments (synthetic dataset) Synthetic dataset for large-scale experiments - Generated a DB of 40k mashlets & GPs (ProgrammableWeb has 4k) - Based on ProgrammableWeb characteristics. Experiments for synthetic dataset - Varying # of total mashlets and GPs - Varying k - Varying # of user mashlets - Varying GP complexity
21 GP Complexity = 5, varying k Results (synthetic dataset)
22 GP Complexity = 10, varying k Results (synthetic dataset)
23 Varying # of user mashlets Results (synthetic dataset)
24 Real dataset - Used real-life mashlets from ProgrammableWeb and IBM Mashup Center - Scenario: development of a travel-related mashup Experiments for quality assesment - IBM Mashup Center as the mashup platform - Users placed mashlets - MatchUp offered top-10 GPs for their mashlets - Users searched for alternatives Results - User satisfaction was high - High correlation between suggestions and users’ lists - Browsing for additional results was in general unsuccessful - Gluing process was significantly expedited Experiments (real dataset)
25 Related Work Autocompletion in many other domains Phrase Prediction (Nandi & Jagadish, VLDB 2007) File locations (Myers, CHI 2000) Web service composition Model for WS composition (Berardi et al., VLDB 2005) Optimized and customized algorithm (Mcilraith and Son, KR 2002) Mashup assembly tools MashMaker (Ennals & Garofalakis, SIGMOD 2007) : data -> widgets MashupAdvisor (Elmeleegy et al., ICWS 2008): mashup -> output recomm. -> assembly to achieve this output
26 Future Work Infer semantic inheritance automatically Distributed environment Incorporating context and user preference Conclusions A novel Autocompletion mechanism for rapid development of mashups Using the collective wisdom of other users on the web A dedicated Threshold-based top-k algorithm which reduces the search space Pagerank-style calculation of mashlets and glue patterns popularity