Cross-library API Recommendation Using Web Search Engines Wujie Zheng, Qirun Zhang, Michael R. Lyu Department of Computer Science & Engineering The Chinese University of Hong Kong Sep. 8, 2011 Dept. of Computer Science & Engineering, CUHK ESEC/FSE 2011, Szeged, Hungary, Sep. 5-9, 2011
Cross-references of APIs Research Problem How to find cross-references of APIs in different libraries? Programming Tasks Library S: some APIs Cross-references of APIs Library T: what APIs
Related Work Recommending APIs in the same library Analyzing library code Call graph: Robillard05 (Suade), Saul07 (FRAN) Access graph: Robillard05 (Suade), Long09 (Altair) Our work Analyze the control-flow graph to capture the significance of the caller-callee linkages
Related Work Recommending APIs in the same library Analyzing client code Frequent itemset mining: Li05 (PR-Miner) Code search engines: Thummalapenta07 (PARSEWeb) Recommending APIs in different libraries Analyzing client code API mapping: Zhong10 (MAM)
The world of users’ knowledge Our New Idea From code to users’ knowledge Developers ask for and share knowledge of APIs in the Web The world of users’ knowledge Code repository
Motivating Example Developers ask equivalent APIs in C# for the “HashMap” class in Java http://stackoverflow.com/questions/1273139/c-java-hashmap-equivalent
The Approach Overview Input: API lists of the source (S) and target (T) libraries Output: Relevant APIs in T for each API in S
The Approach Web query construction Querying Web search engines The source API + the target library’s name Querying Web search engines Web services of Google’s Web search
The Approach Mining API candidates Dictionary-based approach Require the API list of the target library as inputs Calculate the frequency of each target API in the Web search results
The Approach Ranking w.r.t. the whole library To reduce the noise (common target APIs) For a source API Q and a target API T tf(T,Q): the frequency of T in Q’s search results idf(T): dividing the total number of source APIs by the number of source APIs whose search results containing T, and then taking the logarithm
Preliminary Results From JDK to .NET
Discussions Web-query construction API candidates extraction Extract keywords from a source API’s doc API candidates extraction Analyze the source code Using templates (regular expressions) API usage pattern mapping From recommending single APIs to recommending API lists (sequences)
Q&A Thanks!