MAPO: Mining and Recommending API Usage Patterns

Slides:



Advertisements
Similar presentations
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Advertisements

Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis.
Seite 1Validas AG Model Projector Dr. Oscar Slotosch, Validas AG.
Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.
IBM Proof of Technology Discovering the Value of SOA with WebSphere Process Integration © 2005 IBM Corporation SOA on your terms and our expertise WebSphere.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
OOSE 01/17 Institute of Computer Science and Information Engineering, National Cheng Kung University Member:Q 薛弘志 P 蔡文豪 F 周詩御.
Where Innovation Is Tradition SYST699 – Spec Innovations Innoslate™ System Engineering Management Software Tool Test & Analysis.
Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn XMLSnippet: A Coding Assistant for XML Configuration Snippet.
PLATFORM INDEPENDENT SOFTWARE DEVELOPMENT MONITORING Mária Bieliková, Karol Rástočný, Eduard Kuric, et. al.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
Presented by Abirami Poonkundran.  Introduction  Current Work  Current Tools  Solution  Tesseract  Tesseract Usage Scenarios  Information Flow.
Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University
1 PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
Integrated Collaborative Information Systems Ahmet E. Topcu Advisor: Prof Dr. Geoffrey Fox 1.
Hipikat: A Project Memory for Software Development The CISC 864 Analysis By Lionel Marks.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge.
Debug Concern Navigator Masaru Shiozuka(Kyushu Institute of Technology, Japan) Naoyasu Ubayashi(Kyushu University, Japan) Yasutaka Kamei(Kyushu University,
Advanced Software Development & Engineering 1 Theme Introduction.
Computer Science Automated Software Engineering Research ( Mining Exception-Handling Rules as Conditional Association.
Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
Exploiting Code Search Engines to Improve Programmer Productivity and Quality Suresh Thummalapenta Advisor: Dr. Tao Xie Department of Computer Science.
Cross Language Clone Analysis Team 2 October 13, 2010.
Towards understanding programs through wear-based filtering Robert DeLine Amir Khella Mary Czerwinski George Robertson Microsoft Corporation SoftVis 2005.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
How Can I Use This Method? 2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING HOW.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Cross Language Clone Analysis Team 2 February 3, 2011.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
CAR-Miner: Mining Exception-Handling Rules as Sequence Association Rules Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
The PLA Model: On the Combination of Product-Line Analyses 강태준.
1 API Recommendation Wujie Zheng
Thomas Burleson. Using MVC with Flex & Coldfusion Projects June 27, 2007 See how Coldfusion MVC is similar to Flex MVC…
Introduction to Machine Learning, its potential usage in network area,
Diversified Trajectory Pattern Ranking in Geo-Tagged Social Media
What is Apertis? Apertis is a versatile open source infrastructure tailored to the automotive needs and fit for a wide variety of electronic devices.
Testing in Production Key to Data Driven Quality
Challenges in Creating an Automated Protein Structure Metaserver
David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and K
Daniel Amyot and Jun Biao Yan
Mining and Analyzing Data from Open Source Software Repository
Systems Analysis Overview.
Gong Cheng, Yanan Zhang, and Yuzhong Qu
Cross-library API Recommendation Using Web Search Engines
Towards the Automated Recovery of Complex Temporal API-Usage Patterns
Data Warehousing and Data Mining
Discriminative Frequent Pattern Analysis for Effective Classification
PROJECT PROGRESS PRESENTATION
Team Skill 6 - Building The Right System Part 1: Applying Use Cases
CloudAnt: Database as a Service (DBaaS)
Topological Signatures For Fast Mobility Analysis
A Semantic Peer-to-Peer Overlay for Web Services Discovery
Computational Advertising and
Recommending Adaptive Changes for Framework Evolution
Rule Engine Concepts and Drools Expert
Microsoft Azure Data Catalog
From Use Cases to Implementation
Introduction Dataset search
Presentation transcript:

MAPO: Mining and Recommending API Usage Patterns Authors: Hao Zhong, Tao Xie, Lu Zhang, Jian Pei, and Hong Mei (2006) Presented By: Aabhas Bhatia

Motivating Examples Reading File Pre-written APIs in every language Can simply use them Need to look them up

Create HTTP Request API keys encoded

Add file to body of HTTP request Reading File Pre-written APIs in every language Can simply use them Need to look them up

The Problem APIs exist for reuse, but Complex and difficult to reuse How to organize different API methods together?

Search for code snippets? Why not look at existing code snippets? Which one is relevant? More APIs to call? Look at more snippets?

Solution MAPO - Mining API usage Patterns from Open source repositories Offline mining (scalable??) Real time recommendations

Code Extraction Pattern Mining Recommendation Approach

Overview

Code Analyzer Extracts API usage information from code snippets Organizes information according to the methods Code Analyzer

Code Analyzer - parsing code Looks at following locations for API method calls Super constructor call Class cast expression Method call Class instance creations

Code Analyzer - selecting API sequences asas

Code Analyzer - selecting API sequences Choose a subset that covers all API calls Avoid Method and Common-path overweight Greedy way - select longest first Recursive Method Inlining for developer’s methods

API usage miner Groups API usage information into cluster Mines patterns from each cluster API usage miner

API usage miner - clustering Precursor for mining- Reduce the total scenarios by clustering similar ones Different API usage scenarios might interfere while mining

Clustering - similarity metrics Use average of 3 (equal weights??) Class Name Method Name Called API methods

Similarity metrics - names

Similarity metrics - names

Similarity metrics - Called API methods Similar feature but with different set of API methods called For two sequences s1 and s2 :

API Usage miner - mining from each cluster Mines frequent API method calls Sequences with support greater than threshold reported(??)

Recommender Recommends API usage pattern upon requests Eclipse plugin ranks and shows code snippets Recommender

Evaluation Experimental Study - is MAPO effective Empirical Study - can MAPO lead to lesser bugs Evaluation

Experimental Study - setup Mined 20 projects that use Eclipse’s Graphical Editing Framework Support value 0.7 Filtered out clusters not calling any GEF APIs 93 patterns, 157 API sequences, 856 API methods explored

Experimental Study - quantitative comparison Results were compared against code Snippet searching tools: Strathcona Google code Search with limited search scope(??) 13 tasks created from a book on GEF

Experimental Study - results

Experimental Study - design decisions validation Greedy subsequence selection Inlining Clustering

Experimental Study - design decisions validation

Empirical Study - setup Goal - min bugs for tasks in 1 hour Task implemented by authors and then code removed to form an incomplete code base(??) Users can refer MAPO, Google and Strathcona

Empirical Study - setup

Empirical Study - results

Recommending Code Snippets Mining API properties Related Work

Recommending Code Snippets Strathcona locates a set of relevant code snippets from a repository Prospector synthesizes solution jungloids from a jungloid query. XSnippet extends Prospector and adds additional queries, ranking heuristics, and mining algorithms to query a repository for relevant code snippets. PARSEWeb mines snippets from Google code search to find solution jungloids.

Mining API properties Mine Association Rules among software artifacts: -But properties mined without any temporal information -MAPO mines more complicated API usage patterns Mine frequent call sequences from API client code or traces: -MAPO mines patterns related to multiple APIs. - MAPO takes programming context into consideration

Conclusion Mined patterns Easier to find snippets Less snippets to look at Reduction in bugs Eclipse IDE integration Conclusion

Design decisions Experiments Discussion

Design Decisions Equal weights to similarity metrics? What about uncommon and security APIs? What happens when API upgrades? State of the art clustering schemes? Threshold for support score?

Experiments Limited Google code search over 20 repos Scalability issues with offline mining Skeletal of tasks was written by authors Can I actually use this when writing code from scratch?

Questions Thanks