1 PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.

Slides:



Advertisements
Similar presentations
Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers I am Raphael Hoffmann and this is joint work with James Fogarty.
Advertisements

Copyright © SoftTree Technologies, Inc. DB Tuning Expert.
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
Lecture 1 Introduction to the ABAP Workbench
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Finding Code to Reuse Kerry Chang Human-Computer Interaction Institute Carnegie Mellon University D: Human Aspects of Software Development (HASD)
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
Automatically Extracting and Verifying Design Patterns in Java Code James Norris Ruchika Agrawal Computer Science Department Stanford University {jcn,
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Mining Jungloids to Cure Programmer Headaches Dave Mandelin, Ras BodikUC Berkeley Doug KimelmanIBM.
Problem: Extracting attribute set for classes (Eg: Price, Creator, Genre for class ‘Video Games’) Why?  Attributes are used to extract templates which.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Chapter 5: Information Retrieval and Web Search
OOSE 01/17 Institute of Computer Science and Information Engineering, National Cheng Kung University Member:Q 薛弘志 P 蔡文豪 F 周詩御.
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ICSE 2003 Java.
Avalanche Internet Data Management System. Presentation plan 1. The problem to be solved 2. Description of the software needed 3. The solution 4. Avalanche.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University What Kinds of.
Behavior-based Spyware Detection By Engin Kirda and Christopher Kruegel Secure Systems Lab Technical University Vienna Greg Banks, Giovanni Vigna, and.
Tao Xie Automated Software Engineering Group Department of Computer Science North Carolina State University
Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn XMLSnippet: A Coding Assistant for XML Configuration Snippet.
Improving Programmer Productivity via Mining Program Source Code Tao Xie Department of Computer Science North Carolina State University
Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:
©2010 John Wiley and Sons Chapter 12 Research Methods in Human-Computer Interaction Chapter 12- Automated Data Collection.
Hipikat: A Project Memory for Software Development The CISC 864 Analysis By Lionel Marks.
CS266 Software Reverse Engineering (SRE) Reversing and Patching Java Bytecode Teodoro (Ted) Cipresso,
Samad Paydar Web Technology Lab. Ferdowsi University of Mashhad 10 th August 2011.
Mining Software Data: Code Tao Xie University of Illinois at Urbana-Champaign
Chapter 6: Information Retrieval and Web Search
Yazd University, Electrical and Computer Engineering Department Course Title: Advanced Software Engineering By: Mohammad Ali Zare Chahooki 1 Machine Learning.
Debug Concern Navigator Masaru Shiozuka(Kyushu Institute of Technology, Japan) Naoyasu Ubayashi(Kyushu University, Japan) Yasutaka Kamei(Kyushu University,
Workshop on Software Product Archiving and Retrieving System Takeo KASUBUCHI Hiroshi IGAKI Hajimu IIDA Ken’ichi MATUMOTO Nara Institute of Science and.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn Luyiqi Locus based alignment storage.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Development of.
Computer Science Automated Software Engineering Research ( Mining Exception-Handling Rules as Conditional Association.
Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
Exploiting Code Search Engines to Improve Programmer Productivity and Quality Suresh Thummalapenta Advisor: Dr. Tao Xie Department of Computer Science.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Design and Implementation of a Rationale-Based Analysis Tool (RAT) Diploma thesis from Timo Wolf Design and Realization of a Tool for Linking Source Code.
A code-centric cluster-based approach for searching online support forums for programmers Christopher Scaffidi, Christopher Chambers, Sheela Surisetty.
Service Reliability Engineering The Chinese University of Hong Kong
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
1 Recommendation Systems for Code Reuse Tao Xie Department of Computer Science North Carolina State University Raleigh, USA.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Graph Indexing From managing and mining graph data.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
CAR-Miner: Mining Exception-Handling Rules as Sequence Association Rules Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
2014 Semantic-based Code and Documentation Search Engine Reshma Thumma Oct 10,2014 #GHC
1 API Recommendation Wujie Zheng
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
XSnippet: Mining For Sample Code Naiyana Tansalarak and Kajal Claypool Presented by: Shan Li CISC864.
CSC 241: Introduction to Computer Science I
Experience Report: System Log Analysis for Anomaly Detection
Big Data is a Big Deal!.
Ruru Yue1, Na Meng2, Qianxiang Wang1 1Peking University 2Virginia Tech
Chapter 12: Automated data collection methods
Cross-library API Recommendation Using Web Search Engines
Data Mining Chapter 6 Search Engines
Chapter 1 Introduction(1.1)
Code search & recommendation engines
Panagiotis G. Ipeirotis Luis Gravano
Precise Condition Synthesis for Program Repair
MAPO: Mining and Recommending API Usage Patterns
Don’t just listen to music; listen to people
CSC 241: Introduction to Computer Science I
Presentation transcript:

1 PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina State University Raleigh, USA Automated Software Engineering 2007

2 2 Motivation Programmers commonly reuse APIs of existing frameworks or libraries –Advantages: Low cost and high efficiency of development –Challenges: Complexity and lack of documentation Frame works

3 Problem While reusing APIs of existing open source frameworks or libraries, programmers often –know what type of object they need –but do not know how to write code for getting that object Query: “Source  Destination” How to use these APIs?

4 Example Task from Eclipse Programming Task: How to parse code in a dirty editor? Query: IEditorPart -> ICompilationUnit PARSEWeb Solution: IEditorPart iep =... IEditorInput editorInp = iep.getEditorInput(); IWorkingCopyManager wcm = JavaUI.getWorkingCopyManager(); ICompilationUnit icu = wcm.getWorkingCopy(editorInp); Difficulties: a. Needs an instance of IWorkingCopyManager b. Needs to invoke a static method of JavaUI for getting the preceding instance

5 Example Task from Eclipse Programming Task: How to parse code in a dirty editor of Eclipse? ? Query: “IEditorPart -> ICompilationUnit” Open Source Projects on web 1 2 N … … Extract MIS 1 MIS 2... … MIS k *MIS: Method-Invocation sequence, FMIS: Frequent MIS FMIS 1 FMIS 2 … FMIS n Suggest Mine

6 PARSEWeb Overview Code Downloader Code Search Engine Open Source Projects Local Source Code Repository Code Analyzer Method Invocation Sequences Sequence Post processor Clustered Method Invocation Sequences Query Splitter Final Method Invocation Sequences Query Phase 1 Phase 2 Phase 3

7 PARSEWeb : Code Downloader Code Downloader Code Search Engine Open Source Projects Local Source Code Repository Code Analyzer Method Invocation Sequences Sequence Post processor Clustered Method Invocation Sequences Query Splitter Final Method Invocation Sequences Query Phase 1 Phase 2 Phase 3

8 8 Code Downloader Accepts queries of form [Source  Destination] Interacts with a code search engine to gather related code samples Stores gathered code samples (source files) in a repository, referred as “Local Source Code Repository”

9 PARSEWeb: Code Analyzer Code Downloader Code Search Engine Open Source Repositories Local Source Code Repository Code Analyzer Method Invocation Sequences Sequence Post processor Clustered Method Invocation Sequences Query Splitter Final Method Invocation Sequences Query Phase 1 Phase 2 Phase 3

10 Code Analyzer Collect [Source  Destination] method sequences by analyzing code samples statically –Deal with local method calls by inlining methods –Deal with conditionals/loops by traversing control flow graphs Resolve types in code samples –Challenges: Gathered code samples are not compilable –Solutions: heuristics are developed

11 Type Heuristics How do I get the return type of createQueueSession method? Example 1: QueueConnection connect; QueueSession session = connect.createQueueSession(false,int) Example 2: public QueueSession test() {... return connect.createQueueSession(false,int); } 11

12 PARSEWeb: Sequence Postprocessor Code Downloader Code Search Engine Open Source Repositories Local Source Code Repository Code Analyzer Method Invocation Sequences Sequence Post processor Clustered Method Invocation Sequences Query Splitter Final Method Invocation Sequences Query Phase 1 Phase 2 Phase 3

13 Sequence Postprocessor Candidate sequences produced by the code analyzer may be too many Solutions: Cluster similar sequences –Clustering heuristics are developed Rank sequences –Ranking heuristics are developed

14 Clustering Heuristics Method-invocation sequences with the same set of statements can be considered similar, although the statements are in different order. e.g., '' '' and '' '' Method-invocation sequences with minor differences measured by an attribute cluster precision value can be considered similar. e.g., '' '' and '' '' can be considered similar under cluster precision value one

15 Ranking Heuristics Heuristic 1: Higher frequency -> Higher rank Heuristic 2: Shorter length -> Higher rank

16 PARSEWeb: Query Splitter Code Downloader Code Search Engine Open Source Repositories Local Source Code Repository Code Analyzer Method Invocation Sequences Sequence Post Processor Clustered Method Invocation Sequences Query Splitter Final Method Invocation Sequences Query Phase 1 Phase 2 Phase 3

17 Query Splitter Lack of code samples that give candidate method-invocation sequences in the results of code search engines –Required method-invocation sequences are split among different source files Solution: –Split the user query into multiple queries –Compose the results for each split query

18 Query Splitting Example 1. User query: “org.eclipse.jface.viewers.IStructuredSelection->java.io.ObjectInputStream” Results: None 2. Query: “java.io.ObjectInputStream” Results: 3. Most used immediate sources are: java.io.InputStream, java.io.ByteArrayInputStream, java.io.FileInputStream 3. Three Queries to be fired: “org.eclipse.jface.viewers.IStructuredSelection-> java.io.InputStream” Results: 1 “org.eclipse.jface.viewers.IStructuredSelection-> java.io.ByteArrayInputStream” Results: 5 “org.eclipse.jface.viewers.IStructuredSelection-> java.io.FileInputStream” Results: None

19 Eclipse Plugin

20 Evaluations Real Programming Problems: To address problems posted in developer forums Real Projects: To show that solutions recommended by PARSEWeb are –available in real projects –better than solutions recommended by related tools PROSPECTOR, Strathcona, and Google Code Search averagely

21 Real Programming Problems Jakarta BCEL user forum, 2001 Problem : “How to disassemble java byte code” Query : “Code  Instruction” Solution Sequence: FileName:2_RepMIStubGenerator.java MethodName: isWriteMethod Rank:1 NumberOfOccurrences:1 Code,getCode() ReturnType:#UNKNOWN# CONSTRUCTOR,InstructionList(#UNKNOWN#) ReturnType:InstructionList InstructionList,getInstructions() ReturnType:Instruction Solution Sample Code : Code code; InstructionList il = new InstructionList(code.getCode()); Instruction[] ins = il.getInstructions();

22 Real Programming Problems Dev 2 Dev Newsgroups, 2006 Problem : “how to connect db by sessionBean” Query : javax.naming.InitialContext  java.sql.Connection Solution Sequence : FileName:3 AddressBean.java MethodName:getNextUniqueKey Rank:1 NumberOfOccurrences:34 javax.naming.InitialContext,lookup(java.lang.String) ReturnType:javax.sql.DataSource javax.sql.DataSource,getConnection() ReturnType:java.sql.Connection

23 Real Project: Logic Source File: LogicEditor.java SUMMARY-> PARSEWeb: 8/10, Prospector: 6/10, Strathcona: 5/10

24 Comparison with Prospector 12 specific programming tasks taken from XSnippet approach. SUMMARY-> PARSEWeb: 11/12, Prospector: 7/12

25 Comparison with Other Tools Percentage of tasks successfully completed by PARSEWeb, Prospector, and XSnippet

26 Significance of Internal Techniques *Legend: Method inline: Method inlining Post Process: Sequence Post Processor Query Split: Query Splitter

27 Related Work Jungloid mining: Helping to navigate the API jungle [Mandelin et al. PLDI 05] Using structural context to recommend source code examples [Holmes and Murphy ICSE 05] XSnippet: Mining for sample code [Sahavechaphan and Claypool OOPSLA 06] MAPO: Mining API usages from open source repositories [Xie and Pei MSR 06]

28 Future Work Interacts with a single code search engine, i.e., Google code search –Plan to interact with other code search engines Requires both Source and Destination objects –Plan to infer Source from the code context Type heuristics cannot infer types in a few scenarios Eg: QueueSession session = factory.createQueueConnection(). createQueueSession(false, Session.AUTO);  Cannot infer the return type of createQueueConnection and the receiver type of createQueueSession

29 Conclusion An approach that tries to address problems faced by programmers in reusing APIs of existing frameworks or libraries by accepting queries of the form “Source  Destination “ and by suggesting frequent method-invocation sequences as solutions Addressed two problems posted in developer forums and compared with existing related tools Prospector and Strathcona

30 T. Xie Mining Program Source Code Questions?