Presentation is loading. Please wait.

Presentation is loading. Please wait.

Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University.

Similar presentations


Presentation on theme: "Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University."— Presentation transcript:

1 Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University of Washington, Seattle UIST 2007

2 Programmers Use Search To identify an API To seek information about an API To find examples on how to use an API Programmatically output an Acrobat PDF file in Java. Example Task:

3 Example: General Web Search Interface

4 Example: Code-Specific Web Search Interface …

5 Problems Information is dispersed: tutorials, API itself, documentation, pages with samples Difficult and time-consuming to … –locate required pieces, –get an overview of alternatives, –judge relevance and quality of results, –understand dependencies. Many page visits required

6 With Assieme we … Designed a new Web search interface Developed needed inference

7 Outline Motivation What Programmers Search For The Assieme Search Engine –Inferring Implicit References –Using Implicit References for Scoring Evaluation of Inference & User Study Discussion & Conclusion

8 Six Learning Barriers faced by Programmers (Ko et al. 04) Design barriers What to do? Selection barriers What to use? Coordination barriers How to combine? Use barriers How to use? Understanding barriers What is wrong? Information barriers How to check?

9 Examining Programmer Web Queries Objective See what programmers search for Dataset 15 million queries and click-through data Random sample of MSN queries in 05/06 Procedure Extract query sessions containing java – 2,529 Manual looking at queries and defining regex filters Informal taxonomy of query sessions

10 Examining Programmer Web Queries

11 DescriptiveContain package, type or member name Contain terms like example, using, sample code 64.1 %35.9 % 17.9 % java JSP current datejava SimpleDateFormat using currentdate in jsp Selection barrierUse barrier Coordination barrier

12 Assieme example code documentation required libaries relevance indicated by # uses Summaries show referenced types links to related info

13 Challenges How to put the right information on the interface ? Get all programming-related data Interpret data and infer relationships

14 Outline Motivation What Programmers Search For The Assieme Search Engine –Inferring Implicit References –Using Implicit References for Scoring Evaluation of Inference & User Study Discussion & Conclusion

15 Assiemes Data … is crawled using existing search engines Pages with code examples JAR files JavaDoc pages Queried Google on java ±import ±class … Queried Google on overview-tree.html … Downloaded library files for all projects on Sun.com, Apache.org, Java.net, SourceForge.net ~2,360,000 ~79,000~480,000

16 The Assieme Search Engine … infers 2 kinds of implicit references JAR files JavaDoc pages Pages with code examples Uses of packages, types and members Matches of packages, types and members ?

17 unclear segmentation Extracting Code Samples code in a different language (C++)distracting terms … in codeline numbers

18 Extracting Code Samples remove HTML commands, but preserve line breaks remove some distracters by heuristics launch (error-tolerant) Java parser at every line break (separately parse for types, methods, and sequences of statements) A simple example: 1: import java.util.*; 2: class c { 3: HashMap m = new HashMap(); 4: void f() { m.clear(); } 5: } back A simple example: 1: import java.util.*; 2: class c { 3: HashMap m = new HashMap(); 4: void f() { m.clear(); } 5: } back A simple example: 1: import java.util.*; 2: class c { 3: HashMap m = new HashMap(); 4: void f() { m.clear(); } 5: } back A simple example: import java.util.*; class c { HashMap m = new HashMap(); void f() { m.clear(); } } back

19 Resolving External Code References Naïve approach of finding term matches does not work: 1 import java.util.*; 2 class c { 3 HashMap m = new HashMap(); 4 void f() { m.clear(); } 5 } Reference java.util.HashMap.clear() on line 4 only detectable by considering several lines ? Use compiler to identify unresolved names

20 Resolving External Code References Index packages/types/members in Jar files JAR files Utility function: # covered references (and JAR popularity) java.util.HashMap.clear() java.util.HashMap … greedily pick best JARs JAR files unresolved names compile index lookup put on classpath Compile & lookup

21 Scoring Existing techniques … –Docs modeled as weighted term frequencies –Hypertext link analysis (PageRank) –JAR files (binary code) provide no context –Source code contains few relevant keywords –Structure in code important for relevance … do not work well for code, because:

22 Using Implicit References to Improve Scoring Assieme exploits structure on Web pages HTML hyperlinks and structure in code code references

23 Scoring APIs (packages/types/members) Web pages

24 Scoring APIs Use text on doc pages and on pages with code samples that reference API (~ anchor text) Weight APIs by #incoming refs (~ PageRank) Web Pages Use fully qualified references (java.util.HashMap) and adjust term weights Filter pages by references Favor pages with accompanying text

25 Outline Motivation What Programmers Search For The Assieme Search Engine –Inferring Implicit References –Using Implicit References for Scoring Evaluation of Inference & User Study Discussion & Conclusion

26 Evaluating Code Extraction and Reference Resolution … on 350 hand-labeled pages from Assiemes data Reference Resolution Recall 89.6%, Precision 86.5% False positives: Fisheye and diff pages False negatives: incomplete code samples Code Extraction Recall 96.9%, Precision 50.1% ( 76.7%) False positives: C, C#, JavaScript, PHP, FishEye/diff (After filtering pages without refs: precision 76.7%)

27 User Study Assieme vs. Google vs. Google Code Search Design 40 search tasks based on queries in logs: query socket java Write a basic server that communicates using Sockets Find code samples (and required libraries) 4 blocks of 10 tasks: 1 for training + 1 per interface Participants 9 (under-)graduate students in Computer Science

28 User Study – Task Time F(1,258)=5.74 p.017 F(1,258)=1.91 p.17 * significant

29 User Study – Solution Quality 0 seriously flawed.5 generally good but fell short in critical regard 1 fairly complete F(1,258)=55.5 p <.0001 F(1,258)=6.29 p.013 * *

30 User Study – # Queries Issued F(1,259)=9.77 p.002 F(1,259)=6.85 p.001 * *

31 Outline Motivation What Programmers Search For The Assieme Search Engine –Inferring Implicit References –Using Implicit References for Scoring Evaluation of Inference & User Study Discussion & Conclusion

32 Assieme – a novel web search interface Programmers obtain better solutions, using fewer queries, in the same amount of time Using Google subjects visited 3.3 pages/task, using Assieme only 0.27 pages, but 4.3 previews Ability to quickly view code samples changed participants strategies

33 Thank You Raphael Hoffmann Computer Science & Engineering University of Washington James Fogarty Computer Science & Engineering University of Washington Daniel S. Weld Computer Science & Engineering University of Washington This material is based upon work supported by the National Science Foundation under grant IIS , by the Office of Naval Research under grant N , SRI International under CALO grant and the Washington Research Foundation / TJ Cable Professorship.

34 Search is fundamental in modern User Interfaces Visualizing search results [Paek et al. 04] Finding personal information [Cutrell et al. 06] Augmenting structured sites [Huynh et al. 06] Summarizing search sessions [Dontcheva et al. 06] Invoking commands in programs [Little et al. 06]

35 User Study - Feedback


Download ppt "Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University."

Similar presentations


Ads by Google