Download presentation
Presentation is loading. Please wait.
Published byFrank Lester Modified over 6 years ago
1
Lori Pollock Professor, CIS Program Analysis, Software Development & Maintenance Tools, Optimizing Compilers ‘81 B.S. CS and Econ, Allegheny ’81-’86 PhD in CS, U of Pittsburgh Married Mark ’86-’ Assistant Prof, Rice U Lauren ‘88; Lindsay ‘90 ’ Assistant, Associate, Full Prof UD CIS Matt ’95 Today: Lauren & Lindsay here at UD, Matt 14, 3 PhD students and a few undergraduate researchers
2
What I do here at UD Research Graduate Teaching Undergraduates
Software Engineering and Compilation Lab (Hiperspace) 213 Smith Hall Collaborations Vijay Shanker (UD CIS), Terry Harvey (UD CIS), Lisa Marvel (Army Research Lab), Martin Swany (UD CIS), Guang Gao (UD ECE) Funding Primarily NSF grants; previously some Army funding Graduate Teaching CISC 672 Compilers CISC 673 Program Analysis and Transformations CISC 879 Software Testing and Maintenance CISC 879 Software Tools and Environments Undergraduates Programming XO laptops for local middle school teachers Study abroad programs
3
What I do outside UD Associate Editor, Transactions on Software Engineering and Methodology (TOSEM) Computing Research Association (CRA)’s Committee on the Status of Women in Computer Research (CRA-W) Mentoring – speaker at mentoring workshops for undergrads, grads, assistant and associate profs, and industry lab researchers Program committees, conf org, NSF panels, paper reviews,… (typical of university researchers)
4
PhD Students in Training
Antony Danalis PhD Emily Gibson Hill PhD Giri Sridhara PhD Masters: Divya Muppaneni Current Undergraduates: Eric Enslen, Sana Malik, Katie Baldwin
5
Recently Completed PhD 2007-08
David Shepherd Postdoc, Startup Sara Sprenkle Assistant Prof Washington & Lee U Mike Jochen Assistant Prof East Stroudsburg U Ben Breech Postdoc, Nasa
6
Overview: Research Projects
Natural Language Analysis of Programs Testing Web Applications Collaboration with Sara Sprenkle Emily, Giri Program Analysis Compiler Technology Ben Optimization of Cluster Parallel Programs Runtime Test Generation via Dynamic Compilers Antony Software Tools…………..Testing… Compilers….……Parallel Computing
7
Optimizing Cluster Parallel Programs
Research Problem - How can scientific codes be scaled to a cluster of many CPUs? Major Challenge – Communication Costs Approach and Contributions: An integrated system to hide communication latency -Surveyor: Collect “knowledge” of cluster -Compiler: analyze dependencies and transform to create maximal communication/computation overlap -Communication Library: Use a companion library to MPI
8
ASPhALT: Automatic System for Parallel AppLication Transformations
Contribution: FIRST to cluster-optimize MPI codes
9
Testing Web Applications
Web Application Structure Client (HTML) Server (Java) Database (MySql) Browser Front End Back End Combination of Stand-alone applications GUIs and Database applications Distributed applications Numerous technologies and components
10
Traditional Software Testing Process
Hard to obtain when testing web applications!! Application Representation Implementation Test Case Generator User-session-based Testing Replay Tool Test Cases Application Specification Expected Results Test Cases Actual Results Oracle Pass/ Fail
11
User-session-based Testing Process
register.jsp?name=ss&pass=tst login.jsp?name=ss&pass=tst logout.jsp Users User-session-based User Sessions Beta Web Application (v.0.9) Deployment Log User Requests Replay Tool Test Cases Create test cases Create/Reduce Test Cases Expected Results Test Cases Web Application Implementation (v.1.0) Actual Results Oracle Pass/ Fail
12
Maintenance Testing for Web Applications
Research Problem: How can we exploit user session logging for testing of web applications after initial deployment, with minimal tester effort? Contributions: Scalable, practical, automated structural testing framework for web applications * Test case generation * Test suite reduction * Test oracles * Test coverage criteria in terms of URLs, parameters, values
13
Analyzing the Names in Software
Research Problem % software costs are in reading and navigating large software systems to fix bugs and add new features. Can we help with automation of search, navigation, location of relevant code? - Key: Programmers leave clues of their intent as they choose names. Focus on actions -Correspond to verbs -Verbs need Direct Object - Phrases more useful Proposed Approach Develop, extend, and apply natural language-based analysis to the identifier names and comments Contribution - Aid understanding, debugging, maintenance, development
14
Our Research Focus and Impact
Exploration Understanding … Software Maintenance Tools NLPA Help build these software maintenance tools So green (motivation) comes up first, then (orange- what we are going to use), then NLPA. Natural Language Analysis Word relations (synonyms, antonyms, … Part of speech tagging Abbreviations… Word splitting
15
How do SE tool users search code now?
(Re)formulate Query Determine Relevance of Results User formulates query Query executed by search method User views search results Repeat as necessary Search Results Query Search Method Source Code Our focus: Natural language (NL) queries Reason for NL vs RE NL queries are more flexible than queries such as REs, and in previous work we’ve shown that keyword-style queries are superior for users who are new to a software system note difference between developers and users (likely not original developer) Iterative Query Refinement & Search Process Our focus: NL queries (google example) “compile report” vs. “compil*report” User faces 2 challenges: Decide what query words to search for Determine whether the results are relevant
16
Our Contextual Search Process
Determine Relevance of Results User (Re)formulate Query Search Results Hierarchical Query Search Method Search Method Source Code Information Extraction Process Partial Phrase Matching NL Phrase Mapping Information Extraction Process Source Code
17
Another of our Tools: Dora the Program Explorer*
Query Natural Language Query Maintenance request Expert knowledge Query expansion Program Structure Representation Current: call graph Seed starting point Dora Before I give you the details of how we identify the relevant neighborhood, let me give you some background put the problem into context Relevant Neighborhood Subgraph relevant to query Relevant Neighborhood * Dora comes from exploradora, the Spanish word for a female explorer.
18
Illustrating some issues: Extracting Clues from Signatures
Split Name into Words Part-of-speech tag method name Chunk method name Identify Verb and Direct-Object (DO) public UserList getUserListFromFile( String path ) throws IOException { try { File tmpFile = new File( path ); return parseFile(tmpFile); } catch( java.io.IOException e ) { throw new IOrException( ”UserList format issue" + path + " file " + e ); } Next up, Zak POS Tag get<verb> User<adj> List<noun> From <prep> File <noun> Chunk get<verb phrase> User List<noun phrase> From File <prep phrase>
19
Developing Basic NL Analyses
Search Exploration Understanding … Software Maintenance Tools NLPA Help build these software maintenance tools So green (motivation) comes up first, then (orange- what we are going to use), then NLPA. Natural Language Analysis Word relations (synonyms, antonyms, … Part of speech tagging Abbreviations… Word splitting
20
Automatic Abbreviation Expansion
Don’t want to miss relevant code with abbreviations Given a code segment, identify character sequences that are short forms and determine long form non-dictionary word Split Identifiers: Identify non-dictionary words Determine long form To identify character sequences, or tokens, in code boils down to splitting the identifiers. The hardest case is no boundary cases. If not properly split, abbreviations will be missed, such as in string length. no boundary Approach: Mine expansions from code [MSR 08]
21
Issues with a Simple Dictionary Approach
Manually create a lookup table of common abbreviations in code Vocabulary evolves over time, must maintain table Same abbreviation can have different expansions depending on domain AND context: ? Control Flow Graph Context-Free Grammar configuration configure cfg The simplest approach is a manually created dictionary of common short forms in code. However, this has a couple drawbacks.
22
What have we learned overall?
Evaluation studies indicate Natural language analysis has far more potential to improve software maintenance tools than we initially believed Existing technology falls short Synonyms, collocations, morphology, word frequencies, part-of-speech tagging, AOIG Keys to further success Improve recall Extract additional NL clues
23
What are we doing now? Emily Giri
Developing a general software word usage model Implementing SWUM Evaluating for search Giri Developing techniques for automatic comment generation Which statements to include in summary content? How to generate phrases for that content? Providing feedback on SWUM refinement Divya Analyzing specific Java statement structures Developing templates for comment phrase generation
24
Overview: Research Projects
Natural Language Analysis of Programs Testing Web Applications Collaboration with Sara Sprenkle Emily, Giri Program Analysis Compiler Technology Optimization of Cluster Parallel Programs Antony NEW… Natural Language Analysis of Parallel Programs Software Tools…………..Testing… Compilers….……Parallel Computing
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.