Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University CoxR: Open Source.

Similar presentations


Presentation on theme: "Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University CoxR: Open Source."— Presentation transcript:

1 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University CoxR: Open Source Development History Search System Makoto Matsushita, Kei Sasaki, and Katsuro Inoue Osaka University

2 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 2 Contents Background  Open-source software development  Repository analysis system “CoxR” Supporting Dynamic Communication System Future research interests

3 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 3 Open Source Software Development Open and parallel software development  Anybody join the party at anytime  Developers are living all over the world CVS email archives developers source code manual email GNATS submit bug-report request feature enhancement source code source code source code source code requests ↓ fixes

4 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 4 Reusing repositories System repositories have valuable information such as products evolutional histories and each developer’s information  processes to be done to products  knowledge on requirements and design Analyze and reuse these contents may help to reduce time/efforts of whole software development  reuse the ways of bug-fix  understanding a project itself that are going to join  reuse (a part of) products/components However, there are some difficulties to reuse contents…

5 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 5 Problem 1: less relationship between systems Where can I find what I want? user CVS email archive GNATS It seems that ‘bktr’ driver has a bug so I’d like to fix it… proposed fix for bktr driver discussions on bktr driver source code fixes files also need to be changed

6 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 6 Problem 2: Interests may vary Even if the problem is same, a solution that is done in the past is not suitable for all peoples  knowledge and processes may vary for developers  information needs may vary on time I’d like to seek authorities of graphics driver I’d like to have a new version of bktr driver Maybe similar bugs were appeared on other drivers so search them up Problem: there’s a bug on bktr driver

7 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 7 Objective Analyze past processes/histories kept on existing systems, to help developers to search, understand, reuse such processes Modeling information on systems as “development community”, using CVS, Email, and GNATS Propose an information extraction approach from development community A prototype of the proposed approach

8 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 8 Topics Step 1: Modeling information Step 2: Information extraction algorithm Step 3: System implementation

9 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 9 Model elements People: developers registered to CVS, email archive, and GNATS databases Knowledge: contents of CVS, E-mail, and GNATS CVS email archivesGNATS integrated model

10 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 10 Extracting people/knowledge file path revision # tag, date Message-Id: Date: file path PR # date last modified GNATS E-mail CVS 人 source code comments fix audit-trail status Knowledge modification developer contributor Subject: body From: To:, Cc: category bug class description base Originator Responsible

11 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 11 People/Knowledge network We assume that the network has 3 types of edges:  People-Knoledge  People-People  Knowledge-Knoledge Development Community

12 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 12 Extracting network edges (1/2) People-Knowledge edge  People/Knowledge elements in the same CVS, Email and GNATS information People-People edge  Peoples in the same CVS, Email, and GNATS information  Peoples subscribed to the same lists  Peoples working on the same directory

13 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 13 Extracting network edges (2/2) Directly connected  Revision histories to the same file  Files in the same directory  Modified at the same time  Email threads  Email/PR IDs Similar Knowledges  Source codes  Keywords  Base/modification information in GNATS Knowledge-Knowledge edge

14 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 14 Topics Step 1: Modeling information Step 2: Information extraction algorithm Step 3: System implementation Finding out a small network that is matched to the users’ input

15 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 15 Topic community Topic = reusable process and information Elements related to a topic can be defined as a sub- network of development community  Topic community may vary to each user development community Topic communmity Experts on this area patches

16 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 16 Topic community extraction (1/6) Select the initial knowledge elements 1.Assume that a topic is given by a user 2.Extract knowledge matched to the topic 3.Select an initial knowledge elements Code fragments Directory/file name Mailing lists name Bug class/description Keywords Date Search results Keyword: ”bktr” CVS : bktr_core.c 1.20 Comment: fix register error GNATS : Description: fix bktr option error (2000) I found that there is an register error on bktr driver while watching TV by fxtv program… E-mail : Subject bktr module unloding (2002) user

17 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 17 Topic community extraction (2/6) Select the initial knowledge elements 1.Assume that a topic is given by a user 2.Extract knowledge matched to the topic 3.Select an initial knowledge elements Select bktr_card.c It seems that bktr_card.c rev. 1.20 is good CVS : bktr_core.c 1.20 Comment: fix register error E-mail : Subject bktr module unloding (2002) GNATS : Description: fix bktr option error (2000) user

18 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 18 Topic community extraction (3/6) 4.Show related people/knowledges using the network 5.User selects appropriate elements again Search results Search related elements developer: fjoe I’d like to know the people working on bktr_core.c contributor: roger contributor: phk bktr_core.c user

19 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 19 Topic community extraction (4/6) 4.Show related people/knowledges using the network 5.User selects appropriate elements again Select fjoe developer: fjoe Hmm, fjoe is actual developer so I want to know more about him. contributor: roger contributor: phk bktr_core.c user

20 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 20 Topic community extraction (5/6) “Search and select elements” repeated Search results Search related elements developer: fjoe Same time changed: bktr_card.c bktr_core.c Ok, are there any other elements that when fjoe changed bktr_core.c … user Variables changed in yuv422_pro()

21 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 21 Topic community extraction (6/6) “Search and select elements” repeated Search results Search related elements developer: fjoe Same time changed: bktr_card.c Email commented to the change Topic community Tracking GNATS elements that is talking about bktr_card.c The user finally get information about the changes to bktr_card.c, that helps to fix register error bktr_core.c user PR:41437 causes a register error GNATS PR:41437 (closed) Description : Problems bktr_card.c:yuv422_pro() Variables changed in yuv422_pro()

22 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 22 Topics Step 1: Modeling information Step 2: Information extraction algorithm Step 3: System implementation CoxR: web-based system, using FreeBSD data

23 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 23 CoxR implementation Using FreeBSD development data, from 1994 to 2004 System development environment  CPU : Pentium4 1.5GHz  RAM : 512MB(SDRAM)  OS : Debian GNU/Linux  System size: about 10000 LOCs CVS : FreeBSD CVS repository (Total 57822 files, 618186 revisions) E-mail:“Commited changes” mailing lists (Total 213723) BTS : FreeBSD GNATS PRs (Total 82350)

24 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 24 System overview Web Server CoxR-C Topic words Matched People/Knowledge System Control user Relation extraction Relation DB E-mail GNATS CVS データ抽出の流れ関連抽出の流れ 情報探索の流れ History DB Information Extraction Search results Knowledge People Knowledge People Knowledge- Knowledge relations People- Knowledge relations People- People relations selection

25 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 25 System evaluation Purpose  CoxR provides useful information to developers with appropriate search results Process  Announcing CoxR to ‘freebsd-hackers’ and ‘freebsd- current’ mailing lists that are mainly for FreeBSD developers  Trace users’ behaviors with webserver’s log  Evaluation period: Jan/31/2005-Feb/21/2005  Total users : 79 (31 unique users)

26 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 26 Initial knowledge selection Unfortunately not all users select knowledge from the topic search results Maybe they are just “try” to use CoxR search, or search results is not good for users 18 out of 31 users select initial knowledge Type of information selected:  CVS: 12  E-mail: 4  GNATS: 2 Selection times average: 4 times per topics (min 1, max 9)

27 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 27 Topic community search Users actually search topic community  12 out of 18  they used to search related people and knowledge within the same subsystem Average network traversal: 2 times  People-People: 1  People-Knowledge: 8  Knowledge-Knowledge: 13

28 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 28 Discussions Initial knowledge selections  56% search results would leads to valuable information  “Search by keyword, then search by developer names and/or date” is a typical search patterns Topic community selection  67% users who find initial knowledge elements are successfully find their own topic community  They used to trace Knowledge-Knowledge and People-Knowledge edge of development network

29 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 29 Conclusion “CoxR”, a search system for open-source software development  CVS, Email, and GNATS  Development network, topic community  Evaluation helped with real developers Keywords may have its information costs  Easy to find important keywords  Links between similar keywords Developer roles  Easy to find people by their roles Reuse topic community found by others  It can be a suggestion of finding out topic community

30 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 30 Fin

31 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 31 CoxR CoxR (Web Server) CoxR user Query Word Query Word = Source code Commit log Keyword, Time File name Developer name Search Result CoDS Related Files /Data CVS information E-mail information Developer name Time Topics Commit log Keyword Time File name Developer name Data Display Record System Search Result File name Developer name Time SPxR Code DB DB Create tool Fusion info Create tool CVS Info DBFusion info DB E-mail Info DB CVS info Create toolE-mail info Create tool Token Similarity Time File name Developer name Source code Token compare tool CGI-Main Source code Search result Lexical analysis tool CVS Repository E-mail Archive Code DB DB Create tool Fusion info Create tool CVS Info DBFusion info DB E-mail Info DB CVS info Create toolE-mail info Create tool Token compare tool CGI-Main Lexical analysis tool Data Display Record System

32 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 32 Example case user Sending a password Password Attack if (i != 0) error("Permission denied, please try again."); password = read_passphrase(prompt, 0); packet_start(SSH_CMSG_AUTH_PASSWORD); packet_put_string(password, strlen(password)); memset(password, 0, strlen(password)); xfree(password); packet_send(); packet_write_wait(); Source code of sending a password Needs improvements

33 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 33 Searching the repositories Directory structure and list archive Keywords in source code Filename and repository if (i != 0) error("Permission denied, please try again."); password = read_passphrase(prompt, 0); packet_start(SSH_CMSG_AUTH_PASSWORD); packet_put_string(password, strlen(password)); memset(password, 0, strlen(password)); xfree(password); packet_send(); packet_write_wait(); user Identify similar code

34 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 34 Searching similar code After matching files are detected, identify which are most important in this case, with logs and diffs pad passwords sent to not give hints to the length Developer green committed to here at 2001/03/20 02:06:40 packet_put_string() is changed to ssh_put_password() There’s an evidence of improvement, but hard to understand what’s are actually changed Understanding with related information

35 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 35 Searching related information Display change history of the file Search files and emails at the same time of this commit Search files and emails by same developer

36 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 36 Search by revision histories Search differences between revisions Search file by same developers, and the same time

37 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 37 Search by development time Detailed information of this fileEmail information

38 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 38 Search by keyword “openssh” キーワードによる検索 Combining search results will make it easy to find what we need

39 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 39 Search similar information Definition of ssh_put_password() Files commit at the same time (2001/03/20 02:06:40) and same developer (green) Actual source code of how to hide the password packet length is found by CoxR Search living source code from relative source files

40 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 40 Solutions if (i != 0) error("Permission denied, please try again."); password = read_passphrase(prompt, 0); packet_start(SSH_CMSG_AUTH_PASSWORD); packet_put_string(password, strlen(password)); memset(password, 0, strlen(password)); xfree(password); packet_send(); packet_write_wait(); user void ssh_put_password(char *password) { int size; char *padded; size = roundup(strlen(password) + 1, 32); padded = xmalloc(size); memset(padded, 0, size); strlcpy(padded, password, size); packet_put_string(padded, size); memset(padded, 0, size); xfree(padded); } Search “how to fix” Understanding changes Reuse it later

41 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 41 Discussions Search similar code ・・・ shows actual changes Search relative infomation ・・・ Understanding how to fix the security hole Easy to detect what we need, since any kind of information, including keywords, time, developer name, code fragment, can be used. Easy to understand search results by finding relative information easily; it helps to grasp not only “what,” but also “why” this change happened.

42 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2005/12/17Supporting Knowledge Collaboration in Software Development 42 Conclusion Remarks Implementing “CoxR”, a search system for both CVS revisions and email archives. Using actual open-source development data, CoxR provides easy and quick way to search useful information on software development. Broader experimentation Improvements on search method (multiple search at one time) Information scoring (define “importance/relation level” of each information)


Download ppt "Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University CoxR: Open Source."

Similar presentations


Ads by Google