Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary.

Similar presentations


Presentation on theme: "Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary."— Presentation transcript:

1 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary Study on Impact of Software Licenses on Copy-and-Paste Reuse Yu Kashima † , Yasuhiro Hayase †† , Norihiro Yoshida ††† , Yuki Manabe † , Katsuro Inoue † † : Osaka University †† : Toyo University †††: Nara Institute of Science and Technology 1

2 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Software Reuse Purpose of software reuse –Development of reliable software –Increasing software productivity We focus on Copy-and-Paste(CnP) –A basic method of software reuse 2

3 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Open Source Software and Licenses Open Source Software(OSS) –Derivative works from OSS products are allowed to be distributed –Reusable source code is increasing because of increasing OSS products OSS Licenses –Many kind of licenses are designed for satisfying various developer’s intent –Each OSS licenses have different conditions –Reuse is also restricted by the licenses 3

4 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Representative OSS Licenses 3-clause BSD License(BSD3) –A derivative work must retain copyright notices, list of conditions and disclaimer of warranties Apache License Version 2(Apachev2) –A derivative work must retain copyrights, patents, trademarks and attribution notices GNU General Public License Version 2(GPLv2) –A derivative work must be distributed under GPLv2 LicenseName Code ≡ source code distributed under LicenseName Ex. BSD3 code ≡ source code distributed under BSD3 4

5 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University CnP between different license files If a developer reuse source code; –Both license of reused code and license of developing code must be satisfied simultaneously –Distributions of developing code are prohibited in case CnP 5 BSD3GPLv2 CnP Apachev2GPLv2 CnP

6 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Impact of License on CnP Hypothesis –Characteristic of source code reuse depends on their license Frequency of CnP Kind of licenses used by source code developed by CnP To our knowledge, there are no quantitative studies on CnP reuse from the aspect of software license We investigate actual OSS to confirm this hypothesis 6

7 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Experiment An quantitative experiment was performed on a small set Purpose –Confirming our hypothesis –Investigating the scalability of our method Overview –Investigation of the number of CnP on each license –Code clone detection is used for CnP detection Code clone is a code fragment similar to other Code clone is typically generated by CnP 7

8 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Method of Experiment Step1. License detection Source Files Application X Application Y Step3. Counting Code Clones Code fragments grouped by their license 8 License#Code Fragm ents License A10 License B3 …… Unknown License A License B License A License B Step2. Code Clone Detection

9 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Step1. License Detection Ninka[1] is used for detecting licenses of source files –Analyzing license description in the source file –Having the high precision of the detected license Excluding files Ninka fails to detect their licenses –Files which contain no license description or unknown license description [1] D. M. German, Y. Manabe and K. Inoue: “A sentence-matching method for automatic license identification of source code files”, ASE 2010, pp. 437–446 (2010) 9

10 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Step2. Code Clone Detection CCFinder[2] is used for extracting code clone across different application –We assume that CnP within application will not cause license problems Filtering –Excluding code clones generated by other than CnP Ex. getter/setter, variable declarations Directions of CnP are undecided 10 License A License B License C Application X Application Y Application Z CnP Getter/Setter [2] T. Kamiya, S. Kusumoto and K. Inoue: “CCFinder: A multilinguistic token-based code clone detection system for large scale source code”, IEEE Transactions on Software Engineering, 28, pp. 654–670 (2002) Variable Declarations

11 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Step3. Counting Code Clones(1/2) Repeating the following steps to target licenses 1.Select a license as an analysis target 2.Extract clone sets including the license code Clone set is a set of code clones similar to each other 3.Count code fragments in extracted clone sets grouped by their license 11 License A License B License C License #Code Fragments License A2 License B1 License C2 Application XApplication YApplication Z Fragments having CnP relations to License A code

12 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Step3. Counting Code Clones(2/2) A clone set including both original code fragments and code fragments generated by CnP → Counting code fragments in clone sets approximates counting the number of CnP Counting the number of CnP to/from target license code fragments Although this table includes the CnP of opposite direction, it is enough to understand the brief of summary 12 License A License B License C License #Code Fragments License A2 License B1 License C2 Application XApplication YApplication Z Fragments having CnP relations to License A code

13 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Analyzed Code Java files(.java) in Debian GNU/Linux main section Reasons for selecting this target –consisted of various licenses –enable to be analyzed by both Ninka and CCFinder –an feasible scale for this experiment 13 #Packages452 #Files77,452 LOC8,530,896

14 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University License Distribution in Analyzed Code 14 #Files

15 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result ( BSD3 ) 15 License#FragmentsPercentage BSD361392% GPLv % Apachev2162.4% LesserGPL % GPLv2,ClassPathException10.15% LesserGPL % Result of counting code fragments in clone sets including BSD3 fragments grouped by their license The frequency of license used by code fragments having CnP relationship to BSD3 fragments BSD3 code is mostly reused by BSD3 code

16 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result ( Apachev2 ) License#Fragments Percentage Apachev % Apachev % LesserGPL % MPLv % BSD3291.5% MX4JLicensev % GPLv % LibraryGPL % MPLv % MITX11noNotice20.10% Public Domain10.050% Subversion % EPLv % 16 Large percentage of CnP between Apachev2 code fragments Apachev1.1 code has been changed their license to Apachev2

17 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result ( GPLv2+ ) 17 License#FragmentsPercentage GPLv % GPLnoVersion,GPLv2+,LinkException22541% BSD3285.1% LibraryGPLv % Apachev240.73% LesserGPLv % CnP within GPLv2+ code occupy the highest percentage “GPLnoVersion, GPLv2+, LinkException” has high percentage “GPLnoVersion, GPLv2+, LinkException” code is reused by GPLv2+ code. CnP GPLnoVersion, GPLv2+, LinkExceptionGPLv2+ CnP

18 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University #Files and #Fragments under Each License 18 #Fragments#Files#Fragments / #Files BSD Apachev GPLv The frequency of CnP per file BSD3 > Apachev2 > GPLv2+ Code under a license is copy-and-pasted frequently, if “#Fragments / #Files” of the license is large

19 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Summary of the Results Common characteristic of all licenses –CnP within code distributed under same license or licenses designed by the same organization have a majority CnP might happen mostly in an organization Apachev2 has CnP relations to various licenses –Files under Apachev2 have the largest number –The condition of Apachev2 is more relaxed than that of GPLv2+ The frequency of CnP per file BSD3 > Apachev2 > GPLv2+ 19

20 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Threat to Validity Insufficient to apply this result to general OSS –This analysis target is small → We plan large scale analysis –Only Java files were analyzed History of Java files is short, hence Java files are less copy- and-pasted than others → We plan analysis of C/C++ files Overlap code fragments may be counted separately –Number of overlap code fragments might be small 20 Fragment A Fragment B

21 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Scalability of Investigating Method This method can apply to large target, because each step can –License detection Ninka can analyze files in linear order –Code clone detection There are more scalable tools than CCFinder such as CCFinderX and D-CCFinder. –Counting code clone This process did not take a long time 21

22 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Conclusion A preliminary study of impact of licenses on CnP was performed –Java files in Debian/GNU Linux main section were analyzed CnP are happened mostly within code distributed under the same license or licenses designed by the same organization The frequency of CnP per file –BSD3 > Apachev2 > GPLv2+ Our method can be applied to a large target 22

23 Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Future Work Large Scale Experiment Investigating that code fragments are copy-and-pasted mostly in an organization Detecting direction of CnP 23


Download ppt "Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary."

Similar presentations


Ads by Google