Presentation is loading. Please wait.

Presentation is loading. Please wait.

Empirical Studies on License Compliance and Copyright Inconsistency Risks in Open Source Software Shi QIU.

Similar presentations


Presentation on theme: "Empirical Studies on License Compliance and Copyright Inconsistency Risks in Open Source Software Shi QIU."— Presentation transcript:

1 Empirical Studies on License Compliance and Copyright Inconsistency Risks in Open Source Software
Shi QIU

2 Introduction Open source license Copyright
Open source license describes the terms and conditions when OSS software is used, modified and shared. Software copyright is a special case of copyright, which is used to prevents the unauthorized copying of software.

3 Enforce package A under GPL-2.0 as well !
Definition The situation that the license of an OSS software is not compatible with the license of its dependency[1]. Copyleft license: e.g. GPL-2.0, GPL-3.0, LGPL-2.1, etc. Package A Package B Enforce package A under GPL-2.0 as well ! MIT License GPL-2.0 License [1] Daniel German and Massimiliano Di Penta. A method for open source license compliance of java applications. IEEE software, Vol. 29, No. 3, pp. 58–63, 2012.

4 Problems 1. Direct risk 2. Indirect risk 3. Self risk Name: Package6
Version: 1.0.1 License: MIT Name: Package2 Version: 1.0.4 License: GPL-2.0 2. Indirect risk Name: Package3 Version: 1.0.1 License: MIT Name: Package4 Version: 2.0.1 License: MIT Name: Package5 Version: 1.2.1 License: GPL-3.0 OSS ecosystems consist of software projects that are developed and evolve together in a shared environment. Name: Package6 Version: 1.0.2 License: MIT 3. Self risk File1 File2 GPL-2.0 MIT

5 Research Questions Research Questions Data collection
RQ1: What is the proportion of packages with license compliance risk? RQ2: Is the reuse of packages licensed under the copyleft license more likely to cause license compliance risk? RQ3: Does transitive dependency have an impact on the occurrence of license compliance risk? RQ4: What are the characteristics of license compliance risk at file level? Data collection

6 GPL-2, GPLv2, GPL 2, GNU GPL-2.0, GPL version 2, …
Method 1. Build the license dictionary 2. Build the software evolutionary dataset GPL-2, GPLv2, GPL 2, GNU GPL-2.0, GPL version 2, … GPL-2.0 Name: package7 Version License Dependency (version) 1.0.1 MIT package8 (1.0.1), package9 (2.3.1) 1.0.2 package8 (1.0.2) 1.1.0 GPL-2.0 package9 (2.4.0), package10 (1.0.1)

7 Method 3. Build the license compatibility dataset
MIT, GPL-2.0, Apache-2.0, … Name: Package1 Version: 1.0.1 License: MIT Name: Package2 Version: 1.0.4 License: GPL-2.0 19 popular licenses Name: Package1 Version: 1.0.1 License: MIT Name: Package2 Version: 1.0.4 License: GPL-2.0 [2]

8 Method 4. Detect direct and indirect risk
Name: Package1 Version: 1.0.1 License: MIT Name: Package2 Version: 1.2.1 License: MIT Name: Package3 Version: 2.0.1 License: GPL-2.0 software evolutionary dataset Name: Package4 Version: 1.2.3 License: GPL-3.0 Report Name: Package1 License: MIT Direct risks: Package4 (GPL-3.0) Indirect risks: Package3 (GPL-2.0) license compatibility dataset

9 Method 5. Detect self risk license compatibility dataset
Name: Package1 Version: 1.0.2 License: MIT File1 File2 GPL-2.0 MIT Report Name: Package1 License: MIT self risks: File1 (GPL-2.0) license compatibility dataset

10 Proportion of Risky Packages
RQ1: What is the proportion of packages with license compliance risk? Result: 2,704 packages are detected as having direct or indirect dependency risk out of 419,708 packages. The proportion is only 0.644%. We define these packages as risky packages. Answer: Packages with license compliance risk in npm is very few.

11 An Example A real example of risky packages
cstar (MIT) commander (GPL-2.0) graceful-readlink (MIT) mucbuc-filebase (ISC) walk-json (MIT) travejs (GPL-2.0) inject-json (MIT) commander and travejs packages are not compatible with cstar package.

12 Risk of Copyleft License
RQ2: Is the reuse of packages licensed under the copyleft license more likely to cause license compliance risk? Result: In npm, 4,067 packages includes at least one package licensed under the selected copyleft licenses in its dependency chain. Among them, 2,704 packages are detected as risky packages. The proportion is 66.49%. Answer: Yes, reuse of packages licensed under the copyleft license is more likely to cause license compliance risk.

13 Impact of Transitive Dependency
RQ3: Does transitive dependency have an impact on the occurrence of license compliance risk? Result: Answer: Yes, it does. The direct or indirect dependency risk has a tendency to happen in the shallow transitive dependency. Direct Dependency Indirect Dependency

14 Self Risk RQ4: What are the characteristics of license compliance risk at file level? Result: 964 packages in 2,704 risky packages are detected as having self risk as well. The proportion is 66.49%. In the 9,679,468 source code files of 2,704 risky packages, only 291,340 files are detected. The proportion is 3.01%. Answer: The packages having direct or indirect dependency risk have a high possibility of having self risk as well. The source code files causing compliance risk only take a small part of all source code files of a package.

15 Conclusion A method to detect license compliance risk and an empirical study on NPM. A method to detect copyright inconsistency risk and an evolutionary study on Linux kernel. Future Work - More OSS ecosystems - A web service for developers


Download ppt "Empirical Studies on License Compliance and Copyright Inconsistency Risks in Open Source Software Shi QIU."

Similar presentations


Ads by Google