Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Licensing is Software Too: Achievements and Challenges (and how this relates to code provenance) Massimiliano Di Penta University of Sannio, Italy

Similar presentations


Presentation on theme: "1 Licensing is Software Too: Achievements and Challenges (and how this relates to code provenance) Massimiliano Di Penta University of Sannio, Italy"— Presentation transcript:

1 1 Licensing is Software Too: Achievements and Challenges (and how this relates to code provenance) Massimiliano Di Penta University of Sannio, Italy dipenta@unisannio.it http://www.rcost.unisannio.it/mdipenta

2 2 Acknowledgements  Daniel M. Germán, Univ. Victoria, Canada  Julius Davies, Univ. Victoria, Canada  Giuliano Antoniol, Ecole Polyt. Montréal, Canada  Yann-Gaël Guéhéneuc, Ecole Polyt. Montréal, Canada

3 3 Reusing Open Source Software  When developing a software system, we try (if possible) not to reinvent the wheel  Components, libraries, source code snippets out of there, ready to be reused  Code search engines are becoming popular  Open source code modification and redistribution governed by  Software licenses  Copyright statements  Everything contained in a licensing block…

4 4 What does a licensing contain? /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.mozilla.org/MPL/ …. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner …. * decision by deleting the provisions above and replace them with the notice * and other provisions required by the GPL or the LGPL. If you do not delete * the provisions above, a recipient may use your version of this file under * the terms of any one of the MPL, the GPL or the LGPL. * * ***** END LICENSE BLOCK ***** */ #include "nsXULAppAPI.h" #ifdef XP_WIN #include License (MPL+GPL+LGPL) Copyright statement Copyright year Contributor

5 5 Restrictive vs. permissive licenses  Restrictive (aka copyleft or reciprocal)  Changed software must be made available under similar terms wrt. the original  Example: GPL  Permissive  Modifications/enhancements may remain proprietary  Distribution of source code or binary permitted – Provided copyright notice and/or liability disclaimers – Contributor names do not imply endorsement  Examples: Berkeley Software Distribution (BSD), Apache Software License, MIT

6 6 FOSS development teams care! (source: Debian) I am in the process of trying to prepare 0.8.0 for Debian GNU/Linux I have started going over the copyright/license headers. In src/celeste many files are missing copyright information. Most of these are files imported with minimal changes from Gabor API http://www.kung-foo.tv/gaborapi.php or libsvm http://www.csie.ntu.edu.tw/\~cjlin/libsvm/. The attached patch adds copyright and license statements to these files.[1] Please apply and update the headers (adding copyright holders) if you make substantial changes. thanks, cu andreas [1] I have doublechecked with Gabor API's upstream author Adriaan Tijsseling that files like ContrastFilter.cpp are Copyright (c) Adriaan Tijsseling and licensed under GPLv2+, although the original headers just say: Original Author: Yasunobu Honma Modifications by: Adriaan Tijsseling (AGT)

7 7 Conjectures  Since licenses determine the way software can be composed and re-distributed  They may change/evolve as any other part of the software  They might be subject to bugs too – See our ICPC 2010 paper about how to identify licensing incompatibilities  They might determine the success/failure of a software project  Code provenance and licenses:  Licenses constrain source code migration between projects  Code provenance might be useful to determine the licensing of closed components

8 8 Licenses influence the software lifetime  OpenBSD founder and project leader Theo de Raadt removed a security software package called IP-Filter [written by Darren Reed] after its author changed its license. Stephen Shankland, CNET News, 2001/05/30.  Licenses evolve as software does  Failing to account for that would cause copyright infringements  Decisions on license changes impact as other decisions on software evolution  Little attention so far from the scientific community Need for methods and tools to audit licensing and their changes

9 9 Example: Java  Until November 2006, the license of Java JDK v1.2 said: “Except as specifically authorized in any Supplemental License Terms, you may not make copies of Software, other than a single copy of Software for archival purposes”  This disallowed the inclusion of Java in Linux distributions  Java 5.0 released under the GPL v2 with the CLASSPATH exception:  Java could be modified/updated under the GPL v2  Java programs could be released under any license as long as they satisfy the conditions stated in the CLASSPATH exception Changing the license of a system can promote and ease the distribution and reuse of a software system

10 10 Example: Mono  Framework produced by Novell to support the.Net API under non Microsoft OS  Initially distributed under the GPL v2  potential problem when running.Net systems  Considered derivative works of Mono  Required to be also released under the GPL v2  Mono developers changed its license to MIT/X11  The change was also required by HP for its participation to the project A change to a more permissive license may increase the size of the community of contributors to a FOSS system

11 11 Example: QT  First released under a non-open source but free license, called the FreeQT License, and a commercial license  QT became the basis for KDE  QT v2.0 was released under a new license, the Q Public License  incompatible with the GPL  GNOME project started as a QT-free alternative to KDE  Harmony project started as a GPL replacement of QT  Trolltech changed the license of QT v3 to the GPL v2  The Harmony project was abandoned Changing the license of FOSS system towards a more permissive might cause the abandonment of a competing system

12 12 Example: MySQL  In 2004, MySQL AB changed the license of its client libraries from LGPL v2.1 to GPL v2  to prevent industrial companies from using the libraries within proprietary products  Unintended consequences:  PHP systems were no longer able to connect to MySQL  PHP license is incompatible with the GPL v2  MySQL addressed this problem by adding the MySQL FOSS License Exception to the GPL v2 Changing the license of a FOSS system might have unintended/undesirable consequences to its legitimate users

13 13 Empirical Study  Goal: analyze licensing evolution  Purpose: investigating how developers change licensing statements  Context: CVS/SVN repositories of  ArgoUML, Eclipse-JDT, the FreeBSD and the OpenBSD kernels, Mozilla, Samba

14 14 Research Questions  RQ1: To what extent are files changing their licenses?  RQ2: How are copyright years changed in licensing statements?  RQ3: Who are the contributors of a software project and how do they change?

15 15 Licensing Analysis Method – Extracting Licensing statements /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.mozilla.org/MPL/ …. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner …. * decision by deleting the provisions above and replace them with the notice * and other provisions required by the GPL or the LGPL. If you do not delete * the provisions above, a recipient may use your version of this file under * the terms of any one of the MPL, the GPL or the LGPL. * * ***** END LICENSE BLOCK ***** */ #include "nsXULAppAPI.h" #ifdef XP_WIN #include

16 16 Licensing Analysis Method – Classifying licenses  FoSSology [Gobeille, MSR 2008]: detects licenses using the Binary Symbolic Alignment Matrix (bSAM)  Ninka [German et al., ASE 2010]: uses a pattern- matching approach /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.mozilla.org/MPL/ …. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner …. /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.mozilla.org/MPL/ …. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner …. MPL 1.1/GPL 2.0/LGPL 2.1

17 17 Licensing Analysis Method – Identifying changes in copyright years  Mining references to years in licensing… /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.mozilla.org/MPL/ …. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner …. /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.mozilla.org/MPL/ …. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner ….

18 18 Licensing Analysis Method – Identifying contributor names  Mining emails, plus various patterns  Copyright … year name  Contributor(s) …  And mapped to committers, whenever possible /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.mozilla.org/MPL/ …. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner …. /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.mozilla.org/MPL/ …. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner ….

19 19 RQ1: Most relevant license changes Eclipse-JDT Common Public License v1.0Eclipse Public License v1.0CHANGE2394 Common Public License v0.5Common Public License v1.0UPDATE808 Mozilla NPL'NPL v1.1'-style+GPL v2+LGPL v2.1DUAL2914 NPL'Dual MPL GPL'-style+MPLDUAL1274 'Dual MPL GPL'-style+MPLNPLBUG1194 Licensing updated as new licenses were developed  Eclipse JDT: CPL 0.5  CPL 1.0  EPL 1.0  IBM has relinquished control of licenses to the Eclipse Foundation  Mozilla: NPL  MPL + GPL (+ LGPL)  NPL allowed to release Netscape 6 as a proprietary system  MPL only allows to re-distribute the source code under the MPL  Multiple licenses to deal with incompatibilities  Files wrongly changed to NPL (bug #98089)

20 20 RQ1: Most relevant license changes FreeBSD BSD UCRegents (4-cl BSD) 'BSD UCRegents'-style (4-cl BSD) UPDATE491 'BSD UCRegents'-style (4-cl BSD)'INRIA-OSL'-style (3-cl BSD)UPDATE300 OpenBSD 'BSD UCRegents'-style (4-cl BSD)'INRIA-OSL'-style (3-cl BSD)UPDATE964 BSD UCRegents (4-cl BSD) 'BSD UCRegents'-style (4-cl BSD) UPDATE414  FreeBSD and OpenBSD are more eclectic than other projects  Moving from BSD-4 clauses to the more permissive BSD-3 and BSD-2

21 21 RQ1: Most relevant license changes ArgoUML None 'Free with copyright clause'-style +'UC Regents free with copyright clause'-style ADD127 Samba NoneGPL v2ADD15  ArgoUML and Samba kept the same licenses over the analyzed time span  Change is from None to a simple license  Authors realized the importance of including a license

22 22 RQ2: How and why were copyright years changed?  Files for which the copyright years were updated underwent a significantly higher number of changes than others  When developers perform substantial changes to a file, they also update copyright years  Required by copyright regulations  Lack of updates with substantial changes would allow an infringer to claim “innocent infringement”  Commits explicitly targeted to copyright years  “Updated copyrights”  “Updated copyrights to 2004”

23 23 RQ3: When do contributors change?  Changes where contributor names are added are significantly bigger than other changes  Contributors often added when they make substantial changes  Contributor names are important assets in source code  Like the signature on a picture  However…  contributors can change during the time  no standard way of reporting them  no clear rule on when one should become a contributor  Their presence can have legal implications

24 Licenses Influence Code Migration

25 25 Free (software) as a bird…  As birds migrate differently during different seasons….  Code might have a migration preferential direction  Given two systems  e.g. FreeBSD and Linux  We find the same code in both systems  Three scenarios:  Migration FreeBSD  Linux  Migration Linux  FreeBSD  Migration third-party  FreeBSD, Linux

26 26 Licenses in the kernels  Let’s consider 3 Unix kernels  Linux, FreeBSD, and OpenBSD  Linux: mainly GPL (65%) with some (25%) “promoted” to GPL by L. Torvalds  A few files (35) have two licenses  FreeBSD: 75% of the files have BSD license  Some files, with corporate (Intel)  Some with MIT  Others with multiple license (BSD and GPL, MIT and GPL, BSD and Educational)  OpenBSD: similar to FreeBSD

27 27 Sibling(s) Origin  Identify siblings between systems using clone detection  CCFinderX, with >100 tokens as threshold, plus other heuristics  Trace back into past siblings – their code fragments in the same files  Again clone detection, the sibling fragment wrt. previous file revisions  When they disappear, then we have their origins  Take the oldest of the two as the true origin Sys 1 – File i Sys 2 – File j siblings Cloned fragments Migration direction

28 28 Code Migration and Licenses FreeBSD Linux Files BSD GPL8 BSD MIT2 BSD None2 CorporateBSD+GPL89 GPL None1 PhraseBSD+GPL1 X.Net+BS D MIT1 Linux FreeBSD Files BSD+GPL Corporate8 GPL BSD17 GPLBSD+GPL1 GPLCPL+BSD+GPL1 MIT BSD1 MIT+GPL None2 BSD1 Phrase+GP L MIT2 OpenBSD LinuxFiles BSD BSD+GPL1 BSD MIT2 BSD Unknown1 BSD+GPL GPL1 BSD+Phra se Phrase+GPL1 MIT GPL23 After Jan 1, 2002 Nothing before Before Jan 1, 2002 Almost nothing after

29 29 Discussion  Siblings have a preferential flow  Initially from BSD(s) to Linux – frequent  Today from Linux to FreeBSD – less frequent  Thus, due to licenses but also to the system level of development  Companies directly contribute to code in different kernels – see Intel drivers with dual licenses  In this case, code migrates from a third party towards Linux and FreeBSD

30 Identifying licenses of jar archives

31 31 Motivations  Very often, Java open source software is distributed in jar archives  See http://mvnrepository.com/  Problem: the jar might not contain licensing info  Under what conditions can we integrate the component?  The jar might not be legally used  Even if it’s from open source code, we might not found exactly the same jar

32 32 Search-driven approach  Extracting info from the class bytecode  Class and package names.. or a fingerprint..  We use the ASM library (http://asm.ow2.org/)  Querying Google Code Search  Using the full qualified class name  Using the package only  Query performed using the Google Code API (http://code.google.com/apis/gdata/)  If the same class is not found, its license is obtained by those of classes belonging to the same package

33 33 Google Code Search Output

34 34 % of correct classifications  Found license:  Min. 29% (commons.codec), Avg. 82%, median: 89.5%  Inferred licenses:  Min. 62% (JLayer 1.0), Avg. 95%, median 100%  The inferring heuristic significantly better both in terms of completeness and of precision

35 35 Incorrect classifications  Most of them are between LGPL and GPL and between BSD and Apache.  commons-codec: mismatching between Apache and BSD  files licensed under the Apache v 1.1  derived from the BSD  JLayer: mismatching between GPL and LGPL  same inferred licenses in both releases (0.4 and 1.0)  however, JLayer moved from GPL to LGPL from release 0.4 to release 1.0

36 36 Conclusions  We proposed a code analysis method as support for lawyers other than for software engineers  We studied how licensing are used and evolve  License type, copyright year, contributors  Main findings:  License influence projects outcome  License influence code migration  Moving towards more permissive licenses  Copyright years and contributor names updated to preserve rights on new code

37 37 Licensing and code provenance  Licensing influences the direction in which code flows from a system towards another one  Often code flows in the direction of more permissive licenses… ..but there are many other factors influencing how code flows  Search-driven approaches can be adopted to determine from what code does a closed component come from  And thus its licensing…  Issues related to the capabilities of the code search tools

38 38 Thank you!

39 39 References  Daniel M. Germán, Jens H. Weber-Jahnke, Massimiliano Di Penta: Lawful Software Engineering, Proceedings of FoSER: Working Conference on the Future of Software Engineering Research, November 2010, Santa Fe', USA, 2010, ACM  Daniel M. Germán, Massimiliano Di Penta, Julius Davies: Understanding and Auditing the Licensing of Open Source Software Distributions. ICPC 2010: 84-93  Massimiliano Di Penta, Daniel M. Germán, Yann-Gaël Guéhéneuc, Giuliano Antoniol: An exploratory study of the evolution of software licensing. ICSE 2010: 145-154  Massimiliano Di Penta, Daniel M. Germán, Giuliano Antoniol: Identifying licensing of jar archives using a code-search approach. MSR 2010: 151-160  Massimiliano Di Penta, Daniel M. Germán: Who are Source Code Contributors and How do they Change? WCRE 2009: 11-20  Daniel M. Germán, Massimiliano Di Penta, Yann-Gaël Guéhéneuc, Giuliano Antoniol: Code siblings: Technical and legal implications of copying code between applications. MSR 2009: 81-90  Daniel M. Germán, Yuki Manabe, Katsuro Inoue: A sentence-matching method for automatic license identification of source code files. ASE 2010: 437-446  Daniel M. Germán, Ahmed E. Hassan: License integration patterns: Addressing license mismatches in component-based development. ICSE 2009: 188-198  Robert Gobeille: The FOSSology project. MSR 2008: 47-50


Download ppt "1 Licensing is Software Too: Achievements and Challenges (and how this relates to code provenance) Massimiliano Di Penta University of Sannio, Italy"

Similar presentations


Ads by Google