Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Licensing is Software Too: Achievements and Challenges (and how this relates to code provenance) Massimiliano Di Penta University of Sannio, Italy

Similar presentations


Presentation on theme: "1 Licensing is Software Too: Achievements and Challenges (and how this relates to code provenance) Massimiliano Di Penta University of Sannio, Italy"— Presentation transcript:

1 1 Licensing is Software Too: Achievements and Challenges (and how this relates to code provenance) Massimiliano Di Penta University of Sannio, Italy

2 2 Acknowledgements  Daniel M. Germán, Univ. Victoria, Canada  Julius Davies, Univ. Victoria, Canada  Giuliano Antoniol, Ecole Polyt. Montréal, Canada  Yann-Gaël Guéhéneuc, Ecole Polyt. Montréal, Canada

3 3 Reusing Open Source Software  When developing a software system, we try (if possible) not to reinvent the wheel  Components, libraries, source code snippets out of there, ready to be reused  Code search engines are becoming popular  Open source code modification and redistribution governed by  Software licenses  Copyright statements  Everything contained in a licensing block…

4 4 What does a licensing contain? /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * …. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner …. * decision by deleting the provisions above and replace them with the notice * and other provisions required by the GPL or the LGPL. If you do not delete * the provisions above, a recipient may use your version of this file under * the terms of any one of the MPL, the GPL or the LGPL. * * ***** END LICENSE BLOCK ***** */ #include "nsXULAppAPI.h" #ifdef XP_WIN #include License (MPL+GPL+LGPL) Copyright statement Copyright year Contributor

5 5 Restrictive vs. permissive licenses  Restrictive (aka copyleft or reciprocal)  Changed software must be made available under similar terms wrt. the original  Example: GPL  Permissive  Modifications/enhancements may remain proprietary  Distribution of source code or binary permitted – Provided copyright notice and/or liability disclaimers – Contributor names do not imply endorsement  Examples: Berkeley Software Distribution (BSD), Apache Software License, MIT

6 6 FOSS development teams care! (source: Debian) I am in the process of trying to prepare for Debian GNU/Linux I have started going over the copyright/license headers. In src/celeste many files are missing copyright information. Most of these are files imported with minimal changes from Gabor API or libsvm The attached patch adds copyright and license statements to these files.[1] Please apply and update the headers (adding copyright holders) if you make substantial changes. thanks, cu andreas [1] I have doublechecked with Gabor API's upstream author Adriaan Tijsseling that files like ContrastFilter.cpp are Copyright (c) Adriaan Tijsseling and licensed under GPLv2+, although the original headers just say: Original Author: Yasunobu Honma Modifications by: Adriaan Tijsseling (AGT)

7 7 Conjectures  Since licenses determine the way software can be composed and re-distributed  They may change/evolve as any other part of the software  They might be subject to bugs too – See our ICPC 2010 paper about how to identify licensing incompatibilities  They might determine the success/failure of a software project  Code provenance and licenses:  Licenses constrain source code migration between projects  Code provenance might be useful to determine the licensing of closed components

8 8 Licenses influence the software lifetime  OpenBSD founder and project leader Theo de Raadt removed a security software package called IP-Filter [written by Darren Reed] after its author changed its license. Stephen Shankland, CNET News, 2001/05/30.  Licenses evolve as software does  Failing to account for that would cause copyright infringements  Decisions on license changes impact as other decisions on software evolution  Little attention so far from the scientific community Need for methods and tools to audit licensing and their changes

9 9 Example: Java  Until November 2006, the license of Java JDK v1.2 said: “Except as specifically authorized in any Supplemental License Terms, you may not make copies of Software, other than a single copy of Software for archival purposes”  This disallowed the inclusion of Java in Linux distributions  Java 5.0 released under the GPL v2 with the CLASSPATH exception:  Java could be modified/updated under the GPL v2  Java programs could be released under any license as long as they satisfy the conditions stated in the CLASSPATH exception Changing the license of a system can promote and ease the distribution and reuse of a software system

10 10 Example: Mono  Framework produced by Novell to support the.Net API under non Microsoft OS  Initially distributed under the GPL v2  potential problem when running.Net systems  Considered derivative works of Mono  Required to be also released under the GPL v2  Mono developers changed its license to MIT/X11  The change was also required by HP for its participation to the project A change to a more permissive license may increase the size of the community of contributors to a FOSS system

11 11 Example: QT  First released under a non-open source but free license, called the FreeQT License, and a commercial license  QT became the basis for KDE  QT v2.0 was released under a new license, the Q Public License  incompatible with the GPL  GNOME project started as a QT-free alternative to KDE  Harmony project started as a GPL replacement of QT  Trolltech changed the license of QT v3 to the GPL v2  The Harmony project was abandoned Changing the license of FOSS system towards a more permissive might cause the abandonment of a competing system

12 12 Example: MySQL  In 2004, MySQL AB changed the license of its client libraries from LGPL v2.1 to GPL v2  to prevent industrial companies from using the libraries within proprietary products  Unintended consequences:  PHP systems were no longer able to connect to MySQL  PHP license is incompatible with the GPL v2  MySQL addressed this problem by adding the MySQL FOSS License Exception to the GPL v2 Changing the license of a FOSS system might have unintended/undesirable consequences to its legitimate users

13 13 Empirical Study  Goal: analyze licensing evolution  Purpose: investigating how developers change licensing statements  Context: CVS/SVN repositories of  ArgoUML, Eclipse-JDT, the FreeBSD and the OpenBSD kernels, Mozilla, Samba

14 14 Research Questions  RQ1: To what extent are files changing their licenses?  RQ2: How are copyright years changed in licensing statements?  RQ3: Who are the contributors of a software project and how do they change?

15 15 Licensing Analysis Method – Extracting Licensing statements /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * …. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner …. * decision by deleting the provisions above and replace them with the notice * and other provisions required by the GPL or the LGPL. If you do not delete * the provisions above, a recipient may use your version of this file under * the terms of any one of the MPL, the GPL or the LGPL. * * ***** END LICENSE BLOCK ***** */ #include "nsXULAppAPI.h" #ifdef XP_WIN #include

16 16 Licensing Analysis Method – Classifying licenses  FoSSology [Gobeille, MSR 2008]: detects licenses using the Binary Symbolic Alignment Matrix (bSAM)  Ninka [German et al., ASE 2010]: uses a pattern- matching approach /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * …. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner …. /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * …. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner …. MPL 1.1/GPL 2.0/LGPL 2.1

17 17 Licensing Analysis Method – Identifying changes in copyright years  Mining references to years in licensing… /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * …. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner …. /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * …. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner ….

18 18 Licensing Analysis Method – Identifying contributor names  Mining s, plus various patterns  Copyright … year name  Contributor(s) …  And mapped to committers, whenever possible /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * …. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner …. /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * …. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner ….

19 19 RQ1: Most relevant license changes Eclipse-JDT Common Public License v1.0Eclipse Public License v1.0CHANGE2394 Common Public License v0.5Common Public License v1.0UPDATE808 Mozilla NPL'NPL v1.1'-style+GPL v2+LGPL v2.1DUAL2914 NPL'Dual MPL GPL'-style+MPLDUAL1274 'Dual MPL GPL'-style+MPLNPLBUG1194 Licensing updated as new licenses were developed  Eclipse JDT: CPL 0.5  CPL 1.0  EPL 1.0  IBM has relinquished control of licenses to the Eclipse Foundation  Mozilla: NPL  MPL + GPL (+ LGPL)  NPL allowed to release Netscape 6 as a proprietary system  MPL only allows to re-distribute the source code under the MPL  Multiple licenses to deal with incompatibilities  Files wrongly changed to NPL (bug #98089)

20 20 RQ1: Most relevant license changes FreeBSD BSD UCRegents (4-cl BSD) 'BSD UCRegents'-style (4-cl BSD) UPDATE491 'BSD UCRegents'-style (4-cl BSD)'INRIA-OSL'-style (3-cl BSD)UPDATE300 OpenBSD 'BSD UCRegents'-style (4-cl BSD)'INRIA-OSL'-style (3-cl BSD)UPDATE964 BSD UCRegents (4-cl BSD) 'BSD UCRegents'-style (4-cl BSD) UPDATE414  FreeBSD and OpenBSD are more eclectic than other projects  Moving from BSD-4 clauses to the more permissive BSD-3 and BSD-2

21 21 RQ1: Most relevant license changes ArgoUML None 'Free with copyright clause'-style +'UC Regents free with copyright clause'-style ADD127 Samba NoneGPL v2ADD15  ArgoUML and Samba kept the same licenses over the analyzed time span  Change is from None to a simple license  Authors realized the importance of including a license

22 22 RQ2: How and why were copyright years changed?  Files for which the copyright years were updated underwent a significantly higher number of changes than others  When developers perform substantial changes to a file, they also update copyright years  Required by copyright regulations  Lack of updates with substantial changes would allow an infringer to claim “innocent infringement”  Commits explicitly targeted to copyright years  “Updated copyrights”  “Updated copyrights to 2004”

23 23 RQ3: When do contributors change?  Changes where contributor names are added are significantly bigger than other changes  Contributors often added when they make substantial changes  Contributor names are important assets in source code  Like the signature on a picture  However…  contributors can change during the time  no standard way of reporting them  no clear rule on when one should become a contributor  Their presence can have legal implications

24 Licenses Influence Code Migration

25 25 Free (software) as a bird…  As birds migrate differently during different seasons….  Code might have a migration preferential direction  Given two systems  e.g. FreeBSD and Linux  We find the same code in both systems  Three scenarios:  Migration FreeBSD  Linux  Migration Linux  FreeBSD  Migration third-party  FreeBSD, Linux

26 26 Licenses in the kernels  Let’s consider 3 Unix kernels  Linux, FreeBSD, and OpenBSD  Linux: mainly GPL (65%) with some (25%) “promoted” to GPL by L. Torvalds  A few files (35) have two licenses  FreeBSD: 75% of the files have BSD license  Some files, with corporate (Intel)  Some with MIT  Others with multiple license (BSD and GPL, MIT and GPL, BSD and Educational)  OpenBSD: similar to FreeBSD

27 27 Sibling(s) Origin  Identify siblings between systems using clone detection  CCFinderX, with >100 tokens as threshold, plus other heuristics  Trace back into past siblings – their code fragments in the same files  Again clone detection, the sibling fragment wrt. previous file revisions  When they disappear, then we have their origins  Take the oldest of the two as the true origin Sys 1 – File i Sys 2 – File j siblings Cloned fragments Migration direction

28 28 Code Migration and Licenses FreeBSD Linux Files BSD GPL8 BSD MIT2 BSD None2 CorporateBSD+GPL89 GPL None1 PhraseBSD+GPL1 X.Net+BS D MIT1 Linux FreeBSD Files BSD+GPL Corporate8 GPL BSD17 GPLBSD+GPL1 GPLCPL+BSD+GPL1 MIT BSD1 MIT+GPL None2 BSD1 Phrase+GP L MIT2 OpenBSD LinuxFiles BSD BSD+GPL1 BSD MIT2 BSD Unknown1 BSD+GPL GPL1 BSD+Phra se Phrase+GPL1 MIT GPL23 After Jan 1, 2002 Nothing before Before Jan 1, 2002 Almost nothing after

29 29 Discussion  Siblings have a preferential flow  Initially from BSD(s) to Linux – frequent  Today from Linux to FreeBSD – less frequent  Thus, due to licenses but also to the system level of development  Companies directly contribute to code in different kernels – see Intel drivers with dual licenses  In this case, code migrates from a third party towards Linux and FreeBSD

30 Identifying licenses of jar archives

31 31 Motivations  Very often, Java open source software is distributed in jar archives  See  Problem: the jar might not contain licensing info  Under what conditions can we integrate the component?  The jar might not be legally used  Even if it’s from open source code, we might not found exactly the same jar

32 32 Search-driven approach  Extracting info from the class bytecode  Class and package names.. or a fingerprint..  We use the ASM library (http://asm.ow2.org/)  Querying Google Code Search  Using the full qualified class name  Using the package only  Query performed using the Google Code API (http://code.google.com/apis/gdata/)  If the same class is not found, its license is obtained by those of classes belonging to the same package

33 33 Google Code Search Output

34 34 % of correct classifications  Found license:  Min. 29% (commons.codec), Avg. 82%, median: 89.5%  Inferred licenses:  Min. 62% (JLayer 1.0), Avg. 95%, median 100%  The inferring heuristic significantly better both in terms of completeness and of precision

35 35 Incorrect classifications  Most of them are between LGPL and GPL and between BSD and Apache.  commons-codec: mismatching between Apache and BSD  files licensed under the Apache v 1.1  derived from the BSD  JLayer: mismatching between GPL and LGPL  same inferred licenses in both releases (0.4 and 1.0)  however, JLayer moved from GPL to LGPL from release 0.4 to release 1.0

36 36 Conclusions  We proposed a code analysis method as support for lawyers other than for software engineers  We studied how licensing are used and evolve  License type, copyright year, contributors  Main findings:  License influence projects outcome  License influence code migration  Moving towards more permissive licenses  Copyright years and contributor names updated to preserve rights on new code

37 37 Licensing and code provenance  Licensing influences the direction in which code flows from a system towards another one  Often code flows in the direction of more permissive licenses… ..but there are many other factors influencing how code flows  Search-driven approaches can be adopted to determine from what code does a closed component come from  And thus its licensing…  Issues related to the capabilities of the code search tools

38 38 Thank you!

39 39 References  Daniel M. Germán, Jens H. Weber-Jahnke, Massimiliano Di Penta: Lawful Software Engineering, Proceedings of FoSER: Working Conference on the Future of Software Engineering Research, November 2010, Santa Fe', USA, 2010, ACM  Daniel M. Germán, Massimiliano Di Penta, Julius Davies: Understanding and Auditing the Licensing of Open Source Software Distributions. ICPC 2010:  Massimiliano Di Penta, Daniel M. Germán, Yann-Gaël Guéhéneuc, Giuliano Antoniol: An exploratory study of the evolution of software licensing. ICSE 2010:  Massimiliano Di Penta, Daniel M. Germán, Giuliano Antoniol: Identifying licensing of jar archives using a code-search approach. MSR 2010:  Massimiliano Di Penta, Daniel M. Germán: Who are Source Code Contributors and How do they Change? WCRE 2009:  Daniel M. Germán, Massimiliano Di Penta, Yann-Gaël Guéhéneuc, Giuliano Antoniol: Code siblings: Technical and legal implications of copying code between applications. MSR 2009:  Daniel M. Germán, Yuki Manabe, Katsuro Inoue: A sentence-matching method for automatic license identification of source code files. ASE 2010:  Daniel M. Germán, Ahmed E. Hassan: License integration patterns: Addressing license mismatches in component-based development. ICSE 2009:  Robert Gobeille: The FOSSology project. MSR 2008: 47-50


Download ppt "1 Licensing is Software Too: Achievements and Challenges (and how this relates to code provenance) Massimiliano Di Penta University of Sannio, Italy"

Similar presentations


Ads by Google