Presentation on theme: "Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago."— Presentation transcript:
Copyright and Mass-Digitization: The strategic importance of data-mining Presentation Details Matthew Sag Professor of Law Loyola University of Chicago email@example.com www.matthewsag.com
Abbreviated Time Line 2004 Google library project begins 2005 Class action suit filed by Authors Guild (among others) 2008 & 2009 Settlement proposed, objections follow, settlement revised 2011 (March) Settlement rejected (September) 2011 Authors Guild v. HathiTrust filed 2012 (August) oral argument in Authors Guild v. HathiTrust (October) Judge Baer ruled against the plaintiffs in Authors Guild v. HathiTrust. Library digitization (ADA + Data) are fair use. 2013 (July) Second Cir. tells Judge Chin, no class certification without addressing the fair use issue (September) oral argument on fair use in Authors Guild v. Google
The strategic importance of text-mining Different kinds of digitization program raise different legal issues and bring in different stakeholders.
The Many Faces of Library/Archive Digitization Preservation Data production and analysis* Searching books, testing search algorithms, computational linguistics, automated translation, natural language processing, macro-analysis of text A platform for display and distribution of individual works Disabled access* Scholarly access General access 4
Strategic Considerations Library digitization for data production and analysis Significant academic and commercial constituency (not just Google!) Strong normative appeal Obvious orphan works problem Justifies digitizing entire collections Even if some other uses are too much, no all-copyright owner class action possible
The Legal Argument #1 Metadata – facts about the work – does not infringe the rights of the copyright owner. – This is not usually contested, but its important to make sure everyone understands the reasons why metadata cant infringe. Those reasons are … Idea-expression distinction Merger doctrine Metadata is not substantially similarity to underlying text Facts about the work dont originate with the author
Legal Argument #2 A copying process that only produces metadata does not infringe. Intermediate non-expressive use is either (a) not copying in the relevant sense or (b) fair use The distinction between expressive and nonexpressive parts of works is well recognized (no copyright in a phone book, etc). The same distinction should be made in relation to potential acts of infringement. Intermediate non-expressive uses dont communicate the authors original expression to the public. No expressive substitution, no infringement
Application to Fair Use Sect. 107 Factors (1) purpose and character: Like transformative uses, a nonexpressive use poses no risk of expressive substitution (2) nature of the work … not much use (3) Amount and Substantiality: Like transformative uses, because there is no expressive substitution in a nonexpressive use, the amount of copying is qualitatively insignificant. (4) Market effect: Like transformative uses, a nonexpressive use poses no risk of expressive substitution, thus no cognizable market effect.
Legal Argument #3 Non-expressive use does not harm copyright owners and has great social value
The United States is versus The United States are 1780 –1900
13 American Slavery in American, English, and Irish Literature, 1800-1899. Matthew Jockers, Macroanalysis: Digital Methods for Literary History (2013) Proportion of Irish Literature with a topic of slavery spikes ~ 1860-65
Importance of the Digital Humanities Brief Focused attention on digitization for the sake of data Demonstrated importance Disentangled it from other issues Not just a Google issue, Not just an internet issue, Not just a research/scholarship issue Powerful examples tied directly to the understanding of literature » In case making the Internet work through caching and search was not enough for you!
Quotes from HathiTrust judgment … I cannot imagine a definition of fair use that would not encompass the transformative uses made by Defendants' MDP and would require that I terminate this invaluable contribution to the progress of science and cultivation of the arts that at the same time effectuates the ideals espoused by the ADA. – The search capabilities of the HDL have already given rise to new methods of academic inquiry such as text mining. (brief cited) – … metadata and text mining, which "could actually enhance the market for the underlying work, by causing researchers to revisit the original work and reexamine it in more detail (brief quoted)
Impact of the Digital Humanities Amicus Brief Three for the price of one Authors Guild v. HathiTrust (district court) Authors Guild v. Google (district court) Authors Guild v. HathiTrust (court of appeals) Over 100 signatories! Discussed with approval in HathiTrust United States is/are example made its way into the judgment in HathiTrust last year and oral argument in Google books on this week!
Some Concluding Thoughts Specific legal issues vary by jurisdiction fair use, fair dealing, legislative reform Underlying policy questions are global Idea-expression distinction The promise of big data and problem of orphan works Challenge for libraries and archives is making courts/decision makers understand the broader consequences
Action Items Commercial and non-commercial digitizers need to work together and defend everyones right to non- expressive use Digital Humanities, Linguistics, Comp. Sci., Libraries Search providers, plagiarism and copyright infringement detection tools, music identification tools, reverse engineering Advantage of flexible limitations and exceptions Without reform, other nations cede ground to the U.S. as the data engine of the world.
Abbreviated Issues Summary IssueStatusCaseNotes PreservationStill open, but court unconvinced v. HathiTrust Orphan works display Still open, not ripev. HathiTrustTrove (Australia) Best practices Disability accessDigitization okv. HathiTrustOn appeal Data miningDigitization okv. HathiTrustAll but given up in v. Google Library copies as quid pro quo Still openv. GoogleEasier now underlying use is fair use Making/retaining excessive copies Still openv. Google Snippet displayStill openv. Google Standing, remedies, class action … Mixedv. HathiTrust v. Google
Further Reading Matthew Jockers, Matthew Sag & Jason Schultz, Digital Archives: Dont Let Copyright Block Data Mining, 490 N ATURE 29-30 (October 4, 2012)
Further reading Matthew Sag, Orphan Works as Grist for the Data Mill, 27 B ERKELEY T ECHNOLOGY L AW J OURNAL 1503 – 1550 (2012) Matthew Sag, Copyright and Copy-Reliant Technology, 103 N ORTHWESTERN U NIVERSITY L AW R EVIEW 1607–1682 (2009)