1 Technical Aspects Relating to Copyright and the Web Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY

1 1 Technical Aspects Relating to Copyright and the Web Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY This presentation contains images of copyrighted resources. Any copyright holder who wishes their images removed should contact the author. This presentation contains images of copyrighted resources. Any copyright holder who wishes their images removed should contact the author.

2 2 Contents Introduction Problems Caching / Mirroring Issues Performance Issues Indexing Issues Client Issues Possible Solutions Applications Protocol Developments Conclusions Further Information

3 3 What Do We Want? A Quicker, More Reliable Web We all know how slow the web can be Protection For Our Intellectual Property As web authors / developers we want to protect our intellectual property Sensible Ways of Including Resources Avoiding delays and bureaucracy Sensible Copyright Statements on our Pages Avoiding statements which show no understandings of web technologies Clarification of Responsibilities Is it my responsibility or the University?

4 4 lis-elib Discussion Posting to lis-elib on 22 April 1997 "… A strange copyright statement at the URL has been brought to my attention. It states that readers are not authorised to: (a) Alter the material in any way. (b) View or print the HTML source code." (a) sounds reasonable. Is it? (b) sounds unreasonable. Is it? 36 responses in last week of April The statement also prohibits storage of pages after 30 days

5 5 Caching What Is It? Storing of resources to reduce network load if resource requested again How? –On the server –On the client Is it Important? Yes! JISC is funding a national caching infrastructure Much development work in making web protocols cache-aware Essential to conserve scarce network bandwidth

6 6 Memory Caches Clients can store resource in: Disk caches Memory caches Contents of memory caches are lost when PC rebooted Probably no copyright implications

7 7 Client Side Disk Caches Client side disk caches can be viewed and manipulated: about:cache URL in Netscape Options menu in Internet Explorer Author's example: 158 files (HTML, GIF, CGI, …) in IE cache 267 files in Netscape cache

8 8 Management of Client Caches Who Manages the Client-side Cache? The End User? Is the end user responsible for removing files after 30 days? What happens if the end user is on holiday? The Local Computing Service? For centrally managed systems, the computer service could automate deletion of files The Browser Developer Is Microsoft / Netscape responsible? The Information Provider Is the information provider responsible for providing expiry dates?

9 9 Server Side Caches Server-side caching (proxy caching): Provide caching on behalf of clients in an institution Proxy servers needed in some institutions which run firewalls - proxy caches are then a free spin-off Proxy logs provide information on requests for external resources Legal / privacy issues associated with access to proxy logs (e.g. looking for porn) Need for guidelines for cache administrators?

10 10 National Caching Infrastructure JISC is funding a national caching infrastructure: Initially hosted at University of Kent Subsequently distributed between Kent and Leeds New service managed by Manchester / Loughborough How do nationally funded caches relate to copyright legislation? See cache.html for paper on caching and copyright

11 11 Off-Line Browsers Offline browsers are useful for: Giving presentations when no network available Reducing online charges Retrieving resources when network / server load is low Ensuring consistent retrieval time But: Do they breach copyright? Does the user breach copyright?

12 12 WebWacker WebWacker is a well-known off-line browser "WebWhacker, the ultimate offline browser. This powerful tool allows you to save Web pages - including text, graphics and HTML links - directly to your hard drive, so you can view them offline at highly accelerated speeds"

13 13 Pointers to External Images Is inclusion of pointers to external images a breach of copyright? BKelly/uniras94/uk_univ_logos.html Can inclusions of external images sometimes be regarded as "fair use"

14 14 Configurable Views Recent developments (e.g. Apple's HotSauce) enable end user to configure their view of a web site Hypothetical example: In my view of an American Sports information gateway: Soccer should be called Football Baseball should be a sub-category of Rounders

15 15 Dynamic HTML Recent developments with HTML (code name Cougar) provide greater control over HTML resources: Document Object Model (DOM) provides API for HTML elements Client scripting proposal defines mechanism for embedding client-side scripts Style sheets proposal describes layout control

16 16 Internet Explorer 4.0 Internet Explorer 4.0 implements various Cougar facilities: Objects (text, images) can be positioned outside the browser window and move based on a client event (e.g. mouse click) Appearance of HTML elements can be altered dynamically workshop/author/dhtml/

17 17 Internet Explorer 4.0 The content of HTML elements can be changed dynamically: Changing the visibility of objects Generating new content Example - Dynamic ToC Introduction General Information Further Details Introduction What is Cougar? What is DOM? General Information Further Details On MouseOver set visibility to ToC2 on Javascript Code

18 18 Document Object Model With DOM it may be possible to change the appearance of other people's pages University of Bath This is the home page for Bath University Copyright - no unauthorised alterations permitted My frame My frame page lets me change view of other pages University of Bathwater This is not the home page for Bath University Copyright - sucks My frame My frame page lets me change view of other pages Note that IE's implementation of DOM restricts DOM access to pages in same domain

19 19 Virtual Documents What are the copyright implications for virtual documents? Merseyworld provides access to Internet standards using a CGI program to retrieve a document and format it (using frames and embedded text / graphics) control/techwatch.html idshow?draft-abela-ulm-01.txt

20 20 Embedding Materials Paper on Bookmarking Service for Organizing and Sharing URLs described WebTagger. It uses a proxy service which adds buttons at top of pages for bookmarking. Is this "altering the appearance of a copyrighted document"? HyperNews/get/PAPER189.html

21 21 Bookmarks Bookmarks: Collection of facts (addresses) of Internet resources Developments in bookmarking software: –Check for changes –Collaborative ‘bookmarks’ (cf Firefly at No harm in using bookmarks?

22 22 Including Search Boxes Can you include a form from another web page? TSStanley/search/ alts.htm

23 23 Example - TotalNews TotalNews has links to over 1,000 news sources TotalNews uses frames to surround material from other sites with its ads ".. What TotalNews is doing is a trademark and copyright violation" Bruce Keller, law firm representing CNN, Wall Street Journal,...

24 24 Robots Robots (crawlers, spiders): Software which retrieve web resources for indexing, auditing, validation purposes Do robots breach copyright? Robots protocol can be used to instruct robots not to retrieve resource. However the protocol lacks granularity.

25 25 Solutions So far we've looked at examples which illustrate: How copyright may appear to be breached by mainstream technical developments That forumulating a copyright statement may conflict with other desirable aims Now let's look at possible solutions: Applications Protocol developments

26 26 Applications Limitations: Application specific Proprietary format See also project SCOPE: Read only documents Authorised printing See http://www. TechUnit.html CACTUS - ECMS http://acorn. Access can be managed by an application: Project Acorn provides access to PDF documents. Documents are "encrypted" and the application's printing and copying functions disabled. Printing controlled by form

27 27 Applications - WebReferee WebReferee is a web server utility designed to eliminate unauthorized references to content on your site. See Note that this can also be done using Apache server configuration options Today's Top Story... a picture of a lake... Heh, heh, heh... we're such good pirates we've stolen this from Maximized Software's demo site. Image source: X

28 28 Applications - Image Guardian Image Guardian protects images by using a Java applet See http://demo. ImageGuardian.htm

29 29 Watermarks Various work being carried out in providing digital watermarks Several applications for use with images, video and sound: SureSign - see http://www.highwatersignum. com/ProdInfo.html Research papers describe use with other applications: Watermar/watermar.html dig_wtr/dig_watr.html

30 30 Protocol Developments HTTP: Cache-control header allows client or server to transmit directives typically to override caching algorithms Cache-control private - resource must not be cached by a shared cache Cache-control no-cache - resource must not be cached HTML: Define expiry dates in HEAD

31 31 WD-htmllink Proposal The WD-htmllink spec: Defines relationships between web resources (e.g. Next, Previous, Parent, Help ) Can also define pages which can be indexed by robots. For example: prohibits robots from indexing and following links See WD-htmllink.html

32 32 Seeking Permission Why not email copyright holders: "I would like to include an image of your web page in a presentation I'm giving. The presentation is for academic purposes. If you don't reply with 2 weeks I assume that you have granted permission" Issues: –Finding the copyright holder –Does the "2 weeks" clause have any legal status? –Obtaining a timely response –Length of any acknowledgements

33 33 Technological or Legal Problem? Whose problem is copyright? Copyright law exists. It is the responsibility of: –Protocol developers –Software developers –Information providers –End users to ensure they do not breach copyright (when writing protocols, software or web pages) Copyright law is no longer appropriate in current form. It is the responsibility of legislators to redraft legislation

34 34 What Do We Want From Copyright Legislation? Is copyright: Means to "encourage learned men to compose and write useful books" - i.e. function of copyright is the promotion of learning Mechanism for publishers to maintain revenue What do we want from our copyright statements? Technical solutions are available - how much time and effort do we want to spend?

35 35 A Copyright Statement ACM: "Permission is granted to make digital or hard copies of part or all of this work for personal or classroom use is granted with or without fee provided that copies are not made or distributed for profit.. And that copies bear this notice and full citation on the first page …" How about adding: Permission is granted to make use of caching, off-line browsing and similar technologies to enhance web performance purposes only Small parts of this resource may be included in other documents for non-commercial use Read the online FAQ for further details Fill in the online form to submit other copyright requests Permission is granted to link to this resource

36 36 Further Information Copyright Resources copyproj.html copyright.html index.html Technologies (Watermarking) ~hartung/watermarkinglinks.html Caching

37 37 Further Information W3C eLib Software Applications Software

