Anonymizing Web Transaction Logs to Ensure Privacy and Increase Usability Paul A. Soderdahl University of Iowa Libraries ILA/ACRL Spring 2003, Dubuque,

Slides:



Advertisements
Similar presentations
Privacy and Library Systems Karen Coyle for InfoPeople November, 2004.
Advertisements

Remember to forget me. HTTP Logs Keep DNT=1 in logs and when/if all exemption covered use has been made – de-identify (IP= , UA=ZZZ) – erase the.
1 Configuring Internet- related services (April 22, 2015) © Abdou Illia, Spring 2015.
CMSC 414 Computer (and Network) Security Lecture 16 Jonathan Katz.
1 Web Content Delivery Reading: Section and COS 461: Computer Networks Spring 2007 (MW 1:30-2:50 in Friend 004) Ioannis Avramopoulos Instructor:
CGI Programming: Part 1. What is CGI? CGI = Common Gateway Interface Provides a standardized way for web browsers to: –Call programs on a server. –Pass.
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
University of Kentucky Proxy Service Presentation By Kelly Vickery
Server tools. Site server tools can be utilised to build, host, track and monitor transactions on a business site. There are a wide range of possibilities.
Internet Information ISYS 105B. What is the Internet? Comprised of network of computers Started in 1969 by U.S. Defense Dept.
ASHIMA KALRA IMPORTANT TERMS.  WWW WWW  URL URL  HTTP PROTOCOL HTTP PROTOCOL  PROXIES PROXIES.
Conditions and Terms of Use
CH2 System models.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 23 How Web Host Servers Work.
1 In the good old days... Years ago… the WWW was made up of (mostly) static documents. –Each URL corresponded to a single file stored on some hard disk.
Protecting Patron Information in a Consortial Environment Issues and Strategies Jennifer Kuntz
1 Lies, damn lies and Web statistics A brief introduction to using and abusing web statistics Paul Smith, ILRT July 2006.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
The Intranet.
Module 2 – User Safety Privacy Attacks on end users Browser vulnerabilities.
1 Web Servers (Chapter 21 – Pages( ) Outline 21.1 Introduction 21.2 HTTP Request Types 21.3 System Architecture.
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
Privacy & Confidentiality in Internet Research Jeffrey M. Cohen, Ph.D. Associate Dean, Responsible Conduct of Research Weill Medical College of Cornell.
An NZFFBS Training Module.  Objective 1  State the purpose and principles of the Privacy Act and the Code of Ethics.  Objective 2  Apply the principles.
Introduction Web analysis includes the study of users’ behavior on the web Traffic analysis – Usage analysis Behavior at particular website or across.
Internet Privacy Define PRIVACY? How important is internet privacy to you? What privacy settings do you utilize for your social media sites?
RECORDS MANAGEMENT Judith Read and Mary Lea Ginn Chapter 7 Storing, Retrieving, and Transferring Records 1 © 2016 Cengage Learning ®. May not be scanned,
 1- Definition  2- Helpdesk  3- Asset management  4- Analytics  5- Tools.
1 Chapter 1 INTRODUCTION TO WEB. 2 Objectives In this chapter, you will: Become familiar with the architecture of the World Wide Web Learn about communication.
4.01 How Web Pages Work.
4.01 How Web Pages Work.
SharePoint 101 – An Overview of SharePoint 2010, 2013 and Office 365
BUILD SECURE PRODUCTS AND SERVICES
4.01 How Web Pages Work.
Installing TMG & Choosing a Client Type
Module 3: Enabling Access to Internet Resources
ELECTRONIC RETURN ORIGINATOR (ERO) (Transmitter in Tax-Wise)
The Intranet.
Instructor Materials Chapter 5 Providing Network Services
Enabling Secure Internet Access with TMG
SSL Certificates for Secure Websites
Web Development Web Servers.
Privacy principles Individual written policies
19.10 Using Cookies A cookie is a piece of information that’s stored by a server in a text file on a client’s computer to maintain information about.
E-commerce | WWW World Wide Web - Concepts
E-commerce | WWW World Wide Web - Concepts
IT Applications Theory Slideshows
Latest Updates on BlackHawk Mines Music : Privacy Policy
Networks Problem Set 1 Due Oct 3 Bonus Date Oct 2
Troubleshooting IP Communications
Providing Network Services
Wednesday, September 19, 2018 What Is the Internet?
Chapter 12: Automated data collection methods
The Request & Response object
What is Cookie? Cookie is small information stored in text file on user’s hard drive by web server. This information is later used by web browser to retrieve.
Chapter 14: Representing Identity
Chapter 27 WWW and HTTP.
Machine Independent Features
Configuring Internet-related services
Web Privacy Chapter 6 – pp 125 – /12/9 Y K Choi.
BMV Leisure & Shaftesbury Luxury Lodges GDPR Statement
In-house Developed Library Solutions
Designing IIS Security (IIS – Internet Information Service)
4.01 How Web Pages Work.
Web Servers (IIS and Apache)
4.01 How Web Pages Work.
The Internet and Electronic mail
Personal Privacy and the Public Internet
Presentation transcript:

Anonymizing Web Transaction Logs to Ensure Privacy and Increase Usability Paul A. Soderdahl University of Iowa Libraries ILA/ACRL Spring 2003, Dubuque, IA May 2, 2003

Description of the Problem Iowa Code §22.7 The following public records shall be kept confidential… 13. The records of a library which, by themselves or when examined with other public records, would reveal the identity of the library patron checking out or requesting an item or information from the library….

Description of the Problem University Libraries User Privacy Policy The Libraries will not reveal the identities of individual users nor reveal the information sources or services they consult unless required by law…. The Libraries from time to time may aggregate and retain user data for a reasonable period of time in order to investigate the use or value of resources and services. It will, however, neither collect nor retain information identifying individuals except during the period when and only for the purpose that such record is necessary to furnish a specific service.

Description of the Problem University Libraries User Privacy Policy Publicly Accessible Digital Information Systems The Libraries’ computer-based access systems (e.g., InfoHawk or various digital information systems) frequently track or "log" the actions performed by users of those systems. Transaction level logging that can be tied to individuals may be kept intact for a limited period of time for trouble-shooting and problem resolution related to system functions and service transactions. During the period this information is retained, it is held in confidence and is not shared with third parties unless required by law….

Description of the Problem University Libraries User Privacy Policy Publicly Accessible Digital Information Systems When the information is no longer useful, by a reasonable standard, for resolving problems, the Libraries may aggregate and retain anonymized user data in order to investigate the use or value of resources and services. Information regarding individual identities (or the source of the transaction) will be removed. Original transaction logging information that has been processed in this way will be destroyed and care taken to ensure that backups or other inadvertently stored forms of the data are not retained.

Description of the Problem Web transaction log data 1.Date and time 2.Client IP address 3.Client username (if logged in) 4.HTTP method used (usually GET/PUT) 5.URL requested 6.Parameters passed to URL (everything after the question mark) 7.HTTP status code 8.# of bytes server sends to client 9.# of bytes client sends to server 10.Length of transaction 11.Client software used 12.Any cookie client passed to server 13.URL of referring page

Sample Entry Server: Library Explorer :21: GET /ch1/subjectsearch/p_medicine.htm explorer.lib.uiowa.edu Mozilla/4.0+(compatible;+MSIE+5.01;+Windo ws+NT+5.0;+.NET+CLR ) - stand&hl=en&lr=&ie=UTF-8&start=10&sa=N

Sample Entry Server: PURL :26: GET /wiley/BioEssays purl.lib.uiowa.edu Mozilla/4.0+(compatible;+MSIE+6.0;+Window s+98) - www/ejrnl.html

Sample Entry Server: Intranet :50: soderdhl GET /infohawk/Staffdirectory.htm Mozilla/4.0+(compatible;+MSIE+5.01;+Windo ws+NT+5.0) -

Let’s Get Real How much can you tell from ? So what that a would-be terrorist looked at our overdue fines? Are web transaction logs that sexy?

Web Usage Reports Does the individual workstation information provide any valuable data? What do we really want to know about our web server usage?

Web Usage Reports H.I.T.S. – How Idiots Track Success Look for trends Document activity

Web Usage Reports Visitor profiles –On campus vs. off campus, etc. –Public workstation usage Reduce noise –Automated tests –Robots

Web Usage Reports proxy library public workstations residence halls computing labs dial-up ISU UNI ICPL AOL robots charlotte on campus UI affiliate off campus

IP Translation At end of each month, modify log file Strip out IP address information Replace with pseudo-DNS –Fake domain names based on interest

IP Translation Instead of DNS lookup:  mac107.civil.northwestern.edu  dhcp80ff991b.dynamic.uiowa.edu Replace with pseudo-DNS:  mnpub07.lib-public-uiowa.edu *  anonymous.itc-uiowa.edu *.*.*.*  anonymous.unknown.com

Examples Sample report before IP translationSample report before IP translation Sample report after IP translationSample report after IP translation

Examples

IP translation table Perl script

Weaknesses Lose distinctions:.edu,.gov,.com,.mil Lose foreign country usage

Improvements #1: Pseudo-DNS based on interest –lib-public-uiowa.edu –residence-rooms-uiowa.edu #2: DNS lookup and re-anonymize –anonymous.mil –anonymous.uk

Questions