Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anonymizing Web Transaction Logs to Ensure Privacy and Increase Usability Paul A. Soderdahl University of Iowa Libraries ILA/ACRL Spring 2003, Dubuque,

Similar presentations


Presentation on theme: "Anonymizing Web Transaction Logs to Ensure Privacy and Increase Usability Paul A. Soderdahl University of Iowa Libraries ILA/ACRL Spring 2003, Dubuque,"— Presentation transcript:

1 Anonymizing Web Transaction Logs to Ensure Privacy and Increase Usability Paul A. Soderdahl University of Iowa Libraries ILA/ACRL Spring 2003, Dubuque, IA May 2, 2003

2 Description of the Problem Iowa Code §22.7 The following public records shall be kept confidential… 13. The records of a library which, by themselves or when examined with other public records, would reveal the identity of the library patron checking out or requesting an item or information from the library….

3 Description of the Problem University Libraries User Privacy Policy The Libraries will not reveal the identities of individual users nor reveal the information sources or services they consult unless required by law…. The Libraries from time to time may aggregate and retain user data for a reasonable period of time in order to investigate the use or value of resources and services. It will, however, neither collect nor retain information identifying individuals except during the period when and only for the purpose that such record is necessary to furnish a specific service.

4 Description of the Problem University Libraries User Privacy Policy Publicly Accessible Digital Information Systems The Libraries’ computer-based access systems (e.g., InfoHawk or various digital information systems) frequently track or "log" the actions performed by users of those systems. Transaction level logging that can be tied to individuals may be kept intact for a limited period of time for trouble-shooting and problem resolution related to system functions and service transactions. During the period this information is retained, it is held in confidence and is not shared with third parties unless required by law….

5 Description of the Problem University Libraries User Privacy Policy Publicly Accessible Digital Information Systems When the information is no longer useful, by a reasonable standard, for resolving problems, the Libraries may aggregate and retain anonymized user data in order to investigate the use or value of resources and services. Information regarding individual identities (or the source of the transaction) will be removed. Original transaction logging information that has been processed in this way will be destroyed and care taken to ensure that backups or other inadvertently stored forms of the data are not retained.

6 Description of the Problem Web transaction log data 1.Date and time 2.Client IP address 3.Client username (if logged in) 4.HTTP method used (usually GET/PUT) 5.URL requested 6.Parameters passed to URL (everything after the question mark) 7.HTTP status code 8.# of bytes server sends to client 9.# of bytes client sends to server 10.Length of transaction 11.Client software used 12.Any cookie client passed to server 13.URL of referring page

7 Sample Entry Server: Library Explorer 2002-07-01 15:21:17 129.105.86.107 - GET /ch1/subjectsearch/p_medicine.htm - 200 22726 495 16 explorer.lib.uiowa.edu Mozilla/4.0+(compatible;+MSIE+5.01;+Windo ws+NT+5.0;+.NET+CLR+1.0.3705) - http://www.google.com/search?q=dictionary+ stand&hl=en&lr=&ie=UTF-8&start=10&sa=N

8 Sample Entry Server: PURL 2002-05-21 18:26:21 128.255.153.27 - GET /wiley/BioEssays - 304 140 508 516 purl.lib.uiowa.edu Mozilla/4.0+(compatible;+MSIE+6.0;+Window s+98) - http://www.lib.uiowa.edu/hardin- www/ejrnl.html

9 Sample Entry Server: Intranet 2002-02-13 19:50:57 128.255.52.248 soderdhl GET /infohawk/Staffdirectory.htm - 200 105590 430 109 Mozilla/4.0+(compatible;+MSIE+5.01;+Windo ws+NT+5.0) - http://intranet.lib.uiowa.edu/infohawk/

10 Let’s Get Real How much can you tell from 128.255.52.248? So what that a would-be terrorist looked at our overdue fines? Are web transaction logs that sexy?

11 Web Usage Reports Does the individual workstation information provide any valuable data? What do we really want to know about our web server usage?

12 Web Usage Reports H.I.T.S. – How Idiots Track Success Look for trends Document activity

13 Web Usage Reports Visitor profiles –On campus vs. off campus, etc. –Public workstation usage Reduce noise –Automated tests –Robots

14 Web Usage Reports proxy library public workstations residence halls computing labs dial-up ISU UNI ICPL AOL robots charlotte on campus UI affiliate off campus

15 IP Translation At end of each month, modify log file Strip out IP address information Replace with pseudo-DNS –Fake domain names based on interest

16 IP Translation Instead of DNS lookup: 129.105.86.107  mac107.civil.northwestern.edu 128.255.153.27  dhcp80ff991b.dynamic.uiowa.edu Replace with pseudo-DNS: 128.255.53.89  mnpub07.lib-public-uiowa.edu 128.255.104.*  anonymous.itc-uiowa.edu *.*.*.*  anonymous.unknown.com

17 Examples Sample report before IP translationSample report before IP translation Sample report after IP translationSample report after IP translation

18 Examples

19

20 IP translation table Perl script

21 Weaknesses Lose distinctions:.edu,.gov,.com,.mil Lose foreign country usage

22 Improvements #1: Pseudo-DNS based on interest –lib-public-uiowa.edu –residence-rooms-uiowa.edu #2: DNS lookup and re-anonymize –anonymous.mil –anonymous.uk

23 Questions paul-soderdahl@uiowa.edu


Download ppt "Anonymizing Web Transaction Logs to Ensure Privacy and Increase Usability Paul A. Soderdahl University of Iowa Libraries ILA/ACRL Spring 2003, Dubuque,"

Similar presentations


Ads by Google