Presentation is loading. Please wait.

Presentation is loading. Please wait.

Please do not reuse these slides without prior permission from Copyright 2004 Scott Nicholson, Syracuse.

Similar presentations


Presentation on theme: "Please do not reuse these slides without prior permission from Copyright 2004 Scott Nicholson, Syracuse."— Presentation transcript:

1 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Striking a Balance: Bibliomining and Privacy Scott Nicholson Assistant Professor Syracuse University School of Information Studies http://bibliomining.org scott@scottnicholson.com

2 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies What is Bibliomining? Bibliomining is the combination of Bibliometrics and Data Mining used on the data produced during the operation of libraries (physical and digital)

3 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies What is Bibliomining?  Application of advanced analysis tools to data produced by libraries  May include Data mining Bibliometrics (patterns in scholarship) Online analytical processing (OLAP) Other statistical techniques

4 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Goals of Bibliomining  Improved decision-making through better understanding of Patron Behavior Library Staff Behavior Behavior of outside organizations  Can provide justification for Library management policies and decisions Acquisitions and ILL source selection Collection development decisions Use of library services (funding bodies)

5 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Steps in Bibliomining  Determine areas of focus Prediction vs. Description  Determine data source needs Internal and External  Gather data  Create data warehouse  Select appropriate analysis tools  Create & test models / Create reports  Analyze results

6 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Creating the data warehouse  A data warehouse is a collection of cleaned and anonymized data in a relational database and a point for queries  Outside of the operational systems  Connects disparate data sources into easily accessible database  Can be one time or updated on a regular basis

7 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Steps in the Warehousing Process  Identify fields of interest  Determine fields that contain personally identifiable information (PII)  Determine combinations of fields that create PII (dept. + level + gender)

8 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Methods for dealing with Personally Identifiable Information  Use codes, Ids for matching and discard

9 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Codes for PII  One typical suggestion – code the PII fields, and then record the codes in the database Appropriate for other parties  Do not use a reversible encoding procedure to encode variables. This does not protect patron’s information from an investigation.

10 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Coding and not discarding  Use a code when there is some aspect of the ID that is important Example – IP addresses  Think about the use of the field, and code appropriately  Do not generate code from original; rather, use other methods for code that capture key information

11 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Methods for dealing with Personally Identifiable Information  Use for matching and discard

12 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Dealing with categories Make sure that combinations of categories don’t identify an individual.

13 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Dealing with Textual data  Digital Reference transactions  Easy to deal with the metadata  Hard to deal with the text Manual cleaning of PII Natural Language Processing research  Similar problem with deidentification of medial records

14 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies People to Involve  Institutional Research Board (IRB)  Legal counsel Ensures you are following state laws for library data  Library administration / Board  Patrons If there are policies, follow them If there are not, create them

15 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Benefits to creating the Data Warehouse  Cleaned resource, ready for analysis Outside of operational system  Use for regular reports and research  Forces library to examine the life of data Are there backup tapes created? How long are backups kept?

16 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies Striking a Balance  A well-designed data warehouse strikes the balance between Protecting Privacy and Maintaining a Data-Based History

17 Please do not reuse these slides without prior permission from scott@scottnicholson.com. scott@scottnicholson.com Copyright 2004 Scott Nicholson, Syracuse University School of Information Studies For more information  About bibliomining: http://bibliomining.com  About an active data warehouse project: http://metrics.library.upenn.edu/prototype/ datafarm/  About this presentation: http://bibliomining.com/nicholson “The Bibliomining process: Data warehousing and data mining for library decision-making”


Download ppt "Please do not reuse these slides without prior permission from Copyright 2004 Scott Nicholson, Syracuse."

Similar presentations


Ads by Google