Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sensitive Information Sweep Using Cornell’s Spider Wyman Miles, Cornell University Kerry Havens, University of Colorado at Boulder Steve Lovaas, Colorado.

Similar presentations

Presentation on theme: "Sensitive Information Sweep Using Cornell’s Spider Wyman Miles, Cornell University Kerry Havens, University of Colorado at Boulder Steve Lovaas, Colorado."— Presentation transcript:

1 Sensitive Information Sweep Using Cornell’s Spider Wyman Miles, Cornell University Kerry Havens, University of Colorado at Boulder Steve Lovaas, Colorado State University

2 Overview Quick Background The Technical Problem (Kerry) The Organizational Problem (Steve) Spider (Wyman) Summary & Questions

3 What is “Sensitive Information”? A Growing Concern A Moving Target SSN, Credit Card, Driver’s License, Medical Records, Student Information, Proprietary Research,… Data in Context – Aggregation

4 Why Are We All Here? The Front Page! CDW-G 2006 Survey – more than 3 million college students may have lost personal information in the last year. Identity theft is the fastest growing crime in the U.S. By far the biggest culprit? Lost or stolen computers.

5 Regulations, Standards, & Laws Federal – HIPAA, FERPA, SarbOx, GLB,… Identity Theft Protection Act? State – Many states passing identity theft protection laws; New York & Colorado have state CISO Industry – PCIDSS

6 The Technical Problem: Finding sensitive information in a haystack Kerry Havens University of Colorado at Boulder

7 SSN Remediation At CU-Boulder, SSNs were used as a student identifier before 2004 House Bill 03-1175 was approved in 2003 requiring institutions to change this method to ensure the privacy of a student’s social security number CU-Boulder started issuing student IDs to new students in July 2004 and converting SSNs to SIDs in 2005

8 Where the data is not stored File type exclusions – fine tuning –Binary files where the data cannot be read –Received input from community for fine tuning False positives –International telephone numbers –Examples for web form validation Why is the department webpage asking for SSNs?

9 OS and File Encoding Problems HTML encoding problems Representations (pictures) of sensitive data are not found –Examples include PDF Searching a UNIX filesystem –Preparing the file before searching for private data –For example, using strings to extract text from text/binary hybrids like.doc or.xls

10 Where the data is stored Typical file types of discovered data –Gradebooks –Course web pages –Homework assignments –Travel authorization forms –Personal financial documents –Email

11 Regular Expressions Returns too much data: /\d{3}-\d{2}-\d{4}/ Searching for environment specific data in the hope that common data will lead us to more data: /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | (52[1-4]|65[0-3])\d{6})\b/ State specific information can be found at

12 Regular Expressions Let’s dissect this… /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | (52[1-4]|65[0-3])\d{6})\b/

13 Regular Expressions Let’s dissect this… /\b( [0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | (52[1-4]|65[0-3])\d{6} )\b/ Boundary

14 Regular Expressions Let’s dissect this… /\b( [0-7] \d{2}[-|\s]\d{2}[- |\s]\d{4} | (52[1-4]|65[0-3])\d{6})\b/ First acceptable digit

15 Regular Expressions Let’s dissect this… /\b([0-7] \d{2} [-|\s] \d{2} [-|\s] \d{4} | (52[1-4]|65[0-3]) \d{6} )\b/ 2, 4, or 6 digits in a row

16 Regular Expressions Let’s dissect this… /\b([0-7]\d{2} [-|\s] \d{2} [-|\s] \d{4} | (52[1-4]|65[0-3])\d{6})\b/ Delimited by dash or space

17 Regular Expressions Let’s dissect this… /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | (52[1-4]|65[0-3]) \d{6})\b/ Colorado specific prefix, not delimited

18 CU Experiences Pitfalls –Users’ interpretations of the log file –Fine tuning file extension exceptions and regular expressions Recommendations –Keep current environment in mind

19 The Organizational Problem: a really big haystack Steve Lovaas Network Security Manager Colorado State University

20 Organizational Vision Support from the top –Cabinet-level committee driving the project –Spurred by headlines and state mandates –VP for IT who really gets security Campus PR campaign –Web site –Public meetings Tied SSN purge to the rollout of a new CSUID in Fall 2006

21 Using Resources Project Constraints –Tight timeline –No budget –Not a trivial programming project Buy / Build / Leverage tools? Goal: 100% coverage vs. Best Effort Spider chosen for Windows, Linux, Mac Manual searching on AIX, mainframe

22 Ultimate Responsibility Original thought: deans / dept. heads Revised edition: individual employees Developed a personal attestation for for every employee to sign, submitted in bulk by colleges More work for central IT Senior VP: Doing the scan and signing the form is a CONDITION OF EMPLOYMENT

23 Individual Attestation Form Every employee 2 choices: –I don’t interact with SSNs in the course of my job –SSNs in all electronic files under my control have been removed or encrypted VP for IT must approve exceptions

24 CSU Experiences Pitfalls –Beta tool for a live project requires quick response and careful management of user expectations & acceptance –Careful of deadlines, it’s a lot of work! Recommendations –Don’t do this kind of project without active support from the very top –Anticipate the need for analysis/parsing tools –Have a supported encryption solution for exceptions

25 Cornell Spider Wyman Miles Sr. Security Engineer Cornell University

26 A Brief History of Spider Early 2005, scan Web for SSNs Later, scan disk images for SSNs/CCNs March 2006, debut at BU Security Camp April 2006, Educause, demand for a Windows version Version 1.0 in May, 2.0 in June

27 A Brief History, II June 2006, major feedback from Steve: bug reports, tests, feature requests Engine developed that same month: internal incident response OSX Spider Sept 2006 Windows Spider rewrite April 2007, GPL release of all Spiders

28 Current Spider SSN, SIN, CCN, NINO discovery in many file types Various data type validators Web scanning, back to its roots Scan for data in unallocated space Faster. More readable source

29 Various Spiders Windows Spider, aka Spider3 OSX Spider Engine, general UNIX spider LinSpider, our oldest version Spider Simple: Windows Spider preconfigured to skip noisy files

30 Future Spider Feature set convergence between Engine, OSX, Windows Community Development Possible I2 hosting of distribution and documentation More documentation! Client-Server model revisited

31 Spider Log

32 Spider at Cornell Incident response: a compromise has happened, what was at risk? Pre-emptive –Dan Elswit, CALS Security Officer

33 Spider in CIT CIT abandoned SSNs a few years ago, but they remain Tech support uses Spider Simple to discover lurking SSNs Manual process

34 Athletics Spider Simple Unique log names to network share Centralized analysis

35 Spider Downloads

36 Summary Purging sensitive information is something we’re going to have to get good at Get support from the highest levels Tune regular expressions and file/ext skip lists for your environment Anticipate parsing needs, exceptions New Spider features, more users, broader OS support Spider also for ongoing support, forensics

37 Questions? Wyman Miles: – Kerry Havens: –Kerry.Havens@Colorado.EDUKerry.Havens@Colorado.EDU Steve Lovaas: –Steven.Lovaas@ColoState.EDUSteven.Lovaas@ColoState.EDU The Spider users’ list: –

Download ppt "Sensitive Information Sweep Using Cornell’s Spider Wyman Miles, Cornell University Kerry Havens, University of Colorado at Boulder Steve Lovaas, Colorado."

Similar presentations

Ads by Google