Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Software Reference Library

Similar presentations


Presentation on theme: "National Software Reference Library"— Presentation transcript:

1 National Software Reference Library
Douglas White Information Technology Laboratory July 2004 July 9, 2004

2 Introduction The National Software Reference Library is:
A physical collection of over 5,000 software packages on secured shelves A database of file “fingerprints” (or “hashes”) and additional information to uniquely identify each file on the shelves A Reference Data Set (RDS) extracted from the database onto CD, used by law enforcement, investigators, researchers, others “sanity check” slide, what the NSRL is in general The NSRL is three items – software on shelves, information in a database, and the RDS CD. July 9, 2004

3 Use of the NSRL Eliminate as many known files as possible from the examination process using automated means Discover expected file name with unknown contents Identify origins of files Look for malicious files, e.g., hacker tools Identify duplicate files Provide rigorously verified data for forensic investigations If you find a NOTEPAD.EXE with a hash that doesn’t match the RDS hash, investigate it. If a computer has a FOOBAR.O file with a hash that matches the RDS, and the only application with FOOBAR.O is a hacker tool, you may have probable cause/intent grounds. July 9, 2004

4 How Did the NSRL Start? Law Enforcement needed software hashes that could be used in investigations and in court. Source must be unbiased - NIST is a neutral organization Data produced must be of the highest quality Data must be traceable and repeatable There must be a repository of original software NIST provides an open rigorous process This was the situation when NIST was chosen to perform the NSRL work in 2001. “no unbiased organizations” – made it very hard to argue computer forensics issues in court. If XXX organization may have a conflict of interest with YYY tool… “data traceable” – collections of hashes were not methodically gathered, many were swapped around via , touched by many hands Traceability – hash sets existed but could not be duplicated from the original sources Capabilities – some tools used MD5 but not SHA, some used SHA but not MD5, some used only CRC32… July 9, 2004

5 NSRL Software Collection
Balance of most popular (encountered often) and most desired (pirated often) Currently 32 languages, used internationally Software is purchased commercially Software is donated under non-use policy List of contents available on website We try to strike a balance of what “popular” means. We buy some of the software, we try to acquire old versions of software, and some is donated by organizations. Donated software is not installed for daily use – it is secured in the NSRL shelves. We are always glad to take donations of software. You may have older versions of software you don’t need – we will take it. We want anything we don’t have, and you can see what we have by visiting our website July 9, 2004

6 NSRL Software Database
Information to uniquely identify every file on every piece of media in every application Database schema is available on website 4,200 Bytes per application 750 Bytes per file Total database size now 20 GB for 5,000 applications with 31,900,000 files The bulk of the work in the NSRL project is spent in populating the NSRL database. The database design is open, anyone can see the schema on the website. Just to give a feel for the size, here are some numbers. It takes 4,200 bytes to record information about each app – address, OS, vendor, etc. Each file is uniquely identified by 750 bytes. July 9, 2004

7 NSRL Reference Data Set
The Reference Data Set (RDS) is a selection of information from the NSRL database Allows positive identification of manufacturer, product, operating system, version, file name from file “signature” Data format available for forensic tool developers Published quarterly, free redistribution Possible to publish critical data out of regular schedule; in February 2004 NSRL supplied 500,000 Arabic file signatures to FBI & DoD $90 a year for 4 releases – price covers duplication and mailing. Mention the free redistribution policy. Mention we give hundreds away for free, when conferences ask. Mention we have the previous quarter’s release on the website. July 9, 2004

8 RDS Field Use Concept ANALYSIS PROGRAM RDS UNKNOWN FILES FILES
Disk Drive ANALYSIS PROGRAM KNOWN FILES the RDS would be available in the field or laboratory. A third-party program is executed on the subject disk to generate file profiles and to compare these file profiles with those in the RDS. If a file profile from the subject drive matches the RDS, then that file is a known file and can be removed from further investigation. Generally, this results in 40 to 95 percent of subject files being discarded, which amounts to a significant number of saved hours in the investigation. The ones that do not match are those highlighted for further investigation as “unknown” files. The 40-95% is for commercial software. Disk copy/image made by a tool whose properties you know (CFTT) Trash can = no need to investigate. RDS July 9, 2004

9 RDS Field Use Example You are looking for sensitive facility maps on a computer which is running Windows 2000. Windows 2000 operating system software contains images which are known gifs, icons, jpeg files e.g., By using the RDS and an analysis program the investigator would not have to look at these files to complete his investigation. July 9, 2004

10 Hashes Like a person’s fingerprint
Uniquely identifies the file based on contents You can’t create the file from the hash Primary hash value used is Secure Hash Algorithm (SHA-1) specified in FIPS 180-1, a 160-bit hashing algorithm 1045 combinations of 160-bit values “Computationally infeasible” to find two different files less than 264 bits in size producing the same SHA-1 264 bits is one million terabytes [note – skip this depending on the audience] SHA-1 URL: 160 bits -> 40 char hex string 2^64 bits is one million terabytes – about 10^19 bits July 9, 2004

11 Hash Examples Filename Bytes SHA-1
NT4\ALPHA\notepad.exe F1F284D5D757039DEC1C44A05AC148B9D204E467 NT4\I386\notepad.exe C4E15A C61548A981A4AC BE37 NT4\MIPS\notepad.exe E4DBBA665E FE5E E69 NT4\PPC\notepad.exe BB7AF0E4DD565ED75DEB492D8C17B1BFD3FB23 WINNT31.WKS\I386\notepad.exe E0849CF327709FC46B705EEAB5E57380F5B1F67 WINNT31.SRV\I386\notepad.exe E0849CF327709FC46B705EEAB5E57380F5B1F67 You can see 4 different SHA-1 values for “notepad.exe” on different platforms running NT4. Compare that with the I386 “notepad.exe” that remains the same on NT31 workstation and server. I’ve done 2 small (less than 1024 bit) SHA-1 hashes by hand, and each took me 30 minutes. July 9, 2004

12 NSRL & National Archives and Records Administration
Use hashing process on non-classified Presidential materials Identify application files Identify duplicate files Access to older installed software July 9, 2004

13 NSRL & Voting Systems Needs
Determine that software used during elections is the expected software Tested, certified version is definitively identifiable Same during distribution, installation, setup, or use “Chain of custody” Transparency The NSRL methodology is in the public domain, available for inspection Jurisdictions can share knowledge with each other July 9, 2004

14 EAC & NSRL Can verify that operating system file contents have not been modified Can verify that application file contents have not been modified Can verify that known static sections of files have not been modified At 866MHz, SHA-1 of 50MB takes ~5 sec. , MD5 of 50MB takes ~4 sec. July 9, 2004

15 Voting Research Issues
Working with software companies to get access to software Distribution vs. installation hashes If there is any setup after the hashes are made, how do you know what changes are valid? Possible/practical to have on-location, time-of-certification hashing? Verification within time/ space/ security constraints Time/space/security - looking at use of Bloom filters to allow fast filtering of file signatures, given a ceiling on distribution size and need for read-only media July 9, 2004

16 Discussion Questions about the NSRL
Discussion of the NSRL and Voting Systems July 9, 2004

17 Contact Douglas White Software Diagnostics and Conformance Testing
Information Technology Laboratory Telephone: Web: [use whatever is appropriate] Douglas White has worked at the National Institute of Standards and Technology since His experience has covered distributed systems, distributed databases and telecommunication protocols. He has written programs in many areas, including real time biomonitoring, real time video processing, web site/database integration, system administration scripts and network monitoring scripts. He holds both a B.A and M.S. in computer science from Hood College. July 9, 2004


Download ppt "National Software Reference Library"

Similar presentations


Ads by Google