Tools for identifying duplicate files and known software files

Slides:



Advertisements
Similar presentations
File Format Identification and Archival Processing
Advertisements

Microsoft Windows NT Embedded 4.0
CLEARSPACE Digital Document Archiving system INTRODUCTION Digital Document Archiving is the process of capturing paper documents through scanning and.
Windows XP Photo Workflow Tim Grey Imaging Strategist Microsoft Corporation.
Chapter 3 Loaders and Linkers
Open Data at the World Bank. Open Data at the World Bank Open about what we do Open about what we.
Guide to Computer Forensics and Investigations Fourth Edition
Guide to Computer Forensics and Investigations Fourth Edition
Computer & Network Forensics
COS 413 Day 13. Agenda Questions? Assignment 4 Due Assignment 5 posted –Due Oct 21 Capstone proposal Due Oct 17 Lab 5 on Oct 15 in N105 –Hands-on Projects.
Guide to Computer Forensics and Investigations Third Edition
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Chapter 8 Chapter 8: Managing the Server Through Accounts and Groups.
Archival Prototypes and Lessons Learned Mike Smorul UMIACS.
70-270, MCSE/MCSA Guide to Installing and Managing Microsoft Windows XP Professional and Windows Server 2003 Chapter Five Managing Disks and Data.
Hands-on: Capturing an Image with AccessData FTK Imager
July 9, National Software Reference Library Douglas White Information Technology Laboratory July 2004.
With Windows 7 Comprehensive© 2012 Pearson Education, Inc. Publishing as Prentice Hall1 PowerPoint Presentation to Accompany GO! with Windows 7 Comprehensive.
High-Speed, High Volume Document Storage, Retrieval, and Manipulation with Documentum and Snowbound March 8, 2007.
Acceleratio Ltd. is a software development company based in Zagreb, Croatia, founded in We create innovative software solutions for SharePoint,
Linux Operations and Administration
ORCID Technical Report May 18, Development Approach 2 Alpha Completed Spring 2010 Self-claim oriented Limited light integration with a few participant.
Lesson 7-Creating and Changing Directories. Overview Using directories to create order. Managing files in directories. Using pathnames to manage files.
systemhound © Raxco Software Belgium systemhound PC inventory software.
Teaching Digital Forensics w/Virtuals By Amelia Phillips.
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
ACQUISITION OF ADVANCED FORENSICS FORMAT BASED FORENSIC SOFTWARE Informal Proposal Chris Nelson.
Chapter 9 Computer Forensics Analysis and Validation Guide to Computer Forensics and Investigations Fourth Edition.
Practical Computer Literacy Week-02
DIGITAL FORENSICS Forensic Toolkit: a tool to process born digital records Emma Jolley Curator of Digital Archives.
Guide to Computer Forensics and Investigations Fourth Edition
Hypatia Hydra Platform for Access to Information in Archives DLF Forum * Baltimore * October 31, 2011 Stanford University Bradley Daigle Julie Meloni Tom.
Archivists’ Toolkit: Introduction March 12, 2007 Jody Lloyd Thompson.
Chapter 9 Computer Forensics Analysis and Validation Guide to Computer Forensics and Investigations Fourth Edition.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
1 AutoCAD Electrical 2008 What’s New Name Company AutoCAD Electrical 2008 Top Reasons to Move from AutoCAD AMS CAD Solutions
The Roger Conatser Aerial Photographs Collection Bethany C. Fiechter, Archivist for Manuscript and Digital Collections Amanda A. Hurford, Metadata and.
Checking More Alerting Less PRESENTED BY: AMIN ROIS SINUNG NUGROHO.
Enterprise Solutions Chapter 10 – Enterprise Content Management.
WGS Data management course Try-out , Hugo Besemer.
August Video Management Software ViconNet Enterprise Video Management Software Hybrid DVR Kollector Strike Kollector Force Plug & Play NVR HDExpress.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
TOPSpro Special Topics I: Database Managemen t. Agenda for Module I: Database Management  TOPSpro Backup/Restore Wizard  TOPS-TOPS Import/Export Wizard.
Forensic Investigation Techniques Michael Jones. Overview Purpose People Processes Michael Jones2Digital Forensic Investigations.
PGP Desktop (Client only) By: Courtney Wirtz & Vincent Verner.
A Solution for Maintaining File Integrity within an Online Data Archive Dan Scholes PDS Geosciences Node Washington University 1.
Digital forensics tools and methodologies in archival repositories
KEEPS – a system for UELMA preservation and security
By Jean Andrews Windows 7 and A+ By Jean Andrews
Architecting Search in SharePoint 2016
bitcurator-access-webtools Quick Start Guide
National Software Reference Library
Creighton Barrett Dalhousie University Archives
Contract Lifecycle Management In the Disruptive Age
KEEPS – a system for UELMA preservation and security
Getting Started with Application Software
Module 11: File Structure
The importance of being Connected
An Overview of Data-PASS Shared Catalog
StratusLab Tutorial (Bordeaux, France)
Chapter 5 EnCase Concepts.
By Luis Rivera, Michael Zanchelli, Julio Joblete
Booting Up 15-Nov-18 boot.ppt.
Modern PC operating systems
Computer Forensics Lab 1 INFORMATION TECHNOLOGY DEPARTMENT LEBANESE FRENCH UNIVERSITY (LFU) COURSE CODE: IT402CF 1.
bitcurator-access-webtools Quick Start Guide
Forensic Recovery of Evidence Device (FRED)
Radical Collaboration between Computer Science and Archival Science to Educate Next Generation Archivists Jane Zhang Catholic University of America 2019.
IT Management, Simplified
Presentation transcript:

Tools for identifying duplicate files and known software files Creighton Barrett Digital Archivist, Dalhousie University Archives BitCurator User Forum, Northwestern University April 27-28, 2017

Tools FSlint utility (included in BitCurator environment) Forensic Toolkit (FTK) “Flag Duplicates” processing option FTK Known File Filter (KFF) National Software Reference Library datasets (e.g., Reference Data Set) 2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

FSlint (finds file system “lint”) Duplicates Installed packages Bad names Name clashes Temp files Bad symlinks Bad IDs Empty directories Non stripped binaries Redundant whitespace 2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

FSlint – Duplicates Multiple scans to maximize efficiency and minimize false positives Filters out files of different sizes Checks to ensure files are not hard-linked Checks first 4k bytes and filters out files with unique MD5 Checks entire file and filters out files with unique MD5 Checks remaining files and filters out files with unique SHA-1 (to avoid MD5 collisions) Image source (BitCurator wiki): https://wiki.bitcurator.net/index.php?title=Identify_and_delete_duplicate_files 2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

FTK – Flag Duplicates Simpler process than FSlint, still a powerful feature Checks entire file and generates MD5 Assigns primary status to first instance of each MD5 Assigns secondary status to subsequent instances of each MD5 2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

FTK – Flag Duplicates 2017 BitCurator User Forum - Tools for identifying duplicate files and known software files 6

NSRL Reference Data Set (RDS) Image source (NSRL): https://www.nsrl.nist.gov/Documents/Data-Formats-of-the-NSRL-Reference-Data-Set-16.pdf 2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

NSRL Reference Data Set (RDS) Hashsets and metadata used in file identification Data can be used in third-party digital forensics tools RDS is updated four times each year As of v2.55, RDS is partitioned into four divisions: Modern – applications created in or after 2000 Legacy – applications created in or before 1999 Android – Mobile apps for the Android OS iOS – Mobile apps for iOS 2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

FTK – Known File Filter (KFF) KFF data – hash values of known files that are compared against files in an FTK case KFF data can come from pre-configured libraries (e.g., NSRL RDS, DHS, ICE, etc.) or custom libraries FTK ships with version of NSRL RDS bifurcated into “Ignore” and “Alert” libraries KFF Server – used to process KFF data against evidence in an FTK case KFF Import Utility – used to import and index KFF data 2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

FTK – Known File Filter (KFF) Used to eliminate known software and OS files in a forensic case OR highlight presence of “alert” files Used to “profile” someone before getting into case analysis (e.g., can help detect the usage of forensics programs such as drive cleaners) Custom libraries can be used to find known files (especially in IP theft cases) Custom libraries could be used to flag duplicates across multiple acquisitions from same donor 2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

Other tools to work with NSRL RDS nsrlsvr - https://github.com/rjhansen/nsrlsvr/ Keeps track of 40+ million hash values in an in- memory dataset to facilitate fast user queries Supports custom libraries (“local corpus”) nsrlllokup - https://rjhansen.github.io/nsrllookup/ Command-line application Works with tools like hashdeep: http://md5deep.sourceforge.net/ National Software Reference Library - MD5/SHA1/File Name search - http://nsrl.hashsets.com 2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

Bill Freedman fonds filtered in FTK Description # of files Size Unfiltered All files in case 26,651,084 3,568 GB Primary status Duplicate File indicator IS “Primary” 731,417 83.48 GB Secondary status Duplicate File indicator IS “Secondary” 16,569,218 271.5 GB KFF Ignore Match all files where KFF status IS “Ignore” 2,548,119 44.29 GB No KFF Ignore Match all files where KFF status IS NOT “Ignore” + KFF status IS “Not checked” 24,102,965 3524 GB Primary status + No KFF Ignore Match all files where duplicate file indicator IS “Primary” + KFF status IS NOT “Ignore” 626,351 71.95 GB Actual files + Primary status + No KFF Ignore Match all disk-bound files where duplicate file indicator IS “Primary” + KFF status IS NOT “Ignore” 103,412 61.81 GB Bill Freedman fonds: Acquired in 2016 Hybrid collection 18 boxes of textual records 5000+ original slides Five laptops One desktop Two external hard drives… 2017 BitCurator User Forum - Tools for identifying duplicate files and known software files

Questions Does it matter which duplicate file is selected for preservation? What if there are MD5 matches with different file names or extensions? Can queries against NSRL RDS be incorporated into BitCurator workflows? Could provenance-based libraries of “known file” hashes be incorporated into BitCurator workflows? Can repositories share provenance-based hash libraries (expose our “local corpus” of MD5s…)? 2017 BitCurator User Forum - Tools for identifying duplicate files and known software files