Digital Forensics Dr. Bhavani Thuraisingham The University of Texas at Dallas Evidence Correlation November 4, 2008.

Slides:



Advertisements
Similar presentations
Computer Systems & Architecture Lesson 2 4. Achieving Qualities.
Advertisements

Computer Forensic Analysis By Aaron Cheeseman Excerpt from Investigating Computer-Related Crime By Peter Stephenson (2000) CRC Press LLC - Computer Crimes.
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
2P13 Week 11. A+ Guide to Managing and Maintaining your PC, 6e2 RAID Controllers Redundant Array of Independent (or Inexpensive) Disks Level 0 -- Striped.
A correlation method for establishing provenance of timestamps in digital evidence By: Bradley Schatz*, George Mohay, Andrew Clark From: Information Security.
Information Retrieval in Practice
 Guarantee that EK is safe  Yes because it is stored in and used by hw only  No because it can be obtained if someone has physical access but this can.
Guide to Computer Forensics and Investigations Fourth Edition
Guide to Computer Forensics and Investigations Fourth Edition
Computer & Network Forensics
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
An evaluation framework
Chapter 14 The Second Component: The Database.
Methodology Conceptual Database Design
Maintaining and Updating Windows Server 2008
Chapter 1 Introduction to Databases
Overview of Search Engines
Digital Forensics Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #12 Intelligent Digital Forensics September 30, 2009.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
TRANSACTION PROCESSING SYSTEM Liew Woei Song Muhammad Hofiz Achoson.
IT – DBMS Concepts Relational Database Theory.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
Computers Are Your Future Tenth Edition Chapter 12: Databases & Information Systems Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
The University of Akron Dept of Business Technology Computer Information Systems DBMS Functions 2440: 180 Database Concepts Instructor: Enoch E. Damson.
© Paradigm Publishing Inc. 9-1 Chapter 9 Database and Information Management.
1 PHP and MySQL. 2 Topics  Querying Data with PHP  User-Driven Querying  Writing Data with PHP and MySQL PHP and MySQL.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Introduction: Databases and Database Users
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
Components of Database Management System
© Paradigm Publishing Inc. 9-1 Chapter 9 Database and Information Management.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
©2010 John Wiley and Sons Chapter 12 Research Methods in Human-Computer Interaction Chapter 12- Automated Data Collection.
OBJECT ORIENTED SYSTEM ANALYSIS AND DESIGN. COURSE OUTLINE The world of the Information Systems Analyst Approaches to System Development The Analyst as.
Digital Forensics Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #8 Computer Forensics Data Recovery and Evidence Collection September.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Methodology - Conceptual Database Design. 2 Design Methodology u Structured approach that uses procedures, techniques, tools, and documentation aids to.
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
12 Developing a Web Site Section 12.1 Discuss the functions of a Web site Compare and contrast style sheets Apply cascading style sheets (CSS) to a Web.
Andrew S. Budarevsky Adaptive Application Data Management Overview.
Digital Forensics Dr. Bhavani Thuraisingham The University of Texas at Dallas Evidence Correlation November 2011.
Methodology - Conceptual Database Design
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Digital Forensics Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #27 Evidence Correlation October 31, 2007.
Digital Forensics Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture 23 Intelligent Digital Forensics October 22, 2007.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
 Forensics  Application of scientific knowledge to a problem  Computer Forensics  Application of the scientific method in reconstructing a sequence.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
7 Strategies for Extracting, Transforming, and Loading.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
2/20: Ch. 6 Data Management What is data? How is it stored? –Traditional management storage techniques; problems –DBMS.
Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.
Digital Forensics Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #8 File Systems September 22, 2008.
Unit 2 Personal Cyber Security and Social Engineering Part 2.
Maintaining and Updating Windows Server 2008 Lesson 8.
Agenda for Today  DATABASE Definition What is DBMS? Types Of Database Most Popular Primary Database  SQL Definition What is SQL Server? Versions Of SQL.
SQL Database Management
Information Retrieval in Practice
Architecture Review 10/11/2004
Module 11: File Structure
Chapter 9 Database and Information Management.
Dr. Bhavani Thuraisingham The University of Texas at Dallas
Dr. Bhavani Thuraisingham The University of Texas at Dallas
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

Digital Forensics Dr. Bhavani Thuraisingham The University of Texas at Dallas Evidence Correlation November 4, 2008

Papers to discuss l Forensic feature extraction and cross-drive analysis l A correlation method for establishing provenance of timestamps in digital evidence l md5bloom: Forensic file system hashing revisited (OPTIONAL) l Identifying almost identical files using context triggered piecewise hashing (OPTIONAL) -

Abstract of Paper 1 l This paper introduces Forensic Feature Extraction (FFE) and Cross- Drive Analysis (CDA), two new approaches for analyzing large data sets of disk images and other forensic data. FFE uses a variety of lexigraphic techniques for extracting information from bulk data; CDA uses statistical techniques for correlating this information within a single disk image and across multiple disk images. An architecture for these techniques is presented that consists of five discrete steps: imaging, feature extraction, first-order cross-drive analysis, cross-drive correlation, and report generation. CDA was used to analyze 750 images of drives acquired on the secondary market; it automatically identified drives containing a high concentration of confidential financial records as well as clusters of drives that came from the same organization. FFE and CDA are promising techniques for prioritizing work and automatically identifying members of social networks under investigation. Authors believe it is likely to have other uses as well.

Outline l Introduction l Forensics Feature Extraction l Single Drive Analysis l Cross drive analysis l Implementation l Directions

Introduction: Why? l Improper prioritization. In these days of cheap storage and fast computers, the critical resource to be optimized is the attention of the examiner or analyst. Today work is not prioritized based on the information that the drive contains. l Lost opportunities for data correlation. Because each drive is examined independently, there is no opportunity to automatically ‘‘connect the dots’’ on a large case involving multiple storage devices. l Improper emphasis on document recovery. Because today’s forensic tools are based on document recovery, they have taught examiners, analysts, and customers to be primarily concerned with obtaining documents.

Feature Extraction l An address extractor, which can recognize RFC822- style addresses. l An Message-ID extractor. l An Subject: extractor. l A Date extractor, which can extract date and time stamps in a variety of formats. l A cookie extractor, which can identify cookies from the Set- Cookie: header in web page cache files. l A US social security number extractor, which identifies the patterns ###-##-#### and ######### when preceded with the letters SSN and an optional colon. l A Credit card number extractor.

Single Drive analysis l Extracted features can be used to speed initial analysis and answer specific questions about a drive image. l Authors have successfully used extracted features for drive image attribution and to build a tool that scans disks to report the likely existence of information that should have been destroyed under Fair and Accurate Credit Transactions Act l Drive attribution: an analyst might encounter a hard drive and wish to determine to whom that drive previously belonged. For example, the drive might have been purchased on eBay and the analyst might be attempting to return it to its previous owner. l powerful technique for making this determination is to create a histogram of the addresses on the drive (as returned by the address feature extractor).

Cross drive analysis (CDA) l Cross-drive analysis is the term that coined to describe forensic analysis of a data set that spans multiple drives. l The fundamental theory of cross-drive analysis is data gleaned from multiple drives can improve the forensic analysis of a drive in question both in the case when the multiple drives are related to the drive in question and in the case when they are not. l two forms of CDA: first order, in which the results of a feature extractor are compared across multiple drives, an O(n) operation; and second order, where the results are correlated, an O(n2) operation.

Implementation l 1. Disks collected are imaged onto into a single AFF file. (AFF is the Advanced Forensic Format, a file format for disk images that contains all of the data accession information, such as the drive’s manufacturer and serial number, as well as the disk contents) l 2. The afxml program is used to extract drive metadata from the AFF file and build an entry in the SQL database. l 3. Strings are extracted with an AFF-aware program in three passes, one for 8-bit characters, one for 16-bit characters in lsb format, and one for 16-bit characters in msb format. l 4. Feature extractors run over the string files and write their results to feature files. l 5. Extracted features from newly-ingested drives are run against a watch list; hits are reported to the human operator. l 6. The feature files are read by indexers, which build indexes in the SQL server of the identified features.

Implementation l 7. A multi-drive correlation is run to see if the newly accessioned drive contained features in common with any drives that are on a drive watch list. l 8. A user interface allows multiple analysts to simultaneously interact with the database, to schedule new correlations to be run in a batch mode, or to view individual sectors or recovered files from the drive images that are stored on the file server.

Directions l Improve feature extraction l Improve the algorithms l Develop end to end systems

Abstract of Paper 2 l Establishing the time at which a particular event happened is a fundamental concern when relating cause and effect in any forensic investigation. Reliance on computer generated timestamps for correlating events is complicated by uncertainty as to clock skew and drift, environmental factors such as location and local time zone offsets, as well as human factors such as clock tampering. Establishing that a particular computer’s temporal behavior was consistent during its operation remains a challenge. The contributions of this paper are both a description of assumptions commonly made regarding the behavior of clocks in computers, and empirical results demonstrating that real world behavior diverges from the idealized or assumed behavior. Authors present an approach for inferring the temporal behavior of a particular computer over a range of time by correlating commonly available local machine timestamps with another source of timestamps. We show that a general characterization of the passage of time may be inferred from an analysis of commonly available browser records.

Outline l Introduction l Factors to consider l Drifting clocks l Identifying computer timescales by correlation with corroborating sources l Directions

Introduction l Timestamps are increasingly used to relate events which happen in the digital realm to each other and to events which happen in the physical realm, helping to establish cause and effect. l Difficulty with timestamps is how to interpret and relate the timestamps generated by separate computer clocks when they are not synchronized l Current approaches to inferring the real world interpretation of timestamps assume idealized models of computer clock l Uncertainty with the behavior of suspect’s clock computer before seizure. l Authors explore two themes related to this uncertainty. - investigate whether it is reasonable to assume uniform behavior of computer clocks over time, and test these assumptions by attempting to characterize how computer clocks behave in the wild. - investigate the feasibility of automatically identifying the local time on a computer by correlating timestamps embedded in digital evidence with corroborative time sources.

Factors l Computer timekeeping l Real-time synchronization l Factors affecting timekeeping accuracy - Clock configuration - Tampering - Synchronization protocol - Misinterpretation l Usage of timestamps in forensics

Drifting clocks behavior l Enumerate the main factors influencing the temporal behavior of the clock of a computer, and then attempt to experimentally validate whether one can make informed assumptions about such behavior. l Authors do this by empirically studying the temporal behavior of a network of computers found in the wild. l The subject of case study is a network of machines in active use by a small business. The network consists of a Windows 2000 domain, consisting of one Windows 2000 server, and mixed number of Windows XP and 2000 workstations. l The goal here is to observe the temporal behavior. In order to observe this behavior, authors have constructed a simple service that logs both the system time of a host computer and the civil time for the location. l The program samples both sources of time and logs the results to a file. The logging program was deployed on all workstations and the server

Correlation l Automated approach which correlates time stamped events found on a suspect computer with time stamped events from a more reliable, corroborating source. l Web browser records are increasingly employed as evidence in investigations, and are a rich source of time stamped data. l Techniques implemented are” Click stream correlation algorithm and Non-cached correlation algorithm l Authors compare the results of both algorithms

Directions l Need to determine whether the conditions and the assumptions of the experiments are realistic l What are the most appropriate correlation algorithms? l Need to integrate with clock synchronization algorithms

Abstract of Paper 3 (OPTIONAL) l Hashing is a fundamental tool in digital forensic analysis used both to ensure data integrity and to efficiently identify known data objects. Authors objective is to leverage advanced hashing techniques in order to improve the efficiency and scalability of digital forensic analysis. They explore the use of Bloom filters as a means to efficiently aggregate and search hashing information. In They present md5bloo a Bloom filter manipulation tool that can be incorporated into forensic practice, along with example uses and experimental results.

Outline l Introduction l Bloom filter l Applications l Directions

Introduction l The goal is to pick from a set of forensic images the one(s) that are most like (or perhaps most unlike) a particular target. l This problem comes up in a number of different variations, such as comparing the target with previous/related cases, or determining the relationships among targets in a larger investigation. l The goal is to get a high-level picture that will guide the following in-depth inquiry. l already existing problems of scale in digital forensic tools are further multiplied by the number of targets, which explains the fact that in other forensic areas comparison with other cases is routine and massive, whereas in digital forensics it is the exception. l An example is object versioning detection: need to detect a particular version of an object and not the target object

Introduction l need to address is a way to store a set of hashes representing the different components of a composite object as opposed to a single hash. l For example, hashing the individual routines of libraries or executables would enable fine-grained detection of changes (e.g. only a fraction of the code changes from version to version). l The problem is that storing more hashes presents a scalability problem even for targets of modest sizes. l Therefore, authors propose the use of Bloom filters as an efficient way to store and query large sets of hashes.

Bloom Filters l A Bloom filter B is a representation of a set S = {s1,., sn} of n elements from a universe (of possible values) U. The filter consists of an array of m bits, initially all set to 0. l the ratio r =m/n is a key design element and is usually fixed for a particular application. l To represent the set elements, the filter uses k independent hash functions h1,., hk, with a range {0,., m 1}. All hash functions are assumed to be independent and to map elements from U uniformly over the range of the function. l Md5bloom: Authors have a prototype stream-oriented Bloom filter implementation called md5bloom.

Application of Bloom Filter in Security l Spafford (1992) was one of the first person to use Bloom filters to support computer security. l The OPUS system by Spafford uses a Bloom filter which efficiently encodes a wordlist containing poor password choices to help users choose strong passwords. l Bellovin and Cheswick present a scheme for selectively sharing data while maintaining privacy. Through the use of encrypted Bloom filters, they allow parties to perform searches against each other’s document sets without revealing the specific details of the queries. The system supports query restrictions to limit the set of allowed queries. l Aguilera et al. discuss the use of Bloom filters to enhance security in a network-attached disks (NADs) infrastructure. l The authors use bloom filtering to detect hash tampering

Directions l Cryptography is a kept application for detecting evidence tampering l Bloom filters are one application for tampering the hash l Need to compare different cryptographic algorithms l Relations to correlation needs to be determined

Abstract of Paper 4 l Homologous files share identical sets of bits in the same order. Because such files are not completely identical, traditional techniques such as cryptographic hashing cannot be used to identify them. This paper introduces a new technique for constructing hash signatures by combining a number of traditional hashes whose boundaries are determined by the context of the input. These signatures can be used to identify modified versions of known files even if data has been inserted, modified, or deleted in the new files. The description of this method is followed by a brief analysis of its performance and some sample applications to computer forensics.

Outline l Introduction l Piece-wise hashing l Spamsum algorithms l Directions

Introduction l This paper describes a method for using a context triggered rolling hash in combination with a traditional hashing algorithm to identify known files that have had data inserted, modified, or deleted. l First, they examine how cryptographic hashes are currently used by forensic examiners to identify known files and what weaknesses exist with such hashes. l Next, the concept of piecewise hashing is introduced. l Finally a rolling hash algorithm that produces a pseudo-random output based only on the current context of an input is described. l By using the rolling hash to set the boundaries for the traditional piecewise hashes, authors create a Context Triggered Piecewise Hash (CTPH).

Piece wise hashing l arbitrary hashing algorithm to create many checksums for a file instead of just one. Rather than to generate a single hash for the entire file, a hash is generated for many discrete fixed-size segments of the file. For example, one hash is generated for the first 512 bytes of input, another hash for the next 512 bytes, and so on. l A rolling hash algorithm produces a pseudo-random value based only on the current context of the input. The rolling hash works by maintaining a state based solely on the last few bytes from the input. Each byte is added to the state as it is processed and removed from the state after a set number of other bytes have been processed.

Spamsum l spam detection written by Dr. Andrew Tridgell is Spamsum can identify s that are similar but not identical to samples of known spam. The spamsum algorithm was in turn based upon the rsync checksum also by Dr. Tridgell. l The spamsum algorithm uses FNV hashes for the traditional hashes which produce a 32-bit output for any input. In spamsum, Dr. Tridgell further reduced the FNV hash by recording only a base64 encoding of the six least significant bits (LS6B) of each hash value l The algorithm for the rolling hash was inspired by the Alder32 checksum.

Directions l Many applications in alerted document matching and partial file matching l Improvement to hash algorithms l Performance studies