Presentation on theme: "Forensic Analysis of Database Tampering"— Presentation transcript:
1 Forensic Analysis of Database Tampering Kyriacos Pavlou and Richard T. SnodgrassComputer Science DepartmentThe University of Arizona
2 IntroductionThe problem : How to systematically perform forensic analysison a compromised database.Recent federal laws (HIPAA, Sarbanes-Oxley Act etc.) and incidents of corporate collusion mandate audit log security.Snodgrass et al. [VLDB04] showed how to detect database tampering.Approach: Hash using a cryptographically strong hash function, notarize data manipulated by transactions and periodically validate.Forensic analysis to ascertain:When the intrusion transpiredWhat data was alteredWho the intruder isWhy has this transpired
3 Outline Tamper Detection Forensic Analysis Forensic Algorithms The corruption diagramTypes of corruption eventsForensic AlgorithmsThree algorithmsForensic strengthFuture Work
4 Tamper Detection Several related ideas that allow tamper detection: DBMS can maintain audit log in backgroundTransaction-time tableAppend-onlyData modified can be cryptographically hashed to produce a secure one-way hash of transaction.Notarize hash value with external notarization service. The hash value cannot change.Implementation optimizations:opportunistic hashingtransaction ordering listlinked hashingThe latest hash value is a hash of all the changes made to thedatabase since database creation.
5 Tamper Detection Two phases: Normal Processing Validation transactionsTwo phases:Normal ProcessingValidationThe validation result is a single bit.hash valuetransactionstransactionshashing+notary IDhash valuetransactionshashing+notary IDrehashhash valuenotary ID+result
6 DefinitionsCorruption Event (CE): any event that corrupts data and compromises database (intrusion, human intervention, bug)Corruption time (tc): actual time instant at which a CE occurred.Validation Event (VE): validation of the audit log by the Notarization Service (NS).Time of VE (tv): time instant at which a VE occurred.Validation Failure vs. Validation Success: NS’s answer to a query for a particular hash value. Denotes tampering or lack thereof respectively.Notarization Event (NE): the notarization of a document by the NS.Time of NE (tn): time instant at which a NE occurred.
7 Definitions (cntd) Forensic analysis involves the following: Temporal detection: determination of tcSpatial detection: determination of “where,” i.e., the location in the database of the data affected in a CE.This data is termed the corruption locus data (lc).In fact, try to ascertain locus time (tl), the time instant lc was originally stored (transaction commit time).Note that a CE can have many lc’s, termed multi-locus, or aonly one lc termed single-locus CE.
8 The Corruption Diagram When= TRUEVE1NE3NE: Notarization EventVE: Validation EventlinkNE2linkNE1NE0Where
10 Forensic AnalysisIf a corruption is detected, the forensic analyzer springs into action.The analyzer tries to ascertain a corruption region: the bounds on the uncertainty of the “where” and “when” of the corruption.
11 Notarization and Validation Intervals Non-aligned validation just delays detection of tampering.Validation factor IV = V·IN
12 Analyzing Timestamp Corruption So far considered data-only CEs. We now examine the case where the timestamps of the tuples are changed.Data-onlyBackdatingPostdatingRetroactiveIntroactive×
13 Monochromatic Algorithm WhenForensic analysis beginsTFFFFVE2 = FALSENE6CE.time of corruption (tc)NE5NE4VE1 = TRUENE3Corruption Region: captures the uncertainty as to the position of CENE2NE1tl: place of corruption (commit time)NE0Where
14 Monochromatic Algorithm Central insight: data can be rehashed by validator and checked.Corruption region bounds: IV INArea is solely dependent on the two intervals.Cannot handle CEs involving timestamp corruption.×
15 The RGB Forensic Algorithm WhenBGTTFFFFFFFVE4 = FALSENE8Forensic analysis beginsCE.Postdating CEIV = 4 daysIN = 2 daystcNE7TNotarization of RedRVE3 = TRUENE6NE5BGTNotarization of Blue & GreenVE2 = TRUEtptp: postdating timeNE4NE3Notarization of RedRVE1 = TRUENE2NE1xxNE0tlWhere
16 The RGB Forensic Algorithm Introduction of RGB partial hash chains:Allows the bounding of both tl and tpIncurs extra NS costEach of two corruption regions bounds: IV INWe would like to reduce the area of the corruption regions.×
17 The Polychromatic Algorithm BGFWhenFTTFFFFFFVE4 = FALSENE8Forensic analysis beginsCE.IV = 4 daysIN = 2 daysDesired = 1 daytcNE7TNotarization of 2 RedsRVE3 = TRUENE6NE5Backdating CETBGFFNotarization of 2 Blues & 1 GreenVE2 = TRUENE4Uncertainty can be arbitrarily shrunk via a logarithmic number of red and blue hash chains.NE3Notarization of 2 RedsRVE1 = TRUENE2tb: backdating timeNE1xxNE0tbtl
18 The Polychromatic Algorithm Introduction of extra partial hash chains:Reduces uncertainty of corruption regionIncurs additional NS costUncertainty can be arbitrarily shrunk via a logarithmic number of red and blue hash chains.Hence, the width is no longer dependent on IV and IN .
19 Forensic Strength Components: Inverse Forensic Strength: Work of forensic analysisRegion-area of CEWidth of postdating / backdating uncertaintyInverse Forensic Strength:IFS( D , IN ,V ) = ( NumNotarizes( D , IN ,V ) + ForensicAnalysis( D , IN ,V ) )· RegionArea( IN ,V ) · UncertaintyWidth( D , IN )where V = IV / IN is the validation factor andD is the number of days before first validation failure.Monochromatic: O( V · D2 · IN )RGB: O( V · D · IN2 ) We assume that D >> IN .Polychromatic: O( ( V + lg IN ) · D )
20 Future Work Develop a stronger lower bound for this problem. Accommodate multi-locus and complex CEs.Differentiate postdating and backdating CEs.Implement forensic analysis in validator.Consider interaction between transaction-time storage manager and underlying WORM storage.
21 Summary We have presented a means of performing forensic analysis. We have introduced a graphical representation to visualize CEs, termed the corruption diagram.We have designed three forensic algorithms.MonochromaticRGBPolychromatic
22 AcknowledgementsNSF grants IIS , IIS and EIA and a grant from Microsoft provided partial support for this work.