Watermarking Relational Databases Rakesh Agrawal and Jerry Kiernan.

Watermarking Relational Databases Rakesh Agrawal and Jerry Kiernan

Why Watermark Databases  Watermark -- Intentionally introduced pattern in the data ƒhard to occur by chance ƒhard to find => hard to destroy (robust against malicious attack)  Increasing use of databases in applications beyond "behind-the-firewall data processing" involving data publication  Data providers require technical solutions to deter data theft and assert ownership of pirated copies

 Value of the database is significantly reduced if all of k least significant bits of an attribute are dropped or perturbed, but it is acceptable to perturb a small number of attribute values  Datasets from many data publishers satisfy the above assumption (Acceptable to tradeoff a small decrease in quality to assert ownership) ƒTables of parametric specifications (mechanical, electrical, electronic, chemical, etc.), surveys (geological, climatic, etc.), life sciences (e.g. gene expressions) ƒHistorical precedence: Logarithm tables, Astronomical E phemerides, H.P.  Inappropriate dataset: Online bank balances Assumption

 Detectability ƒUsing a subset of the tuples and attributes  Robustness ƒUpdates and malicious attacks  Incremental Updatability ƒOn tuple insert/update/delete  Imperceptibility ƒHard to infer the presence of a watermark  Blind System ƒDetection requires neither the original data nor the watermark  Key-Based System ƒAlgorithm is public ƒSecurity resides in the choice of secret key Desiderata

Related Work  Images [BGM95,HG98,M98,DR00]  Audio [BTH96]  Text [M94]  Software [CT00]

Database Relation Multimedia Object  Consists of a large number of bits, with considerable redundancy => Watermark has a large cover to hide in.  Consists of tuples, each of which represents a separate object => Watermark needs to be spread over these separate objects.  Tuples of a relation constitute a set and there is no implied ordering between them  Relative spatial/temporal positioning of various pieces of an object does not change.  Portions of an object cannot be dropped or replaced arbitrarily without causing perceptual changes in the object.  Pirate can easily drop some tuples/attributes or substitute them with tuples/attributes from other relations Need watermarking techniques designed to take into account special characteristics of relational data Relational data is different from multimedia data

Techniques  Introduce watermarks across a fraction of the tuples in a database relation  Detect the watermark by retrieving a subset of the tuples  Use statistical hypothesis testing to locate the watermark even in the presence of updates to the data

Message Authentication Code  h = H(M), where H is a hash function and M is a message ƒGiven M, easy to compute h ƒGiven h, hard to compute M ƒGiven M, hard to find M' such that H(M) = H(M')  MD5 and SHA are good choices for H  MAC is a one-way hash function which depends on a key K  We use: F(r.P) = H(K o H(K o r.P)), where r.P is the primary key of relation r, and o is concatenation

Watermarking Algorithm  Determine the attributes(s) to be watermarked, the Gap, and the LSBs  For each tuple r, compute MAC: ƒEstablish if r doesn't fall into a gap ƒSelect attribute to be marked ƒDetermine bit position to contain the mark ƒCompute the mark's value ƒUpdate the attribute's value to reflect the watermark, if necessary

Technique A1A2A3A4 PK1011001100110000100100011100110000101 PK2101000111010101111111010110100110011 PK3110001110100010101000010101101010010 PK4111000010010000010 111110010 PK5110011001010000011100011001110000110 A1A2A3A4 PK1011001100110000100100011100110000101 PK2101000111010101111111010110100110011 PK3110001110100010101000010101101010000 PK4111000010010001010010000010111110010 PK5110011001010000111100011001110000110 Before Watermarking After Watermarking PK5 Not selected because in gap B2 of A1 selected for PK1 Value not changed because Mark = 1 Value changed Mark = 1

Without the Private Key, the Watermark is Hard to Destroy  Which tuple contains a mark  Which attribute got marked  Which bit position got marked  The expected value of a mark

Detection Algorithm  Locate suspicious data and extract sample which might contain watermark  For each tuple r, compute MAC: ƒIf r doesn't fall into a gap, extract the mark bit value  Count the number of success and Bernoulli trials  Apply statistical analysis to establish presence of the watermark

Extensions to the Algorithm  Relations with no primary keys  Null values

Evaluation  Analysis  Experiments ƒForest Cover Type dataset from UCI repository

Attacks  Bit attacks ƒRandomize, zero-out, bit flipping, rounding, translation  Subset attack ƒSelect subset of tuples and attributes  Mix-and-match attack ƒCombine data from multiple sources  Additive attack ƒInsert new watermark over existing watermark  Invertibility attack ƒCounterfeit watermark  Benign updates

Cumulative Binomial Probability Distribution b(k;n,p) = ( n k ) p (1-p) k n-k B(k;n,p) = b(i;n,p) S i=k n

Parameters and Defaults  Number of tuples: 1 million  Number of marked attributes: 1  Number of least significant bits: 1  Fraction of tuples marked: 1/1000  Significance level for hypothesis test: 0.01

Proportion of correctly marked tuples required for detectability  The proportion of correctly marked tuples needed for detectability decreases as the number of marks increases  For 1M tuples and 10% of tuples marked, that proportion < 51%  Illustrates the tolerance of the watermark to updates

Proportion of correctly marked tuples needed for decreasing alpha  The data can tolerate a large number of updates while maintaining detectability with high confidence

Excess Error in an Attack  Attacker can be forced to make orders of magnitude more errors than the owner,making his data economically much less attractive compared to that of the owner

Samples in Which the Watermark Could be Detected When the Attacker has Dropped Tuples  Watermark detected in a subset of the tuples of a watermarked relation  Selectivity gives the sample size  Each experiment repeated 100 times  Results show the percentage of trials in which the watermark could be detected

Samples in Which Watermark was Detected When the Attacker has Dropped some Attributes  Watermark detected in a subset of the attributes and tuples of a watermarked relation  Watermark spread across 10 attributes  Selectivity gives the sample size  Each experiment repeated 100 times  Results show the percentage of trials in which the watermark could be detected

Mix-and-Match Attack  Minimum fraction of tuples from the watermarked relation needed for detectability  N is the relation size  N x f = tuples from marked relation  N x (1 - f) = tuples from other relations

Summary  Provided desiderata for a system for watermarking database relations  First watermarking algorithm for database relations  No dependence on tuple ordering  Robust against attacks  Watermark can be incrementally updated  Requires neither the original relation nor the watermark for detection

Future Work  Watermarking extensions to handle non- numeric attributes  New algorithms for fingerprinting to track multiple sources of piracy

Watermarking Relational Databases Rakesh Agrawal and Jerry Kiernan.

Similar presentations

Presentation on theme: "Watermarking Relational Databases Rakesh Agrawal and Jerry Kiernan."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Watermarking Relational Databases Rakesh Agrawal and Jerry Kiernan.

Similar presentations

Presentation on theme: "Watermarking Relational Databases Rakesh Agrawal and Jerry Kiernan."— Presentation transcript:

Similar presentations

About project

Feedback