 # Hash Function. What are hash functions? Just a method of compressing strings – E.g., H : {0,1}*  {0,1} 160 – Input is called “message”, output is “digest”

## Presentation on theme: "Hash Function. What are hash functions? Just a method of compressing strings – E.g., H : {0,1}*  {0,1} 160 – Input is called “message”, output is “digest”"— Presentation transcript:

Hash Function

What are hash functions? Just a method of compressing strings – E.g., H : {0,1}*  {0,1} 160 – Input is called “message”, output is “digest” Why would you want to do this? – Short, fixed-size better than long, variable-size True also for non-crypto hash functions – Digest can be added for redundancy – Digest hides possible structure in message

Typically using Merkle-Damgård iteration: 1.Start from a “compression function” – h: {0,1} b+n  {0,1} n 2.Iterate it How are they built? h c =160 bits |M|=b=512 bits d=h(c,M) = 160 bits hhhh … M1M1 M2M2 M L-1 MLML IV=d 0 d1d1 d2d2 d L-1 dLdL d=H(M) But not always…

The Merkle-Damgard iterated construction Thm: h collision resistant ⇒ H collision resistant Can we use H(.) to directly build a MAC? No,Given H( k ll m) can compute H( k ll m ll PB ll w ) for any w. hhh mmmm ll PB h IV (fixed) H(m)

What are they good for? “Modern, collision resistant hash functions were designed to create small, fixed size message digests so that a digest could act as a proxy for a possibly very large variable length message in a digital signature algorithm, such as RSA or DSA. These hash functions have since been widely used for many other “ancillary” applications, including hash-based message authentication codes, pseudo random number generators, and key derivation functions.”

Important Properties of Cryptographic Hash Functions First pre-image resistance Second pre-image resistance Strong collision resistance Efficient

First pre-image resistance: Given y in Y, it is “computationally infeasible” to compute a value x in X such that h(x) = y Hard to invert Why we need this property? – If hash function is invertible, then it is not useful for crypto applications n E.g., password – authentication will be broken

Second pre-image resistance: Given x in X, it is “computationally infeasible” to compute a different value x’ in X with x!=x’ such that h(x) = h(x’) Weak collision-resistance (sometimes also called target collision resistance) Why we need this property? – If hash function is not weak collision-resistant, then it is not useful for crypto applications n E.g., password authentication will be broken because even if the attacker doesn’t get your actual password, he can still get another string of bits that “collides with” and is as good as your actual password

Strong collision-resistance: It is “computationally infeasible” to find distinct inputs x, x’ with x!=x’ such that h(x) = h(x’) Why we need this property? – Usability of hash functions – Security of hash-then-sign schemes Given h, attacker computes m,m’ such that h(m) = h(m’) Attacker gives m to Alice to hash-then-sign Alice produces Attacker replaces it with and claims Alice signed it Strong collision resistance implies weak collision resistance

Two variants:  when drawing elements randomly (with replacement) from a set of N elements, with high probability a repeated element will be encountered after ~sqrt(N) selections  if we have a set of N elements, and we randomly select two subsets of size ~sqrt(N) each, then with high probability, the intersection of the two subsets will not be empty These facts have a profound impact on the design of hash functions (and other cryptographic algorithms and protocols)!

Birthday Paradox  Given a set of N elements, from which we draw k elements randomly (with replacement). What is the probability of encountering at least one repeating element?  first, compute the probability of no repetition: – the first element x1 can be anything – when choosing the second element x2, the probability of x2 ≠ x1 is 1-1/N – when choosing x3, the probability of x3 ≠ x2 and x3 ≠ x1 is 1-2/N – … – when choosing the k-th element, the probability of no repetition is 1-(k-1)/N – the probability of no repetition is (1 - 1/N)(1 - 2/N)…(1 – (k-1)/N) – when x is small, (1-x) ≈ e-x – (1 - 1/N)(1 - 2/N)…(1 – (k-1)/N) = e-1/Ne-2/N … e-(k-1)/N = e-k(k-1)/2N  the probability of at least one repetition after k drawing is 1 – e-k(k-1)/2N

Birthday Paradox How many drawings do you need, if you want the probability of at least one repetition to be ε ?  solve the following for k: ε = 1 – e-k(k-1)/2N k(k-1) = 2N ln(1/1-ε) k ≈ sqrt(2N ln(1/1-ε))  examples: ε = ½ -> k ≈ 1.177 sqrt(N) ε = ¾ -> k ≈ 1.665 sqrt(N) ε = 0.9 -> k ≈ 2.146 sqrt(N)  origin of the name “birthday paradox”: – elements are dates in a year (N = 365) – among 1.177 sqrt(365) ≈ 23 randomly selected people, there will be at least two that have the same birthday with probability ½

Choosing the output size of hash function good hash functions can be modeled as follows: – given a hash value y, the probability that a randomly chosen input x maps to y is ~2-n – the probability that two randomly chosen inputs x and x’ map into the same hash value is also ~2-n Æ n should be at least 64, but 80 is even better  birthday attacks – among ~sqrt(2n) = 2n/2 randomly chosen messages, with high probability there will be a collision pair – it is easier to find collisions than to find preimages or 2nd preimages for a given hash value - in order to resist birthday attacks, n should be at least 128, but 160 is even better

Standardized method: HMAC (Hash-MAC) Most widely used MAC on the Internet. H: hash function. example: SHA-256 ; output is 256 bits Building a MAC out of a hash function: HMAC: S( k, m ) = H ( k  opad ll H( k  ipad ll m ) )

HMAC in pictures Similar to the NMAC PRF. main difference: the two keys k 1, k 2 are dependent hh mmm ll PB h h tag > > > h kipad IV (fixed) > > IV (fixed) h > kopad

Warning: verification timing attacks [L’09] Example: Keyczar crypto library (Python) [simplified] def Verify(key, msg, sig_bytes): return HMAC(key, msg) == sig_bytes The problem: ‘==‘ implemented as a byte-by- byte comparison Comparator returns false when first inequality found

Warning: verification timing attacks [L’09] Timing attack: to compute tag for target message m do: Step 1: Query server with random tag Step 2: Loop over all possible first bytes and query server. stop when verification takes a little longer than in step 1 Step 3: repeat for all tag bytes until valid tag found m, tag k accept or reject target msg m

Make string comparator always take same time (Python) : return false if sig_bytes has wrong length result = 0 for x, y in zip( HMAC(key,msg), sig_bytes): result |= ord(x) ^ ord(y) return result == 0 Or def Verify(key, msg, sig_bytes): mac = HMAC(key, msg) return HMAC(key, mac) == HMAC(key, sig_bytes)

Lesson Don’t implement crypto yourself !

Download ppt "Hash Function. What are hash functions? Just a method of compressing strings – E.g., H : {0,1}*  {0,1} 160 – Input is called “message”, output is “digest”"

Similar presentations