Martin Novotný, Andy Rupp Ruhr University Bochum

Martin Novotný, Andy Rupp Ruhr University Bochum
Application of FPGA Design: Design Challenges for Implementing Realtime A5/1 Attack with Precomputation Tables Martin Novotný, Andy Rupp Ruhr University Bochum

Outline A5/1 cipher Time-Memory Trade-off Tables
Original Hellman Approach Distinguished points TMTO with multiple data Rainbow tables Thin-rainbow tables Architecture of the A5/1 TMTO engine – table generation Implementation results

A5/1 Cipher A5/1 A5/1 Encrypts GSM communication
GSM communication organized in frames 1 frame = 114 bits in each direction Stream cipher produces the keystream KS being xored with the plaintext P to form ciphertext C C = P  KS A5/1 P C A5/1 KS

Architecture of A5/1 Cipher
3 linear feedback shift registers (LFSRs) LFSRs irregularly clocked the register is clocked iff its clocking bit (yellow) is equal to the majority of all 3 clocking bits  at least 2 registers are clocked in each cycle

Algorithm of A5/1 Reset all 3 registers
(Initialization) Load 64 bits of key K + 22 bits of frame number FN into 3 registers K and FN xored bit-by-bit to the least significant bits registers clocked regularly (Warm-up) Clock for 100 cycles and discard the output registers clocked irregularly (Execution) Clock for 228 cycles, generate bits (for each direction) Repeat for the next frame

Cryptanalysis of stream ciphers with known plaintext
From the ciphertext C and known plaintext P compute keystream KS: KS = P  C Keystream KS is a function of: key K: KS = f(K) internal state L: KS = g(L) (internal state = content of all registers) Reset all 3 registers (Initialization) Load 64 bits of key K + 22 bits of frame number FN into 3 registers (Warm-up) Clock for 100 cycles and discard the output (Execution) Clock for 228 cycles, generate bits (for each direction) We can skip Initialization and Warm-up!!!

Cryptanalysis of A5/1 For (known) keystream KS find the internal state L When L found, track the A5/1 cipher back through Warm-up phase and Initialization to get the key K. Reset all 3 registers (Initialization) Load 64 bits of key K + 22 bits of frame number FN into 3 registers (Warm-up) Clock for 100 cycles and discard the output (Execution) Clock for 228 cycles, generate bits (for each direction) L

Cryptanalysis of A5/1 Internal state L has 64 bits
 we need (at least) 64 bits of keystream KS One A5/1 frame has 114 bits  we can make samples KSi It is sufficient to find any Li KS3 L3 KS2 L2 KS1 L1 KS0 L0

Two extreme approaches
Brute force attack Check all combinations of a key K online time T = N = 2k memory M = 1 Table lookup (For a given plaintext P) All pairs key-ciphertext {Ki, Ci} precomputed and stored (sorted by C) Online phase: Look-up C in the table (and find K) time T = 1 memory M = N = 2k

Time-Memory Trade-Off (Hellman, 1981)
Compromises the above two extreme approaches Precomputation phase: For a given plaintext P: precompute (ideally all) pairs key-ciphertext {Ki, Ci}; store only some of them in the table. Online phase: Perform some computations; lookup the table and find the key K. time T = N2/3 memory M = N2/3

Precomputation (offline) phase
Idea: Encryption function E is a pseudo-random function C = EK(P) Pairs {Ki, Ci} organized in chains Ci is used to create a key Ki+1 for the next step E is pseudo-random  we perform a pseudo-random walk in the keyspace E K C P plaintext P is the same P E K1 C1 SP = 1234 Start Point P E R EP Kt Ct B05B 8EC0 End Point f C2 P E R K3 28DF 7A3D R K2 R – reduction function (DES: C has 64 bits, K has 56 bits) f – step function f(x) = R(Ex(P))

Idea: Encryption function E is a pseudo-random function C = EK(P) Pairs {Ki, Ci} organized in chains Ci is used to create a key Ki+1 for the next step E is pseudo-random  we perform a pseudo-random walk in the keyspace E K C P plaintext P is the same Start Point End Point P E R f K1 EP K2 K3 Kt C1 C2 Ct SP = 1234 7A3D 28DF B05B 8EC0 R – reduction function (DES: C has 64 bits, K has 56 bits) f – step function f(x) = R(Ex(P))

1234 SP1 = k10 f k11 f k12 f … f k1t-1 f k1t = EP1 8EC0 1235 SP2 = k20 f k21 f k22 f … f k2t-1 f k2t = EP2 2A1B 1236 SP3 = k30 f k31 f k32 f … f k3t-1 f k3t = EP3 4D3C … … … 9999 SPm = km0 f km1 f km2 f … f kmt-1 f kmt = EPm 02E3 m chains with a fixed length t generated Only pairs {SPi, EPi} stored (sorted by EP)  reducing memory requirements P E R f SPj = kj0 kjt = EPj kj1 kj2 kjt-1

 Online phase E Given C. (and P) … try to find K, such that C = EK(P)
= EPi ? Lookup:  SPi R C y1 E K f

 Online phase Given C. (and P) … try to find K, such that C = EK(P)
= EPi ? Lookup: R C y1

Online phase Given C. (and P) … try to find K, such that C = EK(P) y1

= EPi ? Lookup: R C y1 f y2

Online phase Given C. (and P) … try to find K, such that C = EK(P) C

= EPi ? Lookup: R C y1 f y2 y3

Online phase Given C. (and P) … try to find K, such that C = EK(P) C

 Online phase Given C. (and P) … try to find K, such that C = EK(P)
= EPi ? Lookup:  R C y1 f y2 y3 y4

 Online phase E Given C. (and P) … try to find K, such that C = EK(P)
= EPi ? Lookup:  SPi R C y1 f y2 y3 y4 f K

Birthday paradox problem
m chains of fixed length t generated R is not bijective ⇒ some kij collide. Collisions yield in chain merges or in cycles in chains Matrix stopping rule: Hellman proved that it is not worth to increase number of chains m or length of chain t beyond the point at which m × t2 = N (the coverage of keyspace does not increase too much then)

Birthday paradox problem
Matrix stopping rule: m × t2 = N Recommendation: To use r tables, each with different reduction (re-randomization) function R Since also N = m  t  r, then r = t Hellman recommends m = t = r = N1/3 SP1 … … … … … EP1 SP2 … … … … … EP2 SP3 … … … … … EP3 … … … SP200 … … … ... EP200

Hellman TMTO – Complexity
Precomputation phase Precomputation time PT = m  t  r = N (e.g. 260) Memory M = m  r = N2/3 (e.g. 240 ) Online phase Memory M = N2/3 Online time T = t  r = t2 = N2/3 (e.g. 240 ) Table accesses TA = T = N2/3 (e.g. 240 )

Hellman TMTO – Complexity
Precomputation phase Precomputation time PT = m  t  r = N (e.g. 260) Memory M = m  r = N2/3 (e.g. 240 ) Online phase Memory M = N2/3 Online time T = t  r = t2 = N2/3 (e.g. 240 ) Table accesses TA = T = N2/3 (e.g. 240 ) 34 years (1 disk access ~ 1 ms)

Distinguished points (DP) (Rivest, ????)
Slight modification of original Hellman method Goal: To reduce the number of table accesses TA (in Hellman TA = N2/3) Distinguished point is a point of a certain property (e.g. 20 most significant bits are equal to 0).

Distinguished Points (DP) Precomputation phase
Chains are generated until the distinguished point (DP) is reached if the chain exceeds maximum length tmax, then it is discarded and the next chain is generated the chain is also discarded if the DP has been reached, but the chain is too short tmin (to have better coverage) Triples {SPj, EPj, lj} stored, sorted by EP (lj is a length of the chain) 1234 SP1 = k10 f k11 f … … … … … … … f k1u = EP1 0EC0 1235 SP2 = k20 f k21 f … … f k2v = EP2 0A1B 1236 SP3 = k30 f k31 f … … … … … f k3w = EP3 043C … … … 9999 SPm = km0 f km1 f … … … f kmz = EPm 02E3 chains have different lengths End Points are DP

Distinguished Points (DP) Online phase
There is 1 distinguished point per chain – the End Point End Point  Distinguished Point Algorithm: compute yi+1 = f(yi) iteratively until the DP is reached (or the maximum length tmax is exceeded) then lookup (just once per table) (if tmax is exceeded, do not lookup at all) Advantages Table accesses TA = r = N1/3 (c.f. TA = t  r = N2/3 in original Hellman) Chain loops are not possible E f = EPi ? Lookup:  SPi R C y1 f y2 y3 y4 043C f K

TMTO with multiple data (Biryukov & Shamir, 2000)
Important for stream ciphers: To reveal an internal state Li having k bits we need only k bits of a keystream KSi Having D data samples of the ciphertext C (or the keystream KS) we have D times more chances to find the key K (or the internal state L)  We calculate r/D tables only  we reduce the precomputation time PT and the memory M × online time T and #table access TA remain unchanged KS3 L3 KS2 L2 KS1 L1 KS0 L0

TMTO with multiple data A5/1
1 frame: 114 bits Internal state: 64 bits  114 – = 51 data samples from 1 frame (each sample has 64 bits) D = 51 We calculate D times less tables ( save memory, save time) KS0 L0 KS1 L1 KS2 L2 KS3 L3

Rainbow tables (Oechslin, 2003)
Idea: to use different reduction/re-randomization function Ri in each step of chain generation, hence the step functions are: f1 f2 f3 … ft-1 ft P E R1 f1 R2 f2 Rt ft SPj = kj0 kjt = EPj kj1 kj2 xjt-1 Online phase: Compute y1 = Rt(C), compare with EPs, if no match, then Compute y2 = ft(Rt-1(C)), compare with EPs, if no match, then Compute y3 = ft(ft-1(Rt-2(C))), compare with EPs, if no match, then …

Rainbow tables Just one table (or only several tables) generated,
m = N2/3 (t reduction functions used ⇒ the table can be t times longer), t = N1/3 Advantages chain loops impossible point collisions lead to chain merges only if the equal points appear in the same position of the chain online time T about ½ of the online time of original Hellman (for single data) number of table accesses the same like for the Hellman+DP method (for single data) Disadvantages Inferior to the Hellman+DP method in the case of multiple data (D > 1) (online time T and the number of table accesses TA are D-times greater)

f1 f2 f3 … fS-1 fS f1 f2 f3 … fS-1 fS … … … f1 f2 f3 … fS-1 fS
Thin-rainbow tables The way to cope with the rainbow tables when having multiple data The sequence of S different reduction functions fi is applied k-times periodically in order to create a chain: f1 f2 f3 … fS-1 fS f1 f2 f3 … fS-1 fS … … … f1 f2 f3 … fS-1 fS Chain length t = S × k 1st 2nd kth

Thin-rainbow tables + DP (to reduce # table accesses TA)
DP criterion is checked after each fS f1 f2 f3 … fS-1 fS f1 f2 f3 … fS-1 fS … … … f1 f2 f3 … fS-1 fS We store only chains for which kmin < k < kmax 1st 2nd kth DP ? DP ? DP ? DP ?

Candidates for implementation (in case of multiple data, D>1)
Hellman + DP DP-criterion checked after each step-function f Thin-rainbow + DP DP-criterion checked after fS only  simpler HW, better time/area product Both have the same precomputation complexity Both have comparable online time T and # table accesses TA

Thin-rainbow tables + DP (to reduce # table accesses TA)
DP criterion is checked after each fS f1 f2 f3 … fS-1 fS f1 f2 f3 … fS-1 fS … … … f1 f2 f3 … fS-1 fS We store only chains for which kmin < k < kmax 1st 2nd kth DP ? DP ? DP ? DP ?

Implementation choices
Pipeline? Array of small computing elements?

Slice – FPGAs Basic Building Block
Look-up Table Flip- flop Look-up Table Flip- flop Implements combinational logic (any logic function of 4 variables) It is RAM 16x1 (holds the truth-table)

LUT – configuration choices
Look-up Table Flip- flop Look-up Table Flip- flop Can be configured as: LUT (function generator) RAM 16x1 SRL16 (upto 16-bit shift register)

Implementation choices
Pipeline? All A5/1 bits should have been accessible in parallel max. 240 A5/1 units (64x FF/unit) (and no control unit, …) Array of small computing elements? LFSRs can be implemented using SRL16 (1 LUT config. as up to 16-bit shift register) max. 480 A5/1 units (8x SRL16 + 5x FF/unit) (enough FFs for control unit, …)

A5/1 TMTO basic element Calculates one chain Two-stroke mode:
core #1 generates keystream, core #2 is loaded core #2 generates keystream, core #1 is loaded

First, the start point SPj is loaded to core #1
A5/1 TMTO basic element First, the start point SPj is loaded to core #1

A5/1 TMTO basic element … then one rainbow period f1f2f3 … fS-1fS is performed … In odd steps: Core #1 generates keystream … ... that is re-randomized … … and loaded to core # 2 as a new internal state

A5/1 TMTO basic element … then one rainbow period f1f2f3 … fS-1fS is performed … … and loaded to core # 1 as a new internal state ... that is re-randomized … In even steps: Core #2 generates keystream …

A5/1 TMTO basic element After application of fS:
the result is shifted out to check the DP-criterion

A5/1 TMTO engine – table generation (in 1 FPGA)
234 TMTO elements  234 chains computed in parallel in Spartan FPGA TMTO elements share the DP-checker

Loading Startpoints 1234

Loading Startpoints 1234 1235

Loading Startpoints 1234 1235 1236

1st Rainbow Sequence 1234 1235 1236

1st Rainbow Sequence 7A3D 802B 41C3

1st Rainbow Sequence 27B3 4C81 05A1

1st Rainbow Sequence 5AB7 44DC 820F

1st Rainbow Sequence 654C 05B5 82A1

1st Rainbow Sequence 57A2 91D6 120B

1st Rainbow Sequence 1283 AB45 5A1B

1st Rainbow Sequence 987B 651E 420B

1st Rainbow Sequence 1A56 02BA 8ACD

Evaluation (DP checking) 1A56 02BA 8ACD

Evaluation (DP checking) 1A56 02BA 8ACD 

Evaluation (DP checking) 02BA 8ACD 1A56  1237 1235

Evaluation (DP checking) 8ACD 1A56 1237  02BA 1235

Evaluation (DP checking) 1A56 1237 8ACD 02BA 1235

2nd Rainbow Sequence … 1A56 1237 8ACD 02BA 1235

Data error detection Data MUST be correct
Errors may appear during the data transfer via COPA bus (120 FPGAs sharing the same bus) Hamming code (72, 64) TED (triple error detection) detects 99.19% quadruple errors (detects also all errors of 5, 6, 7, 9, 10, 11, … bits) If an error appears, the data are discarded Hamming encoding COPA bus

Implementation results
COPACOBANA is able to perform up to 236 (~69 billion) step-functions fi per second 234 TMTO elements/FPGA 120 FPGAs maximum frequency fmax = 156 MHz one step-function takes 64 clock cycles 234 × 120 × 156106 / 64  236

Parameter choices chains computed m rainbow sequence S DP criterion d [bits] #seq. in chain k precomp. time PT [days] disk usage DU [TB] # data samples: D = 64 online time OT [s] table accesses TA success ratio SR 241 215 5 [23 , 26] 337.5 7.49 27.8 221 0.86 239 [23 , 27] 95.4 3.25 36.3 0.67 240 214 [24 , 27] 4.85 10.9 220 0.63 84.4 7.04 7.0 0.60 3.48 [24 , 26] 5.06 8.5 0.55 237 6 [24 , 28] 47.7 0.79 73.5 0.42

Martin Novotný, Andy Rupp Ruhr University Bochum

Similar presentations

Presentation on theme: "Martin Novotný, Andy Rupp Ruhr University Bochum"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Martin Novotný, Andy Rupp Ruhr University Bochum

Similar presentations

Presentation on theme: "Martin Novotný, Andy Rupp Ruhr University Bochum"— Presentation transcript:

Similar presentations

About project

Feedback