Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Analysis of the Rijndael Block Cipher

Similar presentations


Presentation on theme: "Parallel Analysis of the Rijndael Block Cipher"— Presentation transcript:

1 Parallel Analysis of the Rijndael Block Cipher
Philip Brisk Adam Kaplan Majid Sarrafzadeh Embedded & Reconfigurable Systems Lab Computer Science Department IASTED-PDCS November, 2003

2 Outline Introduction Background Material
Analysis of the Rijndael Cipher Concluding Remarks 1/34 IASTED-PDCS November, 2003

3 Parallel Models of Computation and Cryptography
Achieving optimal performance of cryptographic algorithms is imperative! Goal: Understand how to accelerate performance by studying cryptography under parallel models of computation. 2/34 IASTED-PDCS November, 2003

4 What can we Learn from Parallel Models of Computation?
Identification of performance bottlenecks. How to design efficient cryptographic hardware. Techniques to improve future algorithms. 3/34 IASTED-PDCS November, 2003

5 Outline Background Material Introduction
Cost Model Prefix Sum Computation Analysis of the Rijndael Cipher Concluding Remarks 4/34 IASTED-PDCS November, 2003

6 Cost Model n : problem size t(n) : number of steps
p(n) = N > 1 : number of processors c(n) : cost s(n) : speedup 5/34 IASTED-PDCS November, 2003

7 Cost Optimality Cost ≡ the number of steps executed collectively by all processors. An algorithm is cost-optimal on a parallel model of computation if: 6/34 IASTED-PDCS November, 2003

8 Prefix Sum Computation
P – a set of N processors: {P1, …, PN} Processor Pi holds a value ai. For each processor Pi, compute the sum Si: Algorithm: for i = 1 to N Si = ai + Si-1 Addition can be generalized to any binary associative operation. 7/34 IASTED-PDCS November, 2003

9 Prefix Sum Computation
Meijer and Akl [1987] described a solution using a binary tree of processors. 3 6 1 4 8/34 IASTED-PDCS November, 2003

10 Prefix Sum Computation
Meijer and Akl [1987] described a solution using a binary tree of processors. 3 6 1 3 6 1 4 8/34 IASTED-PDCS November, 2003

11 Prefix Sum Computation
Meijer and Akl [1987] described a solution using a binary tree of processors. 3 1 9 3 6 1 4 8/34 IASTED-PDCS November, 2003

12 Prefix Sum Computation
Meijer and Akl [1987] described a solution using a binary tree of processors. 9 3 9 6 1 4 5 8/34 IASTED-PDCS November, 2003

13 Prefix Sum Computation
Meijer and Akl [1987] described a solution using a binary tree of processors. 9 9 3 9 1 5 8/34 IASTED-PDCS November, 2003

14 Prefix Sum Computation
Meijer and Akl [1987] described a solution using a binary tree of processors. 3 9 10 14 8/34 IASTED-PDCS November, 2003

15 A Cost-Optimal Prefix Sum
To achieve cost optimality: 9/34 IASTED-PDCS November, 2003

16 Outline Analysis of the Rijndael Cipher Introduction
Background Material Analysis of the Rijndael Cipher Concluding Remarks 10/34 IASTED-PDCS November, 2003

17 The Rijndael Cipher The cipher iterates in a series of rounds.
Each round requires a Key Using the same key every round is not secure. Providing a sequence of keys as an input is unreasonable. A key schedule is uses the original key to compute a new key for each round. 11/34 IASTED-PDCS November, 2003

18 The Rijndael Cipher Key Schedule Round Transformation Key Expansion
Expands the original key analogously to prefix-sum computation. Round Key Selection Divides the expanded key between the rounds of the cipher Round Transformation 4 sub-transformations applied during each round: ByteSub Shift Row MixColumn AddRoundKey 12/34 IASTED-PDCS November, 2003

19 The Rijndael Cipher: Parameters
Nb – Block Length (# bytes in state) Nk – Key Length Nr – Number of Rounds The key and state are represented as 2-dimensional arrays of bytes. 13/34 IASTED-PDCS November, 2003

20 Representation of the State
The state is represented by a 4 x Nb/4 array of bytes (Nb = 4, 6, or 8) Nb a0,0 a0,1 a0,2 a0,3 a1,0 a1,1 a1,2 a1,3 4 a2,0 a2,1 a2,2 a2,3 a3,0 a3,1 a3,2 a3,3 14/34 IASTED-PDCS November, 2003

21 The ByteSub Transformation
Apply an S-Box to every byte in the state. a0,0 a0,1 a0,2 a0,3 S-BOX b0,0 b0,1 b0,2 b0,3 a1,0 ai,j a1,1 a1,2 a1,3 b1,0 b1,1 bi,j b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3 a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 State 8-bit lookup table State 15/34 IASTED-PDCS November, 2003

22 The ByteSub Transformation
S-BOX b0,0 b0,1 b0,2 b0,3 a1,0 a1,1 ai,j a1,2 a1,3 b1,0 b1,1 bi,j b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3 a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 State 8-bit lookup table State 15/34 IASTED-PDCS November, 2003

23 The ByteSub Transformation
1 processor t(n) = O(Nb) a0,0 a0,1 a0,2 a0,3 S-BOX b0,0 b0,1 b0,2 b0,3 a1,0 ai,j a1,1 a1,2 a1,3 b1,0 b1,1 bi,j b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3 a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 State 8-bit lookup table State 15/34 IASTED-PDCS November, 2003

24 The ByteSub Transformation
4 x Nb processors t(n) = O(1) a0,0 a0,1 a0,2 a0,3 S-BOX b0,0 b0,1 b0,2 b0,3 a1,0 ai,j a1,1 a1,2 a1,3 b1,0 b1,1 bi,j b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3 a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 State 8-bit lookup table State 15/34 IASTED-PDCS November, 2003

25 The Shift-Row Transformation
Shift each row of the state by a constant. a0,0 a0,1 a0,2 a0,3 b0,0 b0,1 b0,2 b0,3 a1,0 a1,1 a1,2 a1,3 b1,1 b1,2 b1,3 b1,0 a2,0 a2,1 a2,2 a2,3 b2,2 b2,3 b2,0 b2,1 a3,0 a3,1 a3,2 a3,3 b3,3 b3,0 b3,1 b3,2 State State 16/34 IASTED-PDCS November, 2003

26 The Shift-Row Transformation
1 processor t(n) = O(Nb) a0,0 a0,1 a0,2 a0,3 b0,0 b0,1 b0,2 b0,3 a1,0 a1,1 a1,2 a1,3 b1,1 b1,2 b1,3 b1,0 a2,0 a2,1 a2,2 a2,3 b2,2 b2,3 b2,0 b2,1 a3,0 a3,1 a3,2 a3,3 b3,3 b3,0 b3,1 b3,2 State State 16/34 IASTED-PDCS November, 2003

27 The Shift-Row Transformation
4 x Nb processors t(n) = O(1) a0,0 a0,1 a0,2 a0,3 b0,0 b0,1 b0,2 b0,3 a1,0 a1,1 a1,2 a1,3 b1,1 b1,2 b1,3 b1,0 a2,0 a2,1 a2,2 a2,3 b2,2 b2,3 b2,0 b2,1 a3,0 a3,1 a3,2 a3,3 b3,3 b3,0 b3,1 b3,2 State State 16/34 IASTED-PDCS November, 2003

28 The Mix-Column Transformation
Apply to each column in the state. a0,j b0,j a0,0 a0,1 a0,2 a0,3 Mix- Column b0,0 b0,1 b0,2 b0,3 a1,j b1,j a1,0 a1,1 a1,2 a1,3 b1,0 b1,1 b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3 a2,j b2,j a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 a3,j b3,j 4x4 Byte Matrix State State 17/34 IASTED-PDCS November, 2003

29 The Mix-Column Transformation
a0,j b0,j a0,0 a0,1 a0,2 a0,3 Mix- Column b0,0 b0,1 b0,2 b0,3 a1,j b1,j a1,0 a1,1 a1,2 a1,3 b1,0 b1,1 b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3 a2,j b2,j a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 a3,j b3,j 4x4 Byte Matrix State State 17/34 IASTED-PDCS November, 2003

30 The Mix-Column Transformation
1 processor t(n) = O(Nb) a0,j b0,j a0,0 a0,1 a0,2 a0,3 Mix- Column b0,0 b0,1 b0,2 b0,3 a1,j b1,j a1,0 a1,1 a1,2 a1,3 b1,0 b1,1 b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3 a2,j b2,j a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 a3,j b3,j 4x4 Byte Matrix State State 17/34 IASTED-PDCS November, 2003

31 The Mix-Column Transformation
O(Nb) processors t(n) = O(1) a0,j b0,j a0,0 a0,1 a0,2 a0,3 Mix- Column b0,0 b0,1 b0,2 b0,3 a1,j b1,j a1,0 a1,1 a1,2 a1,3 b1,0 b1,1 b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 b2,0 b2,1 b2,2 b2,3 a2,j b2,j a3,0 a3,1 a3,2 a3,3 b3,0 b3,1 b3,2 b3,3 a3,j b3,j 4x4 Byte Matrix State State 17/34 IASTED-PDCS November, 2003

32 The Add-Round-Key Transformation
Xor each state byte with each key byte.. a0,0 a0,1 a0,2 a0,3 k0,0 k0,1 k0,2 k0,3 b0,0 b0,1 b0,2 b0,3 a1,0 ai,j a1,1 a1,2 a1,3 k1,0 ki,j k1,1 k1,2 k1,3 b1,0 b1,1 bi,j b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 k2,0 k2,1 k2,2 k2,3 b2,0 b2,1 b2,2 b2,3 a3,0 a3,1 a3,2 a3,3 k3,0 k3,1 k3,2 k3,3 b3,0 b3,1 b3,2 b3,3 State Key State XOR 18/34 IASTED-PDCS November, 2003

33 The Add-Round-Key Transformation
1 processor t(n) = O(Nb) a0,0 a0,1 a0,2 a0,3 k0,0 k0,1 k0,2 k0,3 b0,0 b0,1 b0,2 b0,3 a1,0 ai,j a1,1 a1,2 a1,3 k1,0 ki,j k1,1 k1,2 k1,3 b1,0 b1,1 bi,j b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 k2,0 k2,1 k2,2 k2,3 b2,0 b2,1 b2,2 b2,3 a3,0 a3,1 a3,2 a3,3 k3,0 k3,1 k3,2 k3,3 b3,0 b3,1 b3,2 b3,3 State Key State XOR 18/34 IASTED-PDCS November, 2003

34 The Add-Round-Key Transformation
4 x Nb processors t(n) = O(1) a0,0 a0,1 a0,2 a0,3 k0,0 k0,1 k0,2 k0,3 b0,0 b0,1 b0,2 b0,3 a1,0 ai,j a1,1 a1,2 a1,3 k1,0 ki,j k1,1 k1,2 k1,3 b1,0 b1,1 bi,j b1,2 b1,3 a2,0 a2,1 a2,2 a2,3 k2,0 k2,1 k2,2 k2,3 b2,0 b2,1 b2,2 b2,3 a3,0 a3,1 a3,2 a3,3 k3,0 k3,1 k3,2 k3,3 b3,0 b3,1 b3,2 b3,3 State Key State XOR 18/34 IASTED-PDCS November, 2003

35 The Round Transformation
For i = 1 to Nr – 1 State  ByteSub(State) State  ShiftRow(State) State  MixColumn(State) State  AddRoundKey(State, Key) Final Round: 19/34 IASTED-PDCS November, 2003

36 The Round Transformation
Sequential Model p(n) = 1 t(n) = O(Nb x Nr) Fully Parallel Model p(n) = O(Nb) t(n) = O(Nr) s(n) = O(Nb) c(n) = O(Nb x Nr) We have achieved cost-optimality! 20/34 IASTED-PDCS November, 2003

37 Key Expansion Algorithm
For j = 1 to Nk W[j] = (Key[4j],Key[4j+1],Key[4j+2],Key[4j+3]) For j = Nk+1 to Nb x (Nr+1) temp = W[j-1] if( j % Nk = 0 ) temp = SubByte(RotByte(temp)) ^ Rcon[j/Nk] else if( Nk > 6 && j % Nk == 4 ) temp = SubByte(temp) W[j] = W[j-Nk] XOR temp 21/34 IASTED-PDCS November, 2003

38 Key Expansion Algorithm on a Uniprocessor (Sequential) Machine
Basic Algorithm Structure: For j = 1 to Nk { … } For j = Nk+1 to Nb x (Nr+1) Nk iterations Nb x (Nr + 1) - Nk iterations Total: Nb x (Nr + 1) iterations 1 processor t(n) = O(Nb x Nr) 22/34 IASTED-PDCS November, 2003

39 Key Expansion Algorithm on a Parallel Machine
The loop-carried dependence appears to render the algorithm impossible to parallelize… For j = Nk+1 to Nb x (Nr+1) temp = W[j-1] W[j] = W[j-Nk] XOR temp 23/34 IASTED-PDCS November, 2003

40 Key Expansion Algorithm on a Parallel Machine
… Observe that XOR is a binary associative operation. For j = Nk+1 to Nb x (Nr+1) temp = W[j-1] W[j] = W[j-Nk] XOR temp 23/34 IASTED-PDCS November, 2003

41 Key Expansion Algorithm on a Parallel Machine
This algorithm is simply a variant of Prefix Sum with XOR instead of +. For j = Nk+1 to Nb x (Nr+1) temp = W[j-1] W[j] = W[j-Nk] XOR temp 23/34 IASTED-PDCS November, 2003

42 Key Expansion Algorithm
To compute the prefix sum cost-optimally: 24/34 IASTED-PDCS November, 2003

43 Round Key Selection Bytes W[Nb x i] through W[Nb x (i+1) – 1] are chosen to be the key bits for round i. Can be interleaved with the Key Expansion phase with no additional overhead. W[1..Nb-1] W[Nb..2Nb-1] W[NbNr..Nb(Nr+1)-1] 25/34 IASTED-PDCS November, 2003

44 Key Schedule Sequential Algorithm Parallel (Prefix-Sum) Algorithm
26/34 IASTED-PDCS November, 2003

45 The Rijndael Cipher: Sequential Model
Key Schedule Round Transformation Overall 27/34 IASTED-PDCS November, 2003

46 The Rijndael Cipher: Parallel Model
Key Schedule Round Transformation 28/34 IASTED-PDCS November, 2003

47 The Rijndael Cipher: Parallel Model
Altogether This model does NOT yield a cost-optimal solution! 29/34 IASTED-PDCS November, 2003

48 Achieving Cost Optimality with a Parallel Model of Computation
Reduce the number of processors from The Round Transformation requires time The Key Schedule requires time 30/34 IASTED-PDCS November, 2003

49 Achieving Cost Optimality
Final Results: Speedup and Cost: 31/34 IASTED-PDCS November, 2003

50 Summary of Results Fastest Model Cost-Optimal Model 32/34
IASTED-PDCS November, 2003

51 Outline Concluding Remarks Introduction Background Material
Analysis of the Rijndael Cipher Concluding Remarks 33/34 IASTED-PDCS November, 2003

52 Concluding Remarks First theoretical study of the parallelism inherent in the Rijndael AES. Fastest parallel model was not cost-optimal - some acceleration was sacrificed in order to achieve cost-optimality. 34/34 IASTED-PDCS November, 2003


Download ppt "Parallel Analysis of the Rijndael Block Cipher"

Similar presentations


Ads by Google