RTL Design Methodology Transition from Pseudocode & Interface Lecture 2B RTL Design Methodology Transition from Pseudocode & Interface to a Corresponding Block Diagram
Structure of a Typical Digital System Data Inputs Control & Status Inputs Control Signals Datapath (Execution Unit) Controller (Control Unit) Status Signals Data Outputs Control & Status Outputs
Hardware Design with RTL VHDL Pseudocode Interface Datapath Controller ASM chart Block diagram VHDL code VHDL code
Steps of the Design Process Introduced in Class Today Text description Interface Pseudocode Block diagram of the Datapath Interface divided into Datapath and Controller ASM chart of the Controller RTL VHDL code of the Datapath, Controller, and Top-level Unit Testbench for the Datapath, Controller, and Top-Level Unit Functional simulation and debugging Synthesis and post-synthesis simulation Implementation and timing simulation Experimental testing using FPGA board
Class Exercise 2 CIPHER
Pseudocode Split input I into four words, I3, I2, I1, I0, of the size of w bits each A = I3; B = I2; C = I1; D=I0 B = B + S[0] D = D + S[1] for i = 1 to r do { T = (B*(2B + 1)) <<< k U = (D*(2D + 1)) <<< k A = ((A ⊕ T) <<< U) + S[2i] C = ((C ⊕ U) <<< T) + S[2i + 1] (A, B, C, D) = (B, C, D, A) } A = A + S[2r + 2] C = C + S[2r + 3] O = (A, B, C, D)
Notation w: word size, e.g., w=8 (constant) k: log2(w) (constant) A, B, C, D, U, T: w-bit variables I3, I2, I1, I0: Four w-bit words of the input I r: number of rounds (constant) O: output of the size of 4w bits S[j] : 2r+4 round keys stored in two RAMs. Each key is a w-bit word. The first RAM stores values of S[j=2i], i.e., only round keys with even indices. The second memory stores values of S[j=2i+1], i.e., only round keys with odd indices.
Operations ⊕ : XOR + : addition modulo 2w – : subtraction modulo 2w * : multiplication modulo 2w X <<< Y : rotation of X to the left by the number of positions given in Y X >>> Y : rotation of X to the right by the number of positions (A, B, C, D) : Concatenation of A, B, C, and D.
Circuit Interface CIPHER clk reset O I write_I DONE Sj Write_Sj j 4w m
Interface Table Note: m is a size of index j. It is a minimum integer, such that 2m -1 2r+3.
Protocol (1) An external circuit first loads all round keys S[0], S[1], S[2], …, S[2r+2], [2r+3] to the two internal memories of the CIPHER unit. The first memory stores values of S[j=2i], i.e., only round keys with even indices. The second memory stores values of S[j=2i+1], i.e. only round keys with odd indices. Loading round keys is performed using inputs: Sj, j, write_Sj, clk. Then, the external circuits, loads an input block I to the CIPHER unit, using inputs: I, write_I, clk. After the input block I is loaded to the CIPHER unit, the encryption starts automatically.
Protocol (2) When the encryption is completed, signal DONE becomes active, and the output O changes to the new value of the ciphertext. The output O keeps the last value of the ciphertext at the output, until the next encryption is completed. Before the first encryption is completed, this output should be equal to zero.
Assumptions • 2r+4 clock cycles are used to load round keys to internal RAMs one round of the main for loop of the pseudocode executes in one clock cycle • you can access only one position of each internal memory of round keys per clock cycle As a result, the encryption of a single input block I should last r+2 clock cycles.
Typical Internal Structure of a Secret-Key Block Cipher Round Key[0] Initial Transformation i:=1 Round Key[i] Cipher Round i:=i+1 #rounds times i<#rounds? Round Key[#rounds+1] Final Transformation
(no Initial Transformation, no Final Transformation) Basic iterative architecture (no Initial Transformation, no Final Transformation) input multiplexer register round key combinational logic one round output
Basic architecture: Timing CLK P1 P2 P3 IN C1 C2 OUT #rounds · clock_period
Primary parameters of hardware implementations of secret-key block ciphers Latency Throughput Mi+2 Mi Mi+1 Mi Time to encrypt/decrypt a single block of data Encryption/ decryption Encryption/ decryption Number of bits encrypted/decrypted in a unit of time Ci+2 Ci Ci+1 Ci
Advanced Encryption Standard AES
AES Encryption
AES Decryption
Basic Iterative Architecture of AES (Encryption and Decryption) Data input round key Encryption circuit Decryption circuit R1 SubBytes & InvSubBytes ShiftRows InvShiftRows round key MixColumns round key round key InvMixColumns Data output
Initial Transformation Data input Initial Transformation round key Encryption input Decryption input round key round key Encryption round Decryption round Encryption feedback output Encryption final output Decryption final output Decryption feedback output Data output
Top Level Block Diagram control input key input interface Control unit key scheduling encryption/decryption memory of round keys output interface output
Modes of Operation
Block vs. stream ciphers M1, M2, …, Mn m1, m2, …, mn Internal state - IS Block cipher K K Stream cipher C1, C2, …, Cn c1, c2, …, cn Ci=fK(Mi) ci = fK(mi, ISi) ISi+1=gK(mi, ISi) Every block of ciphertext is a function of the current block of plaintext and the current internal state of the cipher Every block of ciphertext is a function of only one corresponding block of plaintext
Typical stream cipher Sender Receiver initialization vector (seed) key key Pseudorandom Key Generator Pseudorandom Key Generator ki keystream ki keystream mi ci ci mi plaintext ciphertext ciphertext plaintext
turned into a stream ciphers Standard modes of operation of block ciphers Block cipher Block cipher turned into a stream ciphers ECB mode Counter mode CFB mode CBC mode
ECB (Electronic CodeBook) mode
Electronic CodeBook Mode – ECB Encryption M1 M2 M3 MN-1 MN K K K K K E E E E E . . . C1 C2 C3 CN-1 CN Ci = EK(Mi) for i=1..N
Electronic CodeBook Mode – ECB Decryption C1 C2 C3 CN-1 CN K K K K K D D D D D . . . M1 M2 M3 MN-1 MN Ci = EK(Mi) for i=1..N
Counter Mode
Counter Mode - CTR Encryption IV IV+1 IV+2 IV+N-2 IV+N-1 . . . K K K K K E E E E E . . . k1 k2 k3 kN-1 kN m1 m2 m3 mN-1 mN c1 c2 c3 cN-1 cN ci = mi ki ki = EK(IV+i-1) for i=1..N
Counter Mode - CTR Decryption IV IV+1 IV+2 IV+N-2 IV+N-1 . . . K K K K K E E E E E . . . k2 kN k1 k3 kN-1 c1 c2 c3 cN-1 cN m1 m2 m3 mN-1 mN mi = ci ki ki = EK(IV+i-1) for i=1..N
(simplified block diagram) Counter Mode – CTR (simplified block diagram) IV IV counter counter ISi ISi IN IN K E K E OUT OUT ci ci IS1 = IV ci = EK(ISi) mi ISi+1 = ISi+1 mi mi
Counter Mode – Potential for Parallel Processing IV IV+1 IV+2 IV+N-1 IV+N . . . E E E E E . . . M0 M1 M2 MN-1 MN C1 C2 C3 CN-1 CN Ci = Mi AES(IV+i) for i=0..N
Increasing speed by parallel processing Encryption/ decryption unit Encryption/ decryption unit Encryption/ decryption unit Encryption/ decryption unit Encryption/ decryption unit Encryption/ decryption unit
Increasing speed using pipelining Cipher 1 Cipher 2 round 1 round 1 round 2 . . . target clock period, e.g., 20 ns . . . round 10 round 16 block size Throughput = target_clock_period
CFB (Cipher FeedBack) Mode
Cipher Feedback Mode - CFB Encryption IV . . . E E E E E . . . k1 k2 k3 kN-1 kN m1 m2 m3 mN-1 mN c1 c2 c3 cN-1 cN ci = mi ki ki =EK(ci-1) for i=1..N, and c0 = IV
Cipher Feedback Mode - CFB Decryption IV . . . E E E E E . . . k1 k2 k3 kN-1 kN m1 m2 m3 mN-1 mN c1 c2 c3 cN-1 cN mi = ci ki ki =EK(ci-1) for i=1..N, and c0 = IV
Cipher Feedback Mode – CFB (simplified block diagram) IV IV register register ISi ISi IN IN K E K E IS1 = IV ci = EK(ISi) mi ISi+1 = ci OUT OUT ci ci mi mi
CBC (Cipher Block Chaining) Mode
Cipher Block Chaining Mode - CBC Encryption m1 m2 m3 mN-1 mN . . . IV E E E E E . . . c1 cN c2 c3 cN-1 ci = EK(mi ci-1) for i=1..N c0=IV
Cipher Block Chaining Mode - CBC Decryption c1 c2 c3 cN-1 cN D D D D D . . . . . . IV m1 m2 m3 mN-1 mN mi = DK(ci) ci-1 for i=1..N c0=IV