1 Performance Investigations Hannes Tschofenig, Manuel Pégourié-Gonnard 25 th March 2015.

Slides:



Advertisements
Similar presentations
Doc.: IEEE /1012r0 Submission September 2009 Dan Harkins, Aruba NetworksSlide 1 Suite-B Compliance for a Mesh Network Date: Authors:
Advertisements

Web security: SSL and TLS
Spring 2000CS 4611 Security Outline Encryption Algorithms Authentication Protocols Message Integrity Protocols Key Distribution Firewalls.
Internet and Intranet Protocols and Applications Lecture 9a: Secure Sockets Layer (SSL) March, 2004 Arthur Goldberg Computer Science Department New York.
TLS Introduction 14.2 TLS Record Protocol 14.3 TLS Handshake Protocol 14.4 Summary.
SSL CS772 Fall Secure Socket layer Design Goals: SSLv2) SSL should work well with the main web protocols such as HTTP. Confidentiality is the top.
An Introduction to Secure Sockets Layer (SSL). Overview Types of encryption SSL History Design Goals Protocol Problems Competing Technologies.
Efficient Public Key Infrastructure Implementation in Wireless Sensor Networks Wireless Communication and Sensor Computing, ICWCSC International.
Transport Layer Security (TLS) Protocol Introduction to networks and communications(CS555) Prof : Dr Kurt maly Student:Abhinav y.
Cryptography and Network Security
Information Security Principles & Applications Topic 4: Message Authentication 虞慧群
1 Digital Signatures & Authentication Protocols. 2 Digital Signatures have looked at message authentication –but does not address issues of lack of trust.
Small(er) Footprint for TLS Implementations Hannes Tschofenig Smart Object Security workshop, March 2012, Paris.
Web Security for Network and System Administrators1 Chapter 4 Encryption.
Session 5 Hash functions and digital signatures. Contents Hash functions – Definition – Requirements – Construction – Security – Applications 2/44.
Lesson Title: Introduction to Cryptography Dale R. Thompson Computer Science and Computer Engineering Dept. University of Arkansas
Cryptography and Network Security Chapter 17
FIT3105 Smart card based authentication and identity management Lecture 4.
Dr Alejandra Flores-Mosri Message Authentication Internet Management & Security 06 Learning outcomes At the end of this session, you should be able to:
November 1, 2006Sarah Wahl / Graduate Student UCCS1 Public Key Infrastructure By Sarah Wahl.
Mar 5, 2002Mårten Trolin1 Previous lecture More on hash functions Digital signatures Message Authentication Codes Padding.
Secure Hashing and DSS Sultan Almuhammadi ICS 454 Principles of Cryptography.
How cryptography is used to secure web services Josh Benaloh Cryptographer Microsoft Research.
Fall 2010/Lecture 311 CS 426 (Fall 2010) Public Key Encryption and Digital Signatures.
Cryptography and Network Security Chapter 11 Fourth Edition by William Stallings Lecture slides by Lawrie Brown/Mod. & S. Kondakci.
ACE – Design Considerations Corinna Schmitt IETF ACE WG meeting July 23,
Cryptography1 CPSC 3730 Cryptography Chapter 11, 12 Message Authentication and Hash Functions.
Feb 19, 2002Mårten Trolin1 Previous lecture Practical things about the course. Example of cryptosystem — substitution cipher. Symmetric vs. asymmetric.
TrustPort Public Key Infrastructure. Keep It Secure Table of contents  Security of electronic communications  Using asymmetric cryptography.
CSE 597E Fall 2001 PennState University1 Digital Signature Schemes Presented By: Munaiza Matin.
Network Security Essentials Fifth Edition by William Stallings Fifth Edition by William Stallings.
Chapter 8.  Cryptography is the science of keeping information secure in terms of confidentiality and integrity.  Cryptography is also referred to as.
CN8816: Network Security1 Confidentiality, Integrity & Authentication Confidentiality - Symmetric Key Encryption Data Integrity – MD-5, SHA and HMAC Public/Private.
1 Chapter 4 Encryption. 2 Objectives In this chapter, you will: Learn the basics of encryption technology Recognize popular symmetric encryption algorithms.
Cryptography and Network Security Chapter 11 Fifth Edition by William Stallings Lecture slides by Lawrie Brown.
Bob can sign a message using a digital signature generation algorithm
© Neeraj Suri EU-NSF ICT March 2006 DEWSNet Dependable Embedded Wired/Wireless Networks MUET Jamshoro Computer Security: Principles and Practice Slides.
Lecture slides prepared for “Computer Security: Principles and Practice”, 2/e, by William Stallings and Lawrie Brown, Chapter 21 “Public-Key Cryptography.
Understanding Cryptography – A Textbook for Students and Practitioners by Christof Paar and Jan Pelzl Chapter 10 – Digital Signatures.
Crypto Bro Rigby. History
Cosc 4765 SSL/TLS and VPN. SSL and TLS We can apply this generally, but also from a prospective of web services. Multi-layered: –S-http (secure http),
Key Management Workshop November 1-2, Cryptographic Algorithms, Keys, and other Keying Material  Approved cryptographic algorithms  Security.
Introduction to Secure Sockets Layer (SSL) Protocol Based on:
PORTING A NETWORK CRYPTOGRAPHIC SERVICE TO THE RMC2000 : A CASE STUDY IN EMBEDDED SOFTWARE DEVELOPMENT.
Cryptography and Network Security (CS435) Part Fourteen (Web Security)
Web Security : Secure Socket Layer Secure Electronic Transaction.
Chapter 21 Public-Key Cryptography and Message Authentication.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 2 – Cryptographic.
Cryptography and Network Security Chapter 13 Fourth Edition by William Stallings.
1 Number Theory and Advanced Cryptography 6. Digital Signature Chih-Hung Wang Sept Part I: Introduction to Number Theory Part II: Advanced Cryptography.
Public Key Algorithms Lesson Introduction ●Modular arithmetic ●RSA ●Diffie-Hellman.
Faster Implementation of Modular Exponentiation in JavaScript
By Sandeep Gadi 12/20/  Design choices for securing a system affect performance, scalability and usability. There is usually a tradeoff between.
SMUCSE 5349/7349 SSL/TLS. SMUCSE 5349/7349 Layers of Security.
IPSec and TLS Lesson Introduction ●IPSec and the Internet key exchange protocol ●Transport layer security protocol.
Azam Supervisor : Prof. Raj Jain
A Cross-Protocol Attack on the TLSProtocol Nikos Mavrogiannopoulos, Frederik Vercauteren, VesselinVelichkov, Bart Preneel. Presented by: Nitin Subramanian.
1 SSL/TLS. 2 Web security Security requirements Secrecy to prevent eavesdroppers to learn sensitive information Entity authentication Message authentication.
Public-Key encryption structure First publicly proposed by Diffie and Hellman in 1976First publicly proposed by Diffie and Hellman in 1976 Based on mathematical.
@Yuan Xue CS 285 Network Security Secure Socket Layer Yuan Xue Fall 2013.
Cryptography CSS 329 Lecture 13:SSL.
The Federal Information Processing Standards (FIPS) Encryption Suite Sean Smith COSC
Page 1 of 17 M. Ufuk Caglayan, CmpE 476 Spring 2000, SSL and SET Notes, March 29, 2000 CmpE 476 Spring 2000 Notes on SSL and SET Dr. M. Ufuk Caglayan Department.
CS 465 TLS Last Updated: Oct 31, 2017.
The Secure Sockets Layer (SSL) Protocol
Public Key Infrastructure
Diffie-Hellman Key Exchange
Digital Signature Standard (DSS)
Measuring the Performance and Energy Cost of Cryptography in IoT Devices Hannes Tschofenig, Arm.
Presentation transcript:

1 Performance Investigations Hannes Tschofenig, Manuel Pégourié-Gonnard 25 th March 2015

2 Motivation  In we tried to provide guidance for the use of DTLS (TLS) when used in IoT deployments and included performance data to help understand the design tradeoffs.draft-ietf-lwig-tls-minimal  Later, work in the IETF DICE was started with the profile draft, which offers detailed guidance concerning credential types, communication patterns. It also indicates which extensions to use or not to use.profile draft  Goal of is to offer performance data based on the recommendations in the profile draft.draft-ietf-lwig-tls-minimal  This presentation is about the current status of gathering performance data for later inclusion into the document.draft-ietf-lwig-tls-minimal

3 Performance Data  This is the data we want:  Flash code size  Message size / Communication Overhead  CPU performance  Energy consumption  RAM usage  Also allows us to judge the improvements of various extensions and gives engineers a rough idea what to expect when planning to use DTLS/TLS in an IoT product.  offers preliminary data aboutdraft-ietf-lwig-tls-minimal-01  Code size of various basic building blocks (data from one stack only)  Memory (RAM/flash) (pre-shared secret credential only)  Communication overhead (high level only)

4  Goal of the authors: Determine performance of asymmetric cryptography on ARM-based processors.  Next slides explains  Assumptions for the measurements,  ARM processors used for the measurements,  Development boards used,  Actual performance data, and  Comparison with other algorithms. Overview

5  Main focus of the measurements so far was on  raw crypto (and not on protocol exchanges)  ECC rather than RSA  Different ECC curves  Run-time performance (not energy consumption, RAM usage, code size)  No hardware acceleration was used.  Used open source software; code based on PolarSSL/mbed TLS stack.  No hardware-based random number generator in the development platform was used  Not fit for real deployment. Assumptions

6 ARM Cortex-M Processors Lowest cost Low power Lowest power Outstanding energy efficiency Performance efficiency Feature rich connectivity Digital Signal Control (DSC) Processor with DSP Accelerated SIMD Floating point (FP) Processors use the 32-bit RISC architecture Recently released; Best performance Processors used in the performance tests

7 Prototyping Boards used in Performance Tests  ST Nucleo F401RE (STM32F401RET6)  ARM Cortex-M4 CPU with FPU at 84MHz  512KB Flash, 96KB SRAM  ST Nucleo F103 (STM32F103RBT6)  ARM Cortex-M4 CPU with FPU at 72MHz  128KB Flash, 20KB SRAM  ST Nucleo L152RE (STM32L152RET6)  ARM Cortex-M3 CPU at 32MHz  512 KBytes Flash, 80KB RAM  ST Nucleo F091 (STM32F091RCT6)  ARM Cortex-M0 CPU at 48MHz  256 KBytes Flash, 32KB RAM  NXP LPC1768  ARM Cortex-M3 CPU at 96MHz  512KB Flash, 32KB RAM  Freescale FRDM-KL25Z  ARM Cortex-M0+ CPU at 48MHz  128KB Flash, 16KB RAM ST Nucleo LPC1768 FRDM-KL25Z

8 ECC Curves  NIST curves: secp521r1, secp384r1, secp256r1, secp224r1, secp192r1  “Koblitz curves”: secp256k1, secp224k1, secp192k1  Brainpool curves: brainpoolP512r1, brainpoolP384r1, brainpoolP256r1  Curve25519 (only preliminary results).  Note that FIPS186-4 refers to secp192r1 as P-192, secp224r1 as P-224, secp256r1 as P-256, secp384r1 as P-384, and secp521r1 as P-521.

9 Optimizations  NIST Optimization  Utilizes special structure of NIST chosen curves.  Appendix 1 of  Longer version in FIPS PUB 186-4:   Relevant configuration parameter: POLARSSL_ECP_NIST_OPTIM  Fixed Point Optimization:  Pre-computes points  Described in  Relevant configuration parameter: POLARSSL_ECP_FIXED_POINT_OPTIM  Window:  Technique for more efficient exponentation  Sliding window technique described in  Relevant configuration parameter: POLARSSL_ECP_WINDOW_SIZE (min=2, max=7).

10 ECDSA, ECDHE, and ECDH  Elliptic Curve Digital Signature Algorithm (ECDSA) is the elliptic curve variant of the Digital Signature Algorithm (DSA) or, as it is sometimes called, the Digital Signature Standard (DSS).  It is used in TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8 ciphersuite recommended in CoAP (and consequently also in the DTLS profile draft).  ECDSA, like DSA, has the property that poor randomness used during signature generation can compromise the long-term signing key.  For this reason the deterministic variant of (EC)DSA (RFC 6979) is implemented, which uses the private key as a source or “entropy” to seed a PRNG.  Note: None of the prototyping boards listed in the slide deck provide true random number generation.  CoAP recommends this ciphersuite TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8 that makes use of the Ephemeral Elliptic Curve Diffie-Hellman (ECDHE).  The Elliptic Curve Diffie-Hellman (ECDH) is only used for comparison purposes in this slide deck but not used in the recommended ciphersuites.

11 Key Length SymmetricECCDH/DSA/RSA  Tradeoff between security and performance.  Values based on recommendations from RFC  [I-D.ietf-uta-tls-bcp] recommends at least 112 bits symmetric keys.I-D.ietf-uta-tls-bcp  A 2013 ENISA report states that an 80bit symmetric key is sufficient for legacy applications but recommends 128 bits for new systems.

12 Observations: Performance Figures  ECDSA signature operation is faster than ECDSA verify operation.  Brainpool curves are slower than NIST curves because Brainpool curves use random primes.  ECC key sizes above 256 bits are substantially slower than ECC curves with key size 192, 224, and 256.  ECDH is only slightly faster than ECDHE (when fixed point optimization is enabled).  CPU speed has a significant impact on the performance.  The performance of symmetric key cryptography (keyed hash functions, encryption functions) is neglectable.

13 Observations: Optimizations  NIST curve optimization provides substantial benefit for NIST secp*r1 curves.  Fixed point optimization has a significant influence on the performance.  There is a performance – RAM usage tradeoff: increased performance comes at the expense of additional RAM usage.  ECC library increases code size but also requires a fair amount of RAM for optimizations (for most curves).

14 ECC Performance of the Cortex M3/M4

15 NIST curves: secp521r1, secp384r1, secp256r1, secp224r1, secp192r1 Koblitz curves: secp256k1, secp224k1, secp192k1 Performance of various NIST/Koblitz ECC Curves

16 Performance difference between signature vs. verify For comparison: secp192r1 (signature) needs 66msec. For comparison: secp256r1 (signature) needs 122msec.

17 For comparison: Secp256r1 (signature) needs 122msec. Performance of Brainpool Curves

18 For comparison: Secp256r1 (verify) needs 458msec. Performance of Brainpool Curves

19 For comparison: secp192r1 (signature, W=7) needs 66msec. For comparison: secp521r1 (signature, W=7) needs 351msec. Performance impact of the “window” parameter

20 The Performance Impact of the NIST Optimization secp192r1 (ECDHE): 5986 msec (F401RE, optimization disabled) vs. 638 msec (optimization enabled)

21 ECC Performance of the Cortex M0/M0+

22 ECDHE Performance of the KL25Z

23 ECDSA Performance of the KL25Z

24 + FP optimization enabled

25 + FP optimization enabled

26 + FP optimization enabled

27 CPU Speed Impact

28 Performance of ECDHE: L152RE vs. LPC1768 secp192r1 (ECDHE): 1155 msec (L152RE) vs. 229 msec (LPC1768) L152RE: Cortex-M3 with 32MHz LPC1768: Cortex-M3 with 96MHz NIST optimization enabled. Fixed-point speed-up enabled.

29 Performance Comparison: Prototyping Boards

30 Curve25519 (Warning: Preliminary Results)

31 Notes: The Curve25519-mbedtls implementation uses a generic libary. Hence, the special properties of Curve25519 are not utilized. Curve25519 has very low RAM requirements (~1 Kbyte only). Curve25519-donna is based on the Google implementation. Improvements for M0/M0+ are likely since the code has not been tailored to the architecture. Question: Is Curve25519 a way to get ECC on M0/M0+?

32 The Power of Assembly Optimizations  Example: micro-ecc library   Written in C, with optional inline assembly for ARM and Thumb platforms.  LPC1114 at 48MHz (ARM Cortex-M0) ECDH time (ms)secp192r1secp256r1 LPC STM32F091604, ECDSA verify time (ms)secp192r1Secp256r1 LPC STM32F  Performance improvement between 200 and 300 %

33 RAM Usage

34 What was measured?  Heap using a custom memory allocation handler (instead of malloc).  Memory allocated on the stack was not measured (but it is negligible).  Measurement was done on a Linux PC (rather than on the embedded device itself for convenience reasons).  Two aspects investigated:  Memory impact caused by different window parameter changes.  Memory impact caused by FP performance optimization.

35

36

37 Note: NIST optimization enabled in both cases since it does not have an impact on the heap usage.

38 Summary  To enable certain optimizations sufficient RAM is needed. A tradeoff decision between RAM and speed.  Optimizations pays off.  This slide shows heap usage (NIST optimization enabled).

39 Using ~50 % more RAM increases the performance by a factor 8 or more.

40 Applying Results to TLS/DTLS

41 Raw Public Keys with TLS_ECDHE_ECDSA_*  TLS / DTLS client needs to perform the following computations: 1. Client verifies the signature covering the Server Key Exchange message that contains the server's ephemeral ECDH public key (and the corresponding elliptic curve domain parameters). 2. Client computes ECDHE. 3. Client creates signature over the Client Key Exchange message containing the client's ephemeral ECDH public key (and the corresponding elliptic curve domain parameters).  Summary:  1 x ECDSA verification for step (1)  1 x ECDHE computation for step (2)  1 x ECDSA signature for step (3)  Example (LPC1768, secp224r1, W=7, FP and NIST optimization enabled)  329msec (ECDSA verification)  303 msec (ECDHE computation)  85 msec (ECDSA signature) Total: 717 msec

42 Applying Results to TLS/DTLS Certificates with TLS_ECDHE_ECDSA_* 1 x ECDSA verification for server certificate CA Certificate CA Certificate Server Certificate CA Certificate CA Certificate Intermediate CA Certificate Server Certificate CA Certificate CA Certificate 1 st Intermediate CA Certificate Server Certificate 2 nd Intermediate CA Certificate Same as with raw public key plus (assuming no OCSP and certs are signed with ECC certificates) 1 x ECDSA verification for Intermediate CA certificate 1 x ECDSA verification for server certificate 1 x ECDSA verification for 1 st Intermediate CA certificate 1 x ECDSA verification for 2 nd Intermediate CA certificate 1 x ECDSA verification for server certificate

43 Symmetric Key Cryptography

44 Symmetric Key Cryptography  Secure Hash Algorithm (SHA) creates a fixed length fingerprint based on an arbitrarily long input. The output length of the fingerprint is determined by the hash function itself. For example, SHA256 produces an output of 256 bits.  Advanced Encryption Standard (AES) is an encryption algorithm, which has a fixed block size of 128 bits, and a key size of 128, 192, or 256 bits.  A mode of operation describes how to repeatedly apply a cipher's single-block operation to securely transform amounts of data larger than a block.  Examples of modes of operation: CCM, GCM, CBC.  Test relevant information:  SHA computes a hash over a buffer with a length of 1024 bytes.  AES-CBC: 1024 input bytes are encrypted. No integrity protection is used. IV size is 16 bytes.  AES-CCM and AES-GCM: 1024 input bytes are encrypted and integrity protected. No additional data is used. In this version of the test a 12 bytes nonce value is used together with the input data. In addition to the encrypted data a 16 byte tag value is produced.

45 Symmetric Key Crypto: Performance of the KL25Z

46 Symmetric Key Crypto: Performance of the LPC1768

47 Conclusion  ECC requires performance-demanding computations. Those take time.  What an acceptable delay is depends on the application.  Many applications only need to run public key cryptographic operations during the initial (session) setup phase and infrequently afterwards.  With session resumption DTLS/TLS uses symmetric key cryptography most of the time (which is lightning fast).  Detailed performance figures depend on the enabled performance optimizations (and indirectly the available RAM size), the key size, the type of curve, and CPU speed.  Choosing the microprocessor based on the expected usage environment is important.

48 Next Steps  Collecting performance data on IoT devices is time-consuming. We would appreciate help.  In particular, we need  Verification of the gathered data  Data from other crypto libraries  Further tests (energy efficiency, complete DTLS/TLS handshake data, data about various extensions, more data for Curve25519, etc.).  We plan to update accordingly.draft-ietf-lwig-tls-minimal