Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMSIS-NN: Efficient Neural Network kernels for Arm Cortex-M CPUs

Similar presentations


Presentation on theme: "CMSIS-NN: Efficient Neural Network kernels for Arm Cortex-M CPUs"— Presentation transcript:

1 CMSIS-NN: Efficient Neural Network kernels for Arm Cortex-M CPUs
Felix Johnny Maintainer, CMSIS-NN Dec 2nd , 2019

2 Used in many projects > 1,200,000 source files public on GitHub
CMSIS Pathway to the Arm eco-system Cortex Microcontroller Software Interface Standard Consistent, generic, and standardized software building blocks Available for all Cortex-M and Cortex-A5, A7, A9 processors Open source – public development on GitHub: 6,000+ devices supported with CMSIS Used in many projects > 1,200,000 source files public on GitHub The Cmsis (Cortex Microcontroller Software Interface Standard) standard is a collection of API definitions, libraries, utilities, and methods that simplify and accelerate the creation of microcontroller applications. Cmsis is provided FOC by ARM (and the contributors) with a permissive Apache 2.0 licenses and the software components can be used in any open source and commercial projects. Users (software programmers) benefit from RTOS, DSP-Libraries, consistent access to peripheral, and debug visibility. SiPs (device vendors) have a clear process to deploy support for new devices along with hardware abstraction layers and software libraries. The device support is delivered in Device Family Packs (that typically support a complete family of [many] microcontrollers) and can be used with several main stream tools including ARM Keil MDK and the new DS-MDK. CMSIS is supported by all leading toolchains and allows SiPs to focus on the creation of the device (not on establishing contacts with the members of large ARM Eco-System). That is the reason for our headline: CMSIS – The Pathway to the ARM Eco-System! For more information visit Device family packs > 3,000,000 pack downloads in past 6 months

3 CMSIS 5 Consistent software framework for Arm Cortex-M and Cortex-A5/A7/A9 based systems CMSIS-Pack System-on-chip Arm® Cortex® processor Application code Specialized peripherals Communication peripherals CoreSight™ debug logic µVision Debugger Debugger CMSIS-RTOS Real-time execution CMSIS-NN Machine learning CMSIS-DSP Signal processing CMSIS-SVD Peripheral description CMSIS-DAP Debug access CMSIS-Driver Middleware interface CMSIS-CORE Processor core and peripheral access Peripheral HAL Device specific Access Filter (MPU, SAU) CMSIS-Zone System Partitioning Important additions to CMSIS: support for Arm Cortex-M23/33 and A5/A7/A9 cores. Keil RTX 5 is now the kernel for mbed OS. It uses the CMSIS-RTOS API v2 natively and has some enhanced features such as dynamic and static object creation We have a FreeRTOS port for the v2 API to get more traction from the wide FreeRTOS user base

4 Why target Arm Cortex-M CPUs?
Accelerate deployment of Machine Learning on edge devices Enable Machine Learning on more than 47 billion shipped units Networks with lower memory foot prints and MACs are available now MobileNet V2 3.38 MB parameters ~ 307 million MACs Person Detect(TFL) 250 kBytes ~ 7 million MACs

5 Arm Cortex-M CPU and CMSIS-NN
The relevant question to ask Cortex-M0 Cortex-M3 Cortex-M33 is it SIMD* capable? No Yes Cortex-M0, Cortex-M3, Cortex-M23, Cortex-M33 … Cortex-M7, Cortex-M4, Cortex-M33 … * Single Instruction Multiple Data

6 CMSIS-NN – How do we go about with optimizations?
What do we optimize first? CONV_2D DEPTHWISE_CONV_2D FULLY_CONNECTED Optimized for SIMD and non-SIMD What next? Kernel support for Arm Helium Technology M-Profile Vector Extension (MVE)

7 Components of an optimized kernel
Optimization technique where possible Usually it is some form of memory reorganization Optimization using intrinsics Nudge compiler to use SIMD instructions Provides similar performance results over different optimization levels.

8 CMSIS-NN & TensorFlow Lite for Microcontrollers
Access to optimized kernels through TFL micro Collaboration with TFL micro team Support for optimized bit exact int8 kernels Fallback on reference kernels when optimization is not available Application TensorFlow Lite micro runtime ref kernels CMSIS-NN opt kernels Arm Cortex-M CPU

9 Arm Mbed Enabled ..one way to execute TFL micro demos and tests
List of Cortex-M4 and Cortex-M7 boards Why use Mbed Enabled boards? Low threshold to prototype projects on Cortex-M CPUs Models and kernel tests from TFL micro can be run on Mbed Enabled boards

10 Reference kernel and CMSIS-NN build using Mbed
Clone TensorFlow repository from GitHub Generate project in TFL micro with or without optimized kernels Write your application Compile and flash make -f tensorflow/lite/experimental/micro/tools/make/Makefile generate_<use_case_name>_mbed_project TAGS=cmsis-nn mbed compile -DARM_MATH_DSP -DARM_MATH_LOOPUNROLL -f -m auto -t GCC_ARM --profile release

11 Generate project with ‘TAGS=cmsis-nn’
Optimized versions of kernels are selected when available. The reference kernels are used as the fallback option. All Ops Include reference implementation for op CMSIS-NN for op? N Y Include optimized implementation for op Remaining op? Y N Generate project

12 Question the performance numbers
int8 person detect model from TFL micro Sensationalizing the result 17x performance boost with CMSIS-NN kernels The complete picture Mbed profile Optimization option Ref kernels(cycles) CMSIS-NN Optimized kernels(cycles) speedup (ref/CMSIS-NN)x release Os 7 Ofast 2 Og 8 O2 Nothing specified - 17 ST NUCLEO – F746ZG, Cortex-M7, GCC

13 Useful links CMSIS GitHub: https://github.com/ARM-software/CMSIS_5
Supported operators in CMSIS : software/CMSIS_5/tree/develop/CMSIS/NN Person Detection example : micro/examples/person_detection My GitHub ID: felix-johnny

14 Demo or Questions

15

16 Micro Architectures and ML Kernels

17 Kernel selection – reference vs CMSIS-NN optimized


Download ppt "CMSIS-NN: Efficient Neural Network kernels for Arm Cortex-M CPUs"

Similar presentations


Ads by Google