Download presentation
Presentation is loading. Please wait.
Published byNickolas Melton Modified over 8 years ago
1
Performed by: Yotam Platner & Merav Natanson Instructor: Guy Revach המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון טכנולוגי לישראל הפקולטה להנדסת חשמל Technion - Israel institute of technology department of Electrical Engineering דו ” ח סיכום פרויקט ( סופי ( Subject: Configurable System-On-Chip For Deep Learning Neural Networks סמסטר חורף תשע " ה 1
2
Abstract המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory 2 The project’s goal is to implement an SOC architecture that could compute a wide variety of Deep Learning algorithms using Convolutional Neural Networks in a fast, dynamic and configurable way. It includes: Analysis of the common factors of different Deep Learning algorithms using Convolutional Neural Networks, with specific focus on LENET5 algorithm for digit recognition, which is the primary test case for the system. Architecture of an FPGA co-processor with a configurable number of execution units and neuron units, which can be programmed by a software system to perform the operation of multiple layers of a Deep Learning algorithm. High level architecture of a software API in C++, which is designed to run on an embedded Linux OS. The API enables a user to operate the co-processor. Analysis of the system's performance for different user configuration options. Functional simulation (in VHDL) for the complex parts of the co-processor.
3
System description המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory 3 The implementation is centered on the concept of execution units: An execution unit includes the necessary logic for the execution of a single layer of a neural network algorithm (convolution, sub-sampling or fully connected). The execution unit pulls data, weights and configuration from an internal memory of the FPGA (written to by the software side), and operates calculation units called neurons. Several of those execution units and neurons are used in parallel to achieve a high throughput for the whole algorithm. The software API manages the coprocessor through the use of a configuration block for each layer, and allocation of memory. The software performs final calculation stages on the results, and enables simple operation of the system by the user.
4
Specification המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory Hardware Software 4 Dual-Core ARM Cortex A9 CPU 2 NEON SIMD engines Operating System : Embedded Linux (PetaLinux) Xilinx Linux Drivers for AXI CDMA controller, and Ethernet Adapter Xilinx Zynq Z-7020 System-On-Chip Equivalent of Artix 7 FPGA with 85K logic cells 560KB Block-RAM 220 DSP blocks (18x25 multipliers)
5
System Block Diagram המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory 5
6
FPGA Block Diagram המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory 6
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.