Final Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets 309326767 Maxim Zavodchik 310623772.

Slides:



Advertisements
Similar presentations
Nios Multi Processor Ethernet Embedded Platform Final Presentation
Advertisements

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
Internal Logic Analyzer Final presentation-part B
Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
Super Fast Camera System Performed by: Tokman Niv Levenbroun Guy Supervised by: Leonid Boudniak.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
Term Project Overview Yong Wang. Introduction Goal –familiarize with the design and implementation of a simple pipelined RISC processor What to do –Build.
1 Matrix Multiplication on SOPC Project instructor: Ina Rivkin Students: Shai Amara Shuki Gulzari Project duration: one semester.
© 2004 Xilinx, Inc. All Rights Reserved Implemented by : Alon Ben Shalom Yoni Landau Project supervised by: Mony Orbach High speed digital systems laboratory.
NETWORK ON CHIP ROUTER Students : Itzik Ben - shushan Jonathan Silber Instructor : Isaschar Walter Final presentation part B Spring 2006.
Reliable Data Storage using Reed Solomon Code Supervised by: Isaschar (Zigi) Walter Performed by: Ilan Rosenfeld, Moshe Karl Spring 2004 Part A Final Presentation.
Simulation Interface Final Presentation Guy Zur Eithan Nadir Instructor : Igal Kogan.
Asynchronous Pipelined Ring Interconnection for SoC Final Presentation One semester project, Spring 2005 Supervisor: Nitzan Miron Students: Ziv Zeev Shwaitser.
DSP Algorithm on System on Chip Performed by : Einat Tevel Supervisor : Isaschar Walter Accompanying engineers : Emilia Burlak, Golan Inbar Technion -
Presenting: Itai Avron Supervisor: Chen Koren Final Presentation Spring 2005 Implementation of Artificial Intelligence System on FPGA.
Configurable System-on-Chip: Xilinx EDK
Workload distribution in satellites Part A Final Presentation Performed by :Grossman Vadim Maslovksy Eugene Instructor:Rivkin Inna Spring 2004.
הטכניון - מכון טכנולוגי לישראל הפקולטה להנדסת חשמל Technion - Israel institute of technology department of Electrical Engineering Virtex II-PRO Dynamical.
Programmable logic and FPGA
Workload distribution in satellites Final Presentation Performed by :Grossman Vadim Maslovksy Eugene Instructor:Rivkin Inna Spring 2004.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
Presenting: Itai Avron Supervisor: Chen Koren Characterization Presentation Spring 2005 Implementation of Artificial Intelligence System on FPGA.
Reliable Data Storage using Reed Solomon Code Supervised by: Isaschar (Zigi) Walter Performed by: Ilan Rosenfeld, Moshe Karl Spring 2004 Midterm Presentation.
Technion Digital Lab Project Xilinx ML310 board based on VirtexII-PRO programmable device Students: Tsimerman Igor Firdman Leonid Firdman Leonid.
Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
Interface of DSP to Peripherals of PC Spring 2002 Supervisor: Broodney, Hen | Presenting: Yair Tshop Michael Behar בס " ד.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Winter 2005 Winter 2005 Virtex II-Pro Dynamical Test Application - Part.
Presenting: Itai Avron Supervisor: Chen Koren Mid Semester Presentation Spring 2005 Implementation of Artificial Intelligence System on FPGA.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
הטכניון - מכון טכנולוגי לישראל הפקולטה להנדסת חשמל Technion - Israel institute of technology department of Electrical Engineering Virtex II-PRO Dynamical.
Technion Digital Lab Project Performance evaluation of Virtex-II-Pro embedded solution of Xilinx Students: Tsimerman Igor Firdman Leonid Firdman.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
General Purpose FIFO on Virtex-6 FPGA ML605 board midterm presentation
Final presentation Encryption/Decryption on embedded system Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Winter 2013 Part A.
Viterbi Decoder Project Alon weinberg, Dan Elran Supervisors: Emilia Burlak, Elisha Ulmer.
General Purpose FIFO on Virtex-6 FPGA ML605 board Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf 1 Semester: spring 2012.
Student : Andrey Kuyel Supervised by Mony Orbach Spring 2011 Final Presentation High speed digital systems laboratory High-Throughput FFT Technion - Israel.
Matrix Multiplication on FPGA Final presentation One semester – winter 2014/15 By : Dana Abergel and Alex Fonariov Supervisor : Mony Orbach High Speed.
DLS Digital Controller Tony Dobbing Head of Power Supplies Group.
Firmware based Array Sorter and Matlab testing suite Final Presentation August 2011 Elad Barzilay & Uri Natanzon Supervisor: Moshe Porian.
© 2003 Xilinx, Inc. All Rights Reserved Answers DSP Design Flow.
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
J. Christiansen, CERN - EP/MIC
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
LZRW3 Data Compression Core Dual semester project April 2013 Project part A final presentation Shahar Zuta Netanel Yamin Advisor: Moshe porian.
Part A Presentation Implementation of DSP Algorithm on SoC Student : Einat Tevel Supervisor : Isaschar Walter Accompanying engineer : Emilia Burlak The.
1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital.
1 הפקולטה להנדסת חשמל הפקולטה להנדסת חשמל Department of Electrical Engineering הטכניון - מכון טכנולוגי לישראל Technion - Israel Institute of Technology.
Final Presentation Final Presentation OFDM implementation and performance test Performed by: Tomer Ben Oz Ariel Shleifer Guided by: Mony Orbach Duration:
PROJECT - ZYNQ Yakir Peretz Idan Homri Semester - winter 2014 Duration - one semester.
Final Presentation Implementation of DSP Algorithm on SoC Student : Einat Tevel Supervisor : Isaschar Walter Accompanying engineer : Emilia Burlak The.
Tools - LogiBLOX - Chapter 5 slide 1 FPGA Tools Course The LogiBLOX GUI and the Core Generator LogiBLOX L BX.
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
© 2003 Xilinx, Inc. All Rights Reserved Answers DSP Design Flow.
Peter JansweijerATLAS week: February 24, 2004Slide 1 Preparatory Design Studies MROD-X Use Xilinx Virtex II Pro –RocketIO –PowerPC –Port the current MROD-In.
Generic SOC Architecture for Convolutional Neural Networks CDR By: Merav Natanson & Yotam Platner Supervisor: Guy Revach HSDSL Lab, Technion.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Survey of Reconfigurable Logic Technologies
Introduction to Field Programmable Gate Arrays (FPGAs) EDL Spring 2016 Johns Hopkins University Electrical and Computer Engineering March 2, 2016.
System on a Programmable Chip (System on a Reprogrammable Chip)
Introduction to Programmable Logic
Electronics for Physicists
MSECE Thesis Presentation Paul D. Reynolds
Paul D. Reynolds Russell W. Duren Matthew L. Trumbo Robert J. Marks II
Electronics for Physicists
RTL Design Methodology
Presentation transcript:

Final Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik

Project Objectives Implementing Neural Network on FPGA Implementing Neural Network on FPGA  Creating modular design  Implementing in software (Matlab)  Creating PC Interface Performance Analyze: Performance Analyze:  Area on chip  Interconnections  Speed vs. software implementation  Frequency  Cost

Project’s Part A Objectives Implementing a single neuron in VHDL. Implementing a single neuron in VHDL. Researching and integrating into EDK environment and running the design on FPGA. Researching and integrating into EDK environment and running the design on FPGA. Implementing the feed forward calculation. Implementing the feed forward calculation. Implementing the learning in Matlab. Implementing the learning in Matlab. Building a Graphical User Interface for friendly communication with the system. Building a Graphical User Interface for friendly communication with the system.

Testing Application Single neuron can separate two regions by linear line. Single neuron can separate two regions by linear line. There is need for a multi layered network to recognize an image. There is need for a multi layered network to recognize an image. Implementing and/or functions: Implementing and/or functions: (0,0) (0,1) (1,0) (1,1) (0,0) (0,1) (1,0) (1,1) AND Function OR Function

Learning in Matlab Implementing a NN using logsig() activation function and ‘traingdx’ training algorithm. Implementing a NN using logsig() activation function and ‘traingdx’ training algorithm. Providing a Truth Table for the binary functions AND/OR as a training set. Providing a Truth Table for the binary functions AND/OR as a training set. % Build the NN temp = size(inputs_vec); in_range = zeros(temp(1),2); in_range(:,2) = 1; net = newff(in_range,[1],{'logsig'},'traingdx'); % Train the NN net.TrainParam.epochs = epochs; net.TrainParam.goal = error; net = train(net,inputs_vec,target_vec); Sigmoid Function:

Hardware Description XilinX ML310 Development Board XilinX ML310 Development Board  RS232 Standard - FPGA UART  Transmission rate is 115,200 bits/sec optimally  VirtexII-Pro XC2VP30 FPGA  2 PowerPC 405 Core MHz  2,448 Kbytes of BRAM memories  x18 bits multipliers  30,816 Logic Cells  Up to 111,232 internal registers  Up to 111,232 LUTS  256 MB DDR DIMM

System Interface Inputs Inputs  Binary number ( up to 1024 bits)  Weights – 13 bits width  Fixed Point Presentation:  1 sign bit  4 integer bits  8 fraction bits  Sigmoid function values – 8 bit width Outputs Outputs  Two bits – neuron’s binary result on the input number or failure detection.

System Description Power PC Weights memory Single Neuron UART Input memory Sigmoid memory PLBPLB OPBOPB PLB2OPB Bridge

EDK Integration PPC writes the BRAMS and controls Single Neuron through the PLB PPC writes the BRAMS and controls Single Neuron through the PLB Single Neuron connected to PLB as an User Core IPIF. Single Neuron connected to PLB as an User Core IPIF. Memories: Memories: PORT1: Connected to PLB as IPIF PORT2: Connected to Single Neuron directly UART (Serial Port) is connected to OPB. UART (Serial Port) is connected to OPB.

Control Flow Get WeightsGet Sigmoid Load Input number Load Bias Calculate φ(.) Calculate φ(.) Calculate output bitsSend the result to user IDLE Load decision values Get Inputs Wait for loading Bias

Architecture – Single Neuron Multiplier 1x13 bits Multiplier 1x13 bits Accumulator 13 bits width Accumulator 13 bits width FSM Controller FSM Controller MULT Accumulator REG REGREGREG Comparator Comparator Min Decision Max Decision Bias Weight X[i] W[i] Y v logsig(v) bias/max/min/inputs_num REG Bias/Min/Max/Inputs_num Registers Bias/Min/Max/Inputs_num Registers Comparator: Comparator:

Architecture – Memories (1) 2-Port BRAMS with separate clocks. 2-Port BRAMS with separate clocks. Special sized BRAMS generated by the Xilinx Core Generator. Special sized BRAMS generated by the Xilinx Core Generator. VHDL SRAM controller wrapping VHDL SRAM controller wrapping Inputs Memory: Inputs Memory:  Up to 1024 binary bits 1 Kbyte

Architecture – Memories (2) Weights Memory: Weights Memory: 1024*13bits =13,312 bits =1,664 bytes1024*13bits =13,312 bits =1,664 bytes Bias weight: Bias weight: 1 register for output layer (13 bit width)1 register for output layer (13 bit width) Sigmoid Memory: Sigmoid Memory:  Values out of range [-4,4] are mapped to 0,1  Memory block quantizing sigmoid values : 11 bits input representing values [-4,4]11 bits input representing values [-4,4] 8 bits output representing values [0,1]8 bits output representing values [0,1] ~1.6 Kbyte 2 Kbyte

Simulation (1) Single Neuron VHDL simulation Single Neuron VHDL simulation Application: AND function with 4 inputs Application: AND function with 4 inputs Minimum decision value:0.3789Minimum decision value: Maximum decision value:0.6211Maximum decision value: Pipeline stages: 3-Pipeline stages: MemoriesMultAccumulator

Simulation (2) Result: Result: Sigmoid answer: 9F = = Sigmoid answer: 9F = = “ready” signal assigned when done “ready” signal assigned when done Latency: 14 + |Inputs| - 1 [clocks] Latency: 14 + |Inputs| - 1 [clocks]

Software PPC’s program controls the whole flow. PPC’s program controls the whole flow. The PPC writes control words and reads result words on PLB as 64 bits of data. The PPC writes control words and reads result words on PLB as 64 bits of data. Control/Result Words Structure: Control/Result Words Structure:  Memories:  Single Neuron: From CPUFrom CPU To CPUTo CPU [load_w0][rst][start][w0_ready][load_min_val][load_max_val][load_inputs_num][w0/min_val/max_val/inputs_number][ "0" ] ÷÷ [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ][ 6 ][ 7÷19 ][20÷63] [ y ][ready][w0_rd][ "0" ] ÷÷ [0 ÷ 1][ 2 ][ 3 ][4÷63] [ 0 ][ 1÷10 ][ 11÷24 / 11 ][ 25÷63 / 12÷63 ] [USER_wr_a][USER_addr_a][USER_dout_a][ “0” ] [ 0 ][ 1÷11 ][ 12÷19 ][ 20÷63 ] Sigmoid W/X

Building a Graphical User Interface for friendly communication between the user and the system. Building a Graphical User Interface for friendly communication between the user and the system. Implemented in Matlab 6.1 Implemented in Matlab 6.1 The GUI enables: The GUI enables:  Choosing a function to be implemented  Define maximum error, epochs number and decision values.  Choosing the length of binary input vector.  Simulating the neuron for input vector. Graphical User Interface

Project’s Part B Objectives Creating a multi layered network to classify a digit. Creating a multi layered network to classify a digit. Implementing a modular system : Implementing a modular system :  Number of neurons in the hidden layer varies from 2 to 10.  Number of sub-networks.

Project’s Part B Objectives (Cont.) Implementing a Parallel System: Implementing a Parallel System:  Dividing complex fully-connected network into sub-networks.  10 sub-networks running concurrently.  Up to 10 neurons run concurrently in each sub- network.  Up to 5 inputs are calculated together depending on number of neurons in hidden layer.  Parallel calculations of output layer.