Self Organizing Learning Array SOLAR New learning algorithm –Multi layer structure and on-line learning; –local and sparse interconnections; –entropy based self-organized learning Superior performance –Parallel computing organization; –Low power dissipation; –Efficient communication; –High chip utilization rate; Potential to be a leading technology in machine learning –pave the way to machine intelligence application areas including pattern recognition, intelligent control, signal processing, robotics and biological research.
DARPA: Cognitive Information Processing Technology Wanted: machine that can reason, using substantial amounts of knowledge Can learn from its experiences so that its performance improves with knowledge and experience Can explain itself and can accept direction Is aware of its own behavior and reflects on its own capabilities Responds in a robust manner to a surprise
Self-Organizing Learning ARray (SOLAR ) Dowling, 1998, p. 17
Here,,, represent the probabilities of each class, attribute probability and joint probability respectively. Self-organizing Principle Neuron self-organization includes: Selection of inputs Choosing transformation function Setting threshold Providing output probabilities Setting output control
Self-Organizing Process and Neuron Structure
Self-organizing Process Matlab Simulation Initial interconnection Learning process
Synthetic Data Classification
Credit Card Data Set Method Error Rate Cal SOLAR Itrule Discrim Logdisc DIPOL CART RBF CASTLE NaiveBay Backprop C SMART Baytree k-NN NewID LVQ ALLOC Quadisc Default Kohonen Failed SOLAR self organizing structure
SW/HW codesign of SOLAR JTAG Programming Software run in PC PCI Bus Hardware Board Virtex XCV800FPGA dynamic configuration
Cosimulation - What and Why? Cosimulation –Simulation of heterogeneous systems whose hardware and software components are interacting Benefits of cosimulation –Verifying correct functionality of the target even before hardware is built –Profiling the dynamic behavior –Identifying the performance bottleneck –Preventing problems such as over-design or under- design related to system integration –Saving the system development cost and cycle
Traditional Cosimulation Environment –A software process Written in high-level language, such as C/C++ –A simulation process of hardware model Hardware description language, such as VHDL –Inter-process communication (IPC) routine Connect the hardware process and software process Software Model (C-program) Hardware Model (VHDL) IPC routines Foreign IPC procedures IPC Two simulators
Traditional Cosimulation To perform cosimulation, two simulators should be combined and complex IPC should be developed. These IPCs are error-prone routines requiring to handle various formats of data and processed signals Especially, when focusing on hardware part, we hope that the software part is minimized and the HW/SW communication is simple and reliable
SOLAR Cosimulation –A software process Written in behavioral VHDL which is not synthesizable –A hardware process Written in RTL VHDL which is synthesizable –HW/SW communication FSM and FIFOs Software Model (Behavioral VHDL) Hardware Model (RTL VHDL) One simulator FSM and FIFOs
SOLAR Cosimulation To perform SOLAR cosimulation, one single VHDL simulator is applied. So complex error- prone IPC is avoided. Data formats and other problems can be easily handled. The interface between HW/SW is implemented by several FIFOs controlled by a FSM, which is simple, reliable and easily modified. File I/O functions are used to simplify software part design when focusing on hardware part implementation.
Co-simulation System Decomposition Interface modeling (RTL VHDL Main Initialization File I/O SOLAR Training Over No Yes System architecture modelling (Behavioral VHDL) Input FIFO Output FIFO FSMFSM Interface Control OP EBE REG FIFO MEM Self-organizing learning architecture (Structural VHDL)
SW Organization VHDL Model All functions and signal variables in the packages are shared, and program execution is functionally interleaved. lower level package is the description for system input and output, initialization and update of the memory element in the network. The higher level packages encapsulate new system functions based on the functions described by the lower level packages. The highest design level function representing the software part in the overall system implements the system organization and management.
Single Neuron’s Hardware Architecture Figure 4: Single neuron’s learning architecture D REG CTRL R R R R FIFO/DMA CTRL MAIN CONTROLLER OP 1024X32 FIFO INTERFACEINTERFACE INTERFACEINTERFACE M ALU M
Interface Process SW HW time configuration send data Receive data conf done start wait command send command over read registers dma request … … time
Interface Modeling class other Software (behavioral VHDL) Interface FIFOs memory module Ctrl Others Figure 5: Interface modeling using FSM&FIFO Hardware (structural VHDL) training
Interface Simulation Small Training Data Set
System Synchronized Work Software Work Hardware Work Interface Work Time
Incremental Prototyping Overall system design can be accelerated by replacing HW subcomponent with real hardware once successfully simulated. HW function is completely defined and prototyped t HW function VHDL- simulated (incremental part)
EBE Simulation Main Procedures contain: Sending data from software to Chip Memory Trigger start signal ALU calculation for all data Moving calculated results to intermediate memory Threshold scanning & ID calculation Updating the intermediate values Data Movement if the current ID is optimal Repeating from 3 to 6 untill all functions are scanned Sending data from Chip to software In this simulation waveform, the signal “Opt_Threshold” and “ID” represent the optimal threshold and the corresponding information index deficiency for this particular training neuron in its learning subspace.
EBE Prototyping SOLAR Training SOLAR Training Map onto Virtex (57.8% logic, 60.3% route) Minimum period: ns (Maximum Frequency: MHz) Minimum input arrival time before clock: ns Maximum output required time after clock: ns
For instance, a particular neuron has 1024 subspace data. PC to Chip: 38x1024 = CLKs ALU calculation: 16x1024=16384 CLKs Threshold scan & ID calculation (maximum): (4x1024+7) x1024= CLKs Data Movement (Maximum) 1x1024=1024 CLKs Chip to PC: 1x1024=1024 CLKs Other: (starting sequence, wait, handshaking, etc.) 20x1024 =20480 CLKs Total: ( )x = CLKs Run Time Main Operations CLK Number per DATA PC data to in- chip memory 38 ALU Calculation 16 Threshold Scanning 4 ID calculation7 Memory data Movement 1 In-chip FIFO to PC 1 x7 functions
Future Work - System SOLAR
SOLAR will grow Rack (4 boards,1x4) 1 Million gates 6 Million gates 24 Million gates Half of a billion gates Board (6 chips,2x3)System (16 cabinets, 4X4)Chip VIRTEXCV1000