Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mahdi Nazemi, Shahin Nazarian, and Massoud Pedram July 10, 2017

Similar presentations


Presentation on theme: "Mahdi Nazemi, Shahin Nazarian, and Massoud Pedram July 10, 2017"— Presentation transcript:

1 Mahdi Nazemi, Shahin Nazarian, and Massoud Pedram July 10, 2017
High-Performance FPGA Implementation of Equivariant Adaptive Separation via Independence Algorithm for Independent Component Analysis Mahdi Nazemi, Shahin Nazarian, and Massoud Pedram July 10, 2017

2 Outline Independent Component Analysis (ICA) Motivations for Using ICA
Equivariant Adaptive Separation via Independence (EASI) Algorithm Proposed Algorithm and Hardware Implementation Results

3 Independent Component Analysis (ICA)
ICA can be defined as estimation of a generative model 𝒙 𝑚×1 = 𝑨 𝑚×𝑛 𝒔 𝑛× 𝑚 ≥𝑛 𝒙: observed random variables 𝑨: mixing matrix 𝒔: independent components (ICs) Objective: estimate both mixing coefficients and ICs Another variation: 𝒚 𝑛×1 = 𝑩 𝑛×𝑚 𝒙 𝑚×1 𝑩: separation matrix 𝒚: estimates of ICs ICA allows feature extraction, i.e. to keep features that explain the essential structure of the data. Cocktail party problem

4 Outline Independent Component Analysis (ICA) Motivations for Using ICA
Equivariant Adaptive Separation via Independence (EASI) Algorithm Proposed Algorithm and Hardware Implementation Results

5 Bayesian Neural Networks (BNNs)
Inputs, weights, or outputs follow a probability distribution function Sampling dependent features is complicated and computationally expensive ICA finds independent components ⇒ simplifies sampling RNGs for sampling: Wallace, Ziggurat

6 ICA for Dimensionality Reduction
Preprocessing step for transforming the original problem into a smaller problem suitable for hardware implementation MNIST dataset Input features: 784 → 32 (~25x ↓) Accuracy: 0.23% ↓ not only reduces redundancy, but also better for hardware

7 Outline Independent Component Analysis (ICA) Motivations for Using ICA
Equivariant Adaptive Separation via Independence (EASI) Algorithm Proposed Algorithm and Hardware Implementation Results

8 EASI Algorithm Equivariant Adaptive Combines whitening and separation
Stochastic Gradient Descent (SGD) optimization Combines whitening and separation Only requires addition and multiplication 𝒙 𝑘 : observations (m) 𝑩 𝑘 : separation matrix 𝒚 𝑘 : output features (n) 𝒚 𝑘 = 𝑩 𝑘 𝒙 𝑘 Nonlinearity 𝐠 (𝒚 𝑘 )= 𝑦 𝑘 3 𝑩 𝑘+1 = 𝑩 𝑘 −𝜇𝐻 𝑩 𝑘 𝑯=𝑰− 𝒚 𝑘 𝒚 𝑘 𝑇 + 𝒈(𝒚 𝑘 ) 𝒚 𝑘 𝑇 − 𝒚 𝑘 𝒈( 𝒚 𝑘 ) 𝑇 𝒙 𝑘 Repeat until convergence loop-carried dependency S1 S2 S3 S4

9 Hardware Implementation

10 Shortcomings of Existing Implementations
Clock frequency/throughput is low each training sample has to wait for the immediately preceding sample to update model parameters Clock frequency or throughput decreases by increasing 𝑚 and 𝑛 [Meyer-Baese, SPIE ’15]

11 Outline Independent Component Analysis (ICA) Motivations for Using ICA
Equivariant Adaptive Separation via Independence (EASI) Algorithm Proposed Algorithm and Hardware Implementation Results

12 Proposed Algorithm S1 S2 S3 S4 S5 𝒙 𝑘 𝑝 𝒙 𝑘 : observations (m)
𝒚 𝑘 𝑝 = 𝑩 𝑘 𝑝 𝒙 𝑘 𝑝 Nonlinearity 𝐠( 𝒚 𝑘 𝑝 )= ( 𝒚 𝑘 𝑝 ) 3 𝑩 𝑘 𝑝+1 = 𝑩 𝑘 𝑝 − 𝑯 𝑘 𝑝 𝑩 𝑘 𝑝 𝑯 𝑘 𝑝 =𝑰− 𝒚 𝑘 𝑝 ( 𝒚 𝑘 𝑝 ) 𝑇 +𝑔( 𝒚 𝑘 𝑝 ) ( 𝒚 𝑘 𝑝 ) 𝑇 − 𝒚 𝑘 𝑝 𝒈( 𝒚 𝑘 𝑝 ) 𝑇 𝒙 𝑘 𝑝 𝑯 𝑘 𝑝 = 𝛾 𝑯 𝑘−1 𝑃 +𝜇 𝑯 𝑘 𝑝 , 𝑝=0 &𝛽 𝑯 𝑘 𝑝−1 +𝜇 𝑯 𝑘 𝑝 , 0<𝑝<𝑃 increment p S1 S2 S3 S4 S5 𝒙 𝑘 : observations (m) 𝑩 𝑘 : separating matrix 𝒚 𝑘 : output features (n) 𝑘: index of mini-batch 𝑝: index of training sample within a mini-batch 𝑃: mini-batch size Initialize 𝐵 0 0 randomly At the beginning of each mini-batch: initialize 𝑝 to 0 initialize 𝑯 𝑘 𝑝 to a zero matrix

13 Hardware Implementation
equations for critical path delay and throughput

14 Outline Independent Component Analysis (ICA) Motivations for Using ICA
Equivariant Adaptive Separation via Independence (EASI) Algorithm Proposed Algorithm and Hardware Implementation Results

15 FPGA Implementation 32-bit floating point variables and operations
𝑚=4 and n=2 ~11x increase in clock frequency ~149x increase in throughput ~23x increase in number of registers Clock frequency is independent of 𝑚 and 𝑛 EASI with SGD EASI with SMBGD Clock frequency (MHz) 4.81 55.17 Throughput (MIPS) 717.21 Adaptive Logic Modules (ALMs) 12731 10350 DSPs (Multipliers) 42 Registers (bits) 160 3648

16 During Poster Session Later Today
Q&A During Poster Session Later Today


Download ppt "Mahdi Nazemi, Shahin Nazarian, and Massoud Pedram July 10, 2017"

Similar presentations


Ads by Google