Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reza Yazdani Albert Segura José-María Arnau Antonio González

Similar presentations


Presentation on theme: "Reza Yazdani Albert Segura José-María Arnau Antonio González"— Presentation transcript:

1 Reza Yazdani Albert Segura José-María Arnau Antonio González
An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition Reza Yazdani Albert Segura José-María Arnau Antonio González

2 Automatic Speech Recognition (ASR)
Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

3 ASR Requirements Voice-based user-interfaces for mobile devices
Large Vocabulary Speaker-independent High Accuracy Real-time Performance Energy Efficiency Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

4 ASR Solutions General-purpose platforms Reza Yazdani
An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

5 Outline Motivation Automatic Speech Recognition Accelerated ASR System
Memory Subsystem Optimizations Prefetcher Bandwidth Reduction Experimental Results Conclusions Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

6 Automatic Speech Recognition
State-of-the-art ASR system Hybrid model: DNN + HMM Feature Extraction Likelihood Computation \ Graph Search Sound Signal Speech (words) GPU Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

7 Graph Search Dictionary Training Graph Generator Viterbi Search
Weighted-Finite-State-Transducer Training Graph Generator Viterbi Search Acoustic model Language model Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

8 Viterbi Search A simple example of WFST for detecting 2 words: three and two Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

9 Viterbi Search THREE 0.3 0.21 Frame 0 Frame 1 Frame 2 Frame 3 0.0015
0.54 0.3 0.0012 0.0009 0.46 0.0018 1.0 Pruning! THREE Pruning! Pruning! Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

10 Outline Motivation Automatic Speech Recognition Accelerated ASR System
Memory Subsystem Optimizations Prefetcher Bandwidth Reduction Experimental Results Conclusions Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

11 Accelerated ASR System
Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

12 Accelerator’s Architecture
Average active states On each frame evaluation: Less than 1%! Viterbi Accelerator WFST Dynamic Search Graph Acoustic Scores Main Memory w1 1 2 4 6 7 w2 Frame i Frame i+1 Solution: Hash Table w3 w4 State ID Token Info 6 State Index Token frame t th uw r iy 1 0.9 0.025 2 0.7 0.012 0.25 0.12 3 Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

13 Outline Motivation Automatic Speech Recognition Accelerated ASR System
Memory Subsystem Optimizations Prefetcher Bandwidth Reduction Experimental Results Conclusions Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

14 Potential Improvement
Perfect caches and hash tables Speedups with respect to the baseline architecture 94.6% Improvement Large Memory Footprint (34million Arcs) Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

15 Hardware Prefetching Dynamic access of a small sparsely distributed subset of arcs On average: 25K out of 34M arcs Conventional prefetchers are inefficient Graph search exhibits unpredictable access pattern Pruning unlikely paths causes more unpredictability Our proposed scheme based on the decoupled access-execute All memory addresses are deterministic after the pruning Issue memory requests much in advance High accuracy: computed rather than predicted addresses Timeliness: reorder-buffer to avoid early evictions 94% speedup with a negligible area overhead of 0.05% Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

16 Outline Motivation Automatic Speech Recognition Accelerated ASR System
Memory Subsystem Optimizations Prefetcher Bandwidth Reduction Experimental Results Conclusions Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

17 Bandwidth Reduction 97% of dynamically expanded states have less than 16 arcs A novel technique for directly computing arc addresses Changing the memory layout of the WFST dataset Avoid memory access for fetching state’s data 20% Memory Bandwidth Saving at a negligible cost of 0.02% area increase Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

18 Outline Motivation Automatic Speech Recognition Accelerated ASR System
Memory Subsystem Optimizations Prefetcher Bandwidth Reduction Experimental Results Conclusions Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

19 Evaluation Methodology
Viterbi accelerator's timing estimation A cycle-accurate simulator Execution and activity factors RTL Verilog model for logic components Design frequency Modeling memory parts with CACTI Cache&Memory latency Power model Memory & Caches: Cacti Logic: Synopsys Design Compiler Technology node: 28nm Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

20 Experimental Results 111.47x Speedup 16.7x Speedup 1185x Reduction
Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

21 Outline Motivation Automatic Speech Recognition Accelerated ASR System
Memory Subsystem Optimizations Prefetcher Bandwidth Reduction Experimental Results Conclusions Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

22 Conclusion Viterbi search is the main bottleneck in ASR systems
General-purpose solutions Not real-time for large speech models High energy consumption Design of an accelerator tailored for the Viterbi Search More energy-efficient (by orders of magnitude) Memory subsystem is the main challenge of ASR Arc prefetcher Memory bandwidth reduction 1.7x faster than NVIDIA GTX 980 and 287x less energy Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition

23 Reza Yazdani Albert Segura José-María Arnau Antonio González
An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition Reza Yazdani Albert Segura José-María Arnau Antonio González


Download ppt "Reza Yazdani Albert Segura José-María Arnau Antonio González"

Similar presentations


Ads by Google