Energy efficient calculations of text similarity measure on FPGA-accelerated computing platforms Michał Karwatowski 1,2, Paweł Russek 1,2, Maciej Wielgosz.

Slides:

Advertisements

Similar presentations

Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.

Advertisements

C++ vs. Python By Jahrain Jackson Home Institution: University of Hawaii at Hilo Internship: Subaru Telescope Mentor: Matt Dinkins.

1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.

Cosine similarity metric calculation on low power heterogeneous computing platform Michał Karwatowski 1,2, Sebastian Koryciak 1,2, Ernest Jamro 1,2, Agnieszka.

VEGAS: Soft Vector Processor with Scratchpad Memory Christopher Han-Yu Chou Aaron Severance, Alex D. Brant, Zhiduo Liu, Saurabh Sant, Guy Lemieux University.

Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm.

Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.

K-means clustering –An unsupervised and iterative clustering algorithm –Clusters N observations into K clusters –Observations assigned to cluster with.

Computes the partial dot products for only the diagonal and upper triangle of the input matrix. The vector computed by this architecture is added to the.

A Massively Parallel Architecture for Bioinformatics Presented by Md Jamiul Jahid.

1 Multi-Core Architecture on FPGA for Large Dictionary String Matching Department of Computer Science and Information Engineering National Cheng Kung University,

Video on DSP and FPGA John Johansson April 12, 2004.

Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Written by: Haim Natan Benny Pano Supervisor:

1 The Problem of Power Consumption in Servers L. Minas and B. Ellison Intel-Lab In Dr. Dobb’s Journal, May 2009 Prepared and presented by Yan Cai Fall.

HW/SW CODESIGN OF THE MPEG-2 VIDEO DECODER Matjaz Verderber, Andrej Zemva, Andrej Trost University of Ljubljana Faculty of Electrical Engineering Trzaska.

HW/SW CODESIGN OF THE MPEG-2 VIDEO DECODER Matjaz Verderber, Andrej Zemva, Andrej Trost University of Ljubljana Faculty of Electrical Engineering Trzaska.

General Purpose FIFO on Virtex-6 FPGA ML605 board midterm presentation

Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.

Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf

1 Background The latest video coding standard H.263 -> MPEG4 Part2 -> MPEG4 Part10/AVC Superior compression performance 50%-70% bitrate saving (H.264 v.s.MPEG-2)

Solving a Sudoku in Parallel

Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.

Different CPUs CLICK THE SPINNING COMPUTER TO MOVE ON.

1 A 252Kgates/4.9Kbytes SRAM/71mW Multi-Standard Video Decoder for High Definition Video Applications Motivation A variety of video coding standards Increasing.

Matrix Multiplication on FPGA Final presentation One semester – winter 2014/15 By : Dana Abergel and Alex Fonariov Supervisor : Mony Orbach High Speed.

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal

Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard Association for Computational Linguistics,

Study of the parallel techniques for dimensionality reduction and its impact on quality of the text processing algorithms Marcin Pietroń 1,2, Maciej Wielgosz.

The versatile hardware accelerator framework for sparse vector calculations Michał Karwatowski 1,2, Kazimierz Wiatr 12 1 AGH University of Science and.

Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.

Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.

In each iteration macro model creates several micro modules, sends data to them and waits for the results. Using Akka Actors for Managing Iterations in.

High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.

The Java profiler based on byte code analysis and instrumentation for many-core hardware accelerators Marcin Pietroń 1,2, Michał Karwatowski 1,2, Kazimierz.

PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.

Accelerating Homomorphic Evaluation on Reconfigurable Hardware Thomas Pöppelmann, Michael Naehrig, Andrew Putnam, Adrian Macias.

An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.

Floating-Point Divide and Square Root for Efficient FPGA Implementation of Image and Signal Processing Algorithms Xiaojun Wang, Miriam Leeser

UAV IMAGING G6: Shen, Yubing, Yushi. PANDABOARD Dual-Core 1.2 GHz ARM Cortex-A9 CPU 1 GB DDR2 SDRAM 5V Power Supply.

Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

PERFORMANCE STUDY OF BIG DATA ON SMALL NODES. Ομάδα: Παναγιώτης Μιχαηλίδης Αντρέας Σόλου Instructor: Demetris Zeinalipour.

Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS

INFORMATION RETRIEVAL PROJECT Creation of clusters of concepts that represent a domain corpus.

Sudhanshu Khemka.  Treats each document as a vector with one component corresponding to each term in the dictionary  Weight of a component is calculated.

DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:

Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,

SIMD Implementation of Discrete Wavelet Transform Jake Adriaens Diana Palsetia.

Performed by: Yotam Platner & Merav Natanson Instructor: Guy Revach המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון.

Philipp Gysel ECE Department University of California, Davis

Matrix Multiplication in Hadoop

Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : JAMAL A. NASIR, IRAKLIS VARLAMIS, ASIM KARIM, GEORGE TSATSARONIS KNOWLEDGE-BASED.

Item-Based Collaborative Filtering Recommendation Algorithms

Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets By Yong Chen (with Jialin Liu) Data-Intensive Scalable Computing Laboratory.

IR 6 Scoring, term weighting and the vector space model.

Relational Query Processing on OpenCL-based FPGAs Zeke Wang, Johns Paul, Hui Yan Cheah (NTU, Singapore), Bingsheng He (NUS, Singapore), Wei Zhang (HKUST,

Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.

Accelerating particle identification for high-speed data-filtering using OpenCL on FPGAs and other architectures for FPL 2016 Srikanth Sridharan CERN 8/31/2016.

A Simple Approach for Author Profiling in MapReduce

Optimizing Parallel Algorithms for All Pairs Similarity Search

DI4R Conference, September, 28-30, 2016, Krakow

High Performance Computing on an IBM Cell Processor --- Bioinformatics

Genomic Data Clustering on FPGAs for Compression

FPGAs in AWS and First Use Cases, Kees Vissers

Yinsheng Liu, Beijing Jiaotong University, China

Concept Decomposition for Large Sparse Text Data Using Clustering

Zhiyuan Shao, Ruoshi Li, Diqing Hu, Xiaofei Liao, and Hai Jin

LANMC: LSTM-Assisted Non-Rigid Motion Correction

A microprocessor into a memory chip Dave Patterson, Berkeley, 1997

Optimal Co-design of FPGA Implementations for MPC

Accelerating Regular Path Queries using FPGA

Presentation transcript:

Energy efficient calculations of text similarity measure on FPGA-accelerated computing platforms Michał Karwatowski 1,2, Paweł Russek 1,2, Maciej Wielgosz 1,2, Sebastian Koryciak 1,2, Kazimierz Wiatr 12 1 AGH University of Science and Technology, al. Mickiewicza 30, Kraków, 2 ACK Cyfronet AGH, ul. Nawojki 11, Kraków PPAM Kraków

Agenda Energy consumption in data centers Text processing Low energy FPGA cluster Experiments Results Conclusions and future work 2

Energy consumption in data centers HUGE energy consumption Complex algorithms require computing power Text processing Use different hardware 3

Text similarity calculation VSM TD-IDF Cosine similarity 4

Vector Space Model 5

Term Frequency – Inverse Document Frequency weighting scheme 6

Cosine similarity measure 7

Text comparison 8

ZedBoard Dual-core ARM Cortex-A9 667 MHz 512 MB RAM connected to PS FPGA XC7Z020 85k logic cells 140 block RAMs 9

Cluster 10

Hadoop 11

VC707 Intel Core i MHz 12 GB RAM FPGA VX485T 485k logic cells 1030 block RAMs PCIe Gen2x8 12

Experiment scheme 13

Runtime for 1 – 8 vectors 14

Runtime for 1 – 32 vectors 15

Zynq energy consumption W4.35 W

Vitrex energy consumption W180 W

Average energy consumption [uJ] 18

Resource utilization 19

Conclusions Speedup achieved; Zynq 11.7 times faster Virtex 10.5 times faster Energy consumption: Zynq 10.8 times lower Virtex 12.9 times lower 20

Work in progress 32 internal channels in Zynq 192 internal channels in Virtex Database in DDR3 memory 21

Questions 22