Presentation is loading. Please wait.

Presentation is loading. Please wait.

© University of North Carolina at Charlotte1 Chapter 9: Green Computing Platforms for Biomedical Systems Vinay Vijendra Kumar Lakshmi, Ashish Panday, Arindam.

Similar presentations


Presentation on theme: "© University of North Carolina at Charlotte1 Chapter 9: Green Computing Platforms for Biomedical Systems Vinay Vijendra Kumar Lakshmi, Ashish Panday, Arindam."— Presentation transcript:

1 © University of North Carolina at Charlotte1 Chapter 9: Green Computing Platforms for Biomedical Systems Vinay Vijendra Kumar Lakshmi, Ashish Panday, Arindam Mukherjee, and Bharat S Joshi University of North Carolina at Charlotte HANDBOOK ON GREEN INFORMATION AND COMMUNICATION SYSTEMS

2 © University of North Carolina at Charlotte2 Overview Green Computing in Biomedical Field Survey of Green Computing Platform Analysis of popular Biomedical Applications Design Framework for Biomedical Embedded Processors Survey of Simulation tools for Design Space Exploration Development and Characterization of Benchmark Suite Design Space Exploration and Optimization Techniques of Embedded Micro architectures Conclusion Future Research Areas

3 © University of North Carolina at Charlotte3 Green Computing in Biomedical Field Computing in Biomedical systems can be classified into 3 categories. Implantable device level Portable/Embedded platform level Server level

4 © University of North Carolina at Charlotte4 Characteristics of Biomedical Systems Power consumption Renewable energy resource – energy harvesting Heat dissipation Minimizing area Cost Performance

5 © University of North Carolina at Charlotte5 Survey of Green Computing Platforms Implantable Devices monitor the physiological parameters of the human body. Pacemakers, cardioverter-defibrillators, cochlear Most of the implantable devices are inactive most of the times and activate based on a stimulus from the body Configuration of a brain implant or brain-machine interface (BMI)

6 © University of North Carolina at Charlotte6 physiological monitoring systems recognition systems Wearable ultra-low power biomedical signal processor, CoolBio. Embedded Platforms

7 © University of North Carolina at Charlotte7 Power Management in Intel ATOM ATOM includes power management control block, a power management block, a clock synthesizer and a few programmable registers which work on reducing the noise, achieving low quiescent current, real-time dynamic switching of voltage and frequency between multiple performance modes, varying core operation voltage and processor speeds to save on ATOMs power and improve its performance. Figure : Power management in Intel ATOM

8 © University of North Carolina at Charlotte8 Servers The Oracle WebLogic Server 11g software was used to demonstrate the performance of the Avitek Medical Records sample application. A configuration using SPARC T3-1B and SPARC Enterprise M5000 servers from Oracle was used and showed excellent scaling of different configurations as well as doubling previous generation SPARC blade performance. ServerProcessorMemoryMaximum TPS SPARC T3-1B1 x SPARC T3, 1.65 GHz, 16 cores128 GB28,156 SPARC T3-1B1 x SPARC T3, 1.65 GHz, 8 cores128 GB14,030 Sun Blade T63201 x UltraSPARC T2, 1.4 GHz, 8 cores64 GB13,386

9 © University of North Carolina at Charlotte9 Cell Processor

10 © University of North Carolina at Charlotte10 Analysis of Biomedical Applications Flowchart for choosing algorithm-architecture combination best suited for an application

11 © University of North Carolina at Charlotte11 Pairwise Correlation Another way to interpret PPMCC X: {x 1, x 2, x 3, ….. x n } Y : {y 1, y 2, y 3, ….. y n } r : coefficient of correlation Cov(X,Y) : covariance of X and Y S X : standard deviations of X S Y : standard deviations of Y µ X : Expectation of X µ Y : Expectation of Y

12 © University of North Carolina at Charlotte12 i,j : i th, j th channel where 1i,jm x (i,k), x (j,k) : k th sample from i th, j th channel where 1i,jm, ij and 1kn r (i,j) : Correlation coefficient between i th, j th channel where 1i,jm

13 © University of North Carolina at Charlotte13 Choosing initial algorithm and architecture CPIL1I_MISS%L1D_MISS %L2_MISS % Serial Code0.8422.9891.5460.77 Initially the PWC is written In serial fashion for Xeon Dual Core processor. After running Vtune we arrive at the following statistics Table 1: Performance of Serial code on Intel Xeon Dual Core processor CPIL1I_MISS %L1D_MISS %L2_MISS % Parallel Code(OMP) 0.6727.8489.2325.67 The code is them parallelised in OpenMP and analysed once again to arrive at better performance values as shown below Table 4.3: Performance of OpenMP code on Intel Xeon Dual Core processor Implementation on Cell using the Ring Algorithm gives a speed-up of approx. 56 when compared with serial version on Intel Xeon.

14 © University of North Carolina at Charlotte14 Design Framework for Biomedical Embedded Processors Design flow for Bio-medical Embedded Processors

15 © University of North Carolina at Charlotte15 Survey of Simulation tools for Design Space Exploration FeaturesMV5M5CASPERSesc Full-System Simulation System-call Emulation I/O Disk ISA Emulated thread API Category IO Core Multithreaded core OOO Core SIMD Core Alpha Event Driven Various Cycle Driven Sparc Trace driven Mips Event Driven

16 © University of North Carolina at Charlotte16 Development and Characterization of Benchmark Suite A good multicore benchmark will identify bottlenecks in the multicore system design including memory and I/O bottlenecks, computational bottlenecks, and real-time bottlenecks*. In addition, a good multicore benchmark will identify synchronization problems where code and data blocks are split, distributed to various compute engines for processing, and then the results are reassembled. *S Gal-On, M Levy, S Leibson, How to Survice the Quest for a useful Multicore Benchmark", ECN Magazine, Dec 2009

17 © University of North Carolina at Charlotte17 Performance analysis of the benchmark Analysis of PWC on various Simulator tools CASPER CPI per core on CASPER CPI D$ size (in bytes) Avg Power (uW) D$ size (in bytes) Average Power per core on CASPER

18 © University of North Carolina at Charlotte18 Analysis of Parallel version of the code (per CPU results) on MV5 with various configurations Frequency Number of SIMD CPUs Number of OOO CPUs No. of HW+SW threads Benchmark Used Host memory usage Simulation time (seconds) fractal_smp1 GHz4064+2FILTER1.217 MB0.019065 Fractal_smp1 GHz4064+2PPPC1.207 MB0.001364 Config_hetero1 GHz2232+2FILTER PPPC2.234 MB0.070888 Config_hetero1 Ghz4432+2FILTER PPPC2.255 MB1050.42 Total Energy of cpu (mJ) Total Leakage Energy of cpu (mJ) Clock active energy (uJ) Total Cache Energy (mJ) D$ Miss rate I$ Miss rate Floating ALU Active Energy (mJ) Integer ALU Active Energy (mJ) Fractal smp on FILTER 26.3581 00 1.7132090.0009562.1860350.2570.1621.08779871.785665 Fractal smp on PPPC 0.01064 4 0.0021180.0001880.0032912.1950.0780.00040010.000844 Config_hetero FILTER + PPPC 29.9186 95 1.5437020.00029212.0977282.8760.0182.2410644.005570 Config_hetero FILTER + PPPC 32.2689 733 4.19825260.0001820.00819441.7470.0000011.08779921.7854455 MV5 Simulation

19 © University of North Carolina at Charlotte19 Design Space Exploration and Optimization Techniques of Embedded Micro architectures Different approaches used for design space exploration for multicore processor architecture and optimization algorithms Artificial Neural Networks (ANN) Fast Genetic Algorithms(Used in CASPER) Genetically programmed Response surfaces(GPRS used on MV5)

20 © University of North Carolina at Charlotte20 Conclusion Methodologies for the characterization of bio-medical applications for ultra-low-energy and low heat producing embedded implantable devices, as well as for low power dissipation but high performance embedded computing platforms. PWC benchmark the computation complexity is O(mn2), which has given a CPI of 0.67 and L2 Cache miss percentage of 25.67 on Intel Xeon Dual Core processor Outlines of the procedure to be followed for the design space exploration of processor micro-architectures using existing simulation tools and optimizers. heterogeneous configuration with two IO and two OOO consumes less energy per CPU (29.918 mJ) compared to a homogenous configuration on MV5's alpha architecture simulation

21 © University of North Carolina at Charlotte21 Development of better different instruction set architectures (ISAs) Corresponding cross-compilers to generate optimized executables for the simulators Upgrading existing simulation platforms to support full system mode with real time kernel libraries to account for the latency and throughput of the real-life applications Development of advanced real time operating systems and scheduling algorithms to schedule the various applications on different heterogeneous cores to meet the hard real time constraints. Future Research Areas

22 © University of North Carolina at Charlotte22 Thanks for your attention!


Download ppt "© University of North Carolina at Charlotte1 Chapter 9: Green Computing Platforms for Biomedical Systems Vinay Vijendra Kumar Lakshmi, Ashish Panday, Arindam."

Similar presentations


Ads by Google