Software Acceleration in Hybrid Systems Xiaoqiao (XQ) Meng IBM T. J

Slides:



Advertisements
Similar presentations
Welcome to Middleware Joseph Amrithraj
Advertisements

1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.
The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
Technical Architectures
Components and Architecture CS 543 – Data Warehousing.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Self-Adaptive QoS Guarantees and Optimization in Clouds Jim (Zhanwen) Li (Carleton University) Murray Woodside (Carleton University) John Chinneck (Carleton.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
Computer System Architectures Computer System Software
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Extreme scale parallel and distributed systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Intro – Part 2 Introduction to Database Management: Ch 1 & 2.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
Full and Para Virtualization
Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.
Background Computer System Architectures Computer System Software.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Distributed Systems Architectures Chapter 12. Objectives  To explain the advantages and disadvantages of different distributed systems architectures.
Distributed Systems Architectures. Topics covered l Client-server architectures l Distributed object architectures l Inter-organisational computing.
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
Distributed Virtualization for Net-Centric Operations Draft
Applied Operating System Concepts
Chapter 12: Architecture
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Chapter 6: Securing the Cloud
Business System Development
Organizations Are Embracing New Opportunities
Lecture 2: Performance Evaluation
Database Architectures and the Web
Netscape Application Server
Enabling Effective Utilization of GPUs for Data Management Systems
Enabling machine learning in embedded systems
High-performance tracing of many-core systems with LTTng
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Super Computing By RIsaj t r S3 ece, roll 50.
Grid Computing.
CSC 480 Software Engineering
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
Database Architectures and the Web
Containers in HPC By Raja.
Chapter 21: Virtualization Technology and Security
Introduction to Cloud Computing
#01 Client/Server Computing
SmartHOTEL Solutions Powered by Microsoft Azure Provide Hoteliers with Comprehensive, One-Stop Automated Management of All Booking Channels MICROSOFT AZURE.
Built on the Powerful Microsoft Azure Platform, Lievestro Delivers Care Information, Capacity Management Solutions to Hospitals, Medical Field MICROSOFT.
Chapter 22: Virtualization Security
Continuous Performance Engineering
On-Premises, or Deployed in a Hybrid Environment
DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.
Chapter 2: System Structures
Crypteron is a Developer-Friendly Data Breach Solution that Allows Organizations to Secure Applications on Microsoft Azure in Just Minutes MICROSOFT AZURE.
Dell Data Protection | Rapid Recovery: Simple, Quick, Configurable, and Affordable Cloud-Based Backup, Retention, and Archiving Powered by Microsoft Azure.
CLUSTER COMPUTING.
Chapter 12: Physical Architecture Layer Design
Operating System Concepts
Windows Virtual PC / Hyper-V
Single Cell’s Progenitor Powered by Microsoft Azure Improves Organisational Efficiency with Strategic Procurement, Contract Management, and Analytics MICROSOFT.
Introduction To Distributed Systems
Operating System Concepts
#01 Client/Server Computing
The Database World of Azure
Presentation transcript:

Software Acceleration in Hybrid Systems Xiaoqiao (XQ) Meng IBM T. J Software Acceleration in Hybrid Systems Xiaoqiao (XQ) Meng IBM T.J. Watson Research Center May 4, 2011

Hybrid Systems A distributed system consisting of heterogeneous computing architectures E.g., X86, PowerPC, ARM, GPU, FPGA In a broader sense, a hybrid system also includes heterogeneous storage, I/O and communication devices Advantages Complementary strengths and weakness Cost-effective Mixed low-end and high-end computing platforms to be elastic. E.g., the TianHe supercomputer uses Intel Xeon + AMD GPU hybrid system to achieve high floating point performance in economical manner Use cases General-purposed computing systems Fit-for-purpose appliance Intel and GPU as an example. HPC is good example. GPU used in TianHe for e.g., RoadRunner also used Cell and Intel.

Example: Zing platform from aZul Zing: Elastic Java runtime platform The Zing Java Platform consists of 4 main components: a 100% Java-compatible JVM which installs and launches like any other commercial JDK/JVM; a highly optimized runtime platform (i.e. kernel) packaged as an easy-to-install virtual application;an integrated,application-aware resource management and process controller and a true, zero-overhead, always-on production-time diagnostic and tuning tool integrated into the Zing JVM and appliance

Leveraging heterogeneity Essence of software design in hybrid systems: it is possible to leverage the heterogeneity to accelerate software performance Scenario 1: migrate computing-intensive work units from CPUs with expensive MIPS to CPUs with cheap MIPS Scenario 2: Place multi-thread work unit on CPUs with good SMT supports and multi-core Scenario 1 and 3 are sufficient. We talk about TianHe HPC. Remove Scenario 2.

Challenges How to detect acceleration potential? What software technology to realize the potential?

Affinity based Workload Distribution Goal: In hybrid systems, distribute workload in run-time by leveraging affinity Hybrid system: A computing system that combines general and special purpose machines Affinity: A work unit more efficiently processed on one platform than the other Analytically determine workload distribution that optimizes performance Technical challenges Identify appropriate application/system metrics to characterize workload Workload characteristics change over time Recognize the affinity of workload to platforms during runtime Conflict between affinity and load balancing Use concrete examples such as pdf transaction, db2 queries. Work unit 1 Work unit 2 Work unit 3 SAN FPGA GPU Z Power X86 Application and system monitoring Workload Distribution Analyzer 6

Example from Netezza

Workload Distribution Analyzer Detect re-distributable work unit For Java workload, a localizable JNI call is a re-distributable work unit Use domain knowledge and machine learning techniques to determine affinity of work units Domain knowledge E.g., in general Power has better Decimal Floating-point support than Z. So a Math module has better affinity on Power than on Z Machine learning techniques Affinity clustering: work units referencing similar data are clustered together to be executed on the same hardware Supervised learning: by learning from work units affinity to Power, infer whether other work units have affinity to Power Una-May’s ML approach might be a very good fit Determine workload re-distributing strategy to optimize performance Principle of data locality E.g., if a work unit has much input data on Z, it should remain on Z instead of migrating to Power Tradeoff between locality and overall load balancing Respect data locality while maintaining load balancing across hybrid systems

Existing simple practice for affinity based workload distribution Workloads that benefit from being on z Workload that access significant amount of z data If they are running on Power or x86, they can be moved to z Z data can be provided to apps in a much more efficient manner Workloads that require strong single threaded performance (z has the fastest threads in the industry) Workloads that require strong security and unique quality of service Workloads that benefit from being on Power7/massive multi-threaded platforms Long running applications with little or minimal data requirements Multiple parallel threads that benefit from SMT on Power Threads that require significant amount of main memory Threads that require memory bandwidth Analytics algorithms that are that are computation intensive (e.g., floating point operations) 9 9

Building a hybrid system with diversified benchmarks Design and build a hybrid system testbed Include major computing architectures Extend from computation-intensive, HPC types of benchmark to a wider range of benchmarks, including: Web server Java client/server (various Java middleware) Mail server Network file systems Etc. Initially, we may focus on multi-tier benchmark and study performance acceleration within one-tier Caching system. Short-response-time request. Focus on work unit within one tier.

Technical challenges Automatically inferring dependency graph Derive input/output data flow Infer dependency among work units Online learning efficiency for an independent work units Quantify affinity/stickness. Consider security as extra constraints Combining dependency graph and efficiency learning to infer acceleration ratio Lightweight, cross-platform measurement tools Work with legacy applications without much code change Mention machine learning in “online learning efficiency”. Expand bullet 3 (how to infer acceleration ratio). Add objective “we’re looking for a run-time system,”

Websphere WLM