Ajit Mathews Corp. VP Software Development ML Software Engineering

Slides:

Advertisements

Similar presentations

ATI Stream Computing OpenCL™ Histogram Optimization Illustration Marc Romankewicz April 5, 2010.

Advertisements

ATI Stream ™ Physics Neal Robison Director of ISV Relations, AMD Graphics Products Group Game Developers Conference March 26, 2009.

© 2014 Microsoft Corporation. All rights reserved.

Panel Discussion: The Future of I/O From a CPU Architecture Perspective #OFADevWorkshop Brad Benton AMD, Inc.

Intel® Education Learning in Context: Science Journal Intel Solutions Summit 2015, Dallas, TX.

OPTIMIZING AND DEBUGGING GRAPHICS APPLICATIONS WITH AMD'S GPU PERFSTUDIO 2.5 GPG Developer Tools Gordon Selley Peter Lohrmann GDC 2011.

OpenCL Introduction A TECHNICAL REVIEW LU OCT

FPGA and ASIC Technology Comparison - 1 © 2009 Xilinx, Inc. All Rights Reserved How do I Get Started with PlanAhead?

WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.

April 30, 2007 openSUSE.org Build Service a short introduction Moiz Kohari VP Engineering.

End User License Agreement Permission to use and redistribute this Document is granted, provided that (1) the below copyright notice appears in all copies.

Enhancement Package Innovations Gabe Rodriguez - Halliburton Stefan Kneis – SAP Marco Valencia - SAP.

ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008.

Joseph L. GreathousE, Mayank Daga AMD Research 11/20/2014

 Programming - the process of creating computer programs.

Oracle Fusion Applications 11gR1 ( ) Functional Overview (L2) Manage Inbound Logistics (L3) Manage Supplier Returns.

Oracle Fusion Applications 11gR1 ( ) Functional Overview (L2) Manage Inbound Logistics (L3) Manage and Disposition Inventory Returns.

STRUCTURAL AGNOSTIC SPMV: ADAPTING CSR-ADAPTIVE FOR IRREGULAR MATRICES MAYANK DAGA AND JOSEPH L. GREATHOUSE AMD RESEARCH ADVANCED MICRO DEVICES, INC.

FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR DAVID A. ROBERTS, AMD RESEARCH PRASHANT J. NAIR, GEORGIA INSTITUTE OF TECHNOLOGY

SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION MARC S. ORR †§, SHUAI CHE §, AYSE YILMAZER §, BRADFORD M. BECKMANN §, MARK D. HILL †§, DAVID A. WOOD †§ †

IMPLEMENTING A LEADING LOADS PERFORMANCE PREDICTOR ON COMMODITY PROCESSORS BO SU † JOSEPH L. GREATHOUSE ‡ JUNLI GU ‡ MICHAEL BOYER ‡ LI SHEN † ZHIYING.

End User License Agreement Permission to use and redistribute this Document is granted, provided that (1) the below copyright notice appears in all copies.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle Proprietary and Confidential. 1.

IDC Says, "Don't Move To The Cloud" Richard Whitehead Director, Intelligent Workload Management August, 2010 Ben Goodman Principal.

Connectivity to bank and sample account structure

Development Environment

Accelerate your DevOps with OpenShift by Red Hat

DL (Deep Learning) Workspace

Introduction to Computer Science

CSCI-235 Micro-Computer Applications

IoT at the Edge Technical guidance deck.

Enabling machine learning in embedded systems

BLIS optimized for EPYCTM Processors

Developing Drivers in Visual Studio

Texas Instruments TDA2x and Vision SDK

SQL Server Data Tools for Visual Studio Part I: Core SQL Server Tools

Microsoft Virtual Academy

Cognitive Toolkit (CNTK) Cha Zhang, Principal Researcher Microsoft AI & Research With 150+ contributors.

Parallelspace PowerPoint Template for ArchiMate® 2.1 version 1.1

Parallelspace PowerPoint Template for ArchiMate® 2.1 version 2.0

Cloud Database Based on SQL Server 2012 Technologies

The Small batch (and Other) solutions in Mantle API

Many-core Software Development Platforms

IoT at the Edge Technical guidance deck.

SOC Runtime Gregory Stoner.

Systems Analysis and Design 5th Edition Chapter 8. Architecture Design

Automation in an XML Authoring Environment

Microsoft Virtual Academy

Microsoft Services Provider License Agreement Program reference card

Microsoft Ignite NZ October 2016 SKYCITY, Auckland.

Microsoft Virtual Academy

Microsoft Virtual Academy

MIX 09 12/8/2018 4:33 PM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.

12/9/2018 Desktop Virtualization Corey Hynes Kyle Rosenthal President Technical Lead HynesITe Inc Spider Consulting @windowspcguy.

What's New in eCognition 9

Ideas for adding FPGA Accelerators to DPDK

AI Stick Easy to learn and use, accelerate the industrialization of artificial intelligence, and let the public become an expert in AI.

Microsoft Virtual Academy

Using the Microsoft AI Platform for next generation applications

Microsoft Virtual Academy

Advanced Micro Devices, Inc.

AMD GPU Performance Revealed

Service Template Creation from the Ground Up

Service Template Creation from the Ground Up

What's New in eCognition 9

Microsoft Virtual Academy

Microsoft Virtual Academy

What's New in eCognition 9

Rajy Rawther Kiriti Nagesh Gowda

Presentation transcript:

Ajit Mathews Corp. VP Software Development ML Software Engineering Machine Learning @ AMD Ajit Mathews Corp. VP Software Development ML Software Engineering 3:00 PM

open source Foundation for Machine learning ONNX Frameworks Middleware and Libraries MIOpen BLAS,FFT,RNG RCCL Eigen Machine Learning Apps Applications ROCm Fully Open Source ROCm Platform OpenMP HIP OpenCL™ Python Devices GPU CPU APU DLA Caffe 2 PyTorch TensorFlow Latest Machine Learning Frameworks MIVisionX Optimized Math Libraries Dockers and Kubernetes Support Up-Streamed for Linux Kernel Distributions AMD Confidential

open source Foundation for Machine learning AMD Confidential open source Foundation for Machine learning ONNX Frameworks Middleware and Libraries MIOpen BLAS,FFT,RNG RCCL Eigen Machine Learning Apps Applications ROCm Fully Open Source ROCm Platform OpenMP HIP OpenCL™ Python Devices GPU CPU APU DLA Caffe 2 PyTorch TensorFlow ML APPS use open source machine learning frameworks Low level software components are abstracted - therefore CUDA is not a factor MIVisionX Publicly available for machine learning experts to try out on our hardware

Machine Learning Frameworks TensorFlow Frameworks Middleware and Libraries MIOpen BLAS,FFT,RNG RCCL Eigen Machine Learning Apps Applications ROCm Fully Open Source ROCm Platform OpenMP HIP OpenCL™ Python Devices GPU CPU APU DLA Caffe 2 MXnet PyTorch Supports: Vision (CNNs, GANs) Translate (RNNs, LSTMs, Transformer) Reinforcement Learning High performance FP16/FP32 training with up to 8 GPUs/node v1.13.1 and v2.0-alpha – Available today as a docker container: https://hub.docker.com/r/rocm/tensorflow or as Python PIP wheel: https://pypi.org/project/tensorflow-rocm/ Matching AMD versions are available within days of official release TensorFlow – we have v1.13.1 released today. This the latest stable TensorFlow version.

Machine Learning Frameworks TensorFlow Frameworks Middleware and Libraries MIOpen BLAS,FFT,RNG RCCL Eigen Machine Learning Apps Applications ROCm Fully Open Source ROCm Platform OpenMP HIP OpenCL™ Python Devices GPU CPU APU DLA Caffe 2 MXnet PyTorch AMD related changes have been upstreamed https://github.com/pytorch/pytorch High performance FP16/FP32 training with up to 8 GPUs/node Available today as a docker container (or build from source): https://hub.docker.com/r/rocm/pytorch ROCm is an official build target for PyTorch CI, ensures continuous testing and minimal regressions Supports: All Torch-Vision models PyTorch Translate All PyTorch examples Supports: https://github.com/facebookincubator/gloo/tree/master/gloo Supports

Mike Schmit Director of Software Engineering ML Computer Vision Inference with OpenVXTM Mike Schmit Director of Software Engineering ML Computer Vision 3:00 – 5:30 PM

MIVisionX = OpenVXTM with tools/libraries Conformant OpenVXTM 1.0.1, Open source (MIT license) Neural net extensions w/ Optimized MIOpen libraries Model compiler / model optimizer OpenCVTM interop Radeon Loom 360 stitching library WinML for Windows Utilities ADAT (AMD Dataset Analysis Tool) RunVX (command line OpenVX interpreter) GDF (OpenVX scripting language & debugger) LoomShell (360 image scripting language & debugger) High-level summary

Introduction to MIVisionX Toolkit MIVisionX toolkit is a comprehensive computer vision and machine intelligence libraries, utilities and applications bundled into a single toolkit. AMD OpenVX is delivered as Open Source with MIVisionX Primarily targeted at applications requiring a combination of machine learning inference and computer vision or image/video processing. Includes a model compiler for converting and optimizing a pretrained model from existing formats such as Caffe, NNEF and ONNX to an OpenVX backend. After compilation, MIVisionX generates an optimized library specific for a backend to run inferencing and vision pre- and post-processing modules. It is beneficial to have lightweight and dedicated APIs optimized for AMD hardware for inference deployment as opposed to heavyweight frameworks.

Neural Network Deployment options Frameworks ONNX MIVisionX Model Compiler / optimizer network … training Application Application Application Application OpenVX run-time & libraries OpenVX Binary run-time & libraries WinML run-time & libraries Future target system(s) Deployment Option #1 Deployment Option #3 Deployment Option #2 Deployment Option #4

AMD ML SOFTWARE Stack w/ ROCm Data Platform Tools Machine Learning Apps MIVisionX Apps Latest Machine Learning Frameworks Docker and Kubernetes support Optimized Math & Communication Libraries Up-Streamed for Linux Kernel Distributions open source Foundation for Machine learning Frameworks Exchange formats MIVisionX Middleware and Libraries MIOpen BLAS, FFT, RNG RCCL Eigen ROCm Fully Open Source ROCm Platform OpenMP HIP OpenCL™ Python 10 GPU CPU APU Future Accelerators Devices ROCm = Radeon Open Compute platform HIP = Heterogeneous-compute Interface for Portability

Not all Tutorials may be presented based on time available Tutorial Examples Tutorial #1: Image Classification with ONNX Tutorial #2: Object Detection with Caffe Tutorial #3: Image Classification with NNEF Tutorial #4: Object Detection with multi-stream HW video decode Not all Tutorials may be presented based on time available Links: https://github.com/kiritigowda/MIVisionX-Inference-Tutorial#mivisionx-inference-tutorial https://github.com/rrawther/MIVisionX-OpenVX-Tutorial

Tutorial Systems and Room Setup AMD Developer Cloud Server AMD EpycTM + Radeon InstinctTM MI25 Laptop Laptop Laptop AMD RyzenTM 7 + RadeonTM Vega VII WiFi router Laptop AMD Ryzen ThreadripperTM + RadeonTM Vega 10 Laptop Laptop See printed instructions to get connected now

Tutorial Example 1: Image Classification Using pretrained ONNX model

Tutorial Example 2: Object Detection Using Pre-Trained Caffe model

Tutorial Example 3: Image Classification Using Pre-Trained NNEF model

Mv_objdetect using 4 video streams Example shows decoding 4 video streams simultaneously using amd_media_decoder OpenVX node and running the inference on 4 streams and visualizing the results using OpenCV.

Inference server demo … Setup phase Model E Parameters GPU #0 1. Model compilation 1. Choose model & parameters Status C D GPU #1 Image database A 2a. Image decode 2b. Multiple GPU execution GPU #2 2. Choose dataset B Images GPU #3 G Results F … 3. View results Results Inference execution up to 8 MI25 or MI60 GPUs A-G Critical path flow Numbers show the complete setup and inference

Bytes processed (per 1000 images) Client: Read HDD Client: Xmit Server: JPEG Decode Copy: PCIe to GPU GPU: inference Server: send results Client: Display results A B C D E F G 15 MB/sec (assume 10:1 compression) partial results shown; Full results reported 15 MB/sec 150MB/sec & 600 MB/sec (best case w/ no resize) 600 MB/sec 1000 images 1000 * 64 Example Capacities: Examples HDD = 100-200 MB/sec SATA III SSD = 550 MB/sec NVMe = ~2GB/sec 1 Gbps (125 MB/sec) … 100 Gbps 32 cores 64 threads PCIe 3.0 16 GB/sec for x16 600 – 900 images/sec per GPU for Resnet-50 FP32 1 Gbps (125 MB/sec) … 100 Gbps NA

Disclaimers and attributions The information contained herein is for informational purposes only, and is subject to change without notice. Timelines, roadmaps, and/or product release dates shown in these slides are plans only and subject to change. “Polaris”, “Vega”, “Radeon Vega”, “Navi”, “Zen” and “Naples” are codenames for AMD architectures, and are not product names. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18 ©2019 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, Ryzen, Threadripper, EPYC, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

BACK UP