Vilen Jumutcs Solutions Architect, ML/DL Expert Intel Corp (EMEA)

Vilen Jumutcs Solutions Architect, ML/DL Expert Intel Corp (EMEA)
Intel strategy on aI: improving cern Deep learning workflows and models using intel optimized solutions Vilen Jumutcs Solutions Architect, ML/DL Expert Intel Corp (EMEA)

12X The next big wave of computing AI Compute Cycles will grow by 2020
Data deluge COMPUTE breakthrough Innovation surge Standards-based servers Cloud computing Artificial intelligence mainframes AI Compute Cycles will grow by 2020 12X AI is the next big wave of computing, and Intel uniquely has the experience to fuel the AI computing era, because we’ve successfully led previous major computing transformations (mainframe, standards-based servers, cloud computing). What is Intel’s forecast for this market? While only 7% of server sales in 2016 were used for artificial intelligence workloads, it is the fastest-growing data center workload, and we foresee ~12X growth in demand by 2020 in the datacenter alone, not to mention significant usage in edge devices (cars, phones, cameras etc.). Within that 7% last year, 60% of servers were used for ‘conventional machine learning’ while the remaining 40% were used for ‘deep learning’. Among servers used for conventional machine learning, 97% used Intel Xeon processors to handle the computations, 2% used other architectures, and 1% used Intel processors paired with GPUs. Among servers used for deep learning, 91% use just Intel Xeon processors to handle the computations, 7% use Intel Xeon processors paired with GPUs, and 2% use other architectures altogether. What do others have to say about the AI market? A Gartner survey ( of more than 2,500 CIOs revealed that spending on intelligence is the top business investment priority in all types of organizations.1 The opportunity is so significant that Andrew Ng, Chief Scientist at Baidu Research and a luminary in AI, believes there’s more risk in waiting than jumping in. He said, “In the past, a lot of S&P 500 CEOs wished they had started thinking sooner than they did about their Internet strategy. I think five years from now there will be a number of S&P 500 CEOs that will wish they’d started thinking earlier about their AI strategy.” Yeah yeah, but haven’t we heard the promise of AI before—as early as the 1960s, and again in the 1980s, when a lot of research went into AI? Yes, but the timing wasn’t right back then, and 3 things have brought AI to life today: Compute breakthrough: Paved by Moore’s Law, compute capability and architectural innovation have progressed to the point where we’ve crossed the threshold required to support the intense demands of machine intelligence. For example, the concept of deep learning through artificial neural networks has existed for at least 20 years, but not until the past few years have computing advancements enabled the practical application of these intensive algorithms, thanks to greater accuracy and speed. Data deluge: Our world of smart and connected devices has unleashed a data deluge, as the Internet of Things (IoT) joins apps in generating continuous streams of structured and unstructured data. The IoT will include a projected 200 billion smart-and-connected devices by 2020,8 and the data produced is expected to double every two years to total 40 zettabytes (40 trillion gigabytes) by These vast data stores are required to train many AI algorithms and are ripe to be mined for fresh insights. Innovation surge: Of course, compute power and data are not enough on their own. The road to AI is also being driven by a surge of innovation that has pushed us over the tipping point from research to mainstream use. Each new AI algorithmic innovation and use case opens more eyes to the power of AI, leading more innovators to join the community and stimulating an ever-increasing demand for the technology. Neural network innovations in the 1990s renewed research into AI, but it was accuracy breakthroughs in both speech recognition and image recognition, in 2009 and 2012 respectively, that proved to be catalysts for today’s surge of innovation. In last year’s ImageNet Computer Vision contest, a neural network–based application even outperformed a human. As we progress, a plethora of unsolved AI challenges will continue to attract researchers and innovators around the world. Source: Intel

Intel ai approach “ ” connectionist Symbolist evolutionaries bayesians
models inspired by neural networks “ One note does not make a symphony; one artist does not make an orchestra... – Matshona Dhliwayo, Philosopher Symbolist evolutionaries rules and models using logical reasoning models inspired by Darwinian evolution ” bayesians analogizers models using probabilistic inference reason from similar cases In any orchestra, you need the woodwind section and the percussion, and you need the first chair violin. But you don’t talk about a great piece of music as being defined by the timeliness of one percussionist, a harmony of the brass section, or even multiple pieces together. It is about what the ensemble can do as a harmonius whole that results in a truly masterful performance that can move people. Similarly for AI, truly human-like learning and thinking machines reach beyond current trends in engineering, both in what they learn, and how they learn it; combining the strengths of recent neural network advances with the best of other fundamental approaches to cognitive computing. This is the pathway to truly intelligent machines. At Intel, our strategy is to be the AI ‘conductor’ that brings together the very best of each approach to deliver a sum that is greater than its individual parts. It is not about just deep learning, or another algorithm type. Its not about the individual tools, standards, solutions or platforms. Its about what you can do differently than traditional compute, when you have all of those elements at your disposal and combine them in emergent ways. Intelligence is a set of properties that emerges when technologies achieve a learning loop leveraging multiple frameworks. These frameworks, or “tribes” of learners as described in Pedro Domingos’ book The Master Algorithm, combine to deliver increasingly intelligent machines that will eventually lead to general intelligence or ‘strong AI’: Connectionists – origin in neuroscience; attempting to reverse engineer the brain through deep (learning) neural networks. Luminaries: Geoff Hinton, Yann LeCun, Yoshua Bengio. Players: Google, Facebook, Microsoft, AWS, Apple, IBM, Clarifai, Numenta, Nervana, Movidius, Vicarious, nnaisense, Nvidia, Altera, Micron, Qualcomm, MIT, more. Symbolists – origin in logic/philosophy; learning is the inverse of deduction, working backward to fill gaps. Luminaries: Tom Mitchell, Steve Muggleton, Ross Quinlan. Players: IBM Watson, Cycorp, Google, CMU’s NELL, ExpertSystem Semantic Intelligence, more. Evolutionaries – origin in evolutionary biology; evolution is the source of all life on Earth, so isn’t that what we should simulate? Luminaries: John Holland, John Koza, Hod Lipson. Players: Nutonian, Sentient, PSIBERNETIX, Hod Lipson’s evolutionary robotics, more. Analogists – origin in psychology; reasoning based on similarity, anomalies, trends, etc. Luminaries: Peter Hart, Vladimir Vapnik, Douglas Hofstadter. Players: IBM Watson, Saffron, Netflix, more. Bayesians – origin in statistics; true believers in the power of Bayes’ theorem to perform probabilistic inference. Luminaries: Judea Pearl, David Heckerman, Michael Jordan. Players: SAS, Vicarious, Google, more. *5 Tribes of Learners as described in “The Master Algorithm” by researcher Pedro Domingas

Artificial intelligence @ Intel
connectivity Things & devices Cloud DATA Center Experiences … MACHINE/DEEP LEARNING REASONING SYSTEMS Programmable solutions COMPUTER VISION Accelerant Technologies TOOLS & STANDARDS Memory/storage Networking communications 5G Unleash Your Potential with Intel’s Complete AI Portfolio Intel has the most dense and power efficient transistor technology on the planet, the experience of successfully driving several major computing transformations, and the complete portfolio required to unleash the full potential of AI. At Intel, our vision is that if it is smart and connected, it is best with Intel. Intel is committed to AI and is making major investments across technology, training, resources and R&D to advance AI for business and society. We have a commitment to our partners, the industry as a whole and our global society to accelerate AI development, deliver end-to-end solutions, and lead the next generation of computing transformations. Within our industry, only Intel can make and deliver upon this commitment because of our comprehensive technology portfolio – an unparalleled portfolio developed through acquisition and innovation. Intel uniquely offers a range of computer solutions for machine learning in the data center from general purpose (Xeon, Xeon Phi) to targeted silicon (FPGA and Nervana technology). At the edge, Intel also offers a portfolio of processors (Core, Atom, Joule, etc) that utilize common intelligent APIs for distributed and collaborative intelligence. We acquired recognized AI leader, Nervana Systems, to accelerate training time, which is a critical phase of the AI development cycle – initially from days to hours and on the path from hours to minutes. The technology innovations from Nervana will be optimized specifically for neural networks to deliver the highest performance for deep learning, as well as unprecedented compute density with high-bandwidth interconnect for seamless model parallelism. We expect Nervana’s technologies to produce a breakthrough 100-fold increase in performance in the next three years to train complex neural networks, enabling data scientists to solve their biggest AI challenges faster. The Saffron cognitive platform leverages associative and machine learning techniques for memory-based reasoning and transparent analysis of multi-sourced, sparse, dynamic data. This technology is also particularly well-suited to small devices, making intelligent local analytics possible across IoT and helping advance state-of the-art collaborative AI. Our pending acquisition of this leading-edge computer vision powerhouse, Movidius, will give us a well-rounded presence for technologies at the edge. Embedded computer vision is increasingly important and Intel has a complete solution, with Intel® RealSense™ cameras seeing in 3D as the “eyes” of a device, Intel CPUs as the “brain,” and Movidius Myriad 2 vision processors as the “visual cortex.” These vision processing units (VPUs) deliver performance with an equally important low power and thermal footprint. Going forward, we expect to see intelligence built into every device, with data center functions distributed across a continuum from the edge to the cloud, and real-time communication occurring over high-speed 5G networks. At this juncture, we will have both the infrastructure and the algorithms to create more advanced AI with the ability to think independently, and our deep collaborations on the emerging 5G standards will make that possible.

Full intel AI portfolio Unleash potential experiences tools Frameworks
Intel® Deep Learning SDK Intel® Computer Vision SDK Movidius Neural Compute Stick Unleash potential Full Frameworks E2E Tool Mlib BigDL Intel Dist Intel® Nervana™ Graph* Movidius MvTensor Library libraries Associative Memory Base Intel® DAAL Intel® MKL MKL-DNN Intel® MLSL Lake Crest * hardware Intel has the best in class hardware portfolio for machine & deep learning, but software is critical to unleashing the full compute potential. To that end, Intel offers an optimized software stack in order to deliver game-changing AI applications. At the lowest level, Intel optimized primitive functions that are used across a wide array of machine & deep learning frameworks and solutions, including the Math Kernel Library (MKL), Data Analytics Acceleration Library (DAAL), and Intel Python distribution. At the framework level, Intel is committed to optimizing the most popular analytics, machine & deep learning frameworks by the end of As far as tools, Intel offers the Deep Learning SDK to accelerate deep learning training & deployment, the Saffron Natural Intelligence Platform for reasoning systems, and we are a key contributor to the Trusted Analytics Platform for classic machine learning and data analytics. The vertical separator delineates libraries/frameworks/tools that are machine & deep learning centric (left) from those that are memory-based reasoning (right). Compute Memory & Storage Networking Visual Intelligence *Coming 2017

Intel® Nervana™ Portfolio
Common Architecture for Machine & Deep Learning Lake Crest Intel® Xeon® Processors Intel® Xeon® Processor +FPGA Intel® Xeon Phi™ Processors Intel® Xeon® Processor + LakE CREST Most Widely Deployed Machine Learning Platform (>97%*) Higher Performance Machine Learning, General Purpose Breakthrough Deep Learning Inference & Workload Flexibility Best-in-Class Neural Network Training Performance Targeted acceleration Let’s take a closer look, beginning at the lowest layer: hardware. Intel® Xeon® Processors: Processor optimized for a wide variety of datacenter workloads, flexible infrastructure with low TCO Intel® Xeon Phi™ Processors: Processor for HPC & enterprises running scale-out, highly-parallel, memory intensive apps Intel® Xeon® Processor + FPGA: Reconfigurable accelerator for enhanced inference needs and flexible workload acceleration Intel® Xeon® Processor + Lake Crest: Accelerator for unprecedented training compute density in deep learning centric environments Intel offers the most comprehensive, flexible and performance optimized portfolio of products for machine & deep learning in the datacenter, including Intel® Xeon® processors and Intel® Xeon Phi™ processors (next generation codenamed Knights Mill, coming 2017) for general purpose infrastructures, as well as FPGA’s and Nervana technology (codenamed Lake Crest, coming 2017) for workload-optimized environments. In the next few slides, we’ll dive deeper into each product line in our machine & deep learning datacenter portfolio. *Intel® Xeon® processors are used in 97% of servers that are running machine learning workloads today (Source: Intel)

Intel® Nervana™ porftolio (Detail)
Batch Many batch models Train machine learning models across a diverse set of dense and sparse data Train large deep neural networks Train large models as fast as possible Training Lake Crest OR OR *Future* Batch Stream … Edge inference Infer billions of data samples at a time and feed applications within ~1 day Infer deep data streams with low latency in order to take action within milliseconds Power- constrained environments OR or other Intel® edge processor Option for higher throughput/watt Required for low latency TRAINING BATCH – customers who periodically retrain compute-intensive deep neural networks in batch mode (e.g. Johns Hopkins retraining medical imaging models with new patient scans+outcomes) Xeon: dynamically scale out and train numerous model runs in parallel on flexible infrastructure with low TCO Xeon Phi: dynamically scale out and train numerous model runs in parallel, with greater performance on flexible HPC Xeon Phi infrastructure MANY BATCH – customers doing R&D which requires iterative training of compute-intensive deep neural networks (e.g. Google DeepMind) Xeon Phi: dynamically scale out and train numerous model runs in parallel, with good performance on flexible HPC Xeon Phi infrastructure Xeon+Lake Crest (future): unparalleled deep learning training performance on application-specific accelerator INFERENCE BATCH – customers performing inference in batch mode do not require real-time answers (e.g. Google Photos image tagging) Xeon+FPGA: dynamically scale out and run inference in parallel on flexible infrastructure with low TCO; FPGA delivers higher performance/watt and offers fast workload switching for flexibility Xeon Phi: dynamically scale out and run inference in parallel, with greater performance on flexible HPC Xeon Phi infrastructure STREAM – customers performing inference in stream mode require near real-time answers (e.g. Google search) Xeon+FPGA: dynamically scale out and run inference in parallel on flexible infrastructure with low TCO; FPGA delivers low latency for fastest inference EDGE – customers performing inference at the edge in power-constrained environments (e.g. drone vision) Movidius: Vision Processing Units (VPU) deliver accelerated computer vision processing with a ~1W footpring Other: many other Intel technologies are applicable at the edge, including FPGA, Xeon, Core, Atom, Quark, etc. BATCH: Infer millions-billions of data samples at a time and feed application in real time- within a day E.g., Amazon Alexa, Google autofill search terms STREAM Latency sensitive Infer one data stream at a time and take action within milliseconds E.g., Tencent Ad platform, Facebook image tagging etc. EDGE Power constrained

roadmap: Intel® nervana™ platform
Shipping Coming Soon Today 2017 Future Crest family (nervana) Lake Crest Lake Crest TBA altera FPGA Arria 10 FPGA Canyon Vista TBA Targeted acceleration Intel® Xeon Phi™ processor Knights Landing Knights Mill TBA Intel® Xeon® processor Broadwell Skylake, +FPGA TBA Intel is introducing new products in each category in 2017: Next-generation Intel® Xeon® processor codenamed Skylake (general purpose) Next-generation Intel® Xeon Phi™ processor codenamed Knights Mill (general purpose with deep learning optimizations) Deep learning inference accelerator codenamed Canyon Vista, based on the Arria 10 FPGA (reconfigurable) Nervana technology codenamed Lake Crest (workload optimized for deep learning)

Knights Mill Performance Projections Projected Application performance
Lower is better KNM expected to deliver better deep learning time to train: Up to 74% faster than Xeon Phi™ 7250 estimated on 8 nodes Knights Mill Relative Performance (Normalized to 1.0 baseline of a Intel® Xeon Phi™ 7250) Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision # Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: Source: Intel measured or estimated as of May Configuration Details: See slide 14 KNM expected to deliver better deep learning time to train: Up to 74% faster than Xeon Phi™ 7250 estimated on 8 nodes

AI Software

AI inside Intel® Nervana™ portfolio experiences platforms Frameworks
Intel® Nervana™ Cloud & Appliance Intel® Computer Vision SDK Movidius Fathom Intel® Nervana™ DL Studio Frameworks Mllib BigDL Intel® Data Analytics Acceleration Library (DAAL) Intel® Nervana™ Graph* libraries Intel Python Distribution Intel® Math Kernel Library (MKL, MKL-DNN) More hardware Compute Memory & Storage Networking Intel has the full stack of end to end AI products for building AI solutions at enterprise scale. It has the best in class hardware portfolio for machine & deep learning, but software is critical to unleashing the full compute potential. To that end, Intel offers an optimized software stack in order to deliver game-changing AI applications. At the lowest level, Intel optimized primitive functions that are used across a wide array of machine & deep learning frameworks and solutions, including the Math Kernel Library (MKL), Data Analytics Acceleration Library (DAAL), and Intel Python distribution. At the framework level, Intel is committed to optimizing the most popular analytics, machine & deep learning frameworks by the end of As far as tools, Intel offers the Deep Learning SDK to accelerate deep learning training & deployment, the Saffron Natural Intelligence Platform for reasoning systems, and we are a key contributor to the Trusted Analytics Platform for classic machine learning and data analytics. Note: the vertical line like that splits the frameworks and libraries rows, indicates a split between deep learning (left side) and conventional machine learning (right side) *Future

Deep learning software – a many to many problem
DL software is a complex many-to-many problem; Further complicated if you want to optimally place your workload across a heterogeneous environment

Nervana graph

Neon Deep Learning Functions Hardware-Specific Transformers
intel® Nervana™ graph COMING SOON High-Performance Execution Graph for Neural Networks Intel® Nervana™ Graph enables optimizations that are applicable across multiple HW targets. Customer Solutions Neon Solutions Customer Models Neon Models Efficient buffer allocation Training vs inference optimizations Efficient scaling across multiple nodes Efficient partitioning of subgraphs Compounding of ops Customer Algorithms Neon Deep Learning Functions Hardware Agnostic Intel® Nervana™ Graph Hardware-Specific Transformers MKL-DNN The Intel® Nervana™ Graph will scale performance across hundreds of machine and deep learning frameworks Various deep learning frameworks generate a set and flow of functions required to execute a given deep learning topology. This set of functions is known as a graph. The Intel® Nervana™ graph optimizes these graphs. It takes as input a graph and finds the optimal set of functions, buffer allocations, and data layouts to execute the functions in the graphs. For example, the graph may combine a group of graph functions into one function. The heterogeneous graph transformer divides the graph across the hardware available. Work assigned to Intel’s Xeon, Intel’s Xeon Phi, or Intel’s FPGAs goes to an IA graph transformer which performs hardware-specific graph optimizations and uses Intel’s MKL-DNN for hardware specific lower level instructions. Work assigned to Lake Crest uses the Argon graph transformer to perform hardware-specific graph optimizations and converts operations into Lake Crest ISA. Work assigned to GPUs use a GPU graph transformer which performs hardware-specific graph optimizations and lower level instructions optimized for GPUs.

Benchmarks A holistic AI portfolio is also crucial, because in automated driving, as in so many other industries, many disparate systems and capabilities must work seamlessly together in order to deliver a safer and more enjoyable commute. From car to cloud, the most scalable automated driving platforms are built on Intel. Intel has an unmatched portfolio of AI technologies, and we are applying them to deliver the most scalable and secure platform for automated driving, enabling industry innovators to build rapidly and pursue countless design iterations across brands and fleets. Scalable, powerful in-vehicle computing Fulfill the widest range of market needs with scalable architecture for in-vehicle computing. Intel delivers an incredibly high performance per watt with plenty of headroom to grow for ADAS and the software-defined cockpit. Intel® silicon and software support better human-machine interface (HMI) designs that build trust between driver and vehicle and provide advanced virtualization and graphics capabilities. Next-generation network connectivity Be prepared to harness the next generation of connectivity for over-the-air updates, high-definition maps, and vehicle-to-vehicle (V2V) and vehicle-to-everything (V2X) communication. Intel’s automotive-grade wireless communications solutions support today’s state-of-the-art technology, while paving a path toward future 5G solutions. High-performance data center and cloud Ensure a high-performance data center with cloud capabilities that meets the demands of the new transportation value chain, from artificial intelligence (AI) to fleet management to data mining. Intel® Data Center technologies can scale to meet the demands of a range of data analytics workloads while accelerating the time to train deep learning models for AI—both key for highly and fully autonomous driving.

Intel Confidential - Internal Use Only
Deep learning optimizations (since 01/2016) Gain for SKX Platinum cores 2.5Ghz with optimized SW vs HSW E5-2699v3 18 cores 2.3Ghz un-optimized SW 298X 223X Configuration details on slide: 30 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: Source: Intel measured as of November 2016 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision # Given that total AI performance is more than about the raw TFLOP performance of the accelerators being used, Intel has invested significantly in software optimizations the end frameworks that developers/researchers use as well as the topologies that execute the various workload types for DL. When taken together, Intel is delivering massive generational improvements in total system performance. This graph shows generational comparisons of raw training performance on previous generation CPU vs. new Skylake with optimized software - in some cases delivering over a 200x reduction in the time to train. Intel Confidential - Internal Use Only

NEOn + ngraph solution example

Conditional GAN: code chunks
# batch size args.batch_size = 128 # batch axis N = ng.make_axis(name='N', length=args.batch_size) # discriminator network disc_layers = [Affine(nout=128, weight_init=XavierInit(), bias_init=0, activation=relu, batch_norm=True), Affine(nout=1, weight_init=XavierInit(), bias_init=0, activation=Logistic())] discriminator = Sequential(disc_layers, name="Discriminator") # image placeholder L = ng.make_axis(name='L', length=784) image_axes = ng.make_axes([L, N]) image = ng.placeholder(axes=image_axes) ng.concat_along_axis # generator network gen_layers = [ Affine(nout=128, weight_init=XavierInit(), bias_init=0, activation=relu, batch_norm=True), Affine(nout=784, weight_init=XavierInit(), bias_init=0, activation=Tanh())] generator = Sequential(gen_layers, name="Generator") ng.concat_along_axis # noise placeholder for noise source noisel = 100 noise_axis = ng.make_axis(name='M', length=noisel) z_ax = ng.make_axes([noise_axis, N]) z = ng.placeholder(axes=z_ax) # labels placeholder for conditioning Y = ng.make_axis(name='Y', length=10) # label_axes = ng.make_axes([Y,N]) labels = ng.one_hot(inputs['label'], axis=Y) # input data place holders inputs = train_set.make_placeholders()

Mnist Conditional gan with ngraph – output examples

summary Get started with #IntelAI today!
Artificial intelligence (AI), the next big wave in computing, is an increasingly important source for competitive advantage that is already transforming industries Today is the ideal time to begin integrating AI into your products, services and business processes Intel has the complete AI portfolio, world-class silicon, and experience from successfully driving previous major computing transformations Get started with #IntelAI today! Use Intel’s performance-optimized libraries & frameworks Contact your Intel representative for help and POC opportunities Find out more at & software.intel.com/ai See slide

Legal Notices & disclaimers
This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the annual report on Form 10-K. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Intel, the Intel logo, Pentium, Celeron, Atom, Core, Xeon and others are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © 2016 Intel Corporation.

additional info

Artificial Intelligence Opportunity
AI is the fastest growing data center workload 2016 Servers AI 40% Classic Machine Learning Deep Learning 60% 7% 97% Intel Architecture (IA) 1% IA+ GPU 2% Other 91% 7% Intel Architecture (IA) 2% IA+ GPU Other See slide Source: Amalgamation of Intel data, analyst data and Intel analysis. Intel Confidential

Artificial Intelligence Plan
Bringing the HPC Strategy to AI Intel® Nervana™ Portfolio Top 500 % FLOPs 35% 5% 60% 93% 7% 16% 20% 64% Nvidia intro Xeon Phi intro November ‘16 Nvidia* Intel® Xeon Phi™ Xeon % Most widely deployed machine learning solution High performance, classic machine learning Coming 2017 Skylake Coming 2017 Knights Mill Programmable, low-latency inference Lake Crest Best in class neural network performance SDVs Shipping Today Broadwell + Arria 10 Coming 2017 Lake Crest See slide Intel Confidential

Intel® Deep Learning SDK
BETA Available Now Accelerate Deep Learning Development For developers looking to accelerate deep learning model design, training & deployment FREE for data scientists and software developers to develop, train & deploy deep learning Simplify installation of Intel optimized frameworks and libraries Increase productivity through simple and highly- visual interface Enhance deployment through model compression and normalization Facilitate integration with full software stack via inference engine software.intel.com/deep-learning-sdk The Intel® Deep Learning SDK is a set of tools for data scientists and software developers to develop, train, and deploy deep learning solutions. The SDK encompasses a training tool and a deployment tool that can be used separately or together in a complete deep learning workflow. The Training Tools allows data scientists to easily prepare training data, design models, and train models with automated experiments and advanced visualizations. It also simplifies the installation and usage of popular deep learning frameworks optimized for Intel platforms. The Deployment Tool enables developers to optimize trained deep learning models through model compression and weight quantization, which are tailored to end-point device characteristics. It also delivers a unified API to integrate the inference with application logic. The technical preview is now available at software.intel.com/deep-learning-sdk.

BIGDL Bringing Deep Learning to Big Data
BETA: Now GOLD: Q1’17 Bringing Deep Learning to Big Data For developers looking to run deep learning on Hadoop/Spark due to familiarity or analytics use Open Sourced Deep Learning Library for Apache Spark* Make Deep learning more Accessible to Big data users and data scientists. Feature Parity with popular DL frameworks like Caffe, Torch, Tensorflow etc. Easy Customer and Developer Experience Run Deep learning Applications as Standard Spark programs; Run on top of existing Spark/Hadoop clusters (No Cluster change) High Performance powered by Intel MKL and Multi-threaded programming. Efficient Scale out leveraging Spark architecture. Spark Core SQL SparkR Stream-ing MLlib GraphX ML Pipeline DataFrame BigDL github.com/intel-analytics/BigDL BigDL is a distributed deep learning library for Apache Spark. It was developed inside Intel and contributed back to the community. BigDL was open sourced in Dec With BigDL, users can write their deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters. As the leading framework for Distributed ML, the addition of deep learning to the super-popular Spark framework is important, because it allows Spark developers to perform a range of data analysis tasks—including data wrangling, interactive queries, and stream processing—within a single framework. That helps avoid the complexity inherent in using multiple frameworks and libraries. Three important features offered by Big DL are (1) Rich Deep learning support, (2) High Singe Node Xeon Performance and last but least (3) Efficient Scale out leveraging Spark architecture. Lets click down on each of these… 1.) Rich deep learning support. Modeled after Torch, BigDL provides comprehensive support for deep learning, including numeric computing (via Tensor) and high level neural networks; in addition, users can load pre-trained Caffe or Torch models into Spark programs using BigDL. 2.)High Single Node Xeon Performance. To achieve high performance, BigDL uses Intel MKL and multi-threaded programming in each Spark task. Consequently, it is orders of magnitude faster than out-of-box open source Caffe, Torch or TensorFlow on a single-node Xeon. 3.) Efficient scale-out leveraging Spark architecture. BigDL can efficiently scale out to perform data analytics at "Big Data scale", by leveraging Apache Spark architecture. The typical BigDL users want to (1)Analyze “big data” using deep learning on the same Hadoop/Spark cluster where the data is stored · (2) Add deep learning functionalities (either training or prediction) to your Big Data (Spark) programs and/or workflow. (3) Leverage existing Hadoop/Spark clusters to run your deep learning applications, which can be then dynamically shared with other workloads (e.g., ETL, data warehouse, feature engineering, classical machine learning, graph analytics, etc.). With this unified platform, customers can eliminate large volume of unnecessary dataset transfer between separate systems, eliminate separate HW clusters (e.g. CPU and GPU clusters) and move towards a CPU cluster, reduce system complexity and the latency for end-to-end learning. Ultimately, customers can achieve better scale, higher resource utilization, ease of use/development, and better TCO. Is there a list of features that BigDL has parity with Caffe & Torch? >> Whatever model can be built with Caffe or Torch can be built with BigDL and scaled out to clusters. Can you run Caffe or Torch trained models out of the box? >> You can write a Spark program using BigDL API to load Caffe or Torch trained models Caffe and Torch are declining in popularity, while TensorFlow and MXnet are growing rapidly, so what is the BigDL feature roadmap going forward relative to those two leaders? >> At a high level, any algorithm you can build in one framework can be built with BigDL, from that standpoint we have functional parity with most DL frameworks; Since TensorFlow now has Spark support, why should a new user choose BigDL instead? >> If you are referring toTensorframes, you cannot do distributed training on tensorframes.. If referring to tensorflow on Spark, it is important to remember that it is a Coarse grained job level and HDFS file level and cannot do seamless integration of big data analytics pipeline. Also, the OpenMP native optimizations in the popular DL frameworks like Caffe & Tensorflow are in conflict with Java threading model used in Spark which results in lower performance than BigDL. How does BigDL’s performance compare to other leading frameworks on common topologies? >> For IA, single node, we ~80% of Intel caffe for GoogleNet v1 and v2; for multi-node (upto 16 nodes) we are comparable >> For GPU, single node Haswell ~ 70-80% of BVLC Caffe (K40) and Broadwell is ~70-80% of BVLC Caffe (K80) How do you pronounce BigDL? “Big deal” 

Intel distribution for python
Advancing Python Performance Closer to Native Speeds For developers using the most popular and fastest growing programming language for AI Easy, Out-of-the-box Access to High Performance Python Prebuilt, optimized for numerical computing, data analytics, HPC Drop in replacement for your existing Python (no code changes required) Drive Performance with Multiple Optimization Techniques Accelerated NumPy/SciPy/Scikit- Learn with Intel® MKL Data analytics with pyDAAL, enhanced thread scheduling with TBB, Jupyter* Notebook interface, Numba, Cython Scale easily with optimized MPI4Py and Jupyter notebooks Faster Access to Latest Optimizations for Intel Architecture Distribution and individual optimized packages available through conda and Anaconda Cloud Optimizations upstreamed back to main Python trunk See slide software.intel.com/intel-distribution-for-python

Intel® MKL-DNN Math Kernel Library for Deep Neural Networks
For developers of deep learning frameworks featuring optimized performance on Intel hardware Distribution Details Open Source Apache 2.0 License Common DNN APIs across all Intel hardware. Rapid release cycles, iterated with the DL community, to best support industry framework integration. Highly vectorized & threaded for maximal performance, based on the popular Intel® MKL library. BETA Now Available! github.com/01org/mkl-dnn Direct 2D Convolution Local response normalization (LRN) Rectified linear unit neuron activation (ReLU) Maximum pooling Inner product Intel® MKL-DNN (Math Kernel Library for Deep Neural Networks) is highly optimized using industry leading techniques and low level assembly code where appropriate. The API has been developed with feedback and interaction with the major framework owners, and as an open source project will track new and emerging trends in these frameworks. Intel is using this internally for our work in optimizing industry frameworks, as well as supporting the industry in their optimizations.

Intel® Machine learning scaling library (MLSL)
Scaling Deep Learning to 32 Nodes and Beyond BETA Now Available! For maximum deep learning scale-out performance on Intel® architecture Deep learning abstraction of message-passing implementation Built on top of MPI; allows other communication libraries to be used as well Optimized to drive scalability of communication patterns Works across various interconnects: Intel® Omni-Path Architecture, InfiniBand, and Ethernet Common API to support Deep Learning frameworks (Caffe, Theano, Torch etc.) FORWARD PROP BACK PROP L A Y E R 1 2 N Allreduce Alltoall Reduce Scatter Allgather github.com/01org/MLSL/releases The Intel® Machine Learning Scaling Library (Intel® MLSL) is a collection of communication primitives and building blocks to scale deep learning framework performance over a cluster deployment. It is intended for deep learning framework developers and optimizers. For example, a framework developer calls functions to distribute Caffe training compute across an Intel® Xeon Phi™ cluster.

Vilen Jumutcs Solutions Architect, ML/DL Expert Intel Corp (EMEA)

Similar presentations

Presentation on theme: "Vilen Jumutcs Solutions Architect, ML/DL Expert Intel Corp (EMEA)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Vilen Jumutcs Solutions Architect, ML/DL Expert Intel Corp (EMEA)

Similar presentations

Presentation on theme: "Vilen Jumutcs Solutions Architect, ML/DL Expert Intel Corp (EMEA)"— Presentation transcript:

Similar presentations

About project

Feedback