Discussion: Cloud Computing for an AI First Future

Discussion: Cloud Computing for an AI First Future
13th Cloud Control Workshop, June 13-15, 2018 Skåvsjöholm in the Stockholm Archipelago Geoffrey Fox, June 14, 2018 Department of Intelligent Systems Engineering `, Work with Judy Qiu, Supun Kamburugamuva, Shantenu Jha, Kannan Govindarajan, Pulasthi Wickramasinghe, Gurhan Gunduz, Ahmet Uyar 1/3/2019

Cloud Computing for an AI First Future
Artificial Intelligence is a dominant disruptive technology affecting all our activities including business, education, research, and society. Further, several companies have proposed AI first strategies. The AI disruption is typically associated with big data coming from edge, repositories or sophisticated scientific instruments such as telescopes, light sources and gene sequencers. AI First requires mammoth computing resources such as clouds, supercomputers, hyperscale systems and their distributed integration. AI First clouds are related to High Performance Computing HPC -- Cloud or Big Data integration/convergence This panel could examine the driving applications and their implication for hardware and software infrastructure. Panel can look at type of hardware suitable for AI First clouds 1/3/2019

Summit Supercomputer Oak Ridge
Architecture: 4,608 compute servers, each containing two 22-core IBM Power9 processors and six NVIDIA Tesla V100 graphics processing unit accelerators, Power 15 MW Storage 250 PB Infiniband NVLINK Note AI First “smartest” adjective 1/3/2019

Big Data and Simulation Difficulty in Parallelism Size of Synchronization constraints
Loosely Coupled Tightly Coupled HPC Clouds: Accelerators High Performance Interconnect HPC Clouds/Supercomputers Memory access also critical Commodity Clouds Size of Disk I/O MapReduce as in scalable databases Graph Analytics e.g. subgraph mining Global Machine Learning e.g. parallel clustering Deep Learning LDA Pleasingly Parallel Often independent events Unstructured Adaptive Sparse Linear Algebra at core (often not sparse) Current major Big Data category Structured Adaptive Sparse Parameter sweep simulations Largest scale simulations Just two problem characteristics There is also data/compute distribution seen in grid/edge computing Exascale Supercomputers 1/3/2019

Discussion at Workshop I
Now moving to AI First Future Time Situation Situation in The discussion was animated with however agreement that The current big data focus is to left of previous slide with for example Flink and Spark having workflow and SQL special thrusts Over next five years, we can expect a greater focus of the big data community on the right side with deep learning, LDA and graph analytics having features quite similar to classic HPC problems that require high performance communication There was also agreement that are “AI First” cloud was a plausible vision However there was significant disagreement as to how to realize an “AI First” cloud and if HPC was relevant It was felt that clouds would always chose a “sweet spot” too far from HPC choices for HPC community to be able to use. 1/3/2019

Discussion at Workshop II
The slow rate of change in the HPC community with many legacy applications using technologies like Fortran was noted Would Cray survive in a “converged world”? The storage differences between HDFS style in clouds and Lustre in HPC was noted However S3 (AWS) is nearer Lustre architecture (with storage separate from compute) than HDFS. The interconnect difference between Ethernet (Clouds) and Infiniband (HPC) was noted The HPC community could surely adapt to chosen public cloud communication technology as this does not affect programming model – just details of implementation. If one wants to run graph analytics on public clouds one will need excellent communication as the irregular structure and low compute/communication of big data graphs probably makes them harder for efficient implementation than big simulation problems. 1/3/2019

Discussion at Workshop III
Another point of view was that the movement to the right of slide 4 offers a natural chance for convergence of HPC and public cloud hardware and software. The expectation of importance of accelerators in both big data and HPC was stressed Maybe those accelerators could be different (TPU v. GPU) but the same accelerators could drive HPC and public clouds Probably one would always use specialized machines (supercomputers) for the really large jobs with say 100,000 cores or more as not easy to get co-location at their level on public clouds Note NASA evaluation of public clouds for workloads on NASA supercomputers pdf Note BDEC meeting on following slide. Ask Geoffrey Fox for more information 1/3/2019

Big Data and Extreme-scale Computing (BDEC) http://www. exascale
BDEC Pathways to Convergence Report Next Meeting October 3-5, 2018 Bloomington Indiana USA. October 3 is evening reception with meeting focus “Defining application requirements for a data intensive computing continuum” Later meeting February Kobe, Japan (National infrastructure visions); Q Europe (Exploring alternative platform architectures); Q4, 2019 USA (Vendor/Provider perspectives); Q2, 2020 Europe (? Focus); Q3-4, 2020 Final meeting Asia (write report) 1/3/2019

Discussion: Cloud Computing for an AI First Future

Similar presentations

Presentation on theme: "Discussion: Cloud Computing for an AI First Future"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Discussion: Cloud Computing for an AI First Future

Similar presentations

Presentation on theme: "Discussion: Cloud Computing for an AI First Future"— Presentation transcript:

Similar presentations

About project

Feedback