Digital Science Center

Slides:

Advertisements

Similar presentations

Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox

Advertisements

SLA-Oriented Resource Provisioning for Cloud Computing

System Center 2012 R2 Overview

Programming Models for IoT and Streaming Data IC2E Internet of Things Panel Judy Qiu Indiana University.

What is Cloud Computing? o Cloud computing:- is a style of computing in which dynamically scalable and often virtualized resources are provided as a service.

Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Dibbs Research at Digital Science

Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,

Cloud Computing Why is it called the cloud?.

Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox

Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science

Data Science at Digital Science October Geoffrey Fox Judy Qiu

FutureGrid Connection to Comet Testbed and On Ramp as a Service Geoffrey Fox Indiana University Infra structure.

Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.

SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox

Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von.

SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu

Web Technologies Lecture 13 Introduction to cloud computing.

Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.

© 2007 IBM Corporation IBM Software Strategy Group IBM Google Announcement on Internet-Scale Computing (“Cloud Computing Model”) Oct 8, 2007 IBM Confidential.

Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb

Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center 1.

Towards High Performance Processing of Streaming Data May Supun Kamburugamuve, Saliya Ekanayake, Milinda Pathirage and Geoffrey C. Fox Indiana.

Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit

1 Panel on Merge or Split: Mutual Influence between Big Data and HPC Techniques IEEE International Workshop on High-Performance Big Data Computing In conjunction.

Geoffrey Fox Panel Talk: February

Big Data Analytics and HPC Platforms

Hyungro Lee, Geoffrey C. Fox

Accessing the VI-SEEM infrastructure

AuraPortal Cloud Helps Empower Organizations to Organize and Control Their Business Processes via Applications on the Microsoft Azure Cloud Platform MICROSOFT.

Device Maintenance and Management, Parental Control, and Theft Protection for Home Users Made Easy with Remo MORE and Power of Azure MICROSOFT AZURE APP.

Organizations Are Embracing New Opportunities

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING

Digital Science Center II

Clouds , Grids and Clusters

Status and Challenges: January 2017

Bridges and Clouds Sergiu Sanielevici, PSC Director of User Support for Scientific Applications October 12, 2017 © 2017 Pittsburgh Supercomputing Center.

Couchbase Server is a NoSQL Database with a SQL-Based Query Language

NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.

In-Memory Performance

University of Technology

Department of Intelligent Systems Engineering

Some Remarks for Cloud Forward Internet2 Workshop

NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.

Department of Intelligent Systems Engineering

Digital Science Center I

Cloud Computing Dr. Sharad Saxena.

I590 Data Science Curriculum August

Versatile HPC: Comet Virtual Clusters for the Long Tail of Science SC17 Denver Colorado Comet Virtualization Team: Trevor Cooper, Dmitry Mishin, Christopher.

Data Science Curriculum March

Utilizing the Capabilities of Microsoft Azure, Skipper Offers a Results-Based Platform That Helps Digital Advertisers with the Marketing of Their Mobile.

Development of the Nanoconfinement Science Gateway

Tutorial Overview February 2017

Martin Swany Gregor von Laszewski Thomas Sterling Clint Whaley

Research in Digital Science Center

Scalable Parallel Interoperable Data Analytics Library

Clouds from FutureGrid’s Perspective

AWS Cloud Computing Masaki.

Digital Science Center III

Indiana University Gregor von Laszewski.

Department of Intelligent Systems Engineering

AGMLAB Information Technologies

Cloud-Enabling Technology

Software Acceleration in Hybrid Systems Xiaoqiao (XQ) Meng IBM T. J

PHI Research in Digital Science Center

Panel on Research Challenges in Big Data

Microsoft Virtual Academy

Big Data, Simulations and HPC Convergence

Research in Digital Science Center

Convergence of Big Data and Extreme Computing

Presentation transcript:

Digital Science Center Geoffrey C. Fox, David Crandall, Judy Qiu, Gregor von Laszewski, Fugang Wang, Badi' Abdul-Wahid Saliya Ekanayake, Supun Kamburugamuva, Jerome Mitchell, Bingjing Zhang School of Informatics and Computing, Indiana University Digital Science Center Research Areas Green implies HPC Integration Scientific Impact Metrics We developed a software framework and process to evaluate scientific impact for XSEDE. We calculate and track various Scientific Impact Metrics of XSEDE., BlueWaters, and NCAR. In recently conducted peer comparison study we showed how well XSEDE and its support services (ECSS) perform judging by citations received. During this process we retrieved and processed millions of data entries from multiple sources in various formats to obtain the result. Digital Science Center Facilities RaPyDLI Deep Learning Environment SPIDAL Scalable Data Analytics Library and applications including Ice Sheet MIDAS Big Data Software Big Data Ogres Classification and Benchmarks CloudIOT Internet of Things Environment Cloudmesh Cloud and Bare metal Automation XSEDE TAS Monitoring citations and system metrics Data Science Education with MOOC’s # Publications Rank - Average Rank - Median # Citation - Average # Citation - Median XSEDE 2349 61 65 26 11 Peers 168422 49 48 13 5 Ice Layer Detection Algorithm Data analytics for IoT devices in Cloud Parallel Sparse LDA High Performance Data Analytics with Java + MPI on Multicore HPC Clusters We developed a framework to bring data from IoT devices to a cloud environment for real time data analysis. The framework consists of; Data collection nodes near the devices, Publish-subscribe brokers to bring data to cloud and Apache Storm coupled with other batch processing engines for data processing in cloud. Our data pipe line is Robot → Gateway → Message Brokers → Apache Storm. Original LDA (orange) compared to LDA exploiting sparseness (blue) Note data analytics making use of Infiniband (i.e. limited by communication!) Java code running under Harp – Hadoop plus HPC plugin Corpus: 3,775,554 Wikipedia documents, Vocabulary: 1 million words; Topics: 10k topics; BR II is Big Red II supercomputer with Cray Gemini interconnect Juliet is Haswell Cluster with Intel (switch) and Mellanox (node) infiniband (not optimized) Harp LDA on BR2 The polar science community has built radars capable of surveying the polar ice sheets, and as a result, have collected terabytes of data and is increasing its repository each year as signal processing techniques improve and the cost of hard drives decrease enabling a new-generation of high resolution ice thickness and accumulation maps. Manually extracting layers from an enormous corpus of ice thickness and accumulation data is time-consuming and requires sparse hand-selection, so developing image processing techniques to automatically aid in the discovery of knowledge is of high importance. We find it is challenging to achieve high performance in HPC clusters for big data problems. We approach this with Java and MPI, but improves further using Java memory maps to and off heap data structures. We achieve zero intra-node messaging, zero GC, and minimal memory footprint. We present performance results of running it on a latest Intel Haswell HPC cluster consisting 3456 cores total. Simultaneous Localization and Mapping(SLAM) is an example application built on top of our framework, where we exploit parallel data processing to speedup the expensive SLAM computation. Robot Map of the Environment Harp LDA on Juliet (36 core nodes)

Digital Science Center Geoffrey C. Fox, David Crandall, Judy Qiu, Gregor von Laszewski, Fugang Wang, Badi' Abdul-Wahid, Saliya Ekanayake, Supun Kamburugamuva, Jerome Mitchell, Bingjing Zhang School of Informatics and Computing, Indiana University DSC Computing Systems FutureSystems Use of FutureSystems India Cluster Rapid Prototyping HPC Environment for Deep Learning Infrastructure HPC and cloud mixed environment Had openstack in service since Cactus release back to 2011 Currently operating two openstack clouds: juno and kilo, with ~60 compute nodes total. Usage Data of OpenStack 1,060,442 hours of wall time (between 01/01/2015 - 10/31/2015) 14,318 VM instances launched total (between 01/01/2015 - 10/31/2015) Average 63 VMs launched per day 70% of tiny or small server size and 30% of medium or (x)large server size were used Courses Total Courses to date: 75 course projects from 31 distinct institutions since 2010 SP15 v594 (online)Topic: cloud computing, big data Audience: #38 globally distributed GE employees Projects: deployments of big data platforms and analysis of data sets FA15, FA14 (online) [in progress] Topic: big data Audience: #139 students (on site & remote) Projects: analysis of sports, Amazon movie reviews, stock market, twitter data Undergraduate Student Research Developing Cloudmesh for cloud environment management REU and on-site students Just installed 128 node Haswell based system (Juliet) 128 GB memory per node Substantial conventional disk per node (8TB) plus PCI based SSD Infiniband with SR-IOV 24 and 36 core nodes (3456 total cores) Working with SDSC on NSF XSEDE Comet System (Haswell 47,776 cores) Older machines India (128 nodes, 1024 cores), Bravo (16 nodes, 128 cores), Delta(16 nodes, 192 cores), Echo(16 nodes, 192 cores), Tempest (32 nodes, 768 cores) with large memory, large disk and GPU Optimized for Cloud research and Large scale Data analytics exploring storage models, algorithms Build technology to support high performance virtual clusters We are developing an on-ramp to deep learning that utilizes HPC and GPU resources. It serves as an aggregate for modules related to deep learning. We address a multitude of issues including deployment, access, and integration into HPC environments. Simple interfaces are provided that allow easy reusability. We are working with our partners on a performance optimized convolution kernel that will be able to utilize state of the art GPUs including AMD. Data management will be available as part of deep learning workfflows. JULIET INDIA TEMPEST BRAVO-DELTA-ECHO Cloudmesh for Managing Virtual Clusters Virtual Clusters with SDSC Comet Threads v. Processes on 24 core Juliet Nodes for 48 and 96 nodes Threads v. Processes on 24 core Juliet Nodes for 24, 48 and 96 nodes Comet, is a new petascale supercomputer designed to transform advanced scientific computing by expanding access and capacity among traditional as well as non-traditional research domains. Comet will be capable of an overall peak performance of nearly two petaflops, or two quadrillion operations per second. We are working together with SDSC to deliver an easy to use service that allows users to request virtual clusters that are close to hardware. Advanced users will be able to request such clusters and allow the users to manage them and deploy their own software stacks on them. Comet is the first virtualized HPC cluster, and delivers a significantly increased level of computing capacity and customizability to support data-enabled science and engineering at the campus, regional, and national levels, and in turn support the entire science and engineering enterprise, including education as well as research. IU is currently working on delivering easy to use client interfaces and leverages experiences from our cloudmesh software. Cloudmesh is an important component designed to deliver a software-defined system – encompassing virtualized and bare-metal infrastructure, networks, application, systems and platform software – with a unifying goal of providing Cloud Testbeds as a Service (CTaaS). Cloudmesh federates a number of resources from academia and industry. This includes existing FutureSystems, Amazon Web Services, Azure, and HP Cloud.