Digital Science Center

Slides:



Advertisements
Similar presentations
Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox
Advertisements

SLA-Oriented Resource Provisioning for Cloud Computing
System Center 2012 R2 Overview
Programming Models for IoT and Streaming Data IC2E Internet of Things Panel Judy Qiu Indiana University.
What is Cloud Computing? o Cloud computing:- is a style of computing in which dynamically scalable and often virtualized resources are provided as a service.
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Dibbs Research at Digital Science
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
Cloud Computing Why is it called the cloud?.
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science
Data Science at Digital Science October Geoffrey Fox Judy Qiu
FutureGrid Connection to Comet Testbed and On Ramp as a Service Geoffrey Fox Indiana University Infra structure.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von.
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
Web Technologies Lecture 13 Introduction to cloud computing.
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.
© 2007 IBM Corporation IBM Software Strategy Group IBM Google Announcement on Internet-Scale Computing (“Cloud Computing Model”) Oct 8, 2007 IBM Confidential.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center 1.
Towards High Performance Processing of Streaming Data May Supun Kamburugamuve, Saliya Ekanayake, Milinda Pathirage and Geoffrey C. Fox Indiana.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
1 Panel on Merge or Split: Mutual Influence between Big Data and HPC Techniques IEEE International Workshop on High-Performance Big Data Computing In conjunction.
Geoffrey Fox Panel Talk: February
Big Data Analytics and HPC Platforms
Hyungro Lee, Geoffrey C. Fox
Accessing the VI-SEEM infrastructure
AuraPortal Cloud Helps Empower Organizations to Organize and Control Their Business Processes via Applications on the Microsoft Azure Cloud Platform MICROSOFT.
Device Maintenance and Management, Parental Control, and Theft Protection for Home Users Made Easy with Remo MORE and Power of Azure MICROSOFT AZURE APP.
Organizations Are Embracing New Opportunities
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Digital Science Center II
Clouds , Grids and Clusters
Status and Challenges: January 2017
Bridges and Clouds Sergiu Sanielevici, PSC Director of User Support for Scientific Applications October 12, 2017 © 2017 Pittsburgh Supercomputing Center.
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.
In-Memory Performance
University of Technology
Department of Intelligent Systems Engineering
Some Remarks for Cloud Forward Internet2 Workshop
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
Department of Intelligent Systems Engineering
Digital Science Center I
Cloud Computing Dr. Sharad Saxena.
I590 Data Science Curriculum August
Versatile HPC: Comet Virtual Clusters for the Long Tail of Science SC17 Denver Colorado Comet Virtualization Team: Trevor Cooper, Dmitry Mishin, Christopher.
Data Science Curriculum March
Utilizing the Capabilities of Microsoft Azure, Skipper Offers a Results-Based Platform That Helps Digital Advertisers with the Marketing of Their Mobile.
Development of the Nanoconfinement Science Gateway
Tutorial Overview February 2017
Martin Swany Gregor von Laszewski Thomas Sterling Clint Whaley
Research in Digital Science Center
Scalable Parallel Interoperable Data Analytics Library
Clouds from FutureGrid’s Perspective
AWS Cloud Computing Masaki.
Digital Science Center III
Indiana University Gregor von Laszewski.
Department of Intelligent Systems Engineering
AGMLAB Information Technologies
Cloud-Enabling Technology
Software Acceleration in Hybrid Systems Xiaoqiao (XQ) Meng IBM T. J
PHI Research in Digital Science Center
Panel on Research Challenges in Big Data
Microsoft Virtual Academy
Big Data, Simulations and HPC Convergence
Research in Digital Science Center
Convergence of Big Data and Extreme Computing
Presentation transcript:

Digital Science Center Geoffrey C. Fox, David Crandall, Judy Qiu, Gregor von Laszewski, Fugang Wang, Badi' Abdul-Wahid Saliya Ekanayake, Supun Kamburugamuva, Jerome Mitchell, Bingjing Zhang School of Informatics and Computing, Indiana University Digital Science Center Research Areas Green implies HPC Integration Scientific Impact Metrics We developed a software framework and process to evaluate scientific impact for XSEDE. We calculate and track various Scientific Impact Metrics of XSEDE., BlueWaters, and NCAR. In recently conducted peer comparison study we showed how well XSEDE and its support services (ECSS) perform judging by citations received. During this process we retrieved and processed millions of data entries from multiple sources in various formats to obtain the result. Digital Science Center Facilities RaPyDLI Deep Learning Environment SPIDAL Scalable Data Analytics Library and applications including Ice Sheet MIDAS Big Data Software Big Data Ogres Classification and Benchmarks CloudIOT Internet of Things Environment Cloudmesh Cloud and Bare metal Automation XSEDE TAS Monitoring citations and system metrics Data Science Education with MOOC’s   # Publications Rank - Average Rank - Median # Citation - Average # Citation - Median XSEDE 2349 61 65 26 11 Peers 168422 49 48 13 5 Ice Layer Detection Algorithm  Data analytics for IoT devices in Cloud Parallel Sparse LDA  High Performance Data Analytics with Java + MPI on Multicore HPC Clusters We developed a framework to bring data from IoT devices to a cloud environment for real time data analysis. The framework consists of; Data collection nodes near the devices, Publish-subscribe brokers to bring data to cloud and Apache Storm coupled with other batch processing engines for data processing in cloud. Our data pipe line is Robot → Gateway → Message Brokers → Apache Storm. Original LDA (orange) compared to LDA exploiting sparseness (blue) Note data analytics making use of Infiniband (i.e. limited by communication!) Java code running under Harp – Hadoop plus HPC plugin Corpus: 3,775,554 Wikipedia documents, Vocabulary: 1 million words; Topics: 10k topics; BR II is Big Red II supercomputer with Cray Gemini interconnect Juliet is Haswell Cluster with Intel (switch) and Mellanox (node) infiniband (not optimized) Harp LDA on BR2 The polar science community has built radars capable of surveying the polar ice sheets, and as a result, have collected terabytes of data and is increasing its repository each year as signal processing techniques improve and the cost of hard drives decrease enabling a new-generation of high resolution ice thickness and accumulation maps. Manually extracting layers from an enormous corpus of ice thickness and accumulation data is time-consuming and requires sparse hand-selection, so developing image processing techniques to automatically aid in the discovery of knowledge is of high importance. We find it is challenging to achieve high performance in HPC clusters for big data problems. We approach this with Java and MPI, but improves further using Java memory maps to and off heap data structures. We achieve zero intra-node messaging, zero GC, and minimal memory footprint. We present performance results of running it on a latest Intel Haswell HPC cluster consisting 3456 cores total. Simultaneous Localization and Mapping(SLAM) is an example application built on top of our framework, where we exploit parallel data processing to speedup the expensive SLAM computation. Robot Map of the Environment Harp LDA on Juliet (36 core nodes)

Digital Science Center Geoffrey C. Fox, David Crandall, Judy Qiu, Gregor von Laszewski, Fugang Wang, Badi' Abdul-Wahid, Saliya Ekanayake, Supun Kamburugamuva, Jerome Mitchell, Bingjing Zhang School of Informatics and Computing, Indiana University DSC Computing Systems FutureSystems Use of FutureSystems India Cluster Rapid Prototyping HPC Environment for Deep Learning Infrastructure HPC and cloud mixed environment Had openstack in service since Cactus release back to 2011 Currently operating two openstack clouds: juno and kilo, with ~60 compute nodes total. Usage Data of OpenStack 1,060,442 hours of wall time (between 01/01/2015 - 10/31/2015) 14,318 VM instances launched total  (between 01/01/2015 - 10/31/2015) Average 63 VMs launched per day 70% of tiny or small server size and 30% of medium or (x)large server size were used Courses Total Courses to date: 75 course projects from 31 distinct institutions since 2010 SP15 v594 (online)Topic: cloud computing, big data Audience: #38 globally distributed GE employees Projects: deployments of big data platforms and analysis of data sets FA15, FA14 (online) [in progress] Topic: big data Audience: #139 students (on site & remote) Projects: analysis of sports, Amazon movie reviews, stock market, twitter data Undergraduate Student Research Developing Cloudmesh for cloud environment management REU and on-site students Just installed 128 node Haswell based system (Juliet) 128 GB memory per node Substantial conventional disk per node (8TB) plus PCI based SSD Infiniband with SR-IOV 24 and 36 core nodes (3456 total cores) Working with SDSC on NSF XSEDE Comet System (Haswell 47,776 cores) Older machines India (128 nodes, 1024 cores), Bravo (16 nodes, 128 cores), Delta(16 nodes, 192 cores), Echo(16 nodes, 192 cores), Tempest (32 nodes, 768 cores) with large memory, large disk and GPU Optimized for Cloud research and Large scale Data analytics exploring storage models, algorithms Build technology to support high performance virtual clusters We are developing an on-ramp to deep learning that utilizes HPC and GPU resources. It serves as an aggregate for modules related to deep learning. We address a multitude of issues including deployment, access, and integration into HPC environments. Simple interfaces are provided that allow easy reusability. We are working with our partners on a performance optimized convolution kernel that will be able to utilize state of the art GPUs including AMD. Data management will be available as part of deep learning workfflows. JULIET INDIA TEMPEST BRAVO-DELTA-ECHO Cloudmesh for Managing Virtual Clusters Virtual Clusters with SDSC Comet  Threads v. Processes on 24 core Juliet Nodes for 48 and 96 nodes Threads v. Processes on 24 core Juliet Nodes for 24, 48 and 96 nodes Comet, is a new petascale supercomputer designed to transform advanced scientific computing by expanding access and capacity among traditional as well as non-traditional research domains. Comet will be capable of an overall peak performance of nearly two petaflops, or two quadrillion operations per second. We are working together with SDSC to deliver an easy to use service that allows users to request virtual clusters that are close to hardware. Advanced users will be able to request such clusters and allow the users to manage them and deploy their own software stacks on them. Comet is the first virtualized HPC cluster, and delivers a significantly increased level of computing capacity and customizability to support data-enabled science and engineering at the campus, regional, and national levels, and in turn support the entire science and engineering enterprise, including education as well as research. IU is currently working on delivering easy to use client interfaces and leverages experiences from our cloudmesh software. Cloudmesh is an important component designed to deliver a software-defined system – encompassing virtualized and bare-metal infrastructure, networks, application, systems and platform software – with a unifying goal of providing Cloud Testbeds as a Service (CTaaS). Cloudmesh federates a number of resources from academia and industry. This includes existing FutureSystems, Amazon Web Services, Azure, and HP Cloud.