Geoffrey Fox High-Performance Big Data Computing: International, National, and Local initiatives COLLABORATORS China and IU: Fudan University, SICE, OVPR.

Slides:



Advertisements
Similar presentations
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
Advertisements

1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Clouds from FutureGrid’s Perspective April Geoffrey Fox Director, Digital Science Center, Pervasive.
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Dibbs Research at Digital Science
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Panel Session The Challenges at the Interface of Life Sciences and Cyberinfrastructure and how should we tackle them? Chris Johnson, Geoffrey Fox, Shantenu.
Data Science at Digital Science October Geoffrey Fox Judy Qiu
Big Data to Knowledge Panel SKG 2014 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China August Geoffrey Fox
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
SALSASALSA Large-Scale Data Analysis Applications Computer Vision Complex Networks Bioinformatics Deep Learning Data analysis plays an important role in.
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.
1 Panel on Merge or Split: Mutual Influence between Big Data and HPC Techniques IEEE International Workshop on High-Performance Big Data Computing In conjunction.
Percipient StorAGe for Exascale Data Centric Computing Exascale Storage Architecture based on “Mero” Object Store Giuseppe Congiu Seagate Systems UK.
Geoffrey Fox Panel Talk: February
Panel: Beyond Exascale Computing
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes for an HPC Enhanced Cloud and Fog Spanning IoT Big Data and Big Simulations.
Digital Science Center II
Department of Intelligent Systems Engineering
Introduction to Distributed Platforms
CSPA & Digital Transformation
Status and Challenges: January 2017
Implementing parts of HPC-ABDS in a multi-disciplinary collaboration
NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.
Department of Intelligent Systems Engineering
Interactive Website (
Research in Digital Science Center
Big Data Processing Issues taking care of Application Requirements, Hardware, HPC, Grid (distributed), Edge and Cloud Computing Geoffrey Fox, November.
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
Department of Intelligent Systems Engineering
Digital Science Center I
HPSA18: Logistics 7:00 am – 8:00 am Breakfast
I590 Data Science Curriculum August
High Performance Big Data Computing in the Digital Science Center
Research in Intelligent Systems Engineering
Data Science Curriculum March
AI-Driven Science and Engineering with the Global AI and Modeling Supercomputer GAIMSC Workshop on Clusters, Clouds, and Data for Scientific Computing.
Tutorial Overview February 2017
Department of Intelligent Systems Engineering
AI First High Performance Big Data Computing for Industry 4.0
Data Science for Life Sciences Research & the Public Good
Hilton Hotel Honolulu Tapa Ballroom 2 June 26, 2017 Geoffrey Fox
13th Cloud Control Workshop, June 13-15, 2018
A Tale of Two Convergences: Applications and Computing Platforms
Martin Swany Gregor von Laszewski Thomas Sterling Clint Whaley
Research in Digital Science Center
Scalable Parallel Interoperable Data Analytics Library
Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data
Summary of Streaming Data Workshop STREAM2015 October
Clouds from FutureGrid’s Perspective
HPC Cloud and Big Data Testbed
Discussion: Cloud Computing for an AI First Future
Digital Science Center III
Cyberinfrastructure and PolarGrid
Twister2: Design of a Big Data Toolkit
Department of Intelligent Systems Engineering
Digital Science Center
2 Programming Environment for Global AI and Modeling Supercomputer GAIMSC 2/19/2019.
$1M a year for 5 years; 7 institutions Active:
PHI Research in Digital Science Center
PolarGrid and FutureGrid
Panel on Research Challenges in Big Data
Big Data, Simulations and HPC Convergence
Research in Digital Science Center
CReSIS Cyberinfrastructure
Research in Digital Science Center
Convergence of Big Data and Extreme Computing
Twister2 for BDEC2 Poznan, Poland Geoffrey Fox, May 15,
Presentation transcript:

Geoffrey Fox High-Performance Big Data Computing: International, National, and Local initiatives COLLABORATORS China and IU: Fudan University, SICE, OVPR SPIDAL: NIST, Intel, Arizona State University, Rutgers University, Stony Brook University, Universities of Kansas and Utah, Virginia Tech BDECng: Europe, China, Japan, DoE, NSF, Tennessee, Northwestern, Iowa, John Hopkins …….. Local: Judy Qiu, Martin Swany, David Crandall, UITS, Precision Health

 Middleware and High Performance Analytics Libraries for Scalable Data Science 7 Universities collaborating on an ecosystem for scalable applications with the performance of HPC (High Performance Computing) and the rich functionality of the commodity Apache Big Data Stack. Cross-cutting high-performance data-analysis libraries; SPIDAL (Scalable Parallel Interoperable Data Analytics Library) will support new programming and execution models for data-intensive analysis in a wide range of science and engineering applications. Technology driven by major application data challenges in seven different communities: Biomolecular Simulations, Network and Computational Social Science, Epidemiology, Computer Vision, Spatial Geographical Information Systems, Remote Sensing for Polar Science, and Pathology Informatics. Indiana University: David Crandall working with Kansas on Image Processing for Polar Science Remote Sensing; Judy Qiu: Harp DAAL data analytics developed with INTEL; Twister2: a set of middleware components to support batch or streaming data capabilities familiar from Apache Hadoop, Spark, Heron and Flink but with high performance. Twister2 covers bulk synchronous and data flow communication; task management as in Mesos, Yarn and Kubernetes; dataflow graph execution models; launching of the Harp-DAAL library; streaming and repository data access interfaces, in-memory databases and fault tolerance at dataflow nodes. Execution Time Kmeans: Number of Centers

BigData Exascale BDECng Vision: Data Center to Edge The nature of scientific and engineering research is changing and is challenging the high-performance computing (HPC) community to adapt and connect to a broader community. It increasingly depends on an integrated ecosystem of sensors and instruments, big data analytics and machine learning, and computational science. Concurrently, rich commercial cloud computing and data analytic services have also emerged, creating new opportunities and organizational perspectives for scientific discovery. Sensors and instruments now span an enormous range, from a small number of large- scale, scientific instruments to billions of edge/IoT devices and produce massive amounts of real-time, streaming data. HPC, parallel computing, and data analytics pervade all elements of this continuum, often requiring closed loop integration and computational adaptation. The BDEC community is engaged in a shaping strategy and process that builds on the collective expertise of an expanded community: New application domains, Infrastructure providers, Technology community First BDECng meeting: Indiana University Bloomington October 3-5 2018

Fudan University Kickoff Meeting Wednesday April 11: HPSA18: High-Performance Systems and Analytics for Big Data Workshop: A path to future Artificial General Intelligence Rick Van Kooten OVPR, Judy Qiu IU, and X. Sean Wang Fudan: Introduction Linton Ward, IBM Building the cognitive platform to accelerate innovation (Keynote 1) Nathan Greeneltch Intel, Learn Faster with Intel Data Analytics Acceleration Library (DAAL) Takuya Araki NEC, SX-Aurora TSUBASA and its application to machine learning Anthony Skjellum UTC Adding HPC to Apache Spark Andrew Younge Sandia, Supporting High Performance Analysts with System Software for Virtualized Supercomputing Anil Vullikanti Virginia Tech, Finding Trees and Anomalous Subgraphs in Parallel Albert Jonathan Univ. of Minnesota, Geo- Distributed Clouds For Data Analytics Tony Hey UK STFC Big Scientific Data and Data Science (Keynote2) Piotr Luszczek Univ. of Tenessee, HPC Autotuning Techniques for Computational Kernels in Data Analytics Scott Michael UITS, IU Big Data Infrastructure Wo Chang NIST, NIST PWG Big Data Reference Architecture for HPC and Analytics X. Sean Wang Fudan University, Overview including Astronomy (SKA) Data Analysis Weihua Zhang Fudan, Eunomia: Scaling Concurrent Search Trees under Contention Using HTM Martin Swany IU, Hardware-Accelerated Network Microservices for Big Data and Extreme Scale Computing Geoffrey Fox IU, High Performance Big Data Computing in the Digital Science Center Panel Session (Chair: Dennis Gannon) on 5 year challenges in HPSA

THANK YOU PRESENTER CONTACT INFO gcf@iu.edu Digital Science Center https://www.dsc.soic.indiana.edu/, Big Data Exascale http://www.exascale.org/bdec/, SPIDAL Collaboration http://www.spidal.org/, HPSA18: http://ipcc.soic.iu.edu/HPSA/