Towards large-scale parallel simulated packings of ellipsoids with OpenMP and HyperFlow Monika Bargieł 1, Łukasz Szczygłowski 1, Radosław Trzcionkowski.

Slides:



Advertisements
Similar presentations
Polska Infrastruktura Informatycznego Wspomagania Nauki w Europejskiej Przestrzeni Badawczej Institute of Computer Science AGH ACC Cyfronet AGH The PL-Grid.
Advertisements

Scientific Workflow Support in the PL-Grid Infrastructure with HyperFlow Bartosz Baliś, Tomasz Bartyński, Kamil Figiela, Maciej Malawski, Piotr Nowakowski,
Natural Resources Canada Ressources naturelles Canada Canadian Forest Service Service canadien des forêts Conseil national de recherches Canada National.
Creating Computer Programs lesson 27. This lesson includes the following sections: What is a Computer Program? How Programs Solve Problems Two Approaches:
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames Qingan Andy Zhang PhD Candidate Department of Mechanical and Industrial Engineering.
Presenter: Joshan V John Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan & Tien N. Nguyen Iowa State University, USA Instructor: Christoph Csallner 1 Joshan.
ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.
Jawwad A Shamsi Nouman Durrani Nadeem Kafi Systems Research Laboratories, FAST National University of Computer and Emerging Sciences, Karachi Novelties.
Towards auto-scaling in Atmosphere cloud platform Tomasz Bartyński 1, Marek Kasztelnik 1, Bartosz Wilk 1, Marian Bubak 1,2 AGH University of Science and.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Towards scalable, semantic-based virtualized storage.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Parallel Communications and NUMA Control on the Teragrid’s New Sun Constellation System Lars Koesterke with Kent Milfeld and Karl W. Schulz AUS Presentation.
“Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015.
Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms Reporter: Jilin Zhang Authors:Changjun Hu, Yali.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Cluster-based SNP Calling on Large Scale Genome Sequencing Data Mucahid KutluGagan Agrawal Department of Computer Science and Engineering The Ohio State.
Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.
Debugging and Profiling GMAO Models with Allinea’s DDT/MAP Georgios Britzolakis April 30, 2015.
In each iteration macro model creates several micro modules, sends data to them and waits for the results. Using Akka Actors for Managing Iterations in.
Parallel Computing with Matlab CBI Lab Parallel Computing Toolbox TM An Introduction Oct. 27, 2011 By: CBI Development Team.
Parallelisation of Random Number Generation in PLACET Approaches of parallelisation in PLACET Martin Blaha University of Vienna AT CERN
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:
DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.
High Level Architecture (HLA)  used for building interactive simulations  connects geographically distributed nodes  time management (for time- and.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
RELAP5-3D Uncertainty Analysis A.J. Pawel and Dr. George L. Mesina International RELAP Users’ Seminar 2011 July 25-28, 2011.
EC-project number: Universal Grid Client: Grid Operation Invoker Tomasz Bartyński 1, Marian Bubak 1,2 Tomasz Gubała 1,3, Maciej Malawski 1,2 1 Academic.
*Partially funded by the Austrian Grid Project (BMBWK GZ 4003/2-VI/4c/2004) Making the Best of Your Data - Offloading Visualization Tasks onto the Grid.
LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
The EDGeS project receives Community research funding 1 Porting Applications to the EDGeS Infrastructure A comparison of the available methods, APIs, and.
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.
Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3,
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
George Goulas, Christos Gogos, Panayiotis Alefragis, Efthymios Housos Computer Systems Laboratory, Electrical & Computer Engineering Dept., University.
Lightweight construction of rich scientific applications Daniel Harężlak(1), Marek Kasztelnik(1), Maciej Pawlik(1), Bartosz Wilk(1) and Marian Bubak(1,
GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.
A l a p a g o s : a generic distributed parallel genetic algorithm development platform Nicolas Kruchten 4 th year Engineering Science (Infrastructure.
Application of the MCMC Method for the Calibration of DSMC Parameters James S. Strand and David B. Goldstein The University of Texas at Austin Sponsored.
Experiences with Achieving Portability across Heterogeneous Architectures Lukasz G. Szafaryn +, Todd Gamblin ++, Bronis R. de Supinski ++ and Kevin Skadron.
Threaded Programming Lecture 2: Introduction to OpenMP.
AMH001 (acmse03.ppt - 03/7/03) REMOTE++: A Script for Automatic Remote Distribution of Programs on Windows Computers Ashley Hopkins Department of Computer.
Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source:
Structuring Experimenting Esmée Bertens Tim de Ridder Herman de Vos /Department of Mechanical Engineering Systems Engineering Group Masters Team Project.
Experiences Running Seismic Hazard Workflows Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF.
Parameter Sweep and Resources Scaling Automation in Scalarm Data Farming Platform J. Liput, M. Paciorek, M. Wrona, M. Orzechowski, R. Slota, and J. Kitowski.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Susanna Guatelli Geant4 in a Distributed Computing Environment S. Guatelli 1, P. Mendez Lorenzo 2, J. Moscicki 2, M.G. Pia 1 1. INFN Genova, Italy, 2.
NGS computation services: APIs and.
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
About the Capability of Some Parallel Program Metric Prediction Using Neural Network Approach Vera Yu. Goritskaya Nina N. Popova
Origami: Scientific Distributed Workflow in McIDAS-V Maciek Smuga-Otto, Bruce Flynn (also Bob Knuteson, Ray Garcia) SSEC.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Multicore Applications in Physics and Biochemical Research Hristo Iliev Faculty of Physics Sofia University “St. Kliment Ohridski” 3 rd Balkan Conference.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
PYTHON: AN INTRODUCTION
NGS computation services: APIs and Parallel Jobs
FCT Follow-up Meeting 31 March, 2017 Fernando Meireles
Creating Computer Programs
Creating Computer Programs
Presentation transcript:

Towards large-scale parallel simulated packings of ellipsoids with OpenMP and HyperFlow Monika Bargieł 1, Łukasz Szczygłowski 1, Radosław Trzcionkowski 1, Maciej Malawski 1,2 1 Department of Computer Science, 2 Academic Computer Centre Cyfronet AGH University of Science and Technology CGW 15 Kraków, 28 October 2015

Outline Packing of ellipsoids – problem description Parallelization requirements Thread-level parallelization with OpenMP Task-level parallelism using HyperFlow Workflow execution using pilot jobs Experiments on Zeus cluster and results

Packing of ellipsoids – problem description Our goal is to obtain the highest packing fraction of ellipsoids of different shapes (axes ratio) still preserving randomness (in the sense of position and spatial orientation) of the bed. For this the Force Biased algorithm was adapted.

Algorithm description Time step of the calculation: calculate the ‘forces’ between pairs of overlapping particles, proportional to the size of the overlap, move (and possibly rotate) particles according to the resultant ‘forces’, reduce the particles’ diameter slightly (small reduction rate increases the density AND the execution time).

Requirements for parallelization Single simulation Computing overlapping regions between ellipsoids C++ code, in-house developed Multiple-nested loops Task-level parallelism (parameter study) Need to execute multiple simulations Vary particle shape, rotation factor, etc. Repeated runs to gather better statistics

Parallelization with OpenMP Sequential version: 8 minutes Using parallel for #pragma omp parallel for private(ipart) schedule(static) for (ipart = 0; ipart < No_parts; ipart++) { forces() method –down to 3 min 38 sec motion() method – down to 3 min 24 sec force_all() method – down to 3 min 23 sec

OpenMP speedup Parallel speedup on a single node for a system of molecules Zeus cluster Node: 2 x 6-core Intel Xeon L5640 processors 12 cores total Intel compiler.

HyperFlow - introduction Simple high-level workflow description + low-level programming capabilities for advanced developers Skilled programmers can be as productive as in any mainstream programming language Lightweight, non- invasive workflow deployment model that can be applied to various cloud platforms / infrastructures Processes = workflow activities  Connected through signals (ins and outs)  Can be mapped to commands OR JavaScript functions (Node.js) { "processes": [ { "name" : "ComputeStats", "ins" : [ "Data" ], "outs" : [ "Statistics" ], "config" : { "command" : { "executable" : "cstats.sh", "args" : "$Data_filename" } } }, { "name" : "PlotChart", "ins" : [ "Statistics" ], "outs" : [ "Charts" ], "function" : "plotCharts.js", } ], "signals": [ { "name" : "Data" },... ] } Simple JSON format, easy to generate Supports large-scale workflows

Parameter-study workflow Command tasks – call external executables Function tasks – evaluated in the workflow engine

Workflow generator using template Generation of Cartesian product of parameters Configurable repetition for averaging var ifCoord = [ "1" ]; var numberOfParticles = [ "1000" ]; var numberOfSpecies = [ "1" ]; var forceScalingFactor = [ "0.1" ]; var rotationScalingFactor = [ "3.00" ]; var diameterIncreasingFactor = [ "0.01" ]; var cellsX = [ "20", "40" ]; … var diameterOfParts3 = [ "1.0" ]; var numberOfLinesPerPage = [ "56" ]; var numberOfStepsBetweenPrintouts = [ "100" ]; var numberOfStepsBetweenCoord = [ " " ]; var numberOfStepsBetweenRotations = [ "1" ];

Setup on Zeus using pilot jobs Master node executed as interactive job Worker nodes submitted as batch jobs Using parallel file system for data exchange PBS Scripts for submission TODO diagram

Sample workflow execution Experiment with 40 computing tasks 40 pilot jobs submitted, 12 cores, 3 hours each Max 10 jobs running concurrently Total time < 2 hours Mean execution time 18 minutes Total 144 core-hours

Density packing results Semiaxes: 1 : a b : a  0 (prolate)  1 (oblate) Particles with a given axes ratio have a unique random packing density. Ellipsoids with  ≈ 1.7 and  ≈ 0.5 can be packed the densest

Conclusions Packing of ellipsoids proved to be a good application to parallelize Hybrid parallel model used – OpenMP within a single node – HyperFlow for large-scale workflow New deployment model of HyperFLow with pilot jobs tested 5000 CPU hours on Zeus consumed so far

Future work More large-scale runs Better automation of pilot-job management Generalization of parameter-study workflow Support for sensitivity analysis Deployment and tests on other infrastructures Clouds Containers Other parallelization options GPU, CUDA

References 1.Bartosz Balis, HyperFlow: A model of computation, programming approach and enactment engine for complex distributed workflows, Future Generation Computer Systems, Volume 55, February 2016, Pages , ISSN X, Mościński, J., Bargieł, M., “C-language program for simulation of irregular close packing of hard spheres”, Computer Physics Communication 64 (1991)