Large Eddy Simulation of two phase flow combustion in gas turbines: Predicting extreme combustion processes in real engines Isabelle d’Ast - CERFACS.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Multidisciplinary Computation and Numerical Simulation V. Selmin.
GWDAW 16/12/2004 Inspiral analysis of the Virgo commissioning run 4 Leone B. Bosi VIRGO coalescing binaries group on behalf of the VIRGO collaboration.
Current Progress on the CCA Groundwater Modeling Framework Bruce Palmer, Yilin Fang, Vidhya Gurumoorthi, Computational Sciences and Mathematics Division.
Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames Qingan Andy Zhang PhD Candidate Department of Mechanical and Industrial Engineering.
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
I/O Analysis and Optimization for an AMR Cosmology Simulation Jianwei LiWei-keng Liao Alok ChoudharyValerie Taylor ECE Department Northwestern University.
Parallellization of a semantic discovery tool for a search engine Kalle Happonen Wray Buntine.
Parallel Programming Models and Paradigms
Brad Whitlock October 14, 2009 Brad Whitlock October 14, 2009 Porting VisIt to BG/P.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
D. Zuzio* J-L. Estivalezes* *ONERA/DMAE, 2 av. Édouard Belin, Toulouse, France Simulation of a Rayleigh-Taylor instability with five levels of adaptive.
FLANN Fast Library for Approximate Nearest Neighbors
The hybird approach to programming clusters of multi-core architetures.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Task Farming on HPCx David Henty HPCx Applications Support
The Asynchronous Dynamic Load-Balancing Library Rusty Lusk, Steve Pieper, Ralph Butler, Anthony Chan Mathematics and Computer Science Division Nuclear.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Mixed MPI/OpenMP programming on HPCx Mark Bull, EPCC with thanks to Jake Duthie and Lorna Smith.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
Basis Light-Front Quantization: a non-perturbative approach for quantum field theory Xingbo Zhao With Anton Ilderton, Heli Honkanen, Pieter Maris, James.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Alon Horn and Oren Ierushalmi Supervised by Mony Orbach Winter 2010 Final Presentation Implementation of an Engine Control Unit over Many-Core System.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Discontinuous Galerkin Methods and Strand Mesh Generation
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Stochastic optimization of energy systems Cosmin Petra Argonne National Laboratory.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.
Climate-Weather modeling studies Using a Prototype Global Cloud-System Resolving Model Zhi Liang (GFDL/DRC)
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
1CPSD Software Infrastructure for Application Development Laxmikant Kale David Padua Computer Science Department.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
MESQUITE: Mesh Optimization Toolkit Brian Miller, LLNL
Project18’s Communication Drawing Design By: Camilo A. Silva BIOinformatics Summer 2008.
Generic GUI – Thoughts to Share Jinping Gwo EMSGi.org.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
“NanoElectronics Modeling tool – NEMO5” Jean Michel D. Sellier Purdue University.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Advanced User Support for MPCUGLES code at University of Minnesota October 09,
Single Node Optimization Computational Astrophysics.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
OPTIMIZATION OF DIESEL INJECTION USING GRID COMPUTING Miguel Caballer Universidad Politécnica de Valencia.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.
Operational and Application Experiences with the Infiniband Environment Sharon Brunett Caltech May 1, 2007.
Collection and storage of provenance data Jakub Wach Master of Science Thesis Faculty of Electrical Engineering, Automatics, Computer Science and Electronics.
HPC University Requirements Analysis Team Training Analysis Summary Meeting at PSC September Mary Ann Leung, Ph.D.
VisIt Project Overview
The Prediction of Low Frequency Rumble in Combustion Systems
Achieving the Ultimate Efficiency for Seismic Analysis
Spark Presentation.
for the Offline and Computing groups
In-situ Visualization using VisIt
Parallel Objects: Virtualization & In-Process Components
SDM workshop Strawman report History and Progress and Goal.
Integrated Runtime of Charm++ and OpenMP
Hybrid Programming with OpenMP and MPI
Department of Computer Science, University of Tennessee, Knoxville
Presentation transcript:

Large Eddy Simulation of two phase flow combustion in gas turbines: Predicting extreme combustion processes in real engines Isabelle d’Ast - CERFACS

CERFACS Around 130 people in Toulouse ( South West France). Goal : Develop and improve numerical simulation methods and advanced scientific computing on real applications ( CFD, climate, electromagnetism ). 30 to 40 A class publications by year (International journals) 10 Phds per year Collaborations with industry and academia ( France, Germany, Spain, USA, Italy ).

Scientific problem: the prediction of extinction in an industrial burner In an industrial burner due to the fast change in operating conditions, the fuel mass flux can vary much faster than the air mass flux  the engine extinction must be avoided. Extinction is an unsteady phenomenon which has been widely studied in very academic configurations, but very little in complex industrial burners. Purpose of the project: perform Large Eddy Simulation (LES) in an industrial combustion chamber to understand the mechanisms of extinction and evaluate the capability of LES to predict accurately the extinction limits. Annular combustion chamber Combustion chamber sector Air + fuel injection Air injection Outlet Unstructured mesh ~ 9M cells

B) Science Lesson The AVBP Code : –Navier stokes 3D compressible equations Two phase ( liquid Eulerian/lagragian ) Reactive flows Real thermodynamics (perfect and transcritical gases) Moving meshes (piston engines) –Large Eddy Simulation Scales larger than the mesh cells are fully resolved Scales smaller than the mesh cells are modeled via a sub-grid stress tensor model ( Smagorinsky / Wale ) –Unstructured grids: Important for complex geometries –Explicit schemes ( Taylor Galerkin / Lax wendroff ) Contact: for

C) Parallel Programming Model MPI code written in fortran 77. Library requirements: –Parmetis (partitionning) –(p)HDF5 ( I/Os) –Lapack The code runs on any x86 / power / Sparc computer in the market so far ( BullX, Bluegene P, CRAY XT5, Power 6, Sgi Altix) Currently migrating to fortran 90 (validation underway). Introduction of OpenMP and OmpSS for fine grain threading in progress.

E) I/O Patterns and Strategy Two categories of I/O –Small binary files ( one file written by the master for progress monitoring). –Large HDF5 files. Single file only. Written by the master ( HDF5 standard) Phdf5 collective file under study ( parallel I/O handled via PHDF5 only). Performance is erratic and variable. –Multiple master - Slave I/O ( a subset of ranks has I/O responsabilities ) One file per master ( 1/100 of core count files ) under study. Sketch code performance encouraging. –Average size of HDF5 : 2GB. Depends on the mesh size ( max today 15GB per file, one file per dumped time steps, usually 200 for converged simulation). Binary file 100 MB. Input I/O 2 large HDF5 files. –Sequential master read –Buffered / alltoall alltoallv under validation.

F) Visualization and Analysis Visualization uses 2 methods: –Translation of selected datasets to ensight/fieldview/tecplot format. Relies on parallelisation of these tools. –Xdmf format : xml indexing of HDF file and direct read via paraview/ensight (no translation) ‘advanced user methods’ available (not tested on INTREPID yet): –Single HDF5 file written in block format ( per partition ). –Indexed via xdmf –Read and postprocessed in parallel directly via pvserver ( paraview ) on the cluster and automatically generates jpg. Full migration to xdmf for Rd quarter. Generalisation of pvserver.

G) Performance Performance analysis with : –Scalasca –Tau –Paraver / dyninst Current Bottlenecks : –Master/slave –Extreme usage of allreduce. Over 100 Calls per iteration. –Hand coded collective communications instead of alltoall / broadcast –Cache misses: Adaptative cache loop not implemented for node (only for cells). –Pure MPI Implementation (instead of hybrid mode). Current status and future plans for improving performance: –Parallelisation of preprocessing task sketch done 2h -> 3min max memory 15GB versus 50 MB. Replacement of the current master /slave scheme 3 rd Quarter –Buffered – MPI_reduce switch underway on current version: 20% gain per iteration at 1024 cores. Strong scaling performance to be studied. –OpenMP / OmpSS implementation to reduce communications

H) Tools How do you debug your code? –Compiler : “-g -fbounds-check -Wuninitialized -O -ftrapv -fimplicit-none - fno-automatic –Wunused” –Gdb / ddt Current status and future plans for improved tool integration and support –Debug verbosity level included in the next code release.

I) Status and Scalability How does your application scale now? –92% scalability up to 8 Racks on BG-P ( dual mode ) Target 128k cores end of 2012: –Currently 60% on 64k cores.

I) Status and Scalability What are our top pains? –1- Scalable I/O. –2- Blocking allreduce. –3- Scalable post-processing. What did you change to achieve current scalability? –Buffer Asynchronous partition communications ( Irecv/Isend) previously per dataset Irecv/Send. Current status and future plans for improving scalability –Switch to Parmetis 4 for improve performance and larger datasets –Ptscotch ? ( Zoltan ? )

J) Roadmap Where will your science take you over the next 2 years? –Currently we are able to predict instabilities, extinction and ignition of gas turbines. –Switch to larger problems and safety concerns : Fires in buildings ( submitted for consideration for 2013 ). What do you hope to learn / discover? –Understanding flame propagation inside buildings/furnaces will greatly improve prediction models and safety standards can be adapted accordingly. Even larger datasets : 2013 I/O expected 40Gb per snapshot. Need to improve workflow ( fully parallel postprocessing ) – Scalable I/O.