Enhance CMAQ Performance to Meet Future Challenges: I/O Aspect David Wong AMAD, EPA October 20, 2009.

Slides:

Advertisements

Similar presentations

Μπ A Scalable & Transparent System for Simulating MPI Programs Kalyan S. Perumalla, Ph.D. Senior R&D Manager Oak Ridge National Laboratory Adjunct Professor.

Advertisements

Overview Motivation Scala on LLVM Challenges Interesting Subsets.

Intel Parallel Advisor Workflow David Valentine Computer Science Slippery Rock University.

© 2013 IBM Corporation Enabling easy creation of HW reconfiguration scenarios for system level pre-silicon simulation Erez Bilgory Alex Goryachev Ronny.

The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.

Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA.

Implementation of a satellite on a Multi-Core System A project by: Daniel Aranki Mohammad Nassar Supervised by: Mony Orbach Winter 2009 Characterization.

Lecture 3: Computer Performance

Multicore experiment: Plurality Hypercore Processor Performed by: Anton Fulman Ze’ev Zilberman Supervised by: Mony Orbach Characterization presentation.

SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.

“Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015.

Computer System Architectures Computer System Software

Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms Reporter: Jilin Zhang Authors:Changjun Hu, Yali.

Content Project Goals. Term A Goals. Quick Overview of Term A Goals. Term B Goals. Gantt Chart. Requests.

2Q2008 System z High Availability – Parallel Sysplex TGVL: System z Foundation 1 System z High Availability – Value of Parallel Sysplex IBM System z z10.

CS533 Concepts of Operating Systems Jonathan Walpole.

STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.

Understanding Parallelism-Inhibiting Dependences in Sequential Java Programs Atanas (Nasko) Rountev Kevin Van Valkenburgh Dacong Yan P. Sadayappan Ohio.

CSCI 588 – FA05David Woollard - Peter Tipton - Andrew Hart Team 6 Status Review October 18, 2005 David Woollard (ID: 8735) Andrew Hart (ID: 4152) Peter.

The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.

ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.

Process by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.

Copyright ©: University of Illinois CS 241 Staff1 Threads Systems Concepts.

Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.

Improving I/O with Compiler-Supported Parallelism Why Should We Care About I/O? Disk access speeds are much slower than processor and memory access speeds.

Office of Research and Development Atmospheric Modeling Division, National Exposure Research Laboratory WRF-CMAQ 2-way coupled system: Part I David Wong,

Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest.

Operating System 4 THREADS, SMP AND MICROKERNELS.

GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.

A System Performance Model Distributed Process Scheduling.

1 Packet Network Simulator-on-Chip Henry Wong Danyao Wang University of Toronto Connections 2009 ECE Graduate Symposium.

An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.

Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,

Sunpyo Hong, Hyesoon Kim

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.

Evaluating the Fault Tolerance Capabilities of Embedded Systems via BDM M. Rebaudengo, M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica.

Representing the optical properties of black carbon in the integrated WRF-CMAQ system Francis S. Binkowski, UNC David C. Wong, US EPA.

Teaching Parallelism in a Python- Based CS1 at a Small Institution Challenges, Technical and Non-Technical Material, And CS2013 coverage Steven Bogaerts.

Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)

Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.

10-1 人生与责任淮安工业园区实验学校连芳芳 “ 自我介绍 ” “ 自我介绍 ” 儿童时期的我.

Eun-Su Yang and Sundar A. Christopher Earth System Science Center University of Alabama in Huntsville Shobha Kondragunta NOAA/NESDIS Improving Air Quality.

Chapter 7 Input/Output and Storage Systems. 2 Chapter 7 Objectives Understand how I/O systems work, including I/O methods and architectures. Become familiar.

Community Grids Laboratory

Chapter 4: Multithreaded Programming

Operating & Configuring a Cisco IOS Device

Hardware Support for Embedded Operating System Security

Real-time Software Design

Quicken Installation Problem Number More info :

Quicken Help Phone Number More info :

Yahoo Mail Customer Support Number

Most Effective Techniques to Park your Manual Transmission Car

How do Power Car Windows Ensure Occupants Safety

Dump Control, Delay Models

Chapter 4: Threads.

Operating System 4 THREADS, SMP AND MICROKERNELS

Distributed Systems CS

مديريت موثر جلسات Running a Meeting that Works

CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM

Chapter 4: Threads & Concurrency

Lecture 20 Parallel Programming CSE /27/2019.

Presentation transcript:

Enhance CMAQ Performance to Meet Future Challenges: I/O Aspect David Wong AMAD, EPA October 20, 2009

Motivation Seeing degradation of performance as the number of processors increase Need to use large number of processors is inevitable Over due

Background info CMAQ performs I/O through IOAPI_3 IOAPI_3 operates in serial mode PARIO (PARallel IO) was created over 10 years ago, on top of IOAPI_3 to provide parallel I/O functionalities Current I/O design: any processor can perform read but only I/O processor can perform write

Observation Read operation overhead Write operation synchronization cost

Summary CMAQ degradation comes from read operation as well as synchronization cost for write operation. This implies not “suitable” to run with large number of processors AQF’s I/O design does show better performance DFIO approach takes care the degradation problem

Next step Pick a better configuration to compare aqf and cmaq performance Determine the actual I/O performance using the real model code Pnetcdf Multicore architecture

THANK YOU !