Outline 3  PWA overview Computational challenges in Partial Wave Analysis Comparison of new and old PWA software design - performance issues Maciej Swat.

Slides:



Advertisements
Similar presentations
Recent results from the E852 data analysis Motivation, E852 vs GlueX PWA basics 0 0 production and spectra : have we seen exotics yet ? Computational challenge.
Advertisements

Buffers & Spoolers J L Martin Think about it… All I/O is relatively slow. For most of us, input by typing is painfully slow. From the CPUs point.
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.
Linkage Editors Difference between a linkage editor and a linking loader: Linking loader performs all linking and relocation operations, including automatic.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
O. Stézowski IPN Lyon AGATA Week September 2003 Legnaro Data Analysis – Team #3 ROOT as a framework for AGATA.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/27 A Control Software for the ALICE High Level Trigger Timm.
Informationsteknologi Friday, November 16, 2007Computer Architecture I - Class 121 Today’s class Operating System Machine Level.
Science Advisory Committee Meeting - 20 September 3, 2010 Stanford University 1 04_Parallel Processing Parallel Processing Majid AlMeshari John W. Conklin.
Radphi Analysis at I.U. Dan Krop 9/06/02. Overview BGV Callibration Using Gradphi Monte Carlo PWA Formalism for Spin 1 Beam Moments of the Angular Distribution.
FLANN Fast Library for Approximate Nearest Neighbors
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
MetaSync File Synchronization Across Multiple Untrusted Storage Services Seungyeop Han Haichen Shen, Taesoo Kim*, Arvind Krishnamurthy,
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Photon reconstruction and calorimeter software Mikhail Prokudin.
Ch 4. The Evolution of Analytic Scalability
RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
Parallel implementation of RAndom SAmple Consensus (RANSAC) Adarsh Kowdle.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.
Amplitude Analysis in GlueX Curtis A. Meyer Carnegie Mellon University.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
Analysis of E852 Data at Indiana University Ryan Mitchell PWA Workshop Pittsburgh, PA February 2006.
Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.
Update on a New EPICS Archiver Kay Kasemir and Leo R. Dalesio 09/27/99.
MapReduce How to painlessly process terabytes of data.
Cohesion and Coupling CS 4311
The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.
K charged meeting 10/11/03 K tracking efficiency & geometrical acceptance :  K (p K,  K )  We use the tag in the handle emisphere to have in the signal.
SAXS Scatter Performance Analysis CHRIS WILCOX 2/6/2008.
Vertex finding and B-Tagging for the ATLAS Inner Detector A.H. Wildauer Universität Innsbruck CERN ATLAS Computing Group on behalf of the ATLAS collaboration.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Normal text - click to edit HLT tracking in TPC Off-line week Gaute Øvrebekk.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
| nectar.org.au NECTAR TRAINING Module 4 From PC To Cloud or HPC.
CSCI-455/552 Introduction to High Performance Computing Lecture 23.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
2005 Unbinned Point Source Analysis Update Jim Braun IceCube Fall 2006 Collaboration Meeting.
Exploring Parallelism with Joseph Pantoga Jon Simington.
Alignment in real-time in current detector and upgrade 6th LHCb Computing Workshop 18 November 2015 Beat Jost / Cern.
Elliptic flow of D mesons Francesco Prino for the D2H physics analysis group PWG3, April 12 th 2010.
All lepton generation and propagation with MMC Dmitry Chirkin, UCB/LBNL AMANDA meeting, Uppsala, 2004.
Optimal Relay Placement for Indoor Sensor Networks Cuiyao Xue †, Yanmin Zhu †, Lei Ni †, Minglu Li †, Bo Li ‡ † Shanghai Jiao Tong University ‡ HK University.
Automatic Generation of the Partial Wave Analysis Program For Physics Processes Jian-Xiong Wang Institute of High Energy Physics, Academia Sinica 1. Brief.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
A Common Partial Wave Analysis
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
Distributed Network Traffic Feature Extraction for a Real-time IDS
Operating Systems (CS 340 D)
Test of PWA Reliability at BES
Image Processing for Physical Data
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
湖南大学-信息科学与工程学院-计算机与科学系
GEOMATIKA UNIVERSITY COLLEGE CHAPTER 2 OPERATING SYSTEM PRINCIPLES
Ch 4. The Evolution of Analytic Scalability
Multidimensional Integration Part I
By Brandon, Ben, and Lee Parallel Computing.
Chapter 2: Operating-System Structures
WARP: fitting gamma signals
Partial Wave Analysis results from JETSET the Jetset experiment
Chapter 2: Operating-System Structures
Presentation transcript:

Outline 3  PWA overview Computational challenges in Partial Wave Analysis Comparison of new and old PWA software design - performance issues Maciej Indiana University Recent progress in 3  Partial Wave Analysis of E852 data

PWA basics - isobar model- 3     - p p s=(p p +p p ) 2 t=(p p -p X ) 2 X      Measured by experiment CG coefficients and D lmm’ ’sUsually Breit Wigner propagator We know how to calculate decay amplitudes A b This is what we are looking for - production amplitudes iso

3  PWA results Events/0.025 GeV/c 2 Mass [GeV/c 2 ] Intensity All waves a 2 (1320) a1a1  2 (1670) Typical PWA fit involves: ~1-3M events/’t’ bin ~5-10 ‘t’ bins ~80 mass bins ~30-40 waves

For each event we have to: 1. Find helicity frame decay angles -  1,  2 2. Calculate decay amplitudes (involves Wigner D functions, CG coefficients,BW propagators). These amplitudes are model dependent. In particular,the time to calculate a single decay amplitude depends on the model. 3.Find normalization integrals : PWA implementation - Normalization Integrals in the isobar model Data file - 10M experimental events Raw Monte Carlo file - 150M events Accepted Monte Carlo file - 40M events

Data sets Decay Amplitudes All data events All decay amplitudes Normalization Integrals Master Distributed data Gather partial results NI’s Slaves calculate amplitudes “on the fly” and evaluate partial contributions to normalization integrals OLDNEW PC-nodeDisk Storage Legend

Assume 10 ‘t’ bins OLDNEW Total time:~150 hours of computer time 150M40M10M Calculate masses, angles, invariants, amplitudes. Store amplitudes.Fill normalization integrals tables. One has to handle ~5 000 files ~300 GB disk space Every time one changes data cut another 50 hours of computer time is required. MC filesData files Total time:~45 minutes of computer time 150M40M10M Calculate masses, angles, invariants, amplitudes. Fill normalization integrals tables. One has to handle ~15 files ~30 MB disk space Every time one changes data cut another 10 minutes of CPU time is required. MC filesData files Most of the time is spent doing Input/Output operations Input/Output operations are reduced to necessary minimum Performance comparison

Decay Amplitudes Master Fitted parameters Gather likelihood contributions At every iteration of minimization routine master sends fitted parameters, slaves calculate likelihood and send the result back to the master OLDNEW bin 0 bin 1 bin 2 bin n PWA fits Decay Amplitudes... bin 0 - bin n Use final parameters from bin k as the starting parameters for bin k+1. Have to re- read decay amplitudes for every bin. Minuit runs only on master

OLDNEW Only two types of fits are possible as of now: 1. Bins are fitted independent from each other - fast, can use multiple CPU’s. 2 hours / t bin 2. Fit with parameters boot-strapping - has to be done on single machine, slow. 30 hours /t bin New fitter is scalable and one can do the following types of fits: 1. Bins are fitted independent from each other - fast, can use multiple CPU’s. 1-2 hours / t bin 2. Fit with parameters boot-strapping - has to be done on single machine, slow hours / t bin 3. Mass dependent PWA or any other type of fit where one would require to fit entire data sample at once (without dividing data set into bins). PWA fitter features comparison CAUTION: Slow network connection between master and slaves can significantly degrade performance of the PWA fitter.

Tips,Tricks, Conclusions Ways to achieve good performance: 1. Write parallel code using e.g. Message Passing Interface (MPI) - a simple and easy to use library 2. Avoid reading from secondary storage 3. Compute things once and avoid redundancy Until now we have used only isobar model. We have to study also other models, a lot of progress has been made in 70’s. Try to represent data in model independent fashion - study moments Theoretical analysis - choice of model issues