A Data/Detector Characterization Pipeline (What is it and why we need one) Soumya D. Mohanty AEI January 18, 2001 Outline of the talk Functions of a Pipeline.

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

Software change management
Configuration management
By Rohen Shah – rxs07u.  Introduction  Different methodologies used  Different types of testing tools  Most commonly used testing tools  Summary.
Hydrological information systems Svein Taksdal Head of section, Section for Hydroinformatics Hydrology department Norwegian Water Resources and Energy.
Beamline Takashi Kobayashi 1 Global Analysis Meeting Nov. 29, 2007.
LIGO Reduced Data Sets E7 standard reduced data set RDS generation for future runs decimation examples LIGO-G Z Isabel Leonor (w/ Robert Schofield)
Searching for pulsars using the Hough transform Badri Krishnan AEI, Golm (for the pulsar group) LSC meeting, Hanford November 2003 LIGO-G Z.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
Database Management Systems (DBMS)
Incorporating Computer Visualizations and Simulations into Your Teaching Marsha C. Lovett, Ph.D.
SYSTEM LIFE CYCLES. OBJECTIVES o Be able to describe the stages of development of a hardware/software system. o Know what the different stages of the.
Arc: Programming Options Dr Andy Evans. Programming ArcGIS ArcGIS: Most popular commercial GIS. Out of the box functionality good, but occasionally: You.
LIGO-G E ITR 2003 DMT Sub-Project John G. Zweizig LIGO/Caltech Argonne, May 10, 2004.
Detector Characterization in GEO 600 by Alicia M. Sintes-Olives on behalf of the GEO-DC team Universitat de les Illes Balears
Systematic effects in gravitational-wave data analysis
Simple Program Design Third Edition A Step-by-Step Approach
06/15/2009CALICE TB review RPC DHCAL 1m 3 test software: daq, event building, event display, analysis and simulation Lei Xia.
Bottlenecks: Automated Design Configuration Evaluation and Tune.
SWE 316: Software Design and Architecture – Dr. Khalid Aljasser Objectives Lecture 11 : Frameworks SWE 316: Software Design and Architecture  To understand.
Software Engineering 2003 Jyrki Nummenmaa 1 CASE Tools CASE = Computer-Aided Software Engineering A set of tools to (optimally) assist in each.
21 Feb 2002Soumya D. Mohanty, AEI1 DCR Plan of presentation Soumya Mohanty: Overview, aims & work done R. Balasubramanian: Details of Hardware, Database.
Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
Configuration Management (CM)
OBJECT ORIENTED SYSTEM ANALYSIS AND DESIGN. COURSE OUTLINE The world of the Information Systems Analyst Approaches to System Development The Analyst as.
Data Characterization in Gravitational Waves Soma Mukherjee Max Planck Institut fuer Gravitationsphysik Golm, Germany. Talk at University of Texas, Brownsville.
Feb 20, 2002Soumya Mohanty, AEI1 GEO++ A general purpose C++ DSP library Plan of presentation Soumya Mohanty: Overview R. Balasubramanian: MPI Shell, Frame.
Software Project Management With Usage of Metrics Candaş BOZKURT - Tekin MENTEŞ Delta Aerospace May 21, 2004.
LIGO-G9900XX-00-M ITR 2003 DMT Sub-Project John G. Zweizig LIGO/Caltech.
Adapting matched filtering searches for compact binary inspirals in LSC detector data. Chad Hanna – For the LIGO Scientific Collaboration.
Dr. Tom WayCSC Testing and Test-Driven Development CSC 4700 Software Engineering Based on Sommerville slides.
Big Data EUDAT 2012 – Training Day Adam Carter, EPCC EUDAT Training Task Leader.
The System and Software Development Process Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
SIMO SIMulation and Optimization ”New generation forest planning system” Antti Mäkinen & Jussi Rasinmäki Dept. of Forest Resource Management.
LIGO-G D Global Diagnostics and Detector Characterization 9 th Marcel Grossmann Meeting Daniel Sigg, LIGO Hanford Observatory.
18/01/01GEO data analysis meeting, Golm Issues in GW bursts Detection Soumya D. Mohanty AEI Outline of the talk Transient Tests (Transient=Burst) Establishing.
Multidimensional classification of burst triggers from LIGO S5 run Soma Mukherjee for the LSC Center for Gravitational Wave Astronomy University of Texas.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
21 Feb 2002Soumya D. Mohanty, AEI1 Detector/Data Characterization Robot Towards Data Mining Soumya D. Mohanty Max Planck Institut für Gravitationsphysik.
NSF Review, 18 Nov 2003 Peter Shawhan (LIGO/Caltech)1 How to Develop a LIGO Search Peter Shawhan (LIGO / Caltech) NSF Review November 18, 2003 LIGO-G E.
Searching for Gravitational Waves from Binary Inspirals with LIGO Duncan Brown University of Wisconsin-Milwaukee for the LIGO Scientific Collaboration.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Jan 12, 2009LIGO-G Z1 DMT and NDS2 John Zweizig LIGO/Caltech Ligo PAC, Caltech, Jan 12, 2009.
GEO++ Online Detector Characterization System. LIGO-G Z GEO++ working group GEO++ working group Cardiff University: Birmingham University Cardiff.
University of Jyväskylä Department of Mathematical Information Technology ICANNGA 2009 Mining Time Series State Changes with Prototype Based Clustering.
LSC Meeting LIGO Scientific Collaboration - University of Wisconsin - Milwaukee 1 Software Coordinator Report Alan Wiseman LIGO-G Z.
May 29, 2006 GWADW, Elba, May 27 - June 21 LIGO-G0200XX-00-M Data Quality Monitoring at LIGO John Zweizig LIGO / Caltech.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
Multidimensional classification analysis of kleine Welle triggers in LIGO S5 run Soma Mukherjee for the LSC University of Texas at Brownsville GWDAW12,
Periodic Sources Session M. Alessandra Papa - Albert Einstein Institute - Golm, Germany The LSC UL group is using all (and only) of the codes that we (GEO)
TRIUMF HLA Development High Level Applications Perform tasks of accelerator and beam control at control- room level, directly interfacing with operators.
LIGO-G Z Detector Characterization Summary K. Riles - University of Michigan 1 Summary of the Detector Characterization Sessions Keith.
Soma Mukherjee LSC, Hanford, Aug.19 '04 Generation of realistic data segments: production and application. Soma Mukherjee Centre for Gravitational Wave.
LIGO- G Z 11/13/2003LIGO Scientific Collaboration 1 BlockNormal Performance Studies John McNabb & Keith Thorne, for the Penn State University.
The first AURIGA-TAMA joint analysis proposal BAGGIO Lucio ICRR, University of Tokyo A Memorandum of Understanding between the AURIGA experiment and the.
January 2010 – GEO-ISC KickOff meeting Christian Gräf, AEI 10 m Prototype Team State-of-the-art digital control: Introducing LIGO CDS.
Software tools for digital LLRF system integration at CERN 04/11/2015 LLRF15, Software tools2 Andy Butterworth Tom Levens, Andrey Pashnin, Anthony Rey.
TREND DAQ BASIC PRINCIPLES … Antennas DAQ ADC Trigger condition: Max > N.σ ??? First level trigger: Nant > 3 antennas ? -By antenna : - 1 data file (Nevent.
SEARCH FOR INSPIRALING BINARIES S. V. Dhurandhar IUCAA Pune, India.
Software Development.
SC03 failed results delayed FDS: parameter space searches
System.
Searching for pulsars using the Hough transform
Some Simple Definitions for Testing
Problem Solving.
Software metrics.
Recommending Adaptive Changes for Framework Evolution
What is a System? A system is a collection of interrelated components that work together to perform a specific task.
Presentation transcript:

A Data/Detector Characterization Pipeline (What is it and why we need one) Soumya D. Mohanty AEI January 18, 2001 Outline of the talk Functions of a Pipeline A Walk through a candidate pipeline Requirements: Issues Proposal for a plan of work

The functions of a pipeline Why have one? –Understanding a new feature or establishing confidence in detection will require a fair amount of manual work (human intensive). –Large data rate (main+auxiliary channels) implies that an automated tool that helps in focussing our attention is essential. Definition: An automated tool to point out “interesting” segments. – Not meant for detector commissioning stage data. –Types: Data/detector Characterization, Data preparation or conditioning. –May not be possible to cleanly separate the design process. –Byproducts: routine, uninteresting information (data summaries) to support data mining tasks. Open Issue : What is interesting? – Automated tool means precise definition of interesting features required. – Example: Change in PSD, Transients, Change in cross-couplings, …

Pipeline: Not just a sum of its parts Simple Example –Transient test characterized without studying effect on/of line noise. – Line removal tool characterized without studying effect on/of transients. –When real data is passed through the line removal tool followed by the transient test, the result will be different from transient test followed by line removal. There can exist other “cross-couplings” which will affect the overall performance of a pipeline. Computational costs need not be a simple sum of parts. Pipeline design and characterization will involve more than the study of tools in isolation.

Analyzing pipeline performance Basic criteria: The pipeline should not make too many mistakes. On the other hand, it should not lose interesting segments. – Extremely reliable statistical characterization will be required. Open issue: Metrics for pipeline performance (or pipeline calibration). –Metric must include: False alarm and Detection, dependence on a priori modeling of data, Computational costs, … –For data preparation pipeline: Calibrate by injecting GW signals into input. –For data/detector characterization pipeline: ? Bottom Line: Lot of experience with simulated and real data is required.

A Candidate Pipeline Design Status: At the stage of a blueprint that can be implemented. –Several new tools identified that need to be developed. (e.g., need a line removal method which is unaffected by transients.) –The blueprint is concrete enough to begin computational cost studies and statistical characterization studies. Origins –The word “pipeline” has been used on several occasions (e.g., LSC Data Analysis White Paper) but this is the first concrete design. –1999: SDM Commissioned to design one as part of the 40m/TAMA coincidence analysis project. Important: A pipeline will affect planning for other data analysis components. – Examples: Software/hardware environment, User interfaces, A sophisticated database or simple sequential files, Interfaces to DAQ,...

Data/Detector Characterization Pipeline

Requirements: Issues Computing. –Should work online. –Memory requirements might be non-trivial if database access overheads turn out to be large. Implementation Language and environment. Within LDAS (adapted to GEO)? Language: C++ TRIANA? JAVA DMT? VEGA? C++ Database. Not an issue confined to this pipeline alone. –Need depends on what kind of data mining tasks will be required. –Examples : (1) Collect data with a particular type of transient (2) Store information about new types of features. Others. –Lots of ideas and guidelines from users required for the design phase. – Code writing and testing phase will be manpower intensive.

Proposal for a plan of work (fastest) Almost all components available in MATLAB. Use sequential files instead of relational database. Implement as a large MATLAB program. Come up with some metrics of performance. Test against simulated and some real data. If (coincidence run with LIGO), aim to produce X hours of characterized data using this MATLAB code. In the meantime, work on related issues and requirements definition.

Conclusions Large amount of data makes it necessary to have a Pipeline in order to direct our attention to where it is really required. Pipeline design and characterization requires more than listing tools and studying them in isolation. Pipeline designing can identify missing features. A concrete design now exists. Several candidate pipelines must be generated and compared. What is interesting? Guidelines, Ideas and experience with real data required to evolve an answer.