Download presentation
Presentation is loading. Please wait.
Published byKyle Walsh Modified over 11 years ago
1
December 5, 2012 MASTODONS meeting Laboratoire de lAccélérateur Linéaire (LAL) / CNRS & Université Paris Sud Laboratoire de la Recherche en Informatique (LRI) / CNRS & INRIA & Université Paris Sud Laboratoire de lInformatique du Parallélisme (LIP) / CNRS & INRIA & ENS Lyon & UCB Lyon PI: Balázs Kégl (LAL) presented by Cécile Germain (LRI) Data in physics: Large-scale data storage, data management, and data analysis for next generation particle physics experiments DeePhy
2
Balázs is at NIPS
3
Cécile Germain / LRIMASTODONS DeePhy 3 The collaboration LAL AppStat Auger TAO LRI CompSci / Statistics Experimental Physics ATLAS ILC/Calic e LIP Engineering support Service Informatique AVALON
4
Cécile Germain / LRIMASTODONS DeePhy 4 The participants Laboratoire de lAccélérateur Linéaire (LAL) Rémi Bardenet (PhD student, AppStat), Djalel Benbouzid (PhD student, AppStat), François-David Collin (IR CNRS, SI), Laurent Duflot (CR CNRS, ATLAS), Diego Garcia Gamez (postdoc, Auger), Michel Jouvin (IR CNRS, SI), Balázs Kégl (CR CNRS, AppStat&Auger), Oleg Lodygensky (IR CNRS, SI), Roman Poeschl (ILC, DR CNRS), David Rousseau (DR CNRS, ATLAS) Laboratoire de la Recherche en Informatique (LRI) Cécile Germain (Professeur, UPSud, TAO), Tristan Glatard (CR CNRS, Laboratoire CREATIS), Michèle Sebag (DR CNRS, TAO) Laboratoire de lInformatique du Parallélisme (LIP) Simon Delamare (IR CNRS), Gilles Fedak (CR INRIA), Laurent Lefêvre (CR INRIA)
5
Cécile Germain / LRIMASTODONS DeePhy 5 Projects ANR Siminole: 2010-2014, LAL/LRI/Telecom ParisTech Large-scale simulation-based probabilistic inference, optimization, and discriminative learning with applications in experimental physics ANR MapReduce: ?? MRM Grille Paris Sud: 2010-2014, LAL/LRI FP7 EDGI/DEGISCO: ??...
6
Cécile Germain / LRIMASTODONS DeePhy 6 Meetings Regular (phone) meetings between the PIs Project meeting, November 23 Clouds pour le Calcul Scientifique: November 27-28, 2012 LAL@Orsay The First International Workshop on BigData in Science: Systems, Infrastructures and Applications (BIGDATA'2013): October 3-4, 2013 ICPP, Lyon, FRANCE
7
Cécile Germain / LRIMASTODONS DeePhy 7 Budget 2012 LAL LRI LIP
8
Cécile Germain / LRIMASTODONS DeePhy 8 Mission of AppStat@LAL Motivate fundamental research by real applications Bring state-of-the-art analysis techniques to experimental physics Computational Statistics High-energy physics Data analysis methodology Real data Motivation Where we are in 2012
9
Cécile Germain / LRIMASTODONS DeePhy 9 Triggers and lean classifiers D. Benbouzid, R. Busa-Fekete, and B. Kégl, Fast classification using sparse decision DAGs,, International Conference on Machine Learning (ICML), 2012 The telescope image The JEM EUSO telescope on the ISS Trigger = fast classifier: signal vs. background Boosting works well but produces slow classifiers We designed a Markov decision graph (MDDAG) algorithm using reinforcement learning MDDAG can be reused in any test-time constrained problem (object detection, web page ranking), can replace classical cascade designs Leads to interesting research questions (sparsity, representation, deep learning) Where we are in 2012
10
Cécile Germain / LRIMASTODONS DeePhy 10 Adaptive Metropolis for mixture signals R. Bardenet, O. Cappé́, G. Fort, and B. Kégl, Adaptive Metropolis with online relabeling, in International Conference on Artificial Intelligence and Statistics (AISTATS), 2012. The Auger tank signal Cosmic rays and the Pierre Auger observatory Classical adaptive Metropolis was suboptimal due to symmetries (label switching) We designed adaptive Metropolis with online relabeling (AMOR) that works well on our problem AMOR is extendible to any problem involving inference on parametrized mixture signals Where we are in 2012
11
Cécile Germain / LRIMASTODONS DeePhy Léquipe-projet TAO de lINRIA-Saclay 5 chercheurs ou EC seniors 9 juniors 25 PhD, post-docs et Ingénieurs Computer-Go : 3 Gold medals, 2010 CMAES : de la théorie de loptimisation stochatstique aux applications avec EADS, IFP, PSA, SIMINOLE Autonomics + e-Science: Grid Observatory, Green Computing Observatory avec EGI
12
Cécile Germain / LRIMASTODONS DeePhy Digital curation of the behavioural data of the EGI grid: observe and publish www.grid-observatory.org Specific challenges for analysis unusual extreme statistics: which metrics? Are our systems stationary? (in fact no) Optimisation, autonomics How to build the knowledge? No Gold Standard, too rare experts Infer latent causes, eg analyze traces as text files, and more Build credible benchmarks and models, eg piecewise AR instead of long range dependence The Grid Observatory Managed Element ES Monitor Analyze Execute Plan Knowledge Autonomic Manager ES
13
Cécile Germain / LRIMASTODONS DeePhy Failure: its not a bug, its a feature [D. Feng, C.Germain and T. Glatard. Distributed Monitoring with Collaborative Prediction. In « 12th IEEE International Symposium on Cluster, Cloud and Grid Computing (CCGrid'12) » 2012] « A distributed system is one in which the failure of a computer you didnt even know existed can render your own computer unusable » BDII LFC SE VOMS CE N2 N3N1 N4 firewall Service/hardware N5 CE n.… k n SE ….. …… ce-hdbdiilfcvoms…. lcg-cr1111 nmap1100 srm-ls1011 …. Dependency matrix Collaborative prediction more efficient than detection/diagnosis
14
Cécile Germain / LRIMASTODONS DeePhy Failure: its not a bug, its a feature [D. Feng, C.Germain and T. Glatard. Distributed Monitoring with Collaborative Prediction. In « 12th IEEE International Symposium on Cluster, Cloud and Grid Computing (CCGrid'12) » 2012] « A distributed system is one in which the failure of a computer you didnt even know existed can render your own computer unusable » BDII LFC SE VOMS CE N2 N3N1 N4 firewall Service/hardware N5 CE n.… k n SE ….. …… Probe matrix Collaborative prediction more efficient than detection/diagnosis lcg-cp
15
Cécile Germain / LRIMASTODONS DeePhy 15 BigData and BigComputation Scientific challenges for the next 4-5 years The LHCThe EGEE/EGI gridThe ATLAS collaboration largest: 26 km highest energy: 7 TeV coldest: 1.9 K emptiest: (check because 13 atm is very high pressure) largest: 200K CPUs, 15 PB most distributed: 250 sites busiest: 1000 jobs/day largest: 3000 scientists widest: 38 countries and 174 institutions
16
Cécile Germain / LRIMASTODONS DeePhy 16 BigData and BigComputation Scientific challenges for the next 4-5 years Opportunities We can apply computationally expensive analysis techniques we could not dream of ten years ago The Google cat: deep learning technique running on 16K cores for three days, watching 10M YouTube stills [Le et al., ICML12] Large-scale parallel Monte-Carlo Markov chains or likelihood-free simulation based approximate Bayesian computation [Tavaré et al., 1997]
17
Cécile Germain / LRIMASTODONS DeePhy 17 BigData and BigComputation Scientific challenges for the next 4-5 years Challenges Large-scale machine learning is a new paradigm: statistical overfitting is no longer a danger, underfitting is optimization time interferes with the approximation-estimation trade-off, mediocre optimization algorithms like stochastic gradient descent become competitive. [Bottou-Bousquet, 2008, 2011] The algorithmic toolbox shrinks considerably: it is often better to use a large data set and a suboptimal but fast algorithm then a small data set and an optimal expensive technique Data management and work-flow become crucial, often more important than optimizing the single-thread technique
18
Cécile Germain / LRIMASTODONS DeePhy 18 Data as a communication tool between Physics and CS A well-designed challenge can attract a large number of professional data miners Tricky: data cleaning and formatting, defining a standardized problem, evaluation metric, scripting, web interface, marketing requires the collaboration of physicists, computer scientists, and engineers The prize is worthy: objective evaluation of a large number of techniques on a given problem What we will do in 2013
19
Cécile Germain / LRIMASTODONS DeePhy 19 The Higgs boson challenge Ob servation à plus de 5 sigma dune particule dont les propriétés sont compatible avec celle du boson de Higgs Phys. Lett. B 716 (2012) 1-29 Phys. Lett. B 716 (2012) 1-29 Spectre filtré à partir de quelques milliards de collisions proton-proton enregistrées Piètre resolution Pic du signal à coté dun pic de bruit de fond large et important Il sagit maintenant détablir que la nouvelle particule est vraiment le boson de Higgs quon attend en la détectant dans dautres canaux plus difficiles What we will do in 2013 Atlas Signal Higgs
20
Cécile Germain / LRIMASTODONS DeePhy 20 The Higgs boson challenge Challenge 1: estimating the mass of the Higgs boson candidate as precisely as possible (at the moment: 20%) within a fixed CPU time (0.1s par event) Monte-Carlo integration in 5-6 dimensions with constraints Adaptive importance sampling, adaptive MCMC, etc. What we will do in 2013
21
Cécile Germain / LRIMASTODONS DeePhy 21 The Higgs boson challenge Challenge 2: detecting the Higgs boson standard classification problem with ~100 features at two classes: signal vs. background features are constructed manually AdaBoost, SVM, neural networks, etc. Challenge 3: unsupervised feature extraction two-class classification on raw input: a variable size list of particles targeting the deep learning community What we will do in 2013
22
Challenges scientifiques du projet à 4 ou 5 ans et affinage éventuel de ces challenges depuis la soumission des projets, Organisation du projet, modalités de travail collaboratif, agenda des réunions techniques de l'année 2012, Premiers résultats scientifiques obtenus ou le cas échéant identification des premières pistes de recherche explorées et positionnement par rapport à l'état de l'art, Objectifs scientifiques pour 2013, en donner une vue plus détaillée que celle du point 1. 1 slide sur l'utilisation du budget alloué au projet.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.