December 5, 2012 MASTODONS meeting Laboratoire de lAccélérateur Linéaire (LAL) / CNRS & Université Paris Sud Laboratoire de la Recherche en Informatique.

Slides:

Advertisements

Similar presentations

Clinical Engineering in France

Advertisements

Client Logo 1© The Delos Partnership 2006 January 2006 LEAN ENTERPRISE Implementation Workshop.

Remote Educational Programming Of Robots (REPOR) Tord Fauskanger Aurelie Aurilla Bechina Arntzen Dag Samuelsen Buskerud University College.

Detection of Hydrological Changes – Nonparametric Approaches

1 Probability and the Web Ken Baclawski Northeastern University VIStology, Inc.

1 Web Search Environments Web Crawling Metadata using RDF and Dublin Core Dave Beckett Slides:

1 Random Sampling from a Search Engines Index Ziv Bar-Yossef Maxim Gurevich Department of Electrical Engineering Technion.

Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.

Spectral Analysis of Function Composition and Its Implications for Sampling in Direct Volume Visualization Steven Bergner GrUVi-Lab/SFU Torsten Möller.

Current Trends in Machine Learning and Data Mining

Scaling Multivariate Statistics to Massive Data Algorithmic problems and approaches Alexander Gray Georgia Institute of Technology

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Why Grids Matter to Europe Bob Jones EGEE.

Mean-Field Theory and Its Applications In Computer Vision1 1.

Using search for engineering diagnostics and prognostics Jim Austin.

Experimental Particle Physics PHYS6011 Joel Goldstein, RAL 1.Introduction & Accelerators 2.Particle Interactions and Detectors (2) 3.Collider Experiments.

Evaluating Provider Reliability in Risk-aware Grid Brokering Iain Gourlay.

Chapter 1 Introduction Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Introduction Abstract Views of an Operating System.

Unité 3 Leçon oui! Yes! 2. mais oui! Sure! 3. Bien sûr! Of course! 4. Non! No 5. Mais non! Of course not! 6. Peut-être Maybe 7. Pierre est….. Pierre.

Copyright © 2012 American Institutes for Research. All rights reserved. Building a Scientific Basis for Research Evaluation Rebecca F. Rosen, PhD Senior.

© Tarek Hegazy – 1 Basics of Asset Management Prof. Tarek Hegazy.

Trade Promotion Management Study Summary Charts

MA Metal Finishing Forum Tools and Techniques for Optimizing Metal Finishing Process/Environmental MA Metal Finishing Forum Kevin L. Klink, P.E.

Our Digital World Second Edition

CS525: Special Topics in DBs Large-Scale Data Management

Machine Protection – ISSC 2010B. ToddAugust 2010 Thanks to : TE/MPE/MI, CERN Machine Protection Panel, et al 0v3 A Future Safety System?

AFRICA-REN - Hans F Hoffmann/CERN 1 Bienvenue au CERN Recherche Enseignement Technologie Collaboration International Workshop on African Research.

Université du Québec École de technologie supérieure Face Recognition in Video Using What- and-Where Fusion Neural Network Mamoudou Barry and Eric Granger.

Pennsylvania Value-Added Assessment System (PVAAS) High Growth, High Achieving Schools: Is It Possible? Fall, 2011 PVAAS Webinar.

Autonomic Scaling of Cloud Computing Resources

The GATE-LAB system Sorina Camarasu-Pop, Pierre Gueth, Tristan Glatard, Rafael Silva, David Sarrut VIP Workshop December 2012.

Asaf Cidon. , Tomer M. London

Before Training Avant la formation. 1 What do trainees bring with them to training events? Quest-ce que les personnes en formation apportent au stage.

1 Multimedia Systems 2 Dr Paul Newbury School of Engineering and Information Technology ENGG II - 3A11 Ext:

Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.

April 30, A New Tool for Designer-Level Verification: From Concept to Reality April 30, 2014 Ziv Nevo IBM Haifa Research Lab.

© 2012 National Heart Foundation of Australia. Slide 2.

HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.

Who are the Experts?Simon KampaSlide 1 Who are the Experts? Simon Kampa IAM Group University of Southampton

Orientation Session October 28, 2013 AN ESSENTIAL SKILL! 4. Project and E-Learning Program Overview Doc. #: 4.

Towards Corrective Assurance in Adaptive Service-Based Applications Raman Kazhamiakin 1, Andreas Metzger 2, Marco Pistore 1 FBK-Irst, Trento, Italy SSE,

Princess Nora University Artificial Intelligence Artificial Neural Network (ANN) 1.

1 Using one or more of your senses to gather information.

Slide 1 of 29 Community news Slide 2 of 29 Nouvelles de la communauté…

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

1/26Remco Chang – Dagstuhl 14 Analyzing User Interactions for Data and User Modeling Remco Chang Assistant Professor Tufts University.

People Counting and Human Detection in a Challenging Situation Ya-Li Hou and Grantham K. H. Pang IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART.

New Opportunities for Load Balancing in Network-Wide Intrusion Detection Systems Victor Heorhiadi, Michael K. Reiter, Vyas Sekar UNC Chapel Hill UNC Chapel.

Background Reinforcement Learning (RL) agents learn to do tasks by iteratively performing actions in the world and using resulting experiences to decide.

An Overview of Machine Learning

Continuous optimization Problems and successes

Data Visualization STAT 890, STAT 442, CM 462

CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.

LHC’s Second Run Hyunseok Lee 1. 2 ■ Discovery of the Higgs particle.

Event Metadata Records as a Testbed for Scalable Data Mining David Malon, Peter van Gemmeren (Argonne National Laboratory) At a data rate of 200 hertz,

EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.

Database Systems Carlos Ordonez. What is “Database systems” research? Input? large data sets, large files, relational tables How? Fast external algorithms;

Auger & XtremWeb: Monte Carlo computation on A Global Computing platform O. Lodygensky, G. Fedak, V. Neri, A.Cordier, F. Cappello Laboratoire de l’Accelerateur.

C herenkov Telescope Array Dr. Giovanni Lamanna CNRS tenured research scientist (CTA-LAPP team leader and CTA Computing Grid project coordinator) LAPP.

Program 2 Internal structure and deformation of volcanic edifices.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Analysis of job submissions through the EGEE Grid Overview The Grid as an environment for large scale job execution is now moving beyond the prototyping.

Jiri Chudoba for the Pierre Auger Collaboration Institute of Physics of the CAS and CESNET.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Grid Observatory: goals and challenges.

Introduction to Machine Learning August, 2014 Vũ Việt Vũ Computer Engineering Division, Electronics Faculty Thai Nguyen University of Technology.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Introduction to Machine Learning, its potential usage in network area,

Learning Recommender Systems with Adaptive Regularization

Implementing Boosting and Convolutional Neural Networks For Particle Identification (PID) Khalid Teli .

Cécile Germain-Renaud Grid Observatory meeting 19 October 2007 Orsay

Presentation transcript:

December 5, 2012 MASTODONS meeting Laboratoire de lAccélérateur Linéaire (LAL) / CNRS & Université Paris Sud Laboratoire de la Recherche en Informatique (LRI) / CNRS & INRIA & Université Paris Sud Laboratoire de lInformatique du Parallélisme (LIP) / CNRS & INRIA & ENS Lyon & UCB Lyon PI: Balázs Kégl (LAL) presented by Cécile Germain (LRI) Data in physics: Large-scale data storage, data management, and data analysis for next generation particle physics experiments DeePhy

Balázs is at NIPS

Cécile Germain / LRIMASTODONS DeePhy 3 The collaboration LAL AppStat Auger TAO LRI CompSci / Statistics Experimental Physics ATLAS ILC/Calic e LIP Engineering support Service Informatique AVALON

Cécile Germain / LRIMASTODONS DeePhy 4 The participants Laboratoire de lAccélérateur Linéaire (LAL) Rémi Bardenet (PhD student, AppStat), Djalel Benbouzid (PhD student, AppStat), François-David Collin (IR CNRS, SI), Laurent Duflot (CR CNRS, ATLAS), Diego Garcia Gamez (postdoc, Auger), Michel Jouvin (IR CNRS, SI), Balázs Kégl (CR CNRS, AppStat&Auger), Oleg Lodygensky (IR CNRS, SI), Roman Poeschl (ILC, DR CNRS), David Rousseau (DR CNRS, ATLAS) Laboratoire de la Recherche en Informatique (LRI) Cécile Germain (Professeur, UPSud, TAO), Tristan Glatard (CR CNRS, Laboratoire CREATIS), Michèle Sebag (DR CNRS, TAO) Laboratoire de lInformatique du Parallélisme (LIP) Simon Delamare (IR CNRS), Gilles Fedak (CR INRIA), Laurent Lefêvre (CR INRIA)

Cécile Germain / LRIMASTODONS DeePhy 5 Projects ANR Siminole: , LAL/LRI/Telecom ParisTech Large-scale simulation-based probabilistic inference, optimization, and discriminative learning with applications in experimental physics ANR MapReduce: ?? MRM Grille Paris Sud: , LAL/LRI FP7 EDGI/DEGISCO: ??...

Cécile Germain / LRIMASTODONS DeePhy 6 Meetings Regular (phone) meetings between the PIs Project meeting, November 23 Clouds pour le Calcul Scientifique: November 27-28, 2012 The First International Workshop on BigData in Science: Systems, Infrastructures and Applications (BIGDATA'2013): October 3-4, 2013 ICPP, Lyon, FRANCE

Cécile Germain / LRIMASTODONS DeePhy 7 Budget 2012 LAL LRI LIP

Cécile Germain / LRIMASTODONS DeePhy 8 Mission of Motivate fundamental research by real applications Bring state-of-the-art analysis techniques to experimental physics Computational Statistics High-energy physics Data analysis methodology Real data Motivation Where we are in 2012

Cécile Germain / LRIMASTODONS DeePhy 9 Triggers and lean classifiers D. Benbouzid, R. Busa-Fekete, and B. Kégl, Fast classification using sparse decision DAGs,, International Conference on Machine Learning (ICML), 2012 The telescope image The JEM EUSO telescope on the ISS Trigger = fast classifier: signal vs. background Boosting works well but produces slow classifiers We designed a Markov decision graph (MDDAG) algorithm using reinforcement learning MDDAG can be reused in any test-time constrained problem (object detection, web page ranking), can replace classical cascade designs Leads to interesting research questions (sparsity, representation, deep learning) Where we are in 2012

Cécile Germain / LRIMASTODONS DeePhy 10 Adaptive Metropolis for mixture signals R. Bardenet, O. Cappé́, G. Fort, and B. Kégl, Adaptive Metropolis with online relabeling, in International Conference on Artificial Intelligence and Statistics (AISTATS), The Auger tank signal Cosmic rays and the Pierre Auger observatory Classical adaptive Metropolis was suboptimal due to symmetries (label switching) We designed adaptive Metropolis with online relabeling (AMOR) that works well on our problem AMOR is extendible to any problem involving inference on parametrized mixture signals Where we are in 2012

Cécile Germain / LRIMASTODONS DeePhy Léquipe-projet TAO de lINRIA-Saclay 5 chercheurs ou EC seniors 9 juniors 25 PhD, post-docs et Ingénieurs Computer-Go : 3 Gold medals, 2010 CMAES : de la théorie de loptimisation stochatstique aux applications avec EADS, IFP, PSA, SIMINOLE Autonomics + e-Science: Grid Observatory, Green Computing Observatory avec EGI

Cécile Germain / LRIMASTODONS DeePhy Digital curation of the behavioural data of the EGI grid: observe and publish Specific challenges for analysis unusual extreme statistics: which metrics? Are our systems stationary? (in fact no) Optimisation, autonomics How to build the knowledge? No Gold Standard, too rare experts Infer latent causes, eg analyze traces as text files, and more Build credible benchmarks and models, eg piecewise AR instead of long range dependence The Grid Observatory Managed Element ES Monitor Analyze Execute Plan Knowledge Autonomic Manager ES

Cécile Germain / LRIMASTODONS DeePhy Failure: its not a bug, its a feature [D. Feng, C.Germain and T. Glatard. Distributed Monitoring with Collaborative Prediction. In « 12th IEEE International Symposium on Cluster, Cloud and Grid Computing (CCGrid'12) » 2012] « A distributed system is one in which the failure of a computer you didnt even know existed can render your own computer unusable » BDII LFC SE VOMS CE N2 N3N1 N4 firewall Service/hardware N5 CE n.… k n SE ….. …… ce-hdbdiilfcvoms…. lcg-cr1111 nmap1100 srm-ls1011 …. Dependency matrix Collaborative prediction more efficient than detection/diagnosis

Cécile Germain / LRIMASTODONS DeePhy Failure: its not a bug, its a feature [D. Feng, C.Germain and T. Glatard. Distributed Monitoring with Collaborative Prediction. In « 12th IEEE International Symposium on Cluster, Cloud and Grid Computing (CCGrid'12) » 2012] « A distributed system is one in which the failure of a computer you didnt even know existed can render your own computer unusable » BDII LFC SE VOMS CE N2 N3N1 N4 firewall Service/hardware N5 CE n.… k n SE ….. …… Probe matrix Collaborative prediction more efficient than detection/diagnosis lcg-cp

Cécile Germain / LRIMASTODONS DeePhy 15 BigData and BigComputation Scientific challenges for the next 4-5 years The LHCThe EGEE/EGI gridThe ATLAS collaboration largest: 26 km highest energy: 7 TeV coldest: 1.9 K emptiest: (check because 13 atm is very high pressure) largest: 200K CPUs, 15 PB most distributed: 250 sites busiest: 1000 jobs/day largest: 3000 scientists widest: 38 countries and 174 institutions

Cécile Germain / LRIMASTODONS DeePhy 16 BigData and BigComputation Scientific challenges for the next 4-5 years Opportunities We can apply computationally expensive analysis techniques we could not dream of ten years ago The Google cat: deep learning technique running on 16K cores for three days, watching 10M YouTube stills [Le et al., ICML12] Large-scale parallel Monte-Carlo Markov chains or likelihood-free simulation based approximate Bayesian computation [Tavaré et al., 1997]

Cécile Germain / LRIMASTODONS DeePhy 17 BigData and BigComputation Scientific challenges for the next 4-5 years Challenges Large-scale machine learning is a new paradigm: statistical overfitting is no longer a danger, underfitting is optimization time interferes with the approximation-estimation trade-off, mediocre optimization algorithms like stochastic gradient descent become competitive. [Bottou-Bousquet, 2008, 2011] The algorithmic toolbox shrinks considerably: it is often better to use a large data set and a suboptimal but fast algorithm then a small data set and an optimal expensive technique Data management and work-flow become crucial, often more important than optimizing the single-thread technique

Cécile Germain / LRIMASTODONS DeePhy 18 Data as a communication tool between Physics and CS A well-designed challenge can attract a large number of professional data miners Tricky: data cleaning and formatting, defining a standardized problem, evaluation metric, scripting, web interface, marketing requires the collaboration of physicists, computer scientists, and engineers The prize is worthy: objective evaluation of a large number of techniques on a given problem What we will do in 2013

Cécile Germain / LRIMASTODONS DeePhy 19 The Higgs boson challenge Ob servation à plus de 5 sigma dune particule dont les propriétés sont compatible avec celle du boson de Higgs Phys. Lett. B 716 (2012) 1-29 Phys. Lett. B 716 (2012) 1-29 Spectre filtré à partir de quelques milliards de collisions proton-proton enregistrées Piètre resolution Pic du signal à coté dun pic de bruit de fond large et important Il sagit maintenant détablir que la nouvelle particule est vraiment le boson de Higgs quon attend en la détectant dans dautres canaux plus difficiles What we will do in 2013 Atlas Signal Higgs

Cécile Germain / LRIMASTODONS DeePhy 20 The Higgs boson challenge Challenge 1: estimating the mass of the Higgs boson candidate as precisely as possible (at the moment: 20%) within a fixed CPU time (0.1s par event) Monte-Carlo integration in 5-6 dimensions with constraints Adaptive importance sampling, adaptive MCMC, etc. What we will do in 2013

Cécile Germain / LRIMASTODONS DeePhy 21 The Higgs boson challenge Challenge 2: detecting the Higgs boson standard classification problem with ~100 features at two classes: signal vs. background features are constructed manually AdaBoost, SVM, neural networks, etc. Challenge 3: unsupervised feature extraction two-class classification on raw input: a variable size list of particles targeting the deep learning community What we will do in 2013

Challenges scientifiques du projet à 4 ou 5 ans et affinage éventuel de ces challenges depuis la soumission des projets, Organisation du projet, modalités de travail collaboratif, agenda des réunions techniques de l'année 2012, Premiers résultats scientifiques obtenus ou le cas échéant identification des premières pistes de recherche explorées et positionnement par rapport à l'état de l'art, Objectifs scientifiques pour 2013, en donner une vue plus détaillée que celle du point 1. 1 slide sur l'utilisation du budget alloué au projet.