Presentation is loading. Please wait.

Presentation is loading. Please wait.

SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster.

Similar presentations


Presentation on theme: "SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster."— Presentation transcript:

1 SkimSlimService ENABLING NEW WAYS

2 Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster cpu’s) Physicists have no feedback on resources they used. Long running times. Very small percentage of people wants/knows-how to optimize their code. IT people are not happy when someone submits 10k jobs running with 1% efficiency for days, producing 10k of 100 MB files. Huge load on people doing DPD production, frequent errors, slow turnaround. Nobody wants to care about DS sizes, registrations, DDM transfers, approvals. This is the moment to do changes.

3 (R)evolution of ATLAS data formats 2/18/13ILIJA VUKOTIC 3 Original plan (6y. ago) ESD < 500kB/ev 1k br. AOD <100 kB/ev 500 br. Athena used for everything. 4y. ago ESD < 1500kB/ev 8k br. AOD <500 kB/ev 4k br. Athena + ARA 3y. ago ESD < 1800kB/ev 10k br. AOD <1000 kB/ev 7k br. D3PD <20 kB/ev 500-7 k br. Athena + ARA + ROOT Today ESD < 1800kB/ev AOD <1000 kB/ev D3PD <200 kB/ev Athena + ARA + ROOT + Mana + RootCore + Event… Proposals for future ESD < 1800kB/ev AOD <1000 kB/ev GODZILA D3PDs, Structured D3PDs D3PD Athena + ARA + ROOT + Mana + RootCore + Event… TAG ?!

4 Problems with ATLAS data formats 2/18/13ILIJA VUKOTIC 4 large and kept only for a short time. Used only for special studies ESDs too large, needs Athena/ARA/Mana, slow to start up, nobody made it user friendly AOD A lot of them. Flat format Too large. (in sum much larger than AOD) Expensive to produce, store. Inefficient to read Could be reduced at least 60% but nobody knows who needs what Effectively usable only from grid jobs D3PD Takes up to a week to produce it on the grid. People make them larger than necessary to avoid doing it twice Files usually too small for efficient transport, storage, thus requiring merging that can’t be done on grid. Skim/slim D3PD

5 What a physicist want? 2/18/13ILIJA VUKOTIC 5 A full freedom to do analysis In a language he wants Not be forced to use complex frameworks with hundreds of libraries, 20 min compilations, etc. Not be forced to think about computing farms, queues, data transfers, job efficiency, … Get results in no time.

6 Idea 2/18/13ILIJA VUKOTIC 6 Let small number of highly experienced physicists together with IT stuff handle big data. They can do it efficiently. Move majority of physicists away from 100TB scale data to ~100GB data. Sufficiently small for transport, you can analyze it anywhere, even on your laptop. However inefficient your code you won’t spend too much resources, and will get results back in a reasonable time.

7 How would it work 2/18/13ILIJA VUKOTIC 7 Use FAX to access all the data without overhead of staging. Use optimally situated replicas. (possible optimization - production D3PDs preplaced at just several sites, maybe even just one) Physicists request skim/slim through a web service. Could add a few variables in flight. Produced datasets registered in the name of requester. Delivered to a site requested. All in 1-2 hours – this is essential, as only in this case people will skim/slim to only variables they need without thinking of – “what if I forget something I’ll need”.

8 Would it work? 2/18/13ILIJA VUKOTIC 8 Couple hundreds dedicated cores which are made free from all personal inefficient slims/skims using prun. Highly optimized code As we know what branches (variables) people are using we know what is useless in the original D3PDs, so we can produce them much smaller. If bug found in D3PD production no new global redistribution. Some problems can even be fixed in place without new production. If we find it useful we can split/merger/reorganize D3PD without anyone noticing. We could later even go for a completely different underlying big data format: Godzilla D3PDs, merged AOD/D3PD, Hadoop !

9 SkimSlimService 2/18/13ILIJA VUKOTIC 9 1 We have no dedicated resources for this I used UC3 but any queue that has cvmfs will suffice. 2 Modified version of filter-and-merge.py used. 3 Currently under my name as I don’t have production role. Web site at CERN gets requests, shows their status Web site at CERN gets requests, shows their status Handmade server 1 receives web queries, collects info on datasets, files, trees, branches Handmade server 1 receives web queries, collects info on datasets, files, trees, branches Executor at UC3 1 gets tasks from the DB, creates, submits condor SkimSlim jobs 2 makes and registers resulting DS 3 Executor at UC3 1 gets tasks from the DB, creates, submits condor SkimSlim jobs 2 makes and registers resulting DS 3 OracleDB at CERN Stores requests, splits them in tasks, serves as a backend for the web site OracleDB at CERN Stores requests, splits them in tasks, serves as a backend for the web site

10 2/18/13ILIJA VUKOTIC 10 http://ivukotic.web.cern.ch/ivukotic/SSS/index.asp

11 Test runs results 2/18/13ILIJA VUKOTIC 11 Used datasets, skim, slim code of our larges user. Worst case scenario. All of the SMWZ 2012 data and MC 185 TB -> 10 TB (300 branches) Missing in FAX 24 datasets (~3.5%) data.Egamma.txt284 data.Muons.txt288 mc.Alpgen.txt63 mc.Herwig.txt3 mc.Pythia8.txt28 mc.Sherpa.txt19 mc.all.txt9 Total694

12 Test runs results 2/18/13ILIJA VUKOTIC 12 CPU efficiency: when data local ~ 0.75%, remote data between 10 and 50% (6.25MB/s gives 100% eff.) All of SMWZ requires 8600 CPU hours. Can be done in 2 hours by pooling unused resources. Could have one service in EU and one in US to avoid over the ocean traffic. It is easy to deploy service on anything that mounts CVMFS (UC3,UCT3, UCT2, OSG, EC2). On EC2 assuming small instance ~ 500$ Micro instance and spot pricing ~100$. But result delivery ~1k$ (10TB * 0.12/GB).

13 Conclusion 2/18/13ILIJA VUKOTIC 13 Produced a fully functional system you may use now. To be done Polish it Market it Push it politically (essential)

14 Reserve 2/18/13ILIJA VUKOTIC 14

15 A number of ATLAS sites made their storage accessible from outside using xRootD protocol 1. Has a mechanism that gets you a file if it exists anywhere in the federation. All kinds of sites: xrootd, dCache, dpm, lustre, gpfs Read only Need a grid proxy to use it Instructions: https://twiki.cern.ch/twiki/bin/view/Atlas/UsingFAXforEndUsers global regional AGLT2MWT2 SLAC 2/18/13ILIJA VUKOTIC 15 What is FAX? 1 CMS has very similar system they call AAA. EU UK OxfordQMUL Redirector Endpoint

16 2/18/13ILIJA VUKOTIC 16 1 CMS has very similar system they call AAA. We want all the T1s and T2s included. Adding new sites weekly. Currently 31. FAX today AGLT2 BNL-ATLAS BU_ATLAS_TIER2 CERN-PROD DESY-HH INFN-FRASCATI INFN-NAPOLI-ATLAS INFN-ROMA1 JINR-LCG2 LRZ-LMU MPPMU MWT2 OU_OCHEP_SWT2 PRAGUELCG2 RAL-LCG2 RU-PROTVINO-IHEP SWT2_CPB UKI-LT2-QMUL UKI-NORTHGRID-LANCS-HEP UKI-NORTHGRID-LIV-HEP UKI-NORTHGRID-MAN-HEP UKI-SCOTGRID-ECDF UKI-SCOTGRID-GLASGOW UKI-SOUTHGRID-CAM-HEP UKI-SOUTHGRID-OX-HEP WT2 WUPPERTALPROD GRIF-LAL GRIF-IRFU GRIF-LPNHE IN2P3-LAPP

17 Does it work? 2/18/13ILIJA VUKOTIC 17 *For the most part. But a lot of redundancy in the system. We have ~2.5 copies of popular datasets. YES!

18 2/18/13ILIJA VUKOTIC 18 What is it good for? IT: less failed jobs Physicist: less failed jobs Failover if grid job has a problem with an input file IT: easier upgrades, more availability Physicist: more CPU resources Diskless Tier2 IT: simpler and cheaper Physicist: more CPU resources Diskless Tier3s Physicist: effectively more disk space Less data movements GlobalLFN simplify scripts Enables storage sharing between nearby sites University queues Amazon, Google, Microsoft clouds Easily spin more workers Optimize applications Who is reading what How efficiently Have full info

19 How it works? 2/18/13ILIJA VUKOTIC 19 Quite complex system A lot of people involved A lot of development Takes time to deploy Takes time to work out kinks

20 What can I do today? 2/18/13ILIJA VUKOTIC 20 Access data on T2 disks localgroupdisk, userdisk, … If a file is not there job won’t fail, but will come from elsewhere. I can run jobs at uct2/uct3 and access data anywhere in FAX. Use frun: ◦If you have data processed at 10 sites all over the world ◦Want to merge them ◦Want to submit jobs where queues are short

21 Full Dress Rehearsal 2/18/13ILIJA VUKOTIC 21 A week of stress testing all of the FAX endpoints While we have continuous monitoring of standard user accesses (ROOT, xrdcp) to stress the system one has to submit jobs to grid. Submitting realistic jobs manually, automatically Had more problems with tests than with FAX ◦Late distribution of test dataset to endpoints (TB size datasets) ◦High load due to winter conferences did not help ◦Jobs running on a grid node are entirely different game due to limited proxy they use. ◦Found and addresses a number of issues ◦New voms libraries developed ◦Settings at several sites corrected ◦New pilot version Conclusion: We broke nothing (storages, lfcs, links, servers, monitoring). As soon as all observed problems fixed, we’ll hit harder.

22 FAX – remaining to be done 2/18/13ILIJA VUKOTIC 22 Near future: Further expansion: next in line – French and Spanish clouds Improving robustness of all the elements of the system Improving documentation, giving tutorials, user support Months: Move to Rucio Optimization: making network smart so it provides the fastest transfers Integration with other network services

23 Foogle.com 2/18/13ILIJA VUKOTIC 23 Simple to use: Learn a few simple things (shell scripts, pbs/condor macros, python, root and c++, laTeX, … ) Write a few hundreds pages of code Process crawler data and rewrite in a new way. Move it Rewrite original format to a new different one. Rewrite again. Move it. Code to find the page Compile your page to ps/pdf Show! New internet search engine! Say NO to IE, firefox, chrome! RAW -> ESD ESD -> AOD AOD -> D3PD D3PD -> slimmed D3PD slimmed one to Ntuple for final analysis Final analysis Terminal based! From inventors of WWW !


Download ppt "SkimSlimService ENABLING NEW WAYS. Problems of Current Analysis Model 2/18/13ILIJA VUKOTIC 2 Unsustainable in the long run (higher luminosity, no faster."

Similar presentations


Ads by Google