Presentation is loading. Please wait.

Presentation is loading. Please wait.

Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team.

Similar presentations


Presentation on theme: "Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team."— Presentation transcript:

1 Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team

2

3 Server's Hardware profile SuperMicro machines 2 X Intel Xeon dual core @ 2.0 GHz 4 GB RAM RAID 1 - 120 GB HDs

4 Nodes Hardware profile (40) Dell PowerEdge 2950 – 2 x Intel Xeon Quad core @ 2.33 GHz – 16 GB RAM – RAID 0 – 6 x 1 TB Hard Drives CE Resources – 8 Batch slots – 66.5 kHS06 – 2 GB RAM / Slot SE Resources – 5.8 TB Useful for dCache or hadoop Private network only

5 Nodes Hardware profile (2+5) Dell R710 – 2 are Xen Servers – not worker nodes – 2 X Intel Xeon Quad core @ 2.4 GHz – 16 GB RAM RAID 0 – 6 x 2 TB Hard Drives CE – 8 Batch Slots (or more?) – 124.41 kHS06 – 2 GB RAM / Slot SE – 11.8 TB for dCache or hadoop ` Private network only

6 First phase nodes Profile (82) SuperMicro Server – 2 Intel Xeon single core @ 2.66 GHz – 2 GB RAM – 500 GB Hard Drive & 40 GB Hard Drive CE Resources – Not used – Old CPU’s & low RAM per node SE Resources – 500 GB per node

7 Plans for the future - Hardware Buying 5 more Dell R710 Deploying 5 R710 when the disks arrive – More 80 cores – More 120 TB Storage – More 1244 kHS06 Total CE - 40 PE 2950 + 10 R710 = 400 Cores || 3.9 kHS06 SE - 240 + 120 + 45 = 405 TB

8 Software profile – CE OS – CentOS 5.3 64 bits 2 OSG Gatekeepers – Both running OSG - 1.2.x – Maintenance tasks eased by redundancy – less downtimes GUMS 1.2.15 Condor 7.0.3 for job scheduling

9 Software profile – SE OS - CentOS 4.7 32 bits dCache 1.8 – 4 GridFTP Servers PNFS 1.8 PhEDEx 3.2.0

10 Plans for the future: Software/Network SE Migration – Right now we use dCache/PNFS – We plan to migrate to BeStman/Hadoop Some effort already comes up with results Adding the new nodes to the Hadoop SE Migrate the data Test with real production environment – Jobs and users accessing Network Improvement – RNP (our network provider) plan to deliver for us a 10 Gbps link before the next SuperComputing Conference.

11 T2 Analysis model & associated Physics groups We have reserved 30 TB for each of the groups: Forward Physics B-Physics Studying the possibility to reserve space for Exotica The group has several MSc & PhD students working on CMS Analysis for a long time – These have a good support Some Grid users submit, sometimes run into trouble and give up – don't ask for support

12 Developments Condor Mechanism based on suspend to give priority to a very little pool of important users : – 1 pair of batch slots per core – When the priority user’s jobs arrive, it pauses the normal job on the other batch slot – Once it finishes and vacate the slot, his pair automatically resumes. – Documentation can become available for the interested – Developed by Diego Gomes

13 Developments Condor4Web – Web interface to visualize condor queue Shows grid DN’s – Useful for Grid users that want to know how the job is going scheduled inside the site – http://monitor.hepgrid.uerj.br/condor – Available on http://condor4web.sourceforge.nethttp://condor4web.sourceforge.net – Still have much to evolve, but already works – Developed by Samir

14 CMS Center @ UERJ During LISHEP 2009 – January we have inaugurated a small control room for CMS on UERJ:

15 Shifts @ CMS Center Our computing team have participated on tutorials and now we have four potential CSP Shifters

16 CMS Centre (quick) profile Hardware – 4 Dell workstations with 22” monitors – 2 x 47” TV’s – Polycom SoundStation Software – All the conferences including with the other CMS Centers are done via EVO

17 Cluster & Team Alberto Santoro (General supervisor) Eduardo Revoredo (Hardware coordinator) Samir Cury (Site admin) Douglas Milanez (Trainee) Andre Sznajder (Project coordinator) Jose Afonso (Software coordinator) Fabiana Fortes (Site admin) Raul Matos (Trainee)

18 2009/2010 year’s goals We have worked in 2009 mostly in – Getting rid of the infra-structure problems Electrical Insuficciency AC – Many downtimes due to this These are solved now – Besides that problems Running official production on small workflows Doing private production & analysis for local and Grid users 2010 goal – Use the new hardware and infra-structure for a more reliable site – Run more heavy workflows and increase participation and presence on official production.

19 Thanks! I want to formally thank Fermilab, USCMS and OSG for their financial help to bring an UERJ representantive here. Also want to thank USCMS for this very useful meeting

20 Questions? Comments?


Download ppt "Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team."

Similar presentations


Ads by Google