Presentation is loading. Please wait.

Presentation is loading. Please wait.

NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 1 Israel ATLAS TIER-2 Status April 2011 Lorne Levinson.

Similar presentations


Presentation on theme: "NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 1 Israel ATLAS TIER-2 Status April 2011 Lorne Levinson."— Presentation transcript:

1 NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 1 Israel ATLAS TIER-2 Status April 2011 Lorne Levinson

2 NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 2 Israel HEP community ATLAS is the only LHC experiment in which we participate –also Phenix (Heavy Ion @BNL), ILC, ZEUS –Israel is “1.35% of ATLAS” (MoU pledge, authors, common fund) –25-30 people doing physics analysis 3 sites: –Tel Aviv University, Tel Aviv (1956) a university –The Technion Israel Institute of Technology, Haifa (1924) a university –Weizmann Institute of Science, Rehovot (1934) a research institute for Biology, Chemistry, Physics, Math & CS) with graduate school (no undergrads) longest travel is Weizmann  Technion 2 hours office-to-office

3 NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 3 Organization we are a distributed Tier2/Tier3 each site combines Tier2 and Tier3 resources in the same cluster –all resources shared flexibly between T2 and T3 (Lustre/Storm) single management and budget, single purchasing three sites as identical as possible Steering Committee for overall policy Management & Operations team for the three sites stable funding approved until 2012

4 Storage Continues to be the biggest reliability issue. Our hardware is now stable: –replaced DDN 6620’s with DDN 9900 Fully redundant, 300 disk slots, 8x8Gb/s FC ports  5GB/s –two Lustre “OSS” servers –WI servers with 10Gb/s to cluster, TAU, Tech will install 10G in April Gave up on Thumpers+Lustre and Thumpers+iSCSI+Lustre. –We NFS mount Thumpers with Solaris+ZFS for extra "archive" storage, home directories or /opt/exp_soft Lustre + Storm  problem is Storm team does not test new Storm releases on Lustre –Storm-Lustre community must solve this NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 4

5 NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 5 Storm/Lustre Storm allows LCG SRM storage and our local global file name space to share the same physical storage. –No rigid boundary –Jobs in cluster can do Linux file io to read SRM files Storm can run over Lustre (open source) or GPFS (IBM) Lustre: –Object Storage Targets serve (stripes of) file data –Meta-Data Server holds directories redundant failover of MDS’s will soon be supported

6 Storage – installed SRM + local capacity TAUTechnionWeizmannTotal 2010 240192288720 2011 purchase 96144 384 Total 2011 336 4321104 Heavy Ion 3Q2011 481152 NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 6 Net TB

7 NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 7 Group disks We are hosting four ATLASGROUPDISK areas –Muon performance (Technion) –Top (Weizmann) –Heavy Ion (Weizmann) –Standard Model (TAU) (empty)

8 CPU Last purchase was dual Intel E5520 quad core May delivery purchase is dual Intel X5650 hex-core –again 4 motherboards per 2U box with redundant power supply NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 8 coresTel AvivTechnionWeizmannTotal Now192272448944 May3364646401440 We benefit a lot that some other groups place some cores in our cluster: * Weizmann: ATLAS+Phenix/Heavy-Ion, HEP Theory, Condensed matter * Technion: HEP Theory and Bio-informatics * TAU includes: HEP Theory

9 Services nodes Virtualize most services Two 8-core servers, 48GB Failover Easier management –VM images –Roll-back –Image sharing –Easier testing: temp machines May delivery of HW Deciding among: VMware, Xen, Citrix, KVM SE not included ServiceWhere gLite CEper site gLite site-BDIIper site gLite MONper site glite APELper site ELOG electronic log bookWI Zenoss fabric monitoringper site LDAP, DNS, DHCP, syslogper site Frontier DB cacheper site VOMS (for Israel)TAU gLite WMS, LB (for Israel)WI gLite myproxy (for Israel)WI gLite Top-BDII (for Israel)WI gLite NAGIOS for Israel grid service monitoring WI Mantis issue trackerTech Managers’ Wiki pagesTech NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 9

10 Networking Our networking is not good Geant connection is 2 x 1.5G (subscribed on 2 x 2.5G infrastructure) “Political” limits: TAU 500M, Technion 350M, WI 400M –Because a 1G line is shared with institute traffic and the shared router is not really able to do 1G duplex We suspect that the gross mismatch with SARA/NIKHEF’s 10G causes failed connections due to dropped packets. –Lowering the # of files & streams to avoid dropped packets leaves us with even worse net BW Expensive because it is an undersea fiber and one (Italian) company owns the fibers. –An Israeli competitor is installing another fiber now NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 10

11 Networking NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 11

12 GEANT NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 12

13 Networking plans May 2011(?): Increase international connection: from 3Gb/s to 4Gb/s. –5G might be possible later this year, but not budgeted. Replace old routers at entrances to institutes with 10G capable equipment. –This should increase our thru’put and reliability and allow us to actually use a major share of the 1G BW to the sites Negotiating 10G academic backbone Could have 10G to Geant in spring 2012 NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 13

14 SAM/NAGIOS Our NGI did not take on the SAM/NAGIOS monitoring responsibility After the new NAGIOS tests replaced SAM tests, we received no alerts on failed tests. This was a severe problem Finally in December it was agreed with EGI, our NGI and us that we would deploy a NAGIOS test service for Israel, until our NGI succeeded to do it. –The only functioning grid sites in Israel are our 3 ATLAS sites Our NAGIOS service was up and running in January. NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 14

15 Upcoming work Deploy Zenoss fabric and service monitor on all three clusters –currently in-test at Weizmann Deploy Puppet configuration system on all three clusters –We gave up on Quattor after having finally succeeded in getting it to run, Clear that it was unsustainable –Currently for work nodes at Weizmann –Needs to include gLite nodes Virtualization of services (excl SE) Address Storm “untested new version” problem NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 15

16 NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 16 End


Download ppt "NL Cloud Meeting, 5 April 2011 Israel ATLAS Tier2 Status 1 Israel ATLAS TIER-2 Status April 2011 Lorne Levinson."

Similar presentations


Ads by Google