Presentation on theme: "NorthGrid status Alessandra Forti Gridpp12 Brunel, 1 February 2005."— Presentation transcript:
NorthGrid status Alessandra Forti Gridpp12 Brunel, 1 February 2005
1 February 2005Alessandra Forti Gridpp12 NorthGrid status Sites Summary Posts Communication Security –Cooling systems –University security procedures Storage Misc Conclusions
1 February 2005Alessandra Forti Gridpp12 Lancaster Old cluster: 52 cpu, 31 kSI2k LCG-2_3_0 Still RedHat 7.3 New cluster: waited for next week –409 kSI2k, 80 TB Location: HEP department Security: university firewall Installation: kickstart+manual
1 February 2005Alessandra Forti Gridpp12 Liverpool 120 nodes, LCG-2_2_0 until mid-december Complained that nobody was using it –Questioned the effort of putting more nodes if not needed Cooling system breakdown –Hasnt recovered yet Location: HEP department Security: University firewall + machine firewall Installation: kickstart+manual
1 February 2005Alessandra Forti Gridpp12 Manchester Old cluster: 45kSI LCG version 2_3_0 RedHat 7.3 –Upgrade to SL next 2 weeks New cluster: tender sent out this week Location: now HEP department –future universtiy Computing centre Security: University firewall (basically opened) Installation: kickstart+manual
1 February 2005Alessandra Forti Gridpp12 Sheffield New Cluster: 160 cpus –240 kSI2k, 4 TB of storage LCG-2_3_0 SL3.0.3 Location: university computing center Security: university firewall Installation: kickstart+manual
1 February 2005Alessandra Forti Gridpp12 Communication Technical board meetings happen now once a month Sys admin are more responsive to the emails but not to mailing lists. Dont have much time to follow what is happening –Posts are not yet in place to help
1 February 2005Alessandra Forti Gridpp12 Posts 4.5 FTE to fill between sites –Lancaster: still looking –Liverpool: still looking for –Manchester: interviewing –Sheffield: filled No common strategy between universities on this. Perhaps needed? –Posts should work together
1 February 2005Alessandra Forti Gridpp12 Resources security problems: cooling system Clusters partly at HEP departments partly in universities computing facilities Doesnt seem to affect the outcome of cooling systems failures –Liverpool and Sheffield both suffered a serious one –Manchester had to upgrade just to add few machines. There is no protection against this kind of failures. They happen.
1 February 2005Alessandra Forti Gridpp12 Resources security problems: setup and procedures Sites dont know their universities security policies and procedures –Theyll find out They know about university firewalls because they have to ask to open ports –Lately also to keep them close: would be useful a better LCG port table Some dont have a central logging system Some dont have active monitoring Coordination/cooperation needed?
1 February 2005Alessandra Forti Gridpp12 Storage management problems Storage hardware at sites various –Few hundreds GB 80 TB Sites dont know LCG storage requirements for Tier2s: volatile or permanent. –Prefer volatile (=no backup requirements) Only classic SE. –They are difficult to manage –Default configuration doesnt help –Quotas ignored by Information system –CERN says modifying classic SE code is not worth the effort. New DPM/SRM out soon..................
1 February 2005Alessandra Forti Gridpp12 Misc There are no local monitoring tools –Coordination/cooperation needed? GIIS, GOC, ganglia still used to check own system RGMA working only in Manchester and not all the time –Difficult to understand where is the problem Apel and accounting need more publicity –Especially if old logs backup is necessary