Presentation is loading. Please wait.

Presentation is loading. Please wait.

May 10, 2000PHENIX CC-J Updates1 PHENIX CC-J Updates - Preparation For Opening - N.Hayashi / RIKEN May 10, 2000 PHENIX Computing

Similar presentations


Presentation on theme: "May 10, 2000PHENIX CC-J Updates1 PHENIX CC-J Updates - Preparation For Opening - N.Hayashi / RIKEN May 10, 2000 PHENIX Computing"— Presentation transcript:

1 May 10, 2000PHENIX CC-J Updates1 PHENIX CC-J Updates - Preparation For Opening - N.Hayashi / RIKEN May 10, 2000 PHENIX Computing Meeting @BNL

2 May 10, 2000PHENIX CC-J Updates2 CC-J Organization & Construction Organization –Director:M. Ishihara –Planning and Coordination Office (PCO): Manager: T. Ichihara Technical manager: Y. Watanabe Scientific program coordinator: H. Hamagaki, H. En’yo Computer scientist: N. Hayashi, S. Sawada, S. Yokkaichi CC-J liaison at BNL: N. Saito Hardware & System Software Implementation –T. Ichihara- manager, network –Y. Watanabe- HPSS, data duplication facility –N. Hayashi- network, linux, job control –S. Sawada (KEK)- PBS –S. Yokkaichi(RIKEN)- monitor, Objectivity <--- Full Time –H. Hamagaki (CNS, Tokyo)- AFS Contributor –Y.Goto (RBRC), K.Homma (Hiroshima)

3 May 10, 2000PHENIX CC-J Updates3 Plan and Current Status of the CC-J 1998 1999200020012002 RBRC (BNL) R&D for CC-J RIKEN Wako Phase 1 Phase 2 Phase 31/3 scale 2/3 scale Full scale Prototype of CPU farms Data Duplication facility April CC-J review at BNL (Dec. 1998) HPSS Software/Hardware Installation (March 1999) (Supplementary Budget) CC-J starts operation at 1/3 scale (April. 2000) Full scale CC-J (Mar. 2002) CC-J construction CC-J front-end at BNL April CC-J Working Group formed (Oct. 1998) PHENIX Exp. at RHIC

4 May 10, 2000PHENIX CC-J Updates4 Upper part moved to new room New hardware –Additional Two AltaCluster 96 CPUs in total MHz*CPU: 450x32, 600x32, 700x32 –Additional 1.6 TB disk 3.2 TB in total will be operational in 2 weeks Current System Configuration New Building 3.2 TB RAID Disk

5 May 10, 2000PHENIX CC-J Updates5 New CC-J Room Moving the system (April 13) Enough space to extend to triple scale New hot hardware arrived, but need to cool down

6 May 10, 2000PHENIX CC-J Updates6 New HPSS Room Moved during new year’s holiday

7 May 10, 2000PHENIX CC-J Updates7 Linux Upgrade All Linux nodes upgraded to RH6.1 Use KickStart and post installation procedure –KickStart is “Automated installation” feature provided byRedHat. –Take ~10 min. per node –applied for ap01~ap32 / ap33~ap48 are normal way New kernel 2.2.14+NFSv3 patch

8 May 10, 2000PHENIX CC-J Updates8 Performance PISA & Response Chain –Au-Au central collision 200 events –up to 320MB for memory in response chain!!! NFS performance test with ‘bonnie -s 2047’ –2GB read/write –Write 6.6 MB/s; Read 10.1 MB/s

9 May 10, 2000PHENIX CC-J Updates9 Batch Queue System PBS –By now Server on Sun & Linux client One client crash blocks all queuing system (scheduler) The deadlock problem is realized by developing team. Protocol change (TCP -> UDP) may solve. –What we’re trying Both server and client on Linux It looks OK. But keep monitoring... LSF option? –Cost? –We may have evaluation...

10 May 10, 2000PHENIX CC-J Updates10 HPSS Status Acceptance Test - Feb. 21 ~ 29, 2000 http://ccjsun.riken.go.jp/ccj/doc/HPSS/PRR/index.html –Single file write/read 10~12MB/s (sun), 5~7MB/s (Linux) –NB! Sun: through Gigabit, Disk I/O is bottle neck; Linux: 100 BaseT –Multi files write/read simultaneously>50MB/s –pget/pput 48 hours continuos operation –Migration performance>850GB/day Degrade performance when two tape drives works simultaneously on one mover Due to PCI bus collision. Solved quickly. HPSS Readiness Review - Mar. 15, 2000 –RIKEN, IBM Japan & IBM Houston Team

11 May 10, 2000PHENIX CC-J Updates11 HPSS Status (cont’d) Random access tape –Staging time: about 250~300 sec for 1.2GB file Additional drives needed –RedWood Drive is going to be terminated –Next generation of drive announced in fall High Capacity (RedWood type) or Fast (Eagle type) Staging Time (sec) 300 200 100 0 400

12 May 10, 2000PHENIX CC-J Updates12 Schedule Duplication facility –Transfer data by RedWood Tape –DST to CC-J and MC data from CC-J to RCF –Y.Watanabe & S.Yokkaichi work with RCF (May 15~June 22) CC-J Opening –Target Day: Next Month, hopefully Next Core Week Delayed about a month for further preparation

13 May 10, 2000PHENIX CC-J Updates13 Usage Guide Line (Draft) User account –PWG simulator –Regional Scientists (PHENIX-J & PHNEIX- Asia) –Spin Physics Disk quota –PWG simulation Assigned based on a proposal –CC-J user 4GB/5GB(soft/hard limit) for “home” 20GB/25GB for “work” System backup –At least initially, “home” directory only No “rftp” –HPSS will not be opened for private archiver –Technical issue for the implementation –Comparable feature is under consideration

14 May 10, 2000PHENIX CC-J Updates14 Summary New room and new hardware Configuration and test underway HPSS “ready to use” Batch queuing system, Duplication facility, Objectivity practice are important Working on an usage guide line Opening announcement --- Next core week

15 May 10, 2000PHENIX CC-J Updates15 Issues for a big simulation project Run number server and its database Run script –not worked yet Simulation coordination –How does it work? Retracted configuration run? sqrt{s}=140 GeV Run? Libraries packaging? –To avoid problems related to AFS or network

16 May 10, 2000PHENIX CC-J Updates16 VRDC-J http://ccjsun.riken.go.jp/~hayashi/vrdc.html PISA/Response chain production –rather high failure rate “auau cent”, “pp cent Photon” –all data sinked into HPSS. /home/phnxsink/vrdcdataj/hijing/cent_arm/…./pythia/muon_arm/...

17 May 10, 2000PHENIX CC-J Updates17 VRDC-J (cont’d) –Total CPU time: 590 CPU*day (450MHz equiv.) – 516 CPU*day (raw) –Total PRDF size: 1.8TB (uncompressed) – >14 GB (gzipped) –PISA output size: 280GB

18 May 10, 2000PHENIX CC-J Updates18 VRDC-J (cont’d) Statistics per event Additional1M events run for pp minimum bias –250 events/run: ~15min/run –PISA ZFATAL fail 11 runs out of 4,000 runs –PISA BIMCT stops 1 runs –PHOOL segmentation violation 144 runs NB: It happened on all the same node ap10. It could be specific to this node.


Download ppt "May 10, 2000PHENIX CC-J Updates1 PHENIX CC-J Updates - Preparation For Opening - N.Hayashi / RIKEN May 10, 2000 PHENIX Computing"

Similar presentations


Ads by Google