Presentation is loading. Please wait.

Presentation is loading. Please wait.

HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.

Similar presentations


Presentation on theme: "HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis."— Presentation transcript:

1 HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis Framework

2 BELLE/CHEP20002 Introduction to B Factory at KEK KEK-B Accelerator is e + e - asymmetric energy collider: 3.5GeV/c for positrons 8.0GeV/c for electrons Designed luminosity is 1.0 x 10 34 cm -2 s -1 Now KEK-B is operated at ~5.0 x 10 32 cm -2 s -1 BELLE Experiment Goal of BELLE experiment is to study CP violation in B meson decays Experiment is in progress at KEK

3 3 BELLE Detector

4 BELLE/CHEP20004 BELLE Detector SVD Precise vertex detection CDC Track momentum reconstruction Particle ID with dE/dx ACC Aerogel Cherenkov counter for particle ID TOF Particle ID and trigger ECL Electromagnetic calorimeter for e - and  reconstruction KLM Muon and K L and detection EFC Electromagnetic calorimeter for luminosity measurement

5 BELLE/CHEP20005 Current Event Reconstruction Computing Environments Event reconstruction is performed on 8 SMP machines UltraEnterprise x 7 servers equipped with 28 CPUs Total CPU power is 1,200 SEPCint95 Sharing CPUs with user analysis jobs MC production is done on PC farm (P3 500MHz x 4 x 16) Reconstruction Speed 15Hz/server 70Hz/server with L4 (5.0 x 10 32 cm -2 s -1 )

6 BELLE/CHEP20006 Necessity for System Upgrade In Future We will have more luminosity 200Hz after L4 (1.0 x 10 34 cm -2 s -1 ) Data size may increase more Possibly background It causes lack of computing power We need 10 times of current computing power when considering DST reproduction and user analysis activities

7 BELLE/CHEP20007 Next Computing System Low Cost Solution We will build new computing farm with sufficient computing power Computing servers will consist of ~50 units of 4-CPU PC servers with Linux ~50 units of 4-CPU SPARC servers with Solaris Total CPU power will be 12,000 SPECint95

8 8 Configuration of Next System switch hub PC Sun I/O servers PC servers Tape I/O: 24MB/s Gigabit 100Base-T tape libraryswitch FS file server

9 BELLE/CHEP20009 Current Analysis Framework BELLE AnalysiS Framework (B.A.S.F.) B.A.S.F. supports event by event parallel processing on SMP machines hiding parallel processing nature from users B.A.S.F. is currently used widely in BELLE from DST production to user analysis We develop an extension to B.A.S.F. to utilize many PC servers connected via network to be used in next computing system

10 10 New Analysis Framework New Framework Should Provide: Event by event parallel processing capability over network Resource usage optimization Maximize total CPU usage Draw maximum I/O rate from tape servers Capability of handling other purpose than DST production User analysis, Monte Carlo simulation or anything Application for parallel processing at university site dBASF – Distributed B.A.S.F Super-framework for B.A.S.F.

11 Job Client Resource Link of dBASF Servers report of resource usages SPARC PC server B.A.S.F. I/O B.A.S.F. I/O PC server dynamic change of node allocation init/term B.A.S.F.

12 BELLE/CHEP200012 Communication among Servers Functionality Call function on a remote node by sending a message Shared memory expanded over network space Implementation NSM – Network Shared Memory House-grown product Originally used for BELLE DAQ Based on TCP and UDP

13 BELLE/CHEP200013 Components of dBASF dBASF Client User interface Accepts from user: B.A.S.F. execution script Number of CPUs to be allocated for analysis Asks Resource manager to allocate B.A.S.F. daemons Resource manager returns allocated nodes Initiates B.A.S.F. execution on allocated nodes Waits for completion Notified from B.A.S.F. daemons when job ends

14 BELLE/CHEP200014 Components of dBASF Resource Manager Collects resource usage from B.A.S.F. daemons through NSM shared memory CPU load Network traffic rate Monitors idling B.A.S.F. daemons of each dBASF session Increase/decrease number of allocated B.A.S.F. daemons dynamically when better assignment is discovered

15 BELLE/CHEP200015 Components of dBASF B.A.S.F. Daemon Runs on each computing server Accepts ‘initiation request’ from dBASF client and forks B.A.S.F. processes Reports resource usage to Resource manager through NSM shared memory

16 BELLE/CHEP200016 Components of dBASF I/O Daemon Reads tapes or disk files and distributes events to B.A.S.F. running on each node through network Collects processed data from B.A.S.F. through network and writes them to tapes or disk files In case of Monte Carlo event generation, event generator output is distributed to B.A.S.F. where detector simulation is running

17 BELLE/CHEP200017 Components of dBASF Miscellaneous Servers Histogram server Merges histogram data accumulated on each node Output server Collects standard out on each node and saves them to file

18 BELLE/CHEP200018 Resource Management Best Performance Achieved when total I/O rate becomes maximum with minimum number of CPUs Dynamic Load Balancing CPU bound: Increase number of Computing servers so that I/O speed becomes maximum I/O bound: Decrease number of Computing servers so as not to change I/O speed

19 BELLE/CHEP200019 Resource Management Load Balancing When n now CPUs are assigned for a job, best assignment number of CPUs; n new is given by:

20 report of resource usage Resource Management Resource B.A.S.F. best allocation?no B.A.S.F. Job Client increase node decrease node initiate B.A.S.F. terminate B.A.S.F. B.A.S.F.

21 BELLE/CHEP200021 HistogramSTDOUT PC servers Data Flow I/O SPARC B.A.S.F. Raw Data Processed Data PC servers TCP/IP

22 22 Status System test is in progress on BELLE PC farm consisting of 16 units of P3 550MHz x 4 servers Node-to-node communication framework was developed and being tested Resource management algorithm is under study Basic speed test of network data transfer has been finished Fastether: Point-to-Point, 1-to-n GigbitEther: Point-to-point, 1-to-n New computing system will be available in March 2001

23 BELLE/CHEP200023 Summary We will build computing farm of 12,000 SPECint95 with PC Linux and Solaris servers to solve facing computing power shortness We began to develop management scheme of computing system extending current analysis framework We have developed communication framework and are studying resource management algorithm


Download ppt "HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis."

Similar presentations


Ads by Google