Presentation is loading. Please wait.

Presentation is loading. Please wait.

The BTeV Trigger A Model Application for the Developers of Real Time and Embedded Technologies Joel N. Butler, Fermilab Workshop on High Performance, Fault-Adaptive.

Similar presentations


Presentation on theme: "The BTeV Trigger A Model Application for the Developers of Real Time and Embedded Technologies Joel N. Butler, Fermilab Workshop on High Performance, Fault-Adaptive."— Presentation transcript:

1 The BTeV Trigger A Model Application for the Developers of Real Time and Embedded Technologies Joel N. Butler, Fermilab Workshop on High Performance, Fault-Adaptive Large Scale Real-Time Systems Vanderbilt University Nov. 14-15, 2002

2 What’s BTeV An experiment at the Fermilab Tevatron Collider to study the matter-antimatter asymmetry of decays of particles containing the b-quark, a quark which is about 5 times the mass of the proton and decays away with a mean lifetime of 1.5 picoseconds (10 -12 s). When produced at high energies, Einstein’s time dilation allows these particles to go a few millimeters from their production point before they decay. BTeV has enough tracking precision to reconstruct the interaction vertex and the decay vertex and can therefore isolate and study the decays of b-particles. The goal is to study the asymmetry (difference in rate) between the decay of b- particles and b-antiparticles Without these kinds of asymmetries, all matter would have found an antimatter particle to annihilate with into pure energy and there would be no matter excess to form the universe! While I care about this problem, the main point here is that in attacking it, we’ve had to develop some hardware and software, which, viewed abstractly, might be useful to YOU.

3 What’s a Trigger A Trigger is a FILTER. It selects some high energy data – in the form of records of individual collisions – to “save” and consigns the rest to oblivion. The BTeV trigger is a filter with a vengeance, involving thousands of computers operating in parallel to do sophisticated selections at 7.5 MHz.

4 What’s a Model? Webster’s Third New International Dictionary has 14 definitions, among which are: –A structural design or pattern –A person or thing regarded as worthy of imitation –A person or thing that serves as a pattern or source of inspiration for an artist or writer –One who is employed to display clothes or appear in displays of other merchandise

5 Outline What a trigger does in high energy physics experiments and why they started out “highly specialized” How triggers got less “specialized” The BTeV Trigger as a real implementation The BTeV trigger as a model or abstraction What needs to be done to exploit the model

6 The What and Why of Triggers High energy collisions are single events, usually the result of a “beam projectile” colliding with either a nucleus in a solid target (fixed target experiment) or a projectile in a second beam coming from the opposite direction (colliding beam experiment). Most high energy physics events are “ordinary” – I.e. “understood” (sometimes that means not really understood but at least “familiar”) One is usually looking for “rare” events For one reason or another, shifting over time, it has been impossible to record “every” event. In HEP, a “trigger system” is the collection of hardware and associated software used to select (YES/NO) which events are to be recorded to an archival medium and therefore will be available to analysis and which are discarded, I.e. lost forever

7 Some Features It must run in quasi-real time – decisions must keep up with the rate of interactions coming from the experiment – the BTeV trigger makes a decision every 132 ns on average It is “mission critical” – a defective trigger can throw out the “good events” (as well as, or instead of, the bad). It must be well understood – a malfunction in the trigger can create selection bias that make it very hard to extract information from the signal events. We worry that it can be inherently physics “biased” – if you set it up to be efficient at selecting what you are looking for, how will you ever find the unexpected? You would not use a “trigger” if you could record and subsequently analyze every event that was produced. The need for a trigger is always due to the scarcity of some resource. Only 1 collision in 500 has a b-quark.

8 There is one of these every 132 ns (7.6 MHz) Events have vastly different numbers of produced particles, and so a big variation in the number of struck channels The detector response is faster than the 132 ns interval between consecutive events For this detector, there are 25 million channels but only a few thousand have signals from track passing through in any given event Total event – all detectors – have 200Kbytes/event, for a raw, sparsified rate of 1.5 Terabyte/second Runs are last for about 8 hours each and go on 24 x 7 30 station pixel detector A High Energy Collision

9 Various Limitations Detector deadtime – early detectors needed to put in a sensitive state and “fired” or “triggered” when there was an interaction. While they were “recovering”, other interactions were missed so you only wanted to “trigger” on good events Trigger deadtime – sometimes data taking was impossible while the trigger was making up its mind Readout deadtime – sometimes the detector could not accept data while it was being readout Storage limitations – sometimes archiving could not keep up and, without affordable buffering, events were lost Most of these limitations can now be avoided due to improvements in speed and price of electronics, pipelining and buffering using cheap disk and memory, leaving ….

10 Data Storage/Data Access Limitations A typical experiment works very hard to get down to 200MBytes of output per second In a typical run of 1 year, accounting for scheduled down time, accidental downtime, deadtime, etc, this results in 2 Pbyte/year Typically, the output of the computations done on this triples this number to of order 6-8 Pbytes per year Not just affording the storage, but also being able to access such large datasets, is a final limitation To achieve this, the BTeV trigger must select, at most, 1 event out of 2000 events; the CMS and ATLAS triggers (CERN LHC) must select 1 “event” out of 40,000 events!

11 Early Systems Early systems used the simplest aspects of a collision for the trigger – energy deposition in a few detectors Later, they began to use more complicated quantities, such as the total transverse energy –  E i X polar angle i – which could be done on specialized hardware boards with weighting schemes or ALU’s. Since this usually took longer and caused more deadtime, only events which passed the simple hardware trigger – now called “Level 1”, was sent to this new subsystem – “Level 2”

12 Enter Computing – at Level 3 Once microprocessors became available, it became the practice to add a “Level 3” to the “Trigger hierarchy” which used FARMS or CLUSTERS of general purpose processors to do much more sophisticated computations, but on the relatively small number of events passing Levels 1 and Level 2. In HEP, these were an outgrowth of the use of FARMS or CLUSTERS to exploit the “embarrassingly trivial parallelism” of OFFLINE analysis – I.e. each event is an (almost) independent analysis problem as far as event reconstruction goes.

13 Computing Invades Level 2! Within a very few years, general purpose computing was appearing also in subcomponents of the Level 2 triggers and even, in a few cases, for specialized purposes in Level 1. At the same time, FPGA’s, PLA’s, associative memories, etc were beginning to blur the distinctions between computers and combinatoric logic

14 Rationale for Trigger Hierarchy  Collision Rate = R1 = input rate to Level 1. Average Decision time is 1/R1. If L1 accept fraction is f1, output rate is f1 x R1  Input Rate at Level 2 is f1 x R1. Decision time is 1/(f1 X R1) If L2 accept rate is f2, output rate is R1 x f1 x f2.  Input Rate to Level 3 is R1 x f1 x f2. Decision time is 1/(R1 x f1 x f2). Output rate, f3, is usually set by storage considerations

15 BTeV: Every Event Gets Computed The final step is to extend computing to all aspects of the trigger The different levels are really now mainly distinguished by the complexity of the algorithm, although at present there are still minor differences in the hardware at various levels, strictly due to cost considerations.

16 BTeV Spectrometer

17 The BTeV Level I Vertex Trigger Key Points –This is made possible by a vertex detector with excellent spatial resolution, fast readout, low occupancy, and 3-d space points. –A heavily pipelined and parallel processing architecture using inexpensive processing nodes optimized for specific tasks ~ 3000 processors (DSPs). –Sufficient memory (~1 Terabyte) to buffer the event data while calculations are carried out. The trigger will reconstruct every beam crossing and look for TOPOLOGICAL evidence of a B decaying downstream of the primary vertex. Runs at 7.6 MHz!

18 BTeV trigger block diagram 1.5 TB/s7.6 MHz L1 rate reduction: ~100x L2/3 rate reduction: ~20x 4 KHz~800 MB/s 200 MB/s (4x compression)

19 Level 1 vertex trigger architecture FPGA segment trackers Merge Trigger decision to Global Level 1 Switch: sort by crossing number track/vertex farm (~2500 processors) 30 station pixel detector

20 Generate Level-1 accept if 2 or more “detached” tracks in the BTeV pixel detector satisfy : (GeV/c) 2 cm L1 vertex trigger algorithm Execute Trigger

21 Pixel L1Trigger  Finds the primary vertex and identifies tracks which miss it, calculates the significance of detachment, b/  (b). 74% 1% b,b/  b Impact Parameter in units of  Trigger Efficiency-Minimum Bias Events Trigger EfficiencyB s  D s K EFFICIENCYEFFICIENCY EFFICIENCYEFFICIENCY N=1 N=2 N=3 N=4 N=1 N=2 N=3 N=4

22 The Level 2/3 Trigger The Level 1 trigger rejects 99% of the events, retaining nearly 75% of all useful b-quark events The Level 2 and 3 trigger consists of a farm of 2500 LINIX processors, which do an complete analysis – almost equivalent to the full offline analysis – using every piece of information available This system applies a sophisticated set of “physics filters” to achieve a further rejection of 95%, while retaining 90% of the useful b-quark events which survived Level 1.

23 The Abstraction – A Selection Engine So far this looks pretty specialized But almost all the pieces are “commodity devices” and all are “programmable” The only BTeV “specific” part is where two data “substreams” – the pixel detector and the muon detector – are picked off and routed to the Level 1 trigger Lets abstract this by having a “ new data arrival notifier” and “ a data extractor”

24 The Generalized Data Selection Engine Data Generation Transient or Persistent Storage Level N Filter Persistent Storage Level 2 Filter Level 1 Filter

25 Data? From sensors –HEP, Nuclear –Space Science, Astrophysics observations –Earths science, Geology Communications streams, Data Mining –EMAIL, Internet traffic –Written, graphic, verbal files Pattern matching …….

26 What Software is Needed for Something This Complex to Work? A toolkit of parallelization software to split the computations among Levels and among many computers at each Level. THERE ARE SEVERAL TOOLKITS TO DO THIS. A toolkit of Fault Tolerant, Fault adaptive software to make sure all is working, data is not getting lost or miscalculated: CPU and network problems. THERE IS NOTHING AVAILABLE TO DO THIS AT THE SCALE OF THOUSANDS OF PROCESSORS!!!! In real time cases, software to check that the apparatus is working, which is another class of fault. This must be handled at the application level but can use many of the elements of the toolkits above. DITTO!!!! All this must be made to look “simple” to an operator or analyst

27 Fault Tolerance The trigger is working on many beam crossings at once. To achieve high utilization of all processors, it makes decisions as quickly as possible. There is no fixed latency and events are not emerging in the same time ordered sequence with which they enter the system. Keeping the trigger system going and being sure it is making the right decisions is a very demanding problem -- 6000-12,000 processing elements: FPGAs, DSPs. Commercial LINUX processors We have to write a lot of sophisticated fault tolerant, fault adaptive software We are joined by a team of computer scientists who specialize in fault-tolerant computing under an award of $5M over 5 years from the US NSF.

28 Analysis Local Oper. Manager Local Fault Mgr Trig Algo. ARMOR/RTOS Trig Algo. Trig Algo. Trig Algo. Logical Control Network L1/ DSP Local Oper. Manager Local Fault Mgr Trig Algo. ARMOR/RTOS Trig Algo. Trig Algo. Trig Algo. Logical Data Net DSP Local Oper Manager Local Fault Mgr Trig Algo. ARMOR/Linux Trig Algo. Trig Algo. Trig Algo. Logical Data Net Logical Control Network RISC Local Oper Manager Local Fault Mgr Trig Algo. ARMOR/Linux Trig Algo. Trig Algo. Trig Algo. L2,3/ RISC Region Operations Mgr Region Fault Mgr Runtime Design and Analysis Reconfig Behavior Algorithm Fault Behavior Resource Synthesis Performance Simulation Diagnosability Analysis Reliability Analysis System Models Soft Real-Time Hard Experiment Interface Synthesis Feedback Modeling Logical Control Network Global Operations Manager Global Fault Manager

29 Conclusion We believe/hope that many applications can use this kind of system, if not in detail, than at least its abstraction. We believe that the “fault adaptive, fault tolerant layer” is a key issue that will make it safe for non- experts, such as operators to use. We hope that you will help us to identify promising applications areas. We expect that these areas will have new requirements or different concerns/ emphases than HEP. This is your chance to influence the R&D!


Download ppt "The BTeV Trigger A Model Application for the Developers of Real Time and Embedded Technologies Joel N. Butler, Fermilab Workshop on High Performance, Fault-Adaptive."

Similar presentations


Ads by Google