Presentation is loading. Please wait.

Presentation is loading. Please wait.

LHCb upgrade Workshop, Oxford, 07.12.2010 Xavier Gremaud (EPFL, Switzerland)

Similar presentations


Presentation on theme: "LHCb upgrade Workshop, Oxford, 07.12.2010 Xavier Gremaud (EPFL, Switzerland)"— Presentation transcript:

1 LHCb upgrade Workshop, Oxford, 07.12.2010 Xavier Gremaud (EPFL, Switzerland)

2  Data flow  Input data format  Time reordering  Clusterization  Output format  Conclusion 07.12.2010Xavier Gremaud, EPFL

3 07.12.2010Xavier Gremaud, EPFL Split the GBT data in 2x40b Time Reordering Reconstruct the Super Pixel Packet (SPP) 80b wide Linker 0, assemble data from 2 SPP data stream Clusterization + ToT correction (subtraction) (maybe lookup table based calibration) Clusterization + ToT correction (subtraction) (maybe lookup table based calibration) Data from two column processors

4 07.12.2010Xavier Gremaud, EPFL Linker 1, assemble data from 3 GBT, 64b->128b Linker 2, assemble data from 2x3 GBT, 128b->256b Linker 3, assemble data from 2x2x3 GBT, 256b->512b Linker 4, assemble data from 2x2x2x3 GBT, 512b MEP assembly (note : average event is only 2..4 512-bit word long) External memory 2x256b Ethernet framer 512b

5  For 1 link : 80b/25ns = 3.2 Gb/s  For 24 links : 77 Gb/s  The 80b wide GBT word is divided into two 40b data streams which are filled by the column processor (fixed position in the 80b data word). 07.12.2010Xavier Gremaud, EPFL

6  The RAM space is divided in 512 equally sized memory blocks (space reserved for data arriving in random order)  RAM location defined with LSBs of BxID (BCNT) Note: The total memory space required is: max. time delay allowed * the max. event size allowed (space for every event has to be reserved!) 07.12.2010Xavier Gremaud, EPFL

7  In the current FPGA EP4SGX530 (largest Altera Stratix IV device) «only» 64x144kB memory blocks are available.  Choosing a time reorder buffer of 512 events deep and 8 word event size occupies 48 memory blocks (maximum size reached!)  Note: There are no other large memories required for the other processing steps. Conclusion:  Each GBT link is restricted to 8 SPP (Super Pixel Packets) smaller than 64bit.  For the total pixel chip, the maximum number of SPPs is 5x8=40/event.  Time reorder is possible for up to 512-16=498 events. 07.12.2010Xavier Gremaud, EPFL

8  Clusterization requires to split up the SPP format (for example two isolated pixels can be in the same SPP)!  Most obvious approach for clusterization is to use one seeding pixel and search for possible neighbours.  Very difficult to perform “perfect” clusters, average time per cluster is limited to 25ns if done in a pipeline, otherwise 25ns for the complete event!  The 16b seeding hit address is reconstructed from the 12b address, the 4b row header and the 4b hitmap.  An additional link source id is required to identify data from 24 different GBT links (+5bit) 07.12.2010Xavier Gremaud, EPFL

9  The principal goal of the clusterization is data reduction, “perfect” clustering like for Tell1 is not possible anymore. Additional processing in a CPU is required to finish:  Forming clusters over boundaries of GBT links  Combining separated clusters  Forming clusters for events with too high pixel count (see illustration next slide) 07.12.2010Xavier Gremaud, EPFL The cluster form depend of the seeding hit, which is the first hit. One “normal cluster” can be split in two clusters.

10  To pipeline the cluster search, only one cluster per pipeline step is formed.  One pipeline step takes 25ns (2-300Mhz processing frequency)  In average the hottest region has 2..4 pixels “only” per event and per GBT (10..20 pixel per chip)!  The cluster search is performed by searching neighbors from the first hit in the data. Each consecutive pipeline stage has the identical function.  The total number of clusters that can be formed is limited by the number of pipeline stages. 07.12.2010Xavier Gremaud, EPFL

11 07.12.2010Xavier Gremaud, EPFL  The cluster size is restricted to multiple of bytes! (Data processing on the FPGA but also on the CPU becomes very difficult otherwise)  The expected data reduction from clustering taking for 50% 1-hit and 50% 2-hit clusters is order of 14%. Q: Is it worth while doing “not perfect” clustering for 14% data reduction? Q: Does the CPU take advantage from such clusters? Q: Does anybody know an other feasible clustering approach? With clusterizationWithoutData reduction 1 hit29b => 32b25b => 32b 0% 2 hits36b => 40b50b => 56b 28.5% 3 hits43b => 48b75b => 80b 40.0% 4 hits50b => 56b100b => 104b 46.1% 5 hits57b => 64b125b => 128b 50.0% 6 hits68b => 72b150b => 156b 53.8%

12  After the 24 links are linked together, the data are put in a MEP format to reduce the data before the DDR3 SDRAM.  The Bcnt appears only once per event (small data reduction can be expected) (-12bit). 07.12.2010Xavier Gremaud, EPFL

13  The real challenge of the data processing is not to spend more than 25ns per event! Pipelining is required everywhere!  Time reordering for 512 events reaches the limit of the FPGA internal memory.  ToT calculation from BCnt and timestamp is no problem. Calibration per pixel is impossible!  No more real data reduction (zero suppression) like in TELL1.  Small reduction from removing BCNT (-12-bit / SPP)  Small increase from source ID (+5-bit / cluster)  Small decrease from clustering (-14%)  Largest reduction due to not fully loaded GBT links from furthest pixel chips from the beam.  Long time average reduction due to empty bunch crossings. 07.12.2010Xavier Gremaud, EPFL

14  Very wide buses require large multiplexers for padding (eg a 512-bit bus requires for byte padding a multiplexer of 512x64 (32K connections)). Maybe at some stage in the processing the padding has to be reduced to 32- bit minimal size.  Clusterization useful and fast enough? Need some test with real data and a distribution of the cluster sizes. 07.12.2010Xavier Gremaud, EPFL

15  Implementation of the processing including clustering in VHDL  Simulation of the processing with MC data  Place and route of the design to get better idea of possible processing frequency and resource management. 07.12.2010Xavier Gremaud, EPFL


Download ppt "LHCb upgrade Workshop, Oxford, 07.12.2010 Xavier Gremaud (EPFL, Switzerland)"

Similar presentations


Ads by Google