# Belle II DAQ and possibility of SRO

S. YAMADA (KEK)

## Introduction





#### The Belle II experiment

 $\succ$  Search for new physics beyond the Standard Model(SM) via high precision measurement with high statistics samples of B/D/tau decays.

Measure rare decays and search for the discrepancy from SM

➤ The precision of some Belle/Babar results showing tensions with SM is statistically limited.

- Precise Measurement of Unitarity Triangle
- Study of exotic hadrons

etc...

#### SuperKEKB accelerator

 Target of the integrated luminosity : 50 ab<sup>-1</sup>
(cf. 1ab<sup>-1</sup> @ Belle experiment)









## History of Belle II operation

#### <u> Phase I : (2016 Feb.-Jun.)</u>

Accelerator commissioning w/o final focusing magnets w/o the Belle II detector

#### Phase II : (2018Feb.-Jul)

- Accelerator commissioning and physics run
- with the Belle II detector except for vertex subdetectors

#### Phase III : (2019Mar- 2022July)

Physics run with the full Belle II detector

#### LS1(Long Shutdown 1) : (2022July – 2024Jan.)

Various upgrades and improvements of both the accelerator and detector

#### Run 2 : (2024Feb.-)

Resume beam operation





3

## Belle II DAQ system





#### Belle II DAQ system

Belle II



7

## Belle II Trigger and timing distribution(TTD) system



- Tree structure of distribution modules(FTSW)
  - Distribute trigger and timing to FEE(readout system)
  - Readout system receives trigger info. used for consistency check with data from FEE
  - Collects busy or error signal from FEE and readout system to pause sending a trigger.
  - Custom protocol : Data rate : 254Mbps









## BUSY handshake between Trigger-Timing Distribution(TTD) and readout system



- To avoid buffer flow in readout board, BUSY signal is sent to FTSW(Fast Timing SWitch module) from a readout board.
- FEE continues sending data
- When usage of FIFO on readout board becomes almost full, busy signal is sent to FTSW module.
- The FTSW module stop sending triggers ( -> dead time but no overflow)





9

#### Readout protocol from frontend electronics

#### Belle2Link : (D. Sun et. all, hysics Procedia Volume 37, 2012, pp. 1933-1939)

Unified high speed link which connects Front-End Electronics (FEE) and DAQ system for signal with data transmission based on Rocket I/O

FEE side : Functions for I/F with FEE and Trigger Timing Distribution on FPGA

DAQ side : FGPA on a readout board as a data receiver







## Upgrade of the Belle II readout system in LS1

Motivation

- > Difficulty in maintenance throughout the entire Belle-II experiment period
- Upgrade the current bottlenecks in old COPPER readout board( CPU, PCIbus, GigabitEthernet etc. )



## Event-building scheme in the firmware of readout board

- Before LS1, event-building was performed in PCIe40 FPGA on-chip memory.
- > The new scheme has been developed in LS1 by using PC server memory for event-building.
  - Larger memory increases the room to wait for events from FEEs before buffer-full.



- Throughput up to ROPC : FEE->PCIe40->ROPC software : 3.4GB/s
  - Note : Data were not sent to HLT in this measurement
- In a high-rate test including data-transfer to HLT, 800MB/s/ROPC at 32kHz was achieved.
  - Trigger holdoff is the current bottleneck.





## <u>High level trigger</u>

#### **Functions**

- 1. Event reconstruction from data of all detectors except for PXD
- 2. Reconstruction software developed for offline analysis is also used in HLT
- 3. For trigger selection, physics event selection is applied. (rate reduction : 1/3)
- 4. ROI information from reconstructed SVD tracks is fed to PXD for data size reduction

Reconstruction for online event-selection requires large CPU power  $\rightarrow$  parallel processing on large number of CPUs



13

#### Possibility of streaming readout @ Belle II





## Motivations for SRO at Belle II

#### L1 Trigger Menu for Low Multiplicity Physics BELLE2-NOTE-PH-2015-011

| Processes              | T1:2trk | T2:1trk1mu | T3:1mu | T4:1trk1c | T1:bbc | T2:3g | T3:3t | Combine |
|------------------------|---------|------------|--------|-----------|--------|-------|-------|---------|
| $B^0 \overline{B}{}^0$ | -       | 96.5       | 50.0   | 82.9      | 44.8   | 93.4  | 99.4  | > 99.9  |
| $B^+B^-$               | -       | 96.5       | 51.7   | 84.1      | 46.2   | 92.6  | 99.5  | > 99.9  |
| ccbar                  |         | 96.8       | 65.9   | 89.4      | 52.1   | 84.8  | 98.0  | > 99.9  |
| uds                    | -       | 96.5       | 68.0   | 89.1      | 50.0   | 81.1  | 97.2  | > 99.9  |

TABLE VIII: Efficiencies and Cross section after triggers

- Already very high efficiency for hadronic events in Belle II
- What kind of events can benefit from triggerless DAQ/software trigger ?
  - Low multiplicity event for lower energy
  - Displaced vertex

#### **Single Photon Search**

- Search for massive Dark Photon, A', which mixes with Standard Model photon.
- Detector signature is a single initial-state radiation photon.





- Single photon trigger is crucial:
  - Maintaining acceptable rate challenging due to beam-induced backgrounds







DESY. Savino Longo (savino.longo@desy.de)

## What needs to be done in triggerless DAQ : data-format

#### Data format

- No event #
  - Instead, time counter from FTSW will be used (timestamp)
- It would be good to have coarse time-stamp and fine timestamp



## What needs to be done in triggerless DAQ : sorting

- Time sorting
  - Instead of event-building, data from different FEEs needs to be sorted for software trigger.



Probably, this part is also resource-consuming. It will be useful to use FPGA or GPU for the sorting. It could affect the load of software trigger.





## What needs to be done in triggerless DAQ (3)

- Online event selection
- How to set the trigger-window is complicated
- Check every bunch crossing ? :
  - LHCb max. 40MHz(=25ns)
  - Belle II 254MHz(=4ns)
    - Smaller than timing resolution of some sub-detectors
- Set a certain trigger-window which has some overlap



Time-window size for each sub-detector depends on timing resolution of the sensor.





<u>Three bottlenecks for trigger less DAQ</u>



#### Since estimation for triggerless DAQ is not straightforward 100kHz trigger rate is considered...

|       | # of<br>ROPC<br>(COPPER<br>) | dataflow of the<br>current system at<br><b>30kHz by just scaling</b><br>data on May18.2021<br>[ MB/s ] | 100kHz<br>(3.3 x 30kHz)<br>[MB/s] | # of ROPC<br>(PCIe40) | 100kHz<br>/PCIe40<br>[MB/s]† |
|-------|------------------------------|--------------------------------------------------------------------------------------------------------|-----------------------------------|-----------------------|------------------------------|
| SVD   | 9                            | 3640                                                                                                   | 12133                             | 5                     | 2427                         |
| CDC   | 9                            | 613                                                                                                    | 2043                              | 7                     | 292                          |
| ТОР   | 3                            | 208                                                                                                    | 693                               | 2                     | 347                          |
| ARICH | 6                            | 375                                                                                                    | 1250                              | 2                     | 625                          |
| ECL   | 10                           | 601                                                                                                    | 2003                              | 3                     | 668                          |
| KLM   | 3                            | 44.5                                                                                                   | 148                               | 1                     | 148                          |
| TRG   | 3                            | 137                                                                                                    | 457                               | 1                     | 457                          |
| Total | 43                           | 5619                                                                                                   | 18728                             | 21                    |                              |



- Event size will increase as luminosity does.
  - In the table, we only account for the increase in SVD event size( 3.6 times increase at the designed luminosity).



#### (2) + (3) + (4) our etter 40 firm ware on ROPC



## Network bandwidth





100kHz

/PCle40

#### The current performance estimation with 13 HLT units

#### Performance estimation of HLT reconstruction software



#### with 13 units + release 07

- Thanks to the new units and tuning of the reconstruction software, 20kHz processing power is expected with 13 HLT units
- However, this part is still one of the main bottlenecks if we adopt triggerless DAQ.



#### Schedule of upgrade of HLT units (# of CPU cores)

- > Together with the tuning of reconstruction software, # of HLT units will be increased.
- > It is a scalable system because event-building is done before HLT.
  - Increase # of HLT units and different events can be processed in parallel.



## Status of each sub-detector's front-end electronics

#### Pixel detector(PXD)

- Rolling shutter readout: frame 20 us = 50 kHz (this is "trigger-less")
- Data is continuously sampled and written to FEE memory
- Classic "triggering":
  - Level-1 triggers is fed to FEE and tell which part of memory to send out
  - All data can (in principle) be read out at 50kHz (continuous readout ) but the bandwidth of FEE is the limit.
    - Capable of handling up to 1.5-2% occupancy of the pixel sensors. (0.1% at 2021 run)

Data structure in FEE to extract events with L1 trigger info.





25

#### Silicon Vertex Detector(SVD)

- APV25 chip on the silicon strip sensor provides analog signals sampled for 128-channels
- Data transmission from APV25 to FEE takes 26.5usec for 6samples(0.19us window@32MHz sampling clock).
  - Only 0.7% of waveform can be sent if we remove Level-1 trigger.
- There is an upgrade plan to replace both PXD and SVD detectors with pixel sensors in the SuperKEKB Long Shutdown 2 (around 2028, but not yet determined)

#### **Central Drift Chamber(CDC)**

- Hit rate : around 80kHz per wire is expected at inner layers
  - 48ch/FEE board
- Ihit data size : ch ID, ADC, TDC -> about 10bytes/hit
  - Iboard x 48ch => 40MB/s

#### Current FEE

Limitation of the current FEE

- Throughput: 1 Gbps via SFP
- acceptable latency: <8 us

Upgraded FEE (in LS2?)

Possible future improvements

• throughput: 4x 10 Gbps







#### **Time of Propagation detector for PID(TOP)**

- Multi-channel-plate PMTs (MCP PMTs) are used for Cherenkov photon detection.
- Hit rate of the PMT due to beam background : 3MHz
- 128ch per link to a readout board.

Current hit rate : trigger rate 10 kHz, ~30 hits/slot  $\Rightarrow$  300 kHz digitization of hits Triggerless scheme : 3 MHz/PMT, 32 PMTs/slot  $\Rightarrow$  96 MHz digitization of hits

Processing speed in FEE could be the bottleneck

#### Aerogel Ring Imaging Cherenkov detector(ARICH)

- HAPD(Hybrid Avalanche Photo-Detector) is used to detect Cherencov ring in aerogel.
- Only hit information(0 or 1) is sent downstream.
- 10triggers / 26.4us is the limitation x 500ns time window
- → 380kHz is the limit

Belle II

→ 500ns x 380kHz = 19% of the time can be covered. This throughput is the limitation.

Since most of the data are overhead, the change of format could reduce the throughput.

CH Data in Suppressed Mode



\* If a ch has no hit data (Zero), its data is not transmitted.



#### **Electromagnetic Calorimeter(ECL)**

- Front-end electronics(ShaperDSP) board samples waveform with 2MHz and waveform fit is performed to get timing and energy.
- This data processing on a FEE board is one of the bottlenecks of the ECL readout.

#### ShaperDSP FPGA processor ADC data cycle buffer TRG TRG FIFO 290 mks (16 ev.) ADC data 16 ch. TKN ADC buffer samples write addr SCK waddr generator wclk

#### Klong and muon detector(KLM)

The sub-detector consists of Resistive Plate Counter(RPC) and scintillator bars

**Throughput** : Cumulative hit-rate is around O(1) MHz 1hit = 8bytes x 32 links are not so large.

**Bottleneck** : Digitization and waveform sampling is the bottleneck for scintillator waveform readout. 24us/event. -> 40kHz trigger rate

Most likely a "triggerless" mode on would use trigger bits only (no waveform) or would require new FEE.





## Partial SRO for Belle II

- Since each sub-detector has its own FEE hardware and firmware, they will each need a lot of upgrade work to replace them at the same time.
- Starting from adding another data path for some sub-detectors for streaming readout would be a realistic option. (e.g. ECL+CDC for low multiplicity and displaced vertex events)



#### <u>Summary</u>

In the Belle II experiment, the DAQ system needs to be able to handle data-flow with a few tens of times larger luminosity than the former Belle experiment.

- It has been operational in physics runs since 2019 and we are currently in Long Shutdown 1. The next physics Run will start in February 2024.
- Readout system
  - Old COPPER-board based system was replaced with PCIe40 recently and the throughput has been improved.
- High-level trigger
  - Adding CPUs to increase processing power. Currently 20kHz trigger rate can be processed.
- Currently, Belle II DAQ relies on hardware trigger to select events. For the SRO @ Belle II DAQ, improvement of throughput in FEE and HLT is necessary. For FEEs, staging approach to have both data paths with L1 trigger and some sub-detectors w/ streaming readout is realistic.



