# SRO FEE development in Japan

#### Outline

- Key technologies for trigger-less DAQ system
- Implementation to front-end electronics
- Beam test in RCNP and J-PARC
- Future prospect
- Summary

KEK IPNS Ryotaro Honda

# Toward trigger-less data-streaming DAQ system



# Development policy

#### Scalability

- Should be scalable from single FEE node (stand-alone) to a few thousands FEE nodes.
  - Mbps Tbps
- Not only for actual physics experiments but also for small setup, e.g., test bench.

#### Simple to use and maintain

• Usable for a small experiment not having the DAQ team inside.

#### Standardization

- We would like to promote standardization of developed items under SPADI-alliance
- A DAQ system for many experiments



SPADI

Alliance

# Key technologies for high-resolution streaming TDC



# Tapped-delay-line based FPGA TDC

Timing interpolation technique using tapped-delay-line (TDL)



| Clock signal | <b>~</b> | <b>→</b> |  | <br> |
|--------------|----------|----------|--|------|
| Hit input    |          |          |  |      |

Take a snap shot of a pulse running on TDL

TDC implemented in AMD Kintex-7 FPGA

- Use CARRY4 element
- 192 taps corresponding to the vertical size of a clock region.

Timing resolution: 15 ps ( $\sigma$ )



# Data collection relying only on timestamps

#### Consider the case for J-PARC slow extraction



We need the continuous timing measurement over 2 s (spill duration of J-PARC slow extraction)

• Required dynamic range:  $\sim 10^{10}$  (1 ns TDC case)

Introduce **heartbeatmethod**: a technique to reconstruct the time without a long-length time stamp.

- Insert a special data word periodically
- 16-bit heartbeat counter generates this period ⇔ 16-bit coarse count



## Simplified block diagram of Str-TDC



# MIKUMARI link

Clock/data distribution system based on the modulated clock signal (CDCM) using a single transmission line



#### Functions

- Frequency synchronization by transmitting a modulated clock signal
- Data communication based on a data frame
- Pulse transmission with the fixed latency

#### Features

- It can be implemented in a regular bank. It's independent from high-speed serial transceivers such as MGT in AMD FPGAs.
- Any PLL is workable as the clock recovery circuit.



#### Core technology: clock-duty-cycle-modulation

Adopting clock-duty-cycle-modulation (CDCM) as a core technology

- CDCM is a clock-centric type modulation.
- Data bits are embedded to the trailing edges of the clock signal.



- Modulated clock can be directly input to PLLs. Every PLL will be a clock recovery circuit.
- When using PLL having a zero-delay mode for clock recovery, output clock skews respect to the input modulated clock are automatically adjusted.
  - No phase uncertainty. (It exits in a CDR circuit due to clock frequency division.)

#### CDCM by IOSERDES of Xilinx FPGA



[1]. D. Calvet, IEEE TNS, 67(8), 1912 (2020)[2]. R. Honda, IEEE TNS, 70 (6), 1102 (2023)

# Simplified block diagram of Str-TDC



Implementation to electronics

# Hardware

| •                        |                    |
|--------------------------|--------------------|
| AMANEO                   | V1 🥥 *             |
| SFP+                     |                    |
| (Up to 10G)              | it Small mezzanine |
|                          | for 1/O extension  |
|                          |                    |
|                          |                    |
| XC7K160T                 |                    |
| -2FFG                    |                    |
|                          |                    |
|                          | Mezzanine slot     |
| DDR3-SDRAM               | (compatible with   |
| ( <b>DDR3-800, 2Gb</b> ) | HUL)               |
|                          |                    |
|                          | G Hain parts       |
|                          | In Containing      |
|                          |                    |
| compatible)              | Main input         |
|                          | Mezzanine2 part    |
|                          | (Diff. signals)    |
| Power                    |                    |
| (DC 30-35V)              |                    |
|                          |                    |
|                          |                    |
|                          |                    |

A main electronics for network oriented trigger-less data acquisition system (AMANEQ)

- Kintex7 with speed grade -2
  - Transceiver bandwidth up to 10Gbps
  - Can implement SiTCP-XG
- Has two mezzanine slot
  - Mount HUL mezzanine HR-TDC
  - Mount DCR mezzanine for DC readout
- Has a jitter cleaner (CDCE62002)
- DDR3-SDRAM as a de-randomizer
  - DDR3-1333 with 16-bit bus width.
  - 2 Gb
- Powered by the external power supply with DC 30-35V

# Hardware for streaming high-resolution TDC



#### HUL/AMANEQ mezzanine HR-TDC



32 ch tapped-delay-line (TDL) based HR-TDC

- Input IO std.: LVDS (ECL is not supported)
- TDL consists of a CARRY4 primitive chain.
- Both leading/trailing edges
- Intrinsic resolution: 15 ps ( $\sigma$ )

# Streaming HR-TDC on AMANEQ



Modulated clock from upstream (distributer) module

#### AMANEQ Str-HRTDC

- Input : 64 ch (32+32)
- Timing resolution:  $\sim$  25 ps ( $\sigma$ )
  - Synchronization precision of MIKUMARI is included
- Data link: IOGbE
  - TCP/IP provided by SiTCP-XG
- Operation mode
  - Stand alone mode
  - Sync with MIKUMARI system

# HR-TDC timing resolution



# Use case for beam experiments

# Tested in RCNP and J-PARC





#### Tested at RCNP Grand Raiden

- Cyclotron
  - Primary beams
- Test experiment/Physics experiment
  - $\sim$  0.2 Gbps in total

#### Tested at Hadron facility, KI.8BR beam line

- Synchrotron
  - Secondary hadron beams
    - π, K, p
  - Slow extraction
- Test experiment
  - $\sim$  2 Gbps in total

# Pictures of text exp. at J-PARC

**Server PC** 



# Picture of AMANEQs





#### **Detectors**

- BDC: 900 ch + KLDC: 512 ch
- SFT: 384 ch
- TOF, MRPC, Timing counters: 62 ch
- Read by Str-LRTDC (1ns)
- Read by Str-HRTDC

#### DAQ system

- FEE: AMANEQ x20 (Streaming TDC x17 + MIKUMARI x3)
  - Str-LRTDC AMANEQ x15
  - Str-HRTDC AMANEQ x2
- NestDAQ software running on a PC

Compact system was achieved

# Analysis result: TOF histogram for timing counters



# Future prospect

# Implementation of SRO to other FEEs

#### So far, we have developed...

- Streaming high-resolution TDC (25 ps)
- Streaming low-resolution TDC (1 ns)

Same techniques, heartbeat method and MIKUMARI, will be implemented to FEEs developed under SPADI-alliance (and Open-It)





# Summary

#### So far, we have developed several key technologies for the general purpose DAQ system

- Heartbeat method to overcome a problem of long-length counter
- Simple and light-weight synchronization technology: MIKUMARI
- Next generation DAQ software frame work: NestDAQ
  - Semi-automatic topology generation based on service registry by redis

#### Implementation of Streaming HR-TDC into the AMANEQ board

- TDL based HR-TDC is outsourced to the mezzanine card
  - 64 (32+32) ch input per AMANEQ using two HR-TDC mezzanine cards
- $\sim$  25 ps timing resolution including the synchronization precision.
- Data communication by IOGbE

#### The developed Str-HRTDC was tested in the RCNP Grand RAIDEN and J-PARC hadron facility

- Grand RAIDEN at RCNP
- KI.8BR beam line at J-PARC

Full-streaming readout and software based event selection was successfully performed

Future prospect

• Implement streaming function to other FEEs developed under SPADI-alliance

# Toward trigger-less data-streaming DAQ system



# Data collection relying only on timestamps

#### Structure of TDC data

Fine count (n-bit) + Coarse count (m-bit)

Provided by timing interpolation techniques

- Sampling by multi-phases clock signals
- Tapped-delay-line

Independent from data acquisition function

Generated by the DAQ function driven by system (base) clock signal.

In conventional TDC

- The bit-length depends on the record length since it is the relative timing measurement respect to the trigger.
  - E.g., ring-buffer length ⇔ coarse count bit-length

#### In streaming TDC

• No local timing reference. It will be free running, but what length is necessary?

#### FEE data format

In conventional TDC

- Data header (trailer) exists as the boundary of the data block.
  - It may contains data size, event number, flags, etc...

#### In streaming TDC

• No local data block, an event, exists. Another data format must be introduced.

#### Data transfer speed via SiTCP-XG

Test setup



#### Throughput of DDR3-SDRAM



Data bus of SDRAM is bi-directional. Memory operation is determined by command.

#### Tested write/read pattern.





Obtained throughput (reference value) **DDR3-800** 

- ~4.8 Gbps (6.4 Gbps)DDR3-1333
- ~7.9 Gbps (10.66 Gbps)

\*\*\*Access to the same memory bank. Larger over head when changing the bank address.

# Streaming LR-TDC on AMANEQ



Modulated clock from upstream (distributer) module

#### AMANEQ Str-LRTDC

- Input : I 28 ch
  - Mezzanine cards are changed
- Timing precision: Ins
  - (Resolution is better than Ins)
- Data link: IGbE
  - TCP/IP provided by SiTCP
- Operation mode
  - Stand alone mode
  - Sync with MIKUMARI system



- Synchronize 20 AMANEQs using MIKUMARI network
- Readout the data from FEE with a server PC