

# **Outline**

# Introduction

• Who we are

# FPGA design

- Autoencoder
- Parallelizing decision trees
- HLS trees  $\rightarrow$  VHDL trees

# Thoughts on SRO

- Data compression
- Anomaly detection

### More info

- Code structure & git
- Slides & video tutorials



a





University of Pittsburgh

### Undergraduate students





Source: http://cern.ch/twiki/pub/Atlas/TDAQSpeakersCommitteeCommonReferences/tdaqFullNew2017.pdf





### 1.Classification

parallel cuts using HLS

| inst Published by IOP Publishing for Sissa Medialab                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | PUBLISHED BY IOP PUBLISHING FOR SISSA MEDIALAB                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| RECEIVEN: April 9, 2021<br>Accertres. June 29, 2021<br>Poblismed: August 4, 2021                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | Rustware. July 13, 2022<br>Accurren: August 23, 2022<br>PUBLISHED: September 27, 2022                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| anosecond machine learning event classification with<br>posted decision trees in FPGA for high energy physics                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | Nanosecond machine learning regression with deep<br>boosted decision trees in FPGA for high energy physics                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| . Hong,* B.T. Carlson, B.R. Eubanks, S.T. Racz, S.T. Roche, J. Stelzer and D.C. Stumpp                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | B.T. Carlson, <sup>a,b</sup> Q. Bayer, <sup>b</sup> T.M. Hong <sup>b,*</sup> and S.T. Roche <sup>b</sup>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| w. nong, D.I. Cansoli, D.R. Eudains, S.I. nace, S.I. noche, S. Steller and D.C. Stumpp<br>Department of Physics and Astronomy, University of Pittsburgh,<br>Od Allen Hall, 3941 O'Hara St., Pittsburgh, PA 15260, U.S.A.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | <sup>a</sup> Department of Physics and Engineering, Westmont College,<br>955 La Paz Road, Santa Barbara, CA 93108, U.S.A.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| <i>E-mail</i> : tmbong@pitt.edu                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | bolt at mice house, same sourboard, CAS Jirko GAA.     bolt at mice house, same sourboard, CAS Jirko GAA.     bolt at mice house, barrier and Attronomy, University of Pittsburgh,     100 Allen Hall, 3941 O'Hara St., Pittsburgh, PA 15260, U.S.A.     E-mail: tubong0pitt.edu                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| MARTING: We present a novel implementation of classification using the machine learning/artificial<br>telligence method called boosted decision trees (BDT) on field programmable gate arrays (FPGA).<br>The firmware implementation of binary classification requiring 100 training trees with a maximum<br>epth of 4 using four input variables gives a latency value of about 10 ns, independent of the clock<br>peed from 100 a 320 MHz in our setup. The low timing values are achieved by restructuring the<br>DDT layout and reconfiguring its parameters. The FPGA resource utilization is also kept low at<br>range from 0.01% to 0.2% in our setup. A software package called <i>treXinctTHA</i> achieves this<br>pulementation. Our intended user is an expert in custom electronics-based trigger systems in high<br>nergy physics experiments or anyone that needs decisions at the lowest latency values for real-time<br>vent classification. Two problems from high energy physics are considered, in the separation of<br>lectrons vs. photons and in the selection of vector boson fusion-produced Higgs bosons vs. the<br>ejection of the multijet processes.<br>Cryworxs: Digital electronic circuits; Trigger algorithms; Trigger concepts and systems (hardware<br>adstrware); Data reduction methods<br>wxXiv EPRINT: 2104.03408 | ARSTRACT: We present a novel application of the machine learning / artificial intelligence method called boosted decision trees to estimate physical quantities on field programmable gate arrays (FPGA). The software package #VERACHTAN features a new architecture called parallel decision trees with arbitrary number of input variables. It also features a new optimization scheme to use different numbers of bits for each input variables. It also features a new optimization scheme to use different numbers of bits for each input variable. It also features a new optimization scheme to use different numbers of bits for each input variable. It also features a new optimization scheme to use different numbers of bits for each input variables. It also features a new optimization scheme to use different numbers of bits for each input variables. It also features a new optimization scheme to use different numbers of bits for each input variables. It also features a new optimization scheme to use different numbers of bits for each input variables. It also features a new optimization scheme to use different PFGA resource utilization. Problems in high energy physics of proton collisions at the Large Hadron Collider (LHC) are considered. Estimation of missing transverse momentum ( $E_{\rm T}^{\rm max}$ ) at the first level trigger system at the High Luminosity LHC (HL-LHC) experiments, with a simplified detector modeled by Delphes, is used to benchmark and characterize the firmware performance. The firmware implementation with a maximum depth of up to 10 using eight input variables of 16-bit precision gives a latency value of $O(10)$ ns, independent of the clock speed, and $O(0.1)$ % of the available FPGA resources without using digital signal processors. Kerworkers: Data reduction methods; Digital electronic circuits; Trigger algorithms; Trigger concepts and systems (hardware and software). ArXiv EPRINT: 2207.05602 |
| *Corresponding author.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | *Corresponding author.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| 2021 IOP Publishing Lat and Sissa Medialab https://doi.org/10.1088/1748-0221/16/08/P08016                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | © 2022 IOP Publishing Lal and Sissa Medialab https://doi.org/10.1088/1748-0221/17/09/P09039                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |

#### Hong et al. JINST 16, P08016 (2021) http://doi.org/10.1088/1748-0221/16/08/P08016

### 2.Regression

parallel paths using HLS

| inst                                                                                                                                                           | PUBLISHED BY IOP PUBLISHING FOR SISSA MEDIALAB                                                                                                                                                                                                                                                                                         |             |                                               |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|-----------------------------------------------|
|                                                                                                                                                                | RECEIVED: July 13, 2022<br>ACCEPTED: August 23, 2022<br>PUBLISHED: September 27, 2022                                                                                                                                                                                                                                                  |             | na                                            |
|                                                                                                                                                                |                                                                                                                                                                                                                                                                                                                                        |             | Art<br>N                                      |
| nosecond machine                                                                                                                                               | learning regression with deep                                                                                                                                                                                                                                                                                                          |             | d                                             |
| osted decision trees                                                                                                                                           | in FPGA for high energy physics                                                                                                                                                                                                                                                                                                        | 202         | e                                             |
| Carlson, <sup>a,b</sup> Q. Bayer, <sup>b</sup> T.M. Hon                                                                                                        | g <sup>b,*</sup> and S.T. Roche <sup>b</sup>                                                                                                                                                                                                                                                                                           | 22          | Rec                                           |
| partment of Physics and Engineering<br>5 La Paz Road, Santa Barbara, CA 9<br>partment of Physics and Astronomy,<br>0 Allen Hall, 3941 O'Hara St., Pittsb       | 3108, U.S.A.<br>University of Pittsburgh,                                                                                                                                                                                                                                                                                              | JINST       | Pub                                           |
| <i>mail:</i> tmhong@pitt.edu                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                        | -<br>S<br>- |                                               |
| d boosted decision trees to estin<br>GA). The software package FWXM<br>s that allows for deep decision tree                                                    | tion of the machine learning / artificial intelligence method<br>nate physical quantities on field programmable gate arrays<br><b>CRITER</b> features a new architecture called parallel decision<br>es with arbitrary number of input variables. It also features a<br>nu numbers of bits for each input variable, which produces op- | 17          |                                               |
| physics results and ultraefficient<br>toton collisions at the Large Had<br>verse momentum ( $E_T^{miss}$ ) at the first<br>riments, with a simplified detector | FPGA resource utilization. Problems in high energy physics<br>ron Collider (LHC) are considered. Estimation of missing<br>t level trigger system at the High Luminosity LHC (HL-LHC)<br>modeled by Delphes, is used to benchmark and characterize                                                                                      | P09039      | Une<br>agn<br>Lan<br>pro<br>bos<br>une<br>BSN |
| input variables of 16-bit precision                                                                                                                            | are implementation with a maximum depth of up to 10 using<br>n gives a latency value of $O(10)$ ns, independent of the clock<br>PGA resources without using digital signal processors.                                                                                                                                                 | ) 3 9       | at t<br>mo<br>Mo<br>sics<br>pro               |
| WORDS: Data reduction methods;<br>and systems (hardware and softw                                                                                              | Digital electronic circuits; Trigger algorithms; Trigger con-<br>vare)                                                                                                                                                                                                                                                                 |             | the<br>foc<br>ide<br>dat                      |
| IV EPRINT: 2207.05602                                                                                                                                          |                                                                                                                                                                                                                                                                                                                                        |             | the<br>and<br>invo                            |
|                                                                                                                                                                |                                                                                                                                                                                                                                                                                                                                        |             | MH<br>suc<br>exp<br>fiel                      |
| Corresponding author.                                                                                                                                          | -                                                                                                                                                                                                                                                                                                                                      |             | <sup>1</sup> Scl<br><sup>2</sup> De           |

Carlson et al.

JINST 17, P09039 (2022)

http://doi.org/10.1088/1748-0221/17/09/P09039

### **3.Autoencoder 4.Hardware trees**

in-house training bypass latent space

#### re communications nosecond anomaly detection with cision trees and real-time application to otic Higgs decays S. T. Roche Φ<sup>1,2</sup>, Q. Bayer Φ<sup>2</sup>, B. T. Carlson Φ<sup>2,3</sup>, W. C. Ouligian<sup>2</sup>, P J. Stelzer Φ<sup>2</sup> & T. M. Hong Φ<sup>2</sup> ⊠ 23 May 2023 : 9 April 2024 online: 25 April 2024 We present an interpretable implementation of the autoencoding algorithr We present an interpretation implementation of the autoencooling algorithm, used as an anomaly detector, built with a forset of deep decision trees on FPGA, field programmable gate arrays. Scenarios at the Large Hadron Collider at CRRN are considered, for which the autoencoder is trained using known physical processes of the Standard Model. The design is then deployed in real-time trigger systems for anomaly detection of unknown physical processes, such as the detection of rare exotic decays of the Higgs boson. The inference made with a latency value of 30 ns at percent-level resource usage using th Xilinx Virtex UltraScale+ VU9P FPGA. Our method offers anomaly detection a values for edge Al users with r able signal-vsics at the discarding the remaining ~ 99% of the co LHC) at CERN. The LHC is the highest energy ollider that is designed to discover the Higgs operties<sup>45</sup> as well as to probe the unknown and ). Due to the lack of signs of

faster & more efficient no more HLS

PITT-PACC-2409-v1

#### Nanosecond hardware regression trees in FPGA at the LHC

P. Serhiayenka<sup>a</sup>, S. T. Roche<sup>a,b</sup>, B. T. Carlson<sup>a,c</sup>, and T. M. Hong<sup>\*a</sup>

<sup>a</sup>Department of Physics and Astronomy, University of Pittsburgh <sup>b</sup>School of Medicine, Saint Louis University <sup>c</sup>Department of Physics and Engineering, Westmont College

September 20, 2024

#### Abstract

We present a generic parallel implementation of the decision tree-based machine learning (ML) method in hardware description language (HDL) on field programmable gate arrays (FPGA). A regression problem in high energy physics at the Large Hadron Collider is considered: the estimation of the magnitude of missing transverse momentum using boosted decision trees (BDT). A forest of twenty decision trees each with a maximum depth of 10 using eight input variables of 16-bit precision is executed with a latency of about 10 ns using O(0.1%) resources on Xilinx UltraScale+ VU9P—approximately ten times faster and five times smaller compared to similar designs using high level synthesis (HLS)—without the use of digital signal processors (DSP) while eliminating the use of block RAM (BRAM). We also demonstrate a potential application in the estimation of muon momentum for ATLAS RPC at HL-LHC.

Keywords: Data processing methods, Data reduction methods, Digital electronic circuits, Trigger algorithms, and Trigger concepts and systems (hardware and software).

\*Corresponding author, tmhong@pitt.edu

Roche et al. Nat. Comm. 15 (2024) 3527 https://arxiv.org/abs/2304.03836



Serhiayenka et al. Submitted to NIM-A [2409.20506]

Focus today

# Outline

Introduction

• Who we are

## FPGA design

- Autoencoder -
  - Parallelizing decision trees
- HLS trees → VHDL trees

## Thoughts on SRO

- Data compression
- Anomaly detection

### More info

- Relevant papers from US
   Destination bin
- Where to find code, tutorials



| Depth i              | Depth ii              | Depth iii              | Decision path                                                                | Path # |
|----------------------|-----------------------|------------------------|------------------------------------------------------------------------------|--------|
| not(q <sub>i</sub> ) | not(q <sub>ii</sub> ) | N/A                    | not(q <sub>i</sub> ) and not(q <sub>ii</sub> )                               | 0      |
| q <sub>i</sub>       | N/A                   | N/A                    | q <sub>i</sub>                                                               | 1      |
| not(q <sub>i</sub> ) | q <sub>ii</sub>       | not(q <sub>iii</sub> ) | not(q <sub>i</sub> ) and q <sub>ii</sub> and not(q <sub>iii</sub> )          | 2      |
| not(q <sub>i</sub> ) | q <sub>ii</sub>       | q <sub>iii</sub>       | $not(\boldsymbol{q}_i)$ and $\boldsymbol{q}_{ii}$ and $\boldsymbol{q}_{iii}$ | 3      |

<sup>-</sup>0

100

200

D=6, N<sub>bins</sub>=57

0

50

10

Anoma



# Paper 3: Autoencoder intro



### Example: handwritten numbers

• Teach it 0, 1, 2, 3, 4 with a sample (doesn't know about 9!)

1 variable (20 bit) 784 variables (8-bit) 784 variables (8-bit) 300x compression

### Details

- Input-output distance is relatively small = good compression
- Input-output distance is relatively large = bad compression

# Tree autoencoder, what?! NN AE





### Tree AE

- Training is a black box, done offline
- Latent space is complex

From CMS Public Note, DP-2023/079



From CMS Machine Learning Group <a href="https://cms-ml.github.io/documentation/training/autoencoders.html">https://cms-ml.github.io/documentation/training/autoencoders.html</a>

- Training is sampling of 1d pdfs
- Latent space is simple / interpretable



#### • FPGA version simplified for anomaly at CMS • FPGA version can optionally skip latent sp.



Image from

https://medium.com/@rushikesh.shende/autoencoders-variationalautoencoders-vae-and-β-vae-ceba9998773d



# Paper 3: Training dev'd in-house



Train by sampling 1d projections

- Encoding: Event  $\rightarrow$  which bin it's in
- Decoding returns "reconstruction point"
  - Decoding: Bin  $\rightarrow$  median of the training data in bin



# Paper 3: AE to anomaly detector



### How does this detect anomalies?

- Define: Distance between input output = anomaly score
- Non-anomaly
  - Input is similar to training data
  - Will likely land in a small bin → close to the reconstruction point
- Anomaly
  - Input is not similar to training data
  - Will likely land in a large bin → far from the reconstruction point



## Paper 3: Toy dataset (2 wariables)



100 M Hop

Anomaly score 2

50

# Paper 3: Skip latent space



### Don't need latent space in firmware

Closer look at what it means to encode



• Skip the encoding & decoding



## FWXMACHINA

# Logic flow

- Left-to-right data flow (see right)
- Realized that we can bypass the latent space!









Skip this slide

# Paper 3: $H_{125} \rightarrow a_{10} a_{70} \rightarrow \gamma \gamma b \bar{b}$



### Inputs

- Sample
  - MadGraph5\_aMC 2.9.5
  - Hadron'n+Shower: Pythia8
  - Detector: Delphes 3.5.0, CMS
- Variables
  - 8 inputs: jets, photons, ΔR



 $\widehat{}$ 

### Results

- Compare
  - vs. 3 kHz Run-2 ATLAS rate
- Better
  3x gain in signal

Skip this slide



×10<sup>-3</sup>



# Paper 3: Compare with hls4ml



### LHC anomaly detection ds [Sci Data 9, 118]

- Background
  - W  $\rightarrow$  Iv, Z  $\rightarrow$  II, multijet, ttbar
- Signal
  - 4 BSM scenarios
- Input variables
  - 54 variables
  - p<sub>T</sub>, η, φ of the 4 leading μ, 4 leading
     e, 10 leading jets, MET
  - See distributions on the right
- Sample selection
  - Require  $\geq$ 1 lepton w/ p<sub>T</sub> > 23 GeV
  - (L1 will already save these...)



### Paper 3: vs. hls4ml



DS: Govorkova et al.

 $h^0 \rightarrow \tau \tau$ 

]LQ→ bτ  $h^+ \rightarrow \tau \nu$ 

 $A \rightarrow 4I$ 

Signal efficiency (TPR)

0.6

Method: fwX AE V=56

0.8

**ROC** curve

### Works well

- Physics (plots)
- FPGA (table)

### Comparison

- 20 HIs4ml NN-AE [Nature Mach. Intell. 4 (2022) 154–161]
- Physics: comparable AUC
- FPGA results

### Key take-away:

This result uses HLS trees. Using VHDL trees projected to be faster and smaller (next slides).

|             | hls4ml  | fwX (this) |
|-------------|---------|------------|
| Clock speed | 200 MHz | 200 MHz    |
| Latency     | 80 ns   | 30 ns      |
| Interval    | 5 ns    | 5 ns       |
| FF          | 0.5%    | 0.6%       |
| LUT         | 3%      | 9%         |
| DSP         | 1%      | 0.8%       |
| BRAM        | 0.3%    | 0          |
|             |         |            |

better

0.2

0.4

SM acceptance (FPR)

10<sup>-1</sup>

10<sup>-2</sup>

 $10^{-3}$ 

 $10^{-4}$ 

0

Distribution

Dataset:

Method:

Govorkova et al..

Sci. Data 9

no. 118 (2022)

fwX AE V=56

No. of trees T=30

Max depth D=4

50

Anomaly score  $\Delta$ 

60

<u>×10<sup>-3</sup></u>

SM

30

40

Events (unit norm.)

0.2

0.1

# **Papers**



#### 3. Autoencoder 4. Hardware trees **1.**Classification 2.Regression parallel paths using HLS parallel cuts using HLS in-house training faster & more efficient bypassing latent space no more HLS PUBLISHED BY IOP PUBLISHING FOR SISSA MEDIALAE inst PUBLISHED BY IOP PUBLISHING FOR SISSA MEDIALAB RECEIVED: July 13, 2022 ACCEPTED: August 23, 2022 JSHED: September 27, 2022 nature communications PITT-PACC-2409-v1 Nanosecond anomaly detection with Nanosecond machine learning regression with deep decision trees and real-time application to Nanosecond machine learning event classification with Nanosecond hardware regression trees in FPGA at the LHC boosted decision trees in FPGA for high energy physics $\mathbb{N}$ boosted decision trees in FPGA for high energy physics exotic Higgs decays 02 P. Serhiayenka<sup>a</sup>, S. T. Roche<sup>a,b</sup>, B. T. Carlson<sup>a,c</sup>, and T. M. Hong<sup>\*a</sup> S. T. Roche $\oplus^{1,2}$ , Q. Bayer $\oplus^2$ , B. T. Carlson $\oplus^{2,3}$ , W. C. Ouligian<sup>2</sup>, P. Se J. Stelzer $\oplus^2$ & T. M. Hong $\oplus^2$ <sup>a</sup>Department of Physics and Astronomy, University of Pittsburgh $\mathbb{N}$ teceived: 23 May 202 B.T. Carlson, $^{a,b}$ Q. Bayer, $^b$ T.M. $\mathrm{Hong}^{b,*}$ and S.T. Roche <sup>b</sup>School of Medicine, Saint Louis University T.M. Hong," B.T. Carlson, B.R. Eubanks, S.T. Racz, S.T. Roche, J. Stelzer and D.C. Stumpp <sup>a</sup>Department of Physics and Engineering, Westmont C 955 La Paz Road, Santa Barbara, CA 93108, U.S.A. ont College, <sup>c</sup>Department of Physics and Engineering, Westmont College Department of Physics and Astronomy, University of Pittsburgh 100 Allen Hall, 3941 O'Hara St., Pittsburgh, PA 15260, U.S.A. SNL <sup>b</sup>Department of Physics and Astronomy, University of Pittsburgh 100 Allen Hall, 3941 O'Hara St., Pittsburgh, PA 15260, U.S.A. ed as an anomaly detector, built with a forest of deep decision trees of PGA, field programmable gate arrays. Scenarios at the Large Hadron Coll at CERN are considered, for which the autoencoder is trained using know physical processes of the Standard Model. The design is then deployed in 1 E-mail: tmhong@pitt.edu September 20, 2024 E-mail: tmhono@pitt.edu ABSTRACT: We present a novel implementation of classification using the machine learning/artificial н ne trigger systems for anomaly detection of unknown physical pro nod called boosted decision trees (BDT) on field programmable gate arrays (FPGA). ABSTRACT: We present a novel application of the machine learning / artificial intelligence method such as the detection of rare exotic decays of the Higgs boson. The inf The firmware implementation of binary classification requiring 100 training trees with a maximum made with a latency value of 30 ns at percent-level resource usage using t Xilinx Virtex UltraScale+ VU9P FPGA. Our method offers anomaly detection called boosted decision trees to estimate physical quantities on field programmable gate arrays depth of 4 using four input variables gives a latency value of about 10 ns, independent of the clock (FPGA). The software package FWXMACHINA features a new architecture called parallel decision speed from 100 to 320 MHz in our setup. The low timing values are achieved by restructuring the $\sim$ Abstract paths that allows for deep decision trees with arbitrary number of input variables. It also features a BDT layout and reconfiguring its parameters. The FPGA resource utilization is also kept low at new optimization scheme to use different numbers of bits for each input variable, which produces opa range from 0.01% to 0.2% in our setup. A software package called FWXHACHINA achieves this implementation. Our intended user is an expert in custom electronics-based trigger systems in high timal physics results and ultraefficient FPGA resource utilization. Problems in high energy physics Ы We present a generic parallel implementation of the decision tree-based machine learning (ML) of proton collisions at the Large Hadron Collider (LHC) are considered. Estimation of mi method in hardware description language (HDL) on field programmable gate arrays (FPGA) $\bigcirc$ energy physics experiments or anyone that needs decisions at the lowest latency values for real-time transverse momentum ( $E_{T}^{miss}$ ) at the first level trigger system at the High Luminosity LHC (HL-LHC) A regression problem in high energy physics at the Large Hadron Collider is considered: the event classification. Two problems from high energy physics are considered, in the separation of 9 estimation of the magnitude of missing transverse momentum using boosted decision trees (BDT). A forest of twenty decision trees each with a maximum depth of 10 using eight input experiments, with a simplified detector modeled by Delphes, is used to benchmark and characterize electrons vs. photons and in the selection of vector boson fusion-produced Higgs bosons vs. the $\bigcirc$ the firmware performance. The firmware implementation with a maximum depth of up to 10 using rejection of the multijet processes. eight input variables of 16-bit precision gives a latency value of O(10) ns, independent of the clock variables of 16-bit precision is executed with a latency of about 10 ns using O(0.1%) resource on Xilinx UltraScale+ VU9P-approximately ten times faster and five times smaller compared speed, and O(0.1)% of the available FPGA resources without using digital signal processors. KEYWORDS: Digital electronic circuits: Trigger algorithms: Trigger concepts and sy to similar designs using high level synthesis (HLS)—without the use of digital signal processors (DSP) while eliminating the use of block RAM (BRAM). We also demonstrate a potential and software); Data reduction methods KEYWORDS: Data reduction methods; Digital electronic circuits; Trigger algorithms; Trigger cor application in the estimation of muon momentum for ATLAS RPC at HL-LHC. cepts and systems (hardware and software) ARXIV FPRINT: 2207.05602 Keywords: Data processing methods, Data reduction methods, Digital electronic circuits, Trigger algorithms, and Trigger concepts and systems (hardware and software). \*Corresponding author © 2022 IOP Publishing Ltd and Sissa Medialab https://doi.org/10.1088/1748-0221/17/09/P09039

Hong et al. JINST 16, P08016 (2021) http://doi.org/10.1088/1748-0221/16/08/P08016

Carlson et al. JINST 17, P09039 (2022) http://doi.org/10.1088/1748-0221/17/09/P09039

Roche et al. Nat. Comm. 15 (2024) 3527 https://arxiv.org/abs/2304.03836





Serhiayenka et al. Submitted to NIM-A [2409.20506]

Focus today

# Paper 2: Estimation



### Regression (using BDT)

- Toy problem in 1-d
- Train / test on f(x) = sin(x) + Gaussian(x)
- For sample of x: y = f(x) in 16 bits







#### 21

Path

0

1

2

# Paper 2: Parallel paths

- Example
  - 2d toy dataset, say  $x = p_T$  and y = eta for some SM sample





Table 3: Benchmark configuration and the FPGA cost. Three groups of information are given. The top-most group defines the FPGA setup. The second group defines the ML training used for the MET problem and the Nanosecond Optimization. The third group gives the actual results measured on the FPGA for four tree-depth combinations of 40-5, 40-6, 20-7, and 10-8.

| Parameter                        | Value                             | Comments                         |
|----------------------------------|-----------------------------------|----------------------------------|
| FPGA setup                       |                                   |                                  |
| Chip family                      | Xilinx Virtex Ultrascale+         |                                  |
| Chip model                       | xcvu9p-flga2104-2L-e              |                                  |
| Vivado version                   | 2019.2                            |                                  |
| Synthesis type                   | C synthesis                       |                                  |
| HLS or RTL                       | HLS                               | HLS interface pragma: None       |
| Clock speed                      | 320 MHz                           | Clock period is 3.125 ns         |
| ML training configuration & Nanc | second Optimization configuration |                                  |
| ML training method               | Boosted decision tree             | Regression, Adaptive boosting    |
| No. of input variables           | 8                                 |                                  |
| BIN ENGINE type                  | DEEP DECISION TREE ENGINE (D      | DTE)                             |
| No. of bits for all variables    | 16 bits for each                  | binary integers                  |
| FPGA cost for 40 trees, 5 depth  |                                   |                                  |
| Latency                          | 6 clock ticks                     | 18.75 ns                         |
| Look up tables                   | 1675 out of 1 182 240             | 0.1% of available                |
| Flip flops                       | 1460 out of 2 364 480             | < 0.1% of available              |
| FPGA cost for 40 trees, 6 depth  |                                   |                                  |
| Latency                          | 9 clock ticks                     | 28.125 ns                        |
| Look up tables                   | 4566  out of  1182240             | 0.4% of available                |
| Flip flops                       | 2516 out of $2364480$             | 0.1% of available                |
| FPGA cost for 20 trees, 7 depth  |                                   |                                  |
| Latency                          | 15 clock ticks                    | 46.875 ns                        |
| Look up tables                   | 4568  out of  1182240             | 0.4% of available                |
| Flip flops                       | 2697 out of $2364480$             | 0.1% of available                |
| Block RAM                        | 4.5 out of 4320                   | 0.1% of available                |
| FPGA cost for 10 trees, 8 depth  |                                   |                                  |
| Latency                          | 21 clock ticks                    | 65.625 ns                        |
| Look up tables                   | 2556 out of 1 182 240             | 0.2% of available                |
| Flip flops                       | 2299 out of 2 364 480             | 0.1% of available                |
| Block RAM                        | 5 out of 4320                     | 0.1% of available                |
| Common values for the above cont | figurations                       |                                  |
| Interval                         | 1 clock tick                      | 3.125 ns                         |
| Block RAM                        | 0 out of 4320                     | If not listed above              |
| Ultra RAM                        | 0 out of 960                      | Same for all trees and all depth |
| Digital signal processors        | 0 out of 6840                     | Same for all trees and all depth |

### Key idea:

### Can implement deep trees



# **Papers**





Hong et al. JINST 16, P08016 (2021) http://doi.org/10.1088/1748-0221/16/08/P08016

Carlson et al. JINST 17, P09039 (2022) http://doi.org/10.1088/1748-0221/17/09/P09039

Roche et al. Nat. Comm. 15 (2024) 3527 https://arxiv.org/abs/2304.03836



Serhiayenka et al.

Submitted to NIM-A [2409.20506]



# Summary

### • Python to write VHDL

Table 1: FPGA results and comparison with Refs. [7, 8, 11]. All results in the table uses the same FPGA model Xilinx Ultrascale+ VU9P (vu9p-flgb2104-2L-e) with the following available resources 1.2 M LUT, 2.4 M FF, 6.8 k DSP, and 4.3 k BRAM. Effective depth *d* is defined as so that  $2^d = N_{\text{bin}}/N_{\text{tree}}$ .

| <u> </u>          | 5 1 . (?            | 0 1 .63          | r miss     | •               | rmiss                                 | •              |            |            |
|-------------------|---------------------|------------------|------------|-----------------|---------------------------------------|----------------|------------|------------|
| Goal              | 5 classif'n         |                  | 1          |                 | 1                                     |                | ••••       |            |
| Reference         | [11]                | [7]              | [8]        | • • • • • • • • | This pape                             | er             | •••••      | •••••      |
| Setup             |                     |                  |            |                 |                                       |                |            |            |
| Design            | VHDL                | HLS              | HLS        | HLS             | VHDL                                  | VHDL           | VIDL       | VHDL       |
| Sum strategy      | -                   | -                | -          | -               | pipeline                              | combin.        | combin.    | pipeline   |
| Parallelize       | -                   | cutwise          | pathwise   | pathwise        | pathwise                              | pathwise       | pathwise   | pathwise   |
| Clock (MHz)       | 250                 | 320              | 320        | 320             | 320                                   | 320            | 200        | 320        |
| Bit precision     | fixed <sub>18</sub> | int <sub>8</sub> | $int_{16}$ | $int_{16}$      | $int_{16}$                            | $int_{16}$     | $int_{16}$ | $int_{16}$ |
| $N_{\rm var}$     | 16                  | 4                | 8          | 8               | 8                                     | 8              | 8          | 8          |
| N <sub>tree</sub> | 100                 | 100              | 40         | 10              | 40                                    | 10             | 20         | 100        |
| Max. depth D      | 4                   | 4                | 6          | 8               | 6                                     | 8              | 10         | 12         |
| $N_{\rm bin}$     | -                   | -                | 1.7 k      | 1.4 k           | 1.7 k                                 | 1.4 k          | 2.9 k      | 15.7 k     |
| Effective depth d | -                   | -                | 5.4        | 7.2             | 5.4                                   | 7.2            | 7.2        | 7.3        |
| 1                 |                     |                  |            | ```             | <u> </u>                              | 7              |            |            |
| NT / 11           |                     |                  | . 1        |                 | · · · · · · · · · · · · · · · · · · · | 1              | 1          | 1          |
| Notable           |                     |                  | 10         | entical         | identic                               | ai             | slower     | larger     |
| D 1               |                     |                  |            |                 |                                       |                | clock      | forest     |
| Results           | 0.61                |                  | <i>c</i>   |                 |                                       | 101            |            |            |
| LUT               | 96 k                | 1 k              | 6.4 k      | 75 k            | 5.1 k                                 | 10 k           | 15.5 k     | 38 k       |
| FF                | 43 k                | 0.1 k            | 35 k       | 24 k            | 1.6 k                                 | 4.7 k          | 6.6 k      | 19.4 k     |
| DSP               | 0                   | 2                | 0          | 0               | 0                                     | 0              | 0          | 0          |
| BRAM              | 0                   | 5.5              | 0          | 10              | 0                                     | 0              | 0          | 0          |
| URAM              | -                   | 0                | 0          | 0               | 0                                     | 0              | 0          | 0          |
| Latency (ns)      | 52 ns               | 9.375 ns         | 38 ns      | 119 ns          | 25 ns                                 | 19 ns          | 10 ns      | 28 ns      |
| " (tick)          | 13                  | 3                | 12         | 38              | 8                                     | 6              | 2          | 9          |
| Interval (tick)   | 1                   | 1                | 1          | 1               | 1                                     | 1              | 1          | 1          |
|                   |                     |                  |            | ``              | <u></u>                               | , <del>7</del> |            |            |
| Notable           |                     |                  | benc       | hmark           |                                       |                | in abstrac | t          |

### Results

- 5x smaller
- 10x faster



### Test case

• Mock-up ATLAS RPC for Phase-2

R. Ospanov, C. Feng, W. Dong, W. Feng, and S. Yang, *Development of FPGA-based neural network regression models for the ATLAS Phase-II barrel muon trigger upgrade*, Eur. Phys. J. Web of Conf. **251**, 04031 (2021).



# Outline

## Introduction

• Who we are

## FPGA design

- Autoencoder
  - Parallelizing decision trees
  - HLS trees → VHDL trees

# Thoughts on SRO

- Data compression
- Anomaly detection

### More info

- Relevant papers from us
- Where to find code, tutorials

# Thoughts on data compression



# MNIST example shows capability

- Input space
   784 variables of 8-bits = 6272 bits
- Latent space
   1 variable of 20-bit = 20 bits
- Compression = 314x
- Physics compression

Looking for collaborators

### Interpretability

 Learning is based on transparent density estimation of the input variable space

Representative coordinates of a bin is the median value of the training sample in the bin

Latent space data is the bin number

Train on the fly? Sample 1d histograms
 Looking for collaborators





## Regional data tx? (JINST in preparation)



Regional compression



On-detector electronics, Data Data Off-detector electronics, e.g., FPGA e.g., ASIC Transmitter Receiver out<sub>0</sub> ŷ<sub>0</sub> w<sub>0</sub> W<sub>0</sub> Vector sum X Bin х in<sub>0</sub> bus tap Engine<sub>0</sub> LUTO  $\hat{\boldsymbol{x}} = \boldsymbol{\Sigma}_t \, \hat{\boldsymbol{y}}_t$ since w Bin in<sub>1</sub> out<sub>1</sub>  $\hat{y}_t = \hat{x}_t / T$ Engine<sub>1</sub> LUT<sub>1</sub> Reconstr'n Processor Х  $in_{T-1} out_{T-1} \hat{y}_{T-1}$ Bin W<sub>T-1</sub> Engine<sub>T-1</sub> W<sub>T-1</sub> LUT<sub>T-1</sub> Input Encoder Data Latent Decoder Decoded Merge to  $\hat{x}$ Data data packing data data unpacking Modified Deep Decision Tree Engine (DDTE) is split up into two parts

Block diagram

Key question: How to achieve dynamic compression

# Use anomaly detection? (PRD in preparation) TM Hong

### Prototype study with Prof. B. Carlson Westmont College

Look at jets at LHC pileup=200, sum energy in 4 rings around seed



- Train DT autoencoder on pileup jets
- Hard scatter jets are anomalous wrt pileup
- Compression would depend on anomaly



# Outline

Introduction

• Who we are

### FPGA design

- Autoencoder -
  - Parallelizing decision trees
  - HLS trees → VHDL trees

## Thoughts on SRO

- Data compression
- Anomaly detection

### More info

- Relevant papers from us Destination bin
- Where to find code, tutorials





0





# **Python-based code**

# Availability

- <u>gitlab.com/PittHongGroup/fwX</u> parallel cuts (paper 1)
- Shared by email request parallel paths (paper 2)
  - autoencoder (paper 3)

hardware tree (paper 4)

## Licensing

- Will share for "Non-Commercial, Educational and Research Purposes"
- For commercial use, contact Univ. of Pittsburgh Innovation Institute
- See EULA for details

Skip this slide

# **Git structure**





 Xconfig creates model configuration tutorial - part 1 Xfirmware writes HLS or VHDL tutorial - part 2 Vivado synthesize & testbench tutorial - part 3

# More info

### Start page

• <u>fwx.pitt.edu</u>





Information regarding the fwX project will be available on this page. This project is developed by members of the Hong Group in the Department of v and collaborators Physics and As

#### What is fwX

• Its full name is "firmware ex machina," a play of the phrase in Latin / Greek deus ex machina / θεὸς ἐκ μηχανῆς. Since it's a mouthful to say, we refer to it as fwX

• It is a software package to design nanosecond implementation of machine learning / artificial intelligence algorithms on FPGA for use in high energy physics

#### Some figures



Caption Illustrative example of **\***coder as two visual representations of the same decision tree. Deep decision tree (left) rendered as the decision tree grid (center) and implemented by the paralle decision paths (right). Two-depth deep decision tree (DDT) is the encoder (step 1) shown as a conventional binary split diagram; the latent space is the bin number (step 2); the latent space data is decoded using the decision tree grid (DTG) (step 3); and the simultaneous encoding and decoding with **±**code (star-coder) architecture (right) represented by parallel decisio paths (PDP) of Ref. [79]. The DTG is the visualization as a grid of partitions in V-dimensional space. In this example, the input x = (55, 70) yields the output x = (27, 25) without needing to explicitly produce the latent laver Demonstration of decision tree-based autoencoder and a demonstration of data transmission / anomaly detection using the MNIST dataset, which is a set of images of handwritten numbers converted to 28 × 28 pixels, or 784-length input vector V = 784, with N = 8 bits per pixel. The ML training is done on 15k mages of handwritten 0 to 4, but not 5 to 9, on one tree T = 1at a maximum depth of D = 20. The output is a 784-length

ector with 8 bits per pixel. The data compress

lecompression factor, the ratio of input-output bits to the atent space dimensions,  $V \cdot N/(T \cdot D) = 784 \cdot 8/(1 \cdot 20)$ , is about

<u>⊻</u> பீ ≡

### Tutorial

SMARTHEP Edge ML School 9/24/24

Slides

indico.cern.ch/event/1405026/contributions/6103378/

Videos on synthesizing & test bench

indico.cern.ch/event/1405026/contributions/6103386/

| fxX_Tutorial - [C/Users/pas218/fwxHi<br>File Edit Flow Tools Rep |         |            |                                               | ick Access  |               |                   |                      |                 |              |                | B ×       |
|------------------------------------------------------------------|---------|------------|-----------------------------------------------|-------------|---------------|-------------------|----------------------|-----------------|--------------|----------------|-----------|
| B B B X .                                                        |         | Σ %        |                                               | 10 us       | ✓ ±           | C                 |                      |                 |              | I Default Layo |           |
| Flow Navigator 🗄 🔍 –                                             | SIMULAT | ON - Behav | ioral Simulation - Functional - sim_1 - a     | e_testbench |               |                   |                      |                 |              |                | ? ×       |
| PROJECT MANAGER                                                  |         |            |                                               |             | _             |                   |                      |                 |              |                |           |
| Settings                                                         |         | - 0 6      | fwX_ae_behavioral_tb.vhd ×                    |             | *             |                   |                      |                 |              |                | ? 🗆 🖸     |
| Add Sources                                                      | Q."     | ۹ ″        | Q 🖬 @ @ 💥 📲                                   | I I         | 2 2 4         | le el H           |                      |                 |              |                | ٥         |
| Language Templates                                               | Nam^    | Nam^       |                                               |             |               |                   |                      | 423.965 ns      |              |                |           |
| Catalog                                                          | ~ 8     |            | Name                                          | Valu        | e 0.000 ns    | 200.00            | 10 ns 400            | .000 ns 600.000 | ns 000.0     | 00 ns          | 1,000.0   |
| + + Country                                                      |         | 1          | > Vevent0Temp[7:0]                            | 62          | (84)          | <u> </u>          | 62                   | χ 41 χ          | 90           | 146            |           |
| IP INTEGRATOR                                                    |         | 2.10       | > Vevent1Temp[7:0]                            | 176         | 193           | 254               | 176                  | <u> </u>        | 181          | 201            |           |
| Create Block Design                                              |         | > 11       | > V event2Temp[7:0]                           | 219         | 210           | 120               | 219                  | 235             | 217          | 206            | 4         |
| Open Block Design                                                |         | >.10       | > ♥ event3Temp[7:0] > ♥ expectedDistTemp[7:0] | 255<br>251  | 7             | 255               | 251                  | 239<br>V 179 V  | 2            | 27             | $\exists$ |
|                                                                  | >       | > 10       | PrevEvent0Temp[7:0]                           | 62          | 0 2 84        | ^                 | ¥ 62                 | γ 41            | <br>Y 90     | Y 146          | $\exists$ |
| Generate Block Design                                            |         | > 8        | ₩ prevEvent1Temp(7:0)                         | 176         | (0) 193       | 251               | 176                  | 137             | ý 181        | 201            | 5         |
| SIMULATION                                                       |         | > 10       | > V prevEvent2Temp[7:0]                       | 219         | 0 210         | 120               | 219                  | 235             | 217          | 206            |           |
| Run Simulation                                                   |         | > N<br>> N | > V prevEvent3Temp[7:0]                       | 255         | (0)           |                   |                      | 255             |              |                |           |
| Non Simolation                                                   |         | 2 4        | > Vevent0[7:0]                                | 62          | 0 84          | 6                 | 62                   | 41              | 90           | 146            |           |
| RTL ANALYSIS                                                     |         | > 10       | > ♥ event1[7:0]                               | 176         | 0 193         | 254               | 176                  | <u>)</u> 137    | <u>)</u> 181 | 201            |           |
| > Open Elaborated Design                                         |         | > 10       | > Vevent2[7:0]                                | 219         | 0 210         | 150               | 219                  | 235             | 217          | ) 206          |           |
| <ul> <li>Open encounce ocaga</li> </ul>                          |         | > 10       | > ♥ event3[7:0] > ♥ ap_return[7:0]            | 255<br>251  | <u>لا</u> ر   | ίαχαχ ν χ         | 255                  | 255             | 179          | 2              | 4         |
| SYNTHESIS                                                        |         | > 8        | vector_output[7:0]                            | 251         | 0 7           | 255               | Y 251                | ¥ 179           | Y 2          | 27             | Η Ι       |
| Run Synthesis                                                    |         | > 10       | W addrTemp                                    | 3           | (0)           | 1 χ               | 2                    | λ 3 χ           | -^Y          |                | 5         |
| > Open Synthesized Design                                        |         | > 8        | > ¥ addr(9.0)                                 | 3           | (0)           | · · · · ·         | 2                    | 3               | • <u>)</u>   | 5              | 5         |
| y open synercoace besign                                         |         | 10         |                                               |             |               |                   |                      |                 |              |                |           |
| IMPLEMENTATION                                                   | <>~     | <>v        |                                               | <           | > < ====      |                   |                      |                 |              |                |           |
| Run Implementation                                               |         |            |                                               |             |               |                   |                      |                 |              |                |           |
| > Open Implemented Design                                        |         |            | lessages Log                                  |             |               |                   |                      |                 |              |                |           |
|                                                                  | Q 7     |            |                                               |             |               |                   |                      |                 |              | -              |           |
| PROGRAM AND DEBUG                                                |         | n 1000ns   | im-96] XSim completed. Design                 | n enanehot  | 'as testhone  | h hehaz' Inaded.  |                      |                 |              | Station of the |           |
| Generate Bitstream                                               | INFO    | : [USF-XS  | im-97] XSim simulation ran f                  | or 1000ns   | -             | -                 |                      |                 |              |                |           |
| > Open Hardware Manager                                          | O laun  | ch_simuls  | tion: Time (s): cpu = 00:00:                  | 07 ; elapse | sd = 00:00:08 | . Memory (MB): pe | ak = 1064.094 ; gain | = 6.12          |              | -              |           |
|                                                                  | <=      | _          |                                               |             |               |                   |                      |                 | 1            | -              | -         |
|                                                                  |         |            |                                               |             |               |                   |                      |                 |              |                |           |

2021-10-20

2021-12-04

2023-05-12

2023-09-25

Poster: Presentation of fwX BD

decision trees on FPGA for L1 trigger

network results

on FPGA at L1 triggers

Talk: Comparisons of fwX's BDT to hls4ml's neural

Talk: Decision tree autoencoder anomaly detection

Talk: fwXmachina part 1: Classification with boosted



Python: Available upon reque

Medical Imaging Conference, 2021 IEEE NSS

Phenomenology Symposium, Pheno 2023

Fast Machine Learning for Science Workshop

MIC. link

2023, indico

PIKIMO 11, india

S.T. Racz

T.M. Hong

S.T. Roche

T.M. Hong

| 3    | Anomaly det<br>end decision<br>autoencoder |                                                                         | <ul> <li>detection, Mendeley Data, d. 10.17632/s6985kscs.1</li> <li>(2023-04-11). This sample is used in v1 of the paper drail [arXiv:2304.03836v1]</li> <li>fwXmachina example: Anor detection for two photosa two jets, Mendeley Data, d. 10.17632/44t976dyrj.1</li> <li>(2024-02-05). This sample is used in the final version of paper.</li> </ul> | s<br>ft<br>maly<br>and<br>bi: | <ul> <li>Python: Available upon request</li> <li>IP testbench: Xilin: inputs for nano<br/>detection with decision trees, http:<br/>scholarship.pitt.edu/id/eprint/4443<br/>testbench is used in v1 of the pape<br/>[arXiv:2304.03836v1]</li> <li>IP testbench: Xilinx inputs for nano<br/>detection with decision trees for tv<br/>jets, http://d-scholarship.pitt.edu/ii<br/>(2024-02-01). This testbench is use<br/>of the paper.</li> </ul> | //d-<br>31 (2023-04-23). This<br>r draft<br>second anomaly<br>vo photons and two<br>d/eprint/45784 |  |
|------|--------------------------------------------|-------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|--|
| 4    | Application in                             | n ATLAS Upgrade                                                         | 0 -                                                                                                                                                                                                                                                                                                                                                    |                               | 0 -                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                    |  |
| alks | s / Posters                                |                                                                         |                                                                                                                                                                                                                                                                                                                                                        |                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                                                    |  |
| #    | Date                                       | Type: Title                                                             |                                                                                                                                                                                                                                                                                                                                                        |                               | ue / Link                                                                                                                                                                                                                                                                                                                                                                                                                                      | Speaker                                                                                            |  |
| 1    | 2021-05-24                                 | 2021-05-24 Talk: Comparisons to hls4ml's boosted decision tree results  |                                                                                                                                                                                                                                                                                                                                                        |                               | Phenomenology Symposium, Pheno 2021, indico T.M. Hong                                                                                                                                                                                                                                                                                                                                                                                          |                                                                                                    |  |
| 2    | 2021-06-06                                 | Poster: Nanosecond machine learning with BDT for<br>high energy physics |                                                                                                                                                                                                                                                                                                                                                        |                               | ual HEP conference on Run4@LHC,<br>hell 2021, indico                                                                                                                                                                                                                                                                                                                                                                                           | B.T. Carlson                                                                                       |  |
| 3    | 2021-07-13                                 | Talk: Nanosecond machine learning with BDT for<br>high energy physics   |                                                                                                                                                                                                                                                                                                                                                        |                               | sion of Particles and Fields (DPF) in the<br>erican Physical Society (APS), indico                                                                                                                                                                                                                                                                                                                                                             | B.T Carlson                                                                                        |  |
| 4    | 2021-09-28                                 | 28 Seminar: Invisible Higgs decays & trigger challenges<br>at the LHC   |                                                                                                                                                                                                                                                                                                                                                        |                               | versity of Geneva, Switzerland                                                                                                                                                                                                                                                                                                                                                                                                                 | T.M. Hong                                                                                          |  |
| 5    | 2021-10-18                                 | 18 Talk: Presentation of fwX BDT                                        |                                                                                                                                                                                                                                                                                                                                                        |                               | n Int'l Conf. on Accelerator and Large<br>erimental Physics Control Systems,<br>.EPCS 2021, indico                                                                                                                                                                                                                                                                                                                                             | S.T. Roche                                                                                         |  |
| 6    | 2021-10-22                                 | the LHC: A discussi                                                     | learning in real-time triggers at<br>on on Machine learning, Boosted<br>I-time trigger, and ML on FPGA                                                                                                                                                                                                                                                 |                               | artment of Physics, University of<br>nessee, Knoxville                                                                                                                                                                                                                                                                                                                                                                                         | T.M. Hong                                                                                          |  |
|      |                                            |                                                                         |                                                                                                                                                                                                                                                                                                                                                        | IEEE                          | Nuclear Science Symposium and                                                                                                                                                                                                                                                                                                                                                                                                                  |                                                                                                    |  |

fwXmachina example: Anomaly detection, Mendelev Data, doi:

# Conclusion

# Introduction

Papers

# FPGA design

- Decision tree autoencoder
- Parallel decision trees in VHDL

# Thoughts on SRO

- Transparent interpretation
- 30 ns data compression
- 30 ns anomaly detects















# to collaboration



31

### Backup

# **Papers**



#### 3. Autoencoder 4. Hardware trees **1.**Classification 2.Regression parallel paths using HLS parallel cuts using HLS in-house training faster & more efficient bypassing latent space no more HLS inst PUBLISHED BY IOP PUBLISHING FOR SISSA MEDIALAB PUBLISHED BY IOP PUBLISHING FOR SISSA MEDIALAB RECEIVED: April 9, 2021 ACCEPTED: June 29, 2021 PUBLISHED: August 4, 2021 RECEIVED: July 13, 2022 nature communications PITT-PACC-2409-v1 Nanosecond anomaly detection with Nanosecond machine learning regression with deep decision trees and real-time application to Nanosecond machine learning event classification with Nanosecond hardware regression trees in FPGA at the LHC boosted decision trees in FPGA for high energy physics $\mathbb{N}$ boosted decision trees in FPGA for high energy physics exotic Higgs decays P. Serhiayenka<sup>a</sup>, S. T. Roche<sup>a,b</sup>, B. T. Carlson<sup>a,c</sup>, and T. M. Hong<sup>\*a</sup> N S. T. Roche $\oplus$ <sup>1,2</sup>, Q. Bayer $\oplus$ <sup>2</sup>, B. T. Carlson $\oplus$ <sup>2,3</sup>, W. C. Ouligian<sup>2</sup>, P. Ser J. Stelzer $\oplus$ <sup>2</sup> & T. M. Hong $\oplus$ <sup>2</sup> <sup>a</sup>Department of Physics and Astronomy, University of Pittsburgh eived: 23 May 2023 B.T. Carlson,<sup>a,b</sup> Q. Bayer,<sup>b</sup> T.M. Hong<sup>b,\*</sup> and S.T. Roche <sup>b</sup>School of Medicine, Saint Louis University T.M. Hong,\* B.T. Carlson, B.R. Eubanks, S.T. Racz, S.T. Roche, J. Stelzer and D.C. Stumpp <sup>a</sup>Department of Physics and Engineering, Westmont College, 955 La Paz Road, Santa Barbara, CA 93108, U.S.A. <sup>c</sup>Department of Physics and Engineering, Westmont College Department of Physics and Astronomy, University of Pittsburgh 100 Allen Hall, 3941 O'Hara St., Pittsburgh, PA 15260, U.S.A <sup>b</sup>Department of Physics and Astronomy, University of Pittsburg 100 Allen Hall, 3941 O'Hara St., Pittsburgh, PA 15260, U.S.A ed as an anomaly detector, built with a forest of deep decision trees or FPGA, field programmable gate arrays. Scenarios at the Large Hadron Coll at CERN are considered, for which the autoencoder is trained using know physical processes of the Standard Model. The design is then deployed in 1 Z E-mail: tmhong@pitt.edu September 20, 2024 ົດ E-mail: tmbono@pitt.edu ABSTRACT: We present a novel implementation of classification using the machine learning/artificial Н me trigger systems for anomaly detection of unknown physical pro intelligence method called boosted decision trees (BDT) on field programmable gate arrays (FPGA). such as the detection of rare exotic decays of the Higgs boson. The infer ABSTRACT: We present a novel application of the machine learning / artificial intelligence metho The firmware implementation of binary classification requiring 100 training trees with a maximum made with a latency value of 30 ns at percent-level resource usage using t Xilinx Virtex UltraScale+ VU9P FPGA. Our method offers anomaly detection called boosted decision trees to estimate physical quantities on field programmable gate arrays depth of 4 using four input variables gives a latency value of about 10 ns, independent of the clock (FPGA). The software package FWXMACHINA features a new architecture called parallel decision 5 speed from 100 to 320 MHz in our setup. The low timing values are achieved by restructuring the Abstract paths that allows for deep decision trees with arbitrary number of input variables. It also features a BDT layout and reconfiguring its parameters. The FPGA resource utilization is also kept low at tion scheme to use different numbers of bits for each input variable, which p a range from 0.01% to 0.2% in our setup. A software package called FWXMACHINA achieves this Р We present a generic parallel implementation of the decision tree-based machine learning (ML) implementation. Our intended user is an expert in custom electronics-based trigger systems in high of proton collisions at the Large Hadron Collider (LHC) are considered. Esti method in hardware description language (HDL) on field programmable gate arrays (FPGA) energy physics experiments or anyone that needs decisions at the lowest latency values for real-time event classification. Two problems from high energy physics are considered, in the separation of transverse momentum $(E_{T}^{miss})$ at the first level trigger system at the High Luminosity LHC (HL-LHC) A regression problem in high energy physics at the Large Hadron Collider is considered: the estimation of the magnitude of missing transverse momentum using boosted decision trees (BDT). A forest of twenty decision trees each with a maximum depth of 10 using eight input electrons vs. photons and in the selection of vector boson fusion-produced Higgs bosons vs. the the firmware performance. The firmware implementation with a maximum depth of up to 10 using rejection of the multijet processes. variables of 16-bit precision is executed with a latency of about 10 ns using O(0.1%) resource eight input variables of 16-bit precision gives a latency value of O(10) ns, independent of the clock on Xilinx UltraScale+ VU9P-approximately ten times faster and five times smaller compared speed, and O(0.1)% of the available FPGA resources without using digital signal processor 0 KEYWORDS: Digital electronic circuits; Trigger algorithms; Trigger concepts and systems (hardware to similar designs using high level synthesis (HLS)—without the use of digital signal processors (DSP) while eliminating the use of block RAM (BRAM). We also demonstrate a potential and software); Data reduction methods KEYWORDS: Data reduction methods: Digital electronic circuits: Trigger algorithms: Trig application in the estimation of muon momentum for ATLAS RPC at HL-LHC. cepts and systems (hardware and software) ARXIV EPRINT: 2104.03408 ARXIV FPRINT: 2207.05602 Keywords: Data processing methods, Data reduction methods, Digital electronic circuits, Trigger algorithms, and Trigger concepts and systems (hardware and software). \*Corresponding author https://doi.org/10.1088/1748-0221/16/08/P08016 © 2021 IOP Publishing Ltd and Sissa Medialal

Hong et al. JINST 16, P08016 (2021) http://doi.org/10.1088/1748-0221/16/08/P08016



#### Carlson et al. JINST 17, P09039 (2022) http://doi.org/10.1088/1748-0221/17/09/P09039

#### Roche et al. Nat. Comm. **15** (2024) 3527 https://arxiv.org/abs/2304.03836



Serhiayenka et al. Submitted to NIM-A [2409.20506]

#### Paper 1: Parallelize cuts



2d plane: x<sub>a</sub> vs. x<sub>b</sub>









#### Key idea:

Forest can be merged prior to firmware implementation

### Paper 1: Block diagram





# Paper 1: Look up bin engine







Search for the bin where the data point lives

### Paper 1: Bit shifting





41

#### Paper 1: Scaling





# Paper 1: vs. hls4ml family





43

# **Test bench setup**



#### Philosophy

- Every training ships with test vectors
- Every design creates its own testbench
- Performance values from implementation, not estimate



#### Estimates vs. actual



#### Compared

• Estimated usage / latency vs. actual usage / latency

Table 12: FPGA cost verification against physical FPGA. Comparison of the FPGA cost using the bitstream on the FPGA (actual), simulated timing using co-simulation and estimated resources using Vivado HLS (estimated). The actual-to-estimated ratios are given as R. Two FPGA choices and three clock speeds are considered; the 320 MHz group of columns represent the benchmark clock. For all other configurable parameters, see table 1. The timing values are reported in units of clock ticks. The Xilinx Vivado version used for the actual and estimated columns are noted. For the ratios, "1" signifies no difference.

| Parameter  | Benchmark FPGA                 |         |     |                       |        |       |                       |        |                       | Smaller FPGA      |        |         |  |
|------------|--------------------------------|---------|-----|-----------------------|--------|-------|-----------------------|--------|-----------------------|-------------------|--------|---------|--|
| FPGA setup |                                |         |     |                       |        |       |                       |        |                       |                   |        |         |  |
| Family     | Xilinx Virtex Ultrascale+      |         |     |                       |        |       |                       |        |                       | Xilinx Artix-7    |        |         |  |
| Model      | xcvu9p-flga2104-2L-e           |         |     |                       |        |       |                       |        |                       | xc7z020-clg400-1. |        |         |  |
| Speed      | 320 MHz                        |         |     | 200 MHz               |        |       | 100 MHz               |        |                       | 100 MHz           |        |         |  |
| Period     | $3.125\mathrm{ns}\ldots\ldots$ |         |     | 5 ns                  |        |       | 10 ns                 |        |                       | 10 ns             |        |         |  |
| Vivado     | 2019.2 2019.2                  |         |     | 2018.2 2018.2         |        |       | 2018.2 2018.2         |        | 2019.1 2019.2         |                   |        |         |  |
| FPGA cost  | actual / estim. = $R$          |         | R   | actual / estim. = $R$ |        |       | actual / estim. = $R$ |        | actual / estim. = $R$ |                   |        |         |  |
| Latency    | 3 /                            | /3 =    | 1   | 2                     | / 2    | = 1   | 1                     | /1     | = 1                   | 4                 | /4     | = 1     |  |
| Interval   | 1 /                            | /1 =    | 1   | 1                     | /1     | = 1   | 1                     | /1     | = 1                   | 1                 | /1     | = 1     |  |
| LUT        | 717 /                          | /1903 = | 0.4 | 717                   | / 4015 | = 0.2 | 717                   | / 4007 | = 0.2                 | 482               | / 3572 | 2 = 0.1 |  |
| FF         | 147 /                          | /138 =  | 1.1 | 147                   | / 113  | = 1.3 | 147                   | / 2    | = 73.                 | 245               | / 362  | = 0.7   |  |
| BRAM       | 5.5 /                          | /8 =    | 0.7 | 5.5                   | / 15   | = 0.4 | 5.5                   | / 15   | = 0.4                 | 7.5               | / 15   | = 0.5   |  |
| URAM       | 0 /                            | /0 =    | 1   | 0                     | / 0    | = 1   | 0                     | / 0    | = 1                   | NA                | /NA    | = NA    |  |
| DSP        | 2 /                            | /0 =    | NA  | 2                     | / 2    | = 1   | 2                     | / 2    | = 1                   | 2                 | / 2    | = 1     |  |

Not always 1

### FW testbench w/ IP available



#### http://d-scholarship.pitt.edu/45784/

#### Screenshots in the document

#### Autoencoder Firmware Testbench Tutorial

Please download Vivado 2019.2 at the following link, if you do not currently have it: <u>https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/vivado-design-tools/archive.html</u>

#### **Before Beginning**

Before beginning, please make sure that you have (and know the location of) the autoencoder IP folder, and the VHDL testbench files:

| IName                | Date modified     | іуре        | Size |
|----------------------|-------------------|-------------|------|
| 📙 autoencoder8var_ip | 2/7/2024 1:30 PM  | File folder |      |
| tb_vhd_files         | 2/8/2024 11:50 AM | File folder |      |

#### **Creating New Project in Vivado**

Open Vivado 2019.2 and select "create new Project." On the following pop-up, select "next," and you will be prompted to name the project. Name the project as you wish and choose a location to store it. Keep clicking next until you reach a page that prompts you to select the part/ board. For this tutorial, we will be using the Virtex UltraScale+ VCU118 board. After you have selected your part or board, keeping clicking "next" until you have reached the end of the setup page.

| New Project                                                                  |             |         |            |              |                         |
|------------------------------------------------------------------------------|-------------|---------|------------|--------------|-------------------------|
| s <b>fault Part</b><br>oose a default Xilinx part or board for your project. |             |         |            |              | I                       |
| Parts   Boards                                                               |             |         |            |              |                         |
| Reset All Filters                                                            |             |         |            | Up           | date Board Repositories |
| Vendor: All 🗸 Name: All                                                      |             |         |            | ✓ Boar       | d Rev: Latest 🗸 🗸       |
| Search: 🔍 vcu118 🛞                                                           | ✓ (1 match) |         |            |              |                         |
| Display Name                                                                 |             | Preview | Vendor     | File Version | Part                    |
| Virtex UltraScale+ VCU118 Evaluation Platform                                |             |         | xilinx.com | 2.3          | xcvu9p-flga2104-2L-e    |
|                                                                              |             |         |            |              |                         |
| <                                                                            |             |         |            |              | )                       |
| ?                                                                            |             |         | < Back     | Next >       | Finish Cance            |







#### Autoencoder intro



#### Example: handwritten numbers

• Teach it about the number 4



#### Corresponding data set

| Image | Pixel I | Pixel 2 | <br>Pixel<br>300 | <br>Pixel<br>783 | Pixel<br>784 |
|-------|---------|---------|------------------|------------------|--------------|
| 1     | 0       | 0       | <br>240          | <br>0            | 0            |
| 2     | 0       | I       | <br>255          | <br>0            | 0            |
|       |         |         | <br>             | <br>             |              |
| 500k  | 0       | 0       | <br>231          | <br>0            | 0            |

#### Details

• Each pixel in the data set are unrelated to each other

=

## FWXMACHINA

# Logic flow

- Left-to-right data flow (see right)
- Realized that we can bypass the latent space!







# **Machine learning**

Focus on the most popular use cases in HEP

#### Supervised classification

- Neural networks & Boosted decision trees
- Others (SVM, kNN, Matrix element, etc.)

#### Structural similarities: NN & BDT

- Step function boundary
- Fuzzy boundary

#### Use cases

- Regression
- Classification S vs. B
- Anomaly detection B vs. not-B Late

Not covered

Focus of this section

Previous slides

If time

Later slides



Will discuss other approaches (estimation, unsupervised) after intro

# **Neural networks basics**

From Bruce Denby, *Tutorial on Neural Network Applications in High Energy Physics: A 1992 Perspective*, FERMILAB-CONF-92 / 121-E





Sum of step functions can approximate the desired contour





The contour is converted to the final step function

# **Activation function**

Fuzzy boundary using a function





Activation fn gives users a handle to control true / false positive rates

# **Decision tree basics**

And how it achieves the same result as NN



#### Step function for 2d





# Flip book





#### Unit gaussians of two variables









#### **Binary classification**





S

#### **Binary classification**

#### 58

tree1 depth1

# tree1 depth2







#### **Binary classification**









#### **Binary classification**

#### tree1 depth4

# tree1 depth4



tree1 depth8



#### Draws diagonal

# Depth 2

vary trees



































#### becomes very blurry

# Put it together on one slide





Sweet spot depends on the physics problem

# Forest of decision trees

Fuzzy boundary by averaging step functions





Forest of decision trees provides the gradient

# **Activation function**

Fuzzy boundary using a function





#### Different approach, but same result