> - s8:  did you make these plots?

These came from David Ruth! Though I did prepare most of the code that went into making them.

> ‎- what exactly does “uncalibrated” mean?  Is this at the EM scale?

Yep! That's correct!

> ‎- why do the 2 GeV neutrons show up at 5 GeV in the LD reconstruction?

At this point, that's still not understood. Olaiya had started looking into it, and the last I heard from him on this topic was that he had noticed that the reported neutron energy (what you get from edm4hep::MCParticle::getEnergy()) is systematically higher than what was actually thrown.

This deviation increases as a function of thrown energy, and — based on some rough calculations — doesn't line up with what I would expect if it were just a difference between throwing  kinetic energy vs. reporting total energy.

A good check to do could be to train on the kinetic energy we knew we threw (eg. 2 GeV) rather than on relying on what's being reported by edm4hep::MCParticle.

> ‎- why does LD and MLP give such a different response at low E, where the performance really matters: 5 vs 1 GeV?

It's not immediately clear to me why they would give such different responses, but it could be a consequence of an LD model not being adequate for the problem at hand...

As I understand it, the LD model is effectively trying to group points into clusters (particle energies in our case) using statistical properties of the training data in feature space (lead BHCal, BIC cluster energies, positions, etc.). But if there's not enough separation in the means of our training data (which could be especially true at lower energies), it might have trouble separating the data into discrete groups.

Another aspect could be inadequate training data: I would expect the performance of both models to improve  as you

increase the statistics you use to train (though that will have diminishing returns past some point), and
increase the number of kinematic points you train on (which I suspect is far more important).

We trained the models and made these plots using the campaign output, but only at the 4 kinematic points shown on the plots. And the campaign output might not have a granular-enough set of kinematic points to appropriately train the models (especially at low energies).

A more prudent approach (and a good check) might be to train the models on a hand-run simulation that has a very granular spread of kinematic points (which emphasize the 1 - 5 GeV range), and then use the trained models to process specific kinematic points from the campaign data.

Based on the performance of the LD model with this training set, you may want to drop the LD model completely.

Lastly, it should be pointed out the MLP shown in these plots was the one implemented in TMVA, which David identified as being severely limited in its available hyperparameters to tune. David's MLP that he implemented in PyTorch should be far more adequate, and would I recommend using that going forward.

> ‎- why is the resolution so different (15% vs 115%)?

I would suspect for similar reasons, but this time with the MLP struggling with the data. If you look at fig.s 8.190 and 8.192, you can see that even though the MLP curves have better linearity, the calibrated energy distributions themselves look pretty funny. Which, to me, makes it seem it like the MLP model might be overfitting the data.

In contrast, the LD does produce nice, roughly gaussian distributions. So this could be a trade-off between the 2 models:

the MLP architecture might be too complicated for the problem and getting skewed by the variance of the data; and
The LD is — by definition — extremely simple, and so while the variance isn't an issue, it is struggling to correctly classify the data.

Again, my suggestions would be to try a much more granular training set and to compare the performance of both against the performance of David's MLP.

> ‎- like this, one cannot trust the calibrated energy at all, I think.

Fair! I think the plots handily demonstrate that there are clear issues that need to get resolved first! So I would suggest:

again, checking the performance with a more granular simulation and folding David's MLP into comparisons; and
having somebody manually calculate rough calibration factors.

The 2nd point, I think, would go a long way in establishing the trust of the ML techniques and cross-checking them. I sketched out some possible ways of going about this in slide 4 of this presentation:

https://indico.bnl.gov/event/28139/contributions/107601/attachments/62022/106426/BHCalSoftwarePriorities.d16m5y2025.v2.pdf

> ‎- what does this look like for simple reconstruction (correct for sampling fraction, h/e, with clustering)?—or is that the uncalibrated?

So the uncalibrated curves correct for the sampling fraction and include clustering, but they don't correct for h/e. That's part of what the ML is intended to do!

But I agree! I would very much like to see what the curves look like with a "simpler" approach (e.g. using the manual calibration I mentioned above).

> ‎- s10:  this is Andrew’s work for BSM physics.  But we really need something for standard ePIC physics.  That should look for a MIP signal in each tile, to make use of the individual tile read-out.  Can we discuss a way how to get there?