This is cool - there’s some similar work here https://arxiv.org/pdf/2402.01571 which uses spiking neural networks (essentially Dirac pulses). I think the next step for this would be to learn a tonal embedding of the source alongside the event embedding so that you don’t have to rely on physically modelled priors. There’s some interesting work on guitar amp tone modelling that’s doing this already https://zenodo.org/records/14877373
This is a hypothesis put forward by Gerald Langner in the last chapter of “The Neural Code of Pitch and Harmony” 2015. I personally think he was on to something but sadly he died in 2016 before he could promote the work
I’m the author of the high resolution guitar model posted in a comment above. I have a drum transcription model that I’m getting ready for release soon which should be state of the art for this. I’ll try to update this thread when I’m done