Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.

Source:https://github.com/SoraKumo001/next-streaming

⬅️ Zimtohrli: A New Psychoacoustic Perceptual Metric for Audio Compression

Dave_Rosenthal 11 daysReload

A few comments:

- My understanding is that a gamma chirp is the established filter to use for an auditory filter bank--any reason you choose an elliptical filter instead?

- I didn't look too closely, but it seems like you are analyzing the output of the filter bank as real numbers. I highly recommend you convolve with a complex representation of the filter and keep all of the math in the complex domain until you collapse to loudness.

- I'd not bucket to discrete 100hz time slices, instead just convolve the temporal masking function with the full time resolution of the filter bank output.

- You want to think about some volume normalization step that would give the final minimized Zimtohrli distance metric between A and B*x, where x is a free variable for volume. Otherwise, a perceptual codec that just tends to make things a bit quieter might get a bad score.

- For fletcher munson, I assume you are just using a curve at a high-ish volume? If so, good :)

- Not sure how you are spacing filter bank center frequencies relative to ERB size, but I'd recommend oversampling by a factor of 2-3. (That is, a few filters per ERB).

Apologies if any of these are off base--I just took a quick look.

givinguflac 11 daysReload

I looked through the deeper explanation and found this interesting:

“Performing a simple experiment where we have 5 separate components

1000 Hz sine probe 57 dB SPL 750 Hz sine masker A at 71dB SPL 800 Hz sine masker B at 71 dB SPL 850 Hz sine masker C at 67 dB SPL 900 Hz sine masker D at 65 dB SPL I record the following data

When playing probe + masker A through D individually I experience the probe approximately as intensely as a 1000Hz tone at 53dB SPL. When playing probe + all maskers I experience the probe approximately as intensely as a 1000Hz tone at 48dB SPL.”

I would be very interested in understanding more about their testing methodology and hardware setup especially.

Is the perceiver a trained listener? Are they using headphones or speakers or some other transducer method?

It's awfully difficult to say that there is equivalent perceived SPL for different frequency domains, even as a trained listener. Especially given the different frequency response for different listening setups.

The average user has no chance; hence my curiosity of their specific credentials considering they’re building an entirely new perceptual model based on that.

Thoreandan 11 daysReload

Interesting, if hard-to-understand.

It would be nice to see ELi5 explanations for items like this akin to Monty's 'A Digital Media Primer for Geeks' ( https://people.xiph.org/~xiphmont/demo/#:~:text=Xiph )

formerly_proven 11 daysReload

I'm guessing the name is meant to allude to cinnamon pig ears (https://en.wikipedia.org/wiki/Palmier).

DoctorOetker 11 daysReload

Are there any associated scientific articles and/or datasets that back up the experimental claim/insinuation of matching JNDs or perceptual differences?

Is this a proposal without experimental verification?