Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.
Join that with algorithms which can translate English into phonetic tokens with relatively high accuracy, and you have speech synthesis. Make the dictionary big enough, add enough finesse, and a few hundred rules about transitioning from phoneme to phoneme, and it's produces relatively understandable speech.
Part of me feels that we are losing something, moving away from these classic approaches to AI. It used to be that, to teach a machine how to speak, or translate, the designer of the system had to understand how language worked. Sometimes these models percolated back into broader thinking about language. Formant synthesis ended up being an inspiration to some ideas for how the brain recognizes phonemes. (Or maybe that worked in both directions.) It was thought, further advances would come from better theories about language, better abstractions. Deep learning has produced far better systems than the classic approach, but they also offer little in terms of understanding or simplifying.
If this is just how it is, the "more than one hundred languages" claim is a bit suspect.
I switched to it in early childhood, at a time where human-sounding synthesizers were notoriously slow and noticeably unresponsive, and just haven't found anything better ever since. I've used Vocalizer for a while, which is what iOS and Mac OS ship with, but then third-party synthesizer support was added and I switched right back.
However, I wasn't satisfied with the speech quality so now I'm using RHVoice. RHVoice seems to produce more natural/human-sounding output yo me.
https://www.youtube.com/watch?v=493xbPIQBSU