Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.

retrac 15 daysReload

Classic speech synthesis is interesting, in that relatively simple approaches, produce useful results. Formant synthesis takes relatively simple sounds, and modifies them according to the various distinctions the human speech tract can make. The basic vowel quality can be modelled as two sine waves that change over time. (Nothing more complex than what's needed to generate touch tone dialing tones, basically.) Add a few types of buzzing or clicking noises before or after that for consonants, and you're halfway there. The technique predates computers; it's basically the same technique used by the original voder [1] just under computer control.

Join that with algorithms which can translate English into phonetic tokens with relatively high accuracy, and you have speech synthesis. Make the dictionary big enough, add enough finesse, and a few hundred rules about transitioning from phoneme to phoneme, and it's produces relatively understandable speech.

Part of me feels that we are losing something, moving away from these classic approaches to AI. It used to be that, to teach a machine how to speak, or translate, the designer of the system had to understand how language worked. Sometimes these models percolated back into broader thinking about language. Formant synthesis ended up being an inspiration to some ideas for how the brain recognizes phonemes. (Or maybe that worked in both directions.) It was thought, further advances would come from better theories about language, better abstractions. Deep learning has produced far better systems than the classic approach, but they also offer little in terms of understanding or simplifying.

[1] https://en.wikipedia.org/wiki/Voder

miki123211 15 daysReload

Blind person here, ESpeak-ng is literally what I use on all of my devices for most of my day, every day.

I switched to it in early childhood, at a time where human-sounding synthesizers were notoriously slow and noticeably unresponsive, and just haven't found anything better ever since. I've used Vocalizer for a while, which is what iOS and Mac OS ship with, but then third-party synthesizer support was added and I switched right back.

mewse-hn 15 daysReload

No example output? Here's a youtube video where he plays with this software

https://www.youtube.com/watch?v=493xbPIQBSU

jryb 15 daysReload

When speaking Chinese, it says the tone number in English after each character. So "你好" is pronounced "ni three hao three". Am I using this wrong? I'm running `espeak-ng -v cmn "你好"`.

If this is just how it is, the "more than one hundred languages" claim is a bit suspect.

fisian 15 daysReload

I used it on Android and it seems to be one of very few apps that can replace the default Google services text-to-speech engine.

However, I wasn't satisfied with the speech quality so now I'm using RHVoice. RHVoice seems to produce more natural/human-sounding output yo me.