Editor in Chief
- Jan 5, 2011
For folks curious about the process of how Google created the new and more human sounding voice for Google Now, the above video is worth a gander. It's actually quite fascinating to see the complexities of human voice communication, and the video does a good job of breaking some of it down.
The gist of it is that, the basic units of sound which make up every word are culled into a huge library of phonemes, phones, and diphones. Scientists have studied thousands of spoken dialogue from voice actors to help nail down the way these different sounds are created naturally from word to word. This helps them create a voice synthesis engine which doesn't need to be a series of recordings, but can be simulated in real-time.
This is basically just the way real people do it with the organic tools given to us by nature. Of course, there's still a ways to go before it can be perfected. Although Google has truly improved upon the voice characteristics found in their synthesis engine, they are still working on perfecting the prosody and intonation.
The video makes some of these terms more clear, so it's definitely worth a watch when you have the time.