Wednesday, April 16, 2008

Text To Speech

As with many modern computer technologies, I like to at least have a passing knowledge of them. One that I have always found to be quite interesting is synthetic digital voices and the most obvious use for them is having your computer read text to you.

The first digital voice I remember hearing was way back when I was in the neighborhood of eight to ten years old. It was on a PBS show called 3-2-1 Contact. It amazed me that a computer could replicate the speech of one of the guys on the show. The program took the words spoken by the guy and kept repeating it over and over, until it had a digital version of the speech. Just as if someone had heard something in a foreign language and were repeating it to themselves until they had a close approximation of the phrase. Granted it was a bit hard on the ears and lacked any sort of inflection or depth, but it was pretty much what one would expect from a computerized voice in the 80's.

A number of years ago, sometime in the last ten years, very basic programs started coming with the Windows and Mac operating systems that could read text and convert it to speech. Such programs are referred to as Text to Speech (TTS) software. Of course, this is significantly different than what I saw in my childhood because the computer now has to figure out how to sound out the word without hearing a sample. Honestly, I don't think the final voice itself was significantly improved over the one I heard on 3-2-1 Contact. What I find interesting, is the research for the creation of a digital voice was done 25 years ago and it was finally implemented 10 to 15 years later.

Of course, nothing but improvements have been made in the digital voices in the last 10 years. Last year, on a whim, I decided to check out the best voices currently available and it seemed AT&T had a TTS voice named Crystal. I tried it relative to the basic model voice that comes in Windows and the Mac OS, and there was a world of difference. Listening to the basic model is a bit painful and the inflections that end up occurring are very weird, but the Crystal voice isn't so bad. Don't get me wrong. The voice is quite obviously digital and, again, some of the inflections and pauses that arise can still be weird, but the voice is at least tolerable. I find it akin to listening to someone with a mild foreign accent, only the accent is digitized English.

Advances come much more quickly these days, and on another whim, I checked for the latest a couple of days ago. I came across another set of voices from a company called NeoSpeech. I suspect these voices were around last year and I missed them, but since I found them this time around, I can claim them to be new advances in my own personal knowledge base. I must say they are a quantum leap better than the AT&T voices. Paul and Kate were the two voices I checked out and not only are they mild improvements over Crystal, they take only 1/3 of the hard drive space. Crystal weighs in at about 700MB to install and these two are each about 233MB.

I don't use TTS software very often. Mostly because I can read just fine on my own and these programs are primarily being developed for those with poor eyesight and dyslexia. Also, reading something to oneself is much faster than having someone or some program read it to you. However, I do find all three to be equitable computerized voices and use them from time to time when I want to read something, but I also need to do something else that doesn't require my brain. Perhaps 95% of the time that I do use the TTS software I am cleaning my apartment.

Hope you enjoyed your little technology lesson!

1 comment:

John said...

Pat, I don't know if you look back on these old posts but I am doing some catch up and had to weigh in on this. I think TTS is great to read your own work. The problem with reading your own work is that you are familiar with it and it is very likely you miss something because you get comfortable and lax. Using TTS allows you to pick up mistakes easier, in my opinion.