The ESS system is protected by a dozen or so patents so the details remain secret, but basically it goes like this. They start out by making a high quality recording of the words they want to use, with a voice they feel is appropriate. (For example, for an educational program based on' Kipling's The Jungle Book they used an Indian student of Dr. Mozer's.) They then digitize the sound (convert it from analog tape-type sound to "1"s and "0"s that the computer can read) and, using a mini-computer, crunch the original down to 100th of its original size. This crunching is the heart of their system. It takes a considerable amount of effort to decide what information can be thrown away, and which information is essential to the sound. The original information usually involves about 10,000 complete sound samples per second; the finished product uses between 90 and 625 bytes per second.

On the Commodore 64, they normally use a rate of 375 bytes per second or less, so it's possible to pack quite a lot of speech into a program.

To play back the speech on the Commodore 64, ESS uses the machine's own sound device, the SID chip, but in quite an unusual way. All of the registers of SID are shut down except the volume control, which is varied up and down to recreate the original waveform.

Since there are only 16 possible settings, the resulting sound can never be as good as an ordinary tape deck, which has the capability of infinite variation, but they do produce easily intelligible speech.

ESS's technology can reproduce the accents and inflections of the original speaker quite accurately, like the Indian in Jungle Book, or can change them as needed so that the same vocabulary can produce a human and a robot voice.
Kipling's Jungle Book
Kipling's Jungle Book by Fisherprice

Kennedy Approach

Kennedy Approach
Kennedy Approach by Microprose
All of this technology is pretty impressive, but it's up to the software companies to put it to use. I asked George Geary of MicroProse Software, publisher of Kennedy Approach, an air traffic control simulation, why MicroProse had decided to use speech synthesis in their program, and his answer was simple and to the point: "To enhance game play." The voice from the airport control tower (you) alternates with the voices from the various airplanes in giving and receiving instructions and really does add a considerable amount of realism to the simulation. Listen carefully, and you will notice that the voices of the different pilots are pitched differently - a subtle touch, but I found that even before I was aware that the voices were different, my ear knew the difference.

MicroProse, which has its speech digitizing done by ESS, is so happy with the effect of speech in Kennedy Approach that it is currently adding a male and a female voice to Solo Flight so that they can rerelease an enhanced version. They do plan to limit their use of speech synthesis to programs where the game play itself will be enhanced by the electronic voice.

Other uses of synthesized speech are more whimsical. No one would argue that speech is a necessary part of Ghostbusters, but it certainly adds a distinctive and humorous touch. According to Brad Fregger, Director of Software Development at Activision, they wanted to "give the game the same feeling as the movie", and voice was one way of accomplishing this.

Activision considers voice to be "The icing on the cake - we wouldn't leave out the eggs in order to have the icing", but in this case there was room for both. Personally, I'm glad - what other game says, "He slimed me", when I miss?

Likewise, the voices in Jump Jet and Impossible Mission, while adding to the enjoyment and character of the software, are not essential to the game.

Robert Botch, Epyx's Vice President of Marketing, said speech was put into Impossible Mission "to add something extra - some realism"; the cry that occurs as your character falls through one of the holes in the floor is certainly realistic enough.

A more serious use of speech synthesis is in educational programs. According to Todd Mozer, this is the area where ESS expects to see the greatest use of electronic voices in the future. He said, "There have been a lot of studies done about the effectiveness of speech in learning and the results have been extremely positive. Children will sit in front of a computer longer if it's giving them verbal feedback, and it provides a much more effective mechanism for teaching. I would expect that to be a realm where speech takes off." ESS has already produced speech for several educational programs including Talking Teacher by Imagic and Cave of the Word Wizard by Timeworks.
Cave Of The Word Wizard
Cave Of The Word Wizard by Timeworks