Published on 08/25/2006

More and more computer games are including the sound of human (and non-human) voices. Pick up a copy of Kennedy Approach, Impossible Mission, Beach-Head II, Jump Jet or Ghostbusters and you'll see what I mean. Not only is speech synthesis being used very widely, but the quality is amazingly clear and improving all the time.

Introduction

By Tom Jeffries Talking Software for the Commodore 64 Talking games have grabbed their fair share of chart toppers this year. Our American sound expert, Tom Jeffries, went to Berkeley, in California, to talk to the freelance sound specialists who put the chat into Ghostbusters, Impossible Mission, Beach Head II, and Kennedy Approach. The latest thing in game software today is speech synthesis. So many games on the market these days use synthesized voices that I decided to find out who was responsible for all this digital eloquence, why software companies are finding it worthwhile to include speech in their programs, and where it is all headed. Back in the days when computers were enormous, expensive machines available only to people in large universities and corporations, the intellectual challenge of playing a game with a machine had to take the place of advanced features like graphics or sound. Computer time and memory space were far too expensive to fill up with such frills, so mainframe games were (and are) usually text-only. Home computers changed all that. Techno-freaks being what they are, it didn't take long before people started demanding arcade-style graphics on home computers, so special chips were added and large amounts of memory were set aside just for graphics. Sound also got attention. At first outboard devices were required to create an audible output, but soon ways were found to incorporate sound capability into computers. The Apple II and IBM PC both use one of the earliest and simplest forms of onboard sound: a speaker driven by a series of on/off pulses sent by writing to a particular memory location. Programmers have created some amazingly complex sounds, including speech, using this primitive hardware. As home computers progressed, both the graphics and sound capabilities got better and better. There has been a consistent push for greater realism in game play.
It Talks
Ghostbusters by Activision	So it won't surprise you that more and more computer games are including the sound of human (and non-human) voices. Pick up a copy of Kennedy Approach, Impossible Mission, Beach-Head II, Jump Jet or Ghostbusters and you'll see what I mean. Not only is speech synthesis being used very widely, but the quality is amazingly clear and improving all the time. Good speech synthesis is very difficult; it didn't surprise me that most software houses have someone else do it for them. It did surprise me, however, to find out that all of the above-mentioned games except for Jump Jet had their speech provided by one company: Electronic Speech Systems of Berkeley, California. Since I only live a few miles away, it seemed like a good idea to run up there and see if I could find out the secret of their success.
ESS started in 1970 when Todd Mozer's father, Dr. Forrest Mozer, a space physicist at the University of California at Berkeley, developed a technique for speech synthesis based on playing back a digitized voice. It had been assumed previously that this approach would use a prohibitive amount of memory, but Dr. Mozer found ways to encode the data and reduce its size as much as one hundred-fold. Other approaches rely on creating an elaborate mathematical model of the human voice, requiring either a special dedicated speech chip or a very fast, powerful (and expensive) central processor, and producing a rather mechanical sounding voice. Dr. Mozer's algorithm keeps the natural inflections of the human voice, and in current implementations, can use any microprocessor.	Professor Forrest Mozer
At first Dr. Mozer concentrated on hardware implementations of his ideas. His technology was used in the first talking calculator for the blind and in a speech chip produced by National Semiconductor. As the limitations of this ap proach became clear, he and his associates began to concentrate on ways to synthesize speech in software with little or no added hardware, which led to the techniques used to reproduce the incredible laugh in Ghostbusters. Currently ESS, in addition to providing blood-curdling sounds for computer games, is producing speech synthesis products for major electronic equipment manufacturers. They've just finished a product for AT&T that will ring you up in case of a fire or burglary at your house when you are away and tell you what the problem is; they are working with a major automobile manufacturer on a system that will tell you if your oil is low, and will tell you or your mechanic what the problem is when you break down. Wow!

How It's Done

The ESS system is protected by a dozen or so patents so the details remain secret, but basically it goes like this. They start out by making a high quality recording of the words they want to use, with a voice they feel is appropriate. (For example, for an educational program based on' Kipling's The Jungle Book they used an Indian student of Dr. Mozer's.) They then digitize the sound (convert it from analog tape-type sound to "1"s and "0"s that the computer can read) and, using a mini-computer, crunch the original down to 100th of its original size. This crunching is the heart of their system. It takes a considerable amount of effort to decide what information can be thrown away, and which information is essential to the sound. The original information usually involves about 10,000 complete sound samples per second; the finished product uses between 90 and 625 bytes per second.
On the Commodore 64, they normally use a rate of 375 bytes per second or less, so it's possible to pack quite a lot of speech into a program. To play back the speech on the Commodore 64, ESS uses the machine's own sound device, the SID chip, but in quite an unusual way. All of the registers of SID are shut down except the volume control, which is varied up and down to recreate the original waveform. Since there are only 16 possible settings, the resulting sound can never be as good as an ordinary tape deck, which has the capability of infinite variation, but they do produce easily intelligible speech. ESS's technology can reproduce the accents and inflections of the original speaker quite accurately, like the Indian in Jungle Book, or can change them as needed so that the same vocabulary can produce a human and a robot voice.	Kipling's Jungle Book by Fisherprice
Kennedy Approach
Kennedy Approach by Microprose	All of this technology is pretty impressive, but it's up to the software companies to put it to use. I asked George Geary of MicroProse Software, publisher of Kennedy Approach, an air traffic control simulation, why MicroProse had decided to use speech synthesis in their program, and his answer was simple and to the point: "To enhance game play." The voice from the airport control tower (you) alternates with the voices from the various airplanes in giving and receiving instructions and really does add a considerable amount of realism to the simulation. Listen carefully, and you will notice that the voices of the different pilots are pitched differently - a subtle touch, but I found that even before I was aware that the voices were different, my ear knew the difference. MicroProse, which has its speech digitizing done by ESS, is so happy with the effect of speech in Kennedy Approach that it is currently adding a male and a female voice to Solo Flight so that they can rerelease an enhanced version. They do plan to limit their use of speech synthesis to programs where the game play itself will be enhanced by the electronic voice.
Other uses of synthesized speech are more whimsical. No one would argue that speech is a necessary part of Ghostbusters, but it certainly adds a distinctive and humorous touch. According to Brad Fregger, Director of Software Development at Activision, they wanted to "give the game the same feeling as the movie", and voice was one way of accomplishing this. Activision considers voice to be "The icing on the cake - we wouldn't leave out the eggs in order to have the icing", but in this case there was room for both. Personally, I'm glad - what other game says, "He slimed me", when I miss? Likewise, the voices in Jump Jet and Impossible Mission, while adding to the enjoyment and character of the software, are not essential to the game. Robert Botch, Epyx's Vice President of Marketing, said speech was put into Impossible Mission "to add something extra - some realism"; the cry that occurs as your character falls through one of the holes in the floor is certainly realistic enough.
A more serious use of speech synthesis is in educational programs. According to Todd Mozer, this is the area where ESS expects to see the greatest use of electronic voices in the future. He said, "There have been a lot of studies done about the effectiveness of speech in learning and the results have been extremely positive. Children will sit in front of a computer longer if it's giving them verbal feedback, and it provides a much more effective mechanism for teaching. I would expect that to be a realm where speech takes off." ESS has already produced speech for several educational programs including Talking Teacher by Imagic and Cave of the Word Wizard by Timeworks.	Cave Of The Word Wizard by Timeworks

The Future

What's the next step in the never-ending battle for greater realism and higher sales? The experts were nearly unanimous: before too long computers will be able to understand and respond to your speech. Speech recognition is extremely difficult to accomplish because of the complexities of the English language and the variations between voices, but several systems have been developed, including the Covox Voicemaster system for the Commodore 64. Mozer thinks that eventually computer manufacturers may include speech recognition capabilities as a part of the computer. It sounds like fun to me: I can think of quite a few things to say to that ghost that slimed me in Ghostbusters. Voicemaster System by Covox
With built-in speech synthesis and speech recognition, you and your Commodore can sit down for a heart-to-heart chat or, more realistically, you will be able to use your home computer, with a modem, as an intelligent telephone answering machine. Not surprisingly, ESS is just putting the finishing touches to a system which does exactly that. If there is any doubt about whether synthesized speech is here to stay or not, check into the specifications for Commodore's new wonder machine, the Amiga. Speech synthesis is built-in to the Amiga, and software companies are rushing to put it to use. So get used to hearing your computer talk back.
Where Are They Now?
Professor Forrest Mozer still works for the University of California, Berkeley but now in the role of Associate Director of Space Sciences. Professor Mozer has been responsible for pioneering work in the area of electrical field measurements and space plasma, where he has also been awarded the EGU Hannes Alfven Medallist 2004. He has continued to maintain his roots in sound research and was involved in the Mars Microphone project to record sounds from the surface of Mars! His son Todd Mozer continued his Father’s breakthrough in speech synthesis technology and now runs the global company Sensory Inc. which researches and manufactures products for advanced speech recognition. Todd has appeared on television shows like Tech Closeup based in California, USA - discussing his companies line of products and numerous articles have been published online about Sensory Inc.	Todd Mozer - Sensory Inc.
Article reproduced from Commodore User magazine October 1985 edition. Although all text appears unchanged - the Where Are They Now? is an addition to the original article. Some photographs or images have been added or modified for aesthetic purposes. Thank you to the following websites which were used for sourcing some images that appear in this article: Tech Closeup, Ebay, SPRG