If you are waiting at the airport, you will often hear such a voice: "Hello, passengers on flight ca1831 from Beijing to Shanghai, please note that your plane will take off soon. Please take your luggage and boarding from Gate 15." After a while, the voice will change the flight name, departure and arrival places and boarding gates again. Which announcer is speaking? Isn't it troublesome for her to say the same sentence so many times?
In fact, most of these sounds are synthesized by computers. As long as the operator inputs a paragraph of text, the computer can automatically turn it into a human voice. To report different station names, just change the text of the station name. To change a male voice or a child's voice, just select a different voice in the voice library. The computer can do such a job for as long as it takes.
This kind of work of turning text into language is called speech synthesis. The initial speech synthesis technology was very simple, that is, to read the pre recorded words word by word. But this way sounds very mechanical, just like a child who has just learned to speak. More importantly, the number of words that the computer can speak is limited to the content recorded in advance, and the flexibility is too low.
People later found that since human language can be expressed in pinyin composed of initials and vowels, why not let computers "learn" Pinyin? After the computer learns how to use and combine the pronunciation of each "initial consonant" and "vowel", it only needs to make a set of software to turn the characters into "pinyin" that can be recognized by the system, and the computer can speak anything, whether dialect, Mandarin or foreign language. When computer software can process the tone change of language according to the subtle relationship between each word and word, the computer will not only "speak in a round voice", but also become "emotional".
To realize these functions, a computer with strong performance is needed. In the early days, it was only used by companies with large computer systems to make various pre recorded prompts. Fortunately, with the development of today's computer software and hardware technology, the work that originally needed a large-scale computer system to complete can now be handled by an ordinary personal computer. Therefore, more people in need can enjoy the benefits of speech synthesis.
For example, the open source non visual desktop access system (NVDA) can turn the text displayed on the screen into sound to help people with visual impairment "see" the computer. The reading software in some handheld devices can read novels directly. The development of microelectronic technology further reduces the whole set of software and hardware of speech synthesis to one chip, so that electronic devices such as electronic dictionaries and MP3 players can "speak".