Electronics Technology


Processing power of voice recognition technologies requires enhancement for continuous speech recognition

5 October 2005 Electronics Technology

Considered unviable until recently, the realtime speech recognition technology currently used in voice portals consumes immense processing power. The computation-intensive Hidden Markov Model (HMM) technology of the mid-1980s improved the ability of voice recognition devices to identify word relationships and ultimately led to the developing of powerful speech-recognition applications.

For systems to understand and respond to continuous speech, manufacturers have to arrange for availability of a large amount of processing power. However, this will not be possible at reasonable costs. When users speak at natural speed, it becomes difficult to associate specific sounds with particular words. Since users usually do not pause between words, processing naturally spoken phrases in realtime can be tricky.

"Predominantly software-only engines demand more processing power than can be provided by traditional digital signal processing boards," notes VR Yoges, of Frost & Sullivan. "These boards are used in interactive voice recognition (IVR) systems and they need additional processors to supplement the IVR processing power as well as support and manage the system."

Nortel's modern speech-processing platform integrates technologies into a range of the media processing server (MPS) platforms. The MPS systems configured with additional speech servers decrease the response time of a voice recognition solution. The speech server is a speech-processing platform within an IVR/media processing platform offering choices, investment protection and scalability. The advanced system software developed on this platform integrates with industry-standard components to offer the advantages of open architecture systems.

"The design employs high-performance processors that plug into a separate resource subsystem integrated into the core operating architecture of the IVR/media server platform," says Yoges. "This approach provides a cost-effective and scalable resource for running advanced speech recognition and analysis."

Voice recognition systems also need to make allowances for the diverse enunciations and intonations of the same word by different people. The resultant issues of interpreting speech variability have led to the development of complex pattern analysis. Apart from accents, voice recognition systems have trouble filtering out background noise - especially from calls made by mobile phone users. Although better microphones have remedied this issue to a small extent, wind, murmurs and music still require proper isolation from the voice.

To sort out these concerns, ScanSoft introduced the OpenSpeech Recogniser (OSR), a speech recognition solution for telephony applications. A prominent feature of this solution is its ability to enable applications in understanding a range of words and phrases without requiring highly complex grammar rules.

Innovations in automatic speech recognition (ASR), along with new solutions for missing or unreliable data, seek to create minimal fuss about noisy backgrounds and rely on clean speech. It is possible to obtain highly improved speech solutions using such models. This missing-data approach to robust ASR, works on the premise that when speech is one of the several sound sources, recognition is possible through some spectral-temporal regions that remain uncorrupted.

Since spectral features are sensitive to gender differences, it will be easy to analyse the differences in what the models have learnt about male and female speech patterns. Grammar constrains the recognition hypotheses and decides on a sequence of male or female models.

Researchers in the University of Sheffield discussed four system variants. They found discrete signal-to-noise ratio (SNR) masks based on estimates of local SNR. The first 10 frames in the spectral amplitude domain averaged to form a stationary noise estimate. Subtracting this value from the noisy signal forms clean signal estimates.

"The high threshold here offers a safety margin reducing the impact of the errors introduced by a poor fitting," observes Yoges. "Softmarks SNR, in contrast, has fuzzy interpretation, allowing more points to be let through without the damage caused by admitting noise outweighing."

If readers are interested in further information about the analysis of advances in voice recognition technology, they may contact Magdalena Oberland, [email protected], with their details.





Share this article:
Share via emailShare via LinkedInPrint this page

Further reading:

140 W USB-C PD reference design
Altron Arrow Electronics Technology
The design has a wide input range of 90 to 264 V AC, 50-60 Hz, and supports an output voltage range of 5 to 28 V (USB-PD 3.1 specification).

Read more...
Nanometre-precision piezo actuators
RS South Africa Electronics Technology
TDK Corporation has announced two new piezo actuators that are characterised by a wide dynamic range, a high force-to-volume ratio, but with precision in the nanometre range.

Read more...
Webinar: The evolving electrification of the power distribution system
Infineon Technologies Electronics Technology
New connected car functionality, along with the necessity to reduce the cost, weight and complexity associated with wire harnesses, has led to the transformation of the power distribution system in automotive engineering.

Read more...
Improved MnZn material for power conversion industry
Sivan Electronic Supplies Electronics Technology
Cosmo Ferrites Ltd, a leading manufacturer of soft ferrites, has launched an improved version of CF295 for the power conversion industry.

Read more...
Common mode filter for automotive Ethernet
Avnet Abacus Electronics Technology
TDK Corporation has announced the introduction of its new ACT1210E Series common mode filter for automotive Ethernet 10BASE-T1S.

Read more...
Energising the industrial edge
Electronics Technology
As if the drive to decarbonise energy as part of sustainability and climate change efforts was not enough, the recent rise in energy prices has brought into sharp contrast the need to re-examine how we generate, distribute, and consume electricity.

Read more...
Samsung begins chip production using 3 nm process technology
EBV Electrolink Electronics Technology
The optimised 3 nm process with GAA architecture achieves 45% lower power usage, 23% improved performance and 16% smaller surface area compared to 5 nm process.

Read more...
Panasonic releases its updated touch-sensitive knob
Altron Arrow Electronics Technology
Panasonic, in conjunction with Microchip, has launched an update to its existing Magic Knob, a capacitive knob ready for standard touch sensors for use in controlling automotive information displays.

Read more...
Microchip’s new IC to replace Hall effect position sensors
Altron Arrow Electronics Technology
The LX34070 IC from Microchip is set to help accelerate the global move away from expensive and less accurate magnet-based solutions for safety-critical EV motor position monitoring.

Read more...
A brief history of HBTs
Conical Technologies Electronics Technology
In 1947 the engineers at Bell Labs were tasked with developing a transistor. This development heralded the beginning of the semiconductor industry which changed the world forever. Transistors would have ...

Read more...