Automatic Speech Recognition (ASR) in IVR Systems
- On November 20, 2016
- ASR, IVR
Background
The most important role of an IVR system is to efficiently handle incoming calls allowing the organization to improve its service while saving time & money when processing these calls. In order to improve ROI and increase customer satisfactions, organizations need to implement an efficient IVR that quickly gathers the necessary information from the caller and routes the call to the most appropriate destination.
Getting Inputs from the Caller
So how can the IVR system gather information quickly and efficiently? A common method of accepting inputs from the caller is via DTMF tones. This is a simple and robust technique but it is very cumbersome. Everyone is familiar with the annoying experience of listening to a long menu with multiple choices and then continuing “falling” to an even longer submenu.
Since IVR needs to communicate over the phone with humans and not with machines, the most natural method would be for the IVR system to establish a speech channel and not a data channel. Exactly for this purpose Automatic Speech Recognition (ASR) systems where created. So, if this is a good concept why aren’t everybody using it all the time ?
Practical Implementation Issues with ASR
A practically concern with ASR systems is the non-accurate interpretation of the caller’s speech. There are three main reasons that reduce the accuracy of ASR systems:
- Ambient Noise. In many cases the caller is calling from a noisy environment and the ambient noise is reducing the accuracy of speech recognition engines. You can read more on cancelling ambient noise.
- Accent. Not all ASR systems are able to efficiently interpret different accents. Vendors of ASR systems keep improving their engines, sometimes by adding vertical solutions – i.e. solutions that are tuned for specific market segments.
- Network quality. Low network quality is causing interruptions in the audio signal that impact the ability for proper speech recognition.
Practical Ways for Improving ASR Accuracy
Improving Audio Signal
Integrate a solution that will actively analyze the audio signal and improve it before it reaches the speech recognition engine. This solution should take care of issues like ambient noise, network jitter, etc. The following post elaborates on factors for VoIP quality like bandwidth, quality of service (QoS)etc.
Keep Monitoring
Remember, this is never a one-time effort. You must invest in the right technology/tools that constantly monitor the quality of the network & audio signal and alert you on any issues that arise from time to time. In many cases problems may appear only from time to time and therefore you should have 24×7 monitoring.