Background human voices
- On August 22, 2021
- human noise, Noise Cancellation, voice cancellation
There are many cases in which the background noise in phone calls includes human voices. This might happen, for example, in the office where there are other people talking in the background. An extreme case is a crowded call center environment but human noise is not limited to the office and can also appear in other locations like at home or in a coffee shop where there are other people talking in the background. In this post we will try to discuss if and how can a noise reduction application keep the voice of the primary speaker while removing the voices of all other people in the background.
Let’s look at the simple case of one microphone and no a prior knowledge about the primary speaker. In this case, the noise reduction will sometime hear the primary speaker and sometimes hear other people in the background and sometime all voices will be mixed. How can the noise reduction application decide which voice belongs to the primary speaker and which voice belongs to the people in the background? These are all human voices.
Can we rely on volume? Unfortunately volume is not a good indicator since it might miss in both directions – i.e. false positive and false negative. it might indicate that a loud background voice belong to the primary speaker and vice versa it might indicate that a quiet vowels of the primary speaker are background voices. As a result relying on volume to identify the human voices is not a robust method for a production application.
Can we rely on some distance indication? Unfortunately, most common microphones are designed to capture all sounds regardless their distance or direction. Therefore the distance does not play a significant factor with regards to how the voices are captured by the microphone.
Can we learn the voice of the primary speaker during the call? In this approach we learn different features of the voice of the primary speaker during the call and we can use these features to eliminate any voice that does not comply with them. This could have been an excellent approach but unfortunately it falls under the category of “Which came first, the chicken or the egg?”. In other words, the noise reduction application might wrongly learn the features of voices in the background. For example, if the primary speaker stops talking for few seconds and only the background voice is heard, the noise reduction software might learn on the fly the features of the background voice and when the primary speaker starts talking again, his/her voice will be treated as noise since it does not comply with the features of the background voice.
OK, so what can we do to enable a software noise reduction application to effectively remove ambient human voices? There are two approaches to achieve this goal. The first approach is to use more than a single microphone in order to identify the physical location of each voice. This technique is used by mobile phones, which as you know are equipped with multiple microphones, and is also available in our Noise Firewall products. In office environment we use the multiple phones that already exists in the office to build a noise map and identify the physical location of each human voice. For stand-alone scenarios we can use a secondary USB microphone to effectively remove ambient human voices. This type of noise removal is called Reference Based Noise Reduction (a.k.a. RNR)
A different approach would be to provide the software with a prior information about the voice of the primary speaker. Basically you would need to train the software on the voice of the primary speaker prior to asking it to clean noisy calls. This type of noise removal is called Profile Based Noise Reduction (a.k.a. PNR).