Voice Activity Detection’s Role in Better Conference Calls


Conference rooms often don’t have proper acoustic treatment or quality equipment, which can make conference calls a challenge. Pervasive background noises, quiet voices, and unintelligible speech are just a few things that go can awry when talking with clients or colleagues. One way to address these issues and enhance conference calls is through voice activity detection.

What is Voice Activity Detection?

Voice activity detection is a technology that detects human speech even when background noise is present. Products such as the YVC-1000, YVC-330, and YVC-200 utilize voice activity detection to significantly enhance accuracy whenever they pick up audio signals from their microphones. Additionally, it can help save network bandwidth and computation since it prevents needless coding of silence in Voice over Internet Protocol (VoIP) applications.

How Does Voice Activity Detection Work?

Voice activity detection plays an important role alongside these three signal-processing capabilities:
  • Noise reduction
  • Automatic tracking
  • Automatic gain control

Voice Activity Detection and Noise Reduction

Noise reduction is a sound-processing function that detects steady background noises – such as air conditioning units – and minimizes or eliminates those noises from the sound pickup signals. Conventional commercial systems do an adequate job of reducing constant noises, but only when there are no voices. These systems might recognize steady human voices – i.e., a drawn-out “umm” – as misidentified noise components and eliminate them. 

Conversely, the YVC-1000 leverages Human Voice Activity Detection (HVAD) to generate a much better noise reduction and signal-to-noise ratio than commercial systems. Yamaha’s HVAD can filter steady noises not only from the background but also through the speech bandwidth range. 

Voice Activity Detection and Automatic Tracking

Automatic tracking is a sound-processing capability that detects a speaker’s location within a room and latches onto that voice, making it an effective solution in noisy conference rooms. The YVC-1000, for example, picks up the audio source location using its microphone’s array control function, which features three microphone elements. The HVAD technology embedded in the YVC-1000 dramatically enhances the accuracy of detecting the speaker’s location. When the sound sources are spotted, the technology distinguishes whether they’re human voices or not. Then, HVAD uses those results to decipher the areas of any steady noises or isolated sounds to minimize inaccurate identifications. So, even if there’s a fan blowing or someone’s shuffling papers, the microphones will not lose focus of the speaker’s location.

Voice Activity Detection and Automatic Gain Control

Automatic gain control is a function that automatically adjusts and normalizes the level of a speaker’s voice. It compensates for people who speak softly, talk too loudly, or are further away from the microphone. Conventional commercial systems can’t distinguish noises and low voices well, making it difficult to raise the volume of low voices. However, products like the YVC-100 utilize HVAD to increase the accuracy of voice determinations and decipher between human voices and steady noises. Therefore, automatic gain control can be stabilized and the voice output to the far end is a consistent level for participants.

Are you interested in taking your conference calls to the next level? Check out our Product Finder to pick the right equipment for your needs.



HVAD in the YVC-330: