You walk into the conference room more prepared than ever to lead a compelling discussion with your most important stakeholders. The video works fine. But moments into your introduction, a remote conference participant interrupts. He didn’t understand what you said. The background noise is pronounced and the voice is so unclear it’s difficult to understand his question.
In the workplace, communication—especially verbal communication—must be clear and concise because so much can go wrong if an important detail is missed or lost in translation. IT departments are on the hook to provide solutions that allow employees to communicate effectively with people located far away. In an age when consumer devices and applications make it easy for people to connect anytime using text, audio and video, the workforce is expecting the same from business solutions. And companies are responding. Experts are predicting the unified communications (UC) market to grow at 18.5% CAGR from 2015-2020.
A lot of research has been conducted on business communications and even with the hype around video conferencing, the findings are universally clear: The fundamental requirement for effective collaboration is effective audio. Without audio, video is two-way surveillance, and content sharing is an electronic billboard. While both have value, it's not in communicating complex messages. Video and content can significantly enhance communication, but they are not enough on their own. Meeting attendees might tolerate a choppy video connection, but if there is no sound or it’s breaking up, there is no meeting.
Sound, Sound, Sound
When it comes to conferencing, the considerations need to be sound, sound, sound, content and video—in that order. The first three translate to room sound, equipment sound and infrastructure sound. Get them wrong, and the last two are far less important because the experience is already poor for the users. And of course, overlaying everything is ease of use; simplicity outweighs functionality.
Meeting attendees might tolerate a choppy video connectino, but if there is no sound or it's breaking up, there is no meeting.
The justification for great audio is actually biological in nature. We have to work so much harder to understand someone when the sound quality is poor or choppy, and that work takes up brain power that needs to be devoted to understanding and absorbing the message, not just discerning the words. This unnatural, unnecessary work causes the listener to become fatigued during calls, and ultimately, less mentally alert during and after the call.
With the right mix of audio solutions and the right environment, conference sound can be as clear and intelligible as two people talking to each other in the same room. So how does one achieve realistic, natural sound? Think of the conversation in three parts: Sound capture, transmission, and reproduction. Sound capture and reproduction are based on the environment and the equipment used in the environment—two or more rooms, device and people in the conversation. Sound transmission is the infrastructure that transports the sound between the environments.
When deploying an AV solution use a room or space that doesn’t introduce excessive background noise or add strange artifacts such as echo or reverberation. Choose an enclosed space (i.e. a room) if possible, and if not, then choose a space where the surrounding noise is limited. Once an environment has been chosen, maximize its audio potential.
For example, add wall and window treatments to prevent unwanted reflections, echoes or reverberations from those surfaces. Simple treatments such as partially or fully closed blinds and uneven surfaces created by objects like books in book cases can be used, as well as advanced treatments such as dedicated acoustic panels. When the environment is noisy or if the sound is distorted by reflections, participants will struggle to listen.
When choosing the audio solution, select a product designed for the environment. (Hint: personal audio devices are not designed to fill a room with sound effectively, and oversize solutions played quietly simply cost too much.) Find a product or product family designed to reproduce the full spectrum of the human voice. The traditional telephone network (Public Switched Telephone Network or PSTN) was never designed to deliver the full sound of the human voice. It was actually designed to squeeze as much voice traffic as possible down long haul cables that were limited in capacity and expensive to construct. Unfortunately, we are so used to standard telephone audio that we often accept that it is of adequate quality. It’s not.
Nearly all modern UC systems use wideband audio (a range of sound comparable to the full human vocal range) and can transmit rich audio data with incredible fidelity over a digital network. Some are starting to use ultra-wideband audio, which may sound esoteric, but really it means the full sound a human can produce and hear. Either wideband or ultra-wideband can sound infinitely better than the telephone network when all the parts are considered.
So why do so many modern UC solutions sound little better than the telephone networks they replace? For many people, the UC infrastructure is the solution, but look at the other two parts of the puzzle: capture and reproduction. It’s so easy to overlook the human interfaces of the UC infrastructure by using sub-standard equipment or by placing the equipment in a terrible acoustic environment. Think of the old adage, "garbage in, garbage out." In essence, that means sound will never be any clearer than it is when it leaves the speaker's mouth, and each step in the chain needs to maintain that standard if the sound entering the listener's ear is to be of the same quality. If the goal is realistic, natural, face-to-face quality sound, then a 1986 speakerphone on a 2016 UC network cannot do the job.

After Audio, Content & Video
Once the audio chain is sorted out, it's time to consider how to share content. The challenge here is different: Success depends less on how content is presented from a technological perspective and more on ease of use. Research indicates meetings that involve the use of technology can take up to 15 minutes to start. This is, of course, a real cost to the companies involved in lost working hours. Removing this 15 minutes goes a long way to enabling an effective meeting. Any form of content sharing has to be simple and consistent with user expectations, whether they’re expecting to connect a PC by a wire to a display or share using the UC infrastructure. Applying those same principles for the experience in the meeting room is critical.
Finally, consider video. When done well, video enhances a meeting. Participants remain more focused and engaged. Facial expressions and body language provide an important extra layer of human interaction to improve comprehension and understanding. With that in mind, it's key to ensure that the equipment can capture and display those expressions. Video and audio have a lot in common: capture, transmission and reproduction.
- Is the camera right for the room—size, magnification, color fidelity and lighting?
- Is the resolution and view adequate to cover the room, i.e., is the camera designed for the size of room or is it just a high resolution personal webcam?
- Is the screen large enough, accurate enough, of a high enough resolution and bright enough (but not too bright) to faithfully reproduce the image?
- When looking at the video image, can the user on the far end see a sufficient level of detail to make sense of the images and are the facial expressions recognizable?
- Are the remote people almost life-sized when on screen? Is the network adequate to transport the video traffic reliably?
Video requires bandwidth to work well, and that means the network has to be correctly sized and optimized to allow the best video quality. Just like audio, video technology has also evolved significantly over the years. Easy to use solutions can be found that are both network ready and properly designed to enrich the meeting experience, and at a price far more affordable than the telepresence solutions of the past. Modern video codecs can dynamically compensate for network variances and maintain very high quality images. Remember "garbage in, garbage out," so ensuring video capture, transmission, and reproduction are all appropriate is essential to ensuring video adds value.
Evaluating the Choices
One of the biggest challenges in selecting an audio solution is that it's difficult to know how useful the equipment will be. Without a demonstration or test in a realistic setting it's hard to gauge the sound quality one can expect. Video quality is more obvious, both the images on the screen and the resolution specifications tell the story. No one in an IT department would be confused between the performance of an old VGA monitor vs. a modern 1080p LCD flat screen. The LCD wins hands down on paper and in the real world. Judging audio quality is harder because we’ve been conditioned by the telephone network, which does not provide realistic, natural sound. It’s often difficult to tell who we’re talking with or get the full experience of the message. But because we’ve been used to inadequate audio quality on remote calls for so long, many of us don’t realize it doesn’t have to be that way. Face-to-face quality communication is now achievable within a realistic budget and we should expect it.
Amidst constant change one thing stays the same: users can recognize a poor experience
The challenge is selecting the products that meet that expectation. Product specifications will clearly indicate whether a product was designed for the telephone network or for a UC environment (e.g., wideband audio). However, how well these capabilities have been implemented by the manufacturer is impossible to discern from the specifications, and even price provides little guidance.
Listening to recorded sound files to evaluate conference phones tells only part of the story. Too often the sound files are played back on a PC and the speakers used for most PC audio is insufficient for the important sound quality differences to be heard. The only way to know for sure is to test the products; the best approach is to test two or more of them in situ. That way the performance of the microphones and speakers can be confirmed, along with how well the system works in the room in which it will be installed. Only a true demonstration will help reveal how the equipment handles issues like room noise, reflections and echoes, audio pickup from people located at different distances from the microphones, and the ability of the system's speakers to fill the room comfortably.
Think about listening to someone shouting to be heard, as opposed to talking calmly. The former sounds harsh, the latter comfortable. With that in mind, it’s obvious why a personal speakerphone will not provide a good experience when being asked to fill a room with sound. When used for one or two people the phone can “talk” and probably works well. When used as a conference phone it has to “shout” to be heard.
Putting the AV in IT
With wireless communication protocols and online administration, modern UC systems have removed many installation and technical challenges once common in AV implementations. The lines are blurring between AV and IT, and more and more AV responsibilities are shifting to IT departments. As AV departments disappear so do dedicated experts who know how to make sound and video work well in a given space. At the same time UC introduces new challenges. The use of standardized PC connections (USB, Bluetooth, etc.) has allowed many manufacturers to enter the UC market with consumer-grade or highly-specified personal equipment, both of which translate to video and audio capabilities ill-suited to a conference room environment. The AV team knew this; the IT team may still be learning it.
Amidst constant change one thing stays the same: Users can recognize a poor experience. They are just as likely to swamp the IT help desk with issues today as they always have been, especially as personal video conferencing (Skype, Hangouts, Facetime, etc.) solutions set expectations for workplace communications.
Business users know, and thus expect, video should work well. And, if they’re used to using a quality Bluetooth headset, they also know audio can be realistic and natural sounding. The good news is many of those consumers are the IT folks responsible for deploying modern AV systems today. So if they keep their expectations high, recognize that personal and conference room use are different, and hold their providers and equipment manufacturers accountable, they can deploy remarkable solutions at a fraction of the cost of the systems available to the AV department only a handful of years ago.