By Chris "Dr. CT" Bajorek
Q: Our company is planning to bring up a VoIP network between our six main corporate sites -- four in the U.S. and two overseas. Current plans are for purchasing the VoIP gateway options for our name-brand enterprise PBXs. Fortunately, we have the same PBX vendor for all sites in the U.S. but a different PBX vendor in our overseas offices. We are most concerned about speech quality issues when we turn up these new connections. I was warned that using VoIP will result in lower quality than what we are used to unless I "adequately plan." I was also warned about "multi-hop calls." I would appreciate any explanations and advice.
-Harold G, New York
A: I have good and bad news for you, Harold. The bad news is that, indeed, without verifying a few things before you start purchasing VoIP components for your existing PBXs, you could be in for some hefty "field experience." Things may not work exactly the way you want, meaning that you could suffer from a variety of VoIP network performance problems. The good news is this: With an adequate awareness of the issues you are likely to encounter, you can turn this exercise into something with a fairly predictable end result. More good news: So many companies are in the process of converting to VoIP that VoIP equipment vendors who service the enterprise business segment are getting pretty good at it.
Speech quality has been, and continues to be, the primary concern of businesses that wish to transition to VoIP. This concern is well-founded. While the cost savings can yield payback times of one year or less, too many VoIP installations still are unnecessarily saddled with poor speech quality. My experience at CT Labs tells me that while there are still telecom companies coming out with their very first VoIP product offering, most VoIP products being sold today are using relatively mature, debugged cores. This directly translates into better speech quality.
Here's an abbreviated list of issues to address before you start filling out purchase orders for VoIP equipment.
Analyze busy-hour loads at each site. No transition plan should be attempted without first knowing what kind of call traffic you will be experiencing between sites. Obviously, since the VoIP equipment will most likely be carrying only long distance calls, those are the calls to analyze. Busy-hour studies determine how many ports of call-handling equipment you will need to satisfy your call-blocking goals. Knowing this will allow you to purchase the right number of VoIP gateway ports and PBX trunk lines to support them at each site. This will also allow you to adequately plan for the right amount of IP, ATM, or frame relay bandwidth between gateways. You may want to employ a knowledgeable voice traffic engineer for this step if you are unfamiliar with phone traffic analysis principles.
Verify site VoIP interoperability. Since you will be dealing with a different PBX vendor overseas, who may want to recommend a different brand of VoIP gateway, you will have to verify interoperability between gateway manufacturers. My preference: Select a single gateway vendor that can handle all sites. While it is true that standards-based gateways are supposed to interoperate using standards like H.323, in practice this is not guaranteed and requires an extra verification step in your deployment plans.
Voice compression issues. Perhaps the biggest effect on VoIP voice quality is your selection of codec (coder-decoder) or vocoder. In general, the greater the compression, the greater potential for lower speech quality. Pulse code modulation (PCM) is the 64 kbps encoding method most often used in telecom networks. While it provides toll quality voice communications, it also uses the most data bandwidth. When selecting a codec to use in your network, it thus becomes a trade-off between speech quality and bandwidth consumption. Here's a summary of several popular compression methods (see chart):
The data rates given show the positive impact on bandwidth requirements when compression is used. Even when using ADPCM, a 2-to-1 reduction in data bandwidth is realized. With G.723.1, a better than 10-to-1 bandwidth savings can be had at a cost of somewhat less fidelity. The MOS, or mean opinion score, is a subjective method of evaluating perceived speech quality using live listeners. Any score above 4 is toll quality; anything less than 3 sounds "artificial."
These scores are for the pure compression method - that is, without other network degradation factors added. The delay values given show the amount of processing delay that is intrinsic to the compression method. In other words, it takes this much time for the gateway DSPs and firmware to take incoming speech and generate the compressed data stream that represents that segment - or frame - of speech.
Managing Delay Issues. Total end-to-end delays can severely degrade speech quality. Long delays cause speakers to revert to a "you talk, then I'll talk" mode which is highly unnatural and disturbing. Long delays also exacerbate echo problems, which can cause callers to hear a delayed echo of what they just said, also very disturbing to a call's continuity. From empirical tests, any delays less than about 150 ms provide acceptable speech quality. Between 150 and 400 ms the delays begin to interfere with conversations and cause noticeable degradation in the perceived quality of the connection. Anything greater than 400 ms is simply unacceptable for general planning purposes. (These delay recommendations have been formally published by the ITU, so you can hang your hat on them.)
In addition to compression processing, delays are lengthened by serialization (the time it takes gateways to "clock" the serial data stream out onto the network - higher bandwidth pipes add less serialization delay), packetization (the time it takes to hold voice data samples until a single packet's worth has been accumulated), queuing (this router-based delay is the waiting time before the packet can be sent onto the network - overloaded routers take longer), and jitter buffering (which is used to smooth out the inter-packet arrival times so the audio regeneration process can feed properly-timed, periodic samples into the decompression subsystem of the gateway channel processor).
You need to add up all the delays once you have a roadmap of each VoIP data path and see how close you are to 150 ms. Some parameters can be adjusted to minimize delays, although these always come with tradeoffs. For example, most gateways allow you to adjust frame packing, which defines the number of voice frames (usually each 10-15 ms in length) that get packed into a single IP packet. Typically, you can pack from one to three frames of voice data into an IP packet.
As you decrease the number of frames in a single IP packet, the packetization delay will shorten (a good thing). However, this imposes extra IP header overhead for a given phone call data stream, which in turn requires more bandwidth (a bad thing). Another benefit of lower frame packing: lost packets due to degraded IP conditions will cause less disruption in speech, since there are fewer lost milliseconds of speech in a single (lost) packet. If you expect to encounter significant packet loss (> 1%) on your data network, single frame packing may be the best way to go.