Media Processing: Servers and High-Density Boards

What looks like revolution is revealed as simple evolution: Monolithic VRUs are 'decomposing' into softswitches, gateway/controllers, and media servers. The benefits: carriers can ramp up services faster, and new services can be profitable with fewer users.

by Andy Green and Robert Richardson

Back in the old days of analog and TDM (i.e., five minutes ago), voice apps were typically built on VRUs (Voice Response Units) - big PCs or Unix boxes containing line interface, switch matrix, and DSP boards. Though VRUs look monolithic (and typically scale that way), they're really not: From fairly early in the evolution of computer telephony, most voice application platforms drew a clear dividing line between upstairs and downstairs - between "what the application did" and "what was happening on the cards." As CT grew more sophisticated, board makers like Dialogic (now Intel) and NMS engineered more and more powerful media processing, switching and call-handling facilities into their boards, and made these accessible to applications through increasingly abstract APIs and resource-management middleware. At the same time, they worked with standards bodies and application providers to implement, stabilize, and promote the separation of "application computing" and "switching and media processing" at the hardware level, by creating switching buses like PEB, MVIP, SCSA, and H.100/110.

Two goals in mind: first, they wanted to simplify voice application programming by walling off call-progress-bound, interrupt-driven, realtime telephony arcana (network interface, switching, and media processing) from more aetherial application code. Second, they anticipated a time when telephony apps would need to get bigger: A single application controlling transactions on thousands, or tens of thousands of ports. To do this, you have to segregate application processing from "telephony stuff" in both software and hardware, so that resources dedicated to each can scale independently.


Time marches on. The Internet grows up, IP telephony emerges. And we have a great new idea - wow! - voice apps are even easier to write, and voice application components are even more independently scalable, shareable, and manageable if you separate them physically and replace the separate 'signaling' and 'media' buses with a single, packet-based, high-speed IP network.

The new plan for voice apps is remarkably simple and elegant. The app itself - high level - runs on an application server, which talks to a softswitch, managing call control and network signaling. The softswitch talks (via MGCP and SIP) to gateway controllers and gateways - bringing calls and signaling in off the PSTN and reoriginating them as IP. These calls - along with calls originating on the IP side (i.e., from IP telephones, softphones, etc.) - connect across the IP network to "media servers" which handle the heavy lifting of VoiceXML-interpreting, announcement- and voice-prompt-playing, user-input-getting, speech-rec, text-to-speech, call content recording, conferencing, and many other possible tasks.

One vendor's breakdown of core chores for media servers is represented in the chart. The stuff done by a media server is (at least, theoretically) distinct from signal processing tasks better done at the network edge: in gateways, IADs, and similar devices. It makes sense, for example, to interpret DTMF at the gateway and convert to unambiguous (and reliably transmissible) signals for the app, rather than trying to compress and ferry MF tones across the IP network for (possibly fallible) detection on the media server. Likewise, media servers at the network core aren't ideally positioned to do fax-tone detection, PSTN-side call-progress analysis, or other 'gateway plane' signal processing.

Nevertheless, as our roundup will reveal, the general-purpose media servers now coming available can be optioned out to do most or all of these 'peripheral' tasks, too, should they be necessary. And several, indeed, contain TDM line interface components in addition to a high-speed network interface, and are able to terminate and serve both PSTN and IP call traffic. Why buck the trend to segregate 'gateway' and 'core' media processing functions from one another? Because most traffic is still coming off the PSTN, and will be for the near-term future. Peter Gavalakis, marketing manager for public network products within Intel's Network Processing Group, says, "there's absolutely no question that every carrier we've talked to wants a hybrid device. That doesn't mean they aren't going to have other media servers that hide behind gateways, but they all want to leverage the existing infrastructure."

As IP traffic becomes more common, the role of the gateway and the media server will become more sharply delineated, and substantial cost-savings will result. Even today's media servers - while expensive - offer economies of scale and efficiencies of resource utilization that lead to a lower per-port cost than older, TDM-based enhanced service platforms. Brett Azuma, senior vice president of marketing at IP Unity, says that the cost for deploying low-density boxes was prohibitive, requiring carriers to mass-deploy services to reach the break-even ROI hurdle of around 5% market penetration. With media servers, the break-even number is "fractions of a percentage," making them a very attractive equipment investment.


The best way to build this sort of resource box is an open question, or you might even say a question of openness.

Companies that have been founded just to build this sort of box are almost uniformly engineering proprietary, purpose-built systems. In this scenario, you have a rackmount chassis with custom-designed hot-swappable blades that include a great deal of DSP horsepower. Effectively, each blade operates as its own computer, typically running a real-time OS. This is connected to a switch fabric on the backplane (in some cases proprietary, in other cases it's switched Gigabit Ethernet).

"I fundamentally don't believe you can build a truly carrier-class product using open components," says Joel Hughes, CEO of SnowShore Networks. "Unless you as a single entity, a manufacturer, control not just the hardware components but also the low-level embedded software from the OS through every board you plug into the system, you really can't failover calls and provide the reliability and throughput." And even board makers concede that, past a certain density of channels, this is true.

Says Intel's Gavalakis: "Will using off-the-shelf cPCI boards get you the kinds of densities that Cognitronics or SnowShore or Convedia get? No. But it's a heck of a lot cheaper." You can use IP-capable DSP boards right off the shelf now, in other words, instead of building it from scratch. You might stack up a number of separate CPU/board combos in a rack and connect them with switched Gigabit Ethernet. And, with the help of the PICMG 2.16 standard defining switched Ethernet on the cPCI backplane, board vendors like NMS and Motorola Computer Group have done just that. They each have capable media servers - NMS's HearSay and MCG's MXP - built from their own stock of 2.16 compliant cards for communicating over a PSB (Packet Switching Backplane). Both products meet the needs of large enterprises and carriers, especially providers looking to deploy high-value, consumer-oriented speech services.

In the enterprise space, where reliability requirements are less stringent, Verascape shows that it is possible to build media servers from off-the-shelf generic computing components. Its VeraServ links Intel's 1U Pentium III Server (running signal processing code natively!) using gigabit Ethernet as the media bus.

Gavalakis confirms what we're seeing in the market, noting that as standards evolve, you not only get higher densities supported in standard busses, you also gain capabilities for handling failover scenarios even when using a stack of standard pizza boxes hooked up by Ethernet. "We're working on delivering middleware that helps solve some of those problems for our telecom equipment manufacturer customers. They can basically write the application and this middleware will take care of the availability. In some cases, carriers will need the five nines and be willing to pay for it. But in other cases, there are ways to engineer reliability into the system outside of the box."

Media Servers Rounded Up


A 20-person startup, aTelo (Arlington, VA - 703-276-9000, is unabashedly pro-software and their media server is built to run on whatever Pentium-based hardware a buyer wants to use. There's no proprietary DSP hardware, not even off-the-shelf voice processing boards. On a dual-Pentium 3 pizza-box platform (running at 1 GHz), the company reports it can handle 200 channels (jumping to 400 channels midyear with 3 GHz chips) - the benchmark is derived from supporting their ProVoice voicemail application, not from a minimalist announcements-only benchmark (the company says a CPU approach works very well for all but N-way conferencing and high-throughput transcoding work, where DSPs are to be preferred).

The aTelo software expects to receive SIP requests and to gain access to various files on other servers via HTTP - the interface for defining what happens when a SIP request is made is a custom (but Visual Basic-like) script that provides access to various components providing services like ASR, TTS, T.38 fax, DTMF, and so on. Developers can load only the components they need. Where different packages provide overlapping features with different strengths (different speech packages for different languages, for instance), it's possible to load multiple components, so there's some real flexibility. Though aTelo believes its script language has more capability and handles features that VoiceXML doesn't provide, it also supports (by translating on the fly) standard VoiceXML apps. At present, aTelo offers an enterprise version of the media server, which allows multiple servers grouped by virtue of a supervising SIP proxy server. Later in the year there are plans for a clustered version that will present multiple servers to the network as a single IP address.


Cognitronics (Danbury, CT - 203-830-3400, is different from other media server box makers in that it's not a startup, not by a long shot. The company has been around since 1961, when it began making automatic number announcers for Western Electric. In the analog world, when a Class 5 switch needs an announcement made, there's a fair chance that the box that actually makes that announcement was made by Cognitronics.

From there, it's a natural step to making an announcement box for the IP world. Rather than abandon TDM outright, Cognitronics aids gradual migration for its carrier customers by making hybrid media servers. You can mix and match cards that talk TDM and IP.

At present, the Cognitronics media server offering is called Cognitronics Exchange or CX. It comes in four flavors, beginning with the small CX500, which includes one T1/E1 board and a 100 Mb/s ethernet LAN. Dual LAN support and a VoIP Media Interface board will be available later this year. Scaling up through the CX1000 and CX3000, one arrives at the CX4000, the current top of the Cognitronics line, a 5U chassis with up to six processing boards (horizontally mounted for sidelong airflow that doesn't channel heat up the rack to the next component). Options for those slots include VoIP, E1/T1 (with a quad E1/T1 version of the board on the way), and ATM (OC3).


For its CMS-6000, Convedia (Vancouver, BC - 604-918-6300, made the investment in proprietary DSP boards, packet switched backplane, N+1 redundancy, and system software. The ROI is a carrier-class 13U "purpose built" media server that scales up to 18,000 ports. Acting purely as a slave in a softswitch architecture, controlled by MGCP and MEGACO, the CMS-6000 devotes itself purely to media processing. The 6000's DSPs perform the basic media functions (voice detection, DTMF detection, mixing, etc.) on behalf of service logic running on remote app servers (or softswitches) for conferencing, IVR, announcements, and network notifications. With its recent support of SIP as a device control protocol, the 6000's onboard software accepts "invites" as well and can now be controlled by SIP-compliant endpoints.

The performance of the CMS-6000 is astonishing: The DSP horsepower and high-bandwidth backplane allow, on paper, for all 18,000 ports to be engaged simultaneously in 900 separate conferences of 20 callers each. Try doing that with off-the-shelf components! While the CMS-6000 does not currently support TTS/ASR as a core function, instead delegating them to external servers, plans are under way to program them into the 6000.


Like some other media server makers, IP Unity (Milpitas, CA - 408-957-0800, saw they couldn't build their 16,000-port platform from components, so they did it from scratch. Proprietary DSP boards networked by an ATM backplane (2.5 Gbps) let its 13U Harmony6000 achieve impressive specs. As a conference bridge, it can mix together 5,000 three-way calls - that's simultaneous media activity on all ports! DSP resources can be allocated ahead of time so that the MIPs are available to handle specific apps. IP Unity is working on dynamic resource allocation, which would allow the 6000 to handle spikes in media load. Carrier-class credentials are earned in the reliability department as well. Media cards rest on a NEBS-compliant chassis with redundant controllers. DSP chips are organized into smaller subsets, in which each farm - there are 256 - has its own signaling message processor. If the message processor should fail, the other farms continue on. It supports the usual standards-based control protocols - H.323, MGCP, MEGACO -and the box's 16 100BaseT Ethernet interfaces gives it a fat pipe to the IP cloud. ATM interfaces (OC-3 and OC-12) are being planned.


Four separate packet backplanes linking a 21-slot cPCI chassis, no TDM-based H.110 bus. These words describe not only the MXP from Motorola Computer Group (Tempe, AZ - 800-759-1107,, but perhaps suggest what future media processing platforms from OEMs may look like. To build a media server from the MXP, one could slide in any of the current batch of PICMG 2.16-compliant boards, which would communicate media and data over the MXP's dual star topology Ethernet network. If, though, you're a wireless carrier, and need to put this box in the ATM core, then you could instead tap into the MXP's fully meshed, non-shared ATM switching fabric (scales to 700 Gbps), avoiding performance degrading translation to IP. The mesh supports Frame Relay as well. Then, there's Intel's IPMI (Intelligent Platform Management Interface), standardized as PICMG 2.9, which is used to monitor the health of the MXP chassis. Using IPMI, administrators remotely manage the box over a LAN or serial connection. The fourth backplane is Motorola's own Fiber Channel network. Intended as high-speed pathway to networked storage devices, the Channel would be ideal, say, for delivering text from a remote disk file that a media server has requested for its network announcements. To help OEMs take advantage of the MXP, Motorola has partnered with ZYNX Networks for PICMG 2.16-compliant Ethernet switches, NetPlane Systems for its control plane (ATM, IP, and Frame Relay) middleware, and C-Port for its C-5T network processor blades.


Deciding to bring it all together under one roof, NMS (Framingham, MA - 800-533-6120, has delivered a turnkey application server and media/speech processing platform, HearSay, built from its own CG 6000/6500C cards, plus best-of-breed hardware and software from partners like Sun and Oracle. Targeted for carrier customers who don't want to assemble media components themselves, HearSay combines a VoIP gateway, media server, speech server (TTS/ASR), and Sparc-based application servers. Released originally in 2000, the HearSay started life as a voice portal platform, on which companies like BeVocal deployed their VoiceXML browsers. Made significantly more powerful and turnkey, it now scales to 10,000 ports. NMS's PacketMedia software wraps standards-based control protocols (H.323, MGCP, MEGACO, and SIP) around the CG series card, letting HearSay become part of a softswitch-controlled architecture. More on HearSay next issue in our feature on enhanced service platforms.


SnowShore Networks' (Chelmsford, MA - 978-367-8400, N20 is a 10U chassis with a 25GB switch-fabric backplane. It's proprietary equipment, NEBS-3, with hot-swappable blades, redundancy throughout the system, and redundant 48VDC power feeds. Minimum density is 500 simultaneous sessions, scalable in 200-session units right up through a maximum of 20,000 sessions in a chassis.

The N20 is primarily designed for IP, but will also support ATM interfaces. When the unit, now in beta, reaches its Q2 general availability, it will support Gigabit Ethernet and OC12 interfaces (the beta units are running GigE). Virtually everything about the N20 runs at wire speed. Even with media transcoding in place, latency from packet arrival at the ingress to the transcoded packet leaving the box is less than five milliseconds.

As for what the box does, SnowShore views its media server as tackling three separate groups of processing functionality: IP, media, and application. IP is about providing Ethernet interfaces and supporting basic protocols like RTP, much as you'd expect. The media processing is the bread and butter stuff: conferencing, recording, tone processing, compression, transcoding, fax, speech rec, TTS, and streaming audio. The application processing capabilities don't provide actual application logic, but instead provide building blocks for supporting various (mostly Internet-style) interfaces, including SIP, VoiceXML, MGCP, HTTP, NFS, FTP, SMTP, IMAP, and RTSP. Each of these three groups of functionality is implemented on a different blade and the three can therefore be mixed in different proportions, depending on what a carrier's individual needs for scaling the different capabilities is.


Still largely in stealth mode, Thinkengine Networks Inc. (Marlboro, MA - 508-597-0401, plans to announce what they call their Advanced Media Server toward the middle of this year. Details about the server are still somewhat under wraps, but the basic product is a 3U box built out of proprietary components, capable of addressing 672 simultaneous ports (a DS3's worth) either as native IP or via ISDN (or a mixture of both). The box supplies all the voice services a carrier might want, plus an interesting feature not found in other boxes - the ability to listen for a keyword on all active ports, to allow for in-call services no matter what's going on within a call.


Designed as customer premises equipment for the enterprise market, Verascape's (Oakbrook Terrace, IL - 847-919-0873, VeraServ is another un-DSP server. As with aTelo (but sold as a software/hardware bundle), the TTS and ASR processing are on general purpose Intel 1 GHz Pentium 1U Servers, which are up to the task of handling native signal processing software.

These Intel 1Us are piled into a VeraServ seven-foot rack, networked together by two "departmental" IP switches (really Cisco's Catalyst 2948G ). There's a separate media gateway module (engineered with AudioCode cards) to do the TDM-IP translation function. Verascape says that by dedicating 14 Intel Servers, running its Speech Server app software, they can process 672 simultaneous speech streams (48 per server!), from any of 2699 TDM channels funneled in by four DS3s.

The system is sold turnkey, with built-in VoiceXML browsers as part of the Speech Server app. TTS requires separate dedicated Intel 1Us, and the ratio here is 14 channels per Pentium. Neater still is that its Vivance architecture uses SIP to bind it all together, so there's nothing preventing companies from swapping in their own SIP-compliant gateways or media cards or deploying multiple VeraServ racks.


Today's generalized media servers have important precedent in telco central offices. Look at a typical CO's Class 5 switch installation and you'll find there's a purpose-built box hooked onto it. The operation of these "legacy media server" boxes is refreshingly simple. The switch connects a call that needs a message played to an available port on the box, sends an in-band signal (the nature of which will vary based on whose Class 5 switch is doing the signaling), and the box plays the message over the line.

High-Density Media Boards

Whether high density media processing boards are embedded in single-purpose media devices, VoIP gateways, or softswitch-managed media servers, the underlying DSP firmware presents the same set of media processing primitives. In the past, developers had accessed these low-level functions strictly through the board's native APIs, creating non-portable media applications perhaps for a carrier-grade "enhanced service" platform. Today, though, the programming can be accomplished using standards-based device-control protocols (MGCP, MEGACO and H.323) or through peer-to-peer SIP.

Outer interfaces may have changed, but the media boards' core media functions remain fairly constant, though driven by more powerful DSP hardware. While each vendor may enumerate their boards' media capabilities differently, we believe the core functions can be categorized as follows: DTMF detection (including call progress tones), fax detection (T.38), audio mixing, transcoding, ASR, TTS, record-playback, echo cancellation, voice detection, silence detection, and packetization. Boards that end up in media servers typically have runtime firmware for the first eight (see the chart), delegating the remaining functions to the gateway.


Media boards don't necessarily have to be multi-functional building blocks, loaded with firmware to handle several different media tasks simultaneously. In the better-to-do-one-task-well school is a new cPCI card in Amtelco's (McFarland, WI - 608-838-4194, XDS Infinity Series. Its 512-port Infinity Conferencing board devotes itself exclusively to audio mixing. Rather than develop the firmware from scratch, Amtelco has taken the same code that has proven itself in its older 128-port H.100/MVIP conference cards and ported it over to a higher-density H.110-compliant one.

Further informations in englisch can be found on this site!

Further News:

Communication Convergence at its best.
This site is about news and communication convergence at its best. Thanks for some pretty good informations on various topics.
Mr. G. B. Rockr
Planning of 2010
  • MAY:
    - IADs
    - Wireless Extensions
    - SIP & Windows XP
    - Best Convergence Distributors
  • JUNE:
    - Voice Gateways
    - Interaction Recording
    - Cell phone/PDA Fusion
    - Compact PCI and Packet Backplanes
New Offers
for partners

Subscribe For