With organisations putting the business critical application of voice, plus video and storage in the foreseeable future, onto converged networks it's no longer enough to scope out the network, connect up the hardware, switch it on and hope for the best. Constant tuning is necessary to keep everything running at optimum performance, especially when VoIP is involved reports Darren Baguley.
Modern networks are highly complex environments and with all of an organisation's communications focused on the one piece of infrastructure, the consequences of something going wrong are greater than ever before. As a result, companies are increasingly monitoring and tuning their networks to get it right the first time and also to maintain peak performance and identify problems.
"The days when an IT manager could plan a network on the back of an envelope are long gone," says Gartner vice president, research enterprise networks Asia Pacific, Geoff Johnson. "Basically, the vendor or a consultant, or a network tool vendor if you're a large enterprise and already have something in place, should be doing a network assessment, ie, you need to measure it first. It's not just bandwidth either; latency, jitter and delay are very important with VoIP because you're working real time and you're up against the normal phone service which has none of those problems.
"Once the network is installed, the network operator, or the organisation's consultants or vendors need to be taking periodic looks at the network. Lots of unexpected, multi-factorial things happen when you get traffic mixing in networks. Traffic can get bogged down causing unreliable performance and you can throw bandwidth at it but it doesn't do anything about it. By doing this an organisation will learn a lot about how its applications do and don't really work."
According to Johnson, the core problem is that voice and video are low latency. If there's more than 150 ms of delay on a VoIP call, jitter results so voice needs to punch straight through whereas data can get buffered. To avoid these sort of problems, two things are critical. "The network needs to have a policy manager to recognise voice and video traffic separate from data and it has to have a specifiable Quality of Service (QoS), so you need multiprotocol label switching (MPLS) or Asynchronous Transfer Mode (ATM) in the WAN and some QoS mechanisms in the LAN. And the bottom line is that if the network hardware is two years old it will need software upgrades and if it's older than two years it will need firmware upgrades as well," says Johnson.
It's not just the network hardware that make a difference either, says ADC Krone general manager, Rob Milne. If companies have made compromises with the network cabling installation, it doesn't matter how fast the switches or routers are at each end of the network link. "If the cabling link is too poor or noisy to handle the bandwidth required, the network will slow down and some parts may stop altogether," says Milne. "There are two major cabling parameters that dictate network performance; near end crosstalk (NEXT) and return loss (RL). If NEXT is too high from either inside the cable or from influences outside the cable then the data signals will be swamped by the unwanted noise. RL increases when unmatched components are coupled together to form the network channel. This usually occurs when low quality patch cords are installed to an otherwise high performing link. If the RL is too high because of all the small reflections from the mismatches, the unwanted noise increases."
Compuware's marketing director Asia-Pacific, Peter Pritchard, agrees with Johnson that the issue has become far more complex with the advent of VoIP. "A lot of networks weren't designed to handle voice, they were designed simply to move data around and if it had to retry a couple of times it didn't matter that much. All of a sudden organisations are doing VoIP and network attached storage (NAS) and that becomes a problem. With VoIP, it's got to be real time or the quality of the voice itself suffers or it just stops working. With storage, because you're moving vast amounts of data around, if you start getting a lot of retries it causes bottlenecks on the network."
A major part of the problem, says Pritchard, is the very reliability of the Public Switched Telephone Network (PSTN) that VoIP is replacing and the expectations that result from that. "The question is, is what you're supplying as good as the plain telephone system? People joke about copper in the ground but it is one of the most reliable systems for communication the world has ever known and it has set really high expectations for VoIP. It's not new technology to users - voice has been around for over 100 years and they expect it to just work. " When it doesn't work the results are quite spectacular. One company lost power into its computer room. None of its users could access email or make a phone call so managers had to walk around asking staff to use their mobiles as there was no way to get a message out to everyone.
When an organisation strikes problems with its network and it begins the monitoring and tuning process it usually finds a lot of things out about its network that it didn't know. "A lot of companies find literally dozens of applications the IT department didn't know about," says Juniper Networks ANZ's systems engineering manager, Roger Geerts. "Peer-to-peer applications and internet radio are notorious users of bandwidth but so are normal internet sessions. TCP by its nature tries to get as much bandwidth as it can for itself so if there are too many internet browser sessions up at the one time it can cause problems; especially as some of those sessions are likely to be non work related."
One company found that voice worked perfectly except for the end of each month. In this case the problem was somewhat obvious - the finance department was closing the books and doing massive uploads and downloads which was saturating the network - but it's not always the case. According to Pritchard it is vital that an organisation be able to track network traffic by application. "One of our mining company customers was experiencing a network slowdown every lunch time. Eventually it turned out that the mine head operators were playing Doom over the network with the miners down in the shaft. One of the weapons, the BFG 9000, fires bullets at a tremendous rate and each bullet is an ethernet packet which totally hosed the network. Without tracking traffic by application it becomes hard to work out what the problem is. Then you've got to measure and monitor that you've so many MB of VoIP traffic going past a certain point at a certain time, it can also measure the jitter limit, delay limit, lost packets and most importantly for users, the mean opinion score (MOS), which is a measure of voice quality and warn you if things are out of specification."
Another problem that organisations face with network tuning and optimising is that sometimes throwing bandwidth at a problem doesn't change anything. "We had a customer who threw a lot of bandwidth at a response time problem and it didn't do a thing because it was a latency issue, not a bandwidth issue and with something like that without tools, organisations can't see what is going on," says Pritchard. "It wasn't the amount of data going through the network, it was the number of calls to the database. It didn't do a database call once, it would take up to 50 database calls for a given transaction. In the head office, over a 100 m/Gigabit infrastructure that didn't matter but when you're running out to a remote site in Central Queensland and there's milliseconds of latency on hundreds of database calls - all of a sudden users had 30 second response times."
In that case, a simple tune of the application solved the problem but conventional wisdom is that if there's a response time issue, get another T3 link. The problem this attitude creates is that if it doesn't fix the problem the IT manager has just spent $50,000 or more for no result. Some organisations are starting to take a more sophisticated approach to network management and won't put an application on the network until it's been through a testing process. This can help solve problems in several ways. Through testing, the IT manager knows what the impact of the network is on the application and vice versa. And it will indicate how much bandwidth the application is likely to use - networks are happy at 70 percent but a dog at 80 percent and not testing a new application could put an organisation over that threshold without warning. It is also likely to flag any application conflicts as some applications don't play nicely together due to conflicting addressing spaces, and using similar headers or data format which means that packets get routed the wrong way. These problems tend to stem more from poor implementation of applications rather than the applications themselves.
All of the people that Voice&Data spoke to for this feature agreed that implementing QoS is the most vital step an organisation can take to ensure that all the applications on the network get the bandwidth they need to run properly. With QoS in place, users can play Doom during their lunch hour and the IT department knows that it will be restricted to the bandwidth left over from business critical applications such as voice and databases. Part of the problem, however, is that a lot of organisations don't have the IT infrastructure to successfully implement QoS.
"A lot of our clients, especially the smaller ones, don't have the information systems bandwidth or experience to continually make sure the QoS parameters are set right through all their routers and switches," says Integrated Research's product manager, IP telephony products, Kailem Anderson. "They just have one big fat pipe, although once they get outside their network and onto their carrier they have QoS. Big enterprises such as the banks and government departments have the staff and the knowledge to do QoS but often it's just a case of something goes wrong, they don't know what and there's no real QoS within the organisation. Once something does happen that might provide the impetus to give a couple of applications priority through the network, the organisation still has to find the problem to fix it."
Cisco's general manager for convergence, Peter Hughes, believes that not only is there a technological aspect to network optimisation but there's also the people perspective. "One of the questions I ask a CIO who is looking at implementing a converged infrastructure is whether he or she has converged the company's voice, data, directory and database management people into a converged unit group? If an organisation hasn't, there are potential issues. Someone says the organisation is going to implement a converged IP telephony system and the data and voice guys each take a step in opposite directions and start lobbing hand grenades at each other. That's very much a reality, so look at the infrastructure but also look at the company organisation and processes."