Call centre managers, struggling to cope with ever-increasing levels of traffic, have been eyeing speech recognition systems for years.
Desperate to improve customer service levels while keeping a lid on operational costs, they've keenly watched the technology evolve to the point where it can reliably handle many calls that would once have required human intervention.
During the past six months, a range of high-profile speech recognition implementations have gone live. Telstra, banking and insurance firm Suncorp and government agency Centrelink are among a growing number of organisations that have adopted the technology as the basis for critical communications channels for their customers.
According to research firm Gartner, shipments of speech recognition software rose by 23 per cent in 2003 compared with the previous year, and similar rates of growth are expected each year until at least 2007.
Industry experts point to two factors that have combined to push voice recognition firmly into mainstream use: an increase in recognition accuracy and a move by vendors to embrace open standards.
Early users of voice recognition systems were often frustrated by their inability to understand anything but the most carefully spoken words. Limited vocabularies and a requirement to speak slowly and distinctly made most verbal exchanges stilted. Often it was simpler to press 'zero' and speak to a real person.
But during the past 12 months, advances in the algorithms used by speech recognition engines have allowed them to become much better at recognising and understanding normal conversational speech.
Nortel Networks Asia Pacific voice portfolio director Chris Luxford says in the past voice recognition systems were constrained by small word dictionaries and recognition rates of around 80 per cent - nowhere near good enough for serious commercial applications.
"We are now seeing accuracy rates well into the high 90 per cent levels," says Luxford. "Most recognition engines have vocabularies of more than 30,000 words, which easily covers most of the dialogues or interactions that need to take place."
Another advance that is making the technology more appealing is natural language understanding (NLU). NLU allows groups of words and phrases to be used to communicate a more complex request. For example, a bank customer could say "give me my account balance and my last five transactions". The system would identify both questions and deliver the required information.
Leading the field in the speech recognition software space are developers Nuance and ScanSoft. Both companies provide the recognition engines that sit at the heart of major speech services.
ScanSoft's Asia Pacific regional director, Peter Chidiac, says massive investments in research and development have led to the creation of engines that can deal with speech in continuous chunks as well as different accents and pronunciations.
"Systems have to be able to cope with people asking for things in very different ways," he says. "As well as a dictionary of words, we gather statistics on how likely people are to say certain things in response to certain questions. This also helps to increase accuracy."
At the same time, the rise of open standards is also credited with the recent hike in popularity of voice recognition technology.
"One of the biggest barriers for companies has been the cost of these systems," says Luxford. "Historically it has been a very expensive thing to get into, but the move to open standards is changing this."
Rather than developing bespoke applications for each implementation, companies can now choose standard building blocks and put them together as required. As well as the customer interface, standards are also making it much easier to link into existing back-end systems such as customer relationship management applications and databases.
The two key standards at work are voice extensible markup language (VXML) and call control extensible markup language (CCXML).
Just as XML is used to tag items for use on a web page, VXML allows tagging for voice commands. In its simplest form, VXML can be used to retrieve information from a database in response to a spoken request. For example "tell me my credit card balance" can cause the system to find the relevant figure and then read it out to the customer.
A newer standard just emerging on to the market, CCXML extends the capabilities of voice recognition systems even further.
"In all voice recognition environments, you want people to be able to speak to the system and then have that system deliver the information they want," says Nortel's Luxford. "VXML does this, but you also have to have a way to control the call."
For example, if someone asks to speak to an agent, the platform needs a way to route the call to a human or perhaps to another IVR (interactive voice response) system in a different department.
Luxford says such things have usually been handled using proprietary call control technology, but moving to a standards-based environment means lower development costs.
"Widespread adoption of standards will also stimulate the development of a larger range of software applications, which will further drive take-up of voice recognition technology," he says.
While the underlying technologies driving voice recognition systems have evolved significantly during the past few years, the principles behind designing systems that work well and are accepted by customers have not changed.
Jane Curtain, speech recognition expert with integration firm Dimension Data, says the most important part of any successful system is the user interface.
"People just hate pushing buttons," she says. "They have to map a concept to a number and this might not fit the concept they have in their mind. People want to be able to use natural language."
She says most people phoning call centres would prefer to speak to a real person, but a cleverly designed speech recognition application can be the next best thing.
Curtain points to Telstra's new "one number, one voice" system that greets callers seeking information on a variety of the carrier's products and services.
"People can respond using their own language," says Curtain. "The system can then interpret what their enquiry is and direct them down the appropriate path. This is where speech recognition is heading."
When it comes to designing the interface that greets callers, Curtain says one of the most common traps is trying to replicate the structure used when numbered alternatives were given. Rather, designers should try to determine the most likely questions and requests people will make and design their system around that.
"You have to gather a lot of evidence about how people tend to phrase their questions," says Curtain. "We try to have the system do exactly what a call centre agent would do. If you do anything less than that you are actually going a little bit backward in terms of development."
Another example of a successful implementation is Centrelink's natural language speech recognition system.
The agency needed an efficient way to automate the task of people filing details of their casual employment. Faced with a massive increase in call volumes to its call centres around the country, Centrelink opted for a voice recognition system that can guide callers through the information-giving process without the need for human intervention.
Piloted in early 2003, the system went live late in the year and is now handling more than 30 per cent of calls.
The Centrelink system authenticates callers by using a combination of their customer number and a PIN. Once they are identified, the system accesses the agency's central database and checks what information the caller is required to give. Through a series of prompts, the caller is told what details are required and when future payments will be made.
Following the success of the project, the agency is now looking at ways to introduce speech technology into other areas of its operations.
Other popular applications of speech recognition technology in the corporate world include the reading back of personal information, voice controlled message bank services and phone directories.
Some vendors predict an increasing number of companies will do away with switchboard operators, relying on voice systems to direct incoming callers to the appropriate department or staff member.
Nortel's Luxford says this concept can be taken even further with the creation of a personal voice portal. "Carriers around the world are looking for new applications to drive revenues and this is a good example of what they could offer customers," he says.
A personalised voice portal is a single number a subscriber can call, regardless of their location, to access a diverse range of services and information. For example, you could dial the number and say "call my boss". The portal would then dial a pre-programmed phone number, or perhaps ask "office or mobile phone?".
"You could call and say 'send flowers to my wife' and the portal would use details already on file to order, arrange delivery and make payment. These are the kind of value-added services you will see being offered."
Another emerging area is the concept of authentication through voice prints. Recognition software analyses the frequencies of a caller's voice and creates an encrypted voice print.
When the person calls again, their voice is compared to the print on file to determine whether they are one and the same.
ScanSoft's Chidiac says the result is a very reliable and accurate way of identifying people using just their voice.
"Banks are very interested in this technology as an additional way of protecting personal information while making access easier," he says. "You could rely on this solely, but typically they will use it in conjunction with another item, such as an account number.
"As this technology matures, in the future people will just start speaking and the system will recognise them and authorise them."
As more companies consider using speech recognition technology, Telstra has spotted a market for outsourced services.
Telstra's managing director for voice solutions, Louis Dupe, says the carrier is offering a range of services designed to remove much of the complexity for companies keen to use the technology.
"This technology has come of age and we see a growing demand from our clients," he says.
Telstra offers customers a choice of a fully hosted service or the ability to select specific applications that Telstra will manage at a customer premises.
Improved accuracy and reliability and reduced implementation costs are making speech recognition technology an option for an increasing number of organisations. The days of waiting impatiently in a queue to speak with a call centre agent may soon be gone for good.