Phone the Web Banner

Overview | Phone the Web resource library | People

Papers and Articles


Speech Interfaces - Natural language, conversational interfaces, voice recognition.

Speech Recognition

Interactive Voice Response (IVR) Systems - Interactive applications for the telephone.

VoiceXML - Internet markup language for creating voice applications with speech recognition.

Voice Portals - Telephone access to the internet.

SMS Text Messaging - Short text messages over GSM mobile telephone network.

3rd Generation (3G) Mobile Telephones - Next generation phones with a fast data connection.


Information Architecture - Organizing information, web site structure, knowledge systems.

WAP - Low bandwidth internet service for mobile phones.

Speech Interfaces Back to Index
Conversational Interfaces

Lai, Jennifer
Discusses speech-recognition technology and its impact on labor productivity. Promise of text-to-speech and synthesis speech technologies in transforming computers into virtual assistants; Use of speech technology in a variety of applications; Inefficiency of speech alone to fulfill the function of traditional input modalities.
My Voice Is Your Command.

Discusses the developments in voice technology and its implications for marketing and sales executives. Benefits of the technology; Obstacle to voice recognition's growing ubiquity.
Natural language dialogue for personalized interaction

Wlodek Zadrozny, M. Budzikowska, J. Chai, N. Kambhatla, S. Levesque and N. Nicolov
Technologies that successfully recognize and react to spoken or typed words are key to true personalization. Front- and back-end systems must respond in accord, and one solution may be found somewhere in the middle(ware).
Natural Language Technology in Precision Content Retrieval

Jacek Ambroziak, William A. Woods
This paper describes a new approach to information access that combines techniques from natural language processing and knowledge representation with a new technique for relevance estimation and passage retrieval. Unlike many attempts to combine natural language processing with information retrieval, these results show significant benefit from using linguistic knowledge. Subsumption technology is used to automatically integrate syntactic, semantic, and morphological relationships among concepts that occur in the material, and to organize them into a structured conceptual taxonomy that is efficiently usable by retrieval algorithms and also effective for browsing.
Natural Spoken Dialogue Systems for Telephony Applications

Boyce, Susan J
Examines the use of natural spoken dialogue systems for telephony applications. Design of natural-language devices to carry out specific tasks; Ability of these devices to handle breakdowns in recognition; General components of a spoken natural dialogue system; Comparison between these systems and other alternatives for telephony applications.
On natural language call routing

Lee, Chin-Hui; Carpenter, Bob; Chou, Wu; Chu-Carroll, Jennifer; Reichl, Wolfgang; Saad, Antoine; Zhou, Qiru
Automated call routing is the process of associating a user's request with the desired destination. Although some of the call routing functions can often be accomplished though the use of a touch-tone menu in an interactive voice response system, the interaction between the user and such a system is typically very limited. It is therefore desirable to have a call routing system that takes natural language spoken inputs from the user and asks for additional information to complete the user's request as a human agent would. In this paper we present a recent study on natural language call routing and discuss the capabilities and limitations of current technologies.
Speech Interfaces from an Evolutionary Perspective

Clifford Nass, Li Gong
How does the human brain react when confronted by a talking computer? Answers from psychological research and its design implications help define the limits of what computers should say and how they might say it.
Users' Conceptions of Voice-Operated Information Services

Weegels, M.F.
When users interact with a voice-operated service, they bring along their habits as well as their expectations from experience with human-human dialogues, with the domain, and with other systems and services. In addition, users' expectations are further shaped while using a system. The present study explores the extent to which user-system interaction, and in particular difficulties in the interaction, are affected by users' expectations and (mis)conceptions of the service, and how these expectations evolve during use. In an exploratory study, twenty subjects queried two different train travel information services. A semi-structured interview was held on subjects' dialogues with the systems, by replaying the recordings together with the subjects. In interacting with voice-operated services, users appear to draw from various sources of experience. Users' misconceptions and misunderstandings of the system lead to various problems in interaction, such as undesired travel suggestions and irritation. The implications for the design of voice-operated services are discussed.
Speech Recognition Back to Index
(Un)Naturally Speaking.

Reviews the Dragon NaturallySpeaking Preferred v.5 voice-recognition computer software by Dragon Systems. Key features; Pros and cons; Cost; Recommendation.
Error Detection in Spoken Human-Machine Interaction

Krahmer, E.; Swerts, M.; Theune, M.; Weegels, M.
Given the state of the art of current language and speech technology, errors are unavoidable in present-day spoken dialogue systems. Therefore, one of the main concerns in dialogue design is how to decide whether or not the system has understood the user correctly. In human-human communication, dialogue participants are continuously sending and receiving signals on the status of the information being exchanged. We claim that if spoken dialogue systems were able to detect such cues and change their strategy accordingly, the interaction between user and system would improve. The goals of the present study are therefore twofold: (i) to find out which positive and negative cues people actually use in human-machine interaction in response to explicit and implicit verification questions and how informative these signals are, and (ii) to explore the possibilities of spotting errors automatically and on-line. To reach these goals, we first perform a descriptive analysis, followed by experiments with memory-based machine learning techniques. It appears that people systematically use negative/marked cues when there are communication problems. The experiments using memory-based machine learning techniques suggest that it may be possible to spot errors automatically and on-line with high accuracy, in particular when focussing on combinations of cues. This kind of information may turn out to be highly relevant for spoken dialogue systems, e.g., by providing quantitative criteria for changing the dialogue strategy or speech recognition engine.
Improving Speech Recognition Accuracy for Small Vocabulary Applications in Adverse Environments

Thambiratnam, D.; Sridharan, S.
The problem of improving the accuracy of small vocabulary isolated word speaker dependent speech recognition under adverse conditions such as factory environments is considered. A new approach to solving this problem, by using Output Probability Distributions (OPDs), is presented. OPDs improve the system performance by modelling inter-word relationships, something that a standard maximum likelihood (ML) technique fails to do. The system was tested using the TI46 database, corrupted with the NOISEX-92 database, as well as in a real-world factory environment, and achieved good results.
Multilinguality in voice activated information services: The P502 EURESCOM project

Azevedo, J.; Beires, N.; Charpentier, F.; Farrell, M.; Johnston, D.; LeFlour, E.; Micca, G.; Militello, S.; Schroeder, K.
The paper describes the multilingual system developed within the framework of the P502 EURESCOM project. The system described provides information about major telephone services available in the UK, Germany, France, Italy and Portugal in five languages. We present the results of a number of experiments carried out in the five countries, aiming to try and answer some fundamental questions concerned with the exploitation of a multilingual service. Both technological and interface design issues have been investigated and several alternatives have been tested. We compared speech recognition accuracy and successful transaction completion rates of GSM and PSTN networks, and evaluated cross-country and cross-language effects. Using a new methodological approach to assessment, a powerful predictive model was developed. This model allowed users' subjective ratings to be predicted from objective measurements. The results showed that an average Transaction Success rate of more than 92% was obtained when speech recognizers exhibiting good Word Recognition Accuracy were coupled to suitable dialogue interfaces in the IVR system.(Author abstract)
Taming Recognition Errors with a Multimodal Interface

Sharon Oviatt
More modes are better than one when it comes to comprehending human speech, especially when speakers are accented or interacting in noisy natural environments.
The Limits of Speech Recognition

Ben Shneiderman
To improve speech recognition applications, designers must understand acoustic memory and prosody.
Interactive Voice Response (IVR) Systems Back to Index
A Computer? Funny, You Don't Sound Like One.

Discusses voice technology and its use in automated-response telephone systems. Replacement of human customer service representatives with computers; Development of voice recognition software by Christopher Kotelly, an expert in the field of computerised voice systems; Speculation on the future of voice systems.
Voice Power!

Comments on the voice recognition add-on service feature of wireless telephone from Qwest Communications International Wireless. Description of the technology; Topics provided by the browser; Features lacking the service.
VoiceXML Back to Index
Four Technologies That Will Shape the Net.

Presents an outlook for the impact of new technologies on the Internet industry as of October 2000. Integration of voice-recognition technology with internet browsers; Details of the Bluetooth technology standard for short-range wireless communication; Advantages of peer-to-peer file sharing technology for Internet users; Effects of extensible markup language (XML) on the Internet's versatility. AN: 3603459
I dream of Genie.

Focuses on VoiceGenie Technologies Inc., a computer software company, and its programming language VoiceXML. Hope of VoiceGenie that voice portals will allow people to access information from the Internet using their telephones; Expected market for voice-enabling services in the United States; Details of how voice portals work by translating a voice with speech-recognition software which is then transmitted over the Internet to access data.
Lucent deploys VoiceXML.

Reports the launching of Speech Server of Lucent Technologies Inc. Capabilities of the server; Companies that support the VoiceXML specification.
PCs Get Ready To Speak--And Listen.

Offers ideas about text-to-speech, natural-language processing and the future of key-board and mouse-free computing. Discussion on computers that can listen to spoken commands and communicate through natural-sounding synthetic human speech; Contribution of telephony applications to the use of synthetic speech; Background of VoiceXML technology.
Speech-Enabled Services Using TelePortal Software and VoiceXML.

Ball, Thomas; Bonnewell, Veta; Danielsen, Peter; Mataga, Peter; Rehor, Kenneth
TelePortal(TM) software, which resides on a speech-enabled telephony platform, brings the advantages of the World Wide Web to advanced speech recognition telephone services. In response to an incoming call, this software retrieves a dialogue specification document from a Web server, interprets it to collect input from a caller, and submits the input to a (possibly different) Web server, which processes the input and may continue the call by returning another dialogue specification document. The TelePortal architecture includes a browser (to retrieve and cache Web content), a set of interpreters (to process documents), and a set of platform interfaces (to allow the interpreters to control the speech and telephony resources of the host platform). Using the Web to retrieve dialogue documents and to process the input they collect creates a new business opportunity for network operators and third-party application developers. Interactive voice response (IVR) services, which may be made available from a standard wireline or wireless telephone, are easily programmed using the emerging Voice Extensible Markup Language (VoiceXML(*)) standard. TelePortal software is being integrated into several Lucent platforms. We present examples of the new network IVR opportunities that this software provides for one of these--the platform of the intelligent network. [ABSTRACT FROM AUTHOR]
The power of voice

Orubeondo, Ana
Deals with the emerging voice extensible markup language (Voice XML) technologies for Internet applications in businesses. Driving force behind the technology; VoiceXML virtues; Peculiarities of the technology.
The power of voice.

Deals with the emerging voice extensible markup language (Voice XML) technologies for Internet applications in businesses. Driving force behind the technology; VoiceXML virtues; Peculiarities of the technology.
The XML Revolution

XML by itself is just a simple text format; but together with all the ways it's being used to share structured information, it's a revolution that promises to make the Web a whole lot smarter.
VoiceXML for Web-based distributed conversational applications

Bruce Lucas
VoiceXML replaces the familiar HTML interpreter (Web browser) with a VoiceXML interpreter and the mouse and keyboard with the human voice.
Wireless Voice Interface Nears.

Reports that third generation telecommunications technology, which will enable users to access graphics and text from wireless devices by voice, is gaining popularity in the United States. Plans of several carriers to offer said service before end of 2001; Participation of VoiceXML Forum in wireless voice-prompted Web access.
XML, Java, and the future of the Web

The extraordinary growth of the World Wide Web has been fueled by the ability it gives authors to easily and cheaply distribute electronic documents to an international audience. As Web documents have become larger and more complex, however, Web content providers have begun to experience the limitations of a medium that does not provide the extensibility, structure, and data checking needed for large-scale commercial publishing. The ability of Java applets to embed powerful data manipulation capabilities in Web clients makes even clearer the limitations of current methods for the transmittal of document data.
Voice Portals Back to Index
European wireless portal use to boom

There will be about 16.7 million wireless portal users in Europe's top 15 markets by the end of 2000, according to a new study released by The Strategis Group.
Mobilising the Internet

A November 1999 research report from Cyber Dialogue, an Internet database marketing firm, warned e-commerce companies that they were going to have to work harder in the future: the stampede onto the Internet has slowed in the U.S. The survey cites three constraints to growth. First, it takes money to get connected, and many of those off-line simply can't afford Internet access. Second, a third of American adults believe that they have no need for the Internet and have no intention of getting on-line. Third, 27.7 million Americans have tried the Internet--and dropped it; the number is triple that measured in 1997.
Phone-based Web Services

Fifty percent of U.S. households still don't have computers, but they all have telephones. And some intriguing new voice-based services are set to make those phones--as well as countless millions of cell phones--useful in new ways.
Services via mobility portals

Ralph, D.; Shephard, C. G
Abstract: This paper examines the importance of mobility portal services and the technologies that will be essential in delivering content over next generation network technologies. A discussion of some examples of the different mobile portals currently available highlights their limitations and sets the direction for future services. Finally, proposed developments are presented which will extend the functionality available at the mobile portal through the improvements in terminal capability and the evolution of protocols for delivery of content over a mobile network.
Sisl: Several Interfaces, Single Logic

Ball, Thomas; Colby, Christopher; Danielsen, Peter; Jagadeesan, Lalita Jategaonkar; Jagadeesan, Radha; LŠufer, Konstantin; Mataga, Peter; Rehor, Kenneth
Modern interactive services such as information and e-commerce services are becoming increasingly more flexible in the types of user interfaces they support. These interfaces incorporate automatic speech recognition and natural language understanding and include graphical user interfaces on the desktop and web-based interfaces using applets and HTML forms. To what extent can the user interface software be decoupled from the service logic software (the code that defines the essential function of a service)? Decoupling of user interface from service logic directly impacts the flexibility of services, or how easy they are to modify and extend. To explore these issues, we have developed Sisl, an architecture and domain-specific language for designing and implementing interactive services with multiple user interfaces. A key principle underlying Sisl is that all user interfaces to a service share the same service logic. Sisl provides a clean separation between the service logic and the software for a variety of interfaces, including Java applets, HTML pages, speech-based natural language dialogue, and telephone-based voice access. Sisl uses an event-based model of services that allows service providers to support interchangeable user interfaces (or add new ones) to a single consistent source of service logic and data. As part of a collaboration between research and development, Sisl is being used to prototype a new generation of call processing services for a Lucent Technologies switching product.
UPnP, Jini, Salutation

With the world of specialized information appliances poised to take over the technology landscape in the coming years, coordination between devices has become a serious research issue. A number of architectures addressing mobile and specialized devices have emerged recently. These architectures are essentially coordination frameworks that propose certain ways and means of device interaction with the ultimate aim of simple, seamless and scaleable device inter-operability.
Voice Portals and VoiceXML, Part 1

Voice Portals and Voice Extensible Markup Language (VoiceXML) are valuable technologies that will soon have a big impact on the wireless industry. Voice Portals are advanced, voice-activated, natural-language interfaces that permit voice access to Internet-hosted content in much the same way that Web browsers and Wireless Application Protocol (WAP) microbrowsers do.
Voice portals: Ready for Prime Time

Voice portals are in their infancy, but when they're at their best, they allow you to navigate sites by voice alone, using both voice-recognition and voice-synthesis techniques. At their worst, they're frustrating and can have comical foibles. For instance, one site interpreted a cough as a request for the weather in Beirut.

Discusses advantages and disadvantages of web telephones. Ability of receive electronic mail and access the web from the phones; Number of people using the phones in the United States; Difficulty of writing e-mails with the phone; Expectation that voice-recognition technology will improve the phones.

Discusses possible changes in the telephone resulting from bringing voice-recognition technology, mobile telephony and the Internet together. Goals of Mike McCue, Chief Executive Officer (CEO) of Tellme Networks, including bundling many basic services into telephone use; Problems with voice portals, which are unable to work as well as many Web sites; Improvement in voice-recognition technology as of November, 2000.
SMS Text Messaging Back to Index
Automated stock price delivery system based on the GSM short message service

Friel, Dermot; Kilmartin, Liam
The development of value added services based upon the GSM standard is becoming increasingly important to both network operators and the subscriber's to such networks. This paper outlines a system capable of delivering 'real time' stock price information (e.g. price, volume etc.) from Internet based stock price servers to a variety of subscriber types. The main delivery mechanism for this information is the GSM short messaging service (SMS) but the system is also equipped with an interactive voice response (IVR) system in order to support an additional automated speech prompt based delivery mechanism. The system also supports the use of GSM equipped Windows CE based palmtop computers and personal digital assistants, as subscriber terminals for the system, by means of a suite of software for accessing and managing the stock price information delivered to these terminals by the GSM SMS delivery mechanism.
Global system for mobile communications short message service

Peersman, Guillaume; Cvetkovic, Srba; Griffiths, Paul; Spear, Hugh
This tutorial presents an overview of the Global System for Mobile Communications Short Message Service from the viewpoint of implementing new telematic services. The SMS offers the users of GSM networks the ability to exchange alphanumeric messages up to the limit of 160 characters. The tutorial is motivated by an acute absence of research publications in this field. The information gathered in the tutorial was required considering the increasing potential SMS offers for integration with existing messaging services and its ability to offer a successful replacement for the transmission control and Internet protocols as far as low-bandwidth-demanding applications are concerned. Initially, the tutorial gives an overview of the building blocks of GSM networks-the mobile station, base station, and network subsystem-and then emphasizes the SMS network and protocol architecture. The most widely used protocols for message submission are then introduced (text-based, SMS2000, ETSI 0705, TAP) and compared in terms of features provided and flexibility to handle extended alphabets or two-way messaging. Finally the tutorial outlines a summary of current and future issues for further development and research in the light of novel features for submission protocols and telematic services.
Integration of SMS with voice based technology

Peersman, G.; Cvetkovic, S. R.; Smythe, C.; Spear, H.; Griffiths, P.
This paper presents one of the major aspects of a pan-European project called GAIA (Generic Architecture for Information Availability). The aim of GAIA is to set up an electronic commerce demonstrator based on a supplier independent architecture, operating in a non-monopolistic supply chain, and capable of supporting the search, retrieval and request for delivery of information, goods and services, as well as the delivery of information in the digital domain. In addition, the demonstrator will address the all-electronic and secure management of payment and subsequent royalty distribution in the following three sectors: music, publishing and technical data. This paper presents the achievements of the early trials undertaken as part of the project focused specifically on the role of GSM (Global System for Mobile Communications) and SMS (the GSM Short Message Service) as the underlying technology for GAIA compliant requests and delivery notification. This study examines ways in which SMS could be used as a transport medium to underpin the information gateways-as GAIA's contribution towards the development of the universal mailbox concept.
iSMS: An integration platform for short message service and IP networks

Rao, H. C. -H.; Chang, D. -F.; Lin, Y. -B
This article describes iSMS, a platform that integrates IP networks with the Short Message Service in mobile telephone systems. iSMS provides a generic gateway for creating and hosting wireless data services for mobile stations. Our approach does not require any modification to the mobile telephone system architecture. The iSMS system can be quickly developed and operated by a third party or end user without involvement of mobile equipment manufacturers and telecom operators. Based on the iSMS platform, we illustrate services such as e-mail delivery/forwarding, Web access (e.g., stock and train schedule query) and handset music services. The iSMS latform and the services have been implemented for GSM networks. With iSMS, users are able to use standard GSM handsets to access wireless Internet services, while other approaches like Wireless Application Protocol and SIM Toolkit services require function-enabled MSs.
WAP Back to Index
Zdnet article on WAP

The WAP Forum said yesterday that it has solved the security problems of wireless application protocol with the release of its next-gen specifications
3rd Generation (3G) Mobile Telephones Back to Index
3G products-what will the technology enable?

Harmer, J. A.; Friel, C. D
Following the phenomenal success of second generation global system for mobile communications (GSM) systems, the world has turned its attention to third generation mobile systems (3G). New radio spectrum has been allocated for these networks and over the last 12 months there has been global activity to license this spectrum. Licences have been awarded in a number of ways including 'beauty contests' and (often costly) auctions. Technologists have joined forces to specify the standards for 3G. This paper describes the drivers for 3G and the commercial model that is emerging. The benefits that 3G technology will provide for business and consumer products are identified. A component-based approach to application and product development is described, based on the important value add features of mobile systems. Finally, mobile terminals, billing and payment, and customer care are also considered as they are vital to the overall customer experience.
Technology Back to Index
Communications Chameleons

Multipurpose communications systems will be the links of tomorrow's wireless computer networks
Sun 'Cunning device'

New appliances for accessing the Web are emerging every day, but service providers and computer companies must think creatively if they want to make them fly
Information Architecture Back to Index
Conventions for Knowledge Representation via RDF

The Resource Description Framework [RDF] provides a basic model to describe relationships between objects. Ultimately, it is intended to permit the representation, combination and processing of most kinds of metadata from Web-accessible documents or databases. However, except for representing simple metadata, its current XML-based syntax [RDF syntax] and the set of basic classes that have been defined [RDF schema] are insufficient. To make extensions, the users are required to declare new classes in schemas or import schemas from other users. The problem is that similar/identical classes or features will probably be introduced by various users via different names or used in different ways, and this prevents the comparison, reuse and combination of the metadata. To maximize the reuse of metadata, we propose some lexical, structural and semantic conventions, inspired from various knowledge representation projects. These conventions would have to be agreed on and completed by the W3C commitee.
DAML Rules

A semantic markup language for web resources, builds on XML and RDF.
Knowledge-Based Access to Databases

Semantic data models for database systems provide powerful tools to assist database administrators in designing and maintaining schemas, but provide little or no direct support for users of the database. Some research has been done on mapping user models of a domain to the underlying database using semantic schemas. Little has been done, however, on mapping conceptually meaningful data structures to a database lacking a semantic schema, or to a multi-database system that lacks a consistent semantic schema. We argue for the appropriateness of a knowledge representation language for describing the database schema, user data structures, and the mapping between them; present a problem domain in which an existing relational database without a semantic schema must be accessed by a knowledge-based application; and describe our implementation of a system that provides access to a relational database from a KL/ONE-style knowledge representation language. With this background, we highlight recently-added capabilities of the implementation, and provide detailed examples.

Overview | Phone the Web resource library | People

the vault
The Vault
go to the k m i web site