Catálogo de publicaciones - libros
Chinese Spoken Language, Processing: 5th International Symposium, ISCSLP 2006, Singapore, December 13-16, 2006, Proceedings
Qiang Huo ; Bin Ma ; Eng-Siong Chng ; Haizhou Li (eds.)
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Language Translation and Linguistics; Artificial Intelligence (incl. Robotics); Mathematical Logic and Formal Languages; Data Mining and Knowledge Discovery; Algorithm Analysis and Problem Complexity; Document Preparation and Text Processing
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-49665-6
ISBN electrónico
978-3-540-49666-3
Editor responsable
Springer Nature
País de edición
China
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Tabla de contenidos
doi: 10.1007/11939993_61
Automatic Detection of Tone Mispronunciation in Mandarin
Li Zhang; Chao Huang; Min Chu; Frank Soong; Xianda Zhang; Yudong Chen
In this paper we present our study on detecting tone mispronunciations in Mandarin. Both template and HMM approaches are investigated. Schematic templates of pitch contours are shown to be impractical due to their larger pitch range of inter-, even intra-speaker variation. The statistical Hidden Markov Models (HMM) is used to generate a Goodness of Pronunciation (GOP) score for detection with an optimized threshold. To deal with the discontinuity issue of the F0 in speech, the multi-space distribution (MSD) modeling is used for building corresponding HMMs. Under an MSD-HMM framework, detection performance of different choices of features, HMM types and GOP measures are evaluated.
- Human Language Acquisition, Development and Learning | Pp. 590-601
doi: 10.1007/11939993_62
Towards Automatic Tone Correction in Non-native Mandarin
Mitchell Peabody; Stephanie Seneff
Feedback is an important part of foreign language learning and (CALL) systems. For pronunciation tutoring, one method to provide feedback is to provide examples of correct speech for the student to imitate. However, this may be frustrating if a student is unable to completely match the example speech. This research advances towards providing feedback using a student’s own voice. Using the case of an American learning Mandarin Chinese, the differences between native and non-native pronunciations of Mandarin tone are highlighted, and a method for correcting tone errors is presented, which uses pitch transformation techniques to alter student tone productions while maintaining other voice characteristics.
- Human Language Acquisition, Development and Learning | Pp. 602-613
doi: 10.1007/11939993_63
A Corpus-Based Approach for Cooperative Response Generation in a Dialog System
Zhiyong Wu; Helen Meng; Hui Ning; Sam C. Tse
This paper presents a corpus-based approach for cooperative response generation in a spoken dialog system for the Hong Kong tourism domain. A corpus with 3874 requests and responses is collected using Wizard-of- Oz framework. The corpus then undergoes a regularization process that simplifies the interactions to ease subsequent modeling. A semi-automatic process is developed to annotate each utterance in the dialog turns in terms of their key concepts (KC), task goal (TG) and dialog acts (DA). TG and DA characterize the informational goal and communicative goal of the utterance respectively. The annotation procedure is integrated with a dialog modeling heuristic and a discourse inheritance strategy to generate a semantic abstraction (SA), in the form of {, , }, for each user request and system response in the dialog. Semantic transitions, i.e. {, , }→{, , }, may hence be directly derived from the corpus as rules for . Related verbalization methods may also be derived from the corpus and used as templates for . All the rules and templates are stored externally in a human-readable text file which brings the advantage of easy extensibility of the system. Evaluation of this corpus based approach shows that 83% of the generated responses are coherent with the user fs request and qualitative rating achieves a score of 4.0 on a five-point Likert scale.
- Spoken and Multimodal Dialog Systems | Pp. 614-626
doi: 10.1007/11939993_64
A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion
Lei Xie; Helen Meng; Zhi-Qiang Liu
This paper proposes a novel approach towards a video- realistic, speech-driven talking face for Cantonese. We present a technique that realizes a talking face for a target language (Cantonese) using only audio-visual facial recordings for a base language (English). Given a Cantonese speech input, we first use a Cantonese speech recognizer to generate a Cantonese syllable transcription. Then we map it to an English phoneme transcription via a translingual mapping scheme that involves symbol mapping and time alignment from Cantonese syllables to English phonemes. With the phoneme transcription, the input speech, and the audio-visual models for English, an EM-based conversion algorithm is adopted to generate mouth animation parameters associated with the input Cantonese audio. We have carried out audio-visual syllable recognition experiments to objectively evaluate the proposed talking face. Results show that the visual speech synthesized by the Cantonese talking face can effectively increase the accuracy of Cantonese syllable recognition under noisy acoustic conditions.
- Spoken and Multimodal Dialog Systems | Pp. 627-639
doi: 10.1007/11939993_65
The Implementation of Service Enabling with Spoken Language of a Multi-modal System Ozone
Sen Zhang; Yves Laprie
In this paper we described the architecture and key issues of the service enabling layer of a multi-modal system Ozone which is oriented for new technologies and services for emerging nomadic societies. The main objective of the Ozone system is to offer a generic framework to enable consumer-oriented Ambient-Intelligence applications. As a large multi-modal system, Ozone consists of many functional modules. However, spoken language played an important role to facilitate the usage of the system. Hence, we presented the design principle of the architecture of the system, the service enabling layer, and spoken language processing techniques in multi-modal interaction, etc.
- Spoken and Multimodal Dialog Systems | Pp. 640-647
doi: 10.1007/11939993_66
Spoken Correction for Chinese Text Entry
Bo-June Paul Hsu; James Glass
With an average of 17 Chinese characters per phonetic syllable, correcting conversion errors with current phonetic input method editors (IMEs) is often painstaking and time consuming. We explore the application of spoken character description as a correction interface for Chinese text entry, in part motivated by the common practice of describing Chinese characters in names for self-introductions. In this work, we analyze typical character descriptions, extend a commercial IME with a spoken correction interface, and evaluate the resulting system in a user study. Preliminary results suggest that although correcting IME conversion errors with spoken character descriptions may not be more effective than traditional techniques for everyone, nearly all users see the potential benefit of such a system and would recommend it to friends.
- Spoken and Multimodal Dialog Systems | Pp. 648-659
doi: 10.1007/11939993_67
Extractive Chinese Spoken Document Summarization Using Probabilistic Ranking Models
Yi-Ting Chen; Suhan Yu; Hsin-Min Wang; Berlin Chen
The purpose of extractive summarization is to automatically select indicative sentences, passages, or paragraphs from an original document according to a certain target summarization ratio, and then sequence them to form a concise summary. In this paper, in contrast to conventional approaches, our objective is to deal with the extractive summarization problem under a probabilistic modeling framework. We investigate the use of the hidden Markov model (HMM) for spoken document summarization, in which each sentence of a spoken document is treated as an HMM for generating the document, and the sentences are ranked and selected according to their likelihoods. In addition, the relevance model (RM) of each sentence, estimated from a contemporary text collection, is integrated with the HMM model to improve the representation of the sentence model. The experiments were performed on Chinese broadcast news compiled in Taiwan. The proposed approach achieves noticeable performance gains over conventional summarization approaches.
- Speech Data Mining and Document Retrieval | Pp. 660-671
doi: 10.1007/11939993_68
Meeting Segmentation Using Two-Layer Cascaded Subband Filters
Manuel Giuliani; Tin Lay Nwe; Haizhou Li
The extraction of information from recorded meetings is a very important yet challenging task. The problem lies in the inability of speech recognition systems to be directly applied onto meeting speech data, mainly because meeting participants speak concurrently and head-mounted microphones record more than just their wearers’ utterances – crosstalk from his neighbours are inevitably recorded as well. As a result, a degree of preprocessing of these recordings is needed. The current work presents an approach to segment meetings into four audio classes: , , and . For this purpose, we propose Two-Layer Cascaded Subband Filters, which spread according to the pitch and formant frequency scales. This filters are able to detect the presence or absence of pitch and formants in an audio signal. In addition, the filters can determine how many numbers of pitches and formants are present in an audio signal based on the output subband energies. Experiments conducted on the ICSI meeting corpus, show that although an overall recognition rate of up to 57% was achieved, rates for crosstalk and silence classes are as high as 80%. This indicates the positive effect and potential of this subband feature in meeting segmentation tasks.
- Speech Data Mining and Document Retrieval | Pp. 672-682
doi: 10.1007/11939993_69
A Multi-layered Summarization System for Multi-media Archives by Understanding and Structuring of Chinese Spoken Documents
Lin-shan Lee; Sheng-yi Kong; Yi-cheng Pan; Yi-sheng Fu; Yu-tsun Huang; Chien-Chih Wang
The multi-media archives are very difficult to be shown on the screen, and very difficult to retrieve and browse. It is therefore important to develop technologies to summarize the entire archives in the network content to help the user in browsing and retrieval. In a recent paper [1] we proposed a complete set of multi-layered technologies to handle at least some of the above issues: (1) Automatic Generation of Titles and Summaries for each of the spoken documents, such that the spoken documents become much more easier to browse, (2) Global Semantic Structuring of the entire spoken document archive, offering to the user a global picture of the semantic structure of the archive, and (3) Query-based Local Semantic Structuring for the subset of the spoken documents retrieved by the user’s query, providing the user the detailed semantic structure of the relevant spoken documents given the query he entered. The Probabilistic Latent Semantic Analysis (PLSA) is found to be helpful. This paper presents an initial prototype system for Chinese archives with the functions mentioned above, in which the broadcast news archive in Mandarin Chinese is taken as the example archive.
- Speech Data Mining and Document Retrieval | Pp. 683-692
doi: 10.1007/11939993_70
Initial Experiments on Automatic Story Segmentation in Chinese Spoken Documents Using Lexical Cohesion of Extracted Named Entities
Devon Li; Wai-Kit Lo; Helen Meng
Story segmentation plays a critical role in spoken document processing. Spoken documents often come in a continuous audio stream without explicit boundaries related to stories or topics. It is important to be able to automatically segment these audio streams into coherent units. This work is an initial attempt to make use of informative lexical terms (or key terms) in recognition transcripts of Chinese spoken documents for story segmentation. This is because changes in the distribution of informative terms are generally associated with story changes and topic shifts. Our methods of information lexical term extraction include the extraction of POS-tagged nouns, as well as a named entity identifier that extracts Chinese person names, transliterated person names, location and organization names. We also adopted a lexical chaining approach that links up sentences that are lexically “coherent” with each other. This leads to the definition of a lexical chain score that is used for story boundary hypothesis. We conducted experiments on the recognition transcripts of the TDT2 Voice of America Mandarin speech corpus. We compared among several methods of story segmentation, including the use of pauses for story segmentation, the use of lexical chains of all lexical entries in the recognition transcripts, the use of lexical chains of nouns tagged by a part-of-speech tagger, as well as the use of lexical chains of extracted named entities. Lexical chains of informative terms, namely POS-tagged nouns and named entities were found to give comparable performance (F-measures of 0.71 and 0.73 respectively), which is superior to the use of all lexical entries (F-measure of 0.69).
- Speech Data Mining and Document Retrieval | Pp. 693-703