Catálogo de publicaciones - libros
Chinese Spoken Language, Processing: 5th International Symposium, ISCSLP 2006, Singapore, December 13-16, 2006, Proceedings
Qiang Huo ; Bin Ma ; Eng-Siong Chng ; Haizhou Li (eds.)
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Language Translation and Linguistics; Artificial Intelligence (incl. Robotics); Mathematical Logic and Formal Languages; Data Mining and Knowledge Discovery; Algorithm Analysis and Problem Complexity; Document Preparation and Text Processing
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-49665-6
ISBN electrónico
978-3-540-49666-3
Editor responsable
Springer Nature
País de edición
China
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Tabla de contenidos
doi: 10.1007/11939993_1
Interactive Computer Aids for Acquiring Proficiency in Mandarin
Stephanie Seneff
It is widely recognized that one of the best ways to learn a foreign language is through spoken dialogue with a native speaker. However, this is not a practical method in the classroom due to the one-to-one student/teacher ratio it implies. A potential solution to this problem is to rely on computer spoken dialogue systems to role play a conversational partner. This paper describes several multilingual dialogue systems specifically designed to address this need. Students can engage in dialogue with the computer either over the telephone or through audio/typed input at a Web page. Several different domains are being developed, in which a student’s conversational interaction is assisted by a software agent functioning as a “tutor” which can provide them with translation assistance at any time. Thus, two recognizers are running in parallel, one for English and one for Chinese. Some of the research issues surrounding high-quality spoken language translation and dialogue interaction with a non-native speaker are discussed.
- Plenary | Pp. 1-12
doi: 10.1007/11939993_2
The Affective and Pragmatic Coding of Prosody
Klaus R. Scherer
Prosody or intonation is a prime carrier of affective information, a function that has often been neglected in speech research. Most work on prosody has been informed by linguistic models of sentence intonation that focus on accent structure and which are based on widely differing theoretical assumptions. Human speech production includes prosodic coding of emotions, such as anger or happiness, pragmatic intonations, such as interrogative or affirmative modes, as part of the language codes. The differentiation between these two types of prosody still presents a major problem to speech researchers. It is argued that this distinction becomes more feasible when it is acknowledged that these two types of prosody are differently affected by the so-called “push” and “pull” effects. Push effects, influenced by psychophysiological activation, strongly affect emotional prosody, whereas pull effects, influenced by cultural rules of expression, predominantly affect intonation or pragmatic prosody, even though both processes influence all prosodic production. The push-pull distinction implies that biological marking (push) is directly externalized in motor expression, whereas pull effects (based on socio-cultural norms or desirable, esteemed reference persons) will require the shaping of the expression to conform to these models. Given that the underlying biological processes are likely to be dependent on both the idiosyncratic nature of the individual and the specific nature of the situation, we would expect relatively strong inter-individual differences in the expressive patterns resulting from push effects. This is not the case for pull effects. Here, because of the very nature of the models that pull the expression, we would expect a very high degree of symbolization and conventionalization, in other words comparatively few and small individual differences. With respect to cross-cultural comparison, we would expect the opposite: very few differences between cultures for push effects, large differences for pull effects.
- Plenary | Pp. 13-14
doi: 10.1007/11939993_3
Challenges in Machine Translation
Franz Josef Och
In recent years there has been an enormous boom in MT research. There has been not only an increase in the number of research groups in the field and in the amount of funding, but there is now also optimism for the future of the field and for achieving even better quality. The major reason for this change has been a paradigm shift away from linguistic/rule-based methods towards empirical/data-driven methods in MT. This has been made possible by the availability of large amounts of training data and large computational resources. This paradigm shift towards empirical methods has fundamentally changed the way MT research is done. The field faces new challenges. For achieving optimal MT quality, we want to train models on as much data as possible, ideally language models trained on hundreds of billions of words and translation models trained on hundreds of millions of words. Doing that requires very large computational resources, a corresponding software infrastructure, and a focus on systems building and engineering. In addition to discussing those challenges in MT research, the talk will also give specific examples on how some of the data challenges are being dealt with at Google Research.
- Plenary | Pp. 15-15
doi: 10.1007/11939993_4
Automatic Indexing and Retrieval of Large Broadcast News Video Collections – The TRECVID Experience
Tat-Seng Chua
Most existing operational systems rely purely on automatic speech recognition (ASR) text as the basis for news video indexing and retrieval. While current research shows that ASR text has been the most influential component, results of large scale news video processing experiments indicate that the use of other modality features and external information sources such as the Web is essential in various situations. This talk reviews the frameworks and machine learning techniques used to fuse the ASR text with multi-modal and multi-source information to tackle the challenging problems of story segmentation, concept detection and retrieval in broadcast news video. This paper also points the way towards the development of scalable technology to process large news video archives.
- Plenary | Pp. 16-16
doi: 10.1007/11939993_5
An HMM-Based Approach to Flexible Speech Synthesis
Keiichi Tokuda
The increasing availability of large speech databases makes it possible to construct speech synthesis systems, which are referred to as corpus-based, data-driven, speaker-driven, or trainable approach, by applying statistical learning algorithms. These systems, which can be automatically trained, not only generate natural and high quality synthetic speech but also can reproduce voice characteristics of the original speaker. This talk presents one of these approaches, HMM-based speech synthesis. The basic idea of the approach is very simple: just train HMMs (hidden Markov models) and generate speech directly from them. To realize such a speech synthesis system, however, we need some tricks: algorithms for speech parameter generation from HMMs, and a mel-cepstrum based vocoding technique are reviewed, and an approach to simultaneous modeling of phonetic and prosodic parameters (spectrum, F0, and duration) is also presented. The main feature of the system is the use of dynamic feature: by inclusion of dynamic coefficients in the feature vector, the speech parameter sequence generated in synthesis is constrained to be realistic, as defined by the parameters of the HMMs. The attraction of this approach is that voice characteristics of synthesized speech can easily be changed by transforming HMM parameters. Actually, it has been shown that we can change voice characteristics of synthetic speech by applying a speaker adaptation technique which has been used in speech recognition systems. The relationship between the HMM-based approach and other concatenative speech synthesis approaches is also discussed. In the talk, not only the technical description but also recent results and demos will be presented.
- Tutorial | Pp. 17-17
doi: 10.1007/11939993_6
Text Information Extraction and Retrieval
Hang Li
Every day people spend much time on creating, processing, and accessing information. In fact, most of the information exists in the form of "text", contained in books, emails, web pages, news paper articles, blogs, and reports. How to help people quickly find information from text data and how to help people discover new knowledge from text data has become an enormously important issue. Many research efforts have been made on text information extraction, retrieval, and mining; and significant progress has made in recent years. A large number of new methods have been proposed, and many systems have been developed and put into practical uses. This tutorial is aimed at giving an overview on two central topics of the area: namely Information Extraction (IE) and Information Retrieval (IR). Important technologies on them will be introduced. Specifically, models for IE such as Maximum Entropy Markov Model and Conditional Random Fields will be explained. Models for IR such as Language Model and Learning to Rank will be described. A brief survey on recent work on both IE and IR will be given. Finally, some recent work on the combined uses of IE and IR technologies will also be introduced.
- Tutorial | Pp. 18-18
doi: 10.1007/11939993_7
Mechanisms of Question Intonation in Mandarin
Jiahong Yuan
This study investigates mechanisms of question intonation in Mandarin Chinese. Three mechanisms of question intonation have been proposed: an overall higher phrase curve, higher strengths of sentence final tones, and a tone-dependent mechanism that flattens the falling slope of the final falling tone and steepens the rising slope of the final rising tone. The phrase curve and strength mechanisms were revealed by a computational modeling study and verified by the acoustic analyses as well as the perception experiments. The tone-dependent mechanism was suggested by a result from the perceptual study: question intonation is easier to identify if the sentence-final tone is falling whereas it is harder to identify if the sentence-final tone is rising, and was revealed by the acoustic analyses on the final Tone2 and Tone4.
- Topics in Speech Science | Pp. 19-30
doi: 10.1007/11939993_8
Comparison of Perceived Prosodic Boundaries and Global Characteristics of Voice Fundamental Frequency Contours in Mandarin Speech
Wentao Gu; Keikichi Hirose; Hiroya Fujisaki
Although there have been many studies on the prosodic structure of spoken Mandarin as well as many proposals for labeling the prosody of spoken Mandarin, the labeling of prosodic boundaries in all the existing annotation systems relies on auditory perception, and lacks a direct relation to the acoustic process of prosody generation. Besides, perception-based annotation cannot ensure a high degree of consistency and reliability. In the present study, we investigate the phrasing of spoken Mandarin from the production point of view, by using an acoustic model for generating contours. The relationship between perceived prosodic boundaries at various layers and phrase commands derived from the model-based analysis of contours is then revealed. The results indicate that a perception-based prosody labeling system cannot describe the prosodic structure as accurately as the model for contour generation.
- Topics in Speech Science | Pp. 31-42
doi: 10.1007/11939993_9
Linguistic Markings of Units in Spontaneous Mandarin
Shu-Chuan Tseng
Spontaneous speech is produced and probably also perceived in some kinds of units. This paper applies the perceptually defined intonation units to segment spontaneous Mandarin data. The main aim is to examine spontaneous data to see if linguistic cues which mark the unit boundaries exist. If the production of spontaneous speech is a kind of concatenation of these "chunks", we can deepen our understanding of human language processing and the related knowledge about the boundary markings can be applied to improve language models used for automatic speech recognizers. Our results clearly show that discourse items and repair resumptions, which are typical phenomena in spontaneous speech, are mostly located at the boundary of intonation unit. Moreover, temporal marking of items at unit boundary is empirically identified through a series of analyses making use of segmentation of intonation units and measurements of syllable durations.
- Topics in Speech Science | Pp. 43-54
doi: 10.1007/11939993_10
Phonetic and Phonological Analysis of Focal Accents of Disyllabic Words in Standard Chinese
Yuan Jia; Ziyu Xiong; Aijun Li
The article investigates the phonetic and phonological property of focal accents conveyed by disyllabic focused words with various tonal combinations in Standard Chinese. Phonetically, the effect of focal accents upon f resides in two aspects: the manner and the condition of focal accents. Phonologically, the distribution of focal accents is mainly concerned. Acoustic and perceptual experiments and the underlying tonal target of focused constituents are employed in both phonetic realization and phonological analysis. Major findings are that: f ranges of focused words are expanded as the H tones of both focused syllables are raised; the f of the post-focus syllables are compressed obviously in the way the H tones of Tone1 and Tone2 are lowered; the realization of accents is closely related to the tonal target of the focused words; specifically, accents influence the acoustic performances of tones; furthermore, the combination of H/L determines the distribution of accents.
- Topics in Speech Science | Pp. 55-66