Catálogo de publicaciones - libros

Compartir en
redes sociales


Natural Language Processing: IJCNLP 2004: First International Joint Conference, Hainan Island, China, March 22-24, 2004, Revised Selected Papers

Keh-Yih Su ; Jun’ichi Tsujii ; Jong-Hyeok Lee ; Oi Yee Kwong (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Mathematical Logic and Formal Languages; Language Translation and Linguistics; Information Storage and Retrieval; Algorithm Analysis and Problem Complexity; Document Preparation and Text Processing

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-24475-2

ISBN electrónico

978-3-540-30211-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin/Heidelberg 2005

Tabla de contenidos

Automatic Learning of Parallel Dependency Treelet Pairs

Yuan Ding; Martha Palmer

Induction of synchronous grammars from empirical data has long been an unsolved problem; despite generative synchronous grammars theoretically suit the machine translation task very well. This fact is mainly due to pervasive structural divergences between languages. This paper presents a statistical approach that learns dependency structure mappings from parallel corpora. The new algorithm automatically learns parallel dependency treelet pairs from loosely matched non-isomorphic dependency trees while keeping computational complexity polynomial in the length of the sentences. A set of heuristics is introduced and specifically optimized for parallel treelet learning purposes using Minimum Error Rate training.

- Machine Translation and Multilinguality | Pp. 233-243

Selecting Prosody Parameters for Unit Selection Based Chinese TTS

Minghui Dong; Kim-Teng Lua; Jun Xu

In unit selection text-to-speech approach, each unit is described by a set of parameters. However, which parameters effectively express prosody of speech is a problem. In this paper, we propose an approach to the determination of prosody parameters for unit selection-based speech synthesis. We are concerned about how prosody parameters can correctly describe tones and prosodic breaks in Chinese speech. First, we define and evaluate a set of parameters. Then, we cluster the parameters and select a representative parameter from each cluster. Finally, the parameters are evaluated in a real TTS system. Experiment shows that the selected parameters help to improve speech quality.

- NLP Software and Application | Pp. 272-279

Automatic Genre Detection of Web Documents

Chul Su Lim; Kong Joo Lee; Gil Chang Kim

A genre or a style is another view of documents different from a subject or a topic. The genre is also a criterion to classify the documents. There have been several studies on detecting a genre of textual documents. However, only a few of them dealt with web documents. In this paper we suggest sets of features to detect genres of web documents. Web documents are different from textual documents in that they contain URL and HTML tags within the pages. We introduce the features specific to web documents, which are extracted from URL and HTML tags. Experimental results enable us to evaluate their characteristics and performances.

- NLP Software and Application | Pp. 310-319

Statistical Substring Reduction in Linear Time

Xueqiang Lü; Le Zhang; Junfeng Hu

We study the problem of efficiently removing equal frequency -gram substrings from an -gram set, formally called Statistical Substring Reduction (SSR). SSR is a useful operation in corpus based multi-word unit research and new word identification task of oriental language processing. We present a new SSR algorithm that has linear time (()) complexity, and prove its equivalence with the traditional () algorithm. In particular, using experimental results from several corpora with different sizes, we show that it is possible to achieve performance close to that theoretically predicated for this task. Even in a small corpus the new algorithm is several orders of magnitude faster than the () one. These results show that our algorithm is reliable and efficient, and is therefore an appropriate choice for large scale corpus processing.

- NLP Software and Application | Pp. 320-327

Influence of WSD on Cross-Language Information Retrieval

In-Su Kang; Seung-Hoon Na; Jong-Hyeok Lee

Translation ambiguity is a major problem in dictionary-based cross-language information retrieval. This paper proposes a statistical word sense disambiguation (WSD) approach for translation ambiguity resolution. Then, with respect to CLIR effectiveness, the pure effect of a disambiguation module will be explored on the following issues: contribution of disambiguation weight to target term weighting, influences of WSD performance on CLIR retrieval effectiveness. In our investigation, we do not use pre-translation or post-translation methods to exclude any mixing effects on CLIR.

- Semantic Disambiguation | Pp. 358-366

Resolution of Modifier-Head Relation Gaps Using Automatically Extracted Metonymic Expressions

Yoji Kiyota; Sadao Kurohashi; Fuyuko Kido

This paper proposes a method of extracting metonymic expressions and their interpretative expressions from corpora and its application for the full-parsing-based matching method of a QA system. An evaluation showed that 79% of the extracted interpretations were correct, and an experiment using testsets indicated that introducing the metonymic expressions significantly improved the performance.

- Semantic Disambiguation | Pp. 367-376

Improving Word Sense Disambiguation by Pseudo-samples

Xiaojie Wang; Yuji Matsumoto

Data sparseness is a major problem in word sense disambiguation. Automatic sample acquisition and smoothing are two ways that have been explored to alleviate the influence of data sparseness. In this paper, we consider a combination of these two methods. Firstly, we propose a pattern-based way to acquire pseudo samples, and then we estimate conditional probabilities for variables by combining pseudo data set with sense tagged data set. By using the combinational estimation, we build an appropriate leverage between the two different data sets, which is vital to achieve the best performance. Experiments show that our approach brings significant improvement for Chinese word sense disambiguation.

- Semantic Disambiguation | Pp. 386-395

A Collaborative Ability Measurement for Co-training

Dan Shen; Jie Zhang; Jian Su; Guodong Zhou; Chew-Lim Tan

This paper explores collaborative ability of co-training algorithm. We propose a new measurement (CA) for representing the collaborative ability of co-training classifiers based on the overlapping proportion between certain and uncertain instances. The CA measurement indicates whether two classifiers can co-train effectively. We make theoretical analysis for CA values in co-training with independent feature split, with random feature split and without feature split. The experiments justify our analysis. We also explore two variations of the general co-training algorithm and analyze them using the CA measurement.

- Statistical Models and Machine Learning for NLP | Pp. 436-445

Flexible Margin Selection for Reranking with Full Pairwise Samples

Libin Shen; Aravind K. Joshi

Perceptron like large margin algorithms are introduced for the experiments with various margin selections. Compared to the previous perceptron reranking algorithms, the new algorithms use full pairwise samples and allow us to search for margins in a larger space. Our experimental results on the data set of [1] show that a perceptron like ordinal regression algorithm with uneven margins can achieve Recall/Precision of 89.5/90.0 on section 23 of Penn Treebank. Our result on margin selection can be employed in other large margin machine learning algorithms as well as in other NLP tasks.

- Statistical Models and Machine Learning for NLP | Pp. 446-455

NTPC: N-fold Templated Piped Correction

Dekai Wu; Grace Ngai; Marine Carpuat

We describe a broadly-applicable conservative error correcting model, N-fold Templated Piped Correction or NTPC (“nitpick”), that consistently improves the accuracy of existing high-accuracy base models. Under circumstances where most obvious approaches actually reduce accuracy more than they improve it, NTPC nevertheless comes with little risk of accidentally degrading performance. NTPC is particularly well suited for natural language applications involving high-dimensional feature spaces, such as bracketing and disambiguation tasks, since its easily customizable template-driven learner allows efficient search over the kind of complex feature combinations that have typically eluded the base models. We show empirically that NTPC yields small but consistent accuracy gains on top of even high-performing models like boosting. We also give evidence that the various extreme design parameters in NTPC are indeed necessary for the intended operating range, even though they diverge from usual practice.

- Statistical Models and Machine Learning for NLP | Pp. 476-486