Catálogo de publicaciones - libros

Compartir en
redes sociales


Natural Language Processing: IJCNLP 2004: First International Joint Conference, Hainan Island, China, March 22-24, 2004, Revised Selected Papers

Keh-Yih Su ; Jun’ichi Tsujii ; Jong-Hyeok Lee ; Oi Yee Kwong (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Mathematical Logic and Formal Languages; Language Translation and Linguistics; Information Storage and Retrieval; Algorithm Analysis and Problem Complexity; Document Preparation and Text Processing

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-24475-2

ISBN electrónico

978-3-540-30211-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin/Heidelberg 2005

Tabla de contenidos

Tagging Complex NEs with MaxEnt Models: Layered Structures Versus Extended Tagset

Deyi Xiong; Hongkui Yu; Qun Liu

The paper discusses two policies for recognizing NEs with complex structures by maximum entropy models. One policy is to develop cascaded MaxEnt models at different levels. The other is to design more detailed tags with human knowledge in order to represent complex structures. The experiments on Chinese organization names recognition indicate that layered structures result in more accurate models while extended tags can not lead to positive results as expected. We empirically prove that the {} tag set is the best tag set for NE recognition with MaxEnt models.

- Taggers, Chunkers, Shallow Parsers | Pp. 537-544

A Nearest-Neighbor Method for Resolving PP-Attachment Ambiguity

Shaojun Zhao; Dekang Lin

We present a nearest-neighbor algorithm for resolving prepositional phrase attachment ambiguities. Its performance is significantly higher than previous corpus-based methods for PP-attachment that do not rely on manually constructed knowledge bases. We will also show that the PP-attachment task provides a way to evaluate methods for computing distributional word similarities. Our experiments indicate that the cosine of pointwise mutual information vector is a significantly better similarity measure than several other commonly used similarity measures.

- Taggers, Chunkers, Shallow Parsers | Pp. 545-554

Collecting Evaluative Expressions for Opinion Extraction

Nozomi Kobayashi; Kentaro Inui; Yuji Matsumoto; Kenji Tateishi; Toshikazu Fukushima

Automatic extraction of human opinions from Web documents has been receiving increasing interest. To automate the process of opinion extraction, having a collection of evaluative expressions such as “ is confortable” would be useful. However, it can be costly to manually create an exhaustive list of such expressions for many domains, because they tend to be domain-dependent. Motivated by this, we explored ways to accelerate the process of collecting evaluative expressions by applying a text mining technique. This paper proposes a semi-automatic method that uses particular cooccurrence patterns of evaluated subjects, focused attributes and value expressions.

- Text Mining | Pp. 596-605

A Study of Semi-discrete Matrix Decomposition for LSI in Automated Text Categorization

Wang Qiang; Wang XiaoLong; Guan Yi

This paper proposes the use of Latent Semantic Indexing (LSI) techniques, decomposed with semi-discrete matrix decomposition (SDD) method, for text categorization. The SDD algorithm is a recent solution to LSI, which can achieve similar performance at a much lower storage cost. In this paper, LSI is used for text categorization by constructing new features of category as combinations or transformations of the original features. In the experiments on data set of Chinese Library Classification we compare accuracy to a classifier based on k-Nearest Neighbor (k-NN) and the result shows that k-NN based on LSI is sometimes significantly better. Much future work remains, but the results indicate that LSI is a promising technique for text categorization.

- Text Mining | Pp. 606-615

Systematic Construction of Hierarchical Classifier in SVM-Based Text Categorization

Yongwook Yoon; Changki Lee; Gary Geunbae Lee

In a text categorization task, classification on some hierarchy of classes shows better results than the case without the hierarchy. In current environments where large amount of documents are divided into several subgroups with a hierarchy between them, it is more natural and appropriate to use a hierarchical classification method. We introduce a new internal node evaluation scheme which is very helpful to the development process of a hierarchical classifier. We also show that the hierarchical classifier construction method using this measure yields a classifier with better classification performance especially when applied to the classification task with large depth of hierarchy.

- Text Mining | Pp. 616-625

Chinese Treebanks and Grammar Extraction

Keh-Jiann Chen; Yu-Ming Hsieh

Preparation of knowledge bank is a very difficult task. In this paper, we discuss the knowledge extraction from the manually examined Sinica Treebank. Categorical information, word-to-word relation, word collocations, new syntactic patterns and sentence structures are obtained. A searching system for Chinese sentence structure was developed in this study. By using pre-extracted data and SQL commands, the system replies the user’s queries efficiently. We also analyze the extracted grammars to study the tradeoffs between the granularity of the grammar rules and their coverage as well as ambiguities. It provides the information of knowing how large a treebank is sufficient for the purpose of grammar extraction. Finally, we also analyze the tradeoffs between grammar coverage and ambiguity by parsing results from the grammar rules of different granularity.

- Theories and Formalisms for Morphology, Syntax and Semantics | Pp. 655-663

Corpus-Oriented Grammar Development for Acquiring a Head-Driven Phrase Structure Grammar from the Penn Treebank

Yusuke Miyao; Takashi Ninomiya; Jun’ichi Tsujii

This paper describes a method of semi-automatically acquiring an English HPSG grammar from the Penn Treebank. First, heuristic rules are employed to annotate the treebank with of HPSG. Lexical entries are automatically extracted from the annotated corpus by inversely applying HPSG schemata to partially-specified derivation trees. Predefined HPSG schemata assure the acquired lexicon to conform to the theoretical formulation of HPSG. Experimental results revealed that this approach enabled us to develop an HPSG grammar with significant robustness at small cost.

- Theories and Formalisms for Morphology, Syntax and Semantics | Pp. 684-693

Chinese New Word Finding Using Character-Based Parsing Model

Yao Meng; Hao Yu; Fumihito Nishino

The new word finding is a difficult and indispensable task in Chinese segmentation. The traditional methods used the string statistical information to identify the new words in the large-scale corpus. But it is neither convenient nor powerful enough to describe the words’ internal and external structure laws. And it is even the less effective when the occurrence frequency of the new words is very low in the corpus. In this paper, we present a novel method of using parsing information to find the new words. A character level PCFG model is trained by People Daily corpus and Penn Chinese Treebank. The characters are inputted into the character parsing system, and the words are determined by the parsing tree automatically. Our method describes the word-building rules in the full sentences, and takes advantage of rich context to find the new words. This is especially effective in identifying the occasional words or rarely used words, which are usually in low frequency. The preliminary experiments indicate that our method can substantially improve the precision and recall of the new word finding process.

- Word Segmentation | Pp. 733-742

Annotation of Gene Products in the Literature with Gene Ontology Terms Using Syntactic Dependencies

Jung-jae Kim; Jong C. Park

We present a method for automatically annotating gene products in the literature with the terms of Gene Ontology (GO), which provides a dynamic but controlled vocabulary. Although GO is well-organized with such lexical relations as synonymy, ‘is-a’, and ‘part-of’ relations among its terms, GO terms show quite a high degree of morphological and syntactic variations in the literature. As opposed to the previous approaches that considered only restricted kinds of term variations, our method uncovers the syntactic dependencies between gene product names and ontological terms as well in order to deal with real-world syntactic variations, based on the observation that the component words in an ontological term usually appear in a sentence with established patterns of syntactic dependencies.

- Thematic Session: Text Mining in Biomedicine | Pp. 787-796

Mining Biomedical Abstracts: What’s in a Term?

Goran Nenadic; Irena Spasic; Sophia Ananiadou

In this paper we present a study of the usage of terminology in the biomedical literature, with the main aim to indicate phenomena that can be helpful for automatic term recognition in the domain. Our analysis is based on the terminology appearing in the Genia corpus. We analyse the usage of biomedical terms and their variants (namely inflectional and orthographic alternatives, terms with prepositions, coordinated terms, etc.), showing the variability and dynamic nature of terms used in biomedical abstracts. Term coordination and terms containing prepositions are analysed in detail. We also show that there is a discrepancy between terms used in the literature and terms listed in controlled dictionaries. In addition, we briefly evaluate the effectiveness of incorporating treatment of different types of term variation into an automatic term recognition system.

- Thematic Session: Text Mining in Biomedicine | Pp. 797-806