Catálogo de publicaciones - libros

Compartir en
redes sociales


Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWM' 05 Conference held in Gdansk, Poland, June 13-16, 2005

Mieczysław A. Kłopotek ; Sławomir T. Wierzchoń ; Krzysztof Trojanowski (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-25056-2

ISBN electrónico

978-3-540-32392-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

A Neural Network Based Morphological Analyser of the Natural Language

Piotr Jędrzejowicz; Jakub Strychowski

The paper proposes a morphological analyser supported by a neural network to inflect words written in Polish. The approach can be also applied to other languages. The main task of the analyser is to create base forms from the analysed words’ forms. Other objective is to provide grammatical information for the analysed form. Computational experiment results confirm that both objectives are fulfilled by the proposed neural network based morphological analyser. The common words are inflected with a very high quality of nearly 99.9%. Other words like geographical names and people’s names, thanks to the incorporation of neural network, inflect with a quality reaching 93.3%.

Part II - Regular Sessions: Computational Linguistics | Pp. 199-208

An Oracle Grammar-Rule Learning Based on Syntactic Knowledge

Minghu Jiang; Huiying Cai; Dafan Liu; Junzhen Wang

In this paper, we put forward two algorithms for Chinese oracle-bone grammar rules learning: automatically learning rule sets and the error-emended learning from the corpus. Through analysing the syntactic knowledge of oracle-bone inscription, an error-emended learning method is used to construct rule-base of oracle-bone phrase structure, which combines linguist’s introspection and summarize with corpora to capture rules and describe them with formalized symbols. By using part-of-speech and contextual information together, our experimental results show the learning and expression are effective for oracle-bone grammar rule-base.

Part II - Regular Sessions: Computational Linguistics | Pp. 209-218

Speech Interface for Internet Service “Yellow Pages”

Alexey Karpov; Andrey Ronzhin

The paper describes new automatic service intended for using telephone directory inquiries without human-operator. We propose to use the automatic speech recognition system for answering the queries of the users. The information about firms and organizations can be taken from the electronic catalogue “Yellow Pages of Saint-Petersburg” located in the Internet and regularly updated with new information. During development such system for Russian language there are sufficient problems, connected with complex structure of word-formation of Russian language. The developed speech recognition system SIRIUS has one additional level of Russian language representation - morphemic level. As a result the size of vocabulary and time for speech processing are significantly decreased. Also the dialogue model for voice control of electronic catalogue “Yellow Pages” is presented in the paper. The first experimental results allow to say about good perspectives of development of telephone speech recognizer for large vocabulary for Russian.

Part II - Regular Sessions: Computational Linguistics | Pp. 219-228

Automatic Information Classifier Using Rhetorical Structure Theory

Hassan Mathkour; Ameur Touir; Waleed Al-Sanie

Information classification is aimed to secure the documents from being disclosed. The information is classified according to their critical semantic. The decision of classifying a portion of the document as a ‘secret’ depends on the effect of its disclose in the organization the document written for. However, understanding the semantic of the document is not an easy task. The rhetorical structure theory (RST) is one of the leading theories aimed for this reason. In this paper, we will explain a technique to classify the information using RST.

Part II - Regular Sessions: Computational Linguistics | Pp. 229-236

Rule-Based Medical Content Extraction and Classification

Agnieszka Mykowiecka; Anna Kupść; Małgorzata Marciniak

We present the final version of the system for automatic content extraction from Polish medical data. The system combines general IE techniques with an external post-processing. The obtained data is normalized and linked to a simplified ontology. Then, it is automatically grouped to form more complex structures representing medical reports.

Part II - Regular Sessions: Computational Linguistics | Pp. 237-245

A Rule-Based Tagger for Polish Based on Genetic Algorithm

Maciej Piasecki; Bartłomiej Gaweł

In the paper an approach to the construction of rule-based morphosyntactic tagger for Polish is proposed. The core of the tagger are modules of rules (classification systems), acquired from the IPI PAN corpus by application of Genetic Algorithms. Each module is specialised in making decisions concerning different parts of a tag (a structure of attributes). The acquired rules are combined with linguistic rules made by hand and memory-based rules acquired also from the corpus. The construction of the tagger and experiments concerning its properties are also presented in the paper.

Part II - Regular Sessions: Computational Linguistics | Pp. 247-255

On Some Clustering Algorithms for Document Maps Creation

Krzysztof Ciesielski; Michał Dramiński; Mieczysław A. Kłopotek; Mariusz Kujawiak; Sławomir T. Wierzchoń

In this research paper we pinpoint at the need of redesigning of the WebSOM document map creation algorithm. We insist that the SOM clustering should be preceded by identifying major topics of the document collection. Furthermore, the SOM clustering should be preceded by a pre-clustering process resulting in creation of groups of documents with stronger relationships; the groups, not the documents, should be subject of SOM clustering. We propose appropriate algorithms and report on achieved improvements.

Part III - Regular Sessions: Search Engines | Pp. 259-268

Filtering Web Documents for a Thematic Warehouse Case Study: eDot a Food Risk Data Warehouse (extended)

Amar-Djalil Mezaour

Ordinary sources, like databases and general-pupose document collections, seems to be insufficient and inadequate to scale the needs and the requirements of the new generation of warehouses: thematic data warehouses. Knowing that more and more online thematic data is available, the web can be considered as a useful data source for populating thematic data warehouses. To do so, the warehouse data supplier must be able to filter the heterogeneous web content to keep only the documents corresponding to the warehouse topic. Therefore, building efficient automatic tools to characterize web documents dealing with a given thematic is essential to challenge the warehouse data acquisition issue. In this paper, we present our filtering approach implemented in an automatic tool called “. This tool is used to filter crawled documents to keep only the documents dealing with food risk. These documents are then stored in a thematic warehouse called “”. Our filtering approach is based on “”, a declarative web query langage that improves the expressive power of keyword-based queries.

Part III - Regular Sessions: Search Engines | Pp. 269-278

Data Merging in Life Science Data Integration Systems

Tadeusz Pankowski; Ela Hunt

An index-driven integration system provides access to a multitude of data sources: it uses pre-compiled indexes covering content of these sources. Such a scenario is especially attractive in life science applications which integrate data from hundreds of very valuable carefully maintained databases. A key bottleneck in building such systems is data merging where partial answers obtained from different data sources are to be merged and the problem of overlapping data should be solved. In response to a query the redundancy-free answer should be constructed. In the paper we propose a formal foundation for merging XML-like data and discuss indexing support for data merging.

Part III - Regular Sessions: Search Engines | Pp. 279-288

Approximation Quality of the RBS Ranking Algorithm

Marcin Sydow

The RBS algorithm is a novel link-based algorithm for ranking results of a search engine. RBS may be viewed as an extension of PageRank by a parameterized “back button” modeling. RBS is based on the “random surfer with back step” model [7] similarly as PageRank is based on the simpler “random surfer” model [4]. To scale to real Web RBS computes merely a fast probabilistic approximation of the ranking induced by the “random surfer with back step” model [6].

In this paper we experimentally measure the quality of this approximation using a high quality synthetic Web evolution model [5] of our own implementation.

The results demonstrate that RBS is a very good approximation to the “ideal” ranking. Furthermore, as the experiment shows, RBS clearly outperforms PageRank in “back step” modeling even if we try to parameterize the latter.

Part III - Regular Sessions: Search Engines | Pp. 289-296