Catálogo de publicaciones - libros

Compartir en
redes sociales

Aspects of Automatic Text Analysis

Alexander Mehler Reinhard Köhler

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Appl.Mathematics/Computational Methods of Engineering; Artificial Intelligence (incl. Robotics); Document Preparation and Text Processing; Language Translation and Linguistics; Pattern Recognition

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2007	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-37520-3

ISBN electrónico

978-3-540-37522-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2007

Información sobre derechos de publicación

Cobertura temática

Ingeniería eléctrica, electrónica e informática

Lenguas y literatura

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/978-3-540-37522-7_11

Inferring Meaning: Text, Technology and Questions of Induction

Michael Stubbs

Corpus linguists, including lexicographers, use methods which are often called ‘inductive‚. That is, they study large corpora or large data sets (such as wordfrequency lists) derived from these corpora, in order to identify patterns in the data. There is detailed discussion of a few statistical techniques (e.g. for identifying significant collocations), but little general discussion of the combination of automatic and intuitive methods which are used to make significant generalizations. It might be thought that, if linguists draw generalizations from large data sets, then they would generally agree about the resulting analyses, and certainly corpus work often reaches a remarkably large consensus across different studies. Findings from one corpus are regularly corroborated by studies of other independent corpora, and partly automated or computer-assisted analysis has led to major progress in the study of semantic and pragmatic data.

Part IV - Corpus Linguistic and Text Technological Modeling | Pp. 233-253

doi: 10.1007/978-3-540-37522-7_12

Linguistic Information Modeling: From Kilivila Verb Morphology to RelaxNG

Dieter Metzing; Jens Pönninghaus

We will explore the role of an advanced type of document grammar, RelaxNG, in the context of different approaches to the formalization of linguistic regularities based on corpora and XML annotations. Our domain of exploration will be Kilivila verb morphology. The following topics will be focused on: Which kind of regularities in the domain can be expressed given formal limitations of document grammars, i.e. tree grammars? Which linguistic analyses may be taken as a basis for document grammar development? In which way can a document grammar be sensitive to properties of annotations and raw data (document validation and data validation)? Which kinds of formalization may be helpful in the (semi-automatic) development of a document grammar in the case explored? In the first part we will consider aspects of Kilivila verb morphology from the point of view of linguistic analyses. In the second part different strategies for the development of a RelaxNG based document grammar will be examined.

Part IV - Corpus Linguistic and Text Technological Modeling | Pp. 255-276

doi: 10.1007/978-3-540-37522-7_13

Affix Discovery by Means of Corpora: Experiments for Spanish, Czech, Ralámuli and Chuj

Alfonso Medina-Urrea

Although the focus on morpheme discovering techniques originated within those linguistic schools which inherited from Franz Boas the concern for the unknown languages of the NewWorld, automatic, unsupervised morphological segmentation remains a field of interest for the computational processing and engineering of natural languages, as well as for the plain exercise of getting to know them intimately.

Part IV - Corpus Linguistic and Text Technological Modeling | Pp. 277-299

doi: 10.1007/978-3-540-37522-7_14

Licensing Strategies in Natural Language Processing

Jürgen Rolshoven

This article discusses strategies for licensing within the framework of generative grammar as well as their application in the LPS linguistic processing system. LPS is a Linguistic programming language developed at the Institute for Linguistic Data Processing at the University of Cologne. It also is a computer system which employs this language for natural language processing, in particular for machine translation. In the introduction we give a brief sketch of formal grammar development and derive the idea of licensing from this development. We also describe the generation of structures in linguistic processing systems by means of the object-oriented linguistic programming language LPS. The third part discusses optimization strategies through the competing of variant structures evaluated by means of licensing. The concluding fourth part discusses licensing, specifically as a topic of computational linguistics with the aim of distinguishing its placement within the domain of performance or competence.

Part IV - Corpus Linguistic and Text Technological Modeling | Pp. 301-320

doi: 10.1007/978-3-540-37522-7_15

The Surface of Argumentation and the Role of Subordinating Conjunctions

Winfried Lenders

In modern language processing systems like Machine Translation (MT), Information Extraction (IE) or Information Retrieval (IR) systems, the classification of texts and the declaration of the domain of a particular text may help to optimize the system’s performance. However, in the future, in most of these systems the identification of text type and text classification may be done by statistical methods in combination with some structural analysis (tagging, parsing, lemmatization). It may be interesting to see whether there are structural indicators for particular classes of texts as has been postulated by previous discourse theories. Statistical methods primarily use characteristics of the vocabulary of a text like the conditional probability that a certain element belongs to a certain class, neighbourhood measure or entropy etc., in order to identify the most probable class or domain of a text. These tools operate so much better when a text contains special terminology or special language. They do not use any structural criteria, which are – from the linguist’s point of view – most interesting and relevant for a typological classification. It seems, however, that up to now we have relatively poor knowledge of these characteristic differences and structural similarities of texts and registers (cf. Biber et al. [2, p. 106]). This could only be explored by an exhaustive and multidimensional exploration of large text corpora [2]. The reason why up to now such explorations are not yet available is that we do not have enough really sufficiently working tools for automatic structural analysis.

Part V - Text Categorization and Classification | Pp. 323-337

doi: 10.1007/978-3-540-37522-7_16

Computing with Words for Text Categorization

Janusz Kacprzyk; Slawomir Zadrożny

We discuss the use of some elements of Zadeh’s computing with words and perceptions paradigms (cf. Zadeh and Kacprzyk [37, 38]) for the formulation and solution of automatic text document categorization. This problem is constantly gaining importance and popularity in view of a fast proliferation of textual information available on the Internet. The main issues addressed are the document representation and classification. The use of fuzzy logic for both problems has already been quite deeply studied though for the latter, i.e. classification, mainly in a more general context. Our approach is based mainly on the use of usuality qualification in the computing with words and perception paradigm that is technically handled by Zadeh’s classic calculus of linguistically quantified propositions [36]. Moreover, we employ results related to fuzzy (linguistic) queries in information retrieval, in particular various interpretations of weights of query terms. The methods developed are illustrated by example of a well known text corpus.

Part V - Text Categorization and Classification | Pp. 339-362

doi: 10.1007/978-3-540-37522-7_17

Neural Networks, Fuzzy Models and Dynamic Logic

Leonid I. Perlovsky

The paper discusses possible relationships between computational intelligence, known mechanisms of the mind, semiotics, and computational linguistics. Mathematical mechanisms of concepts, emotions, and goals are described as a part of information processing in the mind and are related to language and thought processes in which an event (signals from surrounding world, text corpus, or inside the mind) is understood as a concept. Previous attempts in artificial intelligence at describing thought processes are briefly reviewed and their fundamental (mathematical) limitations are analyzed. The role of emotional signals in overcoming these past limitations is emphasized. The paper describes mathematical mechanisms of concepts applicable to sensory signals and linguistics; they are based on measures of similarities between models and signals. Linguistic similarities are discussed that can utilize various structures and rules proposed in computational linguistic literature. A hierarchical structure of the proposed method is capable of learning and recognizing concepts from textual data, from the level of words and up to sentences, groups of sentences, and towards large bodies of text. I briefly discuss a role of concepts as a mechanism unifying thinking and language and their possible role in language acquisition. A thought process is related to semiotic notions of signs and symbols. It is further related to understanding, imagination, intuition, and other processes in the mind. The paper briefly discusses relationships between the mind and brain and applications to understanding-based search engines.

Part V - Text Categorization and Classification | Pp. 363-386

doi: 10.1007/978-3-540-37522-7_18

A Cognitive Systems Approach to Automatic Text Analysis

Gert Rickheit; Hans Strohner

By regarding cognitive aspects, some shortcomings of traditional accounts of automatic text analysis can be avoided. In particular, at least the aspects of world knowledge, the interaction between text and reader and the impact of the communicative situation should be included. With regard to verbal information, a cognitive system is able to process a text by relating the text information to world knowledge and situational demands. On the basis of this interaction, the system produces inferences, which may lead to text analysis, text evaluation and communicative responses. As a core component of automatic text analysis, we present a cognitive theory of inference building. According to this theory, textual inferences are the product of an intimate interaction of verbal input and world knowledge in certain contexts. Without such inferential abilities, automatic text analysis is severely restricted. In order to prove this claim, we present some examples from various research projects.

Part VI - Cognitive Modeling | Pp. 389-399

doi: 10.1007/978-3-540-37522-7_19

System Theoretical Research on Language and Communication: The Extended Experimental-Simulative Method

Hans-Jürgen Eikmeyer; Walther Kindt; Hans Strohner

The following contribution presents experiences made with a system theoretical methodology within the frame of the Collaborative Research Center (CRC 360) at Bielefeld University. Starting point for this methodology is, on the one hand, the belief that theoretically and empirically backed research on the complex subject of natural language communication needs a systematic and interdisciplinary integration of methods. On the other hand, this kind of integration is possible only on the basis of a system theoretical conception of linguistics, which combines the predominating structural analytical approach with a procedural analytical approach. The experts from CRC called this a change in paradigms.

Part VI - Cognitive Modeling | Pp. 401-417

doi: 10.1007/978-3-540-37522-7_20

The Dimensionality of Text and Picture and the Cross-Cultural Organization of Semiotic Complexes

Wolfgang Wildgen

The distinction between picture and text involves a set of basic semiotic challenges. First, pictures are linked in their production to the motoricity of hands, in their receipt to the eye and the visual cortex. Language in its basic form, spoken language, is linked in its production to the motoricity of the human vocal apparatus (from the vocal cords to the lips) and in its perception by the ear and the auditory cortex. The dynamics of these four subsystems and moreover the coordination of the pairs of subsystems in production and reception define the base line of any comparison of picture and text. The fact that written texts map the characteristics of spoken texts onto the dynamics of hands and eyes (to abbreviate the more complete description above) points to the fact that transitions between the two basic modalities have been achieved in the last millennia. If we take abstract signs of the Palaeolithic as point of departure (cf. Wildgen [32, 34]), this (cultural) evolution has been running the last 30,000 years. An even deeper evolutionary opposition opposes manual/facial sign languages and spoken language. The origin of human language after the proto-language of Homo erectus was basically a dominance shift from a slower and less rich system, at least partially based on visual/motor articulations, to a much quicker and richer systems of phonetic/auditory articulation (cf. Wildgen [33]). We have no direct knowledge about the sign language of Homo erectus, but we may guess the characteristics of such a manually based language, if we consider modern signed languages. Due to the use of the manual/visual mode, they show, in spite of being constructed in parallel to existing phonetic languages, characteristic deviations (cf. Emmorey [5], and Lidell [10]). The most characteristic differences concern the diversity of parameters and the relevance of gradient subsystems. As [24] summarizes, spoken language has as major parameter the recombinant system based on phonetic quality.

Part VII - Visual Systems Modeling | Pp. 421-442