Biblioteca Electrónica de Ciencia y Tecnología - Catálogo de publicaciones

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2006	SpringerLink

Introduction

Natalia Andrienko; Gennady Andrienko

The method of lexical chains is the first time introduced to generate summaries from Chinese texts. The algorithm which computes lexical chains based on the HowNet knowledge database is modified to improve the performance and suit Chinese summarization. Moreover, the construction rules of lexical chains are extended, and relationship among more lexical items is used. The algorithm constructs lexical chains first, and then strong chains are identified and significant sentences are extracted from the text to generate the summary. Evaluation results show that the performance of the system has a notable improvement both in precision and recall compared to the original system.

Pp. 1-16

doi: 10.1007/3-540-31190-4_2

Data

Natalia Andrienko; Gennady Andrienko

Data represent results of the observation or measurement of phenomena. By means of data analysis, people can study these phenomena. Data analysis can be regarded as seeking answers to various questions regarding the phenomena. These questions, or, in other words, data analysis tasks, are the focus of our attention. In this chapter, we attempt to develop a general view of data, which will help us to understand what data analysis tasks are potentially possible.

We distinguish two types of components of data, referrers and attributes, which can also be called independent and dependent variables. A dataset can be viewed on an abstract level as a correspondence between references, i.e. values of the referrers, and characteristics, i.e. values of the attributes. Here are a few examples:

As may be seen from the last example, a dataset may contain several referrers. The second example shows that a dataset may contain any number of attributes.

The examples demonstrate the three most important types of referrers:

The term “population” is used in an abstract sense to mean a group of any items, irrespective of their nature.

We introduce a general view of a dataset structure as a function (in the mathematical sense) defining the correspondence between the references and the characteristics.

Pp. 17-46

doi: 10.1007/3-540-31190-4_3

Tasks

Natalia Andrienko; Gennady Andrienko

In this chapter, we use the metaphor of a mathematical function to identify the types of tasks (questions) involved in exploratory data analysis. A task is viewed as consisting of two parts: the target, i.e. what information needs to be obtained, and the constraints, i.e. what conditions this information needs to fulfil. The target and constraints can be understood as unknown and known (specified) information, respectively; the goal is to find the initially unknown information corresponding to the specified information.

Our task typology has its origin in the ideas expressed by Jacques Bertin in his (Bertin 1967/1983). Like Bertin, we distinguish tasks according to the level of data analysis (“reading level”, in Bertin’s terms) but additionally take into account the division of data components into referrers and attributes:

The tasks are further divided according to the target (“question type”, in Bertin’s terms), i.e. what is the unknown information that needs to be found. At the elementary level, the target may be one or more characteristics (attribute values) or one or more references (referrer values). For example:

It is important that, when a task involves several references, each of them is dealt with individually.

We have extended Bertin’s ideas by explicitly considering the possible relations between references and between characteristics. Relations may also appear in a task target or may be used in task constraints. For example:

At the synoptic level of analysis, we introduce the notion of a “behaviour” — the set of all characteristics corresponding to a given reference (sub)set, considered in its entirety and its particular organisation with respect to the reference sub(set). The behaviour is a generalisation of such notions as distributions, variations, and trends; for example, the variation of the proportions of children over the whole country or the trend in a stock price over a week.

Synoptic tasks involve reference (sub)sets, behaviours, and relations between (sub)sets or between behaviours. Here are a few examples:

Elementary tasks play a marginal role in exploratory data analysis, as compared with synoptic tasks. Among synoptic tasks, the most challenging are tasks of finding significant connections between phenomena, such as cause-effect relations or structural links, and of finding the principles of the internal organisation, functioning, and development of a single phenomenon. We call such tasks “connectional”.

The main purpose of our task typology is to evaluate the existing tools and techniques for EDA in terms of their suitability for different tasks and to try to derive operational general principles for tool selection and tool design.

Pp. 47-161

doi: 10.1007/3-540-31190-4_4

Tools

Natalia Andrienko; Gennady Andrienko

In this chapter, we make an inventory of the tools suitable for supporting exploratory data analysis. Our major point is that the primary tool for analysis is the human imaginative mind, and that all other tools are supplementary. Only the human mind actually does the analysis; the other tools supply it with the necessary material, appropriately prepared and presented. The most appropriate form for the presentation of such material is visual, since the mind, as most scientists tend to agree, operates predominantly with images.

The techniques and software tools usable in exploratory data analysis are currently very numerous, and new tools continue to appear. It would be completely unfeasible to survey all of them. Therefore, we have tried instead to set out the major tool categories and describe the key functions and properties of each category. The resulting classification looks as follows:

We divide the visual expressive means into display dimensions and visual, or retinal, variables. Display dimensions provide a set of positions within a display at which graphical elements, or marks, can be placed. Retinal variables represent various properties of the marks: shape, size, colour, texture, orientation, etc. In addition to the visual dimensions of a display, such as width, height, or depth, we consider also the display time, which can be used, for example, in animated presentations.

In exploratory data analysis, it is usually not enough to use a single tool. Various tools need to be combined. We consider two basic modes of tool combination, sequential and concurrent, and discuss the various mechanisms used for tool combination. Visualisation is an essential component of any tool ensemble. Initial data visualisation is used in order to understand what tools should be used for further work. Results produced by any non-visual tool need to be visualised so that the analyst can see and interpret them

Throughout this chapter, we provide many examples of various tools. Even when discussing non-visual tools such as data manipulation or computational methods, we use visualisation intensively to illustrate the examples. Readers can easily note that we have taken every opportunity to stress the great role of visualisation in exploratory data analysis. At the beginning of the chapter, we make an attempt to substantiate the importance of visualisation.

Pp. 163-459

doi: 10.1007/3-540-31190-4_5

Principles

Natalia Andrienko; Gennady Andrienko

In this chapter, we describe the major principles that are used in exploring data and in choosing tools for this purpose. We have extracted these principles from our experience, by inspecting our usual approaches and choices when we receive new data that need to be analysed. However, these principles correspond very well to ideas expressed by other researchers in the areas of visualisation, data analysis, systems analysis, and cognitive psychology. This certifies the principles as generic, relevant not only to our particular way of handling data but also to some fundamental processes involved in exploration, reasoning, and understanding.

To show where the principles come from, we present a view of the process of data exploration as a combination of top-down and bottom-up procedures, i.e. analysis and synthesis. At the beginning, an explorer has the most general task: to characterise and explain the overall behaviour of the characteristics over the entire reference set. In the course of the exploration, this general task is decomposed into subtasks of various types. We illustrate this view by several examples, in which datasets with different structures are considered and the exploration procedures outlined. These examples demonstrate that the major instrument of exploratory analysis is the human mind, equipped with appropriate visual displays of the data, which provide an object for the explorer’s observations and food for his/her thought.

The great role of visualisation is also pronounced in our presentation of the principles. We introduce ten general principles of EDA:

In our presentation of these principles, we indicate what types of exploratory tasks they are relevant to and what categories of tools can support their implementation. By using examples of various kinds of data, we demonstrate how the principles can be implemented and what can be gained from this.

At the end of the chapter, we put all the principles into the overall context of exploratory data analysis, viewed as the systematic decomposition characteristics over the reference set”. We consider four general cases of analysis, depending on the peculiarities of the data:

Each successive case refers to the cases previously described, as the original task is decomposed and turned into a sequence of simpler operations dealing with subsets and slices of the data.

The cases are summarised compactly in the form of tables, which list the actions performed and specify the types of exploratory subtasks involved, the appropriate tool categories, and the relevant principles. We regard this as a summary of the major results of our study. We indicate the ways in which these results may be used by data explorers and by designers and developers of instruments for EDA. We also present an example of an application of the suggested generic scheme of data analysis to the exploration of a particular dataset.

Pp. 461-633

doi: 10.1007/3-540-31190-4_6

Conclusion

Natalia Andrienko; Gennady Andrienko

The method of lexical chains is the first time introduced to generate summaries from Chinese texts. The algorithm which computes lexical chains based on the HowNet knowledge database is modified to improve the performance and suit Chinese summarization. Moreover, the construction rules of lexical chains are extended, and relationship among more lexical items is used. The algorithm constructs lexical chains first, and then strong chains are identified and significant sentences are extracted from the text to generate the summary. Evaluation results show that the performance of the system has a notable improvement both in precision and recall compared to the original system.

Pp. 635-637

Catálogo de publicaciones - libros

Exploratory Analysis of Spatial and Temporal Data: A Systematic Approach

Natalia Andrienko Gennady Andrienko

Resumen/Descripción – provisto por la editorial

Palabras clave – provistas por la editorial

Disponibilidad

Información

Cobertura temática

Ciencias de la computación e información

Tabla de contenidos

Introduction

Natalia Andrienko; Gennady Andrienko

Data

Natalia Andrienko; Gennady Andrienko

Tasks

Natalia Andrienko; Gennady Andrienko

Tools

Natalia Andrienko; Gennady Andrienko

Principles

Natalia Andrienko; Gennady Andrienko

Conclusion

Natalia Andrienko; Gennady Andrienko