Catálogo de publicaciones - libros

Compartir en
redes sociales


Privacy in Statistical Databases: CENEX-SDC Project International Conference, PSD 2006, Rome, Italy, December 13-15, 2006, Proceedings

Josep Domingo-Ferrer ; Luisa Franconi (eds.)

En conferencia: International Conference on Privacy in Statistical Databases (PSD) . Rome, Italy . December 13, 2006 - December 15, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Data Encryption; Database Management; Probability and Statistics in Computer Science; Computers and Society; Legal Aspects of Computing; Artificial Intelligence (incl. Robotics)

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-49330-3

ISBN electrónico

978-3-540-49332-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Improving Individual Risk Estimators

Loredana Di Consiglio; Silvia Polettini

The release of survey microdata files requires a preliminary assessment of the disclosure risk of the data. Record-level risk measures can be useful for “local” protection (e.g. partially synthetic data [21], or local suppression [25]), and are also used in [22] and [16] to produce global risk measures [13] useful to assess data release. Whereas different proposals to estimating such risk measures are available in the literature, so far only a few attempts have been targeted to the evaluation of the statistical properties of these estimators. In this paper we pursue a simulation study that aims to evaluate the statistical properties of risk estimators. Besides presenting results about the Benedetti-Franconi individual risk estimator (see [11]), we also propose a strategy to produce improved risk estimates, and assess the latter by simulation.

The problem of estimating per record reidentification risk enjoys many similarities with that of small area estimation (see [19]): we propose to introduce external information, arising from a previous census, in risk estimation. To achieve this we consider a simple strategy, namely Structure Preserving Estimation (SPREE) of Purcell and Kish [18], and show by simulation that this procedure provides better estimates of the individual risk of reidentification disclosure, especially for records whose risk is high.

- Utility and Risk in Microdata Protection | Pp. 243-256

Single-Database Private Information Retrieval Schemes : Overview, Performance Study, and Usage with Statistical Databases

Carlos Aguilar Melchor; Yves Deswarte

This paper presents an overview of the current single-database private information retrieval (PIR) schemes and proposes to explore the usage of these protocols with statistical databases. The vicinity of this research field with the one of Oblivious Transfer, and the different performance measures used for the last few years have resulted in re-discoveries and contradictory comparisons of performance in different publications. The contribution of this paper is twofold. First, we present the different schemes through the innovations they have brought to this field of research, which gives a global view of the evolution since the first of these schemes was presented by Kushilevitz and Ostrovsky in 1997. We know of no other survey of the current PIR protocols. We also compare the most representative of these schemes with a single set of communication performance measures. When compared to the usage of global communication cost as a single measure, we assert that this set simplifies the evaluation of the cost of using PIR and reveals the best adapted scheme to each situation. We conclude this overview and performance study by introducing some important issues resulting from PIR usage with statistical databases and highlighting some directions for further research.

- Protocols for Private Computation | Pp. 257-265

Privacy-Preserving Data Set Union

Alberto Maria Segre; Andrew Wildenberg; Veronica Vieland; Ying Zhang

This paper describes a cryptographic protocol for merging two or more data sets without divulging those identifying records; technically, the protocol computes a . Applications for this protocol arise, for example, in data analysis for biomedical application areas, where identifying fields ( patient names) are protected by governmental privacy regulations or by institutional research board policies.

- Protocols for Private Computation | Pp. 266-276

“Secure” Log-Linear and Logistic Regression Analysis of Distributed Databases

Stephen E. Fienberg; William J. Fulp; Aleksandra B. Slavkovic; Tracey A. Wrobel

The machine learning community has focused on confidentiality problems associated with statistical analyses that “integrate” data stored in multiple, distributed databases where there are barriers to simply integrating the databases. This paper discusses various techniques which can be used to perform statistical analysis for categorical data, especially in the form of log-linear analysis and logistic regression over partitioned databases, while limiting confidentiality concerns. We show how ideas from the current literature that focus on “secure” summations and secure regression analysis can be adapted or generalized to the categorical data setting.

- Protocols for Private Computation | Pp. 277-290

Measuring the Impact of Data Protection Techniques on Data Utility: Evidence from the Survey of Consumer Finances

Arthur Kennickell; Julia Lane

Despite the fact that much empirical economic research is based on public-use data files, the debate on the impact of disclosure protection on data quality has largely been conducted among statisticians and computer scientists. Remarkably, economists have shown very little interest in this subject, which has potentially profound implications for research. Without input from such subject-matter experts, statistical agencies may make decisions that unnecessarily obstruct analysis. This paper examines the impact of the application of disclosure protection techniques on a survey that is heavily used by both economists and policy-makers: the Survey of Consumer Finances. It evaluates the ability of different approaches to convey information about changes in data utility to subject matter experts.

- Case Studies | Pp. 291-303

Protecting the Confidentiality of Survey Tabular Data by Adding Noise to the Underlying Microdata: Application to the Commodity Flow Survey

Paul Massell; Laura Zayatz; Jeremy Funk

The Commodity Flow Survey (CFS) produces data on the movement of goods in the United States. The data from the CFS are used by analysts for transportation modeling, planning and decision-making. Cell suppression has been used over the years to protect responding companies’ values in CFS data. Data users, especially transportation modelers, would like to have access to data tables that do not have missing data due to suppression. To meet this need, we are testing the application of a noise protection method (Evans et al [3]) that involves adding noise to the underlying CFS microdata prior to tabulation to protect sensitive cells in CFS tables released to the public. Initial findings of this research have been positive. This paper describes detailed analyses that may be performed to evaluate the effectiveness of the noise protection.

- Case Studies | Pp. 304-317

Italian Household Expenditure Survey: A Proposal for Data Dissemination

Mario Trottini; Luisa Franconi; Silvia Polettini

In this paper we define a proposal for an alternative data dissemination strategy of the Italian Household Expenditure Survey (HES). The proposal moves from partitioning the set of users in different groups homogeneous in terms of needs, type of statistical analyses and access to external information. Such a partition allows the release of different data products that are hierarchical in information content and that may be protected using different data disclosure limitation methods. A new masking procedure that combines Migroaggregation and Data Swapping is proposed to preserve sampling weights.

- Case Studies | Pp. 318-333

The ARGUS Software in CENEX

Anco Hundepool

In this paper we will give an overview of the CENEX project and concentrate on the current state of affairs with respect to the ARGUS-software twins. The CENEX (Centre of Excellence) is a new initiative by Eurostat. The main idea behind the CENEX-concept is to join the forces of the national NSI’s and together bring the skills of the NSI’s on a higher level. The CENEX on Statistical Disclosure Control is a first pilot CENEX-project both aiming at testing the feasibility of the CENEX idea and working on SDC. This project will make a start of writing a handbook on SDC, after an inventory and extend the ARGUS software with an emphasis on issues of practical use. Within this CENEX we will organise the transfer of technology via courses, a WEB-site and this conference. Finally a roadmap for future work will hopefully lead to a follow-up CENEX.

In this paper we will summarise this CENEX-project and give a short overview of the current versions of ARGUS.

- Software | Pp. 334-346

Software Development for SDC in R

M. Templ

The production of scientific-use files from economic microdata is a major problem. Many common methods change the data in a way which leaves the univariate distribution of each of the variables almost unchanged towards the distribution of the variables of the original data, the multivariate structure of the data, however, is often ruined.

Which method are suitable strongly depends on the underlying data. A program system with which one can apply different methods and evaluate and compare results from different algorithms in a flexible way is needed. The use of methods for protecting microdata as an exploratory data analysis tool requires a powerful program system, able to present the results in a number of easy to grasp graphics. For this purpose some of the most populare procedures for anonymising micro data are applied in a flexible R-package. The R system supports flexible data import/export facilities and advanced developement tools for the development of such a software for disclosure control.

Additionally to existing algorithms in other software (MDAV algorithm for microaggregation, ...) some new algorithms for anonymising microdata are implemented, e.g. a fast algorithm for microaggregation with a projection pursuit approach. This algorithm outperforms existing other algorithms for most of real data.

For all this algorithms/methods print, summary and plot methods and methods for validation are implemented.

In the field of economics suppression of cells in marginal tables is likely to be the most popular method to protect tables for statistical agencies. The use of linear programming for cell suppression seems to be the best way of protecting tables and hierarchical tables.

Some R-packages for various fields of disclosure control are being developed at the moment. It is easy to learn the applications of disclosure control even with little previous knowledge because of its integrated online-help with examples ready to be executed.

- Software | Pp. 347-359

On Secure e-Health Systems

Milan Marković

This paper is devoted to e-healthcare security systems based on modern security mechanisms and Public Key Infrastructure (PKI) systems. We signified that only general and multi-layered security infrastructure could cope with possible attacks to e-healthcare systems. We evaluated security mechanisms on application, transport and network layers of ISO/OSI reference model. These mechanisms include confidentiality protection based on symmetrical cryptographic algorithms and digital signature technology based on asymmetrical algorithms for authentication, integrity protection and non-repudiation. User strong authentication procedures based on smart cards, digital certificates and PKI systems are especially emphasized. We gave a brief description of smart cards, HSMs and main components of the PKI systems, emphasizing Certification Authority and its role in establishing cryptographically unique identities of the valid system users based on X.509 digital certificates. Emerging e-healthcare systems and possible appropriate security mechanisms based on proposed Generic CA model are analyzed.

- Software | Pp. 360-374