Catálogo de publicaciones - libros

Compartir en
redes sociales


Privacy in Statistical Databases: CENEX-SDC Project International Conference, PSD 2006, Rome, Italy, December 13-15, 2006, Proceedings

Josep Domingo-Ferrer ; Luisa Franconi (eds.)

En conferencia: International Conference on Privacy in Statistical Databases (PSD) . Rome, Italy . December 13, 2006 - December 15, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Data Encryption; Database Management; Probability and Statistics in Computer Science; Computers and Society; Legal Aspects of Computing; Artificial Intelligence (incl. Robotics)

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-49330-3

ISBN electrónico

978-3-540-49332-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

A Fixed Structure Learning Automaton Micro-aggregation Technique for Secure Statistical Databases

Ebaa Fayyoumi; B. John Oommen

We consider the problem of securing statistical databases and, more specifically, the micro-aggregation technique (), which coalesces the individual records in the micro-data file into groups or classes, and on being queried, reports, for the all individual values, the aggregated means of the corresponding group. This problem is known to be NP-hard and has been tackled using many heuristic solutions. In this paper we present the first reported Learning Automaton () based solution to the . The modifies a fixed-structure solution to the () to solve the micro-aggregation problem. The scheme has been implemented, rigorously tested and evaluated for different real and simulated data sets. The results clearly demonstrate the applicability of to the micro-aggregation problem, and to yield a solution that obtains a lower information loss when compared to the best available heuristic methods for micro-aggregation.

- Methods for Microdata Protection | Pp. 114-128

Optimal Multivariate 2-Microaggregation for Microdata Protection: A 2-Approximation

Josep Domingo-Ferrer; Francesc Sebé

Microaggregation is a special clustering problem where the goal is to cluster a set of points into groups of at least points in such a way that groups are as homogeneous as possible. Microaggregation arises in connection with anonymization of statistical databases for privacy protection (-anonymity), where points are assimilated to database records. A usual group homogeneity criterion is within-groups sum of squares minimization . For multivariate points, optimal microaggregation, with minimum , has been shown to be NP-hard. Recently, a polynomial-time ()-approximation heuristic has been proposed (previous heuristics in the literature offered no approximation bounds). The special case =2 (2-microaggregation) is interesting in privacy protection scenarios with neither internal intruders nor outliers, because information loss is lower: smaller groups imply smaller information loss. For 2-microaggregation the existing general approximation can only guarantee a 54-approximation. We give here a new polynomial-time heuristic whose is at most twice the minimum (2-approximation).

- Methods for Microdata Protection | Pp. 129-138

Using the Jackknife Method to Produce Safe Plots of Microdata

Jobst Heitzig

We discuss several methods for producing plots of uni- and bivariate distributions of confidential numeric microdata so that no single value is disclosed even in the presence of detailed additional knowledge, using the jackknife method of confidentiality protection. For histograms (as for frequency tables) this is similar to adding white noise of constant amplitude to all frequencies. Decreasing the bin size and smoothing, leading to kernel density estimation in the limit, gives more informative plots which need less noise for protection. Detail can be increased by choosing the bandwidth locally. Smoothing also the noise (i.e. using correlated noise) gives more visual improvement. Additional protection comes from robustifying the kernel density estimator or plotting only classified densities as in contour plots.

- Methods for Microdata Protection | Pp. 139-151

Combining Blanking and Noise Addition as a Data Disclosure Limitation Method

Anton Flossmann; Sandra Lechner

Statistical disclosure limitation is widely used by data collecting institutions to provide safe individual data. In this paper, we propose to combine two separate disclosure limitation techniques blanking and addition of independent noise in order to protect the original data. The proposed approach yields a decrease in the probability of reidentifying/disclosing the individual information, and can be applied to linear as well as nonlinear regression models.

We show how to combine the blanking method and the measurement error method, and how to estimate the model by the combination of the Simulation-Extrapolation (SIMEX) approach proposed by [4] and the Inverse Probability Weighting (IPW) approach going back to [8]. We produce Monte-Carlo evidence on how the reduction of data quality can be minimized by this masking procedure.

- Methods for Microdata Protection | Pp. 152-163

Why Swap When You Can Shuffle? A Comparison of the Proximity Swap and Data Shuffle for Numeric Data

Krish Muralidhar; Rathindra Sarathy; Ramesh Dandekar

The rank based proximity swap has been suggested as a data masking mechanism for numerical data. Recently, more sophisticated procedures for masking numerical data that are based on the concept of “shuffling” the data have been proposed. In this study, we compare and contrast the performance of the swapping and shuffling procedures. The results indicate that the shuffling procedures perform better than data swapping both in terms of data utility and disclosure risk.

- Methods for Microdata Protection | Pp. 164-176

Adjusting Survey Weights When Altering Identifying Design Variables Via Synthetic Data

Robin Mitra; Jerome P. Reiter

Statistical agencies alter values of identifiers to protect respondents’ confidentiality. When these identifiers are survey design variables, leaving the original survey weights on the file can be a disclosure risk. Additionally, the original weights may not correspond to the altered values, which impacts the quality of design-based (weighted) inferences. In this paper, we discuss some strategies for altering survey weights when altering design variables. We do so in the context of simulating identifiers from probability distributions, i.e. partially synthetic data. Using simulation studies, we illustrate aspects of the quality of inferences based on the different strategies.

- Methods for Microdata Protection | Pp. 177-188

Risk, Utility and PRAM

Peter-Paul de Wolf

PRAM (Post Randomization Method) is a disclosure control method for microdata, introduced in 1997. Unfortunately, PRAM has not yet been applied extensively by statistical agencies in protecting their microdata. This is partly due to the fact that little knowledge is available on the effect of PRAM on disclosure control as well as on the loss of information it induces.

In this paper, we will try to make up for this lack of knowledge, by supplying some empirical information on the behaviour of PRAM. To be able to achieve this, some basic measures for loss of information and disclosure risk will be introduced. PRAM will be applied to one specific microdata file of over 6 million records, using several models in applying the procedure.

- Utility and Risk in Microdata Protection | Pp. 189-204

Distance Based Re-identification for Time Series, Analysis of Distances

Jordi Nin; Vicenç Torra

Record linkage is a technique for linking records from different files or databases that correspond to the same entity. Standard record linkage methods need the files to have some variables in common. Typically, variables are either numerical or categorical. These variables are the basis for permitting such linkage.

In this paper we study the problem when the files to link are formed by numerical time series instead of numerical variables. We study some extensions of distance base record linkage in order to take advantage of this kind of data.

- Utility and Risk in Microdata Protection | Pp. 205-216

Beyond -Anonymity: A Decision Theoretic Framework for Assessing Privacy Risk

Guy Lebanon; Monica Scannapieco; Mohamed R. Fouad; Elisa Bertino

An important issue any organization or individual has to face when managing data containing sensitive information, is the risk that can be incurred when releasing such data. Even though data may be sanitized, before being released, it is still possible for an adversary to reconstruct the original data by using additional information that may be available, for example, from other data sources. To date, however, no comprehensive approach exists to quantify such risks. In this paper we develop a framework, based on statistical decision theory, to assess the relationship between the disclosed data and the resulting privacy risk. We relate our framework with the -anonymity disclosure method; we make the assumptions behind -anonymity explicit, quantify them, and extend them in several natural directions.

- Utility and Risk in Microdata Protection | Pp. 217-232

Using Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment

Vicenç Torra; John M. Abowd; Josep Domingo-Ferrer

Distance-based record linkage (DBRL) is a common approach to empirically assessing the disclosure risk in SDC-protected microdata. Usually, the Euclidean distance is used. In this paper, we explore the potential advantages of using the Mahalanobis distance for DBRL. We illustrate our point for partially synthetic microdata and show that, in some cases, Mahalanobis DBRL can yield a very high re-identification percentage, far superior to the one offered by other record linkage methods.

- Utility and Risk in Microdata Protection | Pp. 233-242