DIFFERENTIALLY PRIVATE OUTLIER DETECTION IN A COLLABORATIVE ENVIRONMENT

Int J Coop Inf Syst. 2018 Sep;27(3):1850005. doi: 10.1142/S0218843018500053. Epub 2018 Jul 3.

Abstract

Outlier detection is one of the most important data analytics tasks and is used in numerous applications and domains. The goal of outlier detection is to find abnormal entities that are significantly different from the remaining data. Often the underlying data is distributed across different organizations. If outlier detection is done locally, the results obtained are not as accurate as when outlier detection is done collaboratively over the combined data. However, the data cannot be easily integrated into a single database due to privacy and legal concerns. In this paper, we address precisely this problem. We first define privacy in the context of collaborative outlier detection. We then develop a novel method to find outliers from both horizontally partitioned and vertically partitioned categorical data in a privacy-preserving manner. Our method is based on a scalable outlier detection technique that uses attribute value frequencies. We provide an end-to-end privacy guarantee by using the differential privacy model and secure multiparty computation techniques. Experiments on real data show that our proposed technique is both effective and efficient.

Keywords: Distributed Data; Outlier Detection; Privacy.