On Differentially Private Frequent Itemset Mining

Chen Zeng; Jeffrey F Naughton; Jin-Yi Cai

doi:10.14778/2428536.2428539

On Differentially Private Frequent Itemset Mining

VLDB J. 2012 Nov 1;6(1):25-36. doi: 10.14778/2428536.2428539.

Authors

Chen Zeng¹, Jeffrey F Naughton, Jin-Yi Cai

Affiliation

¹ Department of Computer Science, University of Wisconsin-Madison, Madison, WI, 53706.

Abstract

We consider differentially private frequent itemset mining. We begin by exploring the theoretical difficulty of simultaneously providing good utility and good privacy in this task. While our analysis proves that in general this is very difficult, it leaves a glimmer of hope in that our proof of difficulty relies on the existence of long transactions (that is, transactions containing many items). Accordingly, we investigate an approach that begins by truncating long transactions, trading off errors introduced by the truncation with those introduced by the noise added to guarantee privacy. Experimental results over standard benchmark databases show that truncating is indeed effective. Our algorithm solves the "classical" frequent itemset mining problem, in which the goal is to find all itemsets whose support exceeds a threshold. Related work has proposed differentially private algorithms for the top-k itemset mining problem ("find the k most frequent itemsets".) An experimental comparison with those algorithms show that our algorithm achieves better F-score unless k is small.

Grants and funding

R01 LM011028/LM/NLM NIH HHS/United States