Title




Approximation of frequentness probability of itemsets in uncertain data
 
Author




 
Abstract




Mining frequent itemsets from transactional datasets is a well known problem with good algorithmic solutions. Most of these algorithms assume that the input data is free from errors. Real data, however, is often affected by noise. Such noise can be represented by uncertain datasets in which each item has an existence probability. Recently, Bernecker et al. (2009) proposed the frequentness probability; i.e., the probability that a given itemset is frequent, to select itemsets in an uncertain database. A dynamic programming approach to evaluate this measure was given as well. We argue, however, that for the setting of Bernecker et al. (2009), that assumes independence between the items, already wellknown statistical tools exist. We show how the frequentness probability can be approximated extremely accurately using a form of the central limit theorem. We experimentally evaluated our approximation and compared it to the dynamic programming approach. The evaluation shows that our approximation method is extremely accurate even for very small databases while at the same time it has much lower memory overhead and computation time. 
 
Language




English
 
Source (book)




Proceedings of the IEEE International Conference on Data Mining (ICDM), 2010
 
Publication




Los Alamitos, Calif.
:
IEEE Computer Society
,
2010
 
Volume/pages




p. ?
 
