Title
|
|
|
|
Finding robust itemsets under subsampling
| |
Author
|
|
|
|
| |
Abstract
|
|
|
|
Mining frequent patterns is plagued by the problem of pattern explosion, making pattern reduction techniques a key challenge in pattern mining. In this article we propose a novel theoretical framework for pattern reduction by measuring the robustness of a property of an itemset such as closedness or nonderivability. The robustness of a property is the probability that this property holds on random subsets of the original data. We study four properties, namely an itemset being closed, free, non-derivable, or totally shattered, and demonstrate how to compute the robustness analytically without actually sampling the data. Our concept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not assume a null hypothesis or any noise model and, in contrast to noise-tolerant or approximate patterns, the robust patterns for a given property are always a subset of the patterns with this property. If the underlying property is monotonic then the measure is also monotonic, allowing us to efficiently mine robust itemsets. We further derive a parameter-free technique for ranking itemsets that can be used for top-k approaches. Our experiments demonstrate that we can successfully use the robustness measure to reduce the number of patterns and that ranking yields interesting itemsets. |
| |
Language
|
|
|
|
English
| |
Source (journal)
|
|
|
|
ACM transactions on database systems. - New York, N.Y., 1976, currens
| |
Publication
|
|
|
|
New York, N.Y.
:
2014
| |
ISSN
|
|
|
|
0362-5915
1557-4644
[online]
| |
DOI
|
|
|
|
10.1145/2656261
| |
Volume/pages
|
|
|
|
39
:3
(2014)
, 27 p.
| |
Article Reference
|
|
|
|
20
| |
ISI
|
|
|
|
000343423500003
| |
Medium
|
|
|
|
E-only publicatie
| |
Full text (Publisher's DOI)
|
|
|
|
| |
Full text (publisher's version - intranet only)
|
|
|
|
| |
|