Depth-first non-derivable itemset mining

Calders, Toon; Goethals, Bart

Title

Author

Calders, Toon

Goethals, Bart

Abstract

Mining frequent itemsets is one of the main problems in data mining. Much effort went into developing efficient and scalable algorithms for this problem. When the support threshold is set too low, however, or the data is highly correlated, the number of frequent itemsets can become too large, independently of the algorithm used. Therefore, it is often more interesting to mine a reduced collection of interesting itemsets, i.e., a condensed representation. Recently, in this context, the non-derivable itemsets were proposed as an important class of itemsets. An itemset is called derivable when its support is completely determined by the support of its subsets. As such, derivable itemsets represent redundant information and can be pruned from the collection of frequent itemsets. It was shown both theoretically and experimentally that the collection of non-derivable frequent itemsets is in general much smaller than the complete set of frequent itemsets. A breadth-first, Apriori-based algorithm, called NDI, to find all non-derivable itemsets was proposed. In this paper we present a depth-first algorithm, dfNDI, that is based on Eclat for mining the non-derivable itemsets. dfNDI is evaluated on real-life datasets, and experiments show that dfNDI outperforms NDI with an order of magnitude.

Language

English

Source (book)

Proceedings of the SIAM International Conference on Data Mining

Publication

Philadelphia, Pa : Siam , 2005

ISBN

978-0-89871-593-4

Volume/pages

(2005) , p. 250-261

ISI

000289491000023

Faculty/Department				Faculty of Sciences. Mathematics and Computer Science

Research group				ADReM Data Lab (ADReM)
Publication type				P1 Proceeding

Subject				Computer. Automation

Affiliation				Publications with a UAntwerp address

Web of Science

View record in Web of Science®

View citing articles in Web of Science®

Identifier

Creation

08.10.2008

Last edited

04.03.2024

To cite this reference

https://hdl.handle.net/10067/552100151162165141