KRIMP : mining itemsets that compress

Vreeken, Jilles; van Leeuwen, Matthijs; Siebes, Arno

doi:10.1007/S10618-010-0202-X

Title

KRIMP : mining itemsets that compress

Author

Vreeken, Jilles

van Leeuwen, Matthijs

Siebes, Arno

Abstract

One of the major problems in pattern mining is the explosion of the number of results. Tight constraints reveal only common knowledge, while loose constraints lead to an explosion in the number of returned patterns. This is caused by large groups of patterns essentially describing the same set of transactions. In this paper we approach this problem using the MDL principle: the best set of patterns is that set that compresses the database best. For this task we introduce the Krimp algorithm. Experimental evaluation shows that typically only hundreds of itemsets are returned; a dramatic reduction, up to seven orders of magnitude, in the number of frequent item sets. These selections, called code tables, are of high quality. This is shown with compression ratios, swap-randomisation, and the accuracies of the code table-based Krimp classifier, all obtained on a wide range of datasets. Further, we extensively evaluate the heuristic choices made in the design of the algorithm.

Language

English

Source (journal)

Data mining and knowledge discovery. - Boston, Mass., 1997, currens

Publication

Boston, Mass. : 2011

ISSN

1384-5810 [print]

1573-756X [online]

DOI

10.1007/S10618-010-0202-X

Volume/pages

23 :1 (2011) , p. 169-214

ISI

000289106000005

Full text (Publisher's DOI)

https://doi.org/10.1007/S10618-010-0202-X

Full text (publisher's version - intranet only)

https://repository.uantwerpen.be/docman/iruaauth/cbc566/4382898.pdf

Faculty/Department				Faculty of Sciences. Mathematics and Computer Science

Research group				ADReM Data Lab (ADReM)

Publication type				A1 Journal article

Subject				Computer. Automation

Affiliation				Publications with a UAntwerp address

Web of Science

View record in Web of Science®

View citing articles in Web of Science®

Identifier

Creation

19.02.2013

Last edited

15.11.2022

To cite this reference

https://hdl.handle.net/10067/1055820151162165141