Title
Summarizing categorical data by clustering attributes Summarizing categorical data by clustering attributes
Author
Faculty/Department
Faculty of Sciences. Mathematics and Computer Science
Publication type
article
Publication
Boston, Mass. ,
Subject
Computer. Automation
Source (journal)
Data mining and knowledge discovery. - Boston, Mass.
Volume/pages
26(2013) :1 , p. 130-173
ISSN
1384-5810
ISI
000313116400005
Carrier
E
Target language
English (eng)
Full text (Publishers DOI)
Affiliation
University of Antwerp
Abstract
For a book, its title and abstract provide a good first impression of what to expect from it. For a database, obtaining a good first impression is typically not so straightforward. While low-order statistics only provide very limited insight, downright mining the data rapidly provides too much detail for such a quick glance. In this paper we propose a middle ground, and introduce a parameter-free method for constructing high-quality descriptive summaries of binary and categorical data. Our approach builds a summary by clustering attributes that strongly correlate, and uses the Minimum Description Length principle to identify the best clustering-without requiring a distance measure between attributes. Besides providing a practical overview of which attributes interact most strongly, these summaries can also be used as surrogates for the data, and can easily be queried. Extensive experimentation shows that our method discovers high-quality results: correlated attributes are correctly grouped, which is verified both objectively and subjectively. Our models can also be employed as surrogates for the data; as an example of this we show that we can quickly and accurately query the estimated supports of frequent generalized itemsets.
E-info
https://repository.uantwerpen.be/docman/iruaauth/c8cc45/3013631.pdf
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000313116400005&DestLinkType=RelatedRecords&DestApp=ALL_WOS&UsrCustomerID=ef845e08c439e550330acc77c7d2d848
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000313116400005&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=ef845e08c439e550330acc77c7d2d848
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000313116400005&DestLinkType=CitingArticles&DestApp=ALL_WOS&UsrCustomerID=ef845e08c439e550330acc77c7d2d848
Handle