Title
MDL4BMF: Minimum Description Length for Boolean Matrix Factorization MDL4BMF: Minimum Description Length for Boolean Matrix Factorization
Author
Faculty/Department
Faculty of Sciences. Mathematics and Computer Science
Publication type
article
Publication
New York :ACM ,
Subject
Computer. Automation
Source (journal)
ACM Transactions on knowledge discovery from data. - New York
Volume/pages
8(2014) :4 , 31 p.
ISSN
1556-4681
1556-4681
Article Reference
18
Carrier
E-only publicatie
Target language
English (eng)
Full text (Publishers DOI)
Affiliation
University of Antwerp
Abstract
Matrix factorizations-where a given data matrix is approximated by a product of two or more factor matrices-are powerful data mining tools. Among other tasks, matrix factorizations are often used to separate global structure from noise. This, however, requires solving the "model order selection problem" of determining the proper rank of the factorization, that is, to answer where fine-grained structure stops, and where noise starts. Boolean Matrix Factorization (BMF)-where data, factors, and matrix product are Boolean-has in recent years received increased attention from the data mining community. The technique has desirable properties, such as high interpretability and natural sparsity. Yet, so far no method for selecting the correct model order for BMF has been available. In this article, we propose the use of the Minimum Description Length (MDL) principle for this task. Besides solving the problem, this well-founded approach has numerous benefits; for example, it is automatic, does not require a likelihood function, is fast, and, as experiments show, is highly accurate. We formulate the description length function for BMF in general-making it applicable for any BMF algorithm. We discuss how to construct an appropriate encoding: starting from a simple and intuitive approach, we arrive at a highly efficient data-to-model-based encoding for BMF. We extend an existing algorithm for BMF to use MDL to identify the best Boolean matrix factorization, analyze the complexity of the problem, and perform an extensive experimental evaluation to study its behavior.
E-info
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000344352800002&DestLinkType=RelatedRecords&DestApp=ALL_WOS&UsrCustomerID=ef845e08c439e550330acc77c7d2d848
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000344352800002&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=ef845e08c439e550330acc77c7d2d848
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000344352800002&DestLinkType=CitingArticles&DestApp=ALL_WOS&UsrCustomerID=ef845e08c439e550330acc77c7d2d848
Handle