Title
Including high-cardinality attributes in predictive models a case study in churn prediction in the energy sector
Author
Faculty/Department
Faculty of Applied Economics
Publication type
article
Publication
Amsterdam ,
Subject
Economics
Mathematics
Computer. Automation
Source (journal)
Decision support systems. - Amsterdam
Volume/pages
72(2015) , p. 72-81
ISSN
0167-9236
ISI
000351792600007
Carrier
E
Target language
English (eng)
Full text (Publishers DOI)
Affiliation
University of Antwerp
Abstract
High-cardinality attributes are categorical attributes that contain a very large number of distinct values, like for example: family names, zip codes or bankaccount numbers. Within a predictive modeling setting, such features could be highly informative as it might be useful to know that people live in the same village or pay with the same bankaccount number. Despite of this notable and intuitive advantage, high-cardinality attributes are rarely used in predictive modeling. The main reason for this is that including these attributes by using traditional transformation methods is either impossible due to anonymisation of the data (when using semantic grouping of the values) or will vastly increase the dimensionality of the data set (when using dummy encoding), thereby making it difficult or even impossible for most classification techniques to build prediction models. The main contributions of this work are (1) the introduction of several possible transformation functions coming from different domains and contexts, that allow to include high-cardinality features in predictive models. (2) Using a unique data set of a large energy company with more than 1 million customers, we show that adding such features indeed improves the predictive performance of the model significantly. Moreover, (3) we empirically demonstrate that having more data leads to better prediction models, which is not observed for traditional data. As such, we also contribute to the area of big data analytics. Keywords data mining; predictive modeling; high-cardinality attributes; churn prediction
E-info
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000351792600007&DestLinkType=RelatedRecords&DestApp=ALL_WOS&UsrCustomerID=ef845e08c439e550330acc77c7d2d848
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000351792600007&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=ef845e08c439e550330acc77c7d2d848
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000351792600007&DestLinkType=CitingArticles&DestApp=ALL_WOS&UsrCustomerID=ef845e08c439e550330acc77c7d2d848
Handle