Publication
Title
Customs fraud detection : assessing the value of behavioural and high-cardinality data under the imbalanced learning issue
Author
Abstract
In this customs fraud detection application, we analyse a unique data set of 9,624,124 records resulting from a collaboration with the Belgian customs administration. They are faced with increasing levels of international trade, which pressurizes regulatory control. Governments therefore rely on data mining to focus their limited resources on the most likely fraud cases. The literature on data mining for customs fraud detection lacks in two main directions that are simultaneously addressed in this paper: (1) behavioural and high-cardinality data types are neglected due to a lack of methodology to include them. We demonstrate that such fine-grained features (e.g. the specific entities such as consignee, consignor and declarant and the commodities involved in a declaration) are very predictive. (2) Studies in the tax domain most often use standard learning algorithms on their fraud detection applications. However, customs data are highly imbalanced and this poses challenges for many inducers. We present a new EasyEnsemble method that integrates a support vector machine base learner in a confidence-rated boosting algorithm. This results in a fast and scalable learner that is able to drastically improve predictive performance over the base application of a support vector machine. The results of our proposed framework reveals high AUC and lift values that translate into an immediate impact on the customs fraud detection domain through an improved retrieval of tax losses and an enhanced deterrence.
Language
English
Source (journal)
Pattern analysis and applications. - London, 1998, currens
Publication
London : 2020
ISSN
1433-7541 [print]
1433-755X [online]
DOI
10.1007/S10044-019-00852-W
Volume/pages
23 (2020) , p. 1457-1477
ISI
000493491100002
Full text (Publisher's DOI)
Full text (open access)
Full text (publisher's version - intranet only)
UAntwerpen
Faculty/Department
Research group
Project info
Data mining for tax fraud detection.
Digitalisation and Tax (DigiTax).
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
Web of Science
Record
Identifier
Creation 04.11.2019
Last edited 09.10.2023
To cite this reference