Publication
Title
Data mining techniques for software effort estimation: a comparative study
Author
Abstract
A predictive model is required to be accurate and comprehensible in order to inspire confidence in a business setting. Both aspects have been assessed in a software effort estimation setting by previous studies. However, no univocal conclusion to which technique is the most suited has been reached. This study addresses this issue by reporting on the results of a large scale benchmarking study. Different types of techniques are under consideration including techniques inducing tree/rule based models like M5 and CART, linear models such as various types of linear regression, non-linear models (MARS, multi layered perceptron neural networks, radial basis function networks and least squares support vector machines), and estimation techniques that not explicitly induce a model (e.g. a case based reasoning approach). Furthermore, the aspect of feature subset selection by using a generic backward input selection wrapper is investigated. The results are subjected to rigorous statistical testing and indicate that ordinary least squares regression in combination with a logarithmic transformation performs best. Another key finding is that by selecting a subset of highly predictive attributes such as project size, development, and environment related attributes, typically a significant increase in estimation accuracy can be obtained.
Language
English
Source (journal)
IEEE transactions on software engineering. - New York, N.Y.
Publication
New York, N.Y. : 2012
ISSN
0098-5589
Volume/pages
38:2(2012), p. 375-397
ISI
000301915200009
Full text (Publishers DOI)
UAntwerpen
Faculty/Department
Research group
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
Web of Science
Record
Identification
Creation 11.01.2011
Last edited 07.04.2017
To cite this reference