Publication
Title
De-identification of clinical free text in Dutch with limited training data : a case study
Author
Abstract
In order to analyse the information present in medical records while maintaining patient privacy, there is a basic need for techniques to automatically de-identify the free text information in these records. This paper presents a machine learning deidentification system for clinical free text in Dutch, relying on best practices from the state of the art in de-identification of English-language texts. We combine string and pattern matching features with machine learning algorithms and compare performance of three different experimental setups using Support Vector Machines and Random Forests on a limited data set of one hundred manually obfuscated texts provided by Antwerp University Hospital (UZA). The setup with the best balance in precision and recall during development was tested on an unseen set of raw clinical texts and evaluated manually at the hospital site.
Language
English
Source (book)
Workshop on NLP for Medicine and Biology
Publication
S.l. : 2013
Volume/pages
p. 18-23
UAntwerpen
Faculty/Department
Research group
[E?say:metaLocaldata.cgzprojectinf]
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
Record
Identification
Creation 30.01.2015
Last edited 31.01.2015
To cite this reference