Data modelling in corpus linguistics : how low may we go?Data modelling in corpus linguistics : how low may we go?
Velzen, van, Marjolein H.
Faculty of Pharmaceutical, Biomedical and Veterinary Sciences . Biomedical Sciences
Neurochemistry and behaviour
Cortex. - Milano
(2013), p. 1-10
University of Antwerp
Corpus linguistics allows researchers to process millions of words. However, the more words we analyse, i.e., the more data we acquire, the more urgent the call for correct data interpretation becomes. In recent years, a number of studies saw the light attempting to profile some prolific authors' linguistic decline, linking this decline to pathological conditions such as Alzheimer's Disease (AD). However, in line with the nature of the (literary) work that was analysed, numbers alone do not suffice to tell the story. The one and only objective of using statistical methods for the analysis of research data is to tell a story what happened, when, and how. In the present study we describe a computerised but individualised approach to linguistic analysis we propose a unifying approach, with firm grounds in Information Theory, that, independently from the specific parameter being investigated, guarantees to produce a robust model of the temporal dynamics of an author's linguistic richness over his or her lifetime. We applied this methodology to six renowned authors with an active writing life of four decades or more: Iris Murdoch, Gerard Reve, Hugo Claus, Agatha Christie, P.D. James, and Harry Mulisch. The first three were diagnosed with probable Alzheimer Disease, confirmed post-mortem for Iris Murdoch; this same condition was hypothesized for Agatha Christie. Our analysis reveals different evolutive patterns of lexical richness, in turn plausibly correlated with the authors' different conditions.