Speech technology-based assessment of phoneme intelligibility in dysarthria
Faculty of Medicine and Health Sciences
International journal of language and communication disorders. - London
, p. 716-730
University of Antwerp
Background: Currently, clinicians mainly rely on perceptual judgements to assess intelligibility of dysarthric speech. Although often highly reliable, this procedure is subjective with a lot of intrinsic variables. Therefore, certain benefits can be expected from a speech technology-based intelligibility assessment. Previous attempts to develop an automated intelligibility assessment mainly relied on automatic speech recognition (ASR) systems that were trained to recognize the speech of persons without known impairments. In this paper automatic speech alignment (ASA) systems are used instead. In addition, previous attempts only made use of phonemic features (PMF). However, since articulation is an important contributing factor to intelligibility of dysarthric speech and since phonological features (PLF) are shared by multiple phonemes, phonological features may be more appropriate to characterize and identify dysarthric phonemes. Aims: To investigate the reliability of objective phoneme intelligibility scores obtained by three types of intelligibility models: models using only phonemic features (yielded by an automated speech aligner) (PMF models), models using only phonological features (PLF models), and models using a combination of phonemic and phonological features (PMF + PLF models). Methods & Procedures: Correlations were calculated between the objective phoneme intelligibility scores of 60 dysarthric speakers and the corresponding perceptual phoneme intelligibility scores obtained by a standardized perceptual phoneme intelligibility assessment. Outcomes & Results: The correlations between the objective and perceptual intelligibility scores range from 0.793 for the PMF models, over 0.828 for PLF models to 0.943 for PMF + PLF models. The features selected to obtain such high correlations can be divided into six main subgroups: (1) vowel-related phonemic and phonological features, (2) lateral-related features, (3) silence-related features, (4) fricative-related features, (5) velar-related features and (6) plosive-related features. Conclusions & Implications: The phoneme intelligibility scores of dysarthric speakers obtained by the three investigated intelligibility model types are reliable. The highest correlation between the perceptual and objective intelligibility scores was found for models combining phonemic and phonological features. The intelligibility scoring system is now ready to be implemented in a clinical tool.