Large-scale training data enhances silent speech decoding with around-ear EEG
Masakazu Inoue, Eri Hatakeyama, Yuya Kita, Shuntaro Sasai
Silent speech decoding (SSD) offers a potential communication alternative for individuals with impaired vocalization. However, conventional multi-electrode electroencephalography (EEG) or facial electromyography (EMG) systems require cumbersome preparation and are unsuitable for daily use. This study evaluates the practicality of SSD using a wearable around-ear EEG device, focusing on data scaling, cross-subject transfer, vocabulary extensibility, and online decoding performance. We collected 72 hours of around-ear EEG from 24 healthy participants and one individual with incomplete locked-in syndrome (LIS) during silent, vocalized, and attempted speech, and integrated these around-ear EEG recordings with prior EMG + high-density EEG datasets, yielding 282.4 total hours of training data. Using a 64-word classification task as the evaluation metric, we assessed: (1) whether larger datasets improve around-ear EEG–based SSD, (2) whether healthy-participant data supplement limited LIS-participant data despite articulatory differences, (3) transferability to unseen vocabulary, and (4) online user-interface performance. Large-scale EEG/EMG data improved SSD accuracy in both healthy participants and the LIS participant. Training on the heterogeneous dataset achieved 56.6% accuracy for healthy users and 47.3% for the LIS participant. Fine-tuning this decoder for new vocabulary increased the accuracy by 22 percentage points relative to training from scratch. Regression analysis showed that, for decoding in the LIS participant, data from the LIS participant contributed approximately four times the weight of healthy-participant data, quantifying data strategies for SSD. Online experiments achieved top-1/top-5 accuracies of 47.2%/76.0% for healthy users and 26.5%/49.1% for the LIS participant. The results indicate that lightweight, commercially feasible around-ear EEG can enable practical SSD when combined with large-scale healthy-participant data, supporting online operation. Moreover, models trained on a 64-word vocabulary facilitate decoding of a new vocabulary, providing a path toward SSD systems requiring minimal LIS-participant data. This study advances non-invasive silent speech decoding systems suitable for everyday communication.
https://doi.org/10.1088/1741-2552/ae54d0