Publications｜JapanEEG

Large-scale training data enhances silent speech decoding with around-ear EEG

Masakazu Inoue, Eri Hatakeyama, Yuya Kita, Shuntaro Sasai

Silent speech decoding (SSD) offers a potential communication alternative for individuals with impaired vocalization. However, conventional multi-electrode electroencephalography (EEG) or facial electromyography (EMG) systems require cumbersome preparation and are unsuitable for daily use. This study evaluates the practicality of SSD using a wearable around-ear EEG device, focusing on data scaling, cross-subject transfer, vocabulary extensibility, and online decoding performance. We collected 72 hours of around-ear EEG from 24 healthy participants and one individual with incomplete locked-in syndrome (LIS) during silent, vocalized, and attempted speech, and integrated these around-ear EEG recordings with prior EMG + high-density EEG datasets, yielding 282.4 total hours of training data. Using a 64-word classification task as the evaluation metric, we assessed: (1) whether larger datasets improve around-ear EEG–based SSD, (2) whether healthy-participant data supplement limited LIS-participant data despite articulatory differences, (3) transferability to unseen vocabulary, and (4) online user-interface performance. Large-scale EEG/EMG data improved SSD accuracy in both healthy participants and the LIS participant. Training on the heterogeneous dataset achieved 56.6% accuracy for healthy users and 47.3% for the LIS participant. Fine-tuning this decoder for new vocabulary increased the accuracy by 22 percentage points relative to training from scratch. Regression analysis showed that, for decoding in the LIS participant, data from the LIS participant contributed approximately four times the weight of healthy-participant data, quantifying data strategies for SSD. Online experiments achieved top-1/top-5 accuracies of 47.2%/76.0% for healthy users and 26.5%/49.1% for the LIS participant. The results indicate that lightweight, commercially feasible around-ear EEG can enable practical SSD when combined with large-scale healthy-participant data, supporting online operation. Moreover, models trained on a 64-word vocabulary facilitate decoding of a new vocabulary, providing a path toward SSD systems requiring minimal LIS-participant data. This study advances non-invasive silent speech decoding systems suitable for everyday communication.

https://doi.org/10.1088/1741-2552/ae54d0

Seeing from the inside out: Intrinsic neural dynamics explain individual differences in movie watching

Masakazu Inoue, Masafumi Oizumi, Shuntaro Sasai

When viewing the same movie, individuals attend to different visual elements and construct distinct narratives. We hypothesized that these differences arise partly from intrinsic neural dynamics that specify upcoming visual content before eye movements occur, particularly when external visual saliency is low. Using fMRI and eye-tracking during a two-hour movie (n=12), we examined spontaneous gaze shifts during low versus high visual saliency periods. During low-saliency periods, when inter-individual variability was greatest, we found that visual features of upcoming gaze targets 2 seconds before eye movements could be predicted by right temporo-parieto-occipital junction (R-TPOJ) activity. Critically, the strength of connectivity between R-TPOJ and the right superior parietal lobule, which predicts gaze position, correlated with gaze-shift frequency during low-saliency periods. These findings suggest that R-TPOJ specifies the next scene content prior to foveation, and that eye movements serve to align this internally specified content with external stimuli, thereby shaping individual differences during movie-watching.

A Silent Speech Decoding System from EEG and EMG with Heterogenous Electrode Configurations

Masakazu Inoue, Motoshige Sato, Kenichi Tomeoka, Nathania Nah, Eri Hatakeyama, Kai Arulkumaran, Ilya Horiguchi, Shuntaro Sasai

Silent speech decoding, which performs unvocalized human speech recognition from electroencephalography/electromyography (EEG/EMG), increases accessibility for speech-impaired humans. However, data collection is difficult and performed using varying experimental setups, making it nontrivial to collect a large, homogeneous dataset. In this study we introduce neural networks that can handle EEG/EMG with heterogeneous electrode placements and show strong performance in silent speech decoding via multi-task training on large-scale EEG/EMG datasets. We achieve improved word classification accuracy in both healthy participants (95.3%), and a speech-impaired patient (54.5%), substantially outperforming models trained on single-subject data (70.1% and 13.2%). Moreover, our models also show gains in cross-language calibration performance. This increase in accuracy suggests the feasibility of developing practical silent speech decoding systems, particularly for speech-impaired patients.

Scaling Law in Neural Data: Non-Invasive Speech Decoding with 175 Hours of EEG Data

Motoshige Sato, Kenichi Tomeoka, Ilya Horiguchi, Kai Arulkumaran, Ryota Kanai, Shuntaro Sasai

Brain-computer interfaces (BCIs) hold great potential for aiding individuals with speech impairments. Utilizing electroencephalography (EEG) to decode speech is particularly promising due to its non-invasive nature. However, recordings are typically short, and the high variability in EEG data has led researchers to focus on classification tasks with a few dozen classes. To assess its practical applicability for speech neuroprostheses, we investigate the relationship between the size of EEG data and decoding accuracy in the open vocabulary setting. We collected extensive EEG data from a single participant (175 hours) and conducted zero-shot speech segment classification using self-supervised representation learning. The model trained on the entire dataset achieved a top-1 accuracy of 48\% and a top-10 accuracy of 76\%, while mitigating the effects of myopotential artifacts. Conversely, when the data was limited to the typical amount used in practice (∼10 hours), the top-1 accuracy dropped to 2.5\%, revealing a significant scaling effect. Additionally, as the amount of training data increased, the EEG latent representation progressively exhibited clearer temporal structures of spoken phrases. This indicates that the decoder can recognize speech segments in a data-driven manner without explicit measurements of word recognition. This research marks a significant step towards the practical realization of EEG-based speech BCIs.

Delineating neural contributions to electroencephalogram-based speech decoding

Motoshige Sato, Yasuo Kabe, Sensho Nobe, Akito Yoshida, Masakazu Inoue, Mayumi Shimizu, Kenichi Tomeoka, Shuntaro Sasai

Speech Brain-computer interfaces (BCIs) have emerged as a pivotal technology in facilitating communication for individuals with speech impairments. Utilizing electroencephalography (EEG) for noninvasive speech BCIs offers an accessible and affordable solution, potentially benefiting a broader audience. However, EEG-based speech decoding remains controversial especially for overt speech, due to difficulties in separating speech-related neural activities from myoelectric potential artifacts generated during articulation. Here we aim to delineate the extent of the neural contributions by employing Explainable AI techniques to a convolutional neural network predicting spoken words based on signals obtained by ultra-high-density (uhd)-EEG. We found that electrode-wise contributions to the decoding cannot be explained by their mutual information with electromyography (EMG). Furthermore, contributing periods of speech to EEG-based decoding are distinct from those to decoding solely relying on EMG. In contrast, there are significant overlaps in signal timings contributing to EEG-based decoding, regardless of vocal conditions such as overt or covert speech. Notably, the denoising process successfully enhanced the decoding contribution from electrodes within speech-related brain areas for all speech conditions. Altogether, our findings support the idea that, with appropriate preprocessing, EEG becomes a valuable tool for decoding spoken words based on underlying neural activities.

Publications

Year

Large-scale training data enhances silent speech decoding with around-ear EEG

Seeing from the inside out: Intrinsic neural dynamics explain individual differences in movie watching

A Silent Speech Decoding System from EEG and EMG with Heterogenous Electrode Configurations

Scaling Law in Neural Data: Non-Invasive Speech Decoding with 175 Hours of EEG Data

Delineating neural contributions to electroencephalogram-based speech decoding