Research Outcomes

Selected research results obtained using this dataset, demonstrating the current state of the art in non-invasive EEG/EMG-based speech decoding.

Result 01:

Brain-Gmail Interface

Dataset used

5-Word Isolated Speech EEG-EMG Dataset (Dataset 1)

Using 5 color words decoded in real-time from 128-channel ultra high density EEG, we demonstrated end-to-end BCI-controlled Gmail navigation: participants selected emails to open, chose ChatGPT-generated reply candidates, and confirmed sending — using only EEG signals from spoken color words.

Three speech conditions were tested at different audio levels:

Speech condition Audio level Avg. decoding accuracy
Overt ~60 dB 0.407
Minimally overt ~40 dB 0.521
Covert (silent) background noise only 0.212

Online decoding accuracy (5-class, chance = 20%): avg. 52.1% for minimally overt speech, successfully completing the full email workflow.

Result 02:

Scaling Law in EEG-based Speech Decoding

Dataset used

All datasets combined (heterogeneous electrode configurations)

A central finding across our work is that EEG/EMG-based speech decoding accuracy improves monotonically with training data volume — a scaling law analogous to those observed in large language models.

This scaling law has two major practical implications, each validated experimentally:

Cross-device generalizability. The scaling relationship is robust to hardware heterogeneity. Models trained on pooled data from g.Pangolin (128 ch), g.Scarabeo (64 ch), and eego™sports (63 ch) consistently outperform single-device models, indicating that data volume matters more than device uniformity. This finding directly motivates the JapanEEG dataset's multi-configuration design philosophy.

Population-scale pretraining benefits individual users. Using compact around-ear EEG (cEEGrid electrodes) with an OpenBCI Cyton + Daisy Board, we decoded spoken words from a 64-word vocabulary across three speech conditions (silent, vocalized, and attempted speech). Training on data from 24 healthy participants instead of a single patient produced a ~40 percentage-point improvement in decoding accuracy (~0.05 → ~0.45), demonstrating that large-scale population data substantially improves individual decoding performance — including for patients with speech impairments.

Together, these results establish a clear and predictable relationship between data accumulation and BCI performance: more data yields better decoding, regardless of the recording setup.

Result 03:

Electrode-Configuration-Agnostic Speech Decoding Model

Dataset used

All datasets combined (heterogeneous electrode configurations)

A newly proposed EEG-based speech decoding model that is agnostic to electrode configurations. Unlike conventional approaches requiring fixed electrode placement, this model adapts to arbitrary montages via a spatial positional encoder trained on 3D electrode coordinates.

The architecture (conformer × 11 layers) takes EEG/EMG tokens as input and outputs: audio latent representation, mora probability sequence, and word probabilities. Four spatial integration variants were evaluated:

  • (a) Global average pooling — baseline
  • (b) Electrode-specific — per-electrode weighting
  • (c) Subject-specific — per-subject weighting
  • (d) On-the-fly kernel — transformer-based spatial encoding from 3D coordinates (best generalization)

This enables seamless integration of recordings from different labs and devices into a single training corpus, directly supporting the JapanEEG dataset's multiconfiguration design.