Data
Specifications
This project provides openly accessible multimodal EEG/EMG recordings for brain-computer interface research. All datasets are hosted on OpenNeuro and associated with peer-reviewed publications.
General Overview
Summary of the entire project data structure, supported modalities, and publication mappings.
Audio
Microphone recordings synchronized with EEG, downsampled to 16 kHz. Enables audio-EEG cross-modal analysis and ground truth transcription.
EMG (Electromyography)
Bipolar facial EMG (orbicularis oris upper/lower, EOG) synchronized with EEG. Used for myoelectric artifact removal and joint EEG/EMG decoding models.
| Dataset | Task | Device | EEG Ch | Hour | Related Publication |
|---|---|---|---|---|---|
| uhd-speech | Overt / Min-overt / Covert | g.Pangolin | 128 ch | ~113 h | Sasai et al. bioRxiv 2024 |
| silent-speech-word | Silent speech (word) | eego sports | 63 ch | 16.1 h | Sasai et al. bioRxiv 2024 |
| silent-speech-sentence | Silent speech (sentence) | eego sports | 63 ch | 13.6 h | Sasai et al. bioRxiv 2024 |
| vocalized-eego | Vocalized speech | eego sports | 63 ch | 49.2 h | Sasai et al. bioRxiv 2024 |
| vocalized-scarabeo | Vocalized speech | g.SCARABEO | 62 ch | 27.7 h | Sasai et al. bioRxiv 2024 |
DATASET 01:
5-Word Isolated Speech EEG-EMG Dataset
Participants spoke the same word five times consecutively following a 3-count auditory cue, across three speech production modes.
Protocol
| Item | Value |
|---|---|
| Participants | 3 healthy adults (male, age 29–36) |
| Speech conditions | Overt / Minimally overt / Covert (imagined) |
| Vocabulary | 5 color words (Japanese) |
| Task structure | 5 repetitions per word following a 3-count cue, 1.25 sec/repetition |
| EEG device | g.Pangolin (g.tec, Austria) |
| EEG channels | 128 (8 sheets × 16 electrodes) |
| Sampling rate | 256 Hz (online) |
| Electrode placement | Left-hemisphere language areas |
| EMG | EOG + upper/lower orbicularis oris (bipolar) |
| Ethics | Shiba Palace Clinic Ethics Review Committee; Declaration of Helsinki |
Offline Decoding Performance
(EEGNet, for reference)
| Condition | Avg, accuracy (3subjects) |
|---|---|
| Overt | 94.6% |
| Minimally overt | 94.9% |
| Covert | 91.1% |
Real-world Application
Using this dataset, we demonstrated a real-time EEG-based Gmail control interface — participants navigated their inbox and generated AI-assisted email replies by decoding 5 color words (green, magenta, orange, violet, yellow).
Access
DATASET 02:
Open-Vocabulary Sentence Reading EEG-EMG Dataset
Participants read sentences from novels, text-based TV games, and JSUT corpus, producing long-form speech. Participants read them at natural speed.
Protocol
| Item | Value |
|---|---|
| Participants | 3 healthy adults (male, age 22-44) |
| Speech conditions | Overt |
| Vocabulary | Open (Japanese) |
| Task structure | Continuous speech (read sentences from on-screen texts) |
| EEG device | g. Pangolin, g.Scarabeo (g.tec, Austria), eegosports (ANTneuro) |
| EEG channels | 128 (gPangolin), 64 (g.Scarabeo), 63 (eegosports) |
| EMG channels | 5 ch (healthy: cheek, upper/lower mouth, chin, throat; patient: above/below eye) |
| Sampling rate | 1200 Hz (gPangolin, g.Scarabeo), 1024 Hz (eegosports) |
| Electrode placement | Left-hemisphere (gPangolin), whole-brain (g. Scarabeo, eegosperts) |
| EMG | EOG + upper/lower orbicularis oris (bipolar) |
Access
DATASET 03:
64-class word/sentence Reading EEG-EMG Dataset
(TBD)