Moonshot Goal 1 Project
Large-scale
Speech EEG Database
for BCI
Enabling silent speech decoding with multimodal EEG/EMG recordings.
Open data and pre-trained models for the global brain-computer interface research community.
Decoding Accuracy
Silent speech word classification,
healthy participants (N=8)
Participants
8 healthy adults
+ 1 patient
+ calibration data
Hours of EEG / EMG
Across 3 devices
Comprehensive EEG
Datasets
Structured for immediate use in machine learning pipelines.
Includes multimodal recordings synchronized with high-density EEG.
DATASET 01:
5-Word Isolated Speech EEG-EMG Dataset
EEG recordings from three participants performing overt, minimally overt, and covert production of five Japanese color words. EEG was recorded at 256 Hz using a 128-channel g.Pangolin system over left-hemisphere language areas, with concurrent EMG from the orbicularis oris.
DATASET 02:
Open-Vocabulary Sentence Reading EEG-EMG Dataset
EEG/EMG recordings from three long-term participants reading sentences aloud from novels, text-based TV games, and the JSUT corpus at natural speed. EEG was recorded at 1200/1024 Hz using g.Pangolin (128 ch), g.Scarabeo (64 ch), and eegosports (63 ch) systems, with 3-channel facial EMG/EOG. Electrode placement varied from left-hemisphere to whole-brain coverage depending on the device.
DATASET 03:
64-class word/sentence reading EEG-EMG dataset
EEG/EMG recordings from participants reading words/sentences aloud.
ANALYSIS PLATFORM
ArKairos:
Open EEG
Analysis Platform
Features
An open-source software platform for reproducible EEG and EMG data processing. ArKairos provides pre-configured Docker containers and a Python SDK compatible with PyTorch and TensorFlow, allowing researchers worldwide to reproduce our decoding pipelines and build on top of them.
- Pre-configured Docker environments for immediate setup
- Python SDK for data loading, streaming, and preprocessing
- PyTorch & TensorFlow integration
-
Standardized preprocessing pipelines
(notch filter, CAR, bandpass, adaptive EMG removal)