Moonshot Goal 1 Project

Large-scale
Speech EEG Database
for BCI

Enabling silent speech decoding with multimodal EEG/EMG recordings.
Open data and pre-trained models for the global brain-computer interface research community.

Decoding Accuracy

95.3 %

Silent speech word classification,
healthy participants (N=8)

Participants

8 healthy adults
+ 1 patient
+ calibration data

Hours of EEG / EMG

650 +

Across 3 devices

DATASET 01:

5-Word Isolated Speech EEG-EMG Dataset

EEG recordings from three participants performing overt, minimally overt, and covert production of five Japanese color words. EEG was recorded at 256 Hz using a 128-channel g.Pangolin system over left-hemisphere language areas, with concurrent EMG from the orbicularis oris.

128ch EEG Overt / Covert 3 Participants g.Pangolin

Learn More

DATASET 02:

Open-Vocabulary Sentence Reading EEG-EMG Dataset

EEG/EMG recordings from three long-term participants reading sentences aloud from novels, text-based TV games, and the JSUT corpus at natural speed. EEG was recorded at 1200/1024 Hz using g.Pangolin (128 ch), g.Scarabeo (64 ch), and eegosports (63 ch) systems, with 3-channel facial EMG/EOG. Electrode placement varied from left-hemisphere to whole-brain coverage depending on the device.

EEG + EMG

Learn More

DATASET 03:

64-class word/sentence reading EEG-EMG dataset

EEG/EMG recordings from participants reading words/sentences aloud.

Learn More

Features

An open-source software platform for reproducible EEG and EMG data processing. ArKairos provides pre-configured Docker containers and a Python SDK compatible with PyTorch and TensorFlow, allowing researchers worldwide to reproduce our decoding pipelines and build on top of them.

Pre-configured Docker environments for immediate setup
Python SDK for data loading, streaming, and preprocessing
PyTorch & TensorFlow integration
Standardized preprocessing pipelines
(notch filter, CAR, bandpass, adaptive EMG removal)

Newsletter registration

Large-scale
Speech EEG Database
for BCI

Comprehensive EEG
Datasets

5-Word Isolated Speech EEG-EMG Dataset

Open-Vocabulary Sentence Reading EEG-EMG Dataset

64-class word/sentence reading EEG-EMG dataset

ArKairos:
Open EEG
Analysis Platform

Large-scale Speech EEG Databasefor BCI

Comprehensive EEGDatasets

5-Word Isolated Speech EEG-EMG Dataset

Open-Vocabulary Sentence Reading EEG-EMG Dataset

64-class word/sentence reading EEG-EMG dataset

ArKairos:Open EEGAnalysis Platform

Large-scale
Speech EEG Database
for BCI

Comprehensive EEG
Datasets

ArKairos:
Open EEG
Analysis Platform