Skip to content

Support reading comment text/content from Nihon Kohden EEG files #13633

@eulerleibniz

Description

@eulerleibniz

Describe the new feature or enhancement

Description

Nihon Kohden EEG files contain comments/annotations with textual and image content, but currently this information is not accessible when reading the data with MNE.

I previously raised this question on the MNE Discourse forum, where the limitation was discussed:
https://mne.discourse.group/t/how-to-read-the-nihon-kohden-eeg-files-comments-content-of-the-comment/11680

At the moment, MNE can detect the presence/timing of comments but does not expose the actual text content of those comments to the user.

Why this matters

The comment text in Nihon Kohden recordings often contains clinically and experimentally relevant metadata, such as:

  • Seizure/Event types and full descriptions added by experts
  • Manual annotations added during recording added by nurses

Losing this information during import makes downstream analysis specially in case of machine learning incomplete and requires users to rely on external vendor software.

Current behavior

Given a code like this:

import mne 
print(mne.__version__)
raw = mne.io.read_raw_nihon("FJ00231Z.EEG", preload=False) # OR preload=True
for ann in raw.annotations:
    print(ann)

This will output the annotations as expected, but it does not read the comments content. Here is a sample output of the code:

1.11.0 # MNE version
Loading FJ00231Z.EEG
Found 21E file, reading channel names.
Reading header from Path\To\File\EEG2100\FJ00231Z.EEG # EEG 2100 Device
Found PNT file, reading metadata.
Found LOG file, reading events.

OrderedDict({'onset': np.float64(11768.0), 'duration': np.float64(0.0), 'description': np.str_('eye close'), 'orig_time': datetime.datetime(2025, 12, 27, 9, 49, 31, tzinfo=datetime.timezone.utc), 'extras': {}})

OrderedDict({'onset': np.float64(13307.568), 'duration': np.float64(0.0), 'description': np.str_('P_COMMENT'), 'orig_time': datetime.datetime(2025, 12, 27, 9, 49, 31, tzinfo=datetime.timezone.utc), 'extras': {}})

Those annotations with 'description': np.str_('P_COMMENT') contain comments such as this image and are not currently readable:

Image
  • Nihon Kohden EEG files can be read
  • Annotations can be read
  • Comments timing are available
  • Comments are all shown as P_COMMENT and the text/content is not accessible

Expected behavior

At least the Comment text should be parsed and exposed, ideally as:

  • Annotations.description, or
  • In the extra field of the annotation ideally as a dict

Describe your proposed implementation

Basic code snippet

Here is the code i use currently to read the comments from a given .CMT file. It should be integrated into mne.io.read_raw_nihon code but the code is not complete so it can wait.

import re
import string
from dataclasses import dataclass


@dataclass
class Comment:
    timestamp: int
    text: str


TS_RE = re.compile(rb"(\d{20})") # The timing of each annotation which is 20 digit long

PRINTABLE = set(bytes(string.printable, "ascii")) # The .CMT file contains many NULL and control characters


def clean_bytes(b: bytes) -> str:
    # keep printable ASCII, including space and newline - NOT TESTED WITH COMMENTS INCLUDING IMAGE LINKS
    cleaned_byte = bytes(c if c in PRINTABLE else ord(" ") for c in b)
    cleaned_str = cleaned_byte.decode("ascii", errors="ignore").strip()
    cleaned_str = cleaned_str[10:].lstrip().lstrip("\t\n\x0b\r\x0c") # at least 10 control characters are included before the text
    return cleaned_str


def parse_cmt(path: str):
    data = open(path, "rb").read()
    matches = list(TS_RE.finditer(data))
    records = []
    for i, m in enumerate(matches):
        ts = int(m.group(1).decode("ascii"))
        start = m.end()
        end = matches[i + 1].start() if i + 1 < len(matches) else len(data)

        raw_text = data[start:end]
        text = clean_bytes(raw_text)

        if text:
            records.append(Comment(timestamp=ts, text=text))

    return records


records = parse_cmt("ROOT_PATH/NKT/EEG2100/FJ00231Z.CMT")
for r in records:
    print("📝")
    print(r.timestamp)
    print(r.text)

Limitations:

  • Tested only on EEG 2100 devices
  • Its just a workaround for starter
  • Nihon Kohden Comments can containt color code, background transparency, and even a reference image. None of these are read here as i've never seen experts really use them.

Describe possible alternatives

No alternatives currently

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions