Skip to content

Support Google Keep JSON Import, Note Deduplication, and XML/Encoding Fixes#13

Open
schwegler wants to merge 2 commits into
mgks:mainfrom
schwegler:feat/keep-json-support
Open

Support Google Keep JSON Import, Note Deduplication, and XML/Encoding Fixes#13
schwegler wants to merge 2 commits into
mgks:mainfrom
schwegler:feat/keep-json-support

Conversation

@schwegler

@schwegler schwegler commented Jun 14, 2026

Copy link
Copy Markdown

Overview

This PR adds full support for importing Google Keep's native .json export format alongside the existing HTML format. It also implements note deduplication (prioritizing JSON files over HTML files when both are present in Google Takeout archives), maps native tags/checkboxes correctly to ENEX outputs, and fixes a character encoding issue in the zip extractor worker that caused emoji and tag corruption.


Key Changes

1. Google Keep JSON Import Support & Auto-Deduplication

  • Format Detection Upgrade (src/config/formats.js):
    • Enhanced detectFormat to look for folder structure indicators (e.g., Keep/ directories) or standard Takeout indexes like archive_browser.html to robustly auto-detect Google Keep archives.
    • Allows matching both .html and .json files as Google Keep formats.
  • Deduplication & Filtering (src/main.js):
    • When rendering lists or calculating selection state, the application now maps file paths to identify duplicate notes.
    • If a note has both a .json file and a .html file, the UI only displays the .json version to avoid importing duplicate entries.
    • Explicitly excludes system artifacts like archive_browser.html.
  • JSON Parser (src/main.js):
    • Created parseKeepJson(content) to map native Google Keep schemas into internal note objects:
      • Parses checkbox lists (listContent) into formatted HTML checkboxes.
      • Escapes HTML characters in text notes and converts newlines to <br/> tags.
      • Maps labels to tags, attachments to binary paths, and microseconds timestamps (createdTimestampUsec, userEditedTimestampUsec) to ISO strings.

2. ENEX Export Refinement (src/main.js)

  • Checkbox Conversion: Maps standard input checkbox tags (<input type="checkbox" checked="true"/> / <input type="checkbox"/>) to valid Evernote <en-todo checked="true"/> / <en-todo/> elements inside note bodies.
  • Tag Preservation: Correctly handles note tags, escaping XML entities (like &, <, >, ", ') to prevent invalid ENEX document generation, and appends them as standard <tag>Tag Name</tag> XML nodes.

3. Worker UTF-8 Character Encoding Fix (src/modules/worker.js)

  • Changed zip-extraction logic to retrieve text entries as a uint8array binary buffer instead of standard JSZip string extraction.
  • Uses TextDecoder('utf-8') to decode the raw bytes. This ensures emojis, foreign characters, and custom tags are parsed correctly and do not suffer from character corruption.

Files Changed

  • src/config/formats.js: Enhanced detectFormat logic to recognize Keep directories and batch structures.
  • src/main.js: Integrated parseKeepJson, deduplicated list/select actions, added <en-todo> and <tag> mappings.
  • src/modules/worker.js: Replaced JSZip string conversion with manual UTF-8 TextDecoder buffer parsing.

Verification & Testing

  • Deduplication Check: Uploaded a Google Takeout zip with both JSON and HTML copies of notes. Verified that only the JSON representation appears in the import list.
  • Format Check: Verified that checkboxed lists (from Keep JSON) translate to interactive checklists in Evernote/ENEX.
  • Encoding Validation: Imported notes containing emojis and special characters (e.g. ✨ Note Title 🏷️) and confirmed that they render without corruption.

@schwegler schwegler changed the title Feat/keep json support Support Google Keep JSON Import, Note Deduplication, and XML/Encoding Fixes Jun 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant