Overview
Implement parser for Rich Text Format (RTF) documents.
Parent Epic
Part of #91 - Document & Office Format Awareness
Description
Parse RTF control words and text to extract document content and metadata.
Implementation Details
- RTF is text-based but highly structured
- Parse control words (\keyword)
- Extract plain text content
- Handle encoding ('hh hex escapes)
- Parse document info group (\info)
- Skip binary embedded objects (\bin)
String Sources
- Document info (title, author, subject)
- Plain text content
- Font names (\fonttbl)
- Style names (\stylesheet)
- Hyperlinks
Acceptance Criteria
Related
Project: #76