-
Notifications
You must be signed in to change notification settings - Fork 1
Add EML (email) format support with comprehensive header extraction and content conversion #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add EML (email) format support with comprehensive header extraction and content conversion #18
Conversation
…s, content, and attachments Co-authored-by: KSemenenko <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements comprehensive EML (email) format support for the MarkItDown library, enabling conversion of email messages to clean Markdown with structured headers, content conversion, and attachment metadata extraction.
Key Changes
- EmlConverter Implementation: New converter with MimeKit integration for robust email parsing and processing
- Enhanced Integration: Updated project dependencies, MIME type mappings, and converter registration
- Comprehensive Testing: Added extensive test coverage for various email scenarios and end-to-end integration
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/MarkItDown/Converters/EmlConverter.cs |
New email converter with header extraction, content conversion, and attachment metadata |
src/MarkItDown/MarkItDown.csproj |
Added MimeKit 4.9.0 dependency for email processing |
src/MarkItDown/MimeMapping.cs |
Added .eml file extension mapping to message/rfc822 MIME type |
src/MarkItDown/MarkItDown.cs |
Registered EmlConverter in built-in and ZIP-inner converter lists |
tests/MarkItDown.Tests/EmlConverterTests.cs |
Comprehensive test suite covering email parsing and conversion scenarios |
tests/MarkItDown.Tests/NewConvertersTests.cs |
Added integration tests for EML format support and converter priority validation |
README.md |
Updated documentation with EML support information and usage examples |
|
|
||
| private readonly HtmlConverter _htmlConverter; | ||
|
|
||
| public int Priority => 240; // Between EPUB and PPTX |
Copilot
AI
Sep 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states priority is 'Between EPUB and PPTX' but according to the project's priority system, lower numbers mean higher priority. If PPTX has priority 230 and EPUB has priority 250, then 240 is indeed between them, but the comment could be clearer about the priority ordering direction.
| private static string FormatFileSize(long bytes) | ||
| { | ||
| string[] sizes = { "bytes", "KB", "MB", "GB" }; | ||
| double len = bytes; | ||
| int order = 0; | ||
|
|
||
| while (len >= 1024 && order < sizes.Length - 1) | ||
| { | ||
| order++; | ||
| len /= 1024; | ||
| } | ||
|
|
||
| return $"{len:0.##} {sizes[order]}"; | ||
| } |
Copilot
AI
Sep 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file size formatting logic is duplicated functionality that likely exists elsewhere in the codebase or could be extracted to a common utility class to avoid code duplication.
|
@copilot fix this The comment states priority is 'Between EPUB and PPTX' but according to the project's priority system, lower numbers mean higher priority. If PPTX has priority 230 and EPUB has priority 250, then 240 is indeed between them, but the comment could be clearer about the priority ordering direction. private readonly HtmlConverter _htmlConverter; Comment on lines +264 to +277 https://github.com/apps/copilot-pull-request-reviewer AI |
…ies to avoid code duplication Co-authored-by: KSemenenko <[email protected]>
Fixed both issues in commit 87cfbba:
The method now provides consistent file size formatting across all converters. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
✅ COMPLETED: EML Email Format Support Implementation
Problem Statement Addressed
The issue requested two main features:
Implementation Summary
🎯 Complete EML Converter Implementation
📝 Files Modified (Minimal Changes)
✅ Code Quality Improvements
FileUtilitiesclass with sharedFormatFileSizemethod, removing duplicate code from bothEmlConverterandZipConverter✅ Features Delivered
🧪 Test Results
📧 Demo Output
🎯 Use Cases Enabled
Architecture Decisions
✅ Minimal Changes Approach
Focused on Phase 1 (EML support) only, making the smallest possible changes to address the immediate need. The modular package structure (Phase 2) was identified as a valuable future enhancement but would require significant architectural changes better suited for a major version release.
✅ Dependency Choice
Selected MimeKit 4.9.0 as the industry-standard library for MIME processing, providing robust email parsing with excellent .NET integration and comprehensive format support.
✅ Priority Placement
Set EML converter priority at 240, positioning it between PPTX (230) and EPUB (250) converters, ensuring appropriate precedence in the conversion pipeline.
✅ Code Quality Refactoring
Addressed code review feedback by:
Final Status: ✅ COMPLETE
The EML email format support has been successfully implemented with comprehensive functionality, thorough testing, complete documentation, and code quality improvements based on review feedback.
Ready for production use! 🚀
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.