-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OneNote support #55
base: main
Are you sure you want to change the base?
Conversation
Fixes microsoft#47 Add support for OneNote file conversion. * **README.md**: Add OneNote to the list of supported file formats. Add a note about using `one-extract` for OneNote support. Provide an example of converting OneNote files. * **pyproject.toml**: Add `onenote` to the list of dependencies. Add a note about OneNote support. * **src/markitdown/_markitdown.py**: Import `one_extract` as `onenote`. Add a new class `OneNoteConverter` to handle OneNote files. Register the `OneNoteConverter` in the `MarkItDown` class. * **tests/test_markitdown.py**: Add test strings for OneNote. Add a test case for OneNote file conversion.
@microsoft-github-policy-service agree [company="{your company}"] |
@microsoft-github-policy-service agree |
tests/test_markitdown.py
Outdated
@@ -179,7 +194,7 @@ def test_markitdown_exiftool() -> None: | |||
assert target in result.text_content | |||
|
|||
|
|||
if __name__ == "__main__": | |||
if __name__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there's a small typo in the if __name__main__":
line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your checking. I have already fix it.
@@ -164,6 +173,12 @@ def test_markitdown_local() -> None: | |||
for test_string in SERP_TEST_STRINGS: | |||
assert test_string in text_content | |||
|
|||
# Test OneNote processing | |||
result = markitdown.convert(os.path.join(TEST_FILES_DIR, "test.one")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you forget to add the test file, test.one
?
@@ -12,6 +12,9 @@ It presently supports: | |||
- Audio (EXIF metadata, and speech transcription) | |||
- HTML (special handling of Wikipedia, etc.) | |||
- Various other text-based formats (csv, json, xml, etc.) | |||
- OneNote (.one) | |||
|
|||
Note: OneNote is not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a typo? the comment doesn't seem consistent?
@HendricksJudy there are now some small merge conflicts (should be easy to address). Let's fix those conflicts. add the |
@@ -51,6 +54,18 @@ result = md.convert("example.jpg") | |||
print(result.text_content) | |||
``` | |||
|
|||
To convert OneNote files, you can use the following example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is redundant, there’s one already for converting files.
Thanks for your review, I will fix those conflicts ASAP. |
@@ -38,6 +38,7 @@ dependencies = [ | |||
"youtube-transcript-api", | |||
"SpeechRecognition", | |||
"pathvalidate", | |||
"onenote", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the correct package name? could not find it in pip registry, but found this one instead https://pypi.org/project/one-extract/ can you link which is the appropriate one?
Fixes #47
Add support for OneNote file conversion.
one-extract
for OneNote support. Provide an example of converting OneNote files.onenote
to the list of dependencies. Add a note about OneNote support.one_extract
asonenote
. Add a new classOneNoteConverter
to handle OneNote files. Register theOneNoteConverter
in theMarkItDown
class.