Skip to content

After converting a PDF file to Word, some images in the Word document are rotated #346

@yyyang91

Description

@yyyang91

Description of the bug

This is the PDF file that triggered the issue. 彼得与狼.pdf.

The display effect in Adobe Reader:

Image

The display effect in Office Word:

Image

How to reproduce the bug

The following code will convert the PDF file to a Word file.

from pdf2docx import Converter

def pdf_to_word(pdf_path, word_path):
    cv = Converter(pdf_path)
    cv.convert(word_path)
    cv.close()
    print(f"success")


if __name__ == "__main__":
    pdf_to_word("彼得与狼.pdf","彼得与狼.docx")

I found that in the function ImagesExtractor.extract_images, for non-intersected normal images, only the rotation of the page seems to be considered, not the rotation of the image itself.
[ImagesExtractor.py](https://github.com/ArtifexSoftware/pdf2docx/blob/master/pdf2docx/image/ImagesExtractor.py)

### pdf2docx version

0.5.8

### Operating system

Windows

### Python version

3.8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions