Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windows only: pdfquery is locking the opended pdf-file #75

Open
iconberg opened this issue May 10, 2019 · 1 comment
Open

windows only: pdfquery is locking the opended pdf-file #75

iconberg opened this issue May 10, 2019 · 1 comment

Comments

@iconberg
Copy link

I try open pdfs files to query data from it and then use that data to rename the pdf-file.
On windows this code fails with renaming cause the file is locked.
On linux the code is working.

I cannot see if this error belongs to pdfquery itself or an other module used by pdfquery is causing this.

import os
import pdfquery


def is_pdf(file):
    if os.path.splitext(file.lower())[1] == '.pdf':
        return True


pdf_files = os.listdir('./pages')
for pdf_file in filter(is_pdf, pdf_files):
    print(pdf_file)
    pdf = pdfquery.PDFQuery(os.path.join('pages', pdf_file))
    pdf.load()
    for e in pdf.tree.iter():
        text = e.text
        if text:
            text = text.replace(' ', '')
            if text[0:7] == '4002629':
                #del pdf
                os.rename(os.path.join('pages', pdf_file),
                          '{}.pdf'.format(text))
                break

Error on windows:

Traceback (most recent call last):
  File "C:\Users\Administrator\Desktop\PDFs_aufbereiten\pdf_pages_rename.py", line 22, in <module>
    os.rename(os.path.join('pages', pdf_file), '{}.pdf'.format(text))
PermissionError: [WinError 32] Der Prozess kann nicht auf die Datei zugreifen, da sie von einem anderen Prozess verwendet wird: 'pages\\xxxxxxxxxxxxxxxxxxxx.pdf' -> 'xxxxxxxxxxxxx.pdf'

Code on linux is working.

@iconberg
Copy link
Author

Workaround open/close the file by own code before using pdfquery.PDFQuery (thanks to nedbat):

import os
import pdfquery
import time

def is_pdf(file):
    if os.path.splitext(file.lower())[1] == '.pdf':
        return True


rename_files = []
pdf_files = os.listdir('./pages')
for pdf_file in filter(is_pdf, pdf_files):
    print(pdf_file)
    with open(os.path.join('pages', pdf_file), 'rb') as myfile:
        pdf = pdfquery.PDFQuery(myfile)
        pdf.load()
        for e in pdf.tree.iter():
            text = e.text
            if text:
                text = text.replace(' ', '')
                if text[0:7] == '4002629':
                    rename_files.append(
                        (pdf_file, '{}.pdf'.format(text))
                    )
                    break

for oldname, newname in rename_files:
    os.rename(os.path.join('pages', oldname),
              os.path.join('pages', newname)
              )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant