How to convert the response result into Markdown? #4

BUJIDAOVS · 2024-06-02T11:21:56Z

I'm using Python to request the API and have obtained response.json(). How can I transform it into a .md file?

adithya-s-k · 2024-06-02T17:46:51Z

I have added a notebook example on how to use the API and save it to Markdown

please take a look at this
https://github.com/adithya-s-k/marker-api/blob/master/examples/invoke.ipynb

BUJIDAOVS · 2024-06-02T18:04:14Z

Thank you, I also tried writing a similar code myself, for everyone's reference.

import requests
import base64
import os

def request_api(url, pdf_file_path):
    with open(pdf_file_path, 'rb') as pdf_file:
        pdf_content = pdf_file.read()
    files = {'pdf_files': (os.path.basename(pdf_file_path), pdf_content, 'application/pdf')}
    params = {'extract_images': True}  
    response = requests.post(url, files=files, params=params)
    return response

def convert_pdf_to_markdown(pdf_file_path, response):
    parent_dir = os.path.dirname(pdf_file_path)
    md_dir = os.path.join(parent_dir, os.path.splitext(os.path.basename(pdf_file_path))[0])
    if not os.path.exists(md_dir):
        os.makedirs(md_dir)

    image_dir = os.path.join(md_dir, "images")
    if not os.path.exists(image_dir):
        os.makedirs(image_dir)

    data = response.json()
    markdown_text = data[0].get("markdown", "Markdown text not found.")
    images = data[0].get("images", {})

    for image_name, image_data in images.items():
        image_content = base64.b64decode(image_data)
        image_path = os.path.join(image_dir, image_name)
        with open(image_path, "wb") as f:
            f.write(image_content)

        markdown_text = markdown_text.replace(f"![{image_name}]({image_name})", f"![{image_name}]({os.path.join('images', image_name)})")

    markdown_file = os.path.join(md_dir, os.path.splitext(os.path.basename(pdf_file_path))[0] + ".md")
    with open(markdown_file, "w", encoding="utf-8") as f:
        f.write(markdown_text)

def main():
    url = "http://127.0.0.1:17915/convert"
    pdf_file_path = "your.pdf"
    response = request_api(url, pdf_file_path)
    print('Status:',response.status_code)
    convert_pdf_to_markdown(pdf_file_path, response)
    print('Finished')

if __name__ == "__main__":
    main()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to convert the response result into Markdown? #4

How to convert the response result into Markdown? #4

BUJIDAOVS commented Jun 2, 2024

adithya-s-k commented Jun 2, 2024

BUJIDAOVS commented Jun 2, 2024

How to convert the response result into Markdown? #4

How to convert the response result into Markdown? #4

Comments

BUJIDAOVS commented Jun 2, 2024

adithya-s-k commented Jun 2, 2024

BUJIDAOVS commented Jun 2, 2024