explored PyPDF2 3.0.1 (helps in extracting data from pdf files & convert into flow HTML) #34

Gayathrijonnalagadda · 2023-01-27T04:15:49Z

Gayathrijonnalagadda
Jan 27, 2023
Collaborator

you need to install PyPDF2 and pdfreader libraries via pip before running this code, by running pip install pypdf2 & pip install pdfreader command in your terminal or command line.
This code will open the "example.pdf" file, create a PDF reader object to read the file, and then iterate through each page of the PDF. For each page, it will extract the text from the page using the extractText() method, and then write the text to an HTML file using the write() method. The name of the HTML file will be "example_0.html" for the first page, "example_1.html" for the second page, and so on.
This code is a very basic example and it will only extract the text from the PDF (both old and new versions of ipcc reports) and write it to an HTML file, **it will not maintain the formatting, layout, and images of the original PDF.**

!pip install PyPDF2
!pip install Pdfreader

import PyPDF2

 //open pdf file
with open('example.pdf', 'rb') as file:
    // Create a PDF reader object
    pdf_reader = PyPDF2.PdfReader(file)
   
    //Iterate through each page
    for page_num in range(len(pdf_reader.pages)):
        // Get the current page
        page = pdf_reader.pages[page_num]
       
        // Extract the text from the page
        text = page.extract_text()
       
        // Write the text to an HTML file
        with open(f'exmp_{page_num}.html', 'w') as html_file:
            html_file.write(text)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

explored PyPDF2 3.0.1 (helps in extracting data from pdf files & convert into flow HTML) #34

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

explored PyPDF2 3.0.1 (helps in extracting data from pdf files & convert into flow HTML) #34

Uh oh!

Uh oh!

Gayathrijonnalagadda Jan 27, 2023 Collaborator

Replies: 0 comments

Gayathrijonnalagadda
Jan 27, 2023
Collaborator