You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I got this working after some messy changes and wanted to go ahead and capture it until I can create a clean merge request. The idea would be to make the resume a complete continuous doc, rather than treating it like printed pages. This could help with external systems processing the data and allow actual paper printing more flexibility.
Note: I pulled these options out to play with them because I had issues with them conflicting with the CSS. I ended up commenting out all print-specific css rules and focused on getting the output to match the rendered CSS in the browse instead.
The following code shows how you can get the rendered height of the page in inches and the DPI. I found that in the options if you set a margin in pdf_base64 = driver.execute_cdp_cmd("Page.printToPDF", options) the resume would still be partially split across 2 pages. By removing the margins and relying on the rendered size in the browser, we know we are getting out exactly what we are rendering in the browser where it can be easily inspected.
defHTML_to_PDF(html_content, driver, options=DEFAULT_OPTIONS):
""" Converte una stringa HTML in un PDF e restituisce il PDF come stringa base64. :param html_content: Stringa contenente il codice HTML da convertire. :param driver: Istanza del WebDriver di Selenium. :return: Stringa base64 del PDF generato. :raises ValueError: Se l'input HTML non è una stringa valida. :raises RuntimeError: Se si verifica un'eccezione nel WebDriver. """# Validazione del contenuto HTMLifnotisinstance(html_content, str) ornothtml_content.strip():
raiseValueError("Il contenuto HTML deve essere una stringa non vuota.")
# Codifica l'HTML in un URL di tipo dataencoded_html=urllib.parse.quote(html_content)
data_url=f"data:text/html;charset=utf-8,{encoded_html}"try:
driver.get(data_url)
# Attendi che la pagina si carichi completamentetime.sleep(2) # Potrebbe essere necessario aumentare questo tempo per HTML complessibounding_box=driver.execute_script(
"return document.body.getBoundingClientRect();"
)
# Extract the height (in pixels) from the bounding boxheight_in_pixels=bounding_box["height"]
dpi=driver.execute_script(""" // Create a div element with a width of 1 inch var div = document.createElement('div'); div.style.width = '1in'; div.style.height = '1in'; div.style.position = 'absolute'; div.style.visibility = 'hidden'; document.body.appendChild(div); // Measure the offsetWidth (in pixels) var dpi = div.offsetWidth; // Clean up by removing the element document.body.removeChild(div); return dpi; """)
content_height_in_inches=height_in_pixels/dpioptions["paperHeight"] =content_height_in_inches# Esegue il comando CDP per stampare la pagina in PDFpdf_base64=driver.execute_cdp_cmd("Page.printToPDF", options)
returnpdf_base64['data']
exceptExceptionase:
logger.error(f"Si è verificata un'eccezione WebDriver: {e}")
raiseRuntimeError(f"Si è verificata un'eccezione WebDriver: {e}")
The options i used that worked well with the default template (with print styles commented out) where this:
{
"printBackground": True,
"landscape": False,
"paperWidth": 7.5,
"paperHeight": 11.5, # Height of the paper in inches (Letter size)"marginTop": 0, # use the margins from the stylesheet so keep things consistent"marginBottom": 0,
"marginLeft": 0,
"marginRight": 0,
"displayHeaderFooter": False, # Display headers and footers"preferCSSPageSize": False, # Prefer CSS page size"generateDocumentOutline": False, # Generate a document outline"generateTaggedPDF": True, # Generate a tagged PDF"transferMode": "ReturnAsBase64", # Return the PDF as a base64 string
}
The result is a perfectly contained PDF that represents the rendered resume (body element) in the browser.
Notes:
One thing I struggled with is that the way the code is architected currently makes it extremely difficult to pause at a breakpoint of the rendered resume and rebuild the resume with the already generated html LLM output with different css and html to pdf options. It seems like the way the resume is generated should be broken up so that it is easier to work on the rendered/pdf output.
Consider storing the llm output to file so that we can try rerendering the identical content with different templates or with changes to the templates without having to go back through the llm
I think I also needed to make this change to the body style in the default stylesheet to make sure that the body's max width was actually what was specified including the padding.
See example resume output (this includes an extra field i added called personal_statement, which is another feature i think needs to be added. Or provide some way to customize the output without having to hard code a lot of changes.
I struggled with getting the output PDF to match what I saw in the browser. I realized that we were using print-specific settings that make it hard to troubleshoot differences between what was rendered and what was output.
Alternatives considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Feature summary
Support generating PDF without page breaks
Feature description
I got this working after some messy changes and wanted to go ahead and capture it until I can create a clean merge request. The idea would be to make the resume a complete continuous doc, rather than treating it like printed pages. This could help with external systems processing the data and allow actual paper printing more flexibility.
Note: I pulled these options out to play with them because I had issues with them conflicting with the CSS. I ended up commenting out all print-specific css rules and focused on getting the output to match the rendered CSS in the browse instead.
The following code shows how you can get the rendered height of the page in inches and the DPI. I found that in the options if you set a margin in
pdf_base64 = driver.execute_cdp_cmd("Page.printToPDF", options)
the resume would still be partially split across 2 pages. By removing the margins and relying on the rendered size in the browser, we know we are getting out exactly what we are rendering in the browser where it can be easily inspected.The options i used that worked well with the default template (with print styles commented out) where this:
The result is a perfectly contained PDF that represents the rendered resume (body element) in the browser.
Notes:
I think I also needed to make this change to the body style in the default stylesheet to make sure that the body's max width was actually what was specified including the padding.
See example resume output (this includes an extra field i added called
personal_statement
, which is another feature i think needs to be added. Or provide some way to customize the output without having to hard code a lot of changes.resume_base.pdf
Motivation
I struggled with getting the output PDF to match what I saw in the browser. I realized that we were using print-specific settings that make it hard to troubleshoot differences between what was rendered and what was output.
Alternatives considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: