Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to Docutils HTML5 writer instead of the HTML4 one #3004

Open
3 tasks done
qookei opened this issue May 7, 2022 · 7 comments · May be fixed by #3432
Open
3 tasks done

Switch to Docutils HTML5 writer instead of the HTML4 one #3004

qookei opened this issue May 7, 2022 · 7 comments · May be fixed by #3432

Comments

@qookei
Copy link

qookei commented May 7, 2022

  • I have searched the issues (including closed ones) and believe that this is not a duplicate.
  • I have searched the documentation and believe that my question is not covered.
  • I am willing to lend a hand to help implement this feature.

Feature Request

Pelican has used the Docutils HTML4 writer for about 11 years at this point, and I think it's time to put it to rest. Since 0.13, Docutils includes an HTML5 writer called html5_polyglot, which generates (X)HTML5, with correct semantic tags for article sections, etc.

As far as I can tell at a glance, all that is needed is just swapping out the import. I have done so in a custom plugin, by subclassing pelican.readers.RstReader and changing the writer_class (along with the translator) and it appears to work just fine.

Note that this is a breaking change in some aspects. For example, themes that add styles for div.section will break as now those are replaced with actual section tags (they still have their ids though).

Also worth of note is that while issue #959 is similar, it's not related as at that time Docutils did not know about HTML5.

@stale
Copy link

stale bot commented Jul 10, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your participation and understanding.

@stale stale bot added the stale Marked for closure due to inactivity label Jul 10, 2022
@justinmayer justinmayer removed the stale Marked for closure due to inactivity label Jul 21, 2022
@justinmayer
Copy link
Member

justinmayer commented Aug 3, 2023

I apologize for the delay in responding to this issue, which has been raised before in #2549. Given that some years have passed since that issue was closed, perhaps it is time to revisit this topic, particularly because according to the Docutils release notes, it seems html5 will soon become the default writer.

It occurs to me that maybe the Docutils writer could also be a configurable setting?

@qookei: Given the concerns raised in #2549 regarding whether Docutils produces valid HTML5, can you run the output through some validators and confirm that the current released version of Docutils can produce valid HTML5?

@justinmayer
Copy link
Member

@qookei: Any thoughts on my comment above?

@qookei
Copy link
Author

qookei commented Aug 27, 2023

Oh apologies, I looked at your response on my phone, but forgot to respond later.

With the skeleton generated by pelican-quickstart, the following plugin to generate HTML5 output: https://gist.github.com/qookei/c302b95ff716257b424e877f3dceb502 (more or less a copy of the existing HTML4 code in Pelican), and a simple RST file, the W3C markup validator reports the following for the post HTML page:
image

This seems fine to me, and the reported warnings can be fixed by changing the theme's templates (HTML generated with the theme I wrote validates with no messages).

EDIT: The above warnings are also reported with the default HTML4 writer.
I also decided to run it through a larger RST file, and that doesn't show any extra warnings or errors either.

@ceyusa
Copy link

ceyusa commented Dec 1, 2024

I've just change

--- a/pelican/readers.py
+++ b/pelican/readers.py
@@ -11,7 +11,7 @@ import docutils
 import docutils.core
 import docutils.io
 from docutils.parsers.rst.languages import get_language as get_docutils_lang
-from docutils.writers.html4css1 import HTMLTranslator, Writer
+from docutils.writers.html5_polyglot import HTMLTranslator, Writer
 
 from pelican import rstdirectives  # NOQA
 from pelican.cache import FileStampDataCacher

And my blog was built correctly (some random https://validator.w3.org/) obviously it needs a lot of more work got get it merged (a lot of tests need to be updated)

@justinmayer
Copy link
Member

If someone would like to submit a pull request that implements the needed changes for a switch to the Docutils HTML5 writer, I would be happy to include it in a subsequent major Pelican release.

@nkr0
Copy link

nkr0 commented Dec 2, 2024

@justinmayer PR #3432

@nkr0 nkr0 linked a pull request Dec 2, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

4 participants