Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include HTML headers in TOC #538

Merged
merged 6 commits into from
Dec 1, 2023
Merged

Conversation

Crozzers
Copy link
Contributor

This PR closes #537 by adding the capability to parse <h[1-6]> tags and include their IDs in the table of contents.

It works by matching HTML header tags, checking for an id= attribute and then adding an entry to the TOC. If the tag does not have an ID, a new one is generated and inserted into the HTML.

Since HTML content is hashed before markdown headers are processed, this step intercepts headers in _hash_html_block_sub before they get hashed. This means that TOC entries are inserted out of order. To fix this, I added a function that sorts the TOC by order of appearance.
It works by taking the TOC header entry, searching the text for that header and returning the index at which it is found.

This new headers behaviour is disabled by default, and can be enabled using the new header-ids options dict.

# new API
markdown2.markdown(text, extras={'header-ids': {'mixed': True, 'prefix': 'my-prefix'}})
# old API
markdown2.markdown(text, extras={'header-ids': 'my-prefix'})  # converted to {'prefix': 'my-prefix'} in __init__

@nicholasserra nicholasserra merged commit efd824f into trentm:master Dec 1, 2023
18 checks passed
@nicholasserra
Copy link
Collaborator

Thank you! I'm gonna do a release next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTML headers not included in TOC
2 participants