-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: add the contentlayer to html-backend #1040
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
🟢 Require two reviewer for test updatesWonderful, this rule succeeded.When test data is updated, we require two reviewers
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be necessary to rebase to main
to resolve the merge conflict and to pick up the last commit, which includes another doc.add_text(...)
call. And ensure a conventional commit title for this PR.
Another concern: this code will put everything to furniture before the first <h1>
tag appears (if there is any). That may be a strong assumption. For instance, the example reported in the issue #1019 has no <h1>
tag.
Instead, we could put everything in the body container (we only parse the <body>
tag anyway) except specific tags like footnote
or aside
(which are not yet supported by the backend anyway).
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Peter Staar <[email protected]>
Signed-off-by: Cesar Berrospi Ramis <[email protected]>
e96ed30
to
70e6b94
Compare
In case an HTML does not have any header tag, all parsed items are placed in DoclingDocument's body content layer. HTML paragraphs ('p' tags) are parsed as text items with paragraph label. Update test ground truth accoring to the changes above. Signed-off-by: Cesar Berrospi Ramis <[email protected]>
@PeterStaar-IBM I have done some changes on branch
|
Signed-off-by: Cesar Berrospi Ramis <[email protected]>
93d380b
to
3ea31b6
Compare
New feature
Checklist: