-
Notifications
You must be signed in to change notification settings - Fork 17
WIP HTML -> guided navigation conversion #262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Looking at the results, here are a few early comments:
|
Updated input: <!doctype html>
<html xmlns:epub="http://www.idpf.org/2007/ops"><!-- lang="en" xml:lang="en" -->
<body>
<p xml:lang="fr">Paragraphe avec image: <img src="src/image.jpg" alt="A cool image" /></p>
<p xml:lang="fr">Paragraphe avec image #1 <img src="src/image.jpg" alt="A cool image" /> et #2 <img src="src/image.jpg" alt="A second cool image" />!</p>
<p xml:lang="fr"><img src="src/image.jpg" alt="The coolest image" /> et <img src="src/image.jpg" alt="The boring image" /></p>
<p>A paragraph with: <img src="src/image.jpg" alt="A cool image" /><em xml:lang="fr">est cool!</em></p>
<p><i>Simple paragraph</i></p>
<p>This job requires a certain <em xml:lang="fr">savoir faire</em> that can only be acquired over time.</p>
<p>This is a paragraph <b>with some very-<em>strong</em> bold</b> text!</p>
<p>Just<br />testing<br>some<br /> breaks! And useless <span>elements</span>...</p>
<div>
<span id="pg04" role="doc-pagebreak" epub:type="pagebreak" title="4"/>
<p>And the next pagebreak is in the middle <span id="pg05" role="doc-pagebreak" epub:type="pagebreak" title="4"/> of a sentence.</p>
</div>
<section role="doc-chapter" epub:type="chapter">
<h1>Title of the chapter</h1>
</section>
<ul>
<li>First item</li>
<li>Second item</li>
<li>Third item</li>
</ul>
<p aria-hidden="true">Hidden <b>text!</b> <img src="with_image.jpg" />...</p>
<p aria-hidden="true">More Hidden text</p>
<p aria-hidden="true">More Hidden text</p>
<img src="image1.avif" alt="Alternative text using the alt attribute">
<span role="img" aria-label="Rating: 4 out of 5 stars">
<span>★</span>
<span>★</span>
<span>★</span>
<span>★</span>
<span>☆</span>
</span>
<figure aria-labelledby="cat-caption">
<pre>
/\_/\
( o.o )
^
</pre>
<figcaption id="cat-caption">
ASCII Art of a cat face
</figcaption>
</figure>
</body>
</html> output: {
"guided": [
{
"children": [
{
"children": [
{
"text": {
"language": "fr",
"plain": "Paragraphe avec image:"
}
},
{
"description": "A cool image",
"imgref": "src/image.jpg",
"role": [
"image"
]
}
],
"role": [
"paragraph"
]
},
{
"children": [
{
"text": {
"language": "fr",
"plain": "Paragraphe avec image #1"
}
},
{
"description": "A cool image",
"imgref": "src/image.jpg",
"role": [
"image"
]
},
{
"text": {
"language": "fr",
"plain": "et #2"
}
},
{
"description": "A second cool image",
"imgref": "src/image.jpg",
"role": [
"image"
]
},
{
"text": {
"language": "fr",
"plain": "!"
}
}
],
"role": [
"paragraph"
]
},
{
"children": [
{
"description": "The coolest image",
"imgref": "src/image.jpg",
"role": [
"image"
]
},
{
"text": {
"language": "fr",
"plain": "et"
}
},
{
"description": "The boring image",
"imgref": "src/image.jpg",
"role": [
"image"
]
}
],
"role": [
"paragraph"
]
},
{
"children": [
{
"text": "A paragraph with:"
},
{
"description": "A cool image",
"imgref": "src/image.jpg",
"role": [
"image"
]
},
{
"text": {
"ssml": "<emphasis xml:lang=\"fr\">est cool!</emphasis>"
}
}
],
"role": [
"paragraph"
]
},
{
"role": [
"paragraph"
],
"text": {
"ssml": "<emphasis level=\"reduced\">Simple paragraph</emphasis>"
}
},
{
"role": [
"paragraph"
],
"text": {
"ssml": "<emphasis>This job requires a certain </emphasis><lang xml:lang=\"fr\">savoir faire</lang> that can only be acquired over time."
}
},
{
"role": [
"paragraph"
],
"text": {
"ssml": "<emphasis>This is a paragraph </emphasis><emphasis>with some very-</emphasis><emphasis>strong</emphasis> bold text!"
}
},
{
"role": [
"paragraph"
],
"text": {
"ssml": "Just<break/>testing<break/>some<break/> breaks! And useless elements..."
}
},
{
"children": [
{
"children": [
{
"role": [
"paragraph"
],
"text": "And the next pagebreak is in the middle of a sentence."
}
],
"role": [
"pagebreak"
]
}
]
},
{
"children": [
{
"level": 1,
"role": [
"heading"
],
"text": "Title of the chapter"
}
],
"role": [
"chapter"
]
},
{
"children": [
{
"role": [
"listItem"
],
"text": "First item"
},
{
"role": [
"listItem"
],
"text": "Second item"
},
{
"role": [
"listItem"
],
"text": "Third item"
}
],
"role": [
"list"
]
},
{
"description": "Alternative text using the alt attribute",
"imgref": "image1.avif",
"role": [
"image"
]
},
{
"description": "Rating: 4 out of 5 stars",
"role": [
"image"
]
},
{
"description": "ASCII Art of a cat face",
"role": [
"figure"
]
}
]
}
]
} |
Notes:
|
Looking better overall. I still notice objects with just Given the very large number of The examples with an image in the middle of a sentence also make me wonder if we shouldn't have an approach similar to pagebreaks and notes, where we use a custom SSML tag instead of breaking up text into multiple objects. This would apply to If we go back to this example: <p xml:lang="fr">Paragraphe avec image: <img src="src/image.jpg" alt="A cool image" /></p> The output should look like this: {
"role": ["paragraph"],
"text": {
"language": "fr",
"ssml": "Paragraphe avec image: <readium:image id=\"image1\" />",
"children": [
{
"role": ["image"],
"id": "image1",
"imgref": "src/image.jpg",
"description": "A cool image"
}
]
}
} |
For further contextualization, I think that we should include For example, if we add {
"role": ["body"],
"textref": "chapter.xhtml",
"children": []
} To further help with an implementation optimized for search and/or highlighting, we could also go beyond that and provide this information per node with fragments such as:
For example a paragraph with {
"role": ["paragraph"],
"textref": "chapter.xhtml#par1"
} |
@GoobyTheBOI any thoughts on this based on your own work? |
Work in progress. Given the following input:
the following guided nav doc is generated: