Total progression in a publication for locators #90

HadrienGardeur · 2019-04-04T15:45:28Z

HadrienGardeur
Apr 4, 2019
Maintainer

Total progression as a % in a publication is one of the most common way to indicate progression to a user:

it's often displayed in the bookshelf of the user for each book either as text or a visual element
many apps also provide this information in the reading UI (either as text or as a timeline)
it can be relevant as a contextual information in bookmarks, highlights/annotations, search results or a TOC

Currently, the only way to calculate this information with a locator is indirect:

in EPUB, you can calculate a % based on the position of a locator, as long as you also know the total number of positions
in PDF, this can be calculated using the page number in a fragment, as long as you also know the total number of pages
in an audiobook, this can be calculated using the media fragment and checking this against the duration of each item in the readingOrder
in a CBZ, this can be calculated by looking at the index of the resource in the readingOrder plus the total number of resources in the readingOrder

Given the universal nature of a % in a publication, its usefulness for sorting locators and its wide usage in reading apps, I would like to introduce totalProgression as a new location in the Locator model.

I'm including a few contributors that have their own opinion on this:

@ullstrm since this has been implemented on iOS by BookBeat in their own app
@mmenu-mantano who implemented something similar at Mantano
@llemeurfr for the timeline in Readium Desktop
@aferditamuriqi for her work on the Android side and on the position list

HadrienGardeur · 2019-04-09T10:31:56Z

HadrienGardeur
Apr 9, 2019
Maintainer Author

Any thoughts on this before our next call tomorrow?

I know for example that @mmenu-mantano was suggesting a calculation based on bytes rather than characters (as implemented in Bookari), in order to handle the calculation even without decrypting the resources in the reading order.
I'd love to hear how different this is from what @ullstrm has done for BookBeat or if other implementers think that it could work well.

cc @ullstrm @mmenu-mantano @danielweck @llemeurfr @aferditamuriqi

0 replies

mickael-menu-mantano · 2019-04-09T16:52:47Z

mickael-menu-mantano
Apr 9, 2019

Using bytes instead of characters is a good lightweight solution, because it's simple and doesn't require to cache the information. It could be a nice first implementation to provide this feature quickly before refining it.

For unencrypted books, just get the length metadata from the ZIP entries. For encrypted books, the unencrypted length is part of the encryption.xml file.

To get the current position, just use the progression percentage that we already have and apply it to the document's bytes length. Here again, it's an approximation but is good enough for most use cases, and doesn't need to determine the elapsed characters count of the visible page, so much simpler.

Most of the time, it fits the progression quite well, especially for text-heavy books. The amount of HTML tags is relative to the overall length so it doesn't weight much in the calculation.

Now this technique doesn't take into account pictures' height, so it might not be the best solution for every books.

0 replies

mickael-menu-mantano · 2019-04-09T16:57:39Z

mickael-menu-mantano
Apr 9, 2019

I realized that this issue is specifically about a total percentage. But I think requiring Locator.position for all formats, with a way to get the total number of positions is a better alternative to a total progression percentage:

it's trivial to get the total progression from it
you can display the position / total in the user interface
you can display the size of the book in relation to others to get an idea of its length, like the Kindle does

A related question is: how do we expose the total number of pages (positions) in the book? This information is important to make sense of the current position in the library, and currently we don't have any way to expose it (not part of the RWPM nor the Locator models).

Adding a total progression would allow to calculate the total number of pages from the position, but it kind of feel dirty?

I guess it depends on the navigator, because the position might be calculated by it. But creating a navigator to get the document's page count on book importation is not the best experience.

0 replies

HadrienGardeur · 2019-04-09T19:16:27Z

HadrienGardeur
Apr 9, 2019
Maintainer Author

Here's my personal take on this: we'll need format specific rules for calculating totalProgression.

EPUB

if we have a position list, we could simply calculate things this way: currentPosition / totalPositions = totalProgression
in the case of fully FXL publications, calculating positions and totalProgression is very straightforward: each resource in the readingOrder corresponds to a position
if we don't have a position list, we can fallback to bytes as explained by @mmenu-mantano

PDF & CBZ

PDF and CBZ files are fairly similar to fully FXL publications, since it's very easy to calculate positions
LCPDF files are slightly more difficult to handle, but we already need to handle this for positions

Audiobooks

for audiobooks, we always know the progression in seconds in a resource and the duration of each resource in the readingOrder
we can therefore calculate the totalProgression using the duration of the previous resources in the readingOrder, the progression in the current resource and the total duration of the audiobook

To answer @mmenu-mantano's questions:

A related question is: how do we expose the total number of pages (positions) in the book?

There's a specific page for that in our architecture repo, but it definitely needs to be updated.

I realized that this issue is specifically about a total percentage. But I think requiring Locator.position for all formats, with a way to get the total number of positions is a better alternative to a total progression percentage

This is possible but with some tweaks:

each Locator needs to be useful on its own, for an equivalent of totalProgression we would need each Locator to contain both position and totalPositions
for an audiobook, the concept of position isn't used currently and would need to transform position and totalPositions into a float that would accept 0 as a minimum value to express a timestamp in seconds

0 replies

JayPanoz · 2019-04-11T14:07:53Z

JayPanoz
Apr 11, 2019
Collaborator

Alright so that has just landed in my “feature requirements” list this morning – how timely!

(a.k.a. “We really like this thing Apple Books and Kindle are doing with progression bars, would it be possible to have it?” 😬)

The following are just a couple of thoughts and notes that I’m mentioning just in case they’re worth mentioning, nothing strongly-opinionated – so disclaimer: if it sounds I may be playing devil’s advocate here and there, it’s probably because there are underlying questions I failed to write in this form, and explanations would help make it clearer in my mind.

First and foremost, I absolutely agree with what Hadrien said in his latest comment as we’ve already had those discussions internally for some other stuff – more on this in a few paragraphs.

Reflow + no hint is most probably the (only?) outcast in all of EPUB.

But creating a navigator to get the document's page count on book importation is not the best experience.

This I guess can be illustrated by the “iBooks’ method of doing it” and yeah, if that’s the case, I’ve encountered occurrences in which it takes an awful lot of time (like 30 seconds – 1 minute) with the current webview being not interactive for the first 5 to 10 seconds.

I’m very sensitive to this because it has very quickly turned into one of my biggest concerns, cloudreader-wise: “simple and fast” are like the biggest piece of feedback we get from users and publishers – tbh, I would never have thought users care that much about speed and it was mind-blowing at first – so I must now live with the goal of not making it noticeably slower. 😬 In other words, I’m very open to good-enough solutions.

So we encountered total progression already, but for something else (server-side). To be transparent, we can preprocess files, and already extract the number of words in an XHTML file, but decided to not use it in the reader at runtime – it’s used later, server-side.

There are various reasons for that, but the main one is we obviously wanted to align with others/architecture.

So I may have a question about bytes – this is where I might be playing devil’s advocate, so sorry in advance, just being curious and not trying to imply I prefer words, because words have issues in and of themselves, if only because I don’t have a universal definition of “word” that works in all languages.

My conception of bytes is highly impacted by some JavaScript (a.k.a. “EcmaScript in browsers”) details i.e. I’m constantly told to be very cautious with those when I encounter strings, because 1 char ≠ 1 byte (can be 2, 3, 4 for CJK or emoji, and whatnot).

Quick examples for the curious:

"a".length
→ 1
new Blob(["a"]).size
→ 1

"é".length
→ 1
new Blob(["é"]).size
→ 2

"退".length
→ 1
new Blob(["退"]).size
→ 3

"😛".length
→ 2
new Blob(["😛"]).size
→ 4

My understanding is that this won’t make any difference, correct? Again please feel very free to correct me or explain in details how I’m wrong, like I said this kinda is my daily relationship with those, so I may have trouble moving this conception to another context.

My other question I guess would be about HTML tags as I encountered more and more markup with EPUB3. ePub2 was pretty simple to this regard but EPUB3, you may have additional attributes like epub:type and role, etc. and sometimes there is a lot more markup that textContent in a file’s body.

But again, I guess it doesn’t make any difference?

As you can see those are very naïve notes that comes to my mind and I prefer to share them right now in case they are significant. If not, all is good as far as I’m concerned.

0 replies

HadrienGardeur · 2019-04-11T14:23:05Z

HadrienGardeur
Apr 11, 2019
Maintainer Author

@JayPanoz

So we encountered total progression already, but for something else (server-side). To be transparent, we can preprocess files, and already extract the number of words in an XHTML file, but decided to not use it in the reader at runtime – it’s used later, server-side.

My recommendation is to send this to the Web app using the Positions List API.

Once you have the full list of positions in memory, it's very easy to use it:

calculate the progression in the current resource
match your current href and progression to identify which position(s) (could be more than one) are currently displayed to the user
display the position to the user in your UI or calculate the corresponding totalProgression

Building a timeline should be every bit as easy.

0 replies

JayPanoz · 2019-04-11T14:38:53Z

JayPanoz
Apr 11, 2019
Collaborator

@HadrienGardeur ah thanks for the pointer.

Question then, I guess, would be internationalisation and all this stuff.

If say bytes make it a lot easier to handle some languages because we have to, for instance, redefine what a word is – effectively turning the language into an exception/another case – then bytes would be the obvious choice.

That and I don’t know if we’ll always have the luxury to preprocess files.

I’ve been burnt by i18n once (ReadiumCSS) so as you can see, I prefer to be pretty cautious about that now. :-)

0 replies

HadrienGardeur · 2019-04-11T14:50:05Z

HadrienGardeur
Apr 11, 2019
Maintainer Author

If say bytes make it a lot easier to handle some languages because we have to, for instance, redefine what a word is – effectively turning the language into an exception/another case – then bytes would be the obvious choice.

In that case, you'd either need to communicate the length of each resource in bytes in the manifest (requires some work on the backend) or send a HEAD request for every resource (not ideal).

0 replies

JayPanoz · 2019-04-11T15:14:57Z

JayPanoz
Apr 11, 2019
Collaborator

In both cases, the streamer is consequently adding those “hints” for totalPositions then, right?

I’m wondering whether that doesn’t answer

how do we expose the total number of pages (positions) in the book?

To some extent.

0 replies

mickael-menu-mantano · 2019-04-12T13:36:32Z

mickael-menu-mantano
Apr 12, 2019

Regarding the bytes/characters mismatch, it's actually not such a problem. The bytes in this case are really not a substitute for the characters but more of a metric allowing to see how much the resources weight in the publication's overall reading time / size. So this is really not a precise measure, but in practice I think it's a good enough and efficient way to display a progress bar in the book, and to determine page numbers.

Also, it's actually taking into account the images sizes to some extent, in contrary to what I said earlier. Because we are using the progression percent (derived from the web view) to calculate the page number in a given resource, the page numbers progression will count the images in the chapter. However, images don't weight much in the chapter's bytes so if a chapter has a lot of images and very few texts, then it will be seen as a small chapter in the overall publication.

This I guess can be illustrated by the “iBooks’ method of doing it” and yeah, if that’s the case, I’ve encountered occurrences in which it takes an awful lot of time (like 30 seconds – 1 minute) with the current webview being not interactive for the first 5 to 10 seconds.

Actually this is different, in this case iBooks is paginating all the resources using the current appearance settings to figure out the exact total page numbers of a reflowable resource. It's not very efficient because you have to recalculate everything every time the user changes the font size, margins, etc. In our case, the positions won't represent "physical/screen" page numbers but will arbitrarily split the resources into virtual pages (by bytes, or by characters).

What I was talking about, is that as an integrator I would want the total number of pages to be available directly in Publication during the book importation, instead of requiring to create a EPUB navigator (that will be discarded right away instead of presenting the book to the user) to calculate the total pages count.

0 replies

JayPanoz · 2019-04-12T15:14:23Z

JayPanoz
Apr 12, 2019
Collaborator

@mmenu-mantano Ah thanks for the details and clarification – somewhat felt there was a difference there but was too late as I already posted. And yeah agree with the last paragraph entirely. :-)

0 replies

HadrienGardeur · 2019-04-12T19:27:41Z

HadrienGardeur
Apr 12, 2019
Maintainer Author

What I was talking about, is that as an integrator I would want the total number of pages to be available directly in Publication during the book importation, instead of requiring to create a EPUB navigator (that will be discarded right away instead of presenting the book to the user) to calculate the total pages count.

+1 on that but I think there are two different things to consider:

on mobile, this will probably be handled asynchronously and end up in the in-memory model
on the Web, I think that a separate URL and a request are still the best way to handle this (cf Positions List)

0 replies

llemeurfr · 2019-04-17T15:27:12Z

llemeurfr
Apr 17, 2019
Maintainer

My take: if the size of each resource is present in the publication object, we can write a helper function that computes the totalProgression from the size (in bytes) of each unencrypted resource + the local progression. In this case, we can consider storing totalProgression redundant.

Another advantage of using the size in bytes of each resource rather that the number of positions in each resource is that it is more in sync with what we'll have to do for audiobooks (see Hadrien's comment above).

Note that as each "position" is a bloc of 1024 bytes, using bytes or positions is only a matter of precision.

0 replies

mickael-menu-mantano · 2019-04-18T08:22:06Z

mickael-menu-mantano
Apr 18, 2019

For PDF and CBZ the size in bytes wouldn't be use to calculate the totalProgression. If we have the total number of pages (not obvious for LCPDF) then the helper can calculate the progression differently depending on the format.
However, with a lack of a unified metadata to store to calculate the total progression, we need to have the Publication opened to calculate those, which could limit the possibilities outside of the navigator.

I think it might be worth adding the totalProgression to the Locator in the end, since there's nothing that can be used universally for all formats (unless we allow floats for position)

0 replies

dryize · 2021-01-03T17:23:38Z

dryize
Jan 3, 2021

What happened to this idea? Seems like it was abandoned. Any reason for that?

1 reply

mickael-menu Jan 3, 2021
Collaborator

It's already implemented in both iOS and Android, on the develop branches. Do you have a problem using it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Total progression in a publication for locators #90

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 15 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Total progression in a publication for locators #90

HadrienGardeur Apr 4, 2019 Maintainer

Replies: 15 comments · 1 reply

HadrienGardeur Apr 9, 2019 Maintainer Author

mickael-menu-mantano Apr 9, 2019

mickael-menu-mantano Apr 9, 2019

HadrienGardeur Apr 9, 2019 Maintainer Author

JayPanoz Apr 11, 2019 Collaborator

HadrienGardeur Apr 11, 2019 Maintainer Author

JayPanoz Apr 11, 2019 Collaborator

HadrienGardeur Apr 11, 2019 Maintainer Author

JayPanoz Apr 11, 2019 Collaborator

mickael-menu-mantano Apr 12, 2019

JayPanoz Apr 12, 2019 Collaborator

HadrienGardeur Apr 12, 2019 Maintainer Author

llemeurfr Apr 17, 2019 Maintainer

mickael-menu-mantano Apr 18, 2019

dryize Jan 3, 2021

mickael-menu Jan 3, 2021 Collaborator

HadrienGardeur
Apr 4, 2019
Maintainer

Replies: 15 comments 1 reply

HadrienGardeur
Apr 9, 2019
Maintainer Author

mickael-menu-mantano
Apr 9, 2019

mickael-menu-mantano
Apr 9, 2019

HadrienGardeur
Apr 9, 2019
Maintainer Author

JayPanoz
Apr 11, 2019
Collaborator

HadrienGardeur
Apr 11, 2019
Maintainer Author

JayPanoz
Apr 11, 2019
Collaborator

HadrienGardeur
Apr 11, 2019
Maintainer Author

JayPanoz
Apr 11, 2019
Collaborator

mickael-menu-mantano
Apr 12, 2019

JayPanoz
Apr 12, 2019
Collaborator

HadrienGardeur
Apr 12, 2019
Maintainer Author

llemeurfr
Apr 17, 2019
Maintainer

mickael-menu-mantano
Apr 18, 2019

dryize
Jan 3, 2021

mickael-menu Jan 3, 2021
Collaborator