Total progression in a publication for locators #90
Replies: 15 comments 1 reply
-
Any thoughts on this before our next call tomorrow? I know for example that @mmenu-mantano was suggesting a calculation based on bytes rather than characters (as implemented in Bookari), in order to handle the calculation even without decrypting the resources in the reading order. cc @ullstrm @mmenu-mantano @danielweck @llemeurfr @aferditamuriqi |
Beta Was this translation helpful? Give feedback.
-
Using bytes instead of characters is a good lightweight solution, because it's simple and doesn't require to cache the information. It could be a nice first implementation to provide this feature quickly before refining it. For unencrypted books, just get the length metadata from the ZIP entries. For encrypted books, the unencrypted length is part of the encryption.xml file. To get the current position, just use the progression percentage that we already have and apply it to the document's bytes length. Here again, it's an approximation but is good enough for most use cases, and doesn't need to determine the elapsed characters count of the visible page, so much simpler. Most of the time, it fits the progression quite well, especially for text-heavy books. The amount of HTML tags is relative to the overall length so it doesn't weight much in the calculation. Now this technique doesn't take into account pictures' height, so it might not be the best solution for every books. |
Beta Was this translation helpful? Give feedback.
-
I realized that this issue is specifically about a total percentage. But I think requiring
A related question is: how do we expose the total number of pages (positions) in the book? This information is important to make sense of the current position in the library, and currently we don't have any way to expose it (not part of the RWPM nor the Locator models). Adding a total progression would allow to calculate the total number of pages from the position, but it kind of feel dirty? I guess it depends on the navigator, because the position might be calculated by it. But creating a navigator to get the document's page count on book importation is not the best experience. |
Beta Was this translation helpful? Give feedback.
-
Here's my personal take on this: we'll need format specific rules for calculating EPUB
PDF & CBZ
Audiobooks
To answer @mmenu-mantano's questions:
There's a specific page for that in our architecture repo, but it definitely needs to be updated.
This is possible but with some tweaks:
|
Beta Was this translation helpful? Give feedback.
-
Alright so that has just landed in my “feature requirements” list this morning – how timely! (a.k.a. “We really like this thing Apple Books and Kindle are doing with progression bars, would it be possible to have it?” 😬) The following are just a couple of thoughts and notes that I’m mentioning just in case they’re worth mentioning, nothing strongly-opinionated – so disclaimer: if it sounds I may be playing devil’s advocate here and there, it’s probably because there are underlying questions I failed to write in this form, and explanations would help make it clearer in my mind. First and foremost, I absolutely agree with what Hadrien said in his latest comment as we’ve already had those discussions internally for some other stuff – more on this in a few paragraphs. Reflow + no hint is most probably the (only?) outcast in all of EPUB.
This I guess can be illustrated by the “iBooks’ method of doing it” and yeah, if that’s the case, I’ve encountered occurrences in which it takes an awful lot of time (like 30 seconds – 1 minute) with the current webview being not interactive for the first 5 to 10 seconds. I’m very sensitive to this because it has very quickly turned into one of my biggest concerns, cloudreader-wise: “simple and fast” are like the biggest piece of feedback we get from users and publishers – tbh, I would never have thought users care that much about speed and it was mind-blowing at first – so I must now live with the goal of not making it noticeably slower. 😬 In other words, I’m very open to good-enough solutions. So we encountered total progression already, but for something else (server-side). To be transparent, we can preprocess files, and already extract the number of words in an XHTML file, but decided to not use it in the reader at runtime – it’s used later, server-side. There are various reasons for that, but the main one is we obviously wanted to align with others/architecture. So I may have a question about bytes – this is where I might be playing devil’s advocate, so sorry in advance, just being curious and not trying to imply I prefer words, because words have issues in and of themselves, if only because I don’t have a universal definition of “word” that works in all languages. My conception of bytes is highly impacted by some JavaScript (a.k.a. “EcmaScript in browsers”) details i.e. I’m constantly told to be very cautious with those when I encounter strings, because 1 char ≠ 1 byte (can be 2, 3, 4 for CJK or emoji, and whatnot). Quick examples for the curious:
My understanding is that this won’t make any difference, correct? Again please feel very free to correct me or explain in details how I’m wrong, like I said this kinda is my daily relationship with those, so I may have trouble moving this conception to another context. My other question I guess would be about HTML tags as I encountered more and more markup with EPUB3. ePub2 was pretty simple to this regard but EPUB3, you may have additional attributes like But again, I guess it doesn’t make any difference? As you can see those are very naïve notes that comes to my mind and I prefer to share them right now in case they are significant. If not, all is good as far as I’m concerned. |
Beta Was this translation helpful? Give feedback.
-
My recommendation is to send this to the Web app using the Positions List API. Once you have the full list of positions in memory, it's very easy to use it:
Building a timeline should be every bit as easy. |
Beta Was this translation helpful? Give feedback.
-
@HadrienGardeur ah thanks for the pointer. Question then, I guess, would be internationalisation and all this stuff. If say bytes make it a lot easier to handle some languages because we have to, for instance, redefine what a word is – effectively turning the language into an exception/another case – then bytes would be the obvious choice. That and I don’t know if we’ll always have the luxury to preprocess files. I’ve been burnt by i18n once (ReadiumCSS) so as you can see, I prefer to be pretty cautious about that now. :-) |
Beta Was this translation helpful? Give feedback.
-
In that case, you'd either need to communicate the length of each resource in bytes in the manifest (requires some work on the backend) or send a HEAD request for every resource (not ideal). |
Beta Was this translation helpful? Give feedback.
-
In both cases, the streamer is consequently adding those “hints” for I’m wondering whether that doesn’t answer
To some extent. |
Beta Was this translation helpful? Give feedback.
-
Regarding the bytes/characters mismatch, it's actually not such a problem. The bytes in this case are really not a substitute for the characters but more of a metric allowing to see how much the resources weight in the publication's overall reading time / size. So this is really not a precise measure, but in practice I think it's a good enough and efficient way to display a progress bar in the book, and to determine page numbers. Also, it's actually taking into account the images sizes to some extent, in contrary to what I said earlier. Because we are using the progression percent (derived from the web view) to calculate the page number in a given resource, the page numbers progression will count the images in the chapter. However, images don't weight much in the chapter's bytes so if a chapter has a lot of images and very few texts, then it will be seen as a small chapter in the overall publication.
Actually this is different, in this case iBooks is paginating all the resources using the current appearance settings to figure out the exact total page numbers of a reflowable resource. It's not very efficient because you have to recalculate everything every time the user changes the font size, margins, etc. In our case, the positions won't represent "physical/screen" page numbers but will arbitrarily split the resources into virtual pages (by bytes, or by characters). What I was talking about, is that as an integrator I would want the total number of pages to be available directly in |
Beta Was this translation helpful? Give feedback.
-
@mmenu-mantano Ah thanks for the details and clarification – somewhat felt there was a difference there but was too late as I already posted. And yeah agree with the last paragraph entirely. :-) |
Beta Was this translation helpful? Give feedback.
-
+1 on that but I think there are two different things to consider:
|
Beta Was this translation helpful? Give feedback.
-
My take: if the size of each resource is present in the publication object, we can write a helper function that computes the totalProgression from the size (in bytes) of each unencrypted resource + the local progression. In this case, we can consider storing totalProgression redundant. Another advantage of using the size in bytes of each resource rather that the number of positions in each resource is that it is more in sync with what we'll have to do for audiobooks (see Hadrien's comment above). Note that as each "position" is a bloc of 1024 bytes, using bytes or positions is only a matter of precision. |
Beta Was this translation helpful? Give feedback.
-
For PDF and CBZ the size in bytes wouldn't be use to calculate the I think it might be worth adding the |
Beta Was this translation helpful? Give feedback.
-
What happened to this idea? Seems like it was abandoned. Any reason for that? |
Beta Was this translation helpful? Give feedback.
-
Total progression as a % in a publication is one of the most common way to indicate progression to a user:
Currently, the only way to calculate this information with a locator is indirect:
position
of a locator, as long as you also know the total number of positionsreadingOrder
readingOrder
plus the total number of resources in thereadingOrder
Given the universal nature of a % in a publication, its usefulness for sorting locators and its wide usage in reading apps, I would like to introduce
totalProgression
as a newlocation
in the Locator model.I'm including a few contributors that have their own opinion on this:
Beta Was this translation helpful? Give feedback.
All reactions