Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPDSParser.parse is always returning true #1099

Open
BPerlakiH opened this issue Jul 12, 2024 · 8 comments · May be fixed by #1100
Open

OPDSParser.parse is always returning true #1099

BPerlakiH opened this issue Jul 12, 2024 · 8 comments · May be fixed by #1100
Assignees
Labels
Milestone

Comments

@BPerlakiH
Copy link

BPerlakiH commented Jul 12, 2024

Updated:

On the Apple reader side, we would like to be rest assured, that when invalid data is passed into the OPDSParser.parse(data:) function, we can get a false value back,
so we can handle it gracefully. We already had a unit test on the reader side, but so far it was disabled.

Currently the libkiwix function never returns false:

libkiwix/src/manager.cpp

Lines 167 to 179 in ece4096

bool Manager::readOpds(const std::string& content, const std::string& urlHost)
{
pugi::xml_document doc;
pugi::xml_parse_result result
= doc.load_buffer((void*)content.data(), content.size());
if (result) {
this->parseOpdsDom(doc, urlHost);
return true;
}
return false;
}

Whereas on compiled libkiwix header files, I see this information, which is misleading:

  /**
   * Load a library content stored in a OPDS stream.
   *
   * @param content The content of the OPDS stream.
   * @param readOnly Set if the library path could be overwritten later with
   *                 updated content.
   * @param libraryPath The library path (used to resolve relative path)
   * @return True if the content has been properly parsed.
   */
  bool readOpds(const std::string& content, const std::string& urlHost);

So we either need to return False in case of invalid content, or if that's not possible, update the header file to reflect this special case.

@kelson42
Copy link
Collaborator

kelson42 commented Jul 15, 2024

@BPerlakiH Which exception is thrown? And what exactly is wrong in the OPDS stream to trigger this exception?

@veloman-yunkan
Copy link
Collaborator

I think that the bug description is not formulated correctly. I believe that it should rather state that Manager::readOpds() is not fool-proof against malformed input with an example of such input provided. The fix should then come with one or more unit-tests covering such scenarios.

@mgautierfr
Copy link
Member

From the screenshot in kiwix/kiwix-apple#851, it seems this is a allocation error (not enough memory ? to much memory allocated ?) inside icu73 (called by kiwix::removeAccents).
This may be us doing strange things when trying to remove accents somehow.

But we need to fix this issue first. This is not related to this method always returning true (even if we have to fix this anyway).

@mgautierfr
Copy link
Member

I have open the issue #1101 about the memory allocation error in kiwix::removeAccents.

I keep this issue open about the method always returning true.

@kelson42
Copy link
Collaborator

kelson42 commented Jul 19, 2024

@BPerlakiH Can you please open the corresponding issue lin this repo to clearly state what does not work?

@BPerlakiH
Copy link
Author

BPerlakiH commented Jul 20, 2024

Sorry, I also think that the description was misleading. I did update it, to reflect a more narrow scope: the actual return value of this specific libkiwix function, and how it is described in the compiled header file.

@kelson42
Copy link
Collaborator

kelson42 commented Jul 26, 2024

@veloman-yunkan @mgautierfr We need your feedback here. In a way or the other the libkiwix user would benefit to know if the parsing went well or not. The boolean returns seems appropriate for that and this is what the documentation implies. Documentation and testset around this function should probably be updated.

@kelson42 kelson42 changed the title Manger readOps is always returning true OPDSParser.parse is always returning true Jul 26, 2024
@kelson42 kelson42 assigned veloman-yunkan and unassigned BPerlakiH Jul 26, 2024
@veloman-yunkan
Copy link
Collaborator

Currently the libkiwix function never returns false:

libkiwix/src/manager.cpp

Lines 167 to 179 in ece4096

bool Manager::readOpds(const std::string& content, const std::string& urlHost)
{
pugi::xml_document doc;
pugi::xml_parse_result result
= doc.load_buffer((void*)content.data(), content.size());
if (result) {
this->parseOpdsDom(doc, urlHost);
return true;
}
return false;
}

The code proves the opposite - Manager::readOpds() returns false in case of malformed (syntactically incorrect) XML. Yet, it does return true if the XML content does not represent valid or reasonable OPDS data, but before fixing anything here we have to define how it should behave in those cases. For example, what should it return when

  1. the XML is a KML file rather than OPDS;
  2. the XML is an OPDS response from a new API endpoint added in a more recent version of kiwix-server;
  3. the XML is an OPDS response generated for example by a malicious or buggy kiwix-server that contains e.g. a string where a number is expected, or contains duplicate entries with the same book ID but different attributes (title, category, etc).
  4. when certain attributes in the OPDS are obviously wrong (e.g. a date string reading "2020-15-48T34:71:95") or unreasonable (a date pointing to 100 years in the future).
  5. and so on.

@kelson42 kelson42 modified the milestones: 14.0.0, 14.1.0 Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants