Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ensure choices/option parent is set correctly #750

Closed
wants to merge 1 commit into from

Conversation

ukanga
Copy link
Contributor

@ukanga ukanga commented Jan 27, 2025

Closes #749

Why is this the best possible solution? Were any other approaches considered?

Restores the previous behaviour where traversing choices always point to the parent they are assigned to.
The deep copy is at the last possible place when the choices are assigned to a question before the parent is set.

What are the regression risks?

Does this change require updates to documentation? If so, please file an issue here and include the link below.

No

Before submitting this PR, please make sure you have:

  • included test cases for core behavior and edge cases in tests
  • run python -m unittest and verified all tests pass
  • run ruff format pyxform tests and ruff check pyxform tests to lint code
  • verified that any code or assets from external sources are properly credited in comments

@lognaturel
Copy link
Contributor

Thanks! Can you please check the import order for the linter?

@lindsay-stevens
Copy link
Contributor

Sorry but I'm strongly against this change. Many performance and correctness issues have been due to dict-like behaviour in pyxform and this change propagates that assumption, making it more difficult to transition to classes. Unnecessarily performing copies or deepcopies of dicts, either actual dicts (e.g. the builder module) or dict-like objects (e.g. SurveyElements), is the kind of thing I've worked hard to systematically eliminate over the past few months. In relation to issue #749 this is about internals not exposed in xls2xform, and it doesn't affect XForm conversion correctness.

Anyway, navigating SurveyElements via parent/child for questions and choices will not be compatible usage soon. I identified this parent reference issue a few weeks ago and it is addressed in #746 (mainly commit 6927767) by using an Itemset class and changing the SurveyElement structures to remove the parent-child relationship for Questions and Choices. In other words, a choice list doesn't really "belong" to a Question in any sense that is meaningful to generating the XForm: a choice belongs to an Itemset and an Itemset belongs to Survey; and one or more Questions can reference each Itemset. In the context of the generated XForm, this corresponds to the change from choices being copied into the question and always being in the body with the select, to the current behaviour of model-level secondary instance elements for choices with references to those instances from selects (except for some weirdness to accommodate the deprecated search appearance). A similar duplication/unnecessary reference should soon be removed from xls2json add_choices_info_to_question which has been there for about 18 months for the backwards compatibility of external code, but slows down every XForm conversion and complicates the internal code.

External code that wants to use these pyxform internals should reference the choices on the Survey.choices dict using the MultipleChoiceQuestion.itemset name as the key, instead of assuming a copy of the choices exists in MultipleChoiceQuestion.children or .choices; and do the equivalent if using the internal dict structure emitted from workbook_to_json, or any SurveyElement.to_json_dict.

@ukanga
Copy link
Contributor Author

ukanga commented Jan 28, 2025

@lindsay-stevens Can we then remove the parent reference that currently exists with choices? The reference exists, hence the assumption that it is broken, but if the expected behaviour is that it should not be aware of the parent, then we should make it so.

@ukanga ukanga force-pushed the fix-option-parent branch from d433a7b to 6c4970b Compare January 28, 2025 09:54
@ukanga
Copy link
Contributor Author

ukanga commented Jan 28, 2025

@lindsay-stevens Are we not meant to use the to_json() and to_json_dict() functions?
Is there a plan to eliminate these functions at some point?

I use them so I do not have to reread the XLSForm file, specifically when making minor server-side modifications to the survey, such as setting a version.

@lindsay-stevens
Copy link
Contributor

Can we then remove the parent reference that currently exists with choices?

I have opened #751 for this. It makes sense but it's not a simple refactor, and should come after #746 is merged.

Are we not meant to use the to_json() and to_json_dict() functions?

No immediate plans to remove, but these serialisation functions seem to be only used in tests, the SurveyElement.__eq__ usage has no hits when running the test suite, the functions aren't accessible from xls2xform, and I'm not sure if the result dict object can be fed back in to pyxform for it to really qualify as a serialisation. So I think they could be candidates for removal. During the recent refactoring efforts I have spent a while making sure they still work according to the tests so from a maintenance perspective it may be better to remove. Reading the supported (rectangular) XLSForm formats is faster now and realistically that's how most users will serialise their XLSForm definitions.

@ukanga
Copy link
Contributor Author

ukanga commented Jan 28, 2025

@lindsay-stevens, how do you access the (rectangular) XLSForm formats? Assuming you have an Excel file representing the XLSForm, you plan not to reread it once you process it but would want to persist a representation of it and make some modifications.

I mainly use pyxform as a library, and such representation is quite helpful to the functionality we have built using this library. Losing such changes will result in a need for a tremendous rewrite of an 11-year-old codebase.

@lindsay-stevens
Copy link
Contributor

By rectangular XLSForm formats I'm referring to XLSX, XLS, CSV, and MD. Is this input data not retained by your app? Why cannot the modifications be done in-memory on the SurveyElement structures rather than their dict copies? What kinds of modifications are being made? Where and how is the intermediate representation persisted? How is the representation conversion resumed e.g. where in the pipeline is this modified data fed back in to pyxform?

I don't know about your use case but I hope you can appreciate that it will be nearly impossible to non-disruptively maintain pyxform against unknown requirements placed on its internal data structures and behaviour. I think the solution will involve discussion about what exactly your app is doing, what it's needs and assumptions are, and how that might fit into an explicitly supported use case for pyxform. In the meantime to_json_dict still works but if they are to be retained I think the above should be resolved.

@ukanga
Copy link
Contributor Author

ukanga commented Jan 28, 2025

The output of survey.to_json() is persisted in the DB. Later on, days/weeks/months, the user can make form title and form version changes without re-uploading an updated XLSForm (xlsx) file. This avoids the round trip of the user downloading the XLSForm they uploaded, making modifications to it, and then re-uploading.

Similar changes might be done when changing or adding an public key to make the survey use encryption.

@ukanga ukanga closed this Jan 28, 2025
@ukanga ukanga mentioned this pull request Jan 28, 2025
4 tasks
@lognaturel
Copy link
Contributor

My apologies, I took a quick look at this and didn't think through the implications and the broader work in progress!

As far as I can tell, there was never an effort to document or clarify what should be intended as a stable interface to pyxform beyond the XML output for clients. As @lindsay-stevens describes, that has lead to different systems making different assumptions around what should be considered public vs private. As we work to improve performance and make some aspects of the system easier to evolve, we're bound to break some integrators, and we're trying to signal that with major release bumps and discussion in PRs as they come in. Please join those conversations!

@ukanga did you close this because you've found a satisfactory solution for now? Given that you are updating specific, well-known fields, maybe you don't need pyxform to make those changes? Or maybe a public interface specifically to make those changes could be provided? I'm not seeing the connection between choices and the needs you've mentioned, though!

@ukanga
Copy link
Contributor Author

ukanga commented Jan 29, 2025

@lognaturel I found a different way to handle it, I think. I am still identifying and fixing tests on onadata to accommodate the changes introduced with release https://github.com/XLSForm/pyxform/releases/tag/v3.0.0. This has been manageable and was almost done.

However, with the proposed changes on #746, I don't think we can use pyxform as a library the way we have been using it. I hope there is a way to persist the new structure other than the XML and the source XLSForm file since reprocessing those is overkill for our use case.

For now, I will pin us to https://github.com/XLSForm/pyxform/releases/tag/v3.0.0 + 6524d17 which is not as chaotic for us for the time being. We will wait and see what
#746 does and then handle the changes when that is tagged and released.

Do consider the intermediate structure pyxform has as a useful part of the library, not just the final XForm XML.

@lognaturel
Copy link
Contributor

lognaturel commented Jan 29, 2025

This has been manageable and was almost done.

Thanks, that's great to hear! Do keep us posted on any other issues you may run into.

identifying and fixing tests on onadata

Should some of those tests be in pyxform instead? A big challenge we have in trying to improve pyxform is that everything is public and none of the expectations for integration are documented. That makes refactoring safely quite difficult. Maybe we could identify some desired integration points and make sure they're well-documented and tested.

For now, I will pin us to https://github.com/XLSForm/pyxform/releases/tag/v3.0.0 + 6524d17 which is not as chaotic for us for the time being

That sounds great. @lindsay-stevens are syncing up shortly and will discuss how to help accommodate your usage. Do you actually use OMK still? My understanding was that the app had been long deprecated and that select_one_from_map in ODK Collect replaces most desired usage.

@ukanga
Copy link
Contributor Author

ukanga commented Jan 29, 2025

That sounds great. @lindsay-stevens are syncing up shortly and will discuss how to help accommodate your usage. Do you actually use OMK still? My understanding was that the app had been long deprecated and that select_one_from_map in ODK Collect replaces most desired usage.

It's okay to remove OMK. I am not aware of any current usage. However, we do have plenty of data collected with OSM. None of it is active anymore, but I may need to check. There are no reservations about removing the functionality; the use case evolved, and the select_one_from_map is only one way of the different ways it evolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Choices have reference to the last parent in the event two or more questions use the same choices
3 participants