Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate unknown when an explicit flag is set, in the form unknown=...|PROPAGATE #1634

Closed

Conversation

sirosen
Copy link
Contributor

@sirosen sirosen commented Jul 17, 2020

This is based upon the work in #1490 , but with some significant modifications.
In retrospect, it might have been simpler to base it on #1429 , or just do the work from scratch, but I'm not sure.
closes #1428, #1429, #1490, and #1575

The biggest change from #1490 or #1429 I've made is that the behavior is never implied by certain usages -- it's always explicit. There's no distinction between unknown being passed at load time, to Nested, at schema instantiation, or via schema options. Instead, all of these contexts will support propagate_unknown=True, which turns on propagation.
This approach might not be all that elegant -- it somewhat bloats the public interface -- but it is guaranteed to be backwards compatible. Because the behavior is opt-in, nobody should have behavior change on them unexpectedly with this update.

Quick note: I wasn't sure if it is desirable to allow propagate_unknown in all of these contexts, especially Nested. I included it for now, but would be open to removing it.

propagate_unknown as written does not respect setting propagate_unknown=False in a nested schema. It's not something I considered particularly important (since the whole point is to "turn the feature on") but I can look at tweaking that if it's likely to be confusing.

One important bit of this is that I wanted propagate_unknown to understand the difference between unknown=RAISE being set because it's the default, and someone writing fields.Nested(MySchema(unknown=RAISE)), in which case they're asking for it explicitly. To handle this, __init__ checks before setting unknown. If no value was passed, in addition to setting unknown=RAISE, it will set a new value, auto_unknown=True.
During loading, whenever propagate_unknown=True and auto_unknown=False, the unknown value for the nested schema in question will be respected. Furthermore, that value will start to propagate. The same happens, though without use of auto_unknown, wherever fields.Nested.unknown is set.

Reading this comment on #1429, my understanding was that handling these kinds of scenarios was considered to be necessary. With the auto_unknown tracking and behavior, it's possible to support cases like this one:

class D(Schema):
    x = fields.Str()
class C(Schema):
    x = fields.Str()
class B(Schema):
    d = fields.Nested(D(unknown=EXCLUDE))
    c = fields.Nested(C)
class A(Schema):
    b1 = fields.Nested(B(unknown=INCLUDE))
    b2 = fields.Nested(B())

A(propagate_unknown=True, unknown=RAISE).load(...)

In the above, everything should "percolate down" in a relatively intuitive way. Importantly, that top-level value for unknown=RAISE doesn't override the use of unknown=INCLUDE and unknown=EXCLUDE in descendant schemas.

I've added a few simple tests, but I haven't aimed to be completely comprehensive because there are too many scenarios to think about. It's hard to tell what's important to test this way and what is not. If there's something I ought to be testing which is missing, just say that word and I'm happy to add more test cases.


This is my first time touching marshmallow, and I'm changing the interfaces for Schema, fields.Nested, and schema opts. So I would not be at all surprised if I've missed something significant -- please just let me know if so.

@sirosen
Copy link
Contributor Author

sirosen commented Aug 11, 2020

@sloria, could I push for your (or another marshmallow maintainer's) feedback on this approach? The implementation aside, I'm not sure if this is the way to solve this problem.

@sloria
Copy link
Member

sloria commented Aug 12, 2020

Apologies for the delay. Unfortunately I won't be able to give this the focused attention this needs in the next week or so--lots of work and non-work commitments coming up fast.

I thought we'd decided against this behavior in past proposals:

An instance method argument is not the only way to define schema meta options in bulk. Practical solutions were linked to in #1490. Maximizing convenience does not seem to be a strong reason to bake this functionality into the API since the work arounds are simple and the use case is obscure.

#1575 (comment)

I think this proposed change would make it too easy for developers to implicitly opt into undesired behavior. I don't think there is any precedent for method args behaving this much differently than their Meta counterparts, which would make it unexpected behavior for most developers.

#1428 (comment)

And potential solutions requiring no changes to marshmallow:

@deckar01
Copy link
Member

This does work around some of the issues with past proposals, but it illustrates how much API surface area has to be added to get this behavior. It takes significantly less code to use a factory pattern, and it covers more complex use cases.

def get_schema(unknown=None):
   class D(Schema):
       x = fields.Str()
   class C(Schema):
       x = fields.Str()
   class B(Schema):
       d = fields.Nested(D(unknown=unknown or EXCLUDE))
       c = fields.Nested(C)
   class A(Schema):
       b1 = fields.Nested(B(unknown=unknown or INCLUDE))
       b2 = fields.Nested(B())
  return A

A = get_schema(unknown=RAISE)
A().load(...)

@sirosen
Copy link
Contributor Author

sirosen commented Aug 12, 2020

The factory pattern or a custom base schema probably covers all usage. I think the 99% case for users wanting this is this simple: a user wants to set EXCLUDE or INCLUDE on a whole tree, they specify schema.load(unknown=...), and then they are surprised by the result not being what they wanted.

I thought we'd decided against this behavior in past proposals

Not having participated in the past threads, the fact that other PRs for this have been left open led me to believe the door was being intentionally left open to a better approach.

Even that aside, users are somewhat consistently surprised by the fact that unknown doesn't automatically propagate down. IMO, it merits a second look (at least, since #980) on that account alone.

it illustrates how much API surface area has to be added to get this behavior.

I take your comment to also mean "changing the API this much makes things worse / more confusing". Which is something with which I agree.

I can work to smooth out the public API by making the values for unknown into specialized objects.
Usage like unknown=EXCLUDE | PROPAGATE would be nice, from a user-perspective. The API doesn't change at all that way (if we consider repr(EXCLUDE) to be non-public, which I would). I didn't bother originally because it's more work to support, but I think it provides a better API.

sample of an or-able UnknownParam class
class UnknownParam:
    def __init__(self, value=None, propagate=False):
        self.value = value
        self.propagate = propagate

    def __or__(self, other):
        return UnknownParam(
            self.value or other.value, self.propagate or other.propagate
        )

    def __str__(self):
        if self.value:
            return self.value
        if self.propagate:
            return "propagate"
        return "null"

    def __repr__(self):
        return "UnknownParam(value={!r}, propagate={!r})".format(
            self.value, self.propagate
        )

    def is_good(self):
        return self.value is not None


EXCLUDE = UnknownParam("exclude")
INCLUDE = UnknownParam("include")
RAISE = UnknownParam("raise")
PROPAGATE = UnknownParam(propagate=True)

I'm happy to rework this existing PR to use that, if it would stand a chance at being accepted.

As far as I'm concerned, that's just sugar -- which I'm happy to work on, since the public API is important! -- but I'd just be doing the same thing internally. I don't want to do that work unless we agree that allowing users to opt-in to propagation like this is desirable in the first place.


If no functional change is going to be made, I think something explicit needs to be added somewhere in the docs. I would be inclined to phrase it positively, in terms of how to apply unknown=EXCLUDE or unknown=INCLUDE to a tree of schemas.
For example, I would be happy to add to the relevant quickstart section in a new PR.

suggested addition to quickstart docs
Setting `unknown` for all schemas in a tree
-----

Note that the `unknown` setting does not automatically propagate to nested schemas.
To set `unknown` for an entire schema tree, use a shared base class.

For example:

class ExcludingSchema(Schema):
    class Meta:
        unknown = EXCLUDE

class C(ExcludingSchema):
    x = fields.Str()
class B(ExcludingSchema):
    c = fields.Nested(C)
class A(ExcludingSchema):
    b = fields.Nested(B)

A().load({"b": {"c": {"x": "foo", "y": "bar"}, "d": "baz"}})
# => {"b": {"c": {"x": "foo"}}}

@deckar01
Copy link
Member

I take your comment to also mean "changing the API this much makes things worse / more confusing".

To clarify: I think the number of method arguments added is disproportionate to the benefit it provides. I generally promote providing all the data needed to satisfy special use cases rather than implementing them in the core. A leaner core is better for performance and reduces the complexity of maintaining and contributing to the library.

I think the 99% case for users wanting this is this simple: a user wants to set EXCLUDE or INCLUDE on a whole tree

I think this can be done without modifying the core. If you are open to solving this use case with a utility and think a convenient interface for this functionality would be useful to other users in the community, would you consider publishing it as a package? I think that would be a good addition to the "Fields and Custom Schemas" section of the Ecosystem page of the wiki.

@sirosen
Copy link
Contributor Author

sirosen commented Aug 20, 2020

It's possible to do this in a separate lib, but if it's not going into the core, then the solutions of using a custom base schema or writing a schema builder are much better.

I think I misunderstood the situation -- with the existing PRs and issues still open, I thought this was something where I could close out a series of issues. It seems like none of these are actually wanted. Can they all be closed, along with this?

If it's useful, I can open a new PR with docs as I suggested. That way, the unknown documentation will state, up-front, how to solve this use-case. And that should help anyone being surprised by the behavior.

@sirosen sirosen force-pushed the conditional-propagate-unknown branch from 105702c to a990005 Compare September 4, 2020 19:48
@sirosen sirosen changed the title Propagate unknown when an explicit flag, propagate_unknown, is set Propagate unknown when an explicit flag is set, in the form unknown=...|PROPAGATE Sep 4, 2020
mahenzon and others added 6 commits September 4, 2020 19:49
This adds a new option, `propagate_unknown` which defaults to False.

It controls the behavior of `unknown` with respect to nested
structures. Anywhere that `unknown` can be set, `propagate_unknown`
can be set. That means it can be applied to a schema instance, a load
call, schema.Meta, or to fields.Nested .

When set, nested deserialize calls will get the same value for
`unknown` which their parent call got and they will receive
`propagate_unknown=True`.

The new flag is completely opt-in and therefore backwards
compatible with any current usages of marshmallow.
Once you opt in to this behavior on a schema, you don't need to
worry about making sure it's set by nested schemas that you use.

In the name of simplicity, this sacrifices a bit of flexibility. A
schema with `propagate_unknown=True, unknown=...` will override the
`unknown` settings on any of its child schemas.

Tests cover usage as a schema instantiation arg and as a load arg
for some simple data structures.
propagate_unknown will still traverse any series of nested documents,
meaning that once you set propagate_unknown=True, it is true for the
whole schema structure.

However, this introduces tracking for whether or not `unknown` was set
explicitly. If `unknown=RAISE` is set because no value was specified,
we will set a new flag on the schema, `auto_unknown=True`.

propagate_unknown now has the following behavior:
- if the nested schema has auto_unknown=False, use the current value
  for `unknown` in the nested `load` call
- if a nested field has its `unknown` attribute set, use that in place
  of any value sent via `propagate_unknown`

Effectively, this means that if you set `unknown` explicitly anywhere
in a nested schema structure, it will propagate downwards from that
point.

Combined with the fact that propagate_unknown=True propagates
downwards across all schema barriers, including if
`propagate_unknown=False` is set explicitly somewhere, this could be
confusing. However, because the idea is for `propagate_unknown=True`
to eventually be the only supported behavior for marshmallow,
this is acceptable as a limitation.

auto_unknown is an attribute of schema opts and of schema instances,
with the same kind of inheritance behavior as other fields.
The changelog entry, including credit to prior related work, covers
the closed issues and describes how `propagate_unknown` is expected to
behave.

Inline documentation attempts to strike a balance between clarify and
brevity.

closes marshmallow-code#1428, marshmallow-code#1429, marshmallow-code#1490, marshmallow-code#1575
Rather than having a new parameter which allows `propagate_unknown`
behavior, convert this into an option which can be set on the
``unknown`` parameter itself.

Drawing from other interfaces which present flags as or-able
values (traditionally bitmasks), express the setting of this option as
`x | PROPAGATE` for any supported `x`.

This makes it harder to propagate the value of `propagate` separately
from the rest of the value of `unknown`. For simplicity, change the
behavior here to suit this implementation.
In order to simplify, the value of `unknown` will not be respected
in child schemas if `PROPAGATE` is used. This removes any need for the
`auto_unknown` tracking, so there's less complexity added to schemas
in order to implement this behavior.

Adjust tests and changelog to match.

Also make minor tweaks to ensure UnknownParam usage is consistent and
keep the diff with the dev branch smaller.
@sirosen sirosen force-pushed the conditional-propagate-unknown branch from a990005 to 499d608 Compare September 4, 2020 19:50
@sirosen
Copy link
Contributor Author

sirosen commented Sep 4, 2020

I don't want to nag about this issue, but I also feel that the first draft of this put it on poor footing by adding a lot more API surface and including an inessential feature in the auto_unknown behavior.

I've refactored it to add PROPAGATE with the bitwise-or based interface proposed above, and to remove the auto_unknown behavior entirely in order to simplify. I'm not sure if unknown="raise" is considered valid and worth supporting or not, so I included support for it.

I'm still not strongly attached to making the change, but I think this is the simplest/smallest version of it I can imagine which won't break any normal usages today.

@sirosen

This comment has been minimized.

@wanderrful
Copy link

Completely agreed, sounds good! +1 will greatly appreciate having Propagate as an option.

@caio-vinicius
Copy link

caio-vinicius commented Apr 4, 2022

For the people looking how to solve the issue, this helps me:

"user": fields.Nested(                  
    {"id": fields.Int(required=True),   
     "login": fields.Str(required=True),
     "url": fields.Str()                
    },                                  
    required=True,                      
    unknown=EXCLUDE),                   

Adding unknown=EXCLUDE inside the fields.Nested to make sure that the Nested field ignore other data coming from the request.

@sirosen
Copy link
Contributor Author

sirosen commented Jan 1, 2025

I didn't think this is going to be merged, and keeping it open is just noise. Not sure why I didn't close it before.

I think it would be best if the other PRs for propagation of unknown were also closed.

@sirosen sirosen closed this Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants