Skip to content

[Avro] Accept dict with only 'type': 'null' as representation of null #2109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Tishj
Copy link

@Tishj Tishj commented Jun 17, 2025

This PR makes it so that a different serialization of the null type is accepted, this variation is produced by avro-c (apache/avro's c lang implementation)

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense, but i cant find a corresponding mention in the avro spec

cc @Fokko to chime in here

@@ -171,11 +171,11 @@ def _resolve_union(
# This means that null has to come first:
# https://avro.apache.org/docs/current/spec.html
# type of the default value must match the first element of the union.
if "null" != avro_types[0]:
raise TypeError("Only null-unions are supported")
if avro_types[0] != "null" and avro_types[0] != {'type': 'null'}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a test for this in

def test_resolve_union() -> None:
with pytest.raises(TypeError) as exc_info:
AvroSchemaConversion()._resolve_union(["null", "string", "long"])
assert "Non-optional types aren't part of the Iceberg specification" in str(exc_info.value)

@Tishj
Copy link
Author

Tishj commented Jun 17, 2025

The spec seems to indicate that this is valid:
https://avro.apache.org/docs/1.11.1/specification/

Right under Schema Declaration

{"type": "typeName" ...attributes...}

where typeName is either a primitive or derived type name, as defined below

List of primitive types

Primitive Types
The set of primitive type names is:

null: no value
...

And then below the list:

Primitive type names are also defined type names. Thus, for example, the schema “string” is equivalent to:
{"type": "string"}

Which is why "null" is shorthand for {"type": "null"}

@kevinjqliu
Copy link
Contributor

ty that makes sense, thanks for the pointer

@kevinjqliu
Copy link
Contributor

Looks like theres a linter issue, could you run make lint locally?

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the syntax looks a bit weird, I agree that this is valid:

image

I agree with @kevinjqliu that a test would be a very good idea, since we're moving this part to rust somewhere in the not-so-distant future :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants