Skip to content

fix: Update push_data and user_data annotation with JsonSerializable instead of Any#1889

Open
Mantisus wants to merge 3 commits into
apify:masterfrom
Mantisus:up-json-serializavle-typing
Open

fix: Update push_data and user_data annotation with JsonSerializable instead of Any#1889
Mantisus wants to merge 3 commits into
apify:masterfrom
Mantisus:up-json-serializavle-typing

Conversation

@Mantisus
Copy link
Copy Markdown
Collaborator

Description

  • Improved annotation for arguments that accept JSON data by replacing implicit Any with explicit JsonSerializable type for push_data and user_data parameters.

Issues

@Mantisus Mantisus requested review from janbuchar and vdusek May 11, 2026 15:52
@Mantisus Mantisus self-assigned this May 11, 2026
Copy link
Copy Markdown
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please try using this type:

JsonSerializable = dict[str, 'JsonSerializable'] | list['JsonSerializable'] | str | int | float | bool | None
"""Recursive type for JSON-serializable values - primitives plus objects and arrays with JSON-serializable contents.

Based on the definition discussed in https://github.com/python/typing/issues/182.
"""

All major type checkers support recursive types now, so we can finally type this correctly. I recently made the same change in the API client as well - https://github.com/apify/apify-client-python/blob/master/src/apify_client/_types.py#L30C1-L34C4.

Copy link
Copy Markdown
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of a better JsonSerializable type should be that it lets us type its various usages more accurately and with less complexity. Here are a few examples:

Comment thread src/crawlee/_utils/file.py
Comment thread src/crawlee/crawlers/_abstract_http/_abstract_http_crawler.py
Comment thread src/crawlee/crawlers/_basic/_basic_crawler.py
@Mantisus Mantisus requested a review from vdusek May 13, 2026 21:49
Copy link
Copy Markdown
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return value of iterate_items is still AsyncIterator[dict[str, Any]] in the FileSystemDatasetClient, SqlDatasetClient, and RedisDatasetClient. Is that intended?

In Redis, there is also a relevant cast.

Copy link
Copy Markdown
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see data field in the DatasetItemDb:

data: Mapped[list[dict[str, Any]] | dict[str, Any]] = mapped_column(JsonField, nullable=False)

=>

data: Mapped[dict[str, JsonSerializable]] = mapped_column(JsonField, nullable=False)

Comment thread src/crawlee/_types.py
Comment on lines -30 to 32
# Workaround for https://github.com/pydantic/pydantic/issues/9445
J = TypeVar('J', bound='JsonSerializable')
JsonSerializable = list[J] | dict[str, J] | str | bool | int | float | None
JsonSerializable = dict[str, 'JsonSerializable'] | list['JsonSerializable'] | str | int | float | bool | None
else:
from pydantic import JsonValue as JsonSerializable
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering whether now we can't just use Pydantic's type for type checking as well, have you tried it? 🙂

yield {}

@staticmethod
def _is_list_of_items(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we can name it "is sequence"? instead of "is list" 🙂

self._id = id or crypto_random_object_id(length=10)
self._max_age = max_age
self._user_data = user_data or {}
self._user_data: dict[str, JsonSerializable] = dict(user_data) if user_data is not None else {}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no?

Suggested change
self._user_data: dict[str, JsonSerializable] = dict(user_data) if user_data is not None else {}
self._user_data: MutableMapping[str, JsonSerializable] = dict(user_data) if user_data is not None else {}

id: Annotated[str, Field(alias='id')]
max_age: Annotated[timedelta, Field(alias='maxAge')]
user_data: Annotated[dict, Field(alias='userData')]
user_data: Annotated[dict[str, JsonSerializable], Field(alias='userData')]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping / MutableMapping no?

attribute: str = 'href',
label: str | None = None,
user_data: dict | None = None,
user_data: dict[str, JsonSerializable] | None = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping / MutableMapping no?

attribute: str | None = None,
label: str | None = None,
user_data: dict[str, Any] | None = None,
user_data: dict[str, JsonSerializable] | None = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping / MutableMapping no?

attribute: str = 'href',
label: str | None = None,
user_data: dict[str, Any] | None = None,
user_data: dict[str, JsonSerializable] | None = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping / MutableMapping no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update push_data annotations to use JsonSerializable type

3 participants