[BUG] Frame-level view aggregations exceed MongoDB pipeline size limit, causing OperationFailure #5453

Emelian · 2025-01-31T09:12:32Z

Describe the problem

The error consistently appears with datasets that have extensive frame data.

When using filters that require frame data (exists, match_frames and more other), FiftyOne attempts to aggregate all frame documents from the dataset. If there is a large amount of frame data, MongoDB throws an error about exceeding the 104857600 bytes limit. This makes any operation on such a view (e.g., len(view)) fail. The issue cannot be resolved at the user-code level.

Traceback (most recent call last):
  File "/builds/analytics/networks/datasets/src/annotator.py", line 90, in main
    logger.info(f"Annotating {len(samples)} videos without {label_field} field")
                              ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/fiftyone/core/view.py", line 102, in __len__
    return self.count()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/fiftyone/core/collections.py", line 7748, in count
    return self._make_and_aggregate(make, field_or_expr)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/fiftyone/core/collections.py", line 10419, in _make_and_aggregate
    return self.aggregate(make(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/fiftyone/core/collections.py", line 10112, in aggregate
    _results = foo.aggregate(self._dataset._sample_collection, pipelines)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/fiftyone/core/odm/database.py", line 346, in aggregate
    result = collection.aggregate(pipelines[0], allowDiskUse=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pymongo/synchronous/collection.py", line 2937, in aggregate
    return self._aggregate(
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pymongo/_csot.py", line 120, in csot_wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pymongo/synchronous/collection.py", line 2845, in _aggregate
    return self._database.client._retryable_read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pymongo/synchronous/mongo_client.py", line 1863, in _retryable_read
    return self._retry_internal(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pymongo/_csot.py", line 120, in csot_wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pymongo/synchronous/mongo_client.py", line 1830, in _retry_internal
    ).run()
      ^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pymongo/synchronous/mongo_client.py", line 2554, in run
    return self._read() if self._is_read else self._write()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pymongo/synchronous/mongo_client.py", line 2697, in _read
    return self._func(self._session, self._server, conn, read_pref)  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pymongo/synchronous/aggregation.py", line 164, in get_cursor
    result = conn.command(
             ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pymongo/synchronous/helpers.py", line 45, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pymongo/synchronous/pool.py", line 538, in command
    return command(
           ^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pymongo/synchronous/network.py", line 218, in command
    helpers_shared._check_command_response(
  File "/usr/local/lib/python3.12/dist-packages/pymongo/helpers_shared.py", line 247, in _check_command_response
    raise OperationFailure(errmsg, code, response, max_wire_version)
pymongo.errors.OperationFailure: PlanExecutor error during aggregation :: caused by :: Total size of documents in frames.samples.678b5d532a5565037d9de50e matching pipeline's $lookup stage exceeds 104[857](****)600 bytes, full error: {'ok': 0.0, 'errmsg': "PlanExecutor error during aggregation :: caused by :: Total size of documents in frames.samples.678b5d532a5565037d9de50e matching pipeline's $lookup stage exceeds 104857600 bytes", 'code': 4568, 'codeName': 'Location4568', '$clusterTime': {'clusterTime': Timestamp(1738175668, 25), 'signature': {'hash': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'keyId': 0}}, 'operationTime': Timestamp(1738175668, 25)}

This is a simplified example. In practice, the filters might vary. The important factor - dataset is having large frames metadata.

System Information

OS Platform and Distribution: Linux
Python version: 3.12
FiftyOne version: 1.3.0
FiftyOne installed from: pip

Extra info

Total count of videos in dataset: 108
Total count of frames in dataset: 928747
Total count of labels (in 2 label fields) in dataset: 950372

The text was updated successfully, but these errors were encountered:

smartnet-club · 2025-02-01T08:48:24Z

I confirm that this error is reproduced on video datasets with a sufficient number of detections

The code:

samples = dataset.match_frames(F("yolov8x-world-test").is_null())
len(samples)

produces mongo request:

      "pipeline": [
        {
          "$lookup": {
            "from": "frames.samples.672b50348af9bde1a02499b0",
            "let": {"sample_id": "$_id"},
            "pipeline": [
              {
                "$match": {"$expr": {"$eq": ["$$sample_id", "$_sample_id"]}}
              },
              {"$sort": {"frame_number": 1}}
            ],
            "as": "frames"}
        },
        {
          "$addFields": {"frames": {"$filter": {"input": "$frames", "as": "this", "cond": {"$not": {"$gt": ["$$this.yolov8x-world-test"]}}}}}
        },
        {
          "$match": {"$expr": {"$gt": [{"$size": {"$ifNull": ["$frames"]}}]}}
        },
        {"$count": "count"}
      ]

lookup part of pipeline aggregates all frames documents from frames.samples.672b50348af9bde1a02499b0 into samples.672b50348af9bde1a02499b0

benjaminpkane · 2025-02-03T17:49:03Z

HI @smartnet-club. I am working on a proposal that aims to fix this. This week, hopefully

Emelian added the bug Bug fixes label Jan 31, 2025

Emelian changed the title ~~[BUG]~~ [BUG] Frame-level view aggregations exceed MongoDB pipeline size limit, causing OperationFailure Feb 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Frame-level view aggregations exceed MongoDB pipeline size limit, causing OperationFailure #5453

[BUG] Frame-level view aggregations exceed MongoDB pipeline size limit, causing OperationFailure #5453

Emelian commented Jan 31, 2025 •

edited

Loading

smartnet-club commented Feb 1, 2025 •

edited

Loading

benjaminpkane commented Feb 3, 2025

[BUG] Frame-level view aggregations exceed MongoDB pipeline size limit, causing OperationFailure #5453

[BUG] Frame-level view aggregations exceed MongoDB pipeline size limit, causing OperationFailure #5453

Comments

Emelian commented Jan 31, 2025 • edited Loading

Describe the problem

System Information

Extra info

smartnet-club commented Feb 1, 2025 • edited Loading

benjaminpkane commented Feb 3, 2025

Emelian commented Jan 31, 2025 •

edited

Loading

smartnet-club commented Feb 1, 2025 •

edited

Loading