[Bug]: Video upload to wandb broken since 2.4.0 #2055

OliverUrbann · 2024-12-13T10:34:46Z

🐛 Bug

Using stable_baselines3 2.3.2 in Python 3.11 the provided unit test can upload videos to WANDB successfully. However, using 2.4 it fails.

To Reproduce

import unittest
import time
import os
import gymnasium as gym
from stable_baselines3.common.vec_env import VecVideoRecorder, DummyVecEnv
import wandb
from wandb import Api
from wandb.integration.sb3 import WandbCallback
from stable_baselines3 import PPO

class TestWandbVideoUpload(unittest.TestCase):
    def test_video_upload(self):
        env_id = "CartPole-v1"
        video_folder = "videos"
        video_length = 100

        vec_env = DummyVecEnv([lambda: gym.make(env_id, render_mode="rgb_array")])

        obs = vec_env.reset()

        run = wandb.init(
            project="test",
            sync_tensorboard=True,  # Automatically upload SB3's TensorBoard metrics
            monitor_gym=True,       # Automatically upload agent playing videos
            # save_code=True,       # Optional
        )

        # Record the video starting at the first step
        vec_env = VecVideoRecorder(
            vec_env,
            video_folder,
            record_video_trigger=lambda x: x == 0,
            video_length=video_length,
            name_prefix=f"agent-{env_id}"
        )

        vec_env.reset()

        model = PPO("MlpPolicy", vec_env, verbose=1, tensorboard_log=f"runs/{run.id}")
        model.learn(
            total_timesteps=5000,
            callback=WandbCallback(
                model_save_path=f"tmp/models/{run.id}",
                verbose=2,
            ),
        )
        run.finish()

        # Give some time for the upload (adjust depending on connection speed)
        time.sleep(30)

        # Use the wandb API to check the run
        api = Api()
        # If you're logged into a different W&B account or using an organization, adjust 'entity' accordingly
        run_path = f"{run.entity}/{run.project}/{run.id}"
        run_api = api.run(run_path)

        # Retrieve a list of all files in the run
        files = run_api.files()
        file_names = [f.name for f in files]

        # Check if a video file is present
        video_files = [name for name in file_names if name.endswith('.mp4')]

        self.assertTrue(len(video_files) > 0, "The video was not uploaded to wandb.")

        # Optional: Print the uploaded video files
        print("Uploaded video files:", video_files)

        # Clean up
        vec_env.close()
        wandb.finish()

if __name__ == '__main__':
    unittest.main()

Relevant log output / Error message

No response

System Info

OS: Linux-5.15.0-124-generic-x86_64-with-glibc2.35 # 134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024
Python: 3.11.0rc1
Stable-Baselines3: 2.4.0
PyTorch: 2.5.1+cu124
GPU Enabled: False
Numpy: 1.26.4
Cloudpickle: 3.1.0
Gymnasium: 0.29.1

Checklist

My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
I have checked that there is no similar issue in the repo
I have read the documentation
I have provided a minimal and working example to reproduce the bug
I've used the markdown code blocks for both code and stack traces.

araffin · 2024-12-13T10:57:02Z

Hello,
could you provide the error message too?

OliverUrbann · 2024-12-13T11:01:00Z

Sure, here is the log downloaded from wandb produced by the provided script:

Using cpu device
MoviePy - Building video /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4.
MoviePy - Writing video /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4
wandb: WARNING Found log directory outside of given root_logdir, dropping given root_logdir for event file in ../tmp/tests/runs/jnkaujln/PPO_1

MoviePy - Done !
MoviePy - video ready /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4
Logging to ../tmp/tests/runs/jnkaujln/PPO_1
Saving video to /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4
MoviePy - Building video /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4.
MoviePy - Writing video /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4
                                                                        

MoviePy - Done !
MoviePy - video ready /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4
-----------------------------
| time/              |      |
|    fps             | 1479 |
|    iterations      | 1    |
|    time_elapsed    | 1    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1418        |
|    iterations           | 2           |
|    time_elapsed         | 2           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.009116981 |
|    clip_fraction        | 0.111       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.686      |
|    explained_variance   | 0.00189     |
|    learning_rate        | 0.0003      |
|    loss                 | 8.96        |
|    n_updates            | 10          |
|    policy_gradient_loss | -0.0165     |
|    value_loss           | 51.3        |
-----------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1393         |
|    iterations           | 3            |
|    time_elapsed         | 4            |
|    total_timesteps      | 6144         |
| train/                  |              |
|    approx_kl            | 0.0094210915 |
|    clip_fraction        | 0.0634       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.667       |
|    explained_variance   | 0.0815       |
|    learning_rate        | 0.0003       |
|    loss                 | 13.8         |
|    n_updates            | 20           |
|    policy_gradient_loss | -0.0178      |
|    value_loss           | 33.9         |
------------------------------------------

So actually I don't see a relevant error. The log msg that the file is ready is correct, there is working video file, but not uploaded.

araffin · 2024-12-13T11:15:53Z

Do you see any difference in the filenames/logs compared to SB3<2.4.0 ?

EDIT: what is you wandb version? latest should be wandb==0.19.1

OliverUrbann · 2024-12-13T12:07:36Z

It is the latest.

Package                      Version
---------------------------- --------------
absl-py                      2.1.0
annotated-types              0.7.0
anyio                        4.7.0
argon2-cffi                  23.1.0
argon2-cffi-bindings         21.2.0
arrow                        1.3.0
asttokens                    3.0.0
astunparse                   1.6.3
async-lru                    2.0.4
attrs                        24.2.0
babel                        2.16.0
beautifulsoup4               4.12.3
bleach                       6.2.0
blinker                      1.4
cachetools                   5.5.0
certifi                      2024.8.30
cffi                         1.17.1
charset-normalizer           3.4.0
click                        8.1.7
cloudpickle                  3.1.0
coloredlogs                  15.0.1
comm                         0.2.2
contourpy                    1.3.1
cryptography                 3.4.8
cycler                       0.12.1
dbus-python                  1.2.18
debugpy                      1.8.10
decorator                    4.4.2
defusedxml                   0.7.1
distro                       1.7.0
distro-info                  1.1+ubuntu0.2
dm-tree                      0.1.8
docker-pycreds               0.4.0
executing                    2.1.0
Farama-Notifications         0.0.4
fastjsonschema               2.21.1
filelock                     3.16.1
flatbuffers                  24.3.25
fonttools                    4.55.3
fqdn                         1.5.1
fsspec                       2024.10.0
gast                         0.6.0
gitdb                        4.0.11
GitPython                    3.1.43
google-auth                  2.36.0
google-auth-oauthlib         1.2.1
google-pasta                 0.2.0
grpcio                       1.68.1
gymnasium                    0.29.1
h11                          0.14.0
h5py                         3.12.1
httpcore                     1.0.7
httplib2                     0.20.2
httpx                        0.28.1
humanfriendly                10.0
idna                         3.10
imageio                      2.36.1
imageio-ffmpeg               0.5.1
importlib-metadata           4.6.4
iniconfig                    2.0.0
ipykernel                    6.29.5
ipython                      8.30.0
ipywidgets                   8.1.5
isoduration                  20.11.0
jedi                         0.19.2
jeepney                      0.7.1
Jinja2                       3.1.4
json5                        0.10.0
jsonpointer                  3.0.0
jsonschema                   4.23.0
jsonschema-specifications    2024.10.1
jupyter                      1.1.1
jupyter_client               8.6.3
jupyter-console              6.6.3
jupyter_core                 5.7.2
jupyter-events               0.10.0
jupyter-lsp                  2.2.5
jupyter_server               2.14.2
jupyter_server_terminals     0.5.3
jupyterlab                   4.3.3
jupyterlab_pygments          0.3.0
jupyterlab_server            2.27.3
jupyterlab_widgets           3.0.13
keras                        2.15.0
keyring                      23.5.0
kiwisolver                   1.4.7
launchpadlib                 1.10.16
lazr.restfulclient           0.14.4
lazr.uri                     1.0.6
libclang                     18.1.1
Markdown                     3.7
MarkupSafe                   3.0.2
matplotlib                   3.9.3
matplotlib-inline            0.1.7
mistune                      3.0.2
ml-dtypes                    0.2.0
more-itertools               8.10.0
moviepy                      2.1.1
mpmath                       1.3.0
nbclient                     0.10.1
nbconvert                    7.16.4
nbformat                     5.10.4
nest-asyncio                 1.6.0
networkx                     3.4.2
notebook                     7.3.1
notebook_shim                0.2.4
numpy                        1.26.4
nvidia-cublas-cu12           12.4.5.8
nvidia-cuda-cupti-cu12       12.4.127
nvidia-cuda-nvrtc-cu12       12.4.127
nvidia-cuda-runtime-cu12     12.4.127
nvidia-cudnn-cu12            9.1.0.70
nvidia-cufft-cu12            11.2.1.3
nvidia-curand-cu12           10.3.5.147
nvidia-cusolver-cu12         11.6.1.9
nvidia-cusparse-cu12         12.3.1.170
nvidia-nccl-cu12             2.21.5
nvidia-nvjitlink-cu12        12.4.127
nvidia-nvtx-cu12             12.4.127
oauthlib                     3.2.0
onnx                         1.15.0
onnx-tf                      1.10.0
onnxruntime                  1.17.1
opt_einsum                   3.4.0
overrides                    7.7.0
packaging                    24.2
pandas                       2.2.3
pandocfilters                1.5.1
parso                        0.8.4
pexpect                      4.9.0
pillow                       10.4.0
pip                          22.0.2
platformdirs                 4.3.6
pluggy                       1.5.0
proglog                      0.1.10
prometheus_client            0.21.1
prompt_toolkit               3.0.48
protobuf                     4.25.5
psutil                       6.1.0
ptyprocess                   0.7.0
pure_eval                    0.2.3
pyasn1                       0.6.1
pyasn1_modules               0.4.1
pycparser                    2.22
pydantic                     2.10.3
pydantic_core                2.27.1
pygame                       2.6.1
Pygments                     2.18.0
PyGObject                    3.42.1
PyJWT                        2.3.0
pyparsing                    2.4.7
pytest                       8.3.4
python-apt                   2.4.0+ubuntu4
python-dateutil              2.9.0.post0
python-dotenv                1.0.1
python-json-logger           2.0.7
pytz                         2024.2
PyVirtualDisplay             3.0
PyYAML                       6.0.2
pyzbar                       0.1.9
pyzmq                        26.2.0
referencing                  0.35.1
requests                     2.32.3
requests-oauthlib            2.0.0
rfc3339-validator            0.1.4
rfc3986-validator            0.1.1
rpds-py                      0.22.3
rsa                          4.9
scipy                        1.14.1
SecretStorage                3.3.1
Send2Trash                   1.8.3
sentry-sdk                   2.19.2
setproctitle                 1.3.4
setuptools                   59.6.0
six                          1.16.0
smmap                        5.0.1
sniffio                      1.3.1
soupsieve                    2.6
stable_baselines3            2.3.2
stack-data                   0.6.3
sympy                        1.13.1
tensorboard                  2.15.2
tensorboard-data-server      0.7.2
tensorflow                   2.15.0
tensorflow-addons            0.23.0
tensorflow-estimator         2.15.0
tensorflow-io-gcs-filesystem 0.37.1
tensorflow-probability       0.23.0
termcolor                    2.5.0
terminado                    0.18.1
tinycss2                     1.4.0
torch                        2.5.1
tornado                      6.4.2
tqdm                         4.67.1
traitlets                    5.14.3
triton                       3.1.0
typeguard                    2.13.3
types-python-dateutil        2.9.0.20241206
typing_extensions            4.12.2
tzdata                       2024.2
unattended-upgrades          0.1
uri-template                 1.3.0
urllib3                      2.2.3
wadllib                      1.3.6
wandb                        0.19.1
wcwidth                      0.2.13
webcolors                    24.11.1
webencodings                 0.5.1
websocket-client             1.8.0
Werkzeug                     3.1.3
wheel                        0.37.1
widgetsnbextension           4.0.13
wrapt                        1.14.1
zipp                         1.0.0

And here is the output of a successful run:

Using cpu device
MoviePy - Building video /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4.
MoviePy - Writing video /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4
wandb: WARNING Found log directory outside of given root_logdir, dropping given root_logdir for event file in ../tmp/tests/runs/khcb9wj0/PPO_1

MoviePy - Done !
MoviePy - video ready /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4
Logging to ../tmp/tests/runs/khcb9wj0/PPO_1
Saving video to /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4
MoviePy - Building video /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4.
MoviePy - Writing video /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4
                                                                        

MoviePy - Done !
MoviePy - video ready /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4
-----------------------------
| time/              |      |
|    fps             | 1414 |
|    iterations      | 1    |
|    time_elapsed    | 1    |
|    total_timesteps | 2048 |
-----------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 1299         |
|    iterations           | 2            |
|    time_elapsed         | 3            |
|    total_timesteps      | 4096         |
| train/                  |              |
|    approx_kl            | 0.0090919435 |
|    clip_fraction        | 0.119        |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.685       |
|    explained_variance   | 0.0122       |
|    learning_rate        | 0.0003       |
|    loss                 | 7.24         |
|    n_updates            | 10           |
|    policy_gradient_loss | -0.0188      |
|    value_loss           | 49.9         |
------------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1270        |
|    iterations           | 3           |
|    time_elapsed         | 4           |
|    total_timesteps      | 6144        |
| train/                  |             |
|    approx_kl            | 0.009359399 |
|    clip_fraction        | 0.0527      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.666      |
|    explained_variance   | 0.112       |
|    learning_rate        | 0.0003      |
|    loss                 | 13.9        |
|    n_updates            | 20          |
|    policy_gradient_loss | -0.0165     |
|    value_loss           | 33.7        |
-----------------------------------------

curtiscjohnson · 2024-12-18T21:28:29Z

I'm also experiencing this issue after a recent upgrade to v2.4.0.

araffin · 2024-12-18T21:56:40Z

might be related to #2061

help is welcomed to solve the issue =)

araffin · 2024-12-20T14:38:02Z

@OliverUrbann could you try with #2063 ?
it might solve your issue

OliverUrbann · 2024-12-20T16:23:12Z

Thx! However, it still fails. Just to double check:

pip install git+https://github.com/DLR-RM/stable-baselines3.git@fix/video-record
...
pip list | grep stable 
stable-baselines3            2.5.0a1

And here is the test output:

wandb: Currently logged in as:.... Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.19.1
wandb: Run data is saved locally in ...
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run swift-bush-15
wandb: ⭐️ View project at ...
wandb: 🚀 View run at ...
error: XDG_RUNTIME_DIR not set in the environment.
Using cpu device
MoviePy - Building video /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4.
MoviePy - Writing video /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4

MoviePy - Done !                                            
MoviePy - video ready /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4
wandb: WARNING Found log directory outside of given root_logdir, dropping given root_logdir for event file in ../tmp/tests/runs/3fwsvh8f/PPO_1
Logging to ../tmp/tests/runs/3fwsvh8f/PPO_1
Saving video to /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4
MoviePy - Building video /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4.
MoviePy - Writing video /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4

MoviePy - Done !                                                        
MoviePy - video ready /home/devil/tmp/tests/videos/agent-CartPole-v1-step-0-to-step-100.mp4
-----------------------------
| time/              |      |
|    fps             | 1470 |
|    iterations      | 1    |
|    time_elapsed    | 1    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1417        |
|    iterations           | 2           |
|    time_elapsed         | 2           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.008298077 |
|    clip_fraction        | 0.0771      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.687      |
|    explained_variance   | 0.000942    |
|    learning_rate        | 0.0003      |
|    loss                 | 7.24        |
|    n_updates            | 10          |
|    policy_gradient_loss | -0.0115     |
|    value_loss           | 47.4        |
-----------------------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 1403        |
|    iterations           | 3           |
|    time_elapsed         | 4           |
|    total_timesteps      | 6144        |
| train/                  |             |
|    approx_kl            | 0.010035685 |
|    clip_fraction        | 0.0683      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.668      |
|    explained_variance   | 0.0891      |
|    learning_rate        | 0.0003      |
|    loss                 | 14.5        |
|    n_updates            | 20          |
|    policy_gradient_loss | -0.0176     |
|    value_loss           | 35.1        |
-----------------------------------------
wandb: updating run config
wandb:                                                                                
wandb: 
wandb: Run history:
wandb:                global_step ▁▅▅▅▅▅▅▅▅▅▅██████████
wandb:                   time/fps █▂▁
wandb:            train/approx_kl ▁█
wandb:        train/clip_fraction █▁
wandb:           train/clip_range ▁▁
wandb:         train/entropy_loss ▁█
wandb:   train/explained_variance ▁█
wandb:        train/learning_rate ▁▁
wandb:                 train/loss ▁█
wandb: train/policy_gradient_loss █▁
wandb:           train/value_loss █▁
wandb: 
wandb: Run summary:
wandb:                global_step 6144
wandb:                   time/fps 1403
wandb:            train/approx_kl 0.01004
wandb:        train/clip_fraction 0.06831
wandb:           train/clip_range 0.2
wandb:         train/entropy_loss -0.66834
wandb:   train/explained_variance 0.08907
wandb:        train/learning_rate 0.0003
wandb:                 train/loss 14.49088
wandb: train/policy_gradient_loss -0.01757
wandb:           train/value_loss 35.06071
wandb: 
wandb: 🚀 View run swift-bush-15 at: ...
wandb: ⭐️ View project at: ...
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s)
wandb: Find logs at: ...
True
FAIL

======================================================================
FAIL: test_video_upload (test_video.TestWandbVideoUpload.test_video_upload)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/devil/MoToFlex/tests/test_video.py", line 65, in test_video_upload
    self.assertTrue(len(video_files) > 0, "The video was not uploaded to wandb.")
AssertionError: False is not true : The video was not uploaded to wandb.

----------------------------------------------------------------------
Ran 1 test in 45.509s

FAILED (failures=1)
Finished running tests!

Also checked 2.3.2 again, and it still works.

araffin · 2024-12-20T16:48:45Z

thanks for trying =)
I've dig more into the issue and I think I found the root cause.

The problem comes from W&B client: https://github.com/wandb/wandb/blob/8dd25cab52da3603022e75322c847de4def21b1c/wandb/integration/gym/__init__.py#L68

With Gymnasium v1.0, the previous recorder was removed (see wandb/wandb#7047 and #1837), so to be compatible with gymnasium v0.29.1 and v1.0, sb3 doesn't use the gym recorder class anymore (which was monkey-patched by W&B client to upload videos).
Long story short, the W&B client/callback has to be updated.

EDIT: in the meantime you can manually call wandb.log(): https://github.com/wandb/wandb/blob/8dd25cab52da3603022e75322c847de4def21b1c/wandb/integration/gym/__init__.py#L80

OliverUrbann added the bug Something isn't working label Dec 13, 2024

araffin added the more information needed Please fill the issue template completely label Dec 13, 2024

curtiscjohnson mentioned this issue Dec 18, 2024

[Bug]: VecVideoRecorder overwrites previous video at each save #2061

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Video upload to wandb broken since 2.4.0 #2055

[Bug]: Video upload to wandb broken since 2.4.0 #2055

OliverUrbann commented Dec 13, 2024

araffin commented Dec 13, 2024

OliverUrbann commented Dec 13, 2024 •

edited

Loading

araffin commented Dec 13, 2024 •

edited

Loading

OliverUrbann commented Dec 13, 2024

curtiscjohnson commented Dec 18, 2024

araffin commented Dec 18, 2024

araffin commented Dec 20, 2024

OliverUrbann commented Dec 20, 2024

araffin commented Dec 20, 2024 •

edited

Loading

[Bug]: Video upload to wandb broken since 2.4.0 #2055

[Bug]: Video upload to wandb broken since 2.4.0 #2055

Comments

OliverUrbann commented Dec 13, 2024

🐛 Bug

To Reproduce

Relevant log output / Error message

System Info

Checklist

araffin commented Dec 13, 2024

OliverUrbann commented Dec 13, 2024 • edited Loading

araffin commented Dec 13, 2024 • edited Loading

OliverUrbann commented Dec 13, 2024

curtiscjohnson commented Dec 18, 2024

araffin commented Dec 18, 2024

araffin commented Dec 20, 2024

OliverUrbann commented Dec 20, 2024

araffin commented Dec 20, 2024 • edited Loading

OliverUrbann commented Dec 13, 2024 •

edited

Loading

araffin commented Dec 13, 2024 •

edited

Loading

araffin commented Dec 20, 2024 •

edited

Loading