Avoid unnecessary writes when modifying non-media fields#6208
Avoid unnecessary writes when modifying non-media fields#6208weiqianwang123 wants to merge 1 commit intobeetbox:masterfrom
Conversation
This patch prevents Item.write() from performing a mediafile.save() when no writable media tag fields have changed. Without this change, commands such as 'beet modify -a onplayer=true' trigger unnecessary file writes, updating mtimes and causing significant slowdowns. Fix verified with a tiny mp3 test case. Signed-off-by: Qianwei Wang <qweiw@umich.edu> Signed-off-by: weiqianwang123 <1416740298@qq.com>
|
Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry. |
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- Loading
original = Item.from_path(self.path)unconditionally intry_syncadds extra I/O even whenwriteis False; consider only constructingoriginalwhenwriteis True and short‑circuiting earlier if not. - Using
Item.from_path(self.path)to detect changes may throw if the file is missing or unreadable, whereastry_writepreviously handled its own errors; consider wrapping this in similar error handling or falling back to writing when comparison fails. - Instead of reconstructing the original item from disk for
_media_tags_changed, consider comparing against the pre‑modify database state (e.g., viaself._db.get_item(self.id)) to avoid extra file reads and keep the change detection purely metadata‑based.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Loading `original = Item.from_path(self.path)` unconditionally in `try_sync` adds extra I/O even when `write` is False; consider only constructing `original` when `write` is True and short‑circuiting earlier if not.
- Using `Item.from_path(self.path)` to detect changes may throw if the file is missing or unreadable, whereas `try_write` previously handled its own errors; consider wrapping this in similar error handling or falling back to writing when comparison fails.
- Instead of reconstructing the original item from disk for `_media_tags_changed`, consider comparing against the pre‑modify database state (e.g., via `self._db.get_item(self.id)`) to avoid extra file reads and keep the change detection purely metadata‑based.
## Individual Comments
### Comment 1
<location> `beets/library/models.py:1016` </location>
<code_context>
(conditionally).
"""
- if write:
+ original = Item.from_path(self.path)
+
+ # only write tags if media tags changed
</code_context>
<issue_to_address>
**issue:** Handle failures when loading `original` via `Item.from_path` to avoid breaking `try_sync` on IO/tagging errors.
Because `try_sync` now calls `Item.from_path(self.path)`, it can fail due to transient or unrelated IO/tagging issues (missing file, permissions, corrupt metadata) and abort the entire sync. Consider wrapping `Item.from_path` in a try/except and, on failure, either skip the change-detection optimization (always write) or log and continue, so `try_sync` degrades gracefully instead of raising unexpectedly.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| (conditionally). | ||
| """ | ||
| if write: | ||
| original = Item.from_path(self.path) |
There was a problem hiding this comment.
issue: Handle failures when loading original via Item.from_path to avoid breaking try_sync on IO/tagging errors.
Because try_sync now calls Item.from_path(self.path), it can fail due to transient or unrelated IO/tagging issues (missing file, permissions, corrupt metadata) and abort the entire sync. Consider wrapping Item.from_path in a try/except and, on failure, either skip the change-detection optimization (always write) or log and continue, so try_sync degrades gracefully instead of raising unexpectedly.
There was a problem hiding this comment.
Pull request overview
This PR aims to optimize the modify command by avoiding unnecessary file writes when only non-media fields (custom/flexible database fields) are changed. The goal is to reduce I/O operations, prevent unnecessary mtime updates, and improve performance significantly (from 25 seconds to <1 second for 200 batch modifications).
Key changes:
- Added
_media_tags_changed()method to compare media tag fields between two Item instances - Modified
try_sync()to conditionally skip file writes when only database-only fields are modified - Introduced logic to detect whether media tag fields have actually changed before writing to disk
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| (conditionally). | ||
| """ | ||
| if write: | ||
| original = Item.from_path(self.path) |
There was a problem hiding this comment.
Item.from_path() can raise a ReadError exception if the file is unreadable, missing, or corrupted. This exception is not caught, which would cause try_sync() to fail unexpectedly.
This is particularly problematic because:
- The method is called
try_sync(), implying it should handle errors gracefully - File corruption or deletion could occur between modification and sync
- The original code path didn't read from disk at this point
Consider wrapping this in a try-except block, or using a different approach that doesn't require re-reading the file.
| original = Item.from_path(self.path) | |
| try: | |
| original = Item.from_path(self.path) | |
| except ReadError as exc: | |
| log.error("{}", exc) | |
| return False |
| original = Item.from_path(self.path) | ||
|
|
||
| # only write tags if media tags changed | ||
| if write and self._media_tags_changed(original, self): | ||
| self.try_write() |
There was a problem hiding this comment.
The new optimization logic in try_sync() lacks test coverage. Consider adding tests to verify:
- Skipping writes when only flexible/custom fields are modified (e.g.,
onplayer=true) - Performing writes when media tag fields are modified (e.g.,
title="new title") - Handling edge cases like missing or corrupted files when checking for changes
- Performance improvement is achieved (optional: benchmark test)
The existing WriteTest class in test/test_library.py would be a good place to add these tests.
| return False | ||
|
|
||
| def _media_tags_changed(self, original, modified): | ||
| for f in self._media_tag_fields: # this is defined in the class already |
There was a problem hiding this comment.
The _media_tags_changed method lacks documentation. Consider adding a docstring to explain:
- What the method does
- The parameters
originalandmodified(expected to be Item instances) - Return value (boolean indicating if any media tag fields differ)
Additionally, the inline comment "this is defined in the class already" is unnecessary and can be removed.
| for f in self._media_tag_fields: # this is defined in the class already | |
| """ | |
| Determine if any media tag fields differ between two Item instances. | |
| Parameters: | |
| original (Item): The original Item instance. | |
| modified (Item): The modified Item instance. | |
| Returns: | |
| bool: True if any media tag fields differ, False otherwise. | |
| """ | |
| for f in self._media_tag_fields: |
| (conditionally). | ||
| """ | ||
| if write: | ||
| original = Item.from_path(self.path) |
There was a problem hiding this comment.
Calling Item.from_path() here defeats the performance optimization. from_path() creates a new Item and calls read(), which opens and parses the entire media file using MediaFile(syspath(read_path)). This is the expensive I/O operation the PR aims to avoid.
Instead, consider tracking which fields have been modified since the last read/write. For example:
- Track dirty fields in a set when fields are modified via
__setitem__ - Check if any dirty fields intersect with
_media_tag_fields - Only write if media tag fields have been modified
Alternatively, compare self against the in-memory state before any modifications were made, rather than re-reading from disk.
|
Hi! Thanks for the PR. It looks like the current approach is causing side effects that break the current test suite, possibly due to an indentation error on this line. A test for a change like this would also be good, especially since it's dealing with core code, to make sure we're not making the performance issues worse. |
Summary
Some beets commands such as:
beet modify -a onplayer=true queen
update only custom fields that are not mapped to media tags.
However, the current implementation still performs MediaFile.save(), causing:
unnecessary file writes
updated mtimes
20–30 second delays on large libraries
slow iPod sync / alternative device workflows
Root Cause
Item.write() filters _media_fields, but it always writes even if:
none of the media-tag fields changed
only flexible/custom database fields changed
Fix
Before writing, compute whether any media-tag-backed field changed.
If no media tag differences exist, skip the write entirely.
This preserves existing behavior for:
imports
replaygain
zero / scrub
metadata updates
Testing
A 0.5s sine-wave mp3 was generated via ffmpeg:
ffmpeg -f lavfi -i sine=frequency=440:duration=0.5 tiny.mp3
beet import tiny.mp3
Then:
Should NOT write to file
beet modify -a onplayer=true tiny
Should write to file
beet modify title="new title" tiny
Screenshots attached.
Performance improvement
Before: 25 seconds
After: <1 second
(repeated 200× batch modify)
DCO
All commits signed-off.
Related Issue: #5668