Skip to content

fix(file): Truncate filenames with excessively long 'extensions'#12025

Open
DEVELOPER-DEEVEN wants to merge 5 commits intoSignificant-Gravitas:devfrom
DEVELOPER-DEEVEN:fix/filename-truncation
Open

fix(file): Truncate filenames with excessively long 'extensions'#12025
DEVELOPER-DEEVEN wants to merge 5 commits intoSignificant-Gravitas:devfrom
DEVELOPER-DEEVEN:fix/filename-truncation

Conversation

@DEVELOPER-DEEVEN
Copy link

Fixes issue where filenames with no dots until the end (or massive extensions) bypassed truncation logic, causing OSError [Errno 36]. Limits extension preservation to 20 chars.

@DEVELOPER-DEEVEN DEVELOPER-DEEVEN requested a review from a team as a code owner February 9, 2026 18:03
@DEVELOPER-DEEVEN DEVELOPER-DEEVEN requested review from Bentlybro and Pwuts and removed request for a team February 9, 2026 18:03
@github-project-automation github-project-automation bot moved this to 🆕 Needs initial review in AutoGPT development kanban Feb 9, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

This PR targets the master branch but does not come from dev or a hotfix/* branch.

Automatically setting the base branch to dev.

@github-actions github-actions bot added the platform/backend AutoGPT Platform - Back end label Feb 9, 2026
@github-actions github-actions bot changed the base branch from master to dev February 9, 2026 18:03
@github-actions github-actions bot added the size/m label Feb 9, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

Walkthrough

The filename sanitization now preserves an extension only if its length is 20 characters or fewer; extensions longer than 20 characters are not preserved and the filename is truncated to MAX_FILENAME_LENGTH. A docstring wording tweak was made in store_media_file.

Changes

Cohort / File(s) Summary
Filename Sanitization Logic
autogpt_platform/backend/backend/util/file.py
When truncating long filenames, preserve the extension only if its length ≤ 20 characters; if >20, truncate the entire filename to MAX_FILENAME_LENGTH. Minor docstring wording tweak in store_media_file regarding sending content to external APIs.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A filename hopped with dots and flair,
I count twenty tails and trim with care,
If the tail’s too long, I snip it straight,
Short and tidy — neat as fate! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change: fixing filename truncation to handle filenames with excessively long extensions by limiting extension preservation.
Description check ✅ Passed The description is clearly related to the changeset, explaining the bug being fixed (filenames bypassing truncation logic) and the solution (limiting extension preservation to 20 chars).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f5f1361 and 1850438.

📒 Files selected for processing (1)
  • autogpt_platform/backend/backend/util/file.py
🧰 Additional context used
📓 Path-based instructions (4)
autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

Files:

  • autogpt_platform/backend/backend/util/file.py
autogpt_platform/backend/**/*.{py,txt}

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use poetry run prefix for all Python commands, including testing, linting, formatting, and migrations

Files:

  • autogpt_platform/backend/backend/util/file.py
autogpt_platform/backend/backend/**/*.py

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings

Files:

  • autogpt_platform/backend/backend/util/file.py
autogpt_platform/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

  • autogpt_platform/backend/backend/util/file.py
🧠 Learnings (1)
📚 Learning: 2026-02-04T16:50:20.508Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-02-04T16:50:20.508Z
Learning: Applies to autogpt_platform/backend/backend/blocks/*.py : When working with files in blocks, use `store_media_file()` from `backend.util.file` with appropriate `return_format` parameter: `for_local_processing` for local tools, `for_external_api` for external APIs, `for_block_output` for block outputs

Applied to files:

  • autogpt_platform/backend/backend/util/file.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Seer Code Review
  • GitHub Check: types
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
  • GitHub Check: Check PR Status
🔇 Additional comments (2)
autogpt_platform/backend/backend/util/file.py (2)

73-84: Clean fix for the long-extension bypass.

The logic correctly handles the edge case where a trailing pseudo-extension exceeds a reasonable length. The 20-character threshold is sensible given that real file extensions rarely exceed ~10 characters.

One minor suggestion: consider extracting 20 into a named constant (e.g., MAX_EXTENSION_LENGTH) next to MAX_FILENAME_LENGTH for discoverability and self-documentation.

[approve_code_changes, suggest_optional_refactor]

Optional: extract magic number
 # Maximum filename length (conservative limit for most filesystems)
 MAX_FILENAME_LENGTH = 200
+# Maximum extension length to preserve during truncation
+MAX_EXTENSION_LENGTH = 20
-            if len(ext) <= 20:
+            if len(ext) <= MAX_EXTENSION_LENGTH:

136-136: LGTM — minor docstring clarification.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CLAassistant
Copy link

CLAassistant commented Feb 9, 2026

CLA assistant check
All committers have signed the CLA.

@github-project-automation github-project-automation bot moved this from 🆕 Needs initial review to 👍🏼 Mergeable in AutoGPT development kanban Feb 13, 2026
@@ -71,11 +71,15 @@ def sanitize_filename(filename: str) -> str:

# Truncate if too long
if len(sanitized) > MAX_FILENAME_LENGTH:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The sanitize_filename() function checks filename length in characters, not bytes. This can cause an OSError for filenames with multi-byte characters that exceed filesystem byte limits.
Severity: HIGH

Suggested Fix

Modify the sanitize_filename() function to check the byte length of the filename after encoding it to UTF-8. The truncation logic should ensure that the final, encoded filename does not exceed the filesystem's byte limit (e.g., 255 bytes).

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: autogpt_platform/backend/backend/util/file.py#L73

Potential issue: The `sanitize_filename()` function at line 73 validates filename length
by character count using `len()`, but most filesystems enforce a byte-limit (e.g., 255
bytes). A filename containing multi-byte UTF-8 characters (like emojis or CJK
characters) can pass the character-based check (e.g., < 200 characters) but still exceed
the filesystem's byte limit. This will cause an `OSError: [Errno 36] File name too long`
when the application attempts to write the file to disk in functions like
`store_media_file()`, leading to a crash of the operation.

Did we get this right? 👍 / 👎 to inform future reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform/backend AutoGPT Platform - Back end size/m

Projects

Status: 👍🏼 Mergeable

Development

Successfully merging this pull request may close these issues.

3 participants