-
Notifications
You must be signed in to change notification settings - Fork 10
feat: add KNOWLEDGE_ALLOW_PDF environment variable #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 1.x
Are you sure you want to change the base?
Conversation
WalkthroughAdds a runtime-configurable PDF support flag, wiring it through config, routes, docs loading, and service initialization. Updates backend upload filters and URL import to reject PDFs when disabled. Expands the frontend Knowledge tab with a full toast system, enhanced search and URL import UI, and improved PDF viewing. Documentation/version bumped. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as User
participant UI as Frontend UI
participant R as Routes (upload)
participant C as Config (isPdfAllowed)
participant S as KnowledgeService
U->>UI: Upload file / URL
UI->>R: POST /knowledge/upload (file or URL)
R->>C: isPdfAllowed(runtime)
alt PDF disabled
R-->>UI: 403 PDF_DISABLED (JSON/text)
UI->>UI: Show toast "PDF uploads disabled"
else PDF allowed
R->>S: Process upload
S-->>R: Success
R-->>UI: 200 OK
UI->>UI: Show success toast
end
sequenceDiagram
autonumber
participant KS as KnowledgeService
participant DL as Docs Loader
participant C as Config (isPdfAllowed)
KS->>DL: loadDocsFromPath(..., runtime)
DL->>C: isPdfAllowed(runtime)
alt PDF disabled
DL->>DL: Skip .pdf files, log warning
else PDF allowed
DL->>DL: Load all supported files
end
DL-->>KS: Loaded documents (filtered)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@BugBot review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/service.ts (1)
323-341: Add service‑level PDF guard (defense‑in‑depth).Routes/docs‑loader enforce the flag, but direct service calls can bypass it. Guard here to uphold the security invariant.
-import { validateModelConfig } from './config'; +import { validateModelConfig, isPdfAllowed } from './config';let documentContentToStore: string; const isPdfFile = contentType === 'application/pdf' || originalFilename.toLowerCase().endsWith('.pdf'); if (isPdfFile) { + // Enforce PDF policy at the service layer too (defense-in-depth) + if (!isPdfAllowed(this.runtime)) { + const err: any = new Error('PDF uploads are disabled by policy'); + err.code = 'PDF_DISABLED'; + err.status = 403; + throw err; + } // For PDFs: extract text for fragments but store original base64 in main document try { fileBuffer = Buffer.from(content, 'base64');If you’d prefer the check earlier, you can also add a fast‑fail in addKnowledge before processDocument.
🧹 Nitpick comments (9)
package.json (1)
81-232: Expose KNOWLEDGE_ALLOW_PDF in agentConfig.pluginParameters.Add this boolean so ops/UIs can discover/configure it (default true).
"KNOWLEDGE_PATH": { "type": "string", "description": "Filesystem path where the knowledge loader searches for documents. Overrides the default ./docs directory.", "required": false, "default": "./docs", "sensitive": false }, + "KNOWLEDGE_ALLOW_PDF": { + "type": "boolean", + "description": "Allow PDF uploads and processing. Set to false to block PDFs (403).", + "required": false, + "default": true, + "sensitive": false + }, "LOAD_DOCS_ON_STARTUP": { "type": "boolean", "description": "Controls whether the plugin should automatically load documents from the docs folder when the agent starts. Any value other than the string 'false' enables loading.", "required": false, "default": true, "sensitive": false }src/frontend/ui/knowledge-tab.tsx (6)
91-147: Improve toast accessibility (screen readers).Add ARIA roles/live regions so toasts announce properly.
- return ( - <div className="fixed top-4 right-4 z-[9999] flex flex-col gap-2 w-full max-w-md pointer-events-none"> + return ( + <div + className="fixed top-4 right-4 z-[9999] flex flex-col gap-2 w-full max-w-md pointer-events-none" + role="region" + aria-live="polite" + aria-label="Notifications" + > {toastState.toasts.map((toast) => ( - <div + <div key={toast.id} className={cn( 'p-4 rounded-lg shadow-2xl border pointer-events-auto', 'animate-in slide-in-from-top-5 fade-in duration-300', 'bg-card text-card-foreground border-border', toast.variant === 'destructive' && 'bg-destructive text-destructive-foreground border-destructive' )} + role={toast.variant === 'destructive' ? 'alert' : 'status'} >
818-839: Add 403/PDF-disabled fallback for non-JSON error bodies (URL import).If server returns 403 without JSON, show the PDF-disabled message.
- if (!result.ok) { + if (!result.ok) { // Try to parse error response as JSON let errorMessage = `Upload failed: ${result.statusText}`; try { const contentType = result.headers.get('content-type'); if (contentType && contentType.includes('application/json')) { const errorData = await result.json(); if (errorData.error?.message) { errorMessage = errorData.error.message; } else if (errorData.error?.code === 'PDF_DISABLED') { errorMessage = 'PDF uploads are currently disabled. Please contact your administrator to enable PDF support.'; } } else { const errorText = await result.text(); if (errorText) { errorMessage = errorText; } } } catch (e) { // If parsing fails, keep the default error message console.error('Error parsing URL upload error response:', e); } + if (result.status === 403 && !/pdf uploads are currently disabled/i.test(errorMessage)) { + errorMessage = + 'PDF uploads are currently disabled. Please contact your administrator to enable PDF support.'; + } throw new Error(errorMessage); }
891-914: Mirror 403/PDF-disabled fallback for file uploads.Same handling as URL path for consistency.
- if (!response.ok) { + if (!response.ok) { // Try to parse error response as JSON let errorMessage = `Upload failed: ${response.statusText}`; try { const contentType = response.headers.get('content-type'); if (contentType && contentType.includes('application/json')) { const errorData = await response.json(); if (errorData.error?.message) { errorMessage = errorData.error.message; } else if (errorData.error?.code === 'PDF_DISABLED') { errorMessage = 'PDF uploads are currently disabled. Please contact your administrator to enable PDF support.'; } } else { const errorText = await response.text(); if (errorText) { errorMessage = errorText; } } } catch (e) { // If parsing fails, keep the default error message console.error('Error parsing upload error response:', e); } + if (response.status === 403 && !/pdf uploads are currently disabled/i.test(errorMessage)) { + errorMessage = + 'PDF uploads are currently disabled. Please contact your administrator to enable PDF support.'; + } throw new Error(errorMessage); }
1536-1543: Hide .pdf from accept list when PDFs are disabled (UX).Read a flag (e.g., window.ELIZA_CONFIG.knowledgeAllowPdf) and omit .pdf from the accept string to avoid misleading users. Backward-compatible if the flag isn’t present.
1768-1791: Sandbox the PDF iframe for defense‑in‑depth.Add sandbox/referrer policy to isolate the embedded viewer.
- <iframe + <iframe src={pdfDataUrl} className="w-full border-0 shadow-md" style={{ height: '90vh', maxWidth: '1200px', backgroundColor: 'var(--background)', }} + sandbox="" + referrerPolicy="no-referrer" title="PDF Document" onError={() => { console.error('Failed to load PDF in iframe'); }} />
224-338: Minor: normalize extension handling in getCorrectMimeType.You already lowercase ext; drop the uppercase 'R' entry and consider consolidating with a Set for readability.
README.md (1)
68-69: Clarify runtime behavior when PDFs are disabled.Note that blocked attempts return HTTP 403 with JSON code PDF_DISABLED to aid operators and UI handling.
Also applies to: 142-146
src/routes.ts (1)
303-311: Consider checking content type before fetching URL content.The current implementation fetches the content from the URL (line 276) before checking if it's a PDF (line 304). This means the system downloads the PDF content before rejecting it.
While this works correctly, it could be optimized by:
- Sending a HEAD request first to check the content type
- Only fetching the content if the type is allowed
However, this optimization adds complexity and may not be necessary if most URLs are valid. The current approach is acceptable.
If you'd like to optimize this, here's a suggested approach:
logger.debug(`[Document Processor] 🌐 Fetching content from URL: ${fileUrl}`); + // Optional: Check content type with HEAD request first + if (!isPdfAllowed(runtime)) { + try { + const headResponse = await fetch(fileUrl, { method: 'HEAD' }); + const contentType = headResponse.headers.get('content-type'); + if (contentType === 'application/pdf') { + logger.warn( + `[Document Processor] ❌ PDF upload rejected (PDF support disabled): ${fileUrl}` + ); + throw new Error( + 'PDF file uploads are disabled. Contact administrator to enable PDF support.' + ); + } + } catch (headError) { + // If HEAD fails, continue with normal fetch + logger.debug(`[Document Processor] HEAD request failed, proceeding with fetch`); + } + } + // Fetch the content from the URL const { content, contentType: fetchedContentType } = await fetchUrlContent(fileUrl);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (8)
README.md(2 hunks)package.json(1 hunks)src/config.ts(1 hunks)src/docs-loader.ts(4 hunks)src/frontend/ui/knowledge-tab.tsx(5 hunks)src/index.ts(1 hunks)src/routes.ts(5 hunks)src/service.ts(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
🔇 Additional comments (13)
package.json (1)
4-4: Version bump looks good.src/index.ts (1)
6-6: Types‑only import cleanup is correct.src/service.ts (1)
67-73: Good: runtime passed to loadDocsFromPath.Enables runtime‑aware PDF gating during startup loads.
src/config.ts (2)
5-10: LGTM! Backward-compatible default parameter.The updated signature with an optional
defaultValueparameter is well-designed. The fallback logic correctly returns thedefaultValuewhen the input isundefinedornull, maintaining the function's robustness.
12-27: LGTM! Runtime-aware PDF control with safe defaults.The
isPdfAllowedfunction properly checks runtime settings first, then falls back to environment variables, with a sensible default oftrue(enabled). This maintains backward compatibility—existing deployments continue allowing PDFs unless explicitly disabled.src/docs-loader.ts (4)
7-7: LGTM! Import added for PDF control.The import of
isPdfAllowedis correctly placed and aligns with the new PDF support feature.
37-43: LGTM! Runtime parameter properly added.The function signature now accepts an optional
runtimeparameter, enabling runtime-aware PDF filtering. This change aligns with the service-level integration mentioned in the AI summary.
53-57: LGTM! Clear warning when PDF support is disabled.The check for PDF support and corresponding warning message properly inform operators when PDFs will be skipped during document loading.
91-95: LGTM! PDF files correctly skipped when disabled.The conditional skip logic is correct—when a PDF file is encountered and PDF support is disabled, it's silently skipped with appropriate debug logging. The use of
continueensures the loop proceeds to the next file.src/routes.ts (4)
8-8: LGTM! Import added for PDF control.The import of
isPdfAllowedis correctly placed and enables runtime PDF filtering in the routes.
16-34: LGTM! Dynamic MIME type filtering based on PDF support.The implementation correctly:
- Initializes
allowedMimeTypeswith all supported types- Checks PDF support via
isPdfAllowed(runtime)- Filters out
'application/pdf'when disabled- Logs an appropriate warning
This ensures PDFs are blocked at the Multer level when support is disabled.
42-51: LGTM! User-friendly error message for blocked PDFs.The conditional error message properly distinguishes between:
- PDF uploads when PDF support is disabled (clear, actionable message)
- Other unsupported file types (generic message with allowed types)
The error messaging is user-friendly and avoids exposing technical details.
1239-1252: LGTM! Error differentiation works and the PDF-disabled message is consistent across the codebase.
Consider extracting the literal into a constant and adding a unit test to guard theerr.message.includes('PDF file uploads are disabled')logic against future changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addresses security concerns around PDF file format while maintaining backward
compatibility. When KNOWLEDGE_ALLOW_PDF=false is set, all PDF uploads are
blocked with user-friendly error messages.
Note
Adds KNOWLEDGE_ALLOW_PDF to toggle PDF support (default enabled), blocks PDFs across loaders/uploads when disabled, improves frontend toasts and error handling, and updates docs.
KNOWLEDGE_ALLOW_PDFenv with helperisPdfAllowed(default: true); refineparseBooleanEnv.loadDocsFromPath; service updated to use it.PDF_DISABLED(403)..pdffiles when PDF support is disabled.KNOWLEDGE_ALLOW_PDF.1.5.12.Written by Cursor Bugbot for commit ffc0fc1. This will update automatically on new commits. Configure here.
Summary by CodeRabbit
New Features
Documentation
Chores