-
Notifications
You must be signed in to change notification settings - Fork 68
Add In-App PDF Preview for KA Citations #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add react-pdf for PDF rendering in a side drawer - Create backend proxy (/api/files/volumes) for authenticated PDF fetching - Parse UC PDF links and extract page/text fragment from URL hash - Show filename, page number, and cited text in tooltip - Include download and "Open in Catalog" buttons in preview header 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Replace @react-pdf-viewer with react-pdf (more actively maintained) - Add manual text highlighting for cited text fragments - Only highlight on the initial page from citation - Change backend endpoint to POST /api/files/databricks-file - Use official Databricks Files API path (/api/2.0/fs/files/) - Fetch PDF via POST with path in body, return blob URL - Add proper error handling for streams to prevent server crashes - Add CSS styles for highlighted text spans 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Create tests/routes/files.test.ts with 15 test cases covering: - Request validation (missing/invalid path) - File type handling (PDF, JSON, TXT, images, unknown) - Databricks API integration (404, 403, 500 errors) - Streaming and header forwarding - Path normalization and edge cases - Add Databricks Files API mock handlers to api-mock-handlers.ts - All tests pass (15 passed, 1 skipped) - Skip unauthenticated test due to test environment auth defaults
- Add collapsible section in PDF preview sheet header to show original cited text - Display cited text in a styled box with proper whitespace preservation - Add toggle functionality with chevron icons (defaults to expanded) - Only shows when highlightText prop is provided - Helps users understand what specific content was referenced in citations
- Uninstalled @react-pdf-viewer/highlight package (no longer needed) - Removed pdf-highlight.css file and import - Added vite-plugin-static-copy to bundle PDF.js worker file - Updated Vite config to copy worker from node_modules to /assets/ - Changed worker source from CDN to bundled path for offline support The PDF.js worker is now bundled with the application instead of being loaded from unpkg CDN, improving reliability and removing external dependencies. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Remove unused highlighting code that was accidentally left in the PDFViewer component. Highlighting will be added in a separate PR. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Ensure reliable worker file path in vite-plugin-static-copy by adding pdfjs-dist as an explicit dependency instead of relying on it being hoisted from react-pdf's node_modules. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add right padding to header buttons to prevent overlap with the sheet's close button - Fix "Open in Catalog" link to use full workspace URL instead of relative path 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Show the page number in the cited text section header to help users understand which page the citation is from. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
| <button | ||
| type="button" | ||
| onClick={handleClick} | ||
| className="cursor-pointer rounded-md bg-muted-foreground px-2 py-0 font-medium text-zinc-200 underline" | ||
| > | ||
| {children} | ||
| </button> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Howcome we are not using the button component that we use elsewhere?
import { Button } from '@/components/ui/button';
| const [error, setError] = useState<PDFError | null>(null); | ||
| const [retryKey, setRetryKey] = useState(0); | ||
| const [blobUrl, setBlobUrl] = useState<string | null>(null); | ||
| const [isLoading, setIsLoading] = useState(false); | ||
| const [showCitedText, setShowCitedText] = useState(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth considering using a reducer instead of this many states.
Seems like the first four states form an implicit state machine:
- Idle → Loading → Success (with blobUrl) | Error
- Error → Retry → Loading
Several places update multiple states together:
- loadPdf start: setIsLoading(true), setError(null)
- loadPdf success: setBlobUrl(url), (implicitly setIsLoading(false))
- loadPdf error: setError(...), setIsLoading(false)
- handleRetry: setError(null), setBlobUrl(null), setRetryKey(prev + 1)
A reducer would make these transitions explicit and prevent impossible states (e.g., having both error and blobUrl set simultaneously).
| return ( | ||
| <Sheet open={open} onOpenChange={handleOpenChange}> | ||
| <SheetContent | ||
| className="flex w-[70vw] max-w-4xl flex-col gap-0 p-0 sm:max-w-4xl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont we have the same class name twice?
| const message = err instanceof Error ? err.message.toLowerCase() : ''; | ||
| if (message.includes('404')) { | ||
| setError({ type: 'NotFoundError' }); | ||
| } else if (message.includes('403')) { | ||
| setError({ type: 'PermissionError' }); | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big fan of string matching on errors. It would be better if our fetchDatabricksFile threw typed errors :/
| ); | ||
|
|
||
| // Fetch the PDF when the sheet opens | ||
| useEffect(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit confusing with the two separate useEffects that manage blob cleanup. First one revokes currentBlobUrl, second one also revokes it when the sheet closes. Might make sense to consider consolidation even if a double-revoke isn't a big problem
| for (const entry of entries) { | ||
| setContainerWidth(entry.contentRect.width); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would we actually ever have multiple entries?
| } else if (hash.startsWith(':~:text=')) { | ||
| // Handle case where hash only contains text fragment without page | ||
| textFragment = decodeURIComponent(hash.substring(':~:text='.length)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only triggers when hash is exactly :~:text= (empty text fragment) because the regex .+ requires at least one character, right? would this ever occur in practice? should we throw an error something instead?
| const filePath = `${match[1]}/${match[2]}`; | ||
|
|
||
| // Keep the original URL for reference (e.g., for "Open in Catalog" link) | ||
| const originalUrl = new URL(href, window.location.origin); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this the exact same code as line 24?
| if (!response.ok) { | ||
| const errorData = await response.json().catch(() => ({})); | ||
| const message = errorData.error || response.statusText; | ||
| throw new Error(`${response.status}: ${message}`); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could throw typed errors here
| const { path: filePath } = req.body as { path?: string }; | ||
|
|
||
| if (!filePath || typeof filePath !== 'string') { | ||
| return res | ||
| .status(400) | ||
| .json({ error: 'File path is required in request body' }); | ||
| } | ||
|
|
||
| // Get Databricks host - prefer cached CLI host, fall back to env | ||
| let hostUrl = getCachedCliHost(); | ||
| if (!hostUrl) { | ||
| hostUrl = getHostUrl(); | ||
| } | ||
|
|
||
| // Construct the Databricks Files API URL | ||
| // Path should start with /Volumes/ for Unity Catalog volumes | ||
| // See: https://docs.databricks.com/api/workspace/files/download | ||
| const normalizedPath = filePath.startsWith('/') | ||
| ? filePath | ||
| : `/${filePath}`; | ||
| const databricksUrl = `${hostUrl}/api/2.0/fs/files${normalizedPath}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the comment says the path should start with volumes, wouldn't it make sense to validate it?
Something like
if (!filePath.match(/^\/?(Volumes\/[^.][^/]*\/[^/]+\/[^/]+\/.+)$/)) {
return res.status(400).json({ error: 'Invalid file path format' });
}
TLDR
Adds first-class support for KA citations with PDF files. The highlighted text and page number is now shown on hover, and the PDF can be previewed in app by clicking on it
chat-template-ka-citation-support.mov
Summary
authentication
Features
Technical Details
Files Changed
Test Plan