You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
New approach to multimodal document ingestion (#2558)
* Prepare change for multimodal, rm old vision approach stuff
* Add LLM-based media describer
* Prepdocs progress
* Fix media description with OpenAI
* More prepdocs improvements for image handling
* Store bbox as list of pixel floats, add storage container just for extracted images
* Getting image citations almost working
* More progress on multimodal approach
* Update more tests
* Fix up more app tests
* Add test for upload_document_image
* Add media describer and embeddings tests
* Fix tests for vision, work on vectorizer
* Add font, rename multimodal doc
* Update links to multimodal
* Fix import
* Doc fixes
* Fix f-string syntax
* Markdown lint issues
* mypy fixes and reasoning fixes
* Rename vision variables, fix mypy
* Mypy fixes
* Fix all mypy issues
* Fixes to sidebar so that it all fits
* Fixes to sidebar so that it all fits
* Integrated vectorization and user upload work
* Progress on user upload support
* changes needed for user upload
* Update tests
* Integrated vectorization progress
* Fix tests
* Use ImageEmbeddings client directly
* Change frontend for vector fields
* Use boolean parameters in the backend as well, for vector fields
* Updated translations
* Change frontend for LLM inputs
* Change from LLM inputs to booleans
* Working on tests
* Blob manager improvements/tests
* Change to a global client that we close in lifespan
* Add latest int vect changes
* Update the tests
* Add as_bytes option
* Mypy fixes
* Mypy fixes
* More mypy fixes
* More mypy fixes
* Address more TODOs
* Fix E2E tests
* Add more tests for blobmanger
* Markdown fix, more coverage
* Fix broken MD link
* Increase coverage
* Increase test coverage
* Add diff-cover step to python test
* Fix diff-cover action
* Fetch origin main for diff-cover
* Increase test coverage
* More tests, Windows check
* Better copilot instructions
* Updated merge
* Revert integrated vectorization changes, using a different strategy
* Remove unused identity in int vect
* Refactor get_sources_content to return DataPoints
* Fix multimodal bicep variable error
* Fix prepdocs to properly close async clients
* better CSS for image URLs and images in Thought Process and Supporting Content
* Revert logging level to WARNING as before
* Update text splitter chunking logic and add full test coverage
* Use single token char
* Fix mypy error
* Add some helper functions and modules to improve code clarity for textsplitter
* Update splitting algorithm with better overlap algorithm, rename SplitPage to Chunk
* markdown issues
* Revise multimodal doc to be clearer
* Rephrase fragment shift to be more grokkable
* Reword duplicate part of textsplitter doc
Copy file name to clipboardExpand all lines: .github/chatmodes/fixer.chatmode.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,12 @@ You MUST check task output readiness before debugging, testing, or declaring wor
26
26
- If watchers seem stuck or output stops updating, stop the tasks and run the "Development" task again.
27
27
- To interact with a running application, use the Playwright MCP server. If testing login, you will need to navigate to 'localhost' instead of '127.0.0.1' since that's the URL allowed by the Entra application.
28
28
29
+
## Running Python scripts
30
+
31
+
If you are running Python scripts that depend on installed requirements, you must run them using the virtual environment in `.venv`.
32
+
29
33
## Committing the change
30
34
31
35
When change is complete, offer to make a new branch, git commit, and pull request.
36
+
(DO NOT check out a new branch unless explicitly confirmed - sometimes user is already in a branch)
32
37
Make sure the PR follows the PULL_REQUEST_TEMPLATE.md format, with all sections filled out and appropriate checkboxes checked.
Copy file name to clipboardExpand all lines: README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -61,7 +61,7 @@ The repo includes sample data so it's ready to try end to end. In this sample ap
61
61
- Renders citations and thought process for each answer
62
62
- Includes settings directly in the UI to tweak the behavior and experiment with options
63
63
- Integrates Azure AI Search for indexing and retrieval of documents, with support for [many document formats](/docs/data_ingestion.md#supported-document-formats) as well as [integrated vectorization](/docs/data_ingestion.md#overview-of-integrated-vectorization)
64
-
- Optional usage of [GPT-4 with vision](/docs/gpt4v.md) to reason over image-heavy documents
64
+
- Optional usage of [multimodal models](/docs/multimodal.md) to reason over image-heavy documents
65
65
- Optional addition of [speech input/output](/docs/deploy_features.md#enabling-speech-inputoutput) for accessibility
66
66
- Optional automation of [user login and data access](/docs/login_and_acl.md) via Microsoft Entra
67
67
- Performance tracing and monitoring with Application Insights
@@ -92,7 +92,7 @@ However, you can try the [Azure pricing calculator](https://azure.com/e/e3490de2
92
92
- Azure AI Search: Basic tier, 1 replica, free level of semantic search. Pricing per hour. [Pricing](https://azure.microsoft.com/pricing/details/search/)
93
93
- Azure Blob Storage: Standard tier with ZRS (Zone-redundant storage). Pricing per storage and read operations. [Pricing](https://azure.microsoft.com/pricing/details/storage/blobs/)
94
94
- Azure Cosmos DB: Only provisioned if you enabled [chat history with Cosmos DB](docs/deploy_features.md#enabling-persistent-chat-history-with-azure-cosmos-db). Serverless tier. Pricing per request unit and storage. [Pricing](https://azure.microsoft.com/pricing/details/cosmos-db/)
95
-
- Azure AI Vision: Only provisioned if you enabled [GPT-4 with vision](docs/gpt4v.md). Pricing per 1K transactions. [Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/computer-vision/)
95
+
- Azure AI Vision: Only provisioned if you enabled [multimodal approach](docs/multimodal.md). Pricing per 1K transactions. [Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/computer-vision/)
96
96
- Azure AI Content Understanding: Only provisioned if you enabled [media description](docs/deploy_features.md#enabling-media-description-with-azure-content-understanding). Pricing per 1K images. [Pricing](https://azure.microsoft.com/pricing/details/content-understanding/)
97
97
- Azure Monitor: Pay-as-you-go tier. Costs based on data ingested. [Pricing](https://azure.microsoft.com/pricing/details/monitor/)
98
98
@@ -255,7 +255,7 @@ You can find extensive documentation in the [docs](docs/README.md) folder:
0 commit comments