-
Notifications
You must be signed in to change notification settings - Fork 591
Adds Azure OpenAI endpoint compatibility and Azure DevOps repo support #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @Barneyjm, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request introduces significant enhancements by adding compatibility for Azure OpenAI endpoints and enabling support for processing repositories hosted on Azure DevOps. The changes involve integrating a new Azure OpenAI client, implementing specific logic to handle Azure DevOps' unique URL structures and authentication for cloning and file retrieval, updating the application's configuration to prioritize Azure OpenAI as the default provider, and modifying both the backend API routing and the frontend UI to seamlessly support Azure DevOps repositories.
Highlights
- Azure OpenAI Integration: A new
AzureOpenAIClient
class is added to provide a dedicated client for interacting with Azure OpenAI services, supporting both chat completions and embeddings, including handling streaming responses, multimodal inputs, and model-specific API versions. - Azure DevOps Repository Support: New backend functions are introduced in
azuredevops_client.py
to specifically handle fetching file content (using the Azure DevOps REST API via curl) and cloning repositories (using git clone with PAT authentication) from Azure DevOps, addressing the platform's distinct repository structure and URL formats. - API Routing and Frontend Handling for Azure DevOps: The FastAPI application (
api.py
) gains new routes to manage Azure DevOps URLs, including a catch-all route (/{organization}/{project}/{repository}
) that processes the repository and redirects the frontend to a simplified URL (/{project}/{repository}
) containing the file tree and README data as query parameters. The frontend (src/app/[owner]/[repo]/page.tsx
,src/app/page.tsx
) is updated to recognize and correctly parse Azure DevOps URLs, handle this redirection, and display the appropriate repository information and icon. - Configuration and Default Provider Changes: Configuration files (
config.py
,embedder.json
,generator.json
) are updated to include the newAzureOpenAIClient
, set Azure OpenAI as the default provider for both embeddings and generation, and define supported models and their specific API versions. - Data Pipeline Enhancements: The data processing pipeline (
data_pipeline.py
) is modified to incorporate the new Azure DevOps cloning and file content retrieval logic, handle URL decoding for Azure DevOps paths (including spaces in project names), generate unique local paths for cloned repositories based on URL hashes, and include extended file attributes in the processed documents. - Improved Retriever Handling: The RAG implementation (
api/rag.py
) is updated to explicitly pass thetop_k
parameter to the FAISS retriever and includes more robust handling and logging for the structure of retrieved documents, providing a fallback mechanism if expected attributes are missing. - Multilingual UI Updates: Translation files (
src/messages/*.json
) are updated across multiple languages (English, Spanish, Japanese, Korean, Vietnamese, Chinese) to include 'Azure DevOps' in descriptions and input placeholders.
Changelog
Click here to see the changelog
- api/api.py
- Added imports for URL parsing (
quote
) and redirection (RedirectResponse
). - Created and mounted a separate
api_router
at the/api
prefix. - Moved existing
/api/wiki_cache
endpoints (GET
,POST
,DELETE
) to the newapi_router
. - Added a simplified route
/{project}/{repository}
to handle Azure DevOps URLs with file tree/readme in query params. - Added a catch-all route
/{organization}/{project}/{repository}
to process Azure DevOps URLs, download the repo, and redirect to the simplified route. - Added new API endpoints
/api/azuredevops/structure
and/api/repo/structure
for fetching repository structure, specifically supporting Azure DevOps. - Moved the
/api/processed_projects
endpoint to theapi_router
.
- Added imports for URL parsing (
- api/azure_openai_client.py
- Added a new file implementing the
AzureOpenAIClient
class. - Provides integration with Azure OpenAI chat completion and embedding APIs.
- Handles API key and endpoint configuration via environment variables.
- Supports model-specific API versions.
- Includes logic for handling multimodal inputs (images).
- Implements synchronous and asynchronous API calls.
- Parses chat completion and embedding responses into Adalflow formats.
- Includes methods for converting standard inputs to API-specific kwargs.
- Added a new file implementing the
- api/azuredevops_client.py
- Added a new file with utility functions for Azure DevOps.
- Implemented
get_azuredevops_file_content
to fetch file content using the Azure DevOps REST API (via curl). - Implemented
clone_azuredevops_repo
to clone repositories using git clone, handling PAT authentication and URL encoding for project names with spaces.
- api/config.py
- Imported
AzureOpenAIClient
. - Added environment variable checks for
AZURE_OPENAI_API_KEY
andAZURE_OPENAI_ENDPOINT
(including backward compatibility forAZURE_OPENAI_API_BASE
). - Added
AzureOpenAIClient
to theMODEL_CLIENT_CLASSES
dictionary. - Updated
get_model_config
to handle the 'azure' provider, mapping 'model' to 'deployment_id' and passingmodel_api_versions
.
- Imported
- api/config/embedder.json
- Changed the default 'embedder' client to
AzureOpenAIClient
usingtext-embedding-ada-002
. - Renamed the previous default 'embedder' section to 'embedder_openai'.
- Changed the default 'embedder' client to
- api/config/generator.json
- Changed the
default_provider
from 'google' to 'azure'. - Added a new 'azure' provider configuration with
AzureOpenAIClient
. - Defined
gpt-4o
as the default model for the 'azure' provider. - Included configurations for several GPT models under the 'azure' provider.
- Added
model_api_versions
mapping for specific Azure OpenAI models.
- Changed the
- api/data_pipeline.py
- Added
get_file_attributes
function to extract extended file attributes. - Modified
download_repo
to generate a unique local path based on the repository URL hash. - Updated
download_repo
to handle Azure DevOps cloning specifically, including URL formatting for git. - Changed
download_repo
to return the local path of the cloned repository. - Updated
process_repository
to include extracted file attributes in document metadata. - Modified
get_file_content
to support the 'azure' repository type, callingget_azuredevops_file_content
and handling URL decoding. - Updated
_create_repo
to correctly parse Azure DevOps repository names from URLs. - Adjusted
_create_repo
logic to use the local path returned bydownload_repo
and copy contents if needed.
- Added
- api/rag.py
- Passed the
top_k
parameter directly to the FAISS retriever initialization. - Added logging and improved handling for the structure of retrieved documents in the
call
method, checking fordoc_indices
and providing a fallback.
- Passed the
- api/websocket_wiki.py
- Imported
AzureOpenAIClient
. - Added checks for Azure OpenAI environment variables and an availability flag.
- Changed the default provider in the
ChatCompletionRequest
model to 'azure'. - Added logic to use
AzureOpenAIClient
when the requested provider is 'azure', formatting messages for the Azure OpenAI API. - Updated the fallback logic for model generation to prioritize 'azure' if available, then 'google', then 'openai'.
- Slightly modified the system prompts (content appears unrelated to Azure/Azure DevOps).
- Imported
- next.config.ts
- Added a rewrite rule to forward the Azure DevOps URL pattern
/:organization/:project/:repository
to the backend server.
- Added a rewrite rule to forward the Azure DevOps URL pattern
- src/app/[owner]/[repo]/page.tsx
- Added console logging for debugging.
- Added a check for missing file tree data in
determineWikiStructure
. - Updated the prompt text sent to the backend to specify the repository type (GitHub, GitLab, Bitbucket, Azure DevOps).
- Implemented logic to handle Azure DevOps URLs: check for file tree/readme in query parameters or redirect to the backend's catch-all route.
- Adjusted the extraction of the owner name for Azure DevOps to use only the organization part.
- Added an Azure DevOps icon to the repository header.
- src/app/page.tsx
- Added 'azure' to the possible types for the
selectedPlatform
state. - Updated the
parseRepositoryInput
function to detect and parse Azure DevOps URLs, extracting organization, project, and repository, and handling URL decoding. - Modified the logic for appending query parameters to the URL, specifically handling encoding and adding the
repo_url
parameter for Azure DevOps.
- Added 'azure' to the possible types for the
- src/components/ConfigurationModal.tsx
- Added 'azure' to the possible types for the
selectedPlatform
prop. - Added a button for selecting 'Azure DevOps' as a platform option.
- Added 'azure' to the possible types for the
- src/messages/en.json
- Updated descriptions and placeholders to include 'Azure DevOps'.
- src/messages/es.json
- Updated descriptions and placeholders to include 'Azure DevOps'.
- src/messages/ja.json
- Updated descriptions and placeholders to include 'Azure DevOps'.
- src/messages/kr.json
- Updated descriptions and placeholders to include 'Azure Devops'.
- src/messages/vi.json
- Updated descriptions and placeholders to include 'Azure Devops'.
- src/messages/zh.json
- Updated descriptions and placeholders to include 'Azure DevOps'.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Azure clouds now join the flow,
DevOps repos start to grow.
Code and docs, a wiki bright,
AI helps with all its might.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant enhancements by adding compatibility for Azure OpenAI and support for Azure DevOps repositories. The changes are quite extensive, especially in api/api.py
and the new client files. Overall, the additions seem well-considered to expand the platform's capabilities.
I've identified a few areas for improvement, primarily concerning consistency in path handling for downloaded repositories, import statement locations, and a potential mismatch in parameter naming for the Azure OpenAI client. There are also opportunities to refactor some duplicated logic in the API routes for Azure DevOps.
The frontend changes to support Azure DevOps URLs and UI elements look good. The persona change in the system prompts for websocket_wiki.py
is a notable shift and should be confirmed if it's intended for the entire application's scope.
Summary of Findings
- Consistency in Repository Path Handling: The
download_repo
function indata_pipeline.py
now uses hash-based paths, which is good. However,DatabaseManager._create_repo
copies this to a name-based path. Consider using the hash-based path consistently to avoid potential collisions and an extra copy step. - Python Import Locations: Several Python files (
api/api.py
,api/azuredevops_client.py
,api/data_pipeline.py
) have imports inside functions. These should be moved to the top of their respective files per PEP 8. - AzureOpenAIClient Parameter Mismatch: In
api/websocket_wiki.py
,api_kwargs
forAzureOpenAIClient.acall
uses amodel
key. The client expectsdeployment_id
. This should be corrected to passdeployment_id
. - URL Parsing Logic: In
api/api.py
(specificallyget_azuredevops_structure
), the Azure DevOps URL parsing logic for length check (len(url_parts) < 6
) might be too lenient and could belen(url_parts) < 7
for robustness. - System Prompt Persona Change: The system prompts in
api/websocket_wiki.py
have shifted from a code analysis persona to an RFP/insurance analyst. This is a significant change and should be verified if it's the intended scope for the entire application.
Merge Readiness
This pull request introduces valuable new features for Azure OpenAI and Azure DevOps support. However, there are a few critical and high-severity issues, particularly the parameter mismatch for the AzureOpenAIClient and potential inconsistencies in repository path handling, that should be addressed before merging. Additionally, standardizing import locations would improve code quality.
I am unable to approve this pull request myself. Please ensure these changes are reviewed and approved by other maintainers after addressing the feedback.
api_kwargs = { | ||
"messages": messages, | ||
"model": request.model or "gpt-4", # Ensure model is included | ||
"stream": True, | ||
"temperature": model_config.get("temperature", 0.7), | ||
"top_p": model_config.get("top_p", 0.8) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When constructing api_kwargs
for AzureOpenAIClient.acall
, the model
key is used (e.g., "model": request.model or "gpt-4"
).
However, AzureOpenAIClient
seems designed to receive deployment_id
in api_kwargs
(as handled by its convert_inputs_to_api_kwargs
method and checked in call
/acall
). The AzureOpenAIClient
then internally maps this deployment_id
to the model
parameter if required by the underlying OpenAI SDK.
Since convert_inputs_to_api_kwargs
is not called here, should api_kwargs
be constructed with "deployment_id": request.model or "gpt-4"
instead, to align with AzureOpenAIClient
's expectation? This seems to apply to the fallback logic as well (e.g., line 833-839).
if len(url_parts) < 6: | ||
raise HTTPException(status_code=400, detail=f"Invalid Azure DevOps repository URL: {repo_url}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an Azure DevOps URL like https://dev.azure.com/{organization}/{project}/_git/{repository}
, splitting by /
results in 7 parts (e.g., ['https:', '', 'dev.azure.com', 'org', 'proj', '_git', 'repo']
).
The current check if len(url_parts) < 6:
might be too lenient. If len(url_parts)
is 6 (e.g., repository name missing), url_parts[-1]
would incorrectly point to _git
.
Should this check be if len(url_parts) < 7:
to ensure all expected segments, including the repository name, are present?
def download_repo(repo_url: str, type: str = "github", access_token: str = None) -> str: | ||
""" | ||
Downloads a Git repository (GitHub, GitLab, or Bitbucket) to a specified local path. | ||
Downloads a repository to a local directory. | ||
|
||
Args: | ||
repo_url (str): The URL of the Git repository to clone. | ||
local_path (str): The local directory where the repository will be cloned. | ||
access_token (str, optional): Access token for private repositories. | ||
repo_url (str): URL of the repository to download | ||
type (str): Type of repository (github, gitlab, bitbucket, azure) | ||
access_token (str, optional): Personal access token for private repositories | ||
|
||
Returns: | ||
str: The output message from the `git` command. | ||
str: Path to the local directory containing the repository | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The download_repo
function's signature has changed to generate local_path
internally using a hash of repo_url
and return this path. This is a good improvement for avoiding path collisions.
However, in DatabaseManager._create_repo
(lines 874-892), the downloaded content from this hash-based repo_path
is then copied to save_repo_dir
(which is name-based, e.g., ~/.adalflow/repos/myrepo
). This copy operation might be inefficient and reintroduces the risk of name collisions if repo_name
isn't globally unique (e.g., different Azure DevOps organizations having a repo with the same name).
Would it be more straightforward and robust to use the hash-based path (returned by download_repo
) consistently as the primary storage and processing directory, thus eliminating the copy step and the potential for name-based collisions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if we'll want to implement this here, but it's a reasonable suggestion. the hash was implemented to help with weird characters in azure devops repos
Love your contribution, but there are some conflicts that needed resolve. Also, there's a new changes on how we define embedder that just got merged in: Can you follow the |
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces Azure OpenAI endpoint compatibility and Azure DevOps repository support, enhancing the application's versatility. The changes span multiple files, incorporating new functionalities and configurations. Overall, the code appears well-structured, but there are areas where improvements can be made to enhance clarity, maintainability, and error handling.
Summary of Findings
- Verbose curl output: The code uses verbose curl output for debugging, which is helpful during development but should be removed or disabled in production to avoid exposing sensitive information.
- Token sanitization: The code sanitizes error messages to remove tokens, but the current implementation might not be robust enough to handle all cases. Consider using a regular expression to handle cases where the token might be surrounded by other characters.
- Azure DevOps URL handling: The code handles Azure DevOps URLs with spaces in project names, but the logic is complex and could be simplified or made more robust by using a dedicated library for URL parsing and manipulation.
Merge Readiness
The pull request introduces valuable new features and appears to be generally well-structured. However, before merging, it's recommended to address the medium and high severity issues to ensure code quality and security. I am unable to approve this pull request, and recommend that others review and approve this code before merging.
if organization.startswith('.well-known') or organization.startswith('_next') or organization.startswith('api'): | ||
raise HTTPException(status_code=404, detail="Resource not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not access_token and repo_type != 'github': # GitHub can work without a token for public repos | ||
logger.warning(f"{repo_type.upper()}_TOKEN not found in environment variables") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
# Return a redirect response | ||
return RedirectResponse(url=redirect_url, status_code=302) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if 'dimensions' in embedding_api_kwargs: | ||
dimensions = embedding_api_kwargs.pop('dimensions') | ||
log.info(f"Removed dimensions parameter ({dimensions}) as it's not supported by {model}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
api_version="2024-02-01", # API version compatible with text-embedding-3 models | ||
azure_endpoint=self.base_url | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error_msg = e.stderr | ||
# Sanitize error message to remove any tokens | ||
if access_token and access_token in error_msg: | ||
error_msg = error_msg.replace(access_token, "[REDACTED]") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clone_url = urlunparse((parsed.scheme, f"{organization}:{access_token}@{parsed.netloc}", path, '', '', '')) | ||
|
||
# Log the URL format (without exposing the token) | ||
sanitized_url = urlunparse((parsed.scheme, f"{organization}:***TOKEN***@{parsed.netloc}", path, '', '', '')) | ||
logger.info(f"Azure DevOps clone URL format (sanitized): {sanitized_url}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good practice to sanitize the URL to avoid exposing the token in logs. However, the current implementation only replaces the token with ***TOKEN***
in the log message. Consider using a more robust method to prevent accidental exposure of the token, such as not logging the URL at all or using a hash of the token instead.
window.location.href = requestUrl; | ||
return; // Stop execution here as we're redirecting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind to provide steps to test? and have Azure Devops Repo users try this out? |
Sure! Though there's no way for me to provide a public azdo repo, to my knowledge.
![]() ![]() |
Azure DevOps integration is difficult due to how the repositories are structured/how the Azure DevOps APIs seem to work. It requires a specific API route to handle these repos specifically.
If you'd like, i can probably split these additions up into one for AzureOpenAI and one for Azure DevOps.