Skip to content

Adds Azure OpenAI endpoint compatibility and Azure DevOps repo support #178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

Barneyjm
Copy link

@Barneyjm Barneyjm commented Jun 2, 2025

Azure DevOps integration is difficult due to how the repositories are structured/how the Azure DevOps APIs seem to work. It requires a specific API route to handle these repos specifically.

If you'd like, i can probably split these additions up into one for AzureOpenAI and one for Azure DevOps.

James Barney added 2 commits June 2, 2025 13:53
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Barneyjm, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces significant enhancements by adding compatibility for Azure OpenAI endpoints and enabling support for processing repositories hosted on Azure DevOps. The changes involve integrating a new Azure OpenAI client, implementing specific logic to handle Azure DevOps' unique URL structures and authentication for cloning and file retrieval, updating the application's configuration to prioritize Azure OpenAI as the default provider, and modifying both the backend API routing and the frontend UI to seamlessly support Azure DevOps repositories.

Highlights

  • Azure OpenAI Integration: A new AzureOpenAIClient class is added to provide a dedicated client for interacting with Azure OpenAI services, supporting both chat completions and embeddings, including handling streaming responses, multimodal inputs, and model-specific API versions.
  • Azure DevOps Repository Support: New backend functions are introduced in azuredevops_client.py to specifically handle fetching file content (using the Azure DevOps REST API via curl) and cloning repositories (using git clone with PAT authentication) from Azure DevOps, addressing the platform's distinct repository structure and URL formats.
  • API Routing and Frontend Handling for Azure DevOps: The FastAPI application (api.py) gains new routes to manage Azure DevOps URLs, including a catch-all route (/{organization}/{project}/{repository}) that processes the repository and redirects the frontend to a simplified URL (/{project}/{repository}) containing the file tree and README data as query parameters. The frontend (src/app/[owner]/[repo]/page.tsx, src/app/page.tsx) is updated to recognize and correctly parse Azure DevOps URLs, handle this redirection, and display the appropriate repository information and icon.
  • Configuration and Default Provider Changes: Configuration files (config.py, embedder.json, generator.json) are updated to include the new AzureOpenAIClient, set Azure OpenAI as the default provider for both embeddings and generation, and define supported models and their specific API versions.
  • Data Pipeline Enhancements: The data processing pipeline (data_pipeline.py) is modified to incorporate the new Azure DevOps cloning and file content retrieval logic, handle URL decoding for Azure DevOps paths (including spaces in project names), generate unique local paths for cloned repositories based on URL hashes, and include extended file attributes in the processed documents.
  • Improved Retriever Handling: The RAG implementation (api/rag.py) is updated to explicitly pass the top_k parameter to the FAISS retriever and includes more robust handling and logging for the structure of retrieved documents, providing a fallback mechanism if expected attributes are missing.
  • Multilingual UI Updates: Translation files (src/messages/*.json) are updated across multiple languages (English, Spanish, Japanese, Korean, Vietnamese, Chinese) to include 'Azure DevOps' in descriptions and input placeholders.

Changelog

Click here to see the changelog
  • api/api.py
    • Added imports for URL parsing (quote) and redirection (RedirectResponse).
    • Created and mounted a separate api_router at the /api prefix.
    • Moved existing /api/wiki_cache endpoints (GET, POST, DELETE) to the new api_router.
    • Added a simplified route /{project}/{repository} to handle Azure DevOps URLs with file tree/readme in query params.
    • Added a catch-all route /{organization}/{project}/{repository} to process Azure DevOps URLs, download the repo, and redirect to the simplified route.
    • Added new API endpoints /api/azuredevops/structure and /api/repo/structure for fetching repository structure, specifically supporting Azure DevOps.
    • Moved the /api/processed_projects endpoint to the api_router.
  • api/azure_openai_client.py
    • Added a new file implementing the AzureOpenAIClient class.
    • Provides integration with Azure OpenAI chat completion and embedding APIs.
    • Handles API key and endpoint configuration via environment variables.
    • Supports model-specific API versions.
    • Includes logic for handling multimodal inputs (images).
    • Implements synchronous and asynchronous API calls.
    • Parses chat completion and embedding responses into Adalflow formats.
    • Includes methods for converting standard inputs to API-specific kwargs.
  • api/azuredevops_client.py
    • Added a new file with utility functions for Azure DevOps.
    • Implemented get_azuredevops_file_content to fetch file content using the Azure DevOps REST API (via curl).
    • Implemented clone_azuredevops_repo to clone repositories using git clone, handling PAT authentication and URL encoding for project names with spaces.
  • api/config.py
    • Imported AzureOpenAIClient.
    • Added environment variable checks for AZURE_OPENAI_API_KEY and AZURE_OPENAI_ENDPOINT (including backward compatibility for AZURE_OPENAI_API_BASE).
    • Added AzureOpenAIClient to the MODEL_CLIENT_CLASSES dictionary.
    • Updated get_model_config to handle the 'azure' provider, mapping 'model' to 'deployment_id' and passing model_api_versions.
  • api/config/embedder.json
    • Changed the default 'embedder' client to AzureOpenAIClient using text-embedding-ada-002.
    • Renamed the previous default 'embedder' section to 'embedder_openai'.
  • api/config/generator.json
    • Changed the default_provider from 'google' to 'azure'.
    • Added a new 'azure' provider configuration with AzureOpenAIClient.
    • Defined gpt-4o as the default model for the 'azure' provider.
    • Included configurations for several GPT models under the 'azure' provider.
    • Added model_api_versions mapping for specific Azure OpenAI models.
  • api/data_pipeline.py
    • Added get_file_attributes function to extract extended file attributes.
    • Modified download_repo to generate a unique local path based on the repository URL hash.
    • Updated download_repo to handle Azure DevOps cloning specifically, including URL formatting for git.
    • Changed download_repo to return the local path of the cloned repository.
    • Updated process_repository to include extracted file attributes in document metadata.
    • Modified get_file_content to support the 'azure' repository type, calling get_azuredevops_file_content and handling URL decoding.
    • Updated _create_repo to correctly parse Azure DevOps repository names from URLs.
    • Adjusted _create_repo logic to use the local path returned by download_repo and copy contents if needed.
  • api/rag.py
    • Passed the top_k parameter directly to the FAISS retriever initialization.
    • Added logging and improved handling for the structure of retrieved documents in the call method, checking for doc_indices and providing a fallback.
  • api/websocket_wiki.py
    • Imported AzureOpenAIClient.
    • Added checks for Azure OpenAI environment variables and an availability flag.
    • Changed the default provider in the ChatCompletionRequest model to 'azure'.
    • Added logic to use AzureOpenAIClient when the requested provider is 'azure', formatting messages for the Azure OpenAI API.
    • Updated the fallback logic for model generation to prioritize 'azure' if available, then 'google', then 'openai'.
    • Slightly modified the system prompts (content appears unrelated to Azure/Azure DevOps).
  • next.config.ts
    • Added a rewrite rule to forward the Azure DevOps URL pattern /:organization/:project/:repository to the backend server.
  • src/app/[owner]/[repo]/page.tsx
    • Added console logging for debugging.
    • Added a check for missing file tree data in determineWikiStructure.
    • Updated the prompt text sent to the backend to specify the repository type (GitHub, GitLab, Bitbucket, Azure DevOps).
    • Implemented logic to handle Azure DevOps URLs: check for file tree/readme in query parameters or redirect to the backend's catch-all route.
    • Adjusted the extraction of the owner name for Azure DevOps to use only the organization part.
    • Added an Azure DevOps icon to the repository header.
  • src/app/page.tsx
    • Added 'azure' to the possible types for the selectedPlatform state.
    • Updated the parseRepositoryInput function to detect and parse Azure DevOps URLs, extracting organization, project, and repository, and handling URL decoding.
    • Modified the logic for appending query parameters to the URL, specifically handling encoding and adding the repo_url parameter for Azure DevOps.
  • src/components/ConfigurationModal.tsx
    • Added 'azure' to the possible types for the selectedPlatform prop.
    • Added a button for selecting 'Azure DevOps' as a platform option.
  • src/messages/en.json
    • Updated descriptions and placeholders to include 'Azure DevOps'.
  • src/messages/es.json
    • Updated descriptions and placeholders to include 'Azure DevOps'.
  • src/messages/ja.json
    • Updated descriptions and placeholders to include 'Azure DevOps'.
  • src/messages/kr.json
    • Updated descriptions and placeholders to include 'Azure Devops'.
  • src/messages/vi.json
    • Updated descriptions and placeholders to include 'Azure Devops'.
  • src/messages/zh.json
    • Updated descriptions and placeholders to include 'Azure DevOps'.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Azure clouds now join the flow,
DevOps repos start to grow.
Code and docs, a wiki bright,
AI helps with all its might.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant enhancements by adding compatibility for Azure OpenAI and support for Azure DevOps repositories. The changes are quite extensive, especially in api/api.py and the new client files. Overall, the additions seem well-considered to expand the platform's capabilities.

I've identified a few areas for improvement, primarily concerning consistency in path handling for downloaded repositories, import statement locations, and a potential mismatch in parameter naming for the Azure OpenAI client. There are also opportunities to refactor some duplicated logic in the API routes for Azure DevOps.

The frontend changes to support Azure DevOps URLs and UI elements look good. The persona change in the system prompts for websocket_wiki.py is a notable shift and should be confirmed if it's intended for the entire application's scope.

Summary of Findings

  • Consistency in Repository Path Handling: The download_repo function in data_pipeline.py now uses hash-based paths, which is good. However, DatabaseManager._create_repo copies this to a name-based path. Consider using the hash-based path consistently to avoid potential collisions and an extra copy step.
  • Python Import Locations: Several Python files (api/api.py, api/azuredevops_client.py, api/data_pipeline.py) have imports inside functions. These should be moved to the top of their respective files per PEP 8.
  • AzureOpenAIClient Parameter Mismatch: In api/websocket_wiki.py, api_kwargs for AzureOpenAIClient.acall uses a model key. The client expects deployment_id. This should be corrected to pass deployment_id.
  • URL Parsing Logic: In api/api.py (specifically get_azuredevops_structure), the Azure DevOps URL parsing logic for length check (len(url_parts) < 6) might be too lenient and could be len(url_parts) < 7 for robustness.
  • System Prompt Persona Change: The system prompts in api/websocket_wiki.py have shifted from a code analysis persona to an RFP/insurance analyst. This is a significant change and should be verified if it's the intended scope for the entire application.

Merge Readiness

This pull request introduces valuable new features for Azure OpenAI and Azure DevOps support. However, there are a few critical and high-severity issues, particularly the parameter mismatch for the AzureOpenAIClient and potential inconsistencies in repository path handling, that should be addressed before merging. Additionally, standardizing import locations would improve code quality.

I am unable to approve this pull request myself. Please ensure these changes are reviewed and approved by other maintainers after addressing the feedback.

Comment on lines +571 to +577
api_kwargs = {
"messages": messages,
"model": request.model or "gpt-4", # Ensure model is included
"stream": True,
"temperature": model_config.get("temperature", 0.7),
"top_p": model_config.get("top_p", 0.8)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When constructing api_kwargs for AzureOpenAIClient.acall, the model key is used (e.g., "model": request.model or "gpt-4").
However, AzureOpenAIClient seems designed to receive deployment_id in api_kwargs (as handled by its convert_inputs_to_api_kwargs method and checked in call/acall). The AzureOpenAIClient then internally maps this deployment_id to the model parameter if required by the underlying OpenAI SDK.

Since convert_inputs_to_api_kwargs is not called here, should api_kwargs be constructed with "deployment_id": request.model or "gpt-4" instead, to align with AzureOpenAIClient's expectation? This seems to apply to the fallback logic as well (e.g., line 833-839).

Comment on lines +706 to +707
if len(url_parts) < 6:
raise HTTPException(status_code=400, detail=f"Invalid Azure DevOps repository URL: {repo_url}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For an Azure DevOps URL like https://dev.azure.com/{organization}/{project}/_git/{repository}, splitting by / results in 7 parts (e.g., ['https:', '', 'dev.azure.com', 'org', 'proj', '_git', 'repo']).

The current check if len(url_parts) < 6: might be too lenient. If len(url_parts) is 6 (e.g., repository name missing), url_parts[-1] would incorrectly point to _git.

Should this check be if len(url_parts) < 7: to ensure all expected segments, including the repository name, are present?

Comment on lines +103 to 114
def download_repo(repo_url: str, type: str = "github", access_token: str = None) -> str:
"""
Downloads a Git repository (GitHub, GitLab, or Bitbucket) to a specified local path.
Downloads a repository to a local directory.

Args:
repo_url (str): The URL of the Git repository to clone.
local_path (str): The local directory where the repository will be cloned.
access_token (str, optional): Access token for private repositories.
repo_url (str): URL of the repository to download
type (str): Type of repository (github, gitlab, bitbucket, azure)
access_token (str, optional): Personal access token for private repositories

Returns:
str: The output message from the `git` command.
str: Path to the local directory containing the repository
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The download_repo function's signature has changed to generate local_path internally using a hash of repo_url and return this path. This is a good improvement for avoiding path collisions.

However, in DatabaseManager._create_repo (lines 874-892), the downloaded content from this hash-based repo_path is then copied to save_repo_dir (which is name-based, e.g., ~/.adalflow/repos/myrepo). This copy operation might be inefficient and reintroduces the risk of name collisions if repo_name isn't globally unique (e.g., different Azure DevOps organizations having a repo with the same name).

Would it be more straightforward and robust to use the hash-based path (returned by download_repo) consistently as the primary storage and processing directory, thus eliminating the copy step and the potential for name-based collisions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we'll want to implement this here, but it's a reasonable suggestion. the hash was implemented to help with weird characters in azure devops repos

@sng-asyncfunc
Copy link
Contributor

Love your contribution, but there are some conflicts that needed resolve.

Also, there's a new changes on how we define embedder that just got merged in:
#169

Can you follow the embedder.json.bak format to specify the azureClient example ? I am trying to avoid touching embedder.json to avoid pull conflicts for others.

@Barneyjm
Copy link
Author

Barneyjm commented Jun 3, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces Azure OpenAI endpoint compatibility and Azure DevOps repository support, enhancing the application's versatility. The changes span multiple files, incorporating new functionalities and configurations. Overall, the code appears well-structured, but there are areas where improvements can be made to enhance clarity, maintainability, and error handling.

Summary of Findings

  • Verbose curl output: The code uses verbose curl output for debugging, which is helpful during development but should be removed or disabled in production to avoid exposing sensitive information.
  • Token sanitization: The code sanitizes error messages to remove tokens, but the current implementation might not be robust enough to handle all cases. Consider using a regular expression to handle cases where the token might be surrounded by other characters.
  • Azure DevOps URL handling: The code handles Azure DevOps URLs with spaces in project names, but the logic is complex and could be simplified or made more robust by using a dedicated library for URL parsing and manipulation.

Merge Readiness

The pull request introduces valuable new features and appears to be generally well-structured. However, before merging, it's recommended to address the medium and high severity issues to ensure code quality and security. I am unable to approve this pull request, and recommend that others review and approve this code before merging.

Comment on lines +571 to +572
if organization.startswith('.well-known') or organization.startswith('_next') or organization.startswith('api'):
raise HTTPException(status_code=404, detail="Resource not found")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This check skips well-known paths, special paths, and API routes. It's good to have, but consider adding a comment explaining why these specific paths are being skipped to improve maintainability.

Comment on lines +609 to +610
if not access_token and repo_type != 'github': # GitHub can work without a token for public repos
logger.warning(f"{repo_type.upper()}_TOKEN not found in environment variables")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider using a more descriptive log level than warning here. Since the application can still function (albeit without token authentication), info or medium might be more appropriate. Also, consider adding a link to documentation on how to set the token, to help the user.

Comment on lines +663 to +665

# Return a redirect response
return RedirectResponse(url=redirect_url, status_code=302)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider using urljoin from urllib.parse to construct the redirect URL. This can help ensure that the URL is correctly formatted, especially when dealing with different base URLs and path structures.

Comment on lines +341 to +343
if 'dimensions' in embedding_api_kwargs:
dimensions = embedding_api_kwargs.pop('dimensions')
log.info(f"Removed dimensions parameter ({dimensions}) as it's not supported by {model}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding a link to the Azure OpenAI documentation explaining why the dimensions parameter is not supported by text-embedding-ada-002. This will help users understand the limitations of this model.

Comment on lines +351 to +353
api_version="2024-02-01", # API version compatible with text-embedding-3 models
azure_endpoint=self.base_url
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding a comment explaining why the API version is being set to 2024-02-01 for text-embedding-3 models. This will help future developers understand the reasoning behind this change.

Comment on lines +180 to +184
error_msg = e.stderr
# Sanitize error message to remove any tokens
if access_token and access_token in error_msg:
error_msg = error_msg.replace(access_token, "[REDACTED]")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's good practice to sanitize the error message to remove tokens. However, the current implementation only replaces the token if it's an exact match. Consider using a regular expression to handle cases where the token might be surrounded by other characters.

Comment on lines +162 to +166
clone_url = urlunparse((parsed.scheme, f"{organization}:{access_token}@{parsed.netloc}", path, '', '', ''))

# Log the URL format (without exposing the token)
sanitized_url = urlunparse((parsed.scheme, f"{organization}:***TOKEN***@{parsed.netloc}", path, '', '', ''))
logger.info(f"Azure DevOps clone URL format (sanitized): {sanitized_url}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's good practice to sanitize the URL to avoid exposing the token in logs. However, the current implementation only replaces the token with ***TOKEN*** in the log message. Consider using a more robust method to prevent accidental exposure of the token, such as not logging the URL at all or using a hash of the token instead.

Comment on lines +1258 to +1259
window.location.href = requestUrl;
return; // Stop execution here as we're redirecting

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding a comment explaining why the window is being redirected to the catch-all route. This will help future developers understand the purpose of this redirection.

@sng-asyncfunc
Copy link
Contributor

Do you mind to provide steps to test? and have Azure Devops Repo users try this out?

@Barneyjm
Copy link
Author

Barneyjm commented Jun 6, 2025

Sure! Though there's no way for me to provide a public azdo repo, to my knowledge.

  1. add your various Azure Open AI endpoint variables, ie AZURE_OPENAI_API_BASE=https://your_deployment.openai.azure.com/ More here: https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Ckeyless%2Ctypescript-keyless%2Centra&pivots=programming-language-python#environment-variables
  2. add your Azure OpenAI embedder to the config files including your embedding model deployment name embedder['model_kwargs']['model']
"embedder": {
    "client_class": "AzureOpenAIClient",
    "batch_size": 500,
    "model_kwargs": {
      "model": "text-embedding-3-small",
      "dimensions": 256,
      "encoding_format": "float"
    }
  }
  1. generate wiki and select azure + your deployed models
image 4. enjoy: image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants