Skip to content

Conversation

@Hugo-Pare
Copy link
Collaborator

@Hugo-Pare Hugo-Pare commented Nov 12, 2025

Summary

This PR upgrades Solace Agent Mesh to Google ADK 1.18.0 and implements automatic database schema migrations, enhanced artifact version management, and improved error handling for SQL operations.

Key Changes

1. Automatic Database Migration System

Added automatic Alembic-based migrations that run on agent startup to ensure database schema compatibility with Google ADK.

New Files:

  • src/solace_agent_mesh/agent/adk/schema_migration.py - Programmatic migration runner
  • src/solace_agent_mesh/agent/adk/alembic/ - Migration directory with version history
  • src/solace_agent_mesh/agent/adk/alembic.ini - Alembic configuration
  • src/solace_agent_mesh/agent/adk/alembic/README - Migration documentation with ADK resources
  • Makefile - to streamline building project, testing etc.

Modified:

  • src/solace_agent_mesh/agent/adk/services.py - Auto-executes migrations on SQL session service initialization

Features:

  • Migrations run automatically on agent startup when using SQL session services
  • Uses Google's official Alembic approach for schema detection and updates
  • Version-controlled migration history
  • Safe updates: only adds missing columns, never deletes data
  • Supports SQLite, PostgreSQL, and MySQL

Reference: Based on Google ADK's official migration example

2. Enhanced Artifact Version Management

Implemented new ADK 1.18.0 artifact methods for improved version tracking and metadata access.

Modified:

  • src/solace_agent_mesh/agent/adk/artifacts/filesystem_artifact_service.py
  • src/solace_agent_mesh/agent/adk/artifacts/s3_artifact_service.py

New Methods:

  • list_artifact_versions() - Lists all versions with complete metadata (version, URI, MIME type, creation time)
  • get_artifact_version() - Retrieves metadata for a specific artifact version

Benefits:

  • Full artifact version history tracking
  • Enhanced metadata access without loading full artifact content
  • Better support for artifact management workflows

3. Long-Running Tool Support

Enhanced ADK flow handling to properly support long-running tools and prevent premature flow termination.

Modified:

  • src/solace_agent_mesh/agent/sac/patch_adk.py

Changes:

  • Updated patch_run_async() to detect and handle long-running tool calls
  • Prevents flow from breaking when tools indicate they're still processing
  • Maintains compatibility with existing tool execution patterns

4. Dependency Upgrades

Updated core Google dependencies to latest stable versions:

Modified:

  • pyproject.toml

Upgrades:

  • google-adk: 1.18.0 (from 1.7.0)
  • google-genai: 1.49.0
  • google-cloud-aiplatform: 1.126.1
  • google-cloud-storage: 3.5.0

updated to testing deps pinpointing fastmcp for compatibility.

Benefits:

  • Access to latest ADK features and improvements
  • Enhanced LLM response handling
  • Improved storage compatibility
  • Security updates and bug fixes

5. Improved SQL Error Handling

Enhanced error handling for database operations to provide better user feedback.

Modified:

  • src/solace_agent_mesh/agent/protocol/event_handlers.py

Improvements:

  • Catches OperationalError exceptions from SQLAlchemy

  • Differentiates between schema errors and general database errors

  • Provides user-friendly error messages:

    • Schema errors: Directs users to contact administrator for migrations
    • General errors: Suggests retry or contacting support
  • Properly NACKs failed messages with appropriate error responses

Migration Guide

For Existing Deployments

Automatic Migration: When agents start with SQL session services, the migration system automatically:

  1. Detects missing database columns
  2. Applies necessary schema updates
  3. Logs migration progress

No manual intervention required - migrations run seamlessly on startup.

For Development

To manually generate new migrations when ADK releases schema changes:

cd src/solace_agent_mesh/agent/adk
alembic revision --autogenerate -m "Update for ADK x.x.x"

Testing

make test-setup
make test-all 
  • ✅ Tested migration from ADK 1.7.0 to 1.18.0
  • ✅ Verified artifact version management with filesystem and S3 backends
  • ✅ Confirmed long-running tool support with peer agent calls
  • ✅ Validated SQL error handling with schema mismatches

Breaking Changes

ADK 1.18.0 introduced a resumability feature with the following API changes
in src/google/adk/runners.py:

1. Added new parameter: invocation_id (Optional[str]) for resuming interrupted invocations
2. Changed new_message from REQUIRED to OPTIONAL to support resumption without new input
3. Added strict validation: "if not invocation_id and not new_message: raise ValueError"

as result of the above one can not supply None for content and no invocation_id, thus in order to respect new contract
tool result content is wired and new message rather than writing to session:

relevant change:

new_tool_response_content = adk_types.Content(role="tool", parts=new_response_parts)

Documentation

  • Added comprehensive README in src/solace_agent_mesh/agent/adk/alembic/README
  • Includes links to Google ADK migration examples and documentation
  • Documents manual migration procedures for development use

Copy link
Contributor

@enavitan enavitan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks god, left a few comments below,

There is quite a lot of logic there i would suggest to have unit test coverage for them, like version, metadata, info lookup/ordering/extraction.

artifact_dir = self._get_artifact_dir(app_name, user_id, session_id, filename)
artifact_versions = []

if not await asyncio.to_thread(os.path.isdir, artifact_dir):
Copy link
Contributor

@enavitan enavitan Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand the intent, we push i/o calls to dedicated thread, then let main thread to continue doing in mem job
wdyt if we just define this entire method body as blocing then just have here:

result = await asyncio.to_thread(slow_code):

this way the dedicated thread does all the job while main loop waits for the result, this should make it a lil more readable IMO, up to you.

session_id: str,
version: int | None = None,
) -> ArtifactVersion | None:
"""Gets the metadata for a specific version of an artifact."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should prob state: - Gets the ArtifactVersion for supplied version or latest AretifactVersion if no version is supplied.

for page in pages:
for obj in page.get("Contents", []):
parts = obj["Key"].split("/")
if len(parts) >= 5: # scope/user/session_or_user/filename/version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd invert the condition here, to minimize the logical block, i.e:
if len(parts) < 5: # guard check skip this
continue

....

artifact_versions.append(artifact_version)

except (ValueError, ClientError) as e:
logger.warning(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shold be prob error ?

session_id: str,
version: int | None = None,
) -> ArtifactVersion | None:
"""Gets the metadata for a specific version of an artifact."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob: Gets ArtifactVersion for supplied version or last one if no version supplied.

component.handle_error(e, Event(EventType.MESSAGE, message))
return None

except OperationalError as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one complex, method lets extract this particular Exception case in a dedicated method. WDYT?

handle_db_error(...)

@enavitan enavitan requested a review from gregmeldrum November 20, 2025 17:53
@sonarqube-solacecloud
Copy link

Quality Gate failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 70%)

See analysis details on SonarQube

@enavitan enavitan marked this pull request as ready for review November 24, 2025 21:13
@cyrus2281 cyrus2281 requested a review from Copilot November 25, 2025 20:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades Solace Agent Mesh to Google ADK 1.18.0 and implements automatic database schema migrations, enhanced artifact version management, and improved error handling for SQL operations. The upgrade addresses breaking changes in ADK's resumability API and adds backward compatibility for artifact storage implementations.

Key changes:

  • Automatic Alembic-based database migrations that run on agent startup
  • New artifact version management methods (list_artifact_versions, get_artifact_version) across all artifact services
  • Updated ADK flow handling to support long-running tools and prevent premature termination

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pyproject.toml Updates core Google dependencies to ADK 1.18.0 and related packages, adds test dependencies
src/solace_agent_mesh/agent/adk/services.py Adds migration execution on SQL session service initialization and implements artifact version methods
src/solace_agent_mesh/agent/adk/schema_migration.py New migration runner that executes Alembic migrations programmatically
src/solace_agent_mesh/agent/adk/artifacts/filesystem_artifact_service.py Implements new artifact version management methods for filesystem storage
src/solace_agent_mesh/agent/adk/artifacts/s3_artifact_service.py Implements new artifact version management methods for S3 storage
src/solace_agent_mesh/agent/sac/component.py Updates ADK runner invocation to handle breaking changes in resumability API
src/solace_agent_mesh/agent/sac/patch_adk.py Updates event filtering to use new ADK 1.18.0 methods
src/solace_agent_mesh/agent/protocol/event_handlers.py Adds specialized error handling for SQL operational errors
tests/sam-test-infrastructure/src/sam_test_infrastructure/artifact_service/service.py Updates test artifact service to support new version management methods and 3-tuple storage format
tests/unit/agent/adk/models/test_lite_llm_caching.py Skips test incompatible with ADK 1.18.0 requirements
Makefile Adds development workflow automation commands
src/solace_agent_mesh/agent/adk/alembic/* New Alembic configuration and migration files for ADK schema compatibility

Comment on lines +1 to +2
import google.adk.sessions.database_session_service
"""ADK session DB upgrade
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import statement should be moved after the module docstring. In Python, the docstring should be the first statement in the module, followed by imports.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ☝🏻

Copy link
Collaborator

@cyrus2281 cyrus2281 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing major stood out to me.
Extensive testing is needed to ensure backward compatibility.

@cyrus2281
Copy link
Collaborator

the title should not be chore, it should at least be a fix or feat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants