Skip to content

Conversation

@ayanray089
Copy link
Contributor

@ayanray089 ayanray089 commented Oct 9, 2025

Description

This PR adds a comprehensive MongoDB Atlas Memory tool that provides semantic memory management capabilities using MongoDB Atlas as the backend with vector embeddings for semantic search.

Key Features:

  • Semantic Search: Automatic embedding generation using Amazon Bedrock Titan for vector similarity search
  • Memory Management: Store, retrieve, list, get, and delete memory operations
  • Index Management: Automatic vector search index creation with proper configuration
  • Namespace Support: Organize memories by namespace for multi-user scenarios
  • Pagination: Support for paginated results in list and retrieve operations
  • Error Handling: Comprehensive error handling with clear error messages

Implementation Details:

  • Uses MongoDB Atlas Vector Search with $vectorSearch aggregation pipeline
  • Supports Amazon Bedrock Titan v2 embeddings (1024 dimensions, cosine similarity)
  • Includes namespace-based data isolation for multi-tenant scenarios
  • Implements pagination support with next_token (skip/limit pattern)
  • Follows the same action-based interface as elasticsearch_memory (record, retrieve, list, get, delete)

Related Issues

Documentation PR

Type of Change

New Tool

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

  • All 27 unit tests pass successfully

  • Integration tests with real MongoDB Atlas cluster verified functionality

  • Code formatting and linting checks pass

  • No breaking changes to existing functionality

Test Coverage:

  • Complete unit test suite with 27 test functions covering all CRUD operations
  • Mocking of MongoDB client and Bedrock client for isolated testing
  • Error handling scenarios, pagination, namespaces, and metadata handling
  • Integration test with real MongoDB Atlas credentials (kept local)

Checklist

  • I have read the CONTRIBUTING document

  • I have added any necessary tests that prove my fix is effective or my feature works

  • I have updated the documentation accordingly

  • I have added an appropriate example to the documentation to outline the feature

  • My changes generate no new warnings

  • Any dependent changes have been merged and published


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Ayan Ray added 3 commits October 9, 2025 11:51
- Implement mongodb_memory.py following elasticsearch_memory.py patterns
- Add MongoDB Atlas vector search with Amazon Bedrock Titan v2 embeddings
- Support all CRUD operations: record, retrieve, list, get, delete
- Include namespace-based data isolation and pagination
- Add comprehensive unit tests (27 tests) with full coverage
- Update pyproject.toml with pymongo optional dependency
- Graceful error handling for vector index creation
- Production-ready with proper logging and validation
- Implement mongodb_memory.py following elasticsearch_memory.py design patterns
- Add MongoDB Atlas Vector Search with  aggregation pipeline
- Support Amazon Bedrock Titan v2 embeddings (1024 dimensions, cosine similarity)
- Include namespace-based data isolation for multi-tenant scenarios
- Add pagination support with next_token (skip/limit pattern)
- Comprehensive error handling and logging
- Add 27 unit tests covering all CRUD operations and edge cases
- Create detailed documentation following elasticsearch_memory_tool.md pattern
- Update README.md with MongoDB Atlas Memory usage examples
- Add environment variables configuration for MongoDB Atlas
- Add mongodb_memory optional dependency to pyproject.toml
- Integration test file contains sensitive credentials and should remain local only
DEFAULT_VECTOR_INDEX_NAME = "vector_index"


def _ensure_vector_search_index(collection, index_name: str = DEFAULT_VECTOR_INDEX_NAME):
Copy link
Contributor

@JackYPCOnline JackYPCOnline Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for each parameter we should specify type annotation

logger.info(f"Created vector search index: {index_name}")

# Wait a moment for index to be ready
import time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not top

# Don't raise exception - allow the tool to work without vector search


def _generate_embedding(bedrock_runtime, text: str, embedding_model: str) -> List[float]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here and remaining functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also functions shouldn't receive infrastructure dependencies as parameters

skip_count = int(next_token) if next_token else 0

# Query for memories in namespace
cursor = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: does this naming consist with other memory tool?

# Query for memories in namespace
cursor = (
collection.find(
{"namespace": namespace}, {"memory_id": 1, "content": 1, "timestamp": 1, "metadata": 1, "_id": 0}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these numbers are hardcoded?

README.md Outdated
action="record",
content="User prefers vegetarian pizza with extra cheese",
metadata={"category": "food_preferences", "type": "dietary"},
cluster_uri="mongodb+srv://user:[email protected]/?retryWrites=true&w=majority",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe cluster_uri can be defined upfront (using class based init) so the agents have no access to possible sensitive information like password of cluster

# Wait a moment for index to be ready
import time

time.sleep(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the time.sleep(2) from here ^^

"index": index_name,
"path": "embedding",
"queryVector": query_embedding,
"numCandidates": max_results * 3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason we're gathering x3 results?

)
return {
"status": "success",
"content": [{"text": f"Memories retrieved successfully: {json.dumps(response, default=str)}"}],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can place the json under "content": [{"json": ..., }] block instead of serializing into text here ^^

- Implement complete MongoDB Atlas memory tool with vector search capabilities
- Add semantic search using Amazon Bedrock Titan embeddings (1024 dimensions)
- Support full CRUD operations: record, retrieve, list, get, delete
- Add namespace support for multi-user memory isolation
- Include environment variable configuration support
- Add security features including connection string masking
- Implement JSON response format optimization
- Add comprehensive test suite with 27 test cases covering all functionality
- Follow same design principles as existing Elasticsearch memory tool
@ayanray089
Copy link
Contributor Author

@JackYPCOnline , @cagataycali Addressed all the feedback in the latest commit

)
# Create agent with secure tool usage
agent = Agent(tools=[memory_tool.mongodb_memory])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be : mongodb_memory

from strands_tools.mongodb_memory import MongoDBMemoryTool
# RECOMMENDED: Secure class-based approach (credentials hidden from agents)
memory_tool = MongoDBMemoryTool(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is different from doc. Please keep all docs consitent.


try:
# Pattern to match mongodb+srv://username:password@host/...
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to the top

result = agent.tool.mongodb_memory(
action="record",
content="User prefers vegetarian pizza with extra cheese",
cluster_uri="mongodb+srv://user:[email protected]/?retryWrites=true&w=majority",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually we don't need ?retryWrites=true&w=majority

## Prerequisites

1. **MongoDB Atlas**: You need a MongoDB Atlas cluster with:
- Connection URI (mongodb+srv format)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume user have no knowledge with MongoDB, we should point them where to find this URI.

- Fix documentation examples to use consistent class-based approach
- Remove unnecessary query parameters from connection string examples
- Add comprehensive MongoDB Atlas connection URI guidance
- Add explanatory code comments for numCandidates usage
- Ensure all examples follow the same pattern throughout documentation
- All 27 unit tests continue to pass
- Tested with real MongoDB Atlas credentials successfully
@ayanray089
Copy link
Contributor Author

@JackYPCOnline Addressed the latest feedback

JackYPCOnline
JackYPCOnline previously approved these changes Oct 21, 2025
- Resolve conflicts in README.md by merging MongoDB Atlas Memory Tool and Retrieve Tool sections
- Resolve conflicts in pyproject.toml by including both elasticsearch_memory and mongodb_memory dependencies
Copy link
Contributor

@JackYPCOnline JackYPCOnline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review all files thoroughly to ensure code quality

README.md Outdated
action="record",
content="User prefers vegetarian pizza with extra cheese",
metadata={"category": "food_preferences", "type": "dietary"},
cluster_uri="mongodb+srv://user:[email protected]/?retryWrites=true&w=majority",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need ?retryWrites=true&w=majority and all other places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

README.md Outdated
| Environment Variable | Description | Default |
|----------------------|-------------|---------|
| RETRIEVE_ENABLE_METADATA_DEFAULT | Default setting for enabling metadata in retrieve tool responses | false |
>>>>>>> origin/main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the conflict resolution is not done here ^^

Suggested change
>>>>>>> origin/main

…MongoDB connection strings

- Remove ?retryWrites=true&w=majority from all MongoDB connection string examples in README.md
- Clean up connection string format to follow best practices
- Addresses review comment about unnecessary query parameters
@ayanray089
Copy link
Contributor Author

@JackYPCOnline Addressed latest feedback

README.md Outdated
| Environment Variable | Description | Default |
|----------------------|-------------|---------|
| RETRIEVE_ENABLE_METADATA_DEFAULT | Default setting for enabling metadata in retrieve tool responses | false |
>>>>>>> origin/main
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>>>>>> origin/main

Can we remove this too ^^

response = _record_memory(collection, bedrock_runtime, namespace, self._embedding_model, content, metadata)
return {
"status": "success",
"content": [{"text": f"Memory stored successfully: {json.dumps(response, default=str)}"}],
Copy link
Member

@cagataycali cagataycali Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return the JSON as content block?

"content" blocks supports "json",

"content": [{"text": f"Memory stored successfully: {json.dumps(response, default=str)}"}],

"content": [{"text": "Memory stored successfully"}, {"json": response}],

raise MongoDBEmbeddingError(f"Embedding generation failed: {str(e)}") from e


def _truncate_content(content: str, max_length: int = MAX_CONTENT_LENGTH) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like default MAX_CONTENT_LENGTH is set for 30 which will be pretty short for the retrieved context.

- Increased MAX_RESPONSE_SIZE from 1,000 to 70,000 characters
- Increased MAX_CONTENT_LENGTH from 8,000 to 12,000 characters
- Increased MAX_MEMORIES_IN_RESPONSE from 2 to 5 memories
- Fixed unit tests to correctly access JSON from response content
- All 27 unit tests now passing
@ayanray089
Copy link
Contributor Author

@JackYPCOnline @cagataycali Addressed the latest comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants