Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡️ Speed up method AstraDBVectorStoreComponent.reset_collection_list by 12% in PR #6048 (bugfix-dev-astradb) #6085

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 3, 2025

⚡️ This pull request contains optimizations for PR #6048

If you approve this dependent PR, these changes will be merged into the original PR branch bugfix-dev-astradb.

This PR will be automatically closed if the original PR is merged.


📄 12% (0.12x) speedup for AstraDBVectorStoreComponent.reset_collection_list in src/backend/base/langflow/components/vectorstores/astradb.py

⏱️ Runtime : 417 milliseconds 373 milliseconds (best of 5 runs)

📝 Explanation and details

To optimize the given Python program for better runtime performance, we can reduce the number of times certain functions are called and avoid redundant computations. We'll make changes to avoid multiple calls to retrieve the collection data and reuse already created objects where possible.

Changes Made.

  1. Caching Keyspace: By caching the keyspace value in a variable (keyspace), we avoid multiple calls to the get_keyspace() method.
  2. Inlined Metadata Extraction: Instead of extracting metadata in the list comprehension directly, we define a helper function get_collection_metadata to avoid re-fetching data.
  3. Single Pass for Options and Metadata: Combined options and metadata generation into a single pass loop to avoid repeated dictionary comprehension.

These changes reduce function calls, loop iterations, and improve the readability and maintainability of the code.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 19 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage undefined
🌀 Generated Regression Tests Details
from unittest.mock import MagicMock

# imports
import pytest  # used for our unit tests
# function to test
from langflow.base.vectorstores.model import LCVectorStoreComponent
from langflow.components.vectorstores.astradb import \
    AstraDBVectorStoreComponent

# unit tests

@pytest.fixture
def mock_component():
    component = AstraDBVectorStoreComponent()
    component.get_database_object = MagicMock()
    component.get_keyspace = MagicMock(return_value="test_keyspace")
    component.collection_data = MagicMock(return_value=[])
    return component

def test_reset_collection_list_empty_db(mock_component):
    # Setup empty database mock
    mock_component.get_database_object().list_collections.return_value = []
    build_config = {"collection_name": {"options": [], "options_metadata": [], "value": "test"}}
    
    # Call reset_collection_list
    codeflash_output = mock_component.reset_collection_list(build_config)

def test_reset_collection_list_single_collection(mock_component):
    # Setup database mock with one collection
    mock_collection = MagicMock()
    mock_collection.name = "collection1"
    mock_collection.options.vector.service.provider = "provider1"
    mock_collection.options.vector.service.model_name = "model1"
    mock_component.get_database_object().list_collections.return_value = [mock_collection]
    build_config = {"collection_name": {"options": [], "options_metadata": [], "value": "test"}}
    
    # Call reset_collection_list
    codeflash_output = mock_component.reset_collection_list(build_config)

def test_reset_collection_list_multiple_collections(mock_component):
    # Setup database mock with multiple collections
    mock_collection1 = MagicMock()
    mock_collection1.name = "collection1"
    mock_collection1.options.vector.service.provider = "provider1"
    mock_collection1.options.vector.service.model_name = "model1"
    
    mock_collection2 = MagicMock()
    mock_collection2.name = "collection2"
    mock_collection2.options.vector.service.provider = "provider2"
    mock_collection2.options.vector.service.model_name = "model2"
    
    mock_component.get_database_object().list_collections.return_value = [mock_collection1, mock_collection2]
    build_config = {"collection_name": {"options": [], "options_metadata": [], "value": "test"}}
    
    # Call reset_collection_list
    codeflash_output = mock_component.reset_collection_list(build_config)

def test_initialize_collection_options_no_api_endpoint(mock_component):
    # Setup database mock
    mock_component.get_database_object().list_collections.return_value = []
    
    # Call _initialize_collection_options without api_endpoint
    result = mock_component._initialize_collection_options()

def test_initialize_collection_options_invalid_api_endpoint(mock_component):
    # Setup database mock to raise an error for invalid api_endpoint
    mock_component.get_database_object.side_effect = Exception("Invalid API Endpoint")
    
    # Call _initialize_collection_options with invalid api_endpoint
    with pytest.raises(Exception, match="Invalid API Endpoint"):
        mock_component._initialize_collection_options(api_endpoint="invalid_endpoint")

def test_reset_collection_list_db_connection_failure(mock_component):
    # Setup database mock to simulate connection failure
    mock_component.get_database_object.side_effect = Exception("Database connection failed")
    build_config = {"collection_name": {"options": [], "options_metadata": [], "value": "test"}}
    
    # Call reset_collection_list
    with pytest.raises(Exception, match="Database connection failed"):
        mock_component.reset_collection_list(build_config)

def test_reset_collection_list_invalid_build_config(mock_component):
    # Setup database mock
    mock_component.get_database_object().list_collections.return_value = []
    build_config = {"invalid_key": {"options": [], "options_metadata": [], "value": "test"}}
    
    # Call reset_collection_list
    with pytest.raises(KeyError):
        mock_component.reset_collection_list(build_config)

def test_reset_collection_list_mutation(mock_component):
    # Setup database mock
    mock_collection = MagicMock()
    mock_collection.name = "collection1"
    mock_collection.options.vector.service.provider = "provider1"
    mock_collection.options.vector.service.model_name = "model1"
    mock_component.get_database_object().list_collections.return_value = [mock_collection]
    build_config = {"collection_name": {"options": [], "options_metadata": [], "value": "test"}}
    
    # Call reset_collection_list
    codeflash_output = mock_component.reset_collection_list(build_config)

def test_reset_collection_list_reset_selected(mock_component):
    # Setup database mock
    mock_collection = MagicMock()
    mock_collection.name = "collection1"
    mock_collection.options.vector.service.provider = "provider1"
    mock_collection.options.vector.service.model_name = "model1"
    mock_component.get_database_object().list_collections.return_value = [mock_collection]
    build_config = {"collection_name": {"options": [], "options_metadata": [], "value": "collection1"}}
    
    # Call reset_collection_list
    codeflash_output = mock_component.reset_collection_list(build_config)

def test_reset_collection_list_large_data(mock_component):
    # Setup database mock with a large number of collections
    mock_collections = []
    for i in range(1000):
        mock_collection = MagicMock()
        mock_collection.name = f"collection{i}"
        mock_collection.options.vector.service.provider = f"provider{i}"
        mock_collection.options.vector.service.model_name = f"model{i}"
        mock_collections.append(mock_collection)
    mock_component.get_database_object().list_collections.return_value = mock_collections
    build_config = {"collection_name": {"options": [], "options_metadata": [], "value": "test"}}
    
    # Call reset_collection_list
    codeflash_output = mock_component.reset_collection_list(build_config)

def test_reset_collection_list_complex_metadata(mock_component):
    # Setup database mock with collections having complex metadata
    mock_collection = MagicMock()
    mock_collection.name = "complex_collection"
    mock_collection.options.vector.service.provider = "complex_provider"
    mock_collection.options.vector.service.model_name = "complex_model"
    mock_collection.options.vector.service.extra_metadata = {"key": "value"}
    mock_component.get_database_object().list_collections.return_value = [mock_collection]
    build_config = {"collection_name": {"options": [], "options_metadata": [], "value": "test"}}
    
    # Call reset_collection_list
    codeflash_output = mock_component.reset_collection_list(build_config)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from unittest.mock import MagicMock

# imports
import pytest  # used for our unit tests
# function to test
from langflow.base.vectorstores.model import LCVectorStoreComponent
from langflow.components.vectorstores.astradb import \
    AstraDBVectorStoreComponent

# unit tests

@pytest.fixture
def mock_component():
    component = AstraDBVectorStoreComponent()
    component.get_database_object = MagicMock()
    component.get_keyspace = MagicMock(return_value="test_keyspace")
    component.collection_data = MagicMock(return_value={"data": "test_data"})
    return component

def test_basic_functionality(mock_component):
    # Mock database and collections
    mock_database = MagicMock()
    mock_collection = MagicMock()
    mock_collection.name = "test_collection"
    mock_collection.options.vector.service.provider = "test_provider"
    mock_collection.options.vector.service.model_name = "test_model"
    mock_database.list_collections = MagicMock(return_value=[mock_collection])
    mock_component.get_database_object.return_value = mock_database

    build_config = {
        "collection_name": {
            "options": [],
            "value": ""
        }
    }

    expected_build_config = {
        "collection_name": {
            "options": ["test_collection"],
            "options_metadata": [{
                "records": {"data": "test_data"},
                "provider": "test_provider",
                "icon": "",
                "model": "test_model"
            }],
            "value": ""
        }
    }

    codeflash_output = mock_component.reset_collection_list(build_config)

def test_empty_database(mock_component):
    # Mock database with no collections
    mock_database = MagicMock()
    mock_database.list_collections = MagicMock(return_value=[])
    mock_component.get_database_object.return_value = mock_database

    build_config = {
        "collection_name": {
            "options": [],
            "value": ""
        }
    }

    expected_build_config = {
        "collection_name": {
            "options": [],
            "options_metadata": [],
            "value": ""
        }
    }

    codeflash_output = mock_component.reset_collection_list(build_config)

def test_single_collection(mock_component):
    # Mock database with one collection
    mock_database = MagicMock()
    mock_collection = MagicMock()
    mock_collection.name = "single_collection"
    mock_collection.options.vector.service.provider = None
    mock_collection.options.vector.service.model_name = None
    mock_database.list_collections = MagicMock(return_value=[mock_collection])
    mock_component.get_database_object.return_value = mock_database

    build_config = {
        "collection_name": {
            "options": [],
            "value": ""
        }
    }

    expected_build_config = {
        "collection_name": {
            "options": ["single_collection"],
            "options_metadata": [{
                "records": {"data": "test_data"},
                "provider": None,
                "icon": "",
                "model": None
            }],
            "value": ""
        }
    }

    codeflash_output = mock_component.reset_collection_list(build_config)

def test_missing_metadata(mock_component):
    # Mock database with collections missing metadata
    mock_database = MagicMock()
    mock_collection = MagicMock()
    mock_collection.name = "collection_without_metadata"
    mock_collection.options.vector = None
    mock_database.list_collections = MagicMock(return_value=[mock_collection])
    mock_component.get_database_object.return_value = mock_database

    build_config = {
        "collection_name": {
            "options": [],
            "value": ""
        }
    }

    expected_build_config = {
        "collection_name": {
            "options": ["collection_without_metadata"],
            "options_metadata": [{
                "records": {"data": "test_data"},
                "provider": None,
                "icon": "",
                "model": None
            }],
            "value": ""
        }
    }

    codeflash_output = mock_component.reset_collection_list(build_config)

def test_large_number_of_collections(mock_component):
    # Mock database with a large number of collections
    mock_database = MagicMock()
    mock_collections = []
    for i in range(1000):
        mock_collection = MagicMock()
        mock_collection.name = f"collection_{i}"
        mock_collection.options.vector.service.provider = "test_provider"
        mock_collection.options.vector.service.model_name = "test_model"
        mock_collections.append(mock_collection)
    mock_database.list_collections = MagicMock(return_value=mock_collections)
    mock_component.get_database_object.return_value = mock_database

    build_config = {
        "collection_name": {
            "options": [],
            "value": ""
        }
    }

    expected_build_config = {
        "collection_name": {
            "options": [f"collection_{i}" for i in range(1000)],
            "options_metadata": [{
                "records": {"data": "test_data"},
                "provider": "test_provider",
                "icon": "",
                "model": "test_model"
            } for i in range(1000)],
            "value": ""
        }
    }

    codeflash_output = mock_component.reset_collection_list(build_config)

def test_special_characters_in_collection_names(mock_component):
    # Mock database with collections having special characters in names
    mock_database = MagicMock()
    special_names = ["test@collection", "collection with spaces", "unicode_测试"]
    mock_collections = []
    for name in special_names:
        mock_collection = MagicMock()
        mock_collection.name = name
        mock_collection.options.vector.service.provider = "test_provider"
        mock_collection.options.vector.service.model_name = "test_model"
        mock_collections.append(mock_collection)
    mock_database.list_collections = MagicMock(return_value=mock_collections)
    mock_component.get_database_object.return_value = mock_database

    build_config = {
        "collection_name": {
            "options": [],
            "value": ""
        }
    }

    expected_build_config = {
        "collection_name": {
            "options": special_names,
            "options_metadata": [{
                "records": {"data": "test_data"},
                "provider": "test_provider",
                "icon": "",
                "model": "test_model"
            } for _ in special_names],
            "value": ""
        }
    }

    codeflash_output = mock_component.reset_collection_list(build_config)

def test_invalid_build_config_structure(mock_component):
    # Test with invalid build config structure
    build_config = {
        "invalid_key": {
            "options": [],
            "value": ""
        }
    }

    with pytest.raises(KeyError):
        mock_component.reset_collection_list(build_config)

def test_null_api_endpoint(mock_component):
    # Test with null API endpoint
    mock_database = MagicMock()
    mock_collection = MagicMock()
    mock_collection.name = "test_collection"
    mock_collection.options.vector.service.provider = "test_provider"
    mock_collection.options.vector.service.model_name = "test_model"
    mock_database.list_collections = MagicMock(return_value=[mock_collection])
    mock_component.get_database_object.return_value = mock_database

    build_config = {
        "collection_name": {
            "options": [],
            "value": ""
        }
    }

    mock_component._initialize_collection_options(api_endpoint=None)
    codeflash_output = mock_component.reset_collection_list(build_config)
    
    expected_build_config = {
        "collection_name": {
            "options": ["test_collection"],
            "options_metadata": [{
                "records": {"data": "test_data"},
                "provider": "test_provider",
                "icon": "",
                "model": "test_model"
            }],
            "value": ""
        }
    }
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

Codeflash

erichare and others added 8 commits January 31, 2025 11:31
… improve code structure and readability

📝 (duck_duck_go_search_run.py): update DuckDuckGoSearchComponent with new display name, description, and documentation URL
📝 (duck_duck_go_search_run.py): update DuckDuckGoSearchComponent inputs with additional information and tool mode
📝 (duck_duck_go_search_run.py): update DuckDuckGoSearchComponent outputs with new output methods and display names
📝 (duck_duck_go_search_run.py): update DuckDuckGoSearchComponent methods to improve clarity and functionality
…` by 12% in PR #6048 (`bugfix-dev-astradb`)

To optimize the given Python program for better runtime performance, we can reduce the number of times certain functions are called and avoid redundant computations. We'll make changes to avoid multiple calls to retrieve the collection data and reuse already created objects where possible.



### Changes Made.
1. **Caching Keyspace**: By caching the keyspace value in a variable (`keyspace`), we avoid multiple calls to the `get_keyspace()` method.
2. **Inlined Metadata Extraction**: Instead of extracting metadata in the list comprehension directly, we define a helper function `get_collection_metadata` to avoid re-fetching data.
3. **Single Pass for Options and Metadata**: Combined options and metadata generation into a single pass loop to avoid repeated dictionary comprehension.

These changes reduce function calls, loop iterations, and improve the readability and maintainability of the code.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Feb 3, 2025
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. python Pull requests that update Python code labels Feb 3, 2025
codeflash-ai bot added a commit that referenced this pull request Feb 3, 2025
…n_options` by 21% in PR #6085 (`codeflash/optimize-pr6048-2025-02-03T13.54.58`)

Here's the optimized version.
Copy link
Contributor Author

codeflash-ai bot commented Feb 3, 2025

⚡️ Codeflash found optimizations for this PR

📄 21% (0.21x) speedup for AstraDBVectorStoreComponent._initialize_collection_options in src/backend/base/langflow/components/vectorstores/astradb.py

⏱️ Runtime : 408 milliseconds 336 milliseconds (best of 5 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch codeflash/optimize-pr6048-2025-02-03T13.54.58).

Base automatically changed from bugfix-dev-astradb to main February 3, 2025 15:53
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI python Pull requests that update Python code size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants