Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we insert variables into tool docstring? #271

Closed
off6atomic opened this issue May 23, 2024 · 4 comments
Closed

How do we insert variables into tool docstring? #271

off6atomic opened this issue May 23, 2024 · 4 comments
Labels
Question Further information is needed

Comments

@off6atomic
Copy link
Contributor

off6atomic commented May 23, 2024

Question

This is how we insert arbitrary variables into tool description from the official OpenAI cookbook guide:

tools = [
    {
        "type": "function",
        "function": {
            "name": "ask_database",
            "description": "Use this function to answer user questions about music. Input should be a fully formed SQL query.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": f"""
                                SQL query extracting info to answer the user's question.
                                SQL should be written using this database schema:
                                {database_schema_string}
                                The query should be returned in plain text, not in JSON.
                                """,
                    }
                },
                "required": ["query"],
            },
        }
    }
]

Notice that we are inserting the database schema into the function specification. This will be important for the model to know about.

How can we achieve same behavior in Mirascope?

Goal

I'm trying to inject schema of a Postgres table into my docstring so that the model can interpret list of values being returned from the function correctly.

@off6atomic off6atomic added the Question Further information is needed label May 23, 2024
@ashishpatel26
Copy link

To achieve similar functionality in Mirascope, where you want to inject the schema of a PostgreSQL table into your docstring so that the model can interpret the list of values being returned from the function correctly, you would follow a process similar to what you described but tailored to Mirascope's capabilities and syntax. Since Mirascope doesn't have a direct equivalent to the OpenAI cookbook guide's method of embedding dynamic content within docstrings, you'll need to manually construct your docstring to include the necessary details about the database schema.

Here's a step-by-step approach to accomplish this:

  1. Retrieve the Database Schema: First, you need to retrieve the schema of the PostgreSQL table you're interested in. You can do this programmatically using SQL queries or through database management tools like pgAdmin or DBeaver.

  2. Convert the Schema to a String: Once you have the schema, convert it into a string format that can be easily embedded into your docstring. This might involve formatting the schema in a way that clearly outlines the columns, their types, and any constraints they may have.

  3. Embed the Schema in Your Docstring: When defining your function in Mirascope, manually embed the schema string into the docstring of your function definition. Ensure that the schema is formatted in a clear and understandable manner for anyone reading the documentation or for the model interpreting the function.

Here's an example of how you might define such a function in Mirascope, assuming you've retrieved and converted your PostgreSQL table schema into a string named postgres_table_schema:

def get_user_data(user_id):
    """
    Retrieve detailed information about a user based on their ID.

    :param user_id: The ID of the user to retrieve information for.
    :return: A dictionary containing the user's information, including fields like name, email, and registration date.

    Note: The following schema describes the structure of the data returned by this function:

    {postgres_table_schema}

    Example return value:
    {
        "id": 123,
        "name": "John Doe",
        "email": "[email protected]",
        "registration_date": "2024-01-01"
    }
    """
    # Function implementation here

In this example, {postgres_table_schema} is a placeholder for the actual schema string you've prepared. Replace it with the actual schema string when documenting your function.

Remember, the key to successfully injecting schema information into your docstring lies in accurately representing the schema in a readable format and ensuring that the schema string is correctly embedded into the docstring. This approach requires manual intervention but allows for precise control over the information presented to the model and users of your function.

@off6atomic
Copy link
Contributor Author

off6atomic commented May 23, 2024

@ashishpatel26 Manual injection is OK but it's not ideal solution in my case. The database schema could be updated frequently. It's going to be cumbersome to have to update the docstring every time your schema changes.

Also if possible, please instruct your ChatGPT to be a bit more concise. If all it's going to say is to inject data manually, it could do that in 3 sentences max.

Appreciate your help though.

@willbakst
Copy link
Contributor

@off6atomic this is really interesting. I've thought a little bit about this:

  1. I filed an issue to enable using the raw schema. This would unblock any issues with using functions/tools since you would then be able to always just write the raw schema if necessary (similar to how we let you write your own messages array in case our prompt template parser doesn't support something).
  2. I'm trying to think through how we would best support this. It's tough because the function conversion relies on parsing the docstring, which is difficult to make dynamic. We generate the schema from the tool class definition, which similarly makes it difficult to make dynamic. As a first step, we should allow for the raw schema (point 1) to unblock while I try to figure out the best path forward here. For now, I've filed another feature request issue so we can better track this there.

In the meantime, you can also always just insert the dynamic string into the prompt_template and reference the tool. This likely won't be as good as inserting it directly into the tool definition, but it should still (hopefully) work while we work towards resolving the two issues I posted.

Thanks for bringing this up!

@willbakst
Copy link
Contributor

I'm closing this issue as all related future work will be tracked in #276 and #278 (alongside #294)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Question Further information is needed
Projects
None yet
Development

No branches or pull requests

3 participants