Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text-to-SQL proof of concept #5788

Open
wants to merge 51 commits into
base: main
Choose a base branch
from

Conversation

ad-elias
Copy link

@ad-elias ad-elias commented Jun 9, 2024

Added:

  • An "Ask AI" command to the command menu.
  • A simple GraphQL resolver that converts the user's question into a relevant SQL query using an LLM, runs the query, and returns the result.
Screenshot 2024-06-09 at 20 53 09

No security concerns have been addressed, this is only a proof-of-concept and not intended to be enabled in production.

All changes are behind a feature flag called IS_ASK_AI_ENABLED.

Copy link

github-actions bot commented Jun 9, 2024

Warnings
⚠️ Changes were made to the environment variables, but not to the documentation - Please review your changes and check if a change needs to be documented!

Welcome!

Hello there, congrats on your first PR! We're excited to have you contributing to this project.
By submitting your Pull Request, you acknowledge that you agree with the terms of our Contributor License Agreement.

TODOs/FIXMEs:

  • ``// TODO
  • Icon: IconSparkles,``: packages/twenty-front/src/modules/command-menu/components/CommandMenu.tsx

Generated by 🚫 dangerJS against 8100e21

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

  • Added "Ask AI" command to command menu
  • Introduced Text-to-SQL GraphQL resolver and service
  • Added langchain dependency and updated typeorm
  • Integrated new feature flag IS_ASK_AI_ENABLED
  • Updated core engine module to include TextToSQLModule

@FelixMalfait
Copy link
Member

😍 amazing! Looking forward to reviewing this first proof of concept!

Copy link
Member

@FelixMalfait FelixMalfait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great work! It's 100% in the right direction.
I tried it locally and the results seemed to be disappointing (only got Error executing raw query for workspace ....: syntax error at or near ... for every query I tried), any idea why? I didn't dig into it / didn't even print the intermediate sql result to investigate yet

package.json Outdated Show resolved Hide resolved
const workspaceSchemaName =
this.workspaceDataSourceService.getSchemaName(workspaceId);

const workspaceDataSource =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should hook into WorkspaceQueryRunner service which already has similar code?
As of now WorkspaceQueryRunner forces query execution through graphql.resolve but maybe refactoring a bit we could rely on SET search_path etc and not duplicate that logic? Not 100% sure I understand every single line but it seems similar to what you did here

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

(updates since last review)

  • Introduced 'Ask AI' feature to convert user questions into SQL queries using LLM
  • Added new components and hooks for 'Ask AI' in the right drawer
  • Updated command menu to include 'Ask AI' command
  • Modified existing components to support new feature
  • Added new environment variables and dependencies for OpenAI and Langfuse integration

@@ -0,0 +1,95 @@
import { Injectable } from '@nestjs/common';

import { ChatOpenAI } from '@langchain/openai';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why we would eventually not want to use function calling eventually to avoid coupling too much with Open AI? But JSON mode can only be positive?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check this out if non-SQL repsonses from the LLM keep being an issue after iterating the prompt!


const sqlQuery = await sqlQueryGeneratorChain.invoke(
{
schema: await db.getTableInfo(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed via Discord you can look into workspaceCacheVersion ; graphql and sql schema can both be synced and used the same version / invalidation

Copy link
Member

@FelixMalfait FelixMalfait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Some minor comments that could be addressed later, except for getRecordMetadataById where I feel you're headed in a wrong direction (although I might have not understood it very well)

recordMetadataById={data.getAskAI.recordMetadataById}
/>
) : (
'Invalid SQL query.'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably when query is invalid we should let the user have an option to view the error.
E.g. I tried this and it was frustrating not to be able to see the error as it wasn't obvious to me (v2 to add to roadmap: let AI try to fix the error)
Screenshot 2024-06-23 at 10 59 44

promptTemplateName: string,
): Promise<PromptTemplate<PromptTemplateInput<T>, any>> {
const filePath = join(
resolveAbsolutePath('.llm-prompt-templates'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you've made this choice and it's a good choice. But subjectively I would prefer to keep the root clean if you just put it in within /llm-prompt-template/ folder. Maybe create folders within driver (/file/file.driver.ts, /langfuse/langfuse.driver.ts ; and put it in /llm-prompt-template/drivers/file/templates/*.txt). Thanks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the prompt to a variable and deleted LLMPromptTemplateDriver for now.

It's probably not as useful as I thought to store the prompt template in a separate file or in Langfuse, because the most frequently changed and important parts of the prompt will be dynamically generated (not editable in Langfuse).

return recordMetadataById;
}

async query(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One challenge is pagination / how to handle a large number of results. It feels like there's a divergence between UI and CSV/download

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be for exemple, show count and limit to 30 results in view (because we enrich the data which is costly and poorly optimized). When downloading CSV view all results because we don't enrich?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants