Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP] Proposal for Agentic Dashboard and Chart Summarization using LLM #32408

Open
ved-kashyap-samsung opened this issue Feb 27, 2025 · 3 comments
Labels
change:backend Requires changing the backend change:frontend Requires changing the frontend design:proposal Design proposals sip Superset Improvement Proposal

Comments

@ved-kashyap-samsung
Copy link
Contributor

Motivation

The goal of this proposal is to introduce a new feature into Apache Superset that leverages Large Language Models (LLMs) to provide advanced dashboard and chart summarization capabilities. This feature aims to enhance user experience by enabling natural language query support, automated summarization, and intelligent chart selection based on user queries. The proposed feature will also reduce dependency on less accurate NL-to-SQL conversion models by directly utilizing LLMs for query processing.

Proposed Change

Overview

We propose the integration of an LLM-based agentic architecture into Superset to enable the following capabilities:

  1. Natural language query support for dashboards and charts.
  2. Intelligent selection of appropriate charts based on natural language queries.
  3. Automated summarization of chart data and dashboards.
  4. Automated text-based reporting based on predefined KPIs and schedules.
  5. Graceful handling of scenarios where relevant charts are not found for a given query.

Implementation Details

  1. LLM Integration:

    • Integrate an LLM capable of understanding and processing natural language queries.
    • Develop an agent-based system where LLMs can perform actions such as selecting relevant charts based on user query, fetching relevant SQL queries from existing APIs, applying filters, running final SQL query and getting results, and finally summarizing results and generating insights.
  2. Natural Language Query Support:

    • Add input fields at both dashboard and chart levels to support natural language queries.
    • Implement backend services to process these queries using the LLM.
  3. Chart and Dashboard Summarization:

    • Provide summarization options in the chart menu based on the loaded data.
    • Implement automated text-based reporting for dashboards using cron jobs for predefined KPIs.
  4. Intelligent Chart Selection:

    • Develop mechanisms for LLMs to pick the correct chart based on chart names or associated metadata.
    • Ensure graceful handling when no relevant charts are found for a query.
  5. Feature Flags:

    • Enable the feature using feature flags to allow users to opt-in on a per-user basis.

Mockups and Screenshots

Mockups and screenshots will be added here once the design phase is complete.

New or Changed Public Interfaces

  1. REST Endpoints:

    • New endpoints for processing natural language queries and returning summarized results.
  2. React Components:

    • New input fields for natural language queries at the dashboard and chart levels.
    • Updated chart menu with summarization options.
  3. Configuration:

    • Configuration options for enabling/disabling the feature using feature flags.
  4. CLI Changes:

    • New CLI commands for managing LLM-related configurations and feature flags.

New dependencies

  1. LLM Libraries:

    • We will integrate with existing LLM libraries such as Hugging Face's Meta-Llama-3-8B
    • Ensure compatibility with Apache License v2.0.
  2. Other Dependencies:

    • Additional Python packages for natural language processing and machine learning (e.g., NLTK, spaCy).

Migration Plan and Compatibility

  1. Database Migrations:

    • No database migrations are required for this feature.
  2. Compatibility:

    • Ensure that existing dashboards and charts continue to function without any changes.
    • Provide a seamless upgrade path with clear documentation on enabling and using the new feature.
  3. Deprecation Strategy:

    • Allow the new feature to coexist with existing NL-to-SQL conversion models during a deprecation period.
    • Provide clear documentation and migration guides for users transitioning to the new system.

Rejected Alternatives

  1. Enhancing Existing NL-to-SQL Models:

    • While enhancing existing NL-to-SQL models could improve accuracy, it would require significant effort in model training and fine-tuning. The LLM-based approach offers a more flexible and scalable solution.
  2. Rule-Based Systems:

    • Rule-based systems lack the flexibility to handle the wide variety of natural language queries effectively. LLMs provide a more robust solution by understanding context and intent.

By integrating LLM-based agentic architecture into Superset, we can significantly enhance the user experience with advanced natural language processing capabilities, making it easier for users to interact with their data and generate insights.


This SIP is now open for discussion. Please subscribe and provide your feedback here.

@ved-kashyap-samsung ved-kashyap-samsung added the sip Superset Improvement Proposal label Feb 27, 2025
@dosubot dosubot bot added change:backend Requires changing the backend change:frontend Requires changing the frontend design:proposal Design proposals labels Feb 27, 2025
@hainenber
Copy link
Contributor

Seems to be duplicate of SIP-140. Aside from that, I'm a bit skeptical on this benefit

... provide advanced dashboard and chart summarization capabilities ...

Isn't the point of having dashboard and/or chart is to summarize raw data into human-readable visualization already?

@ved-kashyap-samsung
Copy link
Contributor Author

Seems to be duplicate of SIP-140. Aside from that, I'm a bit skeptical on this benefit

... provide advanced dashboard and chart summarization capabilities ...

Isn't the point of having dashboard and/or chart is to summarize raw data into human-readable visualization already?

It is not the duplicate of the SIP mentioned by you. I have mentioned that one in rejected alternatives.

Also, Textual insights on charts and graphs provide several benefits:

Enhanced Understanding

  1. Clearer interpretation: Textual insights help users quickly grasp the meaning behind the data, reducing confusion and misinterpretation.
  2. Contextualization: Additional text provides context, enabling users to understand the relevance and significance of the data.

Improved Decision-Making

  1. Actionable recommendations: Textual insights can offer concrete suggestions or recommendations based on the data, facilitating informed decision-making.
  2. Data-driven storytelling: By combining data with narrative, users can better comprehend complex information and make data-driven decisions.

Increased Efficiency

  1. Reduced analysis time: Textual insights save users time and effort by providing a concise summary of key findings.
  2. Simplified communication: Clear and concise textual insights facilitate communication among stakeholders, ensuring everyone is on the same page.

Anomaly Detection

  1. Highlighting unusual patterns: Textual insights can automatically identify and highlight unusual patterns or anomalies in the data, enabling users to quickly investigate and address potential issues.

Better Accessibility

  1. Accessibility for non-technical users: Textual insights make complex data more accessible to non-technical stakeholders, promoting a broader understanding of the information.
  2. Assistive technology compatibility: Text-based insights can be more easily interpreted by assistive technologies, such as screen readers.

Enhanced Engagement

  1. Engaging storytelling: Textual insights can present data in a more engaging and narrative-driven format, capturing users' attention and encouraging exploration.
  2. Personalization: Tailored textual insights can address specific user needs, increasing engagement and relevance.

@rusackas
Copy link
Member

Hi @ved-kashyap-samsung - thank you for the clarification. There have been a handful of discussions around this in various fora. Are you already on slack? If you can DM me there, perhaps some of the Superset committers/PMC members can join a thread there, discuss some ideas/proposals (and lessons learned building similar features on forks) and find a good way to make this work well with the open-source project. Then we can bring those ideas back to this (more official) discussion to gather consensus on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:backend Requires changing the backend change:frontend Requires changing the frontend design:proposal Design proposals sip Superset Improvement Proposal
Projects
None yet
Development

No branches or pull requests

3 participants