Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic data analysis/visualisation - Auto-analyst feature #191

Open
jjuritzno10 opened this issue Apr 8, 2024 · 0 comments
Open

Automatic data analysis/visualisation - Auto-analyst feature #191

jjuritzno10 opened this issue Apr 8, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@jjuritzno10
Copy link

jjuritzno10 commented Apr 8, 2024

Is your feature request related to a problem? Please describe.

Civil servants often have to make sense of data and graphs, but some don't have the quantitative/excel/coding skills to manipulate data or make visualisations. Non-quantitative civil servants need to rely on analysts to get the visualisations/analysis that they know they want, and this can often take a while, even for quite simple requests.

This feature would support non-technical civil servants to interact with datasets and produce pieces of analysis simply using natural language, getting them to a 'first-draft' much faster and lowering the barrier to entry to technical work.

Describe the solution you'd like

Redbox could implement and build upon LIDA 'Automatic Generation of Visualizations and Infographics using Large Language Models' (https://github.com/microsoft/lida) as a component. This would allow users to chat with datasets and produce visualisations using natural language.

LIDA is quite sophisticated and gets us most of the way to a functional prototype, but it could do with tweaking/tuning to be better targeted to CS use-cases.

Below is an example of how a user might interact with the auto-analyst feature:

  1. User uploads dataset
  2. Auto-analyst autodetects dataset content using LIDA
  3. Auto-analyst prompts user to provide additional context about the dataset, the rows and columns, and other formatting issues.
  4. Auto-analyst engages in a discussion with the user to gauge the question that they want answered from the dataset, potentially suggesting concrete example questions where the user doesn't provide a clear, precise question.
  5. Auto-analyst generates visualisations based on the context provided.
  6. Auto-analyst explains the visualisations to the user.
  7. The user and auto-analyst discuss the visualisations and analysis.
  8. Auto-analyst generates new visualisations based on feedback. Go back to 5.
  9. Download the visualisations, analysis and code required to generate the results.

Describe alternatives you've considered

Automatic data visualisation and dashboard products like Tableau and Power BI go some way to provide the features above, but are designed with technical users in mind (afaik). The feature proposed here is as low-code as possible and is targeted at non-technical users.

Additional context

Future features could include

  • Dashboard creation.
  • UI for exploring, annotating, and cleaning datasets.
  • Linking other user-uploaded documents into the auto-analysis process. Example: A user has uploaded documents about universal basic income, then uploaded a similar looking dataset. Auto-analyst draws on the context in the documents when generating visualisations or engaging in conversation with the user.
  • Automatic data reformatting and cleaning. LIDA works best on very simple tables. Quite a lot of ONS published data is in more complicated formats (multiple tables per excel sheet, and multiple sheets per excel workbook). Automatic detection and parsing of tables would improve usability.
  • Data linking to ONS sources. Could incorporate search functionalities to sources such as https://www.ons.gov.uk/
@lmwilkigov lmwilkigov added the enhancement New feature or request label Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants