Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(docs): create architecture page #28481

Merged
merged 13 commits into from
Jul 16, 2024
Merged

chore(docs): create architecture page #28481

merged 13 commits into from
Jul 16, 2024

Conversation

sfirke
Copy link
Member

@sfirke sfirke commented May 13, 2024

SUMMARY

  • Adds an "architecture" page summarizing the components of a Superset installation. A visual diagram should eventually go here too. Please feel free to edit/add/cut.
  • Reorders other installation pages to follow this one
  • Copyediting of Configuring Superset page

@github-actions github-actions bot added the doc Namespace | Anything related to documentation label May 13, 2024
@sfirke
Copy link
Member Author

sfirke commented May 14, 2024

This is ready for review.

A Superset installation is made up of these components:
1. The Superset application itself
2. A metadata database to store Superset's data about users, charts, dashboards, etc.
3. A caching layer (optional, but necessary for some features)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is called out twice in "components" and "optional component", i think it's a great idea to seperate the section, but let's have each component on just one side

@mistercrunch
Copy link
Member

mistercrunch commented May 15, 2024

hey! - I'm hoping it doesn't come across as a hostile takeover of this PR, but I threw your pages into GPT and added some comments/input in the form of bullet, and this is what came out:


Architecture

This documentation outlines the architecture of Apache Superset, with a primary focus on the backend components before introducing the frontend. This organization provides a thorough understanding of how Superset operates from data handling to user interface.

Superset Backend

The backend of Superset is composed of several critical components designed to manage data, execute tasks, and maintain the overall functionality of the system.

Core Backend Components

Web Application [Python/Flask]:

  • Description: Serves static assets and handles API requests from the Superset frontend.
  • Role: Acts as the primary communication hub for frontend interactions and immediate query processing.

Metadata Database [Required]:

  • Description: Stores all of Superset’s essential assets such as dashboards, charts, user configurations, and logs.
  • Supported Technologies: PostgreSQL or MySQL (recommended), other SQLAlchemy-supported OLTP databases

Optional Backend Components

Asynchronous Workers [Pytjhon/Celery]:

  • Description: Manages tasks that are too long or intense for a typical web request cycle.
  • Enabled Features:
    • Asynchronous query executions in SQL Lab.
    • Scheduled generation of alerts and reports.
    • Creation of dashboard thumbnails.
  • Dependencies: Requires a message queue (e.g., Redis, RabbitMQ).

Caching Layer:

  • Description: Enhances performance by caching query results and frequently accessed data.
  • Technologies: Primarily Redis, with support for other caching systems.
  • Enabled Features:
    • Accelerated access to repeated queries.
    • Improved responsiveness of the application.

Logging Interfaces

  • Standard Output and Error Logs: Essential for debugging and monitoring application health.
  • StatsD/Metrics Collection: Enables real-time aggregation of performance metrics.
  • Analytics Logging: Rich structured logs that provides insights into user behaviors and application usage patterns. Typically sent to a stream to land into a data warehouse

Other Common Infrastructure components:

  • Load Balancers/API Gateway: Distributes incoming traffic across multiple servers to enhance availability and manage traffic peaks.
  • Observability/Alerting: Provides monitoring, error tracking, and real-time alerts to maintain performance and uptime.
  • WSGI Server (e.g., Gunicorn in async mode):
  • Database Drivers: Enables communication between Superset and its databases, crucial for operational data querying and management.
  • Orchestration (e.g., Kubernetes): Automates deployment, scaling, and management of containerized applications, ensuring robust service availability.
  • Additional Security Measures: Implements network security, data encryption, and access controls to safeguard data and comply with regulations.

Superset Frontend

The frontend of Superset is a sophisticated web client built using modern web technologies to facilitate interactive data visualization.

Core Technologies:

  • React: Forms the foundation of the frontend, offering a responsive and dynamic user interface.
  • antd: Utilized for designing the visual components and layout of the UI, providing a consistent and professional aesthetic.
  • Plugin Architecture: Allows for the extension of visualization capabilities through custom plugins, enhancing the flexibility and functionality of visual data representation.

Functionality:

  • Communicates with the backend via the Superset API, enabling users to manage and visualize data efficiently.
  • Supports extensive customization and extension through community-developed plugins and themes.

@rusackas
Copy link
Member

Github needs the "face melt" emoji as a reaction.

Is it possible to "land and expand" here? My trust issues with GPT aside (did it really say "Pytjhon?") I wonder if we can merge a first iteration, then divide/expand/remove/elaborate as needed from there, rather than go deep here. We can feed it to GPT as we go for consolidation/clarification/organization.

@mistercrunch
Copy link
Member

I did a fair amount of prompt-inputs and edits to get to that, but main thing is the structure of the docs (backend/frontend) and mentioning technologies used (and technology choices) in different areas

Copy link
Member

@rusackas rusackas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can circle back and make this list link to the descriptions of each component below, but let's just merge this thing and dial it in from there.

@rusackas
Copy link
Member

From here, we can update the installation page to provide a little table (or similar) about how (or if) each installation method installs these components.

@sfirke sfirke merged commit e90a9b3 into master Jul 16, 2024
34 checks passed
@sfirke sfirke deleted the sfirke-add-arch-page branch July 16, 2024 17:51
eschutho pushed a commit that referenced this pull request Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Namespace | Anything related to documentation size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants