-
Notifications
You must be signed in to change notification settings - Fork 710
*: add analytics service docs #22556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
lilin90
wants to merge
7
commits into
pingcap:feature/preview-cloud-lake
Choose a base branch
from
lilin90:cloud-lake-dir
base: feature/preview-cloud-lake
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
1ca2c31
*: add analytics service docs
lilin90 91ff961
Update doc
lilin90 228ce3f
Update relative reference links
lilin90 8263d59
Update docs links and variables
lilin90 9cce75c
Update file names and links
lilin90 69e65b7
Update format and fix typo
lilin90 f8bbc1c
Fix links and update file names
lilin90 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| --- | ||
| title: TiDB Cloud Lake Documentation | ||
| hide_sidebar: true | ||
| hide_commit: true | ||
| summary: TiDB Cloud Lake is cloud-native data warehouse service focused on analytics workloads that scales elastically and supports ANSI SQL and multi-modal data operations. | ||
| --- | ||
|
|
||
| <LearningPathContainer platform="tidb-cloud" title="TiDB Cloud Lake" subTitle="TiDB Cloud Lake is cloud-native data warehouse service focused on analytics workloads that scales elastically and supports ANSI SQL and multi-modal data operations."> | ||
|
|
||
| </LearningPathContainer> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| --- | ||
| title: Access Control | ||
| --- | ||
|
|
||
| Databend incorporates both [Role-Based Access Control (RBAC)](https://en.wikipedia.org/wiki/Role-based_access_control) and [Discretionary Access Control (DAC)](https://en.wikipedia.org/wiki/Discretionary_access_control) models for its access control functionality. When a user accesses a data object in Databend, they must be granted appropriate privileges or roles, or they need to have ownership of the data object. A data object can refer to various elements, such as a database, table, view, stage, or UDF. | ||
|
|
||
|  | ||
|
|
||
| | Concept | Description | | ||
| |-----------|------------------------------------------------------------| | ||
| | Privileges | Privileges play a crucial role when interacting with data objects in Databend. These permissions, such as read, write, and execute, provide precise control over user actions, ensuring alignment with user requirements and maintaining data security. | | ||
| | Roles | Roles simplify access control. Roles are predefined sets of privileges assigned to users, streamlining permission management. Administrators can categorize users based on responsibilities, granting permissions efficiently without individual configurations. | | ||
| | Ownership | Ownership is a specialized privilege for controlling data access. When a user owns a data object, they have the highest control level, dictating access permissions. This straightforward ownership model empowers users to manage their data, controlling who can access or modify it within the Databend environment. | | ||
|
|
||
| This guide describes the related concepts and provides instructions on how to manage access control in Databend: | ||
|
|
||
| - [Privileges](/tidb-cloud-lake/guides/privileges.md) | ||
| - [Roles](/tidb-cloud-lake/guides/roles.md) | ||
| - [Ownership](/tidb-cloud-lake/guides/ownership.md) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,148 @@ | ||
| --- | ||
| title: Aggregating Index | ||
| --- | ||
|
|
||
| # Aggregating Index: Precomputed Results for Instant Analytics | ||
|
|
||
| Aggregating indexes dramatically accelerate analytical queries by precomputing and storing aggregation results, eliminating the need to scan entire tables for common analytics operations. | ||
|
|
||
| ## What Problem Does It Solve? | ||
|
|
||
| Analytical queries on large datasets face significant performance challenges: | ||
|
|
||
| | Problem | Impact | Aggregating Index Solution | | ||
| |---------|--------|---------------------------| | ||
| | **Full Table Scans** | SUM, COUNT, MIN, MAX queries scan millions of rows | Read precomputed results instantly | | ||
| | **Repeated Calculations** | Same aggregations computed over and over | Store results once, reuse many times | | ||
| | **Slow Dashboard Queries** | Analytics dashboards take minutes to load | Sub-second response for common metrics | | ||
| | **High Compute Costs** | Heavy aggregation workloads consume resources | Minimal compute for cached results | | ||
| | **Poor User Experience** | Users wait for reports and analytics | Instant results for business intelligence | | ||
|
|
||
| **Example**: A sales analytics query `SELECT SUM(revenue), COUNT(*) FROM sales WHERE region = 'US'` on 100M rows. Without aggregating index, it scans all US sales records. With aggregating index, it returns precomputed results instantly. | ||
|
|
||
| ## How It Works | ||
|
|
||
| 1. **Index Creation** → Define aggregation queries to precompute | ||
| 2. **Result Storage** → Databend stores aggregated results in optimized blocks | ||
lilin90 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| 3. **Query Matching** → Incoming queries automatically use precomputed results | ||
| 4. **Automatic Updates** → Results refresh when underlying data changes | ||
|
|
||
| ## Quick Setup | ||
|
|
||
| ```sql | ||
| -- Create table with sample data | ||
| CREATE TABLE sales(region VARCHAR, product VARCHAR, revenue DECIMAL, quantity INT); | ||
|
|
||
| -- Create aggregating index for common analytics | ||
| CREATE AGGREGATING INDEX sales_summary AS | ||
| SELECT region, SUM(revenue), COUNT(*), AVG(quantity) | ||
| FROM sales | ||
| GROUP BY region; | ||
|
|
||
| -- Refresh the index (manual mode) | ||
| REFRESH AGGREGATING INDEX sales_summary; | ||
|
|
||
| -- Verify the index is used | ||
| EXPLAIN SELECT region, SUM(revenue) FROM sales GROUP BY region; | ||
| ``` | ||
|
|
||
| ## Supported Operations | ||
|
|
||
| | ✅ Supported | ❌ Not Supported | | ||
| |-------------|-----------------| | ||
| | SUM, COUNT, MIN, MAX, AVG | Window Functions | | ||
| | GROUP BY clauses | GROUPING SETS | | ||
| | WHERE filters | ORDER BY, LIMIT | | ||
| | Simple aggregations | Complex subqueries | | ||
|
|
||
| ## Refresh Strategies | ||
|
|
||
| | Strategy | When to Use | Configuration | | ||
| |----------|-------------|---------------| | ||
| | **Automatic (SYNC)** | Real-time analytics, small datasets | `CREATE AGGREGATING INDEX ... SYNC` | | ||
| | **Manual** | Large datasets, batch processing | `CREATE AGGREGATING INDEX ...` (default) | | ||
| | **Background (Cloud)** | Production workloads | Automatic in Databend Cloud | | ||
|
|
||
| ### Automatic vs Manual Refresh | ||
|
|
||
| ```sql | ||
| -- Automatic refresh (updates with every data change) | ||
| CREATE AGGREGATING INDEX auto_summary AS | ||
| SELECT region, SUM(revenue) FROM sales GROUP BY region SYNC; | ||
|
|
||
| -- Manual refresh (update on demand) | ||
| CREATE AGGREGATING INDEX manual_summary AS | ||
| SELECT region, SUM(revenue) FROM sales GROUP BY region; | ||
|
|
||
| REFRESH AGGREGATING INDEX manual_summary; | ||
| ``` | ||
|
|
||
| ## Performance Example | ||
|
|
||
| This example shows the dramatic performance improvement: | ||
|
|
||
| ```sql | ||
| -- Prepare data | ||
| CREATE TABLE agg(a int, b int, c int); | ||
| INSERT INTO agg VALUES (1,1,4), (1,2,1), (1,2,4), (2,2,5); | ||
|
|
||
| -- Create an aggregating index | ||
| CREATE AGGREGATING INDEX my_agg_index AS SELECT MIN(a), MAX(c) FROM agg; | ||
|
|
||
| -- Refresh the aggregating index | ||
| REFRESH AGGREGATING INDEX my_agg_index; | ||
|
|
||
| -- Verify if the aggregating index works | ||
| EXPLAIN SELECT MIN(a), MAX(c) FROM agg; | ||
|
|
||
| -- Key indicators in the execution plan: | ||
| -- ├── aggregating index: [SELECT MIN(a), MAX(c) FROM default.agg] | ||
| -- ├── rewritten query: [selection: [index_col_0 (#0), index_col_1 (#1)]] | ||
| -- This shows the query uses precomputed results instead of scanning raw data | ||
| ``` | ||
|
|
||
| ## Best Practices | ||
|
|
||
| | Practice | Benefit | | ||
| |----------|---------| | ||
| | **Index Common Queries** | Focus on frequently executed analytics | | ||
| | **Use Manual Refresh** | Better control over update timing | | ||
| | **Monitor Index Usage** | Use EXPLAIN to verify index utilization | | ||
| | **Clean Up Unused Indexes** | Remove indexes that aren't being used | | ||
| | **Match Query Patterns** | Index filters should match actual queries | | ||
|
|
||
| ## Management Commands | ||
|
|
||
| | Command | Purpose | | ||
| |---------|---------| | ||
| | `CREATE AGGREGATING INDEX` | Create new aggregating index | | ||
| | `REFRESH AGGREGATING INDEX` | Update index with latest data | | ||
| | `DROP AGGREGATING INDEX` | Remove index (use VACUUM TABLE to clean storage) | | ||
| | `SHOW AGGREGATING INDEXES` | List all indexes | | ||
|
|
||
| ## Important Notes | ||
|
|
||
| :::tip | ||
| **When to Use Aggregating Indexes:** | ||
| - Frequent analytical queries (dashboards, reports) | ||
| - Large datasets with repeated aggregations | ||
| - Stable query patterns | ||
| - Performance-critical applications | ||
|
|
||
| **When NOT to Use:** | ||
| - Frequently changing data | ||
| - One-time analytical queries | ||
| - Simple queries on small tables | ||
| ::: | ||
|
|
||
| ## Configuration | ||
|
|
||
| ```sql | ||
| -- Enable/disable aggregating index feature | ||
| SET enable_aggregating_index_scan = 1; -- Enable (default) | ||
| SET enable_aggregating_index_scan = 0; -- Disable | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| *Aggregating indexes are most effective for repetitive analytical workloads on large datasets. Start with your most common dashboard and reporting queries.* | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| # AI & ML Integration | ||
|
|
||
| Databend enables powerful AI and ML capabilities through two complementary approaches: build custom AI functions with your own infrastructure, or create conversational data experiences using natural language. | ||
|
|
||
| ## External Functions - The Recommended Approach | ||
|
|
||
| External functions enable you to connect your data with custom AI/ML infrastructure, providing maximum flexibility and performance for AI workloads. | ||
|
|
||
| | Feature | Benefits | | ||
| |---------|----------| | ||
| | **Custom Models** | Use any open-source or proprietary AI/ML models | | ||
| | **GPU Acceleration** | Deploy on GPU-equipped machines for faster inference | | ||
| | **Data Privacy** | Keep your data within your infrastructure | | ||
| | **Scalability** | Independent scaling and resource optimization | | ||
| | **Flexibility** | Support for any programming language and ML framework | | ||
|
|
||
| ## MCP Server - Natural Language Data Interaction | ||
|
|
||
| The Model Context Protocol (MCP) server enables AI assistants to interact with your Databend database using natural language, perfect for building conversational BI tools. | ||
|
|
||
| | Feature | Benefits | | ||
| |---------|----------| | ||
| | **Natural Language** | Query your data using plain English | | ||
| | **AI Assistant Integration** | Works with Claude, ChatGPT, and custom agents | | ||
| | **Real-time Analysis** | Get instant insights from your data | | ||
|
|
||
| ## Getting Started | ||
|
|
||
| **[External Functions Guide](/tidb-cloud-lake/guides/external-ai-functions.md)** - Learn how to create and deploy custom AI functions with practical examples and implementation guidance | ||
|
|
||
| **[MCP Server Guide](/tidb-cloud-lake/guides/mcp-server.md)** - Build a conversational BI tool using mcp-databend and natural language queries | ||
|
|
||
| **[MCP Client Integration](/tidb-cloud-lake/guides/mcp-client-integration.md)** - Configure generic MCP clients (like Codex) to connect to Databend |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| --- | ||
| title: AI-Powered Features | ||
| --- | ||
|
|
||
| import SearchSVG from '@site/static/img/icon/search.svg' | ||
| import LanguageFileParse from '@site/src/components/LanguageDocs/file-parse' | ||
| import AITip from '@site/docs/fragment/ai-tip.md' | ||
|
|
||
| <LanguageFileParse | ||
| cn={<AITip />} | ||
| /> | ||
|
|
||
| With the inclusion of AI-powered features, Databend Cloud allows you to engage in natural language conversations to receive help, assistance, and solutions. These AI-powered features are enabled by default, but you can disable them if desired by navigating to **Manage** > **Settings**. | ||
|
|
||
| ### AI Chat for Assistance | ||
|
|
||
| AI Chat enables natural language interactions, allowing for intuitive information retrieval and streamlined problem-solving. | ||
|
|
||
| To launch an AI-Chat: | ||
|
|
||
| 1. Click the magnifying glass icon <SearchSVG/> located in the sidebar to open the search box. | ||
|
|
||
| 2. Switch to the **Chat** tab. | ||
|
|
||
| 3. Enter your question. | ||
|
|
||
|  | ||
|
|
||
| ### AI-Powered SQL Assistant | ||
|
|
||
| AI assistance is available for editing SQL statements within worksheets. You don't need to write your SQL from scratch — AI can generate it for you. | ||
|
|
||
| To involve AI when editing a SQL statement, simply type "/" at the beginning of a new line and input your query, like "return current time": | ||
|
|
||
|  | ||
|
|
||
| You can also get AI assistance for an existing SQL statement. To do so, highlight your SQL and click **Edit** to specify your desired changes or request further help. Alternatively, click **Chat** to engage in a conversation with AI for more comprehensive support. | ||
|
|
||
|  |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,108 @@ | ||
| --- | ||
| title: Audit Trail | ||
| --- | ||
|
|
||
| import EEFeature from '@site/src/components/EEFeature'; | ||
|
|
||
| <EEFeature featureName='AUDIT TRAIL'/> | ||
|
|
||
| Databend system history tables automatically capture detailed records of database activities, providing a complete audit trail for compliance and security monitoring. | ||
|
|
||
| Allows the auditing of the user: | ||
| - **Query execution** - Complete SQL execution audit trail (`query_history`) | ||
| - **Data access** - Database object access and modifications (`access_history`) | ||
| - **Authentication** - Login attempts and session tracking (`login_history`) | ||
|
|
||
| ## Available Audit Tables | ||
|
|
||
| Databend provides five system history tables that capture different aspects of database activity: | ||
|
|
||
| | Table | Purpose | Key Use Cases | | ||
| |-------|---------|---------------| | ||
| | [query_history](/tidb-cloud-lake/sql/system-history-query-history.md) | Complete SQL execution audit trail | Performance monitoring, security auditing, compliance reporting | | ||
| | [access_history](/tidb-cloud-lake/sql/system-history-access-history.md) | Database object access and modifications | Data lineage tracking, compliance auditing, change management | | ||
| | [login_history](/tidb-cloud-lake/sql/system-history-login-history.md) | Authentication attempts and sessions | Security monitoring, failed login detection, access pattern analysis | | ||
|
|
||
| ## Audit Use Cases & Examples | ||
|
|
||
| ### Security Monitoring | ||
|
|
||
| **Monitor Failed Login Attempts** | ||
|
|
||
| Track authentication failures to identify potential security threats and unauthorized access attempts. | ||
|
|
||
| ```sql | ||
| -- Check for failed login attempts (security audit) | ||
| SELECT event_time, user_name, client_ip, error_message | ||
| FROM system_history.login_history | ||
| WHERE event_type = 'LoginFailed' | ||
| ORDER BY event_time DESC; | ||
| ``` | ||
|
|
||
| Example output: | ||
| ``` | ||
| event_time: 2025-06-03 06:07:32.512021 | ||
| user_name: root1 | ||
| client_ip: 127.0.0.1:62050 | ||
| error_message: UnknownUser. Code: 2201, Text = User 'root1'@'%' does not exist. | ||
| ``` | ||
|
|
||
| ### Compliance Reporting | ||
|
|
||
| **Track Database Schema Changes** | ||
|
|
||
| Monitor DDL operations for compliance and change management requirements. | ||
|
|
||
| ```sql | ||
| -- Audit DDL operations (compliance tracking) | ||
| SELECT query_id, query_start, user_name, object_modified_by_ddl | ||
| FROM system_history.access_history | ||
| WHERE object_modified_by_ddl != '[]' | ||
| ORDER BY query_start DESC; | ||
| ``` | ||
|
|
||
| Example for `CREATE TABLE` operation: | ||
| ``` | ||
| query_id: c2c1c7be-cee4-4868-a28e-8862b122c365 | ||
| query_start: 2025-06-12 03:31:19.042128 | ||
| user_name: root | ||
| object_modified_by_ddl: [{"object_domain":"Table","object_name":"default.default.t","operation_type":"Create"}] | ||
| ``` | ||
|
|
||
| **Audit Data Access Patterns** | ||
|
|
||
| Track who accessed what data and when for compliance and data governance. | ||
|
|
||
| ```sql | ||
| -- Track data access for compliance | ||
| SELECT query_id, query_start, user_name, base_objects_accessed | ||
| FROM system_history.access_history | ||
| WHERE base_objects_accessed != '[]' | ||
| ORDER BY query_start DESC; | ||
| ``` | ||
|
|
||
| ### Operational Monitoring | ||
|
|
||
| **Complete Query Execution Audit** | ||
|
|
||
| Maintain comprehensive records of all SQL operations with user and timing information. | ||
|
|
||
| ```sql | ||
| -- Complete query audit with user and timing information | ||
| SELECT query_id, sql_user, query_text, query_start_time, query_duration_ms, client_address | ||
| FROM system_history.query_history | ||
| WHERE event_date >= TODAY() - INTERVAL 7 DAY | ||
| ORDER BY query_start_time DESC; | ||
| ``` | ||
|
|
||
| Example output: | ||
| ``` | ||
| query_id: 4e1f50a9-bce2-45cc-86e4-c7a36b9b8d43 | ||
| sql_user: root | ||
| query_text: SELECT * FROM t | ||
| query_start_time: 2025-06-12 03:31:35.041725 | ||
| query_duration_ms: 94 | ||
| client_address: 127.0.0.1 | ||
| ``` | ||
|
|
||
| For detailed information about each audit table and their specific fields, see the [System History Tables](/tidb-cloud-lake/sql/system-history-tables.md) reference documentation. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.