pingcap · lilin90 · Mar 12, 2026 · Mar 12, 2026 · Mar 12, 2026 · Mar 12, 2026
diff --git a/TOC-tidb-cloud-lake.md b/TOC-tidb-cloud-lake.md
diff --git a/tidb-cloud-lake/_index.md b/tidb-cloud-lake/_index.md
@@ -0,0 +1,10 @@
+---
+title: TiDB Cloud Lake Documentation
+hide_sidebar: true
+hide_commit: true
+summary: TiDB Cloud Lake is cloud-native data warehouse service focused on analytics workloads that scales elastically and supports ANSI SQL and multi-modal data operations.
+---
+
+<LearningPathContainer platform="tidb-cloud" title="TiDB Cloud Lake" subTitle="TiDB Cloud Lake is cloud-native data warehouse service focused on analytics workloads that scales elastically and supports ANSI SQL and multi-modal data operations.">
+
+</LearningPathContainer>
diff --git a/tidb-cloud-lake/guides/access-control.md b/tidb-cloud-lake/guides/access-control.md
@@ -0,0 +1,19 @@
+---
+title: Access Control
+---
+
+Databend incorporates both [Role-Based Access Control (RBAC)](https://en.wikipedia.org/wiki/Role-based_access_control) and [Discretionary Access Control (DAC)](https://en.wikipedia.org/wiki/Discretionary_access_control) models for its access control functionality. When a user accesses a data object in Databend, they must be granted appropriate privileges or roles, or they need to have ownership of the data object. A data object can refer to various elements, such as a database, table, view, stage, or UDF.
+
+![Alt text](/img/guides/access-control-1.png)
+
+| Concept   | Description                                              |
+|-----------|------------------------------------------------------------|
+| Privileges | Privileges play a crucial role when interacting with data objects in Databend. These permissions, such as read, write, and execute, provide precise control over user actions, ensuring alignment with user requirements and maintaining data security.                                                                 |
+| Roles      | Roles simplify access control. Roles are predefined sets of privileges assigned to users, streamlining permission management. Administrators can categorize users based on responsibilities, granting permissions efficiently without individual configurations.                                                        |
+| Ownership | Ownership is a specialized privilege for controlling data access. When a user owns a data object, they have the highest control level, dictating access permissions. This straightforward ownership model empowers users to manage their data, controlling who can access or modify it within the Databend environment. |
+
+This guide describes the related concepts and provides instructions on how to manage access control in Databend:
+
+- [Privileges](/tidb-cloud-lake/guides/privileges.md)
+- [Roles](/tidb-cloud-lake/guides/roles.md)
+- [Ownership](/tidb-cloud-lake/guides/ownership.md)
diff --git a/tidb-cloud-lake/guides/aggregating-index.md b/tidb-cloud-lake/guides/aggregating-index.md
@@ -0,0 +1,148 @@
+---
+title: Aggregating Index
+---
+
+# Aggregating Index: Precomputed Results for Instant Analytics
+
+Aggregating indexes dramatically accelerate analytical queries by precomputing and storing aggregation results, eliminating the need to scan entire tables for common analytics operations.
+
+## What Problem Does It Solve?
+
+Analytical queries on large datasets face significant performance challenges:
+
+| Problem | Impact | Aggregating Index Solution |
+|---------|--------|---------------------------|
+| **Full Table Scans** | SUM, COUNT, MIN, MAX queries scan millions of rows | Read precomputed results instantly |
+| **Repeated Calculations** | Same aggregations computed over and over | Store results once, reuse many times |
+| **Slow Dashboard Queries** | Analytics dashboards take minutes to load | Sub-second response for common metrics |
+| **High Compute Costs** | Heavy aggregation workloads consume resources | Minimal compute for cached results |
+| **Poor User Experience** | Users wait for reports and analytics | Instant results for business intelligence |
+
+**Example**: A sales analytics query `SELECT SUM(revenue), COUNT(*) FROM sales WHERE region = 'US'` on 100M rows. Without aggregating index, it scans all US sales records. With aggregating index, it returns precomputed results instantly.
+
+## How It Works
+
+1. **Index Creation** → Define aggregation queries to precompute
+2. **Result Storage** → Databend stores aggregated results in optimized blocks
+3. **Query Matching** → Incoming queries automatically use precomputed results
+4. **Automatic Updates** → Results refresh when underlying data changes
+
+## Quick Setup
+
+```sql
+-- Create table with sample data
+CREATE TABLE sales(region VARCHAR, product VARCHAR, revenue DECIMAL, quantity INT);
+
+-- Create aggregating index for common analytics
+CREATE AGGREGATING INDEX sales_summary AS 
+SELECT region, SUM(revenue), COUNT(*), AVG(quantity) 
+FROM sales 
+GROUP BY region;
+
+-- Refresh the index (manual mode)
+REFRESH AGGREGATING INDEX sales_summary;
+
+-- Verify the index is used
+EXPLAIN SELECT region, SUM(revenue) FROM sales GROUP BY region;
+```
+
+## Supported Operations
+
+| ✅ Supported | ❌ Not Supported |
+|-------------|-----------------|
+| SUM, COUNT, MIN, MAX, AVG | Window Functions |
+| GROUP BY clauses | GROUPING SETS |
+| WHERE filters | ORDER BY, LIMIT |
+| Simple aggregations | Complex subqueries |
+
+## Refresh Strategies
+
+| Strategy | When to Use | Configuration |
+|----------|-------------|---------------|
+| **Automatic (SYNC)** | Real-time analytics, small datasets | `CREATE AGGREGATING INDEX ... SYNC` |
+| **Manual** | Large datasets, batch processing | `CREATE AGGREGATING INDEX ...` (default) |
+| **Background (Cloud)** | Production workloads | Automatic in Databend Cloud |
+
+### Automatic vs Manual Refresh
+
+```sql
+-- Automatic refresh (updates with every data change)
+CREATE AGGREGATING INDEX auto_summary AS 
+SELECT region, SUM(revenue) FROM sales GROUP BY region SYNC;
+
+-- Manual refresh (update on demand)
+CREATE AGGREGATING INDEX manual_summary AS 
+SELECT region, SUM(revenue) FROM sales GROUP BY region;
+
+REFRESH AGGREGATING INDEX manual_summary;
+```
+
+## Performance Example
+
+This example shows the dramatic performance improvement:
+
+```sql
+-- Prepare data
+CREATE TABLE agg(a int, b int, c int);
+INSERT INTO agg VALUES (1,1,4), (1,2,1), (1,2,4), (2,2,5);
+
+-- Create an aggregating index
+CREATE AGGREGATING INDEX my_agg_index AS SELECT MIN(a), MAX(c) FROM agg;
+
+-- Refresh the aggregating index
+REFRESH AGGREGATING INDEX my_agg_index;
+
+-- Verify if the aggregating index works
+EXPLAIN SELECT MIN(a), MAX(c) FROM agg;
+
+-- Key indicators in the execution plan:
+-- ├── aggregating index: [SELECT MIN(a), MAX(c) FROM default.agg]
+-- ├── rewritten query: [selection: [index_col_0 (#0), index_col_1 (#1)]]
+-- This shows the query uses precomputed results instead of scanning raw data
+```
+
+## Best Practices
+
+| Practice | Benefit |
+|----------|---------|
+| **Index Common Queries** | Focus on frequently executed analytics |
+| **Use Manual Refresh** | Better control over update timing |
+| **Monitor Index Usage** | Use EXPLAIN to verify index utilization |
+| **Clean Up Unused Indexes** | Remove indexes that aren't being used |
+| **Match Query Patterns** | Index filters should match actual queries |
+
+## Management Commands
+
+| Command | Purpose |
+|---------|---------|
+| `CREATE AGGREGATING INDEX` | Create new aggregating index |
+| `REFRESH AGGREGATING INDEX` | Update index with latest data |
+| `DROP AGGREGATING INDEX` | Remove index (use VACUUM TABLE to clean storage) |
+| `SHOW AGGREGATING INDEXES` | List all indexes |
+
+## Important Notes
+
+:::tip
+**When to Use Aggregating Indexes:**
+- Frequent analytical queries (dashboards, reports)
+- Large datasets with repeated aggregations
+- Stable query patterns
+- Performance-critical applications
+
+**When NOT to Use:**
+- Frequently changing data
+- One-time analytical queries
+- Simple queries on small tables
+:::
+
+## Configuration
+
+```sql
+-- Enable/disable aggregating index feature
+SET enable_aggregating_index_scan = 1;  -- Enable (default)
+SET enable_aggregating_index_scan = 0;  -- Disable
+```
+
+---
+
+*Aggregating indexes are most effective for repetitive analytical workloads on large datasets. Start with your most common dashboard and reporting queries.*
diff --git a/tidb-cloud-lake/guides/ai-ml-integration.md b/tidb-cloud-lake/guides/ai-ml-integration.md
@@ -0,0 +1,33 @@
+# AI & ML Integration
+
+Databend enables powerful AI and ML capabilities through two complementary approaches: build custom AI functions with your own infrastructure, or create conversational data experiences using natural language.
+
+## External Functions - The Recommended Approach
+
+External functions enable you to connect your data with custom AI/ML infrastructure, providing maximum flexibility and performance for AI workloads.
+
+| Feature | Benefits |
+|---------|----------|
+| **Custom Models** | Use any open-source or proprietary AI/ML models |
+| **GPU Acceleration** | Deploy on GPU-equipped machines for faster inference |
+| **Data Privacy** | Keep your data within your infrastructure |
+| **Scalability** | Independent scaling and resource optimization |
+| **Flexibility** | Support for any programming language and ML framework |
+
+## MCP Server - Natural Language Data Interaction
+
+The Model Context Protocol (MCP) server enables AI assistants to interact with your Databend database using natural language, perfect for building conversational BI tools.
+
+| Feature | Benefits |
+|---------|----------|
+| **Natural Language** | Query your data using plain English |
+| **AI Assistant Integration** | Works with Claude, ChatGPT, and custom agents |
+| **Real-time Analysis** | Get instant insights from your data |
+
+## Getting Started
+
+**[External Functions Guide](/tidb-cloud-lake/guides/external-ai-functions.md)** - Learn how to create and deploy custom AI functions with practical examples and implementation guidance
+
+**[MCP Server Guide](/tidb-cloud-lake/guides/mcp-server.md)** - Build a conversational BI tool using mcp-databend and natural language queries
+
+**[MCP Client Integration](/tidb-cloud-lake/guides/mcp-client-integration.md)** - Configure generic MCP clients (like Codex) to connect to Databend
diff --git a/tidb-cloud-lake/guides/ai-powered-features.md b/tidb-cloud-lake/guides/ai-powered-features.md
@@ -0,0 +1,39 @@
+---
+title: AI-Powered Features
+---
+
+import SearchSVG from '@site/static/img/icon/search.svg'
+import LanguageFileParse from '@site/src/components/LanguageDocs/file-parse'
+import AITip from '@site/docs/fragment/ai-tip.md'
+
+<LanguageFileParse
+cn={<AITip />}
+/>
+
+With the inclusion of AI-powered features, Databend Cloud allows you to engage in natural language conversations to receive help, assistance, and solutions. These AI-powered features are enabled by default, but you can disable them if desired by navigating to **Manage** > **Settings**.
+
+### AI Chat for Assistance
+
+AI Chat enables natural language interactions, allowing for intuitive information retrieval and streamlined problem-solving.
+
+To launch an AI-Chat:
+
+1. Click the magnifying glass icon <SearchSVG/> located in the sidebar to open the search box.
+
+2. Switch to the **Chat** tab.
+
+3. Enter your question.
+
+![Alt text](@site/static/img/documents/worksheet/ai-chat.gif)
+
+### AI-Powered SQL Assistant
+
+AI assistance is available for editing SQL statements within worksheets. You don't need to write your SQL from scratch — AI can generate it for you.
+
+To involve AI when editing a SQL statement, simply type "/" at the beginning of a new line and input your query, like "return current time":
+
+![Alt text](@site/static/img/documents/worksheet/ai-worksheet-1.gif)
+
+You can also get AI assistance for an existing SQL statement. To do so, highlight your SQL and click **Edit** to specify your desired changes or request further help. Alternatively, click **Chat** to engage in a conversation with AI for more comprehensive support.
+
+![Alt text](@site/static/img/documents/worksheet/ai-worksheet-2.gif)
diff --git a/tidb-cloud-lake/guides/audit-trail.md b/tidb-cloud-lake/guides/audit-trail.md
@@ -0,0 +1,108 @@
+---
+title: Audit Trail
+---
+
+import EEFeature from '@site/src/components/EEFeature';
+
+<EEFeature featureName='AUDIT TRAIL'/>
+
+Databend system history tables automatically capture detailed records of database activities, providing a complete audit trail for compliance and security monitoring.
+
+Allows the auditing of the user:
+- **Query execution** - Complete SQL execution audit trail (`query_history`)
+- **Data access** - Database object access and modifications (`access_history`)
+- **Authentication** - Login attempts and session tracking (`login_history`)
+
+## Available Audit Tables
+
+Databend provides five system history tables that capture different aspects of database activity:
+
+| Table | Purpose | Key Use Cases |
+|-------|---------|---------------|
+| [query_history](/tidb-cloud-lake/sql/system-history-query-history.md) | Complete SQL execution audit trail | Performance monitoring, security auditing, compliance reporting |
+| [access_history](/tidb-cloud-lake/sql/system-history-access-history.md) | Database object access and modifications | Data lineage tracking, compliance auditing, change management |
+| [login_history](/tidb-cloud-lake/sql/system-history-login-history.md) | Authentication attempts and sessions | Security monitoring, failed login detection, access pattern analysis |
+
+## Audit Use Cases & Examples
+
+### Security Monitoring
+
+**Monitor Failed Login Attempts**
+
+Track authentication failures to identify potential security threats and unauthorized access attempts.
+
+```sql
+-- Check for failed login attempts (security audit)
+SELECT event_time, user_name, client_ip, error_message 
+FROM system_history.login_history 
+WHERE event_type = 'LoginFailed'
+ORDER BY event_time DESC;
+```
+
+Example output:
+```
+event_time: 2025-06-03 06:07:32.512021
+user_name: root1
+client_ip: 127.0.0.1:62050
+error_message: UnknownUser. Code: 2201, Text = User 'root1'@'%' does not exist.
+```
+
+### Compliance Reporting
+
+**Track Database Schema Changes**
+
+Monitor DDL operations for compliance and change management requirements.
+
+```sql
+-- Audit DDL operations (compliance tracking)
+SELECT query_id, query_start, user_name, object_modified_by_ddl
+FROM system_history.access_history 
+WHERE object_modified_by_ddl != '[]'
+ORDER BY query_start DESC;
+```
+
+Example for `CREATE TABLE` operation:
+```
+query_id: c2c1c7be-cee4-4868-a28e-8862b122c365
+query_start: 2025-06-12 03:31:19.042128
+user_name: root
+object_modified_by_ddl: [{"object_domain":"Table","object_name":"default.default.t","operation_type":"Create"}]
+```
+
+**Audit Data Access Patterns**
+
+Track who accessed what data and when for compliance and data governance.
+
+```sql
+-- Track data access for compliance
+SELECT query_id, query_start, user_name, base_objects_accessed
+FROM system_history.access_history 
+WHERE base_objects_accessed != '[]'
+ORDER BY query_start DESC;
+```
+
+### Operational Monitoring
+
+**Complete Query Execution Audit**
+
+Maintain comprehensive records of all SQL operations with user and timing information.
+
+```sql
+-- Complete query audit with user and timing information
+SELECT query_id, sql_user, query_text, query_start_time, query_duration_ms, client_address
+FROM system_history.query_history 
+WHERE event_date >= TODAY() - INTERVAL 7 DAY
+ORDER BY query_start_time DESC;
+```
+
+Example output:
+```
+query_id: 4e1f50a9-bce2-45cc-86e4-c7a36b9b8d43
+sql_user: root
+query_text: SELECT * FROM t
+query_start_time: 2025-06-12 03:31:35.041725
+query_duration_ms: 94
+client_address: 127.0.0.1
+```
+
+For detailed information about each audit table and their specific fields, see the [System History Tables](/tidb-cloud-lake/sql/system-history-tables.md) reference documentation.