Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1,167 changes: 1,167 additions & 0 deletions TOC-tidb-cloud-lake.md

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions tidb-cloud-lake/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: TiDB Cloud Lake Documentation
hide_sidebar: true
hide_commit: true
summary: TiDB Cloud Lake is cloud-native data warehouse service focused on analytics workloads that scales elastically and supports ANSI SQL and multi-modal data operations.
---

<LearningPathContainer platform="tidb-cloud" title="TiDB Cloud Lake" subTitle="TiDB Cloud Lake is cloud-native data warehouse service focused on analytics workloads that scales elastically and supports ANSI SQL and multi-modal data operations.">

</LearningPathContainer>
19 changes: 19 additions & 0 deletions tidb-cloud-lake/guides/access-control.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
title: Access Control
---

Databend incorporates both [Role-Based Access Control (RBAC)](https://en.wikipedia.org/wiki/Role-based_access_control) and [Discretionary Access Control (DAC)](https://en.wikipedia.org/wiki/Discretionary_access_control) models for its access control functionality. When a user accesses a data object in Databend, they must be granted appropriate privileges or roles, or they need to have ownership of the data object. A data object can refer to various elements, such as a database, table, view, stage, or UDF.

![Alt text](/img/guides/access-control-1.png)

| Concept | Description |
|-----------|------------------------------------------------------------|
| Privileges | Privileges play a crucial role when interacting with data objects in Databend. These permissions, such as read, write, and execute, provide precise control over user actions, ensuring alignment with user requirements and maintaining data security. |
| Roles | Roles simplify access control. Roles are predefined sets of privileges assigned to users, streamlining permission management. Administrators can categorize users based on responsibilities, granting permissions efficiently without individual configurations. |
| Ownership | Ownership is a specialized privilege for controlling data access. When a user owns a data object, they have the highest control level, dictating access permissions. This straightforward ownership model empowers users to manage their data, controlling who can access or modify it within the Databend environment. |

This guide describes the related concepts and provides instructions on how to manage access control in Databend:

- [Privileges](/tidb-cloud-lake/guides/privileges.md)
- [Roles](/tidb-cloud-lake/guides/roles.md)
- [Ownership](/tidb-cloud-lake/guides/ownership.md)
148 changes: 148 additions & 0 deletions tidb-cloud-lake/guides/aggregating-index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
---
title: Aggregating Index
---

# Aggregating Index: Precomputed Results for Instant Analytics

Aggregating indexes dramatically accelerate analytical queries by precomputing and storing aggregation results, eliminating the need to scan entire tables for common analytics operations.

## What Problem Does It Solve?

Analytical queries on large datasets face significant performance challenges:

| Problem | Impact | Aggregating Index Solution |
|---------|--------|---------------------------|
| **Full Table Scans** | SUM, COUNT, MIN, MAX queries scan millions of rows | Read precomputed results instantly |
| **Repeated Calculations** | Same aggregations computed over and over | Store results once, reuse many times |
| **Slow Dashboard Queries** | Analytics dashboards take minutes to load | Sub-second response for common metrics |
| **High Compute Costs** | Heavy aggregation workloads consume resources | Minimal compute for cached results |
| **Poor User Experience** | Users wait for reports and analytics | Instant results for business intelligence |

**Example**: A sales analytics query `SELECT SUM(revenue), COUNT(*) FROM sales WHERE region = 'US'` on 100M rows. Without aggregating index, it scans all US sales records. With aggregating index, it returns precomputed results instantly.

## How It Works

1. **Index Creation** → Define aggregation queries to precompute
2. **Result Storage** → Databend stores aggregated results in optimized blocks
3. **Query Matching** → Incoming queries automatically use precomputed results
4. **Automatic Updates** → Results refresh when underlying data changes

## Quick Setup

```sql
-- Create table with sample data
CREATE TABLE sales(region VARCHAR, product VARCHAR, revenue DECIMAL, quantity INT);

-- Create aggregating index for common analytics
CREATE AGGREGATING INDEX sales_summary AS
SELECT region, SUM(revenue), COUNT(*), AVG(quantity)
FROM sales
GROUP BY region;

-- Refresh the index (manual mode)
REFRESH AGGREGATING INDEX sales_summary;

-- Verify the index is used
EXPLAIN SELECT region, SUM(revenue) FROM sales GROUP BY region;
```

## Supported Operations

| ✅ Supported | ❌ Not Supported |
|-------------|-----------------|
| SUM, COUNT, MIN, MAX, AVG | Window Functions |
| GROUP BY clauses | GROUPING SETS |
| WHERE filters | ORDER BY, LIMIT |
| Simple aggregations | Complex subqueries |

## Refresh Strategies

| Strategy | When to Use | Configuration |
|----------|-------------|---------------|
| **Automatic (SYNC)** | Real-time analytics, small datasets | `CREATE AGGREGATING INDEX ... SYNC` |
| **Manual** | Large datasets, batch processing | `CREATE AGGREGATING INDEX ...` (default) |
| **Background (Cloud)** | Production workloads | Automatic in Databend Cloud |

### Automatic vs Manual Refresh

```sql
-- Automatic refresh (updates with every data change)
CREATE AGGREGATING INDEX auto_summary AS
SELECT region, SUM(revenue) FROM sales GROUP BY region SYNC;

-- Manual refresh (update on demand)
CREATE AGGREGATING INDEX manual_summary AS
SELECT region, SUM(revenue) FROM sales GROUP BY region;

REFRESH AGGREGATING INDEX manual_summary;
```

## Performance Example

This example shows the dramatic performance improvement:

```sql
-- Prepare data
CREATE TABLE agg(a int, b int, c int);
INSERT INTO agg VALUES (1,1,4), (1,2,1), (1,2,4), (2,2,5);

-- Create an aggregating index
CREATE AGGREGATING INDEX my_agg_index AS SELECT MIN(a), MAX(c) FROM agg;

-- Refresh the aggregating index
REFRESH AGGREGATING INDEX my_agg_index;

-- Verify if the aggregating index works
EXPLAIN SELECT MIN(a), MAX(c) FROM agg;

-- Key indicators in the execution plan:
-- ├── aggregating index: [SELECT MIN(a), MAX(c) FROM default.agg]
-- ├── rewritten query: [selection: [index_col_0 (#0), index_col_1 (#1)]]
-- This shows the query uses precomputed results instead of scanning raw data
```

## Best Practices

| Practice | Benefit |
|----------|---------|
| **Index Common Queries** | Focus on frequently executed analytics |
| **Use Manual Refresh** | Better control over update timing |
| **Monitor Index Usage** | Use EXPLAIN to verify index utilization |
| **Clean Up Unused Indexes** | Remove indexes that aren't being used |
| **Match Query Patterns** | Index filters should match actual queries |

## Management Commands

| Command | Purpose |
|---------|---------|
| `CREATE AGGREGATING INDEX` | Create new aggregating index |
| `REFRESH AGGREGATING INDEX` | Update index with latest data |
| `DROP AGGREGATING INDEX` | Remove index (use VACUUM TABLE to clean storage) |
| `SHOW AGGREGATING INDEXES` | List all indexes |

## Important Notes

:::tip
**When to Use Aggregating Indexes:**
- Frequent analytical queries (dashboards, reports)
- Large datasets with repeated aggregations
- Stable query patterns
- Performance-critical applications

**When NOT to Use:**
- Frequently changing data
- One-time analytical queries
- Simple queries on small tables
:::

## Configuration

```sql
-- Enable/disable aggregating index feature
SET enable_aggregating_index_scan = 1; -- Enable (default)
SET enable_aggregating_index_scan = 0; -- Disable
```

---

*Aggregating indexes are most effective for repetitive analytical workloads on large datasets. Start with your most common dashboard and reporting queries.*
33 changes: 33 additions & 0 deletions tidb-cloud-lake/guides/ai-ml-integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# AI & ML Integration

Databend enables powerful AI and ML capabilities through two complementary approaches: build custom AI functions with your own infrastructure, or create conversational data experiences using natural language.

## External Functions - The Recommended Approach

External functions enable you to connect your data with custom AI/ML infrastructure, providing maximum flexibility and performance for AI workloads.

| Feature | Benefits |
|---------|----------|
| **Custom Models** | Use any open-source or proprietary AI/ML models |
| **GPU Acceleration** | Deploy on GPU-equipped machines for faster inference |
| **Data Privacy** | Keep your data within your infrastructure |
| **Scalability** | Independent scaling and resource optimization |
| **Flexibility** | Support for any programming language and ML framework |

## MCP Server - Natural Language Data Interaction

The Model Context Protocol (MCP) server enables AI assistants to interact with your Databend database using natural language, perfect for building conversational BI tools.

| Feature | Benefits |
|---------|----------|
| **Natural Language** | Query your data using plain English |
| **AI Assistant Integration** | Works with Claude, ChatGPT, and custom agents |
| **Real-time Analysis** | Get instant insights from your data |

## Getting Started

**[External Functions Guide](/tidb-cloud-lake/guides/external-ai-functions.md)** - Learn how to create and deploy custom AI functions with practical examples and implementation guidance

**[MCP Server Guide](/tidb-cloud-lake/guides/mcp-server.md)** - Build a conversational BI tool using mcp-databend and natural language queries

**[MCP Client Integration](/tidb-cloud-lake/guides/mcp-client-integration.md)** - Configure generic MCP clients (like Codex) to connect to Databend
39 changes: 39 additions & 0 deletions tidb-cloud-lake/guides/ai-powered-features.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
title: AI-Powered Features
---

import SearchSVG from '@site/static/img/icon/search.svg'
import LanguageFileParse from '@site/src/components/LanguageDocs/file-parse'
import AITip from '@site/docs/fragment/ai-tip.md'

<LanguageFileParse
cn={<AITip />}
/>

With the inclusion of AI-powered features, Databend Cloud allows you to engage in natural language conversations to receive help, assistance, and solutions. These AI-powered features are enabled by default, but you can disable them if desired by navigating to **Manage** > **Settings**.

### AI Chat for Assistance

AI Chat enables natural language interactions, allowing for intuitive information retrieval and streamlined problem-solving.

To launch an AI-Chat:

1. Click the magnifying glass icon <SearchSVG/> located in the sidebar to open the search box.

2. Switch to the **Chat** tab.

3. Enter your question.

![Alt text](@site/static/img/documents/worksheet/ai-chat.gif)

### AI-Powered SQL Assistant

AI assistance is available for editing SQL statements within worksheets. You don't need to write your SQL from scratch — AI can generate it for you.

To involve AI when editing a SQL statement, simply type "/" at the beginning of a new line and input your query, like "return current time":

![Alt text](@site/static/img/documents/worksheet/ai-worksheet-1.gif)

You can also get AI assistance for an existing SQL statement. To do so, highlight your SQL and click **Edit** to specify your desired changes or request further help. Alternatively, click **Chat** to engage in a conversation with AI for more comprehensive support.

![Alt text](@site/static/img/documents/worksheet/ai-worksheet-2.gif)
108 changes: 108 additions & 0 deletions tidb-cloud-lake/guides/audit-trail.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
title: Audit Trail
---

import EEFeature from '@site/src/components/EEFeature';

<EEFeature featureName='AUDIT TRAIL'/>

Databend system history tables automatically capture detailed records of database activities, providing a complete audit trail for compliance and security monitoring.

Allows the auditing of the user:
- **Query execution** - Complete SQL execution audit trail (`query_history`)
- **Data access** - Database object access and modifications (`access_history`)
- **Authentication** - Login attempts and session tracking (`login_history`)

## Available Audit Tables

Databend provides five system history tables that capture different aspects of database activity:

| Table | Purpose | Key Use Cases |
|-------|---------|---------------|
| [query_history](/tidb-cloud-lake/sql/system-history-query-history.md) | Complete SQL execution audit trail | Performance monitoring, security auditing, compliance reporting |
| [access_history](/tidb-cloud-lake/sql/system-history-access-history.md) | Database object access and modifications | Data lineage tracking, compliance auditing, change management |
| [login_history](/tidb-cloud-lake/sql/system-history-login-history.md) | Authentication attempts and sessions | Security monitoring, failed login detection, access pattern analysis |

## Audit Use Cases & Examples

### Security Monitoring

**Monitor Failed Login Attempts**

Track authentication failures to identify potential security threats and unauthorized access attempts.

```sql
-- Check for failed login attempts (security audit)
SELECT event_time, user_name, client_ip, error_message
FROM system_history.login_history
WHERE event_type = 'LoginFailed'
ORDER BY event_time DESC;
```

Example output:
```
event_time: 2025-06-03 06:07:32.512021
user_name: root1
client_ip: 127.0.0.1:62050
error_message: UnknownUser. Code: 2201, Text = User 'root1'@'%' does not exist.
```

### Compliance Reporting

**Track Database Schema Changes**

Monitor DDL operations for compliance and change management requirements.

```sql
-- Audit DDL operations (compliance tracking)
SELECT query_id, query_start, user_name, object_modified_by_ddl
FROM system_history.access_history
WHERE object_modified_by_ddl != '[]'
ORDER BY query_start DESC;
```

Example for `CREATE TABLE` operation:
```
query_id: c2c1c7be-cee4-4868-a28e-8862b122c365
query_start: 2025-06-12 03:31:19.042128
user_name: root
object_modified_by_ddl: [{"object_domain":"Table","object_name":"default.default.t","operation_type":"Create"}]
```

**Audit Data Access Patterns**

Track who accessed what data and when for compliance and data governance.

```sql
-- Track data access for compliance
SELECT query_id, query_start, user_name, base_objects_accessed
FROM system_history.access_history
WHERE base_objects_accessed != '[]'
ORDER BY query_start DESC;
```

### Operational Monitoring

**Complete Query Execution Audit**

Maintain comprehensive records of all SQL operations with user and timing information.

```sql
-- Complete query audit with user and timing information
SELECT query_id, sql_user, query_text, query_start_time, query_duration_ms, client_address
FROM system_history.query_history
WHERE event_date >= TODAY() - INTERVAL 7 DAY
ORDER BY query_start_time DESC;
```

Example output:
```
query_id: 4e1f50a9-bce2-45cc-86e4-c7a36b9b8d43
sql_user: root
query_text: SELECT * FROM t
query_start_time: 2025-06-12 03:31:35.041725
query_duration_ms: 94
client_address: 127.0.0.1
```

For detailed information about each audit table and their specific fields, see the [System History Tables](/tidb-cloud-lake/sql/system-history-tables.md) reference documentation.
Loading
Loading