diff --git a/TOC-tidb-cloud-lake.md b/TOC-tidb-cloud-lake.md new file mode 100644 index 0000000000000..f7157cf6e84d4 --- /dev/null +++ b/TOC-tidb-cloud-lake.md @@ -0,0 +1,1167 @@ + + + +# Table of Contents + +## Get Started + +- [Overview](/tidb-cloud-lake/lake-overview.md) +- [Quick Start](/tidb-cloud-lake/lake-quick-start.md) + +## Guides + +- Resources + - [Dashboards](/tidb-cloud-lake/guides/dashboards.md) + - [Organization & Members](/tidb-cloud-lake/guides/organization-members.md) + - [Warehouses](/tidb-cloud-lake/guides/warehouse.md) + - [Worksheets](/tidb-cloud-lake/guides/worksheet.md) +- Administration + - [AI-Powered Features](/tidb-cloud-lake/guides/ai-powered-features.md) + - [Manage Costs](/tidb-cloud-lake/guides/manage-costs.md) + - [Track Metrics with Prometheus](/tidb-cloud-lake/guides/track-metrics.md) + - [Monitor Usage](/tidb-cloud-lake/guides/monitor-usage.md) +- Security + - [Authenticate with AWS IAM Role](/tidb-cloud-lake/guides/authenticate-with-aws-iam-role.md) + - [Connecting to Databend Cloud with AWS PrivateLink](/tidb-cloud-lake/guides/connecting-to-databend-cloud-with-aws-privatelink.md) +- Integrations + - [Data Integration Overview](/tidb-cloud-lake/guides/data-integration-overview.md) + - [Integrate with MySQL](/tidb-cloud-lake/guides/integrate-with-mysql.md) + - [Integrate with Amazon S3](/tidb-cloud-lake/guides/integrate-with-amazon-s3.md) +- Connect + - [Overview](/tidb-cloud-lake/guides/connect-to-databend.md) + - SQL Clients + - [BendSQL](/tidb-cloud-lake/guides/bendsql.md) + - [DBeaver](/tidb-cloud-lake/guides/dbeaver.md) + - Drivers + - [Golang](/tidb-cloud-lake/guides/connect-using-golang.md) + - [Java](/tidb-cloud-lake/guides/connect-using-java.md) + - [Node.js](/tidb-cloud-lake/guides/connect-using-node-js.md) + - [Python](/tidb-cloud-lake/guides/connect-using-python.md) + - [Rust](/tidb-cloud-lake/guides/connect-using-rust.md) + - Visualization + - [Grafana](/tidb-cloud-lake/guides/grafana.md) + - [Tableau](/tidb-cloud-lake/guides/tableau.md) + - [Superset](/tidb-cloud-lake/guides/superset.md) + - [Metabase](/tidb-cloud-lake/guides/metabase.md) + - [Deepnote](/tidb-cloud-lake/guides/deepnote.md) + - [Jupyter Notebook](/tidb-cloud-lake/guides/jupyter-notebook.md) + - [MindsDB](/tidb-cloud-lake/guides/mindsdb.md) + - [Redash](/tidb-cloud-lake/guides/redash.md) +- Data Loading + - Work with Stages + - [Stage Overview](/tidb-cloud-lake/guides/what-is-stage.md) + - [Upload to Stage](/tidb-cloud-lake/guides/upload-to-stage.md) + - Load from Files + - [Load from Stage](/tidb-cloud-lake/guides/load-from-stage.md) + - [Load from Bucket](/tidb-cloud-lake/guides/load-from-bucket.md) + - [Load from Local File](/tidb-cloud-lake/guides/load-from-local-file.md) + - [Load from Remote File](/tidb-cloud-lake/guides/load-from-remote-file.md) + - Load with Platforms + - [Load with Addax](/tidb-cloud-lake/guides/load-with-addax.md) + - [Load with Airbyte](/tidb-cloud-lake/guides/load-with-airbyte.md) + - [Load with DataX](/tidb-cloud-lake/guides/load-with-datax.md) + - [Load with dbt](/tidb-cloud-lake/guides/load-with-dbt.md) + - [Load with Debezium](/tidb-cloud-lake/guides/load-with-debezium.md) + - [Load with Flink CDC](/tidb-cloud-lake/guides/load-with-flink-cdc.md) + - [Load with Kafka](/tidb-cloud-lake/guides/load-with-kafka.md) + - [Load with Tapdata](/tidb-cloud-lake/guides/load-with-tapdata.md) + - [Load with Vector](/tidb-cloud-lake/guides/load-with-vector.md) + - Load Semi-structured Formats + - [Overview](/tidb-cloud-lake/guides/load-semi-structured-formats.md) + - [Load Parquet](/tidb-cloud-lake/guides/load-parquet.md) + - [Load CSV](/tidb-cloud-lake/guides/load-csv.md) + - [Load TSV](/tidb-cloud-lake/guides/load-tsv.md) + - [Load NDJSON](/tidb-cloud-lake/guides/load-ndjson.md) + - [Load ORC](/tidb-cloud-lake/guides/load-orc.md) + - [Load Avro](/tidb-cloud-lake/guides/load-avro.md) + - Query & Transform + - [Overview](/tidb-cloud-lake/guides/query-stage.md) + - [Query Parquet Files](/tidb-cloud-lake/guides/query-parquet-files-in-stage.md) + - [Query CSV Files](/tidb-cloud-lake/guides/query-csv-files-in-stage.md) + - [Query TSV Files](/tidb-cloud-lake/guides/query-tsv-files-in-stage.md) + - [Query NDJSON Files](/tidb-cloud-lake/guides/query-ndjson-files-in-stage.md) + - [Query Avro Files](/tidb-cloud-lake/guides/query-avro-files-in-stage.md) + - [Query Staged ORC Files](/tidb-cloud-lake/guides/query-staged-orc-files-in-stage.md) + - [Transform Data on Load](/tidb-cloud-lake/guides/transform-data-on-load.md) + - Continuous Data Pipelines + - [Overview](/tidb-cloud-lake/guides/continuous-data-pipelines.md) + - [Track and Transform Data via Streams](/tidb-cloud-lake/guides/track-and-transform-data-via-streams.md) + - [Automate Data Loading with Tasks](/tidb-cloud-lake/guides/automate-data-loading-with-tasks.md) +- Data Unloading + - [Unload Parquet File](/tidb-cloud-lake/guides/unload-parquet-file.md) + - [Unload CSV File](/tidb-cloud-lake/guides/unload-csv-file.md) + - [Unload TSV File](/tidb-cloud-lake/guides/unload-tsv-file.md) + - [Unload NDJSON File](/tidb-cloud-lake/guides/unload-ndjson-file.md) + - [Unload Data from Databend](/tidb-cloud-lake/guides/unload-data-from-databend.md) +- AI and ML Integration + - [Overview](/tidb-cloud-lake/guides/ai-ml-integration.md) + - [External AI Functions](/tidb-cloud-lake/guides/external-ai-functions.md) + - [MCP Server](/tidb-cloud-lake/guides/mcp-server.md) + - [MCP Client Integration](/tidb-cloud-lake/guides/mcp-client-integration.md) +- Multimodal Data Analytics + - [Overview](/tidb-cloud-lake/guides/multimodal-data-analytics.md) + - [SQL Analytics](/tidb-cloud-lake/guides/sql-analytics.md) + - [JSON & Search](/tidb-cloud-lake/guides/json-search.md) + - [Vector Search](/tidb-cloud-lake/guides/vector-search.md) + - [Geo Analytics](/tidb-cloud-lake/guides/geo-analytics.md) + - [Lakehouse ETL](/tidb-cloud-lake/guides/lakehouse-etl.md) +- Performance Optimization + - [Overview](/tidb-cloud-lake/guides/performance-optimization.md) + - [Cluster Key](/tidb-cloud-lake/guides/cluster-key-performance.md) + - [Virtual Column](/tidb-cloud-lake/guides/virtual-column.md) + - [Aggregating Index](/tidb-cloud-lake/guides/aggregating-index.md) + - [Full-Text Index](/tidb-cloud-lake/guides/full-text-index.md) + - [Ngram Index](/tidb-cloud-lake/guides/ngram-index.md) + - [Query Result Cache](/tidb-cloud-lake/guides/query-result-cache.md) +- Security & Reliability + - [Overview](/tidb-cloud-lake/guides/security-reliability.md) + - Access Control + - [Overview](/tidb-cloud-lake/guides/access-control.md) + - [Privileges](/tidb-cloud-lake/guides/privileges.md) + - [Roles](/tidb-cloud-lake/guides/roles.md) + - [Ownership](/tidb-cloud-lake/guides/ownership.md) + - [Audit Trail](/tidb-cloud-lake/guides/audit-trail.md) + - [Fail-Safe](/tidb-cloud-lake/guides/fail-safe.md) + - [Masking Policy](/tidb-cloud-lake/guides/masking-policy.md) + - [Network Policy](/tidb-cloud-lake/guides/network-policy.md) + - [Password Policy](/tidb-cloud-lake/guides/password-policy.md) + - [Recovery from Operational Errors](/tidb-cloud-lake/guides/recovery-from-operational-errors.md) +- Data Management + - [Overview](/tidb-cloud-lake/guides/data-management.md) + - [Data Lifecycle](/tidb-cloud-lake/guides/data-lifecycle.md) + - [Data Recovery](/tidb-cloud-lake/guides/data-recovery.md) + - [Data Protection](/tidb-cloud-lake/guides/data-protection.md) + - [Data Purge and Recycle](/tidb-cloud-lake/guides/data-purge-and-recycle.md) + +## Tutorials + +- Ingest & Stream Data + - [Access MySQL & Redis via Dictionaries](/tidb-cloud-lake/tutorials/access-mysql-redis-via-dictionaries.md) + - [Ingest JSON Logs with Vector (Cloud)](/tidb-cloud-lake/tutorials/ingest-json-logs-with-vector-cloud.md) + - [Ingest Kafka with Bend Ingest](/tidb-cloud-lake/tutorials/ingest-kafka-with-bend-ingest.md) + - [Ingest Kafka with Kafka Connect](/tidb-cloud-lake/tutorials/ingest-kafka-with-kafka-connect.md) + - [Inspect Metadata](/tidb-cloud-lake/tutorials/inspect-metadata.md) +- Migrate Data + - [Overview](/tidb-cloud-lake/tutorials/data-migration-overview.md) + - [Migrate MySQL with Addax (Batch)](/tidb-cloud-lake/tutorials/migrate-from-mysql-with-addax.md) + - [Migrate MySQL with DataX (Batch)](/tidb-cloud-lake/tutorials/migrate-from-mysql-with-datax.md) + - [Migrate MySQL with bend-archiver (Batch)](/tidb-cloud-lake/tutorials/migrate-from-mysql-with-bend-archiver.md) + - [Migrate MySQL with Debezium (CDC)](/tidb-cloud-lake/tutorials/migrate-from-mysql-with-debezium.md) + - [Migrate MySQL with Flink CDC](/tidb-cloud-lake/tutorials/migrate-from-mysql-with-flink-cdc.md) + - [Migrate MySQL with Kafka Connect (CDC)](/tidb-cloud-lake/tutorials/migrate-from-mysql-with-kafka-connect.md) + - [Migrate from Snowflake to Databend](/tidb-cloud-lake/tutorials/migrate-from-snowflake.md) +- Operate & Recover + - [Backup & Restore with BendSave](/tidb-cloud-lake/tutorials/backup-restore-with-bendsave.md) +- Cloud Operations + - [AWS Billing](/tidb-cloud-lake/tutorials/aws-billing.md) + - [Dashboard Tour](/tidb-cloud-lake/tutorials/dashboard-tour.md) + - [Data Sharing via ATTACH TABLE](/tidb-cloud-lake/tutorials/data-sharing-via-attach-table.md) + +## Reference + +- SQL Reference + - [Overview](/tidb-cloud-lake/sql/sql-reference-overview.md) + - SQL General + - Data Types + - [Overview](/tidb-cloud-lake/sql/data-types.md) + - [Array](/tidb-cloud-lake/sql/array.md) + - [Binary](/tidb-cloud-lake/sql/binary.md) + - [Bitmap](/tidb-cloud-lake/sql/bitmap.md) + - [Boolean](/tidb-cloud-lake/sql/boolean.md) + - [Date & Time](/tidb-cloud-lake/sql/date-time.md) + - [Decimal](/tidb-cloud-lake/sql/decimal.md) + - [Geospatial](/tidb-cloud-lake/sql/geospatial.md) + - [Interval](/tidb-cloud-lake/sql/interval.md) + - [Map](/tidb-cloud-lake/sql/map.md) + - [Numeric](/tidb-cloud-lake/sql/numeric.md) + - [String](/tidb-cloud-lake/sql/string.md) + - [Tuple](/tidb-cloud-lake/sql/tuple.md) + - [Variant](/tidb-cloud-lake/sql/variant.md) + - [Vector](/tidb-cloud-lake/sql/vector.md) + - Information Schema + - [Information_Schema Tables](/tidb-cloud-lake/sql/information-schema-tables.md) + - [information_schema.columns](/tidb-cloud-lake/sql/information-schema-columns.md) + - [information_schema.keywords](/tidb-cloud-lake/sql/information-schema-keywords.md) + - [information_schema.schemata](/tidb-cloud-lake/sql/information-schema-schemata.md) + - [information_schema.tables](/tidb-cloud-lake/sql/information-schema-tables-sql.md) + - [information_schema.views](/tidb-cloud-lake/sql/information-schema-views.md) + - Table Engines + - [Overview](/tidb-cloud-lake/sql/table-engines.md) + - [Fuse Engine Tables](/tidb-cloud-lake/sql/fuse-engine-tables.md) + - [Apache Iceberg™ Tables](/tidb-cloud-lake/sql/apache-icebergtm-tables.md) + - [Apache Hive Tables](/tidb-cloud-lake/sql/apache-hive-tables.md) + - [Delta Lake Engine](/tidb-cloud-lake/sql/delta-lake-engine.md) + - System Tables + - [Overview](/tidb-cloud-lake/sql/system-tables.md) + - [system.build_options](/tidb-cloud-lake/sql/system-build-options.md) + - [system.caches](/tidb-cloud-lake/sql/system-caches.md) + - [system.clusters](/tidb-cloud-lake/sql/system-clusters.md) + - [system.columns](/tidb-cloud-lake/sql/system-columns.md) + - [system.configs](/tidb-cloud-lake/sql/system-configs.md) + - [system.contributors](/tidb-cloud-lake/sql/system-contributors.md) + - [system.copy_history](/tidb-cloud-lake/sql/system-copy-history.md) + - [system.credits](/tidb-cloud-lake/sql/system-credits.md) + - [system.databases](/tidb-cloud-lake/sql/system-databases.md) + - [system.databases_with_history](/tidb-cloud-lake/sql/system-databases-with-history.md) + - [system.functions](/tidb-cloud-lake/sql/system-functions.md) + - [system.indexes](/tidb-cloud-lake/sql/system-indexes.md) + - [system.locks](/tidb-cloud-lake/sql/system-locks.md) + - [system.metrics](/tidb-cloud-lake/sql/system-metrics.md) + - [system.numbers](/tidb-cloud-lake/sql/system-numbers.md) + - [system.query_cache](/tidb-cloud-lake/sql/system-query-cache.md) + - [system.query_log](/tidb-cloud-lake/sql/system-query-log.md) + - [system.settings](/tidb-cloud-lake/sql/system-settings.md) + - [system.streams](/tidb-cloud-lake/sql/system-streams.md) + - [system.table_functions](/tidb-cloud-lake/sql/system-table-functions.md) + - [system.tables](/tidb-cloud-lake/sql/system-tables-sql.md) + - [system.tables_with_history](/tidb-cloud-lake/sql/system-tables-with-history.md) + - [system.temp_files](/tidb-cloud-lake/sql/system-temp-files.md) + - [system.temporary_tables](/tidb-cloud-lake/sql/system-temporary-tables.md) + - [system.user_functions](/tidb-cloud-lake/sql/system-user-functions.md) + - [system.views](/tidb-cloud-lake/sql/system-views.md) + - [system.virtual_columns](/tidb-cloud-lake/sql/system-virtual-columns.md) + - System History Tables + - [Overview](/tidb-cloud-lake/sql/system-history-tables.md) + - [system_history.access_history](/tidb-cloud-lake/sql/system-history-access-history.md) + - [system_history.log_history](/tidb-cloud-lake/sql/system-history-log-history.md) + - [system_history.login_history](/tidb-cloud-lake/sql/system-history-login-history.md) + - [system_history.profile_history](/tidb-cloud-lake/sql/system-history-profile-history.md) + - [system_history.query_history](/tidb-cloud-lake/sql/system-history-query-history.md) + - [SQL Identifiers](/tidb-cloud-lake/sql/sql-identifiers.md) + - [Input & Output File Formats](/tidb-cloud-lake/sql/input-output-file-formats.md) + - [Connection Parameters](/tidb-cloud-lake/sql/connection-parameters.md) + - [SQL Dialects & Conformance](/tidb-cloud-lake/sql/sql-dialects-conformance.md) + - SQL Statements + - [Overview](/tidb-cloud-lake/sql/sql-statements-reference.md) + - DDL Statements + - [DDL Overview](/tidb-cloud-lake/sql/ddl.md) + - Database + - [Overview](/tidb-cloud-lake/sql/ddl-database-overview.md) + - [RENAME DATABASE](/tidb-cloud-lake/sql/rename-database.md) + - [CREATE DATABASE](/tidb-cloud-lake/sql/create-database.md) + - [DROP DATABASE](/tidb-cloud-lake/sql/drop-database.md) + - [USE DATABASE](/tidb-cloud-lake/sql/use-database.md) + - [SHOW CREATE DATABASE](/tidb-cloud-lake/sql/show-create-database.md) + - [SHOW DATABASES](/tidb-cloud-lake/sql/show-databases.md) + - [SHOW DROP DATABASES](/tidb-cloud-lake/sql/show-drop-databases.md) + - [UNDROP DATABASE](/tidb-cloud-lake/sql/undrop-database.md) + - Table + - [Overview](/tidb-cloud-lake/sql/ddl-table-overview.md) + - [CREATE TABLE](/tidb-cloud-lake/sql/create-table.md) + - [CREATE EXTERNAL TABLE](/tidb-cloud-lake/sql/create-external-table.md) + - [CREATE TEMP TABLE](/tidb-cloud-lake/sql/create-temp-table.md) + - [CREATE TRANSIENT TABLE](/tidb-cloud-lake/sql/create-transient-table.md) + - [DROP TABLE](/tidb-cloud-lake/sql/drop-table.md) + - [UNDROP TABLE](/tidb-cloud-lake/sql/undrop-table.md) + - [RENAME TABLE](/tidb-cloud-lake/sql/rename-table.md) + - [TRUNCATE TABLE](/tidb-cloud-lake/sql/truncate-table.md) + - [DESCRIBE TABLE](/tidb-cloud-lake/sql/describe-table.md) + - [OPTIMIZE TABLE](/tidb-cloud-lake/sql/optimize-table.md) + - [FLASHBACK TABLE](/tidb-cloud-lake/sql/flashback-table.md) + - [ALTER TABLE](/tidb-cloud-lake/sql/alter-table.md) + - [VACUUM DROP TABLE](/tidb-cloud-lake/sql/vacuum-drop-table.md) + - [VACUUM TABLE](/tidb-cloud-lake/sql/vacuum-table.md) + - [ATTACH TABLE](/tidb-cloud-lake/sql/attach-table.md) + - [SHOW CREATE TABLE](/tidb-cloud-lake/sql/show-create-table.md) + - [SHOW DROP TABLES](/tidb-cloud-lake/sql/show-drop-tables.md) + - [SHOW FIELDS](/tidb-cloud-lake/sql/show-fields.md) + - [SHOW COLUMNS](/tidb-cloud-lake/sql/show-columns.md) + - [SHOW STATISTICS](/tidb-cloud-lake/sql/show-statistics.md) + - [SHOW TABLE STATUS](/tidb-cloud-lake/sql/show-table-status.md) + - [SHOW TABLES](/tidb-cloud-lake/sql/show-tables.md) + - View + - [Overview](/tidb-cloud-lake/sql/ddl-view-overview.md) + - [ALTER VIEW](/tidb-cloud-lake/sql/alter-view.md) + - [CREATE VIEW](/tidb-cloud-lake/sql/create-view.md) + - [DROP VIEW](/tidb-cloud-lake/sql/drop-view.md) + - [DESC VIEW](/tidb-cloud-lake/sql/desc-view.md) + - [SHOW VIEWS](/tidb-cloud-lake/sql/show-views.md) + - User & Role + - [Overview](/tidb-cloud-lake/sql/user-role.md) + - [CREATE USER](/tidb-cloud-lake/sql/create-user.md) + - [DESC USER](/tidb-cloud-lake/sql/desc-user.md) + - [DROP USER](/tidb-cloud-lake/sql/drop-user.md) + - [SHOW USERS](/tidb-cloud-lake/sql/show-users.md) + - [ALTER USER](/tidb-cloud-lake/sql/alter-user.md) + - [CREATE ROLE](/tidb-cloud-lake/sql/create-role.md) + - [SET SECONDARY ROLES](/tidb-cloud-lake/sql/set-secondary-roles.md) + - [SET ROLE](/tidb-cloud-lake/sql/set-role.md) + - [SHOW ROLES](/tidb-cloud-lake/sql/show-roles.md) + - [DROP ROLE](/tidb-cloud-lake/sql/drop-role.md) + - [GRANT](/tidb-cloud-lake/sql/grant.md) + - [REVOKE](/tidb-cloud-lake/sql/revoke.md) + - [SHOW GRANTS](/tidb-cloud-lake/sql/show-grants.md) + - Stage + - [Overview](/tidb-cloud-lake/sql/stage.md) + - [CREATE STAGE](/tidb-cloud-lake/sql/create-stage.md) + - [DROP STAGE](/tidb-cloud-lake/sql/drop-stage.md) + - [DESC STAGE](/tidb-cloud-lake/sql/desc-stage.md) + - [LIST STAGE FILES](/tidb-cloud-lake/sql/list-stage-files.md) + - [REMOVE STAGE FILES](/tidb-cloud-lake/sql/remove-stage-files.md) + - [SHOW STAGES](/tidb-cloud-lake/sql/show-stages.md) + - [PRESIGN](/tidb-cloud-lake/sql/presign.md) + - Sequence + - [Overview](/tidb-cloud-lake/sql/sequence.md) + - [CREATE SEQUENCE](/tidb-cloud-lake/sql/create-sequence.md) + - [DESC SEQUENCE](/tidb-cloud-lake/sql/desc-sequence.md) + - [DROP SEQUENCE](/tidb-cloud-lake/sql/drop-sequence.md) + - [SHOW SEQUENCES](/tidb-cloud-lake/sql/show-sequences.md) + - Stream + - [Overview](/tidb-cloud-lake/sql/stream.md) + - [CREATE STREAM](/tidb-cloud-lake/sql/create-stream.md) + - [DESC STREAM](/tidb-cloud-lake/sql/desc-stream.md) + - [DROP STREAM](/tidb-cloud-lake/sql/drop-stream.md) + - [SHOW STREAMS](/tidb-cloud-lake/sql/show-streams.md) + - Task + - [Overview](/tidb-cloud-lake/sql/task.md) + - [CREATE TASK](/tidb-cloud-lake/sql/create-task.md) + - [ALTER TASK](/tidb-cloud-lake/sql/alter-task.md) + - [DROP TASK](/tidb-cloud-lake/sql/drop-task.md) + - [EXECUTE TASK](/tidb-cloud-lake/sql/execute-task.md) + - [SHOW TASKS](/tidb-cloud-lake/sql/show-tasks.md) + - [TASK ERROR NOTIFICATION PAYLOAD](/tidb-cloud-lake/sql/task-error-notification-payload.md) + - Notification + - [Overview](/tidb-cloud-lake/sql/notification.md) + - [CREATE NOTIFICATION INTEGRATION](/tidb-cloud-lake/sql/create-notification-integration.md) + - [ALTER NOTIFICATION INTEGRATION](/tidb-cloud-lake/sql/alter-notification-integration.md) + - [DROP NOTIFICATION INTEGRATION](/tidb-cloud-lake/sql/drop-notification-integration.md) + - Connection + - [Overview](/tidb-cloud-lake/sql/connection.md) + - [CREATE CONNECTION](/tidb-cloud-lake/sql/create-connection.md) + - [DESC CONNECTION](/tidb-cloud-lake/sql/desc-connection.md) + - [DROP CONNECTION](/tidb-cloud-lake/sql/drop-connection.md) + - [SHOW CONNECTIONS](/tidb-cloud-lake/sql/show-connections.md) + - File Format + - [Overview](/tidb-cloud-lake/sql/file-format.md) + - [CREATE FILE FORMAT](/tidb-cloud-lake/sql/create-file-format.md) + - [DROP FILE FORMAT](/tidb-cloud-lake/sql/drop-file-format.md) + - [SHOW FILE FORMATS](/tidb-cloud-lake/sql/show-file-formats.md) + - Cluster Key + - [ALTER CLUSTER KEY](/tidb-cloud-lake/sql/alter-cluster-key.md) + - [DROP CLUSTER KEY](/tidb-cloud-lake/sql/drop-cluster-key.md) + - [RECLUSTER TABLE](/tidb-cloud-lake/sql/recluster-table.md) + - [SET CLUSTER KEY](/tidb-cloud-lake/sql/set-cluster-key.md) + - [Cluster Key](/tidb-cloud-lake/sql/cluster-key.md) + - Aggregating Index + - [CREATE AGGREGATING INDEX](/tidb-cloud-lake/sql/create-aggregating-index.md) + - [DROP AGGREGATING INDEX](/tidb-cloud-lake/sql/drop-aggregating-index.md) + - [Aggregating Index](/tidb-cloud-lake/sql/aggregating-index.md) + - [REFRESH AGGREGATING INDEX](/tidb-cloud-lake/sql/refresh-aggregating-index.md) + - Inverted Index + - [CREATE INVERTED INDEX](/tidb-cloud-lake/sql/create-inverted-index.md) + - [DROP INVERTED INDEX](/tidb-cloud-lake/sql/drop-inverted-index.md) + - [Inverted Index](/tidb-cloud-lake/sql/inverted-index.md) + - [REFRESH INVERTED INDEX](/tidb-cloud-lake/sql/refresh-inverted-index.md) + - Ngram Index + - [CREATE NGRAM INDEX](/tidb-cloud-lake/sql/create-ngram-index.md) + - [DROP NGRAM INDEX](/tidb-cloud-lake/sql/drop-ngram-index.md) + - [Ngram Index](/tidb-cloud-lake/sql/ngram-index.md) + - [REFRESH NGRAM INDEX](/tidb-cloud-lake/sql/refresh-ngram-index.md) + - Vector Index + - [CREATE VECTOR INDEX](/tidb-cloud-lake/sql/create-vector-index.md) + - [DROP VECTOR INDEX](/tidb-cloud-lake/sql/drop-vector-index.md) + - [Vector Index](/tidb-cloud-lake/sql/vector-index.md) + - [REFRESH VECTOR INDEX](/tidb-cloud-lake/sql/refresh-vector-index.md) + - Virtual Column + - [Virtual Column](/tidb-cloud-lake/sql/virtual-column.md) + - [REFRESH VIRTUAL COLUMN](/tidb-cloud-lake/sql/refresh-virtual-column.md) + - [SHOW VIRTUAL COLUMNS](/tidb-cloud-lake/sql/show-virtual-columns.md) + - User-Defined Function + - [ALTER FUNCTION](/tidb-cloud-lake/sql/alter-function.md) + - [CREATE AGGREGATE FUNCTION](/tidb-cloud-lake/sql/create-aggregate-function.md) + - [CREATE SCALAR FUNCTION](/tidb-cloud-lake/sql/create-scalar-function.md) + - [CREATE TABLE FUNCTION](/tidb-cloud-lake/sql/create-table-function.md) + - [DROP FUNCTION](/tidb-cloud-lake/sql/drop-function.md) + - [SHOW USER FUNCTIONS](/tidb-cloud-lake/sql/show-user-functions.md) + - [User-Defined Function](/tidb-cloud-lake/sql/user-defined-function.md) + - External Function + - [ALTER FUNCTION](/tidb-cloud-lake/sql/alter-function-sql.md) + - [CREATE FUNCTION](/tidb-cloud-lake/sql/create-function.md) + - [DROP FUNCTION](/tidb-cloud-lake/sql/drop-function-sql.md) + - [External Function](/tidb-cloud-lake/sql/external-function.md) + - Masking Policy + - [Overview](/tidb-cloud-lake/sql/masking-policy-sql.md) + - [CREATE MASKING POLICY](/tidb-cloud-lake/sql/create-masking-policy.md) + - [DESC MASKING POLICY](/tidb-cloud-lake/sql/desc-masking-policy.md) + - [DROP MASKING POLICY](/tidb-cloud-lake/sql/drop-masking-policy.md) + - Network Policy + - [ALTER NETWORK POLICY](/tidb-cloud-lake/sql/alter-network-policy.md) + - [CREATE NETWORK POLICY](/tidb-cloud-lake/sql/create-network-policy.md) + - [DESC NETWORK POLICY](/tidb-cloud-lake/sql/desc-network-policy.md) + - [DROP NETWORK POLICY](/tidb-cloud-lake/sql/drop-network-policy.md) + - [SHOW NETWORK POLICIES](/tidb-cloud-lake/sql/show-network-policies.md) + - [Network Policy](/tidb-cloud-lake/sql/network-policy.md) + - Password Policy + - [ALTER PASSWORD POLICY](/tidb-cloud-lake/sql/alter-password-policy.md) + - [CREATE PASSWORD POLICY](/tidb-cloud-lake/sql/create-password-policy.md) + - [DESC PASSWORD POLICY](/tidb-cloud-lake/sql/desc-password-policy.md) + - [DROP PASSWORD POLICY](/tidb-cloud-lake/sql/drop-password-policy.md) + - [Password Policy](/tidb-cloud-lake/sql/password-policy.md) + - [SHOW PASSWORD POLICIES](/tidb-cloud-lake/sql/show-password-policies.md) + - Transaction + - [BEGIN](/tidb-cloud-lake/sql/begin.md) + - [COMMIT](/tidb-cloud-lake/sql/commit.md) + - [Transaction](/tidb-cloud-lake/sql/transaction.md) + - [ROLLBACK](/tidb-cloud-lake/sql/rollback.md) + - [SHOW LOCKS](/tidb-cloud-lake/sql/show-locks.md) + - Variable + - [SQL Variables](/tidb-cloud-lake/sql/sql-variables.md) + - [SET VARIABLE](/tidb-cloud-lake/sql/set-variable.md) + - [SHOW VARIABLES](/tidb-cloud-lake/sql/show-variables.md) + - [UNSET VARIABLE](/tidb-cloud-lake/sql/unset-variable.md) + - Stored Procedure + - [CALL PROCEDURE](/tidb-cloud-lake/sql/call-procedure.md) + - [CREATE PROCEDURE](/tidb-cloud-lake/sql/create-procedure.md) + - [DESC PROCEDURE](/tidb-cloud-lake/sql/desc-procedure.md) + - [DROP PROCEDURE](/tidb-cloud-lake/sql/drop-procedure.md) + - [Stored Procedure](/tidb-cloud-lake/sql/stored-procedure.md) + - [SHOW PROCEDURES](/tidb-cloud-lake/sql/show-procedures.md) + - Warehouse + - [ALTER WAREHOUSE](/tidb-cloud-lake/sql/alter-warehouse.md) + - [CREATE WAREHOUSE](/tidb-cloud-lake/sql/create-warehouse.md) + - [DROP WAREHOUSE](/tidb-cloud-lake/sql/drop-warehouse.md) + - [Warehouse](/tidb-cloud-lake/sql/warehouse.md) + - [QUERY_HISTORY](/tidb-cloud-lake/sql/query-history.md) + - [SHOW WAREHOUSES](/tidb-cloud-lake/sql/show-warehouses.md) + - [USE WAREHOUSE](/tidb-cloud-lake/sql/use-warehouse.md) + - workload group + - [ALTER WORKLOAD GROUP](/tidb-cloud-lake/sql/alter-workload-group.md) + - [CREATE WORKLOAD GROUP](/tidb-cloud-lake/sql/create-workload-group.md) + - [DROP WORKLOAD GROUP](/tidb-cloud-lake/sql/drop-workload-group.md) + - [Workload Group](/tidb-cloud-lake/sql/workload-group.md) + - [RENAME WORKLOAD GROUP](/tidb-cloud-lake/sql/rename-workload-group.md) + - [SHOW WORKLOAD GROUPS](/tidb-cloud-lake/sql/show-workload-groups.md) + - DML Commands + - [DML Overview](/tidb-cloud-lake/sql/dml.md) + - [COPY INTO ](/tidb-cloud-lake/sql/copy-into-location.md) + - [COPY INTO ](/tidb-cloud-lake/sql/copy-into-table.md) + - [DELETE](/tidb-cloud-lake/sql/delete.md) + - [INSERT](/tidb-cloud-lake/sql/insert.md) + - [INSERT (multi-table)](/tidb-cloud-lake/sql/insert-multi-table.md) + - [MERGE](/tidb-cloud-lake/sql/merge.md) + - [REPLACE](/tidb-cloud-lake/sql/replace.md) + - [UPDATE](/tidb-cloud-lake/sql/update.md) + - Query Syntax + - [Overview](/tidb-cloud-lake/sql/query-syntax.md) + - [SELECT](/tidb-cloud-lake/sql/select.md) + - [AT](/tidb-cloud-lake/sql/at.md) + - [JOIN](/tidb-cloud-lake/sql/join.md) + - [PIVOT](/tidb-cloud-lake/sql/pivot.md) + - [UNPIVOT](/tidb-cloud-lake/sql/unpivot.md) + - [GROUP BY](/tidb-cloud-lake/sql/group-by.md) + - [CHANGES](/tidb-cloud-lake/sql/changes.md) + - [QUALIFY](/tidb-cloud-lake/sql/qualify.md) + - [SETTINGS Clause](/tidb-cloud-lake/sql/settings-clause.md) + - [TOP](/tidb-cloud-lake/sql/top.md) + - [VALUES](/tidb-cloud-lake/sql/values.md) + - [WITH Clause](/tidb-cloud-lake/sql/clause.md) + - [WITH CONSUME](/tidb-cloud-lake/sql/with-consume.md) + - [WITH Stream Hints](/tidb-cloud-lake/sql/stream-hints.md) + - Query Operators + - [Arithmetic Operators](/tidb-cloud-lake/sql/arithmetic-operators.md) + - [Comparison Operators](/tidb-cloud-lake/sql/comparison-operators.md) + - [Query Operators](/tidb-cloud-lake/sql/query-operators.md) + - [JSON Operators](/tidb-cloud-lake/sql/json-operators.md) + - [Logical Operators](/tidb-cloud-lake/sql/logical-operators.md) + - [Set Operators](/tidb-cloud-lake/sql/set-operators.md) + - [Subquery Operators](/tidb-cloud-lake/sql/subquery-operators.md) + - Explain Commands + - [Overview](/tidb-cloud-lake/sql/explain-commands.md) + - [EXPLAIN](/tidb-cloud-lake/sql/explain.md) + - [EXPLAIN ANALYZE](/tidb-cloud-lake/sql/explain-analyze.md) + - [EXPLAIN ANALYZE GRAPHICAL](/tidb-cloud-lake/sql/explain-analyze-graphical.md) + - [EXPLAIN AST](/tidb-cloud-lake/sql/explain-ast.md) + - [EXPLAIN PERF](/tidb-cloud-lake/sql/explain-perf.md) + - [EXPLAIN RAW](/tidb-cloud-lake/sql/explain-raw.md) + - [EXPLAIN SYNTAX](/tidb-cloud-lake/sql/explain-syntax.md) + - Administration Commands + - [Overview](/tidb-cloud-lake/sql/administration-commands.md) + - [KILL](/tidb-cloud-lake/sql/kill.md) + - [SET](/tidb-cloud-lake/sql/set.md) + - [UNSET](/tidb-cloud-lake/sql/unset.md) + - [SET_VAR](/tidb-cloud-lake/sql/set-var.md) + - [SHOW SETTINGS](/tidb-cloud-lake/sql/show-settings.md) + - [SHOW FUNCTIONS](/tidb-cloud-lake/sql/show-functions.md) + - [SHOW USER FUNCTIONS](/tidb-cloud-lake/sql/show-user-functions-sql.md) + - [SHOW TABLE FUNCTIONS](/tidb-cloud-lake/sql/show-table-functions.md) + - [SHOW PROCESSLIST](/tidb-cloud-lake/sql/show-processlist.md) + - [SHOW METRICS](/tidb-cloud-lake/sql/show-metrics.md) + - [VACUUM DROP TABLE](/tidb-cloud-lake/sql/vacuum-drop-table-sql.md) + - [VACUUM TABLE](/tidb-cloud-lake/sql/vacuum-table-sql.md) + - [VACUUM TEMPORARY FILES](/tidb-cloud-lake/sql/vacuum-temporary-files.md) + - [EXECUTE IMMEDIATE](/tidb-cloud-lake/sql/execute-immediate.md) + - [SYSTEM FLUSH PRIVILEGES](/tidb-cloud-lake/sql/system-flush-privileges.md) + - [SYSTEM ENABLE / DISABLE EXCEPTION_BACKTRACE](/tidb-cloud-lake/sql/system-enable-disable-exception-backtrace.md) + - [SHOW INDEXES](/tidb-cloud-lake/sql/show-indexes.md) + - SQL Functions + - [SQL Function Reference](/tidb-cloud-lake/sql/sql-function-reference.md) + - Bitmap Functions + - [BITMAP_AND](/tidb-cloud-lake/sql/bitmap-sql.md) + - [BITMAP_AND_COUNT](/tidb-cloud-lake/sql/bitmap-count.md) + - [BITMAP_AND_NOT](/tidb-cloud-lake/sql/bitmap-not.md) + - [BITMAP_CARDINALITY](/tidb-cloud-lake/sql/bitmap-cardinality.md) + - [BITMAP_CONTAINS](/tidb-cloud-lake/sql/bitmap-contains.md) + - [BITMAP_COUNT](/tidb-cloud-lake/sql/bitmap-count-sql.md) + - [BITMAP_HAS_ALL](/tidb-cloud-lake/sql/bitmap-has-all.md) + - [BITMAP_HAS_ANY](/tidb-cloud-lake/sql/bitmap-has-any.md) + - [BITMAP_INTERSECT](/tidb-cloud-lake/sql/bitmap-intersect.md) + - [BITMAP_MAX](/tidb-cloud-lake/sql/bitmap-max.md) + - [BITMAP_MIN](/tidb-cloud-lake/sql/bitmap-min.md) + - [BITMAP_NOT](/tidb-cloud-lake/sql/bitmap-not-sql.md) + - [BITMAP_NOT_COUNT](/tidb-cloud-lake/sql/bitmap-not-count.md) + - [BITMAP_OR](/tidb-cloud-lake/sql/bitmap-or.md) + - [BITMAP_OR_COUNT](/tidb-cloud-lake/sql/bitmap-or-count.md) + - [BITMAP_SUBSET_IN_RANGE](/tidb-cloud-lake/sql/bitmap-subset-range.md) + - [BITMAP_SUBSET_LIMIT](/tidb-cloud-lake/sql/bitmap-subset-limit.md) + - [BITMAP_TO_ARRAY](/tidb-cloud-lake/sql/bitmap-array.md) + - [BITMAP_UNION](/tidb-cloud-lake/sql/bitmap-union.md) + - [BITMAP_XOR](/tidb-cloud-lake/sql/bitmap-xor.md) + - [BITMAP_XOR_COUNT](/tidb-cloud-lake/sql/bitmap-xor-count.md) + - [Bitmap Functions](/tidb-cloud-lake/sql/bitmap-functions.md) + - [INTERSECT_COUNT](/tidb-cloud-lake/sql/intersect-count.md) + - [SUB_BITMAP](/tidb-cloud-lake/sql/sub-bitmap.md) + - Conversion Functions + - [BUILD_BITMAP](/tidb-cloud-lake/sql/build-bitmap.md) + - [CAST::](/tidb-cloud-lake/sql/cast.md) + - [Conversion Functions](/tidb-cloud-lake/sql/conversion-functions.md) + - [TO_BINARY](/tidb-cloud-lake/sql/to-binary.md) + - [TO_BITMAP](/tidb-cloud-lake/sql/to-bitmap.md) + - [TO_BOOLEAN](/tidb-cloud-lake/sql/to-boolean.md) + - [TO_DECIMAL](/tidb-cloud-lake/sql/to-decimal.md) + - [TO_FLOAT32](/tidb-cloud-lake/sql/to-float32.md) + - [TO_FLOAT64](/tidb-cloud-lake/sql/to-float64.md) + - [TO_HEX](/tidb-cloud-lake/sql/to-hex.md) + - [TO_INT16](/tidb-cloud-lake/sql/to-int16.md) + - [TO_INT32](/tidb-cloud-lake/sql/to-int32.md) + - [TO_INT64](/tidb-cloud-lake/sql/to-int64.md) + - [TO_INT8](/tidb-cloud-lake/sql/to-int8.md) + - [TO_STRING](/tidb-cloud-lake/sql/to-string.md) + - [TO_TEXT](/tidb-cloud-lake/sql/to-text.md) + - [TO_UINT16](/tidb-cloud-lake/sql/to-uint16.md) + - [TO_UINT32](/tidb-cloud-lake/sql/to-uint32.md) + - [TO_UINT64](/tidb-cloud-lake/sql/to-uint64.md) + - [TO_UINT8](/tidb-cloud-lake/sql/to-uint8.md) + - [TO_VARCHAR](/tidb-cloud-lake/sql/to-varchar.md) + - [TO_VARIANT](/tidb-cloud-lake/sql/to-variant.md) + - [TRY_CAST](/tidb-cloud-lake/sql/try-cast.md) + - [TRY_TO_BINARY](/tidb-cloud-lake/sql/try-to-binary.md) + - Conditional Functions + - [[ NOT ] BETWEEN](/tidb-cloud-lake/sql/not-between.md) + - [CASE](/tidb-cloud-lake/sql/case.md) + - [COALESCE](/tidb-cloud-lake/sql/coalesce.md) + - [DECODE](/tidb-cloud-lake/sql/decode.md) + - [ERROR_OR](/tidb-cloud-lake/sql/error.md) + - [GREATEST](/tidb-cloud-lake/sql/greatest.md) + - [GREATEST_IGNORE_NULLS](/tidb-cloud-lake/sql/greatest-ignore-nulls.md) + - [IF](/tidb-cloud-lake/sql/if.md) + - [IFF](/tidb-cloud-lake/sql/iff.md) + - [IFNULL](/tidb-cloud-lake/sql/ifnull.md) + - [[ NOT ] IN](/tidb-cloud-lake/sql/not.md) + - [Conditional Functions](/tidb-cloud-lake/sql/conditional-functions.md) + - [IS [ NOT ] DISTINCT FROM](/tidb-cloud-lake/sql/is-not-distinct.md) + - [IS_ERROR](/tidb-cloud-lake/sql/is-error.md) + - [IS_NOT_ERROR](/tidb-cloud-lake/sql/is-not-error.md) + - [IS_NOT_NULL](/tidb-cloud-lake/sql/is-not-null.md) + - [IS_NULL](/tidb-cloud-lake/sql/is-null.md) + - [LEAST](/tidb-cloud-lake/sql/least.md) + - [LEAST_IGNORE_NULLS](/tidb-cloud-lake/sql/least-ignore-nulls.md) + - [NULLIF](/tidb-cloud-lake/sql/nullif.md) + - [NVL](/tidb-cloud-lake/sql/nvl.md) + - [NVL2](/tidb-cloud-lake/sql/nvl2.md) + - Numeric Functions + - [Overview](/tidb-cloud-lake/sql/numeric-functions.md) + - [ABS](/tidb-cloud-lake/sql/abs.md) + - [ACOS](/tidb-cloud-lake/sql/acos.md) + - [ADD](/tidb-cloud-lake/sql/add.md) + - [ASIN](/tidb-cloud-lake/sql/asin.md) + - [ATAN](/tidb-cloud-lake/sql/atan.md) + - [ATAN2](/tidb-cloud-lake/sql/atan-sql.md) + - [CBRT](/tidb-cloud-lake/sql/cbrt.md) + - [CEIL](/tidb-cloud-lake/sql/ceil.md) + - [CEILING](/tidb-cloud-lake/sql/ceiling.md) + - [COS](/tidb-cloud-lake/sql/cos.md) + - [COT](/tidb-cloud-lake/sql/cot.md) + - [CRC32](/tidb-cloud-lake/sql/crc.md) + - [DEGREES](/tidb-cloud-lake/sql/degrees.md) + - [DIV](/tidb-cloud-lake/sql/div.md) + - [DIV0](/tidb-cloud-lake/sql/div0.md) + - [DIVNULL](/tidb-cloud-lake/sql/divnull.md) + - [EXP](/tidb-cloud-lake/sql/exp.md) + - [FACTORIAL](/tidb-cloud-lake/sql/factorial.md) + - [FLOOR](/tidb-cloud-lake/sql/floor.md) + - [INTDIV](/tidb-cloud-lake/sql/intdiv.md) + - [LN](/tidb-cloud-lake/sql/ln.md) + - [LOG10](/tidb-cloud-lake/sql/log.md) + - [LOG2](/tidb-cloud-lake/sql/log-sql.md) + - [LOG(b, x)](/tidb-cloud-lake/sql/log-b-x.md) + - [LOG(x)](/tidb-cloud-lake/sql/log-x.md) + - [MINUS](/tidb-cloud-lake/sql/minus.md) + - [MOD](/tidb-cloud-lake/sql/mod.md) + - [MODULO](/tidb-cloud-lake/sql/modulo.md) + - [MULTIPLY](/tidb-cloud-lake/sql/multiply.md) + - [NEG](/tidb-cloud-lake/sql/neg.md) + - [NEGATE](/tidb-cloud-lake/sql/negate.md) + - [PI](/tidb-cloud-lake/sql/pi.md) + - [PLUS](/tidb-cloud-lake/sql/plus.md) + - [POW](/tidb-cloud-lake/sql/pow.md) + - [POWER](/tidb-cloud-lake/sql/power.md) + - [RADIANS](/tidb-cloud-lake/sql/radians.md) + - [RAND()](/tidb-cloud-lake/sql/rand.md) + - [RAND(n)](/tidb-cloud-lake/sql/rand-n.md) + - [ROUND](/tidb-cloud-lake/sql/round.md) + - [SIGN](/tidb-cloud-lake/sql/sign.md) + - [SIN](/tidb-cloud-lake/sql/sin.md) + - [SQRT](/tidb-cloud-lake/sql/sqrt.md) + - [SUBTRACT](/tidb-cloud-lake/sql/subtract.md) + - [TAN](/tidb-cloud-lake/sql/tan.md) + - [TRUNC](/tidb-cloud-lake/sql/trunc.md) + - [TRUNCATE](/tidb-cloud-lake/sql/truncate.md) + - Date & Time Functions + - [Overview](/tidb-cloud-lake/sql/date-time-functions.md) + - [ADD_MONTHS](/tidb-cloud-lake/sql/add-months.md) + - [ADD TIME INTERVAL](/tidb-cloud-lake/sql/add-time-interval.md) + - [AGE](/tidb-cloud-lake/sql/age.md) + - [CONVERT_TIMEZONE](/tidb-cloud-lake/sql/convert-timezone.md) + - [CURRENT_TIMESTAMP](/tidb-cloud-lake/sql/current-timestamp.md) + - [DATE](/tidb-cloud-lake/sql/date.md) + - [DATE_ADD](/tidb-cloud-lake/sql/date-add.md) + - [DATE_BETWEEN](/tidb-cloud-lake/sql/date-between.md) + - [DATE_DIFF](/tidb-cloud-lake/sql/date-diff.md) + - [DATE_FORMAT](/tidb-cloud-lake/sql/date-format.md) + - [DATE_PART](/tidb-cloud-lake/sql/date-part.md) + - [DATE_SUB](/tidb-cloud-lake/sql/date-sub.md) + - [DATE_TRUNC](/tidb-cloud-lake/sql/date-trunc.md) + - [DAY](/tidb-cloud-lake/sql/day.md) + - [EXTRACT](/tidb-cloud-lake/sql/extract.md) + - [LAST_DAY](/tidb-cloud-lake/sql/last-day.md) + - [MILLENNIUM](/tidb-cloud-lake/sql/millennium.md) + - [MONTH](/tidb-cloud-lake/sql/month.md) + - [MONTHS_BETWEEN](/tidb-cloud-lake/sql/months-between.md) + - [NEXT_DAY](/tidb-cloud-lake/sql/next-day.md) + - [NOW](/tidb-cloud-lake/sql/now.md) + - [PREVIOUS_DAY](/tidb-cloud-lake/sql/previous-day.md) + - [QUARTER](/tidb-cloud-lake/sql/quarter.md) + - [STR_TO_DATE](/tidb-cloud-lake/sql/str-date.md) + - [STR_TO_TIMESTAMP](/tidb-cloud-lake/sql/str-timestamp.md) + - [SUBTRACT TIME INTERVAL](/tidb-cloud-lake/sql/subtract-time-interval.md) + - [TIME_SLICE](/tidb-cloud-lake/sql/time-slice.md) + - [TIME_SLOT](/tidb-cloud-lake/sql/time-slot.md) + - [TIMESTAMP_DIFF](/tidb-cloud-lake/sql/timestamp-diff.md) + - [TIMEZONE](/tidb-cloud-lake/sql/timezone.md) + - [TO_DATE](/tidb-cloud-lake/sql/to-date.md) + - [TO_DATETIME](/tidb-cloud-lake/sql/datetime.md) + - [TO_DAY_OF_MONTH](/tidb-cloud-lake/sql/day-month.md) + - [TO_DAY_OF_WEEK](/tidb-cloud-lake/sql/day-week.md) + - [TO_DAY_OF_YEAR](/tidb-cloud-lake/sql/day-year.md) + - [TO_HOUR](/tidb-cloud-lake/sql/hour.md) + - [TO_MINUTE](/tidb-cloud-lake/sql/minute.md) + - [TO_MONDAY](/tidb-cloud-lake/sql/monday.md) + - [TO_MONTH](/tidb-cloud-lake/sql/to-month.md) + - [TO_QUARTER](/tidb-cloud-lake/sql/to-quarter.md) + - [TO_SECOND](/tidb-cloud-lake/sql/second.md) + - [TO_START_OF_DAY](/tidb-cloud-lake/sql/start-day.md) + - [TO_START_OF_FIFTEEN_MINUTES](/tidb-cloud-lake/sql/start-fifteen-minutes.md) + - [TO_START_OF_FIVE_MINUTES](/tidb-cloud-lake/sql/start-five-minutes.md) + - [TO_START_OF_HOUR](/tidb-cloud-lake/sql/start-hour.md) + - [TO_START_OF_ISO_YEAR](/tidb-cloud-lake/sql/start-iso-year.md) + - [TO_START_OF_MINUTE](/tidb-cloud-lake/sql/start-minute.md) + - [TO_START_OF_MONTH](/tidb-cloud-lake/sql/start-month.md) + - [TO_START_OF_QUARTER](/tidb-cloud-lake/sql/start-quarter.md) + - [TO_START_OF_SECOND](/tidb-cloud-lake/sql/start-second.md) + - [TO_START_OF_TEN_MINUTES](/tidb-cloud-lake/sql/start-ten-minutes.md) + - [TO_START_OF_WEEK](/tidb-cloud-lake/sql/start-week.md) + - [TO_START_OF_YEAR](/tidb-cloud-lake/sql/start-year.md) + - [TO_TIMESTAMP](/tidb-cloud-lake/sql/to-timestamp.md) + - [TO_TIMESTAMP_TZ](/tidb-cloud-lake/sql/timestamp-tz.md) + - [TO_UNIX_TIMESTAMP](/tidb-cloud-lake/sql/unix-timestamp.md) + - [TO_WEEK_OF_YEAR](/tidb-cloud-lake/sql/to-week-of-year.md) + - [TO_YEAR](/tidb-cloud-lake/sql/to-year.md) + - [TO_YYYYMM](/tidb-cloud-lake/sql/yyyymm.md) + - [TO_YYYYMMDD](/tidb-cloud-lake/sql/yyyymmdd.md) + - [TO_YYYYMMDDHH](/tidb-cloud-lake/sql/yyyymmddhh.md) + - [TO_YYYYMMDDHHMMSS](/tidb-cloud-lake/sql/yyyymmddhhmmss.md) + - [TODAY](/tidb-cloud-lake/sql/today.md) + - [TOMORROW](/tidb-cloud-lake/sql/tomorrow.md) + - [TRUNC](/tidb-cloud-lake/sql/trunc-sql.md) + - [TRY_TO_DATETIME](/tidb-cloud-lake/sql/try-datetime.md) + - [TRY_TO_TIMESTAMP](/tidb-cloud-lake/sql/try-timestamp.md) + - [WEEK](/tidb-cloud-lake/sql/week.md) + - [WEEKOFYEAR](/tidb-cloud-lake/sql/weekofyear.md) + - [YEAR](/tidb-cloud-lake/sql/year.md) + - [YEARWEEK](/tidb-cloud-lake/sql/yearweek.md) + - [YESTERDAY](/tidb-cloud-lake/sql/yesterday.md) + - Interval Functions + - [Overview](/tidb-cloud-lake/sql/interval-functions.md) + - [EPOCH](/tidb-cloud-lake/sql/epoch.md) + - [TO_CENTURIES](/tidb-cloud-lake/sql/to-centuries.md) + - [TO_DAYS](/tidb-cloud-lake/sql/days.md) + - [TO_DECADES](/tidb-cloud-lake/sql/decades.md) + - [TO_HOURS](/tidb-cloud-lake/sql/hours.md) + - [TO_MICROSECONDS](/tidb-cloud-lake/sql/microseconds.md) + - [TO_MILLENNIA](/tidb-cloud-lake/sql/millennia.md) + - [TO_MILLISECONDS](/tidb-cloud-lake/sql/milliseconds.md) + - [TO_MINUTES](/tidb-cloud-lake/sql/minutes.md) + - [TO_MONTHS](/tidb-cloud-lake/sql/months.md) + - [TO_SECONDS](/tidb-cloud-lake/sql/seconds.md) + - [TO_WEEKS](/tidb-cloud-lake/sql/weeks.md) + - [TO_YEARS](/tidb-cloud-lake/sql/years.md) + - String Functions + - [ASCII](/tidb-cloud-lake/sql/ascii.md) + - [BIN](/tidb-cloud-lake/sql/bin.md) + - [BIT_LENGTH](/tidb-cloud-lake/sql/bit-length.md) + - [CHAR](/tidb-cloud-lake/sql/char.md) + - [CHAR_LENGTH](/tidb-cloud-lake/sql/char-length.md) + - [CHARACTER_LENGTH](/tidb-cloud-lake/sql/character-length.md) + - [CONCAT](/tidb-cloud-lake/sql/concat.md) + - [CONCAT_WS](/tidb-cloud-lake/sql/concat-ws.md) + - [FROM_BASE64](/tidb-cloud-lake/sql/from-base64.md) + - [FROM_HEX](/tidb-cloud-lake/sql/from-hex.md) + - [GLOB](/tidb-cloud-lake/sql/glob.md) + - [HEX](/tidb-cloud-lake/sql/hex-functions.md) + - [String Functions](/tidb-cloud-lake/sql/string-functions.md) + - [INSERT](/tidb-cloud-lake/sql/insert-sql.md) + - [INSTR](/tidb-cloud-lake/sql/instr.md) + - [JARO_WINKLER](/tidb-cloud-lake/sql/jaro-winkler.md) + - [LCASE](/tidb-cloud-lake/sql/lcase.md) + - [LEFT](/tidb-cloud-lake/sql/left.md) + - [LENGTH](/tidb-cloud-lake/sql/length.md) + - [LENGTH_UTF8](/tidb-cloud-lake/sql/length-utf8.md) + - [LIKE](/tidb-cloud-lake/sql/like.md) + - [LOCATE](/tidb-cloud-lake/sql/locate.md) + - [LOWER](/tidb-cloud-lake/sql/lower.md) + - [LPAD](/tidb-cloud-lake/sql/lpad.md) + - [LTRIM](/tidb-cloud-lake/sql/ltrim.md) + - [MID](/tidb-cloud-lake/sql/mid.md) + - [NOT LIKE](/tidb-cloud-lake/sql/not-like.md) + - [NOT REGEXP](/tidb-cloud-lake/sql/not-regexp.md) + - [NOT RLIKE](/tidb-cloud-lake/sql/not-rlike.md) + - [OCT](/tidb-cloud-lake/sql/oct.md) + - [OCTET_LENGTH](/tidb-cloud-lake/sql/octet-length.md) + - [ORD](/tidb-cloud-lake/sql/ord.md) + - [POSITION](/tidb-cloud-lake/sql/position.md) + - [QUOTE](/tidb-cloud-lake/sql/quote.md) + - [REGEXP](/tidb-cloud-lake/sql/regexp.md) + - [REGEXP_INSTR](/tidb-cloud-lake/sql/regexp-instr.md) + - [REGEXP_LIKE](/tidb-cloud-lake/sql/regexp-like.md) + - [REGEXP_REPLACE](/tidb-cloud-lake/sql/regexp-replace.md) + - [REGEXP_SPLIT_TO_ARRAY](/tidb-cloud-lake/sql/regexp-split-array.md) + - [REGEXP_SPLIT_TO_TABLE](/tidb-cloud-lake/sql/regexp-split-table.md) + - [REGEXP_SUBSTR](/tidb-cloud-lake/sql/regexp-substr.md) + - [REPEAT](/tidb-cloud-lake/sql/repeat.md) + - [REPLACE](/tidb-cloud-lake/sql/replace-sql.md) + - [REVERSE](/tidb-cloud-lake/sql/reverse.md) + - [RIGHT](/tidb-cloud-lake/sql/right.md) + - [RLIKE](/tidb-cloud-lake/sql/rlike.md) + - [RPAD](/tidb-cloud-lake/sql/rpad.md) + - [RTRIM](/tidb-cloud-lake/sql/rtrim.md) + - [SOUNDEX](/tidb-cloud-lake/sql/soundex.md) + - [SOUNDS LIKE](/tidb-cloud-lake/sql/sounds-like.md) + - [SPACE](/tidb-cloud-lake/sql/space.md) + - [SPLIT](/tidb-cloud-lake/sql/split.md) + - [SPLIT_PART](/tidb-cloud-lake/sql/split-part.md) + - [STRCMP](/tidb-cloud-lake/sql/strcmp.md) + - [SUBSTR](/tidb-cloud-lake/sql/substr.md) + - [SUBSTRING](/tidb-cloud-lake/sql/substring.md) + - [TO_BASE64](/tidb-cloud-lake/sql/base-sql.md) + - [TRANSLATE](/tidb-cloud-lake/sql/translate.md) + - [TRIM](/tidb-cloud-lake/sql/trim.md) + - [TRIM_BOTH](/tidb-cloud-lake/sql/trim-both.md) + - [TRIM_LEADING](/tidb-cloud-lake/sql/trim-leading.md) + - [TRIM_TRAILING](/tidb-cloud-lake/sql/trim-trailing.md) + - [UCASE](/tidb-cloud-lake/sql/ucase.md) + - [UNHEX](/tidb-cloud-lake/sql/unhex.md) + - [UPPER](/tidb-cloud-lake/sql/upper.md) + - Aggregate Functions + - [Overview](/tidb-cloud-lake/sql/aggregate-functions.md) + - [ANY_VALUE](/tidb-cloud-lake/sql/any-value.md) + - [APPROX_COUNT_DISTINCT](/tidb-cloud-lake/sql/approx-count-distinct.md) + - [ARG_MAX](/tidb-cloud-lake/sql/arg-max.md) + - [ARG_MIN](/tidb-cloud-lake/sql/arg-min.md) + - [ARRAY_AGG](/tidb-cloud-lake/sql/array-agg.md) + - [AVG](/tidb-cloud-lake/sql/avg.md) + - [AVG_IF](/tidb-cloud-lake/sql/avg-if.md) + - [bool_and](/tidb-cloud-lake/sql/bool.md) + - [bool_or](/tidb-cloud-lake/sql/bool-sql.md) + - [COUNT](/tidb-cloud-lake/sql/count.md) + - [COUNT_DISTINCT](/tidb-cloud-lake/sql/count-distinct.md) + - [COUNT_IF](/tidb-cloud-lake/sql/count-if.md) + - [COVAR_POP](/tidb-cloud-lake/sql/covar-pop.md) + - [COVAR_SAMP](/tidb-cloud-lake/sql/covar-samp.md) + - [GROUP_ARRAY_MOVING_AVG](/tidb-cloud-lake/sql/group-array-moving-avg.md) + - [GROUP_ARRAY_MOVING_SUM](/tidb-cloud-lake/sql/group-array-moving-sum.md) + - [GROUP_CONCAT](/tidb-cloud-lake/sql/group-concat.md) + - [HISTOGRAM](/tidb-cloud-lake/sql/histogram.md) + - [JSON_ARRAY_AGG](/tidb-cloud-lake/sql/json-array-agg.md) + - [JSON_OBJECT_AGG](/tidb-cloud-lake/sql/json-object-agg.md) + - [KURTOSIS](/tidb-cloud-lake/sql/kurtosis.md) + - [LISTAGG](/tidb-cloud-lake/sql/listagg.md) + - [MARKOV_TRAIN](/tidb-cloud-lake/sql/markov-train.md) + - [MAX](/tidb-cloud-lake/sql/max.md) + - [MAX_IF](/tidb-cloud-lake/sql/max-if.md) + - [MEDIAN](/tidb-cloud-lake/sql/median.md) + - [MEDIAN_TDIGEST](/tidb-cloud-lake/sql/median-tdigest.md) + - [MIN](/tidb-cloud-lake/sql/min.md) + - [MIN_IF](/tidb-cloud-lake/sql/min-if.md) + - [MODE](/tidb-cloud-lake/sql/mode.md) + - [QUANTILE_CONT](/tidb-cloud-lake/sql/quantile-cont.md) + - [QUANTILE_DISC](/tidb-cloud-lake/sql/quantile-disc.md) + - [QUANTILE_TDIGEST](/tidb-cloud-lake/sql/quantile-tdigest.md) + - [QUANTILE_TDIGEST_WEIGHTED](/tidb-cloud-lake/sql/quantile-tdigest-weighted.md) + - [RETENTION](/tidb-cloud-lake/sql/retention.md) + - [SKEWNESS](/tidb-cloud-lake/sql/skewness.md) + - [STDDEV_POP](/tidb-cloud-lake/sql/stddev-pop.md) + - [STDDEV_SAMP](/tidb-cloud-lake/sql/stddev-samp.md) + - [STRING_AGG](/tidb-cloud-lake/sql/string-agg.md) + - [SUM](/tidb-cloud-lake/sql/sum.md) + - [SUM_IF](/tidb-cloud-lake/sql/sum-if.md) + - [VAR_POP](/tidb-cloud-lake/sql/var-pop.md) + - [VAR_SAMP](/tidb-cloud-lake/sql/var-samp.md) + - [VARIANCE_POP](/tidb-cloud-lake/sql/variance-pop.md) + - [VARIANCE_SAMP](/tidb-cloud-lake/sql/variance-samp.md) + - [WINDOW_FUNNEL](/tidb-cloud-lake/sql/window-funnel.md) + - Window Functions + - [Overview](/tidb-cloud-lake/sql/window-functions.md) + - [CUME_DIST](/tidb-cloud-lake/sql/cume-dist.md) + - [DENSE_RANK](/tidb-cloud-lake/sql/dense-rank.md) + - [FIRST](/tidb-cloud-lake/sql/first.md) + - [FIRST_VALUE](/tidb-cloud-lake/sql/first-value.md) + - [LAG](/tidb-cloud-lake/sql/lag.md) + - [LAST](/tidb-cloud-lake/sql/last.md) + - [LAST_VALUE](/tidb-cloud-lake/sql/last-value.md) + - [LEAD](/tidb-cloud-lake/sql/lead.md) + - [NTH_VALUE](/tidb-cloud-lake/sql/nth-value.md) + - [NTILE](/tidb-cloud-lake/sql/ntile.md) + - [PERCENT_RANK](/tidb-cloud-lake/sql/percent-rank.md) + - [RANGE BETWEEN](/tidb-cloud-lake/sql/range-between.md) + - [RANK](/tidb-cloud-lake/sql/rank.md) + - [ROW_NUMBER](/tidb-cloud-lake/sql/row-number.md) + - [ROWS BETWEEN](/tidb-cloud-lake/sql/rows-between.md) + - Geospatial Functions + - [Overview](/tidb-cloud-lake/sql/geospatial-functions.md) + - [GEO_TO_H3](/tidb-cloud-lake/sql/geo-to-h3.md) + - [GEOHASH_DECODE](/tidb-cloud-lake/sql/geohash-decode.md) + - [GEOHASH_ENCODE](/tidb-cloud-lake/sql/geohash-encode.md) + - [H3_CELL_AREA_M2](/tidb-cloud-lake/sql/h3-cell-area-m2.md) + - [H3_CELL_AREA_RADS2](/tidb-cloud-lake/sql/h3-cell-area-rads2.md) + - [H3_DISTANCE](/tidb-cloud-lake/sql/h3-distance.md) + - [H3_EDGE_ANGLE](/tidb-cloud-lake/sql/h3-edge-angle.md) + - [H3_EDGE_LENGTH_KM](/tidb-cloud-lake/sql/h3-edge-length-km.md) + - [H3_EDGE_LENGTH_M](/tidb-cloud-lake/sql/h3-edge-length-m.md) + - [H3_EXACT_EDGE_LENGTH_KM](/tidb-cloud-lake/sql/h3-exact-edge-length-km.md) + - [H3_EXACT_EDGE_LENGTH_M](/tidb-cloud-lake/sql/h3-exact-edge-length-m.md) + - [H3_EXACT_EDGE_LENGTH_RADS](/tidb-cloud-lake/sql/h3-exact-edge-length-rads.md) + - [H3_GET_BASE_CELL](/tidb-cloud-lake/sql/h3-get-base-cell.md) + - [H3_GET_DESTINATION_INDEX_FROM_UNIDIRECTIONAL_EDGE](/tidb-cloud-lake/sql/h3-get-destination-index-unidirectional-edge.md) + - [H3_GET_FACES](/tidb-cloud-lake/sql/h3-get-faces.md) + - [H3_GET_INDEXES_FROM_UNIDIRECTIONAL_EDGE](/tidb-cloud-lake/sql/h3-get-indexes-unidirectional-edge.md) + - [H3_GET_ORIGIN_INDEX_FROM_UNIDIRECTIONAL_EDGE](/tidb-cloud-lake/sql/h3-get-origin-index-unidirectional-edge.md) + - [H3_GET_RESOLUTION](/tidb-cloud-lake/sql/h3-get-resolution.md) + - [H3_GET_UNIDIRECTIONAL_EDGE](/tidb-cloud-lake/sql/h3-get-unidirectional-edge.md) + - [H3_GET_UNIDIRECTIONAL_EDGE_BOUNDARY](/tidb-cloud-lake/sql/h3-get-unidirectional-edge-boundary.md) + - [H3_GET_UNIDIRECTIONAL_EDGES_FROM_HEXAGON](/tidb-cloud-lake/sql/h3-get-unidirectional-edges-hexagon.md) + - [H3_HEX_AREA_KM2](/tidb-cloud-lake/sql/h3-hex-area-km2.md) + - [H3_HEX_AREA_M2](/tidb-cloud-lake/sql/h3-hex-area-m2.md) + - [H3_HEX_RING](/tidb-cloud-lake/sql/h3-hex-ring.md) + - [H3_INDEXES_ARE_NEIGHBORS](/tidb-cloud-lake/sql/h3-indexes-are-neighbors.md) + - [H3_IS_PENTAGON](/tidb-cloud-lake/sql/h3-is-pentagon.md) + - [H3_IS_RES_CLASS_III](/tidb-cloud-lake/sql/h3-is-res-class-iii.md) + - [H3_IS_VALID](/tidb-cloud-lake/sql/h3-is-valid.md) + - [H3_K_RING](/tidb-cloud-lake/sql/h3-k-ring.md) + - [H3_LINE](/tidb-cloud-lake/sql/h3-line.md) + - [H3_NUM_HEXAGONS](/tidb-cloud-lake/sql/h3-num-hexagons.md) + - [H3_TO_CENTER_CHILD](/tidb-cloud-lake/sql/h3-to-center-child.md) + - [H3_TO_CHILDREN](/tidb-cloud-lake/sql/h3-to-children.md) + - [H3_TO_GEO](/tidb-cloud-lake/sql/h3-to-geo.md) + - [H3_TO_GEO_BOUNDARY](/tidb-cloud-lake/sql/h3-to-geo-boundary.md) + - [H3_TO_PARENT](/tidb-cloud-lake/sql/h3-to-parent.md) + - [H3_TO_STRING](/tidb-cloud-lake/sql/h3-to-string.md) + - [H3_UNIDIRECTIONAL_EDGE_IS_VALID](/tidb-cloud-lake/sql/h3-unidirectional-edge-is-valid.md) + - [HAVERSINE](/tidb-cloud-lake/sql/haversine.md) + - [POINT_IN_POLYGON](/tidb-cloud-lake/sql/point-polygon.md) + - [ST_AREA](/tidb-cloud-lake/sql/st-area.md) + - [ST_ASBINARY](/tidb-cloud-lake/sql/st-asbinary.md) + - [ST_ASEWKB](/tidb-cloud-lake/sql/st-asewkb.md) + - [ST_ASEWKT](/tidb-cloud-lake/sql/st-asewkt.md) + - [ST_ASGEOJSON](/tidb-cloud-lake/sql/st-asgeojson.md) + - [ST_ASTEXT](/tidb-cloud-lake/sql/st-astext.md) + - [ST_ASWKB](/tidb-cloud-lake/sql/st-aswkb.md) + - [ST_ASWKT](/tidb-cloud-lake/sql/st-aswkt.md) + - [ST_CONTAINS](/tidb-cloud-lake/sql/st-contains.md) + - [ST_DIMENSION](/tidb-cloud-lake/sql/st-dimension.md) + - [ST_DISTANCE](/tidb-cloud-lake/sql/st-distance.md) + - [ST_ENDPOINT](/tidb-cloud-lake/sql/st-endpoint.md) + - [ST_GEOGETRYFROMWKB](/tidb-cloud-lake/sql/st-geogetryfromwkb.md) + - [ST_GEOGFROMEWKB](/tidb-cloud-lake/sql/st-geogfromewkb.md) + - [ST_GEOGFROMGEOHASH](/tidb-cloud-lake/sql/st-geogfromgeohash.md) + - [ST_GEOGFROMTEXT](/tidb-cloud-lake/sql/st-geogfromtext.md) + - [ST_GEOGFROMWKB](/tidb-cloud-lake/sql/st-geogfromwkb.md) + - [ST_GEOGFROMWKT](/tidb-cloud-lake/sql/st-geogfromwkt.md) + - [ST_GEOGPOINTFROMGEOHASH](/tidb-cloud-lake/sql/st-geogpointfromgeohash.md) + - [ST_GEOGRAPHYFROMEWKT](/tidb-cloud-lake/sql/st-geographyfromewkt.md) + - [ST_GEOGRAPHYFROMTEXT](/tidb-cloud-lake/sql/st-geographyfromtext.md) + - [ST_GEOGRAPHYFROMWKB](/tidb-cloud-lake/sql/st-geographyfromwkb.md) + - [ST_GEOGRAPHYFROMWKT](/tidb-cloud-lake/sql/st-geographyfromwkt.md) + - [ST_GEOHASH](/tidb-cloud-lake/sql/st-geohash.md) + - [ST_GEOM_POINT](/tidb-cloud-lake/sql/st-geom-point.md) + - [ST_GEOMETRYFROMEWKB](/tidb-cloud-lake/sql/st-geometryfromewkb.md) + - [ST_GEOMETRYFROMEWKT](/tidb-cloud-lake/sql/st-geometryfromewkt.md) + - [ST_GEOMETRYFROMTEXT](/tidb-cloud-lake/sql/st-geometryfromtext.md) + - [ST_GEOMETRYFROMWKB](/tidb-cloud-lake/sql/st-geometryfromwkb.md) + - [ST_GEOMETRYFROMWKT](/tidb-cloud-lake/sql/st-geometryfromwkt.md) + - [ST_GEOMFROMEWKB](/tidb-cloud-lake/sql/st-geomfromewkb.md) + - [ST_GEOMFROMEWKT](/tidb-cloud-lake/sql/st-geomfromewkt.md) + - [ST_GEOMFROMGEOHASH](/tidb-cloud-lake/sql/st-geomfromgeohash.md) + - [ST_GEOMFROMTEXT](/tidb-cloud-lake/sql/st-geomfromtext.md) + - [ST_GEOMFROMWKB](/tidb-cloud-lake/sql/st-geomfromwkb.md) + - [ST_GEOMFROMWKT](/tidb-cloud-lake/sql/st-geomfromwkt.md) + - [ST_GEOMPOINTFROMGEOHASH](/tidb-cloud-lake/sql/st-geompointfromgeohash.md) + - [ST_LENGTH](/tidb-cloud-lake/sql/st-length.md) + - [ST_MAKE_LINE](/tidb-cloud-lake/sql/st-make-line.md) + - [ST_MAKEGEOMPOINT](/tidb-cloud-lake/sql/st-makegeompoint.md) + - [ST_MAKELINE](/tidb-cloud-lake/sql/st-makeline.md) + - [ST_MAKEPOINT](/tidb-cloud-lake/sql/st-makepoint.md) + - [ST_MAKEPOLYGON](/tidb-cloud-lake/sql/st-makepolygon.md) + - [ST_NPOINTS](/tidb-cloud-lake/sql/st-npoints.md) + - [ST_NUMPOINTS](/tidb-cloud-lake/sql/st-numpoints.md) + - [ST_POINT](/tidb-cloud-lake/sql/st-point.md) + - [ST_POINTN](/tidb-cloud-lake/sql/st-pointn.md) + - [ST_POLYGON](/tidb-cloud-lake/sql/st-polygon.md) + - [ST_SETSRID](/tidb-cloud-lake/sql/st-setsrid.md) + - [ST_SRID](/tidb-cloud-lake/sql/st-srid.md) + - [ST_STARTPOINT](/tidb-cloud-lake/sql/st-startpoint.md) + - [ST_TRANSFORM](/tidb-cloud-lake/sql/st-transform.md) + - [ST_X](/tidb-cloud-lake/sql/st-x.md) + - [ST_XMAX](/tidb-cloud-lake/sql/st-xmax.md) + - [ST_XMIN](/tidb-cloud-lake/sql/st-xmin.md) + - [ST_Y](/tidb-cloud-lake/sql/st-y.md) + - [ST_YMAX](/tidb-cloud-lake/sql/st-ymax.md) + - [ST_YMIN](/tidb-cloud-lake/sql/st-ymin.md) + - [STRING_TO_H3](/tidb-cloud-lake/sql/string-to-h3.md) + - [TO_GEOGRAPHY](/tidb-cloud-lake/sql/to-geography.md) + - [TO_GEOMETRY](/tidb-cloud-lake/sql/geometry.md) + - [TO_STRING](/tidb-cloud-lake/sql/string-geospatial.md) + - Full-Text Search Functions + - [Overview](/tidb-cloud-lake/sql/full-text-search-functions.md) + - [MATCH](/tidb-cloud-lake/sql/match.md) + - [QUERY](/tidb-cloud-lake/sql/query.md) + - [SCORE](/tidb-cloud-lake/sql/score.md) + - Structured & Semi-Structured + - [Structured & Semi-Structured Functions](/tidb-cloud-lake/sql/structured-semi-structured-functions.md) + - JSON Functions + - [CHECK_JSON](/tidb-cloud-lake/sql/check-json.md) + - [GET](/tidb-cloud-lake/sql/get.md) + - [GET_BY_KEYPATH](/tidb-cloud-lake/sql/get-by-keypath.md) + - [GET_IGNORE_CASE](/tidb-cloud-lake/sql/get-ignore-case.md) + - [GET_PATH](/tidb-cloud-lake/sql/get-path.md) + - [JSON Functions](/tidb-cloud-lake/sql/json-functions.md) + - [JQ](/tidb-cloud-lake/sql/jq.md) + - [JSON_ARRAY_ELEMENTS](/tidb-cloud-lake/sql/json-array-elements.md) + - [JSON_CONTAINS_IN_LEFT](/tidb-cloud-lake/sql/json-contains-left.md) + - [JSON_EACH](/tidb-cloud-lake/sql/json-each.md) + - [JSON_EXISTS_KEY](/tidb-cloud-lake/sql/json-exists-key.md) + - [JSON_EXTRACT_PATH_TEXT](/tidb-cloud-lake/sql/json-extract-path-text.md) + - [JSON_PATH_EXISTS](/tidb-cloud-lake/sql/json-path-exists.md) + - [JSON_PATH_MATCH](/tidb-cloud-lake/sql/json-path-match.md) + - [JSON_PATH_QUERY](/tidb-cloud-lake/sql/json-path-query.md) + - [JSON_PATH_QUERY_ARRAY](/tidb-cloud-lake/sql/json-path-query-array.md) + - [JSON_PATH_QUERY_FIRST](/tidb-cloud-lake/sql/json-path-query-first.md) + - [JSON_PRETTY](/tidb-cloud-lake/sql/json-pretty.md) + - [JSON_TO_STRING](/tidb-cloud-lake/sql/json-to-string.md) + - [JSON_TYPEOF](/tidb-cloud-lake/sql/json-typeof.md) + - [PARSE_JSON](/tidb-cloud-lake/sql/parse-json.md) + - [STRIP_NULL_VALUE](/tidb-cloud-lake/sql/strip-null-value.md) + - Array Functions + - [Overview](/tidb-cloud-lake/sql/array-functions.md) + - [ARRAY](/tidb-cloud-lake/sql/array-sql.md) + - [ARRAY_AGGREGATE](/tidb-cloud-lake/sql/array-aggregate.md) + - [ARRAY_ANY](/tidb-cloud-lake/sql/array-any.md) + - [ARRAY_APPEND](/tidb-cloud-lake/sql/array-append.md) + - [ARRAY_APPROX_COUNT_DISTINCT](/tidb-cloud-lake/sql/array-approx-count-distinct.md) + - [ARRAY_AVG](/tidb-cloud-lake/sql/array-avg.md) + - [ARRAY_COMPACT](/tidb-cloud-lake/sql/array-compact.md) + - [ARRAY_CONCAT](/tidb-cloud-lake/sql/array-concat.md) + - [ARRAY_CONSTRUCT](/tidb-cloud-lake/sql/array-construct.md) + - [ARRAY_CONTAINS](/tidb-cloud-lake/sql/array-contains.md) + - [ARRAY_COUNT](/tidb-cloud-lake/sql/array-count.md) + - [ARRAY_DISTINCT](/tidb-cloud-lake/sql/array-distinct.md) + - [ARRAY_EXCEPT](/tidb-cloud-lake/sql/array-except.md) + - [ARRAY_FILTER](/tidb-cloud-lake/sql/array-filter.md) + - [ARRAY_FLATTEN](/tidb-cloud-lake/sql/array-flatten.md) + - [ARRAY_GENERATE_RANGE](/tidb-cloud-lake/sql/array-generate-range.md) + - [ARRAY_GET](/tidb-cloud-lake/sql/array-get.md) + - [ARRAY_INDEXOF](/tidb-cloud-lake/sql/array-indexof.md) + - [ARRAY_INSERT](/tidb-cloud-lake/sql/array-insert.md) + - [ARRAY_INTERSECTION](/tidb-cloud-lake/sql/array-intersection.md) + - [ARRAY_KURTOSIS](/tidb-cloud-lake/sql/array-kurtosis.md) + - [ARRAY_MAX](/tidb-cloud-lake/sql/array-max.md) + - [ARRAY_MEDIAN](/tidb-cloud-lake/sql/array-median.md) + - [ARRAY_MIN](/tidb-cloud-lake/sql/array-min.md) + - [ARRAY_OVERLAP](/tidb-cloud-lake/sql/array-overlap.md) + - [ARRAY_PREPEND](/tidb-cloud-lake/sql/array-prepend.md) + - [ARRAY_REDUCE](/tidb-cloud-lake/sql/array-reduce.md) + - [ARRAY_REMOVE](/tidb-cloud-lake/sql/array-remove.md) + - [ARRAY_REMOVE_FIRST](/tidb-cloud-lake/sql/array-remove-first.md) + - [ARRAY_REMOVE_LAST](/tidb-cloud-lake/sql/array-remove-last.md) + - [ARRAY_REVERSE](/tidb-cloud-lake/sql/array-reverse.md) + - [ARRAY_SIZE](/tidb-cloud-lake/sql/array-size.md) + - [ARRAY_SKEWNESS](/tidb-cloud-lake/sql/array-skewness.md) + - [ARRAY_SLICE](/tidb-cloud-lake/sql/array-slice.md) + - [ARRAY_SORT](/tidb-cloud-lake/sql/array-sort.md) + - [ARRAY_STDDEV_POP](/tidb-cloud-lake/sql/array-stddev-pop.md) + - [ARRAY_STDDEV_SAMP](/tidb-cloud-lake/sql/array-stddev-samp.md) + - [ARRAY_SUM](/tidb-cloud-lake/sql/array-sum.md) + - [ARRAY_TO_STRING](/tidb-cloud-lake/sql/array-to-string.md) + - [JSON_ARRAY_TRANSFORM](/tidb-cloud-lake/sql/json-array-transform.md) + - [ARRAY_UNIQUE](/tidb-cloud-lake/sql/array-unique.md) + - [ARRAYS_ZIP](/tidb-cloud-lake/sql/arrays-zip.md) + - [CONTAINS](/tidb-cloud-lake/sql/contains.md) + - [GET](/tidb-cloud-lake/sql/get-sql.md) + - [RANGE](/tidb-cloud-lake/sql/range.md) + - [SLICE](/tidb-cloud-lake/sql/slice.md) + - [UNNEST](/tidb-cloud-lake/sql/unnest.md) + - Object Functions + - [Object Functions](/tidb-cloud-lake/sql/object-functions.md) + - [OBJECT_CONSTRUCT](/tidb-cloud-lake/sql/object-construct.md) + - [OBJECT_CONSTRUCT_KEEP_NULL](/tidb-cloud-lake/sql/object-construct-keep-null.md) + - [OBJECT_DELETE](/tidb-cloud-lake/sql/object-delete.md) + - [OBJECT_INSERT](/tidb-cloud-lake/sql/object-insert.md) + - [OBJECT_KEYS](/tidb-cloud-lake/sql/object-keys.md) + - [OBJECT_PICK](/tidb-cloud-lake/sql/object-pick.md) + - Map Functions + - [Map Functions](/tidb-cloud-lake/sql/map-functions.md) + - [MAP_CAT](/tidb-cloud-lake/sql/map-cat.md) + - [MAP_CONTAINS_KEY](/tidb-cloud-lake/sql/map-contains-key.md) + - [MAP_DELETE](/tidb-cloud-lake/sql/map-delete.md) + - [MAP_FILTER](/tidb-cloud-lake/sql/map-filter.md) + - [MAP_INSERT](/tidb-cloud-lake/sql/map-insert.md) + - [MAP_KEYS](/tidb-cloud-lake/sql/map-keys.md) + - [MAP_PICK](/tidb-cloud-lake/sql/map-pick.md) + - [MAP_SIZE](/tidb-cloud-lake/sql/map-size.md) + - [MAP_TRANSFORM_KEYS](/tidb-cloud-lake/sql/map-transform-keys.md) + - [MAP_TRANSFORM_VALUES](/tidb-cloud-lake/sql/map-transform-values.md) + - [MAP_VALUES](/tidb-cloud-lake/sql/map-values.md) + - Type Conversion + - [AS_ARRAY](/tidb-cloud-lake/sql/as-array.md) + - [AS_BINARY](/tidb-cloud-lake/sql/as-binary.md) + - [AS_BOOLEAN](/tidb-cloud-lake/sql/as-boolean.md) + - [AS_DATE](/tidb-cloud-lake/sql/as-date.md) + - [AS_DECIMAL](/tidb-cloud-lake/sql/as-decimal.md) + - [AS_FLOAT](/tidb-cloud-lake/sql/as-float.md) + - [AS_INTEGER](/tidb-cloud-lake/sql/as-integer.md) + - [AS_OBJECT](/tidb-cloud-lake/sql/as-object.md) + - [AS_STRING](/tidb-cloud-lake/sql/as-string.md) + - [Type Conversion Functions](/tidb-cloud-lake/sql/type-conversion-functions.md) + - Type Predicate + - [Type Predicate Functions](/tidb-cloud-lake/sql/type-predicate-functions.md) + - [IS_ARRAY](/tidb-cloud-lake/sql/is-array.md) + - [IS_BOOLEAN](/tidb-cloud-lake/sql/is-boolean.md) + - [IS_FLOAT](/tidb-cloud-lake/sql/is-float.md) + - [IS_INTEGER](/tidb-cloud-lake/sql/is-integer.md) + - [IS_NULL_VALUE](/tidb-cloud-lake/sql/is-null-value.md) + - [IS_OBJECT](/tidb-cloud-lake/sql/is-object.md) + - [IS_STRING](/tidb-cloud-lake/sql/is-string.md) + - Vector Functions + - [COSINE_DISTANCE](/tidb-cloud-lake/sql/cosine-distance.md) + - [L2_DISTANCE](/tidb-cloud-lake/sql/l2-distance.md) + - [L1_DISTANCE](/tidb-cloud-lake/sql/l1-distance.md) + - [INNER_PRODUCT](/tidb-cloud-lake/sql/inner-product.md) + - [VECTOR_DIMS](/tidb-cloud-lake/sql/vector-dims.md) + - [VECTOR_NORM](/tidb-cloud-lake/sql/vector-norm.md) + - [Vector Functions](/tidb-cloud-lake/sql/vector-functions.md) + - Hash Functions + - [BLAKE3](/tidb-cloud-lake/sql/blake.md) + - [CITY64WITHSEED](/tidb-cloud-lake/sql/city-withseed.md) + - [Hash Functions](/tidb-cloud-lake/sql/hash-functions.md) + - [MD5](/tidb-cloud-lake/sql/md.md) + - [SHA](/tidb-cloud-lake/sql/sha.md) + - [SHA1](/tidb-cloud-lake/sql/sha-sql.md) + - [SHA2](/tidb-cloud-lake/sql/sha-functions.md) + - [SIPHASH](/tidb-cloud-lake/sql/siphash.md) + - [SIPHASH64](/tidb-cloud-lake/sql/siphash-sql.md) + - [XXHASH32](/tidb-cloud-lake/sql/xxhash.md) + - [XXHASH64](/tidb-cloud-lake/sql/xxhash-sql.md) + - UUID Functions + - [GEN_RANDOM_UUID](/tidb-cloud-lake/sql/gen-random-uuid.md) + - [UUID Functions](/tidb-cloud-lake/sql/uuid-functions.md) + - [UUID](/tidb-cloud-lake/sql/uuid.md) + - IP Address Functions + - [IP Address Functions](/tidb-cloud-lake/sql/ip-address-functions.md) + - [INET_ATON](/tidb-cloud-lake/sql/inet-aton.md) + - [INET_NTOA](/tidb-cloud-lake/sql/inet-ntoa.md) + - [IPV4_NUM_TO_STRING](/tidb-cloud-lake/sql/ipv-num-string.md) + - [IPV4_STRING_TO_NUM](/tidb-cloud-lake/sql/ipv-string-num.md) + - [TRY_INET_ATON](/tidb-cloud-lake/sql/try-inet-aton.md) + - [TRY_INET_NTOA](/tidb-cloud-lake/sql/try-inet-ntoa.md) + - [TRY_IPV4_NUM_TO_STRING](/tidb-cloud-lake/sql/try-ipv-num-string.md) + - [TRY_IPV4_STRING_TO_NUM](/tidb-cloud-lake/sql/try-ipv-string-num.md) + - Context Functions + - [CONNECTION_ID](/tidb-cloud-lake/sql/connection-id.md) + - [CURRENT_CATALOG](/tidb-cloud-lake/sql/current-catalog.md) + - [CURRENT_USER](/tidb-cloud-lake/sql/current-user.md) + - [DATABASE](/tidb-cloud-lake/sql/database-sql.md) + - [Context Functions](/tidb-cloud-lake/sql/context-functions.md) + - [LAST_QUERY_ID](/tidb-cloud-lake/sql/last-query-id.md) + - [VERSION](/tidb-cloud-lake/sql/version.md) + - System Functions + - [CLUSTERING_INFORMATION](/tidb-cloud-lake/sql/clustering-information.md) + - [FUSE_BLOCK](/tidb-cloud-lake/sql/fuse-block.md) + - [FUSE_COLUMN](/tidb-cloud-lake/sql/fuse-column.md) + - [FUSE_ENCODING](/tidb-cloud-lake/sql/fuse-encoding.md) + - [FUSE_SEGMENT](/tidb-cloud-lake/sql/fuse-segment.md) + - [FUSE_SNAPSHOT](/tidb-cloud-lake/sql/fuse-snapshot.md) + - [FUSE_STATISTIC](/tidb-cloud-lake/sql/fuse-statistic.md) + - [FUSE_TIME_TRAVEL_SIZE](/tidb-cloud-lake/sql/fuse-time-travel-size.md) + - [FUSE_VIRTUAL_COLUMN](/tidb-cloud-lake/sql/fuse-virtual-column.md) + - [System Functions](/tidb-cloud-lake/sql/system-functions-sql.md) + - Table Functions + - [Overview](/tidb-cloud-lake/sql/table-functions.md) + - [INFER_SCHEMA](/tidb-cloud-lake/sql/infer-schema.md) + - [INSPECT_PARQUET](/tidb-cloud-lake/sql/inspect-parquet.md) + - [LIST_STAGE](/tidb-cloud-lake/sql/list-stage.md) + - [GENERATE_SERIES](/tidb-cloud-lake/sql/generate-series.md) + - [FLATTEN](/tidb-cloud-lake/sql/flatten.md) + - [SYSTEM$FUSE_AMEND](/tidb-cloud-lake/sql/system-fuse-amend.md) + - [FUSE_VACUUM_TEMPORARY_TABLE](/tidb-cloud-lake/sql/fuse-vacuum-temporary-table.md) + - [ICEBERG_MANIFEST](/tidb-cloud-lake/sql/iceberg-manifest.md) + - [ICEBERG_SNAPSHOT](/tidb-cloud-lake/sql/iceberg-snapshot.md) + - [POLICY_REFERENCES](/tidb-cloud-lake/sql/policy-references.md) + - [READ_FILE](/tidb-cloud-lake/sql/read-file.md) + - [RESULT_SCAN](/tidb-cloud-lake/sql/result-scan.md) + - [SHOW_GRANTS](/tidb-cloud-lake/sql/show-grants-sql.md) + - [SHOW_VARIABLES](/tidb-cloud-lake/sql/show-variables-sql.md) + - [STREAM_STATUS](/tidb-cloud-lake/sql/stream-status.md) + - [TASK_HISTORY](/tidb-cloud-lake/sql/task-history.md) + - Sequence Functions + - [Sequence Functions](/tidb-cloud-lake/sql/sequence-functions.md) + - [NEXTVAL](/tidb-cloud-lake/sql/nextval.md) + - Data Anonymization Functions + - [FEISTEL_OBFUSCATE](/tidb-cloud-lake/sql/feistel-obfuscate.md) + - [Data Anonymization Functions](/tidb-cloud-lake/sql/data-anonymization-functions.md) + - [MARKOV_GENERATE](/tidb-cloud-lake/sql/markov-generate.md) + - [OBFUSCATE](/tidb-cloud-lake/sql/obfuscate.md) + - Test Functions + - [Test Functions](/tidb-cloud-lake/sql/test-functions.md) + - [SLEEP](/tidb-cloud-lake/sql/sleep.md) + - Other Functions + - [ASSUME_NOT_NULL](/tidb-cloud-lake/sql/assume-not-null.md) + - [EXISTS](/tidb-cloud-lake/sql/exists.md) + - [GROUPING](/tidb-cloud-lake/sql/grouping.md) + - [HUMANIZE_NUMBER](/tidb-cloud-lake/sql/humanize-number.md) + - [HUMANIZE_SIZE](/tidb-cloud-lake/sql/humanize-size.md) + - [Other Functions](/tidb-cloud-lake/sql/other-functions.md) + - [REMOVE_NULLABLE](/tidb-cloud-lake/sql/remove-nullable.md) + - [TO_NULLABLE](/tidb-cloud-lake/sql/nullable.md) + - [TYPEOF](/tidb-cloud-lake/sql/typeof.md) + - [Stored Procedure & Scripting](/tidb-cloud-lake/sql/stored-procedure-sql-scripting.md) +- General Reference + - [Architecture](/tidb-cloud-lake/guides/tidb-cloud-lake-architecture.md) + - Implementation Deep Dives + - [Fuse Engine](/tidb-cloud-lake/guides/how-fuse-engine-works.md) + - [Optimizer](/tidb-cloud-lake/guides/how-optimizer-works.md) + - [JSON](/tidb-cloud-lake/guides/how-json-variant-works.md) + - [Data Sharing](/tidb-cloud-lake/guides/how-data-sharing-works.md) + - [Compliance & Security](/tidb-cloud-lake/guides/compliance-security.md) + - [Editions](/tidb-cloud-lake/guides/editions.md) + - [Platforms & Regions](/tidb-cloud-lake/guides/platforms-regions.md) + - [Pricing & Billing](/tidb-cloud-lake/guides/pricing-billing.md) + - [Support Services](/tidb-cloud-lake/guides/support-services.md) diff --git a/tidb-cloud-lake/_index.md b/tidb-cloud-lake/_index.md new file mode 100644 index 0000000000000..8cb374c278b43 --- /dev/null +++ b/tidb-cloud-lake/_index.md @@ -0,0 +1,10 @@ +--- +title: TiDB Cloud Lake Documentation +hide_sidebar: true +hide_commit: true +summary: TiDB Cloud Lake is cloud-native data warehouse service focused on analytics workloads that scales elastically and supports ANSI SQL and multi-modal data operations. +--- + + + + diff --git a/tidb-cloud-lake/guides/access-control.md b/tidb-cloud-lake/guides/access-control.md new file mode 100644 index 0000000000000..016bb297916ed --- /dev/null +++ b/tidb-cloud-lake/guides/access-control.md @@ -0,0 +1,19 @@ +--- +title: Access Control +--- + +Databend incorporates both [Role-Based Access Control (RBAC)](https://en.wikipedia.org/wiki/Role-based_access_control) and [Discretionary Access Control (DAC)](https://en.wikipedia.org/wiki/Discretionary_access_control) models for its access control functionality. When a user accesses a data object in Databend, they must be granted appropriate privileges or roles, or they need to have ownership of the data object. A data object can refer to various elements, such as a database, table, view, stage, or UDF. + +![Alt text](/img/guides/access-control-1.png) + +| Concept | Description | +|-----------|------------------------------------------------------------| +| Privileges | Privileges play a crucial role when interacting with data objects in Databend. These permissions, such as read, write, and execute, provide precise control over user actions, ensuring alignment with user requirements and maintaining data security. | +| Roles | Roles simplify access control. Roles are predefined sets of privileges assigned to users, streamlining permission management. Administrators can categorize users based on responsibilities, granting permissions efficiently without individual configurations. | +| Ownership | Ownership is a specialized privilege for controlling data access. When a user owns a data object, they have the highest control level, dictating access permissions. This straightforward ownership model empowers users to manage their data, controlling who can access or modify it within the Databend environment. | + +This guide describes the related concepts and provides instructions on how to manage access control in Databend: + +- [Privileges](/tidb-cloud-lake/guides/privileges.md) +- [Roles](/tidb-cloud-lake/guides/roles.md) +- [Ownership](/tidb-cloud-lake/guides/ownership.md) \ No newline at end of file diff --git a/tidb-cloud-lake/guides/aggregating-index.md b/tidb-cloud-lake/guides/aggregating-index.md new file mode 100644 index 0000000000000..103ecc1fa8c74 --- /dev/null +++ b/tidb-cloud-lake/guides/aggregating-index.md @@ -0,0 +1,148 @@ +--- +title: Aggregating Index +--- + +# Aggregating Index: Precomputed Results for Instant Analytics + +Aggregating indexes dramatically accelerate analytical queries by precomputing and storing aggregation results, eliminating the need to scan entire tables for common analytics operations. + +## What Problem Does It Solve? + +Analytical queries on large datasets face significant performance challenges: + +| Problem | Impact | Aggregating Index Solution | +|---------|--------|---------------------------| +| **Full Table Scans** | SUM, COUNT, MIN, MAX queries scan millions of rows | Read precomputed results instantly | +| **Repeated Calculations** | Same aggregations computed over and over | Store results once, reuse many times | +| **Slow Dashboard Queries** | Analytics dashboards take minutes to load | Sub-second response for common metrics | +| **High Compute Costs** | Heavy aggregation workloads consume resources | Minimal compute for cached results | +| **Poor User Experience** | Users wait for reports and analytics | Instant results for business intelligence | + +**Example**: A sales analytics query `SELECT SUM(revenue), COUNT(*) FROM sales WHERE region = 'US'` on 100M rows. Without aggregating index, it scans all US sales records. With aggregating index, it returns precomputed results instantly. + +## How It Works + +1. **Index Creation** → Define aggregation queries to precompute +2. **Result Storage** → Databend stores aggregated results in optimized blocks +3. **Query Matching** → Incoming queries automatically use precomputed results +4. **Automatic Updates** → Results refresh when underlying data changes + +## Quick Setup + +```sql +-- Create table with sample data +CREATE TABLE sales(region VARCHAR, product VARCHAR, revenue DECIMAL, quantity INT); + +-- Create aggregating index for common analytics +CREATE AGGREGATING INDEX sales_summary AS +SELECT region, SUM(revenue), COUNT(*), AVG(quantity) +FROM sales +GROUP BY region; + +-- Refresh the index (manual mode) +REFRESH AGGREGATING INDEX sales_summary; + +-- Verify the index is used +EXPLAIN SELECT region, SUM(revenue) FROM sales GROUP BY region; +``` + +## Supported Operations + +| ✅ Supported | ❌ Not Supported | +|-------------|-----------------| +| SUM, COUNT, MIN, MAX, AVG | Window Functions | +| GROUP BY clauses | GROUPING SETS | +| WHERE filters | ORDER BY, LIMIT | +| Simple aggregations | Complex subqueries | + +## Refresh Strategies + +| Strategy | When to Use | Configuration | +|----------|-------------|---------------| +| **Automatic (SYNC)** | Real-time analytics, small datasets | `CREATE AGGREGATING INDEX ... SYNC` | +| **Manual** | Large datasets, batch processing | `CREATE AGGREGATING INDEX ...` (default) | +| **Background (Cloud)** | Production workloads | Automatic in Databend Cloud | + +### Automatic vs Manual Refresh + +```sql +-- Automatic refresh (updates with every data change) +CREATE AGGREGATING INDEX auto_summary AS +SELECT region, SUM(revenue) FROM sales GROUP BY region SYNC; + +-- Manual refresh (update on demand) +CREATE AGGREGATING INDEX manual_summary AS +SELECT region, SUM(revenue) FROM sales GROUP BY region; + +REFRESH AGGREGATING INDEX manual_summary; +``` + +## Performance Example + +This example shows the dramatic performance improvement: + +```sql +-- Prepare data +CREATE TABLE agg(a int, b int, c int); +INSERT INTO agg VALUES (1,1,4), (1,2,1), (1,2,4), (2,2,5); + +-- Create an aggregating index +CREATE AGGREGATING INDEX my_agg_index AS SELECT MIN(a), MAX(c) FROM agg; + +-- Refresh the aggregating index +REFRESH AGGREGATING INDEX my_agg_index; + +-- Verify if the aggregating index works +EXPLAIN SELECT MIN(a), MAX(c) FROM agg; + +-- Key indicators in the execution plan: +-- ├── aggregating index: [SELECT MIN(a), MAX(c) FROM default.agg] +-- ├── rewritten query: [selection: [index_col_0 (#0), index_col_1 (#1)]] +-- This shows the query uses precomputed results instead of scanning raw data +``` + +## Best Practices + +| Practice | Benefit | +|----------|---------| +| **Index Common Queries** | Focus on frequently executed analytics | +| **Use Manual Refresh** | Better control over update timing | +| **Monitor Index Usage** | Use EXPLAIN to verify index utilization | +| **Clean Up Unused Indexes** | Remove indexes that aren't being used | +| **Match Query Patterns** | Index filters should match actual queries | + +## Management Commands + +| Command | Purpose | +|---------|---------| +| `CREATE AGGREGATING INDEX` | Create new aggregating index | +| `REFRESH AGGREGATING INDEX` | Update index with latest data | +| `DROP AGGREGATING INDEX` | Remove index (use VACUUM TABLE to clean storage) | +| `SHOW AGGREGATING INDEXES` | List all indexes | + +## Important Notes + +:::tip +**When to Use Aggregating Indexes:** +- Frequent analytical queries (dashboards, reports) +- Large datasets with repeated aggregations +- Stable query patterns +- Performance-critical applications + +**When NOT to Use:** +- Frequently changing data +- One-time analytical queries +- Simple queries on small tables +::: + +## Configuration + +```sql +-- Enable/disable aggregating index feature +SET enable_aggregating_index_scan = 1; -- Enable (default) +SET enable_aggregating_index_scan = 0; -- Disable +``` + +--- + +*Aggregating indexes are most effective for repetitive analytical workloads on large datasets. Start with your most common dashboard and reporting queries.* diff --git a/tidb-cloud-lake/guides/ai-ml-integration.md b/tidb-cloud-lake/guides/ai-ml-integration.md new file mode 100644 index 0000000000000..2473c23fcfced --- /dev/null +++ b/tidb-cloud-lake/guides/ai-ml-integration.md @@ -0,0 +1,33 @@ +# AI & ML Integration + +Databend enables powerful AI and ML capabilities through two complementary approaches: build custom AI functions with your own infrastructure, or create conversational data experiences using natural language. + +## External Functions - The Recommended Approach + +External functions enable you to connect your data with custom AI/ML infrastructure, providing maximum flexibility and performance for AI workloads. + +| Feature | Benefits | +|---------|----------| +| **Custom Models** | Use any open-source or proprietary AI/ML models | +| **GPU Acceleration** | Deploy on GPU-equipped machines for faster inference | +| **Data Privacy** | Keep your data within your infrastructure | +| **Scalability** | Independent scaling and resource optimization | +| **Flexibility** | Support for any programming language and ML framework | + +## MCP Server - Natural Language Data Interaction + +The Model Context Protocol (MCP) server enables AI assistants to interact with your Databend database using natural language, perfect for building conversational BI tools. + +| Feature | Benefits | +|---------|----------| +| **Natural Language** | Query your data using plain English | +| **AI Assistant Integration** | Works with Claude, ChatGPT, and custom agents | +| **Real-time Analysis** | Get instant insights from your data | + +## Getting Started + +**[External Functions Guide](/tidb-cloud-lake/guides/external-ai-functions.md)** - Learn how to create and deploy custom AI functions with practical examples and implementation guidance + +**[MCP Server Guide](/tidb-cloud-lake/guides/mcp-server.md)** - Build a conversational BI tool using mcp-databend and natural language queries + +**[MCP Client Integration](/tidb-cloud-lake/guides/mcp-client-integration.md)** - Configure generic MCP clients (like Codex) to connect to Databend diff --git a/tidb-cloud-lake/guides/ai-powered-features.md b/tidb-cloud-lake/guides/ai-powered-features.md new file mode 100644 index 0000000000000..a0bb44c717963 --- /dev/null +++ b/tidb-cloud-lake/guides/ai-powered-features.md @@ -0,0 +1,39 @@ +--- +title: AI-Powered Features +--- + +import SearchSVG from '@site/static/img/icon/search.svg' +import LanguageFileParse from '@site/src/components/LanguageDocs/file-parse' +import AITip from '@site/docs/fragment/ai-tip.md' + +} +/> + +With the inclusion of AI-powered features, Databend Cloud allows you to engage in natural language conversations to receive help, assistance, and solutions. These AI-powered features are enabled by default, but you can disable them if desired by navigating to **Manage** > **Settings**. + +### AI Chat for Assistance + +AI Chat enables natural language interactions, allowing for intuitive information retrieval and streamlined problem-solving. + +To launch an AI-Chat: + +1. Click the magnifying glass icon located in the sidebar to open the search box. + +2. Switch to the **Chat** tab. + +3. Enter your question. + +![Alt text](@site/static/img/documents/worksheet/ai-chat.gif) + +### AI-Powered SQL Assistant + +AI assistance is available for editing SQL statements within worksheets. You don't need to write your SQL from scratch — AI can generate it for you. + +To involve AI when editing a SQL statement, simply type "/" at the beginning of a new line and input your query, like "return current time": + +![Alt text](@site/static/img/documents/worksheet/ai-worksheet-1.gif) + +You can also get AI assistance for an existing SQL statement. To do so, highlight your SQL and click **Edit** to specify your desired changes or request further help. Alternatively, click **Chat** to engage in a conversation with AI for more comprehensive support. + +![Alt text](@site/static/img/documents/worksheet/ai-worksheet-2.gif) diff --git a/tidb-cloud-lake/guides/audit-trail.md b/tidb-cloud-lake/guides/audit-trail.md new file mode 100644 index 0000000000000..8cef0f4d9fca2 --- /dev/null +++ b/tidb-cloud-lake/guides/audit-trail.md @@ -0,0 +1,108 @@ +--- +title: Audit Trail +--- + +import EEFeature from '@site/src/components/EEFeature'; + + + +Databend system history tables automatically capture detailed records of database activities, providing a complete audit trail for compliance and security monitoring. + +Allows the auditing of the user: +- **Query execution** - Complete SQL execution audit trail (`query_history`) +- **Data access** - Database object access and modifications (`access_history`) +- **Authentication** - Login attempts and session tracking (`login_history`) + +## Available Audit Tables + +Databend provides five system history tables that capture different aspects of database activity: + +| Table | Purpose | Key Use Cases | +|-------|---------|---------------| +| [query_history](/tidb-cloud-lake/sql/system-history-query-history.md) | Complete SQL execution audit trail | Performance monitoring, security auditing, compliance reporting | +| [access_history](/tidb-cloud-lake/sql/system-history-access-history.md) | Database object access and modifications | Data lineage tracking, compliance auditing, change management | +| [login_history](/tidb-cloud-lake/sql/system-history-login-history.md) | Authentication attempts and sessions | Security monitoring, failed login detection, access pattern analysis | + +## Audit Use Cases & Examples + +### Security Monitoring + +**Monitor Failed Login Attempts** + +Track authentication failures to identify potential security threats and unauthorized access attempts. + +```sql +-- Check for failed login attempts (security audit) +SELECT event_time, user_name, client_ip, error_message +FROM system_history.login_history +WHERE event_type = 'LoginFailed' +ORDER BY event_time DESC; +``` + +Example output: +``` +event_time: 2025-06-03 06:07:32.512021 +user_name: root1 +client_ip: 127.0.0.1:62050 +error_message: UnknownUser. Code: 2201, Text = User 'root1'@'%' does not exist. +``` + +### Compliance Reporting + +**Track Database Schema Changes** + +Monitor DDL operations for compliance and change management requirements. + +```sql +-- Audit DDL operations (compliance tracking) +SELECT query_id, query_start, user_name, object_modified_by_ddl +FROM system_history.access_history +WHERE object_modified_by_ddl != '[]' +ORDER BY query_start DESC; +``` + +Example for `CREATE TABLE` operation: +``` +query_id: c2c1c7be-cee4-4868-a28e-8862b122c365 +query_start: 2025-06-12 03:31:19.042128 +user_name: root +object_modified_by_ddl: [{"object_domain":"Table","object_name":"default.default.t","operation_type":"Create"}] +``` + +**Audit Data Access Patterns** + +Track who accessed what data and when for compliance and data governance. + +```sql +-- Track data access for compliance +SELECT query_id, query_start, user_name, base_objects_accessed +FROM system_history.access_history +WHERE base_objects_accessed != '[]' +ORDER BY query_start DESC; +``` + +### Operational Monitoring + +**Complete Query Execution Audit** + +Maintain comprehensive records of all SQL operations with user and timing information. + +```sql +-- Complete query audit with user and timing information +SELECT query_id, sql_user, query_text, query_start_time, query_duration_ms, client_address +FROM system_history.query_history +WHERE event_date >= TODAY() - INTERVAL 7 DAY +ORDER BY query_start_time DESC; +``` + +Example output: +``` +query_id: 4e1f50a9-bce2-45cc-86e4-c7a36b9b8d43 +sql_user: root +query_text: SELECT * FROM t +query_start_time: 2025-06-12 03:31:35.041725 +query_duration_ms: 94 +client_address: 127.0.0.1 +``` + +For detailed information about each audit table and their specific fields, see the [System History Tables](/tidb-cloud-lake/sql/system-history-tables.md) reference documentation. diff --git a/tidb-cloud-lake/guides/authenticate-with-aws-iam-role.md b/tidb-cloud-lake/guides/authenticate-with-aws-iam-role.md new file mode 100644 index 0000000000000..54769a79559a4 --- /dev/null +++ b/tidb-cloud-lake/guides/authenticate-with-aws-iam-role.md @@ -0,0 +1,101 @@ +--- +title: "Authenticate with AWS IAM Role" +sidebar_label: "AWS IAM Role" +--- + +# Authenticate with AWS IAM Role + +Cloud-native identity delegation (AWS IAM Role, Azure Managed Identity, Google Service Account federation, etc.) lets Databend Cloud obtain short-lived credentials to your object storage without ever handling raw access keys. That keeps data plane access inside your cloud provider's control plane while you retain ownership of every permission. + +## IAM role benefits + +- No static keys: temporary credentials eliminate long-lived secrets to rotate or leak. +- Least privilege: fine-grained policies restrict Databend Cloud to only the buckets and actions you approve. +- Central governance: continue auditing and revoking access through your existing IAM workflows. +- Automated rotation: the cloud provider refreshes tokens, so integrations keep working when teams change. + +## How it works + +After Databend Cloud support shares the trusted principal information for your organization, you create an IAM role/identity in your cloud account, attach a policy that allows the object storage operations you need (for example reading a set of buckets), and configure the trust policy so only Databend Cloud can assume the role with a unique external ID. Databend Cloud then assumes that role on demand, uses the temporary credentials to access your storage, and automatically logs out when the session expires. + +## Use IAM role + +1. Raise a support ticket to get the IAM role ARN for your Databend Cloud organization: + + For example: `arn:aws:iam::123456789012:role/xxxxxxx/tnabcdefg/xxxxxxx-tnabcdefg` + +2. Goto AWS Console: + + https://us-east-2.console.aws.amazon.com/iam/home?region=us-east-2#/policies + + Click `Create policy`, and select `Custom trust policy`, and input the policy document for S3 bucket access: + + ```json + { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": "s3:ListBucket", + "Resource": "arn:aws:s3:::test-bucket-123" + }, + { + "Effect": "Allow", + "Action": "s3:*Object", + "Resource": "arn:aws:s3:::test-bucket-123/*" + } + ] + } + ``` + + Click `Next`, and input the policy name: `databend-test`, and click `Create policy` + +3. Goto AWS Console: + + https://us-east-2.console.aws.amazon.com/iam/home?region=us-east-2#/roles + + Click `Create role`, and select `Custom trust policy` in `Trusted entity type`: + + ![Create Role](/img/cloud/iam/aws/create-role.png) + + Input the the trust policy document: + + ```json + { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "AWS": "arn:aws:iam::123456789012:role/xxxxxxx/tnabcdefg/xxxxxxx-tnabcdefg" + }, + "Condition": { + "StringEquals": { + "sts:ExternalId": "my-external-id-123" + } + }, + "Action": "sts:AssumeRole" + } + ] + } + ``` + + Click `Next`, and select the previously created policy: `databend-test` + + Click `Next`, and input the role name: `databend-test` + + Click `View Role`, and record the role ARN: `arn:aws:iam::987654321987:role/databend-test` + +4. Run the following SQL statement in Databend Cloud cloud worksheet or `BendSQL`: + + ```sql + CREATE CONNECTION databend_test STORAGE_TYPE = 's3' ROLE_ARN = 'arn:aws:iam::987654321987:role/databend-test' EXTERNAL_ID = 'my-external-id-123'; + + CREATE STAGE databend_test URL = 's3://test-bucket-123' CONNECTION = (CONNECTION_NAME = 'databend_test'); + + SELECT * FROM @databend_test/test.parquet LIMIT 1; + ``` + +:::info +Congratulations! You could now access your own AWS S3 buckets in Databend Cloud with IAM Role. +::: diff --git a/tidb-cloud-lake/guides/automate-data-loading-with-tasks.md b/tidb-cloud-lake/guides/automate-data-loading-with-tasks.md new file mode 100644 index 0000000000000..bb3426aad20a7 --- /dev/null +++ b/tidb-cloud-lake/guides/automate-data-loading-with-tasks.md @@ -0,0 +1,215 @@ +--- +title: Automating Data Loading with Tasks +sidebar_label: Task +--- + +Tasks wrap SQL so Databend can run it for you on a schedule or when a condition is met. Keep the following knobs in mind when you define one with [CREATE TASK](/tidb-cloud-lake/sql/create-task.md): + +![alt text](/img/load/task.png) + +- **Name & warehouse** – every task needs a warehouse. + ```sql + CREATE TASK ingest_orders + WAREHOUSE = 'etl_wh' + AS SELECT 1; + ``` +- **Trigger** – fixed interval, CRON, or `AFTER another_task`. + ```sql + CREATE TASK mytask + WAREHOUSE = 'default' + SCHEDULE = 2 MINUTE + AS ...; + ``` +- **Guards** – only run when a predicate is true. + ```sql + CREATE TASK mytask + WAREHOUSE = 'default' + WHEN STREAM_STATUS('mystream') = TRUE + AS ...; + ``` +- **Error handling** – pause after N failures or send notifications. + ```sql + CREATE TASK mytask + WAREHOUSE = 'default' + SUSPEND_TASK_AFTER_NUM_FAILURES = 3 + AS ...; + ``` +- **SQL payload** – whatever you place after `AS` is what the task executes. + ```sql + CREATE TASK bump_age + WAREHOUSE = 'default' + SCHEDULE = USING CRON '0 0 1 1 * *' 'UTC' + AS UPDATE employees SET age = age + 1; + ``` + +## Example 1: Scheduled Copy + +Continuously generate sensor data, land it as Parquet, and load it into a table. Replace `'etl_wh_small'` with **your** warehouse name in every `CREATE/ALTER TASK` statement. + +### Step 1. Prepare demo objects + +```sql +-- Create a playground schema and target table +CREATE DATABASE IF NOT EXISTS task_demo; +USE task_demo; + +CREATE OR REPLACE TABLE sensor_events ( + event_time TIMESTAMP, + sensor_id INT, + temperature DOUBLE, + humidity DOUBLE +); + +-- Stage that will store the generated Parquet files +CREATE OR REPLACE STAGE sensor_events_stage; +``` + +### Step 2. Task 1 — Generate files + +`task_generate_data` writes 100 random readings to the stage once per minute. Each execution produces a fresh Parquet file that downstream consumers can ingest. + +```sql +CREATE OR REPLACE TASK task_generate_data + WAREHOUSE = 'etl_wh_small' -- replace with your warehouse + SCHEDULE = 1 MINUTE +AS +COPY INTO @sensor_events_stage +FROM ( + SELECT + NOW() AS event_time, + number AS sensor_id, + 20 + RAND() * 5 AS temperature, + 60 + RAND() * 10 AS humidity + FROM numbers(100) +) +FILE_FORMAT = (TYPE = PARQUET); +``` + +### Step 3. Task 2 — Load the files + +`task_consume_data` scans the stage on the same cadence and copies every newly generated Parquet file into the `sensor_events` table. The `PURGE = TRUE` clause cleans up files that were already ingested. + +```sql +CREATE OR REPLACE TASK task_consume_data + WAREHOUSE = 'etl_wh_small' -- replace with your warehouse + SCHEDULE = 1 MINUTE +AS +COPY INTO sensor_events +FROM @sensor_events_stage +PATTERN = '.*[.]parquet' +FILE_FORMAT = (TYPE = PARQUET) +PURGE = TRUE; +``` + +### Step 4. Resume tasks + +```sql +ALTER TASK task_generate_data RESUME; +ALTER TASK task_consume_data RESUME; +``` + +Both tasks start in a suspended state until you resume them. Expect the first files and copies to happen within the next minute. + +### Step 5. Monitor the pipeline + +```sql +-- Confirm that the tasks are running +SHOW TASKS LIKE 'task_%'; + +-- Inspect files on the stage (should shrink as PURGE removes processed files) +LIST @sensor_events_stage; + +-- Check the ingested rows +SELECT * +FROM sensor_events +ORDER BY event_time DESC +LIMIT 5; + +-- Review recent executions for troubleshooting +SELECT * +FROM task_history('task_consume_data', 5); + +-- Change configuration later if needed +ALTER TASK task_consume_data + SCHEDULE = 30 SECOND, + WAREHOUSE = 'etl_wh_medium'; -- replace with your warehouse +``` + +You can suspend either task with `ALTER TASK ... SUSPEND` when you finish testing. + +### Step 6. Update tasks + +You can change schedules, warehouses, or even the SQL payload without dropping the task: + +```sql +-- Tweak the schedule and warehouse +ALTER TASK task_consume_data + SCHEDULE = 30 SECOND, + WAREHOUSE = 'etl_wh_medium'; -- replace with your warehouse + +-- Update the SQL payload (replace the existing body) +ALTER TASK task_consume_data + AS +COPY INTO sensor_events +FROM @sensor_events_stage +FILE_FORMAT = (TYPE = PARQUET); + +-- Resume after edits (tasks suspend when their SQL changes) +ALTER TASK task_consume_data RESUME; + +-- Review execution history for verification +SELECT * +FROM task_history('task_consume_data', 5) +ORDER BY completed_time DESC; +``` + +`TASK_HISTORY` returns status, timing, and query IDs, making it easy to double-check changes. + +## Example 2: Stream-Triggered Merge + +Use `WHEN STREAM_STATUS(...)` to fire only when a stream has new rows. Reuse the `sensor_events` table from Example 1. + +### Step 1. Create stream + latest table + +```sql +-- Create a stream on the sensor table (Standard mode to capture every mutation) +CREATE OR REPLACE STREAM sensor_events_stream + ON TABLE sensor_events + APPEND_ONLY = false; + +-- Target table that keeps only the latest copy of each row +CREATE OR REPLACE TABLE sensor_events_latest AS +SELECT * +FROM sensor_events +WHERE 1 = 0; +``` + +### Step 2. Create the conditional task + +```sql +CREATE OR REPLACE TASK task_stream_merge + WAREHOUSE = 'etl_wh_small' -- replace with your warehouse + SCHEDULE = 1 MINUTE + WHEN STREAM_STATUS('task_demo.sensor_events_stream') = TRUE +AS +INSERT INTO sensor_events_latest +SELECT * +FROM sensor_events_stream; + +ALTER TASK task_stream_merge RESUME; +``` + +### Step 3. Verify the behavior + +```sql +SELECT * +FROM sensor_events_latest +ORDER BY event_time DESC +LIMIT 5; + +SELECT * +FROM task_history('task_stream_merge', 5); +``` + +The task fires only when `STREAM_STATUS('.')` returns `TRUE`. Always prefix the stream with its database (for example `task_demo.sensor_events_stream`) so the task can resolve it regardless of the current schema, and use your own warehouse name in every `CREATE/ALTER TASK`. + diff --git a/tidb-cloud-lake/guides/bendsql.md b/tidb-cloud-lake/guides/bendsql.md new file mode 100644 index 0000000000000..93976521a5798 --- /dev/null +++ b/tidb-cloud-lake/guides/bendsql.md @@ -0,0 +1,706 @@ +--- +title: BendSQL +--- + +[BendSQL](https://github.com/databendlabs/bendsql) is a command line tool that has been designed specifically for Databend. It allows users to establish a connection with Databend and execute queries directly from a CLI window. + +BendSQL is particularly useful for those who prefer a command line interface and need to work with Databend on a regular basis. With BendSQL, users can easily and efficiently manage their databases, tables, and data, and perform a wide range of queries and operations with ease. + +## Installing BendSQL + +BendSQL offers multiple installation options to suit different platforms and preferences. Choose your preferred method from the sections below or download the installation package from the [BendSQL release page](https://github.com/databendlabs/bendsql/releases) to install it manually. + +### Shell Script + +BendSQL provides a convenient Shell script for installation. You can choose between two options: + +#### Default Installation + +Install BendSQL to the user's home directory (~/.bendsql): + +```bash +curl -fsSL https://repo.databend.com/install/bendsql.sh | bash +``` + +```bash title='Example:' +# highlight-next-line +curl -fsSL https://repo.databend.com/install/bendsql.sh | bash + + B E N D S Q L + Installer + +-------------------------------------------------------------------------------- +Website: https://databend.com +Docs: https://docs.databend.com +Github: https://github.com/databendlabs/bendsql +-------------------------------------------------------------------------------- + +>>> We'll be installing BendSQL via a pre-built archive at https://repo.databend.com/bendsql/v0.22.2/ +>>> Ready to proceed? (y/n) + +>>> Please enter y or n. +>>> y + +-------------------------------------------------------------------------------- + +>>> Downloading BendSQL via https://repo.databend.com/bendsql/v0.22.2/bendsql-aarch64-apple-darwin.tar.gz ✓ +>>> Unpacking archive to /Users/eric/.bendsql ... ✓ +>>> Adding BendSQL path to /Users/eric/.zprofile ✓ +>>> Adding BendSQL path to /Users/eric/.profile ✓ +>>> Install succeeded! 🚀 +>>> To start BendSQL: + + bendsql --help + +>>> More information at https://github.com/databendlabs/bendsql +``` + +#### Custom Installation with `--prefix` + +Install BendSQL to a specified directory (e.g., /usr/local): + +```bash +curl -fsSL https://repo.databend.com/install/bendsql.sh | bash -s -- -y --prefix /usr/local +``` + +```bash title='Example:' +# highlight-next-line +curl -fsSL https://repo.databend.com/install/bendsql.sh | bash -s -- -y --prefix /usr/local + B E N D S Q L + Installer + +-------------------------------------------------------------------------------- +Website: https://databend.com +Docs: https://docs.databend.com +Github: https://github.com/databendlabs/bendsql +-------------------------------------------------------------------------------- + +>>> Downloading BendSQL via https://repo.databend.com/bendsql/v0.22.2/bendsql-aarch64-apple-darwin.tar.gz ✓ +>>> Unpacking archive to /usr/local ... ✓ +>>> Install succeeded! 🚀 +>>> To start BendSQL: + + bendsql --help + +>>> More information at https://github.com/databendlabs/bendsql +``` + +### Homebrew (for macOS) + +BendSQL can be easily installed on macOS using Homebrew with a simple command: + +```bash +brew install databendcloud/homebrew-tap/bendsql +``` + +### Apt (for Ubuntu/Debian) + +On Ubuntu and Debian systems, BendSQL can be installed via the Apt package manager. Choose the appropriate instructions based on the distribution version. + +#### DEB822-STYLE format (Ubuntu-22.04/Debian-12 and later) + +```bash +sudo curl -L -o /etc/apt/sources.list.d/databend.sources https://repo.databend.com/deb/databend.sources +``` + +#### Old format (Ubuntu-20.04/Debian-11 and earlier) + +```bash +sudo curl -L -o /usr/share/keyrings/databend-keyring.gpg https://repo.databend.com/deb/databend.gpg +sudo curl -L -o /etc/apt/sources.list.d/databend.list https://repo.databend.com/deb/databend.list +``` + +Finally, update the package list and install BendSQL: + +```bash +sudo apt update +sudo apt install bendsql +``` + +### Cargo (Rust Package Manager) + +To install BendSQL using Cargo, utilize the `cargo-binstall` tool or build from source using the provided command. + +:::note +Before installing with Cargo, make sure you have the full Rust toolchain and the `cargo` command installed on your computer. If you don't, follow the installation guide at [https://rustup.rs/](https://rustup.rs/). +::: + +**Using cargo-binstall** + +Please refer to [Cargo B(inary)Install - Installation](https://github.com/cargo-bins/cargo-binstall#installation) to install `cargo-binstall` and enable the `cargo binstall ` subcommand. + +```bash +cargo binstall bendsql +``` + +**Building from Source** + +When building from source, some dependencies may involve compiling C/C++ code. Ensure that you have the GCC/G++ or Clang toolchain installed on your computer. + +```bash +cargo install bendsql +``` + +## User Authentication + +If you are connecting to a self-hosted Databend instance, you can use the admin users specified in the [databend-query.toml](https://github.com/databendlabs/databend/blob/main/scripts/distribution/configs/databend-query.toml) configuration file, or you can connect using an SQL user created with the [CREATE USER](/tidb-cloud-lake/sql/create-user.md) command. + +For connections to Databend Cloud, you can use the default `cloudapp` user or an SQL user created with the [CREATE USER](/tidb-cloud-lake/sql/create-user.md) command. Please note that the user account you use to log in to the [Databend Cloud console](https://app.databend.com) cannot be used for connecting to Databend Cloud. + +## Connecting with BendSQL + +BendSQL allows you to connect to both Databend Cloud and self-hosted Databend instances. + +### Customize Connections with a DSN + +A DSN (Data Source Name) is a simple yet powerful way to configure and manage your Databend connection in BendSQL using a single URI-style string. This method allows you to embed your credentials and connection settings directly into your environment, streamlining the connection process. + +#### DSN Format and Parameters + +```bash title='DSN Format' +databend[+flight]://user[:password]@host[:port]/[database][?sslmode=disable][&arg1=value1] +``` + +| Common DSN Parameters | Description | +|-----------------------|--------------------------------------| +| `tenant` | Tenant ID, Databend Cloud only. | +| `warehouse` | Warehouse name, Databend Cloud only. | +| `sslmode` | Set to `disable` if not using TLS. | +| `tls_ca_file` | Custom root CA certificate path. | +| `connect_timeout` | Connect timeout in seconds. | + +| RestAPI Client Parameters | Description | +|-----------------------------|-------------------------------------------------------------------------------------------------------------------------------| +| `wait_time_secs` | Request wait time for page, default is `1`. | +| `max_rows_in_buffer` | Maximum rows for page buffer. | +| `max_rows_per_page` | Maximum response rows for a single page. | +| `page_request_timeout_secs` | Timeout for a single page request, default is `30`. | +| `presign` | Enable presign for data loading. Options: `auto`, `detect`, `on`, `off`. Default is `auto` (only enabled for Databend Cloud). | + +| FlightSQL Client Parameters | Description | +|-----------------------------|----------------------------------------------------------------------| +| `query_timeout` | Query timeout in seconds. | +| `tcp_nodelay` | Defaults to `true`. | +| `tcp_keepalive` | TCP keepalive in seconds (default is `3600`, set to `0` to disable). | +| `http2_keep_alive_interval` | Keep-alive interval in seconds, default is `300`. | +| `keep_alive_timeout` | Keep-alive timeout in seconds, default is `20`. | +| `keep_alive_while_idle` | Defaults to `true`. | + +#### DSN Examples + +```bash +# Local connection using HTTP API with presign detection +databend://root:@localhost:8000/?sslmode=disable&presign=detect + +# Databend Cloud connection with tenant and warehouse info +databend://user1:password1@tnxxxx--default.gw.aws-us-east-2.default.databend.com:443/benchmark?enable_dphyp=1 + +# Local connection using FlightSQL API +databend+flight://root:@localhost:8900/database1?connect_timeout=10 +``` + +### Connect to Databend Cloud + +The best practice for connecting to Databend Cloud is to obtain your DSN from Databend Cloud and export it as an environment variable. To obtain your DSN: + +1. Log in to Databend Cloud and click **Connect** on the **Overview** page. + +2. Select the database and warehouse you want to connect to. + +3. Your DSN will be automatically generated in the **Examples** section. Below the DSN, you'll find a BendSQL snippet that exports the DSN as an environment variable named `BENDSQL_DSN` and launches BendSQL with the correct configuration. You can copy and paste it directly into your terminal. + + ```bash title='Example' + export BENDSQL_DSN="databend://cloudapp:******@tn3ftqihs.gw.aws-us-east-2.default.databend.com:443/information_schema?warehouse=small-xy2t" + bendsql + ``` + +### Connect to Self-hosted Databend + +You can connect to a self-hosted Databend instance using either BendSQL command-line arguments or a DSN. + +#### Option 1: Use BendSQL Arguments + +```bash +bendsql --host --port --user --password --database +``` + +This example connects to a Databend instance running locally on port `8000` using `eric` as the user: + +```bash title='Example' +bendsql --host 127.0.0.1 --port 8000 --user eric --password abc123 +``` + +#### Option 2: Use a DSN + +You can also define the connection using a DSN and export it as the `BENDSQL_DSN` environment variable: + +```bash title='Example' +export BENDSQL_DSN="databend://eric:abc123@localhost:8000/?sslmode=disable" +bendsql +``` + +## BendSQL Settings + +BendSQL provides a range of settings that allow you to define how query results are presented: + +| Setting | Description | +| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `display_pretty_sql` | When set to `true`, SQL queries will be formatted in a visually appealing manner, making them easier to read and understand. | +| `prompt` | The prompt displayed in the command line interface, typically indicating the user, warehouse, and database being accessed. | +| `progress_color` | Specifies the color used for progress indicators, such as when executing queries that take some time to complete. | +| `show_progress` | When set to `true`, progress indicators will be displayed to show the progress of long-running queries or operations. | +| `show_stats` | If `true`, query statistics such as execution time, rows read, and bytes processed will be displayed after executing each query. | +| `max_display_rows` | Sets the maximum number of rows that will be displayed in the output of a query result. | +| `max_col_width` | Sets the maximum width in characters of each column's display rendering. A value smaller than 3 disables the limit. | +| `max_width` | Sets the maximum width in characters of the entire display output. A value of 0 defaults to the width of the terminal window. | +| `output_format` | Sets the format used to display query results (`table`, `csv`, `tsv`, `null`). | +| `expand` | Controls whether the output of a query is displayed as individual records or in a tabular format. Available values: `on`, `off`, and `auto`. | +| `multi_line` | Determines whether multi-line input for SQL queries is allowed. When set to `true`, queries can span multiple lines for better readability. | +| `replace_newline` | Specifies whether newline characters in the output of query results should be replaced with spaces. This can prevent unintended line breaks in the display. | + +For details of each setting, please refer to the reference information below: + +#### `display_pretty_sql` + +The `display_pretty_sql` setting controls whether SQL queries are displayed in a visually formatted manner or not. When set to `false`, as in the first query below, SQL queries are not formatted for visual appeal. In contrast, when set to `true`, as in the second query, SQL queries are formatted in a visually appealing manner, making them easier to read and understand. + +```shell title='Example:' +// highlight-next-line +root@localhost:8000/default> !set display_pretty_sql false +root@localhost:8000/default> SELECT TO_STRING(ST_ASGEOJSON(ST_GEOMETRYFROMWKT('SRID=4326;LINESTRING(400000 6000000, 401000 6010000)'))) AS pipeline_geojson; +┌─────────────────────────────────────────────────────────────────────────┐ +│ pipeline_geojson │ +│ String │ +├─────────────────────────────────────────────────────────────────────────┤ +│ {"coordinates":[[400000,6000000],[401000,6010000]],"type":"LineString"} │ +└─────────────────────────────────────────────────────────────────────────┘ +1 row read in 0.063 sec. Processed 1 row, 1 B (15.76 rows/s, 15 B/s) + +// highlight-next-line +root@localhost:8000/default> !set display_pretty_sql true +root@localhost:8000/default> SELECT TO_STRING(ST_ASGEOJSON(ST_GEOMETRYFROMWKT('SRID=4326;LINESTRING(400000 6000000, 401000 6010000)'))) AS pipeline_geojson; + +SELECT + TO_STRING( + ST_ASGEOJSON( + ST_GEOMETRYFROMWKT( + 'SRID=4326;LINESTRING(400000 6000000, 401000 6010000)' + ) + ) + ) AS pipeline_geojson + +┌─────────────────────────────────────────────────────────────────────────┐ +│ pipeline_geojson │ +│ String │ +├─────────────────────────────────────────────────────────────────────────┤ +│ {"coordinates":[[400000,6000000],[401000,6010000]],"type":"LineString"} │ +└─────────────────────────────────────────────────────────────────────────┘ +1 row read in 0.087 sec. Processed 1 row, 1 B (11.44 rows/s, 11 B/s) +``` + +#### `prompt` + +The `prompt` setting controls the format of the command line interface prompt. In the example below, it was initially set to display the user and warehouse (`{user}@{warehouse}`). After updating it to `{user}@{warehouse}/{database}`, the prompt now includes the user, warehouse, and database. + +```shell title='Example:' +// highlight-next-line +root@localhost:8000/default> !set prompt {user}@{warehouse} +root@localhost:8000 !configs +Settings { + display_pretty_sql: true, + prompt: "{user}@{warehouse}", + progress_color: "cyan", + show_progress: true, + show_stats: true, + max_display_rows: 40, + max_col_width: 1048576, + max_width: 1048576, + output_format: Table, + quote_style: Necessary, + expand: Off, + time: None, + multi_line: true, + replace_newline: true, +} +// highlight-next-line +root@localhost:8000 !set prompt {user}@{warehouse}/{database} +root@localhost:8000/default +``` + +#### `progress_color` + +The `progress_color` setting controls the color used for progress indicators during query execution. In this example, the color has been set to `blue`: + +```shell title='Example:' +// highlight-next-line +root@localhost:8000/default> !set progress_color blue +``` + +#### `show_progress` + +When set to `true`, progress information is displayed during the execution of a query. The progress information includes the number of rows processed, the total number of rows in the query, the processing speed in rows per second, the amount of memory processed, and the processing speed in memory per second. + +```shell title='Example:' +// highlight-next-line +root@localhost:8000/default> !set show_progress true +root@localhost:8000/default> select * from numbers(1000000000000000); +⠁ [00:00:08] Processing 18.02 million/1 quadrillion (2.21 million rows/s), 137.50 MiB/7.11 PiB (16.88 MiB/s) ░ +``` + +#### `show_stats` + +The `show_stats` setting controls whether query statistics are displayed after executing each query. When set to `false`, as the first query in the example below, query statistics are not displayed. In contrast, when set to `true`, as in the second query, query statistics such as execution time, rows read, and bytes processed are displayed after executing each query. + +```shell title='Example:' +// highlight-next-line +root@localhost:8000/default> !set show_stats false +root@localhost:8000/default> select now(); +┌────────────────────────────┐ +│ now() │ +│ Timestamp │ +├────────────────────────────┤ +│ 2024-04-23 23:27:11.538673 │ +└────────────────────────────┘ +// highlight-next-line +root@localhost:8000/default> !set show_stats true +root@localhost:8000/default> select now(); +┌────────────────────────────┐ +│ now() │ +│ Timestamp │ +├────────────────────────────┤ +│ 2024-04-23 23:49:04.754296 │ +└────────────────────────────┘ +1 row read in 0.045 sec. Processed 1 row, 1 B (22.26 rows/s, 22 B/s) +``` + +#### `max_display_rows` + +The `max_display_rows` setting controls the maximum number of rows displayed in the output of a query result. When set to `5` in the example below, only up to 5 rows are displayed in the query result. The remaining rows are indicated with (5 shown). + +```shell title='Example:' +// highlight-next-line +root@localhost:8000/default> !set max_display_rows 5 +root@localhost:8000/default> SELECT * FROM system.configs; +┌──────────────────────────────────────────────────────┐ +│ group │ name │ value │ description │ +│ String │ String │ String │ String │ +├───────────┼──────────────────┼─────────┼─────────────┤ +│ query │ tenant_id │ default │ │ +│ query │ cluster_id │ default │ │ +│ query │ num_cpus │ 0 │ │ +│ · │ · │ · │ · │ +│ · │ · │ · │ · │ +│ · │ · │ · │ · │ +│ storage │ cos.endpoint_url │ │ │ +│ storage │ cos.root │ │ │ +│ 176 rows │ │ │ │ +│ (5 shown) │ │ │ │ +└──────────────────────────────────────────────────────┘ +176 rows read in 0.059 sec. Processed 176 rows, 10.36 KiB (2.98 thousand rows/s, 175.46 KiB/s) +``` + +#### `max_col_width` & `max_width` + +The settings `max_col_width` and `max_width` specify the maximum permitted width in characters for individual columns and the entire display output, respectively. The following example sets column display width to 10 characters and the entire display width to 100 characters: + +```sql title='Example:' +// highlight-next-line +root@localhost:8000/default> .max_col_width 10 +// highlight-next-line +root@localhost:8000/default> .max_width 100 +root@localhost:8000/default> select * from system.settings; +┌──────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ value │ default │ range │ level │ description │ type │ +│ String │ String │ String │ String │ String │ String │ String │ +├────────────┼─────────┼─────────┼──────────┼─────────┼───────────────────────────────────┼────────┤ +│ acquire... │ 15 │ 15 │ None │ DEFAULT │ Sets the maximum timeout in se... │ UInt64 │ +│ aggrega... │ 0 │ 0 │ None │ DEFAULT │ Sets the maximum amount of mem... │ UInt64 │ +│ aggrega... │ 0 │ 0 │ [0, 100] │ DEFAULT │ Sets the maximum memory ratio ... │ UInt64 │ +│ auto_co... │ 50 │ 50 │ None │ DEFAULT │ Threshold for triggering auto ... │ UInt64 │ +│ collation │ utf8 │ utf8 │ ["utf8"] │ DEFAULT │ Sets the character collation. ... │ String │ +│ · │ · │ · │ · │ · │ · │ · │ +│ · │ · │ · │ · │ · │ · │ · │ +│ · │ · │ · │ · │ · │ · │ · │ +│ storage... │ 1048576 │ 1048576 │ None │ DEFAULT │ Sets the byte size of the buff... │ UInt64 │ +│ table_l... │ 10 │ 10 │ None │ DEFAULT │ Sets the seconds that the tabl... │ UInt64 │ +│ timezone │ UTC │ UTC │ None │ DEFAULT │ Sets the timezone. │ String │ +│ unquote... │ 0 │ 0 │ None │ DEFAULT │ Determines whether Databend tr... │ UInt64 │ +│ use_par... │ 0 │ 0 │ [0, 1] │ DEFAULT │ This setting is deprecated │ UInt64 │ +│ 96 rows │ │ │ │ │ │ │ +│ (10 shown) │ │ │ │ │ │ │ +└──────────────────────────────────────────────────────────────────────────────────────────────────┘ +96 rows read in 0.040 sec. Processed 96 rows, 16.52 KiB (2.38 thousand rows/s, 410.18 KiB/s) +``` + +#### `output_format` + +By setting the `output_format` to `table`, `csv`, `tsv`, or `null`, you can control the format of the query result. The `table` format presents the result in a tabular format with column headers, while the `csv` and `tsv` formats provide comma-separated values and tab-separated values respectively, and the `null` format suppresses the output formatting altogether. + +```shell title='Example:' +// highlight-next-line +root@localhost:8000/default> !set output_format table +root@localhost:8000/default> show users; +┌────────────────────────────────────────────────────────────────────────────┐ +│ name │ hostname │ auth_type │ is_configured │ default_role │ disabled │ +│ String │ String │ String │ String │ String │ Boolean │ +├────────┼──────────┼─────────────┼───────────────┼───────────────┼──────────┤ +│ root │ % │ no_password │ YES │ account_admin │ false │ +└────────────────────────────────────────────────────────────────────────────┘ +1 row read in 0.032 sec. Processed 1 row, 113 B (31.02 rows/s, 3.42 KiB/s) + +// highlight-next-line +root@localhost:8000/default> !set output_format csv +root@localhost:8000/default> show users; +root,%,no_password,YES,account_admin,false +1 row read in 0.062 sec. Processed 1 row, 113 B (16.03 rows/s, 1.77 KiB/s) + +// highlight-next-line +root@localhost:8000/default> !set output_format tsv +root@localhost:8000/default> show users; +root % no_password YES account_admin false +1 row read in 0.076 sec. Processed 1 row, 113 B (13.16 rows/s, 1.45 KiB/s) + +// highlight-next-line +root@localhost:8000/default> !set output_format null +root@localhost:8000/default> show users; +1 row read in 0.036 sec. Processed 1 row, 113 B (28.1 rows/s, 3.10 KiB/s) +``` + +#### `expand` + +The `expand` setting controls whether the output of a query is displayed as individual records or in a tabular format. When the `expand` setting is set to `auto`, the system automatically determines how to display the output based on the number of rows returned by the query. If the query returns only one row, the output is displayed as a single record. + +```shell title='Example:' +// highlight-next-line +root@localhost:8000/default> !set expand on +root@localhost:8000/default> show users; +-[ RECORD 1 ]----------------------------------- + name: root + hostname: % + auth_type: no_password +is_configured: YES + default_role: account_admin + disabled: false + +1 row read in 0.055 sec. Processed 1 row, 113 B (18.34 rows/s, 2.02 KiB/s) + +// highlight-next-line +root@localhost:8000/default> !set expand off +root@localhost:8000/default> show users; +┌────────────────────────────────────────────────────────────────────────────┐ +│ name │ hostname │ auth_type │ is_configured │ default_role │ disabled │ +│ String │ String │ String │ String │ String │ Boolean │ +├────────┼──────────┼─────────────┼───────────────┼───────────────┼──────────┤ +│ root │ % │ no_password │ YES │ account_admin │ false │ +└────────────────────────────────────────────────────────────────────────────┘ +1 row read in 0.046 sec. Processed 1 row, 113 B (21.62 rows/s, 2.39 KiB/s) + +// highlight-next-line +root@localhost:8000/default> !set expand auto +root@localhost:8000/default> show users; +-[ RECORD 1 ]----------------------------------- + name: root + hostname: % + auth_type: no_password +is_configured: YES + default_role: account_admin + disabled: false + +1 row read in 0.037 sec. Processed 1 row, 113 B (26.75 rows/s, 2.95 KiB/s) +``` + +#### `multi_line` + +When the `multi_line` setting is set to `true`, allowing input to be entered across multiple lines. As a result, the SQL query is entered with each clause on a separate line for improved readability and organization. + +```shell title='Example:' +// highlight-next-line +root@localhost:8000/default> !set multi_line true; +root@localhost:8000/default> SELECT * +> FROM system.configs; +┌──────────────────────────────────────────────────────┐ +│ group │ name │ value │ description │ +│ String │ String │ String │ String │ +├───────────┼──────────────────┼─────────┼─────────────┤ +│ query │ tenant_id │ default │ │ +│ query │ cluster_id │ default │ │ +│ query │ num_cpus │ 0 │ │ +│ · │ · │ · │ · │ +│ · │ · │ · │ · │ +│ · │ · │ · │ · │ +│ storage │ cos.endpoint_url │ │ │ +│ storage │ cos.root │ │ │ +│ 176 rows │ │ │ │ +│ (5 shown) │ │ │ │ +└──────────────────────────────────────────────────────┘ +176 rows read in 0.060 sec. Processed 176 rows, 10.36 KiB (2.91 thousand rows/s, 171.39 KiB/s) +``` + +#### `replace_newline` + +The `replace_newline` setting determines whether newline characters (\n) are replaced with the literal string (\\n) in the output. In the example below, the `replace_newline` setting is set to `true`. As a result, when the string 'Hello\nWorld' is selected, the newline character (\n) is replaced with the literal string (\\n). So, instead of displaying the newline character, the output displays 'Hello\nWorld' as 'Hello\\nWorld': + +```shell title='Example:' +// highlight-next-line +root@localhost:8000/default> !set replace_newline true +root@localhost:8000/default> SELECT 'Hello\nWorld' AS message; +┌──────────────┐ +│ message │ +│ String │ +├──────────────┤ +│ Hello\nWorld │ +└──────────────┘ +1 row read in 0.056 sec. Processed 1 row, 1 B (18 rows/s, 17 B/s) + +// highlight-next-line +root@localhost:8000/default> !set replace_newline false; +root@localhost:8000/default> SELECT 'Hello\nWorld' AS message; +┌─────────┐ +│ message │ +│ String │ +├─────────┤ +│ Hello │ +│ World │ +└─────────┘ +1 row read in 0.067 sec. Processed 1 row, 1 B (14.87 rows/s, 14 B/s) +``` + +### Configuring BendSQL Settings + +You have the following options to configure a BendSQL setting: + +- Use the `!set ` command. For more information, see [Utility Commands](#utility-commands). + +- Add and configure a setting in the configuration file `~/.config/bendsql/config.toml`. To do so, open the file and add your setting under the `[settings]` section. The following example sets the `max_display_rows` to 10 and `max_width` to 100: + +```toml title='Example:' +... +[settings] +max_display_rows = 10 +max_width = 100 +... +``` + +- Configure a setting at runtime by launching BendSQL and then specifying the setting in the format `. `. Please note that settings configured in this way only take effect in the current session. + +```shell title='Example:' +root@localhost:8000/default> .max_display_rows 10 +root@localhost:8000/default> .max_width 100 +``` + +## Utility Commands + +BendSQL provides users with a variety of commands to streamline their workflow and customize their experience. Here's an overview of the commands available in BendSQL: + +| Command | Description | +| ------------------------ | ---------------------------------- | +| `!exit` | Exits BendSQL. | +| `!quit` | Exits BendSQL. | +| `!configs` | Displays current BendSQL settings. | +| `!set ` | Modifies a BendSQL setting. | +| `!source ` | Executes a SQL file. | + +For examples of each command, please refer to the reference information below: + +#### `!exit` + +Disconnects from Databend and exits BendSQL. + +```shell title='Example:' +➜ ~ bendsql +Welcome to BendSQL 0.17.0-homebrew. +Connecting to localhost:8000 as user root. +Connected to Databend Query v1.2.427-nightly-b1b622d406(rust-1.77.0-nightly-2024-04-20T22:12:35.318382488Z) + +// highlight-next-line +root@localhost:8000/default> !exit +Bye~ +``` + +#### `!quit` + +Disconnects from Databend and exits BendSQL. + +```shell title='Example:' +➜ ~ bendsql +Welcome to BendSQL 0.17.0-homebrew. +Connecting to localhost:8000 as user root. +Connected to Databend Query v1.2.427-nightly-b1b622d406(rust-1.77.0-nightly-2024-04-20T22:12:35.318382488Z) + +// highlight-next-line +root@localhost:8000/default> !quit +Bye~ +➜ ~ +``` + +#### `!configs` + +Displays the current BendSQL settings. + +```shell title='Example:' +// highlight-next-line +root@localhost:8000/default> !configs +Settings { + display_pretty_sql: true, + prompt: "{user}@{warehouse}/{database}> ", + progress_color: "cyan", + show_progress: true, + show_stats: true, + max_display_rows: 40, + max_col_width: 1048576, + max_width: 1048576, + output_format: Table, + quote_style: Necessary, + expand: Off, + time: None, + multi_line: true, + replace_newline: true, +} +``` + +#### `!set ` + +Modifies a BendSQL setting. + +```shell title='Example:' +root@localhost:8000/default> !set display_pretty_sql false +``` + +#### `!source ` + +Executes a SQL file. + +```shell title='Example:' +➜ ~ more ./desktop/test.sql +CREATE TABLE test_table ( + id INT, + name VARCHAR(50) +); + +INSERT INTO test_table (id, name) VALUES (1, 'Alice'); +INSERT INTO test_table (id, name) VALUES (2, 'Bob'); +INSERT INTO test_table (id, name) VALUES (3, 'Charlie'); +➜ ~ bendsql +Welcome to BendSQL 0.17.0-homebrew. +Connecting to localhost:8000 as user root. +Connected to Databend Query v1.2.427-nightly-b1b622d406(rust-1.77.0-nightly-2024-04-20T22:12:35.318382488Z) + +// highlight-next-line +root@localhost:8000/default> !source ./desktop/test.sql +root@localhost:8000/default> SELECT * FROM test_table; + +SELECT + * +FROM + test_table + +┌────────────────────────────────────┐ +│ id │ name │ +│ Nullable(Int32) │ Nullable(String) │ +├─────────────────┼──────────────────┤ +│ 1 │ Alice │ +│ 2 │ Bob │ +│ 3 │ Charlie │ +└────────────────────────────────────┘ +3 rows read in 0.064 sec. Processed 3 rows, 81 B (46.79 rows/s, 1.23 KiB/s) +``` diff --git a/tidb-cloud-lake/guides/cluster-key-performance.md b/tidb-cloud-lake/guides/cluster-key-performance.md new file mode 100644 index 0000000000000..8aed1dc06277d --- /dev/null +++ b/tidb-cloud-lake/guides/cluster-key-performance.md @@ -0,0 +1,158 @@ +--- +title: Cluster Key +--- + +# Cluster Key: Automatic Data Organization for Query Acceleration + +Cluster keys provide automatic data organization to dramatically improve query performance on large tables. Databend seamlessly and continually manages all clustering operations in the background - you simply define the cluster key and Databend handles the rest. + +## What Problem Does It Solve? + +Large tables without proper organization create significant performance and maintenance challenges: + +| Problem | Impact | Automatic Clustering Solution | +|---------|--------|------------------------------| +| **Full Table Scans** | Queries read entire tables even for filtered data | Automatically organize data, read only relevant blocks | +| **Random Data Access** | Similar data scattered across storage | Continuously group related data together | +| **Slow Filter Queries** | WHERE clauses scan unnecessary rows | Auto-skip irrelevant blocks entirely | +| **High I/O Costs** | Reading massive amounts of unused data | Minimize data transfer automatically | +| **Manual Maintenance** | Need to monitor and manually re-cluster tables | Zero maintenance - automatic background optimization | +| **Resource Management** | Must allocate compute for clustering operations | Databend handles all clustering resources automatically | + +**Example**: An e-commerce table with millions of products. Without clustering, querying `WHERE category IN ('Electronics', 'Computers')` must scan all product categories. With automatic clustering by category, Databend continuously groups Electronics and Computers products together, scanning only 2 blocks instead of 1000+ blocks. + +## Benefits of Automatic Clustering + +**Ease-of-Maintenance**: Databend eliminates the need for: +- Monitoring the state of clustered tables +- Manually triggering re-clustering operations +- Designating compute resources for clustering +- Scheduling maintenance windows + +**How it Works**: After you define a cluster key, Databend automatically: +- Monitors table changes from DML operations +- Evaluates when tables would benefit from re-clustering +- Performs background clustering optimization +- Maintains optimal data organization continuously + +All you need to do is define a clustering key for each table (if appropriate) and Databend manages all future maintenance automatically. + +## How It Works + +Cluster keys organize data into storage blocks (Parquet files) based on specified columns: + +![Cluster Key Visualization](/img/sql/clustered.png) + +1. **Data Organization** → Similar values grouped into adjacent blocks +2. **Metadata Creation** → Block-to-value mappings stored for fast lookup +3. **Query Optimization** → Only relevant blocks read during queries +4. **Performance Boost** → Fewer rows scanned, faster results + +## Quick Setup + +```sql +-- Create table with cluster key +CREATE TABLE sales ( + order_id INT, + order_date TIMESTAMP, + region VARCHAR, + amount DECIMAL +) CLUSTER BY (region); + +-- Or add cluster key to existing table +ALTER TABLE sales CLUSTER BY (region, order_date); +``` + +## Choosing the Right Cluster Key + +Select columns based on your most common query filters: + +| Query Pattern | Recommended Cluster Key | Example | +|---------------|------------------------|---------| +| Filter by single column | That column | `CLUSTER BY (region)` | +| Filter by multiple columns | Multiple columns | `CLUSTER BY (region, category)` | +| Date range queries | Date/timestamp columns | `CLUSTER BY (order_date)` | +| High cardinality columns | Use expressions to reduce values | `CLUSTER BY (DATE(created_at))` | + +### Good vs Bad Cluster Keys + +| ✅ Good Choices | ❌ Poor Choices | +|----------------|----------------| +| Frequently filtered columns | Rarely used columns | +| Medium cardinality (100-10K values) | Boolean columns (too few values) | +| Date/time columns | Unique ID columns (too many values) | +| Region, category, status | Random or hash columns | + +## Monitoring Performance + +```sql +-- Check clustering effectiveness +SELECT * FROM clustering_information('database_name', 'table_name'); + +-- Key metrics to watch: +-- average_depth: Lower is better (< 2 ideal) +-- average_overlaps: Lower is better +-- block_depth_histogram: More blocks at depth 1-2 +``` + +## When to Re-cluster + +Tables become disorganized over time with data changes: + +```sql +-- Check if re-clustering is needed +SELECT IF(average_depth > 2 * LEAST(GREATEST(total_block_count * 0.001, 1), 16), + 'Re-cluster needed', + 'Clustering is good') +FROM clustering_information('your_database', 'your_table'); + +-- Re-cluster the table +ALTER TABLE your_table RECLUSTER; +``` + +## Performance Tuning + +### Custom Block Size +Adjust block size for better performance: + +```sql +-- Smaller blocks = fewer rows per query +ALTER TABLE sales SET OPTIONS( + ROW_PER_BLOCK = 100000, + BLOCK_SIZE_THRESHOLD = 52428800 +); +``` + +### Automatic Re-clustering +- `COPY INTO` and `REPLACE INTO` automatically trigger re-clustering +- Monitor clustering metrics regularly +- Re-cluster when `average_depth` becomes too high + +## Best Practices + +| Practice | Benefit | +|----------|---------| +| **Start Simple** | Use single-column cluster keys first | +| **Monitor Metrics** | Check clustering_information regularly | +| **Test Performance** | Measure query speed before/after clustering | +| **Re-cluster Periodically** | Maintain clustering after data changes | +| **Consider Costs** | Clustering consumes compute resources | + +## Important Notes + +:::tip +**When to Use Cluster Keys:** +- Large tables (millions+ rows) +- Slow query performance +- Frequent filter-based queries +- Analytical workloads + +**When NOT to Use:** +- Small tables +- Random access patterns +- Frequently changing data +::: + +--- + +*Cluster keys are most effective on large, frequently queried tables with predictable filter patterns. Start with your most common WHERE clause columns.* diff --git a/tidb-cloud-lake/guides/compliance-security.md b/tidb-cloud-lake/guides/compliance-security.md new file mode 100644 index 0000000000000..9fca2df23a66f --- /dev/null +++ b/tidb-cloud-lake/guides/compliance-security.md @@ -0,0 +1,64 @@ +--- +title: Compliance & Security +--- + +# Databend Security Design + +Databend Cloud is built with security at its core, providing comprehensive protection for your data through multiple security layers, encryption standards, and compliance certifications. + +## Security + +Databend Cloud implements multiple security layers to protect your data and control access to your resources: + +### Access Control + +Databend uses a comprehensive access control system that combines: + +- **Role-Based Access Control (RBAC)**: Manages permissions through roles assigned to users +- **Discretionary Access Control (DAC)**: Allows resource owners to directly grant permissions + +### Data Protection + +**Masking Policy** +Protects sensitive data by controlling how it's displayed to different users, helping you comply with privacy regulations while still allowing authorized access. + +**Network Policy** +Controls which IP addresses can connect to your Databend resources, allowing you to restrict access to specific networks or locations. + +**Password Policy** +Enforces strong passwords with customizable requirements for length, complexity, and rotation to prevent unauthorized access. + +### Secure Connectivity + +**AWS PrivateLink** +Enables private connections between your VPC and Databend Cloud without exposing traffic to the public internet. Currently available on AWS only. + +## Encryption + +### TLS 1.2 + +We provide end-to-end encryption for all communication. All customer data flows are solely over HTTPS. Connections are encrypted using TLS 1.2 from clients through to the Databend API gateway, ensuring: + +- Data confidentiality during transit +- Protection against man-in-the-middle attacks +- Secure client-server communication + +### Storage Encryption + +Databend Enterprise supports server-side encryption in Object Storage Service (OSS). This feature enables you to enhance data security and privacy by activating server-side encryption for data stored in OSS. You can choose the encryption method that best suits your needs: + +- AES-256 encryption +- Customer-managed keys (CMK) +- Hardware security module (HSM) integration options + +## Compliance + +At Databend, we prioritize data security and privacy, and have achieved key compliances that validate our commitment to protecting your data. Our security practices are regularly audited by independent third parties to ensure we meet the highest industry standards. + +### SOC 2 Type II + +We have successfully attained SOC 2 Type II compliance, validated by independent auditors. This certification confirms that our systems adhere to the American Institute of Certified Public Accountants (AICPA) trust service criteria for security, availability, processing integrity, confidentiality, and privacy. We continuously monitor and enhance our operational controls to maintain this standard. + +### GDPR + +Databend adheres to the General Data Protection Regulation (GDPR), the European Union's regulation designed to protect individuals' privacy and personal data. Our compliance includes strict data privacy enforcement, robust encryption, and regular privacy audits to ensure the rights and data privacy of our users across the EU are protected. diff --git a/tidb-cloud-lake/guides/connect-to-databend.md b/tidb-cloud-lake/guides/connect-to-databend.md new file mode 100644 index 0000000000000..8ccf9b6d92e29 --- /dev/null +++ b/tidb-cloud-lake/guides/connect-to-databend.md @@ -0,0 +1,56 @@ +--- +title: Connect to Databend +--- + +Databend supports multiple connection methods to suit different use cases. All options below work with both **Databend Cloud** and **self-hosted Databend**. + +## Quick Selection + +| I want to... | Recommended | +|-------------|-------------| +| Run SQL queries interactively | **BendSQL** (CLI) or **DBeaver** (GUI) | +| Build an application | Language-specific **Driver** | +| Create dashboards & reports | **BI/Visualization Tools** | + +## Connection Strings + +| Deployment | Format | +|------------|--------| +| **Databend Cloud** | `databend://:@.gw..default.databend.com:443/?warehouse=` | +| **Self-Hosted** | `databend://:@:/` | + +:::tip Getting Your Connection String +- **Databend Cloud**: Log in → Click **Connect** → Copy the generated DSN +- **Self-Hosted**: Use your server address with the configured user credentials +::: + +## SQL Clients + +| Tool | Type | Best For | +|------|------|----------| +| [BendSQL](/tidb-cloud-lake/guides/bendsql.md) | CLI | Developers, Scripting, Automation | +| [DBeaver](/tidb-cloud-lake/guides/dbeaver.md) | GUI | Data Analysis, Visual Query Building | + +## Drivers + +| Language | Guide | Use Case | +|----------|-------|----------| +| Go | [Golang Driver](/tidb-cloud-lake/guides/connect-using-golang.md) | Backend services, microservices | +| Python | [Python Connector](/tidb-cloud-lake/guides/connect-using-python.md) | Data science, analytics, ML | +| Node.js | [Node.js Driver](/tidb-cloud-lake/guides/connect-using-node-js.md) | Web applications | +| Java | [JDBC Driver](/tidb-cloud-lake/guides/connect-using-java.md) | Enterprise applications | +| Rust | [Rust Driver](/tidb-cloud-lake/guides/connect-using-rust.md) | System programming | + +## Visualization Tools + +| Tool | Type | +|------|------| +| [Grafana](/tidb-cloud-lake/guides/grafana.md) | Monitoring & Dashboards | +| [Tableau](/tidb-cloud-lake/guides/tableau.md) | Business Intelligence | +| [Superset](/tidb-cloud-lake/guides/superset.md) | Data Exploration | +| [Metabase](/tidb-cloud-lake/guides/metabase.md) | Self-Service BI | +| [Jupyter](/tidb-cloud-lake/guides/jupyter-notebook.md) | Data Science Notebooks | +| [Deepnote](/tidb-cloud-lake/guides/deepnote.md) | Collaborative Notebooks | +| [MindsDB](/tidb-cloud-lake/guides/mindsdb.md) | ML Platform | +| [Redash](/tidb-cloud-lake/guides/redash.md) | SQL-Based Dashboards | + diff --git a/tidb-cloud-lake/guides/connect-using-golang.md b/tidb-cloud-lake/guides/connect-using-golang.md new file mode 100644 index 0000000000000..063b3b97b5b7f --- /dev/null +++ b/tidb-cloud-lake/guides/connect-using-golang.md @@ -0,0 +1,8 @@ +--- +title: 'Connect to Databend Using Golang' +sidebar_label: 'Golang' +--- + +import ComponentContent from '../../../developer/00-drivers/00-golang.md'; + + \ No newline at end of file diff --git a/tidb-cloud-lake/guides/connect-using-java.md b/tidb-cloud-lake/guides/connect-using-java.md new file mode 100644 index 0000000000000..acf125eebea8c --- /dev/null +++ b/tidb-cloud-lake/guides/connect-using-java.md @@ -0,0 +1,8 @@ +--- +title: 'Connect to Databend Using Java' +sidebar_label: 'Java' +--- + +import ComponentContent from '../../../developer/00-drivers/03-jdbc.md'; + + \ No newline at end of file diff --git a/tidb-cloud-lake/guides/connect-using-node-js.md b/tidb-cloud-lake/guides/connect-using-node-js.md new file mode 100644 index 0000000000000..4c3fb81965b76 --- /dev/null +++ b/tidb-cloud-lake/guides/connect-using-node-js.md @@ -0,0 +1,8 @@ +--- +title: 'Connect to Databend Using Node.js' +sidebar_label: 'Node.js' +--- + +import ComponentContent from '../../../developer/00-drivers/02-nodejs.md'; + + \ No newline at end of file diff --git a/tidb-cloud-lake/guides/connect-using-python.md b/tidb-cloud-lake/guides/connect-using-python.md new file mode 100644 index 0000000000000..16304a8c1ae14 --- /dev/null +++ b/tidb-cloud-lake/guides/connect-using-python.md @@ -0,0 +1,8 @@ +--- +title: 'Connect to Databend Using Python' +sidebar_label: 'Python' +--- + +import ComponentContent from '../../../developer/00-drivers/01-python.md'; + + \ No newline at end of file diff --git a/tidb-cloud-lake/guides/connect-using-rust.md b/tidb-cloud-lake/guides/connect-using-rust.md new file mode 100644 index 0000000000000..385f5d9e23106 --- /dev/null +++ b/tidb-cloud-lake/guides/connect-using-rust.md @@ -0,0 +1,8 @@ +--- +title: 'Connect to Databend Using Rust' +sidebar_label: 'Rust' +--- + +import ComponentContent from '../../../developer/00-drivers/04-rust.md'; + + \ No newline at end of file diff --git a/tidb-cloud-lake/guides/connecting-to-databend-cloud-with-aws-privatelink.md b/tidb-cloud-lake/guides/connecting-to-databend-cloud-with-aws-privatelink.md new file mode 100644 index 0000000000000..83a8c15bba911 --- /dev/null +++ b/tidb-cloud-lake/guides/connecting-to-databend-cloud-with-aws-privatelink.md @@ -0,0 +1,85 @@ +--- +title: "Connecting to Databend Cloud with AWS PrivateLink" +sidebar_label: "AWS PrivateLink" +--- + +# Connecting to Databend Cloud with AWS PrivateLink + +PrivateLink-style private endpoints offered by major clouds (AWS PrivateLink, Azure Private Link, Google Private Service Connect, etc.) let you reach Databend Cloud through private IP addresses inside your own network boundary, so no traffic has to traverse the public internet. That keeps your datasets, credentials, and admin actions on the provider's backbone and aligned with the network policies you already operate. + +## Benefits + +- Network isolation: traffic never leaves your VPC/VPN boundary, removing exposure to public endpoints. +- Compliance ready: easier to satisfy internal audits and industry requirements that forbid internet egress. +- Stable performance: traffic follows the cloud provider backbone instead of unpredictable internet routes. +- Simplified controls: reuse your existing security groups, route tables, and monitoring to govern access. + +## How it works + +After Databend Cloud approves the cloud account or project you plan to connect, you create a private endpoint that points to the Databend PrivateLink service for your region. The cloud provider automatically allocates private IP addresses and, once private DNS is enabled, your Databend Cloud domains resolve to those addresses so every session stays on the secure, private path. + +## How to setup AWS PrivateLink + +1. Provide the AWS account ID you are planning to connect to Databend Cloud: + + For example: `123456789012` + +2. Verify your VPC settings + + ![VPC Settings](/img/cloud/privatelink/aws/vpc-settings.png) + + Ensure `Enable DNS resolution` and `Enable DNS hostnames` are checked. + +3. Wait for cloud admin adding your account to whitelist, and get a service name for the cluster to connect to: + + For example: `com.amazonaws.vpce.us-east-2.vpce-svc-0123456789abcdef0` + +4. Prepare a security group with tcp 443 port open: + + ![Security Group](/img/cloud/privatelink/aws/security-group.png) + +5. Goto AWS Console: + + https://us-east-2.console.aws.amazon.com/vpcconsole/home?region=us-east-2#Endpoints: + + Click `Create endpoint`: + + ![Create Endpoint Button](/img/cloud/privatelink/aws/create-endpoint-1.png) + + ![Create Endpoint Sheet](/img/cloud/privatelink/aws/create-endpoint-2.png) + + Select the previously created security group `HTTPS` + + ![Create Endpoint SG](/img/cloud/privatelink/aws/create-endpoint-3.png) + + ![Create Endpoint Done](/img/cloud/privatelink/aws/create-endpoint-4.png) + +6. Wait for cloud admin approving your connect request: + + ![Request](/img/cloud/privatelink/aws/request.png) + +7. Wait for the PrivateLink creation: + + ![Creation](/img/cloud/privatelink/aws/creation.png) + +8. Modify private DNS name setting: + + ![DNS Menu](/img/cloud/privatelink/aws/dns-1.png) + + Enable private DNS names: + + ![DNS Sheet](/img/cloud/privatelink/aws/dns-2.png) + + Wait for changes to apply. + +9. Verify accessing Databend Cloud via PrivateLink: + + ![Verify DNS](/img/cloud/privatelink/aws/verify-1.png) + + ![Verify Response](/img/cloud/privatelink/aws/verify-2.png) + + Gateway domain is resolved to VPC internal IP address. + +:::info +Congratulations! You have successfully connected to Databend Cloud with AWS PrivateLink. +::: diff --git a/tidb-cloud-lake/guides/continuous-data-pipelines.md b/tidb-cloud-lake/guides/continuous-data-pipelines.md new file mode 100644 index 0000000000000..66ff3c994e6de --- /dev/null +++ b/tidb-cloud-lake/guides/continuous-data-pipelines.md @@ -0,0 +1,25 @@ +--- +title: Continuous Data Pipelines +--- + +Build end-to-end change data capture (CDC) flows in Databend with two primitives: + +- **Streams** capture every INSERT/UPDATE/DELETE until you consume them. +- **Tasks** run SQL on a schedule or when a stream reports new rows. + +## Quick Navigation + +- [Example 1: Append-Only Stream Copy](/tidb-cloud-lake/guides/track-and-transform-data-via-streams.md#example-1-append-only-stream-copy) – capture inserts and consume them into another table. +- [Example 2: Standard Stream Updates](/tidb-cloud-lake/guides/track-and-transform-data-via-streams.md#example-2-standard-stream-updates) – see how updates/deletes appear and why only one consumer can drain a stream. +- [Example 3: Incremental Stream Metrics](/tidb-cloud-lake/guides/track-and-transform-data-via-streams.md#example-3-incremental-stream-metrics) – join multiple streams with `WITH CONSUME` to compute deltas batch by batch. +- [Example 1: Scheduled Copy Task](/tidb-cloud-lake/guides/track-and-transform-data-via-streams.md#example-1-scheduled-copy) – generate and load files with two recurring tasks. +- [Example 2: Stream-Triggered Merge](/tidb-cloud-lake/guides/automate-data-loading-with-tasks.md#example-2-stream-triggered-merge) – fire a task only when `STREAM_STATUS` is true. + +## Why CDC in Databend + +- **Lightweight** – streams keep the latest change set without duplicating full tables. +- **Transactional** – stream consumption succeeds or rolls back with your SQL statement. +- **Incremental** – rerun the same query with `WITH CONSUME` to process only new rows. +- **Schedulable** – tasks let you automate the copy, merge, or alert logic you already expressed in SQL. + +Dive into the stream examples first, then combine them with tasks to automate your pipeline. diff --git a/tidb-cloud-lake/guides/dashboards.md b/tidb-cloud-lake/guides/dashboards.md new file mode 100644 index 0000000000000..2d70e2fafc2bd --- /dev/null +++ b/tidb-cloud-lake/guides/dashboards.md @@ -0,0 +1,64 @@ +--- +title: Dashboards +--- +import StepsWrap from '@site/src/components/StepsWrap'; +import StepContent from '@site/src/components/Steps/step-content'; +import EllipsisSVG from '@site/static/img/icon/ellipsis.svg'; + +Dashboards are employed to present query results through a variety of chart types, including **scorecards**, **pie charts**, **bar charts**, and **line charts**. These charts are generated from the query results. You have the option to create a chart based on the query result after executing a query in a worksheet. Refreshing a dashboard allows you to re-execute the queries corresponding to the charts, thereby updating the charts with the latest results. + +![Alt text](@site/static/img/documents/dashboard/dashboard.png) + +## Creating a Dashboard + +In Databend Cloud, you can create multiple dashboards as needed. A dashboard can contain one or multiple charts. Each individual chart corresponds to a specific query result, yet it can be integrated into multiple dashboards. + +**To create a dashboard**: + +1. In a worksheet, run a query for which you intend to generate a chart using the query result. + +2. In the result area, click on the **Chart** tab. + +![Alt text](@site/static/img/documents/dashboard/chart-btn.png) + +3. On the **Chart** tab, choose a chart type from the dropdown menu on the right. Next, specify the data and customize the chart's appearance using the options found on the **Data** and **Style** tabs below the dropdown list. + +Please note that these aggregation functions assist in summarizing and revealing valuable patterns from the raw data in query results. The available functions for aggregation vary based on the distinct data types and the chart types you select. + + +| Function | Description | +|----------------------|----------------------------------------------------------------| +| None | No alteration is applied to the data. | +| Count | Calculates the number of records for the field in the query results (except the records containing NULL and '' values). | +| Min | Computes the minimum value within the query results. | +| Max | Computes the maximum value within the query results. | +| Median | Calculates the median value within the query results. | +| Sum | Calculates the sum of numerical values within the query results. | +| Average | Computes the average value of numerical data within the query results. | +| Mode | Identifies the most frequently occurring value within the query results. | + +4. Return to the Databend Cloud homepage and select **Dashboards** in the left navigation menu, then click **New Dashboard**. + +5. In the new dashboard, click on **Add Chart**. Drag and drop the chart from the left pane onto the dashboard. If you have multiple charts available in the left pane, feel free to drag as many as you need. + +:::note +After generating a chart from the query results in a worksheet, please avoid running other queries in the same worksheet, as doing so might result in the chart becoming unavailable on the dashboard. +::: + +## Sharing a Dashboard + +You can share your dashboards with everyone in your organization or specific individuals. To do so, click the ellipse button on the dashboard you want to share, then select **Share**. + +![alt text](@site/static/img/documents/dashboard/dashboard-share.png) + +When sharing a dashboard, you can choose one of the following permission levels to control how others can access it: + +- **Read Only**: View the dashboard but cannot make changes or run queries to retrieve the latest results. +- **Execute**: Run queries to retrieve the latest results or interact with the dashboard without modifying it. +- **Edit**: Modify the dashboard, including changing queries and how the dashboard reflects the results. + +To view the dashboards shared with you by others, click **Dashboards** in the sidebar, then click the **Shared with Me** tab on the right. + +## Tutorials + +- [Dashboarding COVID-19 Data](/tutorials/cloud-ops/dashboard) diff --git a/tidb-cloud-lake/guides/data-integration-overview.md b/tidb-cloud-lake/guides/data-integration-overview.md new file mode 100644 index 0000000000000..c1a4e59236bf1 --- /dev/null +++ b/tidb-cloud-lake/guides/data-integration-overview.md @@ -0,0 +1,78 @@ +--- +title: Data Integration +--- + +# Data Integration Overview + +The Data Integration feature in Databend Cloud enables you to load data from external sources into Databend through a visual, no-code interface. You can create data sources, configure integration tasks, and monitor synchronization — all from the Databend Cloud console. + +## Supported Data Sources + +| Data Source | Description | +| -------------------- | ---------------------------------------------------------------------------------------- | +| [MySQL](/tidb-cloud-lake/guides/integrate-with-mysql.md) | Sync data from MySQL databases with support for Snapshot, CDC, and Snapshot + CDC modes. | +| [Amazon S3](/tidb-cloud-lake/guides/integrate-with-amazon-s3.md) | Import files from Amazon S3 buckets with support for CSV, Parquet, and NDJSON formats. | + +## Key Concepts + +### Data Source + +A data source represents a connection to an external system. It stores the credentials and connection details needed to access the source data. Once configured, a data source can be reused across multiple integration tasks. + +Databend Cloud currently supports two types of data sources: + +- **MySQL - Credentials**: Connection to a MySQL database (host, port, username, password, database). +- **AWS - Credentials**: Connection to Amazon S3 (Access Key and Secret Key). + +### Integration Task + +An integration task defines how data flows from a source to a target table in Databend. Each task specifies the source configuration, target warehouse and table, and operational parameters specific to the data source type. + +## Managing Data Sources + +![Data Sources Overview](/img/cloud/dataintegration/databendcloud-dataintegration-datasource-overview.png) + +To manage data sources, navigate to **Data** > **Data Sources** from the left sidebar. From this page you can: + +- View all configured data sources +- Create new data sources +- Edit or delete existing data sources +- Test connectivity to verify credentials + +:::tip +It is recommended to always test the connection before saving a data source. This helps catch common issues such as incorrect credentials or network restrictions early. +::: + +## Managing Tasks + +### Starting and Stopping Tasks + +After creation, a task is in a **Stopped** state. To begin data synchronization, click the **Start** button on the task. + +![Task List](/img/cloud/dataintegration/dataintegration-task-list-with-action-button.png) + +To stop a running task, click the **Stop** button. The task will gracefully shut down and save its progress. + +### Task Status + +The Data Integration page displays all tasks with their current status: + +| Status | Description | +| ------- | ----------------------------- | +| Running | Task is actively syncing data | +| Stopped | Task is not running | +| Failed | Task encountered an error | + +### Viewing Run History + +Click on a task to view its execution history. The run history includes: + +- Execution start and end times +- Number of rows synced +- Error details (if any) + +![Run History](/img/cloud/dataintegration/dataintegration-run-history-page.png) + +## Video Tour + + diff --git a/tidb-cloud-lake/guides/data-lifecycle.md b/tidb-cloud-lake/guides/data-lifecycle.md new file mode 100644 index 0000000000000..56e181da41da9 --- /dev/null +++ b/tidb-cloud-lake/guides/data-lifecycle.md @@ -0,0 +1,76 @@ +--- +title: Data Lifecycle in Databend +sidebar_label: Data Lifecycle +--- + +Databend supports familiar Data Definition Language (DDL) and Data Manipulation Language (DML) commands, making it easy for you to manage your database. Whether you're organizing, storing, querying, modifying, or deleting data, Databend follows the same industry standards you're accustomed to. + +## Databend Objects + +Databend supports the following objects to create and modify them: + +- Database +- Table +- External Table +- Stream +- View +- Index +- Stage +- File Format +- Connection +- User Defined Function (UDF) +- External Function +- User +- Role +- Grants +- Warehouse +- Task + +## Organizing Data + +Arrange your data in databases and tables. + +Key Commands: + +- [`CREATE DATABASE`](/tidb-cloud-lake/sql/create-database.md): To create a new database. +- [`ALTER DATABASE`](/tidb-cloud-lake/sql/rename-database.md): To modify an existing database. +- [`CREATE TABLE`](/tidb-cloud-lake/sql/create-table.md): To create a new table. +- [`ALTER TABLE`](/tidb-cloud-lake/sql/alter-table.md): To modify an existing table. + +## Storing Data + +Directly add data to your tables. Databend also allows importing data from external files into its tables. + +Key Commands: + +- [`INSERT`](/tidb-cloud-lake/sql/insert.md): To add data to a table. +- [`COPY INTO
`](/tidb-cloud-lake/sql/copy-into-table.md): To bring in data from an external file. + +## Querying Data + +After your data is in the tables, use `SELECT` to look at and analyze it. + +Key Command: + +- [`SELECT`](/tidb-cloud-lake/sql/select.md): To get data from a table. + +## Working with Data + +Once your data is in Databend, you can update, replace, merge, or delete it as needed. + +Key Commands: + +- [`UPDATE`](/tidb-cloud-lake/sql/update.md): To change data in a table. +- [`REPLACE`](/tidb-cloud-lake/sql/replace.md): To replace existing data. +- [`MERGE`](/tidb-cloud-lake/sql/merge.md): To seamlessly insert, update, and delete by comparing data between main and source tables or subqueries. +- [`DELETE`](/tidb-cloud-lake/sql/delete.md): To remove data from a table. + +## Removing Data + +Databend allows you to delete specific data or entire tables and databases. + +Key Commands: + +- [`TRUNCATE TABLE`](/tidb-cloud-lake/sql/truncate-table.md): To clear a table without deleting its structure. +- [`DROP TABLE`](/tidb-cloud-lake/sql/drop-table.md): To remove a table. +- [`DROP DATABASE`](/tidb-cloud-lake/sql/drop-database.md): To delete a database. diff --git a/tidb-cloud-lake/guides/data-management.md b/tidb-cloud-lake/guides/data-management.md new file mode 100644 index 0000000000000..8108e2274d041 --- /dev/null +++ b/tidb-cloud-lake/guides/data-management.md @@ -0,0 +1,12 @@ +--- +title: Data Management +--- + +# Data Management + +| Category | Description | Key Features | Common Operations | +|----------|-------------|--------------|------------------| +| **[Data Lifecycle](/tidb-cloud-lake/guides/data-lifecycle.md)** | Create and manage objects | • Database & Table
• External Tables
• Streams & Views
• Indexes & Stages | • CREATE/DROP/ALTER
• SHOW TABLES
• DESCRIBE TABLE | +| **[Data Recovery](/tidb-cloud-lake/guides/data-recovery.md)** | Access and restore past data | • Time Travel
• Flashback Tables
• Backup & Restore
• AT & UNDROP | • SELECT ... AT
• FLASHBACK TABLE
• BENDSAVE BACKUP | +| **[Data Protection](/tidb-cloud-lake/guides/data-protection.md)** | Secure access and prevent loss | • Network Policies
• Access Control
• Time Travel & Fail-safe
• Data Encryption | • NETWORK POLICY
• GRANT/REVOKE
• USER/ROLE | +| **[Data Recycle](/tidb-cloud-lake/guides/data-purge-and-recycle.md)** | Free up storage space | • VACUUM Commands
• Retention Policies
• Orphan File Cleanup
• Temporary File Management | • VACUUM TABLE
• VACUUM DROP TABLE
• DATA_RETENTION_TIME | diff --git a/tidb-cloud-lake/guides/data-protection.md b/tidb-cloud-lake/guides/data-protection.md new file mode 100644 index 0000000000000..f74ccf6f43e6d --- /dev/null +++ b/tidb-cloud-lake/guides/data-protection.md @@ -0,0 +1,18 @@ +--- +title: Data Protection in Databend Cloud +sidebar_label: Data Protection +--- + +Databend Cloud's Continuous Data Protection (CDP) offers easy-to-use features to keep your data safe from mistakes, harmful actions, and software problems. It makes sure your data can always be gotten back, even if it's changed, lost, or damaged by accident or on purpose. + +### CDP Features in Databend Cloud +- [Network Policies](/tidb-cloud-lake/guides/network-policy.md) + - Set who can access Databend Cloud based on their internet address. Helps keep your data safe. + +- [Access Control](/tidb-cloud-lake/guides/access-control.md) + - Decide who can see or use different parts of Databend Cloud. Keeps things organized and secure. + +- [Time Travel & Fail-safe](/tidb-cloud-lake/guides/data-recovery.md) + - Get back old or lost data. + - Time Travel lets you look at and bring back past data. + - Fail-safe is for big emergencies, used by Databend Cloud to recover data when there's a serious problem. diff --git a/tidb-cloud-lake/guides/data-purge-and-recycle.md b/tidb-cloud-lake/guides/data-purge-and-recycle.md new file mode 100644 index 0000000000000..fbc766969a99b --- /dev/null +++ b/tidb-cloud-lake/guides/data-purge-and-recycle.md @@ -0,0 +1,133 @@ +--- +title: Data Purge and Recycle +sidebar_label: Data Recycle +--- + +## Overview + +In Databend, data is not immediately deleted when you run `DROP`, `TRUNCATE`, or `DELETE` commands. This enables Databend's time travel feature, allowing you to access previous states of your data. However, this approach means that storage space is not automatically freed up after these operations. + +``` +Before DELETE: After DELETE: After VACUUM: ++----------------+ +----------------+ +----------------+ +| Current Data | | New Version | | Current Data | +| | | (After DELETE) | | (After DELETE) | ++----------------+ +----------------+ +----------------+ +| Historical Data| | Historical Data| | | +| (Time Travel) | | (Original Data)| | | ++----------------+ +----------------+ +----------------+ + Storage not freed Storage freed +``` + +## VACUUM Commands and Cleanup Scope + +Databend provides three VACUUM commands with **different cleanup scopes**. Understanding what each command cleans is crucial for data management. + +``` +VACUUM DROP TABLE +├── Target: Dropped tables (after DROP TABLE command) +├── S3 Storage: ✅ Removes ALL data (files, segments, blocks, indexes, statistics) +├── Meta Service: ✅ Removes ALL metadata (schema, permissions, records) +└── Result: Complete table removal - CANNOT be recovered + +VACUUM TABLE +├── Target: Historical data and orphan files for active tables +├── S3 Storage: ✅ Removes old snapshots, orphan segments/blocks, indexes/stats +├── Meta Service: ❌ Preserves table structure and current metadata +└── Result: Table stays active, only history cleaned + +VACUUM TEMPORARY FILES +├── Target: Temporary spill files from queries (joins, sorts, aggregates) +├── S3 Storage: ✅ Removes temp files from crashed/interrupted queries +├── Meta Service: ❌ No metadata (temp files don't have any) +└── Result: Storage cleanup only, rarely needed +``` + +--- + +> **🚨 Critical**: Only `VACUUM DROP TABLE` affects the meta service. Other commands only clean storage files. + +## Using VACUUM Commands + +The VACUUM command family is the primary method for cleaning data in Databend. + +### VACUUM DROP TABLE + +Permanently removes dropped tables from both storage and metadata. + +```sql +VACUUM DROP TABLE [FROM ] [DRY RUN [SUMMARY]] [LIMIT ]; +``` + +**Options:** +- `FROM `: Restrict to a specific database +- `DRY RUN [SUMMARY]`: Preview files to be removed without actually deleting them +- `LIMIT `: Limit the number of files to be vacuumed + +**Examples:** + +```sql +-- Preview files that would be removed +VACUUM DROP TABLE DRY RUN; + +-- Preview summary of files that would be removed +VACUUM DROP TABLE DRY RUN SUMMARY; + +-- Remove dropped tables from the "default" database +VACUUM DROP TABLE FROM default; + +-- Remove up to 1000 files from dropped tables +VACUUM DROP TABLE LIMIT 1000; +``` + +### VACUUM TABLE + +Removes historical data and orphan files for active tables (storage-only cleanup). + +```sql +VACUUM TABLE [DRY RUN [SUMMARY]]; +``` + +**Options:** +- `DRY RUN [SUMMARY]`: Preview files to be removed without actually deleting them + +**Examples:** + +```sql +-- Preview files that would be removed +VACUUM TABLE my_table DRY RUN; + +-- Preview summary of files that would be removed +VACUUM TABLE my_table DRY RUN SUMMARY; + +-- Remove historical data from my_table +VACUUM TABLE my_table; +``` + +### VACUUM TEMPORARY FILES + +Removes temporary spill files created during query execution. + +```sql +VACUUM TEMPORARY FILES; +``` + +> **Note**: Rarely needed during normal operation since Databend automatically handles cleanup. Manual cleanup is typically only required when Databend crashes during query execution. + +## Adjusting Data Retention Time + +The VACUUM commands remove data files older than the `DATA_RETENTION_TIME_IN_DAYS` setting. By default, Databend retains historical data for 1 day (24 hours). You can adjust this setting: + +```sql +-- Change retention period to 2 days +SET GLOBAL DATA_RETENTION_TIME_IN_DAYS = 2; + +-- Check current retention setting +SHOW SETTINGS LIKE 'DATA_RETENTION_TIME_IN_DAYS'; +``` + +| Edition | Default Retention | Maximum Retention | +| ---------------------------------------- | ----------------- | ---------------- | +| Databend Community & Enterprise Editions | 1 day (24 hours) | 90 days | +| Databend Cloud (Personal) | 1 day (24 hours) | 1 day (24 hours) | +| Databend Cloud (Business) | 1 day (24 hours) | 90 days | diff --git a/tidb-cloud-lake/guides/data-recovery.md b/tidb-cloud-lake/guides/data-recovery.md new file mode 100644 index 0000000000000..a91d84b969d52 --- /dev/null +++ b/tidb-cloud-lake/guides/data-recovery.md @@ -0,0 +1,142 @@ +--- +title: Data Recovery +--- +import EEFeature from '@site/src/components/EEFeature'; + + + +This topic explains how to back up and restore data in Databend. + +## Time Travel: Easy Access to Past Data + +With Databend Time Travel, you can revisit and retrieve data from the past, even if it's been altered or removed. It's perfect for: + +- **Getting Back Deleted Data:** Helps you get back important things like tables, databases that were deleted, whether by accident or on purpose. + +- **Copying and Saving Past Data:** Lets you copy and save important data from earlier times. + +- **Looking at Past Data Use:** Makes it easier to see how data was used or changed at certain times. + +### Main Uses of Time Travel + +- **Access Past Data**: Look at data from the past, even if it has been changed or deleted. +- **Recover Lost Data**: Bring back tables and databases that were deleted using the [FLASHBACK TABLE](/tidb-cloud-lake/sql/flashback-table.md) command. + +### Time Travel SQL Extensions + +- **SQL Extensions for Time Travel:** Use special SQL clauses like [`AT`](/tidb-cloud-lake/sql/at.md) in SELECT statements and CREATE commands to specify the exact point in history you want to access. +- **Revive Deleted Data:** Use the `UNDROP` command for tables, databases. + +### Setting the Data Retention Period + +- **Personal Edition**: Choose between no retention (0 days) or the default of **1 day**. +- **Business Edition and Higher**: + - For temporary data: Set to 0 or the default of **1 day**. + - For permanent data: Choose any period from **0 to 90 days**. + +:::info Note + +Setting a retention period of 0 days means Time Travel won't be available for that data. + +::: + +### Adjusting Data Retention Time + +Change the data keeping time with the `DATA_RETENTION_TIME_IN_DAYS` setting, which is usually 1 day. This decides how long to keep old data. + +## Fail-safe: Extra Protection for Your Data + +Fail-safe in Databend Cloud is an additional safety feature, different from Time Travel. It's designed to protect your data in case of system issues or security incidents. + +### How Fail-safe Works + +Fail-safe offers a fixed 7-day recovery window after the Time Travel period ends. + +Fail-safe includes: + +- **MetaData Recovery:** Uses versioning in the meta-service to recover deleted tables. +- **Data Recovery:** Uses AWS S3's versioning to save data that's been changed or deleted. + +:::caution Attention + +- Fail-safe is an emergency service, not user-configurable, provided by Databend Cloud. +- It should be used only after other recovery methods don't work. +- Not intended for regular historical data access beyond the Time Travel period. +- For restoring data after big problems, and you can't set it up yourself. +- Recovery times can vary from a few hours to several days, depending on the situation. + +::: + + +## BendSave + +BendSave is a command-line tool for backing up and restoring both metadata and actual data files in Databend. It stores backups in S3-compatible object storage, making it ideal for disaster recovery. + +### Downloading BendSave + +The BendSave binary is distributed as part of the [Databend release packages](https://github.com/databendlabs/databend/releases). + +To download: + +1. Go to the latest [Databend Releases](https://github.com/databendlabs/databend/releases). + +2. Select the release that matches your currently running `databend-query` version. + +3. Download and extract the release package. + +4. Inside the extracted archive, locate the **bin** directory and find the **databend-bendsave** binary. + +### Command Reference + +To back up the metadata of a Databend cluster: + +```bash +databend-bendsave backup \ + --from \ + --to +``` + +| Parameter | Description | +|-----------|-------------------------------------------------------------------------| +| from | Path to the `databend-query.toml` configuration file. | +| to | Backup destination, e.g.,`s3://backup?endpoint=http://127.0.0.1:9900&access_key_id=xxx&secret_access_key=xxx`.
- It is recommended to use environment variables such as `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` to provide credentials.| + +To restore the metadata to a Databend cluster: + +```bash +databend-bendsave restore \ + --from \ + --to-query \ + --to-meta \ + --confirm +``` + +| Parameter | Description | +|-----------|-----------------------------------------------------------------------| +| from | Backup source path. | +| to-query | Path to the restored `databend-query.toml` configuration file. | +| to-meta | Path to the restored `databend-meta.toml` configuration file. | +| confirm | Required flag to confirm restoration and avoid accidental overwrites. | + +#### Examples + +```bash +export AWS_ACCESS_KEY_ID=minioadmin +export AWS_SECRET_ACCESS_KEY=minioadmin + +# Backup +./databend-bendsave backup \ + --from ../configs/databend-query.toml \ + --to 's3://backupbucket?endpoint=http://127.0.0.1:9000/®ion=us-east-1' + +# Restore +./databend-bendsave restore \ + --from "s3://backupbucket?endpoint=http://127.0.0.1:9000/®ion=us-east-1" \ + --to-query ../configs/databend-query.toml \ + --to-meta ../configs/databend-meta.toml \ + --confirm +``` + +### Tutorials + +- [Backing Up and Restoring Data with BendSave](/tidb-cloud-lake/tutorials/backup-restore-with-bendsave.md) diff --git a/tidb-cloud-lake/guides/dbeaver.md b/tidb-cloud-lake/guides/dbeaver.md new file mode 100644 index 0000000000000..92e75614316fb --- /dev/null +++ b/tidb-cloud-lake/guides/dbeaver.md @@ -0,0 +1,107 @@ +--- +title: DBeaver +--- + +import StepsWrap from '@site/src/components/StepsWrap'; +import StepContent from '@site/src/components/Steps/step-content'; + +[DBeaver](https://dbeaver.com/) supports connecting to Databend using a built-in driver categorized under **Analytical**, available starting from **version 24.3.1**. + +![](@site/static/img/connect/dbeaver.png) + +## Prerequisites + +- DBeaver 24.3.1 or later version installed +- For self-hosted Databend: [Docker](https://www.docker.com/) installed (if using Docker deployment) + +## User Authentication + +If you are connecting to a self-hosted Databend instance, you can use the admin users specified in the [databend-query.toml](https://github.com/databendlabs/databend/blob/main/scripts/distribution/configs/databend-query.toml) configuration file, or you can connect using an SQL user created with the [CREATE USER](/tidb-cloud-lake/sql/create-user.md) command. + +For connections to Databend Cloud, you can use the default `cloudapp` user or an SQL user created with the [CREATE USER](/tidb-cloud-lake/sql/create-user.md) command. Please note that the user account you use to log in to the [Databend Cloud console](https://app.databend.com) cannot be used for connecting to Databend Cloud. + +## Connecting to Self-Hosted Databend + + + + +### Start Databend (Docker) + +Run the following command to launch a Databend instance: + +:::note +If no custom values for `QUERY_DEFAULT_USER` or `QUERY_DEFAULT_PASSWORD` are specified when starting the container, a default `root` user will be created with no password. +::: + +```bash +docker run -d --name databend \ + -p 3307:3307 -p 8000:8000 -p 8124:8124 -p 8900:8900 \ + datafuselabs/databend:nightly +``` + + + + +### Configure Connection + +1. In DBeaver, go to **Database** > **New Database Connection** to open the connection wizard, then select **Databend** under the **Analytical** category. + +![alt text](@site/static/img/connect/dbeaver-analytical.png) + +2. Enter `root` for the **Username** (or your configured username). + +![alt text](@site/static/img/connect/dbeaver-user-root.png) + +3. Click **Test Connection** to verify the connection. If this is your first time connecting to Databend, you will be prompted to download the driver. Click **Download** to proceed. + +![alt text](@site/static/img/connect/dbeaver-download-driver.png) + +Once the download is complete, the test connection should succeed: + +![alt text](@site/static/img/connect/dbeaver-success.png) + + + + +## Connecting to Databend Cloud + + + + +### Obtain Connection Information + +Log in to Databend Cloud to obtain connection information. For more information, see [Connecting to a Warehouse](/tidb-cloud-lake/guides/warehouse.md#connecting). + +![alt text](@site/static/img/connect/dbeaver-connect-info.png) + +:::note +If your `user` or `password` contains special characters, you need to provide them separately in the corresponding fields (e.g., the `Username` and `Password` fields in DBeaver). In this case, Databend will handle the necessary encoding for you. However, if you're providing the credentials together (e.g., as `user:password`), you must ensure that the entire string is properly encoded before use. +::: + + + + +### Configure Connection + +1. In DBeaver, go to **Database** > **New Database Connection** to open the connection wizard, then select **Databend** under the **Analytical** category. + +![alt text](@site/static/img/connect/dbeaver-analytical.png) + +2. In the **Main** tab, enter the **Host**, **Port**, **Username**, and **Password** based on the connection information obtained in the previous step. + +![alt text](@site/static/img/connect/dbeaver-main-tab.png) + +3. In the **Driver properties** tab, enter the **Warehouse** name based on the connection information obtained in the previous step. + +![alt text](@site/static/img/connect/dbeaver-driver-properties.png) + +4. In the **SSL** tab, select the **Use SSL** checkbox. + +![alt text](@site/static/img/connect/dbeaver-use-ssl.png) + +5. Click **Test Connection** to verify the connection. If this is your first time connecting to Databend, you will be prompted to download the driver. Click **Download** to proceed. Once the download is complete, the test connection should succeed: + +![alt text](@site/static/img/connect/dbeaver-cloud-success.png) + + + diff --git a/tidb-cloud-lake/guides/deepnote.md b/tidb-cloud-lake/guides/deepnote.md new file mode 100644 index 0000000000000..88761f972370d --- /dev/null +++ b/tidb-cloud-lake/guides/deepnote.md @@ -0,0 +1,44 @@ +--- +title: Deepnote +sidebar_position: 5 +--- + +[Deepnote](https://deepnote.com) allows you to easily work on your data science projects, together in real-time and in one place with your friends and colleagues; helping you turn your ideas and analyses into products faster. Deepnote is built for the browser so you can use it across any platform (Windows, Mac, Linux or Chromebook). No downloads required, with updates shipped to you daily. All changes are instantly saved. + +Both Databend and Databend Cloud support integration with Deepnote, requiring a secure connection. When integrating with Databend, please note that the default port is `8124`. + +## Tutorial: Integrating with Deepnote + +This tutorial guides you through the process of integrating Databend Cloud with Deepnote. + +### Step 1. Set up Environment + +Make sure you can log in to your Databend Cloud account and obtain the connection information for a warehouse. For more details, see [Connecting to a Warehouse](/tidb-cloud-lake/guides/warehouse.md#connecting). + +### Step 2. Connect to Databend Cloud + +1. Sign in to Deepnote, or create an account if you don't have one. + +2. Click **+** to the right of **INTEGRATIONS** in the left sidebar, then select **ClickHouse**. + +![Alt text](/img/integration/11.png) + +3. Complete the fields with your connection information. + +| Parameter | Description | +| ---------------- | ---------------------------------- | +| Integration name | For example, `Databend` | +| Host name | Obtain from connection information | +| Port | `443` | +| Username | `cloudapp` | +| Password | Obtain from connection information | + +4. Create a notebook. + +5. In the notebook, navigate to the **SQL** section, and then choose the connection you previously created. + +![Alt text](/img/integration/13.png) + +You're all set! Refer to the Deepnote documentation for how to work with the tool. + +![Alt text](/img/integration/15.png) diff --git a/tidb-cloud-lake/guides/editions.md b/tidb-cloud-lake/guides/editions.md new file mode 100644 index 0000000000000..d3a273bdf1085 --- /dev/null +++ b/tidb-cloud-lake/guides/editions.md @@ -0,0 +1,132 @@ +--- +title: Editions +--- + +Databend Cloud comes in three editions: **Personal**, **Business**, and **Dedicated**, that you can choose from to serve a wide range of needs and ensure optimal performance for different use cases. + +For the pricing information, see [Pricing & Billing](/tidb-cloud-lake/guides/pricing-billing.md). For the detailed feature list among these editions, see [Feature Lists](#feature-lists). + +## Feature Lists + +The following are feature lists of Databend Cloud among editions: + +#### Release Management + + + +#### Security & Governance + +Extended Time Travel.', '', '90 days', '90 days'], +['Column-level Security to apply masking policies to columns in tables or views.', '✓', '✓', '✓'], +['Audit the user access history through the Account Usage ACCESS_HISTORY view.', '✓', '✓', '✓'], +['Support for private connectivity to the Databend Cloud service using AWS PrivateLink.', '', '✓', '✓'], +['Dedicated metadata store and pool of compute resources (used in virtual warehouses).', '', '', '✓'], +]} +/> + +#### Compute Resource + + + +#### SQL Support + +creating table with external location.', '✓', '✓', '✓'], +['Supports for ATTACH TABLE.', '✓', '✓', '✓'], +]} +/> + +#### Interfaces & Tools + + + +#### Data Import & Export + + + +#### Data Pipelines + + + +#### Customer Support + +Response to non-severity-1 issues in hours.', '8h', '4h', '1h'], +]} +/> diff --git a/tidb-cloud-lake/guides/external-ai-functions.md b/tidb-cloud-lake/guides/external-ai-functions.md new file mode 100644 index 0000000000000..808a4f38652b9 --- /dev/null +++ b/tidb-cloud-lake/guides/external-ai-functions.md @@ -0,0 +1,77 @@ +--- +title: External AI Functions +--- + +# External AI Functions + +Build powerful AI/ML capabilities by connecting Databend with your own infrastructure. External functions let you deploy custom models, leverage GPU acceleration, and integrate with any ML framework while keeping your data secure. + +## Key Capabilities + +| Feature | Benefits | +|---------|----------| +| **Custom Models** | Use any open-source or proprietary AI/ML models | +| **GPU Acceleration** | Deploy on GPU-equipped machines for faster inference | +| **Data Privacy** | Keep your data within your infrastructure | +| **Scalability** | Independent scaling and resource optimization | +| **Flexibility** | Support for any programming language and ML framework | + +## How It Works + +1. **Create AI Server**: Build your AI/ML server using Python and [databend-udf](https://pypi.org/project/databend-udf) +2. **Register Function**: Connect your server to Databend with `CREATE FUNCTION` +3. **Use in SQL**: Call your custom AI functions directly in SQL queries + +## Example: Text Embedding Function + +```python +# Simple embedding UDF server demo +from databend_udf import udf, UDFServer +from sentence_transformers import SentenceTransformer + +# Load pre-trained model +model = SentenceTransformer('all-mpnet-base-v2') # 768-dimensional vectors + +@udf( + input_types=["STRING"], + result_type="ARRAY(FLOAT)", +) +def ai_embed_768(inputs: list[str], headers) -> list[list[float]]: + """Generate 768-dimensional embeddings for input texts""" + try: + # Process inputs in a single batch + embeddings = model.encode(inputs) + # Convert to list format + return [embedding.tolist() for embedding in embeddings] + except Exception as e: + print(f"Error generating embeddings: {e}") + # Return empty lists in case of error + return [[] for _ in inputs] + +if __name__ == '__main__': + print("Starting embedding UDF server on port 8815...") + server = UDFServer("0.0.0.0:8815") + server.add_function(ai_embed_768) + server.serve() +``` + +```sql +-- Register the external function in Databend +CREATE OR REPLACE FUNCTION ai_embed_768 (STRING) + RETURNS ARRAY(FLOAT) + LANGUAGE PYTHON + HANDLER = 'ai_embed_768' + ADDRESS = 'https://your-ml-server.example.com'; + +-- Use the custom embedding in queries +SELECT + id, + title, + cosine_distance( + ai_embed_768(content), + ai_embed_768('machine learning techniques') + ) AS similarity +FROM articles +ORDER BY similarity ASC +LIMIT 5; +``` diff --git a/tidb-cloud-lake/guides/fail-safe.md b/tidb-cloud-lake/guides/fail-safe.md new file mode 100644 index 0000000000000..fa322378fd96a --- /dev/null +++ b/tidb-cloud-lake/guides/fail-safe.md @@ -0,0 +1,74 @@ +--- +title: Fail-Safe +--- +import IndexOverviewList from '@site/src/components/IndexOverviewList'; +import EEFeature from '@site/src/components/EEFeature'; + + + +Fail-Safe refers to mechanisms aimed at recovering lost or accidentally deleted data from object storage. + +- Storage Compatibility: Currently, Fail-Safe supports only S3-compatible storage types. +- Bucket Versioning: For Fail-Safe to work, bucket versioning must be enabled. Note that data created before enabling versioning *cannot* be recovered using this method. + +### Implementing Fail-Safe + +Databend offers the [SYSTEM$FUSE_AMEND](/tidb-cloud-lake/sql/system-fuse-amend.md) table function to enable Fail-Safe recovery. This function lets you restore data from an S3-compatible storage bucket when bucket versioning is enabled. + +### Usage Example + +Below is a step-by-step example of using the [SYSTEM$FUSE_AMEND](/tidb-cloud-lake/sql/system-fuse-amend.md) function to recover a table's data from S3: + +1. Enable versioning for the bucket `databend-doc`. + +![alt text](../../../../static/img/guides/bucket-versioning.png) + +2. Create an external table, storing the table data in the `fail-safe` folder in the `databend-doc` bucket. + +```sql +CREATE TABLE t(a INT) +'s3://databend-doc/fail-safe/' +CONNECTION = (access_key_id ='' secret_access_key =''); + +-- Insert sample data +INSERT INTO t VALUES (1), (2), (3); +``` + +If you open the `fail-safe` folder in the bucket now, you can see the data is already there: + +![alt text](../../../../static/img/guides/bucket-versioning-2.png) + +3. Simulate data loss by deleting all the sub-folders and their files in the `fail-safe` folder. + +![alt text](../../../../static/img/guides/bucket-versioning-3.png) + +4. Attempting to query the table after removal will result in an error: + +```sql +SELECT * FROM t; + +error: APIError: ResponseError with 3001: NotFound (persistent) at read, context: { uri: https://s3.us-east-2.amazonaws.com/databend-doc/fail-safe/1/1502/_b/3f84d636dc6c40508720d1cde20d4f3b_v2.parquet, response: Parts { status: 404, version: HTTP/1.1, headers: {"x-amz-request-id": "FYSJNZX1X16T91HN", "x-amz-id-2": "EI+NQjyRlSk8jlU64EASKodjvOkzuAlhZ1CYo0nIenzOH6DP7t6mMWh7raj4mUiOxW18NQesxmA=", "x-amz-delete-marker": "true", "x-amz-version-id": "ngecunzFP0pir0ysXlbR_eJafaTPl1oh", "content-type": "application/xml", "transfer-encoding": "chunked", "date": "Mon, 09 Sep 2024 02:01:57 GMT", "server": "AmazonS3"} }, service: s3, path: 1/1502/_b/3f84d636dc6c40508720d1cde20d4f3b_v2.parquet, range: 4-47 } => S3Error { code: "NoSuchKey", message: "The specified key does not exist.", resource: "", request_id: "FYSJNZX1X16T91HN" } +``` + +5. Recover the table data using system$fuse_amend: + +```sql +CALL system$fuse_amend('default', 't'); + +-[ RECORD 1 ]----------------------------------- +result: Ok +``` + +6. Verify that the table data is back: + +```sql +SELECT * FROM t; + +┌─────────────────┐ +│ a │ +├─────────────────┤ +│ 1 │ +│ 2 │ +│ 3 │ +└─────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/guides/full-text-index.md b/tidb-cloud-lake/guides/full-text-index.md new file mode 100644 index 0000000000000..7ea380566276c --- /dev/null +++ b/tidb-cloud-lake/guides/full-text-index.md @@ -0,0 +1,279 @@ +--- +title: Full-Text Index +--- + +:::info +Looking for a hands-on walkthrough? See [JSON & Search Guide](/tidb-cloud-lake/guides/json-search.md). +::: + +# Full-Text Index: Automatic Lightning-Fast Text Search + +Full-text indexes (inverted indexes) automatically enable lightning-fast text searches across large document collections by mapping terms to documents, eliminating the need for slow table scans. + +## What Problem Does It Solve? + +Text search operations on large datasets face significant performance challenges: + +| Problem | Impact | Full-Text Index Solution | +|---------|--------|-------------------------| +| **Slow LIKE Queries** | `WHERE content LIKE '%keyword%'` scans entire tables | Direct term lookup, skip irrelevant documents | +| **Full Table Scans** | Every text search reads all rows | Read only documents containing search terms | +| **Poor Search Experience** | Users wait seconds/minutes for search results | Sub-second search response times | +| **Limited Search Capabilities** | Basic pattern matching only | Advanced features: fuzzy search, relevance scoring | +| **High Resource Usage** | Text searches consume excessive CPU/memory | Minimal resources for indexed searches | + +**Example**: Searching for "kubernetes error" in 10M log entries. Without full-text index, it scans all 10M rows. With full-text index, it directly finds the ~1000 matching documents instantly. + +## How It Works + +Full-text indexes create an inverted mapping from terms to documents: + +| Term | Document IDs | +|------|-------------| +| "kubernetes" | 101, 205, 1847 | +| "error" | 101, 892, 1847 | +| "pod" | 205, 1847, 2901 | + +When searching for "kubernetes error", the index finds documents containing both terms (101, 1847) without scanning the entire table. + +## Quick Setup + +```sql +-- Create table with text content +CREATE TABLE logs(id INT, message TEXT, timestamp TIMESTAMP); + +-- Create full-text index - automatically indexes new data +CREATE INVERTED INDEX logs_message_idx ON logs(message); + +-- One-time refresh needed only for existing data before index creation +REFRESH INVERTED INDEX logs_message_idx ON logs; + +-- Search using MATCH function - fully automatic optimization +SELECT * FROM logs WHERE MATCH(message, 'error kubernetes'); +``` + +**Automatic Index Management**: +- **New Data**: Automatically indexed as it's inserted - no manual action needed +- **Existing Data**: One-time refresh required only for data that existed before index creation +- **Ongoing Maintenance**: Databend automatically maintains optimal search performance + +## Search Functions + +| Function | Purpose | Example | +|----------|---------|---------| +| `MATCH(column, 'terms')` | Basic text search | `MATCH(content, 'database performance')` | +| `QUERY('column:terms')` | Advanced query syntax | `QUERY('title:"full text" AND content:search')` | +| `SCORE()` | Relevance scoring | `SELECT *, SCORE() FROM docs WHERE MATCH(...)` | + +## Advanced Search Features + +### Fuzzy Search +```sql +-- Find documents even with typos (fuzziness=1 allows 1 character difference) +SELECT * FROM logs WHERE MATCH(message, 'kubernetes', 'fuzziness=1'); +``` + +### Relevance Scoring +```sql +-- Get results with relevance scores, filter by minimum score +SELECT id, message, SCORE() as relevance +FROM logs +WHERE MATCH(message, 'critical error') AND SCORE() > 0.5 +ORDER BY SCORE() DESC; +``` + +### Complex Queries +```sql +-- Advanced query syntax with boolean operators +SELECT * FROM docs WHERE QUERY('title:"user guide" AND content:(tutorial OR example)'); +``` + +## Complete Example + +This example demonstrates creating a full-text search index on Kubernetes log data and searching using various functions: + +```sql +-- Create a table with a computed column +CREATE TABLE k8s_logs ( + event_id INT, + event_data VARIANT, + event_timestamp TIMESTAMP, + event_message VARCHAR AS (event_data['message']::VARCHAR) STORED +); + +-- Create an inverted index on the "event_message" column +CREATE INVERTED INDEX event_message_fulltext ON k8s_logs(event_message); + +-- Insert comprehensive sample data +INSERT INTO k8s_logs (event_id, event_data, event_timestamp) +VALUES + (1, + PARSE_JSON('{ + "message": "Pod scheduled", + "object_type": "Pod", + "name": "frontend-1", + "namespace": "production", + "node": "node-01", + "status": "Scheduled" + }'), + '2024-04-08T08:00:00Z'); + +INSERT INTO k8s_logs (event_id, event_data, event_timestamp) +VALUES + (2, + PARSE_JSON('{ + "message": "Deployment scaled", + "object_type": "Deployment", + "name": "backend", + "namespace": "development", + "replicas": 3 + }'), + '2024-04-08T09:15:00Z'); + +INSERT INTO k8s_logs (event_id, event_data, event_timestamp) +VALUES + (3, + PARSE_JSON('{ + "message": "Node condition changed", + "object_type": "Node", + "name": "node-02", + "condition": "Ready", + "status": "True" + }'), + '2024-04-08T10:30:00Z'); + +INSERT INTO k8s_logs (event_id, event_data, event_timestamp) +VALUES + (4, + PARSE_JSON('{ + "message": "ConfigMap updated", + "object_type": "ConfigMap", + "name": "app-config", + "namespace": "default", + "change": "data update" + }'), + '2024-04-08T11:45:00Z'); + +INSERT INTO k8s_logs (event_id, event_data, event_timestamp) +VALUES + (5, + PARSE_JSON('{ + "message": "PersistentVolume claim created", + "object_type": "PVC", + "name": "storage-claim", + "namespace": "storage", + "status": "Bound", + "volume": "pv-logs" + }'), + '2024-04-08T12:00:00Z'); + +-- Basic search for events containing "PersistentVolume" +SELECT + event_id, + event_message +FROM + k8s_logs +WHERE + MATCH(event_message, 'PersistentVolume'); + +-[ RECORD 1 ]----------------------------------- + event_id: 5 +event_message: PersistentVolume claim created + +-- Verify index usage with EXPLAIN +EXPLAIN SELECT event_id, event_message FROM k8s_logs WHERE MATCH(event_message, 'PersistentVolume'); + +-[ EXPLAIN ]----------------------------------- +Filter +├── output columns: [k8s_logs.event_id (#0), k8s_logs.event_message (#3)] +├── filters: [k8s_logs._search_matched (#4)] +├── estimated rows: 5.00 +└── TableScan + ├── table: default.default.k8s_logs + ├── output columns: [event_id (#0), event_message (#3), _search_matched (#4)] + ├── read rows: 1 + ├── read size: < 1 KiB + ├── partitions total: 5 + ├── partitions scanned: 1 + ├── pruning stats: [segments: , blocks: ] + ├── push downs: [filters: [k8s_logs._search_matched (#4)], limit: NONE] + └── estimated rows: 5.00 + +-- Advanced search with relevance scoring +SELECT + event_id, + event_message, + event_timestamp, + SCORE() +FROM + k8s_logs +WHERE + SCORE() > 0.5 + AND QUERY('event_message:"PersistentVolume claim created"'); + +-[ RECORD 1 ]----------------------------------- + event_id: 5 + event_message: PersistentVolume claim created +event_timestamp: 2024-04-08 12:00:00 + score(): 0.86304635 + +-- Fuzzy search example (handles typos) +SELECT + event_id, event_message, event_timestamp +FROM + k8s_logs +WHERE + match('event_message', 'PersistentVolume claim create', 'fuzziness=1'); + +-[ RECORD 1 ]----------------------------------- + event_id: 5 + event_message: PersistentVolume claim created +event_timestamp: 2024-04-08 12:00:00 +``` + +**Key Points from the Example:** +- `inverted pruning: 5 to 1` shows the index reduced blocks scanned from 5 to 1 +- Relevance scoring helps rank results by match quality +- Fuzzy search finds results even with typos ("create" vs "created") + +## Best Practices + +| Practice | Benefit | +|----------|---------| +| **Index Frequently Searched Columns** | Focus on columns used in search queries | +| **Use MATCH Instead of LIKE** | Leverage automatic index performance | +| **Monitor Index Usage** | Use EXPLAIN to verify index utilization | +| **Consider Multiple Indexes** | Different columns can have separate indexes | + +## Essential Commands + +| Command | Purpose | When to Use | +|---------|---------|-------------| +| `CREATE INVERTED INDEX name ON table(column)` | Create new full-text index | Initial setup - automatic for new data | +| `REFRESH INVERTED INDEX name ON table` | Index existing data | One-time only for pre-existing data | +| `DROP INVERTED INDEX name ON table` | Remove index | When index no longer needed | + +## Important Notes + +:::tip +**When to Use Full-Text Indexes:** +- Large text datasets (documents, logs, comments) +- Frequent text search operations +- Need for advanced search features (fuzzy, scoring) +- Performance-critical search applications + +**When NOT to Use:** +- Small text datasets +- Exact string matching only +- Infrequent search operations +::: + +## Index Limitations + +- Each column can only be in one inverted index +- Requires refresh after data insertion (if data existed before index creation) +- Uses additional storage space for index data + +--- + +*Full-text indexes are essential for applications requiring fast, sophisticated text search capabilities across large document collections.* diff --git a/tidb-cloud-lake/guides/geo-analytics.md b/tidb-cloud-lake/guides/geo-analytics.md new file mode 100644 index 0000000000000..eddf78b801380 --- /dev/null +++ b/tidb-cloud-lake/guides/geo-analytics.md @@ -0,0 +1,246 @@ +--- +title: Geo Analytics +--- + +> **Scenario:** CityDrive records precise GPS positioning and distance-to-signal for every flagged frame. This geospatial data originates from the dash-cam's GPS module and is precisely aligned with the timestamps of video keyframes. Ops teams can answer "where did this happen?" purely in SQL. + +`frame_geo_points` and `signal_contact_points` share the same `video_id`/`frame_id` keys as the rest of the guide, so you can move from SQL metrics to maps without copying data. + +## 1. Create Location Tables +If you followed the JSON guide, these tables already exist. The snippet below shows their structure plus a few Shenzhen samples. + +```sql +CREATE OR REPLACE TABLE frame_geo_points ( + video_id STRING, + frame_id STRING, + position_wgs84 GEOMETRY, + solution_grade INT, + source_system STRING, + created_at TIMESTAMP +); + +INSERT INTO frame_geo_points VALUES + ('VID-20250101-001','FRAME-0101',TO_GEOMETRY('SRID=4326;POINT(114.0579123456789 22.543123456789)'),104,'fusion_gnss','2025-01-01 08:15:21'), + ('VID-20250101-001','FRAME-0102',TO_GEOMETRY('SRID=4326;POINT(114.0610987654321 22.546098765432)'),104,'fusion_gnss','2025-01-01 08:33:54'), + ('VID-20250101-002','FRAME-0201',TO_GEOMETRY('SRID=4326;POINT(114.104012345678 22.559456789012)'),104,'fusion_gnss','2025-01-01 11:12:02'), + ('VID-20250102-001','FRAME-0301',TO_GEOMETRY('SRID=4326;POINT(114.082265432109 22.53687654321)'),104,'fusion_gnss','2025-01-02 09:44:18'), + ('VID-20250103-001','FRAME-0401',TO_GEOMETRY('SRID=4326;POINT(114.119501234567 22.544365432101)'),104,'fusion_gnss','2025-01-03 21:18:07'); + +CREATE OR REPLACE TABLE signal_contact_points ( + node_id STRING, + signal_position GEOMETRY, + video_id STRING, + frame_id STRING, + frame_position GEOMETRY, + distance_m DOUBLE, + created_at TIMESTAMP +); + +INSERT INTO signal_contact_points VALUES + ('SIG-0001', TO_GEOMETRY('SRID=4326;POINT(114.058500123456 22.543800654321)'), 'VID-20250101-001', 'FRAME-0101', TO_GEOMETRY('SRID=4326;POINT(114.0579123456789 22.543123456789)'), 0.012345, '2025-01-01 08:15:30'), + ('SIG-0002', TO_GEOMETRY('SRID=4326;POINT(114.118900987654 22.544800123456)'), 'VID-20250103-001', 'FRAME-0401', TO_GEOMETRY('SRID=4326;POINT(114.119501234567 22.544365432101)'), 0.008765, '2025-01-03 21:18:20'); + +-- Frames and JSON tables these queries join against (same rows as SQL & Search guides). +CREATE OR REPLACE TABLE frame_events ( + frame_id STRING, + video_id STRING, + frame_index INT, + collected_at TIMESTAMP, + event_tag STRING, + risk_score DOUBLE, + speed_kmh DOUBLE +); + +INSERT INTO frame_events VALUES + ('FRAME-0101', 'VID-20250101-001', 125, '2025-01-01 08:15:21', 'hard_brake', 0.81, 32.4), + ('FRAME-0102', 'VID-20250101-001', 416, '2025-01-01 08:33:54', 'pedestrian', 0.67, 24.8), + ('FRAME-0201', 'VID-20250101-002', 298, '2025-01-01 11:12:02', 'lane_merge', 0.74, 48.1), + ('FRAME-0301', 'VID-20250102-001', 188, '2025-01-02 09:44:18', 'hard_brake', 0.59, 52.6), + ('FRAME-0401', 'VID-20250103-001', 522, '2025-01-03 21:18:07', 'night_lowlight', 0.63, 38.9), + ('FRAME-0501', 'VID-MISSING-001', 10, '2025-01-04 10:00:00', 'sensor_fault', 0.25, 15.0); + +CREATE OR REPLACE TABLE frame_metadata_catalog ( + doc_id STRING, + meta_json VARIANT, + captured_at TIMESTAMP, + INVERTED INDEX idx_meta_json (meta_json) +); + +INSERT INTO frame_metadata_catalog VALUES + ('FRAME-0101', PARSE_JSON('{"scene":{"weather_code":"rain","lighting":"day"},"camera":{"sensor_view":"roof"},"vehicle":{"speed_kmh":32.4},"detections":{"objects":[{"type":"vehicle","confidence":0.88},{"type":"brake_light","confidence":0.64}]},"media_meta":{"tagging":{"labels":["hard_brake","rain","downtown_loop"]}}}'), '2025-01-01 08:15:21'), + ('FRAME-0102', PARSE_JSON('{"scene":{"weather_code":"rain","lighting":"day"},"camera":{"sensor_view":"roof"},"vehicle":{"speed_kmh":24.8},"detections":{"objects":[{"type":"pedestrian","confidence":0.92},{"type":"bike","confidence":0.35}]},"media_meta":{"tagging":{"labels":["pedestrian","swerve","crosswalk"]}}}'), '2025-01-01 08:33:54'), + ('FRAME-0201', PARSE_JSON('{"scene":{"weather_code":"overcast","lighting":"day"},"camera":{"sensor_view":"front"},"vehicle":{"speed_kmh":48.1},"detections":{"objects":[{"type":"lane_merge","confidence":0.74},{"type":"vehicle","confidence":0.41}]},"media_meta":{"tagging":{"labels":["lane_merge","urban"]}}}'), '2025-01-01 11:12:02'), + ('FRAME-0301', PARSE_JSON('{"scene":{"weather_code":"clear","lighting":"day"},"camera":{"sensor_view":"front"},"vehicle":{"speed_kmh":52.6},"detections":{"objects":[{"type":"vehicle","confidence":0.82},{"type":"hard_brake","confidence":0.59}]},"media_meta":{"tagging":{"labels":["hard_brake","highway"]}}}'), '2025-01-02 09:44:18'), + ('FRAME-0401', PARSE_JSON('{"scene":{"weather_code":"lightfog","lighting":"night"},"camera":{"sensor_view":"rear"},"vehicle":{"speed_kmh":38.9},"detections":{"objects":[{"type":"traffic_light","confidence":0.78},{"type":"vehicle","confidence":0.36}]},"media_meta":{"tagging":{"labels":["night_lowlight","traffic_light"]}}}'), '2025-01-03 21:18:07'); +``` + +Docs: [Geospatial types](/tidb-cloud-lake/sql/geospatial.md). + +--- + +## 2. Spatial Filters +Measure how far each frame was from a key downtown coordinate or check whether it falls inside a polygon. Convert to SRID 3857 when you need meter-level distances. + +```sql +SELECT l.frame_id, + l.video_id, + f.event_tag, + ST_DISTANCE( + ST_TRANSFORM(l.position_wgs84, 3857), + ST_TRANSFORM(TO_GEOMETRY('SRID=4326;POINT(114.0600 22.5450)'), 3857) + ) AS meters_from_hq +FROM frame_geo_points AS l +JOIN frame_events AS f USING (frame_id) +WHERE ST_DISTANCE( + ST_TRANSFORM(l.position_wgs84, 3857), + ST_TRANSFORM(TO_GEOMETRY('SRID=4326;POINT(114.0600 22.5450)'), 3857) + ) <= 400 +ORDER BY meters_from_hq; +``` + +Sample output: + +``` +frame_id | video_id | event_tag | meters_from_hq +FRAME-0102| VID-20250101-001 | pedestrian | 180.277138577 +FRAME-0101| VID-20250101-001 | hard_brake | 324.291965923 +``` + +Tip: add `ST_ASTEXT(l.geom)` while debugging or switch to [`HAVERSINE`](/tidb-cloud-lake/sql/geospatial-functions.md#trigonometric-distance-functions) for great-circle math. + +```sql +WITH school_zone AS ( + SELECT TO_GEOMETRY('SRID=4326;POLYGON(( + 114.0505 22.5500, + 114.0630 22.5500, + 114.0630 22.5420, + 114.0505 22.5420, + 114.0505 22.5500 + ))') AS poly +) +SELECT l.frame_id, + l.video_id, + f.event_tag +FROM frame_geo_points AS l +JOIN frame_events AS f USING (frame_id) +CROSS JOIN school_zone +WHERE ST_CONTAINS(poly, l.position_wgs84); +``` + +Sample output: + +``` +frame_id | video_id | event_tag +FRAME-0101| VID-20250101-001 | hard_brake +FRAME-0102| VID-20250101-001 | pedestrian +``` + +--- + +## 3. Hex Aggregations +Aggregate risky frames into hexagonal buckets for dashboards. + +```sql +SELECT GEO_TO_H3(ST_X(position_wgs84), ST_Y(position_wgs84), 8) AS h3_cell, + COUNT(*) AS frame_count, + AVG(f.risk_score) AS avg_risk +FROM frame_geo_points AS l +JOIN frame_events AS f USING (frame_id) +GROUP BY h3_cell +ORDER BY avg_risk DESC; +``` + +Sample output: + +``` +h3_cell | frame_count | avg_risk +613635011200942079| 1 | 0.81 +613635011532292095| 1 | 0.74 +613635011238690815| 1 | 0.67 +613635015391051775| 1 | 0.63 +613635011309993983| 1 | 0.59 +``` + +Docs: [H3 functions](/tidb-cloud-lake/sql/geospatial-functions.md#h3-indexing--conversion). + +--- + +## 4. Traffic Context +Join `signal_contact_points` and `frame_geo_points` to validate stored metrics, or blend spatial predicates with JSON search. + +```sql +SELECT t.node_id, + t.video_id, + t.frame_id, + ST_DISTANCE(t.signal_position, t.frame_position) AS recomputed_distance, + t.distance_m AS stored_distance, + l.source_system +FROM signal_contact_points AS t +JOIN frame_geo_points AS l USING (frame_id) +WHERE t.distance_m < 0.03 -- roughly < 30 meters depending on SRID +ORDER BY t.distance_m; +``` + +Sample output: + +``` +node_id | video_id | frame_id | recomputed_distance | stored_distance | source_system +SIG-0002| VID-20250103-001 | FRAME-0401| 0.000741116 | 0.008765 | fusion_gnss +SIG-0001| VID-20250101-001 | FRAME-0101| 0.000896705 | 0.012345 | fusion_gnss +``` + +```sql +WITH near_junction AS ( + SELECT frame_id + FROM frame_geo_points + WHERE ST_DISTANCE( + ST_TRANSFORM(position_wgs84, 3857), + ST_TRANSFORM(TO_GEOMETRY('SRID=4326;POINT(114.0830 22.5370)'), 3857) + ) <= 200 +) +SELECT f.frame_id, + f.event_tag, + meta.meta_json['media_meta']['tagging']['labels'] AS labels +FROM near_junction nj +JOIN frame_events AS f USING (frame_id) +JOIN frame_metadata_catalog AS meta + ON meta.doc_id = nj.frame_id +WHERE QUERY('meta_json.media_meta.tagging.labels:hard_brake'); +``` + +Sample output: + +``` +frame_id | event_tag | labels +FRAME-0301| hard_brake | ["hard_brake","highway"] +``` + +This pattern lets you filter by geography first, then apply JSON search to the surviving frames. + +--- + +## 5. Publish a Heatmap View +Expose the geo heatmap to BI or GIS tools without re-running heavy SQL. + +```sql +CREATE OR REPLACE VIEW v_citydrive_geo_heatmap AS +SELECT GEO_TO_H3(ST_X(position_wgs84), ST_Y(position_wgs84), 7) AS h3_cell, + COUNT(*) AS frames, + AVG(f.risk_score) AS avg_risk +FROM frame_geo_points AS l +JOIN frame_events AS f USING (frame_id) +GROUP BY h3_cell; +``` + +Sample output: + +``` +h3_cell | frames | avg_risk +609131411584057343| 1 | 0.81 +609131411919601663| 1 | 0.74 +609131411617611775| 1 | 0.67 +609131415778361343| 1 | 0.63 +609131411684720639| 1 | 0.59 +``` + +Databend now serves vector, text, and spatial queries off the exact same `video_id`, so investigation teams never have to reconcile separate pipelines. diff --git a/tidb-cloud-lake/guides/grafana.md b/tidb-cloud-lake/guides/grafana.md new file mode 100644 index 0000000000000..dde4bbe787019 --- /dev/null +++ b/tidb-cloud-lake/guides/grafana.md @@ -0,0 +1,191 @@ +--- +title: Grafana +sidebar_position: 1 +--- + +[Grafana](https://grafana.com/) is a monitoring dashboard system, which is an open-source monitoring tool developed by Grafana Labs. It can greatly simplify the complexity of monitoring by allowing us to provide the data to be monitored, and it generates various visualizations. Additionally, it has an alarm function that sends notifications when there is an issue with the system. + +Databend Cloud and Databend can integrate with Grafana in two ways: + +- **Loki Protocol (Recommended for Databend Cloud)**: Use Grafana's built-in Loki data source to connect to Databend Cloud via Loki-compatible API endpoints. +- **Custom Plugin**: Use the [Grafana Databend Data Source Plugin](https://github.com/databendlabs/grafana-databend-datasource) for direct SQL access. + +## Using Loki Protocol (Recommended) + +Databend Cloud provides a Loki-compatible API that allows you to use Grafana's native Loki data source without installing additional plugins. This is the recommended approach for most use cases. + +:::note +The Loki protocol feature requires activation. Please contact support to enable this feature for your account. +::: + +### Step 1. Configure Table + +Before connecting to Grafana, configure your Databend Cloud table for log data visualization. Below are two recommended schema types: + +#### Loki Schema + +This schema stores labels as a VARIANT/MAP alongside the log body: + +```sql +CREATE TABLE logs ( + `timestamp` TIMESTAMP NOT NULL, + `labels` VARIANT NOT NULL, + `line` STRING NOT NULL, + `stream_hash` UInt64 NOT NULL AS (city64withseed(labels, 0)) STORED +) CLUSTER BY (to_start_of_hour(timestamp), stream_hash); + +CREATE INVERTED INDEX logs_line_idx ON logs(line); +REFRESH INVERTED INDEX logs_line_idx ON logs; +``` + +- `timestamp`: log event timestamp +- `labels`: VARIANT storing serialized Loki labels +- `line`: raw log line +- `stream_hash`: computed hash for clustering + +#### Flat Schema + +This schema uses a wide table where each attribute is a separate column: + +```sql +CREATE TABLE nginx_logs ( + `agent` STRING, + `client` STRING, + `host` STRING, + `path` STRING, + `request` STRING, + `status` INT, + `timestamp` TIMESTAMP NOT NULL +) CLUSTER BY (to_start_of_hour(timestamp), host, status); + +CREATE INVERTED INDEX nginx_request_idx ON nginx_logs(request); +REFRESH INVERTED INDEX nginx_request_idx ON nginx_logs; +``` + +Every column except the timestamp and line column becomes a LogQL label. + +![Configure Table](/img/connect/grafana-configure-table.png) + +### Step 2. Get Connection Information + +1. Log in to your Databend Cloud account. + +2. On the dashboard, click **Connect** to view the connection information. Note down: + - **Host**: The warehouse endpoint (e.g., `tnxxxxxxx.gw.aws-us-east-2.default.databend.com`) + - **User**: Your username (typically `cloudapp`) + - **Password**: Your password or API key + - **Database**: The database name containing your log table + - **Warehouse**: The warehouse name + +![Get Connection Info](/img/connect/grafana-get-connect-info.png) + +For detailed information on obtaining connection details, see [Connecting to a Warehouse](/tidb-cloud-lake/guides/warehouse.md#connecting). + +### Step 3. Configure Grafana Data Source + +1. In Grafana, navigate to **Connections** > **Data sources** > **Add data source**. + +2. Search for and select **Loki**. + +3. Configure the basic settings: + - **Name**: Give your data source a descriptive name (e.g., "Databend Cloud Logs") + - **URL**: Enter `https://` using the host from Step 2 + +![Configure Loki Data Source - Basic](/img/connect/grafana-configure-loki-datasource-basic.png) + +4. Configure authentication: + - Enable **Basic auth** under the Authentication section + - **User**: Enter your username (typically `cloudapp`) + - **Password**: Enter your password or API key + +5. Add custom HTTP headers. Under **Custom HTTP Headers**, add the following: + - **Header**: `X-Databend-Warehouse`, **Value**: Your warehouse name + - **Header**: `X-Databend-Database`, **Value**: Your database name + - **Header**: `X-Databend-Table`, **Value**: Your table name + +![Configure Loki Data Source - Headers](/img/connect/grafana-configure-loki-datasource-header.png) + +6. Click **Save & test** to verify the connection. + +![Configure Loki Data Source - Complete](/img/connect/grafana-configure-loki-datasource-complete.png) + +### Step 4. Test Queries + +1. Navigate to **Explore** in Grafana. + +2. Select your Databend Cloud Loki data source. + +3. Use LogQL queries to visualize your data. For example: + - `{service="api"}` - Filter logs by service label + - `{level="error"}` - Show only error-level logs + - `{service="api"} |= "timeout"` - Search for specific text in logs + - `count_over_time({status="500"}[5m])` - Count errors over time + +4. Customize the visualization as needed using Grafana's panel options. + +![Test Loki Query with Explore](/img/connect/grafana-test-loki-query-with-explore.png) + +## Using Custom Plugin (Alternative) + +For advanced use cases requiring direct SQL access or when working with self-hosted Databend, you can use the Grafana Databend Data Source Plugin. + +### Step 1. Set up Environment + +Before you start, ensure you have: + +- Grafana installed. Refer to the official installation guide: [https://grafana.com/docs/grafana/latest/setup-grafana/installation](https://grafana.com/docs/grafana/latest/setup-grafana/installation) +- Databend Cloud access with connection information for a warehouse (see [Connecting to a Warehouse](/tidb-cloud-lake/guides/warehouse.md#connecting)) + +### Step 2. Modify Grafana Configuration + +Add the following lines to your `grafana.ini` file: + +```ini +[plugins] +allow_loading_unsigned_plugins = databend-datasource +``` + +### Step 3. Install the Grafana Databend Data Source Plugin + +1. Find the latest release on [GitHub Release](https://github.com/databendlabs/grafana-databend-datasource/releases). + +2. Get the download URL for the plugin zip package, for example, `https://github.com/databendlabs/grafana-databend-datasource/releases/download/v1.0.2/databend-datasource-1.0.2.zip`. + +3. Get the Grafana plugins folder and unzip the downloaded zip package into it: + +```shell +curl -fLo /tmp/grafana-databend-datasource.zip https://github.com/databendlabs/grafana-databend-datasource/releases/download/v1.0.2/databend-datasource-1.0.2.zip +unzip /tmp/grafana-databend-datasource.zip -d /var/lib/grafana/plugins +rm /tmp/grafana-databend-datasource.zip +``` + +4. Restart Grafana to load the plugin. + +5. Navigate to the **Plugins** page in the Grafana UI, for example, `http://localhost:3000/plugins`, and ensure the plugin is installed. + +![Plugins](/img/integration/grafana-plugins.png) +![Plugin detail](/img/integration/grafana-plugin-detail.png) + +### Step 4. Configure Data Source + +1. Go to the `Add new connection` page, for example, `http://localhost:3000/connections/add-new-connection?search=databend`, search for `databend`, and select it. + +2. Click **Add new data source** on the top right corner of the page. + +3. Input the `DSN` field for your Databend instance. For example: + - Self-hosted: `databend://root:@localhost:8000?sslmode=disable` + - Databend Cloud: `databend://cloudapp:******@tnxxxxxxx.gw.aws-us-east-2.default.databend.com:443/default?warehouse=xsmall-fsta` + +4. Optionally, input the `SQL User Password` field to override the password in the `DSN` field. + +5. Click **Save & test**. If the page displays "Data source is working", the data source has been successfully created. + +### Step 5. Test Queries + +1. Create a new dashboard and add a panel. + +2. Select your Databend data source. + +3. Write SQL queries to retrieve and visualize your data. + +4. Configure the panel visualization options as needed. diff --git a/tidb-cloud-lake/guides/how-data-sharing-works.md b/tidb-cloud-lake/guides/how-data-sharing-works.md new file mode 100644 index 0000000000000..d6de1040ee5be --- /dev/null +++ b/tidb-cloud-lake/guides/how-data-sharing-works.md @@ -0,0 +1,135 @@ +--- +title: How Databend Data Sharing Works +--- + +## What is Data Sharing? + +Different teams need different parts of the same data. Traditional solutions copy data multiple times - expensive and hard to maintain. + +Databend's **[ATTACH TABLE](/tidb-cloud-lake/sql/attach-table.md)** solves this elegantly: create multiple "views" of the same data without copying it. This leverages Databend's **true compute-storage separation** - whether using cloud storage or on-premise object storage: **store once, access everywhere**. + +Think of ATTACH TABLE like computer shortcuts - they point to the original file without duplicating it. + +``` + Object Storage (S3, MinIO, Azure, etc.) + ┌─────────────┐ + │ Your Data │ + └──────┬──────┘ + │ + ┌───────────────────────┼───────────────────────┐ + │ │ │ + ▼ ▼ ▼ +┌─────────────┐ ┌─────────────┐ ┌─────────────┐ +│ Marketing │ │ Finance │ │ Sales │ +│ Team View │ │ Team View │ │ Team View │ +└─────────────┘ └─────────────┘ └─────────────┘ +``` + +## How to Use ATTACH TABLE + +**Step 1: Find your data location** +```sql +SELECT snapshot_location FROM FUSE_SNAPSHOT('default', 'company_sales'); +-- Result: 1/23351/_ss/... → Data at s3://your-bucket/1/23351/ +``` + +**Step 2: Create team-specific views** +```sql +-- Marketing: Customer behavior analysis +ATTACH TABLE marketing_view (customer_id, product, amount, order_date) +'s3://your-bucket/1/23351/' CONNECTION = (ACCESS_KEY_ID = 'xxx', SECRET_ACCESS_KEY = 'yyy'); + +-- Finance: Revenue tracking +ATTACH TABLE finance_view (order_id, amount, profit, order_date) +'s3://your-bucket/1/23351/' CONNECTION = (ACCESS_KEY_ID = 'xxx', SECRET_ACCESS_KEY = 'yyy'); + +-- HR: Employee info without salaries +ATTACH TABLE hr_employees (employee_id, name, department) +'s3://data/1/23351/' CONNECTION = (...); + +-- Development: Production structure without sensitive data +ATTACH TABLE dev_customers (customer_id, country, created_date) +'s3://data/1/23351/' CONNECTION = (...); +``` + +**Step 3: Query independently** +```sql +-- Marketing analyzes trends +SELECT product, COUNT(*) FROM marketing_view GROUP BY product; + +-- Finance tracks profit +SELECT order_date, SUM(profit) FROM finance_view GROUP BY order_date; +``` + +## Key Benefits + +**Real-Time Updates**: When source data changes, all attached tables see it instantly +```sql +INSERT INTO company_sales VALUES (1001, 501, 'Laptop', 1299.99, 299.99, 'user@email.com', '2025-01-20'); +SELECT COUNT(*) FROM marketing_view WHERE order_date = '2024-01-20'; -- Returns: 1 +``` + +**Column-Level Security**: Teams only see what they need - Marketing can't see profit, Finance can't see customer emails + +**Strong Consistency**: Never read partial updates, always see complete snapshots - perfect for financial reporting and compliance + +**Full Performance**: All indexes work automatically, same speed as regular tables + +## Why This Matters + +| Traditional Approach | Databend ATTACH TABLE | +|---------------------|----------------------| +| Multiple data copies | Single copy shared by all | +| ETL delays, sync issues | Real-time, always current | +| Complex maintenance | Zero maintenance | +| More copies = more security risk | Fine-grained column access | +| Slower due to data movement | Full optimization on original data | + +## How It Works Under the Hood + +``` +Query: SELECT product, SUM(amount) FROM marketing_view GROUP BY product + +┌─────────────────────────────────────────────────────────────────┐ +│ Query Execution Flow │ +└─────────────────────────────────────────────────────────────────┘ + + User Query + │ + ▼ +┌───────────────────┐ ┌─────────────────────────────────────┐ +│ 1. Read Snapshot │───►│ s3://bucket/1/23351/_ss/ │ +│ Metadata │ │ Get current table state │ +└───────────────────┘ └─────────────────────────────────────┘ + │ + ▼ +┌───────────────────┐ ┌─────────────────────────────────────┐ +│ 2. Apply Column │───►│ Filter: customer_id, product, │ +│ Filter │ │ amount, order_date │ +└───────────────────┘ └─────────────────────────────────────┘ + │ + ▼ +┌───────────────────┐ ┌─────────────────────────────────────┐ +│ 3. Check Stats & │───►│ • Segment min/max values │ +│ Indexes │ │ • Bloom filters │ +└───────────────────┘ │ • Aggregate indexes │ + │ └─────────────────────────────────────┘ + ▼ +┌───────────────────┐ ┌─────────────────────────────────────┐ +│ 4. Smart Data │───►│ Skip irrelevant blocks │ +│ Fetching │ │ Download only needed data from _b/ │ +└───────────────────┘ └─────────────────────────────────────┘ + │ + ▼ +┌───────────────────┐ ┌─────────────────────────────────────┐ +│ 5. Local │───►│ Full optimization & parallelism │ +│ Execution │ │ Process with all available indexes │ +└───────────────────┘ └─────────────────────────────────────┘ + │ + ▼ + Results: Product sales summary +``` + +Multiple Databend clusters can execute this flow simultaneously without coordination - true compute-storage separation in action. + +ATTACH TABLE represents a fundamental shift: **from copying data for each use case to one copy with many views**. Whether in cloud or on-premise environments, Databend's architecture enables powerful, efficient data sharing while maintaining enterprise-grade consistency and security. diff --git a/tidb-cloud-lake/guides/how-fuse-engine-works.md b/tidb-cloud-lake/guides/how-fuse-engine-works.md new file mode 100644 index 0000000000000..cd908ae0f0b0e --- /dev/null +++ b/tidb-cloud-lake/guides/how-fuse-engine-works.md @@ -0,0 +1,231 @@ +--- +title: How Fuse Engine Works +--- + +## Fuse Engine + +Fuse Engine is Databend's core storage engine, optimized for managing **petabyte-scale** data efficiently on **cloud object storage**. By default, tables created in Databend automatically use this engine (`ENGINE=FUSE`). Inspired by Git, its snapshot-based design enables powerful data versioning (like Time Travel) and provides **high query performance** through advanced pruning and indexing. + +This document explains its core concepts and how it works. + + +## Core Concepts + +Fuse Engine organizes data using three core structures, mirroring Git: + +* **Snapshots (Like Git Commits):** Immutable references defining the table's state at a point in time by pointing to specific Segments. Enables Time Travel. +* **Segments (Like Git Trees):** Collections of Blocks with summary statistics used for fast data skipping (pruning). Can be shared across Snapshots. +* **Blocks (Like Git Blobs):** Immutable data files (Parquet format) holding the actual rows and detailed column-level statistics for fine-grained pruning. + + +``` + Table HEAD + │ + ▼ + ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ + │ SEGMENT A │◄────│ SNAPSHOT 2 │────►│ SEGMENT B │ + │ │ │ Previous: │ │ │ + └───────┬───────┘ │ SNAPSHOT 1 │ └───────┬───────┘ + │ └───────────────┘ │ + │ │ │ + │ ▼ │ + │ ┌───────────────┐ │ + │ │ SNAPSHOT 1 │ │ + │ │ │ │ + │ └───────────────┘ │ + │ │ + ▼ ▼ + ┌───────────────┐ ┌───────────────┐ + │ BLOCK 1 │ │ BLOCK 2 │ + │ (cloud.txt) │ │(warehouse.txt)│ + └───────────────┘ └───────────────┘ +``` + + + +## How Writing Works + +When you add data to a table, Fuse Engine creates a chain of objects. Let's walk through this process step by step: + +### Step 1: Create a table + +```sql +CREATE TABLE git(file VARCHAR, content VARCHAR); +``` + +At this point, the table exists but contains no data: + +``` +(Empty table with no data) +``` + +### Step 2: Insert first data + +```sql +INSERT INTO git VALUES('cloud.txt', '2022/05/06, Databend, Cloud'); +``` + +After the first insert, Fuse Engine creates the initial snapshot, segment, and block: + +``` + Table HEAD + │ + ▼ + ┌───────────────┐ + │ SNAPSHOT 1 │ + │ │ + └───────┬───────┘ + │ + ▼ + ┌───────────────┐ + │ SEGMENT A │ + │ │ + └───────┬───────┘ + │ + ▼ + ┌───────────────┐ + │ BLOCK 1 │ + │ (cloud.txt) │ + └───────────────┘ +``` + +### Step 3: Insert more data + +```sql +INSERT INTO git VALUES('warehouse.txt', '2022/05/07, Databend, Warehouse'); +``` + +When we insert more data, Fuse Engine creates a new snapshot that references both the original segment and a new segment: + +``` + Table HEAD + │ + ▼ + ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ + │ SEGMENT A │◄────│ SNAPSHOT 2 │────►│ SEGMENT B │ + │ │ │ Previous: │ │ │ + └───────┬───────┘ │ SNAPSHOT 1 │ └───────┬───────┘ + │ └───────────────┘ │ + │ │ │ + │ ▼ │ + │ ┌───────────────┐ │ + │ │ SNAPSHOT 1 │ │ + │ │ │ │ + │ └───────────────┘ │ + │ │ + ▼ ▼ + ┌───────────────┐ ┌───────────────┐ + │ BLOCK 1 │ │ BLOCK 2 │ + │ (cloud.txt) │ │(warehouse.txt)│ + └───────────────┘ └───────────────┘ +``` + +## How Reading Works + +When you query data, Fuse Engine uses smart pruning to find your data efficiently: + +``` +Query: SELECT * FROM git WHERE file = 'cloud.txt'; + + Table HEAD + │ + ▼ + ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ + │ SEGMENT A │◄────│ SNAPSHOT 2 │────►│ SEGMENT B │ + │ CHECK │ │ │ │ CHECK │ + └───────┬───────┘ └───────────────┘ └───────────────┘ + │ ✗ + │ (Skip - doesn't contain + │ 'cloud.txt') + ▼ + ┌───────────────┐ + │ BLOCK 1 │ + │ CHECK │ + └───────┬───────┘ + │ + │ ✓ (Contains 'cloud.txt') + ▼ + Read this block +``` + +### Smart Pruning Process + +``` +┌─────────────────────────────────────────┐ +│ Query: WHERE file = 'cloud.txt' │ +└─────────────────┬───────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────┐ +│ Check SEGMENT A │ +│ Min file value: 'cloud.txt' │ +│ Max file value: 'cloud.txt' │ +│ │ +│ Result: ✓ Might contain 'cloud.txt' │ +└─────────────────┬───────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────┐ +│ Check SEGMENT B │ +│ Min file value: 'warehouse.txt' │ +│ Max file value: 'warehouse.txt' │ +│ │ +│ Result: ✗ Cannot contain 'cloud.txt' │ +└─────────────────┬───────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────┐ +│ Check BLOCK 1 in SEGMENT A │ +│ Min file value: 'cloud.txt' │ +│ Max file value: 'cloud.txt' │ +│ │ +│ Result: ✓ Contains 'cloud.txt' │ +└─────────────────┬───────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────┐ +│ Read only BLOCK 1 │ +└─────────────────────────────────────────┘ +``` + +## Snapshot-Based Features + +Fuse Engine's snapshot architecture enables powerful data management capabilities: + +### Time Travel + +Query data as it existed at any point in time. Enables data branching, tagging, and governance with complete audit trails and error recovery. + +### Zero-Copy Schema Evolution + +Modify your table's structure (add columns, drop columns, rename, change types) **without rewriting any underlying data files**. + +- Changes are metadata-only operations recorded in new Snapshots. +- This is instantaneous, requires no downtime, and avoids costly data migration tasks. Older data remains accessible with its original schema. + + +## Advanced Indexing for Query Acceleration (Fuse Engine) + +Beyond basic block/segment pruning using statistics, Fuse Engine offers specialized secondary indexes to further accelerate specific query patterns: + +| Index Type | Brief Description | Accelerates Queries Like... | Example Query Snippet | +| :------------------ | :-------------------------------------------------------- | :-------------------------------------------------- | :-------------------------------------- | +| **Aggregate Index** | Pre-computes aggregate results for specified groups | Faster `COUNT`, `SUM`, `AVG`... + `GROUP BY` | `SELECT COUNT(*)... GROUP BY city` | +| **Full-Text Index** | Inverted index for fast keyword search within text | Text search using `MATCH` (e.g., logs) | `WHERE MATCH(log_entry, 'error')` | +| **JSON Index** | Indexes specific paths/keys within JSON documents | Filtering on specific JSON paths/values | `WHERE event_data:user.id = 123` | +| **Bloom Filter Index** | Probabilistic check to quickly skip non-matching blocks | Fast point lookups (`=`) & `IN` list filtering | `WHERE user_id = 'xyz'` | + + + +## Comparison: Databend Fuse Engine vs. Apache Iceberg + +_**Note:** This comparison focuses specifically on **table format features**. As Databend's native table format, Fuse evolves, aiming to improve **usability and performance**. Features shown are current; expect changes._ + +| Feature | Apache Iceberg | Databend Fuse Engine | +| :---------------------- | :--------------------------------- | :----------------------------------- | +| **Metadata Structure** | Manifest Lists -> Manifest Files -> Data Files | **Snapshot** -> Segments -> Blocks | +| **Statistics Levels** | File-level (+Partition) | **Multi-level** (Snapshot, Segment, Block) → Finer pruning | +| **Pruning Power** | Good (File/Partition stats) | **Excellent** (Multi-level stats + Secondary indexes) | +| **Schema Evolution** | Supported (Metadata change) | **Zero-Copy** (Metadata-only, Instant) | +| **Data Clustering** | Sorting (On write) | **Automatic** Optimization (Background) | +| **Streaming Support** | Basic streaming ingestion | **Advanced Incremental** (Insert/Update tracking) | \ No newline at end of file diff --git a/tidb-cloud-lake/guides/how-json-variant-works.md b/tidb-cloud-lake/guides/how-json-variant-works.md new file mode 100644 index 0000000000000..66f7be844d8a5 --- /dev/null +++ b/tidb-cloud-lake/guides/how-json-variant-works.md @@ -0,0 +1,191 @@ +--- +title: "How Databend JSON (Variant) Works" +sidebar_label: "How Databend JSON Works" +--- + +See also: + +- [Variant Data Type](/tidb-cloud-lake/sql/variant.md) +- [Semi-Structured Functions](/tidb-cloud-lake/guides/load-semi-structured-formats.md) + +Databend reimagines JSON analytics by pairing a native binary layout with automatic JSON indexing so semi-structured data behaves like first-class columns. + +## Why Variant Matters + +Databend keeps JSON flexible while delivering MPP speed: you ingest documents as-is, query with familiar SQL, and the engine stitches together the performance story behind the scenes. Two pillars make it possible: + +- A compact **JSONB** layout keeps types visible to the execution engine. +- Automatic **virtual columns**—Databend’s JSON indexes—surface hot paths without manual work. + +From storage to queries, the rest of this guide follows how those two ideas turn a raw JSON payload (think `orders.data`) into optimised, typed columns. + +## JSON Storage Layout + +Databend stores Variant values in JSONB, a binary format optimised for analytics. In practice this means: + +- **Typed storage** – numbers, booleans, timestamps, and decimals remain native, so comparisons stay binary-safe. +- **Predictable layout** – fields carry length prefixes and canonical key order, eliminating reparsing overhead. +- **Zero-copy access** – operators read JSONB buffers directly during scans and sorts instead of rebuilding JSON text. + +Every Variant column keeps the raw JSONB document for fidelity. When paths like `data['user']['id']` show up repeatedly, Databend tucks them into typed sidecar columns ready for pushdown. + +## Automatic JSON Index Generation + +When new data lands in Databend, a lightweight indexing pipeline immediately scans the JSON blocks to discover hot paths worth materialising as virtual columns—Databend’s built-in JSON indexes. + +### Ingestion Flow + +Databend inspects the incoming batch and converts recurring access patterns into typed columns: + +``` +┌───────────────────────────────────────────────┐ +│ Variant Ingestion Flow │ +├──────────────┬────────────────────────────────┤ +│ Sample Rows │ Peek at the first rows in block │ +│ Detect Paths │ Keep stable leaf key paths │ +│ Infer Types │ Pick native column types │ +│ Materialize │ Write values to virtual Parquet │ +│ Register │ Attach metadata to base column │ +└──────────────┴────────────────────────────────┘ +``` + +### Lightweight by Design + +The pipeline relies on a handful of lightweight heuristics: + +``` +┌─────────────────────────────┬──────────────────────────────────────────────┐ +│ Step │ Heuristic │ +├─────────────────────────────┼──────────────────────────────────────────────┤ +│ Sampling │ Inspect only the first 10 rows of each block │ +│ Null & non-leaf filtering │ Skip paths dominated by NULL or pointing to │ +│ │ objects/arrays │ +│ Stability check │ Promote only leaf paths that stay consistent │ +│ │ across the sample (max 1,000 per block) │ +│ Deduplication │ Use hashing to avoid analysing the same path │ +│ │ repeatedly │ +│ Fallback │ Keep the original JSONB document when no │ +│ │ candidate survives │ +└─────────────────────────────┴──────────────────────────────────────────────┘ +``` + +The result: you load JSON once, and recurring patterns quietly turn into optimised, typed columns with no DDL and no tuning. + +### Virtual Columns Are Automatic JSON Indexes + +In this context, a “virtual column” is simply **Databend’s JSON index**. The ingestion flow decides whether a path such as `data['items'][0]['price']` is stable enough, infers a native type, and writes those values to a columnar sidecar with metadata—no DDL, no knobs. Nested JSON remains in compact JSONB form, while primitive paths become native numbers, strings, or booleans. + +``` +Raw JSON block ──(auto sampling)──▶ Candidate paths ──(stable?)──▶ JSON index +``` + +Instead of building a separate B-tree, Databend snapshots the values for a JSON path into a columnar structure: + +``` +JSON Path ───────────▶ Virtual Column (typed values + stats + location) +``` + +During queries the planner can jump directly to those pre-extracted values, just like hitting an index, while still falling back to the full JSON if an index entry is missing. + +### JSON Index Metadata + +Metadata stored alongside each block summarises the extra columns: + +``` +┌────────────────────────────┬───────────────────────┐ +│ Virtual Column Metadata │ Example │ +├────────────────────────────┼───────────────────────┤ +│ Column Id & JSON Path │ v123 -> ['user']['id'] │ +│ Type Code │ UInt64 / String │ +│ Byte Offset & Length │ Where values live │ +│ Row Count │ Matches base block │ +│ Statistics │ Min / Max / NDV │ +└────────────────────────────┴───────────────────────┘ +``` + +The writer packages these details into the table snapshot and stores the sidecar alongside the main block. Each entry remembers the JSON path, native type, byte offsets, and statistics so Databend can jump straight to the extracted values—or fall back to the original JSON—on demand. + +## Query Execution with JSON Indexes + +Once the indexes exist, the read path reduces to three quick decisions: + +``` +┌──────────────┐ rewrite paths ┌────────────────────┐ +│ SQL Planner │------------------>│ Virtual Column Map │ +└──────┬───────┘ └─────────┬──────────┘ + │ pushdown request │ per-block check + ▼ ▼ +┌──────────────┐ has virtual? ┌────────────────────┐ +│ Fuse Storage │----------------->│ Virtual File Read │ +└──────┬───────┘ │ └─────────┬──────────┘ + │ no └------------------┘ fallback + ▼ +┌──────────────┐ +│ JSONB Reader │ +└──────┬───────┘ + ▼ +┌──────────────┐ +│ Query Output │ +└──────────────┘ +``` + +- During planning, Databend rewrites calls such as `get_by_keypath` into direct virtual-column reads whenever metadata says an index exists. +- Storage hits the virtual column when it exists and reads only that Parquet slice, and it can even skip the original JSON column when every requested path is indexed. +- Otherwise it falls back to evaluating `get_by_keypath` on the JSONB column, keeping semantics intact. +- Filters, projections, and statistics operate on native types instead of reparsing JSON strings. + +Behind the scenes Databend keeps track of which JSON path produced each virtual column, so it knows exactly when the raw document can be skipped and when it needs to re-open it. + +## Working with Variant Data + +With indexing handled behind the scenes, you interact with Variant columns using familiar syntax and functions. + +### Inspect Virtual Columns + +Use [`SHOW VIRTUAL COLUMNS`](/tidb-cloud-lake/sql/show-virtual-columns.md) to list the automatically generated virtual columns for a table when you want to verify what JSON paths Databend has materialised. + +### Access Syntax + +Databend understands both Snowflake-style and PostgreSQL-style selectors; whichever style you prefer, the engine routes them through the same key-path parser and reuses the JSON indexes. Continuing with an `orders` example, you can reach nested fields like this: + +```sql title="Snowflake-style examples" +SELECT data['user']['profile']['name'], + data:user:profile.settings.theme, + data['items'][0]['price'] +FROM orders; +``` + +```sql title="PostgreSQL-style examples" +SELECT data->'user'->'profile'->>'name', + data#>>'{user,profile,settings,theme}', + data @> '{"user":{"id":123}}' +FROM orders; +``` + +### Function Highlights + +Beyond path accessors, Databend ships a rich Variant toolkit: + +- **Parsing & casting**: `parse_json`, `try_parse_json`, `to_variant`, `to_jsonb_binary` +- **Navigation & projection**: `get_path`, `get_by_keypath`, `flatten`, arrow (`->`, `->>`), path (`#>`, `#>>`) and containment operators (`@>`, `?`) +- **Modification**: `object_insert`, `object_remove_keys`, concatenation (`||`), `array_*` helpers +- **Analytics**: `json_extract_keys`, `json_length`, `jsonb_array_elements`, aggregates such as `json_array_agg` + +All functions operate directly on JSONB buffers inside the vectorised engine. + +## Performance Characteristics + +- Internal benchmarks vs. raw JSON scanning: + - Single-path lookups: **≈3× faster**, **≈26×** less data scanned. + - Multi-path projections: **≈1.4× faster**, **≈5.5×** less data read. + - Predicate pushdown composes with bloom/inverted indexes to prune blocks. +- The steadier the JSON shape, the more paths qualify for indexing. + +## Databend Advantages for Variant Data + +- **Snowflake-compatible surface area** – Bring existing queries and UDFs over intact. +- **Native JSONB execution** – Typed encoding plus vectorised operators avoid string shuffling. +- **Automatic JSON indexes** – Sampling, metadata, and pushdown make semi-structured data feel structured. +- **Operational efficiency** – Virtual blocks share lifecycle tooling with regular Fuse blocks, keeping storage and compute predictable. + +With automatic JSON indexing, Databend narrows the gap between flexible documents and high-performance analytics—semi-structured data becomes a first-class citizen in your warehouse. diff --git a/tidb-cloud-lake/guides/how-optimizer-works.md b/tidb-cloud-lake/guides/how-optimizer-works.md new file mode 100644 index 0000000000000..dee69a42787dd --- /dev/null +++ b/tidb-cloud-lake/guides/how-optimizer-works.md @@ -0,0 +1,249 @@ +--- +title: How Databend Optimizer Works +--- + +Databend's query optimizer orchestrates a series of transformations that turn SQL text into an executable plan. The optimizer builds an abstract representation of the query, enriches it with real-time statistics, applies rule-based rewrites, explores join alternatives, and finally picks the cheapest physical operators. + +The same optimizer pipeline powers analytic reporting, JSON search, vector retrieval, and geospatial search—**Databend maintains one optimizer that understands every data type it stores.** + +## What Makes Databend’s Optimizer Tick + +- Statistics stay up to date automatically: when data is written, Databend immediately maintains row counts, value ranges, and NDVs, so the optimizer can use fresh information for selectivity, join ordering, and costing without any manual maintenance. +- Shape first, cost second: the pipeline decorrelates, pushes predicates/limits, and splits aggregates before global search, shrinking the space and moving work to storage. +- DP + Cascades together: DPhpy finds good join orders; a memo‑driven Cascades pass selects the cheapest physical operators over the same SExpr memo. +- Distribution‑aware by design: planning decides local vs distributed and rewrites broadcasts into key‑based shuffles to avoid hotspots. + +## Example Query + +We’ll use the following analytics query and show how each stage transforms it. + +```sql +WITH recent_orders AS ( + SELECT * + FROM orders + WHERE order_date >= DATE_TRUNC('month', today()) - INTERVAL '3' MONTH + AND fulfillment_status <> 'CANCELLED' +) +SELECT c.region, + COUNT(*) AS order_count, + COUNT(o.id) AS row_count, + COUNT(DISTINCT o.product_id) AS product_count, + MIN(o.total_amount) AS min_amount, + AVG(o.total_amount) AS avg_amount +FROM recent_orders o +JOIN customers c ON o.customer_id = c.id +LEFT JOIN products p ON o.product_id = p.id +WHERE c.status = 'ACTIVE' + AND o.total_amount > 0 + AND p.is_active = TRUE + AND EXISTS ( + SELECT 1 + FROM support_tickets t + WHERE t.customer_id = c.id + AND t.created_at > DATE_TRUNC('month', today()) - INTERVAL '1' MONTH + ) +GROUP BY c.region +HAVING COUNT(*) > 100 +ORDER BY order_count DESC +LIMIT 10; +``` + +## Phase 1: Prep & Stats + +Phase 1 makes the query easy to reason about and equips it with the data needed for costing. On our example, the optimizer performs these concrete steps: + +### 1. Flatten the subquery + +Turn the `EXISTS (...)` check into a regular join so the rest of the pipeline sees a single join tree. + +``` +# Before (correlated) +customers ─┐ + ├─ JOIN ─ orders +support ───┘ │ + └─ EXISTS (references customers) + +# After (semi-join) +customers ─┐ +support ───┴─ SEMI JOIN ─ orders +``` + +Equivalent SQL (semantics preserved): + +```sql +FROM ( + SELECT * + FROM orders + WHERE order_date >= DATE_TRUNC('month', today()) - INTERVAL '3' MONTH + AND fulfillment_status <> 'CANCELLED' +) o +JOIN customers c ON o.customer_id = c.id +LEFT JOIN products p ON o.product_id = p.id +JOIN ( + SELECT DISTINCT customer_id + FROM support_tickets + WHERE created_at > DATE_TRUNC('month', today()) - INTERVAL '1' MONTH +) t ON t.customer_id = c.id +``` + +### 2. Check metadata shortcuts + +If an aggregate such as `MIN(o.total_amount)` has no filtering, the optimizer fetches it from table statistics instead of scanning: + +``` +-- Conceptual replacement when no filters apply +SELECT MIN(total_amount) +FROM orders + +# becomes + +SELECT table_stats.min_total_amount +``` + +In our query filters apply, so we keep the real computation. + +### 3. Attach statistics + +During planning, Databend collects row counts, value ranges, and distinct counts for the scanned tables. No SQL changes, but later selectivity and cost estimates stay accurate without any `ANALYZE` jobs. + +### 4. Normalize aggregates + +After statistics are attached, the optimizer rewrites counters it can share. `COUNT(o.id)` becomes `COUNT(*)`, so the engine maintains a single counter for both usages. Only the SELECT list changes: + +```sql +SELECT c.region, + COUNT(*) AS order_count, + COUNT(*) AS row_count, -- was COUNT(o.id) + COUNT(DISTINCT o.product_id) AS product_count, + MIN(o.total_amount) AS min_amount, + AVG(o.total_amount) AS avg_amount +... +``` + +## Phase 2: Refine the Logic + +Phase 2 runs targeted rewrites that keep only the work we truly need: + +### 1. Push filters/limits down + +``` +# Before +Filter (o.total_amount > 0) +└─ Scan (recent_orders) + +# After +Scan (recent_orders, pushdown_predicates=[total_amount > 0]) +``` + +Sorting with a limit also tightens up: + +``` +# Before +Limit (10) +└─ Sort (order_count DESC) + └─ Join (...) + +# After +Sort (order_count DESC) +└─ Limit (10) + └─ Join (...) +``` + +### 2. Drop redundancies + +``` +# Before +Filter (1 = 1 AND c.status = 'ACTIVE') +└─ ... + +# After +Filter (c.status = 'ACTIVE') +└─ ... +``` + +### 3. Split aggregates + +``` +# Before +Aggregate (COUNT/AVG) +└─ Scan (recent_orders) + +# After +Aggregate (final) +└─ Aggregate (partial) + └─ Scan (recent_orders) +``` + +Partial aggregates run close to the data, then a single final step merges the results. + +### 4. Push filters into the CTE + +Predicates that reference only CTE columns are pushed inside the definition of `recent_orders`, shrinking the data before joins: + +```sql +WITH recent_orders AS ( + SELECT * + FROM orders + WHERE order_date >= DATE_TRUNC('month', today()) - INTERVAL '3' MONTH + AND fulfillment_status <> 'CANCELLED' + AND total_amount > 0 -- pushed from outer query +) +``` + +## Phase 3: Cost & Physical Plan + +With a tidy logical plan and fresh statistics, the optimizer makes three decisions: + +### 1. Choose the join order + +A statistics-guided dynamic program (`DPhpyOptimizer`) evaluates join permutations. It prefers building hash tables on the smaller filtered tables (`customers`, `products`, `support_tickets`) while the large fact table (`recent_orders`) probes them: + +``` + customers products + \ / + HASH JOIN (build) + | + recent_orders (probe) + | + SEMI JOIN support_tickets +``` + +### 2. Tighten join semantics + +Rule-based passes adjust joins discovered above. + +#### a. Turn safe LEFT joins into INNER joins + +Our query starts with `LEFT JOIN products p`, but the predicate `p.is_active = TRUE` guarantees we only keep rows with a matching product. The optimizer flips the join type: + +``` +# Before +recent_orders ──⊗── products (LEFT) + filter: p.is_active = TRUE + +# After +recent_orders ──⋈── products (INNER) +``` + +#### b. Drop duplicate predicates + +If a join condition repeats (for example `o.customer_id = c.id` listed twice), `DeduplicateJoinConditionOptimizer` keeps just one copy so the executor evaluates it once. + +#### c. Optionally swap join sides + +If join reordering remains enabled, `CommuteJoin` can flip the join inputs so the optimizer aligns with the desired build/probe orientation (for example, making sure the smaller table builds the hash table or matching a distribution strategy): + +``` +# Before # After (smaller table builds) +customers ──⋈── recent_orders recent_orders ──⋈── customers +``` + +### 3. Pick the physical plan and distribution + +`CascadesOptimizer` picks between hash, merge, or nested-loop implementations using Databend’s cost model. The pipeline also decides whether the plan should remain local; if a warehouse cluster is available and joins are large, broadcast exchanges are rewritten into hash shuffles so work spreads evenly. Final cleanups drop redundant projections and unused CTEs. + +## Observability + +- `EXPLAIN` shows the final optimized plan. +- `EXPLAIN PIPELINE` reveals the execution topology. +- `SET enable_optimizer_trace = 1` records every optimizer step in the query log. diff --git a/tidb-cloud-lake/guides/integrate-with-amazon-s3.md b/tidb-cloud-lake/guides/integrate-with-amazon-s3.md new file mode 100644 index 0000000000000..20e871f22e0cf --- /dev/null +++ b/tidb-cloud-lake/guides/integrate-with-amazon-s3.md @@ -0,0 +1,146 @@ +--- +title: Amazon S3 +--- + +The Amazon S3 data integration enables you to import files from S3 buckets into Databend. It supports CSV, Parquet, and NDJSON file formats, with options for one-time imports or continuous ingestion that automatically polls for new files. + +## Supported File Formats + +| Format | Description | +|---------|--------------------------------------------------------------------| +| CSV | Comma-separated values with configurable delimiters and headers | +| Parquet | Columnar storage format, efficient for analytical workloads | +| NDJSON | Newline-delimited JSON, one JSON object per line | + +## Creating an S3 Data Source + +1. Navigate to **Data** > **Data Sources** and click **Create Data Source**. + +2. Select **AWS - Credentials** as the service type, and fill in the credentials: + +| Field | Required | Description | +|----------------|----------|--------------------------------------| +| **Name** | Yes | A descriptive name for this data source | +| **Access Key** | Yes | AWS Access Key ID | +| **Secret Key** | Yes | AWS Secret Access Key | + +![Create S3 Data Source](/img/cloud/dataintegration/create-s3-datasource.png) + +3. Click **Test Connectivity** to verify the credentials. If the test succeeds, click **OK** to save the data source. + +:::tip +The AWS credentials must have read access to the target S3 bucket. If you plan to use the **Clean Up Original Files** option, write and delete permissions are also required. +::: + +## Creating an S3 Integration Task + +### Step 1: Basic Info + +1. Navigate to **Data** > **Data Integration** and click **Create Task**. + +2. Select an S3 data source, then configure the basic settings: + +| Field | Required | Description | +|--------------------|----------|--------------------------------------------------------------------------------------------------| +| **Data Source** | Yes | Select an existing AWS data source from the dropdown | +| **Name** | Yes | A name for this integration task | +| **File Path** | Yes | S3 URI with optional wildcard pattern (e.g., `s3://mybucket/data/2025-*.csv`) | +| **File Type** | Auto | Auto-detected from file extension. Supported: CSV, Parquet, NDJSON | + +![Create S3 Task - Basic Info](/img/cloud/dataintegration/create-s3-task-basic-info.png) + +#### CSV Options + +When the file type is CSV, additional options are available: + +| Field | Default | Description | +|----------------------|---------|----------------------------------------------------------------| +| **Record Delimiter** | `\n` | Line separator. Options: `\n`, `\r`, `\r\n` | +| **Field Delimiter** | `,` | Column separator. Supports custom values | +| **Has Header** | Yes | Whether the first row contains column names. If disabled, columns are auto-named as `c1`, `c2`, `c3`, etc. | + +#### File Path Patterns + +The file path supports wildcard patterns for matching multiple files: + +``` +s3://mybucket/data/2025-*.csv # All CSV files starting with "2025-" +s3://mybucket/logs/*.parquet # All Parquet files in the logs directory +s3://mybucket/events/data.ndjson # A single specific file +``` + +### Step 2: Preview Data + +After configuring the basic settings, click **Next** to preview the source data. + +![S3 Preview Data](/img/cloud/dataintegration/s3-task-preview-step.png) + +The system reads the first matching file and displays: +- Sample data with column names and types +- A list of matching files (up to 25 files) with their sizes + +:::note +Files larger than 10GB are skipped during preview. Only the first 25 matching files are displayed. +::: + +### Step 3: Set Target Table + +Configure the destination in Databend: + +| Field | Description | +|---------------------|--------------------------------------------------------------------| +| **Warehouse** | Select the target Databend Cloud warehouse for running the import | +| **Target Database** | Choose the target database in Databend | +| **Target Table** | The table name in Databend | + +![S3 Set Target Table](/img/cloud/dataintegration/s3-task-set-target-table.png) + +The system auto-detects columns from the source files. You can review and edit column names and types before proceeding. + +#### Ingestion Options + +| Option | Default | Description | +|------------------------------|----------|--------------------------------------------------------------------------------------------------| +| **Continuous Ingestion** | On | When enabled, the system periodically (every 30 seconds) polls the S3 path and imports new files | +| **Error Handling** | Abort | **Abort**: Stop on first error. **Continue**: Skip failed rows and continue importing | +| **Clean Up Original Files** | Off | When enabled, deletes source files from S3 after successful import | +| **Allow Duplicate Imports** | Off | When enabled, allows re-importing files that have already been imported | + +:::tip +Enable **Continuous Ingestion** when new files are regularly added to the S3 path and you want them automatically loaded into Databend. Disable it for one-time imports. +::: + +Click **Create** to finalize the integration task. + +## Task Behavior + +| Continuous Ingestion | Behavior | +|----------------------|---------------------------------------------------------------------------------------------------| +| On | Runs continuously, polling S3 every 30 seconds for new files and importing them automatically. | +| Off | Imports matching files once and stops. Already-imported files are skipped unless **Allow Duplicate Imports** is enabled. | + +## Advanced Configuration + +### Continuous Ingestion + +When enabled, the task runs as a long-lived process that periodically scans the S3 path for new files. Each cycle: + +1. Lists objects matching the file path pattern +2. Identifies new files not yet imported +3. Imports new files into the target table using `COPY INTO` +4. Records import results in the task history + +This is useful for data pipelines where upstream systems continuously write new files to S3. + +### Error Handling + +- **Abort** (default): The import stops at the first error encountered. Use this when data quality is critical and you want to investigate any issues before proceeding. +- **Continue**: Skips rows that cause errors and continues importing the remaining data. Use this when partial imports are acceptable and you want to maximize data throughput. + +### Clean Up Original Files (PURGE) + +When enabled, source files are deleted from S3 after they are successfully imported into Databend. This helps manage storage costs and prevents reprocessing. Ensure your AWS credentials have `s3:DeleteObject` permission on the target bucket. + +### Allow Duplicate Imports (FORCE) + +By default, the system tracks which files have been imported and skips them in subsequent runs. Enabling this option forces re-import of all matching files, regardless of whether they have been previously imported. This is useful when you need to reload data after schema changes or data corrections. \ No newline at end of file diff --git a/tidb-cloud-lake/guides/integrate-with-mysql.md b/tidb-cloud-lake/guides/integrate-with-mysql.md new file mode 100644 index 0000000000000..50572d9073df9 --- /dev/null +++ b/tidb-cloud-lake/guides/integrate-with-mysql.md @@ -0,0 +1,205 @@ +--- +title: MySQL +--- + +The MySQL data integration enables you to sync data from MySQL databases into Databend in real-time, with support for full snapshot loads, continuous Change Data Capture (CDC), or a combination of both. + +## Sync Modes + +| Sync Mode | Description | +|----------------|--------------------------------------------------------------------------------------------------------------| +| Snapshot | Performs a one-time full data load from the source table. Ideal for initial data migration or periodic bulk imports. | +| CDC Only | Continuously captures real-time changes (inserts, updates, deletes) from MySQL binlog. Requires a primary key for merge operations. | +| Snapshot + CDC | First performs a full snapshot, then seamlessly transitions to continuous CDC. Recommended for most use cases. | + +## Prerequisites + +Before setting up MySQL data integration, ensure your MySQL instance meets the following requirements: + +### Enable Binlog + +MySQL binlog must be enabled with ROW format for CDC and Snapshot + CDC modes: + +```ini title='my.cnf' +[mysqld] +server-id=1 +log-bin=mysql-bin +binlog-format=ROW +binlog-row-image=FULL +``` + +After modifying the configuration, restart MySQL for the changes to take effect. + +### Create a Dedicated User (Recommended) + +Create a MySQL user with the necessary permissions for data replication: + +```sql +CREATE USER 'databend_cdc'@'%' IDENTIFIED BY 'your_password'; +GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'databend_cdc'@'%'; +FLUSH PRIVILEGES; +``` + +### Network Access + +Ensure the MySQL instance is accessible from Databend Cloud. Check your firewall rules and security groups to allow inbound connections on the MySQL port. + +## Creating a MySQL Data Source + +1. Navigate to **Data** > **Data Sources** and click **Create Data Source**. + +2. Select **MySQL - Credentials** as the service type, and fill in the connection details: + +| Field | Required | Description | +|-----------------|----------|-----------------------------------------------------------------------------| +| **Name** | Yes | A descriptive name for this data source | +| **Hostname** | Yes | MySQL server hostname or IP address | +| **Port Number** | Yes | MySQL server port (default: 3306) | +| **DB Username** | Yes | MySQL user with replication permissions | +| **DB Password** | Yes | Password for the MySQL user | +| **Database Name** | Yes | The source database name | +| **DB Charset** | No | Character set (default: utf8mb4) | +| **Server ID** | No | Unique binlog replication identifier. Auto-generated if not provided | + +![Create MySQL Data Source](/img/cloud/dataintegration/databendcloud-dataintegration-create-mysql-source.png) + +3. Click **Test Connectivity** to verify the connection. If the test succeeds, click **OK** to save the data source. + +## Creating a MySQL Integration Task + +### Step 1: Basic Info + +1. Navigate to **Data** > **Data Integration** and click **Create Task**. + +![Data Integration Page](/img/cloud/dataintegration/dataintegration-page-with-create-button.png) + +2. Configure the basic settings: + +| Field | Required | Description | +|----------------------------|-------------|--------------------------------------------------------------------------------------------------| +| **Data Source** | Yes | Select an existing MySQL data source from the dropdown | +| **Name** | Yes | A name for this integration task | +| **Source Database** | — | Automatically displayed based on the selected data source | +| **Source Table** | Yes | Select the table to sync from the MySQL database | +| **Sync Mode** | Yes | Choose from **Snapshot**, **CDC Only**, or **Snapshot + CDC** | +| **Primary Key** | Conditional | The unique identifier column for merge operations. Required for CDC Only and Snapshot + CDC modes | +| **Sync Interval** | Yes | Interval (in seconds) between write operations (default: 3) | +| **Batch Size** | No | Number of rows per batch | +| **Allow Delete** | No | Whether to permit DELETE operations in CDC. Available for CDC Only and Snapshot + CDC modes | + +![Create Task - Basic Info](/img/cloud/dataintegration/create-mysql-task-step1-basic-info.png) + +#### Snapshot Mode Options + +When using **Snapshot** mode, additional options are available: + +- **Snapshot WHERE Condition**: A SQL WHERE clause to filter data during the snapshot (e.g., `created_at > '2024-01-01'`). This allows you to load only a subset of the source data. + +- **Archive Schedule**: Enable periodic archiving to automatically run snapshots on a recurring schedule. When enabled, the following fields appear: + +| Field | Description | +|---------------------|--------------------------------------------------------------------------| +| **Cron Expression** | Schedule in cron format (e.g., `0 1 * * *` for daily at 1:00 AM) | +| **Timezone** | Timezone for the schedule (default: UTC) | +| **Mode** | Archive frequency — **Daily**, **Weekly**, or **Monthly** | +| **Time Column** | The time-based column used for archive partitioning (e.g., `created_at`) | + +### Step 2: Preview Data + +After configuring the basic settings, click **Next** to preview the source data. + +![Preview Data](/img/cloud/dataintegration/create-mysql-task-preview-data-step.png) + +The system fetches a sample row from the selected MySQL table and displays the column names and data types. Review the data to ensure the correct table and columns are selected before proceeding. + +### Step 3: Set Target Table + +Configure the destination in Databend: + +| Field | Description | +|---------------------|--------------------------------------------------------------------| +| **Warehouse** | Select the target Databend Cloud warehouse for running the sync | +| **Target Database** | Choose the target database in Databend | +| **Target Table** | The table name in Databend (defaults to the source table name) | + +![Set Target Table](/img/cloud/dataintegration/dataintegration-mysql-set-target-table.png) + +The system automatically maps source columns to the target table schema. Review the column mappings, then click **Create** to finalize the integration task. + +## Task Behavior by Sync Mode + +| Sync Mode | Behavior | +|----------------|---------------------------------------------------------------------------------------------------| +| Snapshot | Runs once and automatically stops after the full data load is complete. | +| CDC Only | Runs continuously, capturing real-time changes until manually stopped. | +| Snapshot + CDC | Completes the initial snapshot first, then transitions to continuous CDC until manually stopped. | + +For CDC tasks, the current binlog position is saved as a checkpoint when stopped, allowing the task to resume from where it left off when restarted. + +## Sync Mode Details + +### Snapshot + +Snapshot mode performs a one-time full read of the source table and loads all data into the target table in Databend. + +**Use cases:** +- Initial data migration from MySQL to Databend +- Periodic full data refresh +- One-time data imports with WHERE condition filtering + +**Features:** +- Supports WHERE condition filtering to load a subset of data +- Supports periodic archive scheduling for recurring snapshots +- Task automatically stops after completion + +### CDC (Change Data Capture) + +CDC mode continuously monitors the MySQL binlog and captures real-time row-level changes (INSERT, UPDATE, DELETE) from the source table. + +**Use cases:** +- Real-time data replication +- Keeping Databend in sync with operational MySQL databases +- Event-driven data pipelines + +**How it works:** + +1. Connects to MySQL binlog using a unique server ID +2. Captures row-level changes in real-time +3. Writes changes to a raw staging table in Databend +4. Periodically merges changes into the target table using the primary key +5. Saves checkpoint (binlog position) for crash recovery + +:::note +CDC mode requires MySQL binlog to be enabled with ROW format, and a primary key (unique column) must be specified. The MySQL user must have `REPLICATION SLAVE` and `REPLICATION CLIENT` privileges. +::: + +### Snapshot + CDC + +This mode combines both approaches: it first performs a full snapshot of the source table, then seamlessly transitions to CDC mode for continuous change capture. This is the recommended mode for most data integration scenarios, as it ensures a complete initial data load followed by ongoing real-time synchronization. + +## Advanced Configuration + +### Primary Key + +The primary key specifies the unique identifier column used for MERGE operations during CDC. When a change event is captured, Databend uses this key to determine whether to insert a new row or update an existing one. Typically, this should be the primary key of the source table. + +### Sync Interval + +The sync interval (in seconds) controls how frequently captured changes are merged into the target table. A shorter interval provides lower latency but may increase resource usage. The default value of 3 seconds is suitable for most workloads. + +### Batch Size + +Controls the number of rows processed per batch during data loading. Adjusting this value can help optimize throughput for large tables. Leave empty to use the system default. + +### Allow Delete + +When enabled (default for CDC modes), DELETE operations captured from MySQL binlog are applied to the target table in Databend. When disabled, deletes are ignored, and the target table retains all historical records. This is useful for scenarios where you want to maintain a complete audit trail. + +### Archive Schedule + +For Snapshot mode, you can configure periodic archiving to automatically run snapshots on a recurring schedule. This is useful for scenarios where you need regular data refreshes without continuous CDC overhead. + +- **Cron Expression**: Standard cron format for scheduling (e.g., `0 1 * * *` for daily at 1:00 AM) +- **Mode**: Choose **Daily**, **Weekly**, or **Monthly** archiving +- **Time Column**: Specify the column used for time-based partitioning (e.g., `created_at`) +- **Timezone**: Set the timezone for the schedule (default: UTC) \ No newline at end of file diff --git a/tidb-cloud-lake/guides/json-search.md b/tidb-cloud-lake/guides/json-search.md new file mode 100644 index 0000000000000..7d2430db8e198 --- /dev/null +++ b/tidb-cloud-lake/guides/json-search.md @@ -0,0 +1,128 @@ +--- +title: JSON & Search +--- + +> **Scenario:** CityDrive attaches a metadata JSON payload to every extracted frame. This JSON data is extracted from video keyframes by background tools, containing rich unstructured information like scene recognition and object detection. We need to filter this JSON in Databend with Elasticsearch-style syntax without replicating it to an external system. JSON without copying it out of Databend. + +Databend keeps these heterogeneous signals in one warehouse. Inverted indexes power Elasticsearch-style search on VARIANT columns, bitmap tables summarize label coverage, vector indexes answer similarity lookups, and native GEOMETRY columns support spatial filters. + +## 1. Create the Metadata Table +Store one JSON payload per frame so every search runs against the same structure. + +```sql +CREATE DATABASE IF NOT EXISTS video_unified_demo; +USE video_unified_demo; + +CREATE OR REPLACE TABLE frame_metadata_catalog ( + doc_id STRING, + meta_json VARIANT, + captured_at TIMESTAMP, + INVERTED INDEX idx_meta_json (meta_json) +); + +-- Sample rows for the queries below. +INSERT INTO frame_metadata_catalog VALUES + ('FRAME-0101', PARSE_JSON('{"scene":{"weather_code":"rain","lighting":"day"},"camera":{"sensor_view":"roof"},"vehicle":{"speed_kmh":32.4},"detections":{"objects":[{"type":"vehicle","confidence":0.88},{"type":"brake_light","confidence":0.64}]},"media_meta":{"tagging":{"labels":["hard_brake","rain","downtown_loop"]}}}'), '2025-01-01 08:15:21'), + ('FRAME-0102', PARSE_JSON('{"scene":{"weather_code":"rain","lighting":"day"},"camera":{"sensor_view":"roof"},"vehicle":{"speed_kmh":24.8},"detections":{"objects":[{"type":"pedestrian","confidence":0.92},{"type":"bike","confidence":0.35}]},"media_meta":{"tagging":{"labels":["pedestrian","swerve","crosswalk"]}}}'), '2025-01-01 08:33:54'), + ('FRAME-0201', PARSE_JSON('{"scene":{"weather_code":"overcast","lighting":"day"},"camera":{"sensor_view":"front"},"vehicle":{"speed_kmh":48.1},"detections":{"objects":[{"type":"lane_merge","confidence":0.74},{"type":"vehicle","confidence":0.41}]},"media_meta":{"tagging":{"labels":["lane_merge","urban"]}}}'), '2025-01-01 11:12:02'), + ('FRAME-0301', PARSE_JSON('{"scene":{"weather_code":"clear","lighting":"day"},"camera":{"sensor_view":"front"},"vehicle":{"speed_kmh":52.6},"detections":{"objects":[{"type":"vehicle","confidence":0.82},{"type":"hard_brake","confidence":0.59}]},"media_meta":{"tagging":{"labels":["hard_brake","highway"]}}}'), '2025-01-02 09:44:18'), + ('FRAME-0401', PARSE_JSON('{"scene":{"weather_code":"lightfog","lighting":"night"},"camera":{"sensor_view":"rear"},"vehicle":{"speed_kmh":38.9},"detections":{"objects":[{"type":"traffic_light","confidence":0.78},{"type":"vehicle","confidence":0.36}]},"media_meta":{"tagging":{"labels":["night_lowlight","traffic_light"]}}}'), '2025-01-03 21:18:07'); +``` + +> Need multimodal data (vector embeddings, GPS trails, tag bitmaps)? Grab the schemas from the [Vector](/tidb-cloud-lake/guides/vector-search.md) and [Geo](/tidb-cloud-lake/guides/geo-analytics.md) guides so you can combine them with the search results shown here. + +## 2. Search Patterns with `QUERY()` +### Array Match +```sql +SELECT doc_id, + captured_at, + meta_json['detections'] AS detections +FROM frame_metadata_catalog +WHERE QUERY('meta_json.detections.objects.type:pedestrian') +ORDER BY captured_at DESC +LIMIT 5; +``` + +Sample output: + +``` +doc_id | captured_at | detections +FRAME-0102 | 2025-01-01 08:33:54 | {"objects":[{"confidence":0.92,"type":"pedestrian"},{"confidence":0.35,"type":"bike"}]} +``` + +### Boolean AND +```sql +SELECT doc_id, captured_at +FROM frame_metadata_catalog +WHERE QUERY('meta_json.scene.weather_code:rain + AND meta_json.camera.sensor_view:roof') +ORDER BY captured_at; +``` + +Sample output: + +``` +doc_id | captured_at +FRAME-0101 | 2025-01-01 08:15:21 +FRAME-0102 | 2025-01-01 08:33:54 +``` + +### Boolean OR / List +```sql +SELECT doc_id, + meta_json['media_meta']['tagging']['labels'] AS labels +FROM frame_metadata_catalog +WHERE QUERY('meta_json.media_meta.tagging.labels:(hard_brake OR swerve OR lane_merge)') +ORDER BY captured_at DESC +LIMIT 10; +``` + +Sample output: + +``` +doc_id | labels +FRAME-0301 | ["hard_brake","highway"] +FRAME-0201 | ["lane_merge","urban"] +FRAME-0102 | ["pedestrian","swerve","crosswalk"] +FRAME-0101 | ["hard_brake","rain","downtown_loop"] +``` + +### Numeric Ranges +```sql +SELECT doc_id, + meta_json['vehicle']['speed_kmh']::DOUBLE AS speed +FROM frame_metadata_catalog +WHERE QUERY('meta_json.vehicle.speed_kmh:{30 TO 80}') +ORDER BY speed DESC +LIMIT 10; +``` + +Sample output: + +``` +doc_id | speed +FRAME-0301 | 52.6 +FRAME-0201 | 48.1 +FRAME-0401 | 38.9 +FRAME-0101 | 32.4 +``` + +### Boosting +```sql +SELECT doc_id, + SCORE() AS relevance +FROM frame_metadata_catalog +WHERE QUERY('meta_json.scene.weather_code:rain AND (meta_json.media_meta.tagging.labels:hard_brake^2 OR meta_json.media_meta.tagging.labels:swerve)') +ORDER BY relevance DESC +LIMIT 8; +``` + +Sample output: + +``` +doc_id | relevance +FRAME-0101 | 7.0161 +FRAME-0102 | 3.6252 +``` + +`QUERY()` follows Elasticsearch semantics (boolean logic, ranges, boosts, lists). `SCORE()` exposes the Elasticsearch relevance so you can re-rank results inside SQL. See [Search functions](/tidb-cloud-lake/sql/full-text-search-functions.md) for the full operator list. diff --git a/tidb-cloud-lake/guides/jupyter-notebook.md b/tidb-cloud-lake/guides/jupyter-notebook.md new file mode 100644 index 0000000000000..30b2724378f25 --- /dev/null +++ b/tidb-cloud-lake/guides/jupyter-notebook.md @@ -0,0 +1,304 @@ +--- +title: Jupyter Notebook +sidebar_position: 6 +--- + +[Jupyter Notebook](https://jupyter.org) is a web-based interactive application that enables you to create notebook documents that feature live code, interactive plots, widgets, equations, images, etc., and share these documents easily. It is also quite versatile as it can support many programming languages via kernels such as Julia, Python, Ruby, Scala, Haskell, and R. + +With the SQLAlchemy library in Python or [ipython-sql](https://github.com/catherinedevlin/ipython-sql), you can establish a connection to Databend and Databend Cloud within a Jupyter Notebook, allowing you to execute queries and visualize your data from Databend directly in the notebook. + +Alternatively, you can run SQL queries in Python using the [Databend Python Binding](https://pypi.org/project/databend/) library, allowing you to harness DataBend's capabilities directly within your local Python environment or online services like Jupyter Notebook and Google Colab without the need to deploy a separate DataBend instance. + +## Tutorial-1: Integrating Databend with Jupyter Notebook using SQLAlchemy + +In this tutorial, you will first deploy a local Databend instance and Jupyter Notebook, and then run a sample notebook to connect to your local Databend through the SQLAlchemy library, as well as write and visualize data within the notebook. + +Before you start, make sure you have completed the following tasks: + +- You have [Python](https://www.python.org/) installed on your system. +- Download the sample notebook [databend.ipynb](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/integration/databend.ipynb) to a local folder. + +### Step 1. Deploy Databend + +1. Follow the [Deployment Guide](/guides/self-hosted) to deploy a local Databend. +2. Create a SQL user in Databend. You will use this account to connect to Databend in Jupyter Notebook. + +```sql +CREATE ROLE user1_role; +GRANT ALL ON *.* TO ROLE user1_role; +CREATE USER user1 IDENTIFIED BY 'abc123' WITH DEFAULT_ROLE = 'user1_role'; +GRANT ROLE user1_role TO user1; +``` + +### Step 2. Deploy Jupyter Notebook + +1. Install Jupyter Notebook with pip: + +```shell +pip install notebook +``` + +2. Install dependencies with pip: + +```shell +pip install sqlalchemy +pip install pandas +pip install pymysql +``` + +### Step 3. Run Sample Notebook + +1. Run the command below to start Jupyter Notebook: + +```shell +jupyter notebook +``` + +This will start up Jupyter and your default browser should start (or open a new tab) to the following URL: http://localhost:8888/tree + +![Alt text](/img/integration/notebook-tree.png) + +2. On the **Files** tab, navigate to the sample notebook you downloaded and open it. + +3. In the sample notebook, run the cells sequentially. By doing so, you create a table containing 5 rows in your local Databend, and visualize the data with a bar chart. + +![Alt text](/img/integration/integration-gui-jupyter.png) + +## Tutorial-2: Integrating Databend with Jupyter Notebook using ipython-sql + +In this tutorial, you will first deploy a local Databend instance and Jupyter Notebook, and then run a sample notebook to connect to your local Databend through [ipython-sql](https://github.com/catherinedevlin/ipython-sql), as well as write and visualize data within the notebook. + +Before you start, ensure that you have [Python](https://www.python.org/) installed on your system. + +### Step 1. Deploy Databend + +1. Follow the [Deployment Guide](https://docs.databend.com/guides/self-hosted) to deploy a local Databend. +2. Create a SQL user in Databend. You will use this account to connect to Databend in Jupyter Notebook. + +```sql +CREATE ROLE user1_role; +GRANT ALL ON *.* TO ROLE user1_role; +CREATE USER user1 IDENTIFIED BY 'abc123' WITH DEFAULT_ROLE = 'user1_role'; +GRANT ROLE user1_role TO user1; +``` + +### Step 2. Deploy Jupyter Notebook + +1. Install Jupyter Notebook with pip: + +```shell +pip install notebook +``` + +2. Install dependencies with pip: + +:::note +To proceed with this tutorial, you'll need a version of SQLAlchemy that is below 2.0. Please be aware that in SQLAlchemy 2.0 and later versions, the result.DataFrame() method has been deprecated and is no longer available. Instead, you can use the pandas library to directly create a DataFrame from query results and perform plotting. +::: + +```shell +pip install ipython-sql databend-sqlalchemy +pip install sqlalchemy +``` + +### Step 3. Create and Connect a Notebook to Databend + +1. Run the command below to start Jupyter Notebook: + +```shell +jupyter notebook +``` + +This will start up Jupyter and your default browser should start (or open a new tab) to the following URL: http://localhost:8888/tree + +![Alt text](/img/integration/notebook-tree.png) + +2. Select **New** > **Python 3** to create a notebook. + +3. Run the following code sequentially in separate cells. By doing so, you create a table containing 5 rows in your local Databend, and visualize the data with a bar chart. + +```python title='In [1]:' +%load_ext sql +``` + +```sql title='In [2]:' +%%sql databend://user1:abc123@localhost:8000/default +create table if not exists user(created_at Date, count Int32); +insert into user values('2022-04-01', 5); +insert into user values('2022-04-01', 3); +insert into user values('2022-04-03', 4); +insert into user values('2022-04-03', 1); +insert into user values('2022-04-04', 10); +``` + +```python title='In [3]:' +result = %sql select created_at as date, count(*) as count from user group by created_at; +result +``` + +```python title='In [4]:' +%matplotlib inline + +df = result.DataFrame() +df.plot.bar(x='date', y='count') +``` + +You can now see a bar chart on the notebook: + +![Alt text](/img/integration/jupyter-ipython-sql.png) + +## Tutorial-3: Integrating Databend with Jupyter Notebook with Python Binding Library + +In this tutorial, you will first deploy a local Databend instance and Jupyter Notebook, and then run queries in a notebook through the [Databend Python Binding](https://pypi.org/project/databend/) library, as well as write and visualize data within the notebook. + +Before you start, ensure that you have [Python](https://www.python.org/) installed on your system. + +### Step 1. Deploy Jupyter Notebook + +1. Install Jupyter Notebook with pip: + +```shell +pip install notebook +``` + +2. Install dependencies with pip: + +```shell +pip install databend +pip install matplotlib +``` + +### Step 2. Create a Notebook + +1. Run the command below to start Jupyter Notebook: + +```shell +jupyter notebook +``` + +This will start up Jupyter and your default browser should start (or open a new tab) to the following URL: http://localhost:8888/tree + +![Alt text](/img/integration/notebook-tree.png) + +2. Select **New** > **Python 3** to create a notebook. + +3. Run the following code sequentially in separate cells: + +```python title='In [1]:' +# Import the necessary libraries +from databend import SessionContext + +# Create a DataBend session +ctx = SessionContext() +``` + +```python title='In [2]:' +# Create a table in DataBend +ctx.sql("CREATE TABLE IF NOT EXISTS user (created_at Date, count Int32)") +``` + +```python title='In [3]:' +# Insert multiple rows of data into the table +ctx.sql("INSERT INTO user VALUES ('2022-04-01', 5), ('2022-04-01', 3), ('2022-04-03', 4), ('2022-04-03', 1), ('2022-04-04', 10)") +``` + +```python title='In [4]:' +# Execute a query +result = ctx.sql("SELECT created_at as date, count(*) as count FROM user GROUP BY created_at") + +# Display the query result +result.show() +``` + +```python title='In [5]:' +# Import libraries for data visualization +import matplotlib.pyplot as plt + +# Convert the query result to a Pandas DataFrame +df = result.to_pandas() +``` + +```python title='In [6]:' +# Create a bar chart to visualize the data +df.plot.bar(x='date', y='count') +plt.show() +``` + +You can now see a bar chart on the notebook: + +![Alt text](/img/integration/localhost_8888_notebooks_Untitled.ipynb.png) + +## Tutorial-4: Integrating Databend Cloud with Jupyter Notebook using ipython-sql + +In this tutorial, you will first obtain connection information from Databend Cloud and deploy Jupyter Notebook, then create and connect a notebook to Databend Cloud through [ipython-sql](https://github.com/catherinedevlin/ipython-sql), as well as write and visualize data within the notebook. + +Before you start, ensure that you have [Python](https://www.python.org/) installed on your system. + +### Step 1. Obtain Connection Information + +Obtain the connection information from Databend Cloud. For how to do that, refer to [Connecting to a Warehouse](/tidb-cloud-lake/guides/warehouse.md#connecting). + +### Step 2. Deploy Jupyter Notebook + +1. Install Jupyter Notebook with pip: + +```shell +pip install notebook +``` + +2. Install dependencies with pip: + +```shell +pip install ipython-sql databend-sqlalchemy +pip install sqlalchemy +``` + +### Step 3. Create and Connect a Notebook to Databend Cloud + +1. Run the command below to start Jupyter Notebook: + +```shell +jupyter notebook +``` + +This will start up Jupyter and your default browser should start (or open a new tab) to the following URL: http://localhost:8888/tree + +![Alt text](@site/static/img/documents/pricing-billing/notebook-tree.png) + +2. Select **New** > **Python 3** to create a notebook. + +3. Run the following code sequentially in separate cells. By doing so, you create a table containing 5 rows in Databend Cloud, and visualize the data with a bar chart. + +```python +from sqlalchemy import create_engine, text +from sqlalchemy.engine.base import Connection, Engine +import databend_sqlalchemy +import matplotlib.pyplot as plt +import pandas as pd +``` + +```python +engine = create_engine(f"databend://cloudapp:@:443/default?secure=true") +connection = engine.connect() +``` + +```python +connection.execute('create table if not exists user(created_at Date, count Int32);') +connection.execute("insert into user values('2022-04-01', 5);") +connection.execute("insert into user values('2022-04-01', 3);") +connection.execute("insert into user values('2022-04-03', 4);") +connection.execute("insert into user values('2022-04-03', 1);") +connection.execute("insert into user values('2022-04-04', 10);") +result=connection.execute('select created_at as date, count(*) as count from user group by created_at;') +``` + +```python +rows = result.fetchall() +df = pd.DataFrame(rows, columns=result.keys()) +df.plot.bar(x='date', y='count') +plt.show() +``` + +You can now see a bar chart on the notebook: + +![Alt text](@site/static/img/documents/BI/jupyter-bar.png) diff --git a/tidb-cloud-lake/guides/lakehouse-etl.md b/tidb-cloud-lake/guides/lakehouse-etl.md new file mode 100644 index 0000000000000..c5b109c7639ba --- /dev/null +++ b/tidb-cloud-lake/guides/lakehouse-etl.md @@ -0,0 +1,224 @@ +--- +title: Lakehouse ETL +--- + +> **Scenario:** CityDrive's data engineering team exports every batch of dash-cam data as Parquet (videos, frame events, metadata JSON, embeddings, GPS traces, traffic light distances). These Parquet files aggregate all multimodal signals extracted from the raw video streams, forming the foundation of the warehouse. They want to update Databend's shared tables via a single COPY pipeline. to refresh the shared tables in Databend. + +The loading loop is straightforward: + +``` +Object storage → STAGE → COPY INTO tables → (optional) STREAMS/TASKS +``` + +Adjust the bucket path or format to match your environment, then paste the commands below. Syntax mirrors the data loading guides. + +--- + +## 1. Create a Stage +Point a reusable stage at the bucket that holds the CityDrive exports. Swap the credentials/URL for your own account; Parquet is used here, but any supported format works with a different `FILE_FORMAT`. + +```sql +CREATE OR REPLACE CONNECTION citydrive_s3 + STORAGE_TYPE = 's3' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = ''; + +CREATE OR REPLACE STAGE citydrive_stage + URL = 's3://citydrive-lakehouse/raw/' + CONNECTION = (CONNECTION_NAME = 'citydrive_s3') + FILE_FORMAT = (TYPE = 'PARQUET'); +``` + +> [!IMPORTANT] +> Replace the placeholder AWS keys and bucket URL with real values from your environment. Without valid credentials, `LIST`, `SELECT ... FROM @citydrive_stage`, and `COPY INTO` statements will fail with `InvalidAccessKeyId`/403 errors from S3. + +Quick sanity check: + +```sql +LIST @citydrive_stage/videos/; +LIST @citydrive_stage/frame-events/; +LIST @citydrive_stage/manifests/; +LIST @citydrive_stage/frame-embeddings/; +LIST @citydrive_stage/frame-locations/; +LIST @citydrive_stage/traffic-lights/; +``` + +--- + +## 2. Peek at the Files +Use a `SELECT` against the stage to confirm schema and sample rows before loading. + +```sql +SELECT * +FROM @citydrive_stage/videos/capture_date=2025-01-01/videos.parquet +LIMIT 5; + +SELECT * +FROM @citydrive_stage/frame-events/batch_2025_01_01.parquet +LIMIT 5; +``` + +Databend infers the format from the stage definition, so no extra options are required here. + +--- + +## 3. COPY INTO the Unified Tables +Each export maps to one of the shared tables used across the guides. Inline casts keep schemas consistent even if upstream ordering changes. + +### `citydrive_videos` +```sql +COPY INTO citydrive_videos (video_id, vehicle_id, capture_date, route_name, weather, camera_source, duration_sec) +FROM ( + SELECT video_id::STRING, + vehicle_id::STRING, + capture_date::DATE, + route_name::STRING, + weather::STRING, + camera_source::STRING, + duration_sec::INT + FROM @citydrive_stage/videos/ +) +FILE_FORMAT = (TYPE = 'PARQUET'); +``` + +### `frame_events` +```sql +COPY INTO frame_events (frame_id, video_id, frame_index, collected_at, event_tag, risk_score, speed_kmh) +FROM ( + SELECT frame_id::STRING, + video_id::STRING, + frame_index::INT, + collected_at::TIMESTAMP, + event_tag::STRING, + risk_score::DOUBLE, + speed_kmh::DOUBLE + FROM @citydrive_stage/frame-events/ +) +FILE_FORMAT = (TYPE = 'PARQUET'); +``` + +### `frame_metadata_catalog` +```sql +COPY INTO frame_metadata_catalog (doc_id, meta_json, captured_at) +FROM ( + SELECT doc_id::STRING, + meta_json::VARIANT, + captured_at::TIMESTAMP + FROM @citydrive_stage/manifests/ +) +FILE_FORMAT = (TYPE = 'PARQUET'); +``` + +### `frame_embeddings` +```sql +COPY INTO frame_embeddings (frame_id, video_id, sensor_view, embedding, encoder_build, created_at) +FROM ( + SELECT frame_id::STRING, + video_id::STRING, + sensor_view::STRING, + embedding::VECTOR(768), -- replace with your actual dimension + encoder_build::STRING, + created_at::TIMESTAMP + FROM @citydrive_stage/frame-embeddings/ +) +FILE_FORMAT = (TYPE = 'PARQUET'); +``` + +### `frame_geo_points` +```sql +COPY INTO frame_geo_points (video_id, frame_id, position_wgs84, solution_grade, source_system, created_at) +FROM ( + SELECT video_id::STRING, + frame_id::STRING, + position_wgs84::GEOMETRY, + solution_grade::INT, + source_system::STRING, + created_at::TIMESTAMP + FROM @citydrive_stage/frame-locations/ +) +FILE_FORMAT = (TYPE = 'PARQUET'); +``` + +### `signal_contact_points` +```sql +COPY INTO signal_contact_points (node_id, signal_position, video_id, frame_id, frame_position, distance_m, created_at) +FROM ( + SELECT node_id::STRING, + signal_position::GEOMETRY, + video_id::STRING, + frame_id::STRING, + frame_position::GEOMETRY, + distance_m::DOUBLE, + created_at::TIMESTAMP + FROM @citydrive_stage/traffic-lights/ +) +FILE_FORMAT = (TYPE = 'PARQUET'); +``` + +After this step, every downstream workload—SQL analytics, Elasticsearch `QUERY()`, vector similarity, geospatial filters—reads the exact same data. + +--- + +## 4. Streams for Incremental Reactions (Optional) +Use streams when you want downstream jobs to consume only the rows added since the last batch. + +```sql +CREATE OR REPLACE STREAM frame_events_stream ON TABLE frame_events; + +SELECT * FROM frame_events_stream; -- shows newly copied rows +-- …process rows… +SELECT * FROM frame_events_stream WITH CONSUME; -- advance the offset +``` + +`WITH CONSUME` ensures the stream cursor moves forward after the rows are handled. Reference: [Streams](/tidb-cloud-lake/guides/track-and-transform-data-via-streams.md). + +--- + +## 5. Tasks for Scheduled Loads (Optional) +Tasks run **one SQL statement** on a schedule. Create lightweight tasks per table or wrap the logic in a stored procedure if you prefer one entry point. + +```sql +CREATE OR REPLACE TASK task_load_citydrive_videos + WAREHOUSE = 'default' + SCHEDULE = 10 MINUTE +AS + COPY INTO citydrive_videos (video_id, vehicle_id, capture_date, route_name, weather, camera_source, duration_sec) + FROM ( + SELECT video_id::STRING, + vehicle_id::STRING, + capture_date::DATE, + route_name::STRING, + weather::STRING, + camera_source::STRING, + duration_sec::INT + FROM @citydrive_stage/videos/ + ) + FILE_FORMAT = (TYPE = 'PARQUET'); + +ALTER TASK task_load_citydrive_videos RESUME; + +CREATE OR REPLACE TASK task_load_frame_events + WAREHOUSE = 'default' + SCHEDULE = 10 MINUTE + AS + COPY INTO frame_events (frame_id, video_id, frame_index, collected_at, event_tag, risk_score, speed_kmh) + FROM ( + SELECT frame_id::STRING, + video_id::STRING, + frame_index::INT, + collected_at::TIMESTAMP, + event_tag::STRING, + risk_score::DOUBLE, + speed_kmh::DOUBLE + FROM @citydrive_stage/frame-events/ + ) + FILE_FORMAT = (TYPE = 'PARQUET'); + +ALTER TASK task_load_frame_events RESUME; +``` + +Add more tasks for `frame_metadata_catalog`, embeddings, or GPS data using the same pattern. Full options: [Tasks](/tidb-cloud-lake/guides/automate-data-loading-with-tasks.md). + +--- + +Once these jobs run, every guide in the Unified Workloads series reads from the same CityDrive tables—no extra ETL layers, no duplicate storage. diff --git a/tidb-cloud-lake/guides/load-avro.md b/tidb-cloud-lake/guides/load-avro.md new file mode 100644 index 0000000000000..b98901fe50811 --- /dev/null +++ b/tidb-cloud-lake/guides/load-avro.md @@ -0,0 +1,110 @@ +--- +title: Loading Avro into Databend +sidebar_label: Avro +--- + +## What is Avro? + +[Apache Avro™](https://avro.apache.org/) is the leading serialization format for record data, and first choice for streaming data pipelines. + +## Loading Avro File + +The common syntax for loading AVRO file is as follows: + +```sql +COPY INTO [.] + FROM { internalStage | externalStage | externalLocation } +[ PATTERN = '' ] +FILE_FORMAT = (TYPE = AVRO) +``` + +- For more Avro file format options, refer to [Avro File Format Options](/tidb-cloud-lake/sql/input-output-file-formats.md#avro-options). +- For more COPY INTO table options, refer to [COPY INTO table](/tidb-cloud-lake/sql/copy-into-table.md). + +## Tutorial: Loading Avro Data into Databend from Remote HTTP URL + +In this tutorial, you will create a table in Databend using an Avro schema and load Avro data directly from a GitHub-hosted `.avro` file via HTTPS. + +### Step 1: Review the Avro Schema + +Before creating a table in Databend, let’s take a quick look at the Avro schema we’re working with: [userdata.avsc](https://github.com/Teradata/kylo/blob/master/samples/sample-data/avro/userdata.avsc). This schema defines a record named `User` with 13 fields, mostly of type string, along with `int` and `float`. + +```json +{ + "type": "record", + "name": "User", + "fields": [ + {"name": "registration_dttm", "type": "string"}, + {"name": "id", "type": "int"}, + {"name": "first_name", "type": "string"}, + {"name": "last_name", "type": "string"}, + {"name": "email", "type": "string"}, + {"name": "gender", "type": "string"}, + {"name": "ip_address", "type": "string"}, + {"name": "cc", "type": "string"}, + {"name": "country", "type": "string"}, + {"name": "birthdate", "type": "string"}, + {"name": "salary", "type": "float"}, + {"name": "title", "type": "string"}, + {"name": "comments", "type": "string"} + ] +} +``` + +### Step 2: Create a Table in Databend + +Create a table that matches the structure defined in the schema: + +```sql +CREATE TABLE userdata ( + registration_dttm STRING, + id INT, + first_name STRING, + last_name STRING, + email STRING, + gender STRING, + ip_address STRING, + cc VARIANT, + country STRING, + birthdate STRING, + salary FLOAT, + title STRING, + comments STRING +); +``` + +### Step 3: Load Data from a Remote HTTPS URL + +```sql +COPY INTO userdata +FROM 'https://raw.githubusercontent.com/Teradata/kylo/master/samples/sample-data/avro/userdata1.avro' +FILE_FORMAT = (type = avro); +``` + +```sql +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ File │ Rows_loaded │ Errors_seen │ First_error │ First_error_line │ +├──────────────────────────────────────────────────────────────┼─────────────┼─────────────┼──────────────────┼──────────────────┤ +│ Teradata/kylo/master/samples/sample-data/avro/userdata1.avro │ 1000 │ 0 │ NULL │ NULL │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Step 4: Query the Data + +You can now explore the data you just imported: + +```sql +SELECT id, first_name, email, salary FROM userdata LIMIT 5; +``` + +```sql +┌───────────────────────────────────────────────────────────────────────────────────┐ +│ id │ first_name │ email │ salary │ +├─────────────────┼──────────────────┼──────────────────────────┼───────────────────┤ +│ 1 │ Amanda │ ajordan0@com.com │ 49756.53 │ +│ 2 │ Albert │ afreeman1@is.gd │ 150280.17 │ +│ 3 │ Evelyn │ emorgan2@altervista.org │ 144972.52 │ +│ 4 │ Denise │ driley3@gmpg.org │ 90263.05 │ +│ 5 │ Carlos │ cburns4@miitbeian.gov.cn │ NULL │ +└───────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/guides/load-csv.md b/tidb-cloud-lake/guides/load-csv.md new file mode 100644 index 0000000000000..4afc900ea0b2a --- /dev/null +++ b/tidb-cloud-lake/guides/load-csv.md @@ -0,0 +1,134 @@ +--- +title: Loading CSV into Databend +sidebar_label: CSV +--- + +## What is CSV? + +CSV (Comma Separated Values) is a simple file format used to store tabular data, such as a spreadsheet or database. CSV files are plain text files that contain data in a tabular format, where each row is represented on a new line, and columns are separated by a delimiter. + +The following example shows a CSV file with two records: + +```text +Title_0,Author_0 +Title_1,Author_1 +``` + +## Loading CSV File + +The common syntax for loading CSV file is as follows: + +```sql +COPY INTO [.] +FROM { userStage | internalStage | externalStage | externalLocation } +[ PATTERN = '' ] +[ FILE_FORMAT = ( + TYPE = CSV, + RECORD_DELIMITER = '', + FIELD_DELIMITER = '', + SKIP_HEADER = , + COMPRESSION = AUTO +) ] +``` + +- For more CSV file format options, refer to [CSV File Format Options](/tidb-cloud-lake/sql/input-output-file-formats.md#csv-options). +- For more COPY INTO table options, refer to [COPY INTO table](/tidb-cloud-lake/sql/copy-into-table.md). + +## Tutorial: Loading Data from CSV Files + +### Step 1. Create an Internal Stage + +Create an internal stage to store the CSV files. + +```sql +CREATE STAGE my_csv_stage; +``` + +### Step 2. Create CSV files + +Generate a CSV file using these SQL statements: + +```sql +COPY INTO @my_csv_stage +FROM ( + SELECT + 'Title_' || CAST(number AS VARCHAR) AS title, + 'Author_' || CAST(number AS VARCHAR) AS author + FROM numbers(100000) +) + FILE_FORMAT = (TYPE = CSV, COMPRESSION = gzip) +; +``` + +Verify the creation of the CSV file: + +```sql +LIST @my_csv_stage; +``` + +Result: + +```text +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ size │ md5 │ last_modified │ creator │ +├────────────────────────────────────────────────────────────────┼────────┼────────────────────────────────────┼───────────────────────────────┼──────────────────┤ +│ data_4bb7f864-f5f2-41e8-a442-68c2a709be5a_0000_00000000.csv.gz │ 483110 │ "0c8e28daed524468269e44ac13d2f463" │ 2023-12-26 11:37:21.000 +0000 │ NULL │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Step 3: Create Target Table + +```sql +CREATE TABLE books +( + title VARCHAR, + author VARCHAR +); +``` + +### Step 4. Copying Directly from CSV + +To directly copy data into your table from CSV files, use the following SQL command: + +```sql +COPY INTO books +FROM @my_csv_stage +PATTERN = '.*[.]csv.gz' +FILE_FORMAT = ( + TYPE = CSV, + FIELD_DELIMITER = ',', + RECORD_DELIMITER = '\n', + SKIP_HEADER = 0, -- Skip the first line if it is a header, here we don't have a header + COMPRESSION = AUTO +); +``` + +Result: + +```text +┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ File │ Rows_loaded │ Errors_seen │ First_error │ First_error_line │ +├────────────────────────────────────────────────────────────────┼─────────────┼─────────────┼──────────────────┼──────────────────┤ +│ data_4bb7f864-f5f2-41e8-a442-68c2a709be5a_0000_00000000.csv.gz │ 100000 │ 0 │ NULL │ NULL │ +└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Step 4 (Option). Using SELECT to Copy Data + +For more control, like transforming data while copying, use the SELECT statement. Learn more at [`SELECT from CSV`](/tidb-cloud-lake/guides/query-csv-files-in-stage.md). + +```sql +COPY INTO books (title, author) +FROM ( + SELECT $1, $2 + FROM @my_csv_stage +) +PATTERN = '.*[.]csv.gz' +FILE_FORMAT = ( + TYPE = 'CSV', + FIELD_DELIMITER = ',', + RECORD_DELIMITER = '\n', + SKIP_HEADER = 0, -- Skip the first line if it is a header, here we don't have a header + COMPRESSION = 'AUTO' +); +``` diff --git a/tidb-cloud-lake/guides/load-from-bucket.md b/tidb-cloud-lake/guides/load-from-bucket.md new file mode 100644 index 0000000000000..54fefbbcfd149 --- /dev/null +++ b/tidb-cloud-lake/guides/load-from-bucket.md @@ -0,0 +1,75 @@ +--- +title: Loading from Bucket +sidebar_label: Bucket +--- + +When data files are stored in an object storage bucket, such as Amazon S3, it is possible to load them directly into Databend using the [COPY INTO](/tidb-cloud-lake/sql/copy-into-table.md) command. Please note that the files must be in a format supported by Databend, otherwise the data cannot be imported. For more information on the file formats supported by Databend, see [Input & Output File Formats](/tidb-cloud-lake/sql/input-output-file-formats.md). + +![image](/img/load/load-data-from-s3.jpeg) + +This tutorial uses Amazon S3 bucket as an example and offers a detailed, step-by-step guide to help you effectively navigate the process of loading data from files stored in a bucket. + +## Tutorial: Loading from Amazon S3 Bucket + +### Before You Begin + +Before you start, make sure you have completed the following tasks: + +1. Download and save the sample file [books.parquet](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.parquet) to a local folder. The file contains two records: + +```text title='books.parquet' +Transaction Processing,Jim Gray,1992 +Readings in Database Systems,Michael Stonebraker,2004 +``` + +2. Create a bucket in Amazon S3 and upload the sample file to the bucket. For how to do that, refer to these links: + +- Creating a bucket: https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html +- Uploading objects: https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html + +For this tutorial, a bucket named **databend-toronto** was created in the region **US East (Ohio)** (ID: us-east-2). + +![Alt text](/img/load/toronto-bucket.png) + +### Step 1. Create Target Table + +Create a table with the following SQL statements in Databend: + +```sql +USE default; +CREATE TABLE books +( + title VARCHAR, + author VARCHAR, + date VARCHAR +); +``` + +### Step 2. Copy Data into Table + +1. Load data into the target table with the [COPY INTO](/tidb-cloud-lake/sql/copy-into-table.md) command: + +```sql +COPY INTO books +FROM 's3://databend-toronto/' +CONNECTION = ( + ACCESS_KEY_ID = '', + SECRET_ACCESS_KEY = '' +) +PATTERN = '.*[.]parquet' +FILE_FORMAT = ( + TYPE = 'PARQUET' +); +``` + +2. Check the loaded data: + +```sql +SELECT * FROM books; + +--- +title |author |date| +----------------------------+-------------------+----+ +Transaction Processing |Jim Gray |1992| +Readings in Database Systems|Michael Stonebraker|2004| +``` diff --git a/tidb-cloud-lake/guides/load-from-local-file.md b/tidb-cloud-lake/guides/load-from-local-file.md new file mode 100644 index 0000000000000..2d1774ba39e27 --- /dev/null +++ b/tidb-cloud-lake/guides/load-from-local-file.md @@ -0,0 +1,166 @@ +--- +title: Loading from Local File +sidebar_label: Local +--- + +Uploading your local data files to a stage or bucket before loading them into Databend can be unnecessary. Instead, you can use [BendSQL](/tidb-cloud-lake/guides/bendsql.md), the Databend native CLI tool, to directly import the data. This simplifies the workflow and can save you storage fees. + +Please note that the files must be in a format supported by Databend, otherwise the data cannot be imported. For more information on the file formats supported by Databend, see [Input & Output File Formats](/tidb-cloud-lake/sql/input-output-file-formats.md). + +You can also load local files into tables programmatically using JDBC or Python drivers. + +## Load Methods + +There are two methods to load data from local files: + +1. **Stage**: Upload the local file to an internal stage, then copy data from the staged file into the table. File upload occurs either through databend-query or using a presigned URL, depending on the `presigned_url_disabled` connection option (default: `false`). +2. **Streaming**: Load the file directly into the table during upload. Use this method when the file is too large to store as a single object in your object storage. + +## Tutorial 1 - Load from a Local File + +This tutorial uses a CSV file as an example to demonstrate how to import data into Databend using [BendSQL](/tidb-cloud-lake/guides/bendsql.md) from a local source. + +### Before You Begin + +Download and save the sample file [books.csv](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.csv) to a local folder. The file contains two records: + +```text title='books.csv' +Transaction Processing,Jim Gray,1992 +Readings in Database Systems,Michael Stonebraker,2004 +``` + +### Step 1. Create Database and Table + +```shell +❯ bendsql +root@localhost:8000/default> CREATE DATABASE book_db; + +root@localhost:8000/default> USE book_db; + +root@localhost:8000/book_db> CREATE TABLE books +( + title VARCHAR, + author VARCHAR, + date VARCHAR +); + +CREATE TABLE books ( + title VARCHAR, + author VARCHAR, + date VARCHAR +) +``` + +### Step 2. Load Data into Table + +Send loading data request with the following command: + +```shell +❯ bendsql --query='INSERT INTO book_db.books from @_databend_load file_format=(type=csv)' --data=@books.csv +``` + +- The `@_databend_load` is a placeholder representing local file data. +- The [file_format clause](/tidb-cloud-lake/sql/input-output-file-formats.md) uses the same syntax as the COPY command. + +Alternatively, use a Python script: + +```python + import databend_driver + dsn = "databend://root:@localhost:8000/?sslmode=disable", + client = databend_driver.BlockingDatabendClient(dsn) + conn = client.get_conn() + query = "INSERT INTO book_db.books from @_databend_load file_format=(type=csv)" + progress = conn.load_file(query, "book.csv") + conn.close() +``` + +Or use Java code: + +```java +import java.sql.Connection; +import java.sql.Statement; +import java.io.FileInputStream; +import java.nio.file.Files; +import com.databend.jdbc.DatabendConnection; +String url = "jdbc:databend://localhost:8000"; +try (FileInputStream fileInputStream = new FileInputStream(new File("book.csv"))); + Connection connection = DriverManager.getConnection(url, "databend", "databend"); + Statement statement = connection.createStatement()) { + DatabendConnection databendConnection = connection.unwrap(DatabendConnection.class); + String sql = "insert into book_db.books from @_databend_load file_format=(type=csv)"; + int nUpdate = databendConnection.loadStreamToTable(sql, fileInputStream, f.length(), DatabendConnection.LoadMethod.Stage); +} +``` + +:::note +Be sure that you are able to connect to the backend object storage for Databend from local BendSQL directly. +If not, you need to specify the `--set presigned_url_disabled=1` option to disable the presigned url feature. +::: + + +### Step 3. Verify Loaded Data + +```shell +root@localhost:8000/book_db> SELECT * FROM books; + +┌───────────────────────────────────────────────────────────────────────┐ +│ title │ author │ date │ +│ Nullable(String) │ Nullable(String) │ Nullable(String) │ +├──────────────────────────────┼─────────────────────┼──────────────────┤ +│ Transaction Processing │ Jim Gray │ 1992 │ +│ Readings in Database Systems │ Michael Stonebraker │ 2004 │ +└───────────────────────────────────────────────────────────────────────┘ +``` + +## Tutorial 2 - Load into Specified Columns + +In [Tutorial 1](#tutorial-1---load-from-a-csv-file), you created a table containing three columns that exactly match the data in the sample file. You can also load data into specified columns of a table, so the table does not need to have the same columns as the data to be loaded as long as the specified columns can match. This tutorial shows how to do that. + +### Before You Begin + +Before you start this tutorial, make sure you have completed [Tutorial 1](#tutorial-1---load-from-a-csv-file). + +### Step 1. Create Table + +Create a table including an extra column named "comments" compared to the table "books": + +```shell +root@localhost:8000/book_db> CREATE TABLE bookcomments +( + title VARCHAR, + author VARCHAR, + comments VARCHAR, + date VARCHAR +); + +CREATE TABLE bookcomments ( + title VARCHAR, + author VARCHAR, + comments VARCHAR, + date VARCHAR +) +``` + +### Step 2. Load Data into Table + +Send loading data request with the following command: + +```shell +❯ bendsql --query='INSERT INTO book_db.bookcomments(title,author,date) file_format=(type=csv)' --data=@books.csv +``` + +Notice that the `query` part above specifies the columns (title, author, and date) to match the loaded data. + +### Step 3. Verify Loaded Data + +```shell +root@localhost:8000/book_db> SELECT * FROM bookcomments; + +┌──────────────────────────────────────────────────────────────────────────────────────────┐ +│ title │ author │ comments │ date │ +│ Nullable(String) │ Nullable(String) │ Nullable(String) │ Nullable(String) │ +├──────────────────────────────┼─────────────────────┼──────────────────┼──────────────────┤ +│ Transaction Processing │ Jim Gray │ NULL │ 1992 │ +│ Readings in Database Systems │ Michael Stonebraker │ NULL │ 2004 │ +└──────────────────────────────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/guides/load-from-remote-file.md b/tidb-cloud-lake/guides/load-from-remote-file.md new file mode 100644 index 0000000000000..805556d2e8382 --- /dev/null +++ b/tidb-cloud-lake/guides/load-from-remote-file.md @@ -0,0 +1,76 @@ +--- +title: Loading from Remote File +sidebar_label: Remote +--- + +To load data from remote files into Databend, the [COPY INTO](/tidb-cloud-lake/sql/copy-into-table.md) command can be used. This command allows you to copy data from a variety of sources, including remote files, into Databend with ease. With COPY INTO, you can specify the source file location, file format, and other relevant parameters to tailor the import process to your needs. Please note that the files must be in a format supported by Databend, otherwise the data cannot be imported. For more information on the file formats supported by Databend, see [Input & Output File Formats](/tidb-cloud-lake/sql/input-output-file-formats.md). + +## Loading with Glob Patterns + +Databend facilitates the loading of data from remote files through the use of glob patterns. These patterns allow for efficient and flexible data import from multiple files that follow a specific naming convention. Databend supports the following glob patterns: + +### Set Pattern + +The set pattern in glob expressions enables matching any one of the characters within a set. For example, consider files named `data_file_a.csv`, `data_file_b.csv`, and `data_file_c.csv`. Utilize the set pattern to load data from all three files: + +```sql +COPY INTO your_table +FROM 'https://your-remote-location/data_file_{a,b,c}.csv' ... +``` + +### Range Pattern + +When dealing with files named `data_file_001.csv`, `data_file_002.csv`, and `data_file_003.csv`, the range pattern becomes useful. Load data from this series of files using the range pattern like this: + +```sql +COPY INTO your_table +FROM 'https://your-remote-location/data_file_[001-003].csv' ... +``` + +## Tutorial - Load from a Remote File + +This tutorial demonstrates how to import data into Databend from a remote CSV file. The sample file [books.csv](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.csv) contains two records: + +```text title='books.csv' +Transaction Processing,Jim Gray,1992 +Readings in Database Systems,Michael Stonebraker,2004 +``` + +### Step 1. Create Table + +```sql +CREATE TABLE books +( + title VARCHAR, + author VARCHAR, + date VARCHAR +); +``` + +### Step 2. Load Data into Table + +```sql +COPY INTO books +FROM 'https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.csv' +FILE_FORMAT = ( + TYPE = 'CSV', + FIELD_DELIMITER = ',', + RECORD_DELIMITER = '\n', + SKIP_HEADER = 0 +); +``` + +### Step 3. Verify Loaded Data + +```sql +SELECT * FROM books; +``` + +```text title='Result:' +┌──────────────────────────────────┬─────────────────────┬───────┐ +│ title │ author │ date │ +├──────────────────────────────────┼─────────────────────┼───────┤ +│ Transaction Processing │ Jim Gray │ 1992 │ +│ Readings in Database Systems │ Michael Stonebraker │ 2004 │ +└──────────────────────────────────┴─────────────────────┴───────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/guides/load-from-stage.md b/tidb-cloud-lake/guides/load-from-stage.md new file mode 100644 index 0000000000000..e87964515bb5e --- /dev/null +++ b/tidb-cloud-lake/guides/load-from-stage.md @@ -0,0 +1,231 @@ +--- +title: Loading from Stage +sidebar_label: Stage +--- + +Databend enables you to easily import data from files uploaded to either the user stage or an internal/external stage. To do so, you can first upload the files to a stage using [BendSQL](/tidb-cloud-lake/guides/bendsql.md), and then employ the [COPY INTO](/tidb-cloud-lake/sql/copy-into-table.md) command to load the data from the staged file. Please note that the files must be in a format supported by Databend, otherwise the data cannot be imported. For more information on the file formats supported by Databend, see [Input & Output File Formats](/tidb-cloud-lake/sql/input-output-file-formats.md). + +![image](/img/load/load-data-from-stage.jpeg) + +The following tutorials offer a detailed, step-by-step guide to help you effectively navigate the process of loading data from files in a stage. + +## Before You Begin + +Before you start, make sure you have completed the following tasks: + +- Download and save the sample file [books.parquet](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.parquet) to a local folder. The file contains two records: + +```text +Transaction Processing,Jim Gray,1992 +Readings in Database Systems,Michael Stonebraker,2004 +``` + +- Create a table with the following SQL statements in Databend: + +```sql +USE default; +CREATE TABLE books +( + title VARCHAR, + author VARCHAR, + date VARCHAR +); +``` + +## Tutorial 1: Loading from User Stage + +Follow this tutorial to upload the sample file to the user stage and load data from the staged file into Databend. + +### Step 1: Upload Sample File + +1. Upload the sample file using [BendSQL](/tidb-cloud-lake/guides/bendsql.md): + +```sql +root@localhost:8000/default> PUT fs:///Users/eric/Documents/books.parquet @~ + +┌───────────────────────────────────────────────┐ +│ file │ status │ +│ String │ String │ +├─────────────────────────────────────┼─────────┤ +│ /Users/eric/Documents/books.parquet │ SUCCESS │ +└───────────────────────────────────────────────┘ +``` + +2. Verify the staged file: + +```sql +LIST @~; + +name |size|md5 |last_modified |creator| +-------------+----+----------------------------------+-----------------------------+-------+ +books.parquet| 998|"88432bf90aadb79073682988b39d461c"|2023-06-27 16:03:51.000 +0000| | +``` + +### Step 2. Copy Data into Table + +1. Load data into the target table with the [COPY INTO](/tidb-cloud-lake/sql/copy-into-table.md) command: + +```sql +COPY INTO books FROM @~ files=('books.parquet') FILE_FORMAT = (TYPE = PARQUET); +``` + +2. Verify the loaded data: + +```sql +SELECT * FROM books; + +--- +title |author |date| +----------------------------+-------------------+----+ +Transaction Processing |Jim Gray |1992| +Readings in Database Systems|Michael Stonebraker|2004| +``` + +## Tutorial 2: Loading from Internal Stage + +Follow this tutorial to upload the sample file to an internal stage and load data from the staged file into Databend. + +### Step 1. Create an Internal Stage + +1. Create an internal stage with the [CREATE STAGE](/tidb-cloud-lake/sql/create-stage.md) command: + +```sql +CREATE STAGE my_internal_stage; +``` +2. Verify the created stage: + +```sql +SHOW STAGES; + +name |stage_type|number_of_files|creator |comment| +-----------------+----------+---------------+----------+-------+ +my_internal_stage|Internal | 0|'root'@'%'| | +``` + +### Step 2: Upload Sample File + +1. Upload the sample file using [BendSQL](/tidb-cloud-lake/guides/bendsql.md): + +```sql +root@localhost:8000/default> CREATE STAGE my_internal_stage; + +root@localhost:8000/default> PUT fs:///Users/eric/Documents/books.parquet @my_internal_stage + +┌───────────────────────────────────────────────┐ +│ file │ status │ +│ String │ String │ +├─────────────────────────────────────┼─────────┤ +│ /Users/eric/Documents/books.parquet │ SUCCESS │ +└───────────────────────────────────────────────┘ +``` + +2. Verify the staged file: + +```sql +LIST @my_internal_stage; + +name |size |md5 |last_modified |creator| +-----------------------------------+------+----------------------------------+-----------------------------+-------+ +books.parquet | 998|"88432bf90aadb79073682988b39d461c"|2023-06-28 02:32:15.000 +0000| | +``` + +### Step 3. Copy Data into Table + +1. Load data into the target table with the [COPY INTO](/tidb-cloud-lake/sql/copy-into-table.md) command: + +```sql +COPY INTO books +FROM @my_internal_stage +FILES = ('books.parquet') +FILE_FORMAT = ( + TYPE = 'PARQUET' +); +``` +2. Verify the loaded data: + +```sql +SELECT * FROM books; + +--- +title |author |date| +----------------------------+-------------------+----+ +Transaction Processing |Jim Gray |1992| +Readings in Database Systems|Michael Stonebraker|2004| +``` + +## Tutorial 3: Loading from External Stage + +Follow this tutorial to upload the sample file to an external stage and load data from the staged file into Databend. + +### Step 1. Create an External Stage + +1. Create an external stage with the [CREATE STAGE](/tidb-cloud-lake/sql/create-stage.md) command: + +```sql +CREATE STAGE my_external_stage + URL = 's3://databend' + CONNECTION = ( + ENDPOINT_URL = 'http://127.0.0.1:9000', + ACCESS_KEY_ID = 'ROOTUSER', + SECRET_ACCESS_KEY = 'CHANGEME123' + ); +``` + +2. Verify the created stage: + +```sql +SHOW STAGES; + +name |stage_type|number_of_files|creator |comment| +-----------------+----------+---------------+------------------+-------+ +my_external_stage|External | |'root'@'%'| | +``` + +### Step 2: Upload Sample File + +1. Upload the sample file using [BendSQL](/tidb-cloud-lake/guides/bendsql.md): + +```sql +root@localhost:8000/default> PUT fs:///Users/eric/Documents/books.parquet @my_external_stage + +┌───────────────────────────────────────────────┐ +│ file │ status │ +│ String │ String │ +├─────────────────────────────────────┼─────────┤ +│ /Users/eric/Documents/books.parquet │ SUCCESS │ +└───────────────────────────────────────────────┘ +``` + +2. Verify the staged file: + +```sql +LIST @my_external_stage; + +name |size|md5 |last_modified |creator| +-------------+----+----------------------------------+-----------------------------+-------+ +books.parquet| 998|"88432bf90aadb79073682988b39d461c"|2023-06-28 04:13:15.178 +0000| | +``` + +### Step 3. Copy Data into Table + +1. Load data into the target table with the [COPY INTO](/tidb-cloud-lake/sql/copy-into-table.md) command: + +```sql +COPY INTO books +FROM @my_external_stage +FILES = ('books.parquet') +FILE_FORMAT = ( + TYPE = 'PARQUET' +); +``` +2. Verify the loaded data: + +```sql +SELECT * FROM books; + +--- +title |author |date| +----------------------------+-------------------+----+ +Transaction Processing |Jim Gray |1992| +Readings in Database Systems|Michael Stonebraker|2004| +``` \ No newline at end of file diff --git a/tidb-cloud-lake/guides/load-ndjson.md b/tidb-cloud-lake/guides/load-ndjson.md new file mode 100644 index 0000000000000..c334da94cdfc0 --- /dev/null +++ b/tidb-cloud-lake/guides/load-ndjson.md @@ -0,0 +1,123 @@ +--- +title: Loading NDJSON into Databend +sidebar_label: NDJSON +--- + +## What is NDJSON? + +NDJSON is built on top of JSON, and it is a strict subset of JSON. Each line must contain a separate, self-contained valid JSON object. + +The following example shows a NDJSON file with two JSON objects: + +```text +{"title":"Title_0","author":"Author_0"} +{"title":"Title_1","author":"Author_1"} +``` + +## Loading NDJSON File + +The common syntax for loading NDJSON file is as follows: + +```sql +COPY INTO [.] +FROM { userStage | internalStage | externalStage | externalLocation } +[ PATTERN = '' ] +[ FILE_FORMAT = ( + TYPE = NDJSON, + COMPRESSION = AUTO +) ] +``` + +- For more NDJSON file format options, refer to [NDJSON File Format Options](/tidb-cloud-lake/sql/input-output-file-formats.md#ndjson-options). +- For more COPY INTO table options, refer to [COPY INTO table](/tidb-cloud-lake/sql/copy-into-table.md). + +## Tutorial: Loading Data from NDJSON Files + +### Step 1. Create an Internal Stage + +Create an internal stage to store the NDJSON files. + +```sql +CREATE STAGE my_ndjson_stage; +``` + +### Step 2. Create NDJSON files + +Generate a NDJSON file using these SQL statements: + +```sql +COPY INTO @my_ndjson_stage +FROM ( + SELECT + 'Title_' || CAST(number AS VARCHAR) AS title, + 'Author_' || CAST(number AS VARCHAR) AS author + FROM numbers(100000) +) + FILE_FORMAT = (TYPE = NDJSON) +; +``` + +Verify the creation of the NDJSON file: + +```sql +LIST @my_ndjson_stage; +``` + +Result: + +```text +┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ size │ md5 │ last_modified │ creator │ +├────────────────────────────────────────────────────────────────┼─────────┼────────────────────────────────────┼───────────────────────────────┼──────────────────┤ +│ data_b3d94fad-3052-42e4-b090-26409e88c7b9_0000_00000000.ndjson │ 4777780 │ "d1cc98fefc3e3aa0649cade880d754aa" │ 2023-12-26 12:15:59.000 +0000 │ NULL │ +└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Step 3: Create Target Table + +```sql +CREATE TABLE books +( + title VARCHAR, + author VARCHAR +); +``` + +### Step 4. Copying Directly from NDJSON + +To directly copy data into your table from NDJSON files, use the following SQL command: + +```sql +COPY INTO books +FROM @my_ndjson_stage +PATTERN = '.*[.]ndjson' +FILE_FORMAT = ( + TYPE = NDJSON +); +``` + +Result: + +```text +┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ File │ Rows_loaded │ Errors_seen │ First_error │ First_error_line │ +├────────────────────────────────────────────────────────────────┼─────────────┼─────────────┼──────────────────┼──────────────────┤ +│ data_b3d94fad-3052-42e4-b090-26409e88c7b9_0000_00000000.ndjson │ 100000 │ 0 │ NULL │ NULL │ +└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Step 4 (Option). Using SELECT to Copy Data + +For more control, like transforming data while copying, use the SELECT statement. Learn more at [`SELECT from NDJSON`](../04-transform/03-querying-ndjson.md). + +```sql +COPY INTO books(title, author) +FROM ( + SELECT $1:title, $1:author + FROM @my_ndjson_stage +) +PATTERN = '.*[.]ndjson' +FILE_FORMAT = ( + TYPE = NDJSON +); +``` diff --git a/tidb-cloud-lake/guides/load-orc.md b/tidb-cloud-lake/guides/load-orc.md new file mode 100644 index 0000000000000..53e14f1267d21 --- /dev/null +++ b/tidb-cloud-lake/guides/load-orc.md @@ -0,0 +1,141 @@ +--- +title: Loading ORC into Databend +sidebar_label: ORC +--- + +## What is ORC? + +ORC (Optimized Row Columnar) is a columnar storage format commonly used in data analytics. + +## Loading ORC File + +The common syntax for loading ORC file is as follows: + +```sql +COPY INTO [.] + FROM { internalStage | externalStage | externalLocation } +[ PATTERN = '' ] +FILE_FORMAT = (TYPE = ORC) +``` + +- For more ORC file format options, refer to [ORC File Format Options](/tidb-cloud-lake/sql/input-output-file-formats.md#orc-options). +- For more COPY INTO table options, refer to [COPY INTO table](/tidb-cloud-lake/sql/copy-into-table.md). + +## Tutorial: Loading Data from ORC Files + +This tutorial demonstrates how to load data from ORC files stored in an S3 bucket into a Databend table. + +### Step 1. Create an External Stage + +Create an external stage which points to the ORC files in the S3 bucket. + +```sql +CREATE OR REPLACE CONNECTION aws_s3 + STORAGE_TYPE='s3' + ACCESS_KEY_ID='your-ak' + SECRET_ACCESS_KEY='your-sk'; + +CREATE OR REPLACE STAGE orc_data_stage + URL='s3://wizardbend/databend-doc/sample-data/orc/' + CONNECTION=(CONNECTION_NAME='aws_s3'); +``` + +List the files in the stage: + +```sql +LIST @orc_data_stage; +``` + +Result: + +```text + +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ size │ md5 │ last_modified │ creator │ +├───────────────┼────────┼────────────────────────────────────┼───────────────────────────────┼──────────────────┤ +│ README.txt │ 494 │ "72529dd37b12faf08b090f941507a4f4" │ 2024-06-05 03:05:02.000 +0000 │ NULL │ +│ userdata1.orc │ 47448 │ "1595b4de335ac1825af2b846e82fbf48" │ 2024-06-05 03:05:36.000 +0000 │ NULL │ +│ userdata2.orc │ 46545 │ "8a8a1db8475a46365fcb3bcf773fa703" │ 2024-06-05 03:06:47.000 +0000 │ NULL │ +│ userdata3.orc │ 47159 │ "fb8a92554f90c9385388bd91eb1a25f1" │ 2024-06-05 03:12:52.000 +0000 │ NULL │ +│ userdata4.orc │ 47219 │ "222b1fbde459fd9233f5da5613dbcfa1" │ 2024-06-05 03:13:05.000 +0000 │ NULL │ +│ userdata5.orc │ 47206 │ "f12d768b5d210f488dcf55ed86ceaca6" │ 2024-06-05 03:13:16.000 +0000 │ NULL │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Step 2: Querying the Stage Files + +Create a file format for ORC and query the stage to view the data and schema. + +```sql +-- Create a ORC file format +CREATE OR REPLACE FILE FORMAT orc_ff TYPE = 'ORC'; + + +SELECT * +FROM @orc_data_stage ( + FILE_FORMAT => 'orc_ff', + PATTERN => '.*[.]orc' +) t +LIMIT 10; +``` + +Result: + +```text +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ _col0 │ _col1 │ _col2 │ _col3 │ _col4 │ _col5 │ _col6 │ _col7 │ _col8 │ _col9 │ _col10 │ _col11 │ _col12 │ +├─────────────────────┼─────────────────┼──────────────────┼──────────────────┼──────────────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────────┼──────────────────┼───────────────────┼──────────────────────────┼──────────────────┤ +│ 2016-02-03 07:55:29 │ 1 │ Amanda │ Jordan │ ajordan0@com.com │ Female │ 1.197.201.2 │ 6759521864920116 │ Indonesia │ 3/8/1971 │ 49756.53 │ Internal Auditor │ 1E+02 │ +│ 2016-02-03 17:04:03 │ 2 │ Albert │ Freeman │ afreeman1@is.gd │ Male │ 218.111.175.34 │ │ Canada │ 1/16/1968 │ 150280.17 │ Accountant IV │ │ +│ 2016-02-03 01:09:31 │ 3 │ Evelyn │ Morgan │ emorgan2@altervista.org │ Female │ 7.161.136.94 │ 6767119071901597 │ Russia │ 2/1/1960 │ 144972.51 │ Structural Engineer │ │ +│ 2016-02-03 00:36:21 │ 4 │ Denise │ Riley │ driley3@gmpg.org │ Female │ 140.35.109.83 │ 3576031598965625 │ China │ 4/8/1997 │ 90263.05 │ Senior Cost Accountant │ │ +│ 2016-02-03 05:05:31 │ 5 │ Carlos │ Burns │ cburns4@miitbeian.gov.cn │ │ 169.113.235.40 │ 5602256255204850 │ South Africa │ │ NULL │ │ │ +│ 2016-02-03 07:22:34 │ 6 │ Kathryn │ White │ kwhite5@google.com │ Female │ 195.131.81.179 │ 3583136326049310 │ Indonesia │ 2/25/1983 │ 69227.11 │ Account Executive │ │ +│ 2016-02-03 08:33:08 │ 7 │ Samuel │ Holmes │ sholmes6@foxnews.com │ Male │ 232.234.81.197 │ 3582641366974690 │ Portugal │ 12/18/1987 │ 14247.62 │ Senior Financial Analyst │ │ +│ 2016-02-03 06:47:06 │ 8 │ Harry │ Howell │ hhowell7@eepurl.com │ Male │ 91.235.51.73 │ │ Bosnia and Herzegovina │ 3/1/1962 │ 186469.43 │ Web Developer IV │ │ +│ 2016-02-03 03:52:53 │ 9 │ Jose │ Foster │ jfoster8@yelp.com │ Male │ 132.31.53.61 │ │ South Korea │ 3/27/1992 │ 231067.84 │ Software Test Engineer I │ 1E+02 │ +│ 2016-02-03 18:29:47 │ 10 │ Emily │ Stewart │ estewart9@opensource.org │ Female │ 143.28.251.245 │ 3574254110301671 │ Nigeria │ 1/28/1997 │ 27234.28 │ Health Coach IV │ │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Step 4: Create Target Table + +Create a target table in Databend to store the data from the ORC files. We choose some of the columns from the ORC files to create the table. + +```sql +CREATE OR REPLACE TABLE orc_test_table ( + firstname STRING, + lastname STRING, + email STRING, + gender STRING, + country STRING +); +``` + +### Step 5. Using SELECT to Copy Data + +Copy the data from the ORC files in the external stage into the target table. + +```sql +COPY INTO orc_test_table +FROM ( + SELECT _col2, _col3, _col4, _col5, _col8 + FROM @orc_data_stage +) +PATTERN = '.*[.]orc' +FILE_FORMAT = (TYPE = ORC); +``` + +Result: + +```text +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ File │ Rows_loaded │ Errors_seen │ First_error │ First_error_line │ +├───────────────┼─────────────┼─────────────┼──────────────────┼──────────────────┤ +│ userdata1.orc │ 1000 │ 0 │ NULL │ NULL │ +│ userdata2.orc │ 1000 │ 0 │ NULL │ NULL │ +│ userdata3.orc │ 1000 │ 0 │ NULL │ NULL │ +│ userdata4.orc │ 1000 │ 0 │ NULL │ NULL │ +│ userdata5.orc │ 1000 │ 0 │ NULL │ NULL │ +└─────────────────────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/guides/load-parquet.md b/tidb-cloud-lake/guides/load-parquet.md new file mode 100644 index 0000000000000..e8c03987071a0 --- /dev/null +++ b/tidb-cloud-lake/guides/load-parquet.md @@ -0,0 +1,113 @@ +--- +title: Loading Parquet into Databend +sidebar_label: Parquet +--- + +## What is Parquet? + +Parquet is a columnar storage format commonly used in data analytics. It is designed to support complex data structures, and it is efficient for processing large datasets. + +Parquet file is most friendly to Databend. It is recommended to use Parquet file as the data source for Databend. + +## Loading Parquet File + +The common syntax for loading Parquet file is as follows: + +```sql +COPY INTO [.] + FROM { internalStage | externalStage | externalLocation } +[ PATTERN = '' ] +FILE_FORMAT = (TYPE = PARQUET) +``` + +- For more Parquet file format options, refer to [Parquet File Format Options](/tidb-cloud-lake/sql/input-output-file-formats.md#parquet-options). +- For more COPY INTO table options, refer to [COPY INTO table](/tidb-cloud-lake/sql/copy-into-table.md). + +## Tutorial: Loading Data from Parquet Files + +### Step 1. Create an Internal Stage + +Create an internal stage to store the Parquet files. + +```sql +CREATE STAGE my_parquet_stage; +``` + +### Step 2. Create Parquet files + +Generate a Parquet file using these SQL statements: + +```sql +COPY INTO @my_parquet_stage +FROM ( + SELECT + 'Title_' || CAST(number AS VARCHAR) AS title, + 'Author_' || CAST(number AS VARCHAR) AS author + FROM numbers(100000) +) + FILE_FORMAT = (TYPE = PARQUET); +``` + +Verify the creation of the Parquet file: + +```sql +LIST @my_parquet_stage; +``` + +Result: + +```text + +┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ size │ md5 │ last_modified │ creator │ +├─────────────────────────────────────────────────────────────────┼────────┼────────────────────────────────────┼───────────────────────────────┼──────────────────┤ +│ data_3890e0b1-0233-422c-b506-3a4501602f28_0000_00000000.parquet │ 65443 │ "ab4631846ca8a2beed6a48be75d2acac" │ 2023-12-26 10:28:18.000 +0000 │ NULL │ +└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +More details about unload data to stage can be found in [COPY INTO location](/tidb-cloud-lake/sql/copy-into-location.md). + +### Step 3: Create Target Table + +```sql +CREATE TABLE books +( + title VARCHAR, + author VARCHAR +); +``` + +### Step 4. Copying Directly from Parquet + +To directly copy data into your table from Parquet files, use the following SQL command: + +```sql +COPY INTO books + FROM @my_parquet_stage + PATTERN = '.*[.]parquet' + FILE_FORMAT = (TYPE = PARQUET); +``` + +Result: + +```text +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ File │ Rows_loaded │ Errors_seen │ First_error │ First_error_line │ +├─────────────────────────────────────────────────────────────────┼─────────────┼─────────────┼──────────────────┼──────────────────┤ +│ data_3890e0b1-0233-422c-b506-3a4501602f28_0000_00000000.parquet │ 100000 │ 0 │ NULL │ NULL │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Step 4 (Option). Using SELECT to Copy Data + +For more control, like transforming data while copying, use the SELECT statement. Learn more at [`SELECT from Parquet`](../04-transform/00-querying-parquet.md) + +```sql +COPY INTO books (title, author) +FROM ( + SELECT title, author + FROM @my_parquet_stage +) +PATTERN = '.*[.]parquet' +FILE_FORMAT = (TYPE = PARQUET); +``` diff --git a/tidb-cloud-lake/guides/load-semi-structured-formats.md b/tidb-cloud-lake/guides/load-semi-structured-formats.md new file mode 100644 index 0000000000000..0558809665412 --- /dev/null +++ b/tidb-cloud-lake/guides/load-semi-structured-formats.md @@ -0,0 +1,18 @@ +--- +title: Loading Semi-structured Formats +--- + +## What is Semi-structured Data? + +Semi-structured data contains tags or markers to separate semantic elements while not conforming to rigid database structures. Databend efficiently loads these formats using the `COPY INTO` command, with optional on-the-fly data transformation. + +## Supported File Formats + +| File Format | Description | Guide | +| ----------- | ----------- | ----- | +| **Parquet** | Efficient columnar storage format | [Loading Parquet](/tidb-cloud-lake/guides/load-parquet.md) | +| **CSV** | Comma-separated values | [Loading CSV](/tidb-cloud-lake/guides/load-csv.md) | +| **TSV** | Tab-separated values | [Loading TSV](/tidb-cloud-lake/guides/load-tsv.md) | +| **NDJSON** | Newline-delimited JSON | [Loading NDJSON](/tidb-cloud-lake/guides/load-ndjson.md) | +| **ORC** | Optimized Row Columnar format | [Loading ORC](/tidb-cloud-lake/guides/load-orc.md) | +| **Avro** | Row-based format with schema definition | [Loading Avro](/tidb-cloud-lake/guides/load-avro.md) | diff --git a/tidb-cloud-lake/guides/load-tsv.md b/tidb-cloud-lake/guides/load-tsv.md new file mode 100644 index 0000000000000..2ba1f3210b9a2 --- /dev/null +++ b/tidb-cloud-lake/guides/load-tsv.md @@ -0,0 +1,127 @@ +--- +title: Loading TSV into Databend +sidebar_label: TSV +--- + +## What is TSV? + +TSV (Tab Separated Values) is a simple file format used to store tabular data, such as a spreadsheet or database. The TSV file format is very similar to CSV, the records are separated by newlines, and each field is separated by a tab. +The following example shows a TSV file with two records: + +```text +Title_0 Author_0 +Title_1 Author_1 +``` + +## Loading TSV File + +The common syntax for loading TSV file is as follows: + +```sql +COPY INTO [.] +FROM { userStage | internalStage | externalStage | externalLocation } +[ PATTERN = '' ] +[ FILE_FORMAT = ( + TYPE = TSV, + SKIP_HEADER = , + COMPRESSION = AUTO +) ] +``` + +- For more TSV file format options, refer to [TSV File Format Options](/tidb-cloud-lake/sql/input-output-file-formats.md#tsv-options). +- For more COPY INTO table options, refer to [COPY INTO table](/tidb-cloud-lake/sql/copy-into-table.md). + +## Tutorial: Loading Data from TSV Files + +### Step 1. Create an Internal Stage + +Create an internal stage to store the TSV files. + +```sql +CREATE STAGE my_tsv_stage; +``` + +### Step 2. Create TSV files + +Generate a TSV file using these SQL statements: + +```sql +COPY INTO @my_tsv_stage +FROM ( + SELECT + 'Title_' || CAST(number AS VARCHAR) AS title, + 'Author_' || CAST(number AS VARCHAR) AS author + FROM numbers(100000) +) + FILE_FORMAT = (TYPE = TSV) +; +``` + +Verify the creation of the TSV file: + +```sql +LIST @my_tsv_stage; +``` + +Result: + +```text +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ size │ md5 │ last_modified │ creator │ +├─────────────────────────────────────────────────────────────┼─────────┼────────────────────────────────────┼───────────────────────────────┼──────────────────┤ +│ data_7413d5d0-f992-4d92-b28e-0e501d66bdc1_0000_00000000.tsv │ 2477780 │ "a906769144de7aa6a0056a86ddae97d2" │ 2023-12-26 11:56:19.000 +0000 │ NULL │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Step 3: Create Target Table + +```sql +CREATE TABLE books +( + title VARCHAR, + author VARCHAR +); +``` + +### Step 4. Copying Directly from TSV + +To directly copy data into your table from TSV files, use the following SQL command: + +```sql +COPY INTO books +FROM @my_tsv_stage +PATTERN = '.*[.]tsv' +FILE_FORMAT = ( + TYPE = TSV, + SKIP_HEADER = 0, -- Skip the first line if it is a header, here we don't have a header + COMPRESSION = AUTO +); +``` + +Result: + +```text +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ File │ Rows_loaded │ Errors_seen │ First_error │ First_error_line │ +├─────────────────────────────────────────────────────────────┼─────────────┼─────────────┼──────────────────┼──────────────────┤ +│ data_7413d5d0-f992-4d92-b28e-0e501d66bdc1_0000_00000000.tsv │ 100000 │ 0 │ NULL │ NULL │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Step 4 (Option). Using SELECT to Copy Data + +For more control, like transforming data while copying, use the SELECT statement. Learn more at [`SELECT from TSV`](../04-transform/02-querying-tsv.md). + +```sql +COPY INTO books (title, author) +FROM ( + SELECT $1, $2 + FROM @my_tsv_stage +) +PATTERN = '.*[.]tsv' +FILE_FORMAT = ( + TYPE = 'TSV', + SKIP_HEADER = 0, -- Skip the first line if it is a header, here we don't have a header + COMPRESSION = 'AUTO' +); +``` diff --git a/tidb-cloud-lake/guides/load-with-addax.md b/tidb-cloud-lake/guides/load-with-addax.md new file mode 100644 index 0000000000000..301d85f66e1df --- /dev/null +++ b/tidb-cloud-lake/guides/load-with-addax.md @@ -0,0 +1,15 @@ +--- +title: Addax +--- + +[Addax](https://github.com/wgzhao/Addax), originally derived from Alibaba's [DataX](https://github.com/alibaba/DataX), is a versatile open-source ETL (Extract, Transform, Load) tool. It excels at seamlessly transferring data between diverse RDBMS (Relational Database Management Systems) and NoSQL databases, making it an optimal solution for efficient data migration. + +For information about the system requirements, download, and deployment steps for Addax, refer to Addax's [Getting Started Guide](https://github.com/wgzhao/Addax#getting-started). The guide provides detailed instructions and guidelines for setting up and using Addax. + +### DatabendReader & DatabendWriter + +DatabendReader and DatabendWriter are integrated plugins of Addax, allowing seamless integration with Databend. The DatabendReader plugin enables reading data from Databend. Databend provides compatibility with the MySQL client protocol, so you can also use the [MySQLReader](https://wgzhao.github.io/Addax/develop/reader/mysqlreader/) plugin to retrieve data from Databend. For more information about DatabendReader, see https://wgzhao.github.io/Addax/develop/reader/databendreader/ + +### Tutorials + +- [Migrating from MySQL with Addax](/tutorials/migrate/migrating-from-mysql-with-addax) \ No newline at end of file diff --git a/tidb-cloud-lake/guides/load-with-airbyte.md b/tidb-cloud-lake/guides/load-with-airbyte.md new file mode 100644 index 0000000000000..49ed70cfce438 --- /dev/null +++ b/tidb-cloud-lake/guides/load-with-airbyte.md @@ -0,0 +1,87 @@ +--- +title: Airbyte +--- + +

+Airbyte +

+ +## What is [Airbyte](https://airbyte.com/)? + +- Airbyte is an open-source data integration platform that syncs data from applications, APIs & databases to data warehouses lakes & DBs. +- You could load data from any airbyte source to Databend. + +Currently we implemented an experimental airbyte destination allow you to send data from your airbyte source to databend + +**NOTE**: + +currently we only implemented the `append` mode, which means the destination will only append data to the table, and will not overwrite, update or delete any data. +Plus, we assume that your databend destination is **S3 Compatible** since we used presign to copy data from databend stage to table. + +To check whether your backend support the integration, you could simply run the following command + +```sql +CREATE STAGE IF NOT EXISTS airbyte_stage FILE_FORMAT = (TYPE = CSV); +PRESIGN UPLOAD @airbyte_stage/test.csv; +``` + +If you got an error like `Code: 501, Text = Presign is not supported`, then you could not use the integration. +Please read [this](../../20-self-hosted/02-deployment/01-non-production/00-deploying-local.md) for how to use S3 as a storage backend. + +## Create a Databend User + +Connect to Databend server with MySQL client: + +```shell +mysql -h127.0.0.1 -uroot -P3307 +``` + +Create a user: + +```sql +CREATE ROLE airbyte_role; +CREATE USER user1 IDENTIFIED BY 'abc123' WITH DEFAULT_ROLE = 'airbyte_role'; +``` + +Create a Database: + +```sql +CREATE DATABASE airbyte; +``` + +Grant privileges to the role and assign it to the user: + +```sql +GRANT ALL PRIVILEGES ON airbyte.* TO ROLE airbyte_role; +GRANT ROLE airbyte_role TO user1; +``` + +## Configure Airbyte + +To use Databend with Airbyte, you should add our customized connector to your Airbyte Instance. +You could add the destination in Settings -> Destinations -> Custom Destinations -> Add a Custom Destination Page. +Our custom destination image is `datafuselabs/destination-databend:alpha` + +

+Configure Airbyte +

+ +## Setup Databend destination + +**Note**: + +You should have a databend instance running and accessible from your airbyte instance. +For local airbyte, you could not connect docker compose with your localhost network. +You may take a look at [ngrok](https://ngrok.com/) to tunnel your service(**NEVER** expose it on your production environment). + +

+Setup Databend destination +

+ +## Test your integration + +You could use Faker source to test your integration, after sync completed, you could run the following command to see expected uploaded data. + +```sql +select * from default._airbyte_raw_users limit 5; +``` diff --git a/tidb-cloud-lake/guides/load-with-datax.md b/tidb-cloud-lake/guides/load-with-datax.md new file mode 100644 index 0000000000000..9163fab6d167d --- /dev/null +++ b/tidb-cloud-lake/guides/load-with-datax.md @@ -0,0 +1,23 @@ +--- +title: DataX +--- + +[DataX](https://github.com/alibaba/DataX) is an open-source data integration tool developed by Alibaba. It is designed to efficiently and reliably transfer data between various data storage systems and platforms, such as relational databases, big data platforms, and cloud storage services. DataX supports a wide range of data sources and data sinks, including but not limited to MySQL, Oracle, SQL Server, PostgreSQL, HDFS, Hive, HBase, MongoDB, and more. + +:::tip +[Apache DolphinScheduler](https://dolphinscheduler.apache.org/) now has added support for Databend as a data source. This enhancement enables you to leverage DolphinScheduler for managing DataX tasks and effortlessly load data from MySQL to Databend. +::: + +For information about the system requirements, download, and deployment steps for DataX, refer to DataX's [Quick Start Guide](https://github.com/alibaba/DataX/blob/master/userGuid.md). The guide provides detailed instructions and guidelines for setting up and using DataX. + +### DatabendWriter + +DatabendWriter is an integrated plugin of DataX, which means it comes pre-installed and does not require any manual installation. It acts as a seamless connector that enables the effortless transfer of data from other databases to Databend. With DatabendWriter, you can leverage the capabilities of DataX to efficiently load data from various databases into Databend. + +DatabendWriter supports two operational modes: INSERT (default) and REPLACE. In INSERT Mode, new data is added while conflicts with existing records are prevented to maintain data integrity. On the other hand, the REPLACE Mode prioritizes data consistency by replacing existing records with newer data in case of conflicts. + +If you need more information about DatabendWriter and its functionalities, you can refer to the documentation available at https://github.com/alibaba/DataX/blob/master/databendwriter/doc/databendwriter.md + +### Tutorials + +- [Migrating from MySQL with DataX](/tutorials/migrate/migrating-from-mysql-with-datax) \ No newline at end of file diff --git a/tidb-cloud-lake/guides/load-with-dbt.md b/tidb-cloud-lake/guides/load-with-dbt.md new file mode 100644 index 0000000000000..b7a1beb7eb670 --- /dev/null +++ b/tidb-cloud-lake/guides/load-with-dbt.md @@ -0,0 +1,51 @@ +--- +title: dbt +--- + +[dbt](https://www.getdbt.com/) is a transformation workflow that helps you get more work done while producing higher quality results. You can use dbt to modularize and centralize your analytics code, while also providing your data team with guardrails typically found in software engineering workflows. Collaborate on data models, version them, and test and document your queries before safely deploying them to production, with monitoring and visibility. + +[dbt-databend-cloud](https://github.com/databendcloud/dbt-databend) is a plugin developed by Databend with the primary goal of enabling smooth integration between dbt and Databend. By utilizing this plugin, you can seamlessly perform data modeling, transformation, and cleansing tasks using dbt and conveniently load the output into Databend. The table below illustrates the level of support that the dbt-databend-cloud plugin offers for commonly used features in dbt: + +| Feature | Supported ? | +|----------------------------- |----------- | +| Table Materialization | Yes | +| View Materialization | Yes | +| Incremental Materialization | Yes | +| Ephemeral Materialization | No | +| Seeds | Yes | +| Sources | Yes | +| Custom Data Tests | Yes | +| Docs Generate | Yes | +| Snapshots | No | +| Connection Retry | Yes | + +## Installing dbt-databend-cloud + +Installing the dbt-databend-cloud plugin has been streamlined for your convenience, as it now includes dbt as a required dependency. To effortlessly set up both dbt and the dbt-databend-cloud plugin, run the following command: + +```shell +pip3 install dbt-databend-cloud +``` + +However, if you prefer to install dbt separately, you can refer to the official dbt installation guide for detailed instructions. + +## Tutorial: Run dbt Project jaffle_shop + +If you're new to dbt, Databend recommends completing the official dbt tutorial available at https://github.com/dbt-labs/jaffle_shop. Before you start, follow [Installing dbt-databend-cloud](#installing-dbt-databend-cloud) to install dbt and dbt-databend-cloud. + +This tutorial provides a sample dbt project called "jaffle_shop," offering hands-on experience with the dbt tool. By configuring the default global profile (~/.dbt/profiles.yml) with the necessary information to connect to your Databend instance, the project will generate tables and views defined in the dbt models directly in your Databend database. Here's an example of the file profiles.yml that connects to a local Databend instance: + +```yml title="~/.dbt/profiles.yml" +jaffle_shop_databend: + target: dev + outputs: + dev: + type: databend + host: 127.0.0.1 + port: 8000 + schema: sjh_dbt + user: databend + pass: ******** +``` + +If you're using Databend Cloud, you can refer to this [Wiki page](https://github.com/databendcloud/dbt-databend/wiki/How-to-use-dbt-with-Databend-Cloud) for step-by-step instructions on how to run the jaffle_shop dbt project. \ No newline at end of file diff --git a/tidb-cloud-lake/guides/load-with-debezium.md b/tidb-cloud-lake/guides/load-with-debezium.md new file mode 100644 index 0000000000000..d186e8108075f --- /dev/null +++ b/tidb-cloud-lake/guides/load-with-debezium.md @@ -0,0 +1,94 @@ +--- +title: Debezium +--- + +[Debezium](https://debezium.io/) is a set of distributed services to capture changes in your databases so that your applications can see those changes and respond to them. Debezium records all row-level changes within each database table in a change event stream, and applications simply read these streams to see the change events in the same order in which they occurred. + +[debezium-server-databend](https://github.com/databendcloud/debezium-server-databend) is a lightweight CDC tool developed by Databend, based on Debezium Engine. Its purpose is to capture real-time changes in relational databases and deliver them as event streams to ultimately write the data into the target database Databend. This tool provides a simple way to monitor and capture database changes, transforming them into consumable events without the need for large data infrastructures like Flink, Kafka, or Spark. + +### Installing debezium-server-databend + +debezium-server-databend can be installed independently without the need for installing Debezium beforehand. Once you have decided to install debezium-server-databend, you have two options available. The first one is to install it from source by downloading the source code and building it yourself. Alternatively, you can opt for a more straightforward installation process using Docker. + +#### Installing debezium-server-databend from Source + +Before you start, make sure JDK 11 and Maven are installed on your system. + +1. Clone the project: + +```bash +git clone https://github.com/databendcloud/debezium-server-databend.git +``` + +2. Change into the project's root directory: + +```bash +cd debezium-server-databend +``` + +3. Build and package debezium server: + +```go +mvn -Passembly -Dmaven.test.skip package +``` + +4. Once the build is completed, unzip the server distribution package: + +```bash +unzip debezium-server-databend-dist/target/debezium-server-databend-dist*.zip -d databendDist +``` + +5. Enter the extracted folder: + +```bash +cd databendDist +``` + +6. Create a file named _application.properties_ in the _conf_ folder with the content in the sample [here](https://github.com/databendcloud/debezium-server-databend/blob/main/debezium-server-databend-dist/src/main/resources/distro/conf/application.properties.example), and modify the configurations according to your specific requirements. For description of the available parameters, see this [page](https://github.com/databendcloud/debezium-server-databend/blob/main/docs/docs.md). + +```bash +nano conf/application.properties +``` + +7. Use the provided script to start the tool: + +```bash +bash run.sh +``` + +#### Installing debezium-server-databend with Docker + +Before you start, make sure Docker and Docker Compose are installed on your system. + +1. Create a file named _application.properties_ in the _conf_ folder with the content in the sample [here](https://github.com/databendcloud/debezium-server-databend/blob/main/debezium-server-databend-dist/src/main/resources/distro/conf/application.properties.example), and modify the configurations according to your specific requirements. For description of the available Databend parameters, see this [page](https://github.com/databendcloud/debezium-server-databend/blob/main/docs/docs.md). + +```bash +nano conf/application.properties +``` + +2. Create a file named _docker-compose.yml_ with the following content: + +```dockerfile +version: '2.1' +services: + debezium: + image: ghcr.io/databendcloud/debezium-server-databend:pr-2 + ports: + - "8080:8080" + - "8083:8083" + volumes: + - $PWD/conf:/app/conf + - $PWD/data:/app/data +``` + +3. Open a terminal or command-line interface and navigate to the directory containing the _docker-compose.yml_ file. + +4. Use the following command to start the tool: + +```bash +docker-compose up -d +``` + +### Tutorials + +- [Migrating from MySQL with Debezium](/tutorials/migrate/migrating-from-mysql-with-debezium) \ No newline at end of file diff --git a/tidb-cloud-lake/guides/load-with-flink-cdc.md b/tidb-cloud-lake/guides/load-with-flink-cdc.md new file mode 100644 index 0000000000000..4bf9c22bbc084 --- /dev/null +++ b/tidb-cloud-lake/guides/load-with-flink-cdc.md @@ -0,0 +1,5 @@ +--- +title: Flink CDC +--- + +See [Flink CDC](/tidb-cloud-lake/tutorials/migrate-mysql-with-flink-cdc.md). diff --git a/tidb-cloud-lake/guides/load-with-kafka.md b/tidb-cloud-lake/guides/load-with-kafka.md new file mode 100644 index 0000000000000..20d1ba46706bc --- /dev/null +++ b/tidb-cloud-lake/guides/load-with-kafka.md @@ -0,0 +1,34 @@ +--- +title: Kafka +--- + +[Apache Kafka](https://kafka.apache.org/) is an open-source distributed event streaming platform that allows you to publish and subscribe to streams of records. It is designed to handle high-throughput, fault-tolerant, and real-time data feeds. Kafka enables seamless communication between various applications, making it an ideal choice for building data pipelines and streaming data processing applications. + +Databend provides the following plugins and tools for data ingestion from Kafka topics: + +- [databend-kafka-connect](#databend-kafka-connect) +- [bend-ingest-kafka](#bend-ingest-kafka) + +## databend-kafka-connect + +The [databend-kafka-connect](https://github.com/databendcloud/databend-kafka-connect) is a Kafka Connect sink connector plugin designed specifically for Databend. This plugin enables seamless data transfer from Kafka topics directly into Databend tables, allowing for real-time data ingestion with minimal configuration. Key features of databend-kafka-connect include: + +- Automatically creates tables in Databend based on the data schema. +- Supports both **Append Only** and **Upsert** write modes. +- Automatically adjusts the schema of Databend tables as the structure of incoming data changes. + +To download databend-kafka-connect and learn more about the plugin, visit the [GitHub repository](https://github.com/databendcloud/databend-kafka-connect) and refer to the README for detailed instructions. + +## bend-ingest-kafka + +[bend-ingest-kafka](https://github.com/databendcloud/bend-ingest-kafka) is a high-performance data ingestion tool specifically designed to efficiently load data from Kafka topics into Databend tables. It supports two primary modes of operation: JSON Transform Mode and Raw Mode, catering to different data ingestion requirements. Key features of bend-ingest-kafka include: + +- Supports two modes: **JSON Transform Mode**, which maps Kafka JSON data directly to Databend tables based on the data schema, and **Raw Mode**, which ingests raw Kafka data while capturing complete Kafka record metadata. +- Provides configurable batch processing settings for size and interval, ensuring efficient and scalable data ingestion. + +To download bend-ingest-kafka and learn more about the tool, visit the [GitHub repository](https://github.com/databendcloud/bend-ingest-kafka) and refer to the README for detailed instructions. + +## Tutorials + +- [Loading from Kafka with bend-ingest-kafka](/tutorials/ingest-and-stream/kafka-bend-ingest-kafka) +- [Loading from Kafka with databend-kafka-connect](/tutorials/ingest-and-stream/kafka-databend-kafka-connect) diff --git a/tidb-cloud-lake/guides/load-with-tapdata.md b/tidb-cloud-lake/guides/load-with-tapdata.md new file mode 100644 index 0000000000000..d4835fdb24c4f --- /dev/null +++ b/tidb-cloud-lake/guides/load-with-tapdata.md @@ -0,0 +1,31 @@ +--- +title: Tapdata +--- + +[Tapdata](https://tapdata.net) is a platform-oriented product designed for data services, aimed at helping enterprises break down multiple data silos, achieve rapid data delivery, and enhance data transfer efficiency through real-time data synchronization. We also support the creation of tasks through a low-code approach, making it easy to create tasks with a simple drag-and-drop of nodes, effectively reducing development complexity and shortening project deployment cycles. + +Databend is one of the data sources supported by Tapdata. You can use Tapdata to synchronize data from other platforms to Databend, using Databend as the **destination** for data migration/synchronization. + +![Alt text](@site/static/img/documents_cn/getting-started/tapdata-databend.png) + +## Integrating with Tapdata Cloud + +To establish a connection with Databend Cloud and set it as the synchronization destination in [Tapdata Cloud](https://tapdata.net/tapdata-cloud.html), you need to complete the following steps: + +### Step 1: Deploy Tapdata Agent + +Tapdata Agent is a key program in data synchronization, data heterogeneity, and data development scenarios. Given the high real-time requirements for data flow in these scenarios, deploying Tapdata Agent in your local environment ensures optimal performance based on low-latency local networks to guarantee real-time data flow. + +For Tapdata Agent download and installation instructions, please refer to [Step 1: Provision TapData - Tapdata Cloud](https://docs.tapdata.io/faq/agent-installation). + +### Step 2: Create Connections + +You need to establish a connection for each of the data source and data destination for data synchronization. For example, if you want to synchronize data from MySQL to Databend Cloud, you need to create two connections on Tapdata Cloud—one connecting to MySQL and the other to Databend Cloud. Follow the steps outlined in [Step 2: Connect Data Sources](https://docs.tapdata.io/connectors/) for creating connections. + +Here is an example of connecting to Databend Cloud: + +![Alt text](@site/static/img/documents_cn/getting-started/tapdata-connect.png) + +### Step 3: Create Data Replication Tasks + +Once connections to the data source and Databend Cloud are established, you can begin data synchronization by creating data replication tasks. Refer to [Create a Data Replication Task](https://docs.tapdata.io/data-replication/create-task/) for the operational steps. diff --git a/tidb-cloud-lake/guides/load-with-vector.md b/tidb-cloud-lake/guides/load-with-vector.md new file mode 100644 index 0000000000000..d7fce37e5eefd --- /dev/null +++ b/tidb-cloud-lake/guides/load-with-vector.md @@ -0,0 +1,423 @@ +--- +title: Vector +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +[Vector](https://vector.dev/) is a high-performance observability data pipeline that puts organizations in control of their observability data. Collect, transform, and route all your logs, metrics, and traces to any vendors you want today and any other vendors you may want tomorrow. Vector enables dramatic cost reduction, novel data enrichment, and data security where you need it, not where is most convenient for your vendors. Open source and up to 10x faster than every alternative. + +Vector natively supports delivering data to [Databend as a sink](https://vector.dev/docs/reference/configuration/sinks/databend/), this means that Vector can send data to Databend for storage or further processing. Databend acts as the destination for the data collected and processed by Vector. By configuring Vector to use Databend as a sink, you can seamlessly transfer data from Vector to Databend, enabling efficient data analysis, storage, and retrieval. + +## Integrating with Vector + +To integrate Databend with Vector, start by creating an SQL account in Databend and assigning appropriate permissions. This account will be used for communication and data transfer between Vector and Databend. Then, in the Vector configuration, set up Databend as a Sink. + +### Step 1: Creating an SQL User in Databend + +For instructions on how to create a SQL user in Databend and grant appropriate privileges, see [Create User](/tidb-cloud-lake/sql/create-user.md). Here's an example of creating a user named *user1* with the password *abc123*: + +```sql +CREATE ROLE vector_role; +CREATE USER user1 IDENTIFIED BY 'abc123' WITH DEFAULT_ROLE = 'vector_role'; + +CREATE DATABASE nginx; + +GRANT INSERT ON nginx.* TO ROLE vector_role; +GRANT ROLE vector_role TO user1; +``` + +### Step 2: Configure Databend as a Sink in Vector + +In this step, configure Databend as a sink in Vector by specifying the necessary settings such as the input source, compression, database, endpoint, table, and authentication credentials (username and password) for Databend integration. The following is a simple example of configuring Databend as a sink. For a comprehensive list of configuration parameters, refer to the Vector documentation at https://vector.dev/docs/reference/configuration/sinks/databend/ + +```toml title='vector.toml' +... + +[sinks.databend_sink] +type = "databend" +inputs = [ "my-source-or-transform-id" ] # input source +compression = "none" +database = "nginx" #Your database +endpoint = "http://localhost:8000" +table = "mytable" #Your table + +... + +[sinks.databend_sink.auth] +strategy = "basic" +// highlight-next-line +user = "user1" #Databend username +// highlight-next-line +password = "abc123" #Databend password + +... +``` + +## Nginx Access Log Example + +### Step 1. Deploy Databend + +#### 1.1 Install Databend + +Follow the [Docker and Local Deployments](../../20-self-hosted/02-deployment/01-non-production/00-deploying-local.md) guide to deploy a local Databend, or deploy a warehouse in the Databend Cloud. + +#### 1.2 Create a Database and a Table + +```sql +CREATE DATABASE nginx; +``` + +```sql +CREATE TABLE nginx.access_logs ( + `timestamp` TIMESTAMP, + `remote_addr` VARCHAR, + `remote_port` INT, + `request_method` VARCHAR, + `request_uri` VARCHAR, + `server_protocol` VARCHAR, + `status` INT, + `bytes_sent` INT, + `http_referrer` VARCHAR, + `http_user_agent` VARCHAR, + `upstream_addr` VARCHAR, + `scheme` VARCHAR, + `gzip_ratio` VARCHAR, + `request_length` INT, + `request_time` FLOAT, + `ssl_protocol` VARCHAR, + `upstream_response_time` VARCHAR +); +``` + +#### 1.3 Create a User for Vector Auth + +Create a user: + +```sql +CREATE ROLE vector_role; +CREATE USER user1 IDENTIFIED BY 'abc123' WITH DEFAULT_ROLE = 'vector_role'; +``` + +Grant privileges to the role and assign it to the user: + +```sql +GRANT INSERT ON nginx.* TO ROLE vector_role; +GRANT ROLE vector_role TO user1; +``` + +### Step 2. Deploy Nginx + +#### 2.1 Install Nginx + +If you haven't install Nginx, please refer to [How to Install Nginx](https://www.nginx.com/resources/wiki/start/topics/tutorials/install/). + +#### 2.2 Configure Nginx + +```shell title='nginx.conf' +user www-data; +worker_processes 4; +pid /var/run/nginx.pid; + +events { + worker_connections 768; +} + +http { + ## + # Logging Settings + ## + log_format upstream '$remote_addr "$time_local" $host "$request_method $request_uri $server_protocol" $status $bytes_sent "$http_referrer" "$http_user_agent" $remote_port $upstream_addr $scheme $gzip_ratio $request_length $request_time $ssl_protocol "$upstream_response_time"'; + + access_log /var/log/nginx/access.log upstream; + error_log /var/log/nginx/error.log; + + include /etc/nginx/conf.d/*.conf; + include /etc/nginx/sites-enabled/*; +} +``` +This is how the log message looks: +```text +::1 "09/Apr/2022:11:13:39 +0800" localhost "GET /?xx HTTP/1.1" 304 189 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36" 50758 - http - 1202 0.000 - "-" +``` + +Use the new `nginx.conf` replace your Nginx configuration and restart the Nginx server. + +### Step 3. Deploy Vector + +#### 3.1 Install Vector + +You can [install Vector](https://vector.dev/docs/setup/installation/) with the installation script: + +```shell +curl --proto '=https' --tlsv1.2 -sSf https://sh.vector.dev | bash +``` + +#### 3.2 Configure Vector + +```toml title='vector.toml' +[sources.nginx_access_log] +type = "file" +// highlight-next-line +include = ["/var/log/nginx/access.log"] +file_key = "file" +max_read_bytes = 10240 + +[transforms.nginx_access_log_parser] +type = "remap" +inputs = ["nginx_access_log"] +drop_on_error = true + +// highlight-next-line +#nginx log_format upstream '$remote_addr "$time_local" $host "$request_method $request_uri $server_protocol" $status $bytes_sent "$http_referrer" "$http_user_agent" $remote_port $upstream_addr $scheme $gzip_ratio $request_length $request_time $ssl_protocol "$upstream_response_time"'; + +source = """ + parsed_log, err = parse_regex(.message, r'^(?P\\S+) \ +\"(?P\\S+ \\S+)\" \ +(?P\\S+) \ +\"(?P\\S+) (?P.+) (?PHTTP/\\S+)\" \ +(?P\\d+) \ +(?P\\d+) \ +\"(?P.*)\" \ +\"(?P.*)\" \ +(?P\\d+) \ +(?P.+) \ +(?P\\S+) \ +(?P\\S+) \ +(?P\\d+) \ +(?P\\S+) \ +(?P\\S+) \ +\"(?P.+)\"$') + if err != null { + log("Unable to parse access log: " + string!(.message), level: "warn") + abort + } + . = merge(., parsed_log) + .timestamp = parse_timestamp!(.time_local, format: "%d/%b/%Y:%H:%M:%S %z") + .timestamp = format_timestamp!(.timestamp, format: "%F %X") + + # Convert from string into integer. + .remote_port, err = to_int(.remote_port) + if err != null { + log("Unable to parse access log: " + string!(.remote_port), level: "warn") + abort + } + + # Convert from string into integer. + .status, err = to_int(.status) + if err != null { + log("Unable to parse access log: " + string!(.status), level: "warn") + abort + } + + # Convert from string into integer. + .bytes_sent, err = to_int(.bytes_sent) + if err != null { + log("Unable to parse access log: " + string!(.bytes_sent), level: "warn") + abort + } + + # Convert from string into integer. + .request_length, err = to_int(.request_length) + if err != null { + log("Unable to parse access log: " + string!(.request_length), level: "warn") + abort + } + + # Convert from string into float. + .request_time, err = to_float(.request_time) + if err != null { + log("Unable to parse access log: " + string!(.request_time), level: "warn") + abort + } + """ + + +[sinks.nginx_access_log_to_databend] + type = "databend" + inputs = ["nginx_access_log_parser"] + // highlight-next-line + database = "nginx" #Your database + // highlight-next-line + table = "access_logs" #Your table + // highlight-next-line + endpoint = "http://localhost:8000/" + compression = "gzip" + + +[sinks.nginx_access_log_to_databend.auth] + strategy = "basic" + // highlight-next-line + user = "user1" #Databend username + // highlight-next-line + password = "abc123" #Databend password + +[[tests]] +name = "extract fields from access log" + +[[tests.inputs]] +insert_at = "nginx_access_log_parser" +type = "raw" +// highlight-next-line +value = '::1 "09/Apr/2022:11:13:39 +0800" localhost "GET /?xx HTTP/1.1" 304 189 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36" 50758 - http - 1202 0.000 - "-"' + +[[tests.outputs]] +extract_from = "nginx_access_log_parser" + +[[tests.outputs.conditions]] +type = "vrl" +source = """ + assert_eq!(.remote_addr, "::1") + assert_eq!(.time_local, "09/Apr/2022:11:13:39 +0800") + assert_eq!(.timestamp, "2022-04-09 03:13:39") + assert_eq!(.request_method, "GET") + assert_eq!(.request_uri, "/?xx") + assert_eq!(.server_protocol, "HTTP/1.1") + assert_eq!(.status, 304) + assert_eq!(.bytes_sent, 189) + assert_eq!(.http_referrer, "-") + assert_eq!(.http_user_agent, "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36") + assert_eq!(.remote_port, 50758) + assert_eq!(.upstream_addr, "-") + assert_eq!(.scheme, "http") + assert_eq!(.gzip_ratio, "-") + assert_eq!(.request_length, 1202) + assert_eq!(.request_time, 0.000) + assert_eq!(.ssl_protocol, "-") + assert_eq!(.upstream_response_time, "-") + """ + +[[tests]] +name = "no event from wrong access log" +no_outputs_from = ["nginx_access_log_parser"] + +[[tests.inputs]] +insert_at = "nginx_access_log_parser" +type = "raw" +value = 'I am not access log' +``` + +#### 3.3 Validate Configuration + +Check the `nginx_access_log_parser` transform works or not: + +```shell +vector test ./vector.toml +``` + +If it works, the output is: + +```shell +Running tests +test extract fields from access log ... passed +2022-04-09T04:03:09.704557Z WARN transform{component_kind="transform" component_id=nginx_access_log_parser component_type=remap component_name=nginx_access_log_parser}: vrl_stdlib::log: "Unable to parse access log: I am not access log" internal_log_rate_secs=1 vrl_position=479 +test no event from wrong access log ... passed +``` + +#### 3.4 Run Vector + +```shell +vector -c ./vector.toml +``` + +### Step 4. Analyze Nginx Log in Databend + +#### 4.1 Generate logs + +Reload the home page at `http://localhost/xx/yy?mm=nn` many times, or using the [wrk](https://github.com/wg/wrk) HTTP benchmarking tool to generate a large amount Nginx logs quickly: + +```shell +wrk -t12 -c400 -d30s http://localhost +``` + +#### 4.2 Analyze Nginx Access Logs in Databend + +- __Top 10 Request Status__ + +```sql +SELECT count() AS count, status FROM nginx.access_logs GROUP BY status LIMIT 10; + ++-----------+--------+ +| count | status | ++-----------+--------+ +| 106218701 | 404 | ++-----------+--------+ +``` + +- __Top 10 Request Methods__ + +```sql +SELECT count() AS count, request_method FROM nginx.access_logs GROUP BY request_method LIMIT 10; + ++-----------+----------------+ +| count | request_method | ++-----------+----------------+ +| 106218701 | GET | ++-----------+----------------+ +``` + +- __Top 10 Request IPs__ + +```sql +SELECT count(*) AS count, remote_addr AS client FROM nginx.access_logs GROUP BY client ORDER BY count DESC LIMIT 10; + ++----------+-----------+ +| count | client | ++----------+-----------+ +| 98231357 | 127.0.0.1 | +| 2 | ::1 | ++----------+-----------+ +``` + +- __Top 10 Request Pages__ + +```sql +SELECT count(*) AS count, request_uri AS uri FROM nginx.access_logs GROUP BY uri ORDER BY count DESC LIMIT 10; + ++----------+--------------------+ +| count | uri | ++----------+--------------------+ +| 60645174 | /db/abc | +| 41727953 | /a/b/c/d/e/f/d | +| 199852 | /index.html | +| 2 | /xx/yy | ++----------+--------------------+ +``` + + +- __Top 10 HTTP 404 Pages__ + +```sql +SELECT count_if(status=404) AS count, request_uri AS uri FROM nginx.access_logs GROUP BY uri ORDER BY count DESC LIMIT 10; + ++----------+--------------------+ +| count | uri | ++----------+--------------------+ +| 64290894 | /db/abc | +| 41727953 | /a/b/c/d/e/f/d | +| 199852 | /index.html | +| 2 | /xx/yy | ++----------+--------------------+ +``` + +- __Top 10 Requests__ + +```sql +SELECT count(*) AS count, request_uri AS request FROM nginx.access_logs GROUP BY request ORDER BY count DESC LIMIT 10; + ++--------+-----------------------------------------------------------------------------------------------------+ +| count | request | ++--------+-----------------------------------------------------------------------------------------------------+ +| 199852 | /index.html HTTP/1.0 | +| 1000 | /db/abc?good=iphone&uuid=9329836906 HTTP/1.1 | +| 900 | /miaosha/i/miaosha?goodsRandomName=0e67e331-c521-406a-b705-64e557c4c06c&mobile=17967444396 HTTP/1.1 | +| 900 | /miaosha/i/miaosha?goodsRandomName=0e67e331-c521-406a-b705-64e557c4c06c&mobile=16399821384 HTTP/1.1 | +| 900 | /miaosha/i/miaosha?goodsRandomName=0e67e331-c521-406a-b705-64e557c4c06c&mobile=17033481055 HTTP/1.1 | +| 900 | /miaosha/i/miaosha?goodsRandomName=0e67e331-c521-406a-b705-64e557c4c06c&mobile=17769945743 HTTP/1.1 | +| 900 | /miaosha/i/miaosha?goodsRandomName=0e67e331-c521-406a-b705-64e557c4c06c&mobile=15414263117 HTTP/1.1 | +| 900 | /miaosha/i/miaosha?goodsRandomName=0e67e331-c521-406a-b705-64e557c4c06c&mobile=18945218607 HTTP/1.1 | +| 900 | /miaosha/i/miaosha?goodsRandomName=0e67e331-c521-406a-b705-64e557c4c06c&mobile=19889051988 HTTP/1.1 | +| 900 | /miaosha/i/miaosha?goodsRandomName=0e67e331-c521-406a-b705-64e557c4c06c&mobile=15249667263 HTTP/1.1 | ++--------+-----------------------------------------------------------------------------------------------------+ +``` diff --git a/tidb-cloud-lake/guides/manage-costs.md b/tidb-cloud-lake/guides/manage-costs.md new file mode 100644 index 0000000000000..8b388e02d2c6a --- /dev/null +++ b/tidb-cloud-lake/guides/manage-costs.md @@ -0,0 +1,43 @@ +--- +title: Managing Costs +--- + +You can view the current month's consumption and billing history for your organization under **Manage** > **Billing**. + +## Limiting Your Costs + +For admin users, Databend Cloud offers the option to set a spending limit for their organization. This allows administrators to control the maximum amount of money spent on the platform. To do this, go to the homepage and click on **Activate Spending Limit**. On the next page, you can turn on the **Enable Spending Limit** button and specify the maximum monthly spending allowed for your organization. + +:::note +The spending limit you set will apply to each calendar month. For instance, if you set a limit on August 10th, it will be in effect for the entire month of August, from the 1st to the 31st. +::: + +When you set up a spending limit, you need to decide what action Databend Cloud should take when the limit is reached. Currently, there are two options: + +- **Suspend Service**: Your warehouses will not function until the current month ends or you set a higher limit. + +- **Send Notifications Only**: The administrators of your organization will receive email notifications as spending limits are approached. Your warehouses can continue to function properly. + +For the "Send Notifications Only" option, Databend Cloud will send email notifications to administrators based on the following frequency cycle: + +| Spending Range | Notification Frequency | +|---------------- |------------------------ | +| 80% - 90% | Every three days | +| 90% - 100% | Every three days | +| 100% or above | Every three days | + +## Granting Access to Finance Personnel + +To facilitate the work of your finance team while ensuring data security, you can create a role named `billing` within Databend Cloud. This role will be specifically tailored to provide access only to billing-related information. + +```sql +CREATE ROLE billing; +``` + +When inviting finance personnel to your organization, assign them this `billing` role. + +![alt text](../../../../../static/img/documents/pricing-billing/billing-role.png) + +Once they log in to Databend Cloud, they will have restricted access, limited to only the billing page, with all other business-related pages hidden from view. This approach helps to safeguard sensitive data by restricting unnecessary access to other parts of your Databend Cloud environment. + +![alt text](../../../../../static/img/documents/pricing-billing/billing-only-view.png) \ No newline at end of file diff --git a/tidb-cloud-lake/guides/masking-policy.md b/tidb-cloud-lake/guides/masking-policy.md new file mode 100644 index 0000000000000..949fefe0b767f --- /dev/null +++ b/tidb-cloud-lake/guides/masking-policy.md @@ -0,0 +1,170 @@ +--- +title: Masking Policy +--- +import IndexOverviewList from '@site/src/components/IndexOverviewList'; +import EEFeature from '@site/src/components/EEFeature'; + + + +Masking policies protect sensitive data by dynamically transforming column values during query execution. They enable role-based access to confidential information—authorized users see actual data, while others see masked values. + +## How Masking Works + +Policies transform column data at query time, usually based on the caller’s role. + +**Managers see actual values** +```sql +id | email | +---|-----------------| + 2 | eric@example.com| + 1 | sue@example.com | +``` + +**Other roles see masked values** +```sql +id | email | +---|----------| + 2 | *********| + 1 | *********| +``` + +### Key Traits + +- **Query-time** – transformations only occur during SELECTs. +- **Role-aware** – expressions can reference `current_role()` or any condition. +- **Column-scoped** – attach a policy per column; reuse across tables. +- **Non-destructive** – stored data never changes. + +## End-to-End Workflow + +Follow this streamlined sequence to introduce masking on a column. + +### 1. Create the target table + +```sql +CREATE TABLE user_info (id INT, email STRING NOT NULL); +``` + +### 2. Define the masking policy + +```sql +CREATE MASKING POLICY email_mask +AS (val STRING) +RETURNS STRING -> +CASE + WHEN current_role() IN ('MANAGERS') THEN val + ELSE '*********' +END; +``` + +### 3. Attach the policy + +```sql +ALTER TABLE user_info MODIFY COLUMN email SET MASKING POLICY email_mask; +``` + +### 4. Insert and query data + +```sql +INSERT INTO user_info VALUES (1, 'user@example.com'); +SELECT * FROM user_info; +``` + +**Result** + +```sql +id | email +---|---------- + 1 | ********* +``` + +## Read vs Write Behavior + +Masking policies affect read paths only. Write statements always handle true values so applications can store and modify accurate data. + +```sql +-- Write original data +INSERT INTO user_info VALUES (2, 'admin@example.com'); + +-- Read masked data +SELECT * FROM user_info WHERE id = 2; +``` + +**Result** + +```sql +id | email +---|---------- + 2 | ********* +``` + +## Managing Policies + +### DESCRIBE MASKING POLICY + +View metadata, including creation time, signature, and definition. + +```sql +DESCRIBE MASKING POLICY email_mask; +``` + +**Result** + +```sql +Name | Created On | Signature | Return Type | Body | Comment +-----------+-----------------------------+--------------+-------------+----------------------------------------------------------+--------- +email_mask | 2025-11-19 09:49:10.949 UTC | (val STRING) | STRING | CASE WHEN current_role() IN('MANAGERS') THEN val ELSE... | +``` + +### DROP MASKING POLICY + +Remove a policy definition you no longer need. + +```sql +DROP MASKING POLICY [IF EXISTS] email_mask; +``` + +### Detach from a column + +```sql +ALTER TABLE user_info MODIFY COLUMN email UNSET MASKING POLICY; +``` + +## Conditional Masking + +Use the `USING` clause to reference additional columns when the masking logic depends on other values. + +```sql +CREATE MASKING POLICY vip_mask +AS (val STRING, is_vip BOOLEAN) +RETURNS STRING -> +CASE + WHEN is_vip = true THEN val + ELSE '*********' +END; + +ALTER TABLE user_info MODIFY COLUMN email SET MASKING POLICY vip_mask USING (email, is_vip); +INSERT INTO user_info (id, email, is_vip) +VALUES (1, 'vip@example.com', true), (2, 'normal@example.com', false); +SELECT * FROM user_info; +``` + +**Result** + +```sql +id | email | is_vip +---|--------------------|------- + 1 | vip@example.com | true + 2 | ********* | false +``` + +## Privileges & References + +- Grant `CREATE MASKING POLICY` on `*.*` to any role responsible for creating or replacing policies; the creator automatically owns the policy. +- Grant the global `APPLY MASKING POLICY` privilege or `APPLY ON MASKING POLICY ` to roles that attach or detach policies via `ALTER TABLE`. +- Audit access with `SHOW GRANTS ON MASKING POLICY `. +- Additional references: + - [User & Role](/tidb-cloud-lake/sql/user-role.md) + - [CREATE MASKING POLICY](/tidb-cloud-lake/sql/create-masking-policy.md) + - [ALTER TABLE](/tidb-cloud-lake/sql/alter-table.md#column-operations) + - [Masking Policy Commands](/tidb-cloud-lake/sql/masking-policy-sql.md) diff --git a/tidb-cloud-lake/guides/mcp-client-integration.md b/tidb-cloud-lake/guides/mcp-client-integration.md new file mode 100644 index 0000000000000..6854ab9804148 --- /dev/null +++ b/tidb-cloud-lake/guides/mcp-client-integration.md @@ -0,0 +1,205 @@ +--- +title: MCP Client Integration +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +# MCP Client Integration + +## Overview + +[Databend MCP](https://github.com/databendlabs/mcp-databend) connects AI assistants to Databend via Model Context Protocol. Works with Claude Code, Codex, Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. + +**What you can do:** + +- Generate complex SQL queries based on your requirements +- Create and manage scheduled data pipeline tasks +- Explore database schemas and validate query syntax +- Build ETL workflows with COPY, MERGE, and Stage operations + +For example: _"Create a scheduled task that copies parquet files from @my_stage to the orders table every minute, and verify it's running correctly."_ + +## Installation + +### 1. Get Databend Connection + +We recommend using **Databend Cloud** for the best experience. + +1. Log in to [Databend Cloud](https://app.databend.com). +2. Click **Use with AI Tools** in the navigation bar. +3. Select regular connection information (Host, User, Password, etc.). +4. Copy your DSN, which looks like: + `databend://user:pwd@host:443/database?warehouse=warehouse_name` + +![Use with AI Tools](@site/static/img/connect/ai-tools.png) + +### 2. Configure Your MCP Client + + + + + +```bash +codex mcp add databend \ + --env DATABEND_DSN='databend://user:password@host:port/database?warehouse=your_warehouse' \ + --env SAFE_MODE='false' \ + -- uv tool run mcp-databend +``` + +Or add to `~/.codex/config.toml`: + +```toml +[mcp_servers.databend] +command = "uv" +args = ["tool", "run", "mcp-databend"] + +[mcp_servers.databend.env] +DATABEND_DSN = "databend://user:password@host:port/database?warehouse=your_warehouse" +SAFE_MODE = "false" +``` + + + + + +```bash +claude mcp add databend \ + --env DATABEND_DSN='databend://user:password@host:port/database?warehouse=your_warehouse' \ + --env SAFE_MODE='false' \ + -- uv tool run mcp-databend +``` + + + + + +Add to `~/.gemini/settings.json`: + +```json +{ + "mcpServers": { + "databend": { + "command": "uv", + "args": ["tool", "run", "mcp-databend"], + "env": { + "DATABEND_DSN": "databend://user:password@host:port/database?warehouse=your_warehouse", + "SAFE_MODE": "false" + } + } + } +} +``` + + + + + +Add to `~/.cursor/mcp.json`: + +```json +{ + "mcpServers": { + "databend": { + "command": "uv", + "args": ["tool", "run", "mcp-databend"], + "env": { + "DATABEND_DSN": "databend://user:password@host:port/database?warehouse=your_warehouse", + "SAFE_MODE": "false" + } + } + } +} +``` + + + + + +Add to `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%/Claude/claude_desktop_config.json` (Windows): + +```json +{ + "mcpServers": { + "databend": { + "command": "uv", + "args": ["tool", "run", "mcp-databend"], + "env": { + "DATABEND_DSN": "databend://user:password@host:port/database?warehouse=your_warehouse", + "SAFE_MODE": "false" + } + } + } +} +``` + + + + + +Add to `.vscode/mcp.json`: + +```json +{ + "mcpServers": { + "databend": { + "command": "uv", + "args": ["tool", "run", "mcp-databend"], + "env": { + "DATABEND_DSN": "databend://user:password@host:port/database?warehouse=your_warehouse", + "SAFE_MODE": "false" + } + } + } +} +``` + + + + + +```bash +export DATABEND_DSN="databend://user:password@host:port/database?warehouse=your_warehouse" +export SAFE_MODE="false" + +uv tool run mcp-databend +``` + + + + + +## Available Tools + +### Database Operations + +| Tool | Description | +| ---------------- | ------------------------------------------------ | +| `execute_sql` | Execute SQL queries with timeout protection | +| `show_databases` | List all databases | +| `show_tables` | List tables in a database (with optional filter) | +| `describe_table` | Get table schema information | + +### Stage Management + +| Tool | Description | +| ------------------ | ------------------------------------------ | +| `show_stages` | List all available stages | +| `list_stage_files` | List files in a specific stage | +| `create_stage` | Create a new stage with connection support | + +### Connection Management + +| Tool | Description | +| ------------------ | ------------------------------ | +| `show_connections` | List all available connections | + +## Configuration + +| Variable | Description | Default | +| ------------------------ | ------------------------------------------------------- | -------- | +| `DATABEND_DSN` | Connection string | Required | +| `SAFE_MODE` | Block dangerous SQL operations (`DROP`, `DELETE`, etc.) | `true` | +| `DATABEND_QUERY_TIMEOUT` | Query timeout in seconds | `300` | + +For more details on building conversational BI tools, see [MCP Server Guide](02-mcp.md). diff --git a/tidb-cloud-lake/guides/mcp-server.md b/tidb-cloud-lake/guides/mcp-server.md new file mode 100644 index 0000000000000..cf1020c3e7464 --- /dev/null +++ b/tidb-cloud-lake/guides/mcp-server.md @@ -0,0 +1,238 @@ +import DetailsWrap from '@site/src/components/DetailsWrap'; + +# MCP Server for Databend + +[mcp-databend](https://github.com/databendlabs/mcp-databend) is an MCP (Model Context Protocol) server that enables AI assistants to interact directly with your Databend database using natural language. + +## What mcp-databend Can Do + +- **execute_sql** - Execute SQL queries with timeout protection +- **show_databases** - List all available databases +- **show_tables** - List tables in a database (with optional filter) +- **describe_table** - Get detailed table schema information + +## Build a ChatBI Tool + +This tutorial shows you how to build a conversational Business Intelligence tool using mcp-databend and the Agno framework. You'll create a local agent that can answer data questions in natural language. + +![Databend MCP ChatBI](@site/static/img/connect/databend-mcp-chatbi.png) + +## Prerequisites + +Before getting started, you'll need: + +1. **Databend Database** - Either [Databend Cloud](https://app.databend.com) (free tier available) or a self-hosted instance +2. **DeepSeek API Key** - Get your key from [https://platform.deepseek.com/api_keys](https://platform.deepseek.com/api_keys) + +## Step-by-Step Tutorial + +### Step 1: Setup Databend Connection + +If you don't already have a Databend database: + +1. **Sign up for [Databend Cloud](https://app.databend.com)** (free tier available) +2. **Create a warehouse and database** +3. **Get your connection string** from the console + +For detailed DSN format and examples, see [Connection String Documentation](https://docs.databend.com/developer/drivers/#connection-string-dsn). + +| Deployment | Connection String Example | +| ------------------ | ------------------------------------------------------------- | +| **Databend Cloud** | `databend://user:pwd@host:443/database?warehouse=wh` | +| **Self-hosted** | `databend://user:pwd@localhost:8000/database?sslmode=disable` | + +### Step 2: Setup API Keys and Environment + +Set up your API key and database connection: + +```bash +# Set your DeepSeek API key +export DEEPSEEK_API_KEY="your-deepseek-api-key" + +# Set your Databend connection string +export DATABEND_DSN="your-databend-connection-string" +``` + +### Step 3: Install Dependencies + +Create a virtual environment and install the required packages: + +```bash +# Create virtual environment +python3 -m venv .venv +source .venv/bin/activate + +# Install packages +pip install packaging openai agno sqlalchemy fastapi mcp-databend +``` + +### Step 4: Create ChatBI Agent + +Now create your ChatBI agent that uses mcp-databend to interact with your database. + +Create a file `agent.py`: + +```python +from contextlib import asynccontextmanager +import os +import logging +import sys + +from agno.agent import Agent +from agno.playground import Playground +from agno.storage.sqlite import SqliteStorage +from agno.tools.mcp import MCPTools +from agno.models.deepseek import DeepSeek +from fastapi import FastAPI + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +def check_env_vars(): + required = { + "DATABEND_DSN": "https://docs.databend.com/developer/drivers/#connection-string-dsn", + "DEEPSEEK_API_KEY": "https://platform.deepseek.com/api_keys" + } + + missing = [var for var in required if not os.getenv(var)] + + if missing: + print("❌ Missing environment variables:") + for var in missing: + print(f" • {var}: {required[var]}") + print("\nExample: export DATABEND_DSN='...' DEEPSEEK_API_KEY='...'") + sys.exit(1) + + print("✅ Environment variables OK") + +check_env_vars() + +class DatabendTool: + def __init__(self): + self.mcp = None + self.dsn = os.getenv("DATABEND_DSN") + + def create(self): + env = os.environ.copy() + env["DATABEND_DSN"] = self.dsn + self.mcp = MCPTools( + command="python -m mcp_databend", + env=env, + timeout_seconds=300 + ) + return self.mcp + + async def init(self): + try: + await self.mcp.connect() + logger.info("✓ Connected to Databend") + return True + except Exception as e: + logger.error(f"✗ Databend connection failed: {e}") + return False + +databend = DatabendTool() + +agent = Agent( + name="ChatBI", + model=DeepSeek(), + tools=[], + instructions=[ + "You are ChatBI - a Business Intelligence assistant for Databend.", + "Help users explore and analyze their data using natural language.", + "Always start by exploring available databases and tables.", + "Format query results in clear, readable tables.", + "Provide insights and explanations with your analysis." + ], + storage=SqliteStorage(table_name="chatbi", db_file="chatbi.db"), + add_datetime_to_instructions=True, + add_history_to_messages=True, + num_history_responses=5, + markdown=True, + show_tool_calls=True, +) + +@asynccontextmanager +async def lifespan(app: FastAPI): + tool = databend.create() + if not await databend.init(): + logger.error("Failed to initialize Databend") + raise RuntimeError("Databend connection failed") + + agent.tools.append(tool) + logger.info("ChatBI initialized successfully") + + yield + + if databend.mcp: + await databend.mcp.close() + +playground = Playground( + agents=[agent], + name="ChatBI with Databend", + description="Business Intelligence Assistant powered by Databend" +) + +app = playground.get_app(lifespan=lifespan) + +if __name__ == "__main__": + print("🤖 Starting MCP Server for Databend") + print("Open http://localhost:7777 to start chatting!") + playground.serve(app="agent:app", host="127.0.0.1", port=7777) + +``` + +### Step 5: Start Your ChatBI Agent + +Run your agent to start the local server: + +```bash +python agent.py +``` + +You should see: + +``` +✅ Environment variables OK +🤖 Starting MCP Server for Databend +Open http://localhost:7777 to start chatting! +INFO Starting playground on http://127.0.0.1:7777 +INFO: Started server process [189851] +INFO: Waiting for application startup. +INFO:agent:✓ Connected to Databend +INFO:agent:ChatBI initialized successfully +INFO: Application startup complete. +INFO: Uvicorn running on http://127.0.0.1:7777 (Press CTRL+C to quit) +``` + +### Step 6: Setup Web Interface + +For a better user experience, you can set up Agno's web interface: + +```bash +# Create the Agent UI +npx create-agent-ui@latest + +# Enter 'y' when prompted, then run: +cd agent-ui && npm run dev +``` + +**Connect to Your Agent:** + +1. Open [http://localhost:3000](http://localhost:3000) +2. Select "localhost:7777" as your endpoint +3. Start asking questions about your data! + +**Try These Queries:** + +- "Show me all databases" +- "What tables do I have?" +- "Describe the structure of my tables" +- "Run a query to show sample data" + +## Resources + +- **GitHub Repository**: [databendlabs/mcp-databend](https://github.com/databendlabs/mcp-databend) +- **PyPI Package**: [mcp-databend](https://pypi.org/project/mcp-databend) +- **Agno Framework**: [Agno MCP](https://docs.agno.com/tools/mcp/mcp) +- **Agent UI**: [Agent UI](https://docs.agno.com/agent-ui/introduction) diff --git a/tidb-cloud-lake/guides/metabase.md b/tidb-cloud-lake/guides/metabase.md new file mode 100644 index 0000000000000..f1ce48655804b --- /dev/null +++ b/tidb-cloud-lake/guides/metabase.md @@ -0,0 +1,88 @@ +--- +title: Metabase +sidebar_position: 4 +--- + +[Metabase](https://www.metabase.com/) is an open-source business intelligence platform. You can use Metabase to ask questions about your data, or embed Metabase in your app to let your customers explore their data on their own. + +Databend provides a JDBC driver named [Metabase Databend Driver](https://github.com/databendcloud/metabase-databend-driver/releases/latest), enabling you to connect to Metabase and dashboard your data in Databend / Databend Cloud. For more information about the Metabase Databend Driver, refer to https://github.com/databendcloud/metabase-databend-driver + +## Downloading & Installing Metabase Databend Driver + +To download and install the Metabase Databend Driver: + +1. Create a folder named **plugins** in the directory where the file **metabase.jar** is stored. + +```bash +$ ls +metabase.jar +$ mkdir plugins +``` + +2. [Download](https://github.com/databendcloud/metabase-databend-driver/releases/latest) the Metabase Databend Driver, then save it in the **plugins** folder. + +3. To start Metabase, run the following command: + +```bash +java -jar metabase.jar +``` + +## Tutorial: Integrating with Metabase + +This tutorial guides you through the process of integrating Databend / Databend Cloud with Metabase using the Metabase Databend Driver. + +### Step 1. Set up Environment + +To follow along, you'll need to install Metabase with Docker. Before you begin, make sure that Docker is installed on your system. + +For this tutorial, you can integrate either with Databend or Databend Cloud: + +- If you choose to integrate with a local Databend instance, follow the [Deployment Guide](/guides/self-hosted) to deploy it if you don't have one already. +- If you prefer to integrate with Databend Cloud, make sure you can log in to your account and obtain the connection information for a warehouse. For more details, see [Connecting to a Warehouse](/guides/cloud/resources/warehouses#connecting). + +### Step 2. Deploy Metabase + +Follow these steps to install and deploy Metabase with Docker: + +1. Pull the latest Docker image of Metabase from the Docker Hub registry. + +```bash +docker pull metabase/metabase +``` + +2. Deploy Metabase. + +```bash +docker run -d -p 3000:3000 --name metabase metabase/metabase +``` + +3. [Download](https://github.com/databendcloud/metabase-databend-driver/releases/latest) the Metabase Databend Driver, then import it to the **plugins** folder of the Metabase container in Docker. + +![Alt text](/img/integration/add2plugins.gif) + +4. Restart the Metabase container. + +### Step 3. Connect to Metabase + +1. Open your web browser, and go to http://localhost:3000/. + +2. Complete the initial sign-up process. Select **I'll add my data later** in step 3. + +![Alt text](/img/integration/add-later.png) + +3. Click on the **gear** icon in the top right, and navigate to **Admin settings** > **Databases** > **Add a database** to create a connection: + +| Parameter | Databend | Databend Cloud | +| ----------------------------- | ---------------------- | ---------------------------------- | +| Database type | `Databend` | `Databend` | +| Host | `host.docker.internal` | Obtain from connection information | +| Port | `8000` | `443` | +| Username | For example, `root` | `cloudapp` | +| Password | Enter your password | Obtain from connection information | +| Use a secure connection (SSL) | Toggle off | Toggle on | + +4. Click **Save changes**, then click **Exit admin**. + +You're all set! You can now start creating a query and building a dashboard. For more information, please refer to the Metabase documentation: https://www.metabase.com/docs/latest/index.html + +![Alt text](/img/integration/allset.png) diff --git a/tidb-cloud-lake/guides/mindsdb.md b/tidb-cloud-lake/guides/mindsdb.md new file mode 100644 index 0000000000000..6e4297bb9331c --- /dev/null +++ b/tidb-cloud-lake/guides/mindsdb.md @@ -0,0 +1,199 @@ +--- +title: MindsDB +sidebar_position: 7 +--- + +Data that lives in your database is a valuable asset. [MindsDB](https://mindsdb.com/) enables you to use your data and make forecasts. It speeds up the ML development process by bringing machine learning into the database. With MindsDB, you can build, train, optimize, and deploy your ML models without the need for other platforms. + +Both Databend and Databend Cloud can integrate with MindsDB as a data source, which brings Machine Learning capabilities into Databend. The following tutorials show you how to integrate with MindsDB and make data forecasts, using the [Air Pollution in Seoul](https://www.kaggle.com/datasets/bappekim/air-pollution-in-seoul) dataset as an example. + +## Tutorial-1: Integrating Databend with MindsDB + +Before you start, install a local MindsDB or sign up an account for MindsDB Cloud. This tutorial uses MindsDB Cloud. For more information about how to install a local MindsDB, refer to https://docs.mindsdb.com/quickstart#1-create-a-mindsdb-cloud-account-or-install-mindsdb-locally + +### Step 1. Load Dataset into Databend + +Run the following SQL statements to create a table in the database `default` and load the [Air Pollution in Seoul](https://www.kaggle.com/datasets/bappekim/air-pollution-in-seoul) dataset using the COPY INTO command: + +```sql +CREATE TABLE pollution_measurement( + MeasurementDate Timestamp, + StationCode String, + Address String, + Latitude double, + Longitude double, + SO2 double, + NO2 double, + O3 double, + CO double, + PM10 double, + PM25 double +); +COPY INTO pollution_measurement FROM 'https://datasets.databend.org/AirPolutionSeoul/Measurement_summary.csv' file_format=(type='CSV' skip_header=1); +``` + +### Step 2. Connect MindsDB to Databend + +1. Copy and paste the following SQL statements to the MindsDB Cloud Editor, and click **Run**: + +```sql +CREATE DATABASE databend_datasource +WITH engine='databend', +parameters={ + "protocol": "https", + "user": "", + "port": 8000, + "password": "", + "host": "", + "database": "default" +}; +``` + +:::tip +The SQL statements above connect the database `default` in Databend to your MindsDB Cloud account. For explanations about the parameters, refer to https://docs.mindsdb.com/data-integrations/all-data-integrations#databend +::: + +2. In the MindsDB Cloud Editor, run the following SQL statements to verify the integration: + +```sql +SELECT * FROM databend_datasource.pollution_measurement LIMIT 10; +``` + +![Alt text](/img/integration/mindsdb-verify.png) + +### Step 3. Create a Predictor + +In the MindsDB Cloud Editor, run the following SQL statements to create a predictor: + +```sql +CREATE PREDICTOR airq_predictor +FROM databend_datasource (SELECT * FROM pollution_measurement LIMIT 50) +PREDICT so2; +``` + +Now the predictor will begin training. You can check the status with the following query: + +```sql +SELECT * +FROM mindsdb.models +WHERE name='airq_predictor'; +``` + +:::note +The status of the model must be `complete` before you can start making predictions. +::: + +### Step 4. Make Predictions + +In the MindsDB Cloud Editor, run the following SQL statements to predict the concentration of SO2: + +```sql +SELECT + SO2 AS predicted, + SO2_confidence AS confidence, + SO2_explain AS info +FROM mindsdb.airq_predictor +WHERE (NO2 = 0.005) + AND (CO = 1.2) + AND (PM10 = 5) +``` + +Output: + +![Alt text](/img/integration/mindsdb-predict.png) + +## Tutorial-2: Integrating Databend Cloud with MindsDB + +Before you start, install a local MindsDB or sign up an account for MindsDB Cloud. This tutorial uses MindsDB Cloud. For more information about how to install a local MindsDB, refer to https://docs.mindsdb.com/quickstart#1-create-a-mindsdb-cloud-account-or-install-mindsdb-locally + +### Step 1. Load Dataset into Databend Cloud + +Open a worksheet in Databend Cloud, and run the following SQL statements to create a table in the database `default` and load the [Air Pollution in Seoul](https://www.kaggle.com/datasets/bappekim/air-pollution-in-seoul) dataset using the COPY INTO command: + +```sql +CREATE TABLE pollution_measurement( + MeasurementDate Timestamp, + StationCode String, + Address String, + Latitude double, + Longitude double, + SO2 double, + NO2 double, + O3 double, + CO double, + PM10 double, + PM25 double +); + +COPY INTO pollution_measurement FROM 'https://repo.databend.com/AirPolutionSeoul/Measurement_summary.csv' file_format=(type='CSV' skip_header=1); +``` + +### Step 2. Connect MindsDB to Databend Cloud + +1. Copy and paste the following SQL statements to the MindsDB Cloud Editor, and click **Run**: + +```sql +CREATE DATABASE databend_datasource +WITH engine='databend', +parameters={ + "protocol": "https", + "user": "cloudapp", + "port": 443, + "password": "", + "host": "", + "database": "default" +}; +``` + +:::tip +The SQL statements above connect the database `default` in Databend Cloud to your MindsDB Cloud account. The parameter values can be obtained from the connection information of your warehouse. For more information, see [Connecting to a Warehouse](/guides/cloud/resources/warehouses#connecting). For explanations about the parameters, refer to https://docs.mindsdb.com/data-integrations/all-data-integrations#databend +::: + +2. In the MindsDB Cloud Editor, run the following SQL statements to verify the integration: + +```sql +SELECT * FROM databend_datasource.pollution_measurement LIMIT 10; +``` + +![Alt text](@site/static/img/documents/BI/mindsdb-verify.png) + +### Step 3. Create a Predictor + +In the MindsDB Cloud Editor, run the following SQL statements to create a predictor: + +```sql +CREATE PREDICTOR airq_predictor +FROM databend_datasource (SELECT * FROM pollution_measurement LIMIT 50) +PREDICT so2; +``` + +Now the predictor will begin training. You can check the status with the following query: + +```sql +SELECT * +FROM mindsdb.models +WHERE name='airq_predictor'; +``` + +:::note +The status of the model must be `complete` before you can start making predictions. +::: + +### Step 4. Make Predictions + +In the MindsDB Cloud Editor, run the following SQL statements to predict the concentration of SO2: + +```sql +SELECT + SO2 AS predicted, + SO2_confidence AS confidence, + SO2_explain AS info +FROM mindsdb.airq_predictor +WHERE (NO2 = 0.005) + AND (CO = 1.2) + AND (PM10 = 5) +``` + +Output: + +![Alt text](@site/static/img/documents/BI/mindsdb-predict.png) diff --git a/tidb-cloud-lake/guides/monitor-usage.md b/tidb-cloud-lake/guides/monitor-usage.md new file mode 100644 index 0000000000000..8a7806e989253 --- /dev/null +++ b/tidb-cloud-lake/guides/monitor-usage.md @@ -0,0 +1,38 @@ +--- +title: "Monitoring Usage" +--- + +Databend Cloud provides monitoring functionality to help you gain a comprehensive understanding of your and your organization members' usage on the platform. To access the **Monitor** page, click **Monitor** in the sidebar menu on the homepage. The page includes the following tabs: + +- [Metrics](#metrics) +- [SQL History](#sql-history) +- [Task History](#task-history) +- [Audit](#audit): Visible to `account_admin` users only. + +## Metrics + +The **Metrics** tab presents charts that visually illustrate usage statistics for the following metrics, covering data from the past hour, day, or week: + +- Storage Size +- SQL Query Count +- Session Connections +- Data Scanned / Written +- Warehouse Status +- Rows Scanned / Written + +## SQL History + +The **SQL History** tab displays a list of SQL statements that have been executed by all users within your organization. By clicking **Filter** at the top of the list, you can filter records by multiple dimensions. + +Clicking a record on the **SQL History** page reveals detailed information on how Databend Cloud executed the SQL statement, providing access to the following tabs: + +- **Query Details**: Includes Query State (success or failure), Rows Scanned, Warehouse, Bytes Scanned, Start Time, End Time, and Handler Type. +- **Query Profile**: Illustrates how the SQL statement was executed. + +## Task History + +The **Task History** tab offers a comprehensive log of all executed tasks within your organization, enabling users to review task settings and monitor their status. + +## Audit + +The **Audit** tab records the operation logs of all organization members, including the operation type, operation time, IP address, and the account of the operator. By clicking **Filter** at the top of the list, you can filter records by multiple dimensions. \ No newline at end of file diff --git a/tidb-cloud-lake/guides/multimodal-data-analytics.md b/tidb-cloud-lake/guides/multimodal-data-analytics.md new file mode 100644 index 0000000000000..72ee5b0bba3b2 --- /dev/null +++ b/tidb-cloud-lake/guides/multimodal-data-analytics.md @@ -0,0 +1,17 @@ +--- +title: Multimodal Data Analytics +--- + +CityDrive Intelligence records video of every drive. Background processing tools split the video stream into keyframe images, extracting rich multimodal information from each image and storing it by `video_id`. These signals include relational metadata, JSON manifests, behavior tags, vector embeddings, and GPS traces. + +This guide set shows how Databend keeps all those workloads in one warehouse—no copy jobs, no extra search cluster. + +| Guide | What it covers | +|-------|----------------| +| [SQL Analytics](/tidb-cloud-lake/guides/sql-analytics.md) | Base tables, filters, joins, windows, aggregating indexes | +| [JSON & Search](/tidb-cloud-lake/guides/json-search.md) | Load `frame_metadata_catalog`, run Elasticsearch `QUERY()`, link bitmap tags | +| [Vector Search](./02-vector-db.md) | Persist embeddings, run cosine search, join risk metrics | +| [Geo Analytics](/tidb-cloud-lake/guides/geo-analytics.md) | Use `GEOMETRY`, distance/polygon filters, traffic-light joins | +| [Lakehouse ETL](/tidb-cloud-lake/guides/lakehouse-etl.md) | Stage once, `COPY INTO` shared tables, add streams/tasks | + +Walk through them in order to see how the same identifiers flow from classic SQL to text search, vector, geo, and ETL—everything grounded in a single CityDrive scenario. diff --git a/tidb-cloud-lake/guides/network-policy.md b/tidb-cloud-lake/guides/network-policy.md new file mode 100644 index 0000000000000..de72a55f4666e --- /dev/null +++ b/tidb-cloud-lake/guides/network-policy.md @@ -0,0 +1,91 @@ +--- +title: Network Policy +--- + +Network policies control who can log in to Databend based on the client IP. Even if the credentials are correct, a connection request is rejected when its IP does not satisfy the policy, giving you an extra security layer beyond username and password. + +## How It Works + +- `ALLOWED_IP_LIST` accepts single IPv4 addresses or CIDR blocks such as `10.0.0.0/24`. Only addresses in the list are allowed to log in. +- `BLOCKED_IP_LIST` (optional) lets you carve out explicit deny rules from the allowed ranges. Databend checks the blocked list first, so an IP that exists in both lists is still denied. +- A user can reference at most one network policy at a time, but the same policy can be shared across many users for easier management. +- If the server cannot determine a client IP or the IP does not match the allowed list, Databend immediately returns `AuthenticateFailure`. + +## End-to-End Example + +The following walkthrough covers the typical lifecycle: create a policy, attach it to users, confirm its status, update it centrally, and finally detach and drop it. + +### 1. Create and Inspect a Policy + +```sql +CREATE NETWORK POLICY corp_vpn_policy + ALLOWED_IP_LIST=('10.1.0.0/16', '172.16.8.12/32') + BLOCKED_IP_LIST=('10.1.10.25') + COMMENT='Only VPN ranges'; + +SHOW NETWORK POLICIES; + +Name |Allowed Ip List |Blocked Ip List|Comment | +----------------+--------------------------+---------------+-----------------+ +corp_vpn_policy |10.1.0.0/16,172.16.8.12/32|10.1.10.25 |Only VPN ranges | +``` + +### 2. Attach the Policy to Users + +```sql +CREATE USER alice IDENTIFIED BY 'Str0ngPass!' WITH SET NETWORK POLICY='corp_vpn_policy'; +CREATE USER bob IDENTIFIED BY 'An0therPass!'; + +-- Apply the policy to an existing user +ALTER USER bob WITH SET NETWORK POLICY='corp_vpn_policy'; +``` + +### 3. Verify Enforcement + +```sql +DESC USER alice; + +┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ hostname │ auth_type │ default_role │ roles │ disabled │ network_policy │ password_policy │ must_change_password │ +├────────┼──────────┼──────────────────────┼──────────────┼───────┼──────────┼───────────────────┼─────────────────┼──────────────────────┤ +│ alice │ % │ double_sha1_password │ │ │ false │ corp_vpn_policy │ │ NULL │ +└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ + +DESC NETWORK POLICY corp_vpn_policy; + +Name |Allowed Ip List |Blocked Ip List|Comment | +----------------+--------------------------+---------------+----------------+ +corp_vpn_policy |10.1.0.0/16,172.16.8.12/32|10.1.10.25 |Only VPN ranges | +``` + +### 4. Update and Reuse the Policy + +Use [ALTER NETWORK POLICY](/tidb-cloud-lake/sql/network-policy.md) to adjust the allowed or blocked IPs without touching each user: + +```sql +ALTER NETWORK POLICY corp_vpn_policy + SET ALLOWED_IP_LIST=('10.1.0.0/16', '10.2.0.0/16') + BLOCKED_IP_LIST=('10.1.10.25', '10.2.5.5') + COMMENT='VPN + DR site'; + +DESC NETWORK POLICY corp_vpn_policy; + +Name |Allowed Ip List |Blocked Ip List |Comment | +----------------+----------------------------+-------------------------+-----------------+ +corp_vpn_policy |10.1.0.0/16,10.2.0.0/16 |10.1.10.25,10.2.5.5 |VPN + DR site | +``` + +Every user referencing the policy automatically picks up the new IP ranges. + +### 5. Detach and Clean Up + +```sql +ALTER USER bob WITH UNSET NETWORK POLICY; +DROP NETWORK POLICY corp_vpn_policy; +``` + +Confirm that no users depend on the policy before dropping it; otherwise, their logins will fail. + +--- + +For full syntax details, see the [Network Policy SQL reference](/tidb-cloud-lake/sql/network-policy.md), which covers `CREATE`, `ALTER`, `SHOW`, `DESC`, and `DROP`. diff --git a/tidb-cloud-lake/guides/ngram-index.md b/tidb-cloud-lake/guides/ngram-index.md new file mode 100644 index 0000000000000..693bcecd630df --- /dev/null +++ b/tidb-cloud-lake/guides/ngram-index.md @@ -0,0 +1,185 @@ +--- +title: Ngram Index +--- + +# Ngram Index: Fast Pattern Matching for LIKE Queries + +Ngram indexes accelerate pattern matching queries using the `LIKE` operator with wildcards (`%`), enabling fast substring searches without full table scans. + +## What Problem Does It Solve? + +Pattern matching with `LIKE` queries faces significant performance challenges on large datasets: + +| Problem | Impact | Ngram Index Solution | +|---------|--------|---------------------| +| **Slow Wildcard Searches** | `WHERE content LIKE '%keyword%'` scans entire tables | Pre-filter data blocks using n-gram segments | +| **Full Table Scans** | Every pattern search reads all rows | Read only relevant data blocks containing patterns | +| **Poor Search Performance** | Users wait long for substring search results | Sub-second pattern matching response times | +| **Ineffective Traditional Indexes** | B-tree indexes can't optimize middle wildcards | Character-level indexing handles any wildcard position | + +**Example**: Searching for `'%error log%'` in 10M log entries. Without ngram index, it scans all 10M rows. With ngram index, it pre-filters to ~1000 relevant blocks instantly. + +## Ngram vs Full-Text Index: When to Use Which? + +| Feature | Ngram Index | Full-Text Index | +|---------|-------------|-----------------| +| **Primary Use Case** | Pattern matching with `LIKE '%pattern%'` | Semantic text search with `MATCH()` | +| **Search Type** | Exact substring matching | Word-based search with relevance | +| **Query Syntax** | `WHERE column LIKE '%text%'` | `WHERE MATCH(column, 'text')` | +| **Advanced Features** | Case-insensitive matching | Fuzzy search, relevance scoring, boolean operators | +| **Performance Focus** | Accelerate existing LIKE queries | Replace LIKE with advanced search functions | +| **Best For** | Log analysis, code search, exact pattern matching | Document search, content discovery, search engines | + +**Choose Ngram Index when:** +- You have existing `LIKE '%pattern%'` queries to optimize +- Need exact substring matching (case-insensitive) +- Working with structured data like logs, codes, or IDs +- Want to improve performance without changing query syntax + +**Choose Full-Text Index when:** +- Building search functionality for documents or content +- Need fuzzy search, relevance scoring, or complex queries +- Working with natural language text +- Want advanced search capabilities beyond simple pattern matching + +## How Ngram Index Works + +Ngram indexes break text into overlapping character substrings (n-grams) for fast pattern lookup: + +**Example with `gram_size = 3`:** +```text +Input: "The quick brown" +N-grams: "The", "he ", "e q", " qu", "qui", "uic", "ick", "ck ", "k b", " br", "bro", "row", "own" +``` + +**Query Processing:** +```sql +SELECT * FROM t WHERE content LIKE '%quick br%' +``` +1. Pattern `'quick br'` is tokenized into n-grams: "qui", "uic", "ick", "ck ", "k b", " br" +2. Index filters data blocks containing these n-grams +3. Full `LIKE` filter applied only to pre-filtered blocks + +:::note **Important Limitations** +- Pattern must be at least `gram_size` characters long (short patterns like `'%yo%'` with `gram_size=3` won't use the index) +- Matches are case-insensitive ("FOO" matches "foo", "Foo", "fOo") +- Only works with `LIKE` operator, not with other pattern matching functions +::: + +## Quick Setup + +```sql +-- Create table with text content +CREATE TABLE logs(id INT, message STRING); + +-- Create ngram index with 3-character segments +CREATE NGRAM INDEX logs_message_idx ON logs(message) gram_size = 3; + +-- Insert data (automatically indexed) +INSERT INTO logs VALUES (1, 'Application error occurred'); + +-- Search using LIKE - automatically optimized +SELECT * FROM logs WHERE message LIKE '%error%'; +``` + +## Complete Example + +This example demonstrates creating an ngram index for log analysis and verifying its performance benefits: + +```sql +-- Create table for application logs +CREATE TABLE t_articles ( + id INT, + content STRING +); + +-- Create ngram index with 3-character segments +CREATE NGRAM INDEX ngram_idx_content +ON t_articles(content) +gram_size = 3; + +-- Verify index creation +SHOW INDEXES; +``` + +```sql +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ type │ original │ definition │ created_on │ updated_on │ +├───────────────────┼────────┼──────────┼──────────────────────────────────┼────────────────────────────┼─────────────────────┤ +│ ngram_idx_content │ NGRAM │ │ t_articles(content)gram_size='3' │ 2025-05-13 01:02:58.598409 │ NULL │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +```sql +-- Insert test data: 995 irrelevant rows + 5 target rows +INSERT INTO t_articles +SELECT number, CONCAT('Random text number ', number) +FROM numbers(995); + +INSERT INTO t_articles VALUES + (1001, 'The silence was deep and complete'), + (1002, 'They walked in silence through the woods'), + (1003, 'Silence fell over the room'), + (1004, 'A moment of silence was observed'), + (1005, 'In silence, they understood each other'); + +-- Search with pattern matching +SELECT id, content FROM t_articles WHERE content LIKE '%silence%'; + +-- Verify index usage +EXPLAIN SELECT id, content FROM t_articles WHERE content LIKE '%silence%'; +``` + +**Performance Results:** +```sql +-[ EXPLAIN ]----------------------------------- +TableScan +├── table: default.default.t_articles +├── output columns: [id (#0), content (#1)] +├── read rows: 5 +├── read size: < 1 KiB +├── partitions total: 2 +├── partitions scanned: 1 +├── pruning stats: [segments: , blocks: ] +├── push downs: [filters: [is_true(like(t_articles.content (#1), '%silence%'))], limit: NONE] +└── estimated rows: 15.62 +``` + +**Key Performance Indicator:** `bloom pruning: 2 to 1` shows the ngram index successfully filtered out 50% of data blocks before scanning. + +## Best Practices + +| Practice | Benefit | +|----------|---------| +| **Choose Appropriate gram_size** | `gram_size=3` works well for most cases; larger values for longer patterns | +| **Index Frequently Searched Columns** | Focus on columns used in `LIKE '%pattern%'` queries | +| **Monitor Index Usage** | Use `EXPLAIN` to verify `bloom pruning` statistics | +| **Consider Pattern Length** | Ensure search patterns are at least `gram_size` characters long | + +## Essential Commands + +For complete command reference, see [Ngram Index](/tidb-cloud-lake/sql/ngram-index.md). + +| Command | Purpose | +|----------------------------------------------------------|----------------------------------------------| +| `CREATE NGRAM INDEX name ON table(column) gram_size = N` | Create ngram index with N-character segments | +| `SHOW INDEXES` | List all indexes including ngram indexes | +| `REFRESH NGRAM INDEX name ON table` | Refresh ngram index | +| `DROP NGRAM INDEX name ON table` | Remove ngram index | + +:::tip **When to Use Ngram Indexes** +**Ideal for:** +- Log analysis and monitoring systems +- Code search and pattern matching +- Product catalog searches +- Any application with frequent `LIKE '%pattern%'` queries + +**Not recommended for:** +- Short pattern searches (less than `gram_size` characters) +- Exact string matching (use equality comparison instead) +- Complex text search requirements (use Full-Text Index instead) +::: + +--- + +*Ngram indexes are essential for applications requiring fast pattern matching with `LIKE` queries on large text datasets.* diff --git a/tidb-cloud-lake/guides/organization-members.md b/tidb-cloud-lake/guides/organization-members.md new file mode 100644 index 0000000000000..6d1d20da5bd07 --- /dev/null +++ b/tidb-cloud-lake/guides/organization-members.md @@ -0,0 +1,59 @@ +--- +title: Organization & Members +--- + +This topic explains the concept of an organization and its members in Databend Cloud. + +## Understanding Organization + +Organization is an essential concept in Databend Cloud. All the users, databases, warehouses, and other objects in Databend Cloud are associated with an organization. An organization is a group for managing users and their resources. + +In an organization of Databend Cloud, data and resources are shared among all users of the organization. Users can collaborate with each other to manage and analyze the organization's data effectively by taking advantage of the cloud-native features. + +Please note that data is not shared across organizations, and organizations cannot be combined either if your company owns multiple organizations in Databend Cloud. + +### Creating an Organization + +When you provide an organization name during the signup process, you create an organization in Databend Cloud with your account as an Admin account. You will also need to select a pricing plan, a cloud provider, and a region for the new organization. For more information, see [Getting Started](../01-getting-started.md). + +![](@site/static/img/documents/getting-started/01.jpg) + +:::tip +If you're invited by a user who already belongs to an organization in Databend Cloud, the textbox will show that organization's name. In this case, you cannot create another organization. +::: + +### Switching to Another Organization + +If you're a Databend Cloud user who has accepted invitations from multiple organizations, you can switch between these organizations by clicking on the organization name in the top left corner of the page and selecting the organization you want to switch to. + +![Alt text](@site/static/img/documents/overview/switch-org.gif) + +## Managing Members + +To view all the members in your organization, go to **Admin** > **Users & Roles**. This page provides a list of all members, including their email addresses, roles, join times, and last active times. If you're an `account_admin`, you can also change a member's role or remove a member from your organization. + +- The roles listed show the roles assigned to users when they were invited. While these roles can be changed on the page, they cannot be revoked using SQL. However, you can grant additional roles, or grant privileges to roles and assign them to users based on their email addresses. These user accounts, identified by their email addresses, can also function as SQL users in Databend Cloud. Example: + +```sql +GRANT SELECT ON *.* TO ROLE writer; +GRANT ROLE writer to 'eric@databend.com'; +``` + +- The page does not display users created using SQL. To view the SQL users that have been created([**CREATE USER**](/tidb-cloud-lake/sql/create-user.md)、[**CREATE ROLE**](/tidb-cloud-lake/sql/create-role.md)), use the [SHOW USERS](/tidb-cloud-lake/sql/show-users.md) command. + +### Inviting New Members + +To invite a new member to your organization, navigate to the **Admin** > **Users & Roles** page and click on **Invite New Member**. In the dialog box that appears, enter the user's email address and select a role from the list. This list includes built-in roles and any created roles created for your organization. For more information about the roles, see [Roles](/tidb-cloud-lake/guides/roles.md). + +An invitation email will be sent to the invited user. Inside the email, there will be a link that the user can click on to initiate the signup process. + +![Alt text](@site/static/img/documents/overview/invite.png) + +![Alt text](@site/static/img/documents/overview/invite2.png) + +:::note + +- Inviting new members to the organization is a privilege restricted to account_admin roles only. + +- If your organization is under the Trial Plan, it permits a maximum of one user. In such a case, you won't have the capability to extend invitations to additional members. + ::: diff --git a/tidb-cloud-lake/guides/ownership.md b/tidb-cloud-lake/guides/ownership.md new file mode 100644 index 0000000000000..62cccde74505a --- /dev/null +++ b/tidb-cloud-lake/guides/ownership.md @@ -0,0 +1,67 @@ +--- +title: Ownership +--- + +Ownership is a specialized privilege that signifies the exclusive rights and responsibilities a role holds over a specific data object (currently including a database, table, UDF, and stage) within Databend. + +## Granting Ownership + +An object's ownership is automatically granted to the role of the user who creates it and can be transferred between roles using the [GRANT](/tidb-cloud-lake/sql/grant.md) command: + +- Granting ownership of an object to a new role transfers full ownership to the new role, removing it from the previous role. For example, if Role A initially owns a table and you grant ownership to Role B, Role B will become the new owner, and Role A will no longer have ownership rights to that table. +- Granting ownership to the built-in role `public` is not recommended for security reasons. If a user is in the `public` role when creating a object, then all users will have ownership of the object because each user has the `public` role by default. Databend recommends creating and assigning customized roles to users instead of using the `public` role for clarified ownership management. For information about the built-in roles, see [Built-in Roles](/tidb-cloud-lake/guides/roles.md). +- Ownership cannot be granted for tables in the `default` database, as it is owned by the built-in role `account_admin`. + +## Revoking Ownership Not Allowed + +Revoking ownership is *not* supported because every object must have an owner. + +- If an object is dropped, it will not retain its ownership by the original role. If the object is restored (if possible), ownership will not be automatically reassigned, and an `account_admin` will need to manually reassign ownership to a role. +- If a role that owns an object is deleted, an `account_admin` can transfer ownership of the object to another role. + +## Examples + +To grant ownership to a role, use the [GRANT](/tidb-cloud-lake/sql/grant.md) command. These examples demonstrate granting ownership of different database objects to the role 'data_owner': + +```sql +-- Grant ownership of all tables in the 'finance_data' database to the role 'data_owner' +GRANT OWNERSHIP ON finance_data.* TO ROLE 'data_owner'; + +-- Grant ownership of the table 'transactions' in the 'finance_data' schema to the role 'data_owner' +GRANT OWNERSHIP ON finance_data.transactions TO ROLE 'data_owner'; + +-- Grant ownership of the stage 'ingestion_stage' to the role 'data_owner' +GRANT OWNERSHIP ON STAGE ingestion_stage TO ROLE 'data_owner'; + +-- Grant ownership of the user-defined function 'calculate_profit' to the role 'data_owner' +GRANT OWNERSHIP ON UDF calculate_profit TO ROLE 'data_owner'; +``` + +This example demonstrates the establishment of role-based ownership in Databend. Administrators create a role 'role1' and assign it to user 'u1'. Permissions to create tables in the 'db' schema are granted to 'role1'. Consequently, when 'u1' logs in, they possess the privileges of 'role1', allowing them to create and own tables under 'db'. However, access to tables not owned by 'role1' is restricted, as evidenced by the failed query on 'db.t_old_exists'. + +```sql +-- Admin creates roles and assigns roles to corresponding users +CREATE ROLE role1; +CREATE USER u1 IDENTIFIED BY '123' WITH DEFAULT ROLE 'role1'; +GRANT CREATE ON db.* TO ROLE role1; +GRANT ROLE role1 TO u1; + +-- After u1 logs into Databend, role1 has been granted to u1, so u1 can create and own tables under db: +u1> CREATE TABLE db.t(id INT); +u1> INSERT INTO db.t VALUES(1); +u1> SELECT * FROM db.t; +u1> SELECT * FROM db.t_old_exists; -- Failed because the owner of this table is not role1 +``` + +This example shows how to let a user create databases that are owned only by their role, so other users cannot see them unless explicitly granted access: + +```sql +CREATE ROLE part1_role; +GRANT CREATE DATABASE ON *.* TO ROLE part1_role; +CREATE USER user1 IDENTIFIED BY 'abc123' WITH DEFAULT ROLE 'part1_role'; +GRANT ROLE part1_role TO user1; + +-- When user1 creates a database, ownership is assigned to part1_role. +-- Other users will not be able to see or access that database unless +-- privileges or ownership are granted to their roles. +``` diff --git a/tidb-cloud-lake/guides/password-policy.md b/tidb-cloud-lake/guides/password-policy.md new file mode 100644 index 0000000000000..dc9ee314e01a8 --- /dev/null +++ b/tidb-cloud-lake/guides/password-policy.md @@ -0,0 +1,111 @@ +--- +title: Password Policy +--- + +Password policies define how strong a Databend password must be (length, characters, history, retry limits, and more) and how often it can change. They add predictable guardrails around every `CREATE USER` and password change. For the full list of attributes, see [Password Policy Attributes](/tidb-cloud-lake/sql/create-password-policy.md#password-policy-attributes). + +## How It Works + +- SQL users start with no password policy. Assign one either when creating the user (`CREATE USER ... WITH SET PASSWORD POLICY`) or later via [ALTER USER](/tidb-cloud-lake/sql/alter-user.md). Policies do **not** apply to admin accounts declared in [`databend-query.toml`](https://github.com/databendlabs/databend/blob/main/scripts/distribution/configs/databend-query.toml). +- Whenever a managed user sets or changes a password, Databend validates the complexity rules (length and character mix) and, for password changes, enforces minimum age and password history. +- On login, Databend also tracks failed attempts and lockouts based on `PASSWORD_MAX_RETRIES`/`PASSWORD_LOCKOUT_TIME_MINS`, and it flags expired passwords after `PASSWORD_MAX_AGE_DAYS`. Expired users can log in only to change their password. + +:::note +Users normally cannot change their own password unless they have the built-in `account-admin` role. An `account-admin` can run `ALTER USER ... IDENTIFIED BY ...` to rotate passwords for anyone. +::: + +## End-to-End Example + +This walkthrough creates dedicated policies for administrators and analysts, binds them to users, and shows how to revise or remove them later. + +### 1. Create Policies and Inspect Them + +```sql +CREATE PASSWORD POLICY dba_policy + PASSWORD_MIN_LENGTH = 12 + PASSWORD_MAX_LENGTH = 18 + PASSWORD_MIN_UPPER_CASE_CHARS = 2 + PASSWORD_MIN_LOWER_CASE_CHARS = 2 + PASSWORD_MIN_NUMERIC_CHARS = 2 + PASSWORD_MIN_SPECIAL_CHARS = 1 + PASSWORD_MIN_AGE_DAYS = 1 + PASSWORD_MAX_AGE_DAYS = 45 + PASSWORD_MAX_RETRIES = 3 + PASSWORD_LOCKOUT_TIME_MINS = 30 + PASSWORD_HISTORY = 5 + COMMENT='Strict controls for DBAs'; + +CREATE PASSWORD POLICY analyst_policy + COMMENT='Defaults for analysts'; + +SHOW PASSWORD POLICIES; + +┌─────────────────┬───────────────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ comment │ options │ +├─────────────────┼───────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ analyst_policy │ Defaults for analysts │ MIN_LENGTH=8, MAX_LENGTH=256, MIN_UPPER_CASE_CHARS=1, MIN_LOWER_CASE_CHARS=1, MIN_NUMERIC_CHARS=1, MIN_SPECIAL_CHARS=0, ... HISTORY=0 │ +│ dba_policy │ Strict controls for DBAs │ MIN_LENGTH=12, MAX_LENGTH=18, MIN_UPPER_CASE_CHARS=2, MIN_LOWER_CASE_CHARS=2, MIN_NUMERIC_CHARS=2, MIN_SPECIAL_CHARS=1, ... HISTORY=5 │ +└─────────────────┴───────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### 2. Attach the Policy to Users + +```sql +CREATE USER dba_jane IDENTIFIED BY 'Str0ngPass123!' WITH SET PASSWORD POLICY='dba_policy'; + +CREATE USER analyst_mike IDENTIFIED BY 'Abc12345' + WITH SET PASSWORD POLICY='analyst_policy'; + +CREATE USER analyst_zoe IDENTIFIED BY 'Byt3Crush!'; +ALTER USER analyst_zoe WITH SET PASSWORD POLICY='analyst_policy'; +``` + +### 3. Verify the Assignments + +```sql +DESC USER dba_jane; + +┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ hostname │ auth_type │ default_role │ roles │ disabled │ network_policy │ password_policy │ must_change_password │ +├─────────┼──────────┼──────────────────────┼──────────────┼───────┼──────────┼────────────────┼─────────────────┼──────────────────────┤ +│ dba_jane│ % │ double_sha1_password │ │ │ false │ │ dba_policy │ NULL │ +└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ + +DESC PASSWORD POLICY dba_policy; + +Name |Comment |Options +-----------+----------------------------+---------------------------------------------------------------------------------------------------------------------------------+ +dba_policy |Strict controls for DBAs |MIN_LENGTH=12,MAX_LENGTH=18,MIN_UPPER_CASE_CHARS=2,MIN_LOWER_CASE_CHARS=2,MIN_NUMERIC_CHARS=2,MIN_SPECIAL_CHARS=1,...,HISTORY=5 | +``` + +### 4. Update a Policy Centrally + +Use [ALTER PASSWORD POLICY](/tidb-cloud-lake/sql/alter-password-policy.md) to tighten rules without touching each user: + +```sql +ALTER PASSWORD POLICY analyst_policy SET + PASSWORD_MIN_SPECIAL_CHARS = 1 + PASSWORD_MAX_AGE_DAYS = 60 + COMMENT='Analysts need specials now'; + +DESC PASSWORD POLICY analyst_policy; + +Name |Comment |Options +---------------+-----------------------------+------------------------------------------------------------------------------------------------------------------------+ +analyst_policy |Analysts need specials now |MIN_LENGTH=8,MAX_LENGTH=256,MIN_UPPER_CASE_CHARS=1,MIN_LOWER_CASE_CHARS=1,MIN_NUMERIC_CHARS=1,MIN_SPECIAL_CHARS=1,... | +``` + +Every user referencing `analyst_policy` now inherits the stricter password mix and expiry window automatically. + +### 5. Detach and Clean Up + +```sql +ALTER USER analyst_zoe WITH UNSET PASSWORD POLICY; +DROP PASSWORD POLICY analyst_policy; +``` + +Databend prevents you from dropping a policy that is still in use; unset it from all users before running `DROP PASSWORD POLICY`. + +--- + +For full syntax, see the [Password Policy SQL reference](/tidb-cloud-lake/sql/password-policy.md), which covers `CREATE`, `ALTER`, `SHOW`, `DESC`, and `DROP`. diff --git a/tidb-cloud-lake/guides/performance-optimization.md b/tidb-cloud-lake/guides/performance-optimization.md new file mode 100644 index 0000000000000..dca150139ccbf --- /dev/null +++ b/tidb-cloud-lake/guides/performance-optimization.md @@ -0,0 +1,27 @@ +--- +title: Performance Optimization +--- + +Databend primarily accelerates query performance through **various indexing technologies**, including data clustering, result caching, and specialized indexes, helping you significantly improve query response times. + +## Optimization Features + +| Feature | Purpose | When to Use | +|---------|---------|------------| +| [**Cluster Key**](/tidb-cloud-lake/sql/cluster-key.md) | Automatically organize data physically for optimal query performance | When you have large tables with frequent filtering on specific columns, especially time-series or categorical data | +| [**Query Result Cache**](/tidb-cloud-lake/guides/query-result-cache.md) | Automatically store and reuse results of identical queries | When your applications run the same analytical queries repeatedly, such as in dashboards or scheduled reports | +| [**Virtual Column**](/tidb-cloud-lake/guides/virtual-column.md) | Automatically accelerate access to fields within JSON/VARIANT data | When you frequently query specific paths within semi-structured data and need sub-second response times | +| [**Aggregating Index**](/tidb-cloud-lake/guides/aggregating-index.md) | Precompute and store common aggregation results | When your analytical workloads frequently run SUM, COUNT, AVG queries on large datasets | +| [**Full-Text Index**](/tidb-cloud-lake/guides/full-text-index.md) | Enable lightning-fast semantic text search capabilities | When you need advanced text search functionality like relevance scoring and fuzzy matching | +| [**Ngram Index**](/tidb-cloud-lake/guides/ngram-index.md) | Accelerate pattern matching with wildcards | When your queries use LIKE operators with wildcards (especially '%keyword%') on large text columns | + +## Feature Availability + +| Feature | Community | Enterprise | Cloud | +|---------|-----------|------------|-------| +| Cluster Key | ✅ | ✅ | ✅ | +| Query Result Cache | ✅ | ✅ | ✅ | +| Virtual Column | ❌ | ✅ | ✅ | +| Aggregating Index | ✅ | ✅ | ✅ | +| Full-Text Index | ✅ | ✅ | ✅ | +| Ngram Index | ✅ | ✅ | ✅ | diff --git a/tidb-cloud-lake/guides/platforms-regions.md b/tidb-cloud-lake/guides/platforms-regions.md new file mode 100644 index 0000000000000..4a5eef0bb1a5e --- /dev/null +++ b/tidb-cloud-lake/guides/platforms-regions.md @@ -0,0 +1,12 @@ +--- +title: Platforms & Regions +--- + +import LanguageFileParse from '@site/src/components/LanguageDocs/file-parse' +import PlatformsCN from '@site/docs/fragment/01-platforms-cn.md' +import PlatformsEN from '@site/docs/fragment/01-platforms-en.md' + +} +cn={} +/> diff --git a/tidb-cloud-lake/guides/pricing-billing.md b/tidb-cloud-lake/guides/pricing-billing.md new file mode 100644 index 0000000000000..82b3385bebce1 --- /dev/null +++ b/tidb-cloud-lake/guides/pricing-billing.md @@ -0,0 +1,12 @@ +--- +title: Pricing & Billing +--- + +import LanguageFileParse from '@site/src/components/LanguageDocs/file-parse' +import PricingEN from '@site/docs/fragment/03-pricing-en.md' +import PricingCN from '@site/docs/fragment/03-pricing-cn.md' + +} +cn={} +/> diff --git a/tidb-cloud-lake/guides/privileges.md b/tidb-cloud-lake/guides/privileges.md new file mode 100644 index 0000000000000..234ca1ed63d47 --- /dev/null +++ b/tidb-cloud-lake/guides/privileges.md @@ -0,0 +1,252 @@ +--- +title: Privileges +--- + +A privilege is a permission to perform an action. Users must have specific privileges to execute particular actions within Databend. For example, when querying a table, a user needs `SELECT` privileges to the table. Similarly, to read a dataset within a stage, the user must possess `READ` privileges. + +In Databend, privileges are granted to roles. Users receive privileges through the roles assigned to them. + +![Alt text](/img/guides/access-control-2.png) + +## Managing Privileges + +To manage privileges for a role, use the following commands: + +- [GRANT](/tidb-cloud-lake/sql/grant.md) +- [REVOKE](/tidb-cloud-lake/sql/revoke.md) +- [SHOW GRANTS](/tidb-cloud-lake/sql/show-grants.md) + +### Granting Privileges to Roles + +To grant a privilege, create a role, grant the privilege to the role, and then grant that role to users who need it. In the following example, a new role named 'writer' is created and granted all privileges on objects in the 'default' schema. Subsequently, 'david' is created as a new user with the password 'abc123', and the 'writer' role is granted to 'david'. Finally, the granted privileges for 'writer' are shown. + +```sql title='Example:' +-- Create a new role named 'writer' +CREATE ROLE writer; + +-- Grant all privileges on all objects in the 'default' schema to the role 'writer' +GRANT ALL ON default.* TO ROLE writer; + +-- Create a new user named 'david' with the password 'abc123' and set the default role +CREATE USER david IDENTIFIED BY 'abc123' WITH DEFAULT_ROLE = 'writer'; + +-- Grant the role 'writer' to the user 'david' +GRANT ROLE writer TO david; + +-- Show the granted privileges for the role 'writer' +SHOW GRANTS FOR ROLE writer; + +┌───────────────────────────────────────────────────────┐ +│ Grants │ +├───────────────────────────────────────────────────────┤ +│ GRANT ALL ON 'default'.'default'.* TO ROLE 'writer' │ +└───────────────────────────────────────────────────────┘ +``` + +### Revoking Privileges from Roles + +In the context of access control, privileges are revoked from roles. In the following example, we revoke all privileges on all objects in the 'default' schema from role 'writer', and then we display the granted privileges for role 'writer': + +```sql title='Example (Continued):' +-- Revoke all privileges on all objects in the 'default' schema from role 'writer' +REVOKE ALL ON default.* FROM ROLE writer; + +-- Show the granted privileges for the role 'writer' +SHOW GRANTS FOR ROLE writer; +``` + + +## Access Control Privileges + +Databend offers a range of privileges that allow you to exercise fine-grained control over your database objects. Databend privileges can be categorized into the following types: + +- Global privileges: This set of privileges includes privileges that apply to the entire database management system, rather than specific objects within the system. Global privileges grant actions that affect the overall functionality and administration of the database, such as creating or deleting databases, managing users and roles, and modifying system-level settings. For which privileges are included, see [Global Privileges](#global-privileges). + +- Object-specific privileges: Object-specific privileges come with different sets and each one applies to a specific database object. This includes: + - [Table Privileges](#table-privileges) + - [View Privileges](#view-privileges) + - [Database Privileges](#database-privileges) + - [Session Policy Privileges](#session-policy-privileges) + - [Stage Privileges](#stage-privileges) + - [UDF Privileges](#udf-privileges) + - [Sequence Privileges](#sequence-privileges) + - [Connection Privileges](#connection-privileges) + - [Procedure Privileges](#procedure-privileges) + - [Catalog Privileges](#catalog-privileges) + - [Share Privileges](#share-privileges) + +### All Privileges + +| Privilege | Object Type | Description | +|:------------------|:------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------| +| ALL | All | Grants all the privileges for the specified object type. | +| APPLY MASKING POLICY | Global, Masking Policy | Attaches, detaches, describes, or drops masking policies. When granted on *.*, the grantee can manage any masking policy. | +| APPLY ROW ACCESS POLICY | Global, Row Access Policy | Adds or removes row access policies from tables and allows DESCRIBE/DROP operations on any policy. When granted on *.*, the grantee can manage every row access policy. | +| ALTER | Global, Database, Table, View | Alters a database, table, user or UDF. | +| CREATE | Global, Table | Creates a table or UDF. | +| CREATE DATABASE | Global | Creates a database or UDF. | +| CREATE WAREHOUSE | Global | Creates a warehouse. | +| CREATE CONNECTION | Global | Creates a connection. | +| CREATE SEQUENCE | Global | Creates a sequence. | +| CREATE PROCEDURE | PROCEDURE | Creates a procedure. | +| CREATE MASKING POLICY | Global | Creates a masking policy. | +| CREATE ROW ACCESS POLICY | Global | Creates a row access policy. | +| DELETE | Table | Deletes or truncates rows in a table. | +| DROP | Global, Database, Table, View | Drops a database, table, view or UDF. Undrops a table. | +| INSERT | Table | Inserts rows into a table. | +| SELECT | Database, Table | Selects rows from a table. Shows or uses a database. | +| UPDATE | Table | Updates rows in a table. | +| GRANT | Global | Grants / revokes privileges to / from a role. | +| SUPER | Global, Table | Kills a query. Sets global configs. Optimizes a table. Analyzes a table. Operates a stage(Lists stages. Creates, Drops a stage), catalog or share. | +| USAGE | Global | Synonym for “no privileges”. | +| CREATE ROLE | Global | Creates a role. | +| DROP ROLE | Global | Drops a role. | +| CREATE USER | Global | Creates a SQL user. | +| DROP USER | Global | Drops a SQL user. | +| WRITE | Stage | Write into a stage. | +| READ | Stage | Read a stage. | +| USAGE | UDF | Use udf. | +| ACCESS CONNECTION | CONNECTION | Access connection. | +| ACCESS SEQUENCE | SEQUENCE | Access sequence. | +| ACCESS PROCEDURE | PROCEDURE | Access procedure. | + +### Global Privileges + +| Privilege | Description | +|:------------------|:------------------------------------------------------------------------------------------------------------------| +| ALL | Grants all the privileges for the specified object type. | +| ALTER | Adds or drops a table column. Alters a cluster key. Re-clusters a table. | +| CREATEROLE | Creates a role. | +| CREAT DATABASE | Creates a DATABASE. | +| CREATE WAREHOUSE | Creates a WAREHOUSE. | +| CREATE CONNECTION | Creates a CONNECTION. | +| DROPUSER | Drops a user. | +| CREATEUSER | Creates a user. | +| DROPROLE | Drops a role. | +| SUPER | Kills a query. Sets or unsets a setting. Operates a stage, catalog or share. Calls a function. COPY INTO a stage. | +| USAGE | Connects to a databend query only. | +| CREATE | Creates a UDF. | +| DROP | Drops a UDF. | +| ALTER | Alters a UDF. Alters a SQL user. | + +### Table Privileges + +| Privilege | Description | +|:----------|:-----------------------------------------------------------------------------------------------------------------| +| ALL | Grants all the privileges for the specified object type. | +| ALTER | Adds or drops a table column. Alters a cluster key. Re-clusters a table. | +| CREATE | Creates a table. | +| DELETE | Deletes rows in a table. Truncates a table. | +| DROP | Drops or undrops a table. Restores the recent version of a dropped table. | +| INSERT | Inserts rows into a table. COPY INTO a table. | +| SELECT | Selects rows from a table. SHOW CREATE a table. DESCRIBE a table. | +| UPDATE | Updates rows in a table. | +| SUPER | Optimizes or analyzes a table. | +| OWNERSHIP | Grants full control over a database. Only a single role can hold this privilege on a specific object at a time. | + +### View Privileges + +| Privilege | Description | +|:----------|:-----------------------------------------------------------------------| +| ALL | Grants all the privileges for the specified object type | +| ALTER | Creates or drops a view. Alters the existing view using another QUERY. | +| DROP | Drops a view. | + +### Database Privileges + +Please note that you can use the [USE DATABASE](/tidb-cloud-lake/sql/use-database.md) command to specify a database once you have any of the following privileges to the database or any privilege to a table in the database. + +| Privilege | Description | +|:----------|:-----------------------------------------------------------------------------------------------------------------| +| ALTER | Renames a database. | +| DROP | Drops or undrops a database. Restores the recent version of a dropped database. | +| SELECT | SHOW CREATE a database. | +| OWNERSHIP | Grants full control over a database. Only a single role can hold this privilege on a specific object at a time. | +| USAGE | Allows entering a database using `USE `, without granting access to any contained objects. | + +> Note: +> +> 1. If a role owns a database, the role can access all the tables in the database. + + +### Session Policy Privileges + +| Privilege | Description | +| :-- | :-- | +| SUPER | Kills a query. Sets or unsets a setting. | +| ALL | Grants all the privileges for the specified object type. | + +### Stage Privileges + +| Privilege | Description | +|:----------|:--------------------------------------------------------------------------------------------------------------| +| WRITE | Write into a stage. For example, copy into a stage, presign upload or removes a stage | +| READ | Read a stage. For example, list stage, query stage, copy into table from stage, presign download | +| ALL | Grants READ, WRITE privileges for the specified object type. | +| OWNERSHIP | Grants full control over a stage. Only a single role can hold this privilege on a specific object at a time. | + +> Note: +> +> 1. Don't check external location auth. + +### UDF Privileges + +| Privilege | Description | +|:----------|:------------------------------------------------------------------------------------------------------------| +| USAGE | Can use UDF. For example, copy into a stage, presign upload | +| ALL | Grants READ, WRITE privileges for the specified object type. | +| OWNERSHIP | Grants full control over a UDF. Only a single role can hold this privilege on a specific object at a time. | + +> Note: +> +> 1. Don't check the udf auth if it's already be constantly folded. +> 2. Don't check the udf auth if it's a value in insert. + +### Catalog Privileges + +| Privilege | Description | +|:----------|:---------------------------------------------------------| +| SUPER | SHOW CREATE catalog. Creates or drops a catalog. | +| ALL | Grants all the privileges for the specified object type. | + +### Connection Privileges + +| Privilege | Description | +|:------------------|:-------------------------------------------------------------------------------------------------------------------| +| Access Connection | Can access Connection. | +| ALL | Grants Access Connection privileges for the specified object type. | +| OWNERSHIP | Grants full control over a Connection. Only a single role can hold this privilege on a specific object at a time. | + +### Sequence Privileges + +| Privilege | Description | +|:----------------|:-----------------------------------------------------------------------------------------------------------------| +| Access Sequence | Can access Sequence.(e.g. Drop,Desc) | +| ALL | Grants Access Sequence privileges for the specified object type. | +| OWNERSHIP | Grants full control over a Sequence. Only a single role can hold this privilege on a specific object at a time. | + +### Procedure Privileges + +| Privilege | Description | +|:-----------------|:------------------------------------------------------------------------------------------------------------------| +| Access Procedure | Can access Procedure.(e.g. Drop,Call,Desc) | +| ALL | Grants Access Procedure privileges for the specified object type. | +| OWNERSHIP | Grants full control over a Procedure. Only a single role can hold this privilege on a specific object at a time. | + +### Masking Policy Privileges + +In addition to the global `CREATE MASKING POLICY` and `APPLY MASKING POLICY` privileges, you can grant access to individual masking policies: + +| Privilege | Description | +|:----------|:--------------------------------------------------------------------------------------------------------------------------------------| +| APPLY | Attaches or detaches the masking policy from columns, and allows DESC/DROP operations on the policy. | +| OWNERSHIP | Grants full control over a masking policy. Databend grants OWNERSHIP to the role that creates the policy and revokes it automatically when the policy is dropped. | + +### Row Access Policy Privileges + +Row access policies share the same governance model. Beyond the global `CREATE ROW ACCESS POLICY` and `APPLY ROW ACCESS POLICY` privileges, grant access per policy when needed: + +| Privilege | Description | +|:----------|:---------------------------------------------------------------------------------------------------------------------------------------------------| +| APPLY | Adds or removes the row access policy from tables and allows DESC/DROP operations on the policy. | +| OWNERSHIP | Grants full control over a row access policy. Databend grants OWNERSHIP to the creator role and revokes it automatically when the policy is dropped. | diff --git a/tidb-cloud-lake/guides/query-avro-files-in-stage.md b/tidb-cloud-lake/guides/query-avro-files-in-stage.md new file mode 100644 index 0000000000000..44a3bd928009e --- /dev/null +++ b/tidb-cloud-lake/guides/query-avro-files-in-stage.md @@ -0,0 +1,97 @@ +--- +title: Querying Avro Files in Stage +sidebar_label: Avro +--- + +## Syntax: + +- [Query rows as Variants](/tidb-cloud-lake/guides/query-stage.md#query-rows-as-variants) +- [Query Metadata](/tidb-cloud-lake/guides/query-stage.md#query-metadata) + +## Avro Querying Features Overview + +Databend provides comprehensive support for querying Avro files directly from stages. This allows for flexible data exploration and transformation without needing to load the data into a table first. + +* **Variant Representation**: Each row in an Avro file is treated as a variant, referenced by `$1`. This allows for flexible access to nested structures within the Avro data. +* **Type Mapping**: Each Avro type is mapped to a corresponding variant type in Databend. +* **Metadata Access**: You can access metadata columns like `METADATA$FILENAME` and `METADATA$FILE_ROW_NUMBER` for additional context about the source file and row. + +## Tutorial + +This tutorial demonstrates how to query Avro files stored in a stage. + +### Step 1. Prepare an Avro File + +Consider an Avro file with the following schema named `user`: + +```json +{ + "type": "record", + "name": "user", + "fields": [ + { + "name": "id", + "type": "long" + }, + { + "name": "name", + "type": "string" + } + ] +} +``` + +### Step 2. Create an External Stage + +Create an external stage with your own S3 bucket and credentials where your Avro files are stored. + +```sql +CREATE STAGE avro_query_stage +URL = 's3://load/avro/' +CONNECTION = ( + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = '' +); +``` + +### Step 3. Query Avro Files + +#### Basic Query + +Query Avro files directly from a stage: + +```sql +SELECT + CAST($1:id AS INT) AS id, + $1:name AS name +FROM @avro_query_stage +( + FILE_FORMAT => 'AVRO', + PATTERN => '.*[.]avro' +); +``` + +### Query with Metadata + +Query Avro files directly from a stage, including metadata columns like `METADATA$FILENAME` and `METADATA$FILE_ROW_NUMBER`: + +```sql +SELECT + METADATA$FILENAME, + METADATA$FILE_ROW_NUMBER, + CAST($1:id AS INT) AS id, + $1:name AS name +FROM @avro_query_stage +( + FILE_FORMAT => 'AVRO', + PATTERN => '.*[.]avro' +); +``` + +## Type Mapping to Variant + +Variants in Databend are stored as JSONB. While most Avro types map straightforwardly, some special considerations apply: + +* **Time Types**: `TimeMillis` and `TimeMicros` are mapped to `INT64` as JSONB does not have a native Time type. Users should be aware of the original type when processing these values. +* **Decimal Types**: Decimals are loaded as `DECIMAL128` or `DECIMAL256`. An error may occur if the precision exceeds the supported limits. +* **Enum Types**: Avro `ENUM` types are mapped to `STRING` values in Databend. diff --git a/tidb-cloud-lake/guides/query-csv-files-in-stage.md b/tidb-cloud-lake/guides/query-csv-files-in-stage.md new file mode 100644 index 0000000000000..34c1bf643c16d --- /dev/null +++ b/tidb-cloud-lake/guides/query-csv-files-in-stage.md @@ -0,0 +1,73 @@ +--- +title: Querying CSV Files in Stage +sidebar_label: CSV +--- + +## Syntax: + +- [Query columns by position](/tidb-cloud-lake/guides/query-stage.md#query-columns-by-position) +- [Query Metadata](/tidb-cloud-lake/guides/query-stage.md#query-metadata) + +## Tutorial + +### Step 1. Create an External Stage + +Create an external stage with your own S3 bucket and credentials where your CSV files are stored. +```sql +CREATE STAGE csv_query_stage +URL = 's3://load/csv/' +CONNECTION = ( + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = '' +); +``` + +### Step 2. Create Custom CSV File Format + +```sql +CREATE FILE FORMAT csv_query_format + TYPE = CSV, + RECORD_DELIMITER = '\n', + FIELD_DELIMITER = ',', + COMPRESSION = AUTO, + SKIP_HEADER = 1; -- Skip first line when querying if the CSV file has header +``` + +- More CSV file format options refer to [CSV File Format Options](/tidb-cloud-lake/sql/input-output-file-formats.md#csv-options) + +### Step 3. Query CSV Files + +```sql +SELECT $1, $2, $3 +FROM @csv_query_stage +( + FILE_FORMAT => 'csv_query_format', + PATTERN => '.*[.]csv' +); +``` + +If the CSV files is compressed with gzip, we can use the following query: + +```sql +SELECT $1, $2, $3 +FROM @csv_query_stage +( + FILE_FORMAT => 'csv_query_format', + PATTERN => '.*[.]csv[.]gz' +); +``` +### Query with Metadata + +Query CSV files directly from a stage, including metadata columns like `METADATA$FILENAME` and `METADATA$FILE_ROW_NUMBER`: + +```sql +SELECT + METADATA$FILENAME, + METADATA$FILE_ROW_NUMBER, + $1, $2, $3 +FROM @csv_query_stage +( + FILE_FORMAT => 'csv_query_format', + PATTERN => '.*[.]csv' +); +``` \ No newline at end of file diff --git a/tidb-cloud-lake/guides/query-ndjson-files-in-stage.md b/tidb-cloud-lake/guides/query-ndjson-files-in-stage.md new file mode 100644 index 0000000000000..172a6bbba6f81 --- /dev/null +++ b/tidb-cloud-lake/guides/query-ndjson-files-in-stage.md @@ -0,0 +1,91 @@ +--- +title: Querying NDJSON Files in Stage +sidebar_label: NDJSON +--- + +In Databend, you can directly query NDJSON files stored in stages without first loading the data into tables. This approach is particularly useful for data exploration, ETL processing, and ad-hoc analysis scenarios. + +## What is NDJSON? + +NDJSON (Newline Delimited JSON) is a JSON-based file format where each line contains a complete and valid JSON object. This format is especially well-suited for streaming data processing and big data analytics. + +**Example NDJSON file content:** +```json +{"id": 1, "title": "Database Fundamentals", "author": "John Doe", "price": 45.50, "category": "Technology"} +{"id": 2, "title": "Machine Learning in Practice", "author": "Jane Smith", "price": 68.00, "category": "AI"} +{"id": 3, "title": "Web Development Guide", "author": "Mike Johnson", "price": 52.30, "category": "Frontend"} +``` + +**Advantages of NDJSON:** +- **Stream-friendly**: Can be parsed line by line without loading entire file into memory +- **Big data compatible**: Widely used in log files, data exports, and ETL pipelines +- **Easy to process**: Each line is an independent JSON object, enabling parallel processing + +## Syntax + +- [Query rows as Variants](/tidb-cloud-lake/guides/query-stage.md#query-rows-as-variants) +- [Query Metadata](/tidb-cloud-lake/guides/query-stage.md#query-metadata) + +## Tutorial + +### Step 1. Create an External Stage + +Create an external stage with your own S3 bucket and credentials where your NDJSON files are stored. +```sql +CREATE STAGE ndjson_query_stage +URL = 's3://load/ndjson/' +CONNECTION = ( + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = '' +); +``` + +### Step 2. Create Custom NDJSON File Format + +```sql +CREATE FILE FORMAT ndjson_query_format + TYPE = NDJSON, + COMPRESSION = AUTO; +``` + +- More NDJSON file format options refer to [NDJSON File Format Options](/tidb-cloud-lake/sql/input-output-file-formats.md#ndjson-options) + +### Step 3. Query NDJSON Files + +Now you can query the NDJSON files directly from the stage. This example extracts the `title` and `author` fields from each JSON object: + +```sql +SELECT $1:title, $1:author +FROM @ndjson_query_stage +( + FILE_FORMAT => 'ndjson_query_format', + PATTERN => '.*[.]ndjson' +); +``` + +**Explanation:** +- `$1:title` and `$1:author`: Extract specific fields from the JSON object. The `$1` represents the entire JSON object as a variant, and `:field_name` accesses individual fields +- `@ndjson_query_stage`: References the external stage created in Step 1 +- `FILE_FORMAT => 'ndjson_query_format'`: Uses the custom file format defined in Step 2 +- `PATTERN => '.*[.]ndjson'`: Regex pattern that matches all files ending with `.ndjson` + +### Querying Compressed Files + +If the NDJSON files are compressed with gzip, modify the pattern to match compressed files: + +```sql +SELECT $1:title, $1:author +FROM @ndjson_query_stage +( + FILE_FORMAT => 'ndjson_query_format', + PATTERN => '.*[.]ndjson[.]gz' +); +``` + +**Key difference:** The pattern `.*[.]ndjson[.]gz` matches files ending with `.ndjson.gz`. Databend automatically decompresses gzip files during query execution thanks to the `COMPRESSION = AUTO` setting in the file format. + +## Related Documentation + +- [Loading NDJSON Files](/tidb-cloud-lake/guides/load-ndjson.md) - How to load NDJSON data into tables +- [NDJSON File Format Options](/tidb-cloud-lake/sql/input-output-file-formats.md#ndjson-options) - Complete NDJSON format configuration +- [CREATE STAGE](/tidb-cloud-lake/sql/create-stage.md) - Managing external and internal stages \ No newline at end of file diff --git a/tidb-cloud-lake/guides/query-parquet-files-in-stage.md b/tidb-cloud-lake/guides/query-parquet-files-in-stage.md new file mode 100644 index 0000000000000..01b8ee4350861 --- /dev/null +++ b/tidb-cloud-lake/guides/query-parquet-files-in-stage.md @@ -0,0 +1,74 @@ +--- +title: Querying Parquet Files in Stage +sidebar_label: Parquet +--- + + +## Syntax: + +- [Query rows as Variants](/tidb-cloud-lake/guides/query-stage.md#query-rows-as-variants) +- [Query columns by name](/tidb-cloud-lake/guides/query-stage.md#query-columns-by-name) +- [Query Metadata](/tidb-cloud-lake/guides/query-stage.md#query-metadata) + +## Tutorial + +### Step 1. Create an External Stage + +Create an external stage with your own S3 bucket and credentials where your Parquet files are stored. +```sql +CREATE STAGE parquet_query_stage +URL = 's3://load/parquet/' +CONNECTION = ( + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = '' +); +``` + +### Step 2. Create Custom Parquet File Format + +```sql +CREATE FILE FORMAT parquet_query_format TYPE = PARQUET; +``` +- More Parquet file format options refer to [Parquet File Format Options](/tidb-cloud-lake/sql/input-output-file-formats.md#parquet-options) + +### Step 3. Query Parquet Files + +query with colum names: + +```sql +SELECT * +FROM @parquet_query_stage +( + FILE_FORMAT => 'parquet_query_format', + PATTERN => '.*[.]parquet' +); +``` + +query with path expressions: + + +```sql +SELECT $1 +FROM @parquet_query_stage +( + FILE_FORMAT => 'parquet_query_format', + PATTERN => '.*[.]parquet' +); +``` + + +### Query with Metadata + +Query Parquet files directly from a stage, including metadata columns like `METADATA$FILENAME` and `METADATA$FILE_ROW_NUMBER`: + +```sql +SELECT + METADATA$FILENAME, + METADATA$FILE_ROW_NUMBER, + * +FROM @parquet_query_stage +( + FILE_FORMAT => 'parquet_query_format', + PATTERN => '.*[.]parquet' +); +``` diff --git a/tidb-cloud-lake/guides/query-result-cache.md b/tidb-cloud-lake/guides/query-result-cache.md new file mode 100644 index 0000000000000..35e71142beb17 --- /dev/null +++ b/tidb-cloud-lake/guides/query-result-cache.md @@ -0,0 +1,110 @@ +--- +title: Query Result Cache +--- + +Databend caches and persists the query results for every executed query when enabled. This can be used to great effect to dramatically reduce the time it takes to get an answer. + +## Cache Usage Conditions + +Query results are reused from cache only when **all** conditions are satisfied: + +| Condition | Requirement | +|-----------|-------------| +| **Cache Enabled** | `enable_query_result_cache = 1` in current session | +| **Identical Query** | Query text must match exactly (case-sensitive) | +| **Execution Time** | Original query runtime ≥ `query_result_cache_min_execute_secs` | +| **Result Size** | Cached result ≤ `query_result_cache_max_bytes` | +| **TTL Valid** | Cache age < `query_result_cache_ttl_secs` | +| **Data Consistency** | Table data unchanged since caching (unless `query_result_cache_allow_inconsistent = 1`) | +| **Session Scope** | Cache is session-specific | + +:::note Automatic Cache Invalidation +By default (`query_result_cache_allow_inconsistent = 0`), cached results are automatically invalidated when underlying table data changes. This ensures data consistency but may reduce cache effectiveness in frequently updated tables. +::: + +## Quick Start + +Enable query result caching in your session: + +```sql +-- Enable query result cache +SET enable_query_result_cache = 1; + +-- Optional: Cache all queries (including fast ones) +SET query_result_cache_min_execute_secs = 0; +``` + +## Configuration Settings + +| Setting | Default | Description | +|---------|---------|-------------| +| `enable_query_result_cache` | 0 | Enables/disables query result caching | +| `query_result_cache_allow_inconsistent` | 0 | Allow cached results even if underlying data changed | +| `query_result_cache_max_bytes` | 1048576 | Maximum size (bytes) for a single cached result | +| `query_result_cache_min_execute_secs` | 1 | Minimum execution time before caching | +| `query_result_cache_ttl_secs` | 300 | Cache expiration time (5 minutes) | + +## Performance Example + +This example demonstrates caching a TPC-H Q1 query: + +### 1. Enable Caching +```sql +SET enable_query_result_cache = 1; +SET query_result_cache_min_execute_secs = 0; +``` + +### 2. First Execution (No Cache) +```sql +SELECT + l_returnflag, + l_linestatus, + sum(l_quantity) as sum_qty, + sum(l_extendedprice) as sum_base_price, + sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, + sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, + avg(l_quantity) as avg_qty, + avg(l_extendedprice) as avg_price, + avg(l_discount) as avg_disc, + count(*) as count_order +FROM lineitem +WHERE l_shipdate <= add_days(to_date('1998-12-01'), -90) +GROUP BY l_returnflag, l_linestatus +ORDER BY l_returnflag, l_linestatus; +``` + +**Result**: 4 rows in **21.492 seconds** (600M rows processed) + +### 3. Verify Cache Entry +```sql +SELECT sql, query_id, result_size, num_rows FROM system.query_cache; +``` + +### 4. Second Execution (From Cache) +Run the same query again. + +**Result**: 4 rows in **0.164 seconds** (0 rows processed) + +## Cache Management + +### Monitor Cache Usage +```sql +SELECT * FROM system.query_cache; +``` + +### Access Cached Results +```sql +SELECT * FROM RESULT_SCAN(LAST_QUERY_ID()); +``` + +### Cache Lifecycle +Cached results are automatically removed when: +- **TTL expires** (default: 5 minutes) +- **Result size exceeds limit** (default: 1MB) +- **Session ends** (cache is session-scoped) +- **Underlying data changes** (automatic invalidation for consistency) +- **Table structure changes** (schema modifications invalidate cache) + +:::note Session Scope +Query result cache is session-scoped. Each session maintains its own cache that's automatically cleaned up when the session ends. +::: diff --git a/tidb-cloud-lake/guides/query-stage.md b/tidb-cloud-lake/guides/query-stage.md new file mode 100644 index 0000000000000..cbc67412f76f7 --- /dev/null +++ b/tidb-cloud-lake/guides/query-stage.md @@ -0,0 +1,156 @@ +--- +title: Querying & Transforming +slug: querying-stage +--- + +Databend enables direct querying of staged files without loading data into tables first. Query files from any stage type (user, internal, external) or directly from object storage and HTTPS URLs. Ideal for data inspection, validation, and transformation before or after loading. + +## Syntax + +query only + +```sql +SELECT { + [.] [, [.] ...] -- Query columns by name + | [.]$ [, [.]$ ...] -- Query columns by position + | [.]$1[:] [, [.]$1[:] ...] -- Query rows as Variants +} +FROM {@[/] | ''} -- stage table function + [( -- stage table function parameters + [], + [ PATTERN => ''], + [ FILE_FORMAT => 'CSV | TSV | NDJSON | PARQUET | ORC | Avro | '], + [ FILES => ( '' [ , '' ... ])], + [ CASE_SENSITIVE => true | false ] + )] + [] +``` + +copy with transform + +```sql +COPY INTO [.] [ ( [ , ... ] ) ] + FROM ( + SELECT { + [.] [, [.] ...] -- Query columns by name + | [.]$ [, [.]$ ...] -- Query columns by position + | [.]$1[:] [, [.]$1[:] ...] -- Query rows as Variants + } ] + FROM {@[/] | ''} + ) +[ FILES = ( '' [ , '' ] [ , ... ] ) ] +[ PATTERN = '' ] +[ FILE_FORMAT = ( + FORMAT_NAME = '' + | TYPE = { CSV | TSV | NDJSON | PARQUET | ORC | AVRO } [ formatTypeOptions ] + ) ] +[ copyOptions ] +``` +:::info Note + +compared the two syntaxes +- Same `Select List` +- Same ` FROM {@[/] | ''}` +- diff parameters: + - query use `table function parameters`, i.e. `( => , ...)` + - transform use Options at the end of [Copy into table](/tidb-cloud-lake/sql/copy-into-table.md) + +::: + + +## FROM Clause + +the FROM Clause use similar syntax of `Table Function`. Like ordinary tables, table `alias` can be used when join with other tables. + +table function parameters: + +| Parameter | Description | +|-------------------------|---------------------------------------------------------| +| `FILE_FORMAT` | File format type (CSV, TSV, NDJSON, PARQUET, ORC, Avro) | +| `PATTERN` | Regex pattern to filter files | +| `FILES` | Explicit list of files to query | +| `CASE_SENSITIVE` | Column name case sensitivity (Parquet only) | +| `connection_parameters` | External storage connection details | + +## Query File Data + +The select list supports three syntaxes; only one may be used, with no mixing. + +### Query rows as Variants + +- Supported File Formats: NDJSON, AVRO, Parquet, ORC + +:::info Note + +Currently for Parquet and ORC, `Query rows as Variants` is slower than `Query columns by name`, and the two methods can not be mix used. + +::: + +syntax: + +```sql +SELECT [.]$1[:] [, [.]$1[:] ...] +``` + +- Example: `SELECT $1:id, $1:name FROM ...` +- Table Schema: ($1: Variant). i.e. Single Column with Variant Object Type, each Variant representing a whole row +- Notes: + - The type of path expressions like `$1:column` is Variant too, it can be auto cast to native types when used in expressions or load to dest table column, sometimes you may want to cast manually before for type-specific operations (e.g., `CAST($1:id AS INT)`) to make the semantics more explicit. + + +### Query columns by name +- supported File Formats: NDJSON, AVRO, Parquet, ORC + +```sql +SELECT [.] [, [.] ...] +``` + +- Example: `SELECT id, name FROM ...` +- Table Schema: Columns Mapping from Parquet or ORC file schema +- Notes: + - All files are required to have the same Parquet/ORC schema; otherwise, an error will be returned + + +### Query columns by Position +- supported File Formats: CSV, TSV + +```sql +SELECT [.]$[, [.]$, ...] +``` +- Example: `SELECT $1, $2 FROM ...` +- Table Schema: Columns of type `VARCHAR NULL` +- Notes + - `` starts from 1 + +## Query Metadata + +You can also include file metadata in your queries, which is useful for tracking data lineage and debugging: + +```sql +SELECT METADATA$FILENAME, METADATA$FILE_ROW_NUMBER, $1, +( + FILE_FORMAT => 'ndjson_query_format', + PATTERN => '.*[.]ndjson' +); +``` + +The following file-level metadata fields are available for the supported file formats: + +| File Metadata | Type | Description | +| -------------------------- | ------- |--------------------------------------------------| +| `METADATA$FILENAME` | VARCHAR | The path of the file from which the row was read | +| `METADATA$FILE_ROW_NUMBER` | INT | The row number within the file (starting from 0) | + + +**Use cases:** +- **Data lineage**: Track which source file contributed each record +- **Debugging**: Identify problematic records by file and line number +- **Incremental processing**: Process only specific files or ranges within files + +## Tutorials by File Formats +- [Querying Parquet Files](./00-querying-parquet.md) +- [Querying ORC Files](./05-querying-orc.md) +- [Querying NDJSON Files](./03-querying-ndjson.md) +- [Querying Avro Files](./04-querying-avro.md) +- [Querying CSV Files](./01-querying-csv.md) +- [Querying TSV Files](./02-querying-tsv.md) diff --git a/tidb-cloud-lake/guides/query-staged-orc-files-in-stage.md b/tidb-cloud-lake/guides/query-staged-orc-files-in-stage.md new file mode 100644 index 0000000000000..789a8922550d6 --- /dev/null +++ b/tidb-cloud-lake/guides/query-staged-orc-files-in-stage.md @@ -0,0 +1,110 @@ +--- +title: Querying Staged ORC Files in Stage +sidebar_label: ORC +--- +import StepsWrap from '@site/src/components/StepsWrap'; +import StepContent from '@site/src/components/Steps/step-content'; + +## Syntax + +- [Query rows as Variants](/tidb-cloud-lake/guides/query-stage.md#query-rows-as-variants) +- [Query columns by name](/tidb-cloud-lake/guides/query-stage.md#query-columns-by-name) +- [Query Metadata](/tidb-cloud-lake/guides/query-stage.md#query-metadata) + +## Tutorial + +In this tutorial, we will walk you through the process of downloading the Iris dataset in ORC format, uploading it to an Amazon S3 bucket, creating an external stage, and querying the data directly from the ORC file. + + + + +### Download Iris Dataset + +Download the iris dataset from https://github.com/tensorflow/io/raw/master/tests/test_orc/iris.orc then upload it to your Amazon S3 bucket. + +The iris dataset contains 3 classes of 50 instances each, where each class refers to a type of iris plant. It has 4 attributes: (1) sepal length, (2) sepal width, (3) petal length, (4) petal width, and the last column contains the class label. + + + + +### Create External Stage + +Create an external stage with your Amazon S3 bucket where your iris dataset file is stored. + +```sql +CREATE STAGE orc_query_stage + URL = 's3://databend-doc' + CONNECTION = ( + ACCESS_KEY_ID = '', + SECRET_ACCESS_KEY = '' + ); +``` + + + + +### Query ORC File + +query with columns + +```sql +SELECT * +FROM @orc_query_stage +( + FILE_FORMAT => 'orc', + PATTERN => '.*[.]orc' +); + +┌──────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ sepal_length │ sepal_width │ petal_length │ petal_width │ species │ +├───────────────────┼───────────────────┼───────────────────┼───────────────────┼──────────────────┤ +│ 5.1 │ 3.5 │ 1.4 │ 0.2 │ setosa │ +│ · │ · │ · │ · │ · │ +│ 5.9 │ 3 │ 5.1 │ 1.8 │ virginica │ +└──────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +query with path expressions: + +```sql +SELECT $1 +FROM @orc_query_stage +( + FILE_FORMAT => 'orc', + PATTERN => '.*[.]orc' + +); +``` + +You can also query the remote ORC file directly: + +```sql +SELECT + * +FROM + 'https://github.com/tensorflow/io/raw/master/tests/test_orc/iris.orc' (file_format => 'orc'); +``` + + + + +### Query with Metadata + +Query ORC files directly from a stage, including metadata columns like `METADATA$FILENAME` and `METADATA$FILE_ROW_NUMBER`: + + + +```sql +SELECT + METADATA$FILENAME, + METADATA$FILE_ROW_NUMBER, + * +FROM @orc_query_stage +( + FILE_FORMAT => 'orc', + PATTERN => '.*[.]orc' +); +``` + + + \ No newline at end of file diff --git a/tidb-cloud-lake/guides/query-tsv-files-in-stage.md b/tidb-cloud-lake/guides/query-tsv-files-in-stage.md new file mode 100644 index 0000000000000..e2f1a34574bdc --- /dev/null +++ b/tidb-cloud-lake/guides/query-tsv-files-in-stage.md @@ -0,0 +1,73 @@ +--- +title: Querying TSV Files in Stage +sidebar_label: TSV +--- + +## Syntax: + +- [Query columns by position](/tidb-cloud-lake/guides/query-stage.md#query-columns-by-position) +- [Query Metadata](/tidb-cloud-lake/guides/query-stage.md#query-metadata) + + +## Tutorial + +### Step 1. Create an External Stage + +Create an external stage with your own S3 bucket and credentials where your TSV files are stored. +```sql +CREATE STAGE tsv_query_stage +URL = 's3://load/tsv/' +CONNECTION = ( + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = '' +); +``` + +### Step 2. Create Custom TSV File Format + +```sql +CREATE FILE FORMAT tsv_query_format + TYPE = TSV, + RECORD_DELIMITER = '\n', + FIELD_DELIMITER = ',', + COMPRESSION = AUTO; +``` + +- More TSV file format options refer to [TSV File Format Options](/tidb-cloud-lake/sql/input-output-file-formats.md#tsv-options) + +### Step 3. Query TSV Files + +```sql +SELECT $1, $2, $3 +FROM @tsv_query_stage +( + FILE_FORMAT => 'tsv_query_format', + PATTERN => '.*[.]tsv' +); +``` + +If the TSV files is compressed with gzip, we can use the following query: + +```sql +SELECT $1, $2, $3 +FROM @tsv_query_stage +( + FILE_FORMAT => 'tsv_query_format', + PATTERN => '.*[.]tsv[.]gz' +); +``` +### Query with Metadata + +Query TSV files directly from a stage, including metadata columns like `METADATA$FILENAME` and `METADATA$FILE_ROW_NUMBER`: + +```sql +SELECT + METADATA$FILENAME, + METADATA$FILE_ROW_NUMBER, + $1, $2, $3 +FROM @tsv_query_stage +( + FILE_FORMAT => 'tsv_query_format', + PATTERN => '.*[.]tsv' +); +``` \ No newline at end of file diff --git a/tidb-cloud-lake/guides/recovery-from-operational-errors.md b/tidb-cloud-lake/guides/recovery-from-operational-errors.md new file mode 100644 index 0000000000000..fd9440e89e078 --- /dev/null +++ b/tidb-cloud-lake/guides/recovery-from-operational-errors.md @@ -0,0 +1,200 @@ +--- +title: Recovery from Operational Errors +--- +import IndexOverviewList from '@site/src/components/IndexOverviewList'; + +# Recovery from Operational Errors + +This guide provides step-by-step instructions for recovering from common operational errors in Databend. + +## Introduction + +Databend can help you recover from these common operational errors: +- **Accidentally dropped databases** +- **Accidentally dropped tables** +- **Incorrect data modifications (UPDATE/DELETE operations)** +- **Accidentally truncated tables** +- **Data loading mistakes** +- **Schema evolution rollbacks** (reverting table structure changes) +- **Dropped columns or constraints** + +These recovery capabilities are powered by Databend's FUSE engine with its Git-like storage design, which maintains snapshots of your data at different points in time. + +## Recovery Scenarios and Solutions + +### Scenario: Accidentally Dropped Database + +If you've accidentally dropped a database, you can restore it using the `UNDROP DATABASE` command: + +1. Identify the dropped database: + + ```sql + SHOW DROP DATABASES LIKE '%sales_data%'; + ``` + +2. Restore the dropped database: + + ```sql + UNDROP DATABASE sales_data; + ``` + +3. Verify the database has been restored: + + ```sql + SHOW DATABASES; + ``` + +4. Restore ownership (if needed): + + ```sql + GRANT OWNERSHIP on sales_data.* to ROLE ; + ``` + +**Important**: A dropped database can only be restored within the retention period (default is 24 hours). + +For more details, see [UNDROP DATABASE](/tidb-cloud-lake/sql/undrop-database.md) and [SHOW DROP DATABASES](/tidb-cloud-lake/sql/show-drop-databases.md). + +### Scenario: Accidentally Dropped Table + +If you've accidentally dropped a table, you can restore it using the `UNDROP TABLE` command: + +1. Identify the dropped table: + + ```sql + SHOW DROP TABLES LIKE '%order%'; + ``` + +2. Restore the dropped table: + + ```sql + UNDROP TABLE sales_data.orders; + ``` + +3. Verify the table has been restored: + + ```sql + SHOW TABLES FROM sales_data; + ``` + +4. Restore ownership (if needed): + + ```sql + GRANT OWNERSHIP on sales_data.orders to ROLE ; + ``` + +**Important**: A dropped table can only be restored within the retention period (default is 24 hours). + +For more details, see [UNDROP TABLE](/tidb-cloud-lake/sql/undrop-table.md) and [SHOW DROP TABLES](/tidb-cloud-lake/sql/show-drop-tables.md). + +### Scenario: Incorrect Data Updates or Deletions + +If you've accidentally modified or deleted data in a table, you can restore it to a previous state using the `FLASHBACK TABLE` command: + +1. Identify the snapshot ID or timestamp before the incorrect operation: + + ```sql + SELECT * FROM fuse_snapshot('sales_data', 'orders'); + ``` + + ```text + snapshot_id: c5c538d6b8bc42f483eefbddd000af7d + snapshot_location: 29356/44446/_ss/c5c538d6b8bc42f483eefbddd000af7d_v2.json + format_version: 2 + previous_snapshot_id: NULL + [... ...] + timestamp: 2023-04-19 04:20:25.062854 + ``` + +2. Restore the table to the previous state: + + ```sql + -- Using snapshot ID + ALTER TABLE sales_data.orders FLASHBACK TO (SNAPSHOT => 'c5c538d6b8bc42f483eefbddd000af7d'); + + -- Or using timestamp + ALTER TABLE sales_data.orders FLASHBACK TO (TIMESTAMP => '2023-04-19 04:20:25.062854'::TIMESTAMP); + ``` + +3. Verify the data has been restored: + + ```sql + SELECT * FROM sales_data.orders LIMIT 3; + ``` + +**Important**: Flashback operations are only possible for existing tables and within the retention period. + +For more details, see [FLASHBACK TABLE](/tidb-cloud-lake/sql/flashback-table.md). + +### Scenario: Schema Evolution Rollbacks +If you've made unwanted changes to a table's structure, you can revert to the previous schema: + +1. Create a table and add some data: + + ```sql + CREATE OR REPLACE TABLE customers (id INT, name VARCHAR, email VARCHAR); + INSERT INTO customers VALUES (1, 'John', 'john@example.com'); + ``` + +2. Make schema changes: + + ```sql + ALTER TABLE customers ADD COLUMN phone VARCHAR; + DESC customers; + ``` + +Output: +```text +┌─────────┬─────────┬──────┬─────────┬─────────┐ +│ Field │ Type │ Null │ Default │ Extra │ +├─────────┼─────────┼──────┼─────────┼─────────┤ +│ id │ INT │ YES │ NULL │ │ +│ name │ VARCHAR │ YES │ NULL │ │ +│ email │ VARCHAR │ YES │ NULL │ │ +│ phone │ VARCHAR │ YES │ NULL │ │ +└─────────┴─────────┴──────┴─────────┴─────────┘ +``` + +3. Find the snapshot ID from before the schema change: + + ```sql + SELECT * FROM fuse_snapshot('default', 'customers'); + ``` + + Output: + + ```text + snapshot_id: 01963cefafbb785ea393501d2e84a425 timestamp: 2025-04-16 04:51:03.227000 previous_snapshot_id: 01963ce9cc29735b87886a08d3ca7e2f + snapshot_id: 01963ce9cc29735b87886a08d3ca7e2f timestamp: 2025-04-16 04:44:37.289000 previous_snapshot_id: NULL + ``` + +4. Revert to the previous schema (using the earlier snapshot): + + ```sql + ALTER TABLE customers FLASHBACK TO (SNAPSHOT => '01963ce9cc29735b87886a08d3ca7e2f'); + ``` + +5. Verify the schema has been restored: + + ```sql + DESC customers; + ``` + + Output: + ```text + ┌─────────┬─────────┬──────┬─────────┬─────────┐ + │ Field │ Type │ Null │ Default │ Extra │ + ├─────────┼─────────┼──────┼─────────┼─────────┤ + │ id │ INT │ YES │ NULL │ │ + │ name │ VARCHAR │ YES │ NULL │ │ + │ email │ VARCHAR │ YES │ NULL │ │ + └─────────┴─────────┴──────┴─────────┴─────────┘ + ``` + +## Important Considerations and Limitations + +- **Time Constraints**: Recovery only works within the retention period (default: 24 hours). +- **Name Conflicts**: Cannot undrop if an object with the same name exists — [rename database](/tidb-cloud-lake/sql/rename-database.md) or [rename table](/tidb-cloud-lake/sql/rename-table.md) first. +- **Ownership**: Ownership isn't automatically restored—manually grant it after recovery. +- **Transient Tables**: Flashback doesn't work for transient tables (no snapshots stored). + +**For Emergency Situations**: Facing critical data loss? Contact Databend Support immediately for help. diff --git a/tidb-cloud-lake/guides/redash.md b/tidb-cloud-lake/guides/redash.md new file mode 100644 index 0000000000000..ed293b74c06d2 --- /dev/null +++ b/tidb-cloud-lake/guides/redash.md @@ -0,0 +1,167 @@ +--- +title: Redash +sidebar_position: 8 +--- + +[Redash](https://redash.io/) is designed to enable anyone, regardless of the level of technical sophistication, to harness the power of data big and small. SQL users leverage Redash to explore, query, visualize, and share data from any data sources. Their work in turn enables anybody in their organization to use the data. Every day, millions of users at thousands of organizations around the world use Redash to develop insights and make data-driven decisions. + +Both Databend and Databend Cloud can integrate with Redash as a data source. The following tutorials guide you through deploying and integrating Redash. + +## Tutorial-1: Integrating Databend with Redash + +In this tutorial, you'll deploy a local Databend and install Redash with Docker. Before you start, ensure that you have Docker installed. + +### Step 1. Deploy Databend + +Follow the [Deployment Guide](/guides/self-hosted) to deploy a local Databend. + +### Step 2. Deploy Redash + +The steps below describe how to deploy Redash with Docker. + +1. Clone the Redash repository first, and then create an .env file with the following commands: + +```shell +git clone https://github.com/getredash/redash.git +cd redash +touch .env && echo REDASH_COOKIE_SECRET=111 > .env +``` + +2. Install dependencies and build the frontend project: + +:::note +This requires Node.js version between 14.16.0 and 17.0.0. To install Node.js, for example, version 14.16.1: + +```shell +# Install nvm +brew install nvm +export NVM_DIR="$([ -z "${XDG_CONFIG_HOME-}" ] && printf %s "${HOME}/.nvm" || printf %s "${XDG_CONFIG_HOME}/nvm")" +[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" +# Install and switch to Node.js 14.16.1 +nvm install 14.16.1 +nvm use 14.16.1 +``` + +::: + +```shell +cd viz-lib & yarn install +cd .. +yarn install +yarn build +``` + +3. Build the server and initialize the database dependencies before starting Redash in Docker Compose: + +```shell +docker-compose build server +docker-compose run --rm server create_db +``` + +4. Start Redash: + +```shell +docker-compose up +``` + +### Step 3. Add Databend as a Data Source + +1. Sign up for Redash by completing the initial process at http://localhost:5000 in your web browser. + +2. Select `Databend` from the list on **Settings** > **New Data Source**. + +![Alt text](/img/integration/redash-select.png) + +3. Configure your Databend data source. + + - Username: `root`. No password is required if you log into a local instance of Databend with `root`. + - Host: `host.docker.internal` + - Port: `8000` + - Database: `default` + - Secure: Enable this option if you enabled HTTPS on your Databend server. + +![Alt text](/img/integration/redash-cfg-local.png) + +4. Click **Create**, then **Test Connection** to see check if the connection is successful. + +You're all set! You can now write a query and add your visualizations. For more information, refer to the Redash Getting Started guide: https://redash.io/help/user-guide/getting-started#2-Write-A-Query + +## Tutorial-2: Integrating Databend Cloud with Redash + +In this tutorial, you'll install Redash with Docker. Before you start, ensure that you have Docker installed. + +### Step 1. Obtain Connection Information + +Obtain the connection information from Databend Cloud. For how to do that, refer to [Connecting to a Warehouse](/guides/cloud/resources/warehouses#connecting). + +### Step 2. Deploy Redash + +The steps below describe how to deploy Redash with Docker. + +1. Clone the Redash repository first, and then create an .env file with the following commands: + +```shell +git clone https://github.com/getredash/redash.git +cd redash +touch .env && echo REDASH_COOKIE_SECRET=111 > .env +``` + +2. Install dependencies and build the frontend project: + +:::note +This requires Node.js version between 14.16.0 and 17.0.0. To install Node.js, for example, version 14.16.1: + +```shell +# Install nvm +brew install nvm +export NVM_DIR="$([ -z "${XDG_CONFIG_HOME-}" ] && printf %s "${HOME}/.nvm" || printf %s "${XDG_CONFIG_HOME}/nvm")" +[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" +# Install and switch to Node.js 14.16.1 +nvm install 14.16.1 +nvm use 14.16.1 +``` + +::: + +```shell +cd viz-lib & yarn install +cd .. +yarn install +yarn build +``` + +3. Build the server and initialize the database dependencies before starting Redash in Docker Compose: + +```shell +docker-compose build server +docker-compose run --rm server create_db +``` + +4. Start Redash: + +```shell +docker-compose up +``` + +### Step 3. Add Databend Cloud as a Data Source + +1. Sign up for Redash by completing the initial process at http://localhost:5000 in your web browser. + +2. Select `Databend` from the list on **Settings** > **New Data Source**. + +![Alt text](@site/static/img/documents/BI/redash-select.png) + +3. Configure your Databend data source. + + - Username: `cloudapp`. + - Password: Copy and paste your password generated in Databend Cloud. + - Host: Copy and paste your host address generated in Databend Cloud. + - Port: `443` + - Database: `default` + - Secure: Enable this option. + +![Alt text](@site/static/img/documents/BI/redash-cfg-cloud.png) + +4. Click **Create**, then **Test Connection** to see check if the connection is successful. + +You're all set! You can now write a query and add your visualizations. For more information, refer to the Redash Getting Started guide: https://redash.io/help/user-guide/getting-started#2-Write-A-Query diff --git a/tidb-cloud-lake/guides/roles.md b/tidb-cloud-lake/guides/roles.md new file mode 100644 index 0000000000000..53750ae89d8bc --- /dev/null +++ b/tidb-cloud-lake/guides/roles.md @@ -0,0 +1,373 @@ +--- +title: Roles +--- + +Roles in Databend play a pivotal role in simplifying the management of permissions. When multiple users require the same set of privileges, granting privileges individually can be cumbersome. Roles provide a solution by allowing the assignment of a set of privileges to a role, which can then be easily assigned to multiple users. + +![Alt text](/img/guides/access-control-3.png) + +## Inheriting Roles & Establishing Hierarchy + +Role granting enables one role to inherit permissions and responsibilities from another. This contributes to the creation of a flexible hierarchy, similar to the organizational structure, where two [Built-in Roles](#built-in-roles) exist: the highest being `account-admin` and the lowest being `public`. + +Consider the scenario where three roles are created: *manager*, *engineer*, and *intern*. In this example, the role of *intern* is granted to the engineer *role*. Consequently, the *engineer* not only possesses their own set of privileges but also inherits those associated with the *intern* role. Extending this hierarchy further, if the *engineer* role is granted to the *manager*, the *manager* now acquires both the inherent privileges of the *engineer* and the *intern* roles. + +![Alt text](/img/guides/access-control-4.png) + +## Built-in Roles + +Databend comes with the following built-in roles: + +| Built-in Role | Description | +|---------------|----------------------------------------------------------------------------------------------------------------------------------------| +| account-admin | Possesses all privileges, serves as the parent role for all other roles, and enables seamless switching to any role within the tenant. | +| public | Inherits no permissions, considers all roles as its parent roles, and allows any role to switch to the public role. | + +To assign the `account-admin` role to a user in Databend Cloud, select the role when inviting the user. You can also assign the role to a user after they join. If you're using Databend Community Edition or Enterprise Edition, configure an `account-admin` user during deployment first, and then assign the role to other users if needed. For more information about configuring admin users, see [Configuring Admin Users](../../20-self-hosted/04-references/admin-users.md). + +## Setting Default Role + +When a user is granted multiple roles, you can use the [CREATE USER](/tidb-cloud-lake/sql/create-user.md) or [ALTER USER](/tidb-cloud-lake/sql/alter-user.md) commands to set a default role for that user. The default role determines the role automatically assigned to the user at the beginning of a session: + +```sql title='Example:' +-- Show existing roles in the system +SHOW ROLES; + +┌───────────────────────────────────────────────────────────┐ +│ name │ inherited_roles │ is_current │ is_default │ +├───────────────┼─────────────────┼────────────┼────────────┤ +│ account_admin │ 0 │ true │ true │ +│ public │ 0 │ false │ false │ +│ writer │ 0 │ false │ false │ +└───────────────────────────────────────────────────────────┘ + +-- Create a user 'eric' with the password 'abc123' and set 'writer' as the default role +CREATE USER eric IDENTIFIED BY 'abc123' WITH DEFAULT_ROLE = 'writer'; + +-- Grant the 'account_admin' role to the user 'eric' +GRANT ROLE account_admin TO eric; + +-- Set 'account_admin' as the default role for user 'eric' +ALTER USER eric WITH DEFAULT_ROLE = 'account_admin'; +``` + +- Users have the flexibility to switch to other roles within a session using the [SET ROLE](/tidb-cloud-lake/sql/set-role.md) command. +- A user can examine their current role and view all the roles granted to them by using the [SHOW ROLES](/tidb-cloud-lake/sql/show-roles.md) command. +- If you don't explicitly set a default role for a user, Databend will default to using the built-in role `public` as the default role. + +## Active Role & Secondary Roles + +A user can be granted multiple roles in Databend. These roles are categorized into an active role and secondary roles: + +- The active role is the user's currently active primary role for the session, which can be set using the [SET ROLE](/tidb-cloud-lake/sql/set-role.md) command. + +- Secondary roles are additional roles that provide extra permissions and are active by default. Users can activate or deactivate secondary roles with the [SET SECONDARY ROLES](/tidb-cloud-lake/sql/set-secondary-roles.md) command to temporarily adjust their permission scope. + +## Billing Role + +In addition to the standard built-in roles, you can create a custom role named `billing` in Databend Cloud to cater specifically to the needs of finance personnel. The role `billing` provides access only to billing-related information, ensuring that finance personnel can view necessary financial data without exposure to other business-related pages. + +To set up and use the role `billing`, you can create it using the following command: + +```sql +CREATE ROLE billing; +``` +The role name is case-insensitive, so `billing` and `Billing` are considered the same. For detailed steps on setting and assigning the role `billing`, see [Granting Access to Finance Personnel](/guides/cloud/administration/costs#granting-access-to-finance-personnel). + +## Usage Examples (Basic) + +This example showcases role-based permission management. Initially, a 'writer' role is created and granted privileges. Subsequently, these privileges are assigned to the user 'eric', who inherits them. Lastly, the permissions are revoked from the role, demonstrating their impact on the user's privileges. + +```sql title='Example:' +-- Create a new role named 'writer' +CREATE ROLE writer; + +-- Grant all privileges on all objects in the 'default' schema to the role 'writer' +GRANT ALL ON default.* TO ROLE writer; + +-- Create a new user named 'eric' with the password 'abc123' and set the default role +CREATE USER eric IDENTIFIED BY 'abc123' WITH DEFAULT_ROLE = 'writer'; + +-- Grant the role 'writer' to the user 'eric' +GRANT ROLE writer TO eric; + +-- Show the granted privileges for the role 'writer' +SHOW GRANTS FOR ROLE writer; + +┌───────────────────────────────────────────────────────┐ +│ Grants │ +├───────────────────────────────────────────────────────┤ +│ GRANT ALL ON 'default'.'default'.* TO ROLE 'writer' │ +└───────────────────────────────────────────────────────┘ + +-- Revoke all privileges on all objects in the 'default' schema from role 'writer' +REVOKE ALL ON default.* FROM ROLE writer; + +-- Show the granted privileges for the role 'writer' +-- No privileges are displayed as they have been revoked from the role +SHOW GRANTS FOR ROLE writer; +``` + +## Business-Aligned Role Model + +Align roles to business systems so each domain can access only its own data, and cross-domain access is granted through collaboration roles. + +### Reference Architecture + +```text + ┌──────────────┐ + │ identity │ + │ account │ + └──────┬───────┘ + │ users/permissions + v +┌──────────────┐ products ┌──────────────┐ settlement ┌──────────────┐ +│ marketing │─────────────>│ commerce │─────────────>│ payment │ +│ growth │ │ orders │ │ settlement │ +└──────┬───────┘ └──────┬───────┘ └──────┬───────┘ + │ │ fulfillment │ accounting + │ v v + │ ┌──────────────┐ ┌──────────────┐ + │ │ fulfillment │ │ finance │ + │ │ logistics │ │ accounting │ + │ └──────────────┘ └──────────────┘ + │ + │ support/feedback + v +┌──────────────┐ +│ support │ +│ tickets │ +└──────────────┘ + + ^ risk monitoring/policies + │ +┌──────────────┐ +│ risk │ +│ fraud │ +└──────────────┘ +``` + +### Role Conventions + +- `_owner`: owns all objects in the domain +- `_rw`: write access for pipelines and engineers +- `_ro`: read-only access for analysts +- Databases: `_raw`, `_mart` +- Stages: `stage__ingest` + +### Ownership Behavior + +Objects are owned by the role that is active when they are created. Ensure you `SET ROLE _owner` before creating objects. For details, see [Ownership](/tidb-cloud-lake/guides/ownership.md). + +### Usage Examples (Business Domains) + +```sql title='Example:' +-- 1) Business system roles +CREATE ROLE identity_owner; +CREATE ROLE identity_rw; +CREATE ROLE identity_ro; + +CREATE ROLE commerce_owner; +CREATE ROLE commerce_rw; +CREATE ROLE commerce_ro; + +CREATE ROLE payment_owner; +CREATE ROLE payment_rw; +CREATE ROLE payment_ro; + +CREATE ROLE fulfillment_owner; +CREATE ROLE fulfillment_rw; +CREATE ROLE fulfillment_ro; + +CREATE ROLE marketing_owner; +CREATE ROLE marketing_rw; +CREATE ROLE marketing_ro; + +CREATE ROLE finance_owner; +CREATE ROLE finance_rw; +CREATE ROLE finance_ro; + +CREATE ROLE support_owner; +CREATE ROLE support_rw; +CREATE ROLE support_ro; + +CREATE ROLE risk_owner; +CREATE ROLE risk_rw; +CREATE ROLE risk_ro; + +-- 2) Business system resources +CREATE DATABASE identity_raw; +CREATE DATABASE identity_mart; +CREATE STAGE stage_identity_ingest; + +CREATE DATABASE commerce_raw; +CREATE DATABASE commerce_mart; +CREATE STAGE stage_commerce_ingest; + +CREATE DATABASE payment_raw; +CREATE DATABASE payment_mart; +CREATE STAGE stage_payment_ingest; + +CREATE DATABASE fulfillment_raw; +CREATE DATABASE fulfillment_mart; +CREATE STAGE stage_fulfillment_ingest; + +CREATE DATABASE marketing_raw; +CREATE DATABASE marketing_mart; +CREATE STAGE stage_marketing_ingest; + +CREATE DATABASE finance_raw; +CREATE DATABASE finance_mart; +CREATE STAGE stage_finance_ingest; + +CREATE DATABASE support_raw; +CREATE DATABASE support_mart; +CREATE STAGE stage_support_ingest; + +CREATE DATABASE risk_raw; +CREATE DATABASE risk_mart; +CREATE STAGE stage_risk_ingest; + +-- 3) Ownership assigned to owner roles +GRANT OWNERSHIP ON identity_raw.* TO ROLE identity_owner; +GRANT OWNERSHIP ON identity_mart.* TO ROLE identity_owner; +GRANT OWNERSHIP ON STAGE stage_identity_ingest TO ROLE identity_owner; + +GRANT OWNERSHIP ON commerce_raw.* TO ROLE commerce_owner; +GRANT OWNERSHIP ON commerce_mart.* TO ROLE commerce_owner; +GRANT OWNERSHIP ON STAGE stage_commerce_ingest TO ROLE commerce_owner; + +GRANT OWNERSHIP ON payment_raw.* TO ROLE payment_owner; +GRANT OWNERSHIP ON payment_mart.* TO ROLE payment_owner; +GRANT OWNERSHIP ON STAGE stage_payment_ingest TO ROLE payment_owner; + +GRANT OWNERSHIP ON fulfillment_raw.* TO ROLE fulfillment_owner; +GRANT OWNERSHIP ON fulfillment_mart.* TO ROLE fulfillment_owner; +GRANT OWNERSHIP ON STAGE stage_fulfillment_ingest TO ROLE fulfillment_owner; + +GRANT OWNERSHIP ON marketing_raw.* TO ROLE marketing_owner; +GRANT OWNERSHIP ON marketing_mart.* TO ROLE marketing_owner; +GRANT OWNERSHIP ON STAGE stage_marketing_ingest TO ROLE marketing_owner; + +GRANT OWNERSHIP ON finance_raw.* TO ROLE finance_owner; +GRANT OWNERSHIP ON finance_mart.* TO ROLE finance_owner; +GRANT OWNERSHIP ON STAGE stage_finance_ingest TO ROLE finance_owner; + +GRANT OWNERSHIP ON support_raw.* TO ROLE support_owner; +GRANT OWNERSHIP ON support_mart.* TO ROLE support_owner; +GRANT OWNERSHIP ON STAGE stage_support_ingest TO ROLE support_owner; + +GRANT OWNERSHIP ON risk_raw.* TO ROLE risk_owner; +GRANT OWNERSHIP ON risk_mart.* TO ROLE risk_owner; +GRANT OWNERSHIP ON STAGE stage_risk_ingest TO ROLE risk_owner; + +-- 4) Read/write separation inside each domain +GRANT USAGE ON identity_raw.* TO ROLE identity_rw; +GRANT SELECT ON identity_raw.* TO ROLE identity_rw; +GRANT CREATE, INSERT, UPDATE, DELETE, ALTER, DROP ON identity_mart.* TO ROLE identity_rw; +GRANT USAGE ON identity_mart.* TO ROLE identity_ro; +GRANT SELECT ON identity_mart.* TO ROLE identity_ro; +GRANT READ, WRITE ON STAGE stage_identity_ingest TO ROLE identity_rw; + +GRANT USAGE ON commerce_raw.* TO ROLE commerce_rw; +GRANT SELECT ON commerce_raw.* TO ROLE commerce_rw; +GRANT CREATE, INSERT, UPDATE, DELETE, ALTER, DROP ON commerce_mart.* TO ROLE commerce_rw; +GRANT USAGE ON commerce_mart.* TO ROLE commerce_ro; +GRANT SELECT ON commerce_mart.* TO ROLE commerce_ro; +GRANT READ, WRITE ON STAGE stage_commerce_ingest TO ROLE commerce_rw; + +GRANT USAGE ON payment_raw.* TO ROLE payment_rw; +GRANT SELECT ON payment_raw.* TO ROLE payment_rw; +GRANT CREATE, INSERT, UPDATE, DELETE, ALTER, DROP ON payment_mart.* TO ROLE payment_rw; +GRANT USAGE ON payment_mart.* TO ROLE payment_ro; +GRANT SELECT ON payment_mart.* TO ROLE payment_ro; +GRANT READ, WRITE ON STAGE stage_payment_ingest TO ROLE payment_rw; + +GRANT USAGE ON fulfillment_raw.* TO ROLE fulfillment_rw; +GRANT SELECT ON fulfillment_raw.* TO ROLE fulfillment_rw; +GRANT CREATE, INSERT, UPDATE, DELETE, ALTER, DROP ON fulfillment_mart.* TO ROLE fulfillment_rw; +GRANT USAGE ON fulfillment_mart.* TO ROLE fulfillment_ro; +GRANT SELECT ON fulfillment_mart.* TO ROLE fulfillment_ro; +GRANT READ, WRITE ON STAGE stage_fulfillment_ingest TO ROLE fulfillment_rw; + +GRANT USAGE ON marketing_raw.* TO ROLE marketing_rw; +GRANT SELECT ON marketing_raw.* TO ROLE marketing_rw; +GRANT CREATE, INSERT, UPDATE, DELETE, ALTER, DROP ON marketing_mart.* TO ROLE marketing_rw; +GRANT USAGE ON marketing_mart.* TO ROLE marketing_ro; +GRANT SELECT ON marketing_mart.* TO ROLE marketing_ro; +GRANT READ, WRITE ON STAGE stage_marketing_ingest TO ROLE marketing_rw; + +GRANT USAGE ON finance_raw.* TO ROLE finance_rw; +GRANT SELECT ON finance_raw.* TO ROLE finance_rw; +GRANT CREATE, INSERT, UPDATE, DELETE, ALTER, DROP ON finance_mart.* TO ROLE finance_rw; +GRANT USAGE ON finance_mart.* TO ROLE finance_ro; +GRANT SELECT ON finance_mart.* TO ROLE finance_ro; +GRANT READ, WRITE ON STAGE stage_finance_ingest TO ROLE finance_rw; + +GRANT USAGE ON support_raw.* TO ROLE support_rw; +GRANT SELECT ON support_raw.* TO ROLE support_rw; +GRANT CREATE, INSERT, UPDATE, DELETE, ALTER, DROP ON support_mart.* TO ROLE support_rw; +GRANT USAGE ON support_mart.* TO ROLE support_ro; +GRANT SELECT ON support_mart.* TO ROLE support_ro; +GRANT READ, WRITE ON STAGE stage_support_ingest TO ROLE support_rw; + +GRANT USAGE ON risk_raw.* TO ROLE risk_rw; +GRANT SELECT ON risk_raw.* TO ROLE risk_rw; +GRANT CREATE, INSERT, UPDATE, DELETE, ALTER, DROP ON risk_mart.* TO ROLE risk_rw; +GRANT USAGE ON risk_mart.* TO ROLE risk_ro; +GRANT SELECT ON risk_mart.* TO ROLE risk_ro; +GRANT READ, WRITE ON STAGE stage_risk_ingest TO ROLE risk_rw; + +-- 5) Ownership assigned at creation time +SET ROLE commerce_owner; +CREATE TABLE commerce_mart.orders ( + order_id STRING, + user_id STRING, + order_ts TIMESTAMP, + amount DECIMAL(18, 2) +); + +SET ROLE payment_owner; +CREATE TABLE payment_mart.transactions ( + transaction_id STRING, + order_id STRING, + user_id STRING, + transaction_ts TIMESTAMP, + amount DECIMAL(18, 2) +); + +SET ROLE identity_owner; +CREATE TABLE identity_mart.users ( + user_id STRING, + email STRING, + created_at TIMESTAMP +); + +-- 6) Collaboration roles aligned with the architecture +CREATE ROLE collab_marketing_commerce; +GRANT SELECT ON commerce_mart.orders TO ROLE collab_marketing_commerce; +GRANT ROLE collab_marketing_commerce TO ROLE marketing_ro; + +CREATE ROLE collab_fulfillment_commerce; +GRANT SELECT ON commerce_mart.orders TO ROLE collab_fulfillment_commerce; +GRANT ROLE collab_fulfillment_commerce TO ROLE fulfillment_ro; + +CREATE ROLE collab_payment_commerce; +GRANT SELECT ON commerce_mart.orders TO ROLE collab_payment_commerce; +GRANT ROLE collab_payment_commerce TO ROLE payment_ro; + +CREATE ROLE collab_finance_payment; +GRANT SELECT ON payment_mart.transactions TO ROLE collab_finance_payment; +GRANT ROLE collab_finance_payment TO ROLE finance_ro; + +CREATE ROLE collab_support_core; +GRANT SELECT ON commerce_mart.orders TO ROLE collab_support_core; +GRANT SELECT ON payment_mart.transactions TO ROLE collab_support_core; +GRANT ROLE collab_support_core TO ROLE support_ro; + +CREATE ROLE collab_risk_core; +GRANT SELECT ON identity_mart.users TO ROLE collab_risk_core; +GRANT SELECT ON commerce_mart.orders TO ROLE collab_risk_core; +GRANT SELECT ON payment_mart.transactions TO ROLE collab_risk_core; +GRANT ROLE collab_risk_core TO ROLE risk_ro; +``` diff --git a/tidb-cloud-lake/guides/security-reliability.md b/tidb-cloud-lake/guides/security-reliability.md new file mode 100644 index 0000000000000..1925938cee1cb --- /dev/null +++ b/tidb-cloud-lake/guides/security-reliability.md @@ -0,0 +1,15 @@ +--- +title: Security & Reliability +--- + +Databend offers **enterprise-grade security and reliability features** that safeguard your data throughout its lifecycle. From controlling who can access your data to protecting against network threats and recovering from operational errors, Databend's **multi-layered security approach** helps you maintain data integrity, compliance, and business continuity. + +| Security Feature | Purpose | When to Use | +|-----------------|---------|------------| +| [**Access Control**](/tidb-cloud-lake/guides/access-control.md) | Manage user permissions | When you need to control data access with role-based security and object ownership | +| [**Audit Trail**](/tidb-cloud-lake/guides/audit-trail.md) | Track database activities | When you need comprehensive audit trails for security monitoring, compliance, and performance analysis | +| [**Network Policy**](/tidb-cloud-lake/guides/network-policy.md) | Restrict network access | When you want to limit connections to specific IP ranges even with valid credentials | +| [**Password Policy**](/tidb-cloud-lake/guides/password-policy.md) | Set password requirements | When you need to enforce password complexity, rotation, and account lockout rules | +| [**Masking Policy**](/tidb-cloud-lake/guides/masking-policy.md) | Hide sensitive data | When you need to protect confidential data while still allowing authorized access | +| [**Fail-Safe**](/tidb-cloud-lake/guides/fail-safe.md) | Prevent data loss | When you need to recover accidentally deleted data from S3-compatible storage | +| [**Recovery from Errors**](/tidb-cloud-lake/guides/recovery-from-operational-errors.md) | Fix operational mistakes | When you need to recover from dropped databases/tables or incorrect data modifications | diff --git a/tidb-cloud-lake/guides/sql-analytics.md b/tidb-cloud-lake/guides/sql-analytics.md new file mode 100644 index 0000000000000..16d468576e353 --- /dev/null +++ b/tidb-cloud-lake/guides/sql-analytics.md @@ -0,0 +1,387 @@ +--- +title: SQL Analytics +--- + +> **Scenario:** CityDrive stages all dash-cam records into shared relational tables. This relational data (e.g., video metadata, event tags) is extracted by background processing pipelines from keyframes of the raw dash-cam video. Analysts can then filter, join, and aggregate on the same `video_id` / `frame_id` pairs used by all downstream workloads. + +This walkthrough models the relational side of that catalog and highlights practical SQL building blocks. The sample IDs here appear again in the JSON, vector, geo, and ETL guides. + +## 1. Create the Base Tables +`citydrive_videos` stores clip metadata, while `frame_events` records the interesting frames pulled from each clip. + +```sql +CREATE OR REPLACE TABLE citydrive_videos ( + video_id STRING, + vehicle_id STRING, + capture_date DATE, + route_name STRING, + weather STRING, + camera_source STRING, + duration_sec INT +); + +CREATE OR REPLACE TABLE frame_events ( + frame_id STRING, + video_id STRING, + frame_index INT, + collected_at TIMESTAMP, + event_tag STRING, + risk_score DOUBLE, + speed_kmh DOUBLE +); + +INSERT INTO citydrive_videos VALUES + ('VID-20250101-001', 'VEH-21', '2025-01-01', 'Downtown Loop', 'Rain', 'roof_cam', 3580), + ('VID-20250101-002', 'VEH-05', '2025-01-01', 'Port Perimeter', 'Overcast', 'front_cam',4020), + ('VID-20250102-001', 'VEH-21', '2025-01-02', 'Airport Connector', 'Clear', 'front_cam',3655), + ('VID-20250103-001', 'VEH-11', '2025-01-03', 'CBD Night Sweep', 'LightFog', 'rear_cam', 3310); + +INSERT INTO frame_events VALUES + ('FRAME-0101', 'VID-20250101-001', 125, '2025-01-01 08:15:21', 'hard_brake', 0.81, 32.4), + ('FRAME-0102', 'VID-20250101-001', 416, '2025-01-01 08:33:54', 'pedestrian', 0.67, 24.8), + ('FRAME-0201', 'VID-20250101-002', 298, '2025-01-01 11:12:02', 'lane_merge', 0.74, 48.1), + ('FRAME-0301', 'VID-20250102-001', 188, '2025-01-02 09:44:18', 'hard_brake', 0.59, 52.6), + ('FRAME-0401', 'VID-20250103-001', 522, '2025-01-03 21:18:07', 'night_lowlight', 0.63, 38.9), + -- Deliberate orphan to illustrate NOT EXISTS + ('FRAME-0501', 'VID-MISSING-001', 10, '2025-01-04 10:00:00', 'sensor_fault', 0.25, 15.0); + +-- Needed for the JOIN patterns below; same schema as the JSON & Search guide. +CREATE OR REPLACE TABLE frame_metadata_catalog ( + doc_id STRING, + meta_json VARIANT, + captured_at TIMESTAMP, + INVERTED INDEX idx_meta_json (meta_json) +); + +INSERT INTO frame_metadata_catalog VALUES + ('FRAME-0101', PARSE_JSON('{"scene":{"weather_code":"rain","lighting":"day"},"camera":{"sensor_view":"roof"},"vehicle":{"speed_kmh":32.4},"detections":{"objects":[{"type":"vehicle","confidence":0.88},{"type":"brake_light","confidence":0.64}]},"media_meta":{"tagging":{"labels":["hard_brake","rain","downtown_loop"]}}}'), '2025-01-01 08:15:21'), + ('FRAME-0102', PARSE_JSON('{"scene":{"weather_code":"rain","lighting":"day"},"camera":{"sensor_view":"roof"},"vehicle":{"speed_kmh":24.8},"detections":{"objects":[{"type":"pedestrian","confidence":0.92},{"type":"bike","confidence":0.35}]},"media_meta":{"tagging":{"labels":["pedestrian","swerve","crosswalk"]}}}'), '2025-01-01 08:33:54'), + ('FRAME-0201', PARSE_JSON('{"scene":{"weather_code":"overcast","lighting":"day"},"camera":{"sensor_view":"front"},"vehicle":{"speed_kmh":48.1},"detections":{"objects":[{"type":"lane_merge","confidence":0.74},{"type":"vehicle","confidence":0.41}]},"media_meta":{"tagging":{"labels":["lane_merge","urban"]}}}'), '2025-01-01 11:12:02'), + ('FRAME-0301', PARSE_JSON('{"scene":{"weather_code":"clear","lighting":"day"},"camera":{"sensor_view":"front"},"vehicle":{"speed_kmh":52.6},"detections":{"objects":[{"type":"vehicle","confidence":0.82},{"type":"hard_brake","confidence":0.59}]},"media_meta":{"tagging":{"labels":["hard_brake","highway"]}}}'), '2025-01-02 09:44:18'), + ('FRAME-0401', PARSE_JSON('{"scene":{"weather_code":"lightfog","lighting":"night"},"camera":{"sensor_view":"rear"},"vehicle":{"speed_kmh":38.9},"detections":{"objects":[{"type":"traffic_light","confidence":0.78},{"type":"vehicle","confidence":0.36}]},"media_meta":{"tagging":{"labels":["night_lowlight","traffic_light"]}}}'), '2025-01-03 21:18:07'); +``` + +Docs: [CREATE TABLE](/tidb-cloud-lake/sql/create-table.md), [INSERT](/tidb-cloud-lake/sql/insert.md). + +--- + +## 2. Filter the Working Set +Keep investigations focused on the Jan 1–3 snapshot from the seed data so the demo always returns rows. + +```sql +WITH recent_videos AS ( + SELECT * + FROM citydrive_videos + WHERE capture_date >= '2025-01-01' + AND capture_date < '2025-01-04' +) +SELECT v.video_id, + v.route_name, + v.weather, + COUNT(f.frame_id) AS flagged_frames +FROM recent_videos v +LEFT JOIN frame_events f USING (video_id) +GROUP BY v.video_id, v.route_name, v.weather +ORDER BY flagged_frames DESC; +``` + +Docs: [DATEADD](/tidb-cloud-lake/sql/date-add.md), [GROUP BY](/tidb-cloud-lake/sql/select.md#group-by-clause). + +Sample output: + +``` +video_id | route_name | weather | flagged_frames +VID-20250101-001| Downtown Loop | Rain | 2 +VID-20250101-002| Port Perimeter | Overcast | 1 +VID-20250102-001| Airport Connector | Clear | 1 +VID-20250103-001| CBD Night Sweep | LightFog | 1 +``` + +--- + +## 3. JOIN Patterns +### INNER JOIN for frame context +```sql +SELECT f.frame_id, + f.event_tag, + f.risk_score, + v.route_name, + v.camera_source +FROM frame_events AS f +JOIN citydrive_videos AS v USING (video_id) +ORDER BY f.collected_at; +``` + +Sample output: + +``` +frame_id | event_tag | risk_score | route_name | camera_source +FRAME-0101| hard_brake | 0.81 | Downtown Loop | roof_cam +FRAME-0102| pedestrian | 0.67 | Downtown Loop | roof_cam +FRAME-0201| lane_merge | 0.74 | Port Perimeter | front_cam +FRAME-0301| hard_brake | 0.59 | Airport Connector | front_cam +FRAME-0401| night_lowlight | 0.63 | CBD Night Sweep | rear_cam +``` + +### Anti join QA +```sql +SELECT frame_id +FROM frame_events f +WHERE NOT EXISTS ( + SELECT 1 + FROM citydrive_videos v + WHERE v.video_id = f.video_id +); +``` + +Sample output: + +``` +frame_id +FRAME-0501 +``` + +### LATERAL FLATTEN for nested detections +```sql +SELECT f.frame_id, + obj.value['type']::STRING AS detected_type, + obj.value['confidence']::DOUBLE AS confidence +FROM frame_events AS f +JOIN frame_metadata_catalog AS meta ON meta.doc_id = f.frame_id, + LATERAL FLATTEN(input => meta.meta_json['detections']['objects']) AS obj +WHERE f.event_tag = 'pedestrian' +ORDER BY confidence DESC; +``` + +Sample output: + +``` +frame_id | detected_type | confidence +FRAME-0102| pedestrian | 0.92 +FRAME-0102| bike | 0.35 +``` + +Docs: [JOIN](/tidb-cloud-lake/sql/join.md), [FLATTEN](/tidb-cloud-lake/sql/flatten.md). + +--- + +## 4. Aggregations for Fleet KPIs +### Behaviour by route +```sql +SELECT v.route_name, + f.event_tag, + COUNT(*) AS occurrences, + AVG(f.risk_score) AS avg_risk +FROM frame_events f +JOIN citydrive_videos v USING (video_id) +GROUP BY v.route_name, f.event_tag +ORDER BY avg_risk DESC, occurrences DESC; +``` + +Sample output: + +``` +route_name | event_tag | occurrences | avg_risk +Downtown Loop | hard_brake | 1 | 0.81 +Port Perimeter | lane_merge | 1 | 0.74 +Downtown Loop | pedestrian | 1 | 0.67 +CBD Night Sweep | night_lowlight | 1 | 0.63 +Airport Connector | hard_brake | 1 | 0.59 +``` + +### ROLLUP totals +```sql +SELECT v.route_name, + f.event_tag, + COUNT(*) AS occurrences +FROM frame_events f +JOIN citydrive_videos v USING (video_id) +GROUP BY ROLLUP(v.route_name, f.event_tag) +ORDER BY v.route_name NULLS LAST, f.event_tag; +``` + +Sample output (first 6 rows): + +``` +route_name | event_tag | occurrences +Airport Connector | hard_brake | 1 +Airport Connector | NULL | 1 +CBD Night Sweep | night_lowlight | 1 +CBD Night Sweep | NULL | 1 +Downtown Loop | hard_brake | 1 +Downtown Loop | pedestrian | 1 +... (total rows: 10) +``` + +### CUBE for route × weather coverage +```sql +SELECT v.route_name, + v.weather, + COUNT(DISTINCT v.video_id) AS videos +FROM citydrive_videos v +GROUP BY CUBE(v.route_name, v.weather) +ORDER BY v.route_name NULLS LAST, v.weather NULLS LAST; +``` + +Sample output (first 6 rows): + +``` +route_name | weather | videos +Airport Connector | Clear | 1 +Airport Connector | NULL | 1 +CBD Night Sweep | LightFog | 1 +CBD Night Sweep | NULL | 1 +Downtown Loop | Rain | 1 +Downtown Loop | NULL | 1 +... (total rows: 13) +``` + +--- + +## 5. Window Functions +### Running risk per video +```sql +WITH ordered_events AS ( + SELECT video_id, collected_at, risk_score + FROM frame_events +) +SELECT video_id, + collected_at, + risk_score, + SUM(risk_score) OVER ( + PARTITION BY video_id + ORDER BY collected_at + ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW + ) AS cumulative_risk +FROM ordered_events +ORDER BY video_id, collected_at; +``` + +Sample output (first 6 rows): + +``` +video_id | collected_at | risk_score | cumulative_risk +VID-20250101-001| 2025-01-01 08:15:21 | 0.81 | 0.81 +VID-20250101-001| 2025-01-01 08:33:54 | 0.67 | 1.48 +VID-20250101-002| 2025-01-01 11:12:02 | 0.74 | 0.74 +VID-20250102-001| 2025-01-02 09:44:18 | 0.59 | 0.59 +VID-20250103-001| 2025-01-03 21:18:07 | 0.63 | 0.63 +VID-MISSING-001 | 2025-01-04 10:00:00 | 0.25 | 0.25 +``` + +### Rolling average over recent frames +```sql +SELECT video_id, + frame_id, + frame_index, + risk_score, + AVG(risk_score) OVER ( + PARTITION BY video_id + ORDER BY frame_index + ROWS BETWEEN 3 PRECEDING AND CURRENT ROW + ) AS rolling_avg_risk +FROM frame_events +ORDER BY video_id, frame_index; +``` + +Sample output (first 6 rows): + +``` +video_id | frame_id | frame_index | risk_score | rolling_avg_risk +VID-20250101-001| FRAME-0101 | 125 | 0.81 | 0.81 +VID-20250101-001| FRAME-0102 | 416 | 0.67 | 0.74 +VID-20250101-002| FRAME-0201 | 298 | 0.74 | 0.74 +VID-20250102-001| FRAME-0301 | 188 | 0.59 | 0.59 +VID-20250103-001| FRAME-0401 | 522 | 0.63 | 0.63 +VID-MISSING-001 | FRAME-0501 | 10 | 0.25 | 0.25 +``` + +Docs: [Window functions](/tidb-cloud-lake/sql/window-functions.md). + +--- + +## 6. Aggregating Index Boost +Persist frequently used summaries for dashboards. + +```sql +CREATE OR REPLACE AGGREGATING INDEX idx_video_event_summary +AS +SELECT video_id, + event_tag, + COUNT(*) AS event_count, + AVG(risk_score) AS avg_risk +FROM frame_events +GROUP BY video_id, event_tag; +``` + +When analysts rerun a familiar KPI, the optimizer serves it from the index: + +```sql +SELECT v.route_name, + e.event_tag, + COUNT(*) AS event_count, + AVG(e.risk_score) AS avg_risk +FROM frame_events e +JOIN citydrive_videos v USING (video_id) +WHERE v.capture_date >= '2025-01-01' +GROUP BY v.route_name, e.event_tag +ORDER BY avg_risk DESC; +``` + +Sample output: + +``` +route_name | event_tag | event_count | avg_risk +Downtown Loop | hard_brake | 1 | 0.81 +Port Perimeter | lane_merge | 1 | 0.74 +Downtown Loop | pedestrian | 1 | 0.67 +CBD Night Sweep | night_lowlight | 1 | 0.63 +Airport Connector | hard_brake | 1 | 0.59 +``` + +Docs: [Aggregating Index](/tidb-cloud-lake/guides/aggregating-index.md) and [EXPLAIN](/tidb-cloud-lake/sql/explain.md). + +--- + +## 7. Stored Procedure Automation +Wrap the logic so scheduled jobs always produce the same report. + +```sql +CREATE OR REPLACE PROCEDURE citydrive_route_report(days_back UINT8) +RETURNS TABLE(route_name STRING, event_tag STRING, event_count BIGINT, avg_risk DOUBLE) +LANGUAGE SQL +AS +$$ +BEGIN + RETURN TABLE ( + SELECT v.route_name, + e.event_tag, + COUNT(*) AS event_count, + AVG(e.risk_score) AS avg_risk + FROM frame_events e + JOIN citydrive_videos v USING (video_id) + WHERE v.capture_date >= DATEADD('day', -:days_back, DATE '2025-01-04') + GROUP BY v.route_name, e.event_tag + ); +END; +$$; + +CALL PROCEDURE citydrive_route_report(30); +``` + +Sample output: + +``` +route_name | event_tag | event_count | avg_risk +Downtown Loop | hard_brake | 1 | 0.81 +CBD Night Sweep | night_lowlight | 1 | 0.63 +Downtown Loop | pedestrian | 1 | 0.67 +Airport Connector | hard_brake | 1 | 0.59 +Port Perimeter | lane_merge | 1 | 0.74 +``` + +Stored procedures can be triggered manually, via [TASKS](/tidb-cloud-lake/sql/task.md), or from orchestration tools. + +--- + +With these tables and patterns in place, the rest of the CityDrive guides can reference the exact same `video_id` keys—`frame_metadata_catalog` for JSON search, frame embeddings for similarity, GPS locations for geo queries, and a single ETL path to keep them synchronized. diff --git a/tidb-cloud-lake/guides/stage-overview.md b/tidb-cloud-lake/guides/stage-overview.md new file mode 100644 index 0000000000000..39475532f5388 --- /dev/null +++ b/tidb-cloud-lake/guides/stage-overview.md @@ -0,0 +1,87 @@ +--- +title: Stage Overview +--- + +In Databend, a stage is a virtual location where data files reside. Files in a stage can be queried directly or loaded into a table. Alternatively, you can unload data from a table into a stage as a file. The beauty of using a stage is that you can access it for data loading and unloading as conveniently as you would with folders on your computer. Just as when you put a file in a folder, you don't necessarily need to know its exact location on your hard disk. When accessing a file in a stage, you only need to specify the stage name and the file name, such as `@mystage/mydatafile.csv`, rather than specifying its location in the bucket of your object storage. Similar to folders on your computer, you can create as many stages as you need in Databend. However, it's important to note that a stage cannot contain another stage. Each stage operates independently and does not encompass other stages. + +Utilizing a stage for loading data also improves the efficiency of uploading, managing, and filtering your data files. With [BendSQL](/tidb-cloud-lake/guides/bendsql.md), you can easily upload or download files to or from a stage using a single command. When loading data into Databend, you can directly specify a stage in the COPY INTO command, allowing the command to read and even filter data files from that stage. Similarly, when exporting data from Databend, you can dump your data files into a stage. + +## Stage Types + +Based on the actual storage location and accessibility, stages can be categorized into these types: Internal Stage, External Stage, and User Stage. The following table summarizes the characteristics of different stage types in Databend, including their storage locations, accessibility, and recommended usage scenarios: + +| | User Stage | Internal Stage | External Stage | +|----------------------|------------------------------------|--------------------------------------------------|---------------------------------------------------------------------------------------------------------------| +| **Storage Location** | Internal object storage (Databend) | Internal object storage (Databend) | External object storage (e.g., S3, Azure) | +| **Creation Method** | Automatically created | Manually created via: `CREATE STAGE stage_name;` | Manually created via: `CREATE STAGE stage_name` `'s3://bucket/prefix/'` `CONNECTION=(endpoint_url='x', ...);` | +| **Access Control** | Only accessible by the user | Can be shared with other users or roles | Can be shared with other users or roles | +| **Drop Stage** | Not allowed | Deletes the stage and clears files in it | Deletes only the stage; files in the external location are retained | +| **File Upload** | Must upload files to Databend | Must upload files to Databend | No upload needed; used to read or unload data from/to external storage | +| **Usage Scenario** | Personal/private data | Team/shared data | External data integration or unloading | +| **Path Format** | `@~/` | `@stage_name/` | `@stage_name/` | +### Internal Stage + +Files in an internal stage are actually stored in the object storage where Databend resides. An internal stage is accessible to all users within your organization, allowing each user to utilize the stage for their data loading or export tasks. Similar to creating a folder, specifying a name is necessary when creating a stage. Below is an example of creating an internal stage with the [CREATE STAGE](/tidb-cloud-lake/sql/create-stage.md) command: + +```sql +-- Create an internal stage named my_internal_stage +CREATE STAGE my_internal_stage; +``` + +### External Stage + +An external stage enables you to specify an object storage location outside of where Databend resides. For instance, if you have datasets in a Google Cloud Storage container, you can create an external stage using that container. When creating an external stage, you must provide connection information for Databend to connect to the external location. + +Below is an example of creating an external stage. Let's say you have datasets in an Amazon S3 bucket named `databend-doc`: + +![alt text](/img/guides/external-stage.png) + +You can create an external stage with the [CREATE STAGE](/tidb-cloud-lake/sql/create-stage.md) command to connect Databend to that bucket: + +```sql +-- Create an external stage named my_external_stage +CREATE STAGE my_external_stage + URL = 's3://databend-doc' + CONNECTION = ( + ACCESS_KEY_ID = '', + SECRET_ACCESS_KEY = '' + ); +``` + +Once the external stage is created, you can access the datasets from Databend. For example, to list the files: + +```sql +LIST @my_external_stage; + +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ size │ md5 │ last_modified │ creator │ +├───────────────┼────────┼────────────────────────────────────┼───────────────────────────────┼──────────────────┤ +│ Inventory.csv │ 57585 │ "0cd02fb636a22ba9f4ae4d24555a7d68" │ 2024-03-17 21:22:38.000 +0000 │ NULL │ +│ Products.csv │ 42987 │ "570e5cbf6a4b6e7e9a258094192f4784" │ 2024-03-17 21:22:38.000 +0000 │ NULL │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +Please note that the external storage must be one of the object storage solutions supported by Databend. The [CREATE STAGE](/tidb-cloud-lake/sql/create-stage.md) command page provides examples on how to specify connection information for commonly used object storage solutions. + +### User Stage + +The user stage can be considered a special type of internal stage: Files in the user stage are stored in the object storage where Databend resides but cannot be accessed by other users. Each user has their own user stage out-of-the-box, and you do not need to create or name your user stage before use. Additionally, you cannot remove your user stage. + +The user stage can serve as a convenient repository for your data files that do not need to be shared with others. To access your user stage, use `@~`. For example, to list all the files in your stage: + +```sql +LIST @~; +``` + +## Managing Stages + +Databend provides a variety of commands to assist you in managing stages and the files staged within them: + +| Command | Description | Applies to User Stage | Applies to Internal Stage | Applies to External Stage | +| ------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------- | ------------------------- | ------------------------- | +| [CREATE STAGE](/tidb-cloud-lake/sql/create-stage.md) | Creates an internal or external stage. | No | Yes | Yes | +| [DROP STAGE](/tidb-cloud-lake/sql/drop-stage.md) | Removes an internal or external stage. | No | Yes | Yes | +| [DESC STAGE](/tidb-cloud-lake/sql/desc-stage.md) | Shows the properties of an internal or external stage. | No | Yes | Yes | +| [LIST](/tidb-cloud-lake/sql/list-stage-files.md) | Returns a list of the staged files in a stage. Alternatively, the table function [LIST_STAGE](/tidb-cloud-lake/sql/list-stage.md) offers similar functionality with added flexibility to obtain specific file information | Yes | Yes | Yes | +| [REMOVE](/tidb-cloud-lake/sql/remove-stage-files.md) | Removes staged files from a stage. | Yes | Yes | Yes | +| [SHOW STAGES](/tidb-cloud-lake/sql/show-stages.md) | Returns a list of the created internal and external stages. | No | Yes | Yes | diff --git a/tidb-cloud-lake/guides/superset.md b/tidb-cloud-lake/guides/superset.md new file mode 100644 index 0000000000000..d1d81575a87ee --- /dev/null +++ b/tidb-cloud-lake/guides/superset.md @@ -0,0 +1,98 @@ +--- +title: Superset +sidebar_position: 3 +--- +import StepsWrap from '@site/src/components/StepsWrap'; +import StepContent from '@site/src/components/Steps/step-content'; + +[Superset](https://superset.apache.org/) is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts. + +Databend integrates with Superset through two Python libraries: [databend-py](https://github.com/databendcloud/databend-py) and [databend-sqlalchemy](https://github.com/databendcloud/databend-sqlalchemy). To achieve this integration, you need to build a custom Docker image based on the official Superset Docker image, incorporating these two libraries. `databend-py` acts as the Python client library for Databend, allowing Superset to communicate with the Databend server, execute queries, and retrieve results through its API. Concurrently, `databend-sqlalchemy` serves as the SQLAlchemy adapter for Databend, enabling Superset to integrate with Databend using the SQLAlchemy framework. It facilitates the conversion of SQL queries generated by Superset into a format understandable by Databend through SQLAlchemy's interface. + +## Tutorial: Integrating with Superset + +This tutorial guides you through the process of integrating Databend Cloud with Superset. + + + + +### Building Superset Image + +These steps involve creating a customized Superset Docker image with Databend integration: + +1. Starts with the official Superset Docker image as the foundational base. Edit the Dockerfile, permissions are elevated to install essential packages. + +```shell title='Dockerfile' +FROM apache/superset +# Switching to root to install the required packages +USER root +RUN pip install databend-py +RUN pip install databend-sqlalchemy +# Switching back to using the `superset` user +USER superset +``` + +2. Build a Docker image with the tag "superset-databend:v0.0.1" using the current directory as the build context. + +```shell +docker build -t superset-databend:v0.0.1 . +``` + +3. Run a Docker container using the "superset-databend:v0.0.1" image. + +```shell +docker run -d -p 8080:8088 -e "SUPERSET_SECRET_KEY=" --name superset --platform linux/x86_64 superset-databend:v0.0.1 +``` + + + + +### Setting Up Superset + +1. Create an administrator user. + +```shell +docker exec -it superset superset fab create-admin \ + --username admin \ + --firstname Superset \ + --lastname Admin \ + --email admin@superset.com \ + --password admin +``` + +2. Apply any necessary database migrations to ensure that the Superset database schema is up to date. + +```shell +docker exec -it superset superset db upgrade +``` + +3. Initializes Superset. + +```shell +docker exec -it superset superset init +``` + + + + + +### Connecting to Databend Cloud + +1. Navigate to [http://localhost:8080/login/]( http://localhost:8080/login/) and use the credentials `admin/admin` for the username and password to log in. + +2. Select **Settings** > **Data** > **Connect Database** to open the connection wizard. + +![Alt text](/img/integration/superset-connect-db.png) + +3. Select `Other` from the list of supported databases. + +![Alt text](/img/integration/superset-select-other.png) + +4. On the **BASIC** tab, set a display name, for example, `Databend`, and then enter the URI to connect to Databend Cloud. The URI follows the format: `databend://`, where` ` corresponds to the host field in your warehouse's connection information. For information on how to obtain the connection details, refer to [Connecting to a Warehouse](/guides/cloud/resources/warehouses#connecting-to-a-warehouse). + +![Alt text](/img/integration/superset-uri.png) + +5. Click **TEST CONNECTION**, which should result in a popup message saying, "Connection looks good!". + + + \ No newline at end of file diff --git a/tidb-cloud-lake/guides/support-services.md b/tidb-cloud-lake/guides/support-services.md new file mode 100644 index 0000000000000..ed1958adc8d5e --- /dev/null +++ b/tidb-cloud-lake/guides/support-services.md @@ -0,0 +1,66 @@ +--- +title: Support Services +--- + +import LanguageDocs from '@site/src/components/LanguageDocs'; + +# Databend Cloud Support Services + +Databend provides comprehensive Support Services for our Databend Cloud users and customers. Our objective is to deliver exceptional support that reflects the Databend product's core values – high performance, ease of use, and fast, high-quality results. + + + +## Getting Support + +You can access support through multiple channels: + +- **Cloud Console**: Log in to the Databend Cloud console and select **Support → Create New Ticket** from the menu options to open a new support case and view the status of your submitted cases. +- **Status Page**: Subscribe to our [status page](https://status.databend.com) to get notified quickly about any incidents affecting our platform. +- **Documentation**: Browse our comprehensive [documentation](https://docs.databend.com) for guides, tutorials, and reference materials. + +## Support Service Levels + +Databend Cloud offers different levels of support based on your subscription tier: + +| Feature | Personal | Business | Dedicated | +| ------------------------------------------------------ | -------- | -------- | --------- | +| Logging and tracking support tickets | ✓ | ✓ | ✓ | +| 4/7 coverage and response window for Severity 1 issues | 24 hours | 8 hours | 4 hours | +| Response to non-severity-1 issues | 48 hours | 24 hours | 8 hours | + +### Severity Levels + +- **Severity 1**: Critical issue halting business operations with no workaround +- **Severity 2**: Major functionality impacted with significant performance degradation +- **Severity 3**: Partial loss of functionality with minor business impact +- **Severity 4**: General questions, recommendations, or feature requests + +:::note +Please note that only Business and Dedicated customers have a Service Level Agreement (SLA) on Support Incidents. If you are using the Personal edition – while we will try to answer your question as soon as possible, we'd encourage you to also explore our Community resources: + +- [Databend Community Slack Channel](https://link.databend.com/join-slack) +- [GitHub Discussions](https://github.com/databendlabs/databend/discussions) + ::: + +## Enterprise Support + +For customers with mission-critical deployments, our Dedicated edition offers enhanced support options including: + +- Priority response times for all severity levels +- Dedicated support engineer +- Proactive monitoring and issue resolution +- Regular health checks and optimization recommendations + +Contact our [sales team](https://www.databend.com/contact-us/) to learn more about our enterprise support offerings. diff --git a/tidb-cloud-lake/guides/tableau.md b/tidb-cloud-lake/guides/tableau.md new file mode 100644 index 0000000000000..95908d95ccdb1 --- /dev/null +++ b/tidb-cloud-lake/guides/tableau.md @@ -0,0 +1,155 @@ +--- +title: Tableau +sidebar_position: 2 +--- + +[Tableau](https://www.tableau.com/) is a visual analytics platform transforming the way we use data to solve problems—empowering people and organizations to make the most of their data. By leveraging the [databend-jdbc driver](https://github.com/databendcloud/databend-jdbc) (version 0.3.4 or higher), both Databend and Databend Cloud can integrate with Tableau, enabling seamless data access and efficient analysis. It is important to note that for optimal compatibility, it is advisable to use Tableau version 2022.3 or higher to avoid potential compatibility issues. + +Databend currently provides two integration methods with Tableau. The first approach utilizes the Other Databases (JDBC) interface within Tableau and is applicable to both Databend and Databend Cloud. The second method recommends using the [databend-tableau-connector-jdbc](https://github.com/databendcloud/databend-tableau-connector-jdbc) connector specifically developed by Databend for optimal connectivity with Databend. + +The `databend-tableau-connector-jdbc` connector offers faster performance through its JDBC driver, especially when creating Extracts, and is easier to install as a cross-platform jar file, eliminating platform-specific compilations. It allows you to fine-tune SQL queries for standard Tableau functionality, including multiple JOINS and working with Sets, and provides a user-friendly connection dialog for a seamless integration experience. + +## Tutorial-1: Integrating with Databend (through Other Databases (JDBC) Interface) + +In this tutorial, you'll deploy and integrate a local Databend with [Tableau Desktop](https://www.tableau.com/products/desktop). Before you start, [download](https://www.tableau.com/products/desktop/download) Tableau Desktop and follow the on-screen instructions to complete the installation. + +### Step 1. Deploy Databend + +1. Follow the [Local and Docker Deployments](../../20-self-hosted/02-deployment/01-non-production/00-deploying-local.md) guide to deploy a local Databend. +2. Create a SQL user in Databend. You will use this account to connect to Databend in Tableau Desktop. + +```sql +CREATE ROLE tableau_role; +GRANT ALL ON *.* TO ROLE tableau_role; +CREATE USER tableau IDENTIFIED BY 'tableau' WITH DEFAULT_ROLE = 'tableau_role'; +GRANT ROLE tableau_role TO tableau; +``` + +### Step 2. Install databend-jdbc + +1. Download the databend-jdbc driver (version 0.3.4 or higher) from the Maven Central Repository at https://repo1.maven.org/maven2/com/databend/databend-jdbc/ + +2. To install the databend-jdbc driver, move the jar file (for example, databend-jdbc-0.3.4.jar) to Tableau's driver folder. Tableau's driver folder varies depending on the operating system: + +| Operating System | Tableau's Driver Folder | +| ---------------- | -------------------------------- | +| MacOS | ~/Library/Tableau/Drivers | +| Windows | C:\Program Files\Tableau\Drivers | + +### Step 3. Connect to Databend + +1. Launch Tableau Desktop and select **Other Database (JDBC)** in the sidebar. This opens a window as follows: + +![Alt text](/img/integration/tableau-1.png) + +2. In the window that opens, provide the connection information and click **Sign In**. + +| Parameter | Description | For This Tutorial | +| --------- | -------------------------------------------------------------------- | -------------------------------------------------------- | +| URL | Format: `jdbc:databend://{user}:{password}@{host}:{port}/{database}` | `jdbc:databend://tableau:tableau@127.0.0.1:8000/default` | +| Dialect | Select "MySQL" for SQL dialect. | MySQL | +| Username | SQL user for connecting to Databend | tableau | +| Password | SQL user for connecting to Databend | tableau | + +3. When the Tableau workbook opens, select the database, schema, and tables that you want to query. For this tutorial, select _default_ for both **Database** and **Schema**. + +![Alt text](/img/integration/tableau-2.png) + +You're all set! You can now drag tables to the work area to start your query and further analysis. + +## Tutorial-2: Integrating with Databend (through databend-tableau-connector-jdbc Connector) + +In this tutorial, you'll deploy and integrate a local Databend with [Tableau Desktop](https://www.tableau.com/products/desktop). Before you start, [download](https://www.tableau.com/products/desktop/download) Tableau Desktop and follow the on-screen instructions to complete the installation. + +### Step 1. Deploy Databend + +1. Follow the [Local and Docker Deployments](../../20-self-hosted/02-deployment/01-non-production/00-deploying-local.md) guide to deploy a local Databend. +2. Create a SQL user in Databend. You will use this account to connect to Databend in Tableau Desktop. + +```sql +CREATE ROLE tableau_role; +GRANT ALL ON *.* TO ROLE tableau_role; +CREATE USER tableau IDENTIFIED BY 'tableau' WITH DEFAULT_ROLE = 'tableau_role'; +GRANT ROLE tableau_role TO tableau; +``` + +### Step 2. Install databend-jdbc + +1. Download the databend-jdbc driver (version 0.3.4 or higher) from the Maven Central Repository at https://repo1.maven.org/maven2/com/databend/databend-jdbc/ + +2. To install the databend-jdbc driver, move the jar file (for example, databend-jdbc-0.3.4.jar) to Tableau's driver folder. Tableau's driver folder varies depending on the operating system: + +| Operating System | Tableau's Driver Folder | +| ---------------- | -------------------------------- | +| MacOS | ~/Library/Tableau/Drivers | +| Windows | C:\Program Files\Tableau\Drivers | + +### Step 3. Install databend-tableau-connector-jdbc Connector + +1. Download the latest **databend_jdbc.taco** file from the connector's [Releases](https://github.com/databendcloud/databend-tableau-connector-jdbc/releases) page, and save it to the Tableau's connector folder: + +| Operating System | Tableau's Connector Folder | +| ---------------- | ------------------------------------------------------------------ | +| MacOS | ~/Documents/My Tableau Repository/Connectors | +| Windows | C:\Users\[Windows User]\Documents\My Tableau Repository\Connectors | + +2. Start Tableau Desktop with signature verification disabled. If you are on macOS, open Terminal and enter the following command: + +```shell +/Applications/Tableau\ Desktop\ 2023.2.app/Contents/MacOS/Tableau -DDisableVerifyConnectorPluginSignature=true +``` + +### Step 4. Connect to Databend + +1. In Tableau Desktop, select **Databend JDBC by Databend, Inc.** on **To a Server** > **More...**. + +![Alt text](/img/integration/tableau-connector-1.png) + +2. In the window that opens, provide the connection information and click **Sign In**. + +![Alt text](/img/integration/tableau-connector-2.png) + +3. Select a database, then you can drag tables to the work area to start your query and further analysis. + +![Alt text](/img/integration/tableau-connector-3.png) + +## Tutorial 3: Integrating with Databend Cloud + +In this tutorial, you'll integrate Databend Cloud with [Tableau Desktop](https://www.tableau.com/products/desktop). Before you start, [download](https://www.tableau.com/products/desktop/download) Tableau Desktop and follow the on-screen instructions to complete the installation. + +### Step 1. Obtain Connection Information + +Obtain the connection information from Databend Cloud. For how to do that, refer to [Connecting to a Warehouse](/guides/cloud/resources/warehouses#connecting). + +### Step 2. Install databend-jdbc + +1. Download the databend-jdbc driver (version 0.3.4 or higher) from the Maven Central Repository at https://repo1.maven.org/maven2/com/databend/databend-jdbc/ + +2. To install the databend-jdbc driver, move the jar file (for example, databend-jdbc-0.3.4.jar) to Tableau's driver folder. Tableau's driver folder varies depending on the operating system: + +| Operating System | Tableau's Driver Folder | +| ---------------- | -------------------------------- | +| MacOS | ~/Library/Tableau/Drivers | +| Windows | C:\Program Files\Tableau\Drivers | +| Linux | /opt/tableau/tableau_driver/jdbc | + +### Step 3. Connect to Databend Cloud + +1. Launch Tableau Desktop and select **Other Database (JDBC)** in the sidebar. This opens a window as follows: + +![Alt text](@site/static/img/documents/BI/tableau-1.png) + +2. In the window, provide the connection information you obtained in [Step 1](#step-1-obtain-connection-information) and click **Sign In**. + +| Parameter | Description | For This Tutorial | +| --------- | -------------------------------------------------------------------- | -------------------------------------------------------------------------- | +| URL | Format: `jdbc:databend://{user}:{password}@{host}:{port}/{database}` | `jdbc:databend://cloudapp:@https://:443/default` | +| Dialect | Select "MySQL" for SQL dialect. | MySQL | +| Username | SQL user for connecting to Databend Cloud | cloudapp | +| Password | SQL user for connecting to Databend Cloud | Your password | + +3. When the Tableau workbook opens, select the database, schema, and tables that you want to query. For this tutorial, select _default_ for both **Database** and **Schema**. + +![Alt text](@site/static/img/documents/BI/tableau-2.png) + +You're all set! You can now drag tables to the work area to start your query and further analysis. diff --git a/tidb-cloud-lake/guides/tidb-cloud-lake-architecture.md b/tidb-cloud-lake/guides/tidb-cloud-lake-architecture.md new file mode 100644 index 0000000000000..b711ec907d815 --- /dev/null +++ b/tidb-cloud-lake/guides/tidb-cloud-lake-architecture.md @@ -0,0 +1,45 @@ +--- +title: Databend Cloud Architecture +sidebar_label: Architecture +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +![Alt text](@site/static/img/documents/overview/2.png) + + + + +The metadata service is a multi-tenant service that stores the metadata of each tenant in Databend Cloud in a highly available Raft cluster. This metadata includes: + +- Table schema: including the field structure and storage location information of each table, providing optimization information for query planning and providing transaction atomicity guarantee for the storage layer write; +- Cluster management: When the cluster of each tenant starts, multiple instances within the cluster will be registered as metadata and provide health checks for the instances to ensure the overall health of the cluster; +- Security management: saves user, role, and permission-granting information to ensure the security and reliability of data access authentication and authorization processes. + + + + +The architecture of complete separation of storage and compute gives Databend Cloud a unique computational elasticity. + +Each tenant in Databend Cloud can have multiple compute clusters (Warehouse), each with exclusive computing resources, and can automatically release them when inactive for more than 1 minute to reduce usage costs. + +In the compute cluster, queries are executed through the high-performance Databend engine. Each query will go through multiple different submodules: + +- Planner: After parsing the SQL statement, it will combine different operators (such as Projection, Filter, Limit, etc.) into a query plan based on different query types. +- Optimizer: The Databend engine provides a rule-based and cost-based optimizer framework, which implements a series of optimization mechanisms such as predicate pushdown, join reorder, and scan pruning, greatly accelerating queries. +- Processors: Databend implements a push-pull combination of pipeline execution engines. It composes the physical execution of queries into a series of pipelines in the Processor and can dynamically adapt the pipeline configuration based on the runtime information of the query task, combining the vectorized expression calculation framework to maximize the computing power of the CPU. + +In addition, Databend Cloud can dynamically increase or decrease nodes in the cluster with the change of query workload, making computing faster and more cost-effective. + + + + +The storage layer of Databend Cloud is based on FuseEngine, which is designed and optimized for inexpensive object storage. FuseEngine efficiently organizes data based on the properties of object storage, allowing for high-throughput data ingestion and retrieval. + +FuseEngine compresses data in a columnar format and stores it in object storage, which significantly reduces the data volume and storage costs. + +In addition to storing data files, FuseEngine also generates index information, including MinMax index, Bloomfilter index, and others. These indexes reduce IO and CPU consumption during query execution, greatly improving query performance. + + + diff --git a/tidb-cloud-lake/guides/track-and-transform-data-via-streams.md b/tidb-cloud-lake/guides/track-and-transform-data-via-streams.md new file mode 100644 index 0000000000000..d58c5b6a7d78f --- /dev/null +++ b/tidb-cloud-lake/guides/track-and-transform-data-via-streams.md @@ -0,0 +1,241 @@ +--- +title: Tracking and Transforming Data via Streams +sidebar_label: Stream +--- + +import StepsWrap from '@site/src/components/StepsWrap'; +import StepContent from '@site/src/components/Steps/step-content'; + +A stream in Databend is an always-on change table: every committed INSERT, UPDATE, or DELETE is captured until you consume it. This page stays lean—first a quick overview, then one lab with real outputs so you can see streams in action. + +## Stream Overview + +- Streams don’t duplicate table storage; they list the latest change for each affected row until you consume it. +- Consumption (task, INSERT ... SELECT, `WITH CONSUME`, etc.) clears the stream while keeping it ready for new data. +- `APPEND_ONLY` defaults to `true`; set `APPEND_ONLY = false` only when you must capture UPDATE/DELETE events. + +| Mode | Captures | Typical use | +| --- | --- | --- | +| Standard (`APPEND_ONLY = false`) | INSERT + UPDATE + DELETE, collapsed to the latest state per row. | Slowly changing dimensions, compliance audits. | +| Append-Only (`APPEND_ONLY = true`, default) | INSERT only. | Append-only fact/event ingestion. | + +## Example 1: Append-Only Stream + +Run the statements below in any Databend deployment (Cloud worksheet or local) to see how the default append-only mode captures and consumes inserts. + +### 1. Create table and stream + +```sql +CREATE OR REPLACE TABLE sensor_readings ( + sensor_id INT, + temperature DOUBLE +); + +-- APPEND_ONLY defaults to true, so no extra clause is required. +CREATE OR REPLACE STREAM sensor_readings_stream + ON TABLE sensor_readings; +``` + +### 2. Insert rows and preview + +```sql +INSERT INTO sensor_readings VALUES (1, 21.5), (2, 19.7); + +SELECT sensor_id, temperature, change$action, change$is_update +FROM sensor_readings_stream; +``` + +Output: + +``` +┌────────────┬───────────────┬───────────────┬──────────────────┐ +│ sensor_id │ temperature │ change$action │ change$is_update │ +├────────────┼───────────────┼───────────────┼──────────────────┤ +│ 1 │ 21.5 │ INSERT │ false │ +│ 2 │ 19.7 │ INSERT │ false │ +└────────────┴───────────────┴───────────────┴──────────────────┘ +``` + +### 3. Consume (optional) + +```sql +SELECT sensor_id, temperature +FROM sensor_readings_stream WITH CONSUME; + +SELECT * FROM sensor_readings_stream; -- now empty +``` + +`WITH CONSUME` reads the stream once and clears the delta so the next round can capture fresh INSERTs. + +## Example 2: Standard Stream (Updates & Deletes) + +Switch to Standard mode when you must react to every mutation, including UPDATE or DELETE. + +### 1. Create a Standard stream + +```sql +CREATE OR REPLACE STREAM sensor_readings_stream_std + ON TABLE sensor_readings + APPEND_ONLY = false; +``` + +### 2. Mutate rows and compare + +```sql +DELETE FROM sensor_readings WHERE sensor_id = 1; -- remove old reading +INSERT INTO sensor_readings VALUES (1, 22); -- same sensor, new value +DELETE FROM sensor_readings WHERE sensor_id = 2; -- pure deletion +INSERT INTO sensor_readings VALUES (3, 18.5); -- brand-new sensor + +SELECT * FROM sensor_readings_stream; -- still empty (Append-Only ignores non-inserts) + +SELECT sensor_id, temperature, change$action, change$is_update +FROM sensor_readings_stream_std +ORDER BY change$row_id; +``` + +Output: + +``` +┌────────────┬───────────────┬───────────────┬──────────────────┐ +│ sensor_id │ temperature │ change$action │ change$is_update │ +├────────────┼───────────────┼───────────────┼──────────────────┤ +│ 1 │ 21.5 │ DELETE │ true │ +│ 1 │ 22 │ INSERT │ true │ +│ 2 │ 19.7 │ DELETE │ false │ +│ 3 │ 18.5 │ INSERT │ false │ +└────────────┴───────────────┴───────────────┴──────────────────┘ +``` + + Standard streams capture each change with context: updates show up as DELETE+INSERT on the same `sensor_id`, while standalone deletions/insertions appear individually. Append-Only streams stay empty because they track inserts only. + +## Example 3: Incremental Stream Join + +Join multiple append-only streams to produce incremental KPIs. Because Databend streams keep new rows until they are consumed, you can run the same query after each load. Every execution drains only the new rows via [`WITH CONSUME`](/tidb-cloud-lake/sql/consume.md), so updates that arrive at different times are still matched on the next iteration. + +### 1. Create tables and streams + +```sql +CREATE OR REPLACE TABLE customers ( + customer_id INT, + segment VARCHAR, + city VARCHAR +); + +CREATE OR REPLACE TABLE orders ( + order_id INT, + customer_id INT, + amount DOUBLE +); + +CREATE OR REPLACE STREAM customers_stream ON TABLE customers; +CREATE OR REPLACE STREAM orders_stream ON TABLE orders; +``` + +### 2. Load the first batch + +```sql +INSERT INTO customers VALUES + (101, 'VIP', 'Seattle'), + (102, 'Standard', 'Austin'), + (103, 'VIP', 'Austin'); + +INSERT INTO orders VALUES + (5001, 101, 199.0), + (5002, 101, 59.0), + (5003, 102, 89.0); +``` + +### 3. Run the first incremental query + +```sql +WITH + orders_delta AS ( + SELECT customer_id, amount + FROM orders_stream WITH CONSUME + ), + customers_delta AS ( + SELECT customer_id, segment + FROM customers_stream WITH CONSUME + ) +SELECT + o.customer_id, + c.segment, + SUM(o.amount) AS incremental_sales +FROM orders_delta AS o +JOIN customers_delta AS c + ON o.customer_id = c.customer_id +GROUP BY o.customer_id, c.segment +ORDER BY o.customer_id; +``` + +``` +┌──────────────┬───────────┬────────────────────┐ +│ customer_id │ segment │ incremental_sales │ +├──────────────┼───────────┼────────────────────┤ +│ 101 │ VIP │ 258.0 │ +│ 102 │ Standard │ 89.0 │ +└──────────────┴───────────┴────────────────────┘ +``` + +The streams are now empty. When more rows arrive, the same query will capture only the new data. + +### 4. Run again after the next batch + +```sql +-- New data arrives later +INSERT INTO customers VALUES (104, 'Standard', 'Denver'); +INSERT INTO orders VALUES + (5004, 101, 40.0), + (5005, 104, 120.0); + +-- Same incremental query as before +WITH + orders_delta AS ( + SELECT customer_id, amount + FROM orders_stream WITH CONSUME + ), + customers_delta AS ( + SELECT customer_id, segment + FROM customers_stream WITH CONSUME + ) +SELECT + o.customer_id, + c.segment, + SUM(o.amount) AS incremental_sales +FROM orders_delta AS o +JOIN customers_delta AS c + ON o.customer_id = c.customer_id +GROUP BY o.customer_id, c.segment +ORDER BY o.customer_id; +``` + +``` +┌──────────────┬───────────┬────────────────────┐ +│ customer_id │ segment │ incremental_sales │ +├──────────────┼───────────┼────────────────────┤ +│ 101 │ VIP │ 40.0 │ +│ 104 │ Standard │ 120.0 │ +└──────────────┴───────────┴────────────────────┘ +``` + +Rows stay in each stream until `WITH CONSUME` runs, so inserts that arrive at different times are still matched on the next run. Leave the streams unconsumed when you expect more related rows, then rerun the query to pick up the incremental delta. + +## Stream Workflow Notes + +**Consumption** +- Streams are drained inside a transaction: `INSERT INTO target SELECT ... FROM stream` empties the stream only when the statement commits. +- Only one consumer can succeed at a time; other concurrent statements roll back. + +**Modes** +- Append-Only streams capture INSERTs only and are ideal for append-heavy workloads. +- Standard streams emit updates and deletes as long as you consume them; late-arriving updates remain until the next run. + +**Hidden Columns** +- Streams expose `change$action`, `change$is_update`, and `change$row_id`; use them to understand how Databend recorded each row. +- Base tables gain `_origin_version`, `_origin_block_id`, `_origin_block_row_num` for debugging row provenance. + +**Integrations** +- Pair streams with tasks using `task_history('', )` for scheduled incremental loads. +- Use [`WITH CONSUME`](/tidb-cloud-lake/sql/task.md) when you want to drain only the latest delta. + diff --git a/tidb-cloud-lake/guides/track-metrics.md b/tidb-cloud-lake/guides/track-metrics.md new file mode 100644 index 0000000000000..03868092d703b --- /dev/null +++ b/tidb-cloud-lake/guides/track-metrics.md @@ -0,0 +1,156 @@ +--- +title: Tracking Metrics with Prometheus +--- + +import StepsWrap from '@site/src/components/StepsWrap'; +import StepContent from '@site/src/components/Steps/step-content'; + +[Prometheus](https://prometheus.io/) offers a robust solution for real-time monitoring, empowering you to track critical metrics and maintain system stability effectively. This topic guides you through the steps to integrate Prometheus with Databend Cloud and provides an overview of the available metrics. + +:::note +Tracking metrics with Prometheus is only available for Databend Cloud users on the Business and Dedicated plans. +::: + +## Integrating with Prometheus + +Follow these steps to set up a Prometheus instance with Docker and integrate it with Databend Cloud: + + + + +### Prerequisites + +- To start tracking metrics, ensure that metrics are enabled for your Databend Cloud tenant. To enable this feature, submit a support ticket in Databend Cloud by navigating to **Support** > **Create New Ticket** and requesting metrics activation for your tenant. + +- This procedure explains how to set up a Prometheus instance using Docker. Ensure that the Docker Engine is installed on your machine before proceeding. + + + + +### Prepare a SQL User + +Create a dedicated SQL user in Databend Cloud for Prometheus to access metrics. For example, you can create a SQL user named `metrics` with the password `metrics_password` using the following SQL statement: + +```sql +CREATE USER metrics IDENTIFIED BY 'metrics_password'; +``` + + + + +### Start Prometheus Using Docker + +1. On your local machine, create a file named **prometheus.yml** to configure Prometheus for scraping metrics from Databend Cloud. Use the following template: + +```yaml title='prometheus.yml' +scrape_configs: + - job_name: databend-cloud + scheme: https + metrics_path: /metrics + basic_auth: + username: + password: + scrape_interval: 10s + scrape_timeout: 3s + static_configs: + - targets: + - + labels: # Optional + tenant: + platform: + region: +``` + +| Placeholder | Description | Example | +| ------------------- | ------------------------------------------------ | ------------------------------------------------- | +| `` | The username for the SQL user. | `metrics` | +| `` | The secure password for the SQL user. | `metrics_password` | +| `` | The endpoint URL for your Databend Cloud tenant. | `tnxxxxxxx.gw.aws-us-east-2.default.databend.com` | +| `` | Your tenant's unique identifier. | `tnxxxxxxx` | +| `` | The cloud platform hosting the tenant. | `aws` | +| `` | The region where the tenant is hosted. | `us-east-2` | + +2. Start Prometheus with the following command (replace `` with the full path to your **prometheus.yml** file): + +```bash +docker run -d \ + --name prometheus \ + -p 9090:9090 \ + -v :/etc/prometheus/prometheus.yml \ + prom/prometheus +``` + +3. Open Prometheus in your browser at `http://localhost:9090`, navigate to **Status** > **Target health**, and confirm that the `databend-cloud` target is listed with a status of `UP`. + +![alt text](../../../../../static/img/documents/warehouses/metrics-1.png) + + + + +You're all set! You can now query your tenant metrics directly from Prometheus. For example, try querying `databend_cloud_warehouse_status`: + +![alt text](../../../../../static/img/documents/warehouses/metrics-2.png) + +## Available Metrics List + +Please note that all metrics are prefixed with `databend_cloud_`. + +:::note +The metrics are in alpha state and may change over time. We recommend that you monitor the metrics closely and adjust your monitoring setup accordingly. +::: + +### Query Metrics + +The following is a list of query metrics available in Databend Cloud: + +| Name | Type | Labels | Description | +| -------------------- | ------- | ---------------- | ----------------------------------- | +| query_count | Counter | tenant,warehouse | Query counts made by clients | +| query_errors | Counter | tenant,warehouse | Query error counts made by clients | +| query_request_bytes | Counter | tenant,warehouse | Query request bytes from client | +| query_response_bytes | Counter | tenant,warehouse | Query response bytes sent to client | + +### Storage Metrics + +The following is a list of storage metrics available in Databend Cloud: + +| Name | Type | Labels | Description | +| ----------------------------- | ----- | ------ | ----------------------------------------------------- | +| storage_total_size | Guage | tenant | Total size for backend object storage | +| storage_staged_size | Guage | tenant | Total size for staged files on backend object storage | +| storage_table_compressed_size | Guage | tenant | Total size for current tables backend object storage | +| storage_non_current_size | Guage | tenant | Total size for non-current objects in backend storage | + +### Warehouse Metrics + +The following is a list of warehouse metrics available in Databend Cloud: + +| Name | Type | Labels | Description | +| -------------------------------- | ------- | ---------------------------- | --------------------------------------------------- | +| warehouse_status | Guage | tenant,warehouse,size,status | Flag for warehouse status (Suspended,Running, etc.) | +| warehouse_connections | Guage | tenant,warehouse | Session Count currently | +| warehouse_queries_queued | Guage | tenant,warehouse | Queries waiting in queue currently | +| warehouse_queries_running | Guage | tenant,warehouse | Queries running currently | +| warehouse_queries_start_total | Counter | tenant,warehouse | Queries started total | +| warehouse_queries_failed_total | Counter | tenant,warehouse | Queries failed total | +| warehouse_queries_success_total | Counter | tenant,warehouse | Queries success total | +| warehouse_storage_requests_total | Counter | tenant,warehouse,scheme,op | Requests count to backend storage | +| warehouse_storage_requests_bytes | Counter | tenant,warehouse,scheme,op | Requests bytes from backend storage | +| warehouse_data_scan_rows | Counter | tenant,warehouse | Data rows scanned from backend storage | +| warehouse_data_write_rows | Counter | tenant,warehouse | Data rows written to backend storage | + +### Task Metrics + +The following is a list of task metrics available in Databend Cloud: + +| Name | Type | Labels | Description | +| ------------------------------- | ------- | --------------- | -------------------------------- | +| task_scheduled_total | Counter | tenant,task | Total scheduled tasks | +| task_query_requests_total | Counter | tenant,task | Query requests for tasks | +| task_run_skipped_total | Counter | tenant,task | Skipped task runs | +| task_accessor_requests_total | Counter | tenant,function | Accessor requests for tasks | +| task_notification_success_total | Counter | tenant | Successful task notifications | +| task_notification_errors_total | Counter | tenant | Task notification errors | +| task_running_duration_seconds | Counter | tenant,task | Task running duration in seconds | +| task_running | Counter | tenant,task | Running tasks | +| task_scheduled_timestamp | Counter | tenant,task | Scheduled task timestamps | diff --git a/tidb-cloud-lake/guides/transform-data-on-load.md b/tidb-cloud-lake/guides/transform-data-on-load.md new file mode 100644 index 0000000000000..0196bee966774 --- /dev/null +++ b/tidb-cloud-lake/guides/transform-data-on-load.md @@ -0,0 +1,250 @@ +--- +title: Transforming Data on Load +--- + +Databend's `COPY INTO` command allows data transformation during the loading process. This streamlines your ETL pipeline by incorporating basic transformations, eliminating the need for temporary tables. + +See [Querying & Transforming]( /tidb-cloud-lake/guides/query-stage.md) for syntax. + +Key transformations you can perform: + +- **Loading a subset of data columns**: Selectively import specific columns. +- **Reordering columns**: Change column order during load. +- **Converting datatypes**: Ensure consistency and compatibility. +- **Performing arithmetic operations**: Generate new derived data. +- **Loading data to a table with additional columns**: Map and insert data into existing structures. + +## Tutorials + +These tutorials demonstrate data transformation during loading. Each example shows loading from a staged file. + +### Before You Begin + +Create a stage and generate a sample Parquet file: + +```sql +CREATE STAGE my_parquet_stage; +COPY INTO @my_parquet_stage +FROM ( + SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS id, + 'Name_' || CAST(number AS VARCHAR) AS name, + 20 + MOD(number, 23) AS age, + DATE_ADD('day', MOD(number, 60), '2022-01-01') AS onboarded + FROM numbers(10) +) +FILE_FORMAT = (TYPE = PARQUET); +``` + +Query the staged sample file: + +```sql +SELECT * FROM @my_parquet_stage; +``` + +Result: + +``` +┌───────────────────────────────────────┐ +│ id │ name │ age │ onboarded │ +├────────┼────────┼────────┼────────────┤ +│ 1 │ Name_0 │ 20 │ 2022-01-01 │ +│ 2 │ Name_5 │ 25 │ 2022-01-06 │ +│ 3 │ Name_1 │ 21 │ 2022-01-02 │ +│ 4 │ Name_6 │ 26 │ 2022-01-07 │ +│ 5 │ Name_7 │ 27 │ 2022-01-08 │ +│ 6 │ Name_2 │ 22 │ 2022-01-03 │ +│ 7 │ Name_8 │ 28 │ 2022-01-09 │ +│ 8 │ Name_3 │ 23 │ 2022-01-04 │ +│ 9 │ Name_4 │ 24 │ 2022-01-05 │ +│ 10 │ Name_9 │ 29 │ 2022-01-10 │ +└───────────────────────────────────────┘ +``` + +### Tutorial 1 - Loading a Subset of Data Columns + +Load data into a table with fewer columns than the source file (e.g., excluding 'age'). + +```sql +CREATE TABLE employees_no_age ( + id INT, + name VARCHAR, + onboarded timestamp +); + +COPY INTO employees_no_age +FROM ( + SELECT t.id, + t.name, + t.onboarded + FROM @my_parquet_stage t +) +FILE_FORMAT = (TYPE = PARQUET) +PATTERN = '.*parquet'; + +SELECT * FROM employees_no_age; +``` + +Result (first 3 rows): + +``` +┌──────────────────────────────────────────────────────────┐ +│ id │ name │ onboarded │ +├─────────────────┼──────────────────┼─────────────────────┤ +│ 1 │ Name_0 │ 2022-01-01 00:00:00 │ +│ 2 │ Name_5 │ 2022-01-06 00:00:00 │ +│ 3 │ Name_1 │ 2022-01-02 00:00:00 │ +└──────────────────────────────────────────────────────────┘ +``` + +### Tutorial 2 - Reordering Columns During Load + +Load data into a table with columns in a different order (e.g., 'age' before 'name'). + +```sql +CREATE TABLE employees_new_order ( + id INT, + age INT, + name VARCHAR, + onboarded timestamp +); + +COPY INTO employees_new_order +FROM ( + SELECT + t.id, + t.age, + t.name, + t.onboarded + FROM @my_parquet_stage t +) +FILE_FORMAT = (TYPE = PARQUET) +PATTERN = '.*parquet'; + +SELECT * FROM employees_new_order; +``` +Result (first 3 rows): + +``` +┌────────────────────────────────────────────────────────────────────────────┐ +│ id │ age │ name │ onboarded │ +├─────────────────┼─────────────────┼──────────────────┼─────────────────────┤ +│ 1 │ 20 │ Name_0 │ 2022-01-01 00:00:00 │ +│ 2 │ 25 │ Name_5 │ 2022-01-06 00:00:00 │ +│ 3 │ 21 │ Name_1 │ 2022-01-02 00:00:00 │ +└────────────────────────────────────────────────────────────────────────────┘ +``` + +### Tutorial 3 - Converting Datatypes During Load + +Load data and convert a column's datatype (e.g., 'onboarded' to `DATE`). + +```sql +CREATE TABLE employees_date ( + id INT, + name VARCHAR, + age INT, + onboarded date +); + +COPY INTO employees_date +FROM ( + SELECT + t.id, + t.name, + t.age, + to_date(t.onboarded) + FROM @my_parquet_stage t +) +FILE_FORMAT = (TYPE = PARQUET) +PATTERN = '.*parquet'; + +SELECT * FROM employees_date; +``` +Result (first 3 rows): + +``` +┌───────────────────────────────────────────────────────────────────────┐ +│ id │ name │ age │ onboarded │ +├─────────────────┼──────────────────┼─────────────────┼────────────────┤ +│ 1 │ Name_0 │ 20 │ 2022-01-01 │ +│ 2 │ Name_5 │ 25 │ 2022-01-06 │ +│ 3 │ Name_1 │ 21 │ 2022-01-02 │ +└───────────────────────────────────────────────────────────────────────┘ +``` + +### Tutorial 4 - Performing Arithmetic Operations During Load + +Load data and perform arithmetic operations (e.g., increment 'age' by 1). + +```sql +CREATE TABLE employees_new_age ( + id INT, + name VARCHAR, + age INT, + onboarded timestamp +); + +COPY INTO employees_new_age +FROM ( + SELECT + t.id, + t.name, + t.age + 1, + t.onboarded + FROM @my_parquet_stage t +) +FILE_FORMAT = (TYPE = PARQUET) +PATTERN = '.*parquet'; + +SELECT * FROM employees_new_age; +``` +Result (first 3 rows): + +``` +┌────────────────────────────────────────────────────────────────────────────┐ +│ id │ name │ age │ onboarded │ +├─────────────────┼──────────────────┼─────────────────┼─────────────────────┤ +│ 1 │ Name_0 │ 21 │ 2022-01-01 00:00:00 │ +│ 2 │ Name_5 │ 26 │ 2022-01-06 00:00:00 │ +│ 3 │ Name_1 │ 22 │ 2022-01-02 00:00:00 │ +└────────────────────────────────────────────────────────────────────────────┘ +``` + +### Tutorial 5 - Loading to a Table with Additional Columns + +Load data into a table that has more columns than the source file. + +```sql +CREATE TABLE employees_plus ( + id INT, + name VARCHAR, + age INT, + onboarded timestamp, + lastday timestamp +); + +COPY INTO employees_plus (id, name, age, onboarded) +FROM ( + SELECT + t.id, + t.name, + t.age, + t.onboarded + FROM @my_parquet_stage t +) +FILE_FORMAT = (TYPE = PARQUET) +PATTERN = '.*parquet'; + +SELECT * FROM employees_plus; +``` +Result (first 3 rows): + +``` +┌──────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ id │ name │ age │ onboarded │ lastday │ +├─────────────────┼──────────────────┼─────────────────┼─────────────────────┼─────────────────────┤ +│ 1 │ Name_0 │ 20 │ 2022-01-01 00:00:00 │ NULL │ +│ 2 │ Name_5 │ 25 │ 2022-01-06 00:00:00 │ NULL │ +│ 3 │ Name_1 │ 21 │ 2022-01-02 00:00:00 │ NULL │ +└──────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/guides/unload-csv-file.md b/tidb-cloud-lake/guides/unload-csv-file.md new file mode 100644 index 0000000000000..aa82c87675e3d --- /dev/null +++ b/tidb-cloud-lake/guides/unload-csv-file.md @@ -0,0 +1,93 @@ +--- +title: Unloading CSV File +--- + +## Unloading CSV File + +Syntax: + +```sql +COPY INTO { internalStage | externalStage | externalLocation } +FROM { [.] | ( ) } +FILE_FORMAT = ( + TYPE = CSV, + RECORD_DELIMITER = '', + FIELD_DELIMITER = '', + COMPRESSION = gzip, + OUTPUT_HEADER = true -- Unload with header +) +[MAX_FILE_SIZE = ] +[DETAILED_OUTPUT = true | false] +``` + +- More CSV options refer to [CSV File Format Options](/tidb-cloud-lake/sql/input-output-file-formats.md#csv-options) +- Unloading into multiple files use the [MAX_FILE_SIZE Copy Option](/tidb-cloud-lake/sql/copy-into-location.md#copyoptions) +- More details about the syntax can be found in [COPY INTO location](/tidb-cloud-lake/sql/copy-into-location.md) + +## Tutorial + +### Step 1. Create an External Stage + +```sql +CREATE STAGE csv_unload_stage +URL = 's3://unload/csv/' +CONNECTION = ( + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = '' +); +``` + +### Step 2. Create Custom CSV File Format + +```sql +CREATE FILE FORMAT csv_unload_format + TYPE = CSV, + RECORD_DELIMITER = '\n', + FIELD_DELIMITER = ',', + COMPRESSION = gzip, -- Unload with gzip compression + OUTPUT_HEADER = true, -- Unload with header + SKIP_HEADER = 1; -- Only for loading, skip first line when querying if the CSV file has header +``` + +### Step 3. Unload into CSV File + +```sql +COPY INTO @csv_unload_stage +FROM ( + SELECT * + FROM generate_series(1, 100) +) +FILE_FORMAT = (FORMAT_NAME = 'csv_unload_format') +DETAILED_OUTPUT = true; +``` + +Result: + +```text +┌──────────────────────────────────────────────────────────────────────────────────────────┐ +│ file_name │ file_size │ row_count │ +├──────────────────────────────────────────────────────────────────┼───────────┼───────────┤ +│ data_c8382216-0a04-4920-9eca-7b5debe3eed6_0000_00000000.csv.gz │ 187 │ 100 │ +└──────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Step 4. Verify the Unloaded CSV Files + +```sql +SELECT COUNT($1) +FROM @csv_unload_stage +( + FILE_FORMAT => 'csv_unload_format', + PATTERN => '.*[.]csv[.]gz' +); +``` + +Result: + +```text +┌───────────┐ +│ count($1) │ +├───────────┤ +│ 100 │ +└───────────┘ +``` diff --git a/tidb-cloud-lake/guides/unload-data-from-databend.md b/tidb-cloud-lake/guides/unload-data-from-databend.md new file mode 100644 index 0000000000000..457ebfec204b5 --- /dev/null +++ b/tidb-cloud-lake/guides/unload-data-from-databend.md @@ -0,0 +1,25 @@ +--- +title: Unload Data from Databend +slug: /unload-data +--- + +Databend's `COPY INTO` command exports data to various file formats and storage locations with flexible formatting options. + +## Supported File Formats + + +| Format | Example Syntax | Primary Use Case | +|--------|---------------|------------------| +| [**Unload Parquet File**](/guides/unload-data/unload-parquet) | `FILE_FORMAT = (TYPE = PARQUET)` | Analytics workloads, efficient storage | +| [**Unload CSV File**](/guides/unload-data/unload-csv) | `FILE_FORMAT = (TYPE = CSV)` | Data exchange, universal compatibility | +| [**Unload TSV File**](/guides/unload-data/unload-tsv) | `FILE_FORMAT = (TYPE = TSV)` | Tabular data with comma values | +| [**Unload NDJSON File**](/guides/unload-data/unload-ndjson) | `FILE_FORMAT = (TYPE = NDJSON)` | Semi-structured data, flexible schemas | + +## Storage Destinations + + +| Destination | Example | When to Use | +|-------------|---------|-------------| +| **Named Stage** | `COPY INTO my_stage FROM my_table` | For repeated exports to the same location | +| **S3-Compatible Storage** | `COPY INTO 's3://bucket/path/' FROM my_table` | Cloud object storage with Amazon S3 | + diff --git a/tidb-cloud-lake/guides/unload-ndjson-file.md b/tidb-cloud-lake/guides/unload-ndjson-file.md new file mode 100644 index 0000000000000..d42f88c9f1874 --- /dev/null +++ b/tidb-cloud-lake/guides/unload-ndjson-file.md @@ -0,0 +1,87 @@ +--- +title: Unloading NDJSON File +--- + +## Unloading TSV File + +Syntax: + +```sql +COPY INTO { internalStage | externalStage | externalLocation } +FROM { [.] | ( ) } +FILE_FORMAT = ( + TYPE = NDJSON, + COMPRESSION = gzip, + OUTPUT_HEADER = true +) +[MAX_FILE_SIZE = ] +[DETAILED_OUTPUT = true | false] +``` + +- More NDJSON options refer to [NDJSON File Format Options](/tidb-cloud-lake/sql/input-output-file-formats.md#ndjson-options) +- Unloading into multiple files use the [MAX_FILE_SIZE Copy Option](/tidb-cloud-lake/sql/copy-into-location.md#copyoptions) +- More details about the syntax can be found in [COPY INTO location](/tidb-cloud-lake/sql/copy-into-location.md) + +## Tutorial + +### Step 1. Create an External Stage + +```sql +CREATE STAGE ndjson_unload_stage +URL = 's3://unload/ndjson/' +CONNECTION = ( + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = '' +); +``` + +### Step 2. Create Custom NDJSON File Format + +``` +CREATE FILE FORMAT ndjson_unload_format + TYPE = NDJSON, + COMPRESSION = gzip; -- Unload with gzip compression +``` + +### Step 3. Unload into NDJSON File + +```sql +COPY INTO @ndjson_unload_stage +FROM ( + SELECT * + FROM generate_series(1, 100) +) +FILE_FORMAT = (FORMAT_NAME = 'ndjson_unload_format') +DETAILED_OUTPUT = true; +``` + +Result: + +```text +┌─────────────────────────────────────────────────────────────────────────────────────────────┐ +│ file_name │ file_size │ row_count │ +├─────────────────────────────────────────────────────────────────────┼───────────┼───────────┤ +│ data_068976e5-2072-4ad8-9887-16fb9129ed80_0000_00000000.ndjson.gz │ 263 │ 100 │ +└─────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Step 4. Verify the Unloaded NDJSON Files + +```sql +SELECT COUNT($1) +FROM @ndjson_unload_stage +( + FILE_FORMAT => 'ndjson_unload_format', + PATTERN => '.*[.]ndjson[.]gz' +); +``` + +Result: + +```text +┌───────────┐ +│ count($1) │ +├───────────┤ +│ 100 │ +└───────────┘ +``` diff --git a/tidb-cloud-lake/guides/unload-parquet-file.md b/tidb-cloud-lake/guides/unload-parquet-file.md new file mode 100644 index 0000000000000..9d1fea0c4c8d4 --- /dev/null +++ b/tidb-cloud-lake/guides/unload-parquet-file.md @@ -0,0 +1,84 @@ +--- +title: Unloading Parquet File +--- + +## Unloading Parquet File + +Syntax: + +```sql +COPY INTO {internalStage | externalStage | externalLocation} +FROM { [.] | ( ) } +FILE_FORMAT = (TYPE = PARQUET) +[MAX_FILE_SIZE = ] +[DETAILED_OUTPUT = true | false] +``` + +- More Parquet options refer to [Parquet File Format Options](/tidb-cloud-lake/sql/input-output-file-formats.md#parquet-options) +- Unloading into multiple files use the [MAX_FILE_SIZE Copy Option](/tidb-cloud-lake/sql/copy-into-location.md#copyoptions) +- More details about the syntax can be found in [COPY INTO location](/tidb-cloud-lake/sql/copy-into-location.md) + +## Tutorial + +### Step 1. Create an External Stage + +```sql +CREATE STAGE parquet_unload_stage +URL = 's3://unload/parquet/' +CONNECTION = ( + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = '' +); +``` + +### Step 2. Create Custom Parquet File Format + +```sql +CREATE FILE FORMAT parquet_unload_format + TYPE = PARQUET + ; +``` + +### Step 3. Unload into Parquet File + +```sql +COPY INTO @parquet_unload_stage +FROM ( + SELECT * + FROM generate_series(1, 100) +) +FILE_FORMAT = (FORMAT_NAME = 'parquet_unload_format') +DETAILED_OUTPUT = true; +``` + +Result: + +```text +┌───────────────────────────────────────────────────────────────────────────────────────────┐ +│ file_name │ file_size │ row_count │ +│ String │ UInt64 │ UInt64 │ +├───────────────────────────────────────────────────────────────────┼───────────┼───────────┤ +│ data_a3760513-78a8-4a89-8f92-b1a17e0a61b6_0000_00000000.parquet │ 445 │ 100 │ +└───────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Step 4. Verify the Unloaded Parquet Files + +```sql +SELECT COUNT($1) +FROM @parquet_unload_stage +( + FILE_FORMAT => 'parquet_unload_format', + PATTERN => '.*[.]parquet' +); +``` + +Result: + +```text +┌───────────┐ +│ count($1) │ +├───────────┤ +│ 100 │ +└───────────┘ +``` diff --git a/tidb-cloud-lake/guides/unload-tsv-file.md b/tidb-cloud-lake/guides/unload-tsv-file.md new file mode 100644 index 0000000000000..9527cf1e5a0de --- /dev/null +++ b/tidb-cloud-lake/guides/unload-tsv-file.md @@ -0,0 +1,89 @@ +--- +title: Unloading TSV File +--- + +## Unloading TSV File + +Syntax: + +```sql +COPY INTO { internalStage | externalStage | externalLocation } +FROM { [.] | ( ) } +FILE_FORMAT = ( + TYPE = TSV, + RECORD_DELIMITER = '', + FIELD_DELIMITER = '', + COMPRESSION = gzip, + OUTPUT_HEADER = true -- Unload with header +) +[MAX_FILE_SIZE = ] +[DETAILED_OUTPUT = true | false] +``` + +- More TSV options refer to [TSV File Format Options](/tidb-cloud-lake/sql/input-output-file-formats.md#tsv-options) +- Unloading into multiple files use the [MAX_FILE_SIZE Copy Option](/tidb-cloud-lake/sql/copy-into-location.md#copyoptions) +- More details about the syntax can be found in [COPY INTO location](/tidb-cloud-lake/sql/copy-into-location.md) + +## Tutorial + +### Step 1. Create an External Stage + +```sql +CREATE STAGE tsv_unload_stage +URL = 's3://unload/tsv/' +CONNECTION = ( + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = '' +); +``` + +### Step 2. Create Custom TSV File Format + +```sql +CREATE FILE FORMAT tsv_unload_format + TYPE = TSV, + COMPRESSION = gzip; -- Unload with gzip compression +``` + +### Step 3. Unload into TSV File + +```sql +COPY INTO @tsv_unload_stage +FROM ( + SELECT * + FROM generate_series(1, 100) +) +FILE_FORMAT = (FORMAT_NAME = 'tsv_unload_format') +DETAILED_OUTPUT = true; +``` + +Result: + +```text +┌──────────────────────────────────────────────────────────────────────────────────────────┐ +│ file_name │ file_size │ row_count │ +├──────────────────────────────────────────────────────────────────┼───────────┼───────────┤ +│ data_99e8f5c8-79d6-43d8-80d7-13e3f4c91dd5_0002_00000000.tsv.gz │ 160 │ 100 │ +└──────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Step 4. Verify the Unloaded TSV Files + +``` +SELECT COUNT($1) +FROM @tsv_unload_stage +( + FILE_FORMAT => 'tsv_unload_format', + PATTERN => '.*[.]tsv[.]gz' +); +``` + +Result: + +```text +┌───────────┐ +│ count($1) │ +├───────────┤ +│ 100 │ +└───────────┘ +``` diff --git a/tidb-cloud-lake/guides/upload-to-stage.md b/tidb-cloud-lake/guides/upload-to-stage.md new file mode 100644 index 0000000000000..fa607173384c1 --- /dev/null +++ b/tidb-cloud-lake/guides/upload-to-stage.md @@ -0,0 +1,401 @@ +--- +title: Uploading to Stage +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +Databend recommends two file upload methods for stages: [PRESIGN](/tidb-cloud-lake/sql/presign.md) and PUT/GET commands. These methods enable direct data transfer between the client and your storage, eliminating intermediaries and resulting in cost savings by reducing traffic between Databend and your storage. + +![Alt text](/img/load/staging-file.png) + +The PRESIGN method generates a time-limited URL with a signature, which clients can use to securely initiate file uploads. This URL grants temporary access to the designated stage, allowing clients to directly transfer data without relying on Databend servers for the entire process, enhancing both security and efficiency. + +If you're using [BendSQL](/tidb-cloud-lake/guides/bendsql.md) to manage files in a stage, you can use the PUT command for uploading files and the GET command for downloading files. + +- The GET command currently can only download all files in a stage, not individual ones. +- These commands are exclusive to BendSQL and the GET command will not function when Databend uses the file system as the storage backend. + +### Uploading with Presigned URL + +The following examples demonstrate how to upload a sample file ([books.parquet](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.parquet)) to the user stage, an internal stage, and an external stage with presigned URLs. + + + + + +```sql +PRESIGN UPLOAD @~/books.parquet; +``` + +Result: + +``` +┌────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ Name │ Value │ +├────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ method │ PUT │ +│ headers│ {"host":"s3.us-east-2.amazonaws.com"} │ +│ url │ https://s3.us-east-2.amazonaws.com/databend-toronto/stage/user/root/books.parquet?X-Amz-Algorithm... │ +└────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +```shell +curl -X PUT -T books.parquet "https://s3.us-east-2.amazonaws.com/databend-toronto/stage/user/root/books.parquet?X-Amz-Algorithm=... ... +``` + +Check the staged file: + +```sql +LIST @~; +``` + +Result: + +``` +┌───────────────┬──────┬──────────────────────────────────────┬─────────────────────────────────┬─────────┐ +│ name │ size │ md5 │ last_modified │ creator │ +├───────────────┼──────┼──────────────────────────────────────┼─────────────────────────────────┼─────────┤ +│ books.parquet │ 998 │ 88432bf90aadb79073682988b39d461c │ 2023-06-27 16:03:51.000 +0000 │ │ +└───────────────┴──────┴──────────────────────────────────────┴─────────────────────────────────┴─────────┘ +``` + + + + + +```sql +CREATE STAGE my_internal_stage; +``` + +```sql +PRESIGN UPLOAD @my_internal_stage/books.parquet; +``` + +Result: + +``` +┌─────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ Name │ Value │ +├─────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ method │ PUT │ +│ headers │ {"host":"s3.us-east-2.amazonaws.com"} │ +│ url │ https://s3.us-east-2.amazonaws.com/databend-toronto/stage/internal/my_internal_stage/books.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIASTQNLUZWP2UY2HSN%2F20230628%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20230628T022951Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=9cfcdf3b3554280211f88629d60358c6d6e6a5e49cd83146f1daea7dfe37f5c1 │ +└─────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +```shell +curl -X PUT -T books.parquet "https://s3.us-east-2.amazonaws.com/databend-toronto/stage/internal/my_internal_stage/books.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIASTQNLUZWP2UY2HSN%2F20230628%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20230628T022951Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=9cfcdf3b3554280211f88629d60358c6d6e6a5e49cd83146f1daea7dfe37f5c1" +``` + +Check the staged file: + +```sql +LIST @my_internal_stage; +``` + +Result: + +``` +┌──────────────────────────────────┬───────┬──────────────────────────────────────┬─────────────────────────────────┬─────────┐ +│ name │ size │ md5 │ last_modified │ creator │ +├──────────────────────────────────┼───────┼──────────────────────────────────────┼─────────────────────────────────┼─────────┤ +│ books.parquet │ 998 │ "88432bf90aadb79073682988b39d461c" │ 2023-06-28 02:32:15.000 +0000 │ │ +└──────────────────────────────────┴───────┴──────────────────────────────────────┴─────────────────────────────────┴─────────┘ +``` + + + + +```sql +CREATE STAGE my_external_stage +URL = 's3://databend' +CONNECTION = ( + ENDPOINT_URL = 'http://127.0.0.1:9000', + ACCESS_KEY_ID = 'ROOTUSER', + SECRET_ACCESS_KEY = 'CHANGEME123' +); +``` + +```sql +PRESIGN UPLOAD @my_external_stage/books.parquet; +``` + +Result: + +```` +┌─────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ Name │ Value │ +├─────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ method │ PUT │ +│ headers │ {"host":"127.0.0.1:9000"} │ +│ url │ http://127.0.0.1:9000/databend/books.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ROOTUSER%2F20230628%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230628T040959Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature= │ +└─────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +```shell +curl -X PUT -T books.parquet "http://127.0.0.1:9000/databend/books.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ROOTUSER%2F20230628%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230628T040959Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=" +```` + +Check the staged file: + +```sql +LIST @my_external_stage; +``` + +Result: + +``` +┌───────────────┬──────┬──────────────────────────────────────┬─────────────────────────────────┬─────────┐ +│ name │ size │ md5 │ last_modified │ creator │ +├───────────────┼──────┼──────────────────────────────────────┼─────────────────────────────────┼─────────┤ +│ books.parquet │ 998 │ "88432bf90aadb79073682988b39d461c" │ 2023-06-28 04:13:15.178 +0000 │ │ +└───────────────┴──────┴──────────────────────────────────────┴─────────────────────────────────┴─────────┘ +``` + + + + +### Uploading with PUT Command + +The following examples demonstrate how to use BendSQL to upload a sample file ([books.parquet](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.parquet)) to the user stage, an internal stage, and an external stage with the PUT command. + + + + + +```sql +PUT fs:///Users/eric/Documents/books.parquet @~ +``` + +Result: + +``` +┌───────────────────────────────────────────────┐ +│ file │ status │ +├─────────────────────────────────────┼─────────┤ +│ /Users/eric/Documents/books.parquet │ SUCCESS │ +└───────────────────────────────────────────────┘ +``` + +Check the staged file: + +```sql +LIST @~; +``` + +Result: + +``` +┌────────────────────────────────────────────────────────────────────────┐ +│ name │ size │ ··· │ last_modified │ creator │ +├───────────────┼────────┼─────┼──────────────────────┼──────────────────┤ +│ books.parquet │ 998 │ ... │ 2023-09-04 03:27:... │ NULL │ +└────────────────────────────────────────────────────────────────────────┘ +``` + + + + + +```sql +CREATE STAGE my_internal_stage; +``` + +```sql +PUT fs:///Users/eric/Documents/books.parquet @my_internal_stage; +``` + +Result: + +``` +┌───────────────────────────────────────────────┐ +│ file │ status │ +├─────────────────────────────────────┼─────────┤ +│ /Users/eric/Documents/books.parquet │ SUCCESS │ +└───────────────────────────────────────────────┘ +``` + +Check the staged file: + +```sql +LIST @my_internal_stage; +``` + +Result: + +``` +┌────────────────────────────────────────────────────────────────────────┐ +│ name │ size │ ··· │ last_modified │ creator │ +├───────────────┼────────┼─────┼──────────────────────┼──────────────────┤ +│ books.parquet │ 998 │ ... │ 2023-09-04 03:32:... │ NULL │ +└────────────────────────────────────────────────────────────────────────┘ +``` + + + + +``` +CREATE STAGE my_external_stage + URL = 's3://databend' + CONNECTION = ( + ENDPOINT_URL = 'http://127.0.0.1:9000', + ACCESS_KEY_ID = 'ROOTUSER', + SECRET_ACCESS_KEY = 'CHANGEME123' + ); +``` + +```sql +PUT fs:///Users/eric/Documents/books.parquet @my_external_stage +``` + +Result: + +``` +┌───────────────────────────────────────────────┐ +│ file │ status │ +├─────────────────────────────────────┼─────────┤ +│ /Users/eric/Documents/books.parquet │ SUCCESS │ +└───────────────────────────────────────────────┘ +``` + +Check the staged file: + +```sql +LIST @my_external_stage; +``` + +Result: + +``` +┌──────────────────────────────────────────────────────────────────────┐ +│ name │ ··· │ last_modified │ creator │ +├──────────────────────┼─────┼──────────────────────┼──────────────────┤ +│ books.parquet │ ... │ 2023-09-04 03:37:... │ NULL │ +└──────────────────────────────────────────────────────────────────────┘ +``` + + + + +### Uploading a Directory with PUT Command + +You can also upload multiple files from a directory using the PUT command with wildcards. This is useful when you need to stage a large number of files at once. + +```sql +PUT fs:///home/ubuntu/datas/event_data/*.parquet @your_stage; +``` + +Result: + +``` +┌───────────────────────────────────────────────────────┐ +│ file │status │ +├─────────────────────────────────────────────┼─────────┤ +│ /home/ubuntu/datas/event_data/file1.parquet │ SUCCESS │ +│ /home/ubuntu/datas/event_data/file2.parquet │ SUCCESS │ +│ /home/ubuntu/datas/event_data/file3.parquet │ SUCCESS │ +└───────────────────────────────────────────────────────┘ +``` + +### Downloading with GET Command + +The following examples demonstrate how to use BendSQL to download a sample file ([books.parquet](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.parquet)) from the user stage, an internal stage, and an external stage with the GET command. + + + + + +```sql +LIST @~; +``` + +Result: + +``` +┌────────────────────────────────────────────────────────────────────────┐ +│ name │ size │ ··· │ last_modified │ creator │ +├───────────────┼────────┼─────┼──────────────────────┼──────────────────┤ +│ books.parquet │ 998 │ ... │ 2023-09-04 03:27:... │ NULL │ +└────────────────────────────────────────────────────────────────────────┘ +``` + +```sql +GET @~/ fs:///Users/eric/Downloads/fromStage/; +``` + +Result: + +``` +┌─────────────────────────────────────────────────────────┐ +│ file │ status │ +├───────────────────────────────────────────────┼─────────┤ +│ /Users/eric/Downloads/fromStage/books.parquet │ SUCCESS │ +└─────────────────────────────────────────────────────────┘ +``` + + + + + +```sql +LIST @my_internal_stage; +``` + +Result: + +``` +┌────────────────────────────────────────────────────────────────────────┐ +│ name │ size │ ··· │ last_modified │ creator │ +├───────────────┼────────┼─────┼──────────────────────┼──────────────────┤ +│ books.parquet │ 998 │ ... │ 2023-09-04 03:32:... │ NULL │ +└────────────────────────────────────────────────────────────────────────┘ +``` + +```sql +GET @my_internal_stage/ fs:///Users/eric/Downloads/fromStage/; +``` + +Result: + +``` +┌─────────────────────────────────────────────────────────┐ +│ file │ status │ +├───────────────────────────────────────────────┼─────────┤ +│ /Users/eric/Downloads/fromStage/books.parquet │ SUCCESS │ +└─────────────────────────────────────────────────────────┘ +``` + + + + +```sql + +LIST @my_external_stage; + +``` + +Result: + +``` +┌──────────────────────────────────────────────────────────────────────┐ +│ name │ ··· │ last_modified │ creator │ +├──────────────────────┼─────┼──────────────────────┼──────────────────┤ +│ books.parquet │ ... │ 2023-09-04 03:37:... │ NULL │ +└──────────────────────────────────────────────────────────────────────┘ +``` + +```sql +GET @my_external_stage/ fs:///Users/eric/Downloads/fromStage/; +``` + +Result: + +``` +┌─────────────────────────────────────────────────────────┐ +│ file │ status │ +├───────────────────────────────────────────────┼─────────┤ +│ /Users/eric/Downloads/fromStage/books.parquet │ SUCCESS │ +└─────────────────────────────────────────────────────────┘ +``` + + + diff --git a/tidb-cloud-lake/guides/vector-search.md b/tidb-cloud-lake/guides/vector-search.md new file mode 100644 index 0000000000000..93e34e54febea --- /dev/null +++ b/tidb-cloud-lake/guides/vector-search.md @@ -0,0 +1,182 @@ +--- +title: Vector Search +--- + +> **Scenario:** CityDrive keeps embeddings for every frame directly in Databend. These vector embeddings are the result of AI models inferencing on video keyframes to capture visual semantic features. Semantic similarity search ("find frames that look like this") can run alongside traditional SQL analytics—no separate vector service required. + +The `frame_embeddings` table shares the same `frame_id` keys as `frame_events`, `frame_metadata_catalog`, and `frame_geo_points`, which keeps semantic search and classic SQL glued together. + +## 1. Prepare the Embedding Table +Production models tend to emit 512–1536 dimensions. The example below uses 512 so you can copy it straight into a demo cluster without changing the DDL. + +```sql +CREATE OR REPLACE TABLE frame_embeddings ( + frame_id STRING, + video_id STRING, + sensor_view STRING, + embedding VECTOR(512), + encoder_build STRING, + created_at TIMESTAMP, + VECTOR INDEX idx_frame_embeddings(embedding) distance='cosine' +); + +-- SQL UDF: build 512 dims via ARRAY_AGG + window frame; tutorial placeholder only. +CREATE OR REPLACE FUNCTION demo_random_vector(seed STRING) +RETURNS TABLE(embedding VECTOR(512)) +AS $$ +SELECT CAST( + ARRAY_AGG(rand_val) OVER ( + PARTITION BY seed + ORDER BY seq + ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING + ) + AS VECTOR(512) + ) AS embedding +FROM ( + SELECT seed, + dims.number AS seq, + (RAND() * 0.2 - 0.1)::FLOAT AS rand_val + FROM numbers(512) AS dims +) vals +QUALIFY ROW_NUMBER() OVER (PARTITION BY seed ORDER BY seq) = 1; +$$; + +INSERT INTO frame_embeddings (frame_id, video_id, sensor_view, embedding, encoder_build, created_at) +SELECT 'FRAME-0101', 'VID-20250101-001', 'roof_cam', embedding, 'clip-lite-v1', '2025-01-01 08:15:21' +FROM demo_random_vector('FRAME-0101') +UNION ALL +SELECT 'FRAME-0102', 'VID-20250101-001', 'roof_cam', embedding, 'clip-lite-v1', '2025-01-01 08:33:54' +FROM demo_random_vector('FRAME-0102') +UNION ALL +SELECT 'FRAME-0201', 'VID-20250101-002', 'front_cam', embedding, 'night-fusion-v2', '2025-01-01 11:12:02' +FROM demo_random_vector('FRAME-0201') +UNION ALL +SELECT 'FRAME-0401', 'VID-20250103-001', 'rear_cam', embedding, 'night-fusion-v2', '2025-01-03 21:18:07' +FROM demo_random_vector('FRAME-0401'); +``` + +> This array generator is just to keep the tutorial self-contained. Replace it with real embeddings from your model in production. + +If you haven’t run the SQL Analytics guide yet, create the supporting `frame_events` table and seed the same sample rows the vector walkthrough joins against: + +```sql +CREATE OR REPLACE TABLE frame_events ( + frame_id STRING, + video_id STRING, + frame_index INT, + collected_at TIMESTAMP, + event_tag STRING, + risk_score DOUBLE, + speed_kmh DOUBLE +); + +INSERT INTO frame_events VALUES + ('FRAME-0101', 'VID-20250101-001', 125, '2025-01-01 08:15:21', 'hard_brake', 0.81, 32.4), + ('FRAME-0102', 'VID-20250101-001', 416, '2025-01-01 08:33:54', 'pedestrian', 0.67, 24.8), + ('FRAME-0201', 'VID-20250101-002', 298, '2025-01-01 11:12:02', 'lane_merge', 0.74, 48.1), + ('FRAME-0301', 'VID-20250102-001', 188, '2025-01-02 09:44:18', 'hard_brake', 0.59, 52.6), + ('FRAME-0401', 'VID-20250103-001', 522, '2025-01-03 21:18:07', 'night_lowlight', 0.63, 38.9), + ('FRAME-0501', 'VID-MISSING-001', 10, '2025-01-04 10:00:00', 'sensor_fault', 0.25, 15.0); +``` + +Docs: [Vector type](/tidb-cloud-lake/sql/vector.md) and [Vector index](/tidb-cloud-lake/sql/vector.md#vector-indexing). + +--- + +## 2. Run Cosine Search +Pull the embedding from one frame and let the HNSW index return the closest neighbours. + +```sql +WITH query_embedding AS ( + SELECT embedding + FROM frame_embeddings + WHERE frame_id = 'FRAME-0101' +) +SELECT e.frame_id, + e.video_id, + COSINE_DISTANCE(e.embedding, q.embedding) AS distance +FROM frame_embeddings AS e +CROSS JOIN query_embedding AS q +ORDER BY distance +LIMIT 3; +``` + +Sample output: + +``` +frame_id | video_id | distance +FRAME-0101| VID-20250101-001 | 0.0000 +FRAME-0201| VID-20250101-002 | 0.9801 +FRAME-0102| VID-20250101-001 | 0.9842 +``` + +Lower distance = more similar. The `VECTOR INDEX` keeps latency low even with millions of frames. + +Add traditional predicates (route, video, sensor view) before or after the vector comparison to narrow the candidate set. + +```sql +WITH query_embedding AS ( + SELECT embedding + FROM frame_embeddings + WHERE frame_id = 'FRAME-0201' +) +SELECT e.frame_id, + e.sensor_view, + COSINE_DISTANCE(e.embedding, q.embedding) AS distance +FROM frame_embeddings AS e +CROSS JOIN query_embedding AS q +WHERE e.sensor_view = 'rear_cam' +ORDER BY distance +LIMIT 5; +``` + +Sample output: + +``` +frame_id | sensor_view | distance +FRAME-0401| rear_cam | 1.0537 +``` + +The optimizer still uses the vector index while honoring the `sensor_view` filter. + +--- + +## 3. Enrich Similar Frames +Materialize the top matches, then enrich them with `frame_events` for downstream analytics. + +```sql +WITH query_embedding AS ( + SELECT embedding + FROM frame_embeddings + WHERE frame_id = 'FRAME-0102' + ), + similar_frames AS ( + SELECT frame_id, + video_id, + COSINE_DISTANCE(e.embedding, q.embedding) AS distance + FROM frame_embeddings e + CROSS JOIN query_embedding q + ORDER BY distance + LIMIT 5 + ) +SELECT sf.frame_id, + sf.video_id, + fe.event_tag, + fe.risk_score, + sf.distance +FROM similar_frames sf +LEFT JOIN frame_events fe USING (frame_id) +ORDER BY sf.distance; +``` + +Sample output: + +``` +frame_id | video_id | event_tag | risk_score | distance +FRAME-0102| VID-20250101-001 | pedestrian | 0.67 | 0.0000 +FRAME-0201| VID-20250101-002 | lane_merge | 0.74 | 0.9802 +FRAME-0101| VID-20250101-001 | hard_brake | 0.81 | 0.9842 +FRAME-0401| VID-20250103-001 | night_lowlight | 0.63 | 1.0020 +``` + +Because the embeddings live next to relational tables, you can pivot from “frames that look alike” to “frames that also had `hard_brake` tags, specific weather, or JSON detections” without exporting data to another service. diff --git a/tidb-cloud-lake/guides/virtual-column.md b/tidb-cloud-lake/guides/virtual-column.md new file mode 100644 index 0000000000000..a8938ad529171 --- /dev/null +++ b/tidb-cloud-lake/guides/virtual-column.md @@ -0,0 +1,164 @@ +--- +title: Virtual Column +--- + +# Virtual Column: Automatic Acceleration for JSON Data + +import EEFeature from '@site/src/components/EEFeature'; + + + + +Virtual columns automatically accelerate queries on semi-structured data stored in [VARIANT](/tidb-cloud-lake/sql/variant.md) columns. This feature provides **zero-configuration performance optimization** for JSON data access. + +## What Problem Does It Solve? + +When querying JSON data, traditional databases must parse the entire JSON structure every time you access a nested field. This creates performance bottlenecks: + +| Problem | Impact | Virtual Column Solution | +|---------|--------|------------------------| +| **Query Latency** | Complex JSON queries take seconds | Sub-second response times | +| **Excessive Data Reading** | Must read entire JSON documents even for single fields | Read only the specific fields needed | +| **Slow JSON Parsing** | Every query re-parses entire JSON documents | Pre-materialized fields for instant access | +| **High CPU Usage** | JSON traversal consumes processing power | Direct column reads like regular data | +| **Memory Overhead** | Loading full JSON structures into memory | Only load needed fields | + +**Example Scenario**: An e-commerce analytics table with product data in JSON format. Without virtual columns, querying `product_data['category']` across millions of rows requires parsing every JSON document. With virtual columns, it becomes a direct column lookup. + +## How It Works Automatically + +1. **Data Ingestion** → Databend analyzes JSON structure in VARIANT columns +2. **Smart Detection** → System identifies frequently accessed nested fields +3. **Background Optimization** → Virtual columns are created automatically +4. **Query Acceleration** → Queries automatically use optimized paths + +![Virtual Column Workflow](/img/sql/virtual-column.png) + +## Configuration + +Virtual columns are enabled by default starting from v1.2.832 and require no additional configuration. + +## Complete Example + +This example demonstrates automatic virtual column creation and performance benefits: + +```sql +-- Create a table named 'test' with columns 'id' and 'val' of type Variant. +CREATE TABLE test(id int, val variant); + +-- Insert sample records into the 'test' table with Variant data. +INSERT INTO + test +VALUES + ( + 1, + '{"id":1,"name":"databend","tags":["powerful","fast"],"pricings":[{"type":"Standard","price":"Pay as you go"},{"type":"Enterprise","price":"Custom"}]}' + ), + ( + 2, + '{"id":2,"name":"databricks","tags":["scalable","flexible"],"pricings":[{"type":"Free","price":"Trial"},{"type":"Premium","price":"Subscription"}]}' + ), + ( + 3, + '{"id":3,"name":"snowflake","tags":["cloud-native","secure"],"pricings":[{"type":"Basic","price":"Pay per second"},{"type":"Enterprise","price":"Annual"}]}' + ), + ( + 4, + '{"id":4,"name":"redshift","tags":["reliable","scalable"],"pricings":[{"type":"On-Demand","price":"Pay per usage"},{"type":"Reserved","price":"1 year contract"}]}' + ), + ( + 5, + '{"id":5,"name":"bigquery","tags":["innovative","cost-efficient"],"pricings":[{"type":"Flat Rate","price":"Monthly"},{"type":"Flex","price":"Per query"}]}' + ); + +INSERT INTO test SELECT * FROM test; +INSERT INTO test SELECT * FROM test; +INSERT INTO test SELECT * FROM test; +INSERT INTO test SELECT * FROM test; +INSERT INTO test SELECT * FROM test; + +-- Explain the query execution plan for selecting specific fields from the table. +EXPLAIN +SELECT + val ['name'], + val ['tags'] [0], + val ['pricings'] [0] ['type'] +FROM + test; + +-[ EXPLAIN ]----------------------------------- +Exchange +├── output columns: [test.val['name'] (#3), test.val['pricings'][0]['type'] (#5), test.val['tags'][0] (#8)] +├── exchange type: Merge +└── TableScan + ├── table: default.default.test + ├── output columns: [val['name'] (#3), val['pricings'][0]['type'] (#5), val['tags'][0] (#8)] + ├── read rows: 160 + ├── read size: 1.69 KiB + ├── partitions total: 6 + ├── partitions scanned: 6 + ├── pruning stats: [segments: , blocks: ] + ├── push downs: [filters: [], limit: NONE] + ├── virtual columns: [val['name'], val['pricings'][0]['type'], val['tags'][0]] + └── estimated rows: 160.00 + +-- Explain the query execution plan for selecting only the 'name' field from the table. +EXPLAIN +SELECT + val ['name'] +FROM + test; + +-[ EXPLAIN ]----------------------------------- +Exchange +├── output columns: [test.val['name'] (#2)] +├── exchange type: Merge +└── TableScan + ├── table: default.book_db.test + ├── output columns: [val['name'] (#2)] + ├── read rows: 160 + ├── read size: < 1 KiB + ├── partitions total: 16 + ├── partitions scanned: 16 + ├── pruning stats: [segments: , blocks: ] + ├── push downs: [filters: [], limit: NONE] + ├── virtual columns: [val['name']] + └── estimated rows: 160.00 + +-- Display all the auto generated virtual columns. +SHOW VIRTUAL COLUMNS WHERE table='test'; + +╭────────────────────────────────────────────────────────────────────────────────────────────────────────╮ +│ database │ table │ source_column │ virtual_column_id │ virtual_column_name │ virtual_column_type │ +│ String │ String │ String │ UInt32 │ String │ String │ +├──────────┼────────┼───────────────┼───────────────────┼──────────────────────────┼─────────────────────┤ +│ default │ test │ val │ 3000000000 │ ['id'] │ UInt64 │ +│ default │ test │ val │ 3000000001 │ ['name'] │ String │ +│ default │ test │ val │ 3000000002 │ ['pricings'][0]['price'] │ String │ +│ default │ test │ val │ 3000000003 │ ['pricings'][0]['type'] │ String │ +│ default │ test │ val │ 3000000004 │ ['pricings'][1]['price'] │ String │ +│ default │ test │ val │ 3000000005 │ ['pricings'][1]['type'] │ String │ +│ default │ test │ val │ 3000000006 │ ['tags'][0] │ String │ +│ default │ test │ val │ 3000000007 │ ['tags'][1] │ String │ +╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯ +``` + +## Monitoring Commands + +| Command | Purpose | +|---------|---------| +| [`SHOW VIRTUAL COLUMNS`](/tidb-cloud-lake/sql/show-virtual-columns.md) | View automatically created virtual columns | +| [`REFRESH VIRTUAL COLUMN`](/tidb-cloud-lake/sql/refresh-virtual-column.md) | Manually refresh virtual columns | +| [`FUSE_VIRTUAL_COLUMN`](/tidb-cloud-lake/sql/fuse-virtual-column.md) | View virtual column metadata | + +## Performance Results + +Virtual columns typically provide: +- **5-10x faster** JSON field access +- **Automatic optimization** without query changes +- **Reduced resource consumption** during query processing +- **Transparent acceleration** for existing applications + +--- + +*Virtual columns work automatically in the background—Databend optimizes your JSON queries with zero configuration.* diff --git a/tidb-cloud-lake/guides/warehouse.md b/tidb-cloud-lake/guides/warehouse.md new file mode 100644 index 0000000000000..2a0616e3f237e --- /dev/null +++ b/tidb-cloud-lake/guides/warehouse.md @@ -0,0 +1,209 @@ +--- +title: Warehouses +--- + +import PlaySVG from '@site/static/img/icon/play.svg' +import SuspendSVG from '@site/static/img/icon/suspend.svg' +import CheckboxSVG from '@site/static/img/icon/checkbox.svg' +import EllipsisSVG from '@site/static/img/icon/ellipsis.svg' +import { Button } from 'antd' + +The warehouse is an essential component of Databend Cloud. A warehouse represents a set of compute resources including CPU, memory, and local caches. You must run a warehouse to perform SQL tasks such as: + +- Querying data with the SELECT statement +- Modifying data with the INSERT, UPDATE, or DELETE statement +- Loading data into a table with the COPY INTO command + +Running a warehouse incurs expenses. For more information, see [Warehouse Pricing](/guides/cloud/overview/pricing#warehouse-pricing). + +## Warehouse Sizes + +In Databend Cloud, warehouses are available in various sizes, each defined by the maximum number of concurrent queries it can handle. When creating a warehouse, you can choose from the following sizes: + +| Size | Recommended Use Cases | +| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | +| XSmall | Best for simple tasks like testing or running light queries. Suitable for small datasets (around 50GB). | +| Small | Great for running regular reports and moderate workloads. Suitable for medium-sized datasets (around 200GB). | +| Medium | Ideal for teams handling more complex queries and higher concurrency. Suitable for larger datasets (around 1TB). | +| Large | Perfect for organizations running many concurrent queries. Suitable for large datasets (around 5TB). | +| XLarge | Built for enterprise-scale workloads with high concurrency. Suitable for very large datasets (over 10TB). | +| nXLarge | n=2,3,4,5,6 [Contace Us](https://www.databend.com/contact-us/) | +| Multi-Cluster Scaling | Automatically scales out and scales in to match your workload, providing the most cost-efficient way to improve concurrency based on your needs. | + +To choose the appropriate warehouse size, Databend recommends starting with a smaller size. Smaller warehouses may take longer to execute SQL tasks compared to medium or large ones. If you find that query execution is taking too long (for example, several minutes), consider scaling up to a medium or large warehouse for faster results. + +## Managing Warehouses {#managing} + +An organization can have as many warehouses as needed. The **Warehouses** page displays all the warehouses in your organization and allows you to manage them. Please note that only `account_admin` can create or delete a warehouse. + +### Suspending / Resuming Warehouses + +A suspended warehouse does not consume any credits. You can manually suspend or resume a warehouse by clicking the or button on the warehouse. However, a warehouse can automatically suspend or resume in the following scenarios: + +- A warehouse can automatically suspend if there is no activity, based on its auto-suspend setting. +- When you select a suspended warehouse to perform a SQL task, the warehouse will automatically resume. + +### Performing Bulk Operations + +You can perform bulk operations on warehouses, including bulk restart, bulk suspend, bulk resume, and bulk delete. To do so, select the warehouses for bulk operations by checking the checkboxes in the warehouse list, and then click the ellipse button for the desired operation. + +![alt text](@site/static/img/cloud/bulk.gif) + +### Best Practices + +To effectively manage your warehouses and ensure optimal performance and cost-efficiency, consider the following best practices. These guidelines will help you size, organize, and fine-tune your warehouses for various workloads and environments: + +- **Choose the Right Size** + + - For **development & testing**, use smaller warehouses (XSmall, Small). + - For **production**, opt for larger warehouses (Medium, Large, XLarge). + +- **Separate Warehouses** + + - Use separate warehouses for **data loading** and **query execution**. + - Create distinct warehouses for **development**, **testing**, and **production** environments. + +- **Data Loading Tips** + + - Smaller warehouses (Small, Medium) are suitable for data loading. + - Optimize file size and the number of files to enhance performance. + +- **Optimize for Cost & Performance** + + - Avoid running simple queries like `SELECT 1` to minimize credit usage. + - Use bulk loading (`COPY`) rather than individual `INSERT` statements. + - Monitor long-running queries and optimize them to improve performance. + +- **Auto-Suspend** + + - Enable auto-suspend to save credits when the warehouse is idle. + +- **Disable Auto-Suspend for Frequent Queries** + + - Keep warehouses active for frequent or repetitive queries to maintain cache and avoid delays. + +- **Use Auto-Scaling (Business & Dedicated Plans Only)** + + - Multi-cluster scaling automatically adjusts resources based on workload demand. + +- **Monitor & Adjust Usage** + - Regularly review warehouse usage and resize as needed to balance cost and performance. + +## Warehouse Access Control + +Databend Cloud allows you to manage warehouse access with role-based controls by assigning a specific role to a warehouse, so only users with that role can access the warehouse. + +:::note +Warehouse access control is _not_ enabled out of the box. To enable it, go to **Support** > **Create New Ticket** and submit a request. +::: + +To assign a role to a warehouse, select the desired role in the **Advanced Options** during the warehouse creation or modification process: + +![alt text](@site/static/img/documents/warehouses/warehouse-role.png) + +- The two [Built-in Roles](/tidb-cloud-lake/guides/roles.md#built-in-roles) are available for selection, and you can also create additional roles using the [CREATE ROLE](/tidb-cloud-lake/sql/create-role.md) command. For more information about Databend roles, see [Roles](/tidb-cloud-lake/guides/roles.md). +- Warehouses without an assigned role default to the `public` role, allowing access to all users. +- You can grant a role to a user (Databend Cloud login email or SQL user) using the [GRANT](/tidb-cloud-lake/sql/grant.md) command, or, alternatively, assign a role when inviting the user to your organization. For more information, see [Inviting New Members](organization.md#inviting-new-members). This example grants the role `manager` to the user with the email `name@example.com`, allowing access to any warehouse assigned to the `manager` role: + + ```sql title='Examples:' + GRANT ROLE manager to 'name@example.com'; + ``` + +## Multi-Cluster Warehouses + +A multi-cluster warehouse automatically adjusts compute resources by adding or removing clusters based on workload demand. It ensures high concurrency and performance while optimizing cost by scaling up or down as needed. + +:::note +Multi-Cluster is only available for Databend Cloud users on the Business and Dedicated plans. +::: + +### How it Works + +By default, a warehouse consists of a single cluster of compute resources, which can handle a maximum number of concurrent queries depending on its size. When Multi-Cluster is enabled for a warehouse, it allows multiple clusters (as defined by the **Max Clusters** setting) to be dynamically added to handle workloads that exceed the capacity of a single cluster. + +When the number of concurrent queries exceeds the capacity of your warehouse, an additional cluster is added to handle the extra load. If the demand continues to grow, more clusters are added one by one. As query demand decreases, clusters with no activity for longer than the **Auto Suspend** duration are automatically shut down. + +![alt text](@site/static/img/cloud/multi-cluster-how-it-works.png) + +### Enabling Multi-Cluster + +You can enable Multi-Cluster for a warehouse when you create it and set the maximum number of clusters that the warehouse can scale up to. Please note that if Multi-Cluster is enabled for a warehouse, the **Auto Suspend** duration must be set to at least 15 minutes. + +![alt text](@site/static/img/cloud/multi-cluster.png) + +### Cost Calculation + +Multi-Cluster Warehouses are billed based on the number of active clusters used during specific time intervals. + +For example, for an XSmall Warehouse priced at $1 per hour, if one cluster is actively used from 13:00 to 14:00 and two clusters are actively used from 14:00 to 15:00, the total cost incurred from 13:00 to 15:00 is $3 ((1 cluster × 1 hour × $1) + (2 clusters × 1 hour × $1)). + +## Connecting to a Warehouse {#connecting} + +Connecting to a warehouse provides the compute resources required to run queries and analyze data within Databend Cloud. This connection is necessary when accessing Databend Cloud from your applications or SQL clients. + +### Connection Methods + +Databend Cloud supports multiple connection methods to meet your specific needs. For detailed connection instructions, see the [SQL Clients documentation](/guides/connect). + +#### SQL Clients & Tools + +| Client | Type | Best For | Key Features | +| ------------------------------------------ | --------------- | ----------------------------- | ----------------------------------------------------- | +| **[BendSQL](/tidb-cloud-lake/guides/bendsql.md)** | Command Line | Developers, Scripts | Native CLI, Rich formatting, Multiple install options | +| **[DBeaver](/guides/connect/sql-clients/jdbc)** | GUI Application | Data Analysis, Visual Queries | Built-in driver, Cross-platform, Query builder | + +#### Developer Drivers + +| Language | Driver | Use Case | Documentation | +| ----------- | ----------------- | ----------------------- | ------------------------------------------------------ | +| **Go** | Golang Driver | Backend Applications | [Golang Guide](/guides/connect/drivers/golang) | +| **Python** | Python Connector | Data Science, Analytics | [Python Guide](/guides/connect/drivers/python) | +| **Node.js** | JavaScript Driver | Web Applications | [Node.js Guide](/guides/connect/drivers/nodejs) | +| **Java** | JDBC Driver | Enterprise Applications | [JDBC Guide](/guides/connect/drivers/java) | +| **Rust** | Rust Driver | System Programming | [Rust Guide](/guides/connect/drivers/rust) | + +### Obtaining Connection Information + +To obtain the connection information for a warehouse: + +1. Click **Connect** on the **Overview** page. +2. Select the database and warehouse you wish to connect to. The connection information will update based on your selection. +3. The connection details include a SQL user named `cloudapp` with a randomly generated password. Databend Cloud does not store this password. Be sure to copy and save it securely. If you forget the password, click **Reset** to generate a new one. + +![alt text](@site/static/img/documents/warehouses/databend_cloud_dsn.gif) + +### Connection String Format + +Databend Cloud automatically generates your connection string when you click **Connect**: + +``` +databend://:@.gw..default.databend.com:443/?warehouse= +``` + +Where: + +- ``: Default is `cloudapp` +- ``: Click **Reset** to view or change +- ``, ``: Your account information (shown in the connection details) +- ``: Selected database (shown in the connection details) +- ``: Selected warehouse (shown in the connection details) + +### Creating SQL Users for Warehouse Access + +Besides the default `cloudapp` user, you can create additional SQL users for better security and access control: + +```sql +-- Create a role with database access +CREATE ROLE warehouse_user1_role; +GRANT ALL ON my_database.* TO ROLE warehouse_user1_role; + +-- Create a new SQL user and assign the role +CREATE USER warehouse_user1 IDENTIFIED BY 'StrongPassword123' WITH DEFAULT_ROLE = 'warehouse_user1_role'; +GRANT ROLE warehouse_user1_role TO warehouse_user1; +``` + +For more details, see [CREATE USER](/tidb-cloud-lake/sql/create-user.md) and [GRANT](/tidb-cloud-lake/sql/grant.md) documentation. + +### Connection Security + +All connections to Databend Cloud warehouses use TLS encryption by default. For enterprise users requiring additional security, [AWS PrivateLink](/guides/cloud/security/private-link) is available to establish private connections between your VPC and Databend Cloud. diff --git a/tidb-cloud-lake/guides/worksheet.md b/tidb-cloud-lake/guides/worksheet.md new file mode 100644 index 0000000000000..97d5d99e9eb24 --- /dev/null +++ b/tidb-cloud-lake/guides/worksheet.md @@ -0,0 +1,58 @@ +--- +title: Worksheets +--- + +import DbSVG from '@site/static/img/icon/database.svg' +import RoleSVG from '@site/static/img/icon/role.svg' +import WarehouseSVG from '@site/static/img/icon/warehouse.svg' +import EllipsisSVG from '@site/static/img/icon/ellipsis.svg' + +Worksheets in Databend Cloud are used to organize, run, and save SQL statements. They can also be shared with others in your organization. + +## Creating a Worksheet + +To create a new worksheet, click on **Worksheets** in the sidebar and select **New Worksheet**. + +If your SQL statements are already saved in an SQL file, you can also create a worksheet directly from the file. To do so, click the ellipsis icon to the right of **New Worksheet**, then select **Create from SQL File**. + +## Editing and Running SQL Statements + +To edit and run an SQL statement: + +1. Click on the database icon above the SQL editor and select the database you want to query. +2. Click on the user icon above the SQL editor and choose a role to use. The dropdown list will display all the roles you have been granted, along with any child roles under your roles in the hierarchy. For more information about the role hierarchy, see [Inheriting Roles & Establishing Hierarchy](/tidb-cloud-lake/guides/roles.md#inheriting-roles--establishing-hierarchy). + +3. Edit the SQL statement in the SQL editor. +4. Click on the warehouse icon under the SQL editor and select a warehouse from the list. +5. Click **Run Script**. + +The query result shows in the output area. You can click **Export** to save the whole result to a CSV file, or select one or multiple cells in the output area and press Command + C (on Mac) or Ctrl + C (on Windows) to copy them to your clipboard. + +:::tip + +- Multiple SQL statements in a single API call are not supported. Ensure that each SQL query in the worksheet ends with a single semicolon (;). +- To make it easier for you to edit SQL statements, you can select a table in the database list and click the "..." button next to it. Then, follow the menu prompts to choose to copy the table name or all column names to the SQL input area on the right in one click. + +- If you enter multiple statements in the SQL input area, Databend Cloud will only execute the statement where the cursor is located. You can move the cursor to execute other statements. Additionally, you can use keyboard shortcuts: Ctrl + Enter (Windows) or Command + Enter (Mac) to execute the current statement, and Ctrl + Shift + Enter (Windows) or Command + Shift + Enter (Mac) to execute all statements. + ::: + +## Sharing a Worksheet + +You can share your worksheets with everyone in your organization or specific individuals. To do so, click **Share** in the worksheet you want to share, or click **Share this Folder** to share a worksheet folder. + +![Alt text](@site/static/img/documents/worksheet/share.png) + +In the dialog box that appears, select the sharing scope. You can copy and share the link with the intended recipients, who will also receive an email notification. Please note that if you choose the **Designated Members** scope, recipients must click the link you share for the sharing to be successful. + +- To view the worksheets shared with you by others, click **Worksheets** in the sidebar, then click the **Shared with Me** tab on the right. +- When you share a worksheet with others, they can execute the SQL statements in it if they have the necessary permissions, but they won't be able to make any edits to the statements. + +## Exporting Query Results + +Databend Cloud provides the ability to export query results. However, this feature requires the organization Owner to grant the **EXPORT** permission to team members. For data security purposes, this feature is disabled by default. + +![Alt text](@site/static/img/documents/worksheet/download.png) + +If you need to use this feature, please contact your organization Owner to enable the permission(**Admin** > **Users & Roles**): + +![Alt text](@site/static/img/documents/worksheet/export.png) diff --git a/tidb-cloud-lake/lake-overview.md b/tidb-cloud-lake/lake-overview.md new file mode 100644 index 0000000000000..b56ef6e18ed11 --- /dev/null +++ b/tidb-cloud-lake/lake-overview.md @@ -0,0 +1,30 @@ +--- +title: TiDB Cloud Lake Overview +summary: TiDB Cloud Lake is cloud-native data warehouse service focused on analytics workloads that scales elastically and supports ANSI SQL and multi-modal data operations. +--- + +# TiDB Cloud Lake Overview + +TiDB Cloud Lake is cloud-native data warehouse service focused on analytics workloads that scales elastically and supports ANSI SQL and multi-modal data operations. It separates compute and storage, enabling flexible compute provisioning based on workload needs. + +## At a Glance + +| | | +|---|---| +| ⚡ **< 500ms** cold start | 🔐 **SOC 2 Type II** certified | +| 💰 **> 50%** cost savings vs Snowflake | 📊 **99.95%** uptime SLA | + +## Why Databend Cloud? + +| | | +|---|---| +| **Serverless** | No clusters to manage. Compute scales to zero when idle. | +| **Storage-Compute Separation** | Scale independently. Hot/warm/cold data tiering. | +| **Enterprise Security** | RBAC, data masking, encryption, PrivateLink, audit logs. | +| **Snowflake Compatible** | Familiar SQL syntax. Smooth migration. | + +## Get Started + +1. [**Sign Up**](cloud/getting-started) — Create your account in minutes +2. [**Explore**](cloud/resources/) — Warehouses, Worksheets, Dashboards +3. [**Connect**](/guides/connect) — BendSQL, Python, Go, Java, Node.js \ No newline at end of file diff --git a/tidb-cloud-lake/lake-quick-start.md b/tidb-cloud-lake/lake-quick-start.md new file mode 100644 index 0000000000000..b79af0f1e83c3 --- /dev/null +++ b/tidb-cloud-lake/lake-quick-start.md @@ -0,0 +1,8 @@ +--- +title: TiDB Cloud Lake Quick Start +summary: Get started with TiDB Cloud Lake. +--- + +# TiDB Cloud Lake Quick Start + +TiDB Cloud Lake is cloud-native data warehouse service focused on analytics workloads that scales elastically and supports ANSI SQL and multi-modal data operations. This document guides you through an easy way to get started with TiDB Cloud Lake. \ No newline at end of file diff --git a/tidb-cloud-lake/sql/abs.md b/tidb-cloud-lake/sql/abs.md new file mode 100644 index 0000000000000..4792d7fcc5373 --- /dev/null +++ b/tidb-cloud-lake/sql/abs.md @@ -0,0 +1,23 @@ +--- +title: ABS +--- + +Returns the absolute value of `x`. + +## Syntax + +```sql +ABS( ) +``` + +## Examples + +```sql +SELECT ABS(-5); + +┌────────────┐ +│ abs((- 5)) │ +├────────────┤ +│ 5 │ +└────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/acos.md b/tidb-cloud-lake/sql/acos.md new file mode 100644 index 0000000000000..bd34885a49b2c --- /dev/null +++ b/tidb-cloud-lake/sql/acos.md @@ -0,0 +1,23 @@ +--- +title: ACOS +--- + +Returns the arc cosine of `x`, that is, the value whose cosine is `x`. Returns NULL if `x` is not in the range -1 to 1. + +## Syntax + +```sql +ACOS( ) +``` + +## Examples + +```sql +SELECT ACOS(1); + +┌─────────┐ +│ acos(1) │ +├─────────┤ +│ 0 │ +└─────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/add-months.md b/tidb-cloud-lake/sql/add-months.md new file mode 100644 index 0000000000000..f6ef3d5dd7c16 --- /dev/null +++ b/tidb-cloud-lake/sql/add-months.md @@ -0,0 +1,90 @@ +--- +title: ADD_MONTHS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +The add_months() function adds a specified number of months to a given date or timestamp. + +If the input date is month-end or exceeds the resulting month’s days, the result is adjusted to the last day of the new month. Otherwise, the original day is preserved. + +## Syntax + +```sql +ADD_MONTHS(, ) +``` + +| Parameter | Description | +|----------------------|-----------------------------------------------------------------------------| +| `` | The starting date or timestamp to which months will be added | +| `` | The integer number of months to add (can be negative to subtract months) | + +## Return Type + +Returns a TIMESTAMP or DATE type + +## Examples + +### Basic Month Addition +```sql +SELECT ADD_MONTHS('2023-01-15'::DATE, 3); +├───────────────────────────────────┤ +│ 2023-04-15 │ +╰───────────────────────────────────╯ +``` + +### Subtracting Months +```sql +SELECT ADD_MONTHS('2023-06-20'::DATE, -4); +├─────────────────────────────────────┤ +│ 2023-02-20 │ +╰─────────────────────────────────────╯ +``` + +### Month-End Adjustment +```sql +SELECT ADD_MONTHS('2023-01-31'::DATE, 1); +├───────────────────────────────────┤ +│ 2023-02-28 │ +╰───────────────────────────────────╯ +``` + +### With Timestamp Preservation +```sql +SELECT ADD_MONTHS('2023-03-15 14:30:00'::TIMESTAMP, 5); +├─────────────────────────────────────────────────┤ +│ 2023-08-15 14:30:00.000000 │ +╰─────────────────────────────────────────────────╯ +``` + +### With last day of month +```sql +CREATE TABLE contracts ( + id INT, + sign_date DATE, + duration_months INT +); + +INSERT INTO contracts VALUES + (1, '2023-01-15', 12), + (2, '2024-02-28', 6), + (3, '2023-11-30', 3); + +SELECT + id, + sign_date, + ADD_MONTHS(sign_date, duration_months) AS end_date +FROM contracts; +├─────────────────┼────────────────┼────────────────┤ +│ 1 │ 2023-01-15 │ 2024-01-15 │ +│ 2 │ 2024-02-28 │ 2024-08-28 │ +│ 3 │ 2023-11-30 │ 2024-02-29 │ +╰───────────────────────────────────────────────────╯ + +``` + +## See Also + +- [DATE_ADD](/tidb-cloud-lake/sql/date-add.md): Alternative function for adding specific time intervals +- [DATE_SUB](/tidb-cloud-lake/sql/date-sub.md): Function for subtracting time intervals diff --git a/tidb-cloud-lake/sql/add-time-interval.md b/tidb-cloud-lake/sql/add-time-interval.md new file mode 100644 index 0000000000000..7c5934e4176ce --- /dev/null +++ b/tidb-cloud-lake/sql/add-time-interval.md @@ -0,0 +1,83 @@ +--- +title: ADD TIME INTERVAL +description: Add time interval function +title_includes: add_years, add_quarters, add_months, add_days, add_hours, add_minutes, add_seconds +--- + +Add a time interval to a date or timestamp, return the result of date or timestamp type. + +## Syntax + +```sql +ADD_YEARS(, ) +ADD_QUARTERs(, ) +ADD_MONTHS(, ) +ADD_DAYS(, ) +ADD_HOURS(, ) +ADD_MINUTES(, ) +ADD_SECONDS(, ) +``` + +## Return Type + +`DATE`, `TIMESTAMP`, depends on the input. + +## Examples + +```sql +SELECT to_date(18875), add_years(to_date(18875), 2); + +┌───────────────────────────────────────────────┐ +│ to_date(18875) │ add_years(to_date(18875), 2) │ +├────────────────┼──────────────────────────────┤ +│ 2021-09-05 │ 2023-09-05 │ +└───────────────────────────────────────────────┘ + +SELECT to_date(18875), add_quarters(to_date(18875), 2); + +┌──────────────────────────────────────────────────┐ +│ to_date(18875) │ add_quarters(to_date(18875), 2) │ +├────────────────┼─────────────────────────────────┤ +│ 2021-09-05 │ 2022-03-05 │ +└──────────────────────────────────────────────────┘ + +SELECT to_date(18875), add_months(to_date(18875), 2); + +┌────────────────────────────────────────────────┐ +│ to_date(18875) │ add_months(to_date(18875), 2) │ +├────────────────┼───────────────────────────────┤ +│ 2021-09-05 │ 2021-11-05 │ +└────────────────────────────────────────────────┘ + +SELECT to_date(18875), add_days(to_date(18875), 2); + +┌──────────────────────────────────────────────┐ +│ to_date(18875) │ add_days(to_date(18875), 2) │ +├────────────────┼─────────────────────────────┤ +│ 2021-09-05 │ 2021-09-07 │ +└──────────────────────────────────────────────┘ + +SELECT to_datetime(1630833797), add_hours(to_datetime(1630833797), 2); + +┌─────────────────────────────────────────────────────────────────┐ +│ to_datetime(1630833797) │ add_hours(to_datetime(1630833797), 2) │ +├─────────────────────────┼───────────────────────────────────────┤ +│ 2021-09-05 09:23:17 │ 2021-09-05 11:23:17 │ +└─────────────────────────────────────────────────────────────────┘ + +SELECT to_datetime(1630833797), add_minutes(to_datetime(1630833797), 2); + +┌───────────────────────────────────────────────────────────────────┐ +│ to_datetime(1630833797) │ add_minutes(to_datetime(1630833797), 2) │ +├─────────────────────────┼─────────────────────────────────────────┤ +│ 2021-09-05 09:23:17 │ 2021-09-05 09:25:17 │ +└───────────────────────────────────────────────────────────────────┘ + +SELECT to_datetime(1630833797), add_seconds(to_datetime(1630833797), 2); + +┌───────────────────────────────────────────────────────────────────┐ +│ to_datetime(1630833797) │ add_seconds(to_datetime(1630833797), 2) │ +├─────────────────────────┼─────────────────────────────────────────┤ +│ 2021-09-05 09:23:17 │ 2021-09-05 09:23:19 │ +└───────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/add.md b/tidb-cloud-lake/sql/add.md new file mode 100644 index 0000000000000..8da044061fede --- /dev/null +++ b/tidb-cloud-lake/sql/add.md @@ -0,0 +1,5 @@ +--- +title: ADD +--- + +Alias for [PLUS](/tidb-cloud-lake/sql/plus.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/administration-commands.md b/tidb-cloud-lake/sql/administration-commands.md new file mode 100644 index 0000000000000..eb5e68a1966c1 --- /dev/null +++ b/tidb-cloud-lake/sql/administration-commands.md @@ -0,0 +1,52 @@ +--- +title: Administration Commands +--- + +This page provides reference information for the system administration commands in Databend. + +## System Monitoring + +| Command | Description | +|---------|-------------| +| **[SHOW PROCESSLIST](/tidb-cloud-lake/sql/show-processlist.md)** | Display active queries and connections | +| **[SHOW METRICS](/tidb-cloud-lake/sql/show-metrics.md)** | View system performance metrics | +| **[KILL](/tidb-cloud-lake/sql/kill.md)** | Terminate running queries or connections | +| **[RUST BACKTRACE](rust-backtrace.md)** | Debug Rust stack traces | + +## Access Control + +| Command | Description | +|---------|-------------| +| **[FLUSH PRIVILEGES](/tidb-cloud-lake/guides/privileges.md)** | Force every query node to reload role and privilege metadata | + +## Configuration Management + +| Command | Description | +|---------|-------------| +| **[SET](02-set-global.md)** | Set global configuration parameters | +| **[UNSET](/tidb-cloud-lake/sql/unset.md)** | Remove configuration settings | +| **[SET VARIABLE](/tidb-cloud-lake/sql/set-var.md)** | Manage user-defined variables | +| **[SHOW SETTINGS](/tidb-cloud-lake/sql/show-settings.md)** | Display current system settings | + +## Function Management + +| Command | Description | +|---------|-------------| +| **[SHOW FUNCTIONS](/tidb-cloud-lake/sql/show-functions.md)** | List built-in functions | +| **[SHOW USER FUNCTIONS](/tidb-cloud-lake/sql/show-user-functions.md)** | List user-defined functions | +| **[SHOW TABLE FUNCTIONS](/tidb-cloud-lake/sql/show-table-functions.md)** | List table-valued functions | + +## Storage Maintenance + +| Command | Description | +|---------|-------------| +| **[VACUUM TABLE](/tidb-cloud-lake/sql/vacuum-table.md)** | Reclaim storage space from tables | +| **[VACUUM DROP TABLE](/tidb-cloud-lake/sql/vacuum-drop-table.md)** | Clean up dropped table data | +| **[VACUUM TEMP FILES](09-vacuum-temp-files.md)** | Remove temporary files | +| **[SHOW INDEXES](/tidb-cloud-lake/sql/show-indexes.md)** | Display table indexes | + +## Dynamic Execution + +| Command | Description | +|---------|-------------| +| **[EXECUTE IMMEDIATE](/tidb-cloud-lake/sql/execute-immediate.md)** | Execute dynamically constructed SQL statements | diff --git a/tidb-cloud-lake/sql/age.md b/tidb-cloud-lake/sql/age.md new file mode 100644 index 0000000000000..5a99ff78788d2 --- /dev/null +++ b/tidb-cloud-lake/sql/age.md @@ -0,0 +1,89 @@ +--- +title: AGE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +The age() function calculates the difference between two timestamps or the difference between a timestamp and the current date and time. + +## Syntax + +```sql +AGE(, ) +``` + +| Parameter | Description | +|----------------------|-----------------------------------------------------------------------------| +| `` | The ending timestamp | +| `` | The starting timestamp | + +## Return Type + +Returns an INTERVAL type + +## Calculation Logic + +The function calculates: +1. Full year differences (accounting for leap years) +2. Remaining month differences (considering varying month lengths) +3. Remaining day differences (including time components) + +Negative intervals are returned when `` is earlier than ``. + +## Examples + +### Basic Age Calculation +```sql +SELECT AGE('2023-03-15'::TIMESTAMP, '2020-01-20'::TIMESTAMP); +├─────────────────────────┤ +│ 3 years 1 month 26 days │ +╰─────────────────────────╯ +``` + +### Reverse Chronology +```sql +SELECT AGE('2018-12-25'::TIMESTAMP, '2022-05-10'::TIMESTAMP); +├─────────────────────────────┤ +│ -3 years -4 months -16 days │ +╰─────────────────────────────╯ +``` + +### With Time Components +```sql +SELECT AGE('2023-02-28 14:00:00'::TIMESTAMP, '2023-02-27 08:30:00'::TIMESTAMP); +├───────────────┤ +│ 1 day 5:30:00 │ +╰───────────────╯ +``` + +### Table Data Processing +```sql +CREATE TABLE projects ( + name String, + start_date TIMESTAMP, + end_date TIMESTAMP +); + +INSERT INTO projects VALUES + ('Alpha', '2020-06-01', '2023-09-30'), + ('Beta', '2022-01-15', '2022-11-01'); + +SELECT + name, + AGE(end_date, start_date) AS duration +FROM projects; +╭─────────────────────────────────────────────╮ +│ name │ duration │ +│ Nullable(String) │ Nullable(Interval) │ +├──────────────────┼──────────────────────────┤ +│ Alpha │ 3 years 3 months 29 days │ +│ Beta │ 9 months 17 days │ +╰─────────────────────────────────────────────╯ +``` + + +## See Also + +- [DATE_DIFF](/tidb-cloud-lake/sql/date-diff.md): Alternative function for calculating specific time unit differences + diff --git a/tidb-cloud-lake/sql/aggregate-functions.md b/tidb-cloud-lake/sql/aggregate-functions.md new file mode 100644 index 0000000000000..c25034807aa5a --- /dev/null +++ b/tidb-cloud-lake/sql/aggregate-functions.md @@ -0,0 +1,97 @@ +--- +title: 'Aggregate Functions' +--- + +This page provides a comprehensive overview of aggregate functions in Databend, organized by functionality for easy reference. + +## Basic Aggregation + +| Function | Description | Example | +|----------|-------------|---------| +| [COUNT](/tidb-cloud-lake/sql/count.md) | Counts the number of rows or non-NULL values | `COUNT(*)` → `10` | +| [COUNT_DISTINCT](/tidb-cloud-lake/sql/count-distinct.md) | Counts distinct values | `COUNT(DISTINCT city)` → `5` | +| [APPROX_COUNT_DISTINCT](/tidb-cloud-lake/sql/approx-count-distinct.md) | Approximates count of distinct values | `APPROX_COUNT_DISTINCT(user_id)` → `9955` | +| [SUM](/tidb-cloud-lake/sql/sum.md) | Calculates the sum of values | `SUM(sales)` → `1250.75` | +| [AVG](/tidb-cloud-lake/sql/avg.md) | Calculates the average of values | `AVG(temperature)` → `72.5` | +| [MIN](/tidb-cloud-lake/sql/min.md) | Returns the minimum value | `MIN(price)` → `9.99` | +| [MAX](/tidb-cloud-lake/sql/max.md) | Returns the maximum value | `MAX(price)` → `99.99` | +| [ANY_VALUE](/tidb-cloud-lake/sql/any-value.md) | Returns any value from the group | `ANY_VALUE(status)` → `'active'` | + +## Conditional Aggregation + +| Function | Description | Example | +|----------|-------------|---------| +| [COUNT_IF](/tidb-cloud-lake/sql/count-if.md) | Counts rows that match a condition | `COUNT_IF(price > 100)` → `5` | +| [SUM_IF](/tidb-cloud-lake/sql/sum-if.md) | Sums values that match a condition | `SUM_IF(amount, status = 'completed')` → `750.25` | +| [AVG_IF](/tidb-cloud-lake/sql/avg-if.md) | Averages values that match a condition | `AVG_IF(score, passed = true)` → `85.6` | +| [MIN_IF](/tidb-cloud-lake/sql/min-if.md) | Returns minimum where condition is true | `MIN_IF(temp, location = 'outside')` → `45.2` | +| [MAX_IF](/tidb-cloud-lake/sql/max-if.md) | Returns maximum where condition is true | `MAX_IF(speed, vehicle = 'car')` → `120.5` | + +## Statistical Functions + +| Function | Description | Example | +|----------|-------------|---------| +| [VAR_POP](/tidb-cloud-lake/sql/var-pop.md) / [VARIANCE_POP](/tidb-cloud-lake/sql/variance-pop.md) | Population variance | `VAR_POP(height)` → `10.25` | +| [VAR_SAMP](/tidb-cloud-lake/sql/var-samp.md) / [VARIANCE_SAMP](/tidb-cloud-lake/sql/variance-samp.md) | Sample variance | `VAR_SAMP(height)` → `12.3` | +| [STDDEV_POP](/tidb-cloud-lake/sql/stddev-pop.md) | Population standard deviation | `STDDEV_POP(height)` → `3.2` | +| [STDDEV_SAMP](/tidb-cloud-lake/sql/stddev-samp.md) | Sample standard deviation | `STDDEV_SAMP(height)` → `3.5` | +| [COVAR_POP](/tidb-cloud-lake/sql/covar-pop.md) | Population covariance | `COVAR_POP(x, y)` → `2.5` | +| [COVAR_SAMP](/tidb-cloud-lake/sql/covar-samp.md) | Sample covariance | `COVAR_SAMP(x, y)` → `2.7` | +| [KURTOSIS](/tidb-cloud-lake/sql/kurtosis.md) | Measures peakedness of distribution | `KURTOSIS(values)` → `2.1` | +| [SKEWNESS](/tidb-cloud-lake/sql/skewness.md) | Measures asymmetry of distribution | `SKEWNESS(values)` → `0.2` | + +## Percentile and Distribution + +| Function | Description | Example | +|----------|-------------|---------| +| [MEDIAN](/tidb-cloud-lake/sql/median.md) | Calculates the median value | `MEDIAN(response_time)` → `125` | +| [MODE](/tidb-cloud-lake/sql/mode.md) | Returns the most frequent value | `MODE(category)` → `'electronics'` | +| [QUANTILE_CONT](/tidb-cloud-lake/sql/quantile-cont.md) | Continuous interpolation quantile | `QUANTILE_CONT(0.95)(response_time)` → `350.5` | +| [QUANTILE_DISC](/tidb-cloud-lake/sql/quantile-disc.md) | Discrete quantile | `QUANTILE_DISC(0.5)(age)` → `35` | +| [QUANTILE_TDIGEST](/tidb-cloud-lake/sql/quantile-tdigest.md) | Approximate quantile using t-digest | `QUANTILE_TDIGEST(0.9)(values)` → `95.2` | +| [QUANTILE_TDIGEST_WEIGHTED](/tidb-cloud-lake/sql/quantile-tdigest-weighted.md) | Weighted t-digest quantile | `QUANTILE_TDIGEST_WEIGHTED(0.5)(values, weights)` → `50.5` | +| [MEDIAN_TDIGEST](/tidb-cloud-lake/sql/median-tdigest.md) | Approximate median using t-digest | `MEDIAN_TDIGEST(response_time)` → `124.5` | +| [HISTOGRAM](/tidb-cloud-lake/sql/histogram.md) | Creates histogram buckets | `HISTOGRAM(10)(values)` → `[{...}]` | + +## Array and Collection Aggregation + +| Function | Description | Example | +|----------|-------------|---------| +| [ARRAY_AGG](/tidb-cloud-lake/sql/array-agg.md) | Collects values into an array | `ARRAY_AGG(product)` → `['A', 'B', 'C']` | +| [GROUP_ARRAY_MOVING_AVG](/tidb-cloud-lake/sql/group-array-moving-avg.md) | Moving average over array | `GROUP_ARRAY_MOVING_AVG(3)(values)` → `[null, null, 3.0, 6.0, 9.0]` | +| [GROUP_ARRAY_MOVING_SUM](/tidb-cloud-lake/sql/group-array-moving-sum.md) | Moving sum over array | `GROUP_ARRAY_MOVING_SUM(2)(values)` → `[null, 3, 7, 11, 15]` | + +## String Aggregation + +| Function | Description | Example | +|----------|-------------|---------| +| [GROUP_CONCAT](/tidb-cloud-lake/sql/group-concat.md) | Concatenates values with separator | `GROUP_CONCAT(city, ', ')` → `'New York, London, Tokyo'` | +| [STRING_AGG](/tidb-cloud-lake/sql/string-agg.md) | Concatenates strings with separator | `STRING_AGG(tag, ',')` → `'red,green,blue'` | +| [LISTAGG](/tidb-cloud-lake/sql/listagg.md) | Concatenates values with separator | `LISTAGG(name, ', ')` → `'Alice, Bob, Charlie'` | + +## JSON Aggregation + +| Function | Description | Example | +|----------|-------------|---------| +| [JSON_ARRAY_AGG](/tidb-cloud-lake/sql/json-array-agg.md) | Aggregates values as JSON array | `JSON_ARRAY_AGG(name)` → `'["Alice", "Bob", "Charlie"]'` | +| [JSON_OBJECT_AGG](/tidb-cloud-lake/sql/json-object-agg.md) | Creates JSON object from key-value pairs | `JSON_OBJECT_AGG(name, score)` → `'{"Alice": 95, "Bob": 87}'` | + +## Argument Selection + +| Function | Description | Example | +|----------|-------------|---------| +| [ARG_MAX](/tidb-cloud-lake/sql/arg-max.md) | Returns value of expr1 at maximum expr2 | `ARG_MAX(name, score)` → `'Alice'` | +| [ARG_MIN](/tidb-cloud-lake/sql/arg-min.md) | Returns value of expr1 at minimum expr2 | `ARG_MIN(name, score)` → `'Charlie'` | + +## Funnel Analysis + +| Function | Description | Example | +|----------|-------------|---------| +| [RETENTION](/tidb-cloud-lake/sql/retention.md) | Calculates retention rates | `RETENTION(action = 'signup', action = 'purchase')` → `[100, 40]` | +| [WINDOWFUNNEL](/tidb-cloud-lake/sql/window-funnel.md) | Searches for event sequences within time window | `WINDOWFUNNEL(1800)(timestamp, event='view', event='click', event='purchase')` → `2` | + +## Anonymization + +| Function | Description | Example | +|----------|-------------|---------| +| [MARKOV_TRAIN](/tidb-cloud-lake/sql/markov-train.md) | train markov model | `MARKOV_TRAIN(address)` | diff --git a/tidb-cloud-lake/sql/aggregating-index.md b/tidb-cloud-lake/sql/aggregating-index.md new file mode 100644 index 0000000000000..ce3dcad1667f4 --- /dev/null +++ b/tidb-cloud-lake/sql/aggregating-index.md @@ -0,0 +1,20 @@ +--- +title: Aggregating Index +--- +This page provides a comprehensive overview of aggregating index operations in Databend, organized by functionality for easy reference. + +## Aggregating Index Management + +| Command | Description | +|---------|-------------| +| [CREATE AGGREGATING INDEX](/tidb-cloud-lake/sql/create-aggregating-index.md) | Creates a new aggregating index for a table | +| [DROP AGGREGATING INDEX](/tidb-cloud-lake/sql/drop-aggregating-index.md) | Removes an aggregating index | +| [REFRESH AGGREGATING INDEX](/tidb-cloud-lake/sql/refresh-aggregating-index.md) | Updates an aggregating index with the latest data | + +## Related Topics + +- [Aggregating Index](/tidb-cloud-lake/guides/aggregating-index.md) + +:::note +Aggregating indexes in Databend are used to improve the performance of aggregate queries by pre-computing and storing aggregate results. +::: diff --git a/tidb-cloud-lake/sql/alter-cluster-key.md b/tidb-cloud-lake/sql/alter-cluster-key.md new file mode 100644 index 0000000000000..e1f3c6544987c --- /dev/null +++ b/tidb-cloud-lake/sql/alter-cluster-key.md @@ -0,0 +1,38 @@ +--- +title: ALTER CLUSTER KEY +sidebar_position: 3 +--- + +Changes the cluster key for a table. + +See also: +[DROP CLUSTER KEY](/tidb-cloud-lake/sql/drop-cluster-key.md) + +## Syntax + +```sql +ALTER TABLE [ IF EXISTS ] CLUSTER BY ( [ , ... ] ) +``` + +## Examples + +```sql +-- Create table +CREATE TABLE IF NOT EXISTS playground(a int, b int); + +-- Add cluster key by columns +ALTER TABLE playground CLUSTER BY(b,a); + +INSERT INTO playground VALUES(0,3),(1,1); +INSERT INTO playground VALUES(1,3),(2,1); +INSERT INTO playground VALUES(4,4); + +SELECT * FROM playground ORDER BY b,a; +SELECT * FROM clustering_information('db1','playground'); + +-- Delete cluster key +ALTER TABLE playground DROP CLUSTER KEY; + +-- Add cluster key by expressions +ALTER TABLE playground CLUSTER BY(rand()+a); +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/alter-function-sql.md b/tidb-cloud-lake/sql/alter-function-sql.md new file mode 100644 index 0000000000000..6aa7646bbfbb1 --- /dev/null +++ b/tidb-cloud-lake/sql/alter-function-sql.md @@ -0,0 +1,40 @@ +--- +title: ALTER FUNCTION +sidebar_position: 2 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Alters an external function. + +## Syntax + +```sql +ALTER FUNCTION [ IF NOT EXISTS ] + AS ( ) RETURNS LANGUAGE + HANDLER = '' ADDRESS = '' + [DESC=''] +``` + +| Parameter | Description | +|-----------------------|---------------------------------------------------------------------------------------------------| +| `` | The name of the function. | +| `` | The lambda expression or code snippet defining the function's behavior. | +| `DESC=''` | Description of the UDF.| +| `<`| A list of input parameter names. Separated by comma.| +| `<`| A list of input parameter types. Separated by comma.| +| `` | The return type of the function. | +| `LANGUAGE` | Specifies the language used to write the function. Available values: `python`. | +| `HANDLER = ''` | Specifies the name of the function's handler. | +| `ADDRESS = ''` | Specifies the address of the UDF server. | + +## Examples + +```sql +-- Create an external function +CREATE FUNCTION gcd (INT, INT) RETURNS INT LANGUAGE python HANDLER = 'gcd' ADDRESS = 'http://0.0.0.0:8815'; + +-- Modify the handler of the external function +ALTER FUNCTION gcd (INT, INT) RETURNS INT LANGUAGE python HANDLER = 'gcd_new' ADDRESS = 'http://0.0.0.0:8815'; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/alter-function.md b/tidb-cloud-lake/sql/alter-function.md new file mode 100644 index 0000000000000..076f074c16878 --- /dev/null +++ b/tidb-cloud-lake/sql/alter-function.md @@ -0,0 +1,98 @@ +--- +title: ALTER FUNCTION +sidebar_position: 5 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Alters a user-defined function. Supports all function types: Scalar SQL, Tabular SQL, and Embedded functions. + +## Syntax + +### For Scalar SQL Functions +```sql +ALTER FUNCTION [ IF EXISTS ] + ( [] ) + RETURNS + AS $$ $$ + [ DESC='' ] +``` + +### For Tabular SQL Functions +```sql +ALTER FUNCTION [ IF EXISTS ] + ( [] ) + RETURNS TABLE ( ) + AS $$ $$ + [ DESC='' ] +``` + +### For Embedded Functions +```sql +ALTER FUNCTION [ IF EXISTS ] + ( [] ) + RETURNS + LANGUAGE + [IMPORTS = ('', ...)] + [PACKAGES = ('', ...)] + HANDLER = '' + AS $$ $$ + [ DESC='' ] +``` + +## Examples + +### Altering Scalar SQL Function +```sql +-- Create a scalar function +CREATE FUNCTION calculate_tax(income DECIMAL) +RETURNS DECIMAL +AS $$ income * 0.2 $$; + +-- Modify the function to use progressive tax rate +ALTER FUNCTION calculate_tax(income DECIMAL) +RETURNS DECIMAL +AS $$ + CASE + WHEN income <= 50000 THEN income * 0.15 + ELSE income * 0.25 + END +$$; +``` + +### Altering Tabular SQL Function +```sql +-- Create a table function +CREATE FUNCTION get_employees() +RETURNS TABLE (id INT, name VARCHAR(100)) +AS $$ SELECT id, name FROM employees $$; + +-- Modify to include department and salary +ALTER FUNCTION get_employees() +RETURNS TABLE (id INT, name VARCHAR(100), department VARCHAR(100), salary DECIMAL) +AS $$ SELECT id, name, department, salary FROM employees $$; +``` + +### Altering Embedded Function +```sql +-- Create a Python function +CREATE FUNCTION simple_calc(x INT) +RETURNS INT +LANGUAGE python +HANDLER = 'calc' +AS $$ +def calc(x): + return x * 2 +$$; + +-- Modify to use a different calculation +ALTER FUNCTION simple_calc(x INT) +RETURNS INT +LANGUAGE python +HANDLER = 'calc' +AS $$ +def calc(x): + return x * 3 + 1 +$$; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/alter-network-policy.md b/tidb-cloud-lake/sql/alter-network-policy.md new file mode 100644 index 0000000000000..c930477b7898f --- /dev/null +++ b/tidb-cloud-lake/sql/alter-network-policy.md @@ -0,0 +1,41 @@ +--- +title: ALTER NETWORK POLICY +sidebar_position: 3 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Modifies an existing network policy in Databend. + +## Syntax + +```sql +ALTER NETWORK POLICY [ IF EXISTS ] + SET [ ALLOWED_IP_LIST = ('allowed_ip1', 'allowed_ip2', ...) ] + [ BLOCKED_IP_LIST = ('blocked_ip1', 'blocked_ip2', ...) ] + [ COMMENT = 'comment' ] +``` + +| Parameter | Description | +|----------------- |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| policy_name | Specifies the name of the network policy to be modified. | +| ALLOWED_IP_LIST | Specifies a comma-separated list of allowed IP address ranges to update for the policy. This overwrites the existing allowed IP address list with the new one provided. | +| BLOCKED_IP_LIST | Specifies a comma-separated list of blocked IP address ranges to update for the policy. This overwrites the existing blocked IP address list with the new one provided. If this parameter is set to an empty list (), it removes all blocked IP address restrictions. | +| COMMENT | An optional parameter used to update the description or comment associated with the network policy. | + +:::note +This command provides the flexibility to update either the allowed IP list or the blocked IP list, while leaving the other list unchanged. Both ALLOWED_IP_LIST and BLOCKED_IP_LIST are optional parameters. +::: + +## Examples + +```sql +-- Modify the network policy test_policy to change the blocked IP address list from ('192.168.1.99') to ('192.168.1.10'): +ALTER NETWORK POLICY test_policy SET BLOCKED_IP_LIST=('192.168.1.10') + +-- Update the network policy test_policy to allow IP address ranges ('192.168.10.0', '192.168.20.0') and remove any blocked IP address restrictions. Also, change the comment to 'new comment': + +ALTER NETWORK POLICY test_policy SET ALLOWED_IP_LIST=('192.168.10.0', '192.168.20.0') BLOCKED_IP_LIST=() COMMENT='new comment' +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/alter-notification-integration.md b/tidb-cloud-lake/sql/alter-notification-integration.md new file mode 100644 index 0000000000000..e3bff43adf6bb --- /dev/null +++ b/tidb-cloud-lake/sql/alter-notification-integration.md @@ -0,0 +1,46 @@ +--- +title: ALTER NOTIFICATION INTEGRATION +sidebar_position: 2 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Alter the settings of a named notification integration that can be used to send notifications to external messaging services. + +**NOTICE:** this functionality works out of the box only in Databend Cloud. + +## Syntax +### Webhook Notification + +```sql +ALTER NOTIFICATION INTEGRATION [ IF NOT EXISTS ] SET + [ ENABLED = TRUE | FALSE ] + [ WEBHOOK = ( url = , method = , authorization_header = ) ] + [ COMMENT = '' ] +``` + +| Required Parameters | Description | +|---------------------|-------------| +| name | The name of the notification integration. This is a mandatory field. | + + +| Optional Parameters [(Webhook)](#webhook-notification) | Description | +|---------------------|-------------| +| enabled | Whether the notification integration is enabled. | +| url | The URL of the webhook. | +| method | The HTTP method to use when sending the webhook. default is `GET`| +| authorization_header| The authorization header to use when sending the webhook. | +| comment | A comment to associate with the notification integration. | + +## Examples + +### Webhook Notification + +```sql +ALTER NOTIFICATION INTEGRATION SampleNotification SET enabled = true +``` + +This example enables the notification integration named `SampleNotification`. + + diff --git a/tidb-cloud-lake/sql/alter-password-policy.md b/tidb-cloud-lake/sql/alter-password-policy.md new file mode 100644 index 0000000000000..5e3b6ce45f11c --- /dev/null +++ b/tidb-cloud-lake/sql/alter-password-policy.md @@ -0,0 +1,59 @@ +--- +title: ALTER PASSWORD POLICY +sidebar_position: 3 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Modifies an existing password policy in Databend. + +## Syntax + +```sql +-- Modify existing password policy attributes +ALTER PASSWORD POLICY [ IF EXISTS ] SET + [ PASSWORD_MIN_LENGTH = ] + [ PASSWORD_MAX_LENGTH = ] + [ PASSWORD_MIN_UPPER_CASE_CHARS = ] + [ PASSWORD_MIN_LOWER_CASE_CHARS = ] + [ PASSWORD_MIN_NUMERIC_CHARS = ] + [ PASSWORD_MIN_SPECIAL_CHARS = ] + [ PASSWORD_MIN_AGE_DAYS = ] + [ PASSWORD_MAX_AGE_DAYS = ] + [ PASSWORD_MAX_RETRIES = ] + [ PASSWORD_LOCKOUT_TIME_MINS = ] + [ PASSWORD_HISTORY = ] + [ COMMENT = '' ] + +-- Remove specific password policy attributes +ALTER PASSWORD POLICY [ IF EXISTS ] UNSET + [ PASSWORD_MIN_LENGTH ] + [ PASSWORD_MAX_LENGTH ] + [ PASSWORD_MIN_UPPER_CASE_CHARS ] + [ PASSWORD_MIN_LOWER_CASE_CHARS ] + [ PASSWORD_MIN_NUMERIC_CHARS ] + [ PASSWORD_MIN_SPECIAL_CHARS ] + [ PASSWORD_MIN_AGE_DAYS ] + [ PASSWORD_MAX_AGE_DAYS ] + [ PASSWORD_MAX_RETRIES ] + [ PASSWORD_LOCKOUT_TIME_MINS ] + [ PASSWORD_HISTORY ] + [ COMMENT ] +``` + +For detailed descriptions of the password policy attributes, see [Password Policy Attributes](/tidb-cloud-lake/sql/create-password-policy.md#password-policy-attributes). + +## Examples + +This example creates a password policy named 'SecureLogin' with a minimum password length requirement set to 10 characters, later updated to allow passwords between 10 and 16 characters: + +```sql +CREATE PASSWORD POLICY SecureLogin + PASSWORD_MIN_LENGTH = 10; + + +ALTER PASSWORD POLICY SecureLogin SET + PASSWORD_MIN_LENGTH = 10 + PASSWORD_MAX_LENGTH = 16; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/alter-table.md b/tidb-cloud-lake/sql/alter-table.md new file mode 100644 index 0000000000000..defe4f8b0242e --- /dev/null +++ b/tidb-cloud-lake/sql/alter-table.md @@ -0,0 +1,403 @@ +--- +title: ALTER TABLE +sidebar_position: 4 +slug: /sql-commands/ddl/table/alter-table +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +import EEFeature from '@site/src/components/EEFeature'; + +Use `ALTER TABLE` to modify the structure and properties of an existing table, including its columns, comment, storage options, external connection, or even swapping metadata with another table. The subsections below cover each supported capability. + +## Column Operations {#column-operations} + + + +Modify a table by adding, converting, renaming, changing, or removing columns. + +### Syntax + +```sql +-- Add a column to the end of the table +ALTER TABLE [ IF EXISTS ] [ . ] +ADD [ COLUMN ] [ NOT NULL | NULL ] [ DEFAULT ] + +-- Add a column to a specified position +ALTER TABLE [ IF EXISTS ] [ . ] +ADD [ COLUMN ] [ NOT NULL | NULL ] [ DEFAULT ] [ FIRST | AFTER ] + +-- Add a virtual computed column +ALTER TABLE [ IF EXISTS ] [ . ] +ADD [ COLUMN ] AS () VIRTUAL + +-- Convert a stored computed column to a regular column +ALTER TABLE [ IF EXISTS ] [ . ] +MODIFY [ COLUMN ] DROP STORED + +-- Rename a column +ALTER TABLE [ IF EXISTS ] [ . ] +RENAME [ COLUMN ] TO + +-- Change data type +ALTER TABLE [ IF EXISTS ] [ . ] +MODIFY [ COLUMN ] [ DEFAULT ] + [ , [ COLUMN ] [ DEFAULT ] ] + ... + +-- Change comment +ALTER TABLE [ IF EXISTS ] [ . ] +MODIFY [ COLUMN ] [ COMMENT '' ] +[ , [ COLUMN ] [ COMMENT '' ] ] +... + +-- Set / Unset masking policy for a column +ALTER TABLE [ IF EXISTS ] [ . ] +MODIFY [ COLUMN ] SET MASKING POLICY + [ USING ( [ , ... ] ) ] + +ALTER TABLE [ IF EXISTS ] [ . ] +MODIFY [ COLUMN ] UNSET MASKING POLICY + +-- Remove a column +ALTER TABLE [ IF EXISTS ] [ . ] +DROP [ COLUMN ] +``` + +:::note +- Only a constant value can be accepted as a default value when adding or modifying a column. If a non-constant expression is used, an error will occur. +- Adding a stored computed column with ALTER TABLE is not supported yet. +- When you change the data type of a table's columns, there's a risk of conversion errors. For example, if you try to convert a column with text (String) to numbers (Float), it might cause problems. +- When you set a masking policy for a column, make sure that the data type (refer to the parameter *arg_type_to_mask* in the syntax of [CREATE MASKING POLICY](../12-mask-policy/create-mask-policy.md)) defined in the policy matches the column. +- Use the optional `USING` clause when the policy definition expects additional parameters. List the column mapped to each policy argument in order; the first argument always represents the column being masked. +- If you include `USING`, provide at least the masked column plus any additional columns needed by the policy. The first identifier in `USING (...)` must match the column being modified. +- Masking policies can only be attached to regular tables. Views, streams, and temporary tables do not allow `SET MASKING POLICY`. +- A column can belong to at most one security policy (masking or row-level). Remove the existing policy before attaching a new one. +- Attaching, detaching, describing, or dropping a masking policy requires the global `APPLY MASKING POLICY` privilege or APPLY/OWNERSHIP on the specific masking policy. +- Adding, removing, describing, or dropping a row access policy requires the global `APPLY ROW ACCESS POLICY` privilege or APPLY/OWNERSHIP on that policy. +::: + +:::caution +You must `ALTER TABLE ... MODIFY COLUMN
UNSET MASKING POLICY` before changing the column definition or dropping the column; otherwise the statement fails because the column is still protected by a security policy. +::: + +### Examples + +#### Example 1: Adding, Renaming, and Removing a Column + +This example illustrates the creation of a table called "default.users" with columns 'username', 'email', and 'age'. It showcases the addition of columns 'id' and 'middle_name' with various constraints. The example also demonstrates the renaming and subsequent removal of the "age" column. + +```sql +-- Create a table +CREATE TABLE default.users ( + username VARCHAR(50) NOT NULL, + email VARCHAR(255), + age INT +); + +-- Add a column to the end of the table +ALTER TABLE default.users +ADD COLUMN business_email VARCHAR(255) NOT NULL DEFAULT 'example@example.com'; + +DESC default.users; + +Field |Type |Null|Default |Extra| +--------------+-------+----+---------------------+-----+ +username |VARCHAR|NO |'' | | +email |VARCHAR|YES |NULL | | +age |INT |YES |NULL | | +business_email|VARCHAR|NO |'example@example.com'| | + +-- Add a column to the beginning of the table +ALTER TABLE default.users +ADD COLUMN id int NOT NULL FIRST; + +DESC default.users; + +Field |Type |Null|Default |Extra| +--------------+-------+----+---------------------+-----+ +id |INT |NO |0 | | +username |VARCHAR|NO |'' | | +email |VARCHAR|YES |NULL | | +age |INT |YES |NULL | | +business_email|VARCHAR|NO |'example@example.com'| | + +-- Add a column after the column 'username' +ALTER TABLE default.users +ADD COLUMN middle_name VARCHAR(50) NULL AFTER username; + +DESC default.users; + +Field |Type |Null|Default |Extra| +--------------+-------+----+---------------------+-----+ +id |INT |NO |0 | | +username |VARCHAR|NO |'' | | +middle_name |VARCHAR|YES |NULL | | +email |VARCHAR|YES |NULL | | +age |INT |YES |NULL | | +business_email|VARCHAR|NO |'example@example.com'| | + +-- Rename a column +ALTER TABLE default.users +RENAME COLUMN age TO new_age; + +DESC default.users; + +Field |Type |Null|Default |Extra| +--------------+-------+----+---------------------+-----+ +id |INT |NO |0 | | +username |VARCHAR|NO |'' | | +middle_name |VARCHAR|YES |NULL | | +email |VARCHAR|YES |NULL | | +new_age |INT |YES |NULL | | +business_email|VARCHAR|NO |'example@example.com'| | + +-- Remove a column +ALTER TABLE default.users +DROP COLUMN new_age; + +DESC default.users; + +Field |Type |Null|Default |Extra| +--------------+-------+----+---------------------+-----+ +id |INT |NO |0 | | +username |VARCHAR|NO |'' | | +middle_name |VARCHAR|YES |NULL | | +email |VARCHAR|YES |NULL | | +``` + +#### Example 2: Modify Columns and Masking Policies + +```sql +-- Change column types and defaults +ALTER TABLE users +MODIFY COLUMN age BIGINT DEFAULT 18, + COLUMN email VARCHAR(320) DEFAULT ''; + +-- Add masking policy that expects extra arguments +ALTER TABLE users +MODIFY COLUMN email SET MASKING POLICY pii_email USING (email, username); + +-- To drop or alter the column, remove the policy first +ALTER TABLE users +MODIFY COLUMN email UNSET MASKING POLICY; +``` + +## Table Comment {#table-comment} + +Modifies the comment of a table. If the table does not have a comment yet, this command adds the specified comment to the table. + +### Syntax + +```sql +ALTER TABLE [ IF EXISTS ] [ . ] +COMMENT = '' +``` + +### Examples + +```sql +-- Create a table with a comment +CREATE TABLE t(id INT) COMMENT ='original-comment'; + +SHOW CREATE TABLE t; + +┌──────────────────────────────────────────────────────────────────────────────────────┐ +│ Table │ Create Table │ +├────────┼─────────────────────────────────────────────────────────────────────────────┤ +│ t │ CREATE TABLE t (\n id INT NULL\n) ENGINE=FUSE COMMENT = 'original-comment' │ +└──────────────────────────────────────────────────────────────────────────────────────┘ + +-- Modify the comment +ALTER TABLE t COMMENT = 'new-comment'; + +SHOW CREATE TABLE t; + +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ Table │ Create Table │ +├────────┼────────────────────────────────────────────────────────────────────────┤ +│ t │ CREATE TABLE t (\n id INT NULL\n) ENGINE=FUSE COMMENT = 'new-comment' │ +└─────────────────────────────────────────────────────────────────────────────────┘ +``` + +```sql +-- Create a table without comment +CREATE TABLE t(id INT); + +-- Add a comment later +ALTER TABLE t COMMENT = 'new-comment'; +``` + +## Fuse Engine Options {#fuse-engine-options} + +Sets or unsets [Fuse Engine options](/tidb-cloud-lake/sql/table-engines.md#fuse-engine-options) for a table. + +### Syntax + +```sql +-- Set Fuse Engine options +ALTER TABLE [ . ] SET OPTIONS () + +-- Unset Fuse Engine options, reverting them to their default values +ALTER TABLE [ . ] UNSET OPTIONS () +``` + +Only the following Fuse Engine options can be unset: + +- `block_per_segment` +- `block_size_threshold` +- `data_retention_period_in_hours` +- `data_retention_num_snapshots_to_keep` +- `row_avg_depth_threshold` +- `row_per_block` +- `row_per_page` + +### Examples + +```sql +CREATE TABLE fuse_table (a int); + +SET hide_options_in_show_create_table=0; + +-- Show current options +SHOW CREATE TABLE fuse_table; + +-- Change Fuse options +ALTER TABLE fuse_table SET OPTIONS (block_per_segment = 500, data_retention_period_in_hours = 240); + +-- Show updated options +SHOW CREATE TABLE fuse_table; +``` + +```sql +-- Limit snapshots and enable auto vacuum +CREATE OR REPLACE TABLE t(c INT); +ALTER TABLE t SET OPTIONS(data_retention_num_snapshots_to_keep = 1); +SET enable_auto_vacuum = 1; +INSERT INTO t VALUES(1); +INSERT INTO t VALUES(2); +INSERT INTO t VALUES(3); + +-- Revert options to defaults +ALTER TABLE fuse_table UNSET OPTIONS (block_per_segment, data_retention_period_in_hours); +``` + +## External Table Connection {#external-table-connection} + +Updates the connection settings for an external table. Only credential-related fields (`access_key_id`, `secret_access_key`, `role_arn`) are applied when the command runs. Other properties such as `bucket`, `region`, or `root` remain unchanged. + +### Syntax + +```sql +ALTER TABLE [ . ] CONNECTION = ( connection_name = '' ) +``` + +| Parameter | Description | Required | +|-----------|-------------|----------| +| connection_name | Name of the connection to be used for the external table. The connection must already exist in the system. | Yes | + +This command is particularly useful when credentials need to be rotated or when IAM roles change. The specified connection must already exist before it can be used with this command. + +**Security best practices** + +When working with external tables, AWS IAM roles provide significant security advantages over access keys: + +- No stored credentials: Eliminates the need to store access keys in your configuration +- Automatic rotation: Handles credential rotation automatically +- Fine-grained control: Allows for more precise access control + +To use IAM roles with Databend Cloud, see [Authenticate with AWS IAM Role](/guides/cloud/security/iam-role). + +### Examples + +```sql +-- Create connections +CREATE CONNECTION external_table_conn + STORAGE_TYPE = 's3' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = ''; + +CREATE CONNECTION external_table_conn_new + STORAGE_TYPE = 's3' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = ''; + +-- Create an external table using the first connection +CREATE OR REPLACE TABLE external_table_test ( + id INT, + name VARCHAR, + age INT +) +'s3://testbucket/13_fuse_external_table/' +CONNECTION=(connection_name = 'external_table_conn'); + +-- Update to use the new connection +ALTER TABLE external_table_test CONNECTION=( connection_name = 'external_table_conn_new' ); +``` + +```sql +-- Migrate to IAM role authentication +CREATE CONNECTION s3_access_key_conn + STORAGE_TYPE = 's3' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = ''; + +CREATE TABLE sales_data ( + order_id INT, + product_name VARCHAR, + quantity INT +) +'s3://sales-bucket/data/' +CONNECTION=(connection_name = 's3_access_key_conn'); + +CREATE CONNECTION s3_role_conn + STORAGE_TYPE = 's3' + ROLE_ARN = 'arn:aws:iam::123456789012:role/databend-access'; + +ALTER TABLE sales_data CONNECTION=( connection_name = 's3_role_conn' ); +``` + +## Swap Tables {#swap-tables} + +Swaps all table metadata and data between two tables atomically in a single transaction. This operation exchanges the table schemas, including all columns, constraints, and data, effectively making each table take on the identity of the other. + +### Syntax + +```sql +ALTER TABLE [ IF EXISTS ] SWAP WITH +``` + +| Parameter | Description | +|----------------------|------------------------------------------------| +| `source_table_name` | The name of the first table to swap | +| `target_table_name` | The name of the second table to swap with | + +### Usage Notes + +- Only available for Fuse Engine tables. External tables, system tables, and other non-Fuse tables are not supported. +- Temporary tables cannot be swapped with permanent or transient tables. +- The current role must be the owner of both tables to perform the swap operation. +- Both tables must be in the same database. Cross-database swapping is not supported. +- The swap operation is atomic. Either both tables are swapped successfully, or neither is changed. +- All data and metadata are preserved during the swap. No data is lost or modified. + +### Examples + +```sql +-- Create two tables with different schemas +CREATE OR REPLACE TABLE t1(a1 INT, a2 VARCHAR, a3 DATE); +CREATE OR REPLACE TABLE t2(b1 VARCHAR); + +-- Check table schemas before swap +DESC t1; +DESC t2; + +-- Swap the tables +ALTER TABLE t1 SWAP WITH t2; + +-- After swapping, t1 now has t2's schema, and t2 has t1's schema +DESC t1; +DESC t2; +``` diff --git a/tidb-cloud-lake/sql/alter-task.md b/tidb-cloud-lake/sql/alter-task.md new file mode 100644 index 0000000000000..c04589dd54f3e --- /dev/null +++ b/tidb-cloud-lake/sql/alter-task.md @@ -0,0 +1,83 @@ +--- +title: ALTER TASK +sidebar_position: 2 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +The ALTER TASK statement is used to modify an existing task. + +**NOTICE:** this functionality works out of the box only in Databend Cloud. + +## Syntax + +```sql +--- suspend or resume a task +ALTER TASK [ IF EXISTS ] RESUME | SUSPEND + +--- change task settings +ALTER TASK [ IF EXISTS ] SET + [ WAREHOUSE = ] + [ SCHEDULE = { MINUTE | SECOND | USING CRON } ] + [ SUSPEND_TASK_AFTER_NUM_FAILURES = ] + [ = [ , = ... ] ] + [ COMMENT = ] + +--- change task SQL +ALTER TASK [ IF EXISTS ] MODIFY AS + +--- modify DAG when condition and after condition +ALTER TASK [ IF EXISTS ] REMOVE AFTER | ADD AFTER +--- allow to change condition for task execution +ALTER TASK [ IF EXISTS ] MODIFY WHEN +``` + +| Parameter | Description | +|----------------------------------|------------------------------------------------------------------------------------------------------| +| IF EXISTS | Optional. If specified, the task will only be altered if a task of the same name already exists. | +| name | The name of the task. This is a mandatory field. | +| RESUME \| SUSPEND | Resume or suspend the task. | +| SET | Change task settings. details parameter descriptions could be found on see [Create Task](/tidb-cloud-lake/sql/create-task.md). | +| MODIFY AS | Change task SQL. | +| REMOVE AFTER | Remove predecessor task from the task dag, task would become a standalone task or a root task if no predecessor tasks left. | +| ADD AFTER | Add predecessor task to the task dag. | +| MODIFY WHEN | Change the condition for task execution. | + +## Examples + +```sql +ALTER TASK IF EXISTS mytask SUSPEND; +``` +This command suspends the task named mytask if it exists. + +```sql +ALTER TASK IF EXISTS mytask SET + WAREHOUSE = 'new_warehouse' + SCHEDULE = USING CRON '0 12 * * * *' 'UTC'; +``` +This example alters the mytask task, changing its warehouse to new_warehouse and updating its schedule to run daily at noon UTC. + +```sql +ALTER TASK IF EXISTS mytask MODIFY +AS +INSERT INTO new_table SELECT * FROM source_table; +``` +Here, the SQL statement executed by mytask is changed to insert data from source_table into new_table. + +```sql +ALTER TASK mytaskchild MODIFY WHEN STREAM_STATUS('stream3') = False; +``` +In this example, we are modifying the mytaskchild task to change its WHEN condition. The task will now only run if the STREAM_STATUS function for 'stream3' evaluates to False. This means the task will execute when 'stream3' does not contain change data. + +```sql +ALTER TASK MyTask1 ADD AFTER 'task2'; +``` +In this example, we are adding dependencies to the MyTask1 task. It will now run after the successful completion of both 'task2' and 'task3'. This creates a dependency relationship in a Directed Acyclic Graph (DAG) of tasks. + +```sql +ALTER TASK MyTask1 REMOVE AFTER 'task2'; +``` +Here, we are removing a specific dependency for the MyTask1 task. It will no longer run after 'task2'. This can be useful if you want to modify the task's dependencies within a DAG of tasks. + diff --git a/tidb-cloud-lake/sql/alter-user.md b/tidb-cloud-lake/sql/alter-user.md new file mode 100644 index 0000000000000..18f45c4dd7726 --- /dev/null +++ b/tidb-cloud-lake/sql/alter-user.md @@ -0,0 +1,151 @@ +--- +title: ALTER USER +sidebar_position: 2 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Modifies a user account, including: + +- Changing the user's password and authentication type. +- Setting or unsetting a password policy. +- Setting or unsetting a network policy. +- Setting or modifying the default role. If it is not explicitly set, Databend will default to using the built-in role `public` as the default role. + +## Syntax + +```sql +-- Modify password / authentication type +ALTER USER IDENTIFIED [ WITH auth_type ] BY '' [ WITH MUST_CHANGE_PASSWORD = true | false ] + +-- Require user to modify password at next login +ALTER USER WITH MUST_CHANGE_PASSWORD = true + +-- Modify password for currently logged-in user +ALTER USER USER() IDENTIFIED BY '' + +-- Set password policy +ALTER USER WITH SET PASSWORD POLICY = '' + +-- Unset password policy +ALTER USER WITH UNSET PASSWORD POLICY + +-- Set network policy +ALTER USER WITH SET NETWORK POLICY = '' + +-- Unset network policy +ALTER USER WITH UNSET NETWORK POLICY + +-- Set default role +ALTER USER WITH DEFAULT_ROLE = '' + +-- Enable or disable user +ALTER USER WITH DISABLED = true | false + +-- Set workload group +ALTER USER WITH SET WORKLOAD GROUP = '' + +-- Unset workload group +ALTER USER WITH UNSET WORKLOAD GROUP +``` + +- *auth_type* can be `double_sha1_password` (default), `sha256_password` or `no_password`. +- When `MUST_CHANGE_PASSWORD` is set to `true`, the user must change their password at the next login. Please note that this takes effect only for users who have never changed their password since their account was created. If a user has ever changed their password themselves, then they do not need to change it again. +- When you set a default role for a user using [CREATE USER](/tidb-cloud-lake/sql/create-user.md) or ALTER USER, Databend does not verify the role's existence or automatically grant the role to the user. You must explicitly grant the role to the user for the role to take effect. +- `DISABLED` allows you to enable or disable a user. Disabled users cannot log in to Databend until they are enabled. Click [here](/tidb-cloud-lake/sql/create-user.md#example-5-creating-user-in-disabled-state) to see an example. + + +## Examples + +### Example 1: Changing Password & Authentication Type + +```sql +CREATE USER user1 IDENTIFIED BY 'abc123'; + +SHOW USERS; ++-----------+----------+----------------------+---------------+ +| name | hostname | auth_type | is_configured | ++-----------+----------+----------------------+---------------+ +| user1 | % | double_sha1_password | NO | ++-----------+----------+----------------------+---------------+ + +ALTER USER user1 IDENTIFIED WITH sha256_password BY '123abc'; + +SHOW USERS; ++-------+----------+-----------------+---------------+ +| name | hostname | auth_type | is_configured | ++-------+----------+-----------------+---------------+ +| user1 | % | sha256_password | NO | ++-------+----------+-----------------+---------------+ + +ALTER USER 'user1' IDENTIFIED WITH no_password; + +show users; ++-------+----------+-------------+---------------+ +| name | hostname | auth_type | is_configured | ++-------+----------+-------------+---------------+ +| user1 | % | no_password | NO | ++-------+----------+-------------+---------------+ +``` + +### Example 2: Setting & Unsetting Network Policy + +```sql +SHOW NETWORK POLICIES; + +Name |Allowed Ip List |Blocked Ip List|Comment | +------------+-------------------------+---------------+-----------+ +test_policy |192.168.10.0,192.168.20.0| |new comment| +test_policy1|192.168.100.0/24 | | | + +CREATE USER user1 IDENTIFIED BY 'abc123'; + +ALTER USER user1 WITH SET NETWORK POLICY='test_policy'; + +ALTER USER user1 WITH SET NETWORK POLICY='test_policy1'; + +ALTER USER user1 WITH UNSET NETWORK POLICY; +``` + +### Example 3: Setting Default Role + +1. Create a user named "user1" and set the default role as "writer": + +```sql title='Connect as user "root":' + +CREATE USER user1 IDENTIFIED BY 'abc123'; + +GRANT ROLE developer TO user1; + +GRANT ROLE writer TO user1; + +ALTER USER user1 WITH DEFAULT_ROLE = 'writer'; +``` + +2. Verify the default role of user "user1" using the [SHOW ROLES](/tidb-cloud-lake/sql/show-roles.md) command: + +```sql title='Connect as user "user1":' +eric@Erics-iMac ~ % bendsql --user user1 --password abc123 +show roles; +┌───────────────────────────────────────────────────────┐ +│ name │ inherited_roles │ is_current │ is_default │ +│ String │ UInt64 │ Boolean │ Boolean │ +├───────────┼─────────────────┼────────────┼────────────┤ +│ developer │ 0 │ false │ false │ +│ public │ 0 │ false │ false │ +│ writer │ 0 │ true │ true │ +└───────────────────────────────────────────────────────┘ +``` + +### Example 2: Setting & Unsetting Workload Group + +```sql +CREATE USER user1 IDENTIFIED BY 'abc123'; + +ALTER USER user1 WITH SET WORKLOAD GROUP='wg'; + +ALTER USER user1 WITH SET WORKLOAD GROUP='wg1'; + +ALTER USER user1 WITH UNSET WORKLOAD GROUP; +``` diff --git a/tidb-cloud-lake/sql/alter-view.md b/tidb-cloud-lake/sql/alter-view.md new file mode 100644 index 0000000000000..1b3b2c8de49d2 --- /dev/null +++ b/tidb-cloud-lake/sql/alter-view.md @@ -0,0 +1,38 @@ +--- +title: ALTER VIEW +sidebar_position: 2 +--- + +Alter the existing view by using another `QUERY`. + +## Syntax + +```sql +ALTER VIEW [ . ]view_name [ (, ...) ] AS SELECT query +``` + +## Examples + +```sql +CREATE VIEW tmp_view AS SELECT number % 3 AS a, avg(number) FROM numbers(1000) GROUP BY a ORDER BY a; + +SELECT * FROM tmp_view; ++------+-------------+ +| a | avg(number) | ++------+-------------+ +| 0 | 499.5 | +| 1 | 499.0 | +| 2 | 500.0 | ++------+-------------+ + +ALTER VIEW tmp_view(c1) AS SELECT * from numbers(3); + +SELECT * FROM tmp_view; ++------+ +| c1 | ++------+ +| 0 | +| 1 | +| 2 | ++------+ +``` diff --git a/tidb-cloud-lake/sql/alter-warehouse.md b/tidb-cloud-lake/sql/alter-warehouse.md new file mode 100644 index 0000000000000..34799c1ef8266 --- /dev/null +++ b/tidb-cloud-lake/sql/alter-warehouse.md @@ -0,0 +1,95 @@ +--- +title: ALTER WAREHOUSE +sidebar_position: 4 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Suspends, resumes, or modifies settings of an existing warehouse. + +## Syntax + +```sql +-- Suspend or resume a warehouse +ALTER WAREHOUSE { SUSPEND | RESUME } + +-- Modify warehouse settings +ALTER WAREHOUSE + SET [ warehouse_size = ] + [ auto_suspend = ] + [ auto_resume = ] + [ max_cluster_count = ] + [ min_cluster_count = ] + [ comment = '' ] + +ALTER WAREHOUSE SET TAG = '' [ , = '' ... ] + +ALTER WAREHOUSE UNSET TAG [ , ... ] + +ALTER WAREHOUSE RENAME TO +``` + +| Parameter | Description | +| --------- | ---------------------------------------------------------------------------- | +| `SUSPEND` | Immediately suspends the warehouse. | +| `RESUME` | Immediately resumes the warehouse. | +| `SET` | Modifies one or more warehouse options. Unspecified fields remain unchanged. | + +## Options + +The `SET` clause accepts the same options as [CREATE WAREHOUSE](/tidb-cloud-lake/sql/create-warehouse.md): + +| Option | Type / Values | Description | +| ------------------- | ------------------------------------------------------------------- | -------------------------------------------------------------------- | +| `WAREHOUSE_SIZE` | `XSmall`, `Small`, `Medium`, `Large`, `XLarge`, `2XLarge`–`6XLarge` | Changes compute size. | +| `AUTO_SUSPEND` | `NULL`, `0`, or ≥300 seconds | Idle timeout before automatic suspend. `NULL` disables auto-suspend. | +| `AUTO_RESUME` | Boolean | Controls whether incoming queries wake the warehouse automatically. | +| `MAX_CLUSTER_COUNT` | `NULL` or non-negative integer | Upper bound for auto-scaling clusters. | +| `MIN_CLUSTER_COUNT` | `NULL` or non-negative integer | Lower bound for auto-scaling clusters. | +| `COMMENT` | String | Free-form text description. | + +- `NULL` is valid for numeric options to reset them to `0`. +- Supplying `SET` with no options raises an error. +- `SET TAG` adds or updates one or more tags. Multiple tags can be set in a single statement separated by commas. +- `UNSET TAG` removes one or more tags by their keys. Non-existent tag keys are silently ignored. +- `RENAME TO` requires the warehouse to be suspended and uses the same naming rules as `CREATE`. + +## Examples + +Suspend a warehouse: + +```sql +ALTER WAREHOUSE my_wh SUSPEND; +``` + +Resume a warehouse: + +```sql +ALTER WAREHOUSE my_wh RESUME; +``` + +Modify warehouse settings: + +```sql +ALTER WAREHOUSE my_wh + SET warehouse_size = Large + auto_resume = TRUE + comment = 'Serving tier'; +``` + +Disable auto-suspend: + +```sql +ALTER WAREHOUSE my_wh SET auto_suspend = NULL; +``` + +Manage tags: + +```sql +ALTER WAREHOUSE wh_hot SET TAG environment = 'production'; +ALTER WAREHOUSE wh_hot SET TAG environment = 'staging', owner = 'john', cost_center = 'eng'; +ALTER WAREHOUSE wh_hot UNSET TAG environment; +ALTER WAREHOUSE wh_hot UNSET TAG environment, owner, cost_center; +``` diff --git a/tidb-cloud-lake/sql/alter-workload-group.md b/tidb-cloud-lake/sql/alter-workload-group.md new file mode 100644 index 0000000000000..97a07f3051574 --- /dev/null +++ b/tidb-cloud-lake/sql/alter-workload-group.md @@ -0,0 +1,32 @@ +--- +title: ALTER WORKLOAD GROUP +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Update a workload group with specified quota settings. + +## Syntax + +```sql +ALTER WORKLOAD GROUP +[SET cpu_quota = '', query_timeout = ''] +``` + +## Parameters + +| Parameter | Type | Required | Default | Description | +|------------------------|----------|----------|--------------|-----------------------------------------------------------------------------| +| `cpu_quota` | string | No | (unlimited) | CPU resource quota as percentage string (e.g. `"20%"`) | +| `query_timeout` | duration | No | (unlimited) | Query timeout duration (units: `s`/`sec`=seconds, `m`/`min`=minutes, `h`/`hour`=hours, `d`/`day`=days, `ms`=milliseconds, unitless=seconds) | +| `memory_quota` | string or integer | No | (unlimited) | Maximum memory usage limit for workload group (percentage or absolute value) | +| `max_concurrency` | integer | No | (unlimited) | Maximum concurrency number for workload group | +| `query_queued_timeout` | duration | No | (unlimited) | Maximum queuing wait time when workload group exceeds max concurrency (units: `s`/`sec`=seconds, `m`/`min`=minutes, `h`/`hour`=hours, `d`/`day`=days, `ms`=milliseconds, unitless=seconds) | + +## Examples + +```sql +ALTER WORKLOAD GROUP analytics SET cpu_quota = '20%', query_timeout = '10m'; +``` + diff --git a/tidb-cloud-lake/sql/any-value.md b/tidb-cloud-lake/sql/any-value.md new file mode 100644 index 0000000000000..24445aa0271d3 --- /dev/null +++ b/tidb-cloud-lake/sql/any-value.md @@ -0,0 +1,79 @@ +--- +title: ANY_VALUE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Aggregate function. + +The `ANY_VALUE()` function returns an arbitrary non-NULL value from the input expression. It's used in `GROUP BY` queries when you need to select a column that isn't grouped or aggregated. + +> **Alias:** `ANY()` returns the same result as `ANY_VALUE()` and remains available for compatibility. + +## Syntax + +```sql +ANY_VALUE() +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------| +| `` | Any expression | + +## Return Type + +The type of ``. If all values are NULL, the return value is NULL. + +:::note +- `ANY_VALUE()` is non-deterministic and may return different values across executions. +- For predictable results, use `MIN()` or `MAX()` instead. +::: + +## Example + +**Sample Data:** +```sql +CREATE TABLE sales ( + region VARCHAR, + manager VARCHAR, + sales_amount DECIMAL(10, 2) +); + +INSERT INTO sales VALUES + ('North', 'Alice', 15000.00), + ('North', 'Alice', 12000.00), + ('South', 'Bob', 20000.00); +``` + +**Problem:** This query fails because `manager` isn't in GROUP BY: +```sql +SELECT region, manager, SUM(sales_amount) -- ❌ Error +FROM sales GROUP BY region; +``` + +**Old approach:** Add `manager` to GROUP BY, but this creates more groups than needed and hurts performance: +```sql +SELECT region, manager, SUM(sales_amount) +FROM sales GROUP BY region, manager; -- ❌ Poor performance due to extra grouping +``` + +**Better solution:** Use `ANY_VALUE()` to select the manager: +```sql +SELECT + region, + ANY_VALUE(manager) AS manager, -- ✅ Works + SUM(sales_amount) AS total_sales +FROM sales +GROUP BY region; +``` + +**Result:** +```text +| region | manager | total_sales | +|--------|---------|-------------| +| North | Alice | 27000.00 | +| South | Bob | 20000.00 | +``` diff --git a/tidb-cloud-lake/sql/apache-hive-tables.md b/tidb-cloud-lake/sql/apache-hive-tables.md new file mode 100644 index 0000000000000..df6598d4bd9eb --- /dev/null +++ b/tidb-cloud-lake/sql/apache-hive-tables.md @@ -0,0 +1,75 @@ +--- +id: hive +title: Apache Hive Tables +sidebar_label: Apache Hive Tables +slug: /sql-reference/table-engines/hive +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Databend can query data that is cataloged by Apache Hive without copying it. Register the Hive Metastore as a Databend catalog, point to the object storage that holds the table data, and then query the tables as if they were native Databend objects. + +## Quick Start + +1. **Register the Hive Metastore** + + ```sql + CREATE CATALOG hive_prod + TYPE = HIVE + CONNECTION = ( + METASTORE_ADDRESS = '127.0.0.1:9083' + URL = 's3://lakehouse/' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = '' + ); + ``` + +2. **Explore the catalog** + + ```sql + USE CATALOG hive_prod; + SHOW DATABASES; + SHOW TABLES FROM tpch; + ``` + +3. **Query Hive tables** + + ```sql + SELECT l_orderkey, SUM(l_extendedprice) AS revenue + FROM tpch.lineitem + GROUP BY l_orderkey + ORDER BY revenue DESC + LIMIT 10; + ``` + +## Keep Metadata Fresh + +Hive schemas or partitions can change outside of Databend. Refresh Databend’s cached metadata when that happens: + +```sql +ALTER TABLE tpch.lineitem REFRESH CACHE; +``` + +## Data Type Mapping + +Databend automatically converts Hive primitive types to their closest native equivalents when queries run: + +| Hive Type | Databend Type | +| --------- | ------------- | +| `BOOLEAN` | [BOOLEAN](/tidb-cloud-lake/sql/boolean.md) | +| `TINYINT`, `SMALLINT`, `INT`, `BIGINT` | [Integer types](/tidb-cloud-lake/sql/numeric.md#integer-data-types) | +| `FLOAT`, `DOUBLE` | [Floating-point types](/tidb-cloud-lake/sql/numeric.md#floating-point-data-types) | +| `DECIMAL(p,s)` | [DECIMAL](/tidb-cloud-lake/sql/decimal.md) | +| `STRING`, `VARCHAR`, `CHAR` | [STRING](/tidb-cloud-lake/sql/string.md) | +| `DATE`, `TIMESTAMP` | [DATETIME](/tidb-cloud-lake/sql/datetime.md) | +| `ARRAY` | [ARRAY](/tidb-cloud-lake/sql/array.md) | +| `MAP` | [MAP](/tidb-cloud-lake/sql/map.md) | + +Nested structures such as `STRUCT` are surfaced through the [VARIANT](/tidb-cloud-lake/sql/variant.md) type. + +## Notes and Limitations + +- Hive catalogs are **read-only** in Databend (writes must happen through Hive-compatible engines). +- Access to the underlying object storage is required; configure credentials by using [connection parameters](/tidb-cloud-lake/sql/connection-parameters.md). +- Use `ALTER TABLE ... REFRESH CACHE` whenever table layout changes (for example, new partitions) to keep query results up to date. diff --git a/tidb-cloud-lake/sql/apache-icebergtm-tables.md b/tidb-cloud-lake/sql/apache-icebergtm-tables.md new file mode 100644 index 0000000000000..96da01762dbf8 --- /dev/null +++ b/tidb-cloud-lake/sql/apache-icebergtm-tables.md @@ -0,0 +1,563 @@ +--- +id: iceberg +title: Apache Iceberg™ Tables +sidebar_label: Apache Iceberg™ Tables +slug: /sql-reference/table-engines/iceberg +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Databend supports the integration of an [Apache Iceberg™](https://iceberg.apache.org/) catalog, enhancing its compatibility and versatility for data management and analytics. This extends Databend's capabilities by seamlessly incorporating the powerful metadata and storage management capabilities of Apache Iceberg™ into the platform. + +## Quick Start with Iceberg + +If you want to quickly try out Iceberg and experiment with table operations locally, a [Docker-based starter project](https://github.com/databendlabs/iceberg-quick-start) is available. This setup allows you to: + +- Run Spark with Iceberg support +- Use a REST catalog (Iceberg REST Fixture) +- Simulate an S3-compatible object store using MinIO +- Load sample TPC-H data into Iceberg tables for query testing + +### Prerequisites + +Before you start, make sure Docker and Docker Compose are installed on your system. + +### Start Iceberg Environment + +```bash +git clone https://github.com/databendlabs/iceberg-quick-start.git +cd iceberg-quick-start +docker compose up -d +``` + +This will start the following services: + +- `spark-iceberg`: Spark 3.4 with Iceberg +- `rest`: Iceberg REST Catalog +- `minio`: S3-compatible object store +- `mc`: MinIO client (for setting up the bucket) + +```bash +WARN[0000] /Users/eric/iceberg-quick-start/docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion +[+] Running 5/5 + ✔ Network iceberg-quick-start_iceberg_net Created 0.0s + ✔ Container iceberg-rest-test Started 0.4s + ✔ Container minio Started 0.4s + ✔ Container mc Started 0.6s + ✔ Container spark-iceberg S... 0.7s +``` + +### Load TPC-H Data via Spark Shell + +Run the following command to generate and load sample TPC-H data into the Iceberg tables: + +```bash +docker exec spark-iceberg bash /home/iceberg/load_tpch.sh +``` + +```bash +Collecting duckdb + Downloading duckdb-1.2.2-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (18.7 MB) + ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.7/18.7 MB 5.8 MB/s eta 0:00:00 +Requirement already satisfied: pyspark in /opt/spark/python (3.5.5) +Collecting py4j==0.10.9.7 + Downloading py4j-0.10.9.7-py2.py3-none-any.whl (200 kB) + ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.5/200.5 kB 5.9 MB/s eta 0:00:00 +Installing collected packages: py4j, duckdb +Successfully installed duckdb-1.2.2 py4j-0.10.9.7 + +[notice] A new release of pip is available: 23.0.1 -> 25.1.1 +[notice] To update, run: pip install --upgrade pip +Setting default log level to "WARN". +To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). +25/05/07 12:17:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable +25/05/07 12:17:28 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. +[2025-05-07 12:17:18] [INFO] Starting TPC-H data generation and loading process +[2025-05-07 12:17:18] [INFO] Configuration: Scale Factor=1, Data Dir=/home/iceberg/data/tpch_1 +[2025-05-07 12:17:18] [INFO] Generating TPC-H data with DuckDB (Scale Factor: 1) +[2025-05-07 12:17:27] [INFO] Generated 8 Parquet files in /home/iceberg/data/tpch_1 +[2025-05-07 12:17:28] [INFO] Loading data into Iceberg catalog +[2025-05-07 12:17:33] [INFO] Created Iceberg table: demo.tpch.part from part.parquet +[2025-05-07 12:17:33] [INFO] Created Iceberg table: demo.tpch.region from region.parquet +[2025-05-07 12:17:33] [INFO] Created Iceberg table: demo.tpch.supplier from supplier.parquet +[2025-05-07 12:17:35] [INFO] Created Iceberg table: demo.tpch.orders from orders.parquet +[2025-05-07 12:17:35] [INFO] Created Iceberg table: demo.tpch.nation from nation.parquet +[2025-05-07 12:17:40] [INFO] Created Iceberg table: demo.tpch.lineitem from lineitem.parquet +[2025-05-07 12:17:40] [INFO] Created Iceberg table: demo.tpch.partsupp from partsupp.parquet +[2025-05-07 12:17:41] [INFO] Created Iceberg table: demo.tpch.customer from customer.parquet ++---------+---------+-----------+ +|namespace|tableName|isTemporary| ++---------+---------+-----------+ +| tpch| customer| false| +| tpch| lineitem| false| +| tpch| nation| false| +| tpch| orders| false| +| tpch| part| false| +| tpch| partsupp| false| +| tpch| region| false| +| tpch| supplier| false| ++---------+---------+-----------+ + +[2025-05-07 12:17:42] [SUCCESS] TPCH data generation and loading completed successfully +``` + +### Query Data in Databend + +Once the TPC-H tables are loaded, you can query the data in Databend: + +1. Launch Databend in Docker: + +```bash +docker network create iceberg_net +``` + +```bash +docker run -d \ + --name databend \ + --network iceberg_net \ + -p 3307:3307 \ + -p 8000:8000 \ + -p 8124:8124 \ + -p 8900:8900 \ + datafuselabs/databend +``` + +2. Connect to Databend using BendSQL first, and then create an Iceberg catalog: + +```bash +bendsql +``` + +```bash +Welcome to BendSQL 0.24.1-f1f7de0(2024-12-04T12:31:18.526234000Z). +Connecting to localhost:8000 as user root. +Connected to Databend Query v1.2.725-8d073f6b7a(rust-1.88.0-nightly-2025-04-21T11:49:03.577976082Z) +Loaded 1436 auto complete keywords from server. +Started web server at 127.0.0.1:8080 +``` + +```sql +CREATE CATALOG iceberg TYPE = ICEBERG CONNECTION = ( + TYPE = 'rest' + ADDRESS = 'http://host.docker.internal:8181' + warehouse = 's3://warehouse/wh/' + "s3.endpoint" = 'http://host.docker.internal:9000' + "s3.access-key-id" = 'admin' + "s3.secret-access-key" = 'password' + "s3.region" = 'us-east-1' +); +``` + +3. Use the newly created catalog: + +```sql +USE CATALOG iceberg; +``` + +4. Show available databases: + +```sql +SHOW DATABASES; +``` + +```sql +╭──────────────────────╮ +│ databases_in_iceberg │ +│ String │ +├──────────────────────┤ +│ tpch │ +╰──────────────────────╯ +``` + +5. Run a sample query to aggregate TPC-H data: + +```bash +SELECT + l_returnflag, + l_linestatus, + SUM(l_quantity) AS sum_qty, + SUM(l_extendedprice) AS sum_base_price, + SUM(l_extendedprice * (1 - l_discount)) AS sum_disc_price, + SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge, + AVG(l_quantity) AS avg_qty, + AVG(l_extendedprice) AS avg_price, + AVG(l_discount) AS avg_disc, + COUNT(*) AS count_order +FROM + iceberg.tpch.lineitem +GROUP BY + l_returnflag, + l_linestatus +ORDER BY + l_returnflag, + l_linestatus; +``` + +```sql +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ l_returnflag │ l_linestatus │ sum_qty │ sum_base_price │ sum_disc_price │ sum_charge │ avg_qty │ avg_price │ avg_disc │ count_order │ +│ Nullable(String) │ Nullable(String) │ Nullable(Decimal(38, 2)) │ Nullable(Decimal(38, 2)) │ Nullable(Decimal(38, 4)) │ Nullable(Decimal(38, 6)) │ Nullable(Decimal(38, 8)) │ Nullable(Decimal(38, 8)) │ Nullable(Decimal(38, 8)) │ UInt64 │ +├──────────────────┼──────────────────┼──────────────────────────┼──────────────────────────┼──────────────────────────┼──────────────────────────┼──────────────────────────┼──────────────────────────┼──────────────────────────┼─────────────┤ +│ A │ F │ 37734107.00 │ 56586554400.73 │ 53758257134.8700 │ 55909065222.827692 │ 25.52200585 │ 38273.12973462 │ 0.04998530 │ 1478493 │ +│ N │ F │ 991417.00 │ 1487504710.38 │ 1413082168.0541 │ 1469649223.194375 │ 25.51647192 │ 38284.46776085 │ 0.05009343 │ 38854 │ +│ N │ O │ 76633518.00 │ 114935210409.19 │ 109189591897.4720 │ 113561024263.013782 │ 25.50201964 │ 38248.01560906 │ 0.05000026 │ 3004998 │ +│ R │ F │ 37719753.00 │ 56568041380.90 │ 53741292684.6040 │ 55889619119.831932 │ 25.50579361 │ 38250.85462610 │ 0.05000941 │ 1478870 │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +## Datatype Mapping + +This table maps data types between Apache Iceberg™ and Databend. Please note that Databend does not currently support Iceberg data types that are not listed in the table. + +| Apache Iceberg™ | Databend | +| ------------------------------------------- | ------------------------------------------------------------------------ | +| BOOLEAN | [BOOLEAN](/tidb-cloud-lake/sql/boolean.md) | +| INT | [INT32](/tidb-cloud-lake/sql/numeric.md#integer-data-types) | +| LONG | [INT64](/tidb-cloud-lake/sql/numeric.md#integer-data-types) | +| DATE | [DATE](/tidb-cloud-lake/sql/datetime.md) | +| TIMESTAMP/TIMESTAMPZ | [TIMESTAMP](/tidb-cloud-lake/sql/datetime.md) | +| FLOAT | [FLOAT](/tidb-cloud-lake/sql/numeric.md#floating-point-data-types) | +| DOUBLE | [DOUBLE](/tidb-cloud-lake/sql/numeric.md#floating-point-data-type) | +| STRING/BINARY | [STRING](/tidb-cloud-lake/sql/string.md) | +| DECIMAL | [DECIMAL](/tidb-cloud-lake/sql/decimal.md) | +| ARRAY<TYPE> | [ARRAY](/tidb-cloud-lake/sql/array.md), supports nesting | +| MAP<KEYTYPE, VALUETYPE> | [MAP](/tidb-cloud-lake/sql/map.md) | +| STRUCT<COL1: TYPE1, COL2: TYPE2, ...> | [TUPLE](/tidb-cloud-lake/sql/tuple.md) | +| LIST | [ARRAY](/tidb-cloud-lake/sql/array.md) | + +## Managing Catalogs + +Databend provides you the following commands to manage catalogs: + +- [CREATE CATALOG](#create-catalog) +- [SHOW CREATE CATALOG](#show-create-catalog) +- [SHOW CATALOGS](#show-catalogs) +- [USE CATALOG](#use-catalog) + +### CREATE CATALOG + +Defines and establishes a new catalog in the Databend query engine. + +#### Syntax + +```sql +CREATE CATALOG +TYPE=ICEBERG +CONNECTION=( + TYPE='' + ADDRESS='
' + WAREHOUSE='' + ""='' + ""='' + ... +); +``` + +| Parameter | Required? | Description | +| ---------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `` | Yes | The name of the catalog you want to create. | +| `TYPE` | Yes | Specifies the catalog type. For Apache Iceberg™, set to `ICEBERG`. | +| `CONNECTION` | Yes | The connection parameters for the Iceberg catalog. | +| `TYPE` (inside `CONNECTION`) | Yes | The connection type. For Iceberg, it is typically set to `rest` for REST-based connection. | +| `ADDRESS` | Yes | The address or URL of the Iceberg service (e.g., `http://127.0.0.1:8181`). | +| `WAREHOUSE` | Yes | The location of the Iceberg warehouse, usually an S3 bucket or compatible object storage system. | +| `` | Yes | Connection parameters to establish connections with external storage. The required parameters vary based on the specific storage service and authentication methods. See the table below for a full list of the available parameters. | + +| Connection Parameter | Description | +| --------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | +| `s3.endpoint` | S3 endpoint. | +| `s3.access-key-id` | S3 access key ID. | +| `s3.secret-access-key` | S3 secret access key. | +| `s3.session-token` | S3 session token, required when using temporary credentials. | +| `s3.region` | S3 region. | +| `client.region` | Region to use for the S3 client, takes precedence over `s3.region`. | +| `s3.path-style-access` | S3 Path Style Access. | +| `s3.sse.type` | S3 Server-Side Encryption (SSE) type. | +| `s3.sse.key` | S3 SSE key. If encryption type is `kms`, this is a KMS Key ID. If encryption type is `custom`, this is a base-64 AES256 symmetric key. | +| `s3.sse.md5` | S3 SSE MD5 checksum. | +| `client.assume-role.arn` | ARN of the IAM role to assume instead of using the default credential chain. | +| `client.assume-role.external-id` | Optional external ID used to assume an IAM role. | +| `client.assume-role.session-name` | Optional session name used to assume an IAM role. | +| `s3.allow-anonymous` | Option to allow anonymous access (e.g., for public buckets/folders). | +| `s3.disable-ec2-metadata` | Option to disable loading credentials from EC2 metadata (typically used with `s3.allow-anonymous`). | +| `s3.disable-config-load` | Option to disable loading configuration from config files and environment variables. | + +### Catalog Types + +Databend supports four types of Iceberg catalogs: + +- REST Catalog + +REST catalog uses a RESTful API approach to interact with Iceberg tables. + +```sql +CREATE CATALOG iceberg_rest TYPE = ICEBERG CONNECTION = ( + TYPE = 'rest' + ADDRESS = 'http://localhost:8181' + warehouse = 's3://warehouse/demo/' + "s3.endpoint" = 'http://localhost:9000' + "s3.access-key-id" = 'admin' + "s3.secret-access-key" = 'password' + "s3.region" = 'us-east-1' +) + +- AWS Glue Catalog +For Glue catalogs, the configuration includes both Glue service parameters and storage (S3) parameters. The Glue service parameters appear first, followed by the S3 storage parameters (prefixed with "s3."). + +```sql +CREATE CATALOG iceberg_glue TYPE = ICEBERG CONNECTION = ( + TYPE = 'glue' + ADDRESS = 'http://localhost:5000' + warehouse = 's3a://warehouse/glue/' + "aws_access_key_id" = 'my_access_id' + "aws_secret_access_key" = 'my_secret_key' + "region_name" = 'us-east-1' + "s3.endpoint" = 'http://localhost:9000' + "s3.access-key-id" = 'admin' + "s3.secret-access-key" = 'password' + "s3.region" = 'us-east-1' +) +``` + +- Storage Catalog (S3Tables Catalog) + +The Storage catalog requires a table_bucket_arn parameter. Unlike other buckets, S3Tables bucket is not a physical bucket, but a virtual bucket that is managed by S3Tables. You cannot directly access the bucket with a path like `s3://{bucket_name}/{file_path}`. All operations are performed with respect to the bucket ARN. + +Properties Parameters +The following properties are available for the catalog: + +``` +profile_name: The name of the AWS profile to use. +region_name: The AWS region to use. +aws_access_key_id: The AWS access key ID to use. +aws_secret_access_key: The AWS secret access key to use. +aws_session_token: The AWS session token to use. +``` + +```sql +CREATE CATALOG iceberg_storage TYPE = ICEBERG CONNECTION = ( + TYPE = 'storage' + ADDRESS = 'http://localhost:9111' + "table_bucket_arn" = 'my-bucket' + -- Additional properties as needed +) +``` + +- Hive Catalog (HMS Catalog) + +The Hive catalog requires an ADDRESS parameter, which is the address of the Hive metastore. It also requires a warehouse parameter, which is the location of the Iceberg warehouse, usually an S3 bucket or compatible object storage system. + +```sql +CREATE CATALOG iceberg_hms TYPE = ICEBERG CONNECTION = ( + TYPE = 'hive' + ADDRESS = '192.168.10.111:9083' + warehouse = 's3a://warehouse/hive/' + "s3.endpoint" = 'http://localhost:9000' + "s3.access-key-id" = 'admin' + "s3.secret-access-key" = 'password' + "s3.region" = 'us-east-1' +) +``` + +### SHOW CREATE CATALOG + +Returns the detailed configuration of a specified catalog, including its type and storage parameters. + +#### Syntax + +```sql +SHOW CREATE CATALOG ; +``` + +### SHOW CATALOGS + +Shows all the created catalogs. + +#### Syntax + +```sql +SHOW CATALOGS [LIKE ''] +``` + +### USE CATALOG + +Switches the current session to the specified catalog. + +#### Syntax + +```sql +USE CATALOG +``` + +## Caching Iceberg Catalog + +Databend offers a Catalog Metadata Cache specifically designed for Iceberg catalogs. When a query is executed on an Iceberg table for the first time, the metadata is cached in memory. By default, this cache remains valid for 10 minutes, after which it is asynchronously refreshed. This ensures that queries on Iceberg tables are faster by avoiding repeated metadata retrieval. + +If you need fresh metadata, you can manually refresh the cache using the following commands: + +```sql +USE CATALOG iceberg; +ALTER DATABASE tpch REFRESH CACHE; -- Refresh metadata cache for the tpch database +ALTER TABLE tpch.lineitem REFRESH CACHE; -- Refresh metadata cache for the lineitem table +``` + +If you prefer not to use the metadata cache, you can disable it entirely by configuring the `iceberg_table_meta_count` setting to `0` in the [databend-query.toml](https://github.com/databendlabs/databend/blob/main/scripts/distribution/configs/databend-query.toml) configuration file: + +```toml +... +# Cache config. +[cache] +... +iceberg_table_meta_count = 0 +... +``` + +In addition to metadata caching, Databend also supports table data caching for Iceberg catalog tables, similar to Fuse tables. For more information on data caching, refer to the [cache section](/guides/self-hosted/references/node-config/query-config#cache-section) in the Query Configurations reference. + +## Writing to Iceberg Tables + +Databend supports writing data to Iceberg tables using `INSERT INTO`. You can create Iceberg tables directly with the `ENGINE = ICEBERG` clause and optionally define partition columns using `PARTITION BY`. + +### Creating Iceberg Tables + +#### Syntax + +```sql +CREATE TABLE ( + +) ENGINE = ICEBERG +[PARTITION BY ([, , ...])]; +``` + +- `ENGINE = ICEBERG`: Specifies that the table is stored in Iceberg format. +- `PARTITION BY`: Optional. Defines one or more columns for partitioning the table data. + +#### Supported Data Types + +The following Databend data types are supported for writing to Iceberg tables: + +| Databend Type | Iceberg Type | +|---------------|-------------| +| BOOLEAN | Boolean | +| INT | Int | +| BIGINT | Long | +| FLOAT | Float | +| DOUBLE | Double | +| STRING | String | +| DATE | Date | +| TIMESTAMP | Timestamp | + +### Inserting Data + +Use standard `INSERT INTO` statements to write data into Iceberg tables: + +```sql +INSERT INTO VALUES (...), (...); +``` + +Both partitioned and non-partitioned tables support single-row and multi-row inserts. For partitioned tables, Databend automatically routes rows to the correct partitions. Null values in partition columns are also supported. + +### Examples + +#### Non-Partitioned Table + +```sql +CREATE TABLE t_scores(id INT, name STRING, score DOUBLE) ENGINE = ICEBERG; + +INSERT INTO t_scores VALUES (1, 'alice', 85.5); +INSERT INTO t_scores VALUES (2, 'bob', 90.0), (3, 'charlie', 75.5); + +SELECT * FROM t_scores; + +┌──────────────────────────────────────────┐ +│ id │ name │ score │ +├────────┼──────────┼─────────────────────┤ +│ 1 │ alice │ 85.5 │ +│ 2 │ bob │ 90.0 │ +│ 3 │ charlie │ 75.5 │ +└──────────────────────────────────────────┘ +``` + +#### Single-Field Partitioned Table + +```sql +CREATE TABLE t_partitioned(id INT, category STRING, amount DOUBLE) +ENGINE = ICEBERG +PARTITION BY (category); + +INSERT INTO t_partitioned VALUES (1, 'A', 100.5); +INSERT INTO t_partitioned VALUES (2, 'B', 200.0), (3, 'A', 150.5), (4, 'C', 400.0); + +SELECT * FROM t_partitioned; + +┌──────────────────────────────────────────────┐ +│ id │ category │ amount │ +├────────┼────────────┼────────────────────────┤ +│ 1 │ A │ 100.5 │ +│ 3 │ A │ 150.5 │ +│ 2 │ B │ 200.0 │ +│ 4 │ C │ 400.0 │ +└──────────────────────────────────────────────┘ +``` + +#### Multi-Field Partitioned Table + +```sql +CREATE TABLE t_multi_part(id INT, region STRING, year INT, amount DOUBLE) +ENGINE = ICEBERG +PARTITION BY (region, year); + +INSERT INTO t_multi_part VALUES + (1, 'US', 2023, 100.5), + (2, 'EU', 2023, 200.5), + (3, 'US', 2024, 300.5), + (4, 'EU', 2024, 400.5); + +-- Insert into existing partitions +INSERT INTO t_multi_part VALUES + (5, 'US', 2023, 500.5); + +-- Null values in partition columns are supported +INSERT INTO t_multi_part VALUES + (6, NULL, 2023, 600.5), + (7, 'US', NULL, 700.5); + +SELECT * FROM t_multi_part; + +┌──────────────────────────────────────────────────────┐ +│ id │ region │ year │ amount │ +├────────┼──────────┼──────────┼───────────────────────┤ +│ 1 │ US │ 2023 │ 100.5 │ +│ 5 │ US │ 2023 │ 500.5 │ +│ 3 │ US │ 2024 │ 300.5 │ +│ 2 │ EU │ 2023 │ 200.5 │ +│ 4 │ EU │ 2024 │ 400.5 │ +│ 6 │ NULL │ 2023 │ 600.5 │ +│ 7 │ US │ NULL │ 700.5 │ +└──────────────────────────────────────────────────────┘ +``` + +## Apache Iceberg™ Table Functions + +Databend provides the following table functions for querying Iceberg metadata, allowing users to inspect snapshots and manifests efficiently: + +- [ICEBERG_MANIFEST](/tidb-cloud-lake/sql/iceberg-manifest.md) +- [ICEBERG_SNAPSHOT](/tidb-cloud-lake/sql/iceberg-snapshot.md) + +## Usage Examples + +This example shows how to create an Iceberg catalog using a REST-based connection, specifying the service address, warehouse location (S3), and optional parameters like AWS region and custom endpoint: + +```sql +CREATE CATALOG ctl +TYPE=ICEBERG +CONNECTION=( + TYPE='rest' + ADDRESS='http://127.0.0.1:8181' + WAREHOUSE='s3://iceberg-tpch' + "s3.region"='us-east-1' + "s3.endpoint"='http://127.0.0.1:9000' +); +``` diff --git a/tidb-cloud-lake/sql/approx-count-distinct.md b/tidb-cloud-lake/sql/approx-count-distinct.md new file mode 100644 index 0000000000000..293f837c8b5ba --- /dev/null +++ b/tidb-cloud-lake/sql/approx-count-distinct.md @@ -0,0 +1,52 @@ +--- +title: APPROX_COUNT_DISTINCT +--- + +Estimates the number of distinct values in a data set with the [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) algorithm. + +The HyperLogLog algorithm provides an approximation of the number of unique elements using little memory and time. Consider using this function when dealing with large data sets where an estimated result can be accepted. In exchange for some accuracy, this is a fast and efficient method of returning distinct counts. + +To get an accurate result, use [COUNT_DISTINCT](/tidb-cloud-lake/sql/count-distinct.md). See [Examples](#examples) for more explanations. + +## Syntax + +```sql +APPROX_COUNT_DISTINCT() +``` + +## Return Type + +Integer. + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE user_events ( + id INT, + user_id INT, + event_name VARCHAR +); + +INSERT INTO user_events (id, user_id, event_name) +VALUES (1, 1, 'Login'), + (2, 2, 'Login'), + (3, 3, 'Login'), + (4, 1, 'Logout'), + (5, 2, 'Logout'), + (6, 4, 'Login'), + (7, 1, 'Login'); +``` + +**Query Demo: Estimate the Number of Distinct User IDs** +```sql +SELECT APPROX_COUNT_DISTINCT(user_id) AS approx_distinct_user_count +FROM user_events; +``` + +**Result** +```sql +| approx_distinct_user_count | +|----------------------------| +| 4 | +``` diff --git a/tidb-cloud-lake/sql/arg-max.md b/tidb-cloud-lake/sql/arg-max.md new file mode 100644 index 0000000000000..e90318d4aea70 --- /dev/null +++ b/tidb-cloud-lake/sql/arg-max.md @@ -0,0 +1,59 @@ +--- +title: ARG_MAX +--- + +Calculates the `arg` value for a maximum `val` value. If there are several values of `arg` for maximum values of `val`, returns the first of these values encountered. + +## Syntax + +```sql +ARG_MAX(, ) +``` + +## Arguments + +| Arguments | Description | +|-----------|---------------------------------------------------------------------------------------------------| +| `` | Argument of [any data type that Databend supports](/tidb-cloud-lake/sql/data-types.md) | +| `` | Value of [any data type that Databend supports](/tidb-cloud-lake/sql/data-types.md) | + +## Return Type + +`arg` value that corresponds to maximum `val` value. + + matches `arg` type. + +## Example + +**Creating a Table and Inserting Sample Data** + +Let's create a table named "sales" and insert some sample data: +```sql +CREATE TABLE sales ( + id INTEGER, + product VARCHAR(50), + price FLOAT +); + +INSERT INTO sales (id, product, price) +VALUES (1, 'Product A', 10.5), + (2, 'Product B', 20.75), + (3, 'Product C', 30.0), + (4, 'Product D', 15.25), + (5, 'Product E', 25.5); +``` + +**Query: Using ARG_MAX() Function** + +Now, let's use the ARG_MAX() function to find the product that has the maximum price: +```sql +SELECT ARG_MAX(product, price) AS max_price_product +FROM sales; +``` + +The result should look like this: +```sql +| max_price_product | +| ----------------- | +| Product C | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/arg-min.md b/tidb-cloud-lake/sql/arg-min.md new file mode 100644 index 0000000000000..7f0e7c26eca22 --- /dev/null +++ b/tidb-cloud-lake/sql/arg-min.md @@ -0,0 +1,57 @@ +--- +title: ARG_MIN +--- + +Calculates the `arg` value for a minimum `val` value. If there are several different values of `arg` for minimum values of `val`, returns the first of these values encountered. + +## Syntax + +```sql +ARG_MIN(, ) +``` + +## Arguments + +| Arguments | Description | +| --------- | ------------------------------------------------------------------------------------------------- | +| `` | Argument of [any data type that Databend supports](/tidb-cloud-lake/sql/data-types.md) | +| `` | Value of [any data type that Databend supports](/tidb-cloud-lake/sql/data-types.md) | + +## Return Type + +`arg` value that corresponds to minimum `val` value. + +matches `arg` type. + +## Example + +Let's create a table students with columns id, name, and score, and insert some data: + +```sql +CREATE TABLE students ( + id INT, + name VARCHAR, + score INT +); + +INSERT INTO students (id, name, score) VALUES + (1, 'Alice', 80), + (2, 'Bob', 75), + (3, 'Charlie', 90), + (4, 'Dave', 80); +``` + +Now, we can use ARG_MIN to find the name of the student with the lowest score: + +```sql +SELECT ARG_MIN(name, score) AS student_name +FROM students; +``` + +Result: + +```sql +| student_name | +|--------------| +| Bob | +``` diff --git a/tidb-cloud-lake/sql/arithmetic-operators.md b/tidb-cloud-lake/sql/arithmetic-operators.md new file mode 100644 index 0000000000000..73085e0edb255 --- /dev/null +++ b/tidb-cloud-lake/sql/arithmetic-operators.md @@ -0,0 +1,29 @@ +--- +title: Arithmetic Operators +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +| Operator | Description | Example | Result | +| --------------------- | --------------------------------------------------------- | -------------------------- | --------- | +| **+ (unary)** | Returns `a` | **+5** | 5 | +| **+** | Adds two numeric expressions | **4 + 1** | 5 | +| **- (unary)** | Negates the numeric expression | **-5** | -5 | +| **-** | Subtract two numeric expressions | **4 - 1** | 3 | +| **\*** | Multiplies two numeric expressions | **4 \* 1** | 4 | +| **/** | Divides one numeric expression (`a`) by another (`b`) | **4 / 2** | 2 | +| **//** | Computes the integer division of numeric expression | **4 // 3** | 1 | +| **%** | Computes the modulo of numeric expression | **4 % 2** | 0 | +| **^** | Computes the exponentiation of numeric expression | **4 ^ 2** | 16 | +| **|/** | Computes the square root of numeric expression | **|/ 25.0** | 5 | +| **||/** | Computes the cube root of numeric expression | **||/ 27.0** | 3 | +| **@** | Computes the abs of numeric expression | **@ -5.0** | 5 | +| **&** | Computes the bitwise and of numeric expression | **91 & 15** | 11 | +| **|** | Computes the bitwise or of numeric expression | **32 | 3** | 35 | +| **#** | Computes the bitwise xor of numeric expression | **17 # 5** | 20 | +| **~** | Computes the bitwise not of numeric expression | **~ 1** | ~2 | +| **`<<`** | Computes the bitwise shift left of numeric expression | **1 `<<` 4** | 16 | +| **>>** | Computes the bitwise shift right of numeric expression | **8 >> 2** | 2 | +| **`<->`** | Computes the Euclidean distance (L2 norm) between vectors | **[1, 2] `<->` [2, 3]** | 1.4142135 | diff --git a/tidb-cloud-lake/sql/array-agg.md b/tidb-cloud-lake/sql/array-agg.md new file mode 100644 index 0000000000000..3920308862575 --- /dev/null +++ b/tidb-cloud-lake/sql/array-agg.md @@ -0,0 +1,71 @@ +--- +title: ARRAY_AGG +title_includes: LIST +--- + +The ARRAY_AGG function (also known by its alias LIST) transforms all the values, excluding NULL, of a specific column in a query result into an array. + +## Syntax + +```sql +ARRAY_AGG() [ WITHIN GROUP ( ) ] + +LIST() +``` + +## Arguments + +| Arguments | Description | +|-----------| -------------- | +| `` | Any expression | + +## Optional + +| Optional | Description | +|-------------------------------------|-------------------------------------------------------| +| WITHIN GROUP [<orderby_clause>](https://docs.databend.com/sql/sql-commands/query-syntax/query-select#order-by-clause) | Defines the order of values in ordered set aggregates | + +## Return Type + +Returns an [Array](/tidb-cloud-lake/sql/array.md) with elements that are of the same type as the original data. + +## Examples + +This example demonstrates how the ARRAY_AGG function can be used to aggregate and present data in a convenient array format: + +```sql +-- Create a table and insert sample data +CREATE TABLE movie_ratings ( + id INT, + movie_title VARCHAR, + user_id INT, + rating INT +); + +INSERT INTO movie_ratings (id, movie_title, user_id, rating) +VALUES (1, 'Inception', 1, 5), + (2, 'Inception', 2, 4), + (3, 'Inception', 3, 5), + (4, 'Interstellar', 1, 4), + (5, 'Interstellar', 2, 3); + +-- List all ratings for Inception in an array +SELECT movie_title, ARRAY_AGG(rating) AS ratings +FROM movie_ratings +WHERE movie_title = 'Inception' +GROUP BY movie_title; + +| movie_title | ratings | +|-------------|------------| +| Inception | [5, 4, 5] | + +-- List all ratings for Inception in an array Using `WITHIN GROUP` +SELECT movie_title, ARRAY_AGG(rating) WITHIN GROUP ( ORDER BY rating DESC ) AS ratings +FROM movie_ratings +WHERE movie_title = 'Inception' +GROUP BY movie_title; + +| movie_title | ratings | +|-------------|------------| +| Inception | [5, 5, 4] | +``` diff --git a/tidb-cloud-lake/sql/array-aggregate.md b/tidb-cloud-lake/sql/array-aggregate.md new file mode 100644 index 0000000000000..05ffd5e06c4d7 --- /dev/null +++ b/tidb-cloud-lake/sql/array-aggregate.md @@ -0,0 +1,27 @@ +--- +title: ARRAY_AGGREGATE +--- + +Aggregates elements in the array with an aggregate function. + +## Syntax + +```sql +ARRAY_AGGREGATE( , '' ) +``` + +- Supported aggregate functions include `avg`, `count`, `max`, `min`, `sum`, `any`, `stddev_samp`, `stddev_pop`, `stddev`, `std`, `median`, `approx_count_distinct`, `kurtosis`, and `skewness`. + +- The syntax can be rewritten as `ARRAY_( )`. For example, `ARRAY_AVG( )`. + +## Examples + +```sql +SELECT ARRAY_AGGREGATE([1, 2, 3, 4], 'SUM'), ARRAY_SUM([1, 2, 3, 4]); + +┌────────────────────────────────────────────────────────────────┐ +│ array_aggregate([1, 2, 3, 4], 'sum') │ array_sum([1, 2, 3, 4]) │ +├──────────────────────────────────────┼─────────────────────────┤ +│ 10 │ 10 │ +└────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-any.md b/tidb-cloud-lake/sql/array-any.md new file mode 100644 index 0000000000000..e68f1ba813670 --- /dev/null +++ b/tidb-cloud-lake/sql/array-any.md @@ -0,0 +1,47 @@ +--- +title: ARRAY_ANY +--- + +Returns the first non-`NULL` element from an array. Equivalent to `ARRAY_AGGREGATE(, 'ANY')`. + +## Syntax + +```sql +ARRAY_ANY() +``` + +## Return Type + +Same as the array element type. + +## Examples + +```sql +SELECT ARRAY_ANY(['a', 'b', 'c']) AS first_item; + +┌────────────┐ +│ first_item │ +├────────────┤ +│ a │ +└────────────┘ +``` + +```sql +SELECT ARRAY_ANY([NULL, 'x', 'y']) AS first_non_null; + +┌────────────────┐ +│ first_non_null │ +├────────────────┤ +│ x │ +└────────────────┘ +``` + +```sql +SELECT ARRAY_ANY([NULL, 10, 20]) AS first_number; + +┌──────────────┐ +│ first_number │ +├──────────────┤ +│ 10 │ +└──────────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-append.md b/tidb-cloud-lake/sql/array-append.md new file mode 100644 index 0000000000000..bf0e79805ba84 --- /dev/null +++ b/tidb-cloud-lake/sql/array-append.md @@ -0,0 +1,72 @@ +--- +title: ARRAY_APPEND +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Appends an element to the end of an array. + +## Syntax + +```sql +ARRAY_APPEND(array, element) +``` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| array | The source array to which the element will be appended. | +| element | The element to append to the array. | + +## Return Type + +Array with the appended element. + +## Notes + +This function works with both standard array types and variant array types. + +## Examples + +### Example 1: Appending to a Standard Array + +```sql +SELECT ARRAY_APPEND([1, 2, 3], 4); +``` + +Result: + +``` +[1, 2, 3, 4] +``` + +### Example 2: Appending to a Variant Array + +```sql +SELECT ARRAY_APPEND(PARSE_JSON('[1, 2, 3]'), 4); +``` + +Result: + +``` +[1, 2, 3, 4] +``` + +### Example 3: Appending Different Data Types + +```sql +SELECT ARRAY_APPEND(['a', 'b'], 'c'); +``` + +Result: + +``` +["a", "b", "c"] +``` + +## Related Functions + +- [ARRAY_PREPEND](/tidb-cloud-lake/sql/array-prepend.md): Prepends an element to the beginning of an array +- [ARRAY_CONCAT](/tidb-cloud-lake/sql/array-concat.md): Concatenates two arrays diff --git a/tidb-cloud-lake/sql/array-approx-count-distinct.md b/tidb-cloud-lake/sql/array-approx-count-distinct.md new file mode 100644 index 0000000000000..842fd16674e51 --- /dev/null +++ b/tidb-cloud-lake/sql/array-approx-count-distinct.md @@ -0,0 +1,37 @@ +--- +title: ARRAY_APPROX_COUNT_DISTINCT +--- + +Returns an approximate count of distinct elements in an array, ignoring `NULL` values. This uses the same HyperLogLog-based estimator as [`APPROX_COUNT_DISTINCT`](/tidb-cloud-lake/sql/approx-count-distinct.md). + +## Syntax + +```sql +ARRAY_APPROX_COUNT_DISTINCT() +``` + +## Return Type + +`BIGINT` + +## Examples + +```sql +SELECT ARRAY_APPROX_COUNT_DISTINCT([1, 1, 2, 3, 3, 3]) AS approx_cnt; + +┌────────────┐ +│ approx_cnt │ +├────────────┤ +│ 3 │ +└────────────┘ +``` + +```sql +SELECT ARRAY_APPROX_COUNT_DISTINCT([NULL, 'a', 'a', 'b']) AS approx_cnt_text; + +┌──────────────────┐ +│ approx_cnt_text │ +├──────────────────┤ +│ 2 │ +└──────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-avg.md b/tidb-cloud-lake/sql/array-avg.md new file mode 100644 index 0000000000000..a380a97d4c594 --- /dev/null +++ b/tidb-cloud-lake/sql/array-avg.md @@ -0,0 +1,47 @@ +--- +title: ARRAY_AVG +--- + +Returns the average of the numeric items in an array. `NULL` elements are ignored; non-numeric values raise an error. + +## Syntax + +```sql +ARRAY_AVG() +``` + +## Return Type + +Numeric (uses the smallest numeric type that can represent the result). + +## Examples + +```sql +SELECT ARRAY_AVG([1, 2, 3, 4]) AS avg_int; + +┌─────────┐ +│ avg_int │ +├─────────┤ +│ 2.5 │ +└─────────┘ +``` + +```sql +SELECT ARRAY_AVG([1.5, 2.5, 3.5]) AS avg_decimal; + +┌──────────────┐ +│ avg_decimal │ +├──────────────┤ +│ 2.5000 │ +└──────────────┘ +``` + +```sql +SELECT ARRAY_AVG([10, NULL, 4]) AS avg_with_null; + +┌──────────────┐ +│ avg_with_null│ +├──────────────┤ +│ 7.0 │ +└──────────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-compact.md b/tidb-cloud-lake/sql/array-compact.md new file mode 100644 index 0000000000000..b1b73ec0ff84c --- /dev/null +++ b/tidb-cloud-lake/sql/array-compact.md @@ -0,0 +1,66 @@ +--- +title: ARRAY_COMPACT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Removes all NULL values from an array. + +## Syntax + +```sql +ARRAY_COMPACT(array) +``` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| array | The array from which to remove NULL values. | + +## Return Type + +Array without NULL values. + +## Notes + +This function works with both standard array types and variant array types. + +## Examples + +### Example 1: Removing NULLs from a Standard Array + +```sql +SELECT ARRAY_COMPACT([1, NULL, 2, NULL, 3]); +``` + +Result: + +``` +[1, 2, 3] +``` + +### Example 2: Removing NULLs from a Variant Array + +```sql +SELECT ARRAY_COMPACT(PARSE_JSON('["apple", null, "banana", null, "orange"]')); +``` + +Result: + +``` +["apple", "banana", "orange"] +``` + +### Example 3: Array with No NULLs + +```sql +SELECT ARRAY_COMPACT([1, 2, 3]); +``` + +Result: + +``` +[1, 2, 3] +``` diff --git a/tidb-cloud-lake/sql/array-concat.md b/tidb-cloud-lake/sql/array-concat.md new file mode 100644 index 0000000000000..ad5b6963d44b6 --- /dev/null +++ b/tidb-cloud-lake/sql/array-concat.md @@ -0,0 +1,23 @@ +--- +title: ARRAY_CONCAT +--- + +Concats two arrays. + +## Syntax + +```sql +ARRAY_CONCAT( , ) +``` + +## Examples + +```sql +SELECT ARRAY_CONCAT([1, 2], [3, 4]); + +┌──────────────────────────────┐ +│ array_concat([1, 2], [3, 4]) │ +├──────────────────────────────┤ +│ [1,2,3,4] │ +└──────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-construct.md b/tidb-cloud-lake/sql/array-construct.md new file mode 100644 index 0000000000000..feabd25106467 --- /dev/null +++ b/tidb-cloud-lake/sql/array-construct.md @@ -0,0 +1,64 @@ +--- +title: ARRAY_CONSTRUCT +title_includes: JSON_ARRAY +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a JSON array with specified values. + +## Aliases + +- `JSON_ARRAY` + +## Syntax + +```sql +ARRAY_CONSTRUCT(value1[, value2[, ...]]) +``` + +## Return Type + +JSON array. + +## Examples + +### Example 1: Creating JSON Array with Constant Values or Expressions + +```sql +SELECT ARRAY_CONSTRUCT('Databend', 3.14, NOW(), TRUE, NULL); + +array_construct('databend', 3.14, now(), true, null) | +--------------------------------------------------------+ +["Databend",3.14,"2023-09-06 07:23:55.399070",true,null]| + +SELECT ARRAY_CONSTRUCT('fruits', ARRAY_CONSTRUCT('apple', 'banana', 'orange'), OBJECT_CONSTRUCT('price', 1.2, 'quantity', 3)); + +array_construct('fruits', array_construct('apple', 'banana', 'orange'), object_construct('price', 1.2, 'quantity', 3))| +-------------------------------------------------------------------------------------------------------+ +["fruits",["apple","banana","orange"],{"price":1.2,"quantity":3}] | +``` + +### Example 2: Creating JSON Array from Table Data + +```sql +CREATE TABLE products ( + ProductName VARCHAR(255), + Price DECIMAL(10, 2) +); + +INSERT INTO products (ProductName, Price) +VALUES + ('Apple', 1.2), + ('Banana', 0.5), + ('Orange', 0.8); + +SELECT ARRAY_CONSTRUCT(ProductName, Price) FROM products; + +array_construct(productname, price)| +------------------------------+ +["Apple",1.2] | +["Banana",0.5] | +["Orange",0.8] | +``` diff --git a/tidb-cloud-lake/sql/array-contains.md b/tidb-cloud-lake/sql/array-contains.md new file mode 100644 index 0000000000000..2f6f47e17180a --- /dev/null +++ b/tidb-cloud-lake/sql/array-contains.md @@ -0,0 +1,67 @@ +--- +title: ARRAY_CONTAINS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns true if the array contains the specified element. + +## Syntax + +```sql +ARRAY_CONTAINS(array, element) +``` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| array | The array to search within. | +| element | The element to search for. | + +## Return Type + +BOOLEAN + +## Notes + +This function works with both standard array types and variant array types. + +## Examples + +### Example 1: Checking a Standard Array + +```sql +SELECT ARRAY_CONTAINS([1, 2, 3], 2); +``` + +Result: + +``` +true +``` + +### Example 2: Checking a Variant Array + +```sql +SELECT ARRAY_CONTAINS(PARSE_JSON('["apple", "banana", "orange"]'), 'banana'); +``` + +Result: + +``` +true +``` + +### Example 3: Element Not Found + +```sql +SELECT ARRAY_CONTAINS([1, 2, 3], 4); +``` + +Result: + +``` +false +``` diff --git a/tidb-cloud-lake/sql/array-count.md b/tidb-cloud-lake/sql/array-count.md new file mode 100644 index 0000000000000..939b576f2d02e --- /dev/null +++ b/tidb-cloud-lake/sql/array-count.md @@ -0,0 +1,47 @@ +--- +title: ARRAY_COUNT +--- + +Counts the non-`NULL` elements in an array. + +## Syntax + +```sql +ARRAY_COUNT() +``` + +## Return Type + +`BIGINT` + +## Examples + +```sql +SELECT ARRAY_COUNT([1, 2, 3]) AS cnt; + +┌─────┐ +│ cnt │ +├─────┤ +│ 3 │ +└─────┘ +``` + +```sql +SELECT ARRAY_COUNT([1, NULL, 3]) AS cnt_with_null; + +┌──────────────┐ +│ cnt_with_null│ +├──────────────┤ +│ 2 │ +└──────────────┘ +``` + +```sql +SELECT ARRAY_COUNT(['a', 'b', NULL]) AS cnt_text; + +┌─────────┐ +│ cnt_text│ +├─────────┤ +│ 2 │ +└─────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-distinct.md b/tidb-cloud-lake/sql/array-distinct.md new file mode 100644 index 0000000000000..c530ea447da4b --- /dev/null +++ b/tidb-cloud-lake/sql/array-distinct.md @@ -0,0 +1,32 @@ +--- +title: ARRAY_DISTINCT +title_includes: JSON_ARRAY_DISTINCT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Removes duplicate elements from a JSON array and returns an array with only distinct elements. + +## Aliases + +- `JSON_ARRAY_DISTINCT` + +## Syntax + +```sql +ARRAY_DISTINCT() +``` + +## Return Type + +JSON array. + +## Examples + +```sql +SELECT ARRAY_DISTINCT('["apple", "banana", "apple", "orange", "banana"]'::VARIANT); + +-[ RECORD 1 ]----------------------------------- +array_distinct('["apple", "banana", "apple", "orange", "banana"]'::VARIANT): ["apple","banana","orange"] +``` diff --git a/tidb-cloud-lake/sql/array-except.md b/tidb-cloud-lake/sql/array-except.md new file mode 100644 index 0000000000000..3ebf949e64b47 --- /dev/null +++ b/tidb-cloud-lake/sql/array-except.md @@ -0,0 +1,41 @@ +--- +title: ARRAY_EXCEPT +title_includes: JSON_ARRAY_EXCEPT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns a new JSON array containing the elements from the first JSON array that are not present in the second JSON array. + +## Aliases + +- `JSON_ARRAY_EXCEPT` + +## Syntax + +```sql +ARRAY_EXCEPT(, ) +``` + +## Return Type + +JSON array. + +## Examples + +```sql +SELECT ARRAY_EXCEPT( + '["apple", "banana", "orange"]'::VARIANT, + '["banana", "grapes"]'::VARIANT +); + +-[ RECORD 1 ]----------------------------------- +array_except('["apple", "banana", "orange"]'::VARIANT, '["banana", "grapes"]'::VARIANT): ["apple","orange"] + +-- Return an empty array because all elements in the first array are present in the second array. +SELECT ARRAY_EXCEPT('["apple", "banana", "orange"]'::VARIANT, '["apple", "banana", "orange"]'::VARIANT) + +-[ RECORD 1 ]----------------------------------- +array_except('["apple", "banana", "orange"]'::VARIANT, '["apple", "banana", "orange"]'::VARIANT): [] +``` diff --git a/tidb-cloud-lake/sql/array-filter.md b/tidb-cloud-lake/sql/array-filter.md new file mode 100644 index 0000000000000..30460c77808f4 --- /dev/null +++ b/tidb-cloud-lake/sql/array-filter.md @@ -0,0 +1,33 @@ +--- +title: ARRAY_FILTER +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Filters elements from a JSON array based on a specified Lambda expression, returning only the elements that satisfy the condition. For more information about Lambda expression, see [Lambda Expressions](/sql/stored-procedure-scripting/#lambda-expressions). + +## Syntax + +```sql +ARRAY_FILTER(, ) +``` + +## Return Type + +JSON array. + +## Examples + +This example filters the array to return only the strings that start with the letter `a`, resulting in `["apple", "avocado"]`: + +```sql +SELECT ARRAY_FILTER( + ['apple', 'banana', 'avocado', 'grape'], + d -> d::String LIKE 'a%' +); + +-[ RECORD 1 ]----------------------------------- +array_filter(['apple', 'banana', 'avocado', 'grape'], d -> d::STRING LIKE 'a%'): ["apple","avocado"] +``` diff --git a/tidb-cloud-lake/sql/array-flatten.md b/tidb-cloud-lake/sql/array-flatten.md new file mode 100644 index 0000000000000..d5f166d405034 --- /dev/null +++ b/tidb-cloud-lake/sql/array-flatten.md @@ -0,0 +1,54 @@ +--- +title: ARRAY_FLATTEN +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Flattens a nested array into a single-dimensional array. + +## Syntax + +```sql +ARRAY_FLATTEN(array) +``` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| array | The nested array to flatten. | + +## Return Type + +Array (flattened). + +## Notes + +This function works with both standard array types and variant array types. + +## Examples + +### Example 1: Flattening a Nested Array + +```sql +SELECT ARRAY_FLATTEN([[1, 2], [3, 4]]); +``` + +Result: + +``` +[1, 2, 3, 4] +``` + +### Example 2: Flattening a Variant Array + +```sql +SELECT ARRAY_FLATTEN(PARSE_JSON('[["a", "b"], ["c", "d"]]')); +``` + +Result: + +``` +["a", "b", "c", "d"] +``` diff --git a/tidb-cloud-lake/sql/array-functions.md b/tidb-cloud-lake/sql/array-functions.md new file mode 100644 index 0000000000000..b32a4fc453f1e --- /dev/null +++ b/tidb-cloud-lake/sql/array-functions.md @@ -0,0 +1,97 @@ +--- +title: Array Functions +--- + +This section provides reference information for array functions in Databend. Array functions enable creation, manipulation, searching, and transformation of array data structures. + +## Array Creation & Construction + +| Function | Description | Example | +|----------|-------------|---------| +| [ARRAY](/tidb-cloud-lake/sql/array.md) | Builds an array from expressions | `ARRAY(1, 2, 3)` → `[1,2,3]` | +| [ARRAY_CONSTRUCT](/tidb-cloud-lake/sql/array-construct.md) | Creates an array from individual values | `ARRAY_CONSTRUCT(1, 2, 3)` → `[1,2,3]` | +| [RANGE](/tidb-cloud-lake/sql/range.md) | Generates an array of sequential numbers | `RANGE(1, 5)` → `[1,2,3,4]` | +| [ARRAY_GENERATE_RANGE](/tidb-cloud-lake/sql/array-generate-range.md) | Generates a sequence with optional step | `ARRAY_GENERATE_RANGE(0, 6, 2)` → `[0,2,4]` | + +## Array Access & Information + +| Function | Description | Example | +|----------|-------------|---------| +| [GET](/tidb-cloud-lake/sql/get.md) | Gets an element from an array by index | `GET([1,2,3], 1)` → `1` | +| [ARRAY_GET](/tidb-cloud-lake/sql/array-get.md) | Alias for GET function | `ARRAY_GET([1,2,3], 1)` → `1` | +| [CONTAINS](/tidb-cloud-lake/sql/contains.md) | Checks if an array contains a specific value | `CONTAINS([1,2,3], 2)` → `true` | +| [ARRAY_CONTAINS](/tidb-cloud-lake/sql/array-contains.md) | Checks if an array contains a specific value | `ARRAY_CONTAINS([1,2,3], 2)` → `true` | +| [ARRAY_SIZE](/tidb-cloud-lake/sql/array-size.md) | Returns array length (alias: `ARRAY_LENGTH`) | `ARRAY_SIZE([1,2,3])` → `3` | +| [ARRAY_COUNT](/tidb-cloud-lake/sql/array-count.md) | Counts non-`NULL` entries | `ARRAY_COUNT([1,NULL,2])` → `2` | +| [ARRAY_ANY](/tidb-cloud-lake/sql/array-any.md) | Returns the first non-`NULL` value | `ARRAY_ANY([NULL,'a','b'])` → `'a'` | + +## Array Modification + +| Function | Description | Example | +|----------|-------------|---------| +| [ARRAY_APPEND](/tidb-cloud-lake/sql/array-append.md) | Appends an element to the end of an array | `ARRAY_APPEND([1,2], 3)` → `[1,2,3]` | +| [ARRAY_PREPEND](/tidb-cloud-lake/sql/array-prepend.md) | Prepends an element to the beginning of an array | `ARRAY_PREPEND(0, [1,2])` → `[0,1,2]` | +| [ARRAY_INSERT](/tidb-cloud-lake/sql/array-insert.md) | Inserts an element at a specific position | `ARRAY_INSERT([1,3], 1, 2)` → `[1,2,3]` | +| [ARRAY_REMOVE](/tidb-cloud-lake/sql/array-remove.md) | Removes all occurrences of a specified element | `ARRAY_REMOVE([1,2,2,3], 2)` → `[1,3]` | +| [ARRAY_REMOVE_FIRST](/tidb-cloud-lake/sql/array-remove-first.md) | Removes the first element from an array | `ARRAY_REMOVE_FIRST([1,2,3])` → `[2,3]` | +| [ARRAY_REMOVE_LAST](/tidb-cloud-lake/sql/array-remove-last.md) | Removes the last element from an array | `ARRAY_REMOVE_LAST([1,2,3])` → `[1,2]` | + +## Array Combination & Manipulation + +| Function | Description | Example | +|----------|-------------|---------| +| [ARRAY_CONCAT](/tidb-cloud-lake/sql/array-concat.md) | Concatenates multiple arrays | `ARRAY_CONCAT([1,2], [3,4])` → `[1,2,3,4]` | +| [ARRAY_SLICE](/tidb-cloud-lake/sql/array-slice.md) | Extracts a portion of an array | `ARRAY_SLICE([1,2,3,4], 1, 2)` → `[1,2]` | +| [SLICE](/tidb-cloud-lake/sql/slice.md) | Alias for ARRAY_SLICE function | `SLICE([1,2,3,4], 1, 2)` → `[1,2]` | +| [ARRAYS_ZIP](/tidb-cloud-lake/sql/arrays-zip.md) | Combines multiple arrays element-wise | `ARRAYS_ZIP([1,2], ['a','b'])` → `[(1,'a'),(2,'b')]` | +| [ARRAY_SORT](/tidb-cloud-lake/sql/array-sort.md) | Sorts values; variants control order/nulls | `ARRAY_SORT([3,1,2])` → `[1,2,3]` | + +## Array Set Operations + +| Function | Description | Example | +|----------|-------------|---------| +| [ARRAY_DISTINCT](/tidb-cloud-lake/sql/array-distinct.md) | Returns unique elements from an array | `ARRAY_DISTINCT([1,2,2,3])` → `[1,2,3]` | +| [ARRAY_UNIQUE](/tidb-cloud-lake/sql/array-unique.md) | Alias for ARRAY_DISTINCT function | `ARRAY_UNIQUE([1,2,2,3])` → `[1,2,3]` | +| [ARRAY_INTERSECTION](/tidb-cloud-lake/sql/array-intersection.md) | Returns common elements between arrays | `ARRAY_INTERSECTION([1,2,3], [2,3,4])` → `[2,3]` | +| [ARRAY_EXCEPT](/tidb-cloud-lake/sql/array-except.md) | Returns elements in first array but not in second | `ARRAY_EXCEPT([1,2,3], [2,4])` → `[1,3]` | +| [ARRAY_OVERLAP](/tidb-cloud-lake/sql/array-overlap.md) | Checks if arrays have common elements | `ARRAY_OVERLAP([1,2,3], [3,4,5])` → `true` | + +## Array Processing & Transformation + +| Function | Description | Example | +|----------|-------------|---------| +| [ARRAY_TRANSFORM](/tidb-cloud-lake/sql/json-array-transform.md) | Applies a function to each array element | `ARRAY_TRANSFORM([1,2,3], x -> x * 2)` → `[2,4,6]` | +| [ARRAY_FILTER](/tidb-cloud-lake/sql/array-filter.md) | Filters array elements based on a condition | `ARRAY_FILTER([1,2,3,4], x -> x > 2)` → `[3,4]` | +| [ARRAY_REDUCE](/tidb-cloud-lake/sql/array-reduce.md) | Reduces array to a single value using aggregation | `ARRAY_REDUCE([1,2,3], 0, (acc,x) -> acc + x)` → `6` | +| [ARRAY_AGGREGATE](/tidb-cloud-lake/sql/array-aggregate.md) | Aggregates array elements using a function | `ARRAY_AGGREGATE([1,2,3], 'sum')` → `6` | + +## Array Aggregations & Statistics + +| Function | Description | Example | +|----------|-------------|---------| +| [ARRAY_SUM](/tidb-cloud-lake/sql/array-sum.md) | Sum of numeric values | `ARRAY_SUM([1,2,3])` → `6` | +| [ARRAY_AVG](/tidb-cloud-lake/sql/array-avg.md) | Average of numeric values | `ARRAY_AVG([1,2,3])` → `2` | +| [ARRAY_MEDIAN](/tidb-cloud-lake/sql/array-median.md) | Median of numeric values | `ARRAY_MEDIAN([1,3,2])` → `2` | +| [ARRAY_MIN](/tidb-cloud-lake/sql/array-min.md) | Minimum value | `ARRAY_MIN([3,1,2])` → `1` | +| [ARRAY_MAX](/tidb-cloud-lake/sql/array-max.md) | Maximum value | `ARRAY_MAX([3,1,2])` → `3` | +| [ARRAY_STDDEV_POP](/tidb-cloud-lake/sql/array-stddev-pop.md) | Population standard deviation (alias: `ARRAY_STD`) | `ARRAY_STDDEV_POP([1,2,3])` | +| [ARRAY_STDDEV_SAMP](/tidb-cloud-lake/sql/array-stddev-samp.md) | Sample standard deviation (alias: `ARRAY_STDDEV`) | `ARRAY_STDDEV_SAMP([1,2,3])` | +| [ARRAY_KURTOSIS](/tidb-cloud-lake/sql/array-kurtosis.md) | Excess kurtosis of values | `ARRAY_KURTOSIS([1,2,3,4])` | +| [ARRAY_SKEWNESS](/tidb-cloud-lake/sql/array-skewness.md) | Skewness of values | `ARRAY_SKEWNESS([1,2,3,4])` | +| [ARRAY_APPROX_COUNT_DISTINCT](/tidb-cloud-lake/sql/array-approx-count-distinct.md) | Approximate distinct count | `ARRAY_APPROX_COUNT_DISTINCT([1,1,2])` → `2` | + +## Array Formatting + +| Function | Description | Example | +|----------|-------------|---------| +| [ARRAY_TO_STRING](/tidb-cloud-lake/sql/array-to-string.md) | Joins array elements into a string | `ARRAY_TO_STRING(['a','b'], ',')` → `'a,b'` | + +## Array Utility Functions + +| Function | Description | Example | +|----------|-------------|---------| +| [ARRAY_COMPACT](/tidb-cloud-lake/sql/array-compact.md) | Removes null values from an array | `ARRAY_COMPACT([1,null,2,null,3])` → `[1,2,3]` | +| [ARRAY_FLATTEN](/tidb-cloud-lake/sql/array-flatten.md) | Flattens nested arrays into a single array | `ARRAY_FLATTEN([[1,2],[3,4]])` → `[1,2,3,4]` | +| [ARRAY_REVERSE](/tidb-cloud-lake/sql/array-reverse.md) | Reverses the order of array elements | `ARRAY_REVERSE([1,2,3])` → `[3,2,1]` | +| [ARRAY_INDEXOF](/tidb-cloud-lake/sql/array-indexof.md) | Returns the index of first occurrence of an element | `ARRAY_INDEXOF([1,2,3,2], 2)` → `1` | +| [UNNEST](/tidb-cloud-lake/sql/unnest.md) | Expands an array into individual rows | `UNNEST([1,2,3])` → `1, 2, 3` (as separate rows) | diff --git a/tidb-cloud-lake/sql/array-generate-range.md b/tidb-cloud-lake/sql/array-generate-range.md new file mode 100644 index 0000000000000..8bcbfcc5f847e --- /dev/null +++ b/tidb-cloud-lake/sql/array-generate-range.md @@ -0,0 +1,51 @@ +--- +title: ARRAY_GENERATE_RANGE +--- + +Builds an array of evenly spaced integers between a start and end value. The `end` bound is exclusive. + +## Syntax + +```sql +ARRAY_GENERATE_RANGE(, [, ]) +``` + +- ``: First value to include. +- ``: Exclusive upper (or lower) bound. +- ``: Optional increment (default `1`). Negative steps produce descending sequences. + +## Return Type + +`ARRAY` + +## Examples + +```sql +SELECT ARRAY_GENERATE_RANGE(1, 5) AS seq; + +┌──────────┐ +│ seq │ +├──────────┤ +│ [1,2,3,4]│ +└──────────┘ +``` + +```sql +SELECT ARRAY_GENERATE_RANGE(0, 6, 2) AS seq_step; + +┌────────────┐ +│ seq_step │ +├────────────┤ +│ [0,2,4] │ +└────────────┘ +``` + +```sql +SELECT ARRAY_GENERATE_RANGE(5, 0, -2) AS seq_down; + +┌────────────┐ +│ seq_down │ +├────────────┤ +│ [5,3,1] │ +└────────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-get.md b/tidb-cloud-lake/sql/array-get.md new file mode 100644 index 0000000000000..582bf631bc49f --- /dev/null +++ b/tidb-cloud-lake/sql/array-get.md @@ -0,0 +1,5 @@ +--- +title: ARRAY_GET +--- + +Alias for [GET](/tidb-cloud-lake/sql/get.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/array-indexof.md b/tidb-cloud-lake/sql/array-indexof.md new file mode 100644 index 0000000000000..2528aeb760b65 --- /dev/null +++ b/tidb-cloud-lake/sql/array-indexof.md @@ -0,0 +1,68 @@ +--- +title: ARRAY_INDEXOF +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the index of the first occurrence of an element in an array. + +## Syntax + +```sql +ARRAY_INDEXOF(array, element) +``` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| array | The array to search within. | +| element | The element to search for. | + +## Return Type + +INTEGER + +## Important Note on Indexing + +- For standard array types: Indexing is **1-based** (first element is at position 1). +- For variant array types: Indexing is **0-based** (first element is at position 0), for compatibility with Snowflake. + +## Examples + +### Example 1: Finding an Element in a Standard Array (1-based indexing) + +```sql +SELECT ARRAY_INDEXOF([10, 20, 30, 20], 20); +``` + +Result: + +``` +2 +``` + +### Example 2: Finding an Element in a Variant Array (0-based indexing) + +```sql +SELECT ARRAY_INDEXOF(PARSE_JSON('["apple", "banana", "orange"]'), 'banana'); +``` + +Result: + +``` +1 +``` + +### Example 3: Element Not Found + +```sql +SELECT ARRAY_INDEXOF([1, 2, 3], 4); +``` + +Result: + +``` +0 +``` diff --git a/tidb-cloud-lake/sql/array-insert.md b/tidb-cloud-lake/sql/array-insert.md new file mode 100644 index 0000000000000..3605a6ab63e08 --- /dev/null +++ b/tidb-cloud-lake/sql/array-insert.md @@ -0,0 +1,69 @@ +--- +title: ARRAY_INSERT +title_includes: JSON_ARRAY_INSERT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Inserts a value into a JSON array at the specified index and returns the updated JSON array. + +## Aliases + +- `JSON_ARRAY_INSERT` + +## Syntax + +```sql +ARRAY_INSERT(, , ) +``` + +| Parameter | Description | +|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `` | The JSON array to modify. | +| `` | The position at which the value will be inserted. Positive indices insert at the specified position or append if out of range; negative indices insert from the end or at the beginning if out of range. | +| `` | The JSON value to insert into the array. | + +## Return Type + +JSON array. + +## Examples + +When the `` is a non-negative integer, the new element is inserted at the specified position, and existing elements are shifted to the right. + +```sql +-- The new element is inserted at position 0 (the beginning of the array), shifting all original elements to the right +SELECT ARRAY_INSERT('["task1", "task2", "task3"]'::VARIANT, 0, '"new_task"'::VARIANT); + +-[ RECORD 1 ]----------------------------------- +array_insert('["task1", "task2", "task3"]'::VARIANT, 0, '"new_task"'::VARIANT): ["new_task","task1","task2","task3"] + +-- The new element is inserted at position 1, between task1 and task2 +SELECT ARRAY_INSERT('["task1", "task2", "task3"]'::VARIANT, 1, '"new_task"'::VARIANT); + +-[ RECORD 1 ]----------------------------------- +array_insert('["task1", "task2", "task3"]'::VARIANT, 1, '"new_task"'::VARIANT): ["task1","new_task","task2","task3"] + +-- If the index exceeds the length of the array, the new element is appended at the end of the array +SELECT ARRAY_INSERT('["task1", "task2", "task3"]'::VARIANT, 6, '"new_task"'::VARIANT); + +-[ RECORD 1 ]----------------------------------- +array_insert('["task1", "task2", "task3"]'::VARIANT, 6, '"new_task"'::VARIANT): ["task1","task2","task3","new_task"] +``` + +A negative `` counts from the end of the array, with `-1` representing the position before the last element, `-2` before the second last, and so on. + +```sql +-- The new element is inserted just before the last element (task3) +SELECT ARRAY_INSERT('["task1", "task2", "task3"]'::VARIANT, -1, '"new_task"'::VARIANT); + +-[ RECORD 1 ]----------------------------------- +array_insert('["task1", "task2", "task3"]'::VARIANT, - 1, '"new_task"'::VARIANT): ["task1","task2","new_task","task3"] + +-- Since the negative index exceeds the array’s length, the new element is inserted at the beginning +SELECT ARRAY_INSERT('["task1", "task2", "task3"]'::VARIANT, -6, '"new_task"'::VARIANT); + +-[ RECORD 1 ]----------------------------------- +array_insert('["task1", "task2", "task3"]'::VARIANT, - 6, '"new_task"'::VARIANT): ["new_task","task1","task2","task3"] +``` diff --git a/tidb-cloud-lake/sql/array-intersection.md b/tidb-cloud-lake/sql/array-intersection.md new file mode 100644 index 0000000000000..40894fa1adf8c --- /dev/null +++ b/tidb-cloud-lake/sql/array-intersection.md @@ -0,0 +1,42 @@ +--- +title: ARRAY_INTERSECTION +title_includes: JSON_ARRAY_INTERSECTION +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the common elements between two JSON arrays. + +## Aliases + +- `JSON_ARRAY_INTERSECTION` + +## Syntax + +```sql +ARRAY_INTERSECTION(, ) +``` + +## Return Type + +JSON array. + +## Examples + +```sql +-- Find the intersection of two JSON arrays +SELECT ARRAY_INTERSECTION('["Electronics", "Books", "Toys"]'::JSON, '["Books", "Fashion", "Electronics"]'::JSON); + +-[ RECORD 1 ]----------------------------------- +array_intersection('["Electronics", "Books", "Toys"]'::VARIANT, '["Books", "Fashion", "Electronics"]'::VARIANT): ["Electronics","Books"] + +-- Find the intersection of the result from the first query with a third JSON array using an iterative approach +SELECT ARRAY_INTERSECTION( + ARRAY_INTERSECTION('["Electronics", "Books", "Toys"]'::JSON, '["Books", "Fashion", "Electronics"]'::JSON), + '["Electronics", "Books", "Clothing"]'::JSON +); + +-[ RECORD 1 ]----------------------------------- +array_intersection(array_intersection('["Electronics", "Books", "Toys"]'::VARIANT, '["Books", "Fashion", "Electronics"]'::VARIANT), '["Electronics", "Books", "Clothing"]'::VARIANT): ["Electronics","Books"] +``` diff --git a/tidb-cloud-lake/sql/array-kurtosis.md b/tidb-cloud-lake/sql/array-kurtosis.md new file mode 100644 index 0000000000000..51b797076c714 --- /dev/null +++ b/tidb-cloud-lake/sql/array-kurtosis.md @@ -0,0 +1,47 @@ +--- +title: ARRAY_KURTOSIS +--- + +Returns the excess kurtosis of the numeric values in an array. `NULL` elements are ignored; non-numeric elements raise an error. + +## Syntax + +```sql +ARRAY_KURTOSIS() +``` + +## Return Type + +Floating-point. + +## Examples + +```sql +SELECT ARRAY_KURTOSIS([1, 2, 3, 4]) AS kurt; + +┌────────────────────────┐ +│ kurt │ +├────────────────────────┤ +│ -1.200000000000001 │ +└────────────────────────┘ +``` + +```sql +SELECT ARRAY_KURTOSIS([1.5, 2.5, 3.5, 4.5]) AS kurt_decimal; + +┌────────────────────────┐ +│ kurt_decimal │ +├────────────────────────┤ +│ -1.200000000000001 │ +└────────────────────────┘ +``` + +```sql +SELECT ARRAY_KURTOSIS([NULL, 2, 3, 4]) AS kurt_null; + +┌────────────────┐ +│ kurt_null │ +├────────────────┤ +│ 0 │ +└────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-max.md b/tidb-cloud-lake/sql/array-max.md new file mode 100644 index 0000000000000..332447c7cd062 --- /dev/null +++ b/tidb-cloud-lake/sql/array-max.md @@ -0,0 +1,47 @@ +--- +title: ARRAY_MAX +--- + +Returns the largest numeric value in an array. `NULL` elements are skipped; non-numeric values cause an error. + +## Syntax + +```sql +ARRAY_MAX() +``` + +## Return Type + +Same numeric type as the array elements. + +## Examples + +```sql +SELECT ARRAY_MAX([5, 2, 9, -1]) AS max_int; + +┌─────────┐ +│ max_int │ +├─────────┤ +│ 9 │ +└─────────┘ +``` + +```sql +SELECT ARRAY_MAX([1.5, -2.25, 3.0]) AS max_decimal; + +┌─────────────┐ +│ max_decimal │ +├─────────────┤ +│ 3.00 │ +└─────────────┘ +``` + +```sql +SELECT ARRAY_MAX([NULL, 10, 4]) AS max_with_null; + +┌───────────────┐ +│ max_with_null │ +├───────────────┤ +│ 10 │ +└───────────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-median.md b/tidb-cloud-lake/sql/array-median.md new file mode 100644 index 0000000000000..fab2d9600f8b3 --- /dev/null +++ b/tidb-cloud-lake/sql/array-median.md @@ -0,0 +1,47 @@ +--- +title: ARRAY_MEDIAN +--- + +Returns the median of the numeric values in an array. `NULL` elements are ignored. + +## Syntax + +```sql +ARRAY_MEDIAN() +``` + +## Return Type + +Numeric. For even-length inputs the result is the average of the two middle values. + +## Examples + +```sql +SELECT ARRAY_MEDIAN([1, 3, 2, 4]) AS med_even; + +┌────────┐ +│ med_even │ +├────────┤ +│ 2.5 │ +└────────┘ +``` + +```sql +SELECT ARRAY_MEDIAN([1, 3, 5]) AS med_odd; + +┌────────┐ +│ med_odd│ +├────────┤ +│ 3.0 │ +└────────┘ +``` + +```sql +SELECT ARRAY_MEDIAN([NULL, 10, 20, 30]) AS med_null; + +┌────────┐ +│ med_null│ +├────────┤ +│ 20.0 │ +└────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-min.md b/tidb-cloud-lake/sql/array-min.md new file mode 100644 index 0000000000000..e1524dc49e22b --- /dev/null +++ b/tidb-cloud-lake/sql/array-min.md @@ -0,0 +1,47 @@ +--- +title: ARRAY_MIN +--- + +Returns the smallest numeric value in an array. `NULL` elements are skipped; non-numeric values cause an error. + +## Syntax + +```sql +ARRAY_MIN() +``` + +## Return Type + +Same numeric type as the array elements. + +## Examples + +```sql +SELECT ARRAY_MIN([5, 2, 9, -1]) AS min_int; + +┌─────────┐ +│ min_int │ +├─────────┤ +│ -1 │ +└─────────┘ +``` + +```sql +SELECT ARRAY_MIN([1.5, -2.25, 3.0]) AS min_decimal; + +┌──────────────┐ +│ min_decimal │ +├──────────────┤ +│ -2.25 │ +└──────────────┘ +``` + +```sql +SELECT ARRAY_MIN([NULL, 10, 4]) AS min_with_null; + +┌──────────────┐ +│ min_with_null│ +├──────────────┤ +│ 4 │ +└──────────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-overlap.md b/tidb-cloud-lake/sql/array-overlap.md new file mode 100644 index 0000000000000..9c56d9999feb8 --- /dev/null +++ b/tidb-cloud-lake/sql/array-overlap.md @@ -0,0 +1,47 @@ +--- +title: ARRAY_OVERLAP +title_includes: JSON_ARRAY_OVERLAP +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Checks if there is any overlap between two JSON arrays and returns `true` if there are common elements; otherwise, it returns `false`. + +## Aliases + +- `JSON_ARRAY_OVERLAP` + +## Syntax + +```sql +ARRAY_OVERLAP(, ) +``` + +## Return Type + +The function returns a boolean value: + +- `true` if there is at least one common element between the two JSON arrays, +- `false` if there are no common elements. + +## Examples + +```sql +SELECT ARRAY_OVERLAP( + '["apple", "banana", "cherry"]'::JSON, + '["banana", "kiwi", "mango"]'::JSON +); + +-[ RECORD 1 ]----------------------------------- +array_overlap('["apple", "banana", "cherry"]'::VARIANT, '["banana", "kiwi", "mango"]'::VARIANT): true + + +SELECT ARRAY_OVERLAP( + '["grape", "orange"]'::JSON, + '["apple", "kiwi"]'::JSON +); + +-[ RECORD 1 ]----------------------------------- +array_overlap('["grape", "orange"]'::VARIANT, '["apple", "kiwi"]'::VARIANT): false +``` diff --git a/tidb-cloud-lake/sql/array-prepend.md b/tidb-cloud-lake/sql/array-prepend.md new file mode 100644 index 0000000000000..8685860ba5d41 --- /dev/null +++ b/tidb-cloud-lake/sql/array-prepend.md @@ -0,0 +1,67 @@ +--- +title: ARRAY_PREPEND +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Prepends an element to the beginning of an array. + +## Syntax + +```sql +ARRAY_PREPEND(element, array) +``` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| element | The element to prepend to the array. | +| array | The source array to which the element will be prepended. | + +## Return Type + +Array with the prepended element. + +## Notes + +This function works with both standard array types and variant array types. + +## Examples + +### Example 1: Prepending to a Standard Array + +```sql +SELECT ARRAY_PREPEND(0, [1, 2, 3]); +``` + +Result: + +``` +[0, 1, 2, 3] +``` + +### Example 2: Prepending to a Variant Array + +```sql +SELECT ARRAY_PREPEND('apple', PARSE_JSON('["banana", "orange"]')); +``` + +Result: + +``` +["apple", "banana", "orange"] +``` + +### Example 3: Prepending a Complex Element + +```sql +SELECT ARRAY_PREPEND(PARSE_JSON('{"value": 0}'), [1, 2, 3]); +``` + +Result: + +``` +[{"value": 0}, 1, 2, 3] +``` diff --git a/tidb-cloud-lake/sql/array-reduce.md b/tidb-cloud-lake/sql/array-reduce.md new file mode 100644 index 0000000000000..450b391aaccaa --- /dev/null +++ b/tidb-cloud-lake/sql/array-reduce.md @@ -0,0 +1,29 @@ +--- +title: ARRAY_REDUCE +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Reduces a JSON array to a single value by applying a specified Lambda expression. For more information about Lambda expression, see [Lambda Expressions](/sql/stored-procedure-scripting/#lambda-expressions). + +## Syntax + +```sql +ARRAY_REDUCE(, ) +``` + +## Examples + +This example multiplies all the elements in the array (2 _ 3 _ 4): + +```sql +SELECT ARRAY_REDUCE( + [2, 3, 4], + (acc, d) -> acc::Int * d::Int +); + +-[ RECORD 1 ]----------------------------------- +array_reduce([2, 3, 4], (acc, d) -> acc::Int32 * d::Int32): 24 +``` diff --git a/tidb-cloud-lake/sql/array-remove-first.md b/tidb-cloud-lake/sql/array-remove-first.md new file mode 100644 index 0000000000000..6749cfa2d3898 --- /dev/null +++ b/tidb-cloud-lake/sql/array-remove-first.md @@ -0,0 +1,67 @@ +--- +title: ARRAY_REMOVE_FIRST +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Removes the first occurrence of an element from an array. + +## Syntax + +```sql +ARRAY_REMOVE_FIRST(array, element) +``` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| array | The source array from which to remove the element. | +| element | The element to remove from the array. | + +## Return Type + +Array with the first occurrence of the specified element removed. + +## Notes + +This function works with both standard array types and variant array types. + +## Examples + +### Example 1: Removing from a Standard Array + +```sql +SELECT ARRAY_REMOVE_FIRST([1, 2, 2, 3], 2); +``` + +Result: + +``` +[1, 2, 3] +``` + +### Example 2: Removing from a Variant Array + +```sql +SELECT ARRAY_REMOVE_FIRST(PARSE_JSON('["apple", "banana", "apple", "orange"]'), 'apple'); +``` + +Result: + +``` +["banana", "apple", "orange"] +``` + +### Example 3: Element Not Found + +```sql +SELECT ARRAY_REMOVE_FIRST([1, 2, 3], 4); +``` + +Result: + +``` +[1, 2, 3] +``` diff --git a/tidb-cloud-lake/sql/array-remove-last.md b/tidb-cloud-lake/sql/array-remove-last.md new file mode 100644 index 0000000000000..d9e4a7bc6ed0c --- /dev/null +++ b/tidb-cloud-lake/sql/array-remove-last.md @@ -0,0 +1,67 @@ +--- +title: ARRAY_REMOVE_LAST +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Removes the last occurrence of an element from an array. + +## Syntax + +```sql +ARRAY_REMOVE_LAST(array, element) +``` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| array | The source array from which to remove the element. | +| element | The element to remove from the array. | + +## Return Type + +Array with the last occurrence of the specified element removed. + +## Notes + +This function works with both standard array types and variant array types. + +## Examples + +### Example 1: Removing from a Standard Array + +```sql +SELECT ARRAY_REMOVE_LAST([1, 2, 2, 3], 2); +``` + +Result: + +``` +[1, 2, 3] +``` + +### Example 2: Removing from a Variant Array + +```sql +SELECT ARRAY_REMOVE_LAST(PARSE_JSON('["apple", "banana", "apple", "orange"]'), 'apple'); +``` + +Result: + +``` +["apple", "banana", "orange"] +``` + +### Example 3: Element Not Found + +```sql +SELECT ARRAY_REMOVE_LAST([1, 2, 3], 4); +``` + +Result: + +``` +[1, 2, 3] +``` diff --git a/tidb-cloud-lake/sql/array-remove.md b/tidb-cloud-lake/sql/array-remove.md new file mode 100644 index 0000000000000..4ec4d672c9768 --- /dev/null +++ b/tidb-cloud-lake/sql/array-remove.md @@ -0,0 +1,67 @@ +--- +title: ARRAY_REMOVE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Removes all occurrences of an element from an array. + +## Syntax + +```sql +ARRAY_REMOVE(array, element) +``` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| array | The source array from which to remove elements. | +| element | The element to remove from the array. | + +## Return Type + +Array with all occurrences of the specified element removed. + +## Notes + +This function works with both standard array types and variant array types. + +## Examples + +### Example 1: Removing from a Standard Array + +```sql +SELECT ARRAY_REMOVE([1, 2, 2, 3, 2], 2); +``` + +Result: + +``` +[1, 3] +``` + +### Example 2: Removing from a Variant Array + +```sql +SELECT ARRAY_REMOVE(PARSE_JSON('["apple", "banana", "apple", "orange"]'), 'apple'); +``` + +Result: + +``` +["banana", "orange"] +``` + +### Example 3: Element Not Found + +```sql +SELECT ARRAY_REMOVE([1, 2, 3], 4); +``` + +Result: + +``` +[1, 2, 3] +``` diff --git a/tidb-cloud-lake/sql/array-reverse.md b/tidb-cloud-lake/sql/array-reverse.md new file mode 100644 index 0000000000000..6b182a154aeb5 --- /dev/null +++ b/tidb-cloud-lake/sql/array-reverse.md @@ -0,0 +1,66 @@ +--- +title: ARRAY_REVERSE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Reverses the order of elements in an array. + +## Syntax + +```sql +ARRAY_REVERSE(array) +``` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| array | The array to reverse. | + +## Return Type + +Array with elements in reversed order. + +## Notes + +This function works with both standard array types and variant array types. + +## Examples + +### Example 1: Reversing a Standard Array + +```sql +SELECT ARRAY_REVERSE([1, 2, 3, 4, 5]); +``` + +Result: + +``` +[5, 4, 3, 2, 1] +``` + +### Example 2: Reversing a Variant Array + +```sql +SELECT ARRAY_REVERSE(PARSE_JSON('["apple", "banana", "orange"]')); +``` + +Result: + +``` +["orange", "banana", "apple"] +``` + +### Example 3: Reversing an Empty Array + +```sql +SELECT ARRAY_REVERSE([]); +``` + +Result: + +``` +[] +``` diff --git a/tidb-cloud-lake/sql/array-size.md b/tidb-cloud-lake/sql/array-size.md new file mode 100644 index 0000000000000..d0100cc3bf8c0 --- /dev/null +++ b/tidb-cloud-lake/sql/array-size.md @@ -0,0 +1,40 @@ +--- +title: ARRAY_SIZE +title_includes: ARRAY_LENGTH +--- + +Returns the length of an array, counting `NULL` elements. + +Alias: `ARRAY_LENGTH` + +## Syntax + +```sql +ARRAY_SIZE() +``` + +## Return Type + +`BIGINT` + +## Examples + +```sql +SELECT ARRAY_SIZE([1, 2, 3]) AS size_plain; + +┌──────────┐ +│ size_plain │ +├──────────┤ +│ 3 │ +└──────────┘ +``` + +```sql +SELECT ARRAY_SIZE([1, NULL, 3]) AS size_with_null; + +┌──────────────┐ +│ size_with_null│ +├──────────────┤ +│ 3 │ +└──────────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-skewness.md b/tidb-cloud-lake/sql/array-skewness.md new file mode 100644 index 0000000000000..e48e6ac615d1e --- /dev/null +++ b/tidb-cloud-lake/sql/array-skewness.md @@ -0,0 +1,47 @@ +--- +title: ARRAY_SKEWNESS +--- + +Returns the skewness of numeric values in an array. `NULL` items are ignored; non-numeric items raise an error. + +## Syntax + +```sql +ARRAY_SKEWNESS() +``` + +## Return Type + +Floating-point. + +## Examples + +```sql +SELECT ARRAY_SKEWNESS([1, 2, 3, 4]) AS skew; + +┌──────┐ +│ skew │ +├──────┤ +│ 0 │ +└──────┘ +``` + +```sql +SELECT ARRAY_SKEWNESS([1.5, 2.5, 3.5, 4.5]) AS skew_decimal; + +┌────────────┐ +│ skew_decimal│ +├────────────┤ +│ 0 │ +└────────────┘ +``` + +```sql +SELECT ARRAY_SKEWNESS([NULL, 2, 3, 10]) AS skew_null; + +┌────────────────────┐ +│ skew_null │ +├────────────────────┤ +│ 1.6300591617118865 │ +└────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-slice.md b/tidb-cloud-lake/sql/array-slice.md new file mode 100644 index 0000000000000..f3a5eb7eb60f8 --- /dev/null +++ b/tidb-cloud-lake/sql/array-slice.md @@ -0,0 +1,69 @@ +--- +title: ARRAY_SLICE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Extracts a sub-array using slice between start and end arguments. + +## Syntax + +```sql +ARRAY_SLICE(array, start, end) +``` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| array | The source array from which to extract a slice. | +| start | The starting position of the slice (inclusive). | +| end | The ending position of the slice (exclusive). | + +## Return Type + +Array (slice of the original array). + +## Important Note on Indexing + +- For standard array types: Indexing is **1-based** (first element is at position 1). +- For variant array types: Indexing is **0-based** (first element is at position 0), for compatibility with Snowflake. + +## Examples + +### Example 1: Slicing a Standard Array (1-based indexing) + +```sql +SELECT ARRAY_SLICE([10, 20, 30, 40, 50], 2, 4); +``` + +Result: + +``` +[20, 30] +``` + +### Example 2: Slicing a Variant Array (0-based indexing) + +```sql +SELECT ARRAY_SLICE(PARSE_JSON('["apple", "banana", "orange", "grape", "kiwi"]'), 1, 3); +``` + +Result: + +``` +["banana", "orange"] +``` + +### Example 3: Out of Bounds Slice + +```sql +SELECT ARRAY_SLICE([1, 2, 3], 4, 6); +``` + +Result: + +``` +[] +``` diff --git a/tidb-cloud-lake/sql/array-sort.md b/tidb-cloud-lake/sql/array-sort.md new file mode 100644 index 0000000000000..8324aa4850aac --- /dev/null +++ b/tidb-cloud-lake/sql/array-sort.md @@ -0,0 +1,70 @@ +--- +title: ARRAY_SORT +title_includes: ARRAY_SORT_ASC_NULL_FIRST, ARRAY_SORT_ASC_NULL_LAST, ARRAY_SORT_DESC_NULL_FIRST, ARRAY_SORT_DESC_NULL_LAST +--- + +Sorts the elements of an array. By default, `ARRAY_SORT` orders ascending and places `NULL` values last. Use the explicit variants to control order and `NULL` placement. + +## Syntax + +```sql +ARRAY_SORT() +ARRAY_SORT_ASC_NULL_FIRST() +ARRAY_SORT_ASC_NULL_LAST() +ARRAY_SORT_DESC_NULL_FIRST() +ARRAY_SORT_DESC_NULL_LAST() +``` + +## Return Type + +`ARRAY` + +## Examples + +```sql +SELECT ARRAY_SORT([3, 1, 2]) AS sort_default; + +┌──────────────┐ +│ sort_default │ +├──────────────┤ +│ [1,2,3] │ +└──────────────┘ +``` + +```sql +SELECT ARRAY_SORT([NULL, 2, 1]) AS sort_with_nulls; + +┌────────────────┐ +│ sort_with_nulls│ +├────────────────┤ +│ [1,2,NULL] │ +└────────────────┘ +``` + +```sql +SELECT ARRAY_SORT_ASC_NULL_FIRST([NULL, 2, 1]) AS asc_null_first; + +┌────────────────┐ +│ asc_null_first │ +├────────────────┤ +│ [NULL,1,2] │ +└────────────────┘ +``` + +```sql +SELECT ARRAY_SORT_DESC_NULL_LAST([NULL, 2, 1]) AS desc_null_last; + +┌────────────────┐ +│ desc_null_last │ +├────────────────┤ +│ [2,1,NULL] │ +└────────────────┘ + +SELECT ARRAY_SORT_DESC_NULL_FIRST([NULL, 2, 1]) AS desc_null_first; + +┌─────────────────┐ +│ desc_null_first │ +├─────────────────┤ +│ [NULL,2,1] │ +└─────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-sql.md b/tidb-cloud-lake/sql/array-sql.md new file mode 100644 index 0000000000000..feaee48f3c325 --- /dev/null +++ b/tidb-cloud-lake/sql/array-sql.md @@ -0,0 +1,47 @@ +--- +title: ARRAY +--- + +Builds an array literal from the supplied expressions. Each argument is evaluated and stored in order. All elements must be castable to a common type. + +## Syntax + +```sql +ARRAY(, , ... ) +``` + +## Return Type + +`ARRAY` + +## Examples + +```sql +SELECT ARRAY(1, 2, 3) AS arr_int; + +┌─────────┐ +│ arr_int │ +├─────────┤ +│ [1,2,3] │ +└─────────┘ +``` + +```sql +SELECT ARRAY('alpha', UPPER('beta')) AS arr_text; + +┌───────────┐ +│ arr_text │ +├───────────┤ +│ ["alpha","BETA"] │ +└───────────┘ +``` + +```sql +SELECT ARRAY(1, NULL, 3) AS arr_with_null; + +┌────────────────┐ +│ arr_with_null │ +├────────────────┤ +│ [1,NULL,3] │ +└────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-stddev-pop.md b/tidb-cloud-lake/sql/array-stddev-pop.md new file mode 100644 index 0000000000000..f58fcf8f55d41 --- /dev/null +++ b/tidb-cloud-lake/sql/array-stddev-pop.md @@ -0,0 +1,38 @@ +--- +title: ARRAY_STDDEV_POP +title_includes: ARRAY_STD +--- + +Computes the population standard deviation of numeric array values. `NULL` entries are ignored; non-numeric entries raise an error. + +## Syntax + +```sql +ARRAY_STDDEV_POP() +``` + +## Return Type + +Floating-point. + +## Examples + +```sql +SELECT ARRAY_STDDEV_POP([2, 4, 4, 4, 5, 5, 7, 9]) AS stddev_pop; + +┌────────────┐ +│ stddev_pop │ +├────────────┤ +│ 2 │ +└────────────┘ +``` + +```sql +SELECT ARRAY_STDDEV_POP([1.5, 2.5, NULL, 3.5]) AS stddev_pop_null; + +┌─────────────────┐ +│ stddev_pop_null │ +├─────────────────┤ +│ 0.816496580927726 │ +└─────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-stddev-samp.md b/tidb-cloud-lake/sql/array-stddev-samp.md new file mode 100644 index 0000000000000..06daaa3df3560 --- /dev/null +++ b/tidb-cloud-lake/sql/array-stddev-samp.md @@ -0,0 +1,38 @@ +--- +title: ARRAY_STDDEV_SAMP +title_includes: ARRAY_STDDEV +--- + +Computes the sample standard deviation of numeric array values. `NULL` items are ignored; non-numeric entries raise an error. + +## Syntax + +```sql +ARRAY_STDDEV_SAMP() +``` + +## Return Type + +Floating-point. + +## Examples + +```sql +SELECT ARRAY_STDDEV_SAMP([2, 4, 4, 4, 5, 5, 7, 9]) AS stddev_samp; + +┌─────────────┐ +│ stddev_samp │ +├─────────────┤ +│ 2.138089935299395 │ +└─────────────┘ +``` + +```sql +SELECT ARRAY_STDDEV_SAMP([1.5, 2.5, NULL, 3.5]) AS stddev_samp_null; + +┌─────────────────┐ +│ stddev_samp_null │ +├─────────────────┤ +│ 1 │ +└─────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-sum.md b/tidb-cloud-lake/sql/array-sum.md new file mode 100644 index 0000000000000..0357366a661ad --- /dev/null +++ b/tidb-cloud-lake/sql/array-sum.md @@ -0,0 +1,51 @@ +--- +title: ARRAY_SUM +--- + +Sums the numeric elements in an array. `NULL` items are skipped, and non-numeric values raise an error. + +## Syntax + +```sql +ARRAY_SUM() +``` + +## Return Type + +Numeric (matches the widest numeric type in the array). + +## Examples + +```sql +SELECT ARRAY_SUM([1, 2, 3, 4]) AS total; + +┌───────┐ +│ total │ +├───────┤ +│ 10 │ +└───────┘ +``` + +```sql +SELECT ARRAY_SUM([1.5, 2.25, 3.0]) AS total; + +┌────────┐ +│ total │ +├────────┤ +│ 6.75 │ +└────────┘ +``` + +```sql +SELECT ARRAY_SUM([10, NULL, -3]) AS total; + +┌───────┐ +│ total │ +├───────┤ +│ 7 │ +└───────┘ +``` + +## Related + +- [ARRAY_AGGREGATE](/tidb-cloud-lake/sql/array-aggregate.md) diff --git a/tidb-cloud-lake/sql/array-to-string.md b/tidb-cloud-lake/sql/array-to-string.md new file mode 100644 index 0000000000000..6efc959c0fea0 --- /dev/null +++ b/tidb-cloud-lake/sql/array-to-string.md @@ -0,0 +1,37 @@ +--- +title: ARRAY_TO_STRING +--- + +Concatenates the string elements of an array into a single string, separated by a delimiter. `NULL` elements are skipped. + +## Syntax + +```sql +ARRAY_TO_STRING(, ) +``` + +## Return Type + +`STRING` + +## Examples + +```sql +SELECT ARRAY_TO_STRING(['a', 'b', 'c'], ',') AS joined; + +┌────────┐ +│ joined │ +├────────┤ +│ a,b,c │ +└────────┘ +``` + +```sql +SELECT ARRAY_TO_STRING([NULL, 'x', 'y'], '-') AS joined_no_nulls; + +┌──────────────────┐ +│ joined_no_nulls │ +├──────────────────┤ +│ x-y │ +└──────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/array-unique.md b/tidb-cloud-lake/sql/array-unique.md new file mode 100644 index 0000000000000..270a2ce3ce882 --- /dev/null +++ b/tidb-cloud-lake/sql/array-unique.md @@ -0,0 +1,66 @@ +--- +title: ARRAY_UNIQUE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the number of unique elements in the array. + +## Syntax + +```sql +ARRAY_UNIQUE(array) +``` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| array | The array to analyze for unique elements. | + +## Return Type + +INTEGER + +## Notes + +This function works with both standard array types and variant array types. + +## Examples + +### Example 1: Counting Unique Elements in a Standard Array + +```sql +SELECT ARRAY_UNIQUE([1, 2, 2, 3, 3, 3]); +``` + +Result: + +``` +3 +``` + +### Example 2: Counting Unique Elements in a Variant Array + +```sql +SELECT ARRAY_UNIQUE(PARSE_JSON('["apple", "banana", "apple", "orange", "banana"]')); +``` + +Result: + +``` +3 +``` + +### Example 3: Empty Array + +```sql +SELECT ARRAY_UNIQUE([]); +``` + +Result: + +``` +0 +``` diff --git a/tidb-cloud-lake/sql/array.md b/tidb-cloud-lake/sql/array.md new file mode 100644 index 0000000000000..08db75ea1e821 --- /dev/null +++ b/tidb-cloud-lake/sql/array.md @@ -0,0 +1,52 @@ +--- +title: Array +description: Array of defined data type. +sidebar_position: 8 +--- + +## Overview + +`ARRAY(T)` stores variable-length collections whose elements all share the type `T`. Define the element type when creating a table and use array functions to read or transform the values. + +:::note +Databend arrays are 1-based. `arr[1]` returns the first element and `arr[n]` the last. +::: + +## Examples + +```sql +CREATE TABLE array_samples (arr ARRAY(INT64)); + +INSERT INTO array_samples VALUES ([1, 2, 3]), ([10, 20]); + +SELECT + arr, + arr[1] AS first_elem, + arr[2] AS second_elem +FROM array_samples; +``` + +Result: +``` +┌────────────┬────────────┬──────────────┐ +│ arr │ first_elem │ second_elem │ +├────────────┼────────────┼──────────────┤ +│ [1,2,3] │ 1 │ 2 │ +│ [10,20] │ 10 │ 20 │ +└────────────┴────────────┴──────────────┘ +``` + +```sql +-- Index 0 always returns NULL because arrays are 1-based. +SELECT arr[0] AS zeroth_elem FROM array_samples; +``` + +Result: +``` +┌─────────────┐ +│ zeroth_elem │ +├─────────────┤ +│ NULL │ +│ NULL │ +└─────────────┘ +``` diff --git a/tidb-cloud-lake/sql/arrays-zip.md b/tidb-cloud-lake/sql/arrays-zip.md new file mode 100644 index 0000000000000..28e5a6683f2e7 --- /dev/null +++ b/tidb-cloud-lake/sql/arrays-zip.md @@ -0,0 +1,39 @@ +--- +title: ARRAYS_ZIP +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Merges multiple arrays into a single array tuple. + +## Syntax + +```sql +ARRAYS_ZIP( [, ...] ) +``` + +## Arguments + +| Arguments | Description | +|------------|-------------------| +| `` | The input ARRAYs. | + +:::note +- The length of each array must be the same. +::: + +## Return Type + +Array(Tuple). + +## Examples + +```sql +SELECT ARRAYS_ZIP([1, 2, 3], ['a', 'b', 'c']); +┌────────────────────────────────────────┐ +│ arrays_zip([1, 2, 3], ['a', 'b', 'c']) │ +├────────────────────────────────────────┤ +│ [(1,'a'),(2,'b'),(3,'c')] │ +└────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/as-array.md b/tidb-cloud-lake/sql/as-array.md new file mode 100644 index 0000000000000..ba896855cc16c --- /dev/null +++ b/tidb-cloud-lake/sql/as-array.md @@ -0,0 +1,49 @@ +--- +title: AS_ARRAY +--- + +Strict casting `VARIANT` values to ARRAY data type. +If the input data type is not `VARIANT`, the output is `NULL`. +If the type of value in the `VARIANT` does not match the output value, the output is `NULL`. + +## Syntax + +```sql +AS_ARRAY( ) +``` + +## Arguments + +| Arguments | Description | +|-------------|-------------------| +| `` | The VARIANT value | + +## Return Type + +Variant contains Array + +## Examples + +```sql +SELECT as_array(parse_json('[1,2,3]')); ++---------------------------------+ +| as_array(parse_json('[1,2,3]')) | ++---------------------------------+ +| [1,2,3] | ++---------------------------------+ + +SELECT as_array(parse_json('["a","b","c"]')); ++---------------------------------------+ +| as_array(parse_json('["a","b","c"]')) | ++---------------------------------------+ +| ["a","b","c"] | ++---------------------------------------+ + +-- Returns NULL for non-array values +SELECT as_array(parse_json('{"key":"value"}')); ++-----------------------------------------+ +| as_array(parse_json('{"key":"value"}')) | ++-----------------------------------------+ +| NULL | ++-----------------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/as-binary.md b/tidb-cloud-lake/sql/as-binary.md new file mode 100644 index 0000000000000..14e2409f0520e --- /dev/null +++ b/tidb-cloud-lake/sql/as-binary.md @@ -0,0 +1,49 @@ +--- +title: AS_BINARY +--- + +Strict casting `VARIANT` values to BINARY data type. +If the input data type is not `VARIANT`, the output is `NULL`. +If the type of value in the `VARIANT` does not match the output value, the output is `NULL`. + +## Syntax + +```sql +AS_BINARY( ) +``` + +## Arguments + +| Arguments | Description | +|-------------|-------------------| +| `` | The VARIANT value | + +## Return Type + +BINARY + +## Examples + +```sql +SELECT as_binary(to_binary('abcd')::variant); ++---------------------------------------+ +| as_binary(to_binary('abcd')::variant) | ++---------------------------------------+ +| 61626364 | ++---------------------------------------+ + +SELECT as_binary(to_binary('hello')::variant); ++-----------------------------------------+ +| as_binary(to_binary('hello')::variant) | ++-----------------------------------------+ +| 68656C6C6F | ++-----------------------------------------+ + +-- Returns NULL for non-binary values +SELECT as_binary(parse_json('"text"')); ++---------------------------------+ +| as_binary(parse_json('"text"')) | ++---------------------------------+ +| NULL | ++---------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/as-boolean.md b/tidb-cloud-lake/sql/as-boolean.md new file mode 100644 index 0000000000000..ef1b919106003 --- /dev/null +++ b/tidb-cloud-lake/sql/as-boolean.md @@ -0,0 +1,49 @@ +--- +title: AS_BOOLEAN +--- + +Strict casting `VARIANT` values to BOOLEAN data type. +If the input data type is not `VARIANT`, the output is `NULL`. +If the type of value in the `VARIANT` does not match the output value, the output is `NULL`. + +## Syntax + +```sql +AS_BOOLEAN( ) +``` + +## Arguments + +| Arguments | Description | +|-------------|-------------------| +| `` | The VARIANT value | + +## Return Type + +BOOLEAN + +## Examples + +```sql +SELECT as_boolean(parse_json('true')); ++--------------------------------+ +| as_boolean(parse_json('true')) | ++--------------------------------+ +| 1 | ++--------------------------------+ + +SELECT as_boolean(parse_json('false')); ++---------------------------------+ +| as_boolean(parse_json('false')) | ++---------------------------------+ +| 0 | ++---------------------------------+ + +-- Returns NULL for non-boolean values +SELECT as_boolean(parse_json('123')); ++-------------------------------+ +| as_boolean(parse_json('123')) | ++-------------------------------+ +| NULL | ++-------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/as-date.md b/tidb-cloud-lake/sql/as-date.md new file mode 100644 index 0000000000000..332cede5ab501 --- /dev/null +++ b/tidb-cloud-lake/sql/as-date.md @@ -0,0 +1,49 @@ +--- +title: AS_DATE +--- + +Strict casting `VARIANT` values to DATE data type. +If the input data type is not `VARIANT`, the output is `NULL`. +If the type of value in the `VARIANT` does not match the output value, the output is `NULL`. + +## Syntax + +```sql +AS_DATE( ) +``` + +## Arguments + +| Arguments | Description | +|-------------|-------------------| +| `` | The VARIANT value | + +## Return Type + +DATE + +## Examples + +```sql +SELECT as_date(to_date('2025-10-11')::variant); ++-----------------------------------------+ +| as_date(to_date('2025-10-11')::variant) | ++-----------------------------------------+ +| 2025-10-11 | ++-----------------------------------------+ + +SELECT as_date(parse_json('"2024-12-25"')::variant); ++-----------------------------------------------+ +| as_date(parse_json('"2024-12-25"')::variant) | ++-----------------------------------------------+ +| 2024-12-25 | ++-----------------------------------------------+ + +-- Returns NULL for non-date values +SELECT as_date(parse_json('123')); ++----------------------------+ +| as_date(parse_json('123')) | ++----------------------------+ +| NULL | ++----------------------------+ +``` diff --git a/tidb-cloud-lake/sql/as-decimal.md b/tidb-cloud-lake/sql/as-decimal.md new file mode 100644 index 0000000000000..19143dbb6d849 --- /dev/null +++ b/tidb-cloud-lake/sql/as-decimal.md @@ -0,0 +1,49 @@ +--- +title: AS_DECIMAL +--- + +Strict casting `VARIANT` values to DECIMAL data type. +If the input data type is not `VARIANT`, the output is `NULL`. +If the type of value in the `VARIANT` does not match the output value, the output is `NULL`. + +## Syntax + +```sql +AS_DECIMAL( ) +``` + +## Arguments + +| Arguments | Description | +|-------------|-------------------| +| `` | The VARIANT value | + +## Return Type + +DECIMAL + +## Examples + +```sql +SELECT as_decimal(parse_json('12.34')); ++---------------------------------+ +| as_decimal(parse_json('12.34')) | ++---------------------------------+ +| 12.34 | ++---------------------------------+ + +SELECT as_decimal(parse_json('123.456789')); ++--------------------------------------+ +| as_decimal(parse_json('123.456789')) | ++--------------------------------------+ +| 123.456789 | ++--------------------------------------+ + +-- Returns NULL for non-decimal values +SELECT as_decimal(parse_json('"abc"')); ++---------------------------------+ +| as_decimal(parse_json('"abc"')) | ++---------------------------------+ +| NULL | ++---------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/as-float.md b/tidb-cloud-lake/sql/as-float.md new file mode 100644 index 0000000000000..0ae41bcfbf312 --- /dev/null +++ b/tidb-cloud-lake/sql/as-float.md @@ -0,0 +1,49 @@ +--- +title: AS_FLOAT +--- + +Strict casting `VARIANT` values to DOUBLE data type. +If the input data type is not `VARIANT`, the output is `NULL`. +If the type of value in the `VARIANT` does not match the output value, the output is `NULL`. + +## Syntax + +```sql +AS_FLOAT( ) +``` + +## Arguments + +| Arguments | Description | +|-------------|-------------------| +| `` | The VARIANT value | + +## Return Type + +DOUBLE + +## Examples + +```sql +SELECT as_float(parse_json('12.34')); ++-------------------------------+ +| as_float(parse_json('12.34')) | ++-------------------------------+ +| 12.34 | ++-------------------------------+ + +SELECT as_float(parse_json('123')); ++-----------------------------+ +| as_float(parse_json('123')) | ++-----------------------------+ +| 123.0 | ++-----------------------------+ + +-- Returns NULL for non-numeric values +SELECT as_float(parse_json('"abc"')); ++-------------------------------+ +| as_float(parse_json('"abc"')) | ++-------------------------------+ +| NULL | ++-------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/as-integer.md b/tidb-cloud-lake/sql/as-integer.md new file mode 100644 index 0000000000000..e71f6f763a5e1 --- /dev/null +++ b/tidb-cloud-lake/sql/as-integer.md @@ -0,0 +1,49 @@ +--- +title: AS_INTEGER +--- + +Strict casting `VARIANT` values to BIGINT data type. +If the input data type is not `VARIANT`, the output is `NULL`. +If the type of value in the `VARIANT` does not match the output value, the output is `NULL`. + +## Syntax + +```sql +AS_INTEGER( ) +``` + +## Arguments + +| Arguments | Description | +|-------------|-------------------| +| `` | The VARIANT value | + +## Return Type + +BIGINT + +## Examples + +```sql +SELECT as_integer(parse_json('123')); ++-------------------------------+ +| as_integer(parse_json('123')) | ++-------------------------------+ +| 123 | ++-------------------------------+ + +SELECT as_integer(parse_json('-456')); ++--------------------------------+ +| as_integer(parse_json('-456')) | ++--------------------------------+ +| -456 | ++--------------------------------+ + +-- Returns NULL for non-integer values +SELECT as_integer(parse_json('12.34')); ++---------------------------------+ +| as_integer(parse_json('12.34')) | ++---------------------------------+ +| NULL | ++---------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/as-object.md b/tidb-cloud-lake/sql/as-object.md new file mode 100644 index 0000000000000..184048b8cd569 --- /dev/null +++ b/tidb-cloud-lake/sql/as-object.md @@ -0,0 +1,49 @@ +--- +title: AS_OBJECT +--- + +Strict casting `VARIANT` values to OBJECT data type. +If the input data type is not `VARIANT`, the output is `NULL`. +If the type of value in the `VARIANT` does not match the output value, the output is `NULL`. + +## Syntax + +```sql +AS_OBJECT( ) +``` + +## Arguments + +| Arguments | Description | +|-------------|-------------------| +| `` | The VARIANT value | + +## Return Type + +Variant contains Object + +## Examples + +```sql +SELECT as_object(parse_json('{"k":"v","a":"b"}')); ++--------------------------------------------+ +| as_object(parse_json('{"k":"v","a":"b"}')) | ++--------------------------------------------+ +| {"k":"v","a":"b"} | ++--------------------------------------------+ + +SELECT as_object(parse_json('{"name":"John","age":30}')); ++-----------------------------------------------+ +| as_object(parse_json('{"name":"John","age":30}')) | ++-----------------------------------------------+ +| {"name":"John","age":30} | ++-----------------------------------------------+ + +-- Returns NULL for non-object values +SELECT as_object(parse_json('[1,2,3]')); ++----------------------------------+ +| as_object(parse_json('[1,2,3]')) | ++----------------------------------+ +| NULL | ++----------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/as-string.md b/tidb-cloud-lake/sql/as-string.md new file mode 100644 index 0000000000000..d1605064e8f0f --- /dev/null +++ b/tidb-cloud-lake/sql/as-string.md @@ -0,0 +1,49 @@ +--- +title: AS_STRING +--- + +Strict casting `VARIANT` values to VARCHAR data type. +If the input data type is not `VARIANT`, the output is `NULL`. +If the type of value in the `VARIANT` does not match the output value, the output is `NULL`. + +## Syntax + +```sql +AS_STRING( ) +``` + +## Arguments + +| Arguments | Description | +|-------------|-------------------| +| `` | The VARIANT value | + +## Return Type + +VARCHAR + +## Examples + +```sql +SELECT as_string(parse_json('"abc"')); ++--------------------------------+ +| as_string(parse_json('"abc"')) | ++--------------------------------+ +| abc | ++--------------------------------+ + +SELECT as_string(parse_json('"hello world"')); ++----------------------------------------+ +| as_string(parse_json('"hello world"')) | ++----------------------------------------+ +| hello world | ++----------------------------------------+ + +-- Returns NULL for non-string values +SELECT as_string(parse_json('123')); ++------------------------------+ +| as_string(parse_json('123')) | ++------------------------------+ +| NULL | ++------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/ascii.md b/tidb-cloud-lake/sql/ascii.md new file mode 100644 index 0000000000000..f50eb963e0779 --- /dev/null +++ b/tidb-cloud-lake/sql/ascii.md @@ -0,0 +1,32 @@ +--- +title: ASCII +--- + +Returns the numeric value of the leftmost character of the string str. + +## Syntax + +```sql +ASCII() +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | The string. | + +## Return Type + +`TINYINT` + +## Examples + +```sql +SELECT ASCII('2'); ++------------+ +| ASCII('2') | ++------------+ +| 50 | ++------------+ +``` diff --git a/tidb-cloud-lake/sql/asin.md b/tidb-cloud-lake/sql/asin.md new file mode 100644 index 0000000000000..4f7b235003d8d --- /dev/null +++ b/tidb-cloud-lake/sql/asin.md @@ -0,0 +1,23 @@ +--- +title: ASIN +--- + +Returns the arc sine of `x`, that is, the value whose sine is `x`. Returns NULL if `x` is not in the range -1 to 1. + +## Syntax + +```sql +ASIN( ) +``` + +## Examples + +```sql +SELECT ASIN(0.2); + +┌────────────────────┐ +│ asin(0.2) │ +├────────────────────┤ +│ 0.2013579207903308 │ +└────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/assume-not-null.md b/tidb-cloud-lake/sql/assume-not-null.md new file mode 100644 index 0000000000000..b6719467c22bb --- /dev/null +++ b/tidb-cloud-lake/sql/assume-not-null.md @@ -0,0 +1,36 @@ +--- +title: ASSUME_NOT_NULL +--- + +Results in an equivalent non-`Nullable` value for a Nullable type. In case the original value is `NULL` the result is undetermined. + +## Syntax + +```sql +ASSUME_NOT_NULL() +``` + +## Aliases + +- [REMOVE_NULLABLE](/tidb-cloud-lake/sql/remove-nullable.md) + +## Return Type + +Returns the original datatype from the non-`Nullable` type; Returns the embedded non-`Nullable` datatype for `Nullable` type. + +## Examples + +```sql +CREATE TABLE default.t_null ( x int, y int null); + +INSERT INTO default.t_null values (1, null), (2, 3); + +SELECT ASSUME_NOT_NULL(y), REMOVE_NULLABLE(y) FROM t_null; + +┌─────────────────────────────────────────┐ +│ assume_not_null(y) │ remove_nullable(y) │ +├────────────────────┼────────────────────┤ +│ 0 │ 0 │ +│ 3 │ 3 │ +└─────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/at.md b/tidb-cloud-lake/sql/at.md new file mode 100644 index 0000000000000..2039133a4a577 --- /dev/null +++ b/tidb-cloud-lake/sql/at.md @@ -0,0 +1,95 @@ +--- +title: AT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +The AT clause enables you to retrieve previous versions of your data by specifying a snapshot ID, timestamp, stream name, or a time interval. + +Databend automatically creates snapshots when data updates occur, so a snapshot can be considered as a view of your data at a time point in the past. You can access a snapshot by the snapshot ID or the timestamp at which the snapshot was created. For how to obtain the snapshot ID and timestamp, see [Obtaining Snapshot ID and Timestamp](#obtaining-snapshot-id-and-timestamp). + +This is part of the Databend's Time Travel feature that allows you to query, back up, and restore from a previous version of your data within the retention period (24 hours by default). + +## Syntax + +```sql +SELECT ... +FROM ... +AT ( + SNAPSHOT => '' | + TIMESTAMP => | + STREAM => | + OFFSET => + ) +``` + +| Parameter | Description | +|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| SNAPSHOT | Specifies a specific snapshot ID to query previous data from. | +| TIMESTAMP | Specifies a particular timestamp to retrieve data from. | +| STREAM | Indicates querying the data at the time the specified stream was created. | +| OFFSET | Specifies the number of seconds to go back from the current time. It should be in the form of a negative integer, where the absolute value represents the time difference in seconds. For example, `-3600` represents traveling back in time by 1 hour (3,600 seconds). | + +## Obtaining Snapshot ID and Timestamp + +To return the snapshot IDs and timestamps of all the snapshots of a table, use the [FUSE_SNAPSHOT](/tidb-cloud-lake/sql/fuse-snapshot.md) function: + +```sql +SELECT snapshot_id, + timestamp +FROM FUSE_SNAPSHOT('', ''); +``` + +## Examples + +This example demonstrates the AT clause, allowing retrieval of previous data versions based on a snapshot ID, timestamp, and stream: + +1. Create a table named `t` with a single column `a`, and insert two rows with values 1 and 2 into the table. + +```sql +CREATE TABLE t(a INT); + +INSERT INTO t VALUES(1); +INSERT INTO t VALUES(2); +``` + +2. Create a stream named `s` on the table `t`, and add an additional row with value 3 into the table. + +```sql +CREATE STREAM s ON TABLE t; + +INSERT INTO t VALUES(3); +``` + +3. Run time travel queries to retrieve previous data versions. + +```sql +-- Return snapshot IDs and corresponding timestamps for table 't' +SELECT snapshot_id, timestamp FROM FUSE_SNAPSHOT('default', 't'); +┌───────────────────────────────────────────────────────────────┐ +│ snapshot_id │ timestamp │ +├──────────────────────────────────┼────────────────────────────┤ +│ 296349da841d4fa8820bbf8e228d75f3 │ 2024-04-02 15:25:21.456574 │ +│ aaa4857c5935401790db2c9f0f2818be │ 2024-04-02 15:19:02.484304 │ +│ e66ad2bc3f21416e87903dc9cd0388a3 │ 2024-04-02 15:18:40.766361 │ +└───────────────────────────────────────────────────────────────┘ + +-- These queries retrieve the same data but using different methods: +-- by snapshot_id: +SELECT * FROM t AT (SNAPSHOT => 'aaa4857c5935401790db2c9f0f2818be'); +-- by timestamp: +SELECT * FROM t AT (TIMESTAMP => '2024-04-02 15:19:02.484304'::TIMESTAMP); +-- by stream: +SELECT * FROM t AT (STREAM => s); + +┌─────────────────┐ +│ a │ +├─────────────────┤ +│ 1 │ +│ 2 │ +└─────────────────┘ + +-- Retrieve all columns from table 't' with data from 60 seconds ago +SELECT * FROM t AT (OFFSET => -60); +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/atan-sql.md b/tidb-cloud-lake/sql/atan-sql.md new file mode 100644 index 0000000000000..0b1ca765cfa75 --- /dev/null +++ b/tidb-cloud-lake/sql/atan-sql.md @@ -0,0 +1,23 @@ +--- +title: ATAN2 +--- + +Returns the arc tangent of the two variables `x` and `y`. It is similar to calculating the arc tangent of `y` / `x`, except that the signs of both arguments are used to determine the quadrant of the result. `ATAN(y, x)` is a synonym for `ATAN2(y, x)`. + +## Syntax + +```sql +ATAN2( ) +``` + +## Examples + +```sql +SELECT ATAN2(-2, 2); + +┌─────────────────────┐ +│ atan2((- 2), 2) │ +├─────────────────────┤ +│ -0.7853981633974483 │ +└─────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/atan.md b/tidb-cloud-lake/sql/atan.md new file mode 100644 index 0000000000000..459fe53bb7d35 --- /dev/null +++ b/tidb-cloud-lake/sql/atan.md @@ -0,0 +1,23 @@ +--- +title: ATAN +--- + +Returns the arc tangent of `x`, that is, the value whose tangent is `x`. + +## Syntax + +```sql +ATAN( ) +``` + +## Examples + +```sql +SELECT ATAN(-2); + +┌─────────────────────┐ +│ atan((- 2)) │ +├─────────────────────┤ +│ -1.1071487177940906 │ +└─────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/attach-table.md b/tidb-cloud-lake/sql/attach-table.md new file mode 100644 index 0000000000000..70db2e63386e4 --- /dev/null +++ b/tidb-cloud-lake/sql/attach-table.md @@ -0,0 +1,143 @@ +--- +title: ATTACH TABLE +sidebar_position: 6 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +import EEFeature from '@site/src/components/EEFeature'; + + + +ATTACH TABLE creates a read-only link to existing table data without copying it. This command is ideal for data sharing across environments, especially when migrating from a private Databend deployment to [Databend Cloud](https://www.databend.com). + +## Key Features + +- **Zero-Copy Data Access**: Links to source data without physical data movement +- **Real-Time Updates**: Changes in the source table are instantly visible in attached tables +- **Read-Only Mode**: Only supports SELECT queries (no INSERT, UPDATE, or DELETE operations) +- **Column-Level Access**: Optionally include only specific columns for security and performance + +## Syntax + +```sql +ATTACH TABLE [ ( ) ] '' +CONNECTION = ( CONNECTION_NAME = '' ) +``` + +### Parameters + +- **``**: Name of the new attached table to create + +- **``**: Optional list of columns to include from the source table + - When omitted, all columns are included + - Provides column-level security and access control + - Example: `(customer_id, product, amount)` + +- **``**: Path to the source table data in object storage + - Format: `s3://///` + - Example: `s3://databend-toronto/1/23351/` + +- **`CONNECTION_NAME`**: References a connection created with [CREATE CONNECTION](/tidb-cloud-lake/sql/create-connection.md) + +### Finding the Source Table Path + +Use the [FUSE_SNAPSHOT](/tidb-cloud-lake/sql/fuse-snapshot.md) function to get the database and table IDs: + +```sql +SELECT snapshot_location FROM FUSE_SNAPSHOT('default', 'employees'); +-- Result contains: 1/23351/_ss/... → Path is s3://your-bucket/1/23351/ +``` + +## Data Sharing Benefits + +### How It Works + +``` + Object Storage (S3, MinIO, Azure, etc.) + ┌─────────────┐ + │ Source Data │ + └──────┬──────┘ + │ + ┌───────────────────────┼───────────────────────┐ + │ │ │ + ▼ ▼ ▼ +┌─────────────┐ ┌─────────────┐ ┌─────────────┐ +│ Marketing │ │ Finance │ │ Sales │ +│ Team View │ │ Team View │ │ Team View │ +└─────────────┘ └─────────────┘ └─────────────┘ +``` + +### Key Advantages + +| Traditional Approach | Databend ATTACH TABLE | +|---------------------|----------------------| +| Multiple data copies | Single copy shared by all | +| ETL delays, sync issues | Real-time, always current | +| Complex maintenance | Zero maintenance | +| More copies = more security risk | Fine-grained column access | +| Slower due to data movement | Full optimization on original data | + +### Security and Performance + +- **Column-Level Security**: Teams see only the columns they need +- **Real-Time Updates**: Source changes instantly visible in all attached tables +- **Strong Consistency**: Always see complete data snapshots, never partial updates +- **Full Performance**: Inherit all source table indexes and optimizations + +## Examples + +### Basic Usage + +```sql +-- Step 1: Create a connection to your storage +CREATE CONNECTION my_s3_connection + STORAGE_TYPE = 's3' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = ''; + +-- Step 2: Attach a table with all columns +ATTACH TABLE population_all_columns 's3://databend-doc/1/16/' + CONNECTION = (CONNECTION_NAME = 'my_s3_connection'); +``` + +### Column Selection for Security + +```sql +-- Attach only specific columns for data security +ATTACH TABLE population_selected (city, population) 's3://databend-doc/1/16/' + CONNECTION = (CONNECTION_NAME = 'my_s3_connection'); +``` + +### Using IAM Role Authentication + +```sql +-- Create a connection using IAM role (more secure than access keys) +CREATE CONNECTION s3_role_connection + STORAGE_TYPE = 's3' + ROLE_ARN = 'arn:aws:iam::123456789012:role/databend-role'; + +-- Attach table using the IAM role connection +ATTACH TABLE population_all_columns 's3://databend-doc/1/16/' + CONNECTION = (CONNECTION_NAME = 's3_role_connection'); +``` + +### Team-Specific Views + +```sql +-- Marketing: Customer behavior analysis +ATTACH TABLE marketing_view (customer_id, product, amount, order_date) +'s3://your-bucket/1/23351/' +CONNECTION = (CONNECTION_NAME = 'my_s3_connection'); + +-- Finance: Revenue tracking (different columns) +ATTACH TABLE finance_view (order_id, amount, profit, order_date) +'s3://your-bucket/1/23351/' +CONNECTION = (CONNECTION_NAME = 'my_s3_connection'); +``` + +## Learn More + +- [Linking Tables with ATTACH TABLE](/tidb-cloud-lake/tutorials/data-sharing-via-attach-table.md) diff --git a/tidb-cloud-lake/sql/avg-if.md b/tidb-cloud-lake/sql/avg-if.md new file mode 100644 index 0000000000000..d06d5b39deb35 --- /dev/null +++ b/tidb-cloud-lake/sql/avg-if.md @@ -0,0 +1,44 @@ +--- +title: AVG_IF +--- + + +## AVG_IF + +The suffix -If can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition. + +```sql +AVG_IF(, ) +``` + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE employees ( + id INT, + salary INT, + department VARCHAR +); + +INSERT INTO employees (id, salary, department) +VALUES (1, 50000, 'HR'), + (2, 60000, 'IT'), + (3, 55000, 'HR'), + (4, 70000, 'IT'), + (5, 65000, 'IT'); +``` + +**Query Demo: Calculate Average Salary for IT Department** + +```sql +SELECT AVG_IF(salary, department = 'IT') AS avg_salary_it +FROM employees; +``` + +**Result** +```sql +| avg_salary_it | +|-----------------| +| 65000.0 | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/avg.md b/tidb-cloud-lake/sql/avg.md new file mode 100644 index 0000000000000..6f3350d11d1cd --- /dev/null +++ b/tidb-cloud-lake/sql/avg.md @@ -0,0 +1,60 @@ +--- +title: AVG +--- + +Aggregate function. + +The AVG() function returns the average value of an expression. + +**Note:** NULL values are not counted. + +## Syntax + +```sql +AVG() +``` + +## Arguments + +| Arguments | Description | +|-----------|--------------------------| +| `` | Any numerical expression | + +## Return Type + +double + +## Examples + +**Creating a Table and Inserting Sample Data** + +Let's create a table named "sales" and insert some sample data: +```sql +CREATE TABLE sales ( + id INTEGER, + product VARCHAR(50), + price FLOAT +); + +INSERT INTO sales (id, product, price) +VALUES (1, 'Product A', 10.5), + (2, 'Product B', 20.75), + (3, 'Product C', 30.0), + (4, 'Product D', 15.25), + (5, 'Product E', 25.5); +``` + +**Query: Using AVG() Function** + +Now, let's use the AVG() function to find the average price of all products in the "sales" table: +```sql +SELECT AVG(price) AS avg_price +FROM sales; +``` + +The result should look like this: +```sql +| avg_price | +| --------- | +| 20.4 | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/begin.md b/tidb-cloud-lake/sql/begin.md new file mode 100644 index 0000000000000..1ad5862d8e7f0 --- /dev/null +++ b/tidb-cloud-lake/sql/begin.md @@ -0,0 +1,187 @@ +--- +title: BEGIN +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Starts a new transaction. BEGIN and [COMMIT](/tidb-cloud-lake/sql/commit.md)/[ROLLBACK](/tidb-cloud-lake/sql/rollback.md) must be used together to start and then either save or undo a transaction. + +- Databend does *not* support nested transactions, so unmatched transaction statements will be ignored. + + ```sql title="Example:" + BEGIN; -- Start a transaction + + MERGE INTO ... -- This statement belongs to the transaction + + BEGIN; -- Executing BEGIN within a transaction is ignored, no new transaction is started, no error is raised + + INSERT INTO ... -- This statement also belongs to the transaction + + COMMIT; -- End the transaction + + INSERT INTO ... -- This statement belongs to a single-statement transaction + + COMMIT; -- Executing COMMIT outside of a multi-statement transaction is ignored, no commit operation is performed, no error is raised + + BEGIN; -- Start another transaction + ... + ``` + +- When a DDL statement is executed within a multi-statement transaction, it will commit the current multi-statement transaction and execute subsequent statements as single-statement transactions until another BEGIN is issued. + + ```sql title="Example:" + BEGIN; -- Start a multi-statement transaction + + -- DML statements here are part of the current transaction + INSERT INTO table_name (column1, column2) VALUES (value1, value2); + + -- Executing a DDL statement within the transaction + CREATE TABLE new_table (column1 data_type, column2 data_type); + -- This will commit the current transaction + + -- Subsequent statements are executed as single-statement transactions + UPDATE table_name SET column1 = value WHERE condition; + + BEGIN; -- Start a new multi-statement transaction + + -- New DML statements here are part of the new transaction + DELETE FROM table_name WHERE condition; + + COMMIT; -- End the new transaction + ``` + + +## Syntax + +```sql +BEGIN [ TRANSACTION ] +``` + +## Transaction IDs & Statuses + +Databend automatically generates a transaction ID for each transaction. This ID allows users to identify which statements belong to the same transaction, facilitating issue troubleshooting. + +If you're on Databend Cloud, you can find the transaction IDs on **Monitor** > **SQL History**: + +![alt text](../../../../../../static/img/documents/sql/transaction-id.png) + +In the **Transaction** column, you can also see the transaction status of SQL statements during execution: + +| Transaction Status | Description | +|--------------------|-----------------------------------------------------------------------------------------------------------------------------| +| AutoCommit | The statement is not part of a multi-statement transaction. | +| Active | The statement is part of a multi-statement transaction, and all statements preceding it within the transaction succeeded. | +| Fail | The statement is part of a multi-statement transaction, and at least one preceding statement within the transaction failed. | + +## Examples + +In this example, all three statements (INSERT, UPDATE, DELETE) are part of the same multi-statement transaction. They are executed as a single unit, and changes are committed together when COMMIT is issued. + +```sql +-- Start by creating a table +CREATE TABLE employees ( + id INT, + name VARCHAR(50), + department VARCHAR(50) +); + +-- Start a multi-statement transaction +BEGIN; + +-- First statement in the transaction: Insert a new employee +INSERT INTO employees (id, name, department) VALUES (1, 'Alice', 'HR'); + +-- Second statement in the transaction: Insert another new employee +INSERT INTO employees (id, name, department) VALUES (2, 'Bob', 'Engineering'); + +-- Third statement in the transaction: Update the department of the first employee +UPDATE employees SET department = 'Finance' WHERE id = 1; + +-- Commit all the changes +COMMIT; + +-- Verify that the data in the table +SELECT * FROM employees; + +┌───────────────────────────────────────────────────────┐ +│ id │ name │ department │ +├─────────────────┼──────────────────┼──────────────────┤ +│ 1 │ Alice │ Finance │ +│ 2 │ Bob │ Engineering │ +└───────────────────────────────────────────────────────┘ +``` + +In this example, the ROLLBACK statement undoes all changes made during the transaction. As a result, the SELECT query at the end should show an empty employees table, confirming that no changes were committed. + +```sql +-- Start by creating a table +CREATE TABLE employees ( + id INT, + name VARCHAR(50), + department VARCHAR(50) +); + +-- Start a multi-statement transaction +BEGIN; + +-- First statement in the transaction: Insert a new employee +INSERT INTO employees (id, name, department) VALUES (1, 'Alice', 'HR'); + +-- Second statement in the transaction: Insert another new employee +INSERT INTO employees (id, name, department) VALUES (2, 'Bob', 'Engineering'); + +-- Third statement in the transaction: Update the department of the first employee +UPDATE employees SET department = 'Finance' WHERE id = 1; + +-- Rollback the transaction +ROLLBACK; + +-- Verify that the table is empty +SELECT * FROM employees; +``` + +This example sets up a stream and a task to consume the stream, inserting data into two target tables using a transactional block (BEGIN; COMMIT). + +```sql +CREATE DATABASE my_db; +USE my_db; + +CREATE TABLE source_table ( + id INT, + source_flag VARCHAR(50),value VARCHAR(50) +); + +CREATE TABLE target_table_1 ( + id INT,value VARCHAR(50) +); + +CREATE TABLE target_table_2 ( + id INT,value VARCHAR(50) +); + +CREATE STREAM source_stream ON TABLE source_table; + +INSERT INTO source_table VALUES +(1, 'source1', 'value1'), +(2, 'source2', 'value2'), +(3, 'source3', 'value3'), +(4, 'source4', 'value4'); + +CREATE TASK insert_task +WAREHOUSE = 'system' +SCHEDULE = 1 SECOND AS +BEGIN + BEGIN; + INSERT INTO my_db.target_table_1 + SELECT id, value + FROM my_db.source_stream; + + INSERT INTO my_db.target_table_2 + SELECT id, value + FROM my_db.source_stream; + COMMIT; +END; + +EXECUTE TASK insert_task; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bin.md b/tidb-cloud-lake/sql/bin.md new file mode 100644 index 0000000000000..cb7f2812925a9 --- /dev/null +++ b/tidb-cloud-lake/sql/bin.md @@ -0,0 +1,32 @@ +--- +title: BIN +--- + +Returns a string representation of the binary value of N. + +## Syntax + +```sql +BIN() +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | The number. | + +## Return Type + +`VARCHAR` + +## Examples + +```sql +SELECT BIN(12); ++---------+ +| BIN(12) | ++---------+ +| 1100 | ++---------+ +``` diff --git a/tidb-cloud-lake/sql/binary.md b/tidb-cloud-lake/sql/binary.md new file mode 100644 index 0000000000000..3b195af4ef76f --- /dev/null +++ b/tidb-cloud-lake/sql/binary.md @@ -0,0 +1,67 @@ +--- +title: Binary +description: Variable-length sequences of raw bytes. +sidebar_position: 2 +--- + +## Overview + +`BINARY` (alias `VARBINARY`) stores variable-length byte sequences. Unlike `STRING`, the value is not interpreted as UTF-8 text, making it suitable for payloads such as digests, compressed data, or serialized objects. Use conversion functions like [UNHEX](/tidb-cloud-lake/sql/unhex.md), [FROM_BASE64](/tidb-cloud-lake/sql/from-base64.md), and [TO_HEX](/tidb-cloud-lake/sql/hex.md) to encode or decode values when reading or writing the data. + +## Examples + +### Insert Raw Bytes + +```sql +CREATE TABLE binary_samples ( + id INT, + raw BINARY +); + +INSERT INTO binary_samples VALUES + (1, UNHEX('68656c6c6f')), -- "hello" + (2, FROM_BASE64('ZGF0YWJlbmQ=')); -- "databend" +``` + +```sql +SELECT + id, + HEX(raw) AS hex_value, + LENGTH(raw) AS byte_len +FROM binary_samples +ORDER BY id; +``` + +Result: +``` +┌────┬──────────────┬──────────┐ +│ id │ hex_value │ byte_len │ +├────┼──────────────┼──────────┤ +│ 1 │ 68656c6c6f │ 5 │ +│ 2 │ 6461746162656e64 │ 8 │ +└────┴──────────────┴──────────┘ +``` + +### Convert Back to Text + +Binary values can be converted to strings when needed: + +```sql +SELECT + id, + TO_VARCHAR(raw) AS text_value +FROM binary_samples +ORDER BY id; +``` + +Result: +``` +┌────┬─────────────┐ +│ id │ text_value │ +├────┼─────────────┤ +│ 1 │ hello │ +│ 2 │ databend │ +└────┴─────────────┘ +``` + +Binary columns accept NULL values and can also be nested inside ARRAY, MAP, or TUPLE structures when you need to store byte payloads alongside other data. diff --git a/tidb-cloud-lake/sql/bit-length.md b/tidb-cloud-lake/sql/bit-length.md new file mode 100644 index 0000000000000..b9e9027cae21e --- /dev/null +++ b/tidb-cloud-lake/sql/bit-length.md @@ -0,0 +1,33 @@ +--- +id: string-bit_length +title: BIT_LENGTH +--- + +Return the length of a string in bits. + +## Syntax + +```sql +BIT_LENGTH() +``` + +## Arguments + +| Arguments | Description | +|-----------| ----------- | +| `` | The string. | + +## Return Type + +`BIGINT` + +## Examples + +```sql +SELECT BIT_LENGTH('Word'); ++----------------------------+ +| SELECT BIT_LENGTH('Word'); | ++----------------------------+ +| 32 | ++----------------------------+ +``` diff --git a/tidb-cloud-lake/sql/bitmap-array.md b/tidb-cloud-lake/sql/bitmap-array.md new file mode 100644 index 0000000000000..7fa30106bb149 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-array.md @@ -0,0 +1,27 @@ +--- +title: BITMAP_TO_ARRAY +--- + +Converts a Bitmap into an Array. + +## Syntax + +```sql +BITMAP_TO_ARRAY( ) +``` + +## Return Type + +`Array (UInt64)` + +## Examples + +```sql +SELECT BITMAP_TO_ARRAY(TO_BITMAP('1, 3, 5')); + +╭───────────────────────────────────────╮ +│ bitmap_to_array(to_bitmap('1, 3, 5')) │ +├───────────────────────────────────────┤ +│ [1,3,5] │ +╰───────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/bitmap-cardinality.md b/tidb-cloud-lake/sql/bitmap-cardinality.md new file mode 100644 index 0000000000000..f7d6b44965f4d --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-cardinality.md @@ -0,0 +1,5 @@ +--- +title: BITMAP_CARDINALITY +--- + +Alias for [BITMAP_COUNT](/tidb-cloud-lake/sql/bitmap-count.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-contains.md b/tidb-cloud-lake/sql/bitmap-contains.md new file mode 100644 index 0000000000000..780d3e8c49852 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-contains.md @@ -0,0 +1,23 @@ +--- +title: BITMAP_CONTAINS +--- + +Checks if the bitmap contains a specific value. + +## Syntax + +```sql +BITMAP_CONTAINS( , ) +``` + +## Examples + +```sql +SELECT BITMAP_CONTAINS(BUILD_BITMAP([1,4,5]), 1); + +┌─────────────────────────────────────────────┐ +│ bitmap_contains(build_bitmap([1, 4, 5]), 1) │ +├─────────────────────────────────────────────┤ +│ true │ +└─────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-count-sql.md b/tidb-cloud-lake/sql/bitmap-count-sql.md new file mode 100644 index 0000000000000..d850ff93e56f1 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-count-sql.md @@ -0,0 +1,27 @@ +--- +title: BITMAP_COUNT +--- + +Counts the number of bits set to 1 in the bitmap. + +## Syntax + +```sql +BITMAP_COUNT( ) +``` + +## Aliases + +- [BITMAP_CARDINALITY](/tidb-cloud-lake/sql/bitmap-cardinality.md) + +## Examples + +```sql +SELECT BITMAP_COUNT(BUILD_BITMAP([1,4,5])), BITMAP_CARDINALITY(BUILD_BITMAP([1,4,5])); + +┌─────────────────────────────────────────────────────────────────────────────────────┐ +│ bitmap_count(build_bitmap([1, 4, 5])) │ bitmap_cardinality(build_bitmap([1, 4, 5])) │ +├───────────────────────────────────────┼─────────────────────────────────────────────┤ +│ 3 │ 3 │ +└─────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-count.md b/tidb-cloud-lake/sql/bitmap-count.md new file mode 100644 index 0000000000000..c2317033df852 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-count.md @@ -0,0 +1,23 @@ +--- +title: BITMAP_AND_COUNT +--- + +Counts the number of bits set to 1 in the bitmap by performing a logical AND operation. + +## Syntax + +```sql +BITMAP_AND_COUNT( ) +``` + +## Examples + +```sql +SELECT BITMAP_AND_COUNT(TO_BITMAP('1, 3, 5')); + +┌────────────────────────────────────────┐ +│ bitmap_and_count(to_bitmap('1, 3, 5')) │ +├────────────────────────────────────────┤ +│ 3 │ +└────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-functions.md b/tidb-cloud-lake/sql/bitmap-functions.md new file mode 100644 index 0000000000000..2a62567882ec6 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-functions.md @@ -0,0 +1,47 @@ +--- +title: Bitmap Functions +--- + +This page provides a comprehensive overview of Bitmap functions in Databend, organized by functionality for easy reference. + +## Bitmap Operations + +| Function | Description | Example | +|----------|-------------|---------| +| [BITMAP_AND](bitmap-and.md) | Performs a bitwise AND operation on two bitmaps | `BITMAP_AND(BUILD_BITMAP([1,4,5]), BUILD_BITMAP([4,5]))` → `{4,5}` | +| [BITMAP_OR](bitmap-or.md) | Performs a bitwise OR operation on two bitmaps | `BITMAP_OR(BUILD_BITMAP([1,2]), BUILD_BITMAP([2,3]))` → `{1,2,3}` | +| [BITMAP_XOR](/tidb-cloud-lake/sql/bitmap-xor.md) | Performs a bitwise XOR operation on two bitmaps | `BITMAP_XOR(BUILD_BITMAP([1,2,3]), BUILD_BITMAP([2,3,4]))` → `{1,4}` | +| [BITMAP_NOT](/tidb-cloud-lake/sql/bitmap-not.md) | Performs a bitwise NOT operation on a bitmap | `BITMAP_NOT(BUILD_BITMAP([1,2,3]), 5)` → `{0,4}` | +| [BITMAP_AND_NOT](bitmap-and-not.md) | Returns elements in the first bitmap but not in the second | `BITMAP_AND_NOT(BUILD_BITMAP([1,2,3]), BUILD_BITMAP([2,3]))` → `{1}` | +| [BITMAP_UNION](/tidb-cloud-lake/sql/bitmap-union.md) | Combines multiple bitmaps into one | `BITMAP_UNION([BUILD_BITMAP([1,2]), BUILD_BITMAP([2,3])])` → `{1,2,3}` | +| [BITMAP_INTERSECT](/tidb-cloud-lake/sql/bitmap-intersect.md) | Returns the intersection of multiple bitmaps | `BITMAP_INTERSECT([BUILD_BITMAP([1,2,3]), BUILD_BITMAP([2,3,4])])` → `{2,3}` | + +## Bitmap Information + +| Function | Description | Example | +|----------|-------------|---------| +| [BITMAP_COUNT](/tidb-cloud-lake/sql/bitmap-count.md) | Returns the number of elements in a bitmap | `BITMAP_COUNT(BUILD_BITMAP([1,2,3]))` → `3` | +| [BITMAP_CONTAINS](/tidb-cloud-lake/sql/bitmap-contains.md) | Checks if a bitmap contains a specific element | `BITMAP_CONTAINS(BUILD_BITMAP([1,2,3]), 2)` → `true` | +| [BITMAP_HAS_ANY](/tidb-cloud-lake/sql/bitmap-has-any.md) | Checks if a bitmap contains any element from another bitmap | `BITMAP_HAS_ANY(BUILD_BITMAP([1,2,3]), BUILD_BITMAP([3,4]))` → `true` | +| [BITMAP_HAS_ALL](/tidb-cloud-lake/sql/bitmap-has-all.md) | Checks if a bitmap contains all elements from another bitmap | `BITMAP_HAS_ALL(BUILD_BITMAP([1,2,3]), BUILD_BITMAP([2,3]))` → `true` | +| [BITMAP_MIN](/tidb-cloud-lake/sql/bitmap-min.md) | Returns the minimum element in a bitmap | `BITMAP_MIN(BUILD_BITMAP([1,2,3]))` → `1` | +| [BITMAP_MAX](/tidb-cloud-lake/sql/bitmap-max.md) | Returns the maximum element in a bitmap | `BITMAP_MAX(BUILD_BITMAP([1,2,3]))` → `3` | +| [BITMAP_CARDINALITY](/tidb-cloud-lake/sql/bitmap-cardinality.md) | Returns the number of elements in a bitmap | `BITMAP_CARDINALITY(BUILD_BITMAP([1,2,3]))` → `3` | + +## Bitmap Count Operations + +| Function | Description | Example | +|----------|-------------|---------| +| [BITMAP_AND_COUNT](bitmap-and-count.md) | Returns the count of elements in the bitwise AND of two bitmaps | `BITMAP_AND_COUNT(BUILD_BITMAP([1,2,3]), BUILD_BITMAP([2,3,4]))` → `2` | +| [BITMAP_OR_COUNT](bitmap-or-count.md) | Returns the count of elements in the bitwise OR of two bitmaps | `BITMAP_OR_COUNT(BUILD_BITMAP([1,2]), BUILD_BITMAP([2,3]))` → `3` | +| [BITMAP_XOR_COUNT](/tidb-cloud-lake/sql/bitmap-xor-count.md) | Returns the count of elements in the bitwise XOR of two bitmaps | `BITMAP_XOR_COUNT(BUILD_BITMAP([1,2,3]), BUILD_BITMAP([2,3,4]))` → `2` | +| [BITMAP_NOT_COUNT](/tidb-cloud-lake/sql/bitmap-not-count.md) | Returns the count of elements in the bitwise NOT of a bitmap | `BITMAP_NOT_COUNT(BUILD_BITMAP([1,2,3]), 5)` → `2` | +| [INTERSECT_COUNT](/tidb-cloud-lake/sql/intersect-count.md) | Returns the count of elements in the intersection of multiple bitmaps | `INTERSECT_COUNT([BUILD_BITMAP([1,2,3]), BUILD_BITMAP([2,3,4])])` → `2` | + +## Bitmap Subset Operations + +| Function | Description | Example | +|----------|-------------|---------| +| [SUB_BITMAP](/tidb-cloud-lake/sql/sub-bitmap.md) | Extracts a subset of a bitmap | `SUB_BITMAP(BUILD_BITMAP([1,2,3,4,5]), 1, 3)` → `{2,3,4}` | +| [BITMAP_SUBSET_IN_RANGE](bitmap-subset-in-range.md) | Returns a subset of a bitmap within a range | `BITMAP_SUBSET_IN_RANGE(BUILD_BITMAP([1,2,3,4,5]), 2, 4)` → `{2,3}` | +| [BITMAP_SUBSET_LIMIT](/tidb-cloud-lake/sql/bitmap-subset-limit.md) | Returns a subset of a bitmap with a limit | `BITMAP_SUBSET_LIMIT(BUILD_BITMAP([1,2,3,4,5]), 2, 2)` → `{3,4}` | diff --git a/tidb-cloud-lake/sql/bitmap-has-all.md b/tidb-cloud-lake/sql/bitmap-has-all.md new file mode 100644 index 0000000000000..58f508af333b0 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-has-all.md @@ -0,0 +1,23 @@ +--- +title: BITMAP_HAS_ALL +--- + +Checks if the first bitmap contains all the bits in the second bitmap. + +## Syntax + +```sql +BITMAP_HAS_ALL( , ) +``` + +## Examples + +```sql +SELECT BITMAP_HAS_ALL(BUILD_BITMAP([1,4,5]), BUILD_BITMAP([1,2])); + +┌───────────────────────────────────────────────────────────────┐ +│ bitmap_has_all(build_bitmap([1, 4, 5]), build_bitmap([1, 2])) │ +├───────────────────────────────────────────────────────────────┤ +│ false │ +└───────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-has-any.md b/tidb-cloud-lake/sql/bitmap-has-any.md new file mode 100644 index 0000000000000..56b0b8c7a9800 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-has-any.md @@ -0,0 +1,23 @@ +--- +title: BITMAP_HAS_ANY +--- + +Checks if the first bitmap has any bit matching the bits in the second bitmap. + +## Syntax + +```sql +BITMAP_HAS_ANY( , ) +``` + +## Examples + +```sql +SELECT BITMAP_HAS_ANY(BUILD_BITMAP([1,4,5]), BUILD_BITMAP([1,2])); + +┌───────────────────────────────────────────────────────────────┐ +│ bitmap_has_any(build_bitmap([1, 4, 5]), build_bitmap([1, 2])) │ +├───────────────────────────────────────────────────────────────┤ +│ true │ +└───────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-intersect.md b/tidb-cloud-lake/sql/bitmap-intersect.md new file mode 100644 index 0000000000000..c86418bd9b485 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-intersect.md @@ -0,0 +1,23 @@ +--- +title: BITMAP_INTERSECT +--- + +Counts the number of bits set to 1 in the bitmap by performing a logical INTERSECT operation. + +## Syntax + +```sql +BITMAP_INTERSECT( ) +``` + +## Examples + +```sql +SELECT BITMAP_INTERSECT(TO_BITMAP('1, 3, 5'))::String; + +┌────────────────────────────────────────────────┐ +│ bitmap_intersect(to_bitmap('1, 3, 5'))::string │ +├────────────────────────────────────────────────┤ +│ 1,3,5 │ +└────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-max.md b/tidb-cloud-lake/sql/bitmap-max.md new file mode 100644 index 0000000000000..adafe21be4194 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-max.md @@ -0,0 +1,23 @@ +--- +title: BITMAP_MAX +--- + +Gets the maximum value in the bitmap. + +## Syntax + +```sql +BITMAP_MAX( ) +``` + +## Examples + +```sql +SELECT BITMAP_MAX(BUILD_BITMAP([1,4,5])); + +┌─────────────────────────────────────┐ +│ bitmap_max(build_bitmap([1, 4, 5])) │ +├─────────────────────────────────────┤ +│ 5 │ +└─────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-min.md b/tidb-cloud-lake/sql/bitmap-min.md new file mode 100644 index 0000000000000..a074007cf21e2 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-min.md @@ -0,0 +1,23 @@ +--- +title: BITMAP_MIN +--- + +Gets the minimum value in the bitmap. + +## Syntax + +```sql +BITMAP_MIN( ) +``` + +## Examples + +```sql +SELECT BITMAP_MIN(BUILD_BITMAP([1,4,5])); + +┌─────────────────────────────────────┐ +│ bitmap_min(build_bitmap([1, 4, 5])) │ +├─────────────────────────────────────┤ +│ 1 │ +└─────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-not-count.md b/tidb-cloud-lake/sql/bitmap-not-count.md new file mode 100644 index 0000000000000..dfdfcac314460 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-not-count.md @@ -0,0 +1,23 @@ +--- +title: BITMAP_NOT_COUNT +--- + +Counts the number of bits set to 0 in the bitmap by performing a logical NOT operation. + +## Syntax + +```sql +BITMAP_NOT_COUNT( ) +``` + +## Examples + +```sql +SELECT BITMAP_NOT_COUNT(TO_BITMAP('1, 3, 5')); + +┌────────────────────────────────────────┐ +│ bitmap_not_count(to_bitmap('1, 3, 5')) │ +├────────────────────────────────────────┤ +│ 3 │ +└────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-not-sql.md b/tidb-cloud-lake/sql/bitmap-not-sql.md new file mode 100644 index 0000000000000..eb8b3712791c5 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-not-sql.md @@ -0,0 +1,35 @@ +--- +title: BITMAP_NOT +--- + +Generates a new bitmap with elements from the first bitmap that are not in the second one. + +## Syntax + +```sql +BITMAP_NOT( , ) +``` + +## Aliases + +- [BITMAP_AND_NOT](bitmap-and-not.md) + +## Examples + +```sql +SELECT BITMAP_NOT(BUILD_BITMAP([1,4,5]), BUILD_BITMAP([5,6,7]))::String; + +┌──────────────────────────────────────────────────────────────────────┐ +│ bitmap_not(build_bitmap([1, 4, 5]), build_bitmap([5, 6, 7]))::string │ +├──────────────────────────────────────────────────────────────────────┤ +│ 1,4 │ +└──────────────────────────────────────────────────────────────────────┘ + +SELECT BITMAP_AND_NOT(BUILD_BITMAP([1,4,5]), BUILD_BITMAP([5,6,7]))::String; + +┌──────────────────────────────────────────────────────────────────────────┐ +│ bitmap_and_not(build_bitmap([1, 4, 5]), build_bitmap([5, 6, 7]))::string │ +├──────────────────────────────────────────────────────────────────────────┤ +│ 1,4 │ +└──────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-not.md b/tidb-cloud-lake/sql/bitmap-not.md new file mode 100644 index 0000000000000..e692d7fd1df61 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-not.md @@ -0,0 +1,5 @@ +--- +title: BITMAP_AND_NOT +--- + +Alias for [BITMAP_NOT](/tidb-cloud-lake/sql/bitmap-not.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-or-count.md b/tidb-cloud-lake/sql/bitmap-or-count.md new file mode 100644 index 0000000000000..d771f5f8deb15 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-or-count.md @@ -0,0 +1,23 @@ +--- +title: BITMAP_OR_COUNT +--- + +Counts the number of bits set to 1 in the bitmap by performing a logical OR operation. + +## Syntax + +```sql +BITMAP_OR_COUNT( ) +``` + +## Examples + +```sql +SELECT BITMAP_OR_COUNT(TO_BITMAP('1, 3, 5')); + +┌───────────────────────────────────────┐ +│ bitmap_or_count(to_bitmap('1, 3, 5')) │ +├───────────────────────────────────────┤ +│ 3 │ +└───────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-or.md b/tidb-cloud-lake/sql/bitmap-or.md new file mode 100644 index 0000000000000..2e16464fff21e --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-or.md @@ -0,0 +1,23 @@ +--- +title: BITMAP_OR +--- + +Performs a bitwise OR operation on the two bitmaps. + +## Syntax + +```sql +BITMAP_OR( , ) +``` + +## Examples + +```sql +SELECT BITMAP_OR(BUILD_BITMAP([1,4,5]), BUILD_BITMAP([6,7]))::String; + +┌──────────────────────────────────────────────────────────────────┐ +│ bitmap_or(build_bitmap([1, 4, 5]), build_bitmap([6, 7]))::string │ +├──────────────────────────────────────────────────────────────────┤ +│ 1,4,5,6,7 │ +└──────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-sql.md b/tidb-cloud-lake/sql/bitmap-sql.md new file mode 100644 index 0000000000000..dcf44df7b22dd --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-sql.md @@ -0,0 +1,23 @@ +--- +title: BITMAP_AND +--- + +Performs a bitwise AND operation on the two bitmaps. + +## Syntax + +```sql +BITMAP_AND( , ) +``` + +## Examples + +```sql +SELECT BITMAP_AND(BUILD_BITMAP([1,4,5]), BUILD_BITMAP([4,5]))::String; + +┌───────────────────────────────────────────────────────────────────┐ +│ bitmap_and(build_bitmap([1, 4, 5]), build_bitmap([4, 5]))::string │ +├───────────────────────────────────────────────────────────────────┤ +│ 4,5 │ +└───────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-subset-limit.md b/tidb-cloud-lake/sql/bitmap-subset-limit.md new file mode 100644 index 0000000000000..eb06cc5b4f3fc --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-subset-limit.md @@ -0,0 +1,23 @@ +--- +title: BITMAP_SUBSET_LIMIT +--- + +Generates a sub-bitmap of the source bitmap, beginning with a range from the start value, with a size limit. + +## Syntax + +```sql +BITMAP_SUBSET_LIMIT( , , ) +``` + +## Examples + +```sql +SELECT BITMAP_SUBSET_LIMIT(BUILD_BITMAP([1,4,5]), 2, 2)::String; + +┌────────────────────────────────────────────────────────────┐ +│ bitmap_subset_limit(build_bitmap([1, 4, 5]), 2, 2)::string │ +├────────────────────────────────────────────────────────────┤ +│ 4,5 │ +└────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-subset-range.md b/tidb-cloud-lake/sql/bitmap-subset-range.md new file mode 100644 index 0000000000000..cf38aaa0c9639 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-subset-range.md @@ -0,0 +1,23 @@ +--- +title: BITMAP_SUBSET_IN_RANGE +--- + +Generates a sub-bitmap of the source bitmap within a specified range. + +## Syntax + +```sql +BITMAP_SUBSET_IN_RANGE( , , ) +``` + +## Examples + +```sql +SELECT BITMAP_SUBSET_IN_RANGE(BUILD_BITMAP([5,7,9]), 6, 9)::String; + +┌───────────────────────────────────────────────────────────────┐ +│ bitmap_subset_in_range(build_bitmap([5, 7, 9]), 6, 9)::string │ +├───────────────────────────────────────────────────────────────┤ +│ 7 │ +└───────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-union.md b/tidb-cloud-lake/sql/bitmap-union.md new file mode 100644 index 0000000000000..1b58252ebffc4 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-union.md @@ -0,0 +1,23 @@ +--- +title: BITMAP_UNION +--- + +Counts the number of bits set to 1 in the bitmap by performing a logical UNION operation. + +## Syntax + +```sql +BITMAP_UNION( ) +``` + +## Examples + +```sql +SELECT BITMAP_UNION(TO_BITMAP('1, 3, 5'))::String; + +┌────────────────────────────────────────────┐ +│ bitmap_union(to_bitmap('1, 3, 5'))::string │ +├────────────────────────────────────────────┤ +│ 1,3,5 │ +└────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-xor-count.md b/tidb-cloud-lake/sql/bitmap-xor-count.md new file mode 100644 index 0000000000000..58d5ba4e62aea --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-xor-count.md @@ -0,0 +1,23 @@ +--- +title: BITMAP_XOR_COUNT +--- + +Counts the number of bits set to 1 in the bitmap by performing a logical XOR (exclusive OR) operation. + +## Syntax + +```sql +BITMAP_XOR_COUNT( ) +``` + +## Examples + +```sql +SELECT BITMAP_XOR_COUNT(TO_BITMAP('1, 3, 5')); + +┌────────────────────────────────────────┐ +│ bitmap_xor_count(to_bitmap('1, 3, 5')) │ +├────────────────────────────────────────┤ +│ 3 │ +└────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap-xor.md b/tidb-cloud-lake/sql/bitmap-xor.md new file mode 100644 index 0000000000000..439c37f0d4cb3 --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap-xor.md @@ -0,0 +1,23 @@ +--- +title: BITMAP_XOR +--- + +Performs a bitwise XOR (exclusive OR) operation on the two bitmaps. + +## Syntax + +```sql +BITMAP_XOR( , ) +``` + +## Examples + +```sql +SELECT BITMAP_XOR(BUILD_BITMAP([1,4,5]), BUILD_BITMAP([5,6,7]))::String; + +┌──────────────────────────────────────────────────────────────────────┐ +│ bitmap_xor(build_bitmap([1, 4, 5]), build_bitmap([5, 6, 7]))::string │ +├──────────────────────────────────────────────────────────────────────┤ +│ 1,4,6,7 │ +└──────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bitmap.md b/tidb-cloud-lake/sql/bitmap.md new file mode 100644 index 0000000000000..f73cfd292851c --- /dev/null +++ b/tidb-cloud-lake/sql/bitmap.md @@ -0,0 +1,66 @@ +--- +title: Bitmap +sidebar_position: 12 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +## Overview + +`BITMAP` stores membership information for unsigned 64-bit integers and supports fast set operations (count, union, intersection, etc.). SELECT statements show a binary blob, so use [Bitmap Functions](/tidb-cloud-lake/sql/bitmap-functions.md) to interpret the values. + +## Examples + +### Build Bitmaps + +`TO_BITMAP` accepts either a comma-separated string or a `UINT64` value (treated as a single element). `TO_STRING` serializes the bitmap back to readable text. + +```sql +SELECT + TO_BITMAP('1,2,3') AS str_input, + TO_STRING(TO_BITMAP('1,2,3')) AS round_tripped, + TO_STRING(TO_BITMAP(123)) AS from_uint64; +``` + +Result: +``` +┌────────────────────────────────┬──────────────────────────────────┬────────────────┐ +│ str_input │ round_tripped │ from_uint64 │ +├────────────────────────────────┼──────────────────────────────────┼────────────────┤ +│ │ 1,2,3 │ 123 │ +└────────────────────────────────┴──────────────────────────────────┴────────────────┘ +``` + +### Persist Bitmaps + +Use `BUILD_BITMAP` to turn an array into a bitmap before inserting it into a table. Aggregate functions such as `BITMAP_COUNT` can then read the stored values quickly. + +```sql +CREATE TABLE user_visits ( + user_id INT, + page_visits BITMAP +); + +INSERT INTO user_visits VALUES + (1, BUILD_BITMAP([2, 5, 8, 10])), + (2, BUILD_BITMAP([3, 7, 9])), + (3, BUILD_BITMAP([1, 4, 6, 10])); + +SELECT + user_id, + BITMAP_COUNT(page_visits) AS distinct_pages, + BITMAP_HAS_ALL(page_visits, BUILD_BITMAP([10])) AS saw_page_10 +FROM user_visits; +``` + +Result: +``` +┌────────┬────────────────┬─────────────┐ +│ user_id │ distinct_pages │ saw_page_10 │ +├────────┼────────────────┼─────────────┤ +│ 1 │ 4 │ true │ +│ 2 │ 3 │ false │ +│ 3 │ 4 │ true │ +└────────┴────────────────┴─────────────┘ +``` diff --git a/tidb-cloud-lake/sql/blake.md b/tidb-cloud-lake/sql/blake.md new file mode 100644 index 0000000000000..8bb66b6e0e4c0 --- /dev/null +++ b/tidb-cloud-lake/sql/blake.md @@ -0,0 +1,23 @@ +--- +title: BLAKE3 +--- + +Calculates a BLAKE3 256-bit checksum for a string. The value is returned as a string of 64 hexadecimal digits or NULL if the argument was NULL. + +## Syntax + +```sql +BLAKE3() +``` + +## Examples + +```sql +SELECT BLAKE3('1234567890'); + +┌──────────────────────────────────────────────────────────────────┐ +│ blake3('1234567890') │ +├──────────────────────────────────────────────────────────────────┤ +│ d12e417e04494572b561ba2c12c3d7f9e5107c4747e27b9a8a54f8480c63e841 │ +└──────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bool-sql.md b/tidb-cloud-lake/sql/bool-sql.md new file mode 100644 index 0000000000000..0ca6d2ef6af79 --- /dev/null +++ b/tidb-cloud-lake/sql/bool-sql.md @@ -0,0 +1,50 @@ +--- +title: bool_or +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns true if at least one input value is true, otherwise false + +- NULL values are ignored. +- If all input values are null, the result is null. +- Supports for boolean types + +## Syntax + +```sql +bool_or() +``` + +## Return Type + +Same as the input type. + +## Examples + +```sql +select bool_or(t) from (values (true), (true), (null)) a(t); +╭───────────────────╮ +│ bool_or(t) │ +│ Nullable(Boolean) │ +├───────────────────┤ +│ true │ +╰───────────────────╯ + +select bool_or(t) from (values (true), (true), (false)) a(t); +╭───────────────────╮ +│ bool_or(t) │ +│ Nullable(Boolean) │ +├───────────────────┤ +│ true │ +╰───────────────────╯ + +select bool_or(t) from (values (false), (false), (false)) a(t); +╭───────────────────╮ +│ bool_or(t) │ +│ Nullable(Boolean) │ +├───────────────────┤ +│ false │ +╰───────────────────╯ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/bool.md b/tidb-cloud-lake/sql/bool.md new file mode 100644 index 0000000000000..db34c43c42e9a --- /dev/null +++ b/tidb-cloud-lake/sql/bool.md @@ -0,0 +1,51 @@ +--- +title: bool_and +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns true if all input values are true, otherwise false. + +- NULL values are ignored. +- If all input values are null, the result is null. +- Supports for boolean types + +## Syntax + +```sql +bool_and() +``` + +## Return Type + +Same as the input type. + +## Examples + +```sql +select bool_and(t) from (values (true), (true), (null)) a(t); +╭───────────────────╮ +│ bool_and(t) │ +│ Nullable(Boolean) │ +├───────────────────┤ +│ true │ +╰───────────────────╯ + +select bool_and(t) from (values (true), (true), (true)) a(t); + +╭───────────────────╮ +│ bool_and(t) │ +│ Nullable(Boolean) │ +├───────────────────┤ +│ true │ +╰───────────────────╯ + +select bool_and(t) from (values (true), (true), (false)) a(t); +╭───────────────────╮ +│ bool_and(t) │ +│ Nullable(Boolean) │ +├───────────────────┤ +│ false │ +╰───────────────────╯ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/boolean.md b/tidb-cloud-lake/sql/boolean.md new file mode 100644 index 0000000000000..bf5e34fdd6633 --- /dev/null +++ b/tidb-cloud-lake/sql/boolean.md @@ -0,0 +1,43 @@ +--- +title: Boolean +description: Basic logical data type. +sidebar_position: 1 +--- + +## Overview + +`BOOLEAN` (alias `BOOL`) represents `TRUE` or `FALSE` and always uses one byte of storage. Numeric and string inputs automatically coerce to boolean values when possible. + +| Input Type | Converts to TRUE | Converts to FALSE | Notes | +|------------|-----------------|-------------------|-------| +| Numeric | Any non-zero | 0 | Negative numbers convert to TRUE. | +| String | `TRUE` | `FALSE` | Case-insensitive; other text fails to cast. | + +## Examples + +```sql +SELECT + 0::BOOLEAN AS zero_is_false, + 42::BOOLEAN AS nonzero_is_true, + 'True'::BOOLEAN AS string_true, + 'false'::BOOLEAN AS string_false; +``` + +Result: +``` +┌───────────────┬──────────────────┬───────────────┬────────────────┐ +│ zero_is_false │ nonzero_is_true │ string_true │ string_false │ +├───────────────┼──────────────────┼───────────────┼────────────────┤ +│ false │ true │ true │ false │ +└───────────────┴──────────────────┴───────────────┴────────────────┘ +``` + +```sql +-- Casting unsupported text raises an error. +SELECT 'yes'::BOOLEAN; +``` + +Result: +``` +ERROR 1105 (HY000): QueryFailed: [1006]cannot parse to type `BOOLEAN` while evaluating function `to_boolean('yes')` in expr `CAST('yes' AS Boolean)` +``` diff --git a/tidb-cloud-lake/sql/build-bitmap.md b/tidb-cloud-lake/sql/build-bitmap.md new file mode 100644 index 0000000000000..876fa26d38799 --- /dev/null +++ b/tidb-cloud-lake/sql/build-bitmap.md @@ -0,0 +1,23 @@ +--- +title: BUILD_BITMAP +--- + +Converts an array of positive integers to a BITMAP value. + +## Syntax + +```sql +BUILD_BITMAP( ) +``` + +## Examples + +```sql +SELECT BUILD_BITMAP([1,4,5])::String; + +┌─────────────────────────────────┐ +│ build_bitmap([1, 4, 5])::string │ +├─────────────────────────────────┤ +│ 1,4,5 │ +└─────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/call-procedure.md b/tidb-cloud-lake/sql/call-procedure.md new file mode 100644 index 0000000000000..701a6c7c35d94 --- /dev/null +++ b/tidb-cloud-lake/sql/call-procedure.md @@ -0,0 +1,38 @@ +--- +title: CALL PROCEDURE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Executes a stored procedure by calling its name, optionally passing arguments if the procedure requires them. + +## Syntax + +```sql +CALL PROCEDURE ([, , ...]) +``` + +## Examples + +This example demonstrates how to create and call a stored procedure that converts a weight from kilograms (kg) to pounds (lb): + +```sql +CREATE PROCEDURE convert_kg_to_lb(kg DECIMAL(4, 2)) +RETURNS DECIMAL(10, 2) +LANGUAGE SQL +COMMENT = 'Converts kilograms to pounds' +AS $$ +BEGIN + RETURN kg * 2.20462; +END; +$$; + +CALL PROCEDURE convert_kg_to_lb(10.00); + +┌────────────┐ +│ Result │ +├────────────┤ +│ 22.0462000 │ +└────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/case.md b/tidb-cloud-lake/sql/case.md new file mode 100644 index 0000000000000..e8ad1c2305877 --- /dev/null +++ b/tidb-cloud-lake/sql/case.md @@ -0,0 +1,61 @@ +--- +title: CASE +--- + +Handles IF/THEN logic. It is structured with at least one pair of `WHEN` and `THEN` statements. Every `CASE` statement must be concluded with the `END` keyword. The `ELSE` statement is optional, providing a way to capture values not explicitly specified in the `WHEN` and `THEN` statements. + +## Syntax + +```sql +CASE + WHEN THEN + [ WHEN THEN ] + [ ... ] + [ ELSE ] +END AS +``` + +## Examples + +This example categorizes employee salaries using a CASE statement, presenting details with a dynamically assigned column named "SalaryCategory": + +```sql +-- Create a sample table +CREATE TABLE Employee ( + EmployeeID INT, + FirstName VARCHAR(50), + LastName VARCHAR(50), + Salary INT +); + +-- Insert some sample data +INSERT INTO Employee VALUES (1, 'John', 'Doe', 50000); +INSERT INTO Employee VALUES (2, 'Jane', 'Smith', 60000); +INSERT INTO Employee VALUES (3, 'Bob', 'Johnson', 75000); +INSERT INTO Employee VALUES (4, 'Alice', 'Williams', 90000); + +-- Add a new column 'SalaryCategory' using CASE statement +-- Categorize employees based on their salary +SELECT + EmployeeID, + FirstName, + LastName, + Salary, + CASE + WHEN Salary < 60000 THEN 'Low' + WHEN Salary >= 60000 AND Salary < 80000 THEN 'Medium' + WHEN Salary >= 80000 THEN 'High' + ELSE 'Unknown' + END AS SalaryCategory +FROM + Employee; + +┌──────────────────────────────────────────────────────────────────────────────────────────┐ +│ employeeid │ firstname │ lastname │ salary │ salarycategory │ +├─────────────────┼──────────────────┼──────────────────┼─────────────────┼────────────────┤ +│ 1 │ John │ Doe │ 50000 │ Low │ +│ 2 │ Jane │ Smith │ 60000 │ Medium │ +│ 4 │ Alice │ Williams │ 90000 │ High │ +│ 3 │ Bob │ Johnson │ 75000 │ Medium │ +└──────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/cast.md b/tidb-cloud-lake/sql/cast.md new file mode 100644 index 0000000000000..26c8a314547f5 --- /dev/null +++ b/tidb-cloud-lake/sql/cast.md @@ -0,0 +1,39 @@ +--- +title: "CAST::" +--- + +Converts a value from one data type to another. `::` is an alias for CAST. + +See also: [TRY_CAST](/tidb-cloud-lake/sql/try-cast.md) + +## Syntax + +```sql +CAST( AS ) + +:: +``` + +## Examples + +```sql +SELECT CAST(1 AS VARCHAR), 1::VARCHAR; + +┌───────────────────────────────┐ +│ cast(1 as string) │ 1::string │ +├───────────────────┼───────────┤ +│ 1 │ 1 │ +└───────────────────────────────┘ +``` + + +Cast String to +Variant and Cast Variant to `Map` +```sql +select '{"k1":"v1","k2":"v2"}'::Variant a, a::Map(String, String) b, b::Variant = a; +┌──────────────────────┬──────────────────────┬────────────────┐ +│ a │ b │ b::VARIANT = a │ +├──────────────────────┼──────────────────────┼────────────────┤ +│ {"k1":"v1","k2":"v2"}│ {'k1':'v1','k2':'v2'}│ 1 │ +└──────────────────────┴──────────────────────┴────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/cbrt.md b/tidb-cloud-lake/sql/cbrt.md new file mode 100644 index 0000000000000..8cb525d400a48 --- /dev/null +++ b/tidb-cloud-lake/sql/cbrt.md @@ -0,0 +1,23 @@ +--- +title: CBRT +--- + +Returns the cube root of a nonnegative number `x`. + +## Syntax + +```sql +CBRT( ) +``` + +## Examples + +```sql +SELECT CBRT(27); + +┌──────────┐ +│ cbrt(27) │ +├──────────┤ +│ 3 │ +└──────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/ceil.md b/tidb-cloud-lake/sql/ceil.md new file mode 100644 index 0000000000000..a32f1407b1e9f --- /dev/null +++ b/tidb-cloud-lake/sql/ceil.md @@ -0,0 +1,27 @@ +--- +title: CEIL +--- + +Rounds the number up. + +## Syntax + +```sql +CEIL( ) +``` + +## Aliases + +- [CEILING](/tidb-cloud-lake/sql/ceiling.md) + +## Examples + +```sql +SELECT CEILING(-1.23), CEIL(-1.23); + +┌────────────────────────────────────┐ +│ ceiling((- 1.23)) │ ceil((- 1.23)) │ +├───────────────────┼────────────────┤ +│ -1 │ -1 │ +└────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/ceiling.md b/tidb-cloud-lake/sql/ceiling.md new file mode 100644 index 0000000000000..ac24c36e94faa --- /dev/null +++ b/tidb-cloud-lake/sql/ceiling.md @@ -0,0 +1,5 @@ +--- +title: CEILING +--- + +Alias for [CEIL](/tidb-cloud-lake/sql/ceil.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/changes.md b/tidb-cloud-lake/sql/changes.md new file mode 100644 index 0000000000000..544d176c1b6aa --- /dev/null +++ b/tidb-cloud-lake/sql/changes.md @@ -0,0 +1,135 @@ +--- +title: CHANGES +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +The CHANGES clause allows querying the change tracking metadata for a table within a defined time interval. Please note that the time interval must fall within the data retention period (defaulted to 24 hours). To define a time interval, use the `AT` keyword to specify a time point as the start of the interval, with the current time being applied as the default end of the interval. If you wish to specify a past time as the end of the interval, use the `END` keyword in conjunction with the `AT` keyword to set the interval. + +![alt text](/img/sql/changes.png) + +## Syntax + +```sql +SELECT ... +FROM ... + CHANGES ( INFORMATION => { DEFAULT | APPEND_ONLY } ) + AT ( { TIMESTAMP => | + OFFSET => | + SNAPSHOT => '' | + STREAM => } ) + + [ END ( { TIMESTAMP => | + OFFSET => | + SNAPSHOT => '' } ) ] +``` + +| Parameter | Description | +| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| INFORMATION | Specifies the type of change tracking metadata to be retrieved. Can be set to either `DEFAULT` or `APPEND_ONLY`. `DEFAULT` returns all DML changes, including inserts, updates, and deletes. When set to `APPEND_ONLY`, only appended rows are returned. | +| AT | Specifies the starting point of the time interval for querying change tracking metadata. | +| END | Optional parameter specifying the end point of the time interval for querying change tracking metadata. If not provided, the current time is used as the default end point. | +| TIMESTAMP | Specifies a specific timestamp as the reference point for querying change tracking metadata. | +| OFFSET | Specifies a time interval in seconds relative to the current time as the reference point for querying change tracking metadata. It should be in the form of a negative integer, where the absolute value represents the time difference in seconds. For example, `-3600` represents traveling back in time by 1 hour (3,600 seconds). | +| SNAPSHOT | Specifies a snapshot ID as the reference point for querying change tracking metadata. | +| STREAM | Specifies a stream name as the reference point for querying change tracking metadata. | + +## Enabling Change Tracking + +The CHANGES clause requires that the Fuse engine option `change_tracking` must be set to `true` on the table. For more information about the `change_tracking` option, see [Fuse Engine Options](/tidb-cloud-lake/sql/table-engines.md#options). + +```sql title='Example:' +-- Enable change tracking for table 't' +ALTER TABLE t SET OPTIONS(change_tracking = true); +``` + +## Examples + +This example demonstrates the use of the CHANGES clause, allowing for the tracking and querying of changes made to a table: + +1. Create a table to store user profile information and enable change tracking. + +```sql +CREATE TABLE user_profiles ( + user_id INT, + username VARCHAR(255), + bio TEXT +) change_tracking = true; + + +INSERT INTO user_profiles VALUES (1, 'john_doe', 'Software Engineer'); +INSERT INTO user_profiles VALUES (2, 'jane_smith', 'Marketing Specialist'); +``` + +2. Create a stream to capture profile updates, then update an exiting profile and insert a new one. + +```sql +CREATE STREAM profile_updates ON TABLE user_profiles APPEND_ONLY = TRUE; + + +UPDATE user_profiles SET bio = 'Data Scientist' WHERE user_id = 1; +INSERT INTO user_profiles VALUES (3, 'alex_wong', 'Data Analyst'); +``` + +3. Query changes in user profiles by the stream. + +```sql +-- Return all changes in user profiles captured in the stream +SELECT * +FROM user_profiles +CHANGES (INFORMATION => DEFAULT) +AT (STREAM => profile_updates); + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ user_id │ username │ bio │ change$action │ change$row_id │ change$is_update │ +├─────────────────┼──────────────────┼───────────────────┼──────────────────┼────────────────────────────────────────┼──────────────────┤ +│ 1 │ john_doe │ Data Scientist │ INSERT │ 69cffb02264144c384d56f7b6cedee41000000 │ true │ +│ 3 │ alex_wong │ Data Analyst │ INSERT │ 59f315c8655c49eab35ba1959e269430000000 │ false │ +│ 1 │ john_doe │ Software Engineer │ DELETE │ 69cffb02264144c384d56f7b6cedee41000000 │ true │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ + +-- Return appended rows in user profiles captured in the stream +SELECT * +FROM user_profiles +CHANGES (INFORMATION => APPEND_ONLY) +AT (STREAM => profile_updates); + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ user_id │ username │ bio │ change$action │ change$is_update │ change$row_id │ +├─────────────────┼──────────────────┼──────────────────┼───────────────┼──────────────────┼────────────────────────────────────────┤ +│ 3 │ alex_wong │ Data Analyst │ INSERT │ false │ 59f315c8655c49eab35ba1959e269430000000 │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +4. Query changes between a snapshot and a timestamp with both the `AT` and `END` keywords. + +```sql +-- Step 6: Take a snapshot of the user profile data. +SELECT snapshot_id, timestamp +FROM FUSE_SNAPSHOT('default', 'user_profiles'); + +┌───────────────────────────────────────────────────────────────┐ +│ snapshot_id │ timestamp │ +├──────────────────────────────────┼────────────────────────────┤ +│ 6a11c94433714970895edd38577ac8b0 │ 2024-04-10 02:51:39.422832 │ +│ 53dc4750af92423da91c50dcee547cfb │ 2024-04-10 02:51:39.399568 │ +│ 910af7424f764891b0c6fa60aa99fc3a │ 2024-04-10 02:50:14.522416 │ +│ 1225000916f44819a0d23178b2d0d1af │ 2024-04-10 02:50:14.500417 │ +└───────────────────────────────────────────────────────────────┘ + +SELECT * +FROM user_profiles +CHANGES (INFORMATION => DEFAULT) +AT (SNAPSHOT => '1225000916f44819a0d23178b2d0d1af') +END (TIMESTAMP => '2024-04-10 02:51:39.399568'::TIMESTAMP); + +┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ user_id │ username │ bio │ change$action │ change$row_id │ change$is_update │ +├─────────────────┼──────────────────┼──────────────────────┼──────────────────┼────────────────────────────────────────┼──────────────────┤ +│ 1 │ john_doe │ Data Scientist │ INSERT │ 69cffb02264144c384d56f7b6cedee41000000 │ true │ +│ 1 │ john_doe │ Software Engineer │ DELETE │ 69cffb02264144c384d56f7b6cedee41000000 │ true │ +│ 2 │ jane_smith │ Marketing Specialist │ INSERT │ 3db484ac18174223851dc9de22f6bfec000000 │ false │ +└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/char-length.md b/tidb-cloud-lake/sql/char-length.md new file mode 100644 index 0000000000000..838498cbefca8 --- /dev/null +++ b/tidb-cloud-lake/sql/char-length.md @@ -0,0 +1,5 @@ +--- +title: CHAR_LENGTH +--- + +Alias for [LENGTH](/tidb-cloud-lake/sql/length.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/char.md b/tidb-cloud-lake/sql/char.md new file mode 100644 index 0000000000000..37233460a81a2 --- /dev/null +++ b/tidb-cloud-lake/sql/char.md @@ -0,0 +1,84 @@ +--- +id: string-char +title: CHAR +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + + +Returns the character(s) for each integer passed. The function converts each integer to its corresponding Unicode character. + +## Syntax + +```sql +CHAR(N, ...) +CHR(N) +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------------------------------------------------------| +| N | Integer value(s) representing Unicode code points (0 to 2^32-1) | + +## Return Type + +`STRING` + +## Remarks + +- Accepts any integer type (auto-casts to Int64). +- Returns empty string ('') and logs an error for invalid code points. +- `chr` is an alias for `char`. +- NULL inputs result in NULL output. + +## Examples + +```sql +-- Basic usage +SELECT CHAR(65, 66, 67); +┌───────┐ +│ char │ +│ String│ +├───────┤ +│ ABC │ +└───────┘ + +-- Using the CHR alias +SELECT CHR(68); +┌───────┐ +│ chr │ +│ String│ +├───────┤ +│ D │ +└───────┘ + +-- Creating a string from multiple code points +SELECT CHAR(77,121,83,81,76); +┌───────┐ +│ char │ +│ String│ +├───────┤ +│ MySQL │ +└───────┘ + +-- Auto-casting from different integer types +SELECT CHAR(CAST(65 AS UInt16)); +┌───────┐ +│ char │ +│ String│ +├───────┤ +│ A │ +└───────┘ + +-- NULL handling +SELECT CHAR(NULL); +┌───────┐ +│ char │ +│ String│ +├───────┤ +│ NULL │ +└───────┘ +``` diff --git a/tidb-cloud-lake/sql/character-length.md b/tidb-cloud-lake/sql/character-length.md new file mode 100644 index 0000000000000..54adf4e75e085 --- /dev/null +++ b/tidb-cloud-lake/sql/character-length.md @@ -0,0 +1,5 @@ +--- +title: CHARACTER_LENGTH +--- + +Alias for [LENGTH](/tidb-cloud-lake/sql/length.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/check-json.md b/tidb-cloud-lake/sql/check-json.md new file mode 100644 index 0000000000000..659d1009d0089 --- /dev/null +++ b/tidb-cloud-lake/sql/check-json.md @@ -0,0 +1,48 @@ +--- +title: CHECK_JSON +--- + +Checks the validity of a JSON document. +If the input string is a valid JSON document or a `NULL`, the output is `NULL`. +If the input cannot be translated to a valid JSON value, the output string contains the error message. + +## Syntax + +```sql +CHECK_JSON( ) +``` + +## Arguments + +| Arguments | Description | +|-----------|------------------------------| +| `` | An expression of string type | + +## Return Type + +String + +## Examples + +```sql +SELECT check_json('[1,2,3]'); ++-----------------------+ +| check_json('[1,2,3]') | ++-----------------------+ +| NULL | ++-----------------------+ + +SELECT check_json('{"key":"val"}'); ++-----------------------------+ +| check_json('{"key":"val"}') | ++-----------------------------+ +| NULL | ++-----------------------------+ + +SELECT check_json('{"key":'); ++----------------------------------------------+ +| check_json('{"key":') | ++----------------------------------------------+ +| EOF while parsing a value at line 1 column 7 | ++----------------------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/city-withseed.md b/tidb-cloud-lake/sql/city-withseed.md new file mode 100644 index 0000000000000..bd27dfaa596b4 --- /dev/null +++ b/tidb-cloud-lake/sql/city-withseed.md @@ -0,0 +1,23 @@ +--- +title: CITY64WITHSEED +--- + +Calculates a City64WithSeed 64-bit hash for a string. + +## Syntax + +```sql +CITY64WITHSEED(, ) +``` + +## Examples + +```sql +SELECT CITY64WITHSEED('1234567890', 12); + +┌──────────────────────────────────┐ +│ city64withseed('1234567890', 12) │ +├──────────────────────────────────┤ +│ 10660895976650300430 │ +└──────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/clause.md b/tidb-cloud-lake/sql/clause.md new file mode 100644 index 0000000000000..46dd1d5582ca4 --- /dev/null +++ b/tidb-cloud-lake/sql/clause.md @@ -0,0 +1,104 @@ +--- +title: WITH Clause +--- + +The WITH clause is an optional clause that precedes the body of the SELECT statement, and defines one or more CTEs (common table expressions) that can be referenced later in the statement. + + +## Syntax + +### Basic CTE + +```sql +[ WITH + cte_name1 [ ( cte_column_list ) ] AS ( SELECT ... ) + [ , cte_name2 [ ( cte_column_list ) ] AS ( SELECT ... ) ] + [ , cte_nameN [ ( cte_column_list ) ] AS ( SELECT ... ) ] +] +SELECT ... +``` + +### Recursive CTE + +```sql +[ WITH [ RECURSIVE ] + cte_name1 ( cte_column_list ) AS ( anchorClause UNION ALL recursiveClause ) + [ , cte_name2 ( cte_column_list ) AS ( anchorClause UNION ALL recursiveClause ) ] + [ , cte_nameN ( cte_column_list ) AS ( anchorClause UNION ALL recursiveClause ) ] +] +SELECT ... +``` + +Where: +- `anchorClause`: `SELECT anchor_column_list FROM ...` +- `recursiveClause`: `SELECT recursive_column_list FROM ... [ JOIN ... ]` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| `cte_name` | The CTE name must follow standard identifier rules | +| `cte_column_list` | The names of the columns in the CTE | +| `anchor_column_list` | The columns used in the anchor clause for the recursive CTE | +| `recursive_column_list` | The columns used in the recursive clause for the recursive CTE | + +## Examples + +### Basic CTE + +```sql +WITH high_value_customers AS ( + SELECT customer_id, customer_name, total_spent + FROM customers + WHERE total_spent > 10000 +) +SELECT c.customer_name, o.order_date, o.order_amount +FROM high_value_customers c +JOIN orders o ON c.customer_id = o.customer_id +ORDER BY o.order_date DESC; +``` + +### Multiple CTEs + +```sql +WITH + regional_sales AS ( + SELECT region, SUM(sales_amount) as total_sales + FROM sales_data + GROUP BY region + ), + top_regions AS ( + SELECT region, total_sales + FROM regional_sales + WHERE total_sales > 1000000 + ) +SELECT r.region, r.total_sales +FROM top_regions r +ORDER BY r.total_sales DESC; +``` + +### Recursive CTE + +```sql +WITH RECURSIVE countdown AS ( + -- Anchor clause: starting point + SELECT 10 as num + + UNION ALL + + -- Recursive clause: repeat until condition + SELECT num - 1 + FROM countdown + WHERE num > 1 -- Stop condition +) +SELECT num FROM countdown +ORDER BY num DESC; +``` + +## Usage Notes + +- CTEs are temporary named result sets that exist only for the duration of the query +- CTE names must be unique within the same WITH clause +- A CTE can reference previously defined CTEs in the same WITH clause +- Recursive CTEs require both an anchor clause and a recursive clause connected by UNION ALL +- The RECURSIVE keyword is required when using recursive CTEs \ No newline at end of file diff --git a/tidb-cloud-lake/sql/cluster-key.md b/tidb-cloud-lake/sql/cluster-key.md new file mode 100644 index 0000000000000..8bf0754eb837c --- /dev/null +++ b/tidb-cloud-lake/sql/cluster-key.md @@ -0,0 +1,22 @@ +--- +title: Cluster Key +--- + +This page provides a comprehensive overview of cluster key operations in Databend, organized by functionality for easy reference. + +## Cluster Key Management + +| Command | Description | +|---------|-------------| +| [SET CLUSTER KEY](/tidb-cloud-lake/sql/set-cluster-key.md) | Creates or replaces a cluster key for a table | +| [ALTER CLUSTER KEY](/tidb-cloud-lake/sql/alter-cluster-key.md) | Modifies an existing cluster key | +| [DROP CLUSTER KEY](/tidb-cloud-lake/sql/drop-cluster-key.md) | Removes a cluster key from a table | +| [RECLUSTER TABLE](/tidb-cloud-lake/sql/recluster-table.md) | Reorganizes table data based on the cluster key | + +## Related Topics + +- [Cluster Key](/tidb-cloud-lake/guides/cluster-key-performance.md) + +:::note +Cluster keys in Databend are used to physically organize data in tables to improve query performance by co-locating related data. +::: \ No newline at end of file diff --git a/tidb-cloud-lake/sql/clustering-information.md b/tidb-cloud-lake/sql/clustering-information.md new file mode 100644 index 0000000000000..86035b412eecc --- /dev/null +++ b/tidb-cloud-lake/sql/clustering-information.md @@ -0,0 +1,41 @@ +--- +title: CLUSTERING_INFORMATION +--- + +Returns clustering information of a table. + +## Syntax + +```sql +CLUSTERING_INFORMATION('', '') +``` + +## Examples + +```sql +CREATE TABLE mytable(a int, b int) CLUSTER BY(a+1); + +INSERT INTO mytable VALUES(1,1),(3,3); +INSERT INTO mytable VALUES(2,2),(5,5); +INSERT INTO mytable VALUES(4,4); + +SELECT * FROM CLUSTERING_INFORMATION('default','mytable')\G +*************************** 1. row *************************** + cluster_key: ((a + 1)) + total_block_count: 3 + constant_block_count: 1 +unclustered_block_count: 0 + average_overlaps: 1.3333 + average_depth: 2.0 + block_depth_histogram: {"00002":3} +``` + +| Parameter | Description | +|------------------------- |------------------------------------------------------------------------------------------------------------------------ | +| cluster_key | The defined cluster key. | +| total_block_count | The current count of blocks. | +| constant_block_count | The count of blocks where min/max values are equal, meaning each block contains only one (group of) cluster_key value. | +| unclustered_block_count | The count of blocks that have not yet been clustered. | +| average_overlaps | The average ratio of overlapping blocks within a given range. | +| average_depth | The average depth of overlapping partitions for the cluster key. | +| block_depth_histogram | The number of partitions at each depth level. A higher concentration of partitions at lower depths indicates more effective table clustering. | \ No newline at end of file diff --git a/tidb-cloud-lake/sql/coalesce.md b/tidb-cloud-lake/sql/coalesce.md new file mode 100644 index 0000000000000..7b3e2f637257d --- /dev/null +++ b/tidb-cloud-lake/sql/coalesce.md @@ -0,0 +1,31 @@ +--- +title: COALESCE +--- + +Returns the first non-NULL expression within its arguments; if all arguments are NULL, it returns NULL. + +## Syntax + +```sql +COALESCE([, ...]) +``` + +## Examples + +```sql +SELECT COALESCE(1), COALESCE(1, NULL), COALESCE(NULL, 1, 2); + +┌────────────────────────────────────────────────────────┐ +│ coalesce(1) │ coalesce(1, null) │ coalesce(null, 1, 2) │ +├─────────────┼───────────────────┼──────────────────────┤ +│ 1 │ 1 │ 1 │ +└────────────────────────────────────────────────────────┘ + +SELECT COALESCE('a'), COALESCE('a', NULL), COALESCE(NULL, 'a', 'b'); + +┌────────────────────────────────────────────────────────────────┐ +│ coalesce('a') │ coalesce('a', null) │ coalesce(null, 'a', 'b') │ +├───────────────┼─────────────────────┼──────────────────────────┤ +│ a │ a │ a │ +└────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/commit.md b/tidb-cloud-lake/sql/commit.md new file mode 100644 index 0000000000000..b4e24022d2762 --- /dev/null +++ b/tidb-cloud-lake/sql/commit.md @@ -0,0 +1,18 @@ +--- +title: COMMIT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Saves all changes made during a transaction. [BEGIN](/tidb-cloud-lake/sql/begin.md) and COMMIT/[ROLLBACK](/tidb-cloud-lake/sql/rollback.md) must be used together to start and then either save or undo a transaction. + +## Syntax + +```sql +COMMIT +``` + +## Examples + +See [Examples](/tidb-cloud-lake/sql/begin.md#examples). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/comparison-operators.md b/tidb-cloud-lake/sql/comparison-operators.md new file mode 100644 index 0000000000000..8f1779b9d8e1f --- /dev/null +++ b/tidb-cloud-lake/sql/comparison-operators.md @@ -0,0 +1,15 @@ +--- +title: Comparison Operators +--- + +| Operator | Description | Example | Result | +| ------------- | ------------------------------------------- | ------------------------- | ------ | +| `=` | a is equal to b | `2 = 2` | TRUE | +| `!=` | a is not equal to b | `2 != 3` | TRUE | +| `<>` | a is not equal to b | `2 <> 2` | FALSE | +| `>` | a is greater than b | `2 > 3` | FALSE | +| `>=` | a is greater than or equal to b | `4 >= NULL` | NULL | +| `<` | a is less than b | `2 < 3` | TRUE | +| `<=` | a is less than or equal to b | `2 <= 3` | TRUE | +| `IS NULL` | TRUE if expression is NULL, FALSE otherwise | `(4 >= NULL) IS NULL` | TRUE | +| `IS NOT NULL` | FALSE if expression is NULL, TRUE otherwise | `(4 >= NULL) IS NOT NULL` | FALSE | diff --git a/tidb-cloud-lake/sql/concat-ws.md b/tidb-cloud-lake/sql/concat-ws.md new file mode 100644 index 0000000000000..204092ba99951 --- /dev/null +++ b/tidb-cloud-lake/sql/concat-ws.md @@ -0,0 +1,65 @@ +--- +title: CONCAT_WS +--- + +CONCAT_WS() stands for Concatenate With Separator and is a special form of CONCAT(). The first argument is the separator for the rest of the arguments. The separator is added between the strings to be concatenated. The separator can be a string, as can the rest of the arguments. If the separator is NULL, the result is NULL. + +CONCAT_WS() does not skip empty strings. However, it does skip any NULL values after the separator argument. + +## Syntax + +```sql +CONCAT_WS(, , ...) +``` + +## Arguments + +| Arguments | Description | +|---------------| ------------- | +| `` | string column | +| `` | value column | + +## Return Type + +A `VARCHAR` data type value Or `NULL` data type. + +## Examples + +```sql +SELECT CONCAT_WS(',', 'data', 'fuse', 'labs', '2021'); ++------------------------------------------------+ +| CONCAT_WS(',', 'data', 'fuse', 'labs', '2021') | ++------------------------------------------------+ +| data,fuse,labs,2021 | ++------------------------------------------------+ + +SELECT CONCAT_WS(',', 'data', NULL, 'bend'); ++--------------------------------------+ +| CONCAT_WS(',', 'data', NULL, 'bend') | ++--------------------------------------+ +| data,bend | ++--------------------------------------+ + + +SELECT CONCAT_WS(',', 'data', NULL, NULL, 'bend'); ++--------------------------------------------+ +| CONCAT_WS(',', 'data', NULL, NULL, 'bend') | ++--------------------------------------------+ +| data,bend | ++--------------------------------------------+ + + +SELECT CONCAT_WS(NULL, 'data', 'fuse', 'labs'); ++-----------------------------------------+ +| CONCAT_WS(NULL, 'data', 'fuse', 'labs') | ++-----------------------------------------+ +| NULL | ++-----------------------------------------+ + +SELECT CONCAT_WS(',', NULL); ++----------------------+ +| CONCAT_WS(',', NULL) | ++----------------------+ +| | ++----------------------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/concat.md b/tidb-cloud-lake/sql/concat.md new file mode 100644 index 0000000000000..9dbd2334faf3b --- /dev/null +++ b/tidb-cloud-lake/sql/concat.md @@ -0,0 +1,46 @@ +--- +title: CONCAT +--- + +Returns the string that results from concatenating the arguments. May have one or more arguments. If all arguments are nonbinary strings, the result is a nonbinary string. If the arguments include any binary strings, the result is a binary string. A numeric argument is converted to its equivalent nonbinary string form. + +## Syntax + +```sql +CONCAT(, ...) +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | string | + +## Return Type + +A `VARCHAR` data type value Or `NULL` data type. + +## Examples + +```sql +SELECT CONCAT('data', 'bend'); ++------------------------+ +| concat('data', 'bend') | ++------------------------+ +| databend | ++------------------------+ + +SELECT CONCAT('data', NULL, 'bend'); ++------------------------------+ +| CONCAT('data', NULL, 'bend') | ++------------------------------+ +| NULL | ++------------------------------+ + +SELECT CONCAT('14.3'); ++----------------+ +| concat('14.3') | ++----------------+ +| 14.3 | ++----------------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/conditional-functions.md b/tidb-cloud-lake/sql/conditional-functions.md new file mode 100644 index 0000000000000..d6d94e9dd7bd7 --- /dev/null +++ b/tidb-cloud-lake/sql/conditional-functions.md @@ -0,0 +1,40 @@ +--- +title: 'Conditional Functions' +--- + +This page provides a comprehensive overview of Conditional functions in Databend, organized by functionality for easy reference. + +## Basic Conditional Functions + +| Function | Description | Example | +|----------|-------------|---------| +| [IF](/tidb-cloud-lake/sql/if.md) / [IFF](/tidb-cloud-lake/sql/iff.md) | Returns a value based on a condition | `IF(1 > 0, 'yes', 'no')` → `'yes'` | +| [CASE](/tidb-cloud-lake/sql/case.md) | Evaluates conditions and returns a matching result | `CASE WHEN 1 > 0 THEN 'yes' ELSE 'no' END` → `'yes'` | +| [DECODE](/tidb-cloud-lake/sql/decode.md) | Compares expression to search values and returns result | `DECODE(2, 1, 'one', 2, 'two', 'other')` → `'two'` | +| [COALESCE](/tidb-cloud-lake/sql/coalesce.md) | Returns the first non-NULL expression | `COALESCE(NULL, 'hello', 'world')` → `'hello'` | +| [NULLIF](/tidb-cloud-lake/sql/nullif.md) | Returns NULL if two expressions are equal, otherwise the first expression | `NULLIF(5, 5)` → `NULL` | +| [IFNULL](/tidb-cloud-lake/sql/ifnull.md) | Returns the first expression if not NULL, otherwise the second | `IFNULL(NULL, 'default')` → `'default'` | +| [NVL](/tidb-cloud-lake/sql/nvl.md) | Returns the first non-NULL expression | `NVL(NULL, 'default')` → `'default'` | +| [NVL2](/tidb-cloud-lake/sql/nvl2.md) | Returns expr2 if expr1 is not NULL, otherwise expr3 | `NVL2('value', 'not null', 'is null')` → `'not null'` | + +## Comparison Functions + +| Function | Description | Example | +|----------|-------------|---------| +| [GREATEST](/tidb-cloud-lake/sql/greatest.md) | Returns the largest value from a list | `GREATEST(1, 5, 3)` → `5` | +| [LEAST](/tidb-cloud-lake/sql/least.md) | Returns the smallest value from a list | `LEAST(1, 5, 3)` → `1` | +| [GREATEST_IGNORE_NULLS](/tidb-cloud-lake/sql/greatest-ignore-nulls.md) | Returns the largest non-NULL value | `GREATEST_IGNORE_NULLS(NULL, 5, 3)` → `5` | +| [LEAST_IGNORE_NULLS](/tidb-cloud-lake/sql/least-ignore-nulls.md) | Returns the smallest non-NULL value | `LEAST_IGNORE_NULLS(NULL, 5, 3)` → `3` | +| [BETWEEN](between.md) | Checks if a value is within a range | `5 BETWEEN 1 AND 10` → `true` | +| [IN](in.md) | Checks if a value matches any value in a list | `5 IN (1, 5, 10)` → `true` | + +## NULL and Error Handling Functions + +| Function | Description | Example | +|----------|-------------|---------| +| [IS_NULL](/tidb-cloud-lake/sql/is-null.md) | Checks if a value is NULL | `IS_NULL(NULL)` → `true` | +| [IS_NOT_NULL](/tidb-cloud-lake/sql/is-not-null.md) | Checks if a value is not NULL | `IS_NOT_NULL('value')` → `true` | +| [IS_DISTINCT_FROM](is-distinct-from.md) | Checks if two values are different, treating NULLs as equal | `NULL IS DISTINCT FROM 0` → `true` | +| [IS_ERROR](/tidb-cloud-lake/sql/is-error.md) | Checks if an expression evaluation resulted in an error | `IS_ERROR(1/0)` → `true` | +| [IS_NOT_ERROR](/tidb-cloud-lake/sql/is-not-error.md) | Checks if an expression evaluation did not result in an error | `IS_NOT_ERROR(1/1)` → `true` | +| [ERROR_OR](error-or.md) | Returns the first expression if it's not an error, otherwise the second | `ERROR_OR(1/0, 0)` → `0` | diff --git a/tidb-cloud-lake/sql/connection-id.md b/tidb-cloud-lake/sql/connection-id.md new file mode 100644 index 0000000000000..38aeb52552823 --- /dev/null +++ b/tidb-cloud-lake/sql/connection-id.md @@ -0,0 +1,23 @@ +--- +title: CONNECTION_ID +--- + +Returns the connection ID for the current connection. + +## Syntax + +```sql +CONNECTION_ID() +``` + +## Examples + +```sql +SELECT CONNECTION_ID(); + +┌──────────────────────────────────────┐ +│ connection_id() │ +├──────────────────────────────────────┤ +│ 23cb06ec-583e-4eba-b790-7c8cf72a53f8 │ +└──────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/connection-parameters.md b/tidb-cloud-lake/sql/connection-parameters.md new file mode 100644 index 0000000000000..dbb694c196bb6 --- /dev/null +++ b/tidb-cloud-lake/sql/connection-parameters.md @@ -0,0 +1,219 @@ +--- +title: Connection Parameters +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Connection parameters are key-value pairs you supply when creating reusable connections with `CREATE CONNECTION`. After a connection is created, reference it from stages, COPY commands, and other SQL features by using `CONNECTION = (CONNECTION_NAME = '')`. For full syntax and usage, see [CREATE CONNECTION](/tidb-cloud-lake/sql/create-connection.md). + +For storage-specific connection details, see the tables below. + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + + +The following table lists connection parameters for accessing an Amazon S3-like storage service: + +| Parameter | Required? | Description | +|--------------------------- |----------- |-------------------------------------------------------------- | +| endpoint_url | Yes | Endpoint URL for Amazon S3-like storage service. | +| access_key_id | Yes | Access key ID for identifying the requester. | +| secret_access_key | Yes | Secret access key for authentication. | +| enable_virtual_host_style | No | Whether to use virtual host-style URLs. Defaults to *false*. | +| master_key | No | Optional master key for advanced data encryption. | +| region | No | AWS region where the bucket is located. | +| security_token | No | Security token for temporary credentials. | + +:::note +- If the **endpoint_url** parameter is not specified in the command, Databend will create the stage on Amazon S3 by default. Therefore, when you create an external stage on an S3-compatible object storage or other object storage solutions, be sure to include the **endpoint_url** parameter. + +- The **region** parameter is not required because Databend can automatically detect the region information. You typically don't need to manually specify a value for this parameter. In case automatic detection fails, Databend will default to using 'us-east-1' as the region. When deploying Databend with MinIO and not configuring the region information, it will automatically default to using 'us-east-1', and this will work correctly. However, if you receive error messages such as "region is missing" or "The bucket you are trying to access requires a specific endpoint. Please direct all future requests to this particular endpoint", you need to determine your region name and explicitly assign it to the **region** parameter. +::: + +```sql title='Examples' +-- Create a reusable connection for Amazon S3 +CREATE CONNECTION my_s3_conn + STORAGE_TYPE = 's3' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = ''; + +-- Use the connection when creating a stage +CREATE STAGE my_s3_stage + URL = 's3://my-bucket' + CONNECTION = (CONNECTION_NAME = 'my_s3_conn'); + +-- Create a reusable connection for an S3-compatible service such as MinIO +CREATE CONNECTION my_minio_conn + STORAGE_TYPE = 's3' + ENDPOINT_URL = 'http://localhost:9000' + ACCESS_KEY_ID = 'ROOTUSER' + SECRET_ACCESS_KEY = 'CHANGEME123'; + +CREATE STAGE my_minio_stage + URL = 's3://databend' + CONNECTION = (CONNECTION_NAME = 'my_minio_conn'); +``` + +To access your Amazon S3 buckets, you can also specify an AWS IAM role and external ID for authentication. By specifying an AWS IAM role and external ID, you can provide more granular control over which S3 buckets a user can access. This means that if the IAM role has been granted permissions to access only specific S3 buckets, then the user will only be able to access those buckets. An external ID can further enhance security by providing an additional layer of verification. For more information, see https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-role.html + +The following table lists connection parameters for accessing Amazon S3 storage service using AWS IAM role authentication: + +| Parameter | Required? | Description | +|-------------- |----------- |------------------------------------------------------- | +| endpoint_url | No | Endpoint URL for Amazon S3. | +| role_arn | Yes | ARN of the AWS IAM role for authorization to S3. | +| external_id | No | External ID for enhanced security in role assumption. | + +```sql title='Examples' +-- Create the connection using IAM role authentication +CREATE CONNECTION my_iam_conn + STORAGE_TYPE = 's3' + ROLE_ARN = 'arn:aws:iam::123456789012:role/my-role' + EXTERNAL_ID = 'my-external-id'; + +-- Reference the connection when creating a stage +CREATE STAGE my_iam_stage + URL = 's3://my-bucket' + CONNECTION = (CONNECTION_NAME = 'my_iam_conn'); +``` + + + + + +The following table lists connection parameters for accessing Azure Blob Storage: + +| Parameter | Required? | Description | +|----------------|-------------|-------------------------------------------------------| +| endpoint_url | Yes | Endpoint URL for Azure Blob Storage. | +| account_key | Yes | Azure Blob Storage account key for authentication. | +| account_name | Yes | Azure Blob Storage account name for identification. | + +```sql title='Examples' +-- Create a connection for Azure Blob Storage +CREATE CONNECTION my_azure_conn + STORAGE_TYPE = 'azblob' + ACCOUNT_NAME = 'myaccount' + ACCOUNT_KEY = 'myaccountkey' + ENDPOINT_URL = 'https://.blob.core.windows.net'; + +-- Create a stage that uses the connection +CREATE STAGE my_azure_stage + URL = 'azblob://my-container' + CONNECTION = (CONNECTION_NAME = 'my_azure_conn'); +``` + + + + + +The following table lists connection parameters for accessing Google Cloud Storage: + +| Parameter | Required? | Description | +|----------------|-------------|-------------------------------------------------------| +| credential | Yes | Google Cloud Storage credential for authentication. | + +To get the `credential`, you could follow the topic [Create a service account key](https://cloud.google.com/iam/docs/keys-create-delete#creating) +from the Google documentation to create and download a service account key file. After downloading the service account key file, you could +convert it into a base64 string via the following command: + +``` +base64 -i -o ~/Desktop/base64-encoded-key.txt +``` + +```sql title='Examples' +-- Create the connection with the base64-encoded credential +CREATE CONNECTION my_gcs_conn + STORAGE_TYPE = 'gcs' + CREDENTIAL = ''; + +-- Use the connection when creating a stage +CREATE STAGE my_gcs_stage + URL = 'gcs://my-bucket' + CONNECTION = (CONNECTION_NAME = 'my_gcs_conn'); +``` + + + + + +The following table lists connection parameters for accessing Alibaba Cloud OSS: + +| Parameter | Required? | Description | +|---------------------- |----------- |--------------------------------------------------------- | +| access_key_id | Yes | Alibaba Cloud OSS access key ID for authentication. | +| access_key_secret | Yes | Alibaba Cloud OSS access key secret for authentication. | +| endpoint_url | Yes | Endpoint URL for Alibaba Cloud OSS. | +| presign_endpoint_url | No | Endpoint URL for presigning Alibaba Cloud OSS URLs. | + +```sql title='Examples' +-- Create a connection for Alibaba Cloud OSS +CREATE CONNECTION my_oss_conn + STORAGE_TYPE = 'oss' + ACCESS_KEY_ID = '' + ACCESS_KEY_SECRET = '' + ENDPOINT_URL = 'https://.[-internal].aliyuncs.com'; + +-- Create a stage using the connection +CREATE STAGE my_oss_stage + URL = 'oss://my-bucket' + CONNECTION = (CONNECTION_NAME = 'my_oss_conn'); +``` + + + + + +The following table lists connection parameters for accessing Tencent Cloud Object Storage (COS): + +| Parameter | Required? | Description | +|-------------- |----------- |------------------------------------------------------------- | +| endpoint_url | Yes | Endpoint URL for Tencent Cloud Object Storage. | +| secret_id | Yes | Tencent Cloud Object Storage secret ID for authentication. | +| secret_key | Yes | Tencent Cloud Object Storage secret key for authentication. | + +```sql title='Examples' +-- Create a connection for Tencent COS +CREATE CONNECTION my_cos_conn + STORAGE_TYPE = 'cos' + SECRET_ID = '' + SECRET_KEY = '' + ENDPOINT_URL = ''; + +-- Create a stage that uses the connection +CREATE STAGE my_cos_stage + URL = 'cos://my-bucket' + CONNECTION = (CONNECTION_NAME = 'my_cos_conn'); +``` + + + + + +The following table lists connection parameters for accessing Hugging Face: + +| Parameter | Required? | Description | +|-----------|-----------------------|-----------------------------------------------------------------------------------------------------------------| +| repo_type | No (default: dataset) | The type of the Hugging Face repository. Can be `dataset` or `model`. | +| revision | No (default: main) | The revision for the Hugging Face URI. Could be a branch, tag, or commit of the repository. | +| token | No | The API token from Hugging Face, which may be required for accessing private repositories or certain resources. | + +```sql title='Examples' +-- Create a connection for Hugging Face +CREATE CONNECTION my_hf_conn + STORAGE_TYPE = 'hf' + REPO_TYPE = 'dataset' + REVISION = 'main'; + +-- Create a stage that uses the connection +CREATE STAGE my_huggingface_stage + URL = 'hf://opendal/huggingface-testdata/' + CONNECTION = (CONNECTION_NAME = 'my_hf_conn'); +``` + + + + diff --git a/tidb-cloud-lake/sql/connection.md b/tidb-cloud-lake/sql/connection.md new file mode 100644 index 0000000000000..33e7a56aca689 --- /dev/null +++ b/tidb-cloud-lake/sql/connection.md @@ -0,0 +1,92 @@ +--- +title: Connection +--- + +## What is Connection? + +A connection in Databend refers to a designated configuration that encapsulates the details required to interact with an external storage service. It serves as a centralized and reusable set of parameters, such as access credentials, endpoint URLs, and storage types, facilitating the integration of Databend with various storage services. + +Connection can be utilized for creating external stages, external tables, and attaching tables, offering a streamlined and modular approach to managing and accessing data stored in external storage services through Databend. + +## Connection Management + +| Command | Description | +|---------|-------------| +| [CREATE CONNECTION](/tidb-cloud-lake/sql/create-connection.md) | Creates a new connection to an external storage service | +| [DROP CONNECTION](/tidb-cloud-lake/sql/drop-connection.md) | Removes an existing connection | + +## Connection Information + +| Command | Description | +|---------|-------------| +| [DESCRIBE CONNECTION](/tidb-cloud-lake/sql/desc-connection.md) | Shows details of a specific connection | +| [SHOW CONNECTIONS](/tidb-cloud-lake/sql/show-connections.md) | Lists all connections in the current database | + +### Usage Examples + +The examples in this section initially create a connection with the credentials necessary for connecting to Amazon S3. Subsequently, they utilize this established connection to create an external stage and attach an existing table. + +This statement initiates a connection to Amazon S3, specifying essential connection parameters: + +```sql +CREATE CONNECTION toronto + STORAGE_TYPE = 's3' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = ''; + +``` + +#### Example 1: Creating External Stage with Connection + +The following example creates an external stage using the previously defined connection named 'toronto': + +```sql +CREATE STAGE my_s3_stage + URL = 's3://databend-toronto' + CONNECTION = (CONNECTION_NAME = 'toronto'); + + +-- Equivalent to the following statement without using a connection: + +CREATE STAGE my_s3_stage + URL = 's3://databend-toronto' + CONNECTION = ( + ACCESS_KEY_ID = '', + SECRET_ACCESS_KEY = '' + ); + +``` + +#### Example 2: Attaching Table with Connection + +The [ATTACH TABLE](/tidb-cloud-lake/sql/attach-table.md) page offers [Examples](/tidb-cloud-lake/sql/attach-table.md#examples) demonstrating how to connect a new table in Databend Cloud with an existing table in Databend, where data is stored within an Amazon S3 bucket named "databend-toronto". In each example, Step 3 can be streamlined using the previously defined connection named 'toronto': + +```sql title='Databend Cloud:' +ATTACH TABLE employees_backup + 's3://databend-toronto/1/216/' + CONNECTION = (CONNECTION_NAME = 'toronto'); + +``` + +```sql title='Databend Cloud:' +ATTACH TABLE population_readonly + 's3://databend-toronto/1/556/' + CONNECTION = (CONNECTION_NAME = 'toronto') + READ_ONLY; + +``` + +#### Example 3: Creating External Table with Connection + +This example demonstrates the creation of an external table named 'BOOKS' using the previously defined connection named 'toronto': + +```sql +CREATE TABLE BOOKS ( + id BIGINT UNSIGNED, + title VARCHAR, + genre VARCHAR DEFAULT 'General' +) +'s3://databend-toronto' +CONNECTION = (CONNECTION_NAME = 'toronto'); + +``` diff --git a/tidb-cloud-lake/sql/contains.md b/tidb-cloud-lake/sql/contains.md new file mode 100644 index 0000000000000..0e53d3acd3779 --- /dev/null +++ b/tidb-cloud-lake/sql/contains.md @@ -0,0 +1,27 @@ +--- +title: CONTAINS +--- + +Checks if the array contains a specific element. + +## Syntax + +```sql +CONTAINS( , ) +``` + +## Aliases + +- [ARRAY_CONTAINS](/tidb-cloud-lake/sql/array-contains.md) + +## Examples + +```sql +SELECT ARRAY_CONTAINS([1, 2], 1), CONTAINS([1, 2], 1); + +┌─────────────────────────────────────────────────┐ +│ array_contains([1, 2], 1) │ contains([1, 2], 1) │ +├───────────────────────────┼─────────────────────┤ +│ true │ true │ +└─────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/context-functions.md b/tidb-cloud-lake/sql/context-functions.md new file mode 100644 index 0000000000000..67c38fa254e38 --- /dev/null +++ b/tidb-cloud-lake/sql/context-functions.md @@ -0,0 +1,26 @@ +--- +title: Context Functions +--- + +This page provides reference information for the context-related functions in Databend. These functions return information about the current session, database, or system context. + +## Session Information Functions + +| Function | Description | Example | +|----------|-------------|--------| +| [CONNECTION_ID](/tidb-cloud-lake/sql/connection-id.md) | Returns the connection ID for the current connection | `CONNECTION_ID()` → `42` | +| [CURRENT_USER](/tidb-cloud-lake/sql/current-user.md) | Returns the user name and host for the current connection | `CURRENT_USER()` → `'root'@'%'` | +| [LAST_QUERY_ID](/tidb-cloud-lake/sql/last-query-id.md) | Returns the query ID of the last executed query | `LAST_QUERY_ID()` → `'01890a5d-ac96-7cc6-8128-01d71ab8b93e'` | + +## Database Context Functions + +| Function | Description | Example | +|----------|-------------|--------| +| [CURRENT_CATALOG](/tidb-cloud-lake/sql/current-catalog.md) | Returns the name of the current catalog | `CURRENT_CATALOG()` → `'default'` | +| [DATABASE](database.md) | Returns the name of the current database | `DATABASE()` → `'default'` | + +## System Information Functions + +| Function | Description | Example | +|----------|-------------|--------| +| [VERSION](/tidb-cloud-lake/sql/version.md) | Returns the current version of Databend | `VERSION()` → `'DatabendQuery v1.2.252-nightly-193ed56304'` | diff --git a/tidb-cloud-lake/sql/conversion-functions.md b/tidb-cloud-lake/sql/conversion-functions.md new file mode 100644 index 0000000000000..15c9aaaf9dfdc --- /dev/null +++ b/tidb-cloud-lake/sql/conversion-functions.md @@ -0,0 +1,80 @@ +--- +title: 'Conversion Functions' +--- + +This page provides a comprehensive overview of Conversion functions in Databend, organized by functionality for easy reference. + +## Type Conversion Functions + +| Function | Description | Example | +|----------|-------------|---------| +| [CAST](/tidb-cloud-lake/sql/cast.md) | Converts a value to a specified data type | `CAST('123' AS INT)` → `123` | +| [TRY_CAST](/tidb-cloud-lake/sql/try-cast.md) | Safely converts a value to a specified data type, returning NULL on failure | `TRY_CAST('abc' AS INT)` → `NULL` | +| [TO_BOOLEAN](/tidb-cloud-lake/sql/to-boolean.md) | Converts a value to BOOLEAN type | `TO_BOOLEAN('true')` → `true` | +| [TO_STRING](/tidb-cloud-lake/sql/to-string.md) | Converts a value to STRING type | `TO_STRING(123)` → `'123'` | +| [TO_VARCHAR](/tidb-cloud-lake/sql/to-varchar.md) | Converts a value to VARCHAR type | `TO_VARCHAR(123)` → `'123'` | +| [TO_TEXT](/tidb-cloud-lake/sql/to-text.md) | Converts a value to TEXT type | `TO_TEXT(123)` → `'123'` | + +## Numeric Conversion Functions + +| Function | Description | Example | +|----------|-------------|---------| +| [TO_INT8](/tidb-cloud-lake/sql/to-int8.md) | Converts a value to INT8 type | `TO_INT8('123')` → `123` | +| [TO_INT16](/tidb-cloud-lake/sql/to-int16.md) | Converts a value to INT16 type | `TO_INT16('123')` → `123` | +| [TO_INT32](/tidb-cloud-lake/sql/to-int32.md) | Converts a value to INT32 type | `TO_INT32('123')` → `123` | +| [TO_INT64](/tidb-cloud-lake/sql/to-int64.md) | Converts a value to INT64 type | `TO_INT64('123')` → `123` | +| [TO_UINT8](/tidb-cloud-lake/sql/to-uint8.md) | Converts a value to UINT8 type | `TO_UINT8('123')` → `123` | +| [TO_UINT16](/tidb-cloud-lake/sql/to-uint16.md) | Converts a value to UINT16 type | `TO_UINT16('123')` → `123` | +| [TO_UINT32](/tidb-cloud-lake/sql/to-uint32.md) | Converts a value to UINT32 type | `TO_UINT32('123')` → `123` | +| [TO_UINT64](/tidb-cloud-lake/sql/to-uint64.md) | Converts a value to UINT64 type | `TO_UINT64('123')` → `123` | +| [TO_FLOAT32](/tidb-cloud-lake/sql/to-float32.md) | Converts a value to FLOAT32 type | `TO_FLOAT32('123.45')` → `123.45` | +| [TO_FLOAT64](/tidb-cloud-lake/sql/to-float64.md) | Converts a value to FLOAT64 type | `TO_FLOAT64('123.45')` → `123.45` | + +## Binary and Specialized Conversion Functions + +| Function | Description | Example | +|----------|-------------|---------| +| [TO_BINARY](/tidb-cloud-lake/sql/to-binary.md) | Converts a value to BINARY type | `TO_BINARY('abc')` → `binary value` | +| [TRY_TO_BINARY](/tidb-cloud-lake/sql//tidb-cloud-lake/sql/try-to-binary.md) | Safely converts a value to BINARY type, returning NULL on failure | `TRY_TO_BINARY('abc')` → `binary value` | +| [TO_HEX](/tidb-cloud-lake/sql/to-hex.md) | Converts a value to hexadecimal string | `TO_HEX(255)` → `'FF'` | +| [TO_VARIANT](/tidb-cloud-lake/sql/to-variant.md) | Converts a value to VARIANT type | `TO_VARIANT('{"a": 1}')` → `{"a": 1}` | +| [BUILD_BITMAP](/tidb-cloud-lake/sql/build-bitmap.md) | Builds a bitmap from an array of integers | `BUILD_BITMAP([1,2,3])` → `bitmap value` | +| [TO_BITMAP](/tidb-cloud-lake/sql/to-bitmap.md) | Converts a value to BITMAP type | `TO_BITMAP([1,2,3])` → `bitmap value` | + +Please note the following when converting a value from one type to another: + +- When converting from floating-point, decimal numbers, or strings to integers or decimal numbers with fractional parts, Databend rounds the values to the nearest integer. This is determined by the setting `numeric_cast_option` (defaults to 'rounding') which controls the behavior of numeric casting operations. When `numeric_cast_option` is explicitly set to 'truncating', Databend will truncate the decimal part, discarding any fractional values. + + ```sql title='Example:' + SELECT CAST('0.6' AS DECIMAL(10, 0)), CAST(0.6 AS DECIMAL(10, 0)), CAST(1.5 AS INT); + + ┌──────────────────────────────────────────────────────────────────────────────────┐ + │ cast('0.6' as decimal(10, 0)) │ cast(0.6 as decimal(10, 0)) │ cast(1.5 as int32) │ + ├───────────────────────────────┼─────────────────────────────┼────────────────────┤ + │ 1 │ 1 │ 2 │ + └──────────────────────────────────────────────────────────────────────────────────┘ + + SET numeric_cast_option = 'truncating'; + + SELECT CAST('0.6' AS DECIMAL(10, 0)), CAST(0.6 AS DECIMAL(10, 0)), CAST(1.5 AS INT); + + ┌──────────────────────────────────────────────────────────────────────────────────┐ + │ cast('0.6' as decimal(10, 0)) │ cast(0.6 as decimal(10, 0)) │ cast(1.5 as int32) │ + ├───────────────────────────────┼─────────────────────────────┼────────────────────┤ + │ 0 │ 0 │ 1 │ + └──────────────────────────────────────────────────────────────────────────────────┘ + ``` + + The table below presents a summary of numeric casting operations, highlighting the casting possibilities between different source and target numeric data types. Please note that, it specifies the requirement for String to Integer casting, where the source string must contain an integer value. + + | Source Type | Target Type | + |----------------|-------------| + | String | Decimal | + | Float | Decimal | + | Decimal | Decimal | + | Float | Int | + | Decimal | Int | + | String (Int) | Int | + + +- Databend also offers a variety of functions for converting expressions into different date and time formats. For more information, see [Date & Time Functions](/tidb-cloud-lake/sql/date-time-functions.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/convert-timezone.md b/tidb-cloud-lake/sql/convert-timezone.md new file mode 100644 index 0000000000000..c716ce34457e5 --- /dev/null +++ b/tidb-cloud-lake/sql/convert-timezone.md @@ -0,0 +1,93 @@ +--- +title: CONVERT_TIMEZONE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +`CONVERT_TIMEZONE()` converts a timestamp from the current session timezone (default `UTC`) to the timezone supplied in the first argument. The destination timezone must be a valid [IANA timezone name](https://docs.rs/chrono-tz/latest/chrono_tz/enum.Tz.html). + +## Syntax + +```sql +CONVERT_TIMEZONE(, ) +``` + +| Parameter | Description | +|----------------------|-----------------------------------------------------------------------------| +| `` | Case-sensitive timezone name such as `'America/Los_Angeles'` or `'UTC'`. | +| `` | TIMESTAMP expression (or a value castable to TIMESTAMP). Interpreted using the current session timezone. | + +## Return Type + +Returns a TIMESTAMP value that represents the same instant in the target timezone. + +## Behavior + +- The source timezone always equals the current session timezone (default `UTC`). Configure the session or connection to match the data you are converting. +- Invalid timezone names raise an error. If either argument is `NULL`, the result is `NULL`. +- Daylight-saving gaps can make some timestamps invalid. Turn on `enable_dst_hour_fix = 1` (session or tenant level) so Databend adjusts such values automatically. + +## Examples + +### Convert a single timestamp (default UTC session) + +```sql +SELECT CONVERT_TIMEZONE('America/Los_Angeles', '2024-11-01 11:36:10'); +``` + +``` +┌──────────────────────────────────────────────────────┐ +│ convert_timezone('America/Los_Angeles', '2024-11-01… │ +├──────────────────────────────────────────────────────┤ +│ 2024-11-01 04:36:10.000000 │ +└──────────────────────────────────────────────────────┘ +``` + +### Convert rows using each user's timezone + +```sql +SELECT + user_tz, + event_time, + CONVERT_TIMEZONE(user_tz, event_time) AS local_time +FROM ( + VALUES + ('America/Los_Angeles', '2024-10-31 22:21:15'::TIMESTAMP), + ('Asia/Shanghai', '2024-10-31 22:21:15'::TIMESTAMP), + (NULL, '2024-10-31 22:21:15'::TIMESTAMP) +) AS v(user_tz, event_time) +ORDER BY user_tz NULLS LAST; +``` + +``` +┌──────────────────────┬──────────────────────────────┬──────────────────────────────┐ +│ user_tz │ event_time │ local_time │ +├──────────────────────┼──────────────────────────────┼──────────────────────────────┤ +│ America/Los_Angeles │ 2024-10-31 22:21:15.000000 │ 2024-10-31 15:21:15.000000 │ +│ Asia/Shanghai │ 2024-10-31 22:21:15.000000 │ 2024-11-01 06:21:15.000000 │ +│ NULL │ 2024-10-31 22:21:15.000000 │ NULL │ +└──────────────────────┴──────────────────────────────┴──────────────────────────────┘ +``` + +### Handle timestamps inside DST gaps + +In this session the timezone is configured as Asia/Shanghai and `enable_dst_hour_fix = 1`. The timestamp `1947-04-15 00:00:00` never existed there because clocks jumped forward, so Databend adjusts it before returning the UTC value. + +```sql +SELECT CONVERT_TIMEZONE('UTC', '1947-04-15 00:00:00'); +``` + +``` +┌──────────────────────────────────────────────┐ +│ convert_timezone('UTC', '1947-04-15 00:00:00')│ +├──────────────────────────────────────────────┤ +│ 1947-04-14 15:00:00.000000 │ +└──────────────────────────────────────────────┘ +``` + +## See Also + +- [TIMEZONE](/tidb-cloud-lake/sql/timezone.md) +- [TO_TIMESTAMP_TZ](/tidb-cloud-lake/sql/timestamp-tz.md) +- [TO_TIMESTAMP](/tidb-cloud-lake/sql/timestamp.md) diff --git a/tidb-cloud-lake/sql/copy-into-location.md b/tidb-cloud-lake/sql/copy-into-location.md new file mode 100644 index 0000000000000..2d60dfddcdac8 --- /dev/null +++ b/tidb-cloud-lake/sql/copy-into-location.md @@ -0,0 +1,351 @@ +--- +title: "COPY INTO " +sidebar_label: "COPY INTO " +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +COPY INTO allows you to unload data from a table or query into one or more files in one of the following locations: + +- User / Internal / External stages: See [What is Stage?](/tidb-cloud-lake/guides/what-is-stage.md) to learn about stages in Databend. +- Buckets or containers created in a storage service. + +See also: [`COPY INTO
`](/tidb-cloud-lake/sql/copy-into-table.md) + +## Syntax + +```sql +COPY INTO { internalStage | externalStage | externalLocation } +FROM { [.] | ( ) } +[ PARTITION BY ( ) ] +[ FILE_FORMAT = ( + FORMAT_NAME = '' + | TYPE = { CSV | TSV | NDJSON | PARQUET } [ formatTypeOptions ] + ) ] +[ copyOptions ] +[ VALIDATION_MODE = RETURN_ROWS ] +[ DETAILED_OUTPUT = true | false ] +``` + +### internalStage + +```sql +internalStage ::= @[/] +``` + +### externalStage + +```sql +externalStage ::= @[/] +``` + +### externalLocation + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + + + +```sql +externalLocation ::= + 's3://[]' + CONNECTION = ( + + ) +``` + +For the connection parameters available for accessing Amazon S3-like storage services, see [Connection Parameters](/tidb-cloud-lake/sql/connection-parameters.md). + + + + +```sql +externalLocation ::= + 'azblob://[]' + CONNECTION = ( + + ) +``` + +For the connection parameters available for accessing Azure Blob Storage, see [Connection Parameters](/tidb-cloud-lake/sql/connection-parameters.md). + + + + +```sql +externalLocation ::= + 'gcs://[]' + CONNECTION = ( + + ) +``` + +For the connection parameters available for accessing Google Cloud Storage, see [Connection Parameters](/tidb-cloud-lake/sql/connection-parameters.md). + + + + +```sql +externalLocation ::= + 'oss://[]' + CONNECTION = ( + + ) +``` + +For the connection parameters available for accessing Alibaba Cloud OSS, see [Connection Parameters](/tidb-cloud-lake/sql/connection-parameters.md). + + + + +```sql +externalLocation ::= + 'cos://[]' + CONNECTION = ( + + ) +``` + +For the connection parameters available for accessing Tencent Cloud Object Storage, see [Connection Parameters](/tidb-cloud-lake/sql/connection-parameters.md). + + + + +### FILE_FORMAT + +See [Input & Output File Formats](/tidb-cloud-lake/sql/input-output-file-formats.md) for details. + +### PARTITION BY + +Specifies an expression used to partition the unloaded data into separate folders. The expression must evaluate to a `STRING` type. Each distinct value produced by the expression creates a subfolder in the destination path, and the corresponding rows are written into files under that subfolder. + +- If the expression evaluates to `NULL`, the rows are placed in a special `_NULL_` folder. +- The expression can reference any columns from the source table or query. +- Path traversal (`..`) is not allowed in partition values. + +The following options are incompatible with `PARTITION BY` and will cause an error if set: + +| Option | Restriction | +| ------------------- | ------------------------------------------------ | +| SINGLE | Cannot be `TRUE` when using `PARTITION BY`. | +| OVERWRITE | Cannot be `TRUE` when using `PARTITION BY`. | +| INCLUDE_QUERY_ID | Cannot be `FALSE` when using `PARTITION BY`. | + +### copyOptions + +```sql +copyOptions ::= + [ SINGLE = true | false ] + [ MAX_FILE_SIZE = ] + [ OVERWRITE = true | false ] + [ INCLUDE_QUERY_ID = true | false ] + [ USE_RAW_PATH = true | false ] +``` + +| Parameter | Default | Description | +| ---------------- | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| SINGLE | false | When `true`, the command unloads data into one single file. | +| MAX_FILE_SIZE | 67108864 bytes (64 MB) | The maximum size (in bytes) of each file to be created. Effective when `SINGLE` is false. | +| OVERWRITE | false | When `true`, existing files with the same name at the target path will be overwritten. Note: `OVERWRITE = true` requires `USE_RAW_PATH = true` and `INCLUDE_QUERY_ID = false`. | +| INCLUDE_QUERY_ID | true | When `true`, a unique UUID will be included in the exported file names. | +| USE_RAW_PATH | false | When `true`, the exact user-provided path (including the full file name) will be used for exporting the data. If set to `false`, the user must provide a directory path. | + +### DETAILED_OUTPUT + +Determines whether a detailed result of the data unloading should be returned, with the default value set to `false`. For more information, see [Output](#output). + +## Output + +COPY INTO provides a summary of the data unloading results with these columns: + +| Column | Description | +| ------------- | --------------------------------------------------------------------------------------------- | +| rows_unloaded | The number of rows successfully unloaded to the destination. | +| input_bytes | The total size, in bytes, of the data read from the source table during the unload operation. | +| output_bytes | The total size, in bytes, of the data written to the destination. | + +When `DETAILED_OUTPUT` is set to `true`, COPY INTO provides results with the following columns. This assists in locating the unloaded files, especially when using `MAX_FILE_SIZE` to separate the unloaded data into multiple files. + +| Column | Description | +| --------- | -------------------------------------------------- | +| file_name | The name of the unloaded file. | +| file_size | The size of the unloaded file in bytes. | +| row_count | The number of rows contained in the unloaded file. | + +## Examples + +In this section, the provided examples make use of the following table and data: + +```sql +-- Create sample table +CREATE TABLE canadian_city_population ( + city_name VARCHAR(50), + population INT +); + +-- Insert sample data +INSERT INTO canadian_city_population (city_name, population) +VALUES +('Toronto', 2731571), +('Montreal', 1704694), +('Vancouver', 631486), +('Calgary', 1237656), +('Ottawa', 934243), +('Edmonton', 972223), +('Quebec City', 542298), +('Winnipeg', 705244), +('Hamilton', 536917), +('Halifax', 403390); +``` + +### Example 1: Unloading to Internal Stage + +This example unloads data to an internal stage: + +```sql +-- Create an internal stage +CREATE STAGE my_internal_stage; + +-- Unload data from the table to the stage using the PARQUET file format +COPY INTO @my_internal_stage + FROM canadian_city_population + FILE_FORMAT = (TYPE = PARQUET); + +┌────────────────────────────────────────────┐ +│ rows_unloaded │ input_bytes │ output_bytes │ +├───────────────┼─────────────┼──────────────┤ +│ 10 │ 211 │ 572 │ +└────────────────────────────────────────────┘ + +LIST @my_internal_stage; + +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ size │ md5 │ last_modified │ creator │ +├─────────────────────────────────────────────────────────────────┼────────┼──────────────────┼───────────────────────────────┼──────────────────┤ +│ data_abe520a3-ee88-488c-9221-b07c562c9a30_0000_00000000.parquet │ 572 │ NULL │ 2024-01-18 16:20:48.979 +0000 │ NULL │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Example 2: Unloading to Compressed File + +This example unloads data into a compressed file: + +```sql +-- Create an internal stage +CREATE STAGE my_internal_stage; + +-- Unload data from the table to the stage using the CSV file format with gzip compression +COPY INTO @my_internal_stage + FROM canadian_city_population + FILE_FORMAT = (TYPE = CSV COMPRESSION = gzip); + +┌────────────────────────────────────────────┐ +│ rows_unloaded │ input_bytes │ output_bytes │ +├───────────────┼─────────────┼──────────────┤ +│ 10 │ 182 │ 168 │ +└────────────────────────────────────────────┘ + +LIST @my_internal_stage; + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ size │ md5 │ last_modified │ creator │ +├────────────────────────────────────────────────────────────────┼────────┼──────────────────┼───────────────────────────────┼──────────────────┤ +│ data_7970afa5-32e3-4e7d-b793-e42a2a82a8e6_0000_00000000.csv.gz │ 168 │ NULL │ 2024-01-18 16:27:01.663 +0000 │ NULL │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ + +-- COPY INTO also works with custom file formats. See below: +-- Create a custom file format named my_csv_gzip with CSV format and gzip compression +CREATE FILE FORMAT my_csv_gzip TYPE = CSV COMPRESSION = gzip; + +-- Unload data from the table to the stage using the custom file format my_csv_gzip +COPY INTO @my_internal_stage + FROM canadian_city_population + FILE_FORMAT = (FORMAT_NAME = 'my_csv_gzip'); + +┌────────────────────────────────────────────┐ +│ rows_unloaded │ input_bytes │ output_bytes │ +├───────────────┼─────────────┼──────────────┤ +│ 10 │ 182 │ 168 │ +└────────────────────────────────────────────┘ + +LIST @my_internal_stage; + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ size │ md5 │ last_modified │ creator │ +├────────────────────────────────────────────────────────────────┼────────┼──────────────────┼───────────────────────────────┼──────────────────┤ +│ data_d006ba1c-0609-46d7-a67b-75c7078d86ff_0000_00000000.csv.gz │ 168 │ NULL │ 2024-01-18 16:29:29.721 +0000 │ NULL │ +│ data_7970afa5-32e3-4e7d-b793-e42a2a82a8e6_0000_00000000.csv.gz │ 168 │ NULL │ 2024-01-18 16:27:01.663 +0000 │ NULL │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Example 3: Unloading to Bucket + +This example unloads data into a bucket on MinIO: + +```sql +-- Unload data from the table to a bucket named 'databend' on MinIO using the PARQUET file format +COPY INTO 's3://databend' + CONNECTION = ( + ENDPOINT_URL = 'http://localhost:9000/', + ACCESS_KEY_ID = 'ROOTUSER', + SECRET_ACCESS_KEY = 'CHANGEME123', + region = 'us-west-2' + ) + FROM canadian_city_population + FILE_FORMAT = (TYPE = PARQUET); + +┌────────────────────────────────────────────┐ +│ rows_unloaded │ input_bytes │ output_bytes │ +├───────────────┼─────────────┼──────────────┤ +│ 10 │ 211 │ 572 │ +└────────────────────────────────────────────┘ +``` + +![Alt text](/img/sql/copy-into-bucket.png) + +### Example 4: Unloading with PARTITION BY + +This example unloads data into partitioned folders based on a derived expression: + +```sql +-- Create a sample table +CREATE TABLE sales_data ( + sale_date DATE, + region VARCHAR, + amount INT +); + +INSERT INTO sales_data VALUES + ('2025-01-15', 'east', 100), + ('2025-01-20', 'west', 200), + ('2025-02-10', 'east', 150), + (NULL, 'west', 50); + +-- Create an internal stage +CREATE STAGE partitioned_stage; + +-- Unload data partitioned by year-month derived from sale_date +-- When sale_date is NULL, to_varchar() returns NULL, so the entire +-- concatenation evaluates to NULL and the row lands in the _NULL_ folder. +COPY INTO @partitioned_stage + FROM sales_data + PARTITION BY ('month=' || to_varchar(sale_date, 'YYYY-MM')) + FILE_FORMAT = (TYPE = PARQUET); + +-- Verify the partitioned folder layout +SELECT name FROM list_stage(location => '@partitioned_stage') ORDER BY name; + +┌──────────────────────────────────────────────────────────────────┐ +│ name │ +├──────────────────────────────────────────────────────────────────┤ +│ _NULL_/data__0000_00000000.parquet │ +│ month=2025-01/data__0000_00000000.parquet │ +│ month=2025-02/data__0000_00000000.parquet │ +└──────────────────────────────────────────────────────────────────┘ +``` + +When the partition expression evaluates to `NULL`, the data is placed in a `_NULL_` folder. Each unique partition value creates its own subfolder containing the corresponding data files. diff --git a/tidb-cloud-lake/sql/copy-into-table.md b/tidb-cloud-lake/sql/copy-into-table.md new file mode 100644 index 0000000000000..6231bed5ab1e5 --- /dev/null +++ b/tidb-cloud-lake/sql/copy-into-table.md @@ -0,0 +1,704 @@ +--- +title: "COPY INTO
" +sidebar_label: "COPY INTO
" +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + +COPY INTO allows you to load data from files located in one of the following locations: + +- User / Internal / External stages: See [What is Stage?](/tidb-cloud-lake/guides/what-is-stage.md) to learn about stages in Databend. +- Buckets or containers created in a storage service. +- Remote servers from where you can access the files by their URL (starting with "https://..."). +- [IPFS](https://ipfs.tech) and Hugging Face repositories. + +See also: [`COPY INTO `](/tidb-cloud-lake/sql/copy-into-location.md) + +## Syntax + +```sql +/* Standard data load */ +COPY INTO [.] [ ( [ , ... ] ) ] + FROM { userStage | internalStage | externalStage | externalLocation } +[ FILES = ( '' [ , '' ] [ , ... ] ) ] +[ PATTERN = '' ] +[ FILE_FORMAT = ( + FORMAT_NAME = '' + | TYPE = { CSV | TSV | NDJSON | PARQUET | ORC | AVRO } [ formatTypeOptions ] + ) ] +[ copyOptions ] + +/* Data load with transformation */ +COPY INTO [.] [ ( [ , ... ] ) ] + FROM ( + SELECT { + [.] [, [.] ...] -- Query columns by name + | [.]$ [, [.]$ ...] -- Query columns by position + | [.]$1[:] [, [.]$1[:] ...] -- Query rows as Variants + } ] + FROM {@[/] | ''} + ) +[ FILES = ( '' [ , '' ] [ , ... ] ) ] +[ PATTERN = '' ] +[ FILE_FORMAT = ( + FORMAT_NAME = '' + | TYPE = { CSV | TSV | NDJSON | PARQUET | ORC | AVRO } [ formatTypeOptions ] + ) ] +[ copyOptions ] +``` + +Where: + +```sql +userStage ::= @~[/] + +internalStage ::= @[/] + +externalStage ::= @[/] + +externalLocation ::= + /* Amazon S3-like Storage */ + 's3://[/]' + CONNECTION = ( + [ CONNECTION_NAME = '' ] + | [ ENDPOINT_URL = '' ] + [ ACCESS_KEY_ID = '' ] + [ SECRET_ACCESS_KEY = '' ] + [ ENABLE_VIRTUAL_HOST_STYLE = TRUE | FALSE ] + [ MASTER_KEY = '' ] + [ REGION = '' ] + [ SECURITY_TOKEN = '' ] + [ ROLE_ARN = '' ] + [ EXTERNAL_ID = '' ] + ) + + /* Azure Blob Storage */ + | 'azblob://[/]' + CONNECTION = ( + [ CONNECTION_NAME = '' ] + | ENDPOINT_URL = '' + ACCOUNT_NAME = '' + ACCOUNT_KEY = '' + ) + + /* Google Cloud Storage */ + | 'gcs://[/]' + CONNECTION = ( + [ CONNECTION_NAME = '' ] + | CREDENTIAL = '' + ) + + /* Alibaba Cloud OSS */ + | 'oss://[/]' + CONNECTION = ( + [ CONNECTION_NAME = '' ] + | ACCESS_KEY_ID = '' + ACCESS_KEY_SECRET = '' + ENDPOINT_URL = '' + [ PRESIGN_ENDPOINT_URL = '' ] + ) + + /* Tencent Cloud Object Storage */ + | 'cos://[/]' + CONNECTION = ( + [ CONNECTION_NAME = '' ] + | SECRET_ID = '' + SECRET_KEY = '' + ENDPOINT_URL = '' + ) + + /* Remote Files */ + | 'https://' + + /* IPFS */ + | 'ipfs://' + CONNECTION = (ENDPOINT_URL = 'https://') + + /* Hugging Face */ + | 'hf://[/]' + CONNECTION = ( + [ REPO_TYPE = 'dataset' | 'model' ] + [ REVISION = '' ] + [ TOKEN = '' ] + ) + +formatTypeOptions ::= + /* Common options for all formats */ + [ COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE ] + + /* CSV specific options */ + [ RECORD_DELIMITER = '' ] + [ FIELD_DELIMITER = '' ] + [ SKIP_HEADER = ] + [ QUOTE = '' ] + [ ESCAPE = '' ] + [ NAN_DISPLAY = '' ] + [ NULL_DISPLAY = '' ] + [ ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE | FALSE ] + [ EMPTY_FIELD_AS = null | string | field_default ] + [ BINARY_FORMAT = HEX | BASE64 ] + + /* TSV specific options */ + [ RECORD_DELIMITER = '' ] + [ FIELD_DELIMITER = '' ] + + /* NDJSON specific options */ + [ NULL_FIELD_AS = NULL | FIELD_DEFAULT ] + [ MISSING_FIELD_AS = ERROR | NULL | FIELD_DEFAULT ] + [ ALLOW_DUPLICATE_KEYS = TRUE | FALSE ] + + /* PARQUET specific options */ + [ MISSING_FIELD_AS = ERROR | FIELD_DEFAULT ] + + /* ORC specific options */ + [ MISSING_FIELD_AS = ERROR | FIELD_DEFAULT ] + + /* AVRO specific options */ + [ MISSING_FIELD_AS = ERROR | FIELD_DEFAULT ] + +copyOptions ::= + [ SIZE_LIMIT = ] + [ PURGE = ] + [ FORCE = ] + [ DISABLE_VARIANT_CHECK = ] + [ ON_ERROR = { continue | abort | abort_N } ] + [ MAX_FILES = ] + [ RETURN_FAILED_ONLY = ] + [ COLUMN_MATCH_MODE = { case-sensitive | case-insensitive } ] + +``` + +:::note +For remote files, you can use glob patterns to specify multiple files. For example: +- `ontime_200{6,7,8}.csv` represents `ontime_2006.csv`, `ontime_2007.csv`, `ontime_2008.csv` +- `ontime_200[6-8].csv` represents the same files +::: + +## Key Parameters + +- **FILES**: Specifies one or more file names (separated by commas) to be loaded. + +- **PATTERN**: A [PCRE2](https://www.pcre.org/current/doc/html/)-based regular expression pattern string that specifies file names to match. See [Example 4: Filtering Files with Pattern](#example-4-filtering-files-with-pattern). + +## Format Type Options + +The `FILE_FORMAT` parameter supports different file types, each with specific formatting options. Below are the available options for each supported file format: + + + + +These options are available for all file formats: + +| Option | Description | Values | Default | +|--------|-------------|--------|--------| +| COMPRESSION | Compression algorithm for data files | AUTO, GZIP, BZ2, BROTLI, ZSTD, DEFLATE, RAW_DEFLATE, XZ, NONE | AUTO | + + + + + +| Option | Description | Default | +|--------|-------------|--------| +| RECORD_DELIMITER | Character(s) separating records | newline | +| FIELD_DELIMITER | Character(s) separating fields | comma (,) | +| SKIP_HEADER | Number of header lines to skip | 0 | +| QUOTE | Character used to quote fields | double-quote (") | +| ESCAPE | Escape character for enclosed fields | NONE | +| NAN_DISPLAY | String representing NaN values | NaN | +| NULL_DISPLAY | String representing NULL values | \N | +| ERROR_ON_COLUMN_COUNT_MISMATCH | Error if column count doesn't match | TRUE | +| EMPTY_FIELD_AS | How to handle empty fields | null | +| BINARY_FORMAT | Encoding format(HEX or BASE64) for binary data | HEX | + + + + + +| Option | Description | Default | +|--------|-------------|--------| +| RECORD_DELIMITER | Character(s) separating records | newline | +| FIELD_DELIMITER | Character(s) separating fields | tab (\t) | + + + + + +| Option | Description | Default | +|--------|-------------|--------| +| NULL_FIELD_AS | How to handle null fields | NULL | +| MISSING_FIELD_AS | How to handle missing fields | ERROR | +| ALLOW_DUPLICATE_KEYS | Allow duplicate object keys | FALSE | + + + + + +| Option | Description | Default | +|--------|-------------|--------| +| MISSING_FIELD_AS | How to handle missing fields | ERROR | + + + + + +| Option | Description | Default | +|--------|-------------|--------| +| MISSING_FIELD_AS | How to handle missing fields | ERROR | + + + + + +| Option | Description | Default | +|--------|-------------|--------| +| MISSING_FIELD_AS | How to handle missing fields | ERROR | + + + + +## Copy Options + +| Parameter | Description | Default | +|-----------|-------------|----------| +| SIZE_LIMIT | Maximum rows of data to load | `0` (no limit) | +| PURGE | Purges files after successful load | `false` | +| FORCE | Allows reloading of duplicate files | `false` (skips duplicates) | +| DISABLE_VARIANT_CHECK | Replaces invalid JSON with null | `false` (fails on invalid JSON) | +| ON_ERROR | How to handle errors: `continue`, `abort`, or `abort_N` | `abort` | +| MAX_FILES | Maximum number of files to load (up to 15,000) | - | +| RETURN_FAILED_ONLY | Only returns failed files in output | `false` | +| COLUMN_MATCH_MODE | For Parquet: column name matching mode | `case-insensitive` | + +:::tip +When importing large volumes of data, such as logs, it is recommended to set both `PURGE` and `FORCE` to `true`. This ensures efficient data import without the need for interaction with the Meta server (updating the copied-files set). However, it is important to be aware that this may lead to duplicate data imports. +::: + +## Output + +COPY INTO provides a summary of the data loading results with these columns: + +| Column | Type | Nullable | Description | +| ---------------- | ------- | -------- | ----------------------------------------------- | +| FILE | VARCHAR | NO | The relative path to the source file. | +| ROWS_LOADED | INT | NO | The number of rows loaded from the source file. | +| ERRORS_SEEN | INT | NO | Number of error rows in the source file | +| FIRST_ERROR | VARCHAR | YES | The first error found in the source file. | +| FIRST_ERROR_LINE | INT | YES | Line number of the first error. | + +If `RETURN_FAILED_ONLY` is set to `true`, the output will only contain the files that failed to load. + +## Examples + +:::tip Best Practice +For external storage sources, it's recommended to use pre-created connections with the `CONNECTION_NAME` parameter instead of specifying credentials directly in the COPY statement. This approach provides better security, maintainability, and reusability. See [CREATE CONNECTION](/tidb-cloud-lake/sql/create-connection.md) for details on creating connections. +::: + +### Example 1: Loading from Stages + +These examples showcase data loading into Databend from various types of stages: + + + + +```sql +COPY INTO mytable + FROM @~ + PATTERN = '.*[.]parquet' + FILE_FORMAT = (TYPE = PARQUET); +``` + + + + +```sql +COPY INTO mytable + FROM @my_internal_stage + PATTERN = '.*[.]parquet' + FILE_FORMAT = (TYPE = PARQUET); +``` + + + + +```sql +COPY INTO mytable + FROM @my_external_stage + PATTERN = '.*[.]parquet' + FILE_FORMAT = (TYPE = PARQUET); +``` + + + + +### Example 2: Loading from External Locations + +These examples showcase data loading into Databend from various types of external sources: + + + + +This example uses a pre-created connection to load data from Amazon S3: + +```sql +-- First create a connection (you only need to do this once) +CREATE CONNECTION my_s3_conn + STORAGE_TYPE = 's3' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = ''; + +-- Use the connection to load data +COPY INTO mytable + FROM 's3://mybucket/data.csv' + CONNECTION = (CONNECTION_NAME = 'my_s3_conn') + FILE_FORMAT = ( + TYPE = CSV, + FIELD_DELIMITER = ',', + RECORD_DELIMITER = '\n', + SKIP_HEADER = 1 + ) + SIZE_LIMIT = 10; +``` + +**Using IAM Role (Recommended for Production)** + +```sql +-- Create connection using IAM role (more secure, recommended for production) +CREATE CONNECTION my_iam_conn + STORAGE_TYPE = 's3' + ROLE_ARN = 'arn:aws:iam::123456789012:role/my_iam_role'; + +-- Load CSV files using the IAM role connection +COPY INTO mytable + FROM 's3://mybucket/' + CONNECTION = (CONNECTION_NAME = 'my_iam_conn') + PATTERN = '.*[.]csv' + FILE_FORMAT = ( + TYPE = CSV, + FIELD_DELIMITER = ',', + RECORD_DELIMITER = '\n', + SKIP_HEADER = 1 + ); +``` + + + + + +This example connects to Azure Blob Storage and loads data from 'data.csv' into Databend: + +```sql +-- Create connection for Azure Blob Storage +CREATE CONNECTION my_azure_conn + STORAGE_TYPE = 'azblob' + ENDPOINT_URL = 'https://.blob.core.windows.net' + ACCOUNT_NAME = '' + ACCOUNT_KEY = ''; + +-- Use the connection to load data +COPY INTO mytable + FROM 'azblob://mybucket/data.csv' + CONNECTION = (CONNECTION_NAME = 'my_azure_conn') + FILE_FORMAT = (type = CSV); +``` + + + + + +This example connects to Google Cloud Storage and loads data: + +```sql +-- Create connection for Google Cloud Storage +CREATE CONNECTION my_gcs_conn + STORAGE_TYPE = 'gcs' + CREDENTIAL = ''; + +-- Use the connection to load data +COPY INTO mytable + FROM 'gcs://mybucket/data.csv' + CONNECTION = (CONNECTION_NAME = 'my_gcs_conn') + FILE_FORMAT = ( + TYPE = CSV, + FIELD_DELIMITER = ',', + RECORD_DELIMITER = '\n', + SKIP_HEADER = 1 + ); +``` + + + + + +This example loads data from three remote CSV files and skips a file in case of errors. + +```sql +COPY INTO mytable + FROM 'https://ci.databend.org/dataset/stateful/ontime_200{6,7,8}_200.csv' + FILE_FORMAT = (type = CSV) + ON_ERROR = continue; +``` + + + + + +This example loads data from a CSV file on IPFS: + +```sql +COPY INTO mytable + FROM 'ipfs://' + CONNECTION = ( + ENDPOINT_URL = 'https://' + ) + FILE_FORMAT = ( + TYPE = CSV, + FIELD_DELIMITER = ',', + RECORD_DELIMITER = '\n', + SKIP_HEADER = 1 + ); +``` + + + + +### Example 3: Loading Compressed Data + +This example loads a GZIP-compressed CSV file on Amazon S3 into Databend: + +```sql +-- Create connection for compressed data loading +CREATE CONNECTION compressed_s3_conn + STORAGE_TYPE = 's3' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = ''; + +-- Load GZIP-compressed CSV file using the connection +COPY INTO mytable + FROM 's3://mybucket/data.csv.gz' + CONNECTION = (CONNECTION_NAME = 'compressed_s3_conn') + FILE_FORMAT = ( + TYPE = CSV, + FIELD_DELIMITER = ',', + RECORD_DELIMITER = '\n', + SKIP_HEADER = 1, + COMPRESSION = AUTO + ); +``` + +### Example 4: Filtering Files with Pattern + +This example demonstrates how to load CSV files from Amazon S3 using pattern matching with the PATTERN parameter. It filters files with 'sales' in their names and '.csv' extensions: + +```sql +-- Create connection for pattern-based file loading +CREATE CONNECTION pattern_s3_conn + STORAGE_TYPE = 's3' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = ''; + +-- Load CSV files with 'sales' in their names using pattern matching +COPY INTO mytable + FROM 's3://mybucket/' + CONNECTION = (CONNECTION_NAME = 'pattern_s3_conn') + PATTERN = '.*sales.*[.]csv' + FILE_FORMAT = ( + TYPE = CSV, + FIELD_DELIMITER = ',', + RECORD_DELIMITER = '\n', + SKIP_HEADER = 1 + ); +``` + +Where `.*` is interpreted as zero or more occurrences of any character. The square brackets escape the period character `.` that precedes a file extension. + +To load from all the CSV files using a connection: + +```sql +COPY INTO mytable + FROM 's3://mybucket/' + CONNECTION = (CONNECTION_NAME = 'pattern_s3_conn') + PATTERN = '.*[.]csv' + FILE_FORMAT = ( + TYPE = CSV, + FIELD_DELIMITER = ',', + RECORD_DELIMITER = '\n', + SKIP_HEADER = 1 + ); +``` + +When specifying the pattern for a file path including multiple folders, consider your matching criteria: + +- If you want to match a specific subpath following a prefix, include the prefix in the pattern (e.g., 'multi_page/') and then specify the pattern you want to match within that subpath (e.g., '\_page_1'). + +```sql +-- File path: parquet/multi_page/multi_page_1.parquet +COPY INTO ... FROM @data/parquet/ PATTERN = 'multi_page/.*_page_1.*') ... +``` + +- If you want to match any part of the file path that contains the desired pattern, use '.*' before and after the pattern (e.g., '.*multi_page_1.\*') to match any occurrences of 'multi_page_1' within the path. + +```sql +-- File path: parquet/multi_page/multi_page_1.parquet +COPY INTO ... FROM @data/parquet/ PATTERN ='.*multi_page_1.*') ... +``` + +### Example 5: Loading to Table with Extra Columns + +This section demonstrates data loading into a table with extra columns, using the sample file [books.csv](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.csv): + +```text title='books.csv' +Transaction Processing,Jim Gray,1992 +Readings in Database Systems,Michael Stonebraker,2004 +``` + +![Alt text](/img/load/load-extra.png) + +By default, COPY INTO loads data into a table by matching the order of fields in the file to the corresponding columns in the table. It's essential to ensure that the data aligns correctly between the file and the table. For example, + +```sql +CREATE TABLE books +( + title VARCHAR, + author VARCHAR, + date VARCHAR +); + +COPY INTO books + FROM 'https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.csv' + FILE_FORMAT = (TYPE = CSV); +``` + +If your table has more columns than the file, you can specify the columns into which you want to load data. For example, + +```sql +CREATE TABLE books_with_language +( + title VARCHAR, + language VARCHAR, + author VARCHAR, + date VARCHAR +); + +COPY INTO books_with_language (title, author, date) + FROM 'https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.csv' + FILE_FORMAT = (TYPE = CSV); +``` + +If your table has more columns than the file, and the additional columns are at the end of the table, you can load data using the [FILE_FORMAT](#file_format) option `ERROR_ON_COLUMN_COUNT_MISMATCH`. This allows you to load data without specifying each column individually. Please note that ERROR_ON_COLUMN_COUNT_MISMATCH currently works for the CSV file format. + +```sql +CREATE TABLE books_with_extra_columns +( + title VARCHAR, + author VARCHAR, + date VARCHAR, + language VARCHAR, + region VARCHAR +); + +COPY INTO books_with_extra_columns + FROM 'https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.csv' + FILE_FORMAT = (TYPE = CSV, ERROR_ON_COLUMN_COUNT_MISMATCH = false); +``` + +:::note +Extra columns in a table can have default values specified by [CREATE TABLE](/tidb-cloud-lake/sql/create-table.md) or [ALTER TABLE](/tidb-cloud-lake/sql/alter-table.md#column-operations). If a default value is not explicitly set for an extra column, the default value associated with its data type will be applied. For instance, an integer-type column will default to 0 if no other value is specified. +::: + +### Example 6: Loading JSON with Custom Format + +This example loads data from a CSV file "data.csv" with the following content: + +```json +1,"U00010","{\"carPriceList\":[{\"carTypeId":10,\"distance":5860},{\"carTypeId":11,\"distance\":5861}]}" +2,"U00011","{\"carPriceList\":[{\"carTypeId":12,\"distance":5862},{\"carTypeId":13,\"distance\":5863}]}" +``` + +Each line contains three columns of data, with the third column being a string containing JSON data. To load CSV data correctly with JSON fields, we need to set the correct escape character. This example uses the backslash \ as the escape character, as the JSON data contains double quotes ". + +#### Step 1: Create custom file format. + +```sql +-- Define a custom CSV file format with the escape character set to backslash \ +CREATE FILE FORMAT my_csv_format + TYPE = CSV + ESCAPE = '\\'; +``` + +#### Step 2: Create target table. + +```sql +CREATE TABLE t + ( + id INT, + seq VARCHAR, + p_detail VARCHAR + ); +``` + +#### Step 3: Load with custom file format. + +```sql +COPY INTO t FROM @t_stage FILES=('data.csv') +FILE_FORMAT=(FORMAT_NAME='my_csv_format'); +``` + +### Example 7: Loading Invalid JSON + +When loading data into a Variant column, Databend automatically checks the data's validity and throws an error in case of any invalid data. For example, if you have a Parquet file named `invalid_json_string.parquet` in the user stage that contains invalid JSON data, like this: + +```sql +SELECT * +FROM @~/invalid_json_string.parquet; + +┌────────────────────────────────────┐ +│ a │ b │ +├─────────────────┼──────────────────┤ +│ 5 │ {"k":"v"} │ +│ 6 │ [1, │ +└────────────────────────────────────┘ + +DESC t2; + +┌──────────────────────────────────────────────┐ +│ Field │ Type │ Null │ Default │ Extra │ +├────────┼─────────┼────────┼─────────┼────────┤ +│ a │ VARCHAR │ YES │ NULL │ │ +│ b │ VARIANT │ YES │ NULL │ │ +└──────────────────────────────────────────────┘ +``` + +An error would occur when attempting to load the data into a table: + +```sql +COPY INTO t2 FROM @~/invalid_json_string.parquet FILE_FORMAT = (TYPE = PARQUET) ON_ERROR = CONTINUE; +error: APIError: ResponseError with 1006: EOF while parsing a value, pos 3 while evaluating function `parse_json('[1,')` +``` + +To load without checking the JSON validity, set the option `DISABLE_VARIANT_CHECK` to `true` in the COPY INTO statement: + +```sql +COPY INTO t2 FROM @~/invalid_json_string.parquet +FILE_FORMAT = (TYPE = PARQUET) +DISABLE_VARIANT_CHECK = true +ON_ERROR = CONTINUE; + +┌───────────────────────────────────────────────────────────────────────────────────────────────┐ +│ File │ Rows_loaded │ Errors_seen │ First_error │ First_error_line │ +├─────────────────────────────┼─────────────┼─────────────┼──────────────────┼──────────────────┤ +│ invalid_json_string.parquet │ 2 │ 0 │ NULL │ NULL │ +└───────────────────────────────────────────────────────────────────────────────────────────────┘ + +SELECT * FROM t2; +-- Invalid JSON is stored as null in the Variant column. +┌──────────────────────────────────────┐ +│ a │ b │ +├──────────────────┼───────────────────┤ +│ 5 │ {"k":"v"} │ +│ 6 │ null │ +└──────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/cos.md b/tidb-cloud-lake/sql/cos.md new file mode 100644 index 0000000000000..8292ac8854e6d --- /dev/null +++ b/tidb-cloud-lake/sql/cos.md @@ -0,0 +1,23 @@ +--- +title: COS +--- + +Returns the cosine of `x`, where `x` is given in radians. + +## Syntax + +```sql +COS( ) +``` + +## Examples + +```sql +SELECT COS(PI()); + +┌───────────┐ +│ cos(pi()) │ +├───────────┤ +│ -1 │ +└───────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/cosine-distance.md b/tidb-cloud-lake/sql/cosine-distance.md new file mode 100644 index 0000000000000..51f5c6778bc1a --- /dev/null +++ b/tidb-cloud-lake/sql/cosine-distance.md @@ -0,0 +1,100 @@ +--- +title: 'COSINE_DISTANCE' +description: 'Measuring similarity using the cosine_distance function in Databend' +--- + +Calculates the cosine distance between two vectors, measuring how dissimilar they are. + +## Syntax + +```sql +COSINE_DISTANCE(vector1, vector2) +``` + +## Arguments + +- `vector1`: First vector (VECTOR Data Type) +- `vector2`: Second vector (VECTOR Data Type) + +## Returns + +Returns a FLOAT value between 0 and 1: +- 0: Identical vectors (completely similar) +- 1: Orthogonal vectors (completely dissimilar) + +## Description + +The cosine distance measures the dissimilarity between two vectors based on the angle between them, regardless of their magnitude. The function: + +1. Verifies that both input vectors have the same length +2. Computes the sum of element-wise products (dot product) of the two vectors +3. Calculates the square root of the sum of squares for each vector (vector magnitudes) +4. Returns `1 - (dot_product / (magnitude1 * magnitude2))` + +The mathematical formula implemented is: + +``` +cosine_distance(v1, v2) = 1 - (Σ(v1ᵢ * v2ᵢ) / (√Σ(v1ᵢ²) * √Σ(v2ᵢ²))) +``` + +Where v1ᵢ and v2ᵢ are the elements of the input vectors. + +:::info +This function performs vector computations within Databend and does not rely on external APIs. +::: + + +## Examples + +### Basic Usage + +```sql +-- Calculate cosine distance between two vectors +SELECT COSINE_DISTANCE([1.0, 2.0, 3.0]::vector(3), [4.0, 5.0, 6.0]::vector(3)) AS distance; +``` + +Result: +``` +╭─────────────╮ +│ distance │ +├─────────────┤ +│ 0.025368214 │ +╰─────────────╯ +``` + +Create a table with vector data: + +```sql +CREATE OR REPLACE TABLE vectors ( + id INT, + vec VECTOR(3) +); + +INSERT INTO vectors VALUES + (1, [1.0000, 2.0000, 3.0000]), + (2, [1.0000, 2.2000, 3.0000]), + (3, [4.0000, 5.0000, 6.0000]); +``` + +Find the vector most similar to [1, 2, 3]: + +```sql +SELECT + id, + vec, + COSINE_DISTANCE(vec, [1.0000, 2.0000, 3.0000]::VECTOR(3)) AS distance +FROM + vectors +ORDER BY + distance ASC; +``` + +``` +╭────────────────────────────────────╮ +│ id │ vec │ distance │ +├────┼───────────┼───────────────────┤ +│ 1 │ [1,2,3] │ 0.000000059604645 │ +│ 2 │ [1,2.2,3] │ 0.00096315145 │ +│ 3 │ [4,5,6] │ 0.025368214 │ +╰────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/cot.md b/tidb-cloud-lake/sql/cot.md new file mode 100644 index 0000000000000..7fa18f80b8e3d --- /dev/null +++ b/tidb-cloud-lake/sql/cot.md @@ -0,0 +1,23 @@ +--- +title: COT +--- + +Returns the cotangent of `x`, where `x` is given in radians. + +## Syntax + +```sql +COT( ) +``` + +## Examples + +```sql +SELECT COT(12); + +┌─────────────────────┐ +│ cot(12) │ +├─────────────────────┤ +│ -1.5726734063976895 │ +└─────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/count-distinct.md b/tidb-cloud-lake/sql/count-distinct.md new file mode 100644 index 0000000000000..39740b09e1c3e --- /dev/null +++ b/tidb-cloud-lake/sql/count-distinct.md @@ -0,0 +1,64 @@ +--- +title: COUNT_DISTINCT +title_includes: uniq +--- + +Aggregate function. + +The count(distinct ...) function calculates the uniq value of a set of values. + +To obtain an estimated result from large data sets with little memory and time, consider using [APPROX_COUNT_DISTINCT](/tidb-cloud-lake/sql/approx-count-distinct.md). + +:::caution + NULL values are not counted. +::: + +## Syntax + +```sql +COUNT(distinct ...) +UNIQ() +``` + +## Arguments + +| Arguments | Description | +|-----------|--------------------------------------------------| +| `` | Any expression, size of the arguments is [1, 32] | + +## Return Type + +UInt64 + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE products ( + id INT, + name VARCHAR, + category VARCHAR, + price FLOAT +); + +INSERT INTO products (id, name, category, price) +VALUES (1, 'Laptop', 'Electronics', 1000), + (2, 'Smartphone', 'Electronics', 800), + (3, 'Tablet', 'Electronics', 600), + (4, 'Chair', 'Furniture', 150), + (5, 'Table', 'Furniture', 300); +``` + +**Query Demo: Count Distinct Categories** + +```sql +SELECT COUNT(DISTINCT category) AS unique_categories +FROM products; +``` + +**Result** +```sql +| unique_categories | +|-------------------| +| 2 | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/count-if.md b/tidb-cloud-lake/sql/count-if.md new file mode 100644 index 0000000000000..7d5ffefdc7574 --- /dev/null +++ b/tidb-cloud-lake/sql/count-if.md @@ -0,0 +1,45 @@ +--- +title: COUNT_IF +--- + + +## COUNT_IF + +The suffix `_IF` can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition. + +```sql +COUNT_IF(, ) +``` + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE orders ( + id INT, + customer_id INT, + status VARCHAR, + total FLOAT +); + +INSERT INTO orders (id, customer_id, status, total) +VALUES (1, 1, 'completed', 100), + (2, 2, 'completed', 200), + (3, 1, 'pending', 150), + (4, 3, 'completed', 250), + (5, 2, 'pending', 300); +``` + +**Query Demo: Count Completed Orders** +```sql +SELECT COUNT_IF(status, status = 'completed') AS completed_orders +FROM orders; +``` + +**Result** +```sql +| completed_orders | +|------------------| +| 3 | +``` + diff --git a/tidb-cloud-lake/sql/count.md b/tidb-cloud-lake/sql/count.md new file mode 100644 index 0000000000000..432210079d1aa --- /dev/null +++ b/tidb-cloud-lake/sql/count.md @@ -0,0 +1,58 @@ +--- +title: COUNT +--- + +The COUNT() function returns the number of records returned by a SELECT query. + +:::caution +NULL values are not counted. +::: + +## Syntax + +```sql +COUNT() +``` + +## Arguments + +| Arguments | Description | +|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `` | Any expression.
This may be a column name, the result of another function, or a math operation.
`*` is also allowed, to indicate pure row counting. | + +## Return Type + +An integer. + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE students ( + id INT, + name VARCHAR, + age INT, + grade FLOAT NULL +); + +INSERT INTO students (id, name, age, grade) +VALUES (1, 'John', 21, 85), + (2, 'Emma', 22, NULL), + (3, 'Alice', 23, 90), + (4, 'Michael', 21, 88), + (5, 'Sophie', 22, 92); + +``` + +**Query Demo: Count Students with Valid Grades** +```sql +SELECT COUNT(grade) AS count_valid_grades +FROM students; +``` + +**Result** +```sql +| count_valid_grades | +|--------------------| +| 4 | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/covar-pop.md b/tidb-cloud-lake/sql/covar-pop.md new file mode 100644 index 0000000000000..d5dc67fd28def --- /dev/null +++ b/tidb-cloud-lake/sql/covar-pop.md @@ -0,0 +1,63 @@ +--- +title: COVAR_POP +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the population covariance of a set of number pairs. + +## Syntax + +```sql +COVAR_POP(, ) +``` + +## Arguments + +| Arguments | Description | +|-----------| ------------------------ | +| `` | Any numerical expression | +| `` | Any numerical expression | + +## Aliases + +- [VAR_POP](/tidb-cloud-lake/sql/var-pop.md) +- [VARIANCE_POP](/tidb-cloud-lake/sql/variance-pop.md) + +## Return Type + +float64 + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE product_sales ( + id INT, + product_id INT, + units_sold INT, + revenue FLOAT +); + +INSERT INTO product_sales (id, product_id, units_sold, revenue) +VALUES (1, 1, 10, 1000), + (2, 2, 20, 2000), + (3, 3, 30, 3000), + (4, 4, 40, 4000), + (5, 5, 50, 5000); +``` + +**Query Demo: Calculate Population Covariance between Units Sold and Revenue** + +```sql +SELECT COVAR_POP(units_sold, revenue) AS covar_pop_units_revenue +FROM product_sales; +``` + +**Result** +```sql +| covar_pop_units_revenue | +|-------------------------| +| 20000.0 | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/covar-samp.md b/tidb-cloud-lake/sql/covar-samp.md new file mode 100644 index 0000000000000..057e569df1b9b --- /dev/null +++ b/tidb-cloud-lake/sql/covar-samp.md @@ -0,0 +1,70 @@ +--- +title: COVAR_SAMP +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the sample covariance (Σ((x - x̅)(y - y̅)) / (n - 1)) of two data columns. + +:::caution +NULL values are not counted. +::: + +## Syntax + +```sql +COVAR_SAMP(, ) +``` + +## Arguments + +| Arguments | Description | +| --------- | ------------------------ | +| `` | Any numerical expression | +| `` | Any numerical expression | + +## Aliases + +- [VAR_SAMP](/tidb-cloud-lake/sql/var-samp.md) +- [VARIANCE_SAMP](/tidb-cloud-lake/sql/variance-samp.md) + +## Return Type + +float64, when `n <= 1`, returns +∞. + +## Example + +**Create a Table and Insert Sample Data** + +```sql +CREATE TABLE store_sales ( + id INT, + store_id INT, + items_sold INT, + profit FLOAT +); + +INSERT INTO store_sales (id, store_id, items_sold, profit) +VALUES (1, 1, 100, 1000), + (2, 2, 200, 2000), + (3, 3, 300, 3000), + (4, 4, 400, 4000), + (5, 5, 500, 5000); +``` + +**Query Demo: Calculate Sample Covariance between Items Sold and Profit** + +```sql +SELECT COVAR_SAMP(items_sold, profit) AS covar_samp_items_profit +FROM store_sales; +``` + +**Result** + +```sql +| covar_samp_items_profit | +|-------------------------| +| 250000.0 | +``` diff --git a/tidb-cloud-lake/sql/crc.md b/tidb-cloud-lake/sql/crc.md new file mode 100644 index 0000000000000..648910d4c85de --- /dev/null +++ b/tidb-cloud-lake/sql/crc.md @@ -0,0 +1,23 @@ +--- +title: CRC32 +--- + +Returns the CRC32 checksum of `x`, where 'x' is expected to be a string and (if possible) is treated as one if it is not. + +## Syntax + +```sql +CRC32( '' ) +``` + +## Examples + +```sql +SELECT CRC32('databend'); + +┌───────────────────┐ +│ crc32('databend') │ +├───────────────────┤ +│ 1177678456 │ +└───────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/create-aggregate-function.md b/tidb-cloud-lake/sql/create-aggregate-function.md new file mode 100644 index 0000000000000..e1862a6e1d038 --- /dev/null +++ b/tidb-cloud-lake/sql/create-aggregate-function.md @@ -0,0 +1,151 @@ +--- +title: CREATE AGGREGATE FUNCTION +sidebar_position: 1 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a user-defined aggregate function (UDAF) that runs inside Databend's JavaScript or Python runtime. + +### Supported Languages + +- `javascript` +- `python` + +## Syntax + +```sql +CREATE [ OR REPLACE ] FUNCTION [ IF NOT EXISTS ] + ( [ ] ) + STATE { } + RETURNS + LANGUAGE + [ IMPORTS = () ] + [ PACKAGES = () ] +AS $$ + +$$ +[ DESC='' ] +``` + +| Parameter | Description | +| --- | --- | +| `` | Name of the aggregate function. | +| `` | Optional comma-separated list of input parameters and types (for example `value DOUBLE`). | +| `STATE { }` | Struct definition that Databend stores between partial/final aggregation steps (for example `STATE { sum DOUBLE, count DOUBLE }`). | +| `` | Data type returned by the aggregate (`DOUBLE`, `INT`, etc.). | +| `LANGUAGE` | Runtime used to execute the script. Supported values: `javascript`, `python`. | +| `IMPORTS` / `PACKAGES` | Optional lists for shipping extra files (imports) or PyPI packages (Python only). | +| `` | Script body that must expose `create_state`, `accumulate`, `merge`, and `finish` entry points. | +| `DESC` | Optional description. | + +The script must implement these functions: + +- `create_state()` – allocate and return an initial state object. +- `accumulate(state, *args)` – update the state for each input row. +- `merge(state1, state2)` – merge two partial states. +- `finish(state)` – produce the final result (return `None` for SQL `NULL`). + +## Access control requirements + +| Privilege | Object Type | Description | +|:----------|:--------------|:---------------| +| SUPER | Global, Table | Operates a UDF | + +To create a user-defined function, the user performing the operation or the [current_role](/tidb-cloud-lake/guides/roles.md) must have the SUPER [privilege](/tidb-cloud-lake/guides/privileges.md). + +## Examples + +### Python average UDAF + +The following Python aggregate computes the average of a column: + +```sql +CREATE OR REPLACE FUNCTION py_avg (value DOUBLE) + STATE { sum DOUBLE, count DOUBLE } + RETURNS DOUBLE + LANGUAGE python +AS $$ +class State: + def __init__(self): + self.sum = 0.0 + self.count = 0.0 + +def create_state(): + return State() + +def accumulate(state, value): + if value is not None: + state.sum += value + state.count += 1 + return state + +def merge(state1, state2): + state1.sum += state2.sum + state1.count += state2.count + return state1 + +def finish(state): + if state.count == 0: + return None + return state.sum / state.count +$$; + +SELECT py_avg(number) AS avg_val FROM numbers(5); +``` + +``` ++---------+ +| avg_val | ++---------+ +| 2 | ++---------+ +``` + +### JavaScript average UDAF + +The next example shows the same calculation implemented in JavaScript: + +```sql +CREATE OR REPLACE FUNCTION js_avg (value DOUBLE) + STATE { sum DOUBLE, count DOUBLE } + RETURNS DOUBLE + LANGUAGE javascript +AS $$ +export function create_state() { + return { sum: 0, count: 0 }; +} + +export function accumulate(state, value) { + if (value !== null) { + state.sum += value; + state.count += 1; + } + return state; +} + +export function merge(state1, state2) { + state1.sum += state2.sum; + state1.count += state2.count; + return state1; +} + +export function finish(state) { + if (state.count === 0) { + return null; + } + return state.sum / state.count; +} +$$; + +SELECT js_avg(number) AS avg_val FROM numbers(5); +``` + +``` ++---------+ +| avg_val | ++---------+ +| 2 | ++---------+ +``` diff --git a/tidb-cloud-lake/sql/create-aggregating-index.md b/tidb-cloud-lake/sql/create-aggregating-index.md new file mode 100644 index 0000000000000..c58c0e4befe58 --- /dev/null +++ b/tidb-cloud-lake/sql/create-aggregating-index.md @@ -0,0 +1,35 @@ +--- +title: CREATE AGGREGATING INDEX +sidebar_position: 1 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Create a new aggregating index in Databend. + +## Syntax + +```sql +CREATE [ OR REPLACE ] AGGREGATING INDEX AS SELECT ... +``` + +- When creating aggregating indexes, limit their usage to standard [Aggregate Functions](/tidb-cloud-lake/_index.md) (e.g., AVG, SUM, MIN, MAX, COUNT and GROUP BY), while keeping in mind that GROUPING SETS, [Window Functions](/tidb-cloud-lake/_index.md), [LIMIT](/tidb-cloud-lake/sql/select.md#limit-clause), and [ORDER BY](/tidb-cloud-lake/sql/select.md#order-by-clause) are not accepted, or you will get an error: `Currently create aggregating index just support simple query, like: SELECT ... FROM ... WHERE ... GROUP BY ...`. + +- The query filter scope defined when creating aggregating indexes should either match or encompass the scope of your actual queries. + +- To confirm if an aggregating index works for a query, use the [EXPLAIN](/tidb-cloud-lake/sql/explain.md) command to analyze the query. + +## Examples + +This example creates an aggregating index named *my_agg_index* for the query "SELECT MIN(a), MAX(c) FROM agg": + +```sql +-- Prepare data +CREATE TABLE agg(a int, b int, c int); +INSERT INTO agg VALUES (1,1,4), (1,2,1), (1,2,4), (2,2,5); + +-- Create an aggregating index +CREATE AGGREGATING INDEX my_agg_index AS SELECT MIN(a), MAX(c) FROM agg; +``` diff --git a/tidb-cloud-lake/sql/create-connection.md b/tidb-cloud-lake/sql/create-connection.md new file mode 100644 index 0000000000000..4b1cbf7e25ea7 --- /dev/null +++ b/tidb-cloud-lake/sql/create-connection.md @@ -0,0 +1,189 @@ +--- +title: CREATE CONNECTION +sidebar_position: 1 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + +Creates a connection to external storage. + +:::warning +IMPORTANT: When objects (stages, tables, etc.) use a connection, they copy and store the connection's parameters permanently. If you later modify the connection using CREATE OR REPLACE CONNECTION, existing objects will continue using the old parameters. To update objects with new connection parameters, you must drop and recreate those objects. +::: + +## Syntax + +```sql +CREATE [ OR REPLACE ] CONNECTION [ IF NOT EXISTS ] + STORAGE_TYPE = '' + [ ] +``` + +| Parameter | Description | +|------------------|----------------------------------------------------------------------------------------------------------------------------------------------------| +| STORAGE_TYPE | Type of storage service. Possible values include: `s3`, `azblob`, `gcs`, `oss`, and `cos`. | +| storage_params | Vary based on storage type and authentication method. See [Connection Parameters](/tidb-cloud-lake/sql/connection-parameters.md) for the complete list. | + +## Connection Parameters + +Connections encapsulate the credentials and configuration for a specific storage backend. Choose the appropriate `STORAGE_TYPE` and provide the required parameters when creating the connection. The table highlights common options: + +| STORAGE_TYPE | Typical parameters | Description | +|--------------|-------------------|-------------| +| `s3` | `ACCESS_KEY_ID`/`SECRET_ACCESS_KEY`, or `ROLE_ARN`/`EXTERNAL_ID`, optional `ENDPOINT_URL`, `REGION` | Amazon S3 and S3-compatible services (MinIO, Cloudflare R2, etc.). | +| `azblob` | `ACCOUNT_NAME`, `ACCOUNT_KEY`, `ENDPOINT_URL` | Azure Blob Storage. | +| `gcs` | `CREDENTIAL` (base64-encoded service account key) | Google Cloud Storage. | +| `oss` | `ACCESS_KEY_ID`, `ACCESS_KEY_SECRET`, `ENDPOINT_URL` | Alibaba Cloud Object Storage Service. | +| `cos` | `SECRET_ID`, `SECRET_KEY`, `ENDPOINT_URL` | Tencent Cloud Object Storage. | +| `hf` | `REPO_TYPE`, `REVISION`, optional `TOKEN` | Hugging Face Hub datasets and models. | + +For parameter meanings, optional flags, and additional storage types, refer to [Connection Parameters](/tidb-cloud-lake/sql/connection-parameters.md). Expand the tabs below to see storage-specific examples: + + + + +Choose an authentication method for Amazon S3 and S3-compatible services: + + + + +```sql +CREATE CONNECTION + STORAGE_TYPE = 's3' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = ''; +``` + +| Parameter | Description | +|-----------|-------------| +| ACCESS_KEY_ID | Your AWS access key ID. | +| SECRET_ACCESS_KEY | Your AWS secret access key. | + + + + +```sql +CREATE CONNECTION + STORAGE_TYPE = 's3' + ROLE_ARN = ''; +``` + +| Parameter | Description | +|-----------|-------------| +| ROLE_ARN | The Amazon Resource Name (ARN) of the IAM role that Databend will assume to access your S3 resources. | + + + + + + + +```sql +CREATE CONNECTION + STORAGE_TYPE = 'azblob' + ACCOUNT_NAME = '' + ACCOUNT_KEY = '' + ENDPOINT_URL = 'https://.blob.core.windows.net'; +``` + + + + +```sql +CREATE CONNECTION + STORAGE_TYPE = 'gcs' + CREDENTIAL = ''; +``` + + + + +```sql +CREATE CONNECTION + STORAGE_TYPE = 'oss' + ACCESS_KEY_ID = '' + ACCESS_KEY_SECRET = '' + ENDPOINT_URL = 'https://.[-internal].aliyuncs.com'; +``` + + + + +```sql +CREATE CONNECTION + STORAGE_TYPE = 'cos' + SECRET_ID = '' + SECRET_KEY = '' + ENDPOINT_URL = ''; +``` + + + + +```sql +CREATE CONNECTION + STORAGE_TYPE = 'hf' + REPO_TYPE = 'dataset' + REVISION = 'main' + TOKEN = ''; +``` + +Omit `TOKEN` for public repositories; include it for private or rate-limited assets. + + + + + +## Access control requirements + +| Privilege | Object Type | Description | +|:------------------|:------------|:----------------------| +| CREATE CONNECTION | Global | Creates a connection. | + + +To create a connection, the user performing the operation or the [current_role](/tidb-cloud-lake/guides/roles.md) must have the CREATE CONNECTION [privilege](/tidb-cloud-lake/guides/privileges.md). + +## Update Table Connections + +To switch an existing table to a new connection, use [`ALTER TABLE ... CONNECTION`](/tidb-cloud-lake/sql/alter-table.md#external-table-connection). This command rebinds external tables to a different connection without recreating the table. + +## Examples + +### Using Access Keys + +This example creates a connection to Amazon S3 named 'toronto' and establishes an external stage named 'my_s3_stage' linked to the 's3://databend-toronto' URL, using the 'toronto' connection. For more practical examples about connection, see [Usage Examples](/tidb-cloud-lake/sql/connection.md#usage-examples). + +```sql +CREATE CONNECTION toronto + STORAGE_TYPE = 's3' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = ''; + +CREATE STAGE my_s3_stage + URL = 's3://databend-toronto' + CONNECTION = (CONNECTION_NAME = 'toronto'); +``` + +### Using AWS IAM Role + +This example creates a connection to Amazon S3 using an IAM role and then creates a stage that uses this connection. This approach is more secure as it doesn't require storing access keys in Databend. + +```sql +CREATE CONNECTION databend_test + STORAGE_TYPE = 's3' + ROLE_ARN = 'arn:aws:iam::987654321987:role/databend-test'; + +CREATE STAGE databend_test + URL = 's3://test-bucket-123' + CONNECTION = (CONNECTION_NAME = 'databend_test'); + +-- You can now query data from your S3 bucket +SELECT * FROM @databend_test/test.parquet LIMIT 1; +``` + +:::info +To use IAM roles with Databend Cloud, you need to set up a trust relationship between your AWS account and Databend Cloud. See [Authenticate with AWS IAM Role](/guides/cloud/security/iam-role) for detailed instructions. +::: diff --git a/tidb-cloud-lake/sql/create-database.md b/tidb-cloud-lake/sql/create-database.md new file mode 100644 index 0000000000000..2d1ff3ddb8466 --- /dev/null +++ b/tidb-cloud-lake/sql/create-database.md @@ -0,0 +1,33 @@ +--- +title: CREATE DATABASE +sidebar_position: 1 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Create a database. + +## Syntax + +```sql +CREATE [ OR REPLACE ] DATABASE [ IF NOT EXISTS ] +``` + +## Access control requirements + +| Privilege | Object Type | Description | +|:----------------|:------------|:--------------------| +| CREATE DATABASE | Global | Creates a database. | + + +To create a database, the user performing the operation or the [current_role](/tidb-cloud-lake/guides/roles.md) must have the CREATE DATABASE [privilege](/tidb-cloud-lake/guides/privileges.md). + +## Examples + +The following example creates a database named `test`: + +```sql +CREATE DATABASE test; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/create-external-table.md b/tidb-cloud-lake/sql/create-external-table.md new file mode 100644 index 0000000000000..9966dcf75b310 --- /dev/null +++ b/tidb-cloud-lake/sql/create-external-table.md @@ -0,0 +1,131 @@ +--- +title: CREATE EXTERNAL TABLE +sidebar_position: 2 +--- + +The `CREATE TABLE ... CONNECTION = (...)` statement creates a table and specifies an S3-compatible storage bucket for data storage instead of using the default local storage. + +Then the fuse table engine table will be stored in the specified S3-compatible bucket. + +## Benefits + +- You can determine the storage location of the table data. +- Leverage high-performance storage like [Amazon S3 Express One Zone](https://aws.amazon.com/s3/storage-classes/express-one-zone/), to improve performance. + +## Syntax + +```sql +CREATE TABLE [IF NOT EXISTS] [db.]table_name ( + [NOT NULL | NULL] [{ DEFAULT }], + [NOT NULL | NULL] [{ DEFAULT }], + ... +) +'s3:///[]' +CONNECTION = ( + ENDPOINT_URL = 'https://' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = '' + ENABLE_VIRTUAL_HOST_STYLE = 'true' | 'false' +) +| +CONNECTION = ( + CONNECTION_NAME = '' +); +``` + +Connection parameters: + +| Parameter | Description | Required | +|-----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------| +| `s3:///[]` | Files are in the specified external location (S3-like bucket) | YES | +| ENDPOINT_URL | The bucket endpoint URL starting with "https://". To use a URL starting with "http://", set `allow_insecure` to `true` in the [storage] block of the file `databend-query-node.toml`. | Optional | +| ACCESS_KEY_ID | Your access key ID for connecting the AWS S3 compatible object storage. If not provided, Databend will access the bucket anonymously. | Optional | +| SECRET_ACCESS_KEY | Your secret access key for connecting the AWS S3 compatible object storage. | Optional | +| ENABLE_VIRTUAL_HOST_STYLE | If you use virtual hosting to address the bucket, set it to "true". | Optional | + +For more information on `CONNECTION_NAME`, see [CREATE CONNECTION](/tidb-cloud-lake/sql/create-connection.md) + +## S3-compatible Bucket Policy Requirements + +The external location S3 bucket must have the following permissions granted through an S3 bucket policy: + +**Read-only Access:** +- `s3:GetObject`: Allows reading objects from the bucket. +- `s3:ListBucket`: Allows listing objects in the bucket. +- `s3:ListBucketVersions`: Allows listing object versions in the bucket. +- `s3:GetObjectVersion`: Allows retrieving a specific version of an object. + +**Writable Access:** +- `s3:PutObject`: Allows writing objects to the bucket. +- `s3:DeleteObject`: Allows deleting objects from the bucket. +- `s3:AbortMultipartUpload`: Allows aborting multipart uploads. +- `s3:DeleteObjectVersion`: Allows deleting a specific version of an object. +::: + +## Examples + +:::info + +Before using the `SHOW CREATE TABLE` command, you need to set the `hide_options_in_show_create_table` variable to `0`. +```sql +SET GLOBAL hide_options_in_show_create_table = 0; +``` +::: + +### Create a Table with External Location + +Create a table with data stored on an external location, such as Amazon S3: + +```sql +-- Create a table named `mytable` and specify the location `s3://testbucket/admin/data/` for the data storage +CREATE TABLE mytable ( + a INT +) +'s3://testbucket/admin/data/' +CONNECTION = ( + ACCESS_KEY_ID = '', + SECRET_ACCESS_KEY = '', + ENDPOINT_URL = 'https://s3.amazonaws.com' +); + +-- Show the table schema +SHOW CREATE TABLE mytable; + +CREATE TABLE mytable ( + a INT NULL +) +ENGINE = FUSE +COMPRESSION = 'zstd' +STORAGE_FORMAT = 'parquet' +LOCATION = 's3 | bucket=testbucket,root=/admin/data/,endpoint=https://s3.amazonaws.com'; +``` + +### Create a Table Using a Connection + +Or you can create a connection and use it to create a table: +```sql +-- Create a connection named `s3_connection` for the S3 credentials +CREATE CONNECTION s3_connection + STORAGE_TYPE = 's3' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = ''; + +CREATE TABLE mytable ( + a INT +) +'s3://testbucket/admin/data/' +CONNECTION = ( + CONNECTION_NAME = 's3_connection' +); + +-- Show the table schema +SHOW CREATE TABLE mytable; + +CREATE TABLE mytable ( + a INT NULL +) +ENGINE = FUSE +COMPRESSION = 'zstd' +STORAGE_FORMAT = 'parquet' +LOCATION = 's3 | bucket=testbucket,root=/admin/data/,endpoint=https://s3.amazonaws.com'; +``` diff --git a/tidb-cloud-lake/sql/create-file-format.md b/tidb-cloud-lake/sql/create-file-format.md new file mode 100644 index 0000000000000..d35c52b443da6 --- /dev/null +++ b/tidb-cloud-lake/sql/create-file-format.md @@ -0,0 +1,52 @@ +--- +title: CREATE FILE FORMAT +sidebar_position: 1 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Create a named file format. + +## Syntax + +```sql +CREATE [ OR REPLACE ] FILE FORMAT [ IF NOT EXISTS ] FileFormatOptions +``` + +For details about `FileFormatOptions`, see [Input & Output File Formats](/tidb-cloud-lake/sql/input-output-file-formats.md). + +## Use the file format + +Create once, then reuse the format for both querying and loading: + +```sql +-- 1) Create a reusable format +CREATE OR REPLACE FILE FORMAT my_custom_csv TYPE = CSV FIELD_DELIMITER = '\t'; + +-- 2) Query staged files (stage table function syntax uses =>) +SELECT * FROM @mystage/data.csv (FILE_FORMAT => 'my_custom_csv') LIMIT 10; + +-- 3) Load staged files with COPY INTO (copy options use =) +COPY INTO my_table +FROM @mystage/data.csv +FILE_FORMAT = (FORMAT_NAME = 'my_custom_csv'); +``` + +Why the different operators? Stage table functions take key/value parameters written with `=>`, while `COPY INTO` options use standard assignments with `=`. + +**Quick workflow: create, query, and load with the same format** + +```sql +-- Create a reusable format +CREATE FILE FORMAT my_parquet TYPE = PARQUET; + +-- Query staged files with the format (stage table function syntax uses =>) +SELECT * FROM @sales_stage/2024/order.parquet (FILE_FORMAT => 'my_parquet') LIMIT 10; + +-- Load staged files with COPY INTO (copy options use =) +COPY INTO analytics.orders +FROM @sales_stage/2024/order.parquet +FILE_FORMAT = (FORMAT_NAME = 'my_parquet'); +``` diff --git a/tidb-cloud-lake/sql/create-function.md b/tidb-cloud-lake/sql/create-function.md new file mode 100644 index 0000000000000..e61e16af2e609 --- /dev/null +++ b/tidb-cloud-lake/sql/create-function.md @@ -0,0 +1,46 @@ +--- +title: CREATE FUNCTION +sidebar_position: 1 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates an external function that calls a remote handler over Flight (typically Python or other services). + +### Supported Languages + +- Determined by the remote server (commonly Python, but any language can be used as long as it implements the Flight endpoint) + +## Syntax + +```sql +CREATE [ OR REPLACE ] FUNCTION [ IF NOT EXISTS ] + AS ( ) RETURNS LANGUAGE + HANDLER = '' ADDRESS = '' + [DESC=''] +``` + +| Parameter | Description | +|-----------------------|---------------------------------------------------------------------------------------------------| +| `` | The name of the function. | +| `` | The lambda expression or code snippet defining the function's behavior. | +| `DESC=''` | Description of the UDF.| +| `<`| A list of input parameter names. Separated by comma.| +| `<`| A list of input parameter types. Separated by comma.| +| `` | The return type of the function. | +| `LANGUAGE` | Specifies the language used to write the function. Available values: `python`. | +| `HANDLER = ''` | Specifies the name of the function's handler. | +| `ADDRESS = ''` | Specifies the address of the UDF server. | + +## Examples + +This example creates an external function that calculates the greatest common divisor (GCD) of two integers: + +```sql +CREATE FUNCTION gcd AS (INT, INT) + RETURNS INT + LANGUAGE python + HANDLER = 'gcd' + ADDRESS = 'http://localhost:8815'; +``` diff --git a/tidb-cloud-lake/sql/create-inverted-index.md b/tidb-cloud-lake/sql/create-inverted-index.md new file mode 100644 index 0000000000000..b355a3b99dd2f --- /dev/null +++ b/tidb-cloud-lake/sql/create-inverted-index.md @@ -0,0 +1,82 @@ +--- +title: CREATE INVERTED INDEX +sidebar_position: 1 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a new inverted index in Databend. + +## Syntax + +```sql +CREATE [ OR REPLACE ] INVERTED INDEX [IF NOT EXISTS] + ON [.]
( [, ...] ) + [ ] +``` + +| Parameter | Description | +|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------| +| `[ OR REPLACE ]` | Optional parameter indicating that if the index already exists, it will be replaced. | +| `[ IF NOT EXISTS ]` | Optional parameter indicating that the index will only be created if it does not already exist. | +| `` | The name of the inverted index to be created. | +| `[.]
` | The name of the database and table containing the columns for which the index will be created. | +| `` | The name of the column(s) to be included in the index. Multiple indexes can be created for the same table, but each column must be unique across indexes. | +| `` | Optional index options specifying how the inverted index is built. | + +### IndexOptions + +```sql +IndexOptions ::= + TOKENIZER = 'english' | 'chinese' + FILTERS = 'english_stop' | 'english_stemmer' | 'chinese_stop' + INDEX_RECORD = 'position' | 'basic' | 'freq' +``` + +- `TOKENIZER` specifies how text is segmented for indexing. It supports `english` (default) and `chinese` tokenizers. + +- `FILTERS` defines rules for term filtering: + + - Multiple filters can be specified, separated by commas, e.g., `FILTERS = 'english_stop,english_stemmer'`. + - A lower case filter is added by default to convert words to lowercase letters. + +| FILTERS | Description | +|-------------------|-------------------------------------------------------------------------------------------------------------------------| +| `english_stop` | Removes English stop words like "a", "an", "and" etc. | +| `english_stemmer` | Maps different forms of the same word to one common word. For example, "walking" and "walked" will be mapped to "walk". | +| `chinese_stop` | Removes Chinese stop words, currently only supports removal of Chinese punctuation marks. | + +- `INDEX_RECORD` determines what is to be stored for the index data: + +| INDEX_RECORD | Default? | Description | +|--------------|----------|-------------------------------------------------------------------------------------------------------------------------| +| `position` | Yes | Stores DocId, term frequency, and positions, occupies the most space, offers better scoring, and supports phrase terms. | +| `basic` | No | Stores only the DocId, occupies minimal space, but doesn't support phrase searches like "brown fox". | +| `freq` | No | Stores DocId and term frequency, occupies medium space, doesn't support phrase terms, but may provide better scores. | + +## Examples + +```sql +-- Create an inverted index for the 'comment_text' column in the table 'user_comments' +CREATE INVERTED INDEX user_comments_idx ON user_comments(comment_text); + +-- Create an inverted index with a Chinese tokenizer +-- If no tokenizer is specified, the default is English +-- Filters are `english_stop`, `english_stemmer` and `chinese_stop` +-- Index_record in `basic`. +CREATE INVERTED INDEX product_reviews_idx ON product_reviews(review_text) TOKENIZER = 'chinese' FILTERS = 'english_stop,english_stemmer,chinese_stop' INDEX_RECORD='basic'; + +-- Create an inverted index for the 'comment_title' and 'comment_body' columns in the table 'user_comments' +-- The output of SHOW CREATE TABLE includes information about the created inverted index +CREATE INVERTED INDEX customer_feedback_idx ON customer_feedback(comment_title, comment_body); + +SHOW CREATE TABLE customer_feedback; + +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ Table │ Create Table │ +├───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ customer_feedback │ CREATE TABLE customer_feedback (\n comment_title VARCHAR NULL,\n comment_body VARCHAR NULL,\n SYNC INVERTED INDEX customer_feedback_idx (comment_title, comment_body)\n) ENGINE=FUSE │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/create-masking-policy.md b/tidb-cloud-lake/sql/create-masking-policy.md new file mode 100644 index 0000000000000..4ca5df805cb5e --- /dev/null +++ b/tidb-cloud-lake/sql/create-masking-policy.md @@ -0,0 +1,103 @@ +--- +title: CREATE MASKING POLICY +sidebar_position: 1 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +import EEFeature from '@site/src/components/EEFeature'; + + + +Creates a new masking policy in Databend. + +## Syntax + +```sql +CREATE [ OR REPLACE ] MASKING POLICY [ IF NOT EXISTS ] AS + ( [ , ... ] ) + RETURNS -> + [ COMMENT = '' ] +``` + +| Parameter | Description | +|------------------------|-------------| +| `policy_name` | Name of the masking policy to be created. | +| `arg_name_to_mask` | Parameter that represents the column being masked. This argument must appear first and automatically binds to the column referenced in `SET MASKING POLICY`. | +| `arg_type_to_mask` | Data type of the masked column. It must match the data type of the column where the policy is applied. | +| `arg_1 ... arg_n` | Optional extra parameters for additional columns that the policy logic depends on. Provide these columns through the `USING` clause when you attach the policy. | +| `arg_type_1 ... arg_type_n` | Data types for each optional parameter. They must match the columns listed in the `USING` clause. | +| `expression_on_arg_name` | Expression that determines how the input columns should be treated to generate the masked data. | +| `comment` | Optional comment that stores notes about the masking policy. | + +:::note +Ensure that *arg_type_to_mask* matches the data type of the column where the masking policy will be applied. When your policy defines multiple parameters, list each referenced column in the same order within the `USING` clause of `ALTER TABLE ... SET MASKING POLICY`. +::: + +## Access Control Requirements + +| Privilege | Description | +|:----------|:------------| +| CREATE MASKING POLICY | Required to create or replace a masking policy. Typically granted on `*.*`. | + +Databend automatically grants OWNERSHIP on the new masking policy to the current role so that it can manage the policy with others. + +## Examples + +This example illustrates the process of setting up a masking policy to selectively reveal or mask sensitive data based on user roles. + +```sql +-- Create a table and insert sample data +CREATE TABLE user_info ( + user_id INT, + phone VARCHAR, + email VARCHAR +); + +INSERT INTO user_info (user_id, phone, email) VALUES (1, '91234567', 'sue@example.com'); +INSERT INTO user_info (user_id, phone, email) VALUES (2, '81234567', 'eric@example.com'); + +-- Create a role +CREATE ROLE 'MANAGERS'; +GRANT ALL ON *.* TO ROLE 'MANAGERS'; + +-- Create a user and grant the role to the user +CREATE USER manager_user IDENTIFIED BY 'databend'; +GRANT ROLE 'MANAGERS' TO 'manager_user'; + +-- Create a masking policy that expects an extra column +CREATE MASKING POLICY contact_mask +AS + (contact_val nullable(string), phone_ref nullable(string)) + RETURNS nullable(string) -> + CASE + WHEN current_role() IN ('MANAGERS') THEN + contact_val + WHEN phone_ref LIKE '91%' + THEN + contact_val + ELSE + '*********' + END + COMMENT = 'mask contact data with phone check'; + +-- Associate the masking policy with the 'email' column +ALTER TABLE user_info +MODIFY COLUMN email SET MASKING POLICY contact_mask USING (email, phone); + +-- Associate the masking policy with the 'phone' column +ALTER TABLE user_info +MODIFY COLUMN phone SET MASKING POLICY contact_mask USING (phone, phone); + +-- Query with the Root user +SELECT user_id, phone, email FROM user_info ORDER BY user_id; + + user_id │ phone │ email │ + Nullable(Int32) │ Nullable(String) │ Nullable(String) │ +─────────────────┼──────────────────┼──────────────────┤ + 1 │ 91234567 │ sue@example.com │ + 2 │ ********* │ ********* │ + +``` diff --git a/tidb-cloud-lake/sql/create-network-policy.md b/tidb-cloud-lake/sql/create-network-policy.md new file mode 100644 index 0000000000000..19fadaab740a5 --- /dev/null +++ b/tidb-cloud-lake/sql/create-network-policy.md @@ -0,0 +1,50 @@ +--- +title: CREATE NETWORK POLICY +sidebar_position: 1 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a new network policy in Databend. + +## Syntax + +```sql +CREATE [ OR REPLACE ] NETWORK POLICY [ IF NOT EXISTS ] + ALLOWED_IP_LIST = ( 'allowed_ip1', 'allowed_ip2', ... ) + [ BLOCKED_IP_LIST = ( 'blocked_ip1', 'blocked_ip2', ...) ] + [ COMMENT = 'comment' ] +``` + +| Parameter | Description | +|----------------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| policy_name | Specifies the name of the network policy to be created. | +| ALLOWED_IP_LIST | Specifies a comma-separated list of allowed IP address ranges for the policy. Users associated with this policy can access the network using the specified IP ranges. | +| BLOCKED_IP_LIST | Specifies a comma-separated list of blocked IP address ranges for the policy. Users associated with this policy can still access the network from ALLOWED_IP_LIST, except for the IPs specified in BLOCKED_IP_LIST, which will be restricted from access. | +| COMMENT | An optional parameter used to add a description or comment for the network policy. | + +## Examples + +This example demonstrates creating a network policy with specified allowed and blocked IP addresses, and then associating this policy with a user to control network access. The network policy allows all IP addresses ranging from 192.168.1.0 to 192.168.1.255, except for the specific IP address 192.168.1.99. + +```sql +-- Create a network policy +CREATE NETWORK POLICY sample_policy + ALLOWED_IP_LIST=('192.168.1.0/24') + BLOCKED_IP_LIST=('192.168.1.99') + COMMENT='Sample'; + +SHOW NETWORK POLICIES; + +Name |Allowed Ip List |Blocked Ip List|Comment | +-------------+-------------------------+---------------+-----------+ +sample_policy|192.168.1.0/24 |192.168.1.99 |Sample | + +-- Create a user +CREATE USER sample_user IDENTIFIED BY 'databend'; + +-- Associate the network policy with the user +ALTER USER sample_user WITH SET NETWORK POLICY='sample_policy'; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/create-ngram-index.md b/tidb-cloud-lake/sql/create-ngram-index.md new file mode 100644 index 0000000000000..504cbe1c2f14d --- /dev/null +++ b/tidb-cloud-lake/sql/create-ngram-index.md @@ -0,0 +1,117 @@ +--- +title: CREATE NGRAM INDEX +sidebar_position: 1 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates an Ngram index on a column for a table. + +## Syntax + +```sql +-- Create an Ngram index on an existing table +CREATE [OR REPLACE] NGRAM INDEX [IF NOT EXISTS] +ON [.]() +[gram_size = ] [bloom_size = ] + +-- Create an Ngram index when creating a table +CREATE [OR REPLACE] TABLE ( + , + NGRAM INDEX () + [gram_size = ] [bloom_size = ] +)... +``` + +- `gram_size` (defaults to 3) specifies the length of each character-based substring (n-gram) when the column text is indexed. For example, with `gram_size = 3`, the text "hello world" would be split into overlapping substrings like: + + ```text + "hel", "ell", "llo", "lo ", "o w", " wo", "wor", "orl", "rld" + ``` + +- `bloom_size` specifies the size in bytes of the Bloom filter bitmap used to accelerate string matching within each block of data. It controls the trade-off between index accuracy and memory usage: + + - A larger `bloom_size` reduces false positives in string lookups, improving query precision at the cost of more memory. + - A smaller `bloom_size` saves memory but may increase false positives. + - If not explicitly set, the default is 1,048,576 bytes (1m) per indexed column per block. The valid range is from 512 bytes to 10,485,760 bytes (10m). + +## Examples + +### Creating a Table with NGRAM Index + +```sql +CREATE TABLE articles ( + id INT, + title VARCHAR, + content STRING, + NGRAM INDEX idx_content (content) +); +``` + +### Creating an NGRAM Index on an Existing Table + +```sql +CREATE TABLE products ( + id INT, + name VARCHAR, + description STRING +); + +CREATE NGRAM INDEX idx_description +ON products(description); +``` + +### Viewing Indexes + +```sql +SHOW INDEXES; +``` + +Result: +``` +┌─────────────────┬───────┬──────────┬─────────────────────────┬──────────────────────────┐ +│ name │ type │ original │ definition │ created_on │ +├─────────────────┼───────┼──────────┼─────────────────────────┼──────────────────────────┤ +│ idx_content │ NGRAM │ │ articles(content) │ 2025-05-13 01:22:34.123 │ +│ idx_description │ NGRAM │ │ products(description) │ 2025-05-13 01:23:45.678 │ +└─────────────────┴───────┴──────────┴─────────────────────────┴──────────────────────────┘ +``` + +### Using NGRAM Index + +```sql +-- Create a table with NGRAM index +CREATE TABLE phrases ( + id INT, + text STRING, + NGRAM INDEX idx_text (text) +); + +-- Insert sample data +INSERT INTO phrases VALUES +(1, 'apple banana cherry'), +(2, 'banana date fig'), +(3, 'cherry elderberry fig'), +(4, 'date grape kiwi'); + +-- Query using fuzzy matching with the NGRAM index +SELECT * FROM phrases WHERE text LIKE '%banana%'; +``` + +Result: +``` +┌────┬─────────────────────┐ +│ id │ text │ +├────┼─────────────────────┤ +│ 1 │ apple banana cherry │ +│ 2 │ banana date fig │ +└────┴─────────────────────┘ +``` + +### Dropping an NGRAM Index + +```sql +DROP NGRAM INDEX idx_text ON phrases; +``` diff --git a/tidb-cloud-lake/sql/create-notification-integration.md b/tidb-cloud-lake/sql/create-notification-integration.md new file mode 100644 index 0000000000000..e07959a5c17e4 --- /dev/null +++ b/tidb-cloud-lake/sql/create-notification-integration.md @@ -0,0 +1,45 @@ +--- +title: CREATE NOTIFICATION INTEGRATION +sidebar_position: 1 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a named notification integration that can be used to send notifications to external messaging services. + +**NOTICE:** this functionality works out of the box only in Databend Cloud. + +## Syntax +### Webhook Notification + +```sql +CREATE NOTIFICATION INTEGRATION [ IF NOT EXISTS ] +TYPE = +ENABLED = +[ WEBHOOK = ( url = , method = , authorization_header = ) ] +[ COMMENT = '' ] +``` + +| Required Parameters | Description | +|---------------------|-------------| +| name | The name of the notification integration. This is a mandatory field. | +| type | The type of the notification integration. Currently, only `webhook` is supported. | +| enabled | Whether the notification integration is enabled. | + +| Optional Parameters [(Webhook)](#webhook-notification) | Description | +|---------------------|-------------| +| url | The URL of the webhook. | +| method | The HTTP method to use when sending the webhook. default is `GET`| +| authorization_header| The authorization header to use when sending the webhook. | + +## Examples + +### Webhook Notification + +```sql +CREATE NOTIFICATION INTEGRATION IF NOT EXISTS SampleNotification type = webhook enabled = true webhook = (url = 'https://example.com', method = 'GET', authorization_header = 'bearer auth') +``` + +This example creates a notification integration named `SampleNotification` of type `webhook` that is enabled and sends notifications to the `https://example.com` URL using the `GET` method and the `bearer auth` authorization header. + diff --git a/tidb-cloud-lake/sql/create-password-policy.md b/tidb-cloud-lake/sql/create-password-policy.md new file mode 100644 index 0000000000000..27a03de6c69ca --- /dev/null +++ b/tidb-cloud-lake/sql/create-password-policy.md @@ -0,0 +1,54 @@ +--- +title: CREATE PASSWORD POLICY +sidebar_position: 1 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a new password policy in Databend. + +## Syntax + +```sql +CREATE [ OR REPLACE ] PASSWORD POLICY [ IF NOT EXISTS ] + [ PASSWORD_MIN_LENGTH = ] + [ PASSWORD_MAX_LENGTH = ] + [ PASSWORD_MIN_UPPER_CASE_CHARS = ] + [ PASSWORD_MIN_LOWER_CASE_CHARS = ] + [ PASSWORD_MIN_NUMERIC_CHARS = ] + [ PASSWORD_MIN_SPECIAL_CHARS = ] + [ PASSWORD_MIN_AGE_DAYS = ] + [ PASSWORD_MAX_AGE_DAYS = ] + [ PASSWORD_MAX_RETRIES = ] + [ PASSWORD_LOCKOUT_TIME_MINS = ] + [ PASSWORD_HISTORY = ] + [ COMMENT = '' ] +``` + +### Password Policy Attributes + +This table summarizes essential parameters for a password policy, covering aspects like length, character requirements, age restrictions, retry limits, lockout duration, and password history: + +| Attribute | Min | Max | Default | Description | +|-------------------------------|-----|-----|---------|--------------------------------------------------------------------------------------| +| PASSWORD_MIN_LENGTH | 8 | 256 | 8 | Minimum length of the password | +| PASSWORD_MAX_LENGTH | 8 | 256 | 256 | Maximum length of the password | +| PASSWORD_MIN_UPPER_CASE_CHARS | 0 | 256 | 1 | Minimum number of uppercase characters in the password | +| PASSWORD_MIN_LOWER_CASE_CHARS | 0 | 256 | 1 | Minimum number of lowercase characters in the password | +| PASSWORD_MIN_NUMERIC_CHARS | 0 | 256 | 1 | Minimum number of numeric characters in the password | +| PASSWORD_MIN_SPECIAL_CHARS | 0 | 256 | 0 | Minimum number of special characters in the password | +| PASSWORD_MIN_AGE_DAYS | 0 | 999 | 0 | Minimum number of days before password can be modified (0 indicates no restriction) | +| PASSWORD_MAX_AGE_DAYS | 0 | 999 | 90 | Maximum number of days before password must be modified (0 indicates no restriction) | +| PASSWORD_MAX_RETRIES | 1 | 10 | 5 | Maximum number of password retries before lockout | +| PASSWORD_LOCKOUT_TIME_MINS | 1 | 999 | 15 | Duration of lockout in minutes after exceeding retries | +| PASSWORD_HISTORY | 0 | 24 | 0 | Number of recent passwords to check for duplication (0 indicates no restriction) | + +## Examples + +This example creates a password policy named 'SecureLogin' with a minimum password length requirement set to 10 characters: + +```sql +CREATE PASSWORD POLICY SecureLogin + PASSWORD_MIN_LENGTH = 10; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/create-procedure.md b/tidb-cloud-lake/sql/create-procedure.md new file mode 100644 index 0000000000000..1ee24507de678 --- /dev/null +++ b/tidb-cloud-lake/sql/create-procedure.md @@ -0,0 +1,98 @@ +--- +title: CREATE PROCEDURE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Defines a stored procedure that executes SQL operations and returns a result. + +## Syntax + +```sql +CREATE PROCEDURE ( , ...) +RETURNS [NOT NULL] +LANGUAGE +[ COMMENT '' ] +AS $$ +BEGIN + + RETURN ; -- Use to return a single value + -- OR + RETURN TABLE(); -- Use to return a table +END; +$$; +``` + +| Parameter | Description | +|-----------------------------------------|---------------------------------------------------------------------------------------------------------------------------| +| `` | Name of the procedure. | +| ` ` | Input parameters (optional), each with a specified data type. Multiple parameters can be defined and separated by commas. | +| `RETURNS [NOT NULL]` | Specifies the data type of the return value. `NOT NULL` ensures the returned value cannot be NULL. | +| `LANGUAGE` | Specifies the language in which the procedure body is written. Currently, only `SQL` is supported. | +| `COMMENT` | Optional text describing the procedure. | +| `AS ...` | Encloses the procedure body, which contains SQL statements, variable declarations, loops, and a RETURN statement. | + +## Access control requirements + +| Privilege | Object Type | Description | +|:-----------------|:------------|:---------------------| +| CREATE PROCEDURE | Global | Creates a procedure. | + + +To create a procedure, the user performing the operation or the [current_role](/tidb-cloud-lake/guides/roles.md) must have the CREATE PROCEDURE [privilege](/tidb-cloud-lake/guides/privileges.md). + + +## Examples + +This example defines a stored procedure that converts weight from kilograms (kg) to pounds (lb): + +```sql +CREATE PROCEDURE convert_kg_to_lb(kg DECIMAL(4, 2)) +RETURNS DECIMAL(10, 2) +LANGUAGE SQL +COMMENT = 'Converts kilograms to pounds' +AS $$ +BEGIN + RETURN kg * 2.20462; +END; +$$; +``` + +You can also define a stored procedure that works with loops, conditions, and dynamic variables. + +```sql + +CREATE OR REPLACE PROCEDURE loop_test() +RETURNS INT +LANGUAGE SQL +COMMENT = 'loop test' +AS $$ +BEGIN + LET x RESULTSET := select number n from numbers(10); + LET sum := 0; + FOR x IN x DO + FOR batch in 0 TO x.n DO + IF batch % 2 = 0 THEN + sum := sum + batch; + ELSE + sum := sum - batch; + END IF; + END FOR; + END FOR; + RETURN sum; +END; +$$; + +-- Grant ACCESS PROCEDURE Privilege TO role test +GRANT ACCESS PROCEDURE ON PROCEDURE loop_test() to role test; + +``` + +```sql +CALL PROCEDURE loop_test(); + +┌─Result─┐ +│ -5 │ +└────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/create-role.md b/tidb-cloud-lake/sql/create-role.md new file mode 100644 index 0000000000000..bfd0a543c6716 --- /dev/null +++ b/tidb-cloud-lake/sql/create-role.md @@ -0,0 +1,85 @@ +--- +title: CREATE ROLE +sidebar_position: 5 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a new role for access control. Roles are used to group privileges and can be assigned to users or other roles, providing a flexible way to manage permissions in Databend. + +## Syntax + +```sql +CREATE ROLE [ IF NOT EXISTS ] +``` + +**Parameters:** + +- `IF NOT EXISTS`: Create the role only if it doesn't exist (recommended to avoid errors) +- ``: Role name (cannot contain single quotes, double quotes, backspace, or form feed characters) + +## Examples + +```sql +-- Create a basic role +CREATE ROLE analyst; + +-- Create role only if it doesn't exist (recommended) +CREATE ROLE IF NOT EXISTS data_viewer; +``` + +## Common Usage Patterns + +### Read-Only Analyst Role + +Create a role for data analysts who need read access to sales data: + +```sql +-- Create the analyst role +CREATE ROLE sales_analyst; + +-- Grant read permissions +GRANT SELECT ON sales_db.* TO ROLE sales_analyst; + +-- Assign to users +GRANT ROLE sales_analyst TO 'alice'; +GRANT ROLE sales_analyst TO 'bob'; +``` + +### Database Administrator Role + +Create a role for administrators who need full control: + +```sql +-- Create the admin role +CREATE ROLE sales_admin; + +-- Grant full permissions on the database +GRANT ALL ON sales_db.* TO ROLE sales_admin; + +-- Grant user management permissions +GRANT CREATE USER, CREATE ROLE ON *.* TO ROLE sales_admin; + +-- Assign to admin users +GRANT ROLE sales_admin TO 'admin_user'; +``` + +### Verification + +```sql +-- Check what each role can do +SHOW GRANTS FOR ROLE sales_analyst; +SHOW GRANTS FOR ROLE sales_admin; + +-- Check user permissions +SHOW GRANTS FOR 'alice'; +SHOW GRANTS FOR 'admin_user'; +``` + + +## See Also + +- [GRANT](/tidb-cloud-lake/sql/grant.md) - Grant privileges and roles +- [SHOW GRANTS](/tidb-cloud-lake/sql/show-grants.md) - View granted privileges +- [DROP ROLE](/tidb-cloud-lake/sql/drop-role.md) - Drop roles diff --git a/tidb-cloud-lake/sql/create-scalar-function.md b/tidb-cloud-lake/sql/create-scalar-function.md new file mode 100644 index 0000000000000..5ec38aabe59ab --- /dev/null +++ b/tidb-cloud-lake/sql/create-scalar-function.md @@ -0,0 +1,189 @@ +--- +title: CREATE SCALAR FUNCTION +sidebar_position: 0 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a scalar user-defined function (Scalar UDF). The same `CREATE FUNCTION` statement supports two implementation styles: + +- **SQL expression**: Logic expressed purely in SQL; no external runtime is required. +- **Python / JavaScript**: Write code and specify the entry point with `HANDLER`. + +If you need to call external systems (HTTP/services), see External Function commands. + +## Syntax + +### SQL (expression) + +```sql +CREATE [ OR REPLACE ] FUNCTION [ IF NOT EXISTS ] + ( [] ) + RETURNS + AS $$ $$ + [ DESC='' ] +``` + +### Python / JavaScript + +```sql +CREATE [ OR REPLACE ] FUNCTION [ IF NOT EXISTS ] + ( [] ) + RETURNS + LANGUAGE + [IMPORTS = ('', ...)] + [PACKAGES = ('', ...)] + HANDLER = '' + AS $$ $$ + [ DESC='' ] +``` + +## Parameters + +- ``: Optional comma-separated list of parameters with their types (e.g., `x INT, y FLOAT`) +- ``: The data type of the function's return value +- ``: `python`, `javascript` +- ``: Stage files to import (e.g., `@s_udf/your_file.zip`) +- ``: Packages to install from PyPI (Python only; e.g. `numpy`) +- ``: Name of the function in the code to call +- ``: Implementation code in the specified language + +## Access control requirements + +| Privilege | Object Type | Description | +|:----------|:--------------|:---------------| +| SUPER | Global, Table | Operates a UDF | + +To create a user-defined function, the user performing the operation or the [current_role](/tidb-cloud-lake/guides/roles.md) must have the SUPER [privilege](/tidb-cloud-lake/guides/privileges.md). + +## SQL + +```sql +-- Create a function to calculate area of a circle +CREATE OR REPLACE FUNCTION area_of_circle(radius FLOAT) +RETURNS FLOAT +AS $$ + pi() * radius * radius +$$; + +-- Create a function to calculate age in years +CREATE OR REPLACE FUNCTION calculate_age(birth_date DATE) +RETURNS INT +AS $$ + date_diff('year', birth_date, now()) +$$; + +-- Create a function with multiple parameters +CREATE OR REPLACE FUNCTION calculate_bmi(weight_kg FLOAT, height_m FLOAT) +RETURNS FLOAT +AS $$ + weight_kg / (height_m * height_m) +$$; + +-- Use the functions +SELECT area_of_circle(5.0) AS circle_area; +SELECT calculate_age(to_date('1990-05-15')) AS age; +SELECT calculate_bmi(70.0, 1.75) AS bmi; +``` + +## Python + +Python runtime requires Databend Enterprise. You can install PyPI packages via `PACKAGES` and import stage files via `IMPORTS`. + +### Data type mappings (Python) + +| Databend Type | Python Type | +|--------------|-------------| +| NULL | None | +| BOOLEAN | bool | +| INT | int | +| FLOAT/DOUBLE | float | +| DECIMAL | decimal.Decimal | +| VARCHAR | str | +| BINARY | bytes | +| LIST | list | +| MAP | dict | +| STRUCT | object | +| JSON | dict/list | + +### Examples + +```sql +CREATE OR REPLACE FUNCTION calculate_age_py(VARCHAR) +RETURNS INT +LANGUAGE python +HANDLER = 'calculate_age' +AS $$ +from datetime import datetime + +def calculate_age(birth_date_str): + birth_date = datetime.strptime(birth_date_str, '%Y-%m-%d') + today = datetime.now() + age = today.year - birth_date.year + if (today.month, today.day) < (birth_date.month, birth_date.day): + age -= 1 + return age +$$; + +SELECT calculate_age_py('1990-05-15') AS age; +``` + +```sql +CREATE OR REPLACE FUNCTION numpy_sqrt(FLOAT) +RETURNS FLOAT +LANGUAGE python +PACKAGES = ('numpy') +HANDLER = 'numpy_sqrt' +AS $$ +import numpy as np + +def numpy_sqrt(x): + return float(np.sqrt(x)) +$$; + +SELECT numpy_sqrt(9.0) AS sqrt_val; +``` + +## JavaScript + +### Data type mappings (JavaScript) + +| Databend Type | JavaScript Type | +|--------------|----------------| +| NULL | null | +| BOOLEAN | Boolean | +| INT | Number | +| FLOAT/DOUBLE | Number | +| DECIMAL | BigDecimal | +| VARCHAR | String | +| BINARY | Uint8Array | +| DATE/TIMESTAMP | Date | +| ARRAY | Array | +| MAP | Object | +| STRUCT | Object | +| JSON | Object/Array | + +### Example + +```sql +CREATE OR REPLACE FUNCTION calculate_age_js(VARCHAR) +RETURNS INT +LANGUAGE javascript +HANDLER = 'calculateAge' +AS $$ +export function calculateAge(birthDateStr) { + const birthDate = new Date(birthDateStr); + const today = new Date(); + + let age = today.getFullYear() - birthDate.getFullYear(); + const monthDiff = today.getMonth() - birthDate.getMonth(); + + if (monthDiff < 0 || (monthDiff === 0 && today.getDate() < birthDate.getDate())) { + age--; + } + + return age; +} +$$; +``` diff --git a/tidb-cloud-lake/sql/create-sequence.md b/tidb-cloud-lake/sql/create-sequence.md new file mode 100644 index 0000000000000..b6026a62e88d3 --- /dev/null +++ b/tidb-cloud-lake/sql/create-sequence.md @@ -0,0 +1,104 @@ +--- +title: CREATE SEQUENCE +sidebar_position: 1 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a new sequence in Databend. + +A sequence is an object that automatically generates unique numeric identifiers, commonly used for assigning distinct values to table rows (e.g., user IDs). While sequences guarantee unique values, they **do not** ensure contiguity (i.e., gaps may occur). + +## Syntax + +```sql +CREATE [ OR REPLACE ] SEQUENCE [ IF NOT EXISTS ] + [ START [ = ] ] + [ INCREMENT [ = ] ] +``` + +| Parameter | Description | Default | +|---------------------|-------------------------------------------------------|---------| +| `` | The name of the sequence to be created. | - | +| `START` | The initial value of the sequence. | 1 | +| `INCREMENT` | The increment value for each call to NEXTVAL. | 1 | + +## Access control requirements + +| Privilege | Object Type | Description | +|:----------------|:------------|:----------------------| +| CREATE SEQUENCE | Global | Creates a sequence. | + + +To create a sequence, the user performing the operation or the [current_role](/tidb-cloud-lake/guides/roles.md) must have the CREATE SEQUENCE [privilege](/tidb-cloud-lake/guides/privileges.md). + +:::note + +The enable_experimental_sequence_rbac_check settings governs sequence-level access control. It is disabled by default. +sequence creation solely requires the user to possess superuser privileges, bypassing detailed RBAC checks. +When enabled, granular permission verification is enforced during sequence establishment. + +This is an experimental feature and may be enabled by default in the future. + +::: + +## Examples + +### Basic Sequence + +Create a sequence with default settings (starts at 1, increments by 1): + +```sql +CREATE SEQUENCE staff_id_seq; + +CREATE TABLE staff ( + staff_id INT, + name VARCHAR(50), + department VARCHAR(50) +); + +INSERT INTO staff (staff_id, name, department) +VALUES (NEXTVAL(staff_id_seq), 'John Doe', 'HR'); + +INSERT INTO staff (staff_id, name, department) +VALUES (NEXTVAL(staff_id_seq), 'Jane Smith', 'Finance'); + +SELECT * FROM staff; + +┌───────────────────────────────────────────────────────┐ +│ staff_id │ name │ department │ +├─────────────────┼──────────────────┼──────────────────┤ +│ 2 │ Jane Smith │ Finance │ +│ 1 │ John Doe │ HR │ +└───────────────────────────────────────────────────────┘ +``` + +### Custom Start and Increment + +Create a sequence starting at 1000 with increment of 10: + +```sql +CREATE SEQUENCE order_id_seq START = 1000 INCREMENT = 10; + +CREATE TABLE orders ( + order_id BIGINT, + order_name VARCHAR(100) +); + +INSERT INTO orders (order_id, order_name) +VALUES (NEXTVAL(order_id_seq), 'Order A'); + +INSERT INTO orders (order_id, order_name) +VALUES (NEXTVAL(order_id_seq), 'Order B'); + +SELECT * FROM orders; + +┌──────────────────────────────────┐ +│ order_id │ order_name │ +├────────────────┼─────────────────┤ +│ 1000 │ Order A │ +│ 1010 │ Order B │ +└──────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/create-stage.md b/tidb-cloud-lake/sql/create-stage.md new file mode 100644 index 0000000000000..c1b9f1b0cf908 --- /dev/null +++ b/tidb-cloud-lake/sql/create-stage.md @@ -0,0 +1,227 @@ +--- +title: CREATE STAGE +sidebar_position: 1 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates an internal or external stage. + +## Syntax + +```sql +-- Internal stage +CREATE [ OR REPLACE ] STAGE [ IF NOT EXISTS ] + [ FILE_FORMAT = ( + FORMAT_NAME = '' + | TYPE = { CSV | TSV | NDJSON | PARQUET | ORC } [ formatTypeOptions ] + ) ] + [ COPY_OPTIONS = ( copyOptions ) ] + [ COMMENT = '' ] + +-- External stage +CREATE STAGE [ IF NOT EXISTS ] + externalStageParams + [ FILE_FORMAT = ( + FORMAT_NAME = '' + | TYPE = { CSV | TSV | NDJSON | PARQUET | ORC } [ formatTypeOptions ] + ) ] + [ COPY_OPTIONS = ( copyOptions ) ] + [ COMMENT = '' ] +``` + +### externalStageParams + +:::tip +For external stages, it is recommended to use the `CONNECTION` parameter to reference pre-configured connection objects instead of inline credentials. This approach provides better security and maintainability. +::: + +```sql +externalStageParams ::= + '://' + CONNECTION = ( + + ) +| + CONNECTION = ( + CONNECTION_NAME = '' + ); +``` + +For the connection parameters available for different storage services, see [Connection Parameters](/tidb-cloud-lake/sql/connection-parameters.md). + +For more information on `CONNECTION_NAME`, see [CREATE CONNECTION](/tidb-cloud-lake/sql/create-connection.md). + +### FILE_FORMAT + +See [Input & Output File Formats](/tidb-cloud-lake/sql/input-output-file-formats.md) for details. + +### copyOptions + +```sql +copyOptions ::= + [ SIZE_LIMIT = ] + [ PURGE = ] +``` + +| Parameters | Description | Required | +|----------------------|-------------------------------------------------------------------------------------------------------------------------------|----------| +| `SIZE_LIMIT = ` | Number (> 0) that specifies the maximum rows of data to be loaded for a given COPY statement. Default `0` | Optional | +| `PURGE = ` | True specifies that the command will purge the files in the stage if they are loaded successfully into table. Default `false` | Optional | + + +## Access control requirements + +| Privilege | Object Type | Description | +|:----------|:--------------|:--------------------------------------------------------------------------| +| SUPER | Global, Table | Operates a stage(Lists stages. Creates, Drops a stage), catalog or share. | + +To create a stage, the user performing the operation or the [current_role](/tidb-cloud-lake/guides/roles.md) must have the SUPER [privilege](/tidb-cloud-lake/guides/privileges.md). + +## Examples + +### Example 1: Create Internal Stage + +This example creates an internal stage named *my_internal_stage*: + +```sql +CREATE STAGE my_internal_stage; + +DESC STAGE my_internal_stage; + +name |stage_type|stage_params |copy_options |file_format_options |number_of_files|creator |comment| +-----------------+----------+--------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+---------------+------------------+-------+ +my_internal_stage|Internal |StageParams { storage: Fs(StorageFsConfig { root: "_data" }) }|CopyOptions { on_error: AbortNum(1), size_limit: 0, max_files: 0, split_size: 0, purge: false, single: false, max_file_size: 0, disable_variant_check: false }|Parquet(ParquetFileFormatParams)| 0|'root'@'127.0.0.1'| | + +``` + +### Example 2: Create External Stage with Connection + +This example creates an external stage named *my_s3_stage* on Amazon S3 using a connection: + +```sql +-- First create a connection +CREATE CONNECTION my_s3_connection + STORAGE_TYPE = 's3' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = ''; + +-- Create stage using the connection +CREATE STAGE my_s3_stage + URL='s3://load/files/' + CONNECTION = (CONNECTION_NAME = 'my_s3_connection'); + +DESC STAGE my_s3_stage; ++-------------+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------+---------+ +| name | stage_type | stage_params | copy_options | file_format_options | comment | ++-------------+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------+---------+ +| my_s3_stage | External | StageParams { storage: S3(StageS3Storage { bucket: "load", path: "/files/", credentials_aws_key_id: "", credentials_aws_secret_key: "", encryption_master_key: "" }) } | CopyOptions { on_error: None, size_limit: 0 } | FileFormatOptions { format: Csv, skip_header: 0, field_delimiter: ",", record_delimiter: "\n", compression: None } | | ++-------------+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------+---------+ +``` + +### Example 3: Create External Stage with AWS IAM User + +This example creates an external stage named *iam_external_stage* on Amazon S3 with an AWS Identity and Access Management (IAM) user. + +#### Step 1: Create Access Policy for S3 Bucket + +The procedure below creates an access policy named *databend-access* for the bucket *databend-toronto* on Amazon S3: + +1. Log into the AWS Management Console, then select **Services** > **Security, Identity, & Compliance** > **IAM**. +2. Select **Account settings** in the left navigation pane, and go to the **Security Token Service (STS)** section on the right page. Make sure the status of AWS region where your account belongs is **Active**. +3. Select **Policies** in the left navigation pane, then select **Create policy** on the right page. +4. Click the **JSON** tab, copy and paste the following code to the editor, then save the policy as *databend_access*. + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "AllObjectActions", + "Effect": "Allow", + "Action": [ + "s3:*Object" + ], + "Resource": "arn:aws:s3:::databend-toronto/*" + }, + { + "Sid": "ListObjectsInBucket", + "Effect": "Allow", + "Action": [ + "s3:ListBucket" + ], + "Resource": "arn:aws:s3:::databend-toronto" + } + ] +} +``` + +#### Step 2: Create IAM User + +The procedure below creates an IAM user named *databend* and attach the access policy *databend-access* to the user. + +1. Select **Users** in the left navigation pane, then select **Add users** on the right page. +2. Configure the user: + - Set the user name to *databend*. + - When setting permissions for the user, click **Attach policies directly**, then search for and select the access policy *databend-access*. +3. After the user is created, click the user name to open the details page and select the **Security credentials** tab. +4. In the **Access keys** section, click **Create access key**. +5. Select **Third-party service** for the use case, and tick the checkbox below to confirm creation of the access key. +6. Copy and save the generated access key and secret access key to a safe place. + +#### Step 3: Create External Stage + +Use the IAM role to create an external stage with better security. + +```sql +-- First create a connection using IAM role +CREATE CONNECTION iam_s3_connection + STORAGE_TYPE = 's3' + ROLE_ARN = 'arn:aws:iam::123456789012:role/databend-access' + EXTERNAL_ID = 'my-external-id-123'; + +-- Create stage using the connection +CREATE STAGE iam_external_stage + URL = 's3://databend-toronto' + CONNECTION = (CONNECTION_NAME = 'iam_s3_connection'); +``` + +### Example 4: Create External Stage on Cloudflare R2 + +[Cloudflare R2](https://www.cloudflare.com/en-ca/products/r2/) is an object storage service introduced by Cloudflare that is fully compatible with Amazon's AWS S3 service. This example creates an external stage named *r2_stage* on Cloudflare R2. + +#### Step 1: Create Bucket + +The procedure below creates a bucket named *databend* on Cloudflare R2. + +1. Log into the Cloudflare dashboard, and select **R2** in the left navigation pane. +2. Click **Create bucket** to create a bucket, and set the bucket name to *databend*. Once the bucket is successfully created, you can find the bucket endpoint right below the bucket name when you view the bucket details page. + +#### Step 2: Create R2 API Token + +The procedure below creates an R2 API token that includes an Access Key ID and a Secret Access Key. + +1. Click **Manage R2 API Tokens** on **R2** > **Overview**. +2. Click **Create API token** to create an API token. +3. When configuring the API token, select the necessary permission and set the **TTL** as needed. +4. Click **Create API Token** to obtain the Access Key ID and Secret Access Key. Copy and save them to a safe place. + +#### Step 3: Create External Stage + +Use the created Access Key ID and Secret Access Key to create an external stage named *r2_stage*. + +```sql +-- First create a connection +CREATE CONNECTION r2_connection + STORAGE_TYPE = 's3' + REGION = 'auto' + ENDPOINT_URL = '' + ACCESS_KEY_ID = '' + SECRET_ACCESS_KEY = ''; + +-- Create stage using the connection +CREATE STAGE r2_stage + URL='s3://databend/' + CONNECTION = (CONNECTION_NAME = 'r2_connection'); +``` diff --git a/tidb-cloud-lake/sql/create-stream.md b/tidb-cloud-lake/sql/create-stream.md new file mode 100644 index 0000000000000..2d54214bf87bf --- /dev/null +++ b/tidb-cloud-lake/sql/create-stream.md @@ -0,0 +1,117 @@ +--- +title: CREATE STREAM +sidebar_position: 1 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +import EEFeature from '@site/src/components/EEFeature'; + + + +Creates a stream. + +## Syntax + +```sql +CREATE [ OR REPLACE ] STREAM [ IF NOT EXISTS ] [ . ] + ON TABLE [ . ] + [ AT ( { TIMESTAMP => | SNAPSHOT => '' | STREAM => } ) ] + [ APPEND_ONLY = true | false ] + [ COMMENT = '' ] +``` + +| Parameter | Description | +|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `< database_name >` | A stream is treated as an object belonging to a specific database, similar to a table or a view. CREATE STREAM allows for different databases between the stream and the associated table. If a database is not explicitly specified, the current database is applied as the database for the stream you create. | +| AT | When using `AT` followed by `TIMESTAMP =>`or `SNAPSHOT =>` , you can create a stream containing data changes after a specific historical point by the timestamp or snapshot ID; When `AT` is followed by `STREAM =>` , it allows for the creation of a new stream identical to an existing one, preserving the same captured data changes. | +| APPEND_ONLY | When set to `true`, the stream operates in `Append-Only` mode; when set to `false`, it operates in `Standard` mode. Defaults to `true`. For additional details on stream operation modes, see [How Stream Works](/tidb-cloud-lake/sql/stream.md#how-stream-works). | + +## Examples + +This example demonstrates creating a stream named 'order_changes' to monitor changes within the 'orders' table: + +```sql +-- Create a table named 'orders' +CREATE TABLE orders ( + order_id INT, + product_name VARCHAR, + quantity INT, + order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); + +-- Create a stream named 'order_changes' for the table 'orders' +CREATE STREAM order_changes ON TABLE orders; + +-- Insert order 1001 to the table 'orders' +INSERT INTO orders (order_id, product_name, quantity) VALUES (1001, 'Product A', 10); + +-- Insert order 1002 to the table 'orders' +INSERT INTO orders (order_id, product_name, quantity) VALUES (1002, 'Product B', 20); + +-- Retrieve all records from the 'order_changes' stream +SELECT * FROM order_changes; + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ order_id │ product_name │ quantity │ order_date │ change$action │ change$is_update │ change$row_id │ +├─────────────────┼──────────────────┼─────────────────┼────────────────────────────┼───────────────┼──────────────────┼────────────────────────────────────────┤ +│ 1002 │ Product B │ 20 │ 2024-03-28 03:24:16.629135 │ INSERT │ false │ acb58bd6bb4243a4bf0832bf570b38c2000000 │ +│ 1001 │ Product A │ 10 │ 2024-03-28 03:24:16.539178 │ INSERT │ false │ b93a15e694db4134ab5a23afa8c92b20000000 │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +The following example creates a new stream named 'order_changes_copy' with the `AT` parameter, containing the same data changes as 'order_changes': + +```sql +-- Create a stream 'order_changes_copy' on the 'orders' table, copying data changes from 'order_changes' +CREATE STREAM order_changes_copy ON TABLE orders AT (STREAM => order_changes); + +-- Retrieve all records from the 'order_changes_copy' stream +SELECT * FROM order_changes_copy; + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ order_id │ product_name │ quantity │ order_date │ change$action │ change$is_update │ change$row_id │ +├─────────────────┼──────────────────┼─────────────────┼────────────────────────────┼───────────────┼──────────────────┼────────────────────────────────────────┤ +│ 1002 │ Product B │ 20 │ 2024-03-28 03:24:16.629135 │ INSERT │ false │ acb58bd6bb4243a4bf0832bf570b38c2000000 │ +│ 1001 │ Product A │ 10 │ 2024-03-28 03:24:16.539178 │ INSERT │ false │ b93a15e694db4134ab5a23afa8c92b20000000 │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +This example creates two streams on the 'orders' table. Each stream utilizes the `AT` parameter to obtain data changes after a specific snapshot ID or timestamp, respectively. + +```sql +-- Retrieve snapshot and timestamp information from the 'orders' table +SELECT snapshot_id, timestamp from FUSE_SNAPSHOT('default','orders'); + +┌───────────────────────────────────────────────────────────────┐ +│ snapshot_id │ timestamp │ +├──────────────────────────────────┼────────────────────────────┤ +│ f7f57c7d07f445a68e4aa53fa2578bbb │ 2024-03-28 03:24:16.633721 │ +│ 11b9d81eabc94c7da648908f0ba313a1 │ 2024-03-28 03:24:16.611835 │ +└───────────────────────────────────────────────────────────────┘ + +-- Create a stream 'order_changes_after_snapshot' on the 'orders' table, capturing data changes after a specific snapshot +CREATE STREAM order_changes_after_snapshot ON TABLE orders AT (SNAPSHOT => '11b9d81eabc94c7da648908f0ba313a1'); + +-- Query the 'order_changes_after_snapshot' stream to view data changes captured after the specified snapshot +SELECT * FROM order_changes_after_snapshot; + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ order_id │ product_name │ quantity │ order_date │ change$action │ change$is_update │ change$row_id │ +├─────────────────┼──────────────────┼─────────────────┼────────────────────────────┼───────────────┼──────────────────┼────────────────────────────────────────┤ +│ 1002 │ Product B │ 20 │ 2024-03-28 03:24:16.629135 │ INSERT │ false │ acb58bd6bb4243a4bf0832bf570b38c2000000 │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ + +-- Create a stream 'order_changes_after_timestamp' on the 'orders' table, capturing data changes after a specific timestamp +CREATE STREAM order_changes_after_timestamp ON TABLE orders AT (TIMESTAMP => '2024-03-28 03:24:16.611835'::TIMESTAMP); + +-- Query the 'order_changes_after_timestamp' stream to view data changes captured after the specified timestamp +SELECT * FROM order_changes_after_timestamp; + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ order_id │ product_name │ quantity │ order_date │ change$action │ change$is_update │ change$row_id │ +├─────────────────┼──────────────────┼─────────────────┼────────────────────────────┼───────────────┼──────────────────┼────────────────────────────────────────┤ +│ 1002 │ Product B │ 20 │ 2024-03-28 03:24:16.629135 │ INSERT │ false │ acb58bd6bb4243a4bf0832bf570b38c2000000 │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/create-table-function.md b/tidb-cloud-lake/sql/create-table-function.md new file mode 100644 index 0000000000000..ea807da16954f --- /dev/null +++ b/tidb-cloud-lake/sql/create-table-function.md @@ -0,0 +1,90 @@ +--- +title: CREATE TABLE FUNCTION +sidebar_position: 3 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a tabular SQL UDF (UDTF) that encapsulates SQL queries as a table function. Table functions are written in SQL; no external languages are involved. + +### Supported Languages + +- SQL queries only (no external runtimes) + +## Syntax + +```sql +CREATE [ OR REPLACE ] FUNCTION [ IF NOT EXISTS ] + ( [] ) + RETURNS TABLE ( ) + AS $$ $$ +``` + +Where: +- ``: Optional comma-separated list of input parameters with their types (e.g., `x INT, name VARCHAR`) +- ``: Comma-separated list of column names and their types that the function returns +- ``: The SQL query that defines the function logic + +## Unified Function Syntax + +Databend uses a unified `$$` syntax for both scalar and table functions: + +| Function Type | Returns | Usage | +|---------------|---------|-------| +| **Scalar Function** | Single value | `RETURNS ` + `AS $$ $$` | +| **Table Function** | Result set | `RETURNS TABLE(...)` + `AS $$ $$` | + +This consistency makes it easy to understand and switch between function types. + +## Examples + +### Basic Table Function + +```sql +-- Create a sample table +CREATE OR REPLACE TABLE employees ( + id INT, + name VARCHAR(100), + department VARCHAR(100), + salary DECIMAL(10,2) +); + +INSERT INTO employees VALUES + (1, 'John', 'Engineering', 75000), + (2, 'Jane', 'Marketing', 65000), + (3, 'Bob', 'Engineering', 80000), + (4, 'Alice', 'Marketing', 70000); + +-- Create a simple table function to get all employees +CREATE OR REPLACE FUNCTION get_all_employees() +RETURNS TABLE (id INT, name VARCHAR(100), department VARCHAR(100), salary DECIMAL(10,2)) +AS $$ SELECT id, name, department, salary FROM employees $$; + +-- Test the function +SELECT * FROM get_all_employees(); +``` + +### Parameterized Table Function + +```sql +-- Create a table function that filters employees by department +CREATE OR REPLACE FUNCTION get_employees_by_dept(dept_name VARCHAR) +RETURNS TABLE (id INT, name VARCHAR(100), department VARCHAR(100), salary DECIMAL(10,2)) +AS $$ SELECT id, name, department, salary FROM employees WHERE department = dept_name $$; + +-- Use the parameterized table function +SELECT * FROM get_employees_by_dept('Engineering'); +``` + +### Complex Table Function + +```sql +-- Create a table function that aggregates data +CREATE OR REPLACE FUNCTION get_department_stats() +RETURNS TABLE (department VARCHAR(100), employee_count INT, avg_salary DECIMAL(10,2)) +AS $$ SELECT department, COUNT(*) as employee_count, AVG(salary) as avg_salary FROM employees GROUP BY department $$; + +-- Use the complex table function +SELECT * FROM get_department_stats(); +``` diff --git a/tidb-cloud-lake/sql/create-table.md b/tidb-cloud-lake/sql/create-table.md new file mode 100644 index 0000000000000..2bccd56cc51c3 --- /dev/null +++ b/tidb-cloud-lake/sql/create-table.md @@ -0,0 +1,399 @@ +--- +title: CREATE TABLE +sidebar_position: 1 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +import EEFeature from '@site/src/components/EEFeature'; + + + +Creating tables is one of the most complicated operations for many databases because you might need to: + +- Manually specify the engine +- Manually specify the indexes +- And even specify the data partitions or data shard + +Databend aims to be easy to use by design and does NOT require any of those operations when you create a table. Moreover, the CREATE TABLE statement provides these options to make it much easier for you to create tables in various scenarios: + +- [CREATE TABLE](#create-table): Creates a table from scratch. +- [CREATE TABLE ... LIKE](#create-table--like): Creates a table with the same column definitions as an existing one. +- [CREATE TABLE ... AS](#create-table--as): Creates a table and inserts data with the results of a SELECT query. + +See also: + +- [CREATE TEMP TABLE](/tidb-cloud-lake/sql/create-temp-table.md) +- [CREATE TRANSIENT TABLE](/tidb-cloud-lake/sql/create-transient-table.md) +- [CREATE EXTERNAL TABLE](10-ddl-create-table-external-location.md) + +## CREATE TABLE + +```sql +CREATE [ OR REPLACE ] TABLE [ IF NOT EXISTS ] [ . ] +( + [ NOT NULL | NULL ] + [ { DEFAULT + | { AUTOINCREMENT | IDENTITY } + [ { ( , ) + | START INCREMENT } ] + [ { ORDER | NOORDER } ] + } ] + [ AS () STORED | VIRTUAL ] + [ COMMENT '' ], + ... + ... +) +``` + +:::note + +- For available data types in Databend, see [Data Types](/tidb-cloud-lake/sql/data-types.md). + +- Databend suggests avoiding special characters as much as possible when naming columns. However, if special characters are necessary in some cases, the alias should be enclosed in backticks, like this: CREATE TABLE price(\`$CA\` int); + +- Databend will automatically convert column names into lowercase. For example, if you name a column as _Total_, it will appear as _total_ in the result. + ::: + +## CREATE TABLE ... LIKE + +Creates a table with the same column definitions as an existing table. Column names, data types, and their non-NUll constraints of the existing will be copied to the new table. + +Syntax: + +```sql +CREATE TABLE [IF NOT EXISTS] [db.]table_name +LIKE [db.]origin_table_name +``` + +This command does not include any data or attributes (such as `CLUSTER BY`, `TRANSIENT`, and `COMPRESSION`) from the original table, and instead creates a new table using the default system settings. + +:::note WORKAROUND + +- `TRANSIENT` and `COMPRESSION` can be explicitly specified when you create a new table with this command. For example, + +```sql +create transient table t_new like t_old; + +create table t_new compression='lz4' like t_old; +``` + +::: + +## CREATE TABLE ... AS + +Creates a table and fills it with data computed by a SELECT command. + +Syntax: + +```sql +CREATE TABLE [IF NOT EXISTS] [db.]table_name +AS SELECT query +``` + +This command does not include any attributes (such as CLUSTER BY, TRANSIENT, and COMPRESSION) from the original table, and instead creates a new table using the default system settings. + +:::note WORKAROUND + +- `TRANSIENT` and `COMPRESSION` can be explicitly specified when you create a new table with this command. For example, + +```sql +create transient table t_new as select * from t_old; + +create table t_new compression='lz4' as select * from t_old; +``` + +::: + +## Column Nullable + +By default, **all columns are nullable(NULL)** in Databend. If you need a column that does not allow NULL values, use the NOT NULL constraint. For more information, see [NULL Values and NOT NULL Constraint](/tidb-cloud-lake/sql/data-types.md). + +## Column Default Values + +`DEFAULT ` sets a default value for the column when no explicit expression is provided. The default expression can be: + +- A fixed constant, such as `Marketing` for the `department` column in the example below. +- An expression with no input arguments and returns a scalar value, such as `1 + 1`, `NOW()` or `UUID()`. +- A dynamically generated value from a sequence, such as `NEXTVAL(staff_id_seq)` for the `staff_id` column in the example below. + - NEXTVAL must be used as a standalone default value; expressions like `NEXTVAL(seq1) + 1` are not supported. + - Users must adhere to their granted permissions for sequence utilization, including operations such as [NEXTVAL](/tidb-cloud-lake/sql/nextval.md#access-control-requirements) + +## Auto-Increment Columns + + + +`AUTOINCREMENT` or `IDENTITY` can be used to create auto-incrementing columns that automatically generate sequential numeric values. This is particularly useful for creating unique identifiers. + +**Syntax:** + +```sql +{ AUTOINCREMENT | IDENTITY } + [ { ( , ) + | START INCREMENT } ] + [ { ORDER | NOORDER } ] +``` + +**Parameters:** + +- `start_num`: The initial value for the auto-increment sequence (default: 1) +- `step_num`: The increment value for each new row (default: 1) +- `ORDER`: Guarantees monotonically increasing values (with potential gaps) +- `NOORDER`: Does not guarantee order (default) + +**Key Points:** + +- Auto-increment columns are internally backed by a sequence +- When a column with AUTOINCREMENT/IDENTITY is dropped, its associated sequence is also dropped +- If no explicit value is provided during insertion, the next value is automatically generated +- Both `AUTOINCREMENT` and `IDENTITY` are synonyms and behave identically + +**Example:** + +```sql +-- Create a table with auto-increment columns +CREATE TABLE users ( + user_id BIGINT AUTOINCREMENT, + order_id BIGINT AUTOINCREMENT START 100 INCREMENT 10, + username VARCHAR +); + +-- Insert data without specifying auto-increment columns +INSERT INTO users (username) VALUES ('alice'), ('bob'), ('charlie'); + +-- Query the table to see auto-generated values +SELECT * FROM users; + ++----------+----------+----------+ +| user_id | order_id | username | ++----------+----------+----------+ +| 0 | 100 | alice | +| 1 | 110 | bob | +| 2 | 120 | charlie | ++----------+----------+----------+ +``` + +## Computed Columns + +Computed columns are generated from other columns using scalar expressions. Databend supports two types: + +- **STORED**: Values are physically stored and automatically updated when dependent columns change +- **VIRTUAL**: Values are calculated on-the-fly during queries, saving storage space + +**Syntax:** + +```sql + [ NOT NULL | NULL ] AS () { STORED | VIRTUAL } + [ NOT NULL | NULL ] GENERATED ALWAYS AS () { STORED | VIRTUAL } +``` + +**Examples:** + +```sql +-- Stored: physically stored, updates immediately +CREATE TABLE products ( + id INT, + price FLOAT64, + quantity INT, + total_price FLOAT64 AS (price * quantity) STORED +); + +-- Virtual: computed on query, no storage overhead +CREATE TABLE employees ( + id INT, + first_name VARCHAR, + last_name VARCHAR, + full_name VARCHAR AS (CONCAT(first_name, ' ', last_name)) VIRTUAL +); +``` + +:::tip +Choose **STORED** for frequently queried columns where performance matters. Choose **VIRTUAL** to save storage space when computation cost is acceptable. +::: + +## MySQL Compatibility + +Databend's syntax is difference from MySQL mainly in the data type and some specific index hints. + +## Access control requirements + +| Privilege | Object Type | Description | +|:----------|:--------------|:-----------------------| +| CREATE | Global, Table | Creates a table. | + + +To create a table, the user performing the operation or the [current_role](/tidb-cloud-lake/guides/roles.md) must have the CREATE [privilege](/tidb-cloud-lake/guides/privileges.md#table-privileges). + + +## Examples + +### Create Table + +Create a table with a default value for a column (in this case, the `genre` column has 'General' as the default value): + +```sql +CREATE TABLE books ( + id BIGINT UNSIGNED, + title VARCHAR, + genre VARCHAR DEFAULT 'General' +); +``` + +Describe the table to confirm the structure and the default value for the `genre` column: + +```sql +DESC books; ++-------+-----------------+------+---------+-------+ +| Field | Type | Null | Default | Extra | ++-------+-----------------+------+---------+-------+ +| id | BIGINT UNSIGNED | YES | 0 | | +| title | VARCHAR | YES | "" | | +| genre | VARCHAR | YES | 'General'| | ++-------+-----------------+------+---------+-------+ +``` + +Insert a row without specifying the `genre`: + +```sql +INSERT INTO books(id, title) VALUES(1, 'Invisible Stars'); +``` + +Query the table and notice that the default value 'General' has been set for the `genre` column: + +```sql +SELECT * FROM books; ++----+----------------+---------+ +| id | title | genre | ++----+----------------+---------+ +| 1 | Invisible Stars| General | ++----+----------------+---------+ +``` + +### Create Table ... Like + +Create a new table (`books_copy`) with the same structure as an existing table (`books`): + +```sql +CREATE TABLE books_copy LIKE books; +``` + +Check the structure of the new table: + +```sql +DESC books_copy; ++-------+-----------------+------+---------+-------+ +| Field | Type | Null | Default | Extra | ++-------+-----------------+------+---------+-------+ +| id | BIGINT UNSIGNED | YES | 0 | | +| title | VARCHAR | YES | "" | | +| genre | VARCHAR | YES | 'General'| | ++-------+-----------------+------+---------+-------+ +``` + +Insert a row into the new table and notice that the default value for the `genre` column has been copied: + +```sql +INSERT INTO books_copy(id, title) VALUES(1, 'Invisible Stars'); + +SELECT * FROM books_copy; ++----+----------------+---------+ +| id | title | genre | ++----+----------------+---------+ +| 1 | Invisible Stars| General | ++----+----------------+---------+ +``` + +### Create Table ... As + +Create a new table (`books_backup`) that includes data from an existing table (`books`): + +```sql +CREATE TABLE books_backup AS SELECT * FROM books; +``` + +Describe the new table and notice that the default value for the `genre` column has NOT been copied: + +```sql +DESC books_backup; ++-------+-----------------+------+---------+-------+ +| Field | Type | Null | Default | Extra | ++-------+-----------------+------+---------+-------+ +| id | BIGINT UNSIGNED | NO | 0 | | +| title | VARCHAR | NO | "" | | +| genre | VARCHAR | NO | NULL | | ++-------+-----------------+------+---------+-------+ +``` + +Query the new table and notice that the data from the original table has been copied: + +```sql +SELECT * FROM books_backup; ++----+----------------+---------+ +| id | title | genre | ++----+----------------+---------+ +| 1 | Invisible Stars| General | ++----+----------------+---------+ +``` + +### Create Table ... Column As STORED | VIRTUAL + +The following example demonstrates a table with a stored computed column that automatically recalculates based on updates to the "price" or "quantity" columns: + +```sql +-- Create the table with a stored computed column +CREATE TABLE IF NOT EXISTS products ( + id INT, + price FLOAT64, + quantity INT, + total_price FLOAT64 AS (price * quantity) STORED +); + +-- Insert data into the table +INSERT INTO products (id, price, quantity) +VALUES (1, 10.5, 3), + (2, 15.2, 5), + (3, 8.7, 2); + +-- Query the table to see the computed column +SELECT id, price, quantity, total_price +FROM products; + +--- ++------+-------+----------+-------------+ +| id | price | quantity | total_price | ++------+-------+----------+-------------+ +| 1 | 10.5 | 3 | 31.5 | +| 2 | 15.2 | 5 | 76.0 | +| 3 | 8.7 | 2 | 17.4 | ++------+-------+----------+-------------+ +``` + +In this example, we create a table called student*profiles with a Variant type column named profile to store JSON data. We also add a virtual computed column named \_age* that extracts the age property from the profile column and casts it to an integer. + +```sql +-- Create the table with a virtual computed column +CREATE TABLE student_profiles ( + id STRING, + profile VARIANT, + age INT NULL AS (profile['age']::INT) VIRTUAL +); + +-- Insert data into the table +INSERT INTO student_profiles (id, profile) VALUES + ('d78236', '{"id": "d78236", "name": "Arthur Read", "age": "16", "school": "PVPHS", "credits": 120, "sports": "none"}'), + ('f98112', '{"name": "Buster Bunny", "age": "15", "id": "f98112", "school": "TEO", "credits": 67, "clubs": "MUN"}'), + ('t63512', '{"name": "Ernie Narayan", "school" : "Brooklyn Tech", "id": "t63512", "sports": "Track and Field", "clubs": "Chess"}'); + +-- Query the table to see the computed column +SELECT * FROM student_profiles; + ++--------+------------------------------------------------------------------------------------------------------------+------+ +| id | profile | age | ++--------+------------------------------------------------------------------------------------------------------------+------+ +| d78236 | `{"age":"16","credits":120,"id":"d78236","name":"Arthur Read","school":"PVPHS","sports":"none"}` | 16 | +| f98112 | `{"age":"15","clubs":"MUN","credits":67,"id":"f98112","name":"Buster Bunny","school":"TEO"}` | 15 | +| t63512 | `{"clubs":"Chess","id":"t63512","name":"Ernie Narayan","school":"Brooklyn Tech","sports":"Track and Field"}` | NULL | ++--------+------------------------------------------------------------------------------------------------------------+------+ +``` diff --git a/tidb-cloud-lake/sql/create-task.md b/tidb-cloud-lake/sql/create-task.md new file mode 100644 index 0000000000000..a7db35be08c4b --- /dev/null +++ b/tidb-cloud-lake/sql/create-task.md @@ -0,0 +1,272 @@ +--- +title: CREATE TASK +sidebar_position: 1 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +The CREATE TASK statement is used to define a new task that executes a specified SQL statement on a scheduled basis or dag based task graph. + +**NOTICE:** this functionality works out of the box only in Databend Cloud. + +## Syntax + +```sql +CREATE [ OR REPLACE ] TASK [ IF NOT EXISTS ] + WAREHOUSE = + SCHEDULE = { MINUTE | SECOND | USING CRON } + [ AFTER + [ WHEN ] + [ SUSPEND_TASK_AFTER_NUM_FAILURES = ] + [ ERROR_INTEGRATION = ] + [ COMMENT = '' ] + [ = [ , = ... ] ] +AS +{ +| BEGIN + ; + [ ; ... ] + END; +} +``` + +Wrap multiple SQL statements in a `BEGIN ... END;` block so the task executes them sequentially as a script. + +| Parameter | Description | +| ------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| IF NOT EXISTS | Optional. If specified, the task will only be created if a task of the same name does not already exist. | +| name | The name of the task. This is a mandatory field. | +| WAREHOUSE | Required. Specifies the virtual warehouse to use for the task. | +| SCHEDULE | Required. Defines the schedule on which the task will run. Can be specified in minutes or using a CRON expression along with a time zone. | +| SUSPEND_TASK_AFTER_NUM_FAILURES | Optional. The number of consecutive failures after which the task will be automatically suspended. | +| AFTER | List task that must be completed before this task starts. | +| WHEN boolean_expr | A condition that must be true for the task to run. | +| [ERROR_INTEGRATION](/tidb-cloud-lake/sql/notification.md) | Optional. The name of the notification integration to use for the task error notification with specific [task error payload ](/tidb-cloud-lake/sql/task-error-notification-payload.md)applied | +| COMMENT | Optional. A string literal that serves as a comment or description for the task. | +| session_parameter | Optional. Specifies session parameters to use for the task during task run. Note that session parameters must be placed after all other task parameters in the CREATE TASK statement. | +| sql | The SQL statement that the task will execute. It can be a single statement or a script wrapped in `BEGIN ... END;`. This is a mandatory field. | + +### Usage Notes + +- A schedule must be defined for a standalone task or the root task in a DAG of tasks; otherwise, the task only runs if manually executed using EXECUTE TASK. +- A schedule cannot be specified for child tasks in a DAG. +- After creating a task, you must execute ALTER TASK … RESUME before the task will run based on the parameters specified in the task definition. +- When Condition only support a subset of `` + The following are supported in a task WHEN clause: + + - [STREAM_STATUS](/tidb-cloud-lake/sql/stream-status.md) is supported for evaluation in the SQL expression. This function indicates whether a specified stream contains change tracking data. You can use this function to evaluate whether the specified stream contains change data before starting the current run. If the result is FALSE, then the task does not run. + - Boolean operators such as AND, OR, NOT, and others. + - Casts between numeric, string and boolean types. + - Comparison operators such as equal, not equal, greater than, less than, and others. + + :::note + Warning: When using STREAM_STATUS in tasks, you must include the database name when referencing the stream (e.g., `STREAM_STATUS('mydb.stream_name')`). + ::: + +- Multiple tasks that consume change data from a single table stream retrieve different deltas. When a task consumes the change data in a stream using a DML statement, the stream advances the offset. The change data is no longer available for the next task to consume. Currently, we recommend that only a single task consumes the change data from a stream. Multiple streams can be created for the same table and consumed by different tasks. +- Tasks will not retry on each execution; each execution is serial. Each script SQL is executed one by one, with no parallel execution. This ensures that the sequence and dependencies of task execution are maintained. +- Interval-based tasks follow a fixed interval spot in a tight way. This means that if the current task execution time exceeds the interval unit, the next task will execute immediately. Otherwise, the next task will wait until the next interval unit is triggered. For example, if a task is defined with a 1-second interval and one task execution takes 1.5 seconds, the next task will execute immediately. If one task execution takes 0.5 seconds, the next task will wait until the next 1-second interval tick starts. +- While session parameters can be specified during task creation, you can also modify them later using the ALTER TASK statement. For example: + ```sql + ALTER TASK simple_task SET + enable_query_result_cache = 1, + query_result_cache_min_execute_secs = 5; + ``` + +### Important Notes on Cron Expressions + +- The cron expression used in the `SCHEDULE` parameter must contain **exactly 6 fields**. +- The fields represent the following: + 1. **Second** (0-59) + 2. **Minute** (0-59) + 3. **Hour** (0-23) + 4. **Day of the Month** (1-31) + 5. **Month** (1-12 or JAN-DEC) + 6. **Day of the Week** (0-6, where 0 is Sunday, or SUN-SAT) + + #### Example Cron Expressions: + +- **Daily at 9:00:00 AM Pacific Time:** + - `USING CRON '0 0 9 * * *' 'America/Los_Angeles'` + +- **Every minute:** + - `USING CRON '0 * * * * *' 'UTC'` + - This runs the task every minute at the start of the minute. + +- **Every hour at the 15th minute:** + - `USING CRON '0 15 * * * *' 'UTC'` + - This runs the task every hour at 15 minutes past the hour. + +- **Every Monday at 12:00:00 PM:** + - `USING CRON '0 0 12 * * 1' 'UTC'` + - This runs the task every Monday at noon. + +- **On the first day of every month at midnight:** + - `USING CRON '0 0 0 1 * *' 'UTC'` + - This runs the task at midnight on the first day of every month. + +- **Every weekday at 8:30:00 AM:** + - `USING CRON '0 30 8 * * 1-5' 'UTC'` + - This runs the task every weekday (Monday to Friday) at 8:30 AM. + +## Usage Examples + +### CRON Schedule + +```sql +CREATE TASK my_daily_task + WAREHOUSE = 'compute_wh' + SCHEDULE = USING CRON '0 0 9 * * *' 'America/Los_Angeles' + COMMENT = 'Daily summary task' +AS + INSERT INTO summary_table SELECT * FROM source_table; +``` + +In this example, a task named `my_daily_task` is created. It uses the **compute_wh** warehouse to run a SQL statement that inserts data into summary_table from source_table. The task is scheduled to run using a **CRON expression** that executes **daily at 9 AM Pacific Time**. + +### Multiple Statements + +```sql +CREATE TASK IF NOT EXISTS nightly_refresh + WAREHOUSE = 'etl' + SCHEDULE = USING CRON '0 0 2 * * *' 'UTC' +AS +BEGIN + DELETE FROM staging.events WHERE event_time < DATEADD(DAY, -1, CURRENT_TIMESTAMP()); + INSERT INTO mart.events SELECT * FROM staging.events; +END; +``` + +This example creates a task named `nightly_refresh` that executes a script containing multiple statements. The script is wrapped in `BEGIN ... END;` so the DELETE runs before the INSERT each time the task executes. + +### Dynamic SQL (EXECUTE IMMEDIATE) + +```sql +CREATE OR REPLACE TASK log_ingestion + WAREHOUSE = 'default' + SCHEDULE = 1 MINUTE +AS +EXECUTE IMMEDIATE $$ +BEGIN + LET path := CONCAT('@mylog/', DATE_FORMAT(CURRENT_DATE - INTERVAL 3 DAY, '%m/%d/')); + + LET sql := CONCAT( + 'COPY INTO logs.web_logs FROM ', path, + ' PATTERN = ''.*[.]gz'' FILE_FORMAT = (type = NDJSON compression = AUTO) MAX_FILES = 10000' + ); + + EXECUTE IMMEDIATE :sql; +END; +$$; +``` + +This example creates a task that runs every minute. It dynamically computes the stage path for **3 days ago** (for example, `@mylog/12/15/`), builds a `COPY INTO` statement, and executes it via `EXECUTE IMMEDIATE`. + +### Automatic Suspension + +```sql +CREATE TASK IF NOT EXISTS mytask + WAREHOUSE = 'system' + SCHEDULE = 2 MINUTE + SUSPEND_TASK_AFTER_NUM_FAILURES = 3 +AS + INSERT INTO compaction_test.test VALUES((1)); +``` + +This example creates a task named `mytask`, if it doesn't already exist. The task is assigned to the **system** warehouse and is scheduled to run **every 2 minutes**. It will be **automatically suspended** if it **fails three times consecutively**. The task performs an INSERT operation into the compaction_test.test table. + +### Second-Level Scheduling + +```sql +CREATE TASK IF NOT EXISTS daily_sales_summary + WAREHOUSE = 'analytics' + SCHEDULE = 30 SECOND +AS + SELECT sales_date, SUM(amount) AS daily_total + FROM sales_data + GROUP BY sales_date; +``` + +In this example, a task named `daily_sales_summary` is created with **second-level scheduling**. It is scheduled to run **every 30 SECOND**. The task uses the **analytics** warehouse and calculates the daily sales summary by aggregating data from the sales_data table. + +### Task Dependencies + +```sql +CREATE TASK IF NOT EXISTS process_orders + WAREHOUSE = 'etl' + AFTER task1 +AS + INSERT INTO data_warehouse.orders SELECT * FROM staging.orders; +``` + +In this example, a task named `process_orders` is created, and it is defined to run **after the successful completion** of **task1** and **task2**. This is useful for creating **dependencies** in a **Directed Acyclic Graph (DAG)** of tasks. The task uses the **etl** warehouse and transfers data from the staging area to the data warehouse. + +> Tip: Using the AFLTER parameter does not require setting the SCHEDULE parameter. + +### Conditional Execution + +```sql +CREATE TASK IF NOT EXISTS hourly_data_cleanup + WAREHOUSE = 'maintenance' + SCHEDULE = USING CRON '0 0 9 * * *' 'America/Los_Angeles' + WHEN STREAM_STATUS('db1.change_stream') = TRUE +AS + DELETE FROM archived_data + WHERE archived_date < DATEADD(HOUR, -24, CURRENT_TIMESTAMP()); + +``` + +In this example, a task named `hourly_data_cleanup` is created. It uses the **maintenance** warehouse and is scheduled to run **every hour**. The task deletes data from the archived_data table that is older than 24 hours. The task only runs **if the condition is met** using the **STREAM_STATUS** function to check if the `db1.change_stream` contains change data. + +### Error Integration + +```sql +CREATE TASK IF NOT EXISTS mytask + WAREHOUSE = 'mywh' + SCHEDULE = 30 SECOND + ERROR_INTEGRATION = 'myerror' +AS + BEGIN + BEGIN; + INSERT INTO mytable(ts) VALUES(CURRENT_TIMESTAMP); + DELETE FROM mytable WHERE ts < DATEADD(MINUTE, -5, CURRENT_TIMESTAMP()); + COMMIT; + END; +``` + +In this example, a task named `mytask` is created. It uses the **mywh** warehouse and is scheduled to run **every 30 seconds**. The task executes a **BEGIN block** that contains an INSERT statement and a DELETE statement. The task commits the transaction after both statements are executed. When the task fails, it will trigger the **error integration** named **myerror**. + +### Session Parameters + +```sql +CREATE TASK IF NOT EXISTS cache_enabled_task + WAREHOUSE = 'analytics' + SCHEDULE = 5 MINUTE + COMMENT = 'Task with query result cache enabled' + enable_query_result_cache = 1, + query_result_cache_min_execute_secs = 5 +AS + SELECT SUM(amount) AS total_sales + FROM sales_data + WHERE transaction_date >= DATEADD(DAY, -7, CURRENT_DATE()) + GROUP BY product_category; +``` + +In this example, a task named `cache_enabled_task` is created with **session parameters** that enable query result caching. The task is scheduled to run **every 5 minutes** and uses the **analytics** warehouse. The session parameters **`enable_query_result_cache = 1`** and **`query_result_cache_min_execute_secs = 5`** are specified **after all other task parameters**, enabling the query result cache for queries that take at least 5 seconds to execute. This can **improve performance** for subsequent executions of the same task if the underlying data hasn't changed. + +### View Task Run History + +Use the `TASK_HISTORY()` table function to see when and how tasks ran: + +```sql +SELECT * +FROM TASK_HISTORY( + TASK_NAME => 'daily_sales_summary', + RESULT_LIMIT => 20 +) +ORDER BY scheduled_time DESC; +``` + +See [TASK HISTORY](/tidb-cloud-lake/sql/table-functions.md) for all options, including filtering by time range or root task ID in a DAG. diff --git a/tidb-cloud-lake/sql/create-temp-table.md b/tidb-cloud-lake/sql/create-temp-table.md new file mode 100644 index 0000000000000..0dcfc74a80ab0 --- /dev/null +++ b/tidb-cloud-lake/sql/create-temp-table.md @@ -0,0 +1,86 @@ +--- +title: CREATE TEMP TABLE +sidebar_position: 1 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a temporary table that is automatically dropped at the end of the session. + +- A temporary table is visible only within the session that created it and is automatically dropped, with all data vacuumed, when the session ends. + - In cases where automatic cleanup of temporary tables fails—for example, due to a query node crash—you can use the [FUSE_VACUUM_TEMPORARY_TABLE](/tidb-cloud-lake/sql/fuse-vacuum-temporary-table.md) function to manually clean up leftover files from temporary tables. +- To show the existing temporary tables in the session, query the [system.temporary_tables](/tidb-cloud-lake/sql/system-tables.md) system table. See [Example-1](#example-1). +- A temporary table with the same name as a normal table takes precedence, hiding the normal table until dropped. See [Example-2](#example-2). +- No privileges are required to create or operate on a temporary table. +- Databend supports creating temporary tables with the [Fuse Engine](/tidb-cloud-lake/sql/table-engines.md). +- To create temporary tables using BendSQL, ensure you are using the latest version of BendSQL. + +## Syntax + +```sql +CREATE [ OR REPLACE ] { TEMPORARY | TEMP } TABLE + [ IF NOT EXISTS ] + [ . ] + ... +``` + +The omitted parts follow the syntax of [CREATE TABLE](/tidb-cloud-lake/sql/create-table.md). + +## Examples + +### Example-1 + +This example demonstrates how to create a temporary table and verify its existence by querying the [system.temporary_tables](/tidb-cloud-lake/sql/system-tables.md) system table: + +```sql +CREATE TEMP TABLE my_table (id INT, description STRING); + +SELECT * FROM system.temporary_tables; + +┌────────────────────────────────────────────────────┐ +│ database │ name │ table_id │ engine │ +├──────────┼──────────┼─────────────────────┼────────┤ +│ default │ my_table │ 4611686018427407904 │ FUSE │ +└────────────────────────────────────────────────────┘ +``` + +### Example-2 + +This example demonstrates how a temporary table with the same name as a normal table takes precedence. When both tables exist, operations target the temporary table, effectively hiding the normal table. Once the temporary table is dropped, the normal table becomes accessible again: + +```sql +-- Create a normal table +CREATE TABLE my_table (id INT, name STRING); + +-- Insert data into the normal table +INSERT INTO my_table VALUES (1, 'Alice'), (2, 'Bob'); + +-- Create a temporary table with the same name +CREATE TEMP TABLE my_table (id INT, description STRING); + +-- Insert data into the temporary table +INSERT INTO my_table VALUES (1, 'Temp Data'); + +-- Query the table: This will access the temporary table, hiding the normal table +SELECT * FROM my_table; + +┌────────────────────────────────────┐ +│ id │ description │ +├─────────────────┼──────────────────┤ +│ 1 │ Temp Data │ +└────────────────────────────────────┘ + +-- Drop the temporary table +DROP TABLE my_table; + +-- Query the table again: Now the normal table is accessible +SELECT * FROM my_table; + +┌────────────────────────────────────┐ +│ id │ name │ +├─────────────────┼──────────────────┤ +│ 1 │ Alice │ +│ 2 │ Bob │ +└────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/create-transient-table.md b/tidb-cloud-lake/sql/create-transient-table.md new file mode 100644 index 0000000000000..7e0947452776f --- /dev/null +++ b/tidb-cloud-lake/sql/create-transient-table.md @@ -0,0 +1,33 @@ +--- +title: CREATE TRANSIENT TABLE +sidebar_position: 1 +--- + +Creates a table without storing its historical data for Time Travel. + +Transient tables are used to hold transitory data that does not require a data protection or recovery mechanism. Dataebend does not hold historical data for a transient table so you will not be able to query from a previous version of the transient table with the Time Travel feature, for example, the [AT](/tidb-cloud-lake/sql/at.md) clause in the SELECT statement will not work for transient tables. Please note that you can still [drop](/tidb-cloud-lake/sql/drop-table.md) and [undrop](/tidb-cloud-lake/sql/undrop-table.md) a transient table. + +:::caution +Concurrent modifications (including write operations) on transient tables may cause data corruption, making the data unreadable. This defect is being addressed. Until fixed, please avoid concurrent modifications on transient tables. +::: + +## Syntax + +```sql +CREATE [ OR REPLACE ] TRANSIENT TABLE + [ IF NOT EXISTS ] + [ . ] + ... +``` + +The omitted parts follow the syntax of [CREATE TABLE](/tidb-cloud-lake/sql/create-table.md). + +## Examples + +This examples creates a transient table named `visits`: + +```sql +CREATE TRANSIENT TABLE visits ( + visitor_id BIGINT +); +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/create-user.md b/tidb-cloud-lake/sql/create-user.md new file mode 100644 index 0000000000000..3eb9ca0f3a4b9 --- /dev/null +++ b/tidb-cloud-lake/sql/create-user.md @@ -0,0 +1,107 @@ +--- +title: CREATE USER +sidebar_position: 1 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a SQL user for connecting to Databend. Users must be granted appropriate privileges to access databases and perform operations. + +See also: +- [GRANT](/tidb-cloud-lake/sql/grant.md) +- [ALTER USER](/tidb-cloud-lake/sql/alter-user.md) +- [DROP USER](/tidb-cloud-lake/sql/drop-user.md) + +## Syntax + +```sql +CREATE [ OR REPLACE ] USER IDENTIFIED [ WITH ] BY '' +[ WITH MUST_CHANGE_PASSWORD = true | false ] +[ WITH SET PASSWORD POLICY = '' ] +[ WITH SET NETWORK POLICY = '' ] +[ WITH DEFAULT_ROLE = '' ] +[ WITH DISABLED = true | false ] +``` + +**Parameters:** +- ``: Username (cannot contain single quotes, double quotes, backspace, or form feed characters) +- ``: Authentication type - `double_sha1_password` (default), `sha256_password`, or `no_password` +- `MUST_CHANGE_PASSWORD`: When `true`, user must change password at first login +- `DEFAULT_ROLE`: Sets default role (role must be explicitly granted to take effect) +- `DISABLED`: When `true`, user is created in disabled state and cannot log in + +## Examples + +### Example 1: Create User and Grant Database Privileges + +Create a role, grant database privileges, and assign the role to a user: + +```sql +-- Create a role and grant database privileges +CREATE ROLE data_analyst_role; +GRANT SELECT, INSERT ON default.* TO ROLE data_analyst_role; + +-- Create a new user and assign the role +CREATE USER data_analyst IDENTIFIED BY 'secure_password123' WITH DEFAULT_ROLE = 'data_analyst_role'; +GRANT ROLE data_analyst_role TO data_analyst; +``` + +Verify the role and permissions: +```sql +SHOW GRANTS FOR ROLE data_analyst_role; ++-----------------------------------------------------------------+ +| Grants | ++-----------------------------------------------------------------+ +| GRANT SELECT,INSERT ON 'default'.* TO ROLE 'data_analyst_role' | ++-----------------------------------------------------------------+ +``` + +### Example 2: Create User and Grant Role + +Create a user and assign a role with specific privileges: + +```sql +-- Create a role with specific privileges +CREATE ROLE analyst_role; +GRANT SELECT ON *.* TO ROLE analyst_role; +GRANT INSERT ON default.* TO ROLE analyst_role; + +-- Create user and grant the role +CREATE USER john_analyst IDENTIFIED BY 'secure_pass456'; +GRANT ROLE analyst_role TO john_analyst; +``` + +Verify the role assignment: +```sql +SHOW GRANTS FOR john_analyst; ++------------------------------------------+ +| Grants | ++------------------------------------------+ +| GRANT ROLE analyst_role TO 'john_analyst'@'%' | ++------------------------------------------+ +``` + +### Example 3: Create Users with Different Authentication Types + +```sql +-- Create user with default authentication +CREATE USER user1 IDENTIFIED BY 'abc123'; + +-- Create user with SHA256 authentication +CREATE USER user2 IDENTIFIED WITH sha256_password BY 'abc123'; +``` + +### Example 4: Create Users with Special Configurations + +```sql +-- Create user with password change requirement +CREATE USER new_employee IDENTIFIED BY 'temp123' WITH MUST_CHANGE_PASSWORD = true; + +-- Create user in disabled state +CREATE USER temp_user IDENTIFIED BY 'abc123' WITH DISABLED = true; + +-- Create user with default role (role must be granted separately) +CREATE USER manager IDENTIFIED BY 'abc123' WITH DEFAULT_ROLE = 'admin'; +GRANT ROLE admin TO manager; +``` diff --git a/tidb-cloud-lake/sql/create-vector-index.md b/tidb-cloud-lake/sql/create-vector-index.md new file mode 100644 index 0000000000000..bcd8367b0c745 --- /dev/null +++ b/tidb-cloud-lake/sql/create-vector-index.md @@ -0,0 +1,197 @@ +--- +title: CREATE VECTOR INDEX +sidebar_position: 1 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a Vector index on a [VECTOR](/tidb-cloud-lake/sql/vector.md) column for a table to enable efficient similarity search using the HNSW (Hierarchical Navigable Small World) algorithm. + +## Syntax + +```sql +-- Create a Vector index on an existing table +CREATE [OR REPLACE] VECTOR INDEX [IF NOT EXISTS] +ON [.]() +distance = '' [m = ] [ef_construct = ] + +-- Create a Vector index when creating a table +CREATE [OR REPLACE] TABLE ( + , + VECTOR INDEX () + distance = '' [m = ] [ef_construct = ] +)... +``` + +### Parameters + +- **`distance`** (required) - Specifies the distance metric(s) to use for similarity search. Multiple metrics can be combined with commas: + - `'cosine'` - Cosine distance (best for semantic similarity, text embeddings) + - `'l1'` - L1 distance / Manhattan distance (good for feature comparison, sparse data) + - `'l2'` - L2 distance / Euclidean distance (best for geometric similarity, image features) + - Example: `distance = 'cosine,l1,l2'` supports all three metrics + +- **`m`** (optional, default: 16) - Controls the number of bidirectional connections each node has in the HNSW graph: + - Higher values increase memory usage but can improve search accuracy + - Must be greater than 0 + - Typical range: 8-64 + +- **`ef_construct`** (optional, default: 100) - Controls the size of the dynamic candidate list during index construction: + - Higher values improve index quality but increase construction time and memory + - Must be >= 40 + - Typical range: 40-500 + +## How Vector Index Works + +Vector indexes in Databend use the HNSW algorithm to build a multi-layered graph structure: + +1. **Graph Structure**: Each vector is a node with connections to its nearest neighbors +2. **Search Process**: Queries navigate through graph layers, from coarse to fine, to find approximate nearest neighbors quickly +3. **Quantization**: Raw vectors are quantized to reduce storage and improve query performance (with negligible accuracy loss) +4. **Automatic Building**: The index is automatically built as data is written. Every INSERT, COPY, or data load operation automatically generates the index for new rows - no manual maintenance required + +## Examples + +### Creating a Table with Vector Index + +```sql +-- Simple vector index for embeddings +CREATE TABLE documents ( + id INT, + title VARCHAR, + content TEXT, + embedding VECTOR(1024), + VECTOR INDEX idx_embedding(embedding) distance = 'cosine' +); +``` + +### Creating a Vector Index with Custom Parameters + +```sql +-- Vector index with multiple distance metrics and tuned parameters +CREATE TABLE images ( + id INT, + filename VARCHAR, + feature_vector VECTOR(512), + VECTOR INDEX idx_features(feature_vector) + distance = 'cosine,l2' + m = 32 + ef_construct = 200 +); +``` + +### Creating a Vector Index on an Existing Table + +```sql +CREATE TABLE products ( + id INT, + name VARCHAR, + description TEXT, + embedding VECTOR(768) +); + +-- Add vector index after table creation +CREATE VECTOR INDEX idx_product_embedding +ON products(embedding) +distance = 'cosine,l1,l2' +m = 20 +ef_construct = 150; +``` + +### Multiple Vector Indexes on Different Columns + +```sql +CREATE TABLE multimodal_data ( + id INT, + text_embedding VECTOR(384), + image_embedding VECTOR(512), + VECTOR INDEX idx_text(text_embedding) distance = 'cosine', + VECTOR INDEX idx_image(image_embedding) distance = 'l2' +); +``` + +### Viewing Indexes + +Use [SHOW INDEXES](/tidb-cloud-lake/sql/show-indexes.md) to view all indexes: + +```sql +SHOW INDEXES; +``` + +Result: +``` +┌──────────────────────┬────────┬──────────┬────────────────────────────┬──────────────────────────┐ +│ name │ type │ original │ definition │ created_on │ +├──────────────────────┼────────┼──────────┼────────────────────────────┼──────────────────────────┤ +│ idx_embedding │ VECTOR │ │ documents(embedding) │ 2025-05-13 01:22:34.123 │ +│ idx_product_embedding│ VECTOR │ │ products(embedding) │ 2025-05-13 01:23:45.678 │ +└──────────────────────┴────────┴──────────┴────────────────────────────┴──────────────────────────┘ +``` + +### Using Vector Index for Similarity Search + +```sql +-- Create a table with vector index +CREATE TABLE wiki_articles ( + id INT, + title VARCHAR, + embedding VECTOR(8), + VECTOR INDEX idx_embedding(embedding) distance = 'cosine' +); + +-- Insert sample data (8-dimensional vectors for demonstration) +INSERT INTO wiki_articles VALUES +(1, 'Machine Learning', [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]), +(2, 'Deep Learning', [0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85]), +(3, 'Natural Language Processing', [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]), +(4, 'Computer Vision', [0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1]); + +-- Find the 2 most similar articles to a query vector using cosine distance +SELECT id, title, cosine_distance(embedding, [0.12, 0.22, 0.32, 0.42, 0.52, 0.62, 0.72, 0.82]) AS distance +FROM wiki_articles +ORDER BY distance ASC +LIMIT 2; +``` + +Result: +``` +┌────┬─────────────────┬──────────────┐ +│ id │ title │ distance │ +├────┼─────────────────┼──────────────┤ +│ 1 │ Machine Learning│ 0.00012345 │ +│ 2 │ Deep Learning │ 0.00023456 │ +└────┴─────────────────┴──────────────┘ +``` + +## Performance Tuning + +### Choosing Distance Metrics + +Choose the appropriate distance metric based on your use case. See [Vector Functions](/tidb-cloud-lake/sql/vector-functions.md) for querying with distance functions. + +- **Cosine distance**: Best for text embeddings from models like BERT, GPT, where vector magnitude doesn't matter +- **L2 (Euclidean) distance**: Best for image features, spatial data where absolute differences matter +- **L1 (Manhattan) distance**: Good for sparse vectors and when you want to emphasize individual dimension differences + +### Tuning HNSW Parameters + +| Parameter | Lower Value | Higher Value | +|----------------|--------------------------------------|--------------------------------------| +| `m` | Less memory, faster construction | Better accuracy, more memory | +| `ef_construct` | Faster construction, lower quality | Better quality, slower construction | + +**Recommended configurations:** + +- **Small datasets (< 100K vectors)**: Default settings (`m=16`, `ef_construct=100`) +- **Medium datasets (100K - 1M vectors)**: `m=24`, `ef_construct=150` +- **Large datasets (> 1M vectors)**: `m=32`, `ef_construct=200` +- **High accuracy requirements**: `m=48`, `ef_construct=300` + +## Limitations + +- Vector indexes only support columns with [VECTOR](/tidb-cloud-lake/sql/vector.md) data type +- The `distance` parameter is required; indexes without it will be ignored +- Quantization may introduce negligible errors in distance calculations (typically < 0.01%) +- Index size increases with higher `m` values (approximately `m * vector_dimension * 4 bytes` per vector) diff --git a/tidb-cloud-lake/sql/create-view.md b/tidb-cloud-lake/sql/create-view.md new file mode 100644 index 0000000000000..b717e296ce4b4 --- /dev/null +++ b/tidb-cloud-lake/sql/create-view.md @@ -0,0 +1,53 @@ +--- +title: CREATE VIEW +sidebar_position: 1 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a new view based on a query; the Logical View does not store any physical data, when we access a logical view, it will convert the sql into the subquery format to finish it. + +For example, if you create a Logical View like: + +```sql +CREATE VIEW view_t1 AS SELECT a, b FROM t1; +``` +And do a query like: +```sql +SELECT a FROM view_t1; +``` +the result equals the below query +```sql +SELECT a FROM (SELECT a, b FROM t1); +``` + +So, if you delete the table which the view depends on, it occurs an error that the original table does not exist. And you may need to drop the old view and recreate the new view you need. + +## Syntax + +```sql +CREATE [ OR REPLACE ] VIEW [ IF NOT EXISTS ] [ db. ]view_name [ (, ...) ] AS SELECT query +``` + +## Access control requirements + +To access a view, users only require the SELECT privilege on the view itself. + +Separate permissions are not required on the view’s underlying tables. This mechanism simplifies access control and enhances data security. + +## Examples + +```sql +CREATE VIEW tmp_view(c1, c2) AS SELECT number % 3 AS a, avg(number) FROM numbers(1000) GROUP BY a ORDER BY a; + +SELECT * FROM tmp_view; ++------+-------+ +| c1 | c2 | ++------+-------+ +| 0 | 499.5 | +| 1 | 499.0 | +| 2 | 500.0 | ++------+-------+ +``` diff --git a/tidb-cloud-lake/sql/create-warehouse.md b/tidb-cloud-lake/sql/create-warehouse.md new file mode 100644 index 0000000000000..26e9869a96639 --- /dev/null +++ b/tidb-cloud-lake/sql/create-warehouse.md @@ -0,0 +1,68 @@ +--- +title: CREATE WAREHOUSE +sidebar_position: 1 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a new warehouse for compute resources. + +## Syntax + +```sql +CREATE WAREHOUSE [ IF NOT EXISTS ] + [ WITH ] warehouse_size = + [ WITH ] auto_suspend = + [ WITH ] initially_suspended = + [ WITH ] auto_resume = + [ WITH ] max_cluster_count = + [ WITH ] min_cluster_count = + [ WITH ] comment = '' + [ WITH ] TAG ( = '' [ , = '' , ... ] ) +``` + +| Parameter | Description | +| --------------- | --------------------------------------------------------------------------------------------- | +| `IF NOT EXISTS` | Optional. If specified, the command succeeds without changes if the warehouse already exists. | +| warehouse_name | 3–63 characters, containing only `A-Z`, `a-z`, `0-9`, and `-`. | + +## Options + +| Option | Type / Values | Default | Description | +| --------------------- | -------------------------------------------------------------------------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | +| `WAREHOUSE_SIZE` | `XSmall`, `Small`, `Medium`, `Large`, `XLarge`, `2XLarge`–`6XLarge` (case-insensitive) | `Small` | Controls compute size. | +| `AUTO_SUSPEND` | `NULL`, `0`, or ≥300 seconds | `600` seconds | Idle timeout before automatic suspend. `0`/`NULL` means never suspend; values below 300 are rejected. | +| `INITIALLY_SUSPENDED` | Boolean | `FALSE` | If `TRUE`, the warehouse remains suspended after creation until explicitly resumed. | +| `AUTO_RESUME` | Boolean | `TRUE` | Controls whether incoming queries wake the warehouse automatically. | +| `MAX_CLUSTER_COUNT` | `NULL` or non-negative integer | `0` | Upper bound for auto-scaling clusters. `0` disables auto-scale. | +| `MIN_CLUSTER_COUNT` | `NULL` or non-negative integer | `0` | Lower bound for auto-scaling clusters; should be ≤ `MAX_CLUSTER_COUNT`. | +| `COMMENT` | String | Empty | Free-form text surfaced by `SHOW WAREHOUSES`. | +| `TAG` | Key-value pairs: `TAG ( key1 = 'value1', key2 = 'value2' )` | None | Resource tags for categorization and organization (similar to AWS tags). Used for cost allocation, environment identification, or team ownership. | + +- Options may appear in any order and may repeat (the later value wins). +- `AUTO_SUSPEND`, `MAX_CLUSTER_COUNT`, and `MIN_CLUSTER_COUNT` accept `= NULL` to reset to `0`. + +## Examples + +This example creates an XLarge warehouse with auto-scaling and custom settings: + +```sql +CREATE WAREHOUSE IF NOT EXISTS etl_wh + WITH warehouse_size = XLarge + auto_suspend = 600 + initially_suspended = TRUE + auto_resume = FALSE + max_cluster_count = 4 + min_cluster_count = 2 + comment = 'Nightly ETL warehouse' + TAG (environment = 'production', team = 'data-engineering', cost_center = 'analytics'); +``` + +This example creates a basic Small warehouse: + +```sql +CREATE WAREHOUSE my_warehouse + WITH warehouse_size = Small; +``` diff --git a/tidb-cloud-lake/sql/create-workload-group.md b/tidb-cloud-lake/sql/create-workload-group.md new file mode 100644 index 0000000000000..f4d4155cb354e --- /dev/null +++ b/tidb-cloud-lake/sql/create-workload-group.md @@ -0,0 +1,93 @@ +--- +title: CREATE WORKLOAD GROUP +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a workload group with specified quota settings. Workload groups control resource allocation and query concurrency by binding to users. When a user submits queries, the workload group limits are applied based on the user's assigned group. + +## Syntax + +```sql +CREATE WORKLOAD GROUP [IF NOT EXISTS] +[WITH cpu_quota = '', query_timeout = ''] +``` + +## Parameters + +| Parameter | Type | Required | Default | Description | +|------------------------|----------|----------|--------------|-----------------------------------------------------------------------------| +| `cpu_quota` | string | No | (unlimited) | CPU resource quota as percentage string (e.g. `"20%"`) | +| `query_timeout` | duration | No | (unlimited) | Query timeout duration (units: `s`/`sec`=seconds, `m`/`min`=minutes, `h`/`hour`=hours, `d`/`day`=days, `ms`=milliseconds, unitless=seconds) | +| `memory_quota` | string or integer | No | (unlimited) | Maximum memory usage limit for workload group (percentage or absolute value) | +| `max_concurrency` | integer | No | (unlimited) | Maximum concurrency number for workload group | +| `query_queued_timeout` | duration | No | (unlimited) | Maximum queuing wait time when workload group exceeds max concurrency (units: `s`/`sec`=seconds, `m`/`min`=minutes, `h`/`hour`=hours, `d`/`day`=days, `ms`=milliseconds, unitless=seconds) | + +## Examples + +### Basic Example + +```sql +-- Create workload groups +CREATE WORKLOAD GROUP IF NOT EXISTS interactive_queries +WITH cpu_quota = '30%', memory_quota = '20%', max_concurrency = 2; + +CREATE WORKLOAD GROUP IF NOT EXISTS batch_processing +WITH cpu_quota = '70%', memory_quota = '80%', max_concurrency = 10; +``` + +### User Assignment + +Users must be assigned to workload groups to enable resource limiting. When users execute queries, the system applies the workload group's restrictions automatically. + +```sql +-- Create role and grant permissions +CREATE ROLE analytics_role; +GRANT ALL ON *.* TO ROLE analytics_role; +CREATE USER analytics_user IDENTIFIED BY 'password123' WITH DEFAULT_ROLE = 'analytics_role'; +GRANT ROLE analytics_role TO analytics_user; + +-- Assign user to workload group +ALTER USER analytics_user WITH SET WORKLOAD GROUP = 'interactive_queries'; + +-- Reassign to different workload group +ALTER USER analytics_user WITH SET WORKLOAD GROUP = 'batch_processing'; + +-- Remove from workload group (user will use default unlimited resources) +ALTER USER analytics_user WITH UNSET WORKLOAD GROUP; + +-- Check user's workload group +DESC USER analytics_user; +``` + +## Resource Quota Normalization + +### Quota Limits +- Each workload group's `cpu_quota` and `memory_quota` can be set up to `100%` (1.0) +- The total sum of all quotas across workload groups can exceed 100% +- Actual resource allocation is **normalized** based on relative proportions + +### How Quota Normalization Works + +Resources are allocated proportionally based on each group's quota relative to the total: + +``` +Actual Allocation = (Group Quota) / (Sum of All Group Quotas) × 100% +``` + +**Example 1: Total quotas = 100%** +- Group A: 30% quota → Gets 30% of resources (30/100) +- Group B: 70% quota → Gets 70% of resources (70/100) + +**Example 2: Total quotas > 100%** +- Group A: 60% quota → Gets 40% of resources (60/150) +- Group B: 90% quota → Gets 60% of resources (90/150) +- Total quotas: 150% + +**Example 3: Total quotas < 100%** +- Group A: 20% quota → Gets 67% of resources (20/30) +- Group B: 10% quota → Gets 33% of resources (10/30) +- Total quotas: 30% + +**Special Case:** When only one workload group exists, it gets 100% of warehouse resources regardless of its configured quota. diff --git a/tidb-cloud-lake/sql/cume-dist.md b/tidb-cloud-lake/sql/cume-dist.md new file mode 100644 index 0000000000000..4ce063216fd13 --- /dev/null +++ b/tidb-cloud-lake/sql/cume-dist.md @@ -0,0 +1,68 @@ +--- +title: CUME_DIST +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Calculates the cumulative distribution of each row's value. Returns the fraction of rows with values less than or equal to the current row's value. + +See also: [PERCENT_RANK](/tidb-cloud-lake/sql/percent-rank.md) + +## Syntax + +```sql +CUME_DIST() +OVER ( + [ PARTITION BY partition_expression ] + ORDER BY sort_expression [ ASC | DESC ] +) +``` + +**Arguments:** +- `PARTITION BY`: Optional. Divides rows into partitions +- `ORDER BY`: Required. Determines the distribution order +- `ASC | DESC`: Optional. Sort direction (default: ASC) + +**Notes:** +- Returns values between 0 and 1 (exclusive of 0, inclusive of 1) +- Formula: (number of rows ≤ current value) / (total rows) +- Always returns 1.0 for the highest value(s) +- Useful for calculating percentiles and cumulative percentages + +## Examples + +```sql +-- Create sample data +CREATE TABLE scores ( + student VARCHAR(20), + score INT +); + +INSERT INTO scores VALUES + ('Alice', 95), + ('Bob', 87), + ('Charlie', 87), + ('David', 82), + ('Eve', 78); +``` + +**Calculate cumulative distribution (showing what percentage of students scored at or below each score):** + +```sql +SELECT student, score, + CUME_DIST() OVER (ORDER BY score) AS cume_dist, + ROUND(CUME_DIST() OVER (ORDER BY score) * 100) AS cumulative_percent +FROM scores +ORDER BY score; +``` + +Result: +``` +student | score | cume_dist | cumulative_percent +--------+-------+-----------+------------------- +Eve | 78 | 0.2 | 20 +David | 82 | 0.4 | 40 +Bob | 87 | 0.8 | 80 +Charlie | 87 | 0.8 | 80 +Alice | 95 | 1.0 | 100 \ No newline at end of file diff --git a/tidb-cloud-lake/sql/current-catalog.md b/tidb-cloud-lake/sql/current-catalog.md new file mode 100644 index 0000000000000..8641dd92cebaa --- /dev/null +++ b/tidb-cloud-lake/sql/current-catalog.md @@ -0,0 +1,26 @@ +--- +title: CURRENT_CATALOG +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the name of the catalog currently in use for the session. + +## Syntax + +```sql +CURRENT_CATALOG() +``` + +## Examples + +```sql +SELECT CURRENT_CATALOG(); + +┌───────────────────┐ +│ current_catalog() │ +├───────────────────┤ +│ default │ +└───────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/current-timestamp.md b/tidb-cloud-lake/sql/current-timestamp.md new file mode 100644 index 0000000000000..f91006440fcbe --- /dev/null +++ b/tidb-cloud-lake/sql/current-timestamp.md @@ -0,0 +1,5 @@ +--- +title: CURRENT_TIMESTAMP +--- + +Alias for [NOW](/tidb-cloud-lake/sql/now.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/current-user.md b/tidb-cloud-lake/sql/current-user.md new file mode 100644 index 0000000000000..263ea9e88c262 --- /dev/null +++ b/tidb-cloud-lake/sql/current-user.md @@ -0,0 +1,23 @@ +--- +title: CURRENT_USER +--- + +Returns the user name and host name combination for the account that the server used to authenticate the current client. This account determines your access privileges. The return value is a string in the utf8 character set. + +## Syntax + +```sql +CURRENT_USER() +``` + +## Examples + +```sql +SELECT CURRENT_USER(); + +┌────────────────┐ +│ current_user() │ +├────────────────┤ +│ 'root'@'%' │ +└────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/data-anonymization-functions.md b/tidb-cloud-lake/sql/data-anonymization-functions.md new file mode 100644 index 0000000000000..b12eb66ce0c43 --- /dev/null +++ b/tidb-cloud-lake/sql/data-anonymization-functions.md @@ -0,0 +1,27 @@ +--- +title: Data Anonymization Functions +--- + +Data anonymization is the process of altering or removing personally identifiable information (PII) from data sets to protect individual privacy. Its goal is to transform data so it cannot be linked back to specific individuals, while preserving the data's utility for analysis, research, and testing. + +### Common Data Categories for Anonymization + +Effective anonymization strategies typically target specific categories of sensitive data: + +* **Direct Identifiers (PII)**: Information that explicitly identifies a person, such as full names, email addresses, phone numbers, and government IDs. +* **Indirect Identifiers (Quasi-Identifiers)**: Attributes that can identify individuals when combined with other data sources, such as dates of birth, gender, zip codes, or job titles. +* **Sensitive Business Data**: Confidential information like financial transactions, salary details, or proprietary internal records that need protection in non-production environments. + +### Databend Anonymization Techniques + +Databend provides a set of functions to implement various anonymization techniques, including data masking, pseudonymization, and synthetic data generation: + +- **Data Masking**: Use the [`OBFUSCATE` table function](/tidb-cloud-lake/sql/obfuscate.md) to automatically apply masking rules to columns, replacing original values with artificial ones that appear genuine. +- **Pseudonymization**: Use [FEISTEL_OBFUSCATE](/tidb-cloud-lake/sql/feistel-obfuscate.md) to replace identifiers with deterministic substitutes. This preserves data integrity and cardinality, making it suitable for maintaining join keys. +- **Synthetic Data**: Use [MARKOV_TRAIN](/tidb-cloud-lake/sql/markov-train.md) and [MARKOV_GENERATE](/tidb-cloud-lake/sql/markov-generate.md) to produce machine-generated data that statistically resembles the original dataset but has no direct connection to real records. + +| Function | Description | +|----------|-------------| +| [MARKOV_GENERATE](/tidb-cloud-lake/sql/markov-generate.md) | Generate anonymized strings based on a Markov model | +| [FEISTEL_OBFUSCATE](/tidb-cloud-lake/sql/feistel-obfuscate.md) | Obfuscate numbers using a Feistel cipher | +| [OBFUSCATE](/tidb-cloud-lake/sql/obfuscate.md) | Table-level masking using built-in rules | diff --git a/tidb-cloud-lake/sql/data-types.md b/tidb-cloud-lake/sql/data-types.md new file mode 100644 index 0000000000000..eb122ead3b0cf --- /dev/null +++ b/tidb-cloud-lake/sql/data-types.md @@ -0,0 +1,104 @@ +--- +title: Data Types +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Databend stores data in strongly typed columns. This page summarizes the supported data types, how automatic/explicit conversions work, and what happens with NULL or default values. + +## Foundational Types + +| Data Type | Alias | Storage / Resolution | Min Value | Max Value | +|--------------------------------------------|------------|-----------------------------------|--------------------------|--------------------------------| +| [BOOLEAN](/tidb-cloud-lake/sql/boolean.md) | BOOL | 1 byte | – | – | +| [BINARY](/tidb-cloud-lake/sql/binary.md) | VARBINARY | variable | – | – | +| [VARCHAR](/tidb-cloud-lake/sql/string.md) | STRING | variable | – | – | +| [TINYINT](/tidb-cloud-lake/sql/numeric.md#integer-data-types) | INT8 | 1 byte | -128 | 127 | +| [SMALLINT](/tidb-cloud-lake/sql/numeric.md#integer-data-types) | INT16 | 2 bytes | -32768 | 32767 | +| [INT](/tidb-cloud-lake/sql/numeric.md#integer-data-types) | INT32 | 4 bytes | -2147483648 | 2147483647 | +| [BIGINT](/tidb-cloud-lake/sql/numeric.md#integer-data-types) | INT64 | 8 bytes | -9223372036854775808 | 9223372036854775807 | +| [FLOAT](/tidb-cloud-lake/sql/numeric.md#floating-point-data-types) | – | 4 bytes (Float32) | -3.40e38 | 3.40e38 | +| [DOUBLE](/tidb-cloud-lake/sql/numeric.md#floating-point-data-types) | – | 8 bytes (Float64) | -1.79e308 | 1.79e308 | +| [DECIMAL](/tidb-cloud-lake/sql/decimal.md) | – | 16/32 bytes (precision ≤38/76) | `-(10^P-1)/10^S` | `(10^P-1)/10^S` | + +## Date & Time Types + +| Data Type | Alias | Resolution / Notes | +|---------------------------|-----------|--------------------------------------| +| [DATE](/tidb-cloud-lake/sql/datetime.md) | – | Day precision | +| [TIMESTAMP](/tidb-cloud-lake/sql/datetime.md) | DATETIME | Microsecond, session timezone output | +| [TIMESTAMP_TZ](/tidb-cloud-lake/sql/datetime.md) | – | Microsecond + stored offset | +| [INTERVAL](/tidb-cloud-lake/sql/interval.md) | – | Microseconds, supports negative span | + +## Structured & Semi-Structured Types + +| Data Type | Sample | Description | +|-----------------------|----------------------------------------|-------------| +| [ARRAY](/tidb-cloud-lake/sql/array.md) | `[1, 2, 3]` | Ordered list of values with the same inner type. | +| [TUPLE](/tidb-cloud-lake/sql/tuple.md) | `('2023-02-14','Valentine's Day')` | Fixed-length ordered list with declared element types. | +| [MAP](/tidb-cloud-lake/sql/map.md) | `{'a': 1, 'b': 2}` | Key-value collection (internally tuples of key and value types). | +| [VARIANT](/tidb-cloud-lake/sql/variant.md) | `[1, {"name":"databend"}]` | JSON-like container that can mix primitives, arrays, and objects. | +| [BITMAP](/tidb-cloud-lake/sql/bitmap.md) | `` | Compressed bitmap optimized for membership and set operations. | + +## Domain-Specific Types + +| Data Type | Description | +|------------------------------------|-------------| +| [VECTOR](/tidb-cloud-lake/sql/vector.md) | Float32 embeddings for similarity search / ML workloads. | +| [GEOMETRY](/tidb-cloud-lake/sql/geospatial.md) / GEOGRAPHY | Spatial objects stored in WKB/EWKB format. | + +## Casting and Conversion + +### Explicit Casting + +- `CAST(expr AS TYPE)` uses ANSI syntax and fails when conversion is invalid. +- `expr::TYPE` is the PostgreSQL-style shorthand. +- `TRY_CAST(expr AS TYPE)` returns NULL instead of raising an error when conversion fails. + +### Implicit Casting (Coercion) + +Databend performs automatic conversions in well-defined situations: + +1. Integers upcast to `INT64`. Example: `UInt8 -> INT64`. +2. Numeric values upcast to `FLOAT64` when necessary. +3. Any type `T` can become `Nullable(T)` if a NULL appears in an expression. +4. All types can upcast to `VARIANT`. +5. Complex types coerce element-wise (`Array -> Array` when `T -> U`; same for tuples/maps). + +When a target column is `NOT NULL`, explicitly cast to `Nullable` or use `TRY_CAST` if your data may contain NULLs. + +```sql +SELECT CONCAT('1', col); -- safe (strings) +SELECT CONCAT(1, col); -- may fail if `col` can't coerce to number +``` + +## NULL Handling and Defaults + +Columns allow NULL values unless declared `NOT NULL`. When a `NOT NULL` column is omitted during INSERT, Databend writes a type-specific default value: + +| Type Category | Default | +|--------------------------|---------| +| Integer | `0` | +| Floating-point | `0.0` | +| String / Binary | empty string / empty binary | +| Date | `1970-01-01` | +| Timestamp | `1970-01-01 00:00:00` | +| Boolean | `FALSE` | + +Example: + +```sql +CREATE TABLE test ( + id INT64, + name STRING NOT NULL, + age INT32 +); + +INSERT INTO test (id, name, age) VALUES (2, 'Alice', NULL); -- allowed +INSERT INTO test (id, name) VALUES (1, 'John'); -- age becomes NULL +INSERT INTO test (id, age) VALUES (3, 45); -- name uses default '' +``` + +Use `DESC test` or `SHOW CREATE TABLE test` to inspect column defaults and nullability at any time. diff --git a/tidb-cloud-lake/sql/database-sql.md b/tidb-cloud-lake/sql/database-sql.md new file mode 100644 index 0000000000000..24d5c279f17ee --- /dev/null +++ b/tidb-cloud-lake/sql/database-sql.md @@ -0,0 +1,23 @@ +--- +title: DATABASE +--- + +Returns the name of the currently selected database. If no database is selected, then this function returns `default`. + +## Syntax + +```sql +DATABASE() +``` + +## Examples + +```sql +SELECT DATABASE(); + +┌────────────┐ +│ database() │ +├────────────┤ +│ default │ +└────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/date-add.md b/tidb-cloud-lake/sql/date-add.md new file mode 100644 index 0000000000000..01e5d2628a23c --- /dev/null +++ b/tidb-cloud-lake/sql/date-add.md @@ -0,0 +1,97 @@ +--- +title: DATE_ADD +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Adds a specified time interval to a DATE or TIMESTAMP value. + +## Syntax + +```sql +DATE_ADD(, , ) +``` + +| Parameter | Description | +|-----------------------|----------------------------------------------------------------------------------------------------| +| `` | Specifies the time unit: `YEAR`, `QUARTER`, `MONTH`, `WEEK`, `DAY`, `HOUR`, `MINUTE` and `SECOND`. | +| `` | The interval to add, e.g., 2 for 2 days if the unit is `DAY`. | +| `` | A value of `DATE` or `TIMESTAMP` type. | + +## Return Type + +DATE or TIMESTAMP (depending on the type of ``). + +## Examples + +This example adds different time intervals (year, quarter, month, week, and day) to the current date: + +```sql +SELECT + TODAY(), + DATE_ADD(YEAR, 1, TODAY()), + DATE_ADD(QUARTER, 1, TODAY()), + DATE_ADD(MONTH, 1, TODAY()), + DATE_ADD(WEEK, 1, TODAY()), + DATE_ADD(DAY, 1, TODAY()); + +-[ RECORD 1 ]----------------------------------- + today(): 2024-10-10 + DATE_ADD(YEAR, 1, today()): 2025-10-10 +DATE_ADD(QUARTER, 1, today()): 2025-01-10 + DATE_ADD(MONTH, 1, today()): 2024-11-10 + DATE_ADD(WEEK, 1, today()): 2024-10-17 + DATE_ADD(DAY, 1, today()): 2024-10-11 +``` + +This example adds different time intervals (hour, minute, and second) to the current timestamp: + +```sql +SELECT + NOW(), + DATE_ADD(HOUR, 1, NOW()), + DATE_ADD(MINUTE, 1, NOW()), + DATE_ADD(SECOND, 1, NOW()); + +-[ RECORD 1 ]----------------------------------- + now(): 2024-10-10 01:35:33.601312 + DATE_ADD(HOUR, 1, now()): 2024-10-10 02:35:33.601312 +DATE_ADD(MINUTE, 1, now()): 2024-10-10 01:36:33.601312 +DATE_ADD(SECOND, 1, now()): 2024-10-10 01:35:34.601312 +``` + +:::note +- When unit is MONTH, If date is the last day of the month or if the resulting month has fewer days than the day component of date, +- then the result is the last day of the resulting month. Otherwise, the result has the same day component as date. + +When adding a month to a date that would result in an invalid date (e.g., January 31 → February 31), it returns the last valid day of the resulting month: + +```sql +SELECT DATE_ADD(month, 1, '2023-01-31'::DATE) ; +╭────────────────────────────────────────╮ +│ DATE_ADD(MONTH, 1, '2023-01-31'::DATE) │ +│ Date │ +├────────────────────────────────────────┤ +│ 2023-02-28 │ +╰────────────────────────────────────────╯ + +``` + +When adding a month to a date where the resulting month has sufficient days, it performs simple month arithmetic: + +```sql +SELECT DATE_ADD(month, 1, '2023-02-28'::DATE); +╭────────────────────────────────────────╮ +│ DATE_ADD(MONTH, 1, '2023-02-28'::DATE) │ +│ Date │ +├────────────────────────────────────────┤ +│ 2023-03-28 │ +╰────────────────────────────────────────╯ + +``` + +## See Also + +- [ADD_MONTH](/tidb-cloud-lake/sql/add-months.md): Function for add months +- [DATE_SUB](/tidb-cloud-lake/sql/date-sub.md): Function for subtracting time intervals diff --git a/tidb-cloud-lake/sql/date-between.md b/tidb-cloud-lake/sql/date-between.md new file mode 100644 index 0000000000000..06f3c7af7a57a --- /dev/null +++ b/tidb-cloud-lake/sql/date-between.md @@ -0,0 +1,72 @@ +--- +title: DATE_BETWEEN +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Calculates the time interval between two dates or timestamps, returning the difference as an integer in the specified unit, with positive values indicating the first time is earlier than the second, and negative values indicating the opposite. + +See also: [DATE_DIFF](/tidb-cloud-lake/sql/date-diff.md) + +## Syntax + +```sql +DATE_BETWEEN( + YEAR | QUARTER | MONTH | WEEK | DAY | HOUR | MINUTE | SECOND | + DOW | DOY | EPOCH | ISODOW | YEARWEEK | MILLENNIUM, + , + +) +``` + +| Keyword | Description | +|--------------|-------------------------------------------------------------------------| +| `DOW` | Day of the Week. Sunday (0) through Saturday (6). | +| `DOY` | Day of the Year. 1 through 366. | +| `EPOCH` | The number of seconds since 1970-01-01 00:00:00. | +| `ISODOW` | ISO Day of the Week. Monday (1) through Sunday (7). | +| `YEARWEEK` | The year and week number combined, following ISO 8601 (e.g., 202415). | +| `MILLENNIUM` | The millennium of the date (1 for years 1–1000, 2 for 1001–2000, etc.). | + +## DATE_DIFF vs. DATE_BETWEEN + +The `DATE_DIFF` function counts how many boundaries of a user-specified unit (such as day, month, or year) are crossed between two dates, while `DATE_BETWEEN` counts how many complete units fall strictly between them. For example: + +```sql +SELECT + DATE_DIFF(month, '2025-07-31', '2025-10-01'), -- returns 3 + DATE_BETWEEN(month, '2025-07-31', '2025-10-01'); -- returns 2 +``` + +In this example, `DATE_DIFF` returns `3` because the range crosses three month boundaries (July → August → September → October), while `DATE_BETWEEN` returns `2` because there are two full months between the dates: August and September. + +## Examples + +This example calculates the difference between a fixed timestamp (`2020-01-01 00:00:00`) and the current timestamp (`NOW()`), across various units such as year, ISO weekday, year-week, and millennium: + +```sql +SELECT + DATE_BETWEEN(YEAR, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_year, + DATE_BETWEEN(QUARTER, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_quarter, + DATE_BETWEEN(MONTH, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_month, + DATE_BETWEEN(WEEK, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_week, + DATE_BETWEEN(DAY, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_day, + DATE_BETWEEN(HOUR, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_hour, + DATE_BETWEEN(MINUTE, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_minute, + DATE_BETWEEN(SECOND, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_second, + DATE_BETWEEN(DOW, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_dow, + DATE_BETWEEN(DOY, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_doy, + DATE_BETWEEN(EPOCH, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_epoch, + DATE_BETWEEN(ISODOW, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_isodow, + DATE_BETWEEN(YEARWEEK, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_yearweek, + DATE_BETWEEN(MILLENNIUM, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_millennium; +``` + +```sql +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ diff_year │ diff_quarter │ diff_month │ diff_week │ diff_day │ diff_hour │ diff_minute │ diff_second │ diff_dow │ diff_doy │ diff_epoch │ diff_isodow │ diff_yearweek │ diff_millennium │ +├───────────┼──────────────┼────────────┼───────────┼──────────┼───────────┼─────────────┼─────────────┼──────────┼──────────┼────────────┼─────────────┼───────────────┼─────────────────┤ +│ 5 │ 21 │ 63 │ 276 │ 1933 │ 46414 │ 2784887 │ 167093234 │ 1933 │ 1933 │ 167093234 │ 1933 │ 276 │ 0 │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/date-diff.md b/tidb-cloud-lake/sql/date-diff.md new file mode 100644 index 0000000000000..6b5e089dd5756 --- /dev/null +++ b/tidb-cloud-lake/sql/date-diff.md @@ -0,0 +1,72 @@ +--- +title: DATE_DIFF +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Calculates the difference between two dates or timestamps based on a specified time unit. The result is positive if the `` is after the ``, and negative if it's before. + +See also: [DATE_BETWEEN](/tidb-cloud-lake/sql/date-between.md) + +## Syntax + +```sql +DATE_DIFF( + YEAR | QUARTER | MONTH | WEEK | DAY | HOUR | MINUTE | SECOND | + DOW | DOY | EPOCH | ISODOW | YEARWEEK | MILLENNIUM, + , + +) +``` + +| Keyword | Description | +|--------------|-------------------------------------------------------------------------| +| `DOW` | Day of the Week. Sunday (0) through Saturday (6). | +| `DOY` | Day of the Year. 1 through 366. | +| `EPOCH` | The number of seconds since 1970-01-01 00:00:00. | +| `ISODOW` | ISO Day of the Week. Monday (1) through Sunday (7). | +| `YEARWEEK` | The year and week number combined, following ISO 8601 (e.g., 202415). | +| `MILLENNIUM` | The millennium of the date (1 for years 1–1000, 2 for 1001–2000, etc.). | + +## DATE_DIFF vs. DATE_BETWEEN + +The `DATE_DIFF` function counts how many boundaries of a user-specified unit (such as day, month, or year) are crossed between two dates, while `DATE_BETWEEN` counts how many complete units fall strictly between them. For example: + +```sql +SELECT + DATE_DIFF(month, '2025-07-31', '2025-10-01'), -- returns 3 + DATE_BETWEEN(month, '2025-07-31', '2025-10-01'); -- returns 2 +``` + +In this example, `DATE_DIFF` returns `3` because the range crosses three month boundaries (July → August → September → October), while `DATE_BETWEEN` returns `2` because there are two full months between the dates: August and September. + +## Examples + +This example calculates the difference between a fixed timestamp (`2020-01-01 00:00:00`) and the current timestamp (`NOW()`), across various units such as year, ISO weekday, year-week, and millennium: + +```sql +SELECT + DATE_DIFF(YEAR, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_year, + DATE_DIFF(QUARTER, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_quarter, + DATE_DIFF(MONTH, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_month, + DATE_DIFF(WEEK, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_week, + DATE_DIFF(DAY, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_day, + DATE_DIFF(HOUR, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_hour, + DATE_DIFF(MINUTE, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_minute, + DATE_DIFF(SECOND, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_second, + DATE_DIFF(DOW, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_dow, + DATE_DIFF(DOY, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_doy, + DATE_DIFF(EPOCH, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_epoch, + DATE_DIFF(ISODOW, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_isodow, + DATE_DIFF(YEARWEEK, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_yearweek, + DATE_DIFF(MILLENNIUM, TIMESTAMP '2020-01-01 00:00:00', NOW()) AS diff_millennium; +``` + +```sql +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ diff_year │ diff_quarter │ diff_month │ diff_week │ diff_day │ diff_hour │ diff_minute │ diff_second │ diff_dow │ diff_doy │ diff_epoch │ diff_isodow │ diff_yearweek │ diff_millennium │ +├───────────┼──────────────┼────────────┼───────────┼──────────┼───────────┼─────────────┼─────────────┼──────────┼──────────┼────────────┼─────────────┼───────────────┼─────────────────┤ +│ 5 │ 21 │ 63 │ 276 │ 1932 │ 46386 │ 2783184 │ 166991069 │ 1932 │ 1932 │ 166991069 │ 1932 │ 515 │ 0 │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/date-format.md b/tidb-cloud-lake/sql/date-format.md new file mode 100644 index 0000000000000..fee581e7ecc31 --- /dev/null +++ b/tidb-cloud-lake/sql/date-format.md @@ -0,0 +1,5 @@ +--- +title: DATE_FORMAT +--- + +Alias for [TO_STRING](/tidb-cloud-lake/sql/to-string.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/date-part.md b/tidb-cloud-lake/sql/date-part.md new file mode 100644 index 0000000000000..6f66b73ca55f0 --- /dev/null +++ b/tidb-cloud-lake/sql/date-part.md @@ -0,0 +1,64 @@ +--- +title: DATE_PART +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Retrieves the designated portion of a date or timestamp. + +See also: [EXTRACT](/tidb-cloud-lake/sql/extract.md) + +## Syntax + +```sql +DATE_PART( + YEAR | QUARTER | MONTH | WEEK | DAY | HOUR | MINUTE | SECOND | + DOW | DOY | EPOCH | ISODOW | YEARWEEK | MILLENNIUM, + +) +``` + +| Keyword | Description | +|--------------|-------------------------------------------------------------------------| +| `DOW` | Day of the Week. Sunday (0) through Saturday (6). | +| `DOY` | Day of the Year. 1 through 366. | +| `EPOCH` | The number of seconds since 1970-01-01 00:00:00. | +| `ISODOW` | ISO Day of the Week. Monday (1) through Sunday (7). | +| `YEARWEEK` | The year and week number combined, following ISO 8601 (e.g., 202415). | +| `MILLENNIUM` | The millennium of the date (1 for years 1–1000, 2 for 1001–2000, etc.). | + +## Return Type + +Integer. + +## Examples + +This example demonstrates how to use DATE_PART to extract various components—such as year, month, ISO week day, year-week combination, and millennium—from the current timestamp: + +```sql +SELECT + DATE_PART(YEAR, NOW()) AS year_part, + DATE_PART(QUARTER, NOW()) AS quarter_part, + DATE_PART(MONTH, NOW()) AS month_part, + DATE_PART(WEEK, NOW()) AS week_part, + DATE_PART(DAY, NOW()) AS day_part, + DATE_PART(HOUR, NOW()) AS hour_part, + DATE_PART(MINUTE, NOW()) AS minute_part, + DATE_PART(SECOND, NOW()) AS second_part, + DATE_PART(DOW, NOW()) AS dow_part, + DATE_PART(DOY, NOW()) AS doy_part, + DATE_PART(EPOCH, NOW()) AS epoch_part, + DATE_PART(ISODOW, NOW()) AS isodow_part, + DATE_PART(YEARWEEK, NOW()) AS yearweek_part, + DATE_PART(MILLENNIUM, NOW()) AS millennium_part; +``` + +```sql +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ year_part │ quarter_part │ month_part │ week_part │ day_part │ hour_part │ minute_part │ second_part │ dow_part │ doy_part │ epoch_part │ isodow_part │ yearweek_part │ millennium_part │ +├───────────┼──────────────┼────────────┼───────────┼──────────┼───────────┼─────────────┼─────────────┼──────────┼──────────┼───────────────────┼─────────────┼───────────────┼─────────────────┤ +│ 2025 │ 2 │ 4 │ 16 │ 16 │ 18 │ 10 │ 10 │ 3 │ 106 │ 1744827010.257671 │ 3 │ 202516 │ 3 │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/date-sub.md b/tidb-cloud-lake/sql/date-sub.md new file mode 100644 index 0000000000000..d192613ce0c8a --- /dev/null +++ b/tidb-cloud-lake/sql/date-sub.md @@ -0,0 +1,34 @@ +--- +title: DATE_SUB +--- + +Subtract the time interval or date interval from the provided date or date with time (timestamp/datetime). + +## Syntax + +```sql +DATE_SUB(, , ) +``` +## Arguments + +| Arguments | Description | +|-----------------------|-------------------------------------------------------------------------------------------------------------------| +| `` | Must be of the following values: `YEAR`, `QUARTER`, `MONTH`, `DAY`, `HOUR`, `MINUTE` and `SECOND` | +| `` | This is the number of units of time that you want to add. For example, if you want to add 2 days, this will be 2. | +| `` | A value of `DATE` or `TIMESTAMP` type | + +## Return Type + +The function returns a value of the same type as the `` argument. + +## Examples + +```sql +SELECT date_sub(YEAR, 1, to_date('2018-01-02')); + +┌──────────────────────────────────────────┐ +│ date_sub(year, 1, to_date('2018-01-02')) │ +├──────────────────────────────────────────┤ +│ 2017-01-02 │ +└──────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/date-time-functions.md b/tidb-cloud-lake/sql/date-time-functions.md new file mode 100644 index 0000000000000..bb9f7768a0257 --- /dev/null +++ b/tidb-cloud-lake/sql/date-time-functions.md @@ -0,0 +1,89 @@ +--- +title: Date & Time Functions +--- + +This page provides a comprehensive overview of Date & Time functions in Databend, organized by functionality for easy reference. + +## Current Date & Time Functions + +| Function | Description | Example | +|-------------------------------------------|-----------------------------------|------------------------------------------------------| +| [NOW](/tidb-cloud-lake/sql/now.md) | Returns the current date and time | `NOW()` → `2024-06-04 17:42:31.123456` | +| [CURRENT_TIMESTAMP](/tidb-cloud-lake/sql/current-timestamp.md) | Returns the current date and time | `CURRENT_TIMESTAMP()` → `2024-06-04 17:42:31.123456` | +| [TODAY](/tidb-cloud-lake/sql/today.md) | Returns the current date | `TODAY()` → `2024-06-04` | +| [TOMORROW](/tidb-cloud-lake/sql/tomorrow.md) | Returns tomorrow's date | `TOMORROW()` → `2024-06-05` | +| [YESTERDAY](/tidb-cloud-lake/sql/yesterday.md) | Returns yesterday's date | `YESTERDAY()` → `2024-06-03` | + +## Date & Time Extraction Functions + +| Function | Description | Example | +|-----------------------------------------------|--------------------------------------|------------------------------------------| +| [YEAR](/tidb-cloud-lake/sql/year.md) | Extracts the year from a date | `YEAR('2024-06-04')` → `2024` | +| [MONTH](/tidb-cloud-lake/sql/month.md) | Extracts the month from a date | `MONTH('2024-06-04')` → `6` | +| [DAY](/tidb-cloud-lake/sql/day.md) | Extracts the day from a date | `DAY('2024-06-04')` → `4` | +| [QUARTER](/tidb-cloud-lake/sql/quarter.md) | Extracts the quarter from a date | `QUARTER('2024-06-04')` → `2` | +| [WEEK](/tidb-cloud-lake/sql/week.md) / [WEEKOFYEAR](/tidb-cloud-lake/sql/weekofyear.md) | Extracts the week number from a date | `WEEK('2024-06-04')` → `23` | +| [EXTRACT](/tidb-cloud-lake/sql/extract.md) | Extracts a part from a date | `EXTRACT(MONTH FROM '2024-06-04')` → `6` | +| [DATE_PART](/tidb-cloud-lake/sql/date-part.md) | Extracts a part from a date | `DATE_PART('month', '2024-06-04')` → `6` | +| [YEARWEEK](/tidb-cloud-lake/sql/yearweek.md) | Returns year and week number | `YEARWEEK('2024-06-04')` → `202423` | +| [MILLENNIUM](/tidb-cloud-lake/sql/millennium.md) | Returns the millennium from a date | `MILLENNIUM('2024-06-04')` → `3` | + +## Date & Time Conversion Functions + +| Function | Description | Example | +|-------------------------------------------|---------------------------------------------|---------------------------------------------------------------| +| [DATE](/tidb-cloud-lake/sql/date.md) | Converts a value to DATE type | `DATE('2024-06-04')` → `2024-06-04` | +| [TO_DATE](/tidb-cloud-lake/sql/date.md) | Converts a string to DATE type | `TO_DATE('2024-06-04')` → `2024-06-04` | +| [TO_DATETIME](/tidb-cloud-lake/sql/datetime.md) | Converts a string to DATETIME type | `TO_DATETIME('2024-06-04 12:30:45')` → `2024-06-04 12:30:45` | +| [TO_TIMESTAMP](/tidb-cloud-lake/sql/timestamp.md) | Converts a string to TIMESTAMP type | `TO_TIMESTAMP('2024-06-04 12:30:45')` → `2024-06-04 12:30:45` | +| [TO_UNIX_TIMESTAMP](/tidb-cloud-lake/sql/unix-timestamp.md) | Converts a date to Unix timestamp | `TO_UNIX_TIMESTAMP('2024-06-04')` → `1717516800` | +| [TO_YYYYMM](/tidb-cloud-lake/sql/yyyymm.md) | Formats date as YYYYMM | `TO_YYYYMM('2024-06-04')` → `202406` | +| [TO_YYYYMMDD](/tidb-cloud-lake/sql/yyyymmdd.md) | Formats date as YYYYMMDD | `TO_YYYYMMDD('2024-06-04')` → `20240604` | +| [TO_YYYYMMDDHH](/tidb-cloud-lake/sql/yyyymmddhh.md) | Formats date as YYYYMMDDHH | `TO_YYYYMMDDHH('2024-06-04 12:30:45')` → `2024060412` | +| [TO_YYYYMMDDHHMMSS](/tidb-cloud-lake/sql/yyyymmddhhmmss.md) | Formats date as YYYYMMDDHHMMSS | `TO_YYYYMMDDHHMMSS('2024-06-04 12:30:45')` → `20240604123045` | +| [DATE_FORMAT](/tidb-cloud-lake/sql/date-format.md) | Formats a date according to a format string | `DATE_FORMAT('2024-06-04', '%Y-%m-%d')` → `'2024-06-04'` | +| [CONVERT_TIMEZONE](/tidb-cloud-lake/sql/convert-timezone.md) | Converts a timestamp to the target timezone | `CONVERT_TIMEZONE('America/Los_Angeles', '2024-11-01 11:36:10')` → `2024-10-31 20:36:10` | + +## Date & Time Arithmetic Functions + +| Function | Description | Example | +|------------------------------------------|----------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------| +| [DATE_ADD](/tidb-cloud-lake/sql/date-add.md) | Adds a time interval to a date | `DATE_ADD(DAY, 7, '2024-06-04')` → `2024-06-11` | +| [DATE_SUB](/tidb-cloud-lake/sql/date-sub.md) | Subtracts a time interval from a date | `DATE_SUB(MONTH, 1, '2024-06-04')` → `2024-05-04` | +| [ADD INTERVAL](addinterval.md) | Adds an interval to a date | `'2024-06-04' + INTERVAL 1 DAY` → `2024-06-05` | +| [SUBTRACT INTERVAL](subtractinterval.md) | Subtracts an interval from a date | `'2024-06-04' - INTERVAL 1 MONTH` → `2024-05-04` | +| [DATE_DIFF](/tidb-cloud-lake/sql/date-diff.md) | Returns the difference between two dates | `DATE_DIFF(DAY, '2024-06-01', '2024-06-04')` → `3` | +| [TIMESTAMP_DIFF](/tidb-cloud-lake/sql/timestamp-diff.md) | Returns the difference between two timestamps | `TIMESTAMP_DIFF(HOUR, '2024-06-04 10:00:00', '2024-06-04 15:00:00')` → `5` | +| [MONTHS_BETWEEN](/tidb-cloud-lake/sql/months-between.md) | Returns the number of months between two dates | `MONTHS_BETWEEN('2024-06-04', '2024-01-04')` → `5` | +| [DATE_BETWEEN](/tidb-cloud-lake/sql/date-between.md) | Checks if a date is between two other dates | `DATE_BETWEEN('2024-06-04', '2024-06-01', '2024-06-10')` → `true` | +| [AGE](/tidb-cloud-lake/sql/age.md) | Calculate the difference between timestamps or between a timestamp and the current date/time | `AGE('2000-01-01'::TIMESTAMP, '1990-05-15'::TIMESTAMP)` → `9 years 7 months 17 days` | +| [ADD_MONTHS](/tidb-cloud-lake/sql/add-months.md) | Adds months to a date while preserving end-of-month days. | `ADD_MONTHS('2025-04-30',1)` → `2025-05-31` | + +## Date & Time Truncation Functions + +| Function | Description | Example | +|-----------------------------------------------|------------------------------------------------------------------|---------------------------------------------------------------------| +| [DATE_TRUNC](/tidb-cloud-lake/sql/date-trunc.md) | Truncates a timestamp to a specified precision | `DATE_TRUNC('month', '2024-06-04')` → `2024-06-01` | +| [TIME_SLICE](/tidb-cloud-lake/sql/time-slice.md) | Map a single date/timestamp value to a calendar-aligned interval | `TIME_SLICE('2024-06-04', 4, 'MONTH', 'START')` → `2024-05-01` | +| [TO_START_OF_DAY](to-start-of-day.md) | Returns the start of the day | `TO_START_OF_DAY('2024-06-04 12:30:45')` → `2024-06-04 00:00:00` | +| [TO_START_OF_HOUR](to-start-of-hour.md) | Returns the start of the hour | `TO_START_OF_HOUR('2024-06-04 12:30:45')` → `2024-06-04 12:00:00` | +| [TO_START_OF_MINUTE](to-start-of-minute.md) | Returns the start of the minute | `TO_START_OF_MINUTE('2024-06-04 12:30:45')` → `2024-06-04 12:30:00` | +| [TO_START_OF_MONTH](to-start-of-month.md) | Returns the start of the month | `TO_START_OF_MONTH('2024-06-04')` → `2024-06-01` | +| [TO_START_OF_QUARTER](to-start-of-quarter.md) | Returns the start of the quarter | `TO_START_OF_QUARTER('2024-06-04')` → `2024-04-01` | +| [TO_START_OF_YEAR](to-start-of-year.md) | Returns the start of the year | `TO_START_OF_YEAR('2024-06-04')` → `2024-01-01` | +| [TO_START_OF_WEEK](to-start-of-week.md) | Returns the start of the week | `TO_START_OF_WEEK('2024-06-04')` → `2024-06-03` | + +## Date & Time Navigation Functions + +| Function | Description | Example | +|---------------------------------|--------------------------------------------------------|-------------------------------------------------------| +| [LAST_DAY](/tidb-cloud-lake/sql/last-day.md) | Returns the last day of the month | `LAST_DAY('2024-06-04')` → `2024-06-30` | +| [NEXT_DAY](/tidb-cloud-lake/sql/next-day.md) | Returns the date of the next specified day of week | `NEXT_DAY('2024-06-04', 'SUNDAY')` → `2024-06-09` | +| [PREVIOUS_DAY](/tidb-cloud-lake/sql/previous-day.md) | Returns the date of the previous specified day of week | `PREVIOUS_DAY('2024-06-04', 'MONDAY')` → `2024-06-03` | + +## Other Date & Time Functions + +| Function | Description | Example | +|---------------------------|------------------------------|--------------------------------------------------------------------------| +| [TIMEZONE](/tidb-cloud-lake/sql/timezone.md) | Returns the current timezone | `TIMEZONE()` → `'UTC'` | +| [TIME_SLOT](/tidb-cloud-lake/sql/time-slot.md) | Returns time slots | `TIME_SLOT('2024-06-04 12:30:45', 15, 'MINUTE')` → `2024-06-04 12:30:00` | diff --git a/tidb-cloud-lake/sql/date-time.md b/tidb-cloud-lake/sql/date-time.md new file mode 100644 index 0000000000000..41ebf84f01c19 --- /dev/null +++ b/tidb-cloud-lake/sql/date-time.md @@ -0,0 +1,385 @@ +--- +title: Date & Time +description: Databend's Date and Time data type supports standardization and compatibility with various SQL standards, making it easier for users migrating from other database systems. +sidebar_position: 6 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +## Overview + +| Name | Aliases | Storage Size | Resolution | Min Value | Max Value | Format | +|--------------|---------------------------|--------------|-------------|----------------------------|--------------------------------|--------------------------------------------------------------------------------| +| DATE | | 4 bytes | Day | 0001-01-01 | 9999-12-31 | `YYYY-MM-DD` | +| TIMESTAMP | DATETIME | 8 bytes | Microsecond | 0001-01-01 00:00:00.000000 | 9999-12-31 23:59:59.999999 UTC | `YYYY-MM-DD hh:mm:ss[.fraction]`, uses session timezone for display | +| TIMESTAMP_TZ | TIMESTAMP WITH TIME ZONE | 8 bytes | Microsecond | 0001-01-01 00:00:00.000000 | 9999-12-31 23:59:59.999999 UTC | `YYYY-MM-DD hh:mm:ss[.fraction]±hh:mm`, stores UTC value plus offset | + +`DATE` keeps only calendar values, `TIMESTAMP` stores UTC internally but renders through the current session timezone, and `TIMESTAMP_TZ` preserves the original offset for auditing or replication scenarios. + +## Examples + +### DATE + +```sql +CREATE TABLE events (event_date DATE); +INSERT INTO events VALUES ('2024-01-15'), ('2024-12-31'); +SELECT * FROM events; +``` + +Result: +``` +┌────────────┐ +│ event_date │ +├────────────┤ +│ 2024-01-15 │ +│ 2024-12-31 │ +└────────────┘ +``` + +### TIMESTAMP + +```sql +CREATE TABLE meetings ( + meeting_id INT, + meeting_time TIMESTAMP +); + +INSERT INTO meetings VALUES (1, '2024-01-15 14:00:00+08:00'); + +SETTINGS (timezone = 'UTC') +SELECT meeting_id, meeting_time FROM meetings; + +SETTINGS (timezone = 'America/New_York') +SELECT meeting_id, meeting_time FROM meetings; +``` + +Result (timezone = 'UTC'): +``` +┌────────────┬──────────────────────┐ +│ meeting_id │ meeting_time │ +├────────────┼──────────────────────┤ +│ 1 │ 2024-01-15T06:00:00 │ +└────────────┴──────────────────────┘ +``` + +Result (timezone = 'America/New_York'): +``` +┌────────────┬──────────────────────┐ +│ meeting_id │ meeting_time │ +├────────────┼──────────────────────┤ +│ 1 │ 2024-01-15T01:00:00 │ +└────────────┴──────────────────────┘ +``` + +### TIMESTAMP_TZ + +```sql +CREATE TABLE system_logs ( + log_id INT, + log_time TIMESTAMP_TZ +); + +INSERT INTO system_logs VALUES + (1, '2024-01-15 14:00:00+08:00'), + (2, '2024-01-15 06:00:00+00:00'), + (3, '2024-01-15 01:00:00-05:00'); + +SETTINGS (timezone = 'UTC') +SELECT log_id, TO_STRING(log_time) AS log_time FROM system_logs; + +SETTINGS (timezone = 'Asia/Shanghai') +SELECT log_id, TO_STRING(log_time) AS log_time FROM system_logs; +``` + +Result (timezone = 'UTC'): +``` +┌────────┬────────────────────────────────────────────┐ +│ log_id │ log_time │ +├────────┼────────────────────────────────────────────┤ +│ 1 │ 2024-01-15 14:00:00.000000 +0800 │ +│ 2 │ 2024-01-15 06:00:00.000000 +0000 │ +│ 3 │ 2024-01-15 01:00:00.000000 -0500 │ +└────────┴────────────────────────────────────────────┘ +``` + +Result (timezone = 'Asia/Shanghai'): +``` +┌────────┬────────────────────────────────────────────┐ +│ log_id │ log_time │ +├────────┼────────────────────────────────────────────┤ +│ 1 │ 2024-01-15 14:00:00.000000 +0800 │ +│ 2 │ 2024-01-15 06:00:00.000000 +0000 │ +│ 3 │ 2024-01-15 01:00:00.000000 -0500 │ +└────────┴────────────────────────────────────────────┘ +``` + +The offset is part of the stored value, so the display never changes. + +## Choosing the Right Type + +- Use `DATE` for calendar values without time of day. +- Use `TIMESTAMP` when different sessions should display the same moment in their local timezone. +- Use `TIMESTAMP_TZ` when you must keep the input offset for compliance or debugging. + +## Daylight Saving Time Adjustments + +Enable `enable_dst_hour_fix` to make Databend automatically roll missing hours forward when daylight saving time skips part of the day. + +```sql +SET enable_dst_hour_fix = 1; + +SETTINGS (timezone = 'America/Toronto') +SELECT to_datetime('2024-03-10 02:01:00'); +``` + +Result: +``` +┌────────────────────────────────────┐ +│ to_datetime('2024-03-10 02:01:00') │ +├────────────────────────────────────┤ +│ 2024-03-10T03:01:00 │ +└────────────────────────────────────┘ +``` + +Use `SET enable_dst_hour_fix = 0` to return to the default behavior if you would rather raise errors for missing hours. + +## Handling Invalid Values + +Dates outside the supported range automatically clamp to their minimum values. + +```sql +SELECT + ADD_DAYS(TO_DATE('9999-12-31'), 1) AS overflow_date, + SUBTRACT_MINUTES(TO_DATE('1000-01-01'), 1) AS underflow_timestamp; +``` + +Result: +``` +┌───────────────┬──────────────────────────┐ +│ overflow_date │ underflow_timestamp │ +├───────────────┼──────────────────────────┤ +│ 0001-01-01 │ 0999-12-31T18:41:28 │ +└───────────────┴──────────────────────────┘ +``` +The values wrap to the minimum representable date or timestamp instead of raising an error. +## Formatting Date and Time + +Functions such as [TO_DATE](/tidb-cloud-lake/sql/date.md) and [TO_TIMESTAMP](/tidb-cloud-lake/sql/timestamp.md) accept explicit format strings. Control how they parse or render values by adjusting `date_format_style` and `week_start`. + +### Date Format Styles + +Use `date_format_style` to switch between two format vocabularies: + +- **MySQL** (default) uses specifiers like `%Y`, `%m`, `%d`. +- **Oracle** uses specifiers like `YYYY`, `MM`, `DD` to match ANSI-style masks. + +```sql +-- Oracle-style mask +SETTINGS (date_format_style = 'Oracle') +SELECT to_string('2024-04-05'::DATE, 'YYYY-MM-DD'); +``` + +Result (Oracle): +``` +┌──────────────────────────────────────┐ +│ to_string('2024-04-05'::DATE, 'YYYY-MM-DD') │ +├──────────────────────────────────────┤ +│ 2024-04-05 │ +└──────────────────────────────────────┘ +``` + +```sql +-- Back to MySQL-style mask +SETTINGS (date_format_style = 'MySQL') +SELECT to_string('2024-04-05'::DATE, '%Y-%m-%d'); +``` + +Result (MySQL): +``` +┌──────────────────────────────────────┐ +│ to_string('2024-04-05'::DATE, '%Y-%m-%d') │ +├──────────────────────────────────────┤ +│ 2024-04-05 │ +└──────────────────────────────────────┘ +``` + +### Week Start Configuration + +`week_start` defines which day begins the week for functions such as `DATE_TRUNC` or `TRUNC` when using `WEEK` precision. + +```sql +SETTINGS (week_start = 0) SELECT DATE_TRUNC(WEEK, to_date('2024-04-05')); -- Sunday +SETTINGS (week_start = 1) SELECT DATE_TRUNC(WEEK, to_date('2024-04-05')); -- Monday +``` + +Result (week_start = 0): +``` +┌────────────────────────────────┐ +│ DATE_TRUNC(WEEK, TO_DATE('2024-04-05')) │ +├────────────────────────────────┤ +│ 2024-03-31 │ +└────────────────────────────────┘ +``` + +Result (week_start = 1): +``` +┌────────────────────────────────┐ +│ DATE_TRUNC(WEEK, TO_DATE('2024-04-05')) │ +├────────────────────────────────┤ +│ 2024-04-01 │ +└────────────────────────────────┘ +``` + +### MySQL Format Specifiers + +To handle date and time formatting, Databend makes use of the chrono::format::strftime module, which is a standard module provided by the chrono library in Rust. This module enables precise control over the formatting of dates and times. The following content is excerpted from [https://docs.rs/chrono/latest/chrono/format/strftime/index.html](https://docs.rs/chrono/latest/chrono/format/strftime/index.html): + +| Spec. | Example | Description | +| ----- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| | | DATE SPECIFIERS: | +| %Y | 2001 | The full proleptic Gregorian year, zero-padded to 4 digits. chrono supports years from -262144 to 262143. Note: years before 1 BCE or after 9999 CE, require an initial sign (+/-). | +| %C | 20 | The proleptic Gregorian year divided by 100, zero-padded to 2 digits. | +| %y | 01 | The proleptic Gregorian year modulo 100, zero-padded to 2 digits. | +| %m | 07 | Month number (01–12), zero-padded to 2 digits. | +| %b | Jul | Abbreviated month name. Always 3 letters. | +| %B | July | Full month name. Also accepts corresponding abbreviation in parsing. | +| %h | Jul | Same as %b. | +| %d | 08 | Day number (01–31), zero-padded to 2 digits. | +| %e | 8 | Same as %d but space-padded. Same as %\_d. | +| %a | Sun | Abbreviated weekday name. Always 3 letters. | +| %A | Sunday | Full weekday name. Also accepts corresponding abbreviation in parsing. | +| %w | 0 | Sunday = 0, Monday = 1, …, Saturday = 6. | +| %u | 7 | Monday = 1, Tuesday = 2, …, Sunday = 7. (ISO 8601) | +| %U | 28 | Week number starting with Sunday (00–53), zero-padded to 2 digits. | +| %W | 27 | Same as %U, but week 1 starts with the first Monday in that year instead. | +| %G | 2001 | Same as %Y but uses the year number in ISO 8601 week date. | +| %g | 01 | Same as %y but uses the year number in ISO 8601 week date. | +| %V | 27 | Same as %U but uses the week number in ISO 8601 week date (01–53). | +| %j | 189 | Day of the year (001–366), zero-padded to 3 digits. | +| %D | 07/08/01 | Month-day-year format. Same as %m/%d/%y. | +| %x | 07/08/01 | Locale’s date representation (e.g., 12/31/99). | +| %F | 2001-07-08 | Year-month-day format (ISO 8601). Same as %Y-%m-%d. | +| %v | 8-Jul-2001 | Day-month-year format. Same as %e-%b-%Y. | +| | | TIME SPECIFIERS: | +| %H | 00 | Hour number (00–23), zero-padded to 2 digits. | +| %k | 0 | Same as %H but space-padded. Same as %\_H. | +| %I | 12 | Hour number in 12-hour clocks (01–12), zero-padded to 2 digits. | +| %l | 12 | Same as %I but space-padded. Same as %\_I. | +| %P | am | am or pm in 12-hour clocks. | +| %p | AM | AM or PM in 12-hour clocks. | +| %M | 34 | Minute number (00–59), zero-padded to 2 digits. | +| %S | 60 | Second number (00–60), zero-padded to 2 digits. | +| %f | 026490000 | The fractional seconds (in nanoseconds) since last whole second. Databend recommends converting the Integer string into an Integer first, other than using this specifier. See [Converting Integer to Timestamp](/tidb-cloud-lake/sql/timestamp.md#example-2-converting-integer-to-timestamp) for an example. | +| %.f | .026490 | Similar to .%f but left-aligned. These all consume the leading dot. | +| %.3f | .026 | Similar to .%f but left-aligned but fixed to a length of 3. | +| %.6f | .026490 | Similar to .%f but left-aligned but fixed to a length of 6. | +| %.9f | .026490000 | Similar to .%f but left-aligned but fixed to a length of 9. | +| %3f | 026 | Similar to %.3f but without the leading dot. | +| %6f | 026490 | Similar to %.6f but without the leading dot. | +| %9f | 026490000 | Similar to %.9f but without the leading dot. | +| %R | 00:34 | Hour-minute format. Same as %H:%M. | +| %T | 00:34:60 | Hour-minute-second format. Same as %H:%M:%S. | +| %X | 00:34:60 | Locale’s time representation (e.g., 23:13:48). | +| %r | 12:34:60 AM | Hour-minute-second format in 12-hour clocks. Same as %I:%M:%S %p. | +| | | TIME ZONE SPECIFIERS: | +| %Z | ACST | Local time zone name. Skips all non-whitespace characters during parsing. | +| %z | +0930 | Offset from the local time to UTC (with UTC being +0000). | +| %:z | +09:30 | Same as %z but with a colon. | +| %::z | +09:30:00 | Offset from the local time to UTC with seconds. | +| %:::z | +09 | Offset from the local time to UTC without minutes. | +| %#z | +09 | Parsing only: Same as %z but allows minutes to be missing or present. | +| | | DATE & TIME SPECIFIERS: | +| %c | Sun Jul 8 00:34:60 2001 | Locale’s date and time (e.g., Thu Mar 3 23:05:25 2005). | +| %+ | 2001-07-08T00:34:60.026490+09:30 | ISO 8601 / RFC 3339 date & time format. | +| %s | 994518299 | UNIX timestamp, the number of seconds since 1970-01-01 00:00 UTC. Databend recommends converting the Integer string into an Integer first, other than using this specifier. See [Converting Integer to Timestamp](/tidb-cloud-lake/sql/timestamp.md#example-2-converting-integer-to-timestamp) for an example. | +| | | SPECIAL SPECIFIERS: | +| %t | | Literal tab (\t). | +| %n | | Literal newline (\n). | +| %% | | Literal percent sign. | + +It is possible to override the default padding behavior of numeric specifiers %?. This is not allowed for other specifiers and will result in the BAD_FORMAT error. + +| Modifier | Description | +| -------- | ----------------------------------------------------------------------------- | +| %-? | Suppresses any padding including spaces and zeroes. (e.g. %j = 012, %-j = 12) | +| %\_? | Uses spaces as a padding. (e.g. %j = 012, %\_j = 12) | +| %0? | Uses zeroes as a padding. (e.g. %e = 9, %0e = 09) | + +- %C, %y: This is floor division, so 100 BCE (year number -99) will print -1 and 99 respectively. + +- %U: Week 1 starts with the first Sunday in that year. It is possible to have week 0 for days before the first Sunday. + +- %G, %g, %V: Week 1 is the first week with at least 4 days in that year. Week 0 does not exist, so this should be used with %G or %g. + +- %S: It accounts for leap seconds, so 60 is possible. + +- %f, %.f, %.3f, %.6f, %.9f, %3f, %6f, %9f: + The default %f is right-aligned and always zero-padded to 9 digits for the compatibility with glibc and others, so it always counts the number of nanoseconds since the last whole second. E.g. 7ms after the last second will print 007000000, and parsing 7000000 will yield the same. + + The variant %.f is left-aligned and print 0, 3, 6 or 9 fractional digits according to the precision. E.g. 70ms after the last second under %.f will print .070 (note: not .07), and parsing .07, .070000 etc. will yield the same. Note that they can print or read nothing if the fractional part is zero or the next character is not .. + + The variant %.3f, %.6f and %.9f are left-aligned and print 3, 6 or 9 fractional digits according to the number preceding f. E.g. 70ms after the last second under %.3f will print .070 (note: not .07), and parsing .07, .070000 etc. will yield the same. Note that they can read nothing if the fractional part is zero or the next character is not . however will print with the specified length. + + The variant %3f, %6f and %9f are left-aligned and print 3, 6 or 9 fractional digits according to the number preceding f, but without the leading dot. E.g. 70ms after the last second under %3f will print 070 (note: not 07), and parsing 07, 070000 etc. will yield the same. Note that they can read nothing if the fractional part is zero. + +- %Z: Offset will not be populated from the parsed data, nor will it be validated. Timezone is completely ignored. Similar to the glibc strptime treatment of this format code. + + It is not possible to reliably convert from an abbreviation to an offset, for example CDT can mean either Central Daylight Time (North America) or China Daylight Time. + +- %+: Same as %Y-%m-%dT%H:%M:%S%.f%:z, i.e. 0, 3, 6 or 9 fractional digits for seconds and colons in the time zone offset. + + This format also supports having a Z or UTC in place of %:z. They are equivalent to +00:00. + + Note that all T, Z, and UTC are parsed case-insensitively. + + The typical strftime implementations have different (and locale-dependent) formats for this specifier. While Chrono's format for %+ is far more stable, it is best to avoid this specifier if you want to control the exact output. + +- %s: This is not padded and can be negative. For the purpose of Chrono, it only accounts for non-leap seconds so it slightly differs from ISO C strftime behavior. + +### Oracle Format Specifiers + +When `date_format_style` is set to 'Oracle', the following format specifiers are supported: + +| Oracle Format | Description | Example Output (for '2024-04-05 14:30:45.123456') | +|---------------|----------------------------------------------|---------------------------------------------------| +| YYYY | 4-digit year | 2024 | +| YY | 2-digit year | 24 | +| MMMM | Full month name | April | +| MON | Abbreviated month name | Apr | +| MM | Month number (01-12) | 04 | +| DD | Day of month (01-31) | 05 | +| DY | Abbreviated day name | Fri | +| HH24 | Hour of day (00-23) | 14 | +| HH12 | Hour of day (01-12) | 02 | +| AM/PM | Meridian indicator | PM | +| MI | Minute (00-59) | 30 | +| SS | Second (00-59) | 45 | +| FF | Fractional seconds | 123456 | +| UUUU | ISO week-numbering year | 2024 | +| TZH:TZM | Time zone hour and minute with colon | +08:00 | +| TZH | Time zone hour | +08 | + +Examples comparing MySQL and Oracle format styles with the same data: + +```sql +-- MySQL format style (default) +SELECT to_string('2022-12-25'::DATE, '%m/%d/%Y'); + +┌────────────────────────────────┐ +│ to_string('2022-12-25', '%m/%d/%Y') │ +├────────────────────────────────┤ +│ 12/25/2022 │ +└────────────────────────────────┘ + +-- Oracle format style (same data as MySQL example above) +SETTINGS (date_format_style = 'Oracle') +SELECT to_string('2022-12-25'::DATE, 'MM/DD/YYYY'); + +┌────────────────────────────────┐ +│ to_string('2022-12-25', 'MM/DD/YYYY') │ +├────────────────────────────────┤ +│ 12/25/2022 │ +└────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/date-trunc.md b/tidb-cloud-lake/sql/date-trunc.md new file mode 100644 index 0000000000000..3450093fc87e9 --- /dev/null +++ b/tidb-cloud-lake/sql/date-trunc.md @@ -0,0 +1,71 @@ +--- +title: DATE_TRUNC +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Truncates a date or timestamp to a specified precision, providing a standardized way to manipulate dates and timestamps. This function is designed to be compatible with various database systems, making it easier for users to migrate and work with different databases. + +## Syntax + +```sql +DATE_TRUNC(, ) +``` + +| Parameter | Description | +|-----------------------|------------------------------------------------------------------------------------------------------------| +| `` | Must be of the following values: `YEAR`, `QUARTER`, `MONTH`, `WEEK`, `DAY`, `HOUR`, `MINUTE` and `SECOND`. | +| `` | A value of `DATE` or `TIMESTAMP` type. | + +## Week Start Configuration + +When using `WEEK` as the precision parameter, the result depends on the `week_start` setting, which defines the first day of the week: + +- `week_start = 1` (default): Monday is considered the first day of the week +- `week_start = 0`: Sunday is considered the first day of the week + +You can use the `SETTINGS` clause to change this setting for a specific query: + +```sql +-- Set Sunday as the first day of the week +SETTINGS (week_start = 0) SELECT DATE_TRUNC(WEEK, to_date('2024-04-05')); + +-- Set Monday as the first day of the week (default) +SETTINGS (week_start = 1) SELECT DATE_TRUNC(WEEK, to_date('2024-04-05')); +``` + +## Return Type + +Same as ``. + +## Examples + +```sql +SELECT + DATE_TRUNC(MONTH, to_date('2022-07-07')), + DATE_TRUNC(WEEK, to_date('2022-07-07')); + +┌────────────────────────────────────────────────────────────────────────────────────┐ +│ DATE_TRUNC(MONTH, to_date('2022-07-07')) │ DATE_TRUNC(WEEK, to_date('2022-07-07')) │ +├──────────────────────────────────────────┼─────────────────────────────────────────┤ +│ 2022-07-01 │ 2022-07-04 │ +└────────────────────────────────────────────────────────────────────────────────────┘ +``` + +```sql +SELECT + DATE_TRUNC(HOUR, to_timestamp('2022-07-07 01:01:01.123456')), + DATE_TRUNC(SECOND, to_timestamp('2022-07-07 01:01:01.123456')); + +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ DATE_TRUNC(HOUR, to_timestamp('2022-07-07 01:01:01.123456')) │ DATE_TRUNC(SECOND, to_timestamp('2022-07-07 01:01:01.123456')) │ +├─────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────┤ +│ 2022-07-07 01:00:00.000000 │ 2022-07-07 01:01:01.000000 │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +## See Also + +- [TRUNC](/tidb-cloud-lake/sql/trunc.md): Provides similar functionality with a different syntax for better SQL standard compatibility. +- [TIME_SLICE](/tidb-cloud-lake/sql/time-slice.md): Map a single date/timestamp value to a calendar-aligned interval. diff --git a/tidb-cloud-lake/sql/date.md b/tidb-cloud-lake/sql/date.md new file mode 100644 index 0000000000000..e831ed94aef28 --- /dev/null +++ b/tidb-cloud-lake/sql/date.md @@ -0,0 +1,8 @@ +--- +title: DATE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Alias for [TO_DATE](/tidb-cloud-lake/sql/to-date.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/datetime.md b/tidb-cloud-lake/sql/datetime.md new file mode 100644 index 0000000000000..d3343f36899d1 --- /dev/null +++ b/tidb-cloud-lake/sql/datetime.md @@ -0,0 +1,5 @@ +--- +title: TO_DATETIME +--- + +Alias for [TO_TIMESTAMP](/tidb-cloud-lake/sql/to-timestamp.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/day-month.md b/tidb-cloud-lake/sql/day-month.md new file mode 100644 index 0000000000000..1314b2866401d --- /dev/null +++ b/tidb-cloud-lake/sql/day-month.md @@ -0,0 +1,37 @@ +--- +title: TO_DAY_OF_MONTH +--- + +Convert a date or date with time (timestamp/datetime) to a UInt8 number containing the number of the day of the month (1-31). + +## Syntax + +```sql +TO_DAY_OF_MONTH() +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------| +| `` | date/timestamp | + +## Aliases + +- [DAY](/tidb-cloud-lake/sql/day.md) + +## Return Type + +`TINYINT` + +## Examples + +```sql +SELECT NOW(), TO_DAY_OF_MONTH(NOW()), DAY(NOW()); + +┌──────────────────────────────────────────────────────────────────┐ +│ now() │ to_day_of_month(now()) │ day(now()) │ +├────────────────────────────┼────────────────────────┼────────────┤ +│ 2024-03-14 23:35:41.947962 │ 14 │ 14 │ +└──────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/day-week.md b/tidb-cloud-lake/sql/day-week.md new file mode 100644 index 0000000000000..7e757dce3207e --- /dev/null +++ b/tidb-cloud-lake/sql/day-week.md @@ -0,0 +1,35 @@ +--- +title: TO_DAY_OF_WEEK +--- + +Converts a date or date with time (timestamp/datetime) to a UInt8 number containing the number of the day of the week (Monday is 1, and Sunday is 7). + +## Syntax + +```sql +TO_DAY_OF_WEEK() +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------| +| `` | date/timestamp | + +## Return Type + +`TINYINT` + +## Examples + +```sql + +SELECT + to_day_of_week('2023-11-12 09:38:18.165575'); + +┌──────────────────────────────────────────────┐ +│ to_day_of_week('2023-11-12 09:38:18.165575') │ +├──────────────────────────────────────────────┤ +│ 7 │ +└──────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/day-year.md b/tidb-cloud-lake/sql/day-year.md new file mode 100644 index 0000000000000..08d93f6f0542c --- /dev/null +++ b/tidb-cloud-lake/sql/day-year.md @@ -0,0 +1,34 @@ +--- +title: TO_DAY_OF_YEAR +--- + +Convert a date or date with time (timestamp/datetime) to a UInt16 number containing the number of the day of the year (1-366). + +## Syntax + +```sql +TO_DAY_OF_YEAR() +``` + +## Arguments + +| Arguments | Description | +| ----------- | ----------- | +| `` | date/timestamp | + +## Return Type + +`SMALLINT` + +## Examples + +```sql +SELECT + to_day_of_year('2023-11-12 09:38:18.165575'); + +┌──────────────────────────────────────────────┐ +│ to_day_of_year('2023-11-12 09:38:18.165575') │ +├──────────────────────────────────────────────┤ +│ 316 │ +└──────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/day.md b/tidb-cloud-lake/sql/day.md new file mode 100644 index 0000000000000..903baa3b22efa --- /dev/null +++ b/tidb-cloud-lake/sql/day.md @@ -0,0 +1,8 @@ +--- +title: DAY +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Alias for [TO_DAY_OF_MONTH](to-day-of-month.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/days.md b/tidb-cloud-lake/sql/days.md new file mode 100644 index 0000000000000..b8747526289fe --- /dev/null +++ b/tidb-cloud-lake/sql/days.md @@ -0,0 +1,32 @@ +--- +title: TO_DAYS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts a specified number of days into an Interval type. + +- Accepts positive integers, zero, and negative integers as input. + +## Syntax + +```sql +TO_DAYS() +``` + +## Return Type + +Interval (represented in days). + +## Examples + +```sql +SELECT TO_DAYS(2), TO_DAYS(0), TO_DAYS(-2); + +┌────────────────────────────────────────┐ +│ to_days(2) │ to_days(0) │ to_days(- 2) │ +├────────────┼────────────┼──────────────┤ +│ 2 days │ 00:00:00 │ -2 days │ +└────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/ddl-database-overview.md b/tidb-cloud-lake/sql/ddl-database-overview.md new file mode 100644 index 0000000000000..8ac8c4bd66576 --- /dev/null +++ b/tidb-cloud-lake/sql/ddl-database-overview.md @@ -0,0 +1,27 @@ +--- +title: Database +--- + +This page provides a comprehensive overview of database operations in Databend, organized by functionality for easy reference. + +## Database Creation & Management + +| Command | Description | +|---------|-------------| +| [CREATE DATABASE](/tidb-cloud-lake/sql/create-database.md) | Creates a new database | +| [ALTER DATABASE](ddl-alter-database.md) | Modifies a database | +| [DROP DATABASE](/tidb-cloud-lake/sql/drop-database.md) | Removes a database | +| [USE DATABASE](/tidb-cloud-lake/sql/use-database.md) | Sets the current working database | +| [UNDROP DATABASE](/tidb-cloud-lake/sql/undrop-database.md) | Recovers a dropped database | + +## Database Information + +| Command | Description | +|---------|-------------| +| [SHOW DATABASES](/tidb-cloud-lake/sql/show-databases.md) | Lists all databases | +| [SHOW CREATE DATABASE](/tidb-cloud-lake/sql/show-create-database.md) | Shows the CREATE DATABASE statement for a database | +| [SHOW DROP DATABASES](/tidb-cloud-lake/sql/show-drop-databases.md) | Lists dropped databases that can be recovered | + +:::note +Database operations are foundational for organizing your data in Databend. Make sure you have appropriate privileges before executing these commands. +::: \ No newline at end of file diff --git a/tidb-cloud-lake/sql/ddl-table-overview.md b/tidb-cloud-lake/sql/ddl-table-overview.md new file mode 100644 index 0000000000000..d1b20260660a8 --- /dev/null +++ b/tidb-cloud-lake/sql/ddl-table-overview.md @@ -0,0 +1,54 @@ +--- +title: Table +--- + +This page provides a comprehensive overview of table operations in Databend, organized by functionality for easy reference. + +## Table Creation + +| Command | Description | +|---------|-------------| +| [CREATE TABLE](/tidb-cloud-lake/sql/create-table.md) | Creates a new table with specified columns and options | +| [CREATE TABLE ... LIKE](/tidb-cloud-lake/sql/create-table.md#create-table--like) | Creates a table with the same column definitions as an existing one | +| [CREATE TABLE ... AS](/tidb-cloud-lake/sql/create-table.md#create-table--as) | Creates a table and inserts data based on the results of a SELECT query | +| [CREATE TRANSIENT TABLE](/tidb-cloud-lake/sql/create-transient-table.md) | Creates a table without Time Travel support | +| [CREATE EXTERNAL TABLE](10-ddl-create-table-external-location.md) | Creates a table with data stored in a specified external location | +| [ATTACH TABLE](/tidb-cloud-lake/sql/attach-table.md) | Creates a table by associating it with an existing table | + +## Table Modification + +| Command | Description | +|---------|-------------| +| [ALTER TABLE](/tidb-cloud-lake/sql/alter-table.md) | Modifies table columns, comments, Fuse options, external connections, or swaps metadata with another table | +| [RENAME TABLE](/tidb-cloud-lake/sql/rename-table.md) | Changes the name of a table | + +## Table Information + +| Command | Description | +|---------|-------------| +| [DESCRIBE TABLE](/tidb-cloud-lake/sql/describe-table.md) / [SHOW FIELDS](/tidb-cloud-lake/sql/show-fields.md) | Shows information about the columns in a given table | +| [SHOW FULL COLUMNS](show-full-columns.md) | Retrieves comprehensive details about the columns in a given table | +| [SHOW CREATE TABLE](/tidb-cloud-lake/sql/show-create-table.md) | Shows the CREATE TABLE statement that creates the named table | +| [SHOW TABLES](/tidb-cloud-lake/sql/show-tables.md) | Lists the tables in the current or a specified database | +| [SHOW TABLE STATUS](/tidb-cloud-lake/sql/show-table-status.md) | Shows the status of the tables in a database | +| [SHOW DROP TABLES](/tidb-cloud-lake/sql/show-drop-tables.md) | Lists the dropped tables in the current or a specified database | + +## Table Deletion & Recovery + +| Command | Description | Recovery Option | +|---------|-------------|----------------| +| [TRUNCATE TABLE](/tidb-cloud-lake/sql/truncate-table.md) | Removes all data from a table while preserving the table's schema | [FLASHBACK TABLE](/tidb-cloud-lake/sql/flashback-table.md) | +| [DROP TABLE](/tidb-cloud-lake/sql/drop-table.md) | Deletes a table | [UNDROP TABLE](/tidb-cloud-lake/sql/undrop-table.md) | +| [VACUUM TABLE](/tidb-cloud-lake/sql/vacuum-table.md) | Permanently removes historical data files of a table (Enterprise Edition) | Not recoverable | +| [VACUUM DROP TABLE](/tidb-cloud-lake/sql/vacuum-drop-table.md) | Permanently removes data files of dropped tables (Enterprise Edition) | Not recoverable | + +## Table Optimization + +| Command | Description | +|---------|-------------| +| [OPTIMIZE TABLE](/tidb-cloud-lake/sql/optimize-table.md) | Compacts or purges historical data to save storage space and enhance query performance | +| [SET CLUSTER KEY](/tidb-cloud-lake/sql/set-cluster-key.md) | Configures a cluster key to enhance query performance for large tables | + +:::note +Table optimization is an advanced operation. Please carefully read the documentation before proceeding to avoid potential data loss. +::: diff --git a/tidb-cloud-lake/sql/ddl-view-overview.md b/tidb-cloud-lake/sql/ddl-view-overview.md new file mode 100644 index 0000000000000..60e83031c8749 --- /dev/null +++ b/tidb-cloud-lake/sql/ddl-view-overview.md @@ -0,0 +1,24 @@ +--- +title: View +--- + +This page provides a comprehensive overview of view operations in Databend, organized by functionality for easy reference. + +## View Management + +| Command | Description | +|---------|-------------| +| [CREATE VIEW](/tidb-cloud-lake/sql/create-view.md) | Creates a new view based on a query | +| [ALTER VIEW](/tidb-cloud-lake/sql/alter-view.md) | Modifies an existing view | +| [DROP VIEW](/tidb-cloud-lake/sql/drop-view.md) | Removes a view | + +## View Information + +| Command | Description | +|---------|-------------| +| [DESC VIEW](/tidb-cloud-lake/sql/desc-view.md) | Shows detailed information about a view | +| [SHOW VIEWS](/tidb-cloud-lake/sql/show-views.md) | Lists all views in the current or specified database | + +:::note +Views in Databend are named queries stored in the database that can be referenced like tables. They provide a way to simplify complex queries and control access to underlying data. +::: \ No newline at end of file diff --git a/tidb-cloud-lake/sql/ddl.md b/tidb-cloud-lake/sql/ddl.md new file mode 100644 index 0000000000000..ef131d5ca9668 --- /dev/null +++ b/tidb-cloud-lake/sql/ddl.md @@ -0,0 +1,61 @@ +--- +title: DDL (Data Definition Language) Commands +--- + +These topics provide reference information for the DDL (Data Definition Language) commands in Databend. + +## Database & Table Management + +| Component | Description | +|-----------|-------------| +| **[Database](/tidb-cloud-lake/sql/ddl-database-overview.md)** | Create, alter, and drop databases | +| **[Table](/tidb-cloud-lake/sql/ddl-table-overview.md)** | Create, alter, and manage tables | +| **[View](/tidb-cloud-lake/sql/ddl-view-overview.md)** | Create and manage virtual tables based on queries | + +## Performance & Indexing + +| Component | Description | +|-----------|-------------| +| **[Cluster Key](/tidb-cloud-lake/sql/cluster-key.md)** | Define data clustering for query optimization | +| **[Aggregating Index](/tidb-cloud-lake/sql/aggregating-index.md)** | Pre-compute aggregations for faster queries | +| **[Inverted Index](/tidb-cloud-lake/sql/inverted-index.md)** | Full-text search index for text columns | +| **[Ngram Index](/tidb-cloud-lake/sql/ngram-index.md)** | Substring search index for LIKE patterns | +| **[Virtual Column](/tidb-cloud-lake/sql/virtual-column.md)** | Extract and index JSON fields as virtual columns | + +## Security & Access Control + +| Component | Description | +|-----------|-------------| +| **[User](/tidb-cloud-lake/sql/user-role.md)** | Create and manage database users | +| **[Network Policy](/tidb-cloud-lake/sql/network-policy.md)** | Control network access to databases | +| **[Mask Policy](/tidb-cloud-lake/sql/masking-policy.md)** | Apply data masking for sensitive information | +| **[Password Policy](/tidb-cloud-lake/sql/password-policy.md)** | Enforce password requirements and rotation | + +## Data Integration & Processing + +| Component | Description | +|-----------|-------------| +| **[Stage](/tidb-cloud-lake/sql/stage.md)** | Define storage locations for data loading | +| **[Stream](/tidb-cloud-lake/sql/stream.md)** | Capture and process data changes | +| **[Task](/tidb-cloud-lake/sql/task.md)** | Schedule and automate SQL operations | +| **[Sequence](/tidb-cloud-lake/sql/sequence.md)** | Generate unique sequential numbers | +| **[Connection](/tidb-cloud-lake/sql/connection.md)** | Configure external data source connections | +| **[File Format](/tidb-cloud-lake/sql/file-format.md)** | Define formats for data import/export | + +## Functions & Procedures + +| Component | Description | +|-----------|-------------| +| **[UDF](/tidb-cloud-lake/sql/user-defined-function.md)** | Create custom functions in Python or JavaScript | +| **[External Function](/tidb-cloud-lake/sql/external-function.md)** | Integrate external APIs as SQL functions | +| **[Procedure](/tidb-cloud-lake/sql/stored-procedure.md)** | Create stored procedures for complex logic | +| **[Notification](/tidb-cloud-lake/sql/notification.md)** | Set up event notifications and webhooks | + +## Resource Management + +| Component | Description | +|-----------|-------------| +| **[Warehouse](/tidb-cloud-lake/sql/warehouse.md)** | Manage compute resources for query execution | +| **[Workload Group](/tidb-cloud-lake/sql/workload-group.md)** | Control resource allocation and priorities | +| **[Transaction](/tidb-cloud-lake/sql/transaction.md)** | Manage database transactions | +| **[Variable](/tidb-cloud-lake/sql/sql-variables.md)** | Set and use session/global variables | diff --git a/tidb-cloud-lake/sql/decades.md b/tidb-cloud-lake/sql/decades.md new file mode 100644 index 0000000000000..6f26095d3638c --- /dev/null +++ b/tidb-cloud-lake/sql/decades.md @@ -0,0 +1,32 @@ +--- +title: TO_DECADES +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts a specified number of decades into an Interval type. + +- Accepts positive integers, zero, and negative integers as input. + +## Syntax + +```sql +TO_DECADES() +``` + +## Return Type + +Interval (represented in years). + +## Examples + +```sql +SELECT TO_DECADES(2), TO_DECADES(0), TO_DECADES((- 2)); + +┌─────────────────────────────────────────────────┐ +│ to_decades(2) │ to_decades(0) │ to_decades(- 2) │ +├───────────────┼───────────────┼─────────────────┤ +│ 20 years │ 00:00:00 │ -20 years │ +└─────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/decimal.md b/tidb-cloud-lake/sql/decimal.md new file mode 100644 index 0000000000000..dced4025e3a6d --- /dev/null +++ b/tidb-cloud-lake/sql/decimal.md @@ -0,0 +1,63 @@ +--- +title: Decimal +description: Decimal types are high-precision numeric values to be stored and manipulated. +sidebar_position: 5 +--- + +## Overview + +`DECIMAL(P, S)` stores exact numeric values with precision `P` (total digits, 1–76) and scale `S` (digits to the right of the decimal point, 0–P). Values must sit within ±`(10^P - 1) / 10^S`. Precisions up to 38 use `DECIMAL128`, and larger values use `DECIMAL256`. + +## Examples + +```sql +CREATE TABLE invoices ( + description STRING, + amount DECIMAL(10, 2), + tax_rate DECIMAL(5, 4) +); + +INSERT INTO invoices VALUES + ('Laptop', 1299.99, 0.1300), + ('Monitor', 399.50, 0.0750); + +SELECT + description, + amount, + tax_rate, + amount * tax_rate AS tax_value, + amount + amount * tax_rate AS total_due +FROM invoices; +``` + +Result: +``` +┌─────────────┬──────────┬──────────┬────────────┬────────────┐ +│ description │ amount │ tax_rate │ tax_value │ total_due │ +├─────────────┼──────────┼──────────┼────────────┼────────────┤ +│ Laptop │ 1299.99 │ 0.1300 │ 168.998700 │ 1468.988700 │ +│ Monitor │ 399.50 │ 0.0750 │ 29.962500 │ 429.462500 │ +└─────────────┴──────────┴──────────┴────────────┴────────────┘ +``` + +Arithmetic preserves precision automatically: additions keep the widest integer and fractional parts, multiplication adds precisions, and division keeps the left operand's scale. Use explicit casts if you need a specific result shape. + +```sql +SELECT + SUM(amount) AS sum_default, + CAST(SUM(amount) AS DECIMAL(12, 2)) AS sum_cast, + AVG(amount) AS avg_default, + CAST(AVG(amount) AS DECIMAL(12, 4)) AS avg_cast +FROM invoices; +``` + +Result: +``` +┌─────────────┬───────────┬────────────────┬──────────┐ +│ sum_default │ sum_cast │ avg_default │ avg_cast │ +├─────────────┼───────────┼────────────────┼──────────┤ +│ 1699.49 │ 1699.49 │ 849.74500000 │ 849.7450 │ +└─────────────┴───────────┴────────────────┴──────────┘ +``` + +If an operation would overflow the integer part, Databend raises an error; extra fractional digits are truncated rather than rounded. Adjust `P`/`S` or cast the result to control both behaviors. diff --git a/tidb-cloud-lake/sql/decode.md b/tidb-cloud-lake/sql/decode.md new file mode 100644 index 0000000000000..3ada4801d9bc7 --- /dev/null +++ b/tidb-cloud-lake/sql/decode.md @@ -0,0 +1,56 @@ +--- +title: DECODE +--- + +The DECODE function compares the select expression to each search expression in order. As soon as a search expression matches the selection expression, the corresponding result expression is returned. If no match is found and a default value is provided, the default value is returned. + +## Syntax + +```sql +DECODE( , , [, , ... ] [, ] ) +``` + +## Arguments + +- `expr`: The "select expression" that is compared against each search expression. This is typically a column, but can be a subquery, literal, or other expression. +- `searchN`: The search expressions to compare against the select expression. If a match is found, the corresponding result is returned. +- `resultN`: The values that will be returned if the corresponding search expression matches the select expression. +- `default`: Optional. If provided and no search expression matches, this default value is returned. + +## Usage Notes + +- Unlike `CASE`, a NULL value in the select expression matches a NULL value in the search expressions. +- If multiple search expressions would match, only the first match's result is returned. + +## Examples + +```sql +CREATE TABLE t (a VARCHAR); +INSERT INTO t (a) VALUES + ('1'), + ('2'), + (NULL), + ('4'); +``` + +Example with a default value 'other' (note that NULL equals NULL): + +```sql +SELECT a, decode(a, + 1, 'one', + 2, 'two', + NULL, '-NULL-', + 'other' + ) AS decode_result + FROM t; +``` + +result: +``` +┌─a─┬─decode_result─┐ +│ 1 │ one │ +│ 2 │ two │ +│ │ -NULL- │ +│ 4 │ other │ +└───┴───────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/degrees.md b/tidb-cloud-lake/sql/degrees.md new file mode 100644 index 0000000000000..262dd769ca3ac --- /dev/null +++ b/tidb-cloud-lake/sql/degrees.md @@ -0,0 +1,23 @@ +--- +title: DEGREES +--- + +Returns the argument `x`, converted from radians to degrees, where `x` is given in radians. + +## Syntax + +```sql +DEGREES( ) +``` + +## Examples + +```sql +SELECT DEGREES(PI()); + +┌───────────────┐ +│ degrees(pi()) │ +├───────────────┤ +│ 180 │ +└───────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/delete.md b/tidb-cloud-lake/sql/delete.md new file mode 100644 index 0000000000000..3f5620ef71906 --- /dev/null +++ b/tidb-cloud-lake/sql/delete.md @@ -0,0 +1,152 @@ +--- +title: DELETE +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Removes one or more rows from a table. + +:::tip atomic operations +Databend ensures data integrity with atomic operations. Inserts, updates, replaces, and deletes either succeed completely or fail entirely. +::: + +## Syntax + +```sql +DELETE FROM [AS ] +[WHERE ] +``` +- `AS `: Allows you to set an alias for a table, making it easier to reference the table within a query. This helps simplify and shorten the SQL code, especially when dealing with complex queries involving multiple tables. See an example in [Deleting with subquery using EXISTS / NOT EXISTS clause](#deleting-with-subquery-using-exists--not-exists-clause). + +- DELETE does not support the USING clause yet. If you need to use a subquery to identify the rows to be removed, include it within the WHERE clause directly. See examples in [Subquery-Based Deletions](#subquery-based-deletions). + +## Examples + +### Example 1: Direct Row Deletion + +This example illustrates the use of the DELETE command to directly remove a book record with an ID of 103 from a "bookstore" table. + +```sql +-- Create a table and insert 5 book records +CREATE TABLE bookstore ( + book_id INT, + book_name VARCHAR +); + +INSERT INTO bookstore VALUES (101, 'After the death of Don Juan'); +INSERT INTO bookstore VALUES (102, 'Grown ups'); +INSERT INTO bookstore VALUES (103, 'The long answer'); +INSERT INTO bookstore VALUES (104, 'Wartime friends'); +INSERT INTO bookstore VALUES (105, 'Deconstructed'); + +-- Delete a book (Id: 103) +DELETE FROM bookstore WHERE book_id = 103; + +-- Show all records after deletion +SELECT * FROM bookstore; + +101|After the death of Don Juan +102|Grown ups +104|Wartime friends +105|Deconstructed +``` + +### Example 2: Subquery-Based Deletions + +When using a subquery to identify the rows to be deleted, [Subquery Operators](/tidb-cloud-lake/sql/query-operators.md) and [Comparison Operators](/tidb-cloud-lake/sql/query-operators.md) can be utilized to achieve the desired deletion. + +The examples in this section are based on the following two tables: + +```sql +-- Create the 'employees' table +CREATE TABLE employees ( + id INT, + name VARCHAR, + department VARCHAR +); + +-- Insert values into the 'employees' table +INSERT INTO employees VALUES (1, 'John', 'HR'); +INSERT INTO employees VALUES (2, 'Mary', 'Sales'); +INSERT INTO employees VALUES (3, 'David', 'IT'); +INSERT INTO employees VALUES (4, 'Jessica', 'Finance'); + +-- Create the 'departments' table +CREATE TABLE departments ( + id INT, + department VARCHAR +); + +-- Insert values into the 'departments' table +INSERT INTO departments VALUES (1, 'Sales'); +INSERT INTO departments VALUES (2, 'IT'); +``` + +#### Deleting with subquery using IN / NOT IN clause + +```sql +DELETE FROM EMPLOYEES +WHERE DEPARTMENT IN ( + SELECT DEPARTMENT + FROM DEPARTMENTS +); +``` +This deletes employees whose department matches any department in the departments table. It would delete employees with IDs 2 and 3. + +#### Deleting with subquery using EXISTS / NOT EXISTS clause + +```sql +DELETE FROM EMPLOYEES +WHERE EXISTS ( + SELECT * + FROM DEPARTMENTS + WHERE EMPLOYEES.DEPARTMENT = DEPARTMENTS.DEPARTMENT +); + +-- Alternatively, you can delete employees using the alias 'e' for the 'EMPLOYEES' table and 'd' for the 'DEPARTMENTS' table when their department matches. +DELETE FROM EMPLOYEES AS e +WHERE EXISTS ( + SELECT * + FROM DEPARTMENTS AS d + WHERE e.DEPARTMENT = d.DEPARTMENT +); +``` +This deletes employees who belong to a department that exists in the departments table. In this case, it would delete employees with IDs 2 and 3. + +#### Deleting with subquery using ALL clause + +```sql +DELETE FROM EMPLOYEES +WHERE DEPARTMENT = ALL ( + SELECT DEPARTMENT + FROM DEPARTMENTS +); +``` +This deletes employees whose department matches all departments in the department table. In this case, no employees would be deleted. + +#### Deleting with subquery using ANY clause + +```sql +DELETE FROM EMPLOYEES +WHERE DEPARTMENT = ANY ( + SELECT DEPARTMENT + FROM DEPARTMENTS +); +``` +This deletes employees whose department matches any department in the departments table. In this case, it would delete employees with IDs 2 and 3. + +#### Deleting with subquery combining multiple conditions + +```sql +DELETE FROM EMPLOYEES +WHERE DEPARTMENT = ANY ( + SELECT DEPARTMENT + FROM DEPARTMENTS + WHERE EMPLOYEES.DEPARTMENT = DEPARTMENTS.DEPARTMENT +) + OR ID > 2; +``` + +This deletes employees from the employees table if the value of the department column matches any value in the department column of the departments table or if the value of the id column is greater than 2. In this case, it would delete the rows with id 2, 3, and 4 since Mary's department is "Sales," which exists in the departments table, and the IDs 3 and 4 are greater than 2. \ No newline at end of file diff --git a/tidb-cloud-lake/sql/delta-lake-engine.md b/tidb-cloud-lake/sql/delta-lake-engine.md new file mode 100644 index 0000000000000..b7b60d2a85df9 --- /dev/null +++ b/tidb-cloud-lake/sql/delta-lake-engine.md @@ -0,0 +1,42 @@ +--- +id: delta +title: Delta Lake Engine +sidebar_label: Delta Lake Engine +slug: /sql-reference/table-engines/delta +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Databend's [Delta Lake](https://delta.io/) engine allows you to seamlessly query and analyze data in Delta Lake tables stored in your object storage. When you create a table with the Delta Lake engine in Databend, you specify a location where the data files of a Delta Lake table are stored. This setup allows you to gain direct access to the table and perform queries seamlessly from within Databend. + +- Databend's Delta Lake engine currently supported read-only operations. This means that querying data from your Delta Lake tables is supported, while writing to the tables is not. +- The schema for a table created with the Delta Lake engine is set at the time of its creation. Any modifications to the schema of the original Delta Lake table require the recreation of the corresponding table in Databend to ensure synchronization. +- The Delta Lake engine in Databend is built upon the official [delta-rs](https://github.com/delta-io/delta-rs) library. It is important to note that certain features defined in the delta-protocol, including Deletion Vector, Change Data Feed, Generated Columns, and Identity Columns, are NOT currently supported by this engine. + +## Syntax + +```sql +CREATE TABLE +ENGINE = Delta +LOCATION = 's3://' +CONNECTION_NAME = '' +``` + +Before creating a table with the Delta Lake engine, you need to create a connection object used to establish a connection with your S3 storage. To create a connection in Databend, use the [CREATE CONNECTION](/tidb-cloud-lake/sql/create-connection.md) command. + +## Examples + +```sql +--Set up connection +CREATE CONNECTION my_s3_conn +STORAGE_TYPE = 's3' +ACCESS_KEY_ID ='your-ak' SECRET_ACCESS_KEY ='your-sk'; + +-- Create table with Delta Lake engine +CREATE TABLE test_delta +ENGINE = Delta +LOCATION = 's3://testbucket/admin/data/delta/delta-table/' +CONNECTION_NAME = 'my_s3_conn'; +``` diff --git a/tidb-cloud-lake/sql/dense-rank.md b/tidb-cloud-lake/sql/dense-rank.md new file mode 100644 index 0000000000000..ac8c90677210d --- /dev/null +++ b/tidb-cloud-lake/sql/dense-rank.md @@ -0,0 +1,97 @@ +--- +title: DENSE_RANK +--- + +Assigns a rank to each row within a partition. Rows with equal values receive the same rank, with no gaps in subsequent rankings. + +## Syntax + +```sql +DENSE_RANK() +OVER ( + [ PARTITION BY partition_expression ] + ORDER BY sort_expression [ ASC | DESC ] +) +``` + +**Arguments:** +- `PARTITION BY`: Optional. Divides rows into partitions +- `ORDER BY`: Required. Determines the ranking order +- `ASC | DESC`: Optional. Sort direction (default: ASC) + +**Notes:** +- Ranks start from 1 +- Equal values get the same rank +- No gaps in ranking sequence after ties +- Example: 1, 2, 2, 3, 4 (not 1, 2, 2, 4, 5 like RANK) + +## Examples + +```sql +-- Create sample data +CREATE TABLE scores ( + student VARCHAR(20), + subject VARCHAR(20), + score INT +); + +INSERT INTO scores VALUES + ('Alice', 'Math', 95), + ('Alice', 'English', 87), + ('Alice', 'Science', 92), + ('Bob', 'Math', 85), + ('Bob', 'English', 85), + ('Bob', 'Science', 80), + ('Charlie', 'Math', 88), + ('Charlie', 'English', 85), + ('Charlie', 'Science', 85); +``` + +**Dense rank all scores (showing no gaps after ties):** + +```sql +SELECT student, subject, score, + DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank +FROM scores +ORDER BY score DESC, student, subject; +``` + +Result: +``` +student | subject | score | dense_rank +--------+---------+-------+----------- +Alice | Math | 95 | 1 +Alice | Science | 92 | 2 +Charlie | Math | 88 | 3 +Alice | English | 87 | 4 +Bob | English | 85 | 5 +Bob | Math | 85 | 5 +Charlie | English | 85 | 5 +Charlie | Science | 85 | 5 +Bob | Science | 80 | 6 +``` + +**Dense rank scores within each student:** + +```sql +SELECT student, subject, score, + DENSE_RANK() OVER (PARTITION BY student ORDER BY score DESC) AS subject_dense_rank +FROM scores +ORDER BY student, score DESC, subject; +``` + +Result: +``` +student | subject | score | subject_dense_rank +--------+---------+-------+------------------- +Alice | Math | 95 | 1 +Alice | Science | 92 | 2 +Alice | English | 87 | 3 +Bob | English | 85 | 1 +Bob | Math | 85 | 1 +Bob | Science | 80 | 2 +Charlie | Math | 88 | 1 +Charlie | English | 85 | 2 +Charlie | Science | 85 | 2 +``` + diff --git a/tidb-cloud-lake/sql/desc-connection.md b/tidb-cloud-lake/sql/desc-connection.md new file mode 100644 index 0000000000000..92f6bf06e8ab6 --- /dev/null +++ b/tidb-cloud-lake/sql/desc-connection.md @@ -0,0 +1,27 @@ +--- +title: DESC CONNECTION +sidebar_position: 2 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Describes the details of a specific connection, providing information about its type and configuration. + +## Syntax + +```sql +DESC CONNECTION +``` + +## Examples + +```sql +DESC CONNECTION toronto; + +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ storage_type │ storage_params │ +├─────────┼──────────────┼───────────────────────────────────────────────────────────────────────────────────┤ +│ toronto │ s3 │ access_key_id= secret_access_key= │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/desc-masking-policy.md b/tidb-cloud-lake/sql/desc-masking-policy.md new file mode 100644 index 0000000000000..283e0ce027d44 --- /dev/null +++ b/tidb-cloud-lake/sql/desc-masking-policy.md @@ -0,0 +1,55 @@ +--- +title: DESC MASKING POLICY +sidebar_position: 2 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +import EEFeature from '@site/src/components/EEFeature'; + + + +Displays detailed information about a specific masking policy in Databend. + +## Syntax + +```sql +DESC MASKING POLICY +``` + +## Access Control Requirements + +| Privilege | Description | +|:----------|:------------| +| APPLY MASKING POLICY | Required to describe a masking policy unless you own that policy. | + +Either the global `APPLY MASKING POLICY` privilege or APPLY/OWNERSHIP on the specific masking policy satisfies this requirement. + +## Examples + +```sql +CREATE MASKING POLICY email_mask +AS + (val string) + RETURNS string -> + CASE + WHEN current_role() IN ('MANAGERS') THEN + val + ELSE + '*********' + END + COMMENT = 'hide_email'; + +DESC MASKING POLICY email_mask; + +Name |Value | +-----------+---------------------------------------------------------------------+ +Name |email_mask | +Created On |2023-08-09 02:29:16.177898 UTC | +Signature |(val STRING) | +Return Type|STRING | +Body |CASE WHEN current_role() IN('MANAGERS') THEN VAL ELSE '*********' END| +Comment |hide_email | +``` diff --git a/tidb-cloud-lake/sql/desc-network-policy.md b/tidb-cloud-lake/sql/desc-network-policy.md new file mode 100644 index 0000000000000..0349f33576f6e --- /dev/null +++ b/tidb-cloud-lake/sql/desc-network-policy.md @@ -0,0 +1,26 @@ +--- +title: DESC NETWORK POLICY +sidebar_position: 2 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Displays detailed information about a specific network policy in Databend. It provides information about the allowed and blocked IP address lists associated with the policy and the comment, if any, that describes the purpose or function of the policy. + +## Syntax + +```sql +DESC NETWORK POLICY +``` + +## Examples + +```sql +DESC NETWORK POLICY test_policy; + +Name |Allowed Ip List |Blocked Ip List|Comment | +-----------+-------------------------+---------------+-----------+ +test_policy|192.168.10.0,192.168.20.0| |new comment| +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/desc-password-policy.md b/tidb-cloud-lake/sql/desc-password-policy.md new file mode 100644 index 0000000000000..bca48b3f0129c --- /dev/null +++ b/tidb-cloud-lake/sql/desc-password-policy.md @@ -0,0 +1,42 @@ +--- +title: DESC PASSWORD POLICY +sidebar_position: 2 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Displays detailed information about a specific password policy in Databend. For detailed descriptions of the password policy attributes, see [Password Policy Attributes](/tidb-cloud-lake/sql/create-password-policy.md#password-policy-attributes). + +## Syntax + +```sql +DESC PASSWORD POLICY +``` + +## Examples + +```sql +CREATE PASSWORD POLICY SecureLogin + PASSWORD_MIN_LENGTH = 10; + +DESC PASSWORD POLICY SecureLogin; + +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ Property │ Value │ Default │ Description │ +├───────────────────────────────┼─────────────┼──────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ NAME │ SecureLogin │ NULL │ Name of password policy. │ +│ COMMENT │ │ NULL │ Comment of password policy. │ +│ PASSWORD_MIN_LENGTH │ 10 │ 8 │ Minimum length of new password. │ +│ PASSWORD_MAX_LENGTH │ 256 │ 256 │ Maximum length of new password. │ +│ PASSWORD_MIN_UPPER_CASE_CHARS │ 1 │ 1 │ Minimum number of uppercase characters in new password. │ +│ PASSWORD_MIN_LOWER_CASE_CHARS │ 1 │ 1 │ Minimum number of lowercase characters in new password. │ +│ PASSWORD_MIN_NUMERIC_CHARS │ 1 │ 1 │ Minimum number of numeric characters in new password. │ +│ PASSWORD_MIN_SPECIAL_CHARS │ 0 │ 0 │ Minimum number of special characters in new password. │ +│ PASSWORD_MIN_AGE_DAYS │ 0 │ 0 │ Period after a password is changed during which a password cannot be changed again, in days. │ +│ PASSWORD_MAX_AGE_DAYS │ 90 │ 90 │ Period after which password must be changed, in days. │ +│ PASSWORD_MAX_RETRIES │ 5 │ 5 │ Number of attempts users have to enter the correct password before their account is locked. │ +│ PASSWORD_LOCKOUT_TIME_MINS │ 15 │ 15 │ Period of time for which users will be locked after entering their password incorrectly many times (specified by MAX_RETRIES), in minutes. │ +│ PASSWORD_HISTORY │ 0 │ 0 │ Number of most recent passwords that may not be repeated by the user. │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/desc-procedure.md b/tidb-cloud-lake/sql/desc-procedure.md new file mode 100644 index 0000000000000..cc9f1accc1ab3 --- /dev/null +++ b/tidb-cloud-lake/sql/desc-procedure.md @@ -0,0 +1,51 @@ +--- +title: DESC PROCEDURE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Displays detailed information about a specific stored procedure. + +## Syntax + +```sql +DESC | DESCRIBE PROCEDURE ([, , ...]) +``` + +- If a procedure has no parameters, use empty parentheses: `DESC PROCEDURE ()`; +- For procedures with parameters, specify the exact types to avoid errors. + +## Examples + +This example creates and then displays a stored procedure named `sum_even_numbers`. + +```sql +CREATE PROCEDURE sum_even_numbers(start_val UInt8, end_val UInt8) +RETURNS UInt8 NOT NULL +LANGUAGE SQL +COMMENT='Calculate the sum of all even numbers' +AS $$ +BEGIN + LET sum := 0; + FOR i IN start_val TO end_val DO + IF i % 2 = 0 THEN + sum := sum + i; + END IF; + END FOR; + + RETURN sum; +END; +$$; + +DESC PROCEDURE sum_even_numbers(Uint8, Uint8); + +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ Property │ Value │ +├───────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ signature │ (start_val,end_val) │ +│ returns │ (UInt8) │ +│ language │ SQL │ +│ body │ BEGIN\n LET sum := 0;\n FOR i IN start_val TO end_val DO\n IF i % 2 = 0 THEN\n sum := sum + i;\n END IF;\n END FOR;\n \n RETURN sum;\nEND; │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/desc-sequence.md b/tidb-cloud-lake/sql/desc-sequence.md new file mode 100644 index 0000000000000..c00ac29a458ed --- /dev/null +++ b/tidb-cloud-lake/sql/desc-sequence.md @@ -0,0 +1,39 @@ +--- +title: DESC SEQUENCE +sidebar_position: 4 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Describes the properties of a sequence. + +## Syntax + +```sql +DESC SEQUENCE +``` + +| Parameter | Description | +|----------------|-----------------------------------------------------------------------------------------------------------------------------| +| sequence_name | The name of the sequence to describe. This will display all properties of the sequence including start value, interval, current value, creation timestamp, last update timestamp, and any comment. | + +## Examples + +```sql +-- Create a sequence +CREATE SEQUENCE seq; + +-- Use the sequence in an INSERT statement +CREATE TABLE tmp(a int, b uint64, c int); +INSERT INTO tmp select 10,nextval(seq),20 from numbers(3); + +-- Describe the sequence +DESC SEQUENCE seq; + +╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ +│ name │ start │ interval │ current │ created_on │ updated_on │ comment │ +├────────┼────────┼──────────┼─────────┼────────────────────────────┼────────────────────────────┼──────────────────┤ +│ seq │ 1 │ 1 │ 4 │ 2025-05-20 02:48:49.749338 │ 2025-05-20 02:49:14.302917 │ NULL │ +╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ \ No newline at end of file diff --git a/tidb-cloud-lake/sql/desc-stage.md b/tidb-cloud-lake/sql/desc-stage.md new file mode 100644 index 0000000000000..572c39dd2434f --- /dev/null +++ b/tidb-cloud-lake/sql/desc-stage.md @@ -0,0 +1,27 @@ +--- +title: DESC STAGE +sidebar_position: 2 +--- + +Describes the properties of a stage. + +## Syntax + +```sql +DESC STAGE +``` + +## Examples + +```sql +CREATE STAGE my_int_stage; +``` + +```sql +DESC STAGE my_int_stage; ++--------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------+---------+ +| name | stage_type | stage_params | copy_options | file_format_options | comment | ++--------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------+---------+ +| my_int_stage | Internal | StageParams { storage: S3(StageS3Storage { bucket: "", path: "", credentials_aws_key_id: "", credentials_aws_secret_key: "", encryption_master_key: "" }) } | CopyOptions { on_error: None, size_limit: 0 } | FileFormatOptions { format: Parquet, skip_header: 0, field_delimiter: ",", record_delimiter: "\n", compression: None } | | ++--------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------+---------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/desc-stream.md b/tidb-cloud-lake/sql/desc-stream.md new file mode 100644 index 0000000000000..2aedea5e37a3d --- /dev/null +++ b/tidb-cloud-lake/sql/desc-stream.md @@ -0,0 +1,31 @@ +--- +title: DESC STREAM +sidebar_position: 2 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +import EEFeature from '@site/src/components/EEFeature'; + + + +Describes the details of a specific stream. + +## Syntax + +```sql +DESC|DESCRIBE STREAM [ . ] +``` + +## Examples + +```sql +DESC STREAM books_stream_2023; + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ created_on │ name │ database │ catalog │ table_on │ owner │ comment │ mode │ invalid_reason │ +├────────────────────────────┼───────────────────┼──────────┼─────────┼─────────────────────┼──────────────────┼─────────┼─────────────┼────────────────┤ +│ 2023-11-29 02:38:29.588518 │ books_stream_2023 │ default │ default │ default.books_total │ NULL │ │ append_only │ │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/desc-user.md b/tidb-cloud-lake/sql/desc-user.md new file mode 100644 index 0000000000000..01836197700d4 --- /dev/null +++ b/tidb-cloud-lake/sql/desc-user.md @@ -0,0 +1,44 @@ +--- +title: DESC USER +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Displays detailed information about a specific SQL user, including authentication type, roles, network policy, password policy, and other user-related settings. + +## Syntax + +```sql +DESC[RIBE] USER +``` + +## Examples + +```sql +CREATE NETWORK POLICY my_network_policy ALLOWED_IP_LIST=('192.168.100.0/24'); + +CREATE PASSWORD POLICY my_password_policy + PASSWORD_MIN_LENGTH = 12 + PASSWORD_MAX_LENGTH = 24 + PASSWORD_MIN_UPPER_CASE_CHARS = 2 + PASSWORD_MIN_LOWER_CASE_CHARS = 2 + PASSWORD_MIN_NUMERIC_CHARS = 2 + PASSWORD_MIN_SPECIAL_CHARS = 2 + PASSWORD_MIN_AGE_DAYS = 1 + PASSWORD_MAX_AGE_DAYS = 30 + PASSWORD_MAX_RETRIES = 3 + PASSWORD_LOCKOUT_TIME_MINS = 30 + PASSWORD_HISTORY = 5 + COMMENT = 'test comment'; + +CREATE USER eric IDENTIFIED BY '123ABCabc$$123' WITH SET PASSWORD POLICY='my_password_policy', SET NETWORK POLICY='my_network_policy'; + +DESC USER eric; + +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ hostname │ auth_type │ default_role │ roles │ disabled │ network_policy │ password_policy │ must_change_password │ +├────────┼──────────┼──────────────────────┼──────────────┼────────┼──────────┼───────────────────┼────────────────────┼──────────────────────┤ +│ eric │ % │ double_sha1_password │ │ │ false │ my_network_policy │ my_password_policy │ NULL │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/desc-view.md b/tidb-cloud-lake/sql/desc-view.md new file mode 100644 index 0000000000000..6103df355345d --- /dev/null +++ b/tidb-cloud-lake/sql/desc-view.md @@ -0,0 +1,66 @@ +--- +title: DESC VIEW +sidebar_position: 3 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the list of columns for a view. + +## Syntax + +```sql +DESC[RIBE] VIEW [.] +``` + +## Output + +The command outputs a table with the following columns: + +| Column | Description | +|---------|-------------------------------------------------------------------------------------------------------------------------| +| Field | The name of the column in the view. | +| Type | The data type of the column. | +| Null | Indicates whether the column allows NULL values (YES for allowing NULL, NO for not allowing NULL). | +| Default | Specifies the default value for the column. | +| Extra | Provides additional information about the column, such as whether it is a computed column, or other special attributes. | + +## Examples + +```sql +-- Create the employees table +CREATE TABLE employees ( + employee_id INT, + first_name VARCHAR(50), + last_name VARCHAR(50), + email VARCHAR(100), + hire_date DATE, + department_id INT +); + +-- Insert data into the employees table +INSERT INTO employees (employee_id, first_name, last_name, email, hire_date, department_id) +VALUES +(1, 'John', 'Doe', 'john@example.com', '2020-01-01', 101), +(2, 'Jane', 'Smith', 'jane@example.com', '2020-02-01', 102), +(3, 'Alice', 'Johnson', 'alice@example.com', '2020-03-01', 103); + +-- Create the employee_info view +CREATE VIEW employee_info AS +SELECT employee_id, CONCAT(first_name, ' ', last_name) AS full_name, email, hire_date, department_id +FROM employees; + +-- Describe the structure of the employee_info view +DESC employee_info; + +┌─────────────────────────────────────────────────────┐ +│ Field │ Type │ Null │ Default │ Extra │ +├───────────────┼─────────┼────────┼─────────┼────────┤ +│ employee_id │ INT │ YES │ NULL │ │ +│ full_name │ VARCHAR │ YES │ NULL │ │ +│ email │ VARCHAR │ YES │ NULL │ │ +│ hire_date │ DATE │ YES │ NULL │ │ +│ department_id │ INT │ YES │ NULL │ │ +└─────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/describe-table.md b/tidb-cloud-lake/sql/describe-table.md new file mode 100644 index 0000000000000..891afcae90dbd --- /dev/null +++ b/tidb-cloud-lake/sql/describe-table.md @@ -0,0 +1,35 @@ +--- +title: DESCRIBE TABLE +sidebar_position: 2 +--- + +Shows information about the columns in a given table. Equivalent to [SHOW FIELDS](/tidb-cloud-lake/sql/show-fields.md). + +:::tip +[SHOW COLUMNS](show-full-columns.md) provides similar but more information about the columns of a table. +::: + +## Syntax + +```sql +DESC|DESCRIBE [TABLE] [ . ] +``` + +## Examples + +```sql +CREATE TABLE books + ( + price FLOAT Default 0.00, + pub_time DATETIME Default '1900-01-01', + author VARCHAR + ); + +DESC books; + +Field |Type |Null|Default |Extra| +--------+---------+----+----------------------------+-----+ +price |FLOAT |YES |0 | | +pub_time|TIMESTAMP|YES |'1900-01-01 00:00:00.000000'| | +author |VARCHAR |YES |NULL | | +``` diff --git a/tidb-cloud-lake/sql/div.md b/tidb-cloud-lake/sql/div.md new file mode 100644 index 0000000000000..afe4ff9b02bbf --- /dev/null +++ b/tidb-cloud-lake/sql/div.md @@ -0,0 +1,45 @@ +--- +title: DIV +--- + +Returns the quotient by dividing the first number by the second one, rounding down to the closest smaller integer. Equivalent to the division operator `//`. + +See also: + +- [DIV0](/tidb-cloud-lake/sql/div0.md) +- [DIVNULL](/tidb-cloud-lake/sql/divnull.md) + +## Syntax + +```sql + DIV +``` + +## Aliases + +- [INTDIV](/tidb-cloud-lake/sql/intdiv.md) + +## Examples + +```sql +-- Equivalent to the division operator "//" +SELECT 6.1 DIV 2, 6.1//2; + +┌──────────────────────────┐ +│ (6.1 div 2) │ (6.1 // 2) │ +├─────────────┼────────────┤ +│ 3 │ 3 │ +└──────────────────────────┘ + +SELECT 6.1 DIV 2, INTDIV(6.1, 2), 6.1 DIV NULL; + +┌───────────────────────────────────────────────┐ +│ (6.1 div 2) │ intdiv(6.1, 2) │ (6.1 div null) │ +├─────────────┼────────────────┼────────────────┤ +│ 3 │ 3 │ NULL │ +└───────────────────────────────────────────────┘ + +-- Error when divided by 0 +root@localhost:8000/default> SELECT 6.1 DIV 0; +error: APIError: ResponseError with 1006: divided by zero while evaluating function `div(6.1, 0)` +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/div0.md b/tidb-cloud-lake/sql/div0.md new file mode 100644 index 0000000000000..bd234b6c94b23 --- /dev/null +++ b/tidb-cloud-lake/sql/div0.md @@ -0,0 +1,34 @@ +--- +title: DIV0 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the quotient by dividing the first number by the second one. Returns 0 if the second number is 0. + +See also: + +- [DIV](/tidb-cloud-lake/sql/div.md) +- [DIVNULL](/tidb-cloud-lake/sql/divnull.md) + +## Syntax + +```sql +DIV0(, ) +``` + +## Examples + +```sql +SELECT + DIV0(20, 6), + DIV0(20, 0), + DIV0(20, NULL); + +┌───────────────────────────────────────────────────┐ +│ div0(20, 6) │ div0(20, 0) │ div0(20, null) │ +├────────────────────┼─────────────┼────────────────┤ +│ 3.3333333333333335 │ 0 │ NULL │ +└───────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/divnull.md b/tidb-cloud-lake/sql/divnull.md new file mode 100644 index 0000000000000..c140393826d13 --- /dev/null +++ b/tidb-cloud-lake/sql/divnull.md @@ -0,0 +1,34 @@ +--- +title: DIVNULL +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the quotient by dividing the first number by the second one. Returns NULL if the second number is 0 or NULL. + +See also: + +- [DIV](/tidb-cloud-lake/sql/div.md) +- [DIV0](/tidb-cloud-lake/sql/div0.md) + +## Syntax + +```sql +DIVNULL(, ) +``` + +## Examples + +```sql +SELECT + DIVNULL(20, 6), + DIVNULL(20, 0), + DIVNULL(20, NULL); + +┌─────────────────────────────────────────────────────────┐ +│ divnull(20, 6) │ divnull(20, 0) │ divnull(20, null) │ +├────────────────────┼────────────────┼───────────────────┤ +│ 3.3333333333333335 │ NULL │ NULL │ +└─────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/dml.md b/tidb-cloud-lake/sql/dml.md new file mode 100644 index 0000000000000..cdaf2d3208b66 --- /dev/null +++ b/tidb-cloud-lake/sql/dml.md @@ -0,0 +1,23 @@ +--- +title: DML (Data Manipulation Language) Commands +--- + +This page provides reference information for the DML (Data Manipulation Language) commands in Databend. + +## Data Modification + +| Command | Description | +|---------|-------------| +| **[INSERT](/tidb-cloud-lake/sql/insert.md)** | Add new rows to a table | +| **[INSERT MULTI](/tidb-cloud-lake/sql/insert-multi-table.md)** | Insert data into multiple tables in one statement | +| **[UPDATE](/tidb-cloud-lake/sql/update.md)** | Modify existing rows in a table | +| **[DELETE](/tidb-cloud-lake/sql/delete.md)** | Remove rows from a table | +| **[REPLACE](/tidb-cloud-lake/sql/replace.md)** | Insert new rows or update existing ones | +| **[MERGE](/tidb-cloud-lake/sql/merge.md)** | Perform upsert operations based on conditions | + +## Data Loading & Export + +| Command | Description | +|---------|-------------| +| **[COPY INTO Table](/tidb-cloud-lake/sql/copy-into-table.md)** | Load data from files into tables | +| **[COPY INTO Location](/tidb-cloud-lake/sql/copy-into-location.md)** | Export table data to files | \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-aggregating-index.md b/tidb-cloud-lake/sql/drop-aggregating-index.md new file mode 100644 index 0000000000000..26c8811cf76aa --- /dev/null +++ b/tidb-cloud-lake/sql/drop-aggregating-index.md @@ -0,0 +1,24 @@ +--- +title: DROP AGGREGATING INDEX +sidebar_position: 4 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Deletes an existing aggregating index. Please note that deleting an aggregating index does NOT remove the associated storage blocks. To delete the blocks as well, use the [VACUUM TABLE](/tidb-cloud-lake/sql/vacuum-table.md) command. To disable the aggregating indexing feature, set `enable_aggregating_index_scan` to 0. + +## Syntax + +```sql +DROP AGGREGATING INDEX +``` + +## Examples + +This example deleted an aggregating index named *my_agg_index*: + +```sql +DROP AGGREGATING INDEX my_agg_index; +``` diff --git a/tidb-cloud-lake/sql/drop-cluster-key.md b/tidb-cloud-lake/sql/drop-cluster-key.md new file mode 100644 index 0000000000000..61a4a2690eb6d --- /dev/null +++ b/tidb-cloud-lake/sql/drop-cluster-key.md @@ -0,0 +1,23 @@ +--- +title: DROP CLUSTER KEY +sidebar_position: 4 +--- + +Deletes the cluster key for a table. + +See also: +[ALTER CLUSTER KEY](/tidb-cloud-lake/sql/alter-cluster-key.md) + +## Syntax + +```sql +ALTER TABLE [ IF EXISTS ] DROP CLUSTER KEY +``` + +## Examples + +This command drops the cluster key for table *test*: + +```sql +ALTER TABLE test DROP CLUSTER KEY +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-connection.md b/tidb-cloud-lake/sql/drop-connection.md new file mode 100644 index 0000000000000..4a9ac01bf4fff --- /dev/null +++ b/tidb-cloud-lake/sql/drop-connection.md @@ -0,0 +1,21 @@ +--- +title: DROP CONNECTION +sidebar_position: 3 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Deletes an existing connection. + +## Syntax + +```sql +DROP CONNECTION [ IF EXISTS ] +``` + +## Examples + +```sql +DROP CONNECTION toronto; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-database.md b/tidb-cloud-lake/sql/drop-database.md new file mode 100644 index 0000000000000..2809bfb70b3c3 --- /dev/null +++ b/tidb-cloud-lake/sql/drop-database.md @@ -0,0 +1,33 @@ +--- +title: DROP DATABASE +--- + +Drops a database. + +See also: [UNDROP DATABASE](/tidb-cloud-lake/sql/undrop-database.md) + +## Syntax + +```sql +DROP { DATABASE | SCHEMA } [ IF EXISTS ] +``` + +`DROP SCHEMA` is a synonym for `DROP DATABASE`. + +## Examples + +This example creates and then drops a database named "orders_2024": + +```sql +root@localhost:8000/default> CREATE DATABASE orders_2024; + +CREATE DATABASE orders_2024 + +0 row written in 0.014 sec. Processed 0 row, 0 B (0 row/s, 0 B/s) + +root@localhost:8000/default> DROP DATABASE orders_2024; + +DROP DATABASE orders_2024 + +0 row written in 0.012 sec. Processed 0 row, 0 B (0 row/s, 0 B/s) +``` diff --git a/tidb-cloud-lake/sql/drop-file-format.md b/tidb-cloud-lake/sql/drop-file-format.md new file mode 100644 index 0000000000000..07c9b1b2a060a --- /dev/null +++ b/tidb-cloud-lake/sql/drop-file-format.md @@ -0,0 +1,18 @@ +--- +title: DROP FILE FORMAT +sidebar_position: 3 +--- + +Removes a file format. + +## Syntax + +```sql +DROP FILE FORMAT [ IF EXISTS ] ; +``` + +## Examples + +```sql +DROP FILE FORMAT IF EXISTS my_custom_csv; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-function-sql.md b/tidb-cloud-lake/sql/drop-function-sql.md new file mode 100644 index 0000000000000..c3cab10da7246 --- /dev/null +++ b/tidb-cloud-lake/sql/drop-function-sql.md @@ -0,0 +1,24 @@ +--- +title: DROP FUNCTION +sidebar_position: 3 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Drops an external function. + +## Syntax + +```sql +DROP FUNCTION [ IF EXISTS ] +``` + +## Examples + +```sql +DROP FUNCTION a_plus_3; + +SELECT a_plus_3(2); +ERROR 1105 (HY000): Code: 2602, Text = Unknown Function a_plus_3 (while in analyze select projection). +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-function.md b/tidb-cloud-lake/sql/drop-function.md new file mode 100644 index 0000000000000..d6137cd958dcf --- /dev/null +++ b/tidb-cloud-lake/sql/drop-function.md @@ -0,0 +1,64 @@ +--- +title: DROP FUNCTION +sidebar_position: 6 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Drops a user-defined function. Works with all function types: Scalar SQL, Tabular SQL, and Embedded functions. + +## Syntax + +```sql +DROP FUNCTION [ IF EXISTS ] +``` + +## Examples + +### Dropping Scalar SQL Function +```sql +-- Create a scalar function +CREATE FUNCTION calculate_bmi(weight FLOAT, height FLOAT) +RETURNS FLOAT +AS $$ weight / (height * height) $$; + +-- Drop the function +DROP FUNCTION calculate_bmi; +``` + +### Dropping Tabular SQL Function +```sql +-- Create a table function +CREATE FUNCTION get_employees_by_dept(dept_name VARCHAR) +RETURNS TABLE (id INT, name VARCHAR, department VARCHAR) +AS $$ SELECT id, name, department FROM employees WHERE department = dept_name $$; + +-- Drop the function +DROP FUNCTION get_employees_by_dept; +``` + +### Dropping Embedded Function +```sql +-- Create a Python function +CREATE FUNCTION custom_hash(input_str VARCHAR) +RETURNS VARCHAR +LANGUAGE python +HANDLER = 'hash_func' +AS $$ +import hashlib +def hash_func(s): + return hashlib.md5(s.encode()).hexdigest() +$$; + +-- Drop the function +DROP FUNCTION custom_hash; +``` + +### Using IF EXISTS +```sql +-- Safe drop - won't error if function doesn't exist +DROP FUNCTION IF EXISTS non_existent_function; + +-- This will succeed without error even if the function doesn't exist +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-inverted-index.md b/tidb-cloud-lake/sql/drop-inverted-index.md new file mode 100644 index 0000000000000..6a107597b81b2 --- /dev/null +++ b/tidb-cloud-lake/sql/drop-inverted-index.md @@ -0,0 +1,22 @@ +--- +title: DROP INVERTED INDEX +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Removes an inverted index in Databend. + +## Syntax + +```sql +DROP INVERTED INDEX [IF EXISTS] ON [.]
+``` + +## Examples + +```sql +-- Drop the inverted index 'customer_feedback_idx' on the 'customer_feedback' table +DROP INVERTED INDEX customer_feedback_idx ON customer_feedback; +``` diff --git a/tidb-cloud-lake/sql/drop-masking-policy.md b/tidb-cloud-lake/sql/drop-masking-policy.md new file mode 100644 index 0000000000000..df406a91a0591 --- /dev/null +++ b/tidb-cloud-lake/sql/drop-masking-policy.md @@ -0,0 +1,46 @@ +--- +title: DROP MASKING POLICY +sidebar_position: 3 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +import EEFeature from '@site/src/components/EEFeature'; + + + +Deletes an existing masking policy from Databend. When you drop a masking policy, it is removed from Databend, and its associated masking rules are no longer in effect. Please note that, before dropping a masking policy, ensure that this policy is not associated with any columns. + +## Syntax + +```sql +DROP MASKING POLICY [ IF EXISTS ] +``` + +## Access Control Requirements + +| Privilege | Description | +|:----------|:------------| +| APPLY MASKING POLICY | Required to drop a masking policy unless you own that policy. | + +You must have the global `APPLY MASKING POLICY` privilege or APPLY/OWNERSHIP on the target policy. Databend automatically revokes OWNERSHIP from the creator role after the policy is dropped. + +## Examples + +```sql +CREATE MASKING POLICY email_mask +AS + (val string) + RETURNS string -> + CASE + WHEN current_role() IN ('MANAGERS') THEN + val + ELSE + '*********' + END + COMMENT = 'hide_email'; + +DROP MASKING POLICY email_mask; +``` diff --git a/tidb-cloud-lake/sql/drop-network-policy.md b/tidb-cloud-lake/sql/drop-network-policy.md new file mode 100644 index 0000000000000..1c88c9d5f5e96 --- /dev/null +++ b/tidb-cloud-lake/sql/drop-network-policy.md @@ -0,0 +1,22 @@ +--- +title: DROP NETWORK POLICY +sidebar_position: 5 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Deletes an existing network policy from Databend. When you drop a network policy, it is removed from Databend, and its associated rules for allowed and blocked IP address lists are no longer in effect. Please note that, before dropping a network policy, ensure that this policy is not associated with any users. + +## Syntax + +```sql +DROP NETWORK POLICY [ IF EXISTS ] +``` + +## Examples + +```sql +DROP NETWORK POLICY test_policy +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-ngram-index.md b/tidb-cloud-lake/sql/drop-ngram-index.md new file mode 100644 index 0000000000000..3eb9e13b89088 --- /dev/null +++ b/tidb-cloud-lake/sql/drop-ngram-index.md @@ -0,0 +1,25 @@ +--- +title: DROP NGRAM INDEX +sidebar_position: 4 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Drops an existing NGRAM index from a table. + +## Syntax + +```sql +DROP NGRAM INDEX [IF EXISTS] +ON [.]; +``` + +## Examples + +The following example drops the `idx1` index from the `amazon_reviews_ngram` table: + +```sql +DROP NGRAM INDEX idx1 ON amazon_reviews_ngram; +``` diff --git a/tidb-cloud-lake/sql/drop-notification-integration.md b/tidb-cloud-lake/sql/drop-notification-integration.md new file mode 100644 index 0000000000000..e9891852e8d35 --- /dev/null +++ b/tidb-cloud-lake/sql/drop-notification-integration.md @@ -0,0 +1,31 @@ +--- +title: DROP NOTIFICATION INTEGRATION +sidebar_position: 3 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +The DROP NOTIFICATION INTEGRATION statement is used to delete an existing notification. + +**NOTICE:** this functionality works out of the box only in Databend Cloud. + +## Syntax + +```sql +DROP NOTIFICATION INTEGRATION [ IF EXISTS ] +``` + +| Parameter | Description | +|----------------------------------|------------------------------------------------------------------------------------------------------| +| IF EXISTS | Optional. If specified, the notification will only be dropped if a notification of the same name already exists. | +| name | The name of the notification. This is a mandatory field. | + + +## Usage Examples + +```sql +DROP NOTIFICATION INTEGRATION IF EXISTS error_notification; +``` + +This command deletes the notification integration named `error_notification` if it exists. diff --git a/tidb-cloud-lake/sql/drop-password-policy.md b/tidb-cloud-lake/sql/drop-password-policy.md new file mode 100644 index 0000000000000..9662cb5b77132 --- /dev/null +++ b/tidb-cloud-lake/sql/drop-password-policy.md @@ -0,0 +1,23 @@ +--- +title: DROP PASSWORD POLICY +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Deletes an existing password policy from Databend. Please note that, before dropping a password policy, ensure that this policy is not associated with any users. + +## Syntax + +```sql +DROP PASSWORD POLICY [ IF EXISTS ] +``` + +## Examples + +```sql +CREATE PASSWORD POLICY SecureLogin + PASSWORD_MIN_LENGTH = 10; + +DROP PASSWORD POLICY SecureLogin; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-procedure.md b/tidb-cloud-lake/sql/drop-procedure.md new file mode 100644 index 0000000000000..02b24ae772212 --- /dev/null +++ b/tidb-cloud-lake/sql/drop-procedure.md @@ -0,0 +1,35 @@ +--- +title: DROP PROCEDURE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Deletes an existing stored procedure. + +## Syntax + +```sql +DROP PROCEDURE ([, , ...]) +``` + +- If a procedure has no parameters, use empty parentheses: `DROP PROCEDURE ()`; +- For procedures with parameters, specify the exact types to avoid errors. + +## Examples + +This example creates and then drops a stored procedure: + +```sql +CREATE PROCEDURE convert_kg_to_lb(kg DECIMAL(4, 2)) +RETURNS DECIMAL(10, 2) +LANGUAGE SQL +COMMENT = 'Converts kilograms to pounds' +AS $$ +BEGIN + RETURN kg * 2.20462; +END; +$$; + +DROP PROCEDURE convert_kg_to_lb(Decimal(4, 2)); +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-role.md b/tidb-cloud-lake/sql/drop-role.md new file mode 100644 index 0000000000000..d3347749f8640 --- /dev/null +++ b/tidb-cloud-lake/sql/drop-role.md @@ -0,0 +1,21 @@ +--- +title: DROP ROLE +sidebar_position: 8 +--- + +Removes the specified role from the system. + +## Syntax + +```sql +DROP ROLE [ IF EXISTS ] +``` + +## Usage Notes +* If a role is a grant to users, Databend can't drop the grants from the role automatically. + +## Examples + +```sql +DROP ROLE role1; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-sequence.md b/tidb-cloud-lake/sql/drop-sequence.md new file mode 100644 index 0000000000000..fe6bc75b76a8a --- /dev/null +++ b/tidb-cloud-lake/sql/drop-sequence.md @@ -0,0 +1,26 @@ +--- +title: DROP SEQUENCE +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Deletes an existing sequence from Databend. + +## Syntax + +```sql +DROP SEQUENCE [IF EXISTS] +``` + +| Parameter | Description | +|--------------|-----------------------------------------| +| `` | The name of the sequence to be deleted. | + +## Examples + +```sql +-- Delete a sequence named staff_id_seq +DROP SEQUENCE staff_id_seq; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-stage.md b/tidb-cloud-lake/sql/drop-stage.md new file mode 100644 index 0000000000000..83bd651ab4cbf --- /dev/null +++ b/tidb-cloud-lake/sql/drop-stage.md @@ -0,0 +1,18 @@ +--- +title: DROP STAGE +sidebar_position: 7 +--- + +Removes a stage. + +## Syntax + +```sql +DROP STAGE [ IF EXISTS ] ; +``` + +## Examples + +```sql +DROP STAGE IF EXISTS test_stage; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-stream.md b/tidb-cloud-lake/sql/drop-stream.md new file mode 100644 index 0000000000000..d0b34888aad13 --- /dev/null +++ b/tidb-cloud-lake/sql/drop-stream.md @@ -0,0 +1,25 @@ +--- +title: DROP STREAM +sidebar_position: 3 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +import EEFeature from '@site/src/components/EEFeature'; + + + +Deletes an existing stream. + +## Syntax + +```sql +DROP STREAM [ IF EXISTS ] [ . ] +``` + +## Examples + +```sql +DROP STREAM books_stream_2023; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-table.md b/tidb-cloud-lake/sql/drop-table.md new file mode 100644 index 0000000000000..578b873169676 --- /dev/null +++ b/tidb-cloud-lake/sql/drop-table.md @@ -0,0 +1,59 @@ +--- +title: DROP TABLE +sidebar_position: 19 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Deletes a table. + +**See also:** + +- [CREATE TABLE](/tidb-cloud-lake/sql/create-table.md) +- [UNDROP TABLE](/tidb-cloud-lake/sql/undrop-table.md) +- [TRUNCATE TABLE](/tidb-cloud-lake/sql/truncate-table.md) + +## Syntax + +```sql +DROP TABLE [ IF EXISTS ] [ . ] +``` + +This command only marks the table schema as deleted in the metadata service, ensuring that the actual data remains intact. If you need to recover the deleted table schema, you can use the [UNDROP TABLE](/tidb-cloud-lake/sql/undrop-table.md) command. + +For completely removing a table along with its data files, consider using the [VACUUM DROP TABLE](/tidb-cloud-lake/sql/vacuum-drop-table.md) command. + + +## Examples + +### Deleting a Table + +This example highlights the use of the DROP TABLE command to delete the "test" table. After dropping the table, any attempt to SELECT from it results in an "Unknown table" error. It also demonstrates how to recover the dropped "test" table using the UNDROP TABLE command, allowing you to SELECT data from it again. + +```sql +CREATE TABLE test(a INT, b VARCHAR); +INSERT INTO test (a, b) VALUES (1, 'example'); +SELECT * FROM test; + +a|b | +-+-------+ +1|example| + +-- Delete the table +DROP TABLE test; +SELECT * FROM test; +>> SQL Error [1105] [HY000]: UnknownTable. Code: 1025, Text = error: + --> SQL:1:80 + | +1 | /* ApplicationName=DBeaver 23.2.0 - SQLEditor */ SELECT * FROM test + | ^^^^ Unknown table `default`.`test` in catalog 'default' + +-- Recover the table +UNDROP TABLE test; +SELECT * FROM test; + +a|b | +-+-------+ +1|example| +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-task.md b/tidb-cloud-lake/sql/drop-task.md new file mode 100644 index 0000000000000..97bab69cee9cc --- /dev/null +++ b/tidb-cloud-lake/sql/drop-task.md @@ -0,0 +1,35 @@ +--- +title: DROP TASK +sidebar_position: 3 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +The DROP TASK statement is used to delete an existing task. + +**NOTICE:** this functionality works out of the box only in Databend Cloud. + +## Syntax + +```sql +DROP TASK [ IF EXISTS ] +``` + +| Parameter | Description | +|----------------------------------|------------------------------------------------------------------------------------------------------| +| IF EXISTS | Optional. If specified, the task will only be dropped if a task of the same name already exists. | +| name | The name of the task. This is a mandatory field. | + +## Usage Notes: + +- If a predecessor task in a DAG is dropped, then all former child tasks that identified this task as the predecessor become either standalone tasks or root tasks, depending on whether other tasks identify these former child tasks as their predecessor. These former child tasks are suspended by default and must be resumed manually. +- Root Task must be suspended before DROP + +## Usage Examples + +```sql +DROP TASK IF EXISTS mytask; +``` + +This command deletes the task named mytask if it exists. \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-user.md b/tidb-cloud-lake/sql/drop-user.md new file mode 100644 index 0000000000000..25eba619e0b44 --- /dev/null +++ b/tidb-cloud-lake/sql/drop-user.md @@ -0,0 +1,18 @@ +--- +title: DROP USER +sidebar_position: 4 +--- + +Drop the specified user from the system. + +## Syntax + +```sql +DROP USER [ IF EXISTS ] +``` + +## Examples + +```sql +DROP USER user1; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-vector-index.md b/tidb-cloud-lake/sql/drop-vector-index.md new file mode 100644 index 0000000000000..8f2091b5781ff --- /dev/null +++ b/tidb-cloud-lake/sql/drop-vector-index.md @@ -0,0 +1,34 @@ +--- +title: DROP VECTOR INDEX +sidebar_position: 3 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Removes a Vector index from a table. + +## Syntax + +```sql +DROP VECTOR INDEX [IF EXISTS] ON [.] +``` + +## Examples + +```sql +-- Create a table with a vector index +CREATE TABLE articles ( + id INT, + title VARCHAR, + embedding VECTOR(768), + VECTOR INDEX idx_embedding(embedding) distance = 'cosine' +); + +-- Drop the vector index +DROP VECTOR INDEX idx_embedding ON articles; + +-- Drop with IF EXISTS to avoid errors if index doesn't exist +DROP VECTOR INDEX IF EXISTS idx_embedding ON articles; +``` diff --git a/tidb-cloud-lake/sql/drop-view.md b/tidb-cloud-lake/sql/drop-view.md new file mode 100644 index 0000000000000..990e358da6e37 --- /dev/null +++ b/tidb-cloud-lake/sql/drop-view.md @@ -0,0 +1,21 @@ +--- +title: DROP VIEW +sidebar_position: 5 +--- + +Drop the view. + +## Syntax + +```sql +DROP VIEW [ IF EXISTS ] [ . ]view_name +``` + +## Examples + +```sql +DROP VIEW IF EXISTS tmp_view; + +SELECT * FROM tmp_view; +ERROR 1105 (HY000): Code: 1025, Text = Unknown table 'tmp_view'. +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/drop-warehouse.md b/tidb-cloud-lake/sql/drop-warehouse.md new file mode 100644 index 0000000000000..60849da192f48 --- /dev/null +++ b/tidb-cloud-lake/sql/drop-warehouse.md @@ -0,0 +1,35 @@ +--- +title: DROP WAREHOUSE +sidebar_position: 5 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Removes a warehouse and frees up the resources associated with it. + +## Syntax + +```sql +DROP WAREHOUSE [ IF EXISTS ] +``` + +| Parameter | Description | +| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | +| `IF EXISTS` | Optional. If specified, the command succeeds silently when the warehouse does not exist. Without it, the command fails if the warehouse is absent. | +| warehouse_name | The name of the warehouse to remove. | + +## Examples + +Drop a warehouse: + +```sql +DROP WAREHOUSE my_warehouse; +``` + +Drop a warehouse only if it exists: + +```sql +DROP WAREHOUSE IF EXISTS my_warehouse; +``` diff --git a/tidb-cloud-lake/sql/drop-workload-group.md b/tidb-cloud-lake/sql/drop-workload-group.md new file mode 100644 index 0000000000000..c4ac1ccc350bb --- /dev/null +++ b/tidb-cloud-lake/sql/drop-workload-group.md @@ -0,0 +1,22 @@ +--- +title: DROP WORKLOAD GROUP +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Removes the specified workload group. + +## Syntax + +```sql +DROP WORKLOAD GROUP [IF EXISTS] +``` + +## Examples + +This example removes the `test_workload_group` workload group: + +```sql +DROP WORKLOAD GROUP test_workload_group; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/epoch.md b/tidb-cloud-lake/sql/epoch.md new file mode 100644 index 0000000000000..92e971345d5e6 --- /dev/null +++ b/tidb-cloud-lake/sql/epoch.md @@ -0,0 +1,8 @@ +--- +title: EPOCH +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Alias for [TO_SECONDS](/tidb-cloud-lake/sql/seconds.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/error.md b/tidb-cloud-lake/sql/error.md new file mode 100644 index 0000000000000..009e0ad8ecbc4 --- /dev/null +++ b/tidb-cloud-lake/sql/error.md @@ -0,0 +1,37 @@ +--- +title: ERROR_OR +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the first non-error expression among its inputs. If all expressions result in errors, it returns NULL. + +## Syntax + +```sql +ERROR_OR(expr1, expr2, ...) +``` + +## Examples + +```sql +-- Returns the valid date if no errors occur +-- Returns the current date if the conversion results in an error +SELECT NOW(), ERROR_OR('2024-12-25'::DATE, NOW()::DATE); + +┌────────────────────────────────────────────────────────────────────────┐ +│ now() │ error_or('2024-12-25'::date, now()::date) │ +├────────────────────────────┼───────────────────────────────────────────┤ +│ 2024-03-18 01:22:39.460320 │ 2024-12-25 │ +└────────────────────────────────────────────────────────────────────────┘ + +-- Returns NULL because the conversion results in an error +SELECT ERROR_OR('2024-1234'::DATE); + +┌─────────────────────────────┐ +│ error_or('2024-1234'::date) │ +├─────────────────────────────┤ +│ NULL │ +└─────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/execute-immediate.md b/tidb-cloud-lake/sql/execute-immediate.md new file mode 100644 index 0000000000000..85f7d53c4e804 --- /dev/null +++ b/tidb-cloud-lake/sql/execute-immediate.md @@ -0,0 +1,68 @@ +--- +title: EXECUTE IMMEDIATE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Executes a SQL script. For how to write SQL scripts for Databend, see [Stored Procedure & SQL Scripting](/sql/stored-procedure-scripting). + +## Syntax + +```sql +EXECUTE IMMEDIATE $$ +BEGIN + + RETURN ; -- Use to return a single value + -- OR + RETURN TABLE(); -- Use to return a table +END; +$$; +``` + +## Examples + +This example uses a loop to increment sum by iterating from -1 to 2, and the result is the sum (2): + +```sql +EXECUTE IMMEDIATE $$ +BEGIN + LET x := -1; + LET sum := 0; + FOR x IN x TO x + 3 DO + sum := sum + x; + END FOR; + RETURN sum; +END; +$$; + +┌────────┐ +│ Result │ +│ String │ +├────────┤ +│ 2 │ +└────────┘ +``` + +The following example returns a table with a column `1 + 1` and the value 2: + +```sql +EXECUTE IMMEDIATE $$ +BEGIN + LET x := 1; + RETURN TABLE(SELECT :x + 1); +END; +$$; + +┌───────────┐ +│ Result │ +│ String │ +├───────────┤ +│ ┌───────┐ │ +│ │ 1 + 1 │ │ +│ │ UInt8 │ │ +│ ├───────┤ │ +│ │ 2 │ │ +│ └───────┘ │ +└───────────┘ +``` diff --git a/tidb-cloud-lake/sql/execute-task.md b/tidb-cloud-lake/sql/execute-task.md new file mode 100644 index 0000000000000..6f4d3692b56e1 --- /dev/null +++ b/tidb-cloud-lake/sql/execute-task.md @@ -0,0 +1,32 @@ +--- +title: EXECUTE TASK +sidebar_position: 4 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +The EXECUTE TASK statement is used to execute an existing task manually + +**NOTICE:** this functionality works out of the box only in Databend Cloud. + +## Syntax + +```sql +EXECUTE TASK +``` + +| Parameter | Description | +|----------------------------------|------------------------------------------------------------------------------------------------------| +| name | The name of the task. This is a mandatory field. | + +## Usage Notes: +- The SQL command can only execute a standalone task or the root task in a DAG. If a child task is input, the command returns a user error. + +## Usage Examples + +```sql +EXECUTE TASK mytask; +``` + +This command executes the task named mytask. diff --git a/tidb-cloud-lake/sql/exists.md b/tidb-cloud-lake/sql/exists.md new file mode 100644 index 0000000000000..ac3f037cc1bad --- /dev/null +++ b/tidb-cloud-lake/sql/exists.md @@ -0,0 +1,25 @@ +--- +title: EXISTS +--- + +The exists condition is used in combination with a subquery and is considered "to be met" if the subquery returns at least one row. + +## Syntax + +```sql +WHERE EXISTS ( ); +``` + +## Examples +```sql +SELECT number FROM numbers(5) AS A WHERE exists (SELECT * FROM numbers(3) WHERE number=1); ++--------+ +| number | ++--------+ +| 0 | +| 1 | +| 2 | +| 3 | +| 4 | ++--------+ +``` diff --git a/tidb-cloud-lake/sql/exp.md b/tidb-cloud-lake/sql/exp.md new file mode 100644 index 0000000000000..31fefd05769cc --- /dev/null +++ b/tidb-cloud-lake/sql/exp.md @@ -0,0 +1,23 @@ +--- +title: EXP +--- + +Returns the value of e (the base of natural logarithms) raised to the power of `x`. + +## Syntax + +```sql +EXP( ) +``` + +## Examples + +```sql +SELECT EXP(2); + +┌──────────────────┐ +│ exp(2) │ +├──────────────────┤ +│ 7.38905609893065 │ +└──────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/explain-analyze-graphical.md b/tidb-cloud-lake/sql/explain-analyze-graphical.md new file mode 100644 index 0000000000000..3a7d3b466982d --- /dev/null +++ b/tidb-cloud-lake/sql/explain-analyze-graphical.md @@ -0,0 +1,43 @@ +--- +title: EXPLAIN ANALYZE GRAPHICAL +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Analyzes query performance with an interactive visual representation in your browser. Available exclusively in BendSQL v0.22.2+. + +## Syntax + +```sql +EXPLAIN ANALYZE GRAPHICAL +``` + +## Configuration + +Add to your BendSQL config file `~/.config/bendsql/config.toml`: + +```toml +[server] +bind_address = "127.0.0.1" +auto_open_browser = true +``` + +## Example + +```sql +EXPLAIN ANALYZE GRAPHICAL SELECT l_returnflag, COUNT(*) +FROM lineitem +WHERE l_shipdate <= '1998-09-01' +GROUP BY l_returnflag; +``` + +Output: +```bash +View graphical online: http://127.0.0.1:8080?perf_id=1 +``` + +Opens an interactive view showing execution plan, operator runtimes, and data flow. + +![Graphical Analysis](@site/static/img/documents/sql/explain-graphical.png) \ No newline at end of file diff --git a/tidb-cloud-lake/sql/explain-analyze.md b/tidb-cloud-lake/sql/explain-analyze.md new file mode 100644 index 0000000000000..dbcf683873a56 --- /dev/null +++ b/tidb-cloud-lake/sql/explain-analyze.md @@ -0,0 +1,181 @@ +--- +title: EXPLAIN ANALYZE +--- + +`EXPLAIN ANALYZE` used to display a query execution plan along with actual run-time performance statistics. + +This is useful for analyzing query performance and identifying bottlenecks in a query. + +## Syntax + +```sql +EXPLAIN ANALYZE +``` + +## Examples + +TPC-H Q21: +```sql +EXPLAIN ANALYZE SELECT s_name, + -> Count(*) AS numwait + -> FROM supplier, + -> lineitem l1, + -> orders, + -> nation + -> WHERE s_suppkey = l1.l_suppkey + -> AND o_orderkey = l1.l_orderkey + -> AND o_orderstatus = 'F' + -> AND l1.l_receiptdate > l1.l_commitdate + -> AND EXISTS (SELECT * + -> FROM lineitem l2 + -> WHERE l2.l_orderkey = l1.l_orderkey + -> AND l2.l_suppkey <> l1.l_suppkey) + -> AND NOT EXISTS (SELECT * + -> FROM lineitem l3 + -> WHERE l3.l_orderkey = l1.l_orderkey + -> AND l3.l_suppkey <> l1.l_suppkey + -> AND l3.l_receiptdate > l3.l_commitdate) + -> AND s_nationkey = n_nationkey + -> AND n_name = 'EGYPT' + -> GROUP BY s_name + -> ORDER BY numwait DESC, + -> s_name + -> LIMIT 100; ++------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| explain | ++------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Limit | +| ├── limit: 100 | +| ├── offset: 0 | +| ├── estimated rows: 100.00 | +| ├── total process time: 0ms | +| └── Sort | +| ├── sort keys: [numwait DESC NULLS LAST, s_name ASC NULLS LAST] | +| ├── estimated rows: 11000.00 | +| ├── total process time: 0ms | +| └── EvalScalar | +| ├── expressions: [COUNT(*) (#70)] | +| ├── estimated rows: 11000.00 | +| ├── total process time: 0ms | +| └── AggregateFinal | +| ├── group by: [s_name] | +| ├── aggregate functions: [count()] | +| ├── estimated rows: 11000.00 | +| └── AggregatePartial | +| ├── group by: [s_name] | +| ├── aggregate functions: [count()] | +| ├── estimated rows: 11000.00 | +| ├── total process time: 1ms | +| └── HashJoin | +| ├── join type: LEFT ANTI | +| ├── build keys: [l3.l_orderkey (#52)] | +| ├── probe keys: [l1.l_orderkey (#7)] | +| ├── filters: [noteq(l3.l_suppkey (#54), l1.l_suppkey (#9))] | +| ├── estimated rows: 1633696.00 | +| ├── total process time: 788ms | +| ├── Filter(Build) | +| │ ├── filters: [gt(l3.l_receiptdate (#64), l3.l_commitdate (#63))] | +| │ ├── estimated rows: 2400786.33 | +| │ ├── total process time: 85ms | +| │ └── TableScan | +| │ ├── table: default.tpch.lineitem | +| │ ├── read rows: 7202359 | +| │ ├── read bytes: 42731029 | +| │ ├── partitions total: 9 | +| │ ├── partitions scanned: 9 | +| │ ├── pruning stats: [segments: , blocks: ] | +| │ ├── push downs: [filters: [gt(l3.l_receiptdate (#64), l3.l_commitdate (#63))], limit: NONE] | +| │ ├── output columns: [l_orderkey, l_suppkey, l_commitdate, l_receiptdate] | +| │ └── estimated rows: 7202359.00 | +| └── HashJoin(Probe) | +| ├── join type: LEFT SEMI | +| ├── build keys: [l2.l_orderkey (#36)] | +| ├── probe keys: [l1.l_orderkey (#7)] | +| ├── filters: [noteq(l2.l_suppkey (#38), l1.l_suppkey (#9))] | +| ├── estimated rows: 1633696.00 | +| ├── total process time: 905ms | +| ├── TableScan(Build) | +| │ ├── table: default.tpch.lineitem | +| │ ├── read rows: 7202359 | +| │ ├── read bytes: 17507468 | +| │ ├── partitions total: 9 | +| │ ├── partitions scanned: 9 | +| │ ├── pruning stats: [segments: , blocks: ] | +| │ ├── push downs: [filters: [], limit: NONE] | +| │ ├── output columns: [l_orderkey, l_suppkey] | +| │ └── estimated rows: 7202359.00 | +| └── HashJoin(Probe) | +| ├── join type: INNER | +| ├── build keys: [orders.o_orderkey (#23)] | +| ├── probe keys: [l1.l_orderkey (#7)] | +| ├── filters: [] | +| ├── estimated rows: 1633696.00 | +| ├── total process time: 338ms | +| ├── Filter(Build) | +| │ ├── filters: [eq(orders.o_orderstatus (#25), "F")] | +| │ ├── estimated rows: 550000.00 | +| │ ├── total process time: 42ms | +| │ └── TableScan | +| │ ├── table: default.tpch.orders | +| │ ├── read rows: 1650000 | +| │ ├── read bytes: 5173599 | +| │ ├── partitions total: 3 | +| │ ├── partitions scanned: 3 | +| │ ├── pruning stats: [segments: , blocks: ] | +| │ ├── push downs: [filters: [eq(orders.o_orderstatus (#25), "F")], limit: NONE] | +| │ ├── output columns: [o_orderkey, o_orderstatus] | +| │ └── estimated rows: 1650000.00 | +| └── HashJoin(Probe) | +| ├── join type: INNER | +| ├── build keys: [nation.n_nationkey (#32)] | +| ├── probe keys: [supplier.s_nationkey (#3)] | +| ├── filters: [] | +| ├── estimated rows: 184766.67 | +| ├── total process time: 93ms | +| ├── Filter(Build) | +| │ ├── filters: [eq(nation.n_name (#33), "EGYPT")] | +| │ ├── estimated rows: 16.67 | +| │ ├── total process time: 0ms | +| │ └── TableScan | +| │ ├── table: default.tpch.nation | +| │ ├── read rows: 50 | +| │ ├── read bytes: 566 | +| │ ├── partitions total: 2 | +| │ ├── partitions scanned: 2 | +| │ ├── pruning stats: [segments: , blocks: ] | +| │ ├── push downs: [filters: [eq(nation.n_name (#33), "EGYPT")], limit: NONE] | +| │ ├── output columns: [n_nationkey, n_name] | +| │ └── estimated rows: 50.00 | +| └── HashJoin(Probe) | +| ├── join type: INNER | +| ├── build keys: [supplier.s_suppkey (#0)] | +| ├── probe keys: [l1.l_suppkey (#9)] | +| ├── filters: [] | +| ├── estimated rows: 11086.00 | +| ├── total process time: 447ms | +| ├── TableScan(Build) | +| │ ├── table: default.tpch.supplier | +| │ ├── read rows: 11000 | +| │ ├── read bytes: 42015 | +| │ ├── partitions total: 2 | +| │ ├── partitions scanned: 2 | +| │ ├── pruning stats: [segments: , blocks: ] | +| │ ├── push downs: [filters: [], limit: NONE] | +| │ ├── output columns: [s_suppkey, s_name, s_nationkey] | +| │ └── estimated rows: 11000.00 | +| └── Filter(Probe) | +| ├── filters: [gt(l1.l_receiptdate (#19), l1.l_commitdate (#18))] | +| ├── estimated rows: 2400786.33 | +| ├── total process time: 59ms | +| └── TableScan | +| ├── table: default.tpch.lineitem | +| ├── read rows: 7202359 | +| ├── read bytes: 42731029 | +| ├── partitions total: 9 | +| ├── partitions scanned: 9 | +| ├── pruning stats: [segments: , blocks: ] | +| ├── push downs: [filters: [gt(l1.l_receiptdate (#19), l1.l_commitdate (#18))], limit: NONE] | +| ├── output columns: [l_orderkey, l_suppkey, l_commitdate, l_receiptdate] | +| └── estimated rows: 7202359.00 | ++------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/explain-ast.md b/tidb-cloud-lake/sql/explain-ast.md new file mode 100644 index 0000000000000..9a8a73dc3cc79 --- /dev/null +++ b/tidb-cloud-lake/sql/explain-ast.md @@ -0,0 +1,64 @@ +--- +title: EXPLAIN AST +--- + +Returns the abstract syntax tree (AST) of an SQL statement. The command breaks an SQL statement into syntactic parts and represents them in a hierarchical structure. + +## Syntax + +```sql +EXPLAIN AST +``` + +## Examples + +```sql +EXPLAIN AST create user 'test'@'localhost' identified with sha256_password by 'new_password'; + + ---- + CreateUser (children 3) + ├── User 'test'@'localhost' + ├── AuthType sha256_password + └── Password "new_password" + ``` + + ```sql +EXPLAIN AST insert into t1 (a, b) values (1, 2),(3, 4); + + ---- + Insert (children 3) + ├── TableIdentifier t1 + ├── Columns (children 2) + │ ├── Identifier a + │ └── Identifier b + └── Source (children 1) + └── ValueSource +``` + +```sql +EXPLAIN AST select * from t1 inner join t2 on t1.a = t2.a and t1.b = t2.b and t1.a > 2; + + ---- + Query (children 1) + └── QueryBody (children 1) + └── SelectQuery (children 2) + ├── SelectList (children 1) + │ └── Target * + └── TableList (children 1) + └── TableJoin (children 1) + └── Join (children 3) + ├── TableIdentifier t1 + ├── TableIdentifier t2 + └── ConditionOn (children 1) + └── Function AND (children 2) + ├── Function AND (children 2) + │ ├── Function = (children 2) + │ │ ├── ColumnIdentifier t1.a + │ │ └── ColumnIdentifier t2.a + │ └── Function = (children 2) + │ ├── ColumnIdentifier t1.b + │ └── ColumnIdentifier t2.b + └── Function > (children 2) + ├── ColumnIdentifier t1.a + └── Literal Integer(2) +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/explain-commands.md b/tidb-cloud-lake/sql/explain-commands.md new file mode 100644 index 0000000000000..a2e0bb5ee1c48 --- /dev/null +++ b/tidb-cloud-lake/sql/explain-commands.md @@ -0,0 +1,17 @@ +--- +title: Explain Commands +--- + +This page provides reference information for the explain-related commands in Databend. + +## Commands Overview + +| Command | Use Case | +|---------|----------| +| [`EXPLAIN`](/tidb-cloud-lake/sql/explain.md) | Understanding query structure and optimization | +| [`EXPLAIN ANALYZE`](/tidb-cloud-lake/sql/explain-analyze.md) | Performance analysis with runtime statistics | +| [`EXPLAIN ANALYZE GRAPHICAL`](/tidb-cloud-lake/sql/explain-analyze-graphical.md) | Visual performance analysis (BendSQL only) | +| [`EXPLAIN AST`](/tidb-cloud-lake/sql/explain-ast.md) | SQL parsing and syntax analysis | +| [`EXPLAIN PERF`](/tidb-cloud-lake/sql/explain-perf.md) | Query performance profiling (BendSQL only) | +| [`EXPLAIN RAW`](/tidb-cloud-lake/sql/explain-raw.md) | Internal query processing analysis | +| [`EXPLAIN SYNTAX`](/tidb-cloud-lake/sql/explain-syntax.md) | SQL code formatting and standardization | \ No newline at end of file diff --git a/tidb-cloud-lake/sql/explain-perf.md b/tidb-cloud-lake/sql/explain-perf.md new file mode 100644 index 0000000000000..7622f62e9c320 --- /dev/null +++ b/tidb-cloud-lake/sql/explain-perf.md @@ -0,0 +1,31 @@ +--- +title: EXPLAIN PERF +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + +`EXPLAIN PERF` captures stack traces to perform CPU profiling. This command returns an HTML file containing flame graphs generated from data collected from all nodes in the current cluster. You can directly open this HTML file in your browser. + +It is helpful to analyze query performance and help identify bottlenecks. + +## Syntax + +```sql +EXPLAIN PERF +``` + +## Examples + +```shell +bendsql --quote-style never --query="EXPLAIN PERF SELECT avg(number) FROM numbers(10000000)" > demo.html +``` + +Then, you can open the `demo.html` file in your browser to view the flame graphs: + +graphs + +If the query finishes very quickly, it may not collect enough data, resulting in an empty flame graph. diff --git a/tidb-cloud-lake/sql/explain-raw.md b/tidb-cloud-lake/sql/explain-raw.md new file mode 100644 index 0000000000000..1fede6a2a33cf --- /dev/null +++ b/tidb-cloud-lake/sql/explain-raw.md @@ -0,0 +1,33 @@ +--- +title: EXPLAIN RAW +--- + +Shows the logical execution plan of an SQL statement that you can use to analyze, troubleshoot, and improve the efficiency of your queries. + +## Syntax + +```sql +EXPLAIN RAW +``` + +## Examples + +```sql +explain raw select * from t1, t2 where (t1.a = t2.a and t1.a > 3) or (t1.a = t2.a); + +Project: [a (#0),b (#1),a (#2),b (#3)] + └── EvalScalar: [t1.a (#0), t1.b (#1), t2.a (#2), t2.b (#3)] + └── Filter: [((t1.a (#0) = t2.a (#2)) AND (t1.a (#0) > 3)) OR (t1.a (#0) = t2.a (#2))] + └── LogicalJoin: equi-conditions: [], non-equi-conditions: [] + ├── LogicalGet: default.default.t1 + └── LogicalGet: default.default.t2 + +explain raw select * from t1 inner join t2 on t1.a = t2.a and t1.b = t2.b and t1.a > 2; + + ---- + Project: [a (#0),b (#1),a (#2),b (#3)] + └── EvalScalar: [t1.a (#0), t1.b (#1), t2.a (#2), t2.b (#3)] + └── LogicalJoin: equi-conditions: [(t1.a (#0) = t2.a (#2)) AND (t1.b (#1) = t2.b (#3))], non-equi-conditions: [t1.a (#0) > 2] + ├── LogicalGet: default.default.t1 + └── LogicalGet: default.default.t2 +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/explain-syntax.md b/tidb-cloud-lake/sql/explain-syntax.md new file mode 100644 index 0000000000000..c8890cffb8547 --- /dev/null +++ b/tidb-cloud-lake/sql/explain-syntax.md @@ -0,0 +1,46 @@ +--- +title: EXPLAIN SYNTAX +--- + +Outputs formatted SQL code. This command works as an SQL formatter that makes your code easy to read. + +## Syntax + +```sql +EXPLAIN SYNTAX +``` + +## Examples + +```sql +EXPLAIN SYNTAX select a, sum(b) as sum from t1 where a in (1, 2) and b > 0 and b < 100 group by a order by a; + + ---- + SELECT + a, + sum(b) AS sum + FROM + t1 + WHERE + a IN (1, 2) + AND b > 0 + AND b < 100 + GROUP BY a + ORDER BY a +``` + +```sql +EXPLAIN SYNTAX copy into 's3://mybucket/data.csv' from t1 file_format = ( type = CSV field_delimiter = ',' record_delimiter = '\n' skip_header = 1) size_limit=10; + + ---- + COPY + INTO 's3://mybucket/data.csv' + FROM t1 + FILE_FORMAT = ( + field_delimiter = ",", + record_delimiter = "\n", + skip_header = "1", + type = "CSV" + ) + SIZE_LIMIT = 10 +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/explain.md b/tidb-cloud-lake/sql/explain.md new file mode 100644 index 0000000000000..f988102b2d025 --- /dev/null +++ b/tidb-cloud-lake/sql/explain.md @@ -0,0 +1,56 @@ +--- +title: EXPLAIN +--- + +Shows the execution plan of a SQL statement. An execution plan is shown as a tree consisting of different operators where you can see how Databend will execute the SQL statement. An operator usually includes one or more fields describing the actions Databend will perform or the objects related to the query. + +For example, the following execution plan returned by the EXPLAIN command includes an operator named *TableScan* with several fields. + +```sql +EXPLAIN SELECT * FROM allemployees; + +--- +TableScan +├── table: default.default.allemployees +├── read rows: 5 +├── read bytes: 592 +├── partitions total: 5 +├── partitions scanned: 5 +└── push downs: [filters: [], limit: NONE] +``` + +If you are using Databend Cloud, you can utilize the Query Profile feature to visualize the execution plan of your SQL statements. + +## Syntax + +```sql +EXPLAIN +``` + +## Examples + +```sql +EXPLAIN select t.number from numbers(1) as t, numbers(1) as t1 where t.number = t1.number; +---- +Project +├── columns: [number (#0)] +└── HashJoin + ├── join type: INNER + ├── build keys: [numbers.number (#1)] + ├── probe keys: [numbers.number (#0)] + ├── filters: [] + ├── TableScan(Build) + │ ├── table: default.system.numbers + │ ├── read rows: 1 + │ ├── read bytes: 8 + │ ├── partitions total: 1 + │ ├── partitions scanned: 1 + │ └── push downs: [filters: [], limit: NONE] + └── TableScan(Probe) + ├── table: default.system.numbers + ├── read rows: 1 + ├── read bytes: 8 + ├── partitions total: 1 + ├── partitions scanned: 1 + └── push downs: [filters: [], limit: NONE] +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/external-function.md b/tidb-cloud-lake/sql/external-function.md new file mode 100644 index 0000000000000..c0508c8c5b483 --- /dev/null +++ b/tidb-cloud-lake/sql/external-function.md @@ -0,0 +1,17 @@ +--- +title: External Function +--- + +This page provides a comprehensive overview of External Function operations in Databend, organized by functionality for easy reference. + +## External Function Management + +| Command | Description | +|---------|-------------| +| [CREATE EXTERNAL FUNCTION](/tidb-cloud-lake/sql/create-function.md) | Creates a new external function | +| [ALTER EXTERNAL FUNCTION](/tidb-cloud-lake/sql/alter-function.md) | Modifies an existing external function | +| [DROP EXTERNAL FUNCTION](/tidb-cloud-lake/sql/drop-function.md) | Removes an external function | + +:::note +External Functions in Databend allow you to extend functionality by integrating with external services through HTTP/HTTPS endpoints, enabling you to leverage external processing capabilities. +::: \ No newline at end of file diff --git a/tidb-cloud-lake/sql/extract.md b/tidb-cloud-lake/sql/extract.md new file mode 100644 index 0000000000000..e53c920fc93b9 --- /dev/null +++ b/tidb-cloud-lake/sql/extract.md @@ -0,0 +1,84 @@ +--- +title: EXTRACT +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Retrieves the designated portion of a date, timestamp, or interval. + +See also: [DATE_PART](/tidb-cloud-lake/sql/date-part.md) + +## Syntax + +```sql +-- Extract from a date or timestamp +EXTRACT( + YEAR | QUARTER | MONTH | WEEK | DAY | HOUR | MINUTE | SECOND | + DOW | DOY | EPOCH | ISODOW | YEARWEEK | MILLENNIUM + FROM +) + +-- Extract from an interval +EXTRACT( YEAR | MONTH | WEEK | DAY | HOUR | MINUTE | SECOND | MICROSECOND | EPOCH FROM ) +``` + +| Keyword | Description | +|--------------|-------------------------------------------------------------------------| +| `DOW` | Day of the Week. Sunday (0) through Saturday (6). | +| `DOY` | Day of the Year. 1 through 366. | +| `EPOCH` | The number of seconds since 1970-01-01 00:00:00. | +| `ISODOW` | ISO Day of the Week. Monday (1) through Sunday (7). | +| `YEARWEEK` | The year and week number combined, following ISO 8601 (e.g., 202415). | +| `MILLENNIUM` | The millennium of the date (1 for years 1–1000, 2 for 1001–2000, etc.). | + +## Return Type + +The return type depends on the field being extracted: + +- Returns Integer: When extracting discrete date or time components (e.g., YEAR, MONTH, DAY, DOY, HOUR, MINUTE, SECOND), the function returns an Integer. + + ```sql + SELECT EXTRACT(DAY FROM now()); -- Returns Integer + SELECT EXTRACT(DOY FROM now()); -- Returns Integer + ``` + +- Returns Float: When extracting EPOCH (the number of seconds since 1970-01-01 00:00:00 UTC), the function returns a Float, as it may include fractional seconds. + + ```sql + SELECT EXTRACT(EPOCH FROM now()); -- Returns Float + ``` + +## Examples + +This example extracts various fields from the current timestamp: + +```sql +SELECT + NOW(), + EXTRACT(DAY FROM NOW()), + EXTRACT(DOY FROM NOW()), + EXTRACT(EPOCH FROM NOW()), + EXTRACT(ISODOW FROM NOW()), + EXTRACT(YEARWEEK FROM NOW()), + EXTRACT(MILLENNIUM FROM NOW()); + +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ now() │ EXTRACT(DAY FROM now()) │ EXTRACT(DOY FROM now()) │ EXTRACT(EPOCH FROM now()) │ EXTRACT(ISODOW FROM now()) │ EXTRACT(YEARWEEK FROM now()) │ EXTRACT(MILLENNIUM FROM now()) │ +├────────────────────────────┼─────────────────────────┼─────────────────────────┼───────────────────────────┼────────────────────────────┼──────────────────────────────┼────────────────────────────────┤ +│ 2025-04-16 18:04:22.773888 │ 16 │ 106 │ 1744826662.773888 │ 3 │ 202516 │ 3 │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +This example extracts the number of days from an interval: + +```sql +SELECT EXTRACT(DAY FROM '1 day 2 hours 3 minutes 4 seconds'::INTERVAL); + +┌─────────────────────────────────────────────────────────────────┐ +│ EXTRACT(DAY FROM '1 day 2 hours 3 minutes 4 seconds'::INTERVAL) │ +├─────────────────────────────────────────────────────────────────┤ +│ 1 │ +└─────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/factorial.md b/tidb-cloud-lake/sql/factorial.md new file mode 100644 index 0000000000000..b7a4d5e7dc56f --- /dev/null +++ b/tidb-cloud-lake/sql/factorial.md @@ -0,0 +1,23 @@ +--- +title: FACTORIAL +--- + +Returns the factorial logarithm of `x`. If `x` is less than or equal to 0, the function returns 0. + +## Syntax + +```sql +FACTORIAL( ) +``` + +## Examples + +```sql +SELECT FACTORIAL(5); + +┌──────────────┐ +│ factorial(5) │ +├──────────────┤ +│ 120 │ +└──────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/feistel-obfuscate.md b/tidb-cloud-lake/sql/feistel-obfuscate.md new file mode 100644 index 0000000000000..0babcd0743247 --- /dev/null +++ b/tidb-cloud-lake/sql/feistel-obfuscate.md @@ -0,0 +1,62 @@ +--- +title: FEISTEL_OBFUSCATE +--- + +Deterministically obfuscate integers (e.g. IDs or phone numbers) while preserving bit length and value cardinality so joins still work. + +## Syntax + +```sql +FEISTEL_OBFUSCATE( , ) +``` + +## Arguments + +| Arguments | Description | +| ----------- | ----------- | +| `number` | Input | +| `seed` | The data for corresponding non-text columns for different tables will be transformed in the same way, so the data for different tables can be JOINed after obfuscation | + +## Return Type + +Same as input + +## Examples + +```sql +SELECT feistel_obfuscate(10000,1561819567875); ++------------------------------------------+ +| feistel_obfuscate(10000, 1561819567875) | ++------------------------------------------+ +| 15669 | ++------------------------------------------+ +``` + +feistel_obfuscate preserves the number of bits in the original input. If mapping to a larger range is required, an offset can be added to the original input, e.g. feistel_obfuscate(n+10000,50) +```sql +SELECT feistel_obfuscate(10,1561819567875); ++------------------------------------------+ +| feistel_obfuscate(10, 1561819567875) | ++------------------------------------------+ +| 13 | ++------------------------------------------+ +``` + +Phone-number style example (seed = 4242): + +```sql +SELECT 13000000000 + number AS phone, + feistel_obfuscate(13000000000 + number, 4242) AS masked_phone +FROM numbers(5); + +-- Sample output ++-------------+--------------+ +| phone | masked_phone | ++-------------+--------------+ +| 13000000000 | 12221668677 | +| 13000000001 | 10245458699 | +| 13000000002 | 15398657780 | +| 13000000003 | 9910824758 | +| 13000000004 | 13299971128 | ++-------------+--------------+ +``` diff --git a/tidb-cloud-lake/sql/file-format.md b/tidb-cloud-lake/sql/file-format.md new file mode 100644 index 0000000000000..a73ce120d2891 --- /dev/null +++ b/tidb-cloud-lake/sql/file-format.md @@ -0,0 +1,22 @@ +--- +title: File Format +--- + +This page provides a comprehensive overview of File Format operations in Databend, organized by functionality for easy reference. + +## File Format Management + +| Command | Description | +|---------|-------------| +| [CREATE FILE FORMAT](/tidb-cloud-lake/sql/create-file-format.md) | Creates a named file format object for use in data loading and unloading | +| [DROP FILE FORMAT](/tidb-cloud-lake/sql/drop-file-format.md) | Removes a file format object | + +## File Format Information + +| Command | Description | +|---------|-------------| +| [SHOW FILE FORMATS](/tidb-cloud-lake/sql/show-file-formats.md) | Lists all file formats in the current database | + +:::note +File formats in Databend define how data files should be parsed during data loading operations or formatted during data unloading operations. They provide a reusable way to specify file type, field delimiters, compression, and other formatting options. +::: \ No newline at end of file diff --git a/tidb-cloud-lake/sql/first-value.md b/tidb-cloud-lake/sql/first-value.md new file mode 100644 index 0000000000000..ab29deb9f4104 --- /dev/null +++ b/tidb-cloud-lake/sql/first-value.md @@ -0,0 +1,147 @@ +--- +title: FIRST_VALUE +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the first value in the window frame. + +See also: + +- [LAST_VALUE](/tidb-cloud-lake/sql/last-value.md) +- [NTH_VALUE](/tidb-cloud-lake/sql/nth-value.md) + +## Syntax + +```sql +FIRST_VALUE(expression) [ { RESPECT | IGNORE } NULLS ] +OVER ( + [ PARTITION BY partition_expression ] + ORDER BY sort_expression [ ASC | DESC ] + [ window_frame ] +) +``` + +**Arguments:** +- `expression`: Required. The column or expression to return the first value from. +- `PARTITION BY`: Optional. Divides rows into partitions. +- `ORDER BY`: Required. Determines the ordering within the window. +- `window_frame`: Optional. Defines the window frame. The default is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`. + +**Notes:** +- Returns the first value in the ordered window frame. +- Supports `IGNORE NULLS` to skip null values and `RESPECT NULLS` to keep the default behaviour. +- Specify an explicit window frame (for example, `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`) when you need row-based semantics instead of the default range frame. +- Useful for finding the earliest or lowest value in each group or time window. + +## Examples + +```sql +-- Sample order data +CREATE OR REPLACE TABLE orders_window_demo ( + customer VARCHAR, + order_id INT, + order_time TIMESTAMP, + amount INT, + sales_rep VARCHAR +); + +INSERT INTO orders_window_demo VALUES + ('Alice', 1001, to_timestamp('2024-05-01 09:00:00'), 120, 'Erin'), + ('Alice', 1002, to_timestamp('2024-05-01 11:00:00'), 135, NULL), + ('Alice', 1003, to_timestamp('2024-05-02 14:30:00'), 125, 'Glen'), + ('Bob', 1004, to_timestamp('2024-05-01 08:30:00'), 90, NULL), + ('Bob', 1005, to_timestamp('2024-05-01 20:15:00'), 105, 'Kai'), + ('Bob', 1006, to_timestamp('2024-05-03 10:00:00'), 95, NULL), + ('Carol', 1007, to_timestamp('2024-05-04 09:45:00'), 80, 'Lily'); +``` + +**Example 1. First purchase per customer** + +```sql +SELECT customer, + order_id, + order_time, + amount, + FIRST_VALUE(amount) OVER ( + PARTITION BY customer + ORDER BY order_time + ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW + ) AS first_order_amount +FROM orders_window_demo +ORDER BY customer, order_time; +``` + +Result: +``` +customer | order_id | order_time | amount | first_order_amount +---------+----------+----------------------+--------+-------------------- +Alice | 1001 | 2024-05-01 09:00:00 | 120 | 120 +Alice | 1002 | 2024-05-01 11:00:00 | 135 | 120 +Alice | 1003 | 2024-05-02 14:30:00 | 125 | 120 +Bob | 1004 | 2024-05-01 08:30:00 | 90 | 90 +Bob | 1005 | 2024-05-01 20:15:00 | 105 | 90 +Bob | 1006 | 2024-05-03 10:00:00 | 95 | 90 +Carol | 1007 | 2024-05-04 09:45:00 | 80 | 80 +``` + +**Example 2. First order in the trailing 24 hours** + +```sql +SELECT customer, + order_id, + order_time, + FIRST_VALUE(order_id) OVER ( + PARTITION BY customer + ORDER BY order_time + RANGE BETWEEN INTERVAL 1 DAY PRECEDING AND CURRENT ROW + ) AS first_order_in_24h +FROM orders_window_demo +ORDER BY customer, order_time; +``` + +Result: +``` +customer | order_id | order_time | first_order_in_24h +---------+----------+----------------------+-------------------- +Alice | 1001 | 2024-05-01 09:00:00 | 1001 +Alice | 1002 | 2024-05-01 11:00:00 | 1001 +Alice | 1003 | 2024-05-02 14:30:00 | 1003 +Bob | 1004 | 2024-05-01 08:30:00 | 1004 +Bob | 1005 | 2024-05-01 20:15:00 | 1004 +Bob | 1006 | 2024-05-03 10:00:00 | 1006 +Carol | 1007 | 2024-05-04 09:45:00 | 1007 +``` + +**Example 3. Skip nulls to find the first named sales rep** + +```sql +SELECT customer, + order_id, + sales_rep, + FIRST_VALUE(sales_rep) RESPECT NULLS OVER ( + PARTITION BY customer + ORDER BY order_time + ) AS first_rep_respect, + FIRST_VALUE(sales_rep) IGNORE NULLS OVER ( + PARTITION BY customer + ORDER BY order_time + ) AS first_rep_ignore +FROM orders_window_demo +ORDER BY customer, order_id; +``` + +Result: +``` +customer | order_id | sales_rep | first_rep_respect | first_rep_ignore +---------+----------+-----------+-------------------+------------------ +Alice | 1001 | Erin | Erin | Erin +Alice | 1002 | NULL | Erin | Erin +Alice | 1003 | Glen | Erin | Erin +Bob | 1004 | NULL | NULL | NULL +Bob | 1005 | Kai | NULL | Kai +Bob | 1006 | NULL | NULL | Kai +Carol | 1007 | Lily | Lily | Lily +``` diff --git a/tidb-cloud-lake/sql/first.md b/tidb-cloud-lake/sql/first.md new file mode 100644 index 0000000000000..f8cf23a505bce --- /dev/null +++ b/tidb-cloud-lake/sql/first.md @@ -0,0 +1,9 @@ +--- +title: FIRST +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Alias for [FIRST_VALUE](/tidb-cloud-lake/sql/first-value.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/flashback-table.md b/tidb-cloud-lake/sql/flashback-table.md new file mode 100644 index 0000000000000..a5f7b8cf06027 --- /dev/null +++ b/tidb-cloud-lake/sql/flashback-table.md @@ -0,0 +1,155 @@ +--- +title: FLASHBACK TABLE +sidebar_position: 9 +--- + +Flashback a table to an earlier version with a snapshot ID or timestamp, only involving metadata operations, making it a fast process. + +By the snapshot ID or timestamp you specify in the command, Databend flashback the table to a prior state where the snapshot was created. To retrieve snapshot IDs and timestamps of a table, use [FUSE_SNAPSHOT](/tidb-cloud-lake/sql/fuse-snapshot.md). + +The capability to flash back a table is subject to these conditions: + +- The command only existing tables to their prior states. To recover a dropped table, use [UNDROP TABLE](/tidb-cloud-lake/sql/undrop-table.md). + +- Flashback a table is part of Databend's time travel feature. Before using the command, make sure the table you want to flashback is eligible for time travel. For example, the command doesn't work for transient tables because Databend does not create or store snapshots for such tables. + +- You cannot roll back after flashback a table to a prior state, but you can flash back the table again to an earlier state. + +- Databend recommends this command for emergency recovery only. To query the history data of a table, use the [AT](/tidb-cloud-lake/sql/at.md) clause. + +## Syntax + +```sql +-- Restore with a snapshot ID +ALTER TABLE
FLASHBACK TO (SNAPSHOT => ''); + +-- Restore with a snapshot timestamp +ALTER TABLE
FLASHBACK TO (TIMESTAMP => ''::TIMESTAMP); +``` + +## Example + +### Step 1: Create a sample users table and insert data +```sql +-- Create a sample users table +CREATE TABLE users ( + id INT, + first_name VARCHAR, + last_name VARCHAR, + email VARCHAR, + registration_date TIMESTAMP +); + +-- Insert sample data +INSERT INTO users (id, first_name, last_name, email, registration_date) +VALUES (1, 'John', 'Doe', 'john.doe@example.com', '2023-01-01 00:00:00'), + (2, 'Jane', 'Doe', 'jane.doe@example.com', '2023-01-02 00:00:00'); +``` + +Data: +```sql +SELECT * FROM users; ++------+------------+-----------+----------------------+----------------------------+ +| id | first_name | last_name | email | registration_date | ++------+------------+-----------+----------------------+----------------------------+ +| 1 | John | Doe | john.doe@example.com | 2023-01-01 00:00:00.000000 | +| 2 | Jane | Doe | jane.doe@example.com | 2023-01-02 00:00:00.000000 | ++------+------------+-----------+----------------------+----------------------------+ +``` + +Snapshots: +```sql +SELECT * FROM Fuse_snapshot('default', 'users')\G; +*************************** 1. row *************************** + snapshot_id: c5c538d6b8bc42f483eefbddd000af7d + snapshot_location: 29356/44446/_ss/c5c538d6b8bc42f483eefbddd000af7d_v2.json + format_version: 2 +previous_snapshot_id: NULL + segment_count: 1 + block_count: 1 + row_count: 2 + bytes_uncompressed: 150 + bytes_compressed: 829 + index_size: 1028 + timestamp: 2023-04-19 04:20:25.062854 +``` + +### Step 2: Simulate an accidental delete operation + +```sql +-- Simulate an accidental delete operation +DELETE FROM users WHERE id = 1; +``` + +Data: +```sql ++------+------------+-----------+----------------------+----------------------------+ +| id | first_name | last_name | email | registration_date | ++------+------------+-----------+----------------------+----------------------------+ +| 2 | Jane | Doe | jane.doe@example.com | 2023-01-02 00:00:00.000000 | ++------+------------+-----------+----------------------+----------------------------+ +``` + +Snapshots: +```sql +SELECT * FROM Fuse_snapshot('default', 'users')\G; +*************************** 1. row *************************** + snapshot_id: 7193af51a4c9423ebd6ddbb04327b280 + snapshot_location: 29356/44446/_ss/7193af51a4c9423ebd6ddbb04327b280_v2.json + format_version: 2 +previous_snapshot_id: c5c538d6b8bc42f483eefbddd000af7d + segment_count: 1 + block_count: 1 + row_count: 1 + bytes_uncompressed: 87 + bytes_compressed: 778 + index_size: 1028 + timestamp: 2023-04-19 04:22:20.390430 +*************************** 2. row *************************** + snapshot_id: c5c538d6b8bc42f483eefbddd000af7d + snapshot_location: 29356/44446/_ss/c5c538d6b8bc42f483eefbddd000af7d_v2.json + format_version: 2 +previous_snapshot_id: NULL + segment_count: 1 + block_count: 1 + row_count: 2 + bytes_uncompressed: 150 + bytes_compressed: 829 + index_size: 1028 + timestamp: 2023-04-19 04:20:25.062854 +``` + +### Step 3: Find the snapshot ID before the delete operation +```sql +-- Assume the snapshot_id from the previous query is 'xxxxxx' +-- Restore the table to the snapshot before the delete operation +ALTER TABLE users FLASHBACK TO (SNAPSHOT => 'c5c538d6b8bc42f483eefbddd000af7d'); +``` + +Data: +```sql +SELECT * FROM users; ++------+------------+-----------+----------------------+----------------------------+ +| id | first_name | last_name | email | registration_date | ++------+------------+-----------+----------------------+----------------------------+ +| 1 | John | Doe | john.doe@example.com | 2023-01-01 00:00:00.000000 | +| 2 | Jane | Doe | jane.doe@example.com | 2023-01-02 00:00:00.000000 | ++------+------------+-----------+----------------------+----------------------------+ +``` + +Snapshot: +```sql +SELECT * FROM Fuse_snapshot('default', 'users')\G; +*************************** 1. row *************************** + snapshot_id: c5c538d6b8bc42f483eefbddd000af7d + snapshot_location: 29356/44446/_ss/c5c538d6b8bc42f483eefbddd000af7d_v2.json + format_version: 2 +previous_snapshot_id: NULL + segment_count: 1 + block_count: 1 + row_count: 2 + bytes_uncompressed: 150 + bytes_compressed: 829 + index_size: 1028 + timestamp: 2023-04-19 04:20:25.062854 +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/flatten.md b/tidb-cloud-lake/sql/flatten.md new file mode 100644 index 0000000000000..bbab7de7070eb --- /dev/null +++ b/tidb-cloud-lake/sql/flatten.md @@ -0,0 +1,137 @@ +--- +title: FLATTEN +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Transforms nested JSON or array data into a tabular format, where each element or field is represented as a separate row. + +## Syntax + +```sql +[LATERAL] FLATTEN ( + INPUT => + [, PATH => ] + [, OUTER => TRUE | FALSE] + [, RECURSIVE => TRUE | FALSE] + [, MODE => 'OBJECT' | 'ARRAY' | 'BOTH'] +) +``` + +## Parameters + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `INPUT` | JSON or array data to flatten | Required | +| `PATH` | Path to the array/object to flatten | None | +| `OUTER` | Include rows with zero results (with NULL values) | `FALSE` | +| `RECURSIVE` | Flatten nested elements | `FALSE` | +| `MODE` | Flatten objects, arrays, or both | `'BOTH'` | +| `LATERAL` | Enable cross-referencing with preceding table expressions | Optional | + +## Output Columns + +| Column | Description | +|--------|-------------| +| `SEQ` | Sequence number for the input | +| `KEY` | Key of the expanded value (NULL if none) | +| `PATH` | Path to the flattened element | +| `INDEX` | Array index (NULL for objects) | +| `VALUE` | Value of the flattened element | +| `THIS` | Element being flattened | + +**Note:** When using LATERAL, output columns may vary due to dynamic cross-referencing. + +## Examples + +### Basic Flattening + +```sql +-- Flatten a JSON object with nested structures +SELECT * FROM FLATTEN( + INPUT => PARSE_JSON( + '{"name": "John", "languages": ["English", "Spanish"], "address": {"city": "New York"}}' + ) +); +``` + +Results in top-level keys being flattened: + +```text +| seq | key | path | index | value | this | +|-----|-----------|-----------|-------|----------------------|----------------------| +| 1 | name | name | NULL | "John" | {original JSON} | +| 1 | languages | languages | NULL | ["English","Spanish"]| {original JSON} | +| 1 | address | address | NULL | {"city":"New York"} | {original JSON} | +``` + +### Using PATH Parameter + +```sql +-- Flatten only the languages array by specifying the PATH +SELECT * FROM FLATTEN( + INPUT => PARSE_JSON( + '{"name": "John", "languages": ["English", "Spanish"]}' + ), + PATH => 'languages' +); +``` + +Results in array elements being flattened: + +```text +| seq | key | path | index | value | this | +|-----|------|--------------|-------|-----------|-------------------| +| 1 | NULL | languages[0] | 0 | "English" | ["English","Spanish"] | +| 1 | NULL | languages[1] | 1 | "Spanish" | ["English","Spanish"] | +``` + +### Recursive Flattening + +```sql +-- Recursively flatten nested objects and arrays +SELECT * FROM FLATTEN( + INPUT => PARSE_JSON( + '{"name": "John", "address": {"city": "New York", "zip": 10001}}' + ), + RECURSIVE => TRUE +); +``` + +Results in nested objects being flattened: + +```text +| seq | key | path | index | value | this | +|-----|---------|--------------|-------|-------------|-----------------| +| 1 | name | name | NULL | "John" | {original JSON} | +| 1 | address | address | NULL | {"city":...}| {original JSON} | +| 1 | city | address.city | NULL | "New York" | {"city":...} | +| 1 | zip | address.zip | NULL | 10001 | {"city":...} | +``` + +### Using LATERAL FLATTEN + +```sql +-- Use LATERAL FLATTEN to transform a JSON array into rows +-- This allows direct access to array elements without a table +SELECT + f.value:item::STRING AS item_name, + f.value:price::FLOAT AS price +FROM + LATERAL FLATTEN( + INPUT => PARSE_JSON('[ + {"item":"coffee", "price":2.50}, + {"item":"donut", "price":1.20} + ]') + ) f; +``` + +Results: + +```text +| item_name | price | +|-----------|-------| +| coffee | 2.5 | +| donut | 1.2 | +``` diff --git a/tidb-cloud-lake/sql/floor.md b/tidb-cloud-lake/sql/floor.md new file mode 100644 index 0000000000000..d9ecb37445b56 --- /dev/null +++ b/tidb-cloud-lake/sql/floor.md @@ -0,0 +1,23 @@ +--- +title: FLOOR +--- + +Rounds the number down. + +## Syntax + +```sql +FLOOR( ) +``` + +## Examples + +```sql +SELECT FLOOR(1.23); + +┌─────────────┐ +│ floor(1.23) │ +├─────────────┤ +│ 1 │ +└─────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/from-base64.md b/tidb-cloud-lake/sql/from-base64.md new file mode 100644 index 0000000000000..f755b8de76989 --- /dev/null +++ b/tidb-cloud-lake/sql/from-base64.md @@ -0,0 +1,34 @@ +--- +title: FROM_BASE64 +--- + +Takes a string encoded with the base-64 encoded rules and returns the decoded result as a binary. +The result is NULL if the argument is NULL or not a valid base-64 string. + +## Syntax + +```sql +FROM_BASE64() +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------------| +| `` | The string value. | + +## Return Type + +`BINARY` + +## Examples + +```sql +SELECT TO_BASE64('abc'), FROM_BASE64(TO_BASE64('abc')) as b, b::String; +┌───────────────────────────────────────┐ +│ to_base64('abc') │ b │ b::string │ +│ String │ Binary │ String │ +├──────────────────┼────────┼───────────┤ +│ YWJj │ 616263 │ abc │ +└───────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/from-hex.md b/tidb-cloud-lake/sql/from-hex.md new file mode 100644 index 0000000000000..2dbfb1c63ea1c --- /dev/null +++ b/tidb-cloud-lake/sql/from-hex.md @@ -0,0 +1,5 @@ +--- +title: FROM_HEX +--- + +Alias for [UNHEX](/tidb-cloud-lake/sql/unhex.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/full-text-search-functions.md b/tidb-cloud-lake/sql/full-text-search-functions.md new file mode 100644 index 0000000000000..b94bf94bb53f1 --- /dev/null +++ b/tidb-cloud-lake/sql/full-text-search-functions.md @@ -0,0 +1,92 @@ +--- +title: Full-Text Search Functions +--- + +Databend's full-text search functions deliver search-engine-style filtering for semi-structured `VARIANT` data and plain text columns that are indexed with an inverted index. They are ideal for AI-generated metadata—such as perception results from autonomous-driving video frames—stored alongside your assets. + +:::info +Databend's search functions are inspired by [Elasticsearch Full-Text Search Functions](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html). +::: + +Include an inverted index in the table definition for the columns you plan to search: + +```sql +CREATE OR REPLACE TABLE frames ( + id INT, + meta VARIANT, + INVERTED INDEX idx_meta (meta) +); +``` + +## Search Functions + +| Function | Description | Example | +|----------|-------------|---------| +| [MATCH](/tidb-cloud-lake/sql/match.md) | Performs a relevance-ranked search across the listed columns. | `MATCH('summary, tags', 'traffic light red')` | +| [QUERY](/tidb-cloud-lake/sql/query.md) | Evaluates a Lucene-style query expression, including nested `VARIANT` fields. | `QUERY('meta.signals.traffic_light:red')` | +| [SCORE](/tidb-cloud-lake/sql/score.md) | Returns the relevance score for the current row when used with `MATCH` or `QUERY`. | `SELECT summary, SCORE() FROM frame_notes WHERE MATCH('summary, tags', 'traffic light red')` | + +## Query Syntax Examples + +### Example: Single Keyword + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.detections.label:pedestrian') +LIMIT 100; +``` + +### Example: Boolean AND + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.signals.traffic_light:red AND meta.vehicle.lane:center') +LIMIT 100; +``` + +### Example: Boolean OR + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.signals.traffic_light:red OR meta.detections.label:bike') +LIMIT 100; +``` + +### Example: IN List + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.tags:IN [stop urban]') +LIMIT 100; +``` + +### Example: Inclusive Range + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.vehicle.speed_kmh:[0 TO 10]') +LIMIT 100; +``` + +### Example: Exclusive Range + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.vehicle.speed_kmh:{0 TO 10}') +LIMIT 100; +``` + +### Example: Boosted Fields + +```sql +SELECT id, meta['frame']['timestamp'] AS ts, SCORE() +FROM frames +WHERE QUERY('meta.signals.traffic_light:red^1.0 AND meta.tags:urban^2.0') +LIMIT 100; +``` diff --git a/tidb-cloud-lake/sql/fuse-block.md b/tidb-cloud-lake/sql/fuse-block.md new file mode 100644 index 0000000000000..b448b148dd354 --- /dev/null +++ b/tidb-cloud-lake/sql/fuse-block.md @@ -0,0 +1,36 @@ +--- +title: FUSE_BLOCK +--- + +Returns the block information of the latest or specified snapshot of a table. For more information about what is block in Databend, see [What are Snapshot, Segment, and Block?](/tidb-cloud-lake/sql/optimize-table.md#what-are-snapshot-segment-and-block). + +The command returns the location information of each parquet file referenced by a snapshot. This enables downstream applications to access and consume the data stored in the files. + +See Also: + +- [FUSE_SNAPSHOT](/tidb-cloud-lake/sql/fuse-snapshot.md) +- [FUSE_SEGMENT](/tidb-cloud-lake/sql/fuse-segment.md) + +## Syntax + +```sql +FUSE_BLOCK('', ''[, '']) +``` + +## Examples + +```sql +CREATE TABLE mytable(c int); +INSERT INTO mytable values(1); +INSERT INTO mytable values(2); + +SELECT * FROM FUSE_BLOCK('default', 'mytable'); + +--- ++----------------------------------+----------------------------+----------------------------------------------------+------------+----------------------------------------------------+-------------------+ +| snapshot_id | timestamp | block_location | block_size | bloom_filter_location | bloom_filter_size | ++----------------------------------+----------------------------+----------------------------------------------------+------------+----------------------------------------------------+-------------------+ +| 51e84b56458f44269b05a059b364a659 | 2022-09-15 07:14:14.137268 | 1/7/_b/39a6dbbfd9b44ad5a8ec8ab264c93cf5_v0.parquet | 4 | 1/7/_i/39a6dbbfd9b44ad5a8ec8ab264c93cf5_v1.parquet | 221 | +| 51e84b56458f44269b05a059b364a659 | 2022-09-15 07:14:14.137268 | 1/7/_b/d0ee9688c4d24d6da86acd8b0d6f4fad_v0.parquet | 4 | 1/7/_i/d0ee9688c4d24d6da86acd8b0d6f4fad_v1.parquet | 219 | ++----------------------------------+----------------------------+----------------------------------------------------+------------+----------------------------------------------------+-------------------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/fuse-column.md b/tidb-cloud-lake/sql/fuse-column.md new file mode 100644 index 0000000000000..c29b1ee44195c --- /dev/null +++ b/tidb-cloud-lake/sql/fuse-column.md @@ -0,0 +1,36 @@ +--- +title: FUSE_COLUMN +--- + +Returns the column information of the latest or specified snapshot of a table. For more information about what is block in Databend, see [What are Snapshot, Segment, and Block?](/tidb-cloud-lake/sql/optimize-table.md#what-are-snapshot-segment-and-block). + + +See Also: + +- [FUSE_SNAPSHOT](/tidb-cloud-lake/sql/fuse-snapshot.md) +- [FUSE_SEGMENT](/tidb-cloud-lake/sql/fuse-segment.md) +- [FUSE_BLOCK](/tidb-cloud-lake/sql/fuse-block.md) + +## Syntax + +```sql +FUSE_COLUMN('', ''[, '']) +``` + +## Examples + +```sql +CREATE TABLE mytable(c int); +INSERT INTO mytable values(1); +INSERT INTO mytable values(2); + +SELECT * FROM FUSE_COLUMN('default', 'mytable'); + +--- ++----------------------------------+----------------------------+---------------------------------------------------------+------------+-----------+-----------+-------------+-------------+-----------+--------------+------------------+ +| snapshot_id | timestamp | block_location | block_size | file_size | row_count | column_name | column_type | column_id | block_offset | bytes_compressed | ++----------------------------------+----------------------------+---------------------------------------------------------+------------+-----------+-----------+-------------+-------------+-----------+--------------+------------------+ +| 3faefc1a9b6a48f388a8b59228dd06c1 | 2023-07-18 03:06:30.276502 | 1/118746/_b/44df130c207745cb858928135d39c1c0_v2.parquet | 4 | 196 | 1 | c | Int32 | 0 | 8 | 14 | +| 3faefc1a9b6a48f388a8b59228dd06c1 | 2023-07-18 03:06:30.276502 | 1/118746/_b/b6f8496d7e3f4f62a89c09572840cf70_v2.parquet | 4 | 196 | 1 | c | Int32 | 0 | 8 | 14 | ++----------------------------------+----------------------------+---------------------------------------------------------+------------+-----------+-----------+-------------+-------------+-----------+--------------+------------------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/fuse-encoding.md b/tidb-cloud-lake/sql/fuse-encoding.md new file mode 100644 index 0000000000000..35351ff115f80 --- /dev/null +++ b/tidb-cloud-lake/sql/fuse-encoding.md @@ -0,0 +1,57 @@ +--- +title: FUSE_ENCODING +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the encoding types applied to a specific column within a table. It helps you understand how data is compressed and stored in a native format within the table. + +## Syntax + +```sql +FUSE_ENCODING('', '', '') +``` + +The function returns a result set with the following columns: + +| Column | Data Type | Description | +|-------------------|------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| VALIDITY_SIZE | Nullable(UInt32) | The size of a bitmap value that indicates whether each row in the column has a non-null value. This bitmap is used to track the presence or absence of null values in the column's data. | +| COMPRESSED_SIZE | UInt32 | The size of the column data after compression. | +| UNCOMPRESSED_SIZE | UInt32 | The size of the column data before applying encoding. | +| LEVEL_ONE | String | The primary or initial encoding applied to the column. | +| LEVEL_TWO | Nullable(String) | A secondary or recursive encoding method applied to the column after the initial encoding. | + +## Examples + +```sql +-- Create a table with an integer column 'c' and apply 'Lz4' compression +CREATE TABLE t(c INT) STORAGE_FORMAT = 'native' COMPRESSION = 'lz4'; + +-- Insert data into the table. +INSERT INTO t SELECT number FROM numbers(2048); + +-- Analyze the encoding for column 'c' in table 't' +SELECT LEVEL_ONE, LEVEL_TWO, COUNT(*) +FROM FUSE_ENCODING('default', 't', 'c') +GROUP BY LEVEL_ONE, LEVEL_TWO; + +level_one |level_two|count(*)| +------------+---------+--------+ +DeltaBitpack| | 1| + +-- Insert 2,048 rows with the value 1 into the table 't' +INSERT INTO t (c) +SELECT 1 +FROM numbers(2048); + +SELECT LEVEL_ONE, LEVEL_TWO, COUNT(*) +FROM FUSE_ENCODING('default', 't', 'c') +GROUP BY LEVEL_ONE, LEVEL_TWO; + +level_one |level_two|count(*)| +------------+---------+--------+ +OneValue | | 1| +DeltaBitpack| | 1| +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/fuse-engine-tables.md b/tidb-cloud-lake/sql/fuse-engine-tables.md new file mode 100644 index 0000000000000..e35cb7e4ede1a --- /dev/null +++ b/tidb-cloud-lake/sql/fuse-engine-tables.md @@ -0,0 +1,203 @@ +--- +title: Fuse Engine Tables +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +## Overview + +Databend uses the Fuse Engine as its default storage engine, providing a Git-like data management system with: + +- **Snapshot-based Architecture**: Query and restore data at any point in time, with history of data changes for recovery +- **High Performance**: Optimized for analytical workloads with automatic indexing and bloom filters +- **Efficient Storage**: Uses Parquet format with high compression for optimal storage efficiency +- **Flexible Configuration**: Customizable compression, indexing, and storage options +- **Data Maintenance**: Automatic data retention, snapshot management, and change tracking capabilities + +## When to Use Fuse Engine + +Ideal for: +- **Analytics**: OLAP queries with columnar storage +- **Data Warehousing**: Large volumes of historical data +- **Time-Travel**: Access to historical data versions +- **Cloud Storage**: Optimized for object storage + +## Syntax + +```sql +CREATE TABLE ( + +) [ENGINE = FUSE] +[CLUSTER BY ( [, , ...] )] +[]; +``` + +For more details about the `CREATE TABLE` syntax, see [CREATE TABLE](/tidb-cloud-lake/sql/create-table.md). + +## Parameters + +Below are the main parameters for creating a Fuse Engine table: + +#### `ENGINE` +- **Description:** + If an engine is not explicitly specified, Databend will automatically default to using the Fuse Engine to create tables, which is equivalent to `ENGINE = FUSE`. + +--- + +#### `CLUSTER BY` +- **Description:** + Specifies the sorting method for data that consists of multiple expressions. For more information, see [Cluster Key](/tidb-cloud-lake/guides/cluster-key-performance.md). + +--- + +#### `` +- **Description:** + The Fuse Engine offers various options (case-insensitive) that allow you to customize the table's properties. + - See [Fuse Engine Options](#fuse-engine-options) for details. + - Separate multiple options with a space. + - Use [ALTER TABLE](/tidb-cloud-lake/sql/alter-table.md#fuse-engine-options) to modify a table's options. + - Use [SHOW CREATE TABLE](/tidb-cloud-lake/sql/show-create-table.md) to show a table's options. + +--- + +## Fuse Engine Options + +Below are the available Fuse Engine options, grouped by their purpose: + +--- + +### `compression` +- **Syntax:** + `compression = ''` +- **Description:** + Specifies the compression method for the engine. Compression options include lz4, zstd, snappy, or none. The compression method defaults to zstd in object storage and lz4 in file system (fs) storage. + +--- + +### `snapshot_loc` +- **Syntax:** + `snapshot_loc = ''` +- **Description:** + Specifies a location parameter in string format, allowing easy sharing of a table without data copy. + +--- + + +### `block_size_threshold` +- **Syntax:** + `block_size_threshold = ` +- **Description:** + Specifies the maximum block size in bytes. Defaults to 104,857,600 bytes. + +--- + +### `block_per_segment` +- **Syntax:** + `block_per_segment = ` +- **Description:** + Specifies the maximum number of blocks in a segment. Defaults to 1,000. + +--- + +### `row_per_block` +- **Syntax:** + `row_per_block = ` +- **Description:** + Specifies the maximum number of rows in a file. Defaults to 1,000,000. + +--- + +### `bloom_index_columns` +- **Syntax:** + `bloom_index_columns = ' [, ...]'` +- **Description:** + Specifies the columns to be used for the bloom index. The data type of these columns can be Map, Number, String, Date, or Timestamp. If no specific columns are specified, the bloom index is created by default on all supported columns. `bloom_index_columns=''` disables the bloom indexing. + +--- + +### `change_tracking` +- **Syntax:** + `change_tracking = True / False` +- **Description:** + Setting this option to `True` in the Fuse Engine allows for tracking changes for a table. Creating a stream for a table will automatically set `change_tracking` to `True` and introduce additional hidden columns to the table as change tracking metadata. For more information, see [How Stream Works](/tidb-cloud-lake/sql/stream.md#how-stream-works). + +--- + +### `data_retention_period_in_hours` +- **Syntax:** + `data_retention_period_in_hours = ` +- **Description:** + Specifies the number of hours to retain table data. The minimum value is 1 hour. The maximum value is defined by the `data_retention_time_in_days_max` setting in the [databend-query.toml](https://github.com/databendlabs/databend/blob/main/scripts/distribution/configs/databend-query.toml) configuration file, or defaults to 2,160 hours (90 days x 24 hours) if not specified. + +--- + +### `enable_auto_vacuum` +- **Syntax:** + `enable_auto_vacuum = 0 / 1` +- **Description:** + Controls whether a table automatically triggers vacuum operations during mutations. This can be set globally as a setting for all tables or configured at the table level. The table-level option has a higher priority than the session/global setting of the same name. When enabled (set to 1), vacuum operations will be automatically triggered after mutations like INSERT or ALTER TABLE, cleaning up the table data according to the configured retention policy. + + **Examples:** + ```sql + -- Set enable_auto_vacuum globally for all tables across all sessions + SET GLOBAL enable_auto_vacuum = 1; + + -- Create a table with auto vacuum disabled (overrides global setting) + CREATE OR REPLACE TABLE t1 (id INT) ENABLE_AUTO_VACUUM = 0; + INSERT INTO t1 VALUES(1); -- Won't trigger vacuum despite global setting + + -- Create another table that inherits the global setting + CREATE OR REPLACE TABLE t2 (id INT); + INSERT INTO t2 VALUES(1); -- Will trigger vacuum due to global setting + + -- Enable auto vacuum for an existing table + ALTER TABLE t1 SET OPTIONS(ENABLE_AUTO_VACUUM = 1); + INSERT INTO t1 VALUES(2); -- Now will trigger vacuum + + -- Table option takes precedence over global settings + SET GLOBAL enable_auto_vacuum = 0; -- Turn off globally + -- t1 will still vacuum because table setting overrides global + INSERT INTO t1 VALUES(3); -- Will still trigger vacuum + INSERT INTO t2 VALUES(2); -- Won't trigger vacuum anymore + ``` + +--- + +### `data_retention_num_snapshots_to_keep` +- **Syntax:** + `data_retention_num_snapshots_to_keep = ` +- **Description:** + Specifies the number of snapshots to retain during vacuum operations. This can be set globally as a setting for all tables or configured at the table level. The table-level option has a higher priority than the session/global setting of the same name. When set, only the specified number of most recent snapshots will be kept after vacuum operations. Overrides the `data_retention_time_in_days` setting. If set to 0, this setting will be ignored. This option works in conjunction with the `enable_auto_vacuum` setting to provide granular control over snapshot retention policies. + + **Examples:** + ```sql + -- Set global retention to 10 snapshots for all tables across all sessions + SET GLOBAL data_retention_num_snapshots_to_keep = 10; + + -- Create a table with custom snapshot retention (overrides global setting) + CREATE OR REPLACE TABLE t1 (id INT) + enable_auto_vacuum = 1 + data_retention_num_snapshots_to_keep = 5; + + -- Create another table that inherits the global setting + CREATE OR REPLACE TABLE t2 (id INT) enable_auto_vacuum = 1; + + -- When vacuum is triggered: + -- t1 will keep 5 snapshots (table setting) + -- t2 will keep 10 snapshots (global setting) + + -- Change global setting + SET GLOBAL data_retention_num_snapshots_to_keep = 20; + + -- Table options still take precedence: + -- t1 will still keep only 5 snapshots + -- t2 will now keep 20 snapshots + + -- Modify snapshot retention for an existing table + ALTER TABLE t1 SET OPTIONS(data_retention_num_snapshots_to_keep = 3); + -- Now t1 will keep 3 snapshots when vacuum is triggered + ``` + +--- diff --git a/tidb-cloud-lake/sql/fuse-segment.md b/tidb-cloud-lake/sql/fuse-segment.md new file mode 100644 index 0000000000000..f7313ec5690a7 --- /dev/null +++ b/tidb-cloud-lake/sql/fuse-segment.md @@ -0,0 +1,44 @@ +--- +title: FUSE_SEGMENT +--- + +Returns the segment information of a specified table snapshot. For more information about what is segment in Databend, see [What are Snapshot, Segment, and Block?](/tidb-cloud-lake/sql/optimize-table.md#what-are-snapshot-segment-and-block). + +See Also: + +- [FUSE_SNAPSHOT](/tidb-cloud-lake/sql/fuse-snapshot.md) +- [FUSE_BLOCK](/tidb-cloud-lake/sql/fuse-block.md) + +## Syntax + +```sql +FUSE_SEGMENT('', '','') +``` + +## Examples + +```sql +CREATE TABLE mytable(c int); +INSERT INTO mytable values(1); +INSERT INTO mytable values(2); + +-- Obtain a snapshot ID +SELECT snapshot_id FROM FUSE_SNAPSHOT('default', 'mytable') limit 1; + +--- ++----------------------------------+ +| snapshot_id | ++----------------------------------+ +| 82c572947efa476892bd7c0635158ba2 | ++----------------------------------+ + +SELECT * FROM FUSE_SEGMENT('default', 'mytable', '82c572947efa476892bd7c0635158ba2'); + +--- ++----------------------------------------------------+----------------+-------------+-----------+--------------------+------------------+ +| file_location | format_version | block_count | row_count | bytes_uncompressed | bytes_compressed | ++----------------------------------------------------+----------------+-------------+-----------+--------------------+------------------+ +| 1/319/_sg/d35fe7bf99584301b22e8f6a8a9c97f9_v1.json | 1 | 1 | 1 | 4 | 184 | +| 1/319/_sg/c261059d47c840e1b749222dabb4b2bb_v1.json | 1 | 1 | 1 | 4 | 184 | ++----------------------------------------------------+----------------+-------------+-----------+--------------------+------------------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/fuse-snapshot.md b/tidb-cloud-lake/sql/fuse-snapshot.md new file mode 100644 index 0000000000000..168ea1839d67d --- /dev/null +++ b/tidb-cloud-lake/sql/fuse-snapshot.md @@ -0,0 +1,35 @@ +--- +title: FUSE_SNAPSHOT +--- + +Returns the snapshot information of a table. For more information about what is snapshot in Databend, see [What are Snapshot, Segment, and Block?](/tidb-cloud-lake/sql/optimize-table.md#what-are-snapshot-segment-and-block). + +See Also: + +- [FUSE_SEGMENT](/tidb-cloud-lake/sql/fuse-segment.md) +- [FUSE_BLOCK](/tidb-cloud-lake/sql/fuse-block.md) + +## Syntax + +```sql +FUSE_SNAPSHOT('', '') +``` + +## Examples + +```sql +CREATE TABLE mytable(a int, b int) CLUSTER BY(a+1); + +INSERT INTO mytable VALUES(1,1),(3,3); +INSERT INTO mytable VALUES(2,2),(5,5); +INSERT INTO mytable VALUES(4,4); + +SELECT * FROM FUSE_SNAPSHOT('default','mytable'); + +--- +| snapshot_id | snapshot_location | format_version | previous_snapshot_id | segment_count | block_count | row_count | bytes_uncompressed | bytes_compressed | index_size | timestamp | +|----------------------------------|------------------------------------------------------------|----------------|----------------------------------|---------------|-------------|-----------|--------------------|------------------|------------|----------------------------| +| a13d211b7421432898a3786848b8ced3 | 670655/783287/_ss/a13d211b7421432898a3786848b8ced3_v1.json | 1 | \N | 1 | 1 | 2 | 16 | 290 | 363 | 2022-09-19 14:51:52.860425 | +| cf08e6af6c134642aeb76bc81e6e7580 | 670655/783287/_ss/cf08e6af6c134642aeb76bc81e6e7580_v1.json | 1 | a13d211b7421432898a3786848b8ced3 | 2 | 2 | 4 | 32 | 580 | 726 | 2022-09-19 14:52:15.282943 | +| 1bd4f68b831a402e8c42084476461aa1 | 670655/783287/_ss/1bd4f68b831a402e8c42084476461aa1_v1.json | 1 | cf08e6af6c134642aeb76bc81e6e7580 | 3 | 3 | 5 | 40 | 862 | 1085 | 2022-09-19 14:52:20.284347 | +``` diff --git a/tidb-cloud-lake/sql/fuse-statistic.md b/tidb-cloud-lake/sql/fuse-statistic.md new file mode 100644 index 0000000000000..183bd752696ec --- /dev/null +++ b/tidb-cloud-lake/sql/fuse-statistic.md @@ -0,0 +1,10 @@ +--- +title: FUSE_STATISTIC +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +:::note +This function is deprecated. Use [SHOW STATISTICS](/tidb-cloud-lake/sql/show-statistics.md) instead to view table statistics. +::: \ No newline at end of file diff --git a/tidb-cloud-lake/sql/fuse-time-travel-size.md b/tidb-cloud-lake/sql/fuse-time-travel-size.md new file mode 100644 index 0000000000000..2c88f76273ec4 --- /dev/null +++ b/tidb-cloud-lake/sql/fuse-time-travel-size.md @@ -0,0 +1,52 @@ +--- +title: FUSE_TIME_TRAVEL_SIZE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Calculates the storage size of historical data (for Time Travel) for tables. + +## Syntax + +```sql +-- Calculate historical data size for all tables in all databases +SELECT ... +FROM fuse_time_travel_size(); + +-- Calculate historical data size for all tables in a specified database +SELECT ... +FROM fuse_time_travel_size(''); + +-- Calculate historical data size for a specified table in a specified database +SELECT ... +FROM fuse_time_travel_size('', ''); +``` + +## Output + +The function returns a result set with the following columns: + +| Column | Description | +|----------------------------------|-------------------------------------------------------------------------------------------------------| +| `database_name` | The name of the database where the table is located. | +| `table_name` | The name of the table. | +| `is_dropped` | Indicates whether the table has been dropped (`true` for dropped tables, `false` otherwise). | +| `time_travel_size` | The total storage size of historical data (for Time Travel) for the table, in bytes. | +| `latest_snapshot_size` | The storage size of the latest snapshot of the table, in bytes. | +| `data_retention_period_in_hours` | The retention period for Time Travel data in hours (`NULL` means using the default retention policy). | +| `error` | Any error encountered while retrieving the storage size (`NULL` if no errors occurred). | + +## Examples + +This example calculates the historical data for all tables in the `default` database: + +```sql +SELECT * FROM fuse_time_travel_size('default') + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ database_name │ table_name │ is_dropped │ time_travel_size │ latest_snapshot_size │ data_retention_period_in_hours │ error │ +├───────────────┼────────────┼────────────┼──────────────────┼──────────────────────┼────────────────────────────────┼──────────────────┤ +│ default │ books │ true │ 2810 │ 1490 │ NULL │ NULL │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/fuse-vacuum-temporary-table.md b/tidb-cloud-lake/sql/fuse-vacuum-temporary-table.md new file mode 100644 index 0000000000000..91ccc2bc11f1f --- /dev/null +++ b/tidb-cloud-lake/sql/fuse-vacuum-temporary-table.md @@ -0,0 +1,42 @@ +--- +title: FUSE_VACUUM_TEMPORARY_TABLE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +## Overview + +Temporary tables are typically cleaned up automatically at session end (details in [CREATE TEMP TABLE](/tidb-cloud-lake/sql/create-temp-table.md)). However, this process can fail due to events like query node crashes or abnormal session terminations, leaving orphaned temporary files. + +`FUSE_VACUUM_TEMPORARY_TABLE()` manually removes these leftover files to reclaim storage. + +**When to use this function:** +- After known system failures or abnormal session terminations. +- If you suspect orphaned temporary data is consuming storage. +- As a periodic maintenance task in environments prone to such issues. + +## Operational Safety + +The `FUSE_VACUUM_TEMPORARY_TABLE()` function is designed to be a safe and reliable operation. +- **Targets Only Temporary Data:** It specifically identifies and removes only orphaned data and metadata files that belong to temporary tables. +- **No Impact on Regular Tables:** The function will not affect any regular, persistent tables or their data. Its scope is strictly limited to the cleanup of unreferenced temporary table remnants. + +## Syntax + +```sql +FUSE_VACUUM_TEMPORARY_TABLE(); +``` + + +## Examples + +```sql +SELECT * FROM FUSE_VACUUM_TEMPORARY_TABLE(); + +┌────────┐ +│ result │ +├────────┤ +│ Ok │ +└────────┘ +``` diff --git a/tidb-cloud-lake/sql/fuse-virtual-column.md b/tidb-cloud-lake/sql/fuse-virtual-column.md new file mode 100644 index 0000000000000..62d2bbc77cb48 --- /dev/null +++ b/tidb-cloud-lake/sql/fuse-virtual-column.md @@ -0,0 +1,48 @@ +--- +title: FUSE_VIRTUAL_COLUMN +--- + +Returns the virtual column information of the latest or specified snapshot of a table. For details, see [Virtual Column](/tidb-cloud-lake/guides/virtual-column.md). + +## Syntax + +```sql +FUSE_VIRTUAL_COLUMN('', ''[, '']) +``` + +## Examples + +```sql +CREATE TABLE test(id int, val variant); + +INSERT INTO + test +VALUES + ( + 1, + '{"id":1,"name":"databend"}' + ), + ( + 2, + '{"id":2,"name":"databricks"}' + ); + +SELECT * FROM FUSE_VIRTUAL_COLUMN('default', 'test'); + +╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ +│ snapshot_id │ timestamp │ virtual_block_ │ virtual_block_ │ row_count │ column_name │ column_type │ column_id │ block_offset │ bytes_compress │ +│ String │ Timestamp │ location │ size │ UInt64 │ String │ String │ UInt32 │ UInt64 │ ed │ +│ │ │ String │ UInt64 │ │ │ │ │ │ UInt64 │ +├────────────────┼────────────────┼────────────────┼────────────────┼───────────┼─────────────┼─────────────┼────────────┼──────────────┼────────────────┤ +│ 0196c3aa7cc97f │ 2025-05-12 08: │ 1/385366/_vb/h │ 632 │ 2 │ val['id'] │ UInt64 NULL │ 3000000000 │ 4 │ 48 │ +│ e69995765add1b │ 44:12.361000 │ 0196c8d0d8c976 │ │ │ │ │ │ │ │ +│ a3bd │ │ d19de8bfdd32a7 │ │ │ │ │ │ │ │ +│ │ │ 0a01_v2.parque │ │ │ │ │ │ │ │ +│ │ │ t │ │ │ │ │ │ │ │ +│ 0196c3aa7cc97f │ 2025-05-12 08: │ 1/385366/_vb/h │ 632 │ 2 │ val['name'] │ String NULL │ 3000000001 │ 52 │ 58 │ +│ e69995765add1b │ 44:12.361000 │ 0196c8d0d8c976 │ │ │ │ │ │ │ │ +│ a3bd │ │ d19de8bfdd32a7 │ │ │ │ │ │ │ │ +│ │ │ 0a01_v2.parque │ │ │ │ │ │ │ │ +│ │ │ t │ │ │ │ │ │ │ │ +╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/gen-random-uuid.md b/tidb-cloud-lake/sql/gen-random-uuid.md new file mode 100644 index 0000000000000..67ecbdf8f49d7 --- /dev/null +++ b/tidb-cloud-lake/sql/gen-random-uuid.md @@ -0,0 +1,50 @@ +--- +title: GEN_RANDOM_UUID +--- + +Generates a random UUID based on version 7, starting from version 1.2.658. Previously, this function generated UUIDs based on version 4. + +## Syntax + +```sql +GEN_RANDOM_UUID() +``` + +## Aliases + +- [UUID](/tidb-cloud-lake/sql/uuid.md) + +## Why Use UUID v7? + +- **Time-Based Ordering**: UUID v7 includes a timestamp, allowing events or records to be ordered chronologically by the time they were created. This is especially useful when you need to track the sequence of actions. + +- **Chronological Sorting**: UUID v7 ensures that UUIDs are sorted by creation time, which is ideal for scenarios where sorting events by time is necessary, such as event logging or maintaining audit trails. + +## Version Information + +- Version 1.2.658 and later: UUID version upgraded from v4 to v7. +- Version prior to 1.2.658: UUID generation was based on v4. + +## Examples + +In an application where events are logged, maintaining the correct sequence of actions is essential. UUID v7 ensures that each event is time-ordered, making it easy to track actions chronologically. + +```sql +-- Log a user logging in +SELECT GEN_RANDOM_UUID(), 'User logged in' AS event, CURRENT_TIMESTAMP AS event_time; + +-- Log a user making a purchase +SELECT GEN_RANDOM_UUID(), 'User made a purchase' AS event, CURRENT_TIMESTAMP AS event_time; +``` + +The results from these queries might look like this: +```sql +┌──────────────────────────────────────────────────────────────────────────────────────────┐ +│ gen_random_uuid() │ event │ event_time │ +├──────────────────────────────────────┼──────────────────────┼────────────────────────────┤ +│ 019329e6-26a2-7b01-b9f5-1c3c02600578 │ User logged in │ 2024-11-14 08:59:29.313906 │ +│ 019329e6-329e-73c3-b0a8-a413ce298607 │ User made a purchase │ 2024-11-14 08:59:32.381497 │ +└──────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +Notice that the `gen_random_uuid()` values are generated in the order that the events occurred, making it easy to maintain chronological order. diff --git a/tidb-cloud-lake/sql/generate-series.md b/tidb-cloud-lake/sql/generate-series.md new file mode 100644 index 0000000000000..0b7cc395fabb5 --- /dev/null +++ b/tidb-cloud-lake/sql/generate-series.md @@ -0,0 +1,122 @@ +--- +title: GENERATE_SERIES +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + +Generates a dataset starting from a specified point, ending at another specified point, and optionally with an incrementing value. The GENERATE_SERIES function works with the following data types: + +- Integer +- Date +- Timestamp + +## Syntax + +```sql +GENERATE_SERIES(, [, ]) +``` + +## Arguments + +| Argument | Description | +|--------------- |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| start | The starting value, representing the first number, date, or timestamp in the sequence. | +| stop | The ending value, representing the last number, date, or timestamp in the sequence. | +| step_interval | The step interval, determining the difference between adjacent values in the sequence. For integer sequences, the default value is 1. For date sequences, the default step interval is 1 day. For timestamp sequences, the default step interval is 1 microsecond. | + + +:::note +When dealing with functions like GENERATE_SERIES and RANGE, a key distinction lies in their boundary traits. GENERATE_SERIES is bound by both the left and right sides, while RANGE is bound on the left side only. For example, utilizing RANGE(1, 11) is equivalent to GENERATE_SERIES(1, 10). +::: + +## Return Type + +Returns a list containing a continuous sequence of numeric values, dates, or timestamps from *start* to *stop*. + +## Examples + +### Example 1: Generating Numeric, Date, and Timestamp Data + +```sql +SELECT * FROM GENERATE_SERIES(1, 10, 2); + +generate_series| +---------------+ + 1| + 3| + 5| + 7| + 9| + +SELECT * FROM GENERATE_SERIES('2023-03-20'::date, '2023-03-27'::date); + +generate_series| +---------------+ + 2023-03-20| + 2023-03-21| + 2023-03-22| + 2023-03-23| + 2023-03-24| + 2023-03-25| + 2023-03-26| + 2023-03-27| + +SELECT * FROM GENERATE_SERIES('2023-03-26 00:00'::timestamp, '2023-03-27 12:00'::timestamp, 86400000000); + +generate_series | +-------------------+ +2023-03-26 00:00:00| +2023-03-27 00:00:00| +``` + +### Example 2: Filling Query Result Gaps + +This example uses the GENERATE_SERIES function and left join operator to handle gaps in query results caused by missing information in specific ranges. + +```sql +CREATE TABLE t_metrics ( + date Date, + value INT +); + +INSERT INTO t_metrics VALUES + ('2020-01-01', 200), + ('2020-01-01', 300), + ('2020-01-04', 300), + ('2020-01-04', 300), + ('2020-01-05', 400), + ('2020-01-10', 700); + +SELECT date, SUM(value), COUNT() FROM t_metrics GROUP BY date ORDER BY date; + +date |sum(value)|count()| +----------+----------+-------+ +2020-01-01| 500| 2| +2020-01-04| 600| 2| +2020-01-05| 400| 1| +2020-01-10| 700| 1| +``` + +To close the gaps between January 1st and January 10th, 2020, use the following query: + +```sql +SELECT t.date, COALESCE(SUM(t_metrics.value), 0), COUNT(t_metrics.value) +FROM generate_series( + '2020-01-01'::Date, + '2020-01-10'::Date +) AS t(date) +LEFT JOIN t_metrics ON t_metrics.date = t.date +GROUP BY t.date ORDER BY t.date; + +date |coalesce(sum(t_metrics.value), 0)|count(t_metrics.value)| +----------+---------------------------------+----------------------+ +2020-01-01| 500| 2| +2020-01-02| 0| 0| +2020-01-03| 0| 0| +2020-01-04| 600| 2| +2020-01-05| 400| 1| +2020-01-06| 0| 0| +2020-01-07| 0| 0| +2020-01-08| 0| 0| +2020-01-09| 0| 0| +2020-01-10| 700| 1| +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/geo-to-h3.md b/tidb-cloud-lake/sql/geo-to-h3.md new file mode 100644 index 0000000000000..b10109458b8a6 --- /dev/null +++ b/tidb-cloud-lake/sql/geo-to-h3.md @@ -0,0 +1,23 @@ +--- +title: GEO_TO_H3 +--- + +Returns the [H3](https://eng.uber.com/h3/) index of the hexagon cell where the given location resides. Returning 0 means an error occurred. + +## Syntax + +```sql +GEO_TO_H3(lon, lat, res) +``` + +## Examples + +```sql +SELECT GEO_TO_H3(37.79506683, 55.71290588, 15); + +┌─────────────────────────────────────────┐ +│ geo_to_h3(37.79506683, 55.71290588, 15) │ +├─────────────────────────────────────────┤ +│ 644325524701193974 │ +└─────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/geohash-decode.md b/tidb-cloud-lake/sql/geohash-decode.md new file mode 100644 index 0000000000000..32ae41613f51a --- /dev/null +++ b/tidb-cloud-lake/sql/geohash-decode.md @@ -0,0 +1,23 @@ +--- +title: GEOHASH_DECODE +--- + +Converts a [Geohash](https://en.wikipedia.org/wiki/Geohash)-encoded string into latitude/longitude coordinates. + +## Syntax + +```sql +GEOHASH_DECODE('') +``` + +## Examples + +```sql +SELECT GEOHASH_DECODE('ezs42'); + +┌─────────────────────────────────┐ +│ geohash_decode('ezs42') │ +├─────────────────────────────────┤ +│ (-5.60302734375,42.60498046875) │ +└─────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/geohash-encode.md b/tidb-cloud-lake/sql/geohash-encode.md new file mode 100644 index 0000000000000..911cd4ab7b0a6 --- /dev/null +++ b/tidb-cloud-lake/sql/geohash-encode.md @@ -0,0 +1,23 @@ +--- +title: GEOHASH_ENCODE +--- + +Converts a pair of latitude and longitude coordinates into a [Geohash](https://en.wikipedia.org/wiki/Geohash)-encoded string. + +## Syntax + +```sql +GEOHASH_ENCODE(lon, lat) +``` + +## Examples + +```sql +SELECT GEOHASH_ENCODE(-5.60302734375, 42.593994140625); + +┌────────────────────────────────────────────────────┐ +│ geohash_encode((- 5.60302734375), 42.593994140625) │ +├────────────────────────────────────────────────────┤ +│ ezs42d000000 │ +└────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/geometry.md b/tidb-cloud-lake/sql/geometry.md new file mode 100644 index 0000000000000..680bf67664241 --- /dev/null +++ b/tidb-cloud-lake/sql/geometry.md @@ -0,0 +1,94 @@ +--- +title: TO_GEOMETRY +title_includes: TRY_TO_GEOMETRY +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Parses an input and returns a value of type GEOMETRY. + +`TRY_TO_GEOMETRY` returns a NULL value if an error occurs during parsing. + +## Syntax + +```sql +TO_GEOMETRY(, []) +TO_GEOMETRY(, []) +TO_GEOMETRY(, []) +TRY_TO_GEOMETRY(, []) +TRY_TO_GEOMETRY(, []) +TRY_TO_GEOMETRY(, []) +``` + +## Arguments + +| Arguments | Description | +|-------------|-----------------------------------------------------------------------------------------------------------| +| `` | The argument must be a string expression in WKT, EWKT, WKB or EWKB in hexadecimal format, GeoJSON format. | +| `` | The argument must be a binary expression in WKB or EWKB format. | +| `` | The argument must be a JSON OBJECT in GeoJSON format. | +| `` | The integer value of the SRID to use. | + +## Return Type + +Geometry. + +## Examples + +```sql +SELECT + TO_GEOMETRY( + 'POINT(1820.12 890.56)' + ) AS pipeline_geometry; + +┌───────────────────────┐ +│ pipeline_geometry │ +├───────────────────────┤ +│ POINT(1820.12 890.56) │ +└───────────────────────┘ + +SELECT + TO_GEOMETRY( + '0101000020797f000066666666a9cb17411f85ebc19e325641', 4326 + ) AS pipeline_geometry; + +┌───────────────────────────────────────┐ +│ pipeline_geometry │ +├───────────────────────────────────────┤ +│ SRID=4326;POINT(389866.35 5819003.03) │ +└───────────────────────────────────────┘ + +SELECT + TO_GEOMETRY( + FROM_HEX('0101000020797f000066666666a9cb17411f85ebc19e325641'), 4326 + ) AS pipeline_geometry; + +┌───────────────────────────────────────┐ +│ pipeline_geometry │ +├───────────────────────────────────────┤ +│ SRID=4326;POINT(389866.35 5819003.03) │ +└───────────────────────────────────────┘ + +SELECT + TO_GEOMETRY( + '{"coordinates":[[389866,5819003],[390000,5830000]],"type":"LineString"}' + ) AS pipeline_geometry; + +┌───────────────────────────────────────────┐ +│ pipeline_geometry │ +├───────────────────────────────────────────┤ +│ LINESTRING(389866 5819003,390000 5830000) │ +└───────────────────────────────────────────┘ + +SELECT + TO_GEOMETRY( + PARSE_JSON('{"coordinates":[[389866,5819003],[390000,5830000]],"type":"LineString"}') + ) AS pipeline_geometry; + +┌───────────────────────────────────────────┐ +│ pipeline_geometry │ +├───────────────────────────────────────────┤ +│ LINESTRING(389866 5819003,390000 5830000) │ +└───────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/geospatial-functions.md b/tidb-cloud-lake/sql/geospatial-functions.md new file mode 100644 index 0000000000000..9e3105ecd4b98 --- /dev/null +++ b/tidb-cloud-lake/sql/geospatial-functions.md @@ -0,0 +1,140 @@ +--- +title: Geospatial Functions +--- + +Databend ships with two complementary sets of geospatial capabilities: PostGIS-style geometry functions for building and analysing shapes, and H3 utilities for global hexagonal indexing. The tables below group the functions by task so you can quickly locate the right tool, similar to the layout used in the Snowflake documentation. + +## Geometry Constructors + +| Function | Description | Example | +|----------|-------------|---------| +| [ST_MAKEGEOMPOINT](/tidb-cloud-lake/sql/st-makegeompoint.md) / [ST_GEOM_POINT](/tidb-cloud-lake/sql/st-geom-point.md) | Construct a Point geometry | `ST_MAKEGEOMPOINT(-122.35, 37.55)` → `POINT(-122.35 37.55)` | +| [ST_MAKEPOINT](/tidb-cloud-lake/sql/st-makepoint.md) / [ST_POINT](/tidb-cloud-lake/sql/st-point.md) | Construct a Point geography | `ST_MAKEPOINT(-122.35, 37.55)` → `POINT(-122.35 37.55)` | +| [ST_MAKELINE](/tidb-cloud-lake/sql/st-makeline.md) / [ST_MAKE_LINE](/tidb-cloud-lake/sql/st-make-line.md) | Create a LineString from points | `ST_MAKELINE(ST_MAKEGEOMPOINT(-122.35, 37.55), ST_MAKEGEOMPOINT(-122.40, 37.60))` → `LINESTRING(-122.35 37.55, -122.40 37.60)` | +| [ST_MAKEPOLYGON](/tidb-cloud-lake/sql/st-makepolygon.md) | Create a Polygon from a closed LineString | `ST_MAKEPOLYGON(ST_MAKELINE(...))` → `POLYGON(...)` | +| [ST_POLYGON](/tidb-cloud-lake/sql/st-polygon.md) | Create a Polygon from coordinate rings | `ST_POLYGON(...)` → `POLYGON(...)` | + +## Geometry Conversion + +| Function | Description | Example | +|----------|-------------|---------| +| [ST_GEOMETRYFROMTEXT](/tidb-cloud-lake/sql/st-geometryfromtext.md) / [ST_GEOMFROMTEXT](/tidb-cloud-lake/sql/st-geomfromtext.md) | Convert WKT to geometry | `ST_GEOMETRYFROMTEXT('POINT(-122.35 37.55)')` → `POINT(-122.35 37.55)` | +| [ST_GEOMETRYFROMWKB](/tidb-cloud-lake/sql/st-geometryfromwkb.md) / [ST_GEOMFROMWKB](/tidb-cloud-lake/sql/st-geomfromwkb.md) | Convert WKB to geometry | `ST_GEOMETRYFROMWKB(...)` → `POINT(...)` | +| [ST_GEOMETRYFROMEWKT](/tidb-cloud-lake/sql/st-geometryfromewkt.md) / [ST_GEOMFROMEWKT](/tidb-cloud-lake/sql/st-geomfromewkt.md) | Convert EWKT to geometry | `ST_GEOMETRYFROMEWKT('SRID=4326;POINT(-122.35 37.55)')` → `POINT(-122.35 37.55)` | +| [ST_GEOMETRYFROMEWKB](/tidb-cloud-lake/sql/st-geometryfromewkb.md) / [ST_GEOMFROMEWKB](/tidb-cloud-lake/sql/st-geomfromewkb.md) | Convert EWKB to geometry | `ST_GEOMETRYFROMEWKB(...)` → `POINT(...)` | +| [ST_GEOGRAPHYFROMWKT](/tidb-cloud-lake/sql/st-geographyfromwkt.md) / [ST_GEOGFROMWKT](/tidb-cloud-lake/sql/st-geogfromwkt.md) | Convert WKT/EWKT to geography | `ST_GEOGRAPHYFROMWKT('POINT(-122.35 37.55)')` → `POINT(-122.35 37.55)` | +| [ST_GEOGRAPHYFROMWKB](/tidb-cloud-lake/sql/st-geographyfromwkb.md) / [ST_GEOGFROMWKB](/tidb-cloud-lake/sql/st-geogfromwkb.md) | Convert WKB/EWKB to geography | `ST_GEOGRAPHYFROMWKB(...)` → `POINT(...)` | +| [ST_GEOMFROMGEOHASH](/tidb-cloud-lake/sql/st-geomfromgeohash.md) | Convert GeoHash to geometry | `ST_GEOMFROMGEOHASH('9q8yyk8')` → `POLYGON(...)` | +| [ST_GEOMPOINTFROMGEOHASH](/tidb-cloud-lake/sql/st-geompointfromgeohash.md) | Convert GeoHash to Point geometry | `ST_GEOMPOINTFROMGEOHASH('9q8yyk8')` → `POINT(...)` | +| [ST_GEOGFROMGEOHASH](/tidb-cloud-lake/sql/st-geogfromgeohash.md) | Convert GeoHash to geography polygon | `ST_GEOGFROMGEOHASH('9q8yyk8')` → `POLYGON(...)` | +| [ST_GEOGPOINTFROMGEOHASH](/tidb-cloud-lake/sql/st-geogpointfromgeohash.md) | Convert GeoHash to geography point | `ST_GEOGPOINTFROMGEOHASH('9q8yyk8')` → `POINT(...)` | +| [TO_GEOMETRY](/tidb-cloud-lake/sql/geometry.md) | Parse various formats into geometry | `TO_GEOMETRY('POINT(-122.35 37.55)')` → `POINT(-122.35 37.55)` | +| [TO_GEOGRAPHY](/tidb-cloud-lake/sql/to-geography.md) / [TRY_TO_GEOGRAPHY](/tidb-cloud-lake/sql/to-geography.md) | Parse various formats into geography | `TO_GEOGRAPHY('POINT(-122.35 37.55)')` → `POINT(-122.35 37.55)` | + +## Geometry Output + +| Function | Description | Example | +|----------|-------------|---------| +| [ST_ASTEXT](/tidb-cloud-lake/sql/st-astext.md) | Convert geometry to WKT | `ST_ASTEXT(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `'POINT(-122.35 37.55)'` | +| [ST_ASWKT](/tidb-cloud-lake/sql/st-aswkt.md) | Convert geometry to WKT | `ST_ASWKT(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `'POINT(-122.35 37.55)'` | +| [ST_ASBINARY](/tidb-cloud-lake/sql/st-asbinary.md) / [ST_ASWKB](/tidb-cloud-lake/sql/st-aswkb.md) | Convert geometry to WKB | `ST_ASBINARY(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `WKB representation` | +| [ST_ASEWKT](/tidb-cloud-lake/sql/st-asewkt.md) | Convert geometry to EWKT | `ST_ASEWKT(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `'SRID=4326;POINT(-122.35 37.55)'` | +| [ST_ASEWKB](/tidb-cloud-lake/sql/st-asewkb.md) | Convert geometry to EWKB | `ST_ASEWKB(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `EWKB representation` | +| [ST_ASGEOJSON](/tidb-cloud-lake/sql/st-asgeojson.md) | Convert geometry to GeoJSON | `ST_ASGEOJSON(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `'{"type":"Point","coordinates":[-122.35,37.55]}'` | +| [ST_GEOHASH](/tidb-cloud-lake/sql/st-geohash.md) | Convert geometry to GeoHash | `ST_GEOHASH(ST_MAKEGEOMPOINT(-122.35, 37.55), 7)` → `'9q8yyk8'` | +| [TO_STRING](/tidb-cloud-lake/sql/string.md) | Convert geometry to string | `TO_STRING(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `'POINT(-122.35 37.55)'` | + +## Geometry Accessors & Properties + +| Function | Description | Example | +|----------|-------------|---------| +| [ST_DIMENSION](/tidb-cloud-lake/sql/st-dimension.md) | Return the topological dimension | `ST_DIMENSION(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `0` | +| [ST_SRID](/tidb-cloud-lake/sql/st-srid.md) | Return the SRID of a geometry | `ST_SRID(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `4326` | +| [ST_SETSRID](/tidb-cloud-lake/sql/st-setsrid.md) | Assign an SRID to a geometry | `ST_SETSRID(ST_MAKEGEOMPOINT(-122.35, 37.55), 3857)` → `POINT(-122.35 37.55)` | +| [ST_TRANSFORM](/tidb-cloud-lake/sql/st-transform.md) | Transform geometry to a new SRID | `ST_TRANSFORM(ST_MAKEGEOMPOINT(-122.35, 37.55), 3857)` → `POINT(-13618288.8 4552395.0)` | +| [ST_NPOINTS](/tidb-cloud-lake/sql/st-npoints.md) / [ST_NUMPOINTS](/tidb-cloud-lake/sql/st-numpoints.md) | Count points in a geometry | `ST_NPOINTS(ST_MAKELINE(...))` → `2` | +| [ST_POINTN](/tidb-cloud-lake/sql/st-pointn.md) | Return a specific point from a LineString | `ST_POINTN(ST_MAKELINE(...), 1)` → `POINT(-122.35 37.55)` | +| [ST_STARTPOINT](/tidb-cloud-lake/sql/st-startpoint.md) | Return the first point in a LineString | `ST_STARTPOINT(ST_MAKELINE(...))` → `POINT(-122.35 37.55)` | +| [ST_ENDPOINT](/tidb-cloud-lake/sql/st-endpoint.md) | Return the last point in a LineString | `ST_ENDPOINT(ST_MAKELINE(...))` → `POINT(-122.40 37.60)` | +| [ST_LENGTH](/tidb-cloud-lake/sql/st-length.md) | Measure the length of a LineString | `ST_LENGTH(ST_MAKELINE(...))` → `5.57` | +| [ST_X](/tidb-cloud-lake/sql/st-x.md) / [ST_Y](/tidb-cloud-lake/sql/st-y.md) | Return the X or Y coordinate of a Point | `ST_X(ST_MAKEGEOMPOINT(-122.35, 37.55))` → `-122.35` | +| [ST_XMIN](/tidb-cloud-lake/sql/st-xmin.md) / [ST_XMAX](/tidb-cloud-lake/sql/st-xmax.md) | Return the min/max X coordinate | `ST_XMIN(ST_MAKELINE(...))` → `-122.40` | +| [ST_YMIN](/tidb-cloud-lake/sql/st-ymin.md) / [ST_YMAX](/tidb-cloud-lake/sql/st-ymax.md) | Return the min/max Y coordinate | `ST_YMAX(ST_MAKELINE(...))` → `37.60` | + +## Spatial Relationships + +| Function | Description | Example | +|----------|-------------|---------| +| [ST_CONTAINS](/tidb-cloud-lake/sql/st-contains.md) | Test whether one geometry contains another | `ST_CONTAINS(ST_MAKEPOLYGON(...), ST_MAKEGEOMPOINT(...))` → `TRUE` | +| [POINT_IN_POLYGON](point-in-polygon.md) | Check if a point lies inside a polygon | `POINT_IN_POLYGON([lon, lat], [[p1_lon, p1_lat], ...])` → `TRUE` | + +## Distance & Measurements + +| Function | Description | Example | +|----------|-------------|---------| +| [ST_DISTANCE](/tidb-cloud-lake/sql/st-distance.md) | Measure the distance between geometries | `ST_DISTANCE(ST_MAKEGEOMPOINT(-122.35, 37.55), ST_MAKEGEOMPOINT(-122.40, 37.60))` → `5.57` | +| [HAVERSINE](/tidb-cloud-lake/sql/haversine.md) | Compute great-circle distance between coordinates | `HAVERSINE(37.55, -122.35, 37.60, -122.40)` → `6.12` | + +## H3 Indexing & Conversion + +| Function | Description | Example | +|----------|-------------|---------| +| [GEO_TO_H3](/tidb-cloud-lake/sql/geo-to-h3.md) | Convert longitude/latitude to an H3 index | `GEO_TO_H3(37.7950, 55.7129, 15)` → `644325524701193974` | +| [H3_TO_GEO](/tidb-cloud-lake/sql/h3-to-geo.md) | Convert an H3 index to longitude/latitude | `H3_TO_GEO(644325524701193974)` → `[37.7950, 55.7129]` | +| [H3_TO_STRING](/tidb-cloud-lake/sql/h3-to-string.md) | Convert an H3 index to its string form | `H3_TO_STRING(644325524701193974)` → `'8f2830828052d25'` | +| [STRING_TO_H3](/tidb-cloud-lake/sql/string-to-h3.md) | Convert an H3 string to an index | `STRING_TO_H3('8f2830828052d25')` → `644325524701193974` | +| [GEOHASH_ENCODE](/tidb-cloud-lake/sql/geohash-encode.md) | Encode longitude/latitude to GeoHash | `GEOHASH_ENCODE(37.7950, 55.7129, 12)` → `'ucfv0nzpt3s7'` | +| [GEOHASH_DECODE](/tidb-cloud-lake/sql/geohash-decode.md) | Decode a GeoHash to longitude/latitude | `GEOHASH_DECODE('ucfv0nzpt3s7')` → `[37.7950, 55.7129]` | + +## H3 Cell Properties + +| Function | Description | Example | +|----------|-------------|---------| +| [H3_GET_RESOLUTION](/tidb-cloud-lake/sql/h3-get-resolution.md) | Return the resolution of an H3 index | `H3_GET_RESOLUTION(644325524701193974)` → `15` | +| [H3_GET_BASE_CELL](/tidb-cloud-lake/sql/h3-get-base-cell.md) | Return the base cell number | `H3_GET_BASE_CELL(644325524701193974)` → `14` | +| [H3_IS_VALID](/tidb-cloud-lake/sql/h3-is-valid.md) | Check whether an H3 index is valid | `H3_IS_VALID(644325524701193974)` → `TRUE` | +| [H3_IS_PENTAGON](/tidb-cloud-lake/sql/h3-is-pentagon.md) | Check whether an H3 index is a pentagon | `H3_IS_PENTAGON(644325524701193974)` → `FALSE` | +| [H3_IS_RES_CLASS_III](/tidb-cloud-lake/sql/h3-is-res-class-iii.md) | Check whether an H3 index is class III | `H3_IS_RES_CLASS_III(644325524701193974)` → `FALSE` | +| [H3_GET_FACES](/tidb-cloud-lake/sql/h3-get-faces.md) | Return intersecting icosahedron faces | `H3_GET_FACES(644325524701193974)` → `[7]` | +| [H3_TO_PARENT](/tidb-cloud-lake/sql/h3-to-parent.md) | Return the parent index at a lower resolution | `H3_TO_PARENT(644325524701193974, 10)` → `622236721289822207` | +| [H3_TO_CHILDREN](/tidb-cloud-lake/sql/h3-to-children.md) | Return child indexes at a higher resolution | `H3_TO_CHILDREN(622236721289822207, 11)` → `[...]` | +| [H3_TO_CENTER_CHILD](/tidb-cloud-lake/sql/h3-to-center-child.md) | Return the center child for a resolution | `H3_TO_CENTER_CHILD(622236721289822207, 11)` → `625561602857582591` | +| [H3_CELL_AREA_M2](/tidb-cloud-lake/sql/h3-cell-area-m2.md) | Return the area of a cell in square meters | `H3_CELL_AREA_M2(644325524701193974)` → `0.8953` | +| [H3_CELL_AREA_RADS2](/tidb-cloud-lake/sql/h3-cell-area-rads2.md) | Return the area of a cell in square radians | `H3_CELL_AREA_RADS2(644325524701193974)` → `2.2e-14` | +| [H3_HEX_AREA_KM2](/tidb-cloud-lake/sql/h3-hex-area-km2.md) | Return the average hexagon area in km² | `H3_HEX_AREA_KM2(10)` → `0.0152` | +| [H3_HEX_AREA_M2](/tidb-cloud-lake/sql/h3-hex-area-m2.md) | Return the average hexagon area in m² | `H3_HEX_AREA_M2(10)` → `15200` | +| [H3_TO_GEO_BOUNDARY](/tidb-cloud-lake/sql/h3-to-geo-boundary.md) | Return the boundary of a cell | `H3_TO_GEO_BOUNDARY(644325524701193974)` → `[[lon1,lat1], ...]` | +| [H3_NUM_HEXAGONS](/tidb-cloud-lake/sql/h3-num-hexagons.md) | Return the number of hexagons at a resolution | `H3_NUM_HEXAGONS(2)` → `5882` | + +## H3 Neighborhoods + +| Function | Description | Example | +|----------|-------------|---------| +| [H3_DISTANCE](/tidb-cloud-lake/sql/h3-distance.md) | Return the grid distance between two indexes | `H3_DISTANCE(599119489002373119, 599119491149856767)` → `1` | +| [H3_INDEXES_ARE_NEIGHBORS](/tidb-cloud-lake/sql/h3-indexes-are-neighbors.md) | Test whether two indexes are neighbors | `H3_INDEXES_ARE_NEIGHBORS(599119489002373119, 599119491149856767)` → `TRUE` | +| [H3_K_RING](/tidb-cloud-lake/sql/h3-k-ring.md) | Return all indexes within k distance | `H3_K_RING(599119489002373119, 1)` → `[599119489002373119, ...]` | +| [H3_HEX_RING](/tidb-cloud-lake/sql/h3-hex-ring.md) | Return indexes exactly k steps away | `H3_HEX_RING(599119489002373119, 1)` → `[599119491149856767, ...]` | +| [H3_LINE](/tidb-cloud-lake/sql/h3-line.md) | Return indexes along a path | `H3_LINE(from_h3, to_h3)` → `[from_h3, ..., to_h3]` | + +## H3 Edge Operations + +| Function | Description | Example | +|----------|-------------|---------| +| [H3_GET_UNIDIRECTIONAL_EDGE](/tidb-cloud-lake/sql/h3-get-unidirectional-edge.md) | Return the edge between two adjacent cells | `H3_GET_UNIDIRECTIONAL_EDGE(from_h3, to_h3)` → `edge_index` | +| [H3_UNIDIRECTIONAL_EDGE_IS_VALID](/tidb-cloud-lake/sql/h3-unidirectional-edge-is-valid.md) | Check whether an edge index is valid | `H3_UNIDIRECTIONAL_EDGE_IS_VALID(edge_index)` → `TRUE` | +| [H3_GET_ORIGIN_INDEX_FROM_UNIDIRECTIONAL_EDGE](h3-get-origin-index-from-unidirectional-edge.md) | Return the origin cell from an edge | `H3_GET_ORIGIN_INDEX_FROM_UNIDIRECTIONAL_EDGE(edge_index)` → `from_h3` | +| [H3_GET_DESTINATION_INDEX_FROM_UNIDIRECTIONAL_EDGE](h3-get-destination-index-from-unidirectional-edge.md) | Return the destination cell from an edge | `H3_GET_DESTINATION_INDEX_FROM_UNIDIRECTIONAL_EDGE(edge_index)` → `to_h3` | +| [H3_GET_INDEXES_FROM_UNIDIRECTIONAL_EDGE](h3-get-indexes-from-unidirectional-edge.md) | Return both cells for an edge | `H3_GET_INDEXES_FROM_UNIDIRECTIONAL_EDGE(edge_index)` → `[from_h3, to_h3]` | +| [H3_GET_UNIDIRECTIONAL_EDGES_FROM_HEXAGON](h3-get-unidirectional-edges-from-hexagon.md) | List edges originating from a cell | `H3_GET_UNIDIRECTIONAL_EDGES_FROM_HEXAGON(h3_index)` → `[edge1, edge2, ...]` | +| [H3_GET_UNIDIRECTIONAL_EDGE_BOUNDARY](/tidb-cloud-lake/sql/h3-get-unidirectional-edge-boundary.md) | Return the boundary of an edge | `H3_GET_UNIDIRECTIONAL_EDGE_BOUNDARY(edge_index)` → `[[lon1,lat1], [lon2,lat2]]` | + +## H3 Measurements & Angles + +| Function | Description | Example | +|----------|-------------|---------| +| [H3_EDGE_LENGTH_KM](/tidb-cloud-lake/sql/h3-edge-length-km.md) | Return the average edge length in kilometres | `H3_EDGE_LENGTH_KM(10)` → `0.065` | +| [H3_EDGE_LENGTH_M](/tidb-cloud-lake/sql/h3-edge-length-m.md) | Return the average edge length in metres | `H3_EDGE_LENGTH_M(10)` → `65.91` | +| [H3_EXACT_EDGE_LENGTH_KM](/tidb-cloud-lake/sql/h3-exact-edge-length-km.md) | Return the exact edge length in kilometres | `H3_EXACT_EDGE_LENGTH_KM(edge_index)` → `0.066` | +| [H3_EXACT_EDGE_LENGTH_M](/tidb-cloud-lake/sql/h3-exact-edge-length-m.md) | Return the exact edge length in metres | `H3_EXACT_EDGE_LENGTH_M(edge_index)` → `66.12` | +| [H3_EXACT_EDGE_LENGTH_RADS](/tidb-cloud-lake/sql/h3-exact-edge-length-rads.md) | Return the exact edge length in radians | `H3_EXACT_EDGE_LENGTH_RADS(edge_index)` → `0.00001` | +| [H3_EDGE_ANGLE](/tidb-cloud-lake/sql/h3-edge-angle.md) | Return the angle in radians between two edges | `H3_EDGE_ANGLE(edge1, edge2)` → `1.047` | diff --git a/tidb-cloud-lake/sql/geospatial.md b/tidb-cloud-lake/sql/geospatial.md new file mode 100644 index 0000000000000..ca7e75cef5316 --- /dev/null +++ b/tidb-cloud-lake/sql/geospatial.md @@ -0,0 +1,212 @@ +--- +title: Geospatial +sidebar_position: 14 +--- + +Databend stores spatial data through two data types: + +- `GEOMETRY` is planar (default SRID 0, or any SRID you assign) and suits local/projected workloads. +- `GEOGRAPHY` is spherical (WGS 84, SRID 4326) with latitude/longitude validation for global workloads. + +Both types persist coordinates as IEEE 754 `Float64` values in EWKB, cover every common geometry (Point through GeometryCollection), emit WKT/WKB/GeoJSON, and can be reprojected with functions such as `ST_TRANSFORM`. + +## Data Types + +### GEOMETRY + +- Uses Cartesian coordinates and is ideal for campus-, city-, or province-scale data where planar math is sufficient. +- Default SRID is 0; you can set another SRID when creating the column or writing data. +- Works with most spatial operators and can be reprojected with `ST_TRANSFORM` for downstream consumers. + +### GEOGRAPHY + +- Stores longitude/latitude pairs on WGS 84 (SRID 4326); values outside [-180°, 180°] / [-90°, 90°] are rejected. +- Recommended for continental or global distance/area calculations that need ellipsoidal formulas. +- Can be converted to GEOMETRY when a planar algorithm is required. + +| Feature | GEOMETRY | GEOGRAPHY | +| :--- | :--- | :--- | +| **Coordinate System** | Cartesian (Planar) | Ellipsoidal (Spherical) | +| **SRID** | 0 (default) or Custom | 4326 (WGS 84) only | +| **X / Y Interpretation** | X, Y on a flat plane | Longitude, Latitude on a sphere | +| **Edge Interpretation** | Straight line on a plane | Great circle arc (shortest path on sphere) | +| **Primary Use Case** | Local / Projected data (e.g. city, building) | Global data (e.g. GPS tracks, shipping routes) | + +## Precision and Coordinate Control + +- **Double precision everywhere**: functions such as `ST_MAKEPOINT` and `ST_GEOMETRYFROMEWKT` ingest `Float64` values and persist them in EWKB, so coordinates keep their original digits. +- **SRID behavior**: GEOMETRY keeps whatever SRID you assign (default 0), while GEOGRAPHY is fixed at SRID 4326 and rejects other SRIDs. +- **Coordinate safety**: GEOGRAPHY inputs run through `check_point`, ensuring longitude/latitude stay within [-180°, 180°] / [-90°, 90°]. +- **Projection**: `ST_TRANSFORM` swaps GEOMETRY SRIDs (for example, 4326 → 3857) or converts GEOGRAPHY data to a planar system for downstream processing. + +## Supported Object Types + +| Object Type | Description & Example | Precision Notes | +| --- | --- | --- | +| Point | Single coordinate, e.g. `POINT(113.98765432109876 23.456789012345678)` | Each coordinate is stored as a `Float64` and keeps ~15–16 digits of precision. | +| LineString | Connected path, e.g. `LINESTRING(10 20, 30 40, 50 60)` | Every vertex uses the same double precision, so derived lengths rely on the original values. | +| Polygon | Closed area, e.g. `POLYGON((10 20, 30 40, 50 60, 10 20))` | All rings share the `Float64` vertices, preserving polygon edges for area/containment tests. | +| MultiPoint | Multiple points, e.g. `MULTIPOINT((10 20), (30 40))` | Each member point inherits the same double-precision storage as a standalone point. | +| MultiLineString | Multiple paths, e.g. `MULTILINESTRING((10 20, 30 40), (50 60, 70 80))` | Precision is maintained per vertex, ensuring accurate length or intersection calculations. | +| MultiPolygon | Multiple areas, e.g. `MULTIPOLYGON(((10 20, 30 40, 50 60, 10 20)), ((15 25, 25 35, 35 45, 15 25)))` | Each polygon’s coordinates remain `Float64`, so combined areas/overlaps retain full precision. | +| GeometryCollection | Mixed objects, e.g. `GEOMETRYCOLLECTION(POINT(10 20), LINESTRING(10 20, 30 40))` | Members keep their native double-precision coordinates regardless of geometry type. | + +## Output Formats + +Databend persists spatial values as EWKB but exposes several output formats. Set the `geometry_output_format` session setting (default: `WKT`) or call explicit conversion functions: + +- **WKT / EWKT** – Text representation; EWKT prefixes an SRID (for example, `SRID=4326;POINT(-44.3 60.1)`). +- **WKB / EWKB** – Compact binary, useful for interop with other GIS runtimes. +- **GeoJSON** – JSON representation for web maps and APIs. + +```sql +SET geometry_output_format = 'GeoJSON'; +SELECT ST_ASWKB(geo), ST_ASEWKT(geo), ST_ASGEOJSON(geo) FROM ...; +``` + +## Functions + +Browse the catalogued list of spatial functions here: +- [Geospatial Functions](/tidb-cloud-lake/sql/geospatial-functions.md) + +## Examples + +Each example below highlights one object type, the scenario it solves, the SQL to produce it, and a sample result table. `CAST('…' AS GEOMETRY)` parses the inline WKT literal so you can experiment without creating tables. + +### Point — pinpoint a single sensor + +*Scenario*: store the exact latitude/longitude produced by an IoT device and expose both GeoJSON and numeric coordinates. + +```sql +SELECT + ST_ASGEOJSON(pt) AS sensor_geojson, + ST_X(pt) AS lon, + ST_Y(pt) AS lat +FROM (SELECT CAST('POINT(113.98765432109876 23.456789012345678)' AS GEOMETRY) AS pt); +``` + +``` +┌──────────────────────────────────────────────────────────────────────────────┬──────────────────────┬──────────────────────┐ +│ sensor_geojson │ lon │ lat │ +├──────────────────────────────────────────────────────────────────────────────┼──────────────────────┼──────────────────────┤ +│ {"type":"Point","coordinates":[113.98765432109876,23.456789012345677]} │ 113.98765432109876 │ 23.456789012345677 │ +└──────────────────────────────────────────────────────────────────────────────┴──────────────────────┴──────────────────────┘ +``` + +### LineString — describe a route + +*Scenario*: record a simple driving route and measure its length in coordinate units. + +```sql +SELECT + ST_ASWKT(route) AS road_segment, + ST_LENGTH(route) AS segment_length +FROM (SELECT CAST('LINESTRING(10 20, 30 40, 50 60)' AS GEOMETRY) AS route); +``` + +``` +┌──────────────────────────────────────────────────────────────┬────────────────────┐ +│ road_segment │ segment_length │ +├──────────────────────────────────────────────────────────────┼────────────────────┤ +│ LINESTRING(10 20,30 40,50 60) │ 56.568542495 │ +└──────────────────────────────────────────────────────────────┴────────────────────┘ +``` + +### Polygon — capture an area or geofence + +*Scenario*: define a rectangular geofence for a facility, read it back with SRID info, and compute its area. + +```sql +SELECT + ST_ASEWKT(area) AS ewkt_polygon, + ST_AREA(area) AS area_units +FROM (SELECT CAST('POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))' AS GEOMETRY) AS area); +``` + +``` +┌──────────────────────────────────────────────────────────────┬──────────────┐ +│ ewkt_polygon │ area_units │ +├──────────────────────────────────────────────────────────────┼──────────────┤ +│ POLYGON((0 0,0 10,10 10,10 0,0 0)) │ 100 │ +└──────────────────────────────────────────────────────────────┴──────────────┘ +``` + +### MultiPoint — tag multiple sites together + +*Scenario*: keep the coordinates of three kiosks together and report both the GeoJSON payload and the total count. + +```sql +SELECT + ST_ASGEOJSON(places) AS places_geojson, + ST_NUMPOINTS(places) AS total_sites +FROM (SELECT CAST('MULTIPOINT((10 20), (30 40), (50 60))' AS GEOMETRY) AS places); +``` + +``` +┌──────────────────────────────────────────────────────────────┬──────────────┐ +│ places_geojson │ total_sites │ +├──────────────────────────────────────────────────────────────┼──────────────┤ +│ {"type":"MultiPoint","coordinates":[[10,20],[30,40],[50,60]]} │ 3 │ +└──────────────────────────────────────────────────────────────┴──────────────┘ +``` + +### MultiLineString — represent parallel lines + +*Scenario*: group two parallel road segments, read them back as WKT, and count the total vertices with `ST_NUMPOINTS`. + +```sql +SELECT + ST_ASWKT(lines) AS multiline_wkt, + ST_NUMPOINTS(lines) AS vertex_count +FROM (SELECT CAST('MULTILINESTRING((10 20, 30 40), (50 60, 70 80))' AS GEOMETRY) AS lines); +``` + +``` +┌──────────────────────────────────────────────────────────────┬──────────────┐ +│ multiline_wkt │ vertex_count │ +├──────────────────────────────────────────────────────────────┼──────────────┤ +│ MULTILINESTRING((10 20,30 40),(50 60,70 80)) │ 4 │ +└──────────────────────────────────────────────────────────────┴──────────────┘ +``` + +### MultiPolygon — cover disjoint districts + +*Scenario*: represent two disjoint service zones and calculate the combined area. + +```sql +SELECT + ST_ASGEOJSON(zones) AS zones_geojson, + ST_AREA(zones) AS total_area +FROM ( + SELECT CAST('MULTIPOLYGON(((0 0, 0 10, 10 10, 10 0, 0 0)), ((20 0, 20 10, 30 10, 30 0, 20 0)))' AS GEOMETRY) AS zones +); +``` + +``` +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬──────────────┐ +│ zones_geojson │ total_area │ +├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤ +│ {"type":"MultiPolygon","coordinates":[[[[0,0],[0,10],[10,10],[10,0],[0,0]]],[[[20,0],[20,10],[30,10],[30,0],[20,0]]]]} │ 200 │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────┘ +``` + +### GeometryCollection — mix heterogenous shapes + +*Scenario*: keep a landmark marker and its connecting path together, exposing the mixed GeoJSON and the maximum dimension. + +```sql +SELECT + ST_ASGEOJSON(feature) AS feature_geojson, + ST_DIMENSION(feature) AS max_dimension +FROM ( + SELECT CAST('GEOMETRYCOLLECTION(POINT(10 20), LINESTRING(10 20, 30 40))' AS GEOMETRY) AS feature +); +``` + +``` +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬───────────────┐ +│ feature_geojson │ max_dimension │ +├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────┤ +│ {"type":"GeometryCollection","geometries":[{"type":"Point","coordinates":[10,20]},{"type":"LineString","coordinates":[[10,20],[30,40]]}]} │ 1 │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────┘ +``` diff --git a/tidb-cloud-lake/sql/get-by-keypath.md b/tidb-cloud-lake/sql/get-by-keypath.md new file mode 100644 index 0000000000000..351418557ca36 --- /dev/null +++ b/tidb-cloud-lake/sql/get-by-keypath.md @@ -0,0 +1,64 @@ +--- +title: GET_BY_KEYPATH +title_includes: GET_BY_KEYPATH_STRING +--- + +Extracts a nested value from a `VARIANT` using a **key path** string. `GET_BY_KEYPATH` returns the result as `VARIANT`, while `GET_BY_KEYPATH_STRING` returns a `STRING`. + +Key paths follow the Postgres-style braces syntax: each segment is wrapped in `{}` and segments are separated by commas, for example `'{user,profile,name}'`. Array indexes can be specified as numbers, e.g. `'{items,0}'`. + +## Syntax + +```sql +GET_BY_KEYPATH(, ) +GET_BY_KEYPATH_STRING(, ) +``` + +## Return Type + +- `GET_BY_KEYPATH`: `VARIANT` +- `GET_BY_KEYPATH_STRING`: `STRING` + +## Examples + +```sql +SELECT GET_BY_KEYPATH(PARSE_JSON('{"user":{"name":"Ada","tags":["a","b"]}}'), '{user,name}') AS profile_name; + +┌──────────────┐ +│ profile_name │ +├──────────────┤ +│ "Ada" │ +└──────────────┘ +``` + +```sql +SELECT GET_BY_KEYPATH(PARSE_JSON('[10, {"a":{"k1":[1,2,3]}}]'), '{1,a,k1}') AS inner_array; + +┌─────────────┐ +│ inner_array │ +├─────────────┤ +│ [1,2,3] │ +└─────────────┘ +``` + +```sql +SELECT GET_BY_KEYPATH_STRING(PARSE_JSON('{"user":{"name":"Ada"}}'), '{user,name}') AS name_text; + +┌──────────┐ +│ name_text│ +├──────────┤ +│ Ada │ +└──────────┘ +``` + +```sql +SELECT GET_BY_KEYPATH_STRING(PARSE_JSON('[10, {"scores":[100,98]}]'), '{1,scores,0}') AS first_score; + +┌──────────────┐ +│ first_score │ +├──────────────┤ +│ 100 │ +└──────────────┘ +``` + +If the key path cannot be resolved, both functions return `NULL`. diff --git a/tidb-cloud-lake/sql/get-ignore-case.md b/tidb-cloud-lake/sql/get-ignore-case.md new file mode 100644 index 0000000000000..9dd082ed6a390 --- /dev/null +++ b/tidb-cloud-lake/sql/get-ignore-case.md @@ -0,0 +1,37 @@ +--- +title: GET_IGNORE_CASE +--- + +Extracts value from a `VARIANT` that contains `OBJECT` by the field_name. +The value is returned as a `Variant` or `NULL` if either of the arguments is `NULL`. + +`GET_IGNORE_CASE` is similar to `GET` but applies case-insensitive matching to field names. +First match the exact same field name, if not found, match the case-insensitive field name alphabetically. + +## Syntax + +```sql +GET_IGNORE_CASE( , ) +``` + +## Arguments + +| Arguments | Description | +|----------------|------------------------------------------------------------------| +| `` | The VARIANT value that contains either an ARRAY or an OBJECT | +| `` | The String value specifies the key in a key-value pair of OBJECT | + +## Return Type + +VARIANT + +## Examples + +```sql +SELECT get_ignore_case(parse_json('{"aa":1, "aA":2, "Aa":3}'), 'AA'); ++---------------------------------------------------------------+ +| get_ignore_case(parse_json('{"aa":1, "aA":2, "Aa":3}'), 'AA') | ++---------------------------------------------------------------+ +| 3 | ++---------------------------------------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/get-path.md b/tidb-cloud-lake/sql/get-path.md new file mode 100644 index 0000000000000..f650cbdc3ca9a --- /dev/null +++ b/tidb-cloud-lake/sql/get-path.md @@ -0,0 +1,57 @@ +--- +title: GET_PATH +--- + +Extracts value from a `VARIANT` by `path_name`. +The value is returned as a `Variant` or `NULL` if either of the arguments is `NULL`. + +`GET_PATH` is equivalent to a chain of `GET` functions, `path_name` consists of a concatenation of field names preceded by periods (.), colons (:) or index operators (`[index]`). The first field name does not require the leading identifier to be specified. + +## Syntax + +```sql +GET_PATH( , ) +``` + +## Arguments + +| Arguments | Description | +|---------------|------------------------------------------------------------------| +| `` | The VARIANT value that contains either an ARRAY or an OBJECT | +| `` | The String value that consists of a concatenation of field names | + +## Return Type + +VARIANT + +## Examples + +```sql +SELECT get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k1[0]'); ++-----------------------------------------------------------------------+ +| get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k1[0]') | ++-----------------------------------------------------------------------+ +| 0 | ++-----------------------------------------------------------------------+ + +SELECT get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2:k3'); ++-----------------------------------------------------------------------+ +| get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2:k3') | ++-----------------------------------------------------------------------+ +| 3 | ++-----------------------------------------------------------------------+ + +SELECT get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2.k4'); ++-----------------------------------------------------------------------+ +| get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2.k4') | ++-----------------------------------------------------------------------+ +| 4 | ++-----------------------------------------------------------------------+ + +SELECT get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2.k5'); ++-----------------------------------------------------------------------+ +| get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2.k5') | ++-----------------------------------------------------------------------+ +| NULL | ++-----------------------------------------------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/get-sql.md b/tidb-cloud-lake/sql/get-sql.md new file mode 100644 index 0000000000000..944544c776ed7 --- /dev/null +++ b/tidb-cloud-lake/sql/get-sql.md @@ -0,0 +1,27 @@ +--- +title: GET +--- + +Returns an element from an array by index (1-based). + +## Syntax + +```sql +GET( , ) +``` + +## Aliases + +- [ARRAY_GET](/tidb-cloud-lake/sql/array-get.md) + +## Examples + +```sql +SELECT GET([1, 2], 2), ARRAY_GET([1, 2], 2); + +┌───────────────────────────────────────┐ +│ get([1, 2], 2) │ array_get([1, 2], 2) │ +├────────────────┼──────────────────────┤ +│ 2 │ 2 │ +└───────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/get.md b/tidb-cloud-lake/sql/get.md new file mode 100644 index 0000000000000..88a12ac60fb57 --- /dev/null +++ b/tidb-cloud-lake/sql/get.md @@ -0,0 +1,54 @@ +--- +title: GET +title_includes: GET_STRING +--- + +Extracts value from a `Variant` that contains `ARRAY` by `index`, or a `Variant` that contains `OBJECT` by `field_name`. +The value is returned as a `Variant` or `NULL` if either of the arguments is `NULL`. + +`GET` applies case-sensitive matching to `field_name`. For case-insensitive matching, use `GET_IGNORE_CASE`. + +## Syntax + +```sql +GET( , ) + +GET( , ) +``` + +## Arguments + +| Arguments | Description | +|----------------|------------------------------------------------------------------| +| `` | The VARIANT value that contains either an ARRAY or an OBJECT | +| `` | The Uint32 value specifies the position of the value in ARRAY | +| `` | The String value specifies the key in a key-value pair of OBJECT | + +## Return Type + +VARIANT + +## Examples + +```sql +SELECT get(parse_json('[2.71, 3.14]'), 0); ++------------------------------------+ +| get(parse_json('[2.71, 3.14]'), 0) | ++------------------------------------+ +| 2.71 | ++------------------------------------+ + +SELECT get(parse_json('{"aa":1, "aA":2, "Aa":3}'), 'aa'); ++---------------------------------------------------+ +| get(parse_json('{"aa":1, "aA":2, "Aa":3}'), 'aa') | ++---------------------------------------------------+ +| 1 | ++---------------------------------------------------+ + +SELECT get(parse_json('{"aa":1, "aA":2, "Aa":3}'), 'AA'); ++---------------------------------------------------+ +| get(parse_json('{"aa":1, "aA":2, "Aa":3}'), 'AA') | ++---------------------------------------------------+ +| NULL | ++---------------------------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/glob.md b/tidb-cloud-lake/sql/glob.md new file mode 100644 index 0000000000000..8d8a70f75682f --- /dev/null +++ b/tidb-cloud-lake/sql/glob.md @@ -0,0 +1,40 @@ +--- +title: GLOB +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Performs case-sensitive pattern matching using wildcard characters: + +- `?` matches any single character. +- `*` matches zero or more characters. + +## Syntax + +```sql +GLOB(, ) +``` + +## Return Type + +Returns BOOLEAN: `true` if the input string matches the pattern, `false` otherwise. + +## Examples + +```sql +SELECT + GLOB('abc', 'a?c'), + GLOB('abc', 'a*d'), + GLOB('abc', 'abc'), + GLOB('abc', 'abcd'), + GLOB('abcdef', 'a?c*'), + GLOB('hello', 'h*l');; + +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ glob('abc', 'a?c') │ glob('abc', 'a*d') │ glob('abc', 'abc') │ glob('abc', 'abcd') │ glob('abcdef', 'a?c*') │ glob('hello', 'h*l') │ +├────────────────────┼────────────────────┼────────────────────┼─────────────────────┼────────────────────────┼──────────────────────┤ +│ true │ false │ true │ false │ true │ false │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/grant.md b/tidb-cloud-lake/sql/grant.md new file mode 100644 index 0000000000000..ca6b40ae2a9f3 --- /dev/null +++ b/tidb-cloud-lake/sql/grant.md @@ -0,0 +1,319 @@ +--- +title: GRANT +sidebar_position: 9 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Grants privileges, roles, and ownership for a specific database object. This includes: + +- Granting privileges to roles. +- Assigning roles to users or other roles. +- Transferring ownership to a role. + +See also: + +- [REVOKE](/tidb-cloud-lake/sql/revoke.md) +- [SHOW GRANTS](/tidb-cloud-lake/sql/show-grants.md) + +> After changing privileges or roles with `GRANT`, run [SYSTEM FLUSH PRIVILEGES](/tidb-cloud-lake/guides/privileges.md) to broadcast the updates to every query node immediately. + +## Syntax + +### Granting Privileges + +To understand what a privilege is and how it works, see [Privileges](/tidb-cloud-lake/guides/privileges.md). + +:::note Important +CREATE-like privileges that create ownership objects cannot be granted directly to a user. These privileges must be granted to a role first, and then the role can be assigned to users. This includes: +- CREATE +- CREATE DATABASE +- CREATE WAREHOUSE +- CREATE CONNECTION +- CREATE SEQUENCE +- CREATE PROCEDURE +- CREATE MASKING POLICY +- CREATE ROW ACCESS POLICY + +Since `ALL` includes these CREATE privileges, `GRANT ALL ... TO USER` will also fail. For example, `GRANT ALL ON *.* TO USER u1` or `GRANT CREATE DATABASE ON *.* TO USER u1` will fail. Instead, use: +```sql +GRANT ALL ON *.* TO ROLE r1; +GRANT ROLE r1 TO USER u1; +``` +::: + +```sql +GRANT { + schemaObjectPrivileges | ALL [ PRIVILEGES ] ON + } +TO ROLE +``` + +Where: + +```sql +schemaObjectPrivileges ::= +-- For TABLE + { SELECT | INSERT } + +-- For SCHEMA + { CREATE | DROP | ALTER } + +-- For USER + { CREATE USER } + +-- For ROLE + { CREATE ROLE} + +-- For STAGE + { READ, WRITE } + +-- For UDF + { USAGE } + +-- For MASKING POLICY (account-level privileges) + { CREATE MASKING POLICY | APPLY MASKING POLICY } + +-- For ROW ACCESS POLICY (account-level privileges) + { CREATE ROW ACCESS POLICY | APPLY ROW ACCESS POLICY } +``` + +```sql +privileges_level ::= + *.* + | db_name.* + | db_name.tbl_name + | STAGE + | UDF + | MASKING POLICY + | ROW ACCESS POLICY +``` + +### Granting Masking Policy Privileges + +Use the following forms to manage access to individual masking policies: + +```sql +GRANT APPLY ON MASKING POLICY TO ROLE +GRANT ALL [ PRIVILEGES ] ON MASKING POLICY TO ROLE +GRANT OWNERSHIP ON MASKING POLICY TO ROLE '' +``` + +- `CREATE MASKING POLICY` allows a role to create new masking policies. +- `APPLY MASKING POLICY` lets grantees attach, detach, describe, or drop any masking policy when combined with the appropriate `ALTER TABLE` or policy commands. +- `GRANT APPLY ON MASKING POLICY ...` authorizes the grantee to manage a specific masking policy without granting global access. +- OWNERSHIP provides full control over the masking policy; Databend automatically grants OWNERSHIP on a new policy to the creator role and revokes it when the policy is dropped. + +### Granting Row Access Policy Privileges + +Use these forms to manage access to individual row access policies: + +```sql +GRANT APPLY ON ROW ACCESS POLICY TO ROLE +GRANT ALL [ PRIVILEGES ] ON ROW ACCESS POLICY TO ROLE +GRANT OWNERSHIP ON ROW ACCESS POLICY TO ROLE '' +``` + +- `CREATE ROW ACCESS POLICY` allows a role to create new row access policies. +- `APPLY ROW ACCESS POLICY` authorizes attaching or detaching any row access policy from tables, along with DESCRIBE/DROP commands. +- `GRANT APPLY ON ROW ACCESS POLICY ...` limits access to a specific row access policy. +- OWNERSHIP delivers full control over the row access policy; the creator role receives OWNERSHIP automatically and loses it when the policy is dropped. + +### Granting Role + +To understand what a role is and how it works, see [Roles](/tidb-cloud-lake/guides/roles.md). + +```sql +-- Grant a role to a user +GRANT ROLE TO + +-- Grant a role to a role +GRANT ROLE TO ROLE +``` + +### Granting Ownership + +To understand what ownership is and how it works, see [Ownership](/tidb-cloud-lake/guides/ownership.md). + +```sql +-- Grant ownership of a specific table within a database to a role +GRANT OWNERSHIP ON . TO ROLE '' + +-- Grant ownership of a stage to a role +GRANT OWNERSHIP ON STAGE TO ROLE '' + +-- Grant ownership of a user-defined function (UDF) to a role +GRANT OWNERSHIP ON UDF TO ROLE '' +``` + +## Examples + +### Example 1: Granting Privileges to a Role + +Create a role: +```sql +CREATE ROLE user1_role; +``` + +Grant the `ALL` privilege on all existing tables in the `default` database to the role `user1_role`: + +```sql +GRANT ALL ON default.* TO ROLE user1_role; +``` + +```sql +SHOW GRANTS FOR ROLE user1_role; ++--------------------------------------------------+ +| Grants | ++--------------------------------------------------+ +| GRANT ALL ON 'default'.* TO ROLE 'user1_role' | ++--------------------------------------------------+ +``` + +Grant the `ALL` privilege on all databases to the role `user1_role`: + +```sql +GRANT ALL ON *.* TO ROLE user1_role; +``` +```sql +SHOW GRANTS FOR ROLE user1_role; ++--------------------------------------------------+ +| Grants | ++--------------------------------------------------+ +| GRANT ALL ON 'default'.* TO ROLE 'user1_role' | +| GRANT ALL ON *.* TO ROLE 'user1_role' | ++--------------------------------------------------+ +``` + +Grant the `ALL` privilege on the stage named `s1` to the role `user1_role`: + +```sql +GRANT ALL ON STAGE s1 TO ROLE user1_role; +``` +```sql +SHOW GRANTS FOR ROLE user1_role; ++--------------------------------------------------+ +| Grants | ++--------------------------------------------------+ +| GRANT ALL ON STAGE s1 TO ROLE 'user1_role' | ++--------------------------------------------------+ +``` + +Grant the `ALL` privilege on the UDF named `f1` to the role `user1_role`: + +```sql +GRANT ALL ON UDF f1 TO ROLE user1_role; +``` +```sql +SHOW GRANTS FOR ROLE user1_role; ++--------------------------------------------------+ +| Grants | ++--------------------------------------------------+ +| GRANT ALL ON UDF f1 TO ROLE 'user1_role' | ++--------------------------------------------------+ +``` + +### Example 2: Granting Specific Privileges to a Role + +Grant the `SELECT` privilege on all existing tables in the `mydb` database to the role `role1`: + +Create role: +```sql +CREATE ROLE role1; +``` + +Grant privileges to the role: +```sql +GRANT SELECT ON mydb.* TO ROLE role1; +``` + +Show the grants for the role: +```sql +SHOW GRANTS FOR ROLE role1; ++-------------------------------------+ +| Grants | ++-------------------------------------+ +| GRANT SELECT ON 'mydb'.* TO 'role1' | ++-------------------------------------+ +``` + +### Example 3: Granting a Role to a User + +Create a user: +```sql +CREATE USER user1 IDENTIFIED BY 'abc123' WITH DEFAULT_ROLE = 'role1'; +``` + +Role `role1` grants are: +```sql +SHOW GRANTS FOR ROLE role1; ++-------------------------------------+ +| Grants | ++-------------------------------------+ +| GRANT SELECT ON 'mydb'.* TO 'role1' | ++-------------------------------------+ +``` + +Grant role `role1` to user `user1`: +```sql + GRANT ROLE role1 TO user1; +``` + +Now, user `user1` grants are: +```sql +SHOW GRANTS FOR user1; ++-------------------------------------+ +| Grants | ++-------------------------------------+ +| GRANT ROLE role1 TO 'user1'@'%' | ++-------------------------------------+ +``` + +### Example 4: Granting Ownership to a Role + +```sql +-- Grant ownership of all tables in the 'finance_data' database to the role 'data_owner' +GRANT OWNERSHIP ON finance_data.* TO ROLE 'data_owner'; + +-- Grant ownership of the table 'transactions' in the 'finance_data' schema to the role 'data_owner' +GRANT OWNERSHIP ON finance_data.transactions TO ROLE 'data_owner'; + +-- Grant ownership of the stage 'ingestion_stage' to the role 'data_owner' +GRANT OWNERSHIP ON STAGE ingestion_stage TO ROLE 'data_owner'; + +-- Grant ownership of the user-defined function 'calculate_profit' to the role 'data_owner' +GRANT OWNERSHIP ON UDF calculate_profit TO ROLE 'data_owner'; +``` + +### Example 5: Granting Masking Policy Privileges + +```sql +-- Allow the current user to create masking policies +GRANT CREATE MASKING POLICY ON *.* TO ROLE security_admin; + +-- Create a masking policy while assuming the security_admin role +CREATE MASKING POLICY email_mask AS (val STRING) RETURNS STRING -> '***'; + +-- Grant a role the ability to apply the policy when altering tables +GRANT APPLY ON MASKING POLICY email_mask TO ROLE pii_readers; + +-- Review the masking policy privileges +SHOW GRANTS ON MASKING POLICY email_mask; +``` + +### Example 6: Granting Row Access Policy Privileges + +```sql +-- Allow the current role to create row access policies +GRANT CREATE ROW ACCESS POLICY ON *.* TO ROLE row_policy_admin; + +-- Define a row access policy while assuming the row_policy_admin role +CREATE ROW ACCESS POLICY rap_region AS (region STRING) RETURNS BOOLEAN -> region = 'APAC'; + +-- Allow a role to apply the policy when altering tables +GRANT APPLY ON ROW ACCESS POLICY rap_region TO ROLE apac_only; + +-- Review the row access policy privileges +SHOW GRANTS ON ROW ACCESS POLICY rap_region; +``` diff --git a/tidb-cloud-lake/sql/greatest-ignore-nulls.md b/tidb-cloud-lake/sql/greatest-ignore-nulls.md new file mode 100644 index 0000000000000..415fd686361bd --- /dev/null +++ b/tidb-cloud-lake/sql/greatest-ignore-nulls.md @@ -0,0 +1,30 @@ +--- +title: GREATEST_IGNORE_NULLS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the maximum value from a set of values, ignoring any NULL values. + +See also: [GREATEST](/tidb-cloud-lake/sql/greatest.md) + +## Syntax + +```sql +GREATEST_IGNORE_NULLS(, ...) +``` + +## Examples + +```sql +SELECT GREATEST_IGNORE_NULLS(5, 9, 4), GREATEST_IGNORE_NULLS(5, 9, null); +``` + +```sql +┌────────────────────────────────────────────────────────────────────┐ +│ greatest_ignore_nulls(5, 9, 4) │ greatest_ignore_nulls(5, 9, NULL) │ +├────────────────────────────────┼───────────────────────────────────┤ +│ 9 │ 9 │ +└────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/greatest.md b/tidb-cloud-lake/sql/greatest.md new file mode 100644 index 0000000000000..f4c3cc048d956 --- /dev/null +++ b/tidb-cloud-lake/sql/greatest.md @@ -0,0 +1,30 @@ +--- +title: GREATEST +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the maximum value from a set of values. If any value in the set is `NULL`, the function returns `NULL`. + +See also: [GREATEST_IGNORE_NULLS](/tidb-cloud-lake/sql/greatest-ignore-nulls.md) + +## Syntax + +```sql +GREATEST(, ...) +``` + +## Examples + +```sql +SELECT GREATEST(5, 9, 4), GREATEST(5, 9, null); +``` + +```sql +┌──────────────────────────────────────────┐ +│ greatest(5, 9, 4) │ greatest(5, 9, NULL) │ +├───────────────────┼──────────────────────┤ +│ 9 │ NULL │ +└──────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/group-array-moving-avg.md b/tidb-cloud-lake/sql/group-array-moving-avg.md new file mode 100644 index 0000000000000..340d9e7854092 --- /dev/null +++ b/tidb-cloud-lake/sql/group-array-moving-avg.md @@ -0,0 +1,55 @@ +--- +title: GROUP_ARRAY_MOVING_AVG +--- + +The GROUP_ARRAY_MOVING_AVG function calculates the moving average of input values. The function can take the window size as a parameter. If left unspecified, the function takes the window size equal to the number of input values. + +## Syntax + +```sql +GROUP_ARRAY_MOVING_AVG() + +GROUP_ARRAY_MOVING_AVG()() +``` + +## Arguments + +| Arguments | Description | +|------------------| ------------------------ | +| `` | Any numerical expression | +| `` | Any numerical expression | + +## Return Type + +Returns an [Array](/tidb-cloud-lake/sql/array.md) with elements of double or decimal depending on the source data type. + +## Examples + +```sql +-- Create a table and insert sample data +CREATE TABLE hits ( + user_id INT, + request_num INT +); + +INSERT INTO hits (user_id, request_num) +VALUES (1, 10), + (2, 15), + (3, 20), + (1, 13), + (2, 21), + (3, 25), + (1, 30), + (2, 41), + (3, 45); + +SELECT user_id, GROUP_ARRAY_MOVING_AVG(2)(request_num) AS avg_request_num +FROM hits +GROUP BY user_id; + +| user_id | avg_request_num | +|---------|------------------| +| 1 | [5.0,11.5,21.5] | +| 3 | [10.0,22.5,35.0] | +| 2 | [7.5,18.0,31.0] | +``` diff --git a/tidb-cloud-lake/sql/group-array-moving-sum.md b/tidb-cloud-lake/sql/group-array-moving-sum.md new file mode 100644 index 0000000000000..64b41e1872ec9 --- /dev/null +++ b/tidb-cloud-lake/sql/group-array-moving-sum.md @@ -0,0 +1,55 @@ +--- +title: GROUP_ARRAY_MOVING_SUM +--- + +The GROUP_ARRAY_MOVING_SUM function calculates the moving sum of input values. The function can take the window size as a parameter. If left unspecified, the function takes the window size equal to the number of input values. + +## Syntax + +```sql +GROUP_ARRAY_MOVING_SUM() + +GROUP_ARRAY_MOVING_SUM()() +``` + +## Arguments + +| Arguments | Description | +|------------------| ------------------------ | +| `` | Any numerical expression | +| `` | Any numerical expression | + +## Return Type + +Returns an [Array](/tidb-cloud-lake/sql/array.md) with elements that are of the same type as the original data. + +## Examples + +```sql +-- Create a table and insert sample data +CREATE TABLE hits ( + user_id INT, + request_num INT +); + +INSERT INTO hits (user_id, request_num) +VALUES (1, 10), + (2, 15), + (3, 20), + (1, 13), + (2, 21), + (3, 25), + (1, 30), + (2, 41), + (3, 45); + +SELECT user_id, GROUP_ARRAY_MOVING_SUM(2)(request_num) AS request_num +FROM hits +GROUP BY user_id; + +| user_id | request_num | +|---------|-------------| +| 1 | [10,23,43] | +| 2 | [20,45,70] | +| 3 | [15,36,62] | +``` diff --git a/tidb-cloud-lake/sql/group-by.md b/tidb-cloud-lake/sql/group-by.md new file mode 100644 index 0000000000000..1e95d1ec69fc3 --- /dev/null +++ b/tidb-cloud-lake/sql/group-by.md @@ -0,0 +1,5 @@ +--- +title: GROUP BY +--- + +Databend supports GROUP BY with a variety of extensions. \ No newline at end of file diff --git a/tidb-cloud-lake/sql/group-concat.md b/tidb-cloud-lake/sql/group-concat.md new file mode 100644 index 0000000000000..8322601c93487 --- /dev/null +++ b/tidb-cloud-lake/sql/group-concat.md @@ -0,0 +1,5 @@ +--- +title: GROUP_CONCAT +--- + +Alias for [LISTAGG](/tidb-cloud-lake/sql/listagg.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/grouping.md b/tidb-cloud-lake/sql/grouping.md new file mode 100644 index 0000000000000..951a455baef6d --- /dev/null +++ b/tidb-cloud-lake/sql/grouping.md @@ -0,0 +1,42 @@ +--- +title: GROUPING +--- + +Returns a bit mask indicating which `GROUP BY` expressions are not included in the current grouping set. Bits are assigned with the rightmost argument corresponding to the least-significant bit; each bit is 0 if the corresponding expression is included in the grouping criteria of the grouping set generating the current result row, and 1 if it is not included. + +## Syntax + +```sql +GROUPING ( expr [, expr, ...] ) +``` + +:::note +`GROUPING` can only be used with `GROUPING SETS`, `ROLLUP`, or `CUBE`, and its arguments must be in the grouping sets list. +::: + +## Arguments + +Grouping sets items. + +## Return Type + +UInt32. + +## Examples + +```sql +select a, b, grouping(a), grouping(b), grouping(a,b), grouping(b,a) from t group by grouping sets ((a,b),(a),(b), ()) ; ++------+------+-------------+-------------+----------------+----------------+ +| a | b | grouping(a) | grouping(b) | grouping(a, b) | grouping(b, a) | ++------+------+-------------+-------------+----------------+----------------+ +| NULL | A | 1 | 0 | 2 | 1 | +| a | NULL | 0 | 1 | 1 | 2 | +| b | A | 0 | 0 | 0 | 0 | +| NULL | NULL | 1 | 1 | 3 | 3 | +| a | A | 0 | 0 | 0 | 0 | +| b | B | 0 | 0 | 0 | 0 | +| b | NULL | 0 | 1 | 1 | 2 | +| a | B | 0 | 0 | 0 | 0 | +| NULL | B | 1 | 0 | 2 | 1 | ++------+------+-------------+-------------+----------------+----------------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-cell-area-m2.md b/tidb-cloud-lake/sql/h3-cell-area-m2.md new file mode 100644 index 0000000000000..4935e5fc14d07 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-cell-area-m2.md @@ -0,0 +1,23 @@ +--- +title: H3_CELL_AREA_M2 +--- + +Returns the exact area of specific cell in square meters. + +## Syntax + +```sql +H3_CELL_AREA_M2(h3) +``` + +## Examples + +```sql +SELECT H3_CELL_AREA_M2(599119489002373119); + +┌─────────────────────────────────────┐ +│ h3_cell_area_m2(599119489002373119) │ +├─────────────────────────────────────┤ +│ 127785582.60809991 │ +└─────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-cell-area-rads2.md b/tidb-cloud-lake/sql/h3-cell-area-rads2.md new file mode 100644 index 0000000000000..590208dc19ed8 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-cell-area-rads2.md @@ -0,0 +1,23 @@ +--- +title: H3_CELL_AREA_RADS2 +--- + +Returns the exact area of specific cell in square radians. + +## Syntax + +```sql +H3_CELL_AREA_RADS2(h3) +``` + +## Examples + +```sql +SELECT H3_CELL_AREA_RADS2(599119489002373119); + +┌────────────────────────────────────────┐ +│ h3_cell_area_rads2(599119489002373119) │ +├────────────────────────────────────────┤ +│ 0.000003148224310427697 │ +└────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-distance.md b/tidb-cloud-lake/sql/h3-distance.md new file mode 100644 index 0000000000000..acd730c90cacc --- /dev/null +++ b/tidb-cloud-lake/sql/h3-distance.md @@ -0,0 +1,23 @@ +--- +title: H3_DISTANCE +--- + +Returns the grid distance between the the given two [H3](https://eng.uber.com/h3/) indexes. + +## Syntax + +```sql +H3_DISTANCE(h3, a_h3) +``` + +## Examples + +```sql +SELECT H3_DISTANCE(599119489002373119, 599119491149856767); + +┌─────────────────────────────────────────────────────┐ +│ h3_distance(599119489002373119, 599119491149856767) │ +├─────────────────────────────────────────────────────┤ +│ 1 │ +└─────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-edge-angle.md b/tidb-cloud-lake/sql/h3-edge-angle.md new file mode 100644 index 0000000000000..197b6c3bc5d42 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-edge-angle.md @@ -0,0 +1,23 @@ +--- +title: H3_EDGE_ANGLE +--- + +Returns the average length of the H3 hexagon edge in grades. + +## Syntax + +```sql +H3_EDGE_ANGLE(res) +``` + +## Examples + +```sql +SELECT H3_EDGE_ANGLE(10); + +┌───────────────────────┐ +│ h3_edge_angle(10) │ +├───────────────────────┤ +│ 0.0006822586214153981 │ +└───────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-edge-length-km.md b/tidb-cloud-lake/sql/h3-edge-length-km.md new file mode 100644 index 0000000000000..33fe4d5a2e6d3 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-edge-length-km.md @@ -0,0 +1,23 @@ +--- +title: H3_EDGE_LENGTH_KM +--- + +Returns the average hexagon edge length in kilometers at the given resolution. Excludes pentagons. + +## Syntax + +```sql +H3_EDGE_LENGTH_KM(res) +``` + +## Examples + +```sql +SELECT H3_EDGE_LENGTH_KM(1); + +┌──────────────────────┐ +│ h3_edge_length_km(1) │ +├──────────────────────┤ +│ 483.0568390711111 │ +└──────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-edge-length-m.md b/tidb-cloud-lake/sql/h3-edge-length-m.md new file mode 100644 index 0000000000000..da5a8b8fc0063 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-edge-length-m.md @@ -0,0 +1,21 @@ +--- +title: H3_EDGE_LENGTH_M +--- + +Returns the average hexagon edge length in meters at the given resolution. Excludes pentagons. + +## Syntax + +```sql +H3_EDGE_LENGTH_M(1) +``` + +## Examples + +```sql +┌─────────────────────┐ +│ h3_edge_length_m(1) │ +├─────────────────────┤ +│ 483056.8390711111 │ +└─────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-exact-edge-length-km.md b/tidb-cloud-lake/sql/h3-exact-edge-length-km.md new file mode 100644 index 0000000000000..9de9b42cef6c2 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-exact-edge-length-km.md @@ -0,0 +1,23 @@ +--- +title: H3_EXACT_EDGE_LENGTH_KM +--- + +Computes the length of this directed edge, in kilometers. + +## Syntax + +```sql +H3_EXACT_EDGE_LENGTH_KM(h3) +``` + +## Examples + +```sql +SELECT H3_EXACT_EDGE_LENGTH_KM(1319695429381652479); + +┌──────────────────────────────────────────────┐ +│ h3_exact_edge_length_km(1319695429381652479) │ +├──────────────────────────────────────────────┤ +│ 8.267326832647143 │ +└──────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-exact-edge-length-m.md b/tidb-cloud-lake/sql/h3-exact-edge-length-m.md new file mode 100644 index 0000000000000..0624a225138a7 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-exact-edge-length-m.md @@ -0,0 +1,23 @@ +--- +title: H3_EXACT_EDGE_LENGTH_M +--- + +Computes the length of this directed edge, in meters. + +## Syntax + +```sql +H3_EXACT_EDGE_LENGTH_M(h3) +``` + +## Examples + +```sql +SELECT H3_EXACT_EDGE_LENGTH_M(1319695429381652479); + +┌─────────────────────────────────────────────┐ +│ h3_exact_edge_length_m(1319695429381652479) │ +├─────────────────────────────────────────────┤ +│ 8267.326832647143 │ +└─────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-exact-edge-length-rads.md b/tidb-cloud-lake/sql/h3-exact-edge-length-rads.md new file mode 100644 index 0000000000000..221338f20a4ba --- /dev/null +++ b/tidb-cloud-lake/sql/h3-exact-edge-length-rads.md @@ -0,0 +1,23 @@ +--- +title: H3_EXACT_EDGE_LENGTH_RADS +--- + +Computes the length of this directed edge, in radians. + +## Syntax + +```sql +H3_EXACT_EDGE_LENGTH_RADS(h3) +``` + +## Examples + +```sql +SELECT H3_EXACT_EDGE_LENGTH_KM(1319695429381652479); + +┌──────────────────────────────────────────────┐ +│ h3_exact_edge_length_km(1319695429381652479) │ +├──────────────────────────────────────────────┤ +│ 8.267326832647143 │ +└──────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-get-base-cell.md b/tidb-cloud-lake/sql/h3-get-base-cell.md new file mode 100644 index 0000000000000..59c35d82eb472 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-get-base-cell.md @@ -0,0 +1,23 @@ +--- +title: H3_GET_BASE_CELL +--- + +Returns the base cell number of the given [H3](https://eng.uber.com/h3/) index. + +## Syntax + +```sql +H3_GET_BASE_CELL(h3) +``` + +## Examples + +```sql +SELECT H3_GET_BASE_CELL(644325524701193974); + +┌──────────────────────────────────────┐ +│ h3_get_base_cell(644325524701193974) │ +├──────────────────────────────────────┤ +│ 8 │ +└──────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-get-destination-index-unidirectional-edge.md b/tidb-cloud-lake/sql/h3-get-destination-index-unidirectional-edge.md new file mode 100644 index 0000000000000..337afd088df8d --- /dev/null +++ b/tidb-cloud-lake/sql/h3-get-destination-index-unidirectional-edge.md @@ -0,0 +1,23 @@ +--- +title: H3_GET_DESTINATION_INDEX_FROM_UNIDIRECTIONAL_EDGE +--- + +Returns the destination hexagon index from the unidirectional edge H3Index. + +## Syntax + +```sql +H3_GET_DESTINATION_INDEX_FROM_UNIDIRECTIONAL_EDGE(h3) +``` + +## Examples + +```sql +SELECT H3_GET_DESTINATION_INDEX_FROM_UNIDIRECTIONAL_EDGE(1248204388774707199); + +┌────────────────────────────────────────────────────────────────────────┐ +│ h3_get_destination_index_from_unidirectional_edge(1248204388774707199) │ +├────────────────────────────────────────────────────────────────────────┤ +│ 599686043507097599 │ +└────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-get-faces.md b/tidb-cloud-lake/sql/h3-get-faces.md new file mode 100644 index 0000000000000..7751eae781db3 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-get-faces.md @@ -0,0 +1,23 @@ +--- +title: H3_GET_FACES +--- + +Finds all icosahedron faces intersected by the given [H3](https://eng.uber.com/h3/) index. Faces are represented as integers from 0-19. + +## Syntax + +```sql +H3_GET_FACES(h3) +``` + +## Examples + +```sql +SELECT H3_GET_FACES(599119489002373119); + +┌──────────────────────────────────┐ +│ h3_get_faces(599119489002373119) │ +├──────────────────────────────────┤ +│ [0,1,2,3,4] │ +└──────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-get-indexes-unidirectional-edge.md b/tidb-cloud-lake/sql/h3-get-indexes-unidirectional-edge.md new file mode 100644 index 0000000000000..30985f6cf9fcd --- /dev/null +++ b/tidb-cloud-lake/sql/h3-get-indexes-unidirectional-edge.md @@ -0,0 +1,23 @@ +--- +title: H3_GET_INDEXES_FROM_UNIDIRECTIONAL_EDGE +--- + +Returns the origin and destination hexagon indexes from the given unidirectional edge H3Index. + +## Syntax + +```sql +H3_GET_INDEXES_FROM_UNIDIRECTIONAL_EDGE(h3) +``` + +## Examples + +```sql +SELECT H3_GET_INDEXES_FROM_UNIDIRECTIONAL_EDGE(1248204388774707199); + +┌──────────────────────────────────────────────────────────────┐ +│ h3_get_indexes_from_unidirectional_edge(1248204388774707199) │ +├──────────────────────────────────────────────────────────────┤ +│ (599686042433355775,599686043507097599) │ +└──────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-get-origin-index-unidirectional-edge.md b/tidb-cloud-lake/sql/h3-get-origin-index-unidirectional-edge.md new file mode 100644 index 0000000000000..b4b1210524346 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-get-origin-index-unidirectional-edge.md @@ -0,0 +1,23 @@ +--- +title: H3_GET_ORIGIN_INDEX_FROM_UNIDIRECTIONAL_EDGE +--- + +Returns the origin hexagon index from the unidirectional edge H3Index. + +## Syntax + +```sql +H3_GET_ORIGIN_INDEX_FROM_UNIDIRECTIONAL_EDGE(h3) +``` + +## Examples + +```sql +SELECT H3_GET_ORIGIN_INDEX_FROM_UNIDIRECTIONAL_EDGE(1248204388774707199); + +┌───────────────────────────────────────────────────────────────────┐ +│ h3_get_origin_index_from_unidirectional_edge(1248204388774707199) │ +├───────────────────────────────────────────────────────────────────┤ +│ 599686042433355775 │ +└───────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-get-resolution.md b/tidb-cloud-lake/sql/h3-get-resolution.md new file mode 100644 index 0000000000000..1428ad494210f --- /dev/null +++ b/tidb-cloud-lake/sql/h3-get-resolution.md @@ -0,0 +1,23 @@ +--- +title: H3_GET_RESOLUTION +--- + +Returns the resolution of the given [H3](https://eng.uber.com/h3/) index. + +## Syntax + +```sql +H3_GET_RESOLUTION(h3) +``` + +## Examples + +```sql +SELECT H3_GET_RESOLUTION(644325524701193974); + +┌───────────────────────────────────────┐ +│ h3_get_resolution(644325524701193974) │ +├───────────────────────────────────────┤ +│ 15 │ +└───────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-get-unidirectional-edge-boundary.md b/tidb-cloud-lake/sql/h3-get-unidirectional-edge-boundary.md new file mode 100644 index 0000000000000..06c9cdef4f139 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-get-unidirectional-edge-boundary.md @@ -0,0 +1,23 @@ +--- +title: H3_GET_UNIDIRECTIONAL_EDGE_BOUNDARY +--- + +Returns the coordinates defining the unidirectional edge. + +## Syntax + +```sql +H3_GET_UNIDIRECTIONAL_EDGE_BOUNDARY(h3) +``` + +## Examples + +```sql +SELECT H3_GET_UNIDIRECTIONAL_EDGE_BOUNDARY(1248204388774707199); + +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ h3_get_unidirectional_edge_boundary(1248204388774707199) │ +├─────────────────────────────────────────────────────────────────────────────────┤ +│ [(37.42012867767778,-122.03773496427027),(37.33755608435298,-122.090428929044)] │ +└─────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-get-unidirectional-edge.md b/tidb-cloud-lake/sql/h3-get-unidirectional-edge.md new file mode 100644 index 0000000000000..24f61a66d5c6b --- /dev/null +++ b/tidb-cloud-lake/sql/h3-get-unidirectional-edge.md @@ -0,0 +1,23 @@ +--- +title: H3_GET_UNIDIRECTIONAL_EDGE +--- + +Returns the edge between the given two [H3](https://eng.uber.com/h3/) indexes. + +## Syntax + +```sql +H3_GET_UNIDIRECTIONAL_EDGE(h3, a_h3) +``` + +## Examples + +```sql +SELECT H3_GET_UNIDIRECTIONAL_EDGE(644325524701193897, 644325524701193754); + +┌────────────────────────────────────────────────────────────────────┐ +│ h3_get_unidirectional_edge(644325524701193897, 644325524701193754) │ +├────────────────────────────────────────────────────────────────────┤ +│ 1581074247194257065 │ +└────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-get-unidirectional-edges-hexagon.md b/tidb-cloud-lake/sql/h3-get-unidirectional-edges-hexagon.md new file mode 100644 index 0000000000000..fa257541e2908 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-get-unidirectional-edges-hexagon.md @@ -0,0 +1,23 @@ +--- +title: H3_GET_UNIDIRECTIONAL_EDGES_FROM_HEXAGON +--- + +Returns all of the unidirectional edges from the provided H3Index. + +## Syntax + +```sql +H3_GET_UNIDIRECTIONAL_EDGES_FROM_HEXAGON(h3) +``` + +## Examples + +```sql +SELECT H3_GET_UNIDIRECTIONAL_EDGES_FROM_HEXAGON(644325524701193754); + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ h3_get_unidirectional_edges_from_hexagon(644325524701193754) │ +├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ [1292843871042545178,1364901465080473114,1436959059118401050,1509016653156328986,1581074247194256922,1653131841232184858] │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-hex-area-km2.md b/tidb-cloud-lake/sql/h3-hex-area-km2.md new file mode 100644 index 0000000000000..9a28511111b0a --- /dev/null +++ b/tidb-cloud-lake/sql/h3-hex-area-km2.md @@ -0,0 +1,23 @@ +--- +title: H3_HEX_AREA_KM2 +--- + +Returns the average hexagon area in square kilometers at the given resolution. Excludes pentagons. + +## Syntax + +```sql +H3_HEX_AREA_KM2(res) +``` + +## Examples + +```sql +SELECT H3_HEX_AREA_KM2(1); + +┌────────────────────┐ +│ h3_hex_area_km2(1) │ +├────────────────────┤ +│ 609788.4417941332 │ +└────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-hex-area-m2.md b/tidb-cloud-lake/sql/h3-hex-area-m2.md new file mode 100644 index 0000000000000..f02f58d256e09 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-hex-area-m2.md @@ -0,0 +1,23 @@ +--- +title: H3_HEX_AREA_M2 +--- + +Returns the average hexagon area in square meters at the given resolution. Excludes pentagons. + +## Syntax + +```sql +H3_HEX_AREA_M2(res) +``` + +## Examples + +```sql +SELECT H3_HEX_AREA_M2(1); + +┌───────────────────┐ +│ h3_hex_area_m2(1) │ +├───────────────────┤ +│ 609788441794.1339 │ +└───────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-hex-ring.md b/tidb-cloud-lake/sql/h3-hex-ring.md new file mode 100644 index 0000000000000..048545cf4a06e --- /dev/null +++ b/tidb-cloud-lake/sql/h3-hex-ring.md @@ -0,0 +1,23 @@ +--- +title: H3_HEX_RING +--- + +Returns the "hollow" ring of hexagons at exactly grid distance `k` from the given [H3](https://eng.uber.com/h3/) index. + +## Syntax + +```sql +H3_HEX_RING(h3, k) +``` + +## Examples + +```sql +SELECT H3_HEX_RING(599686042433355775, 2); + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ h3_hex_ring(599686042433355775, 2) │ +├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ [599686018811035647,599686034917163007,599686029548453887,599686032769679359,599686198125920255,599686040285872127,599686041359613951,599686039212130303,599686023106002943,599686027400970239,599686013442326527,599686012368584703] │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-indexes-are-neighbors.md b/tidb-cloud-lake/sql/h3-indexes-are-neighbors.md new file mode 100644 index 0000000000000..1b12966294057 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-indexes-are-neighbors.md @@ -0,0 +1,23 @@ +--- +title: H3_INDEXES_ARE_NEIGHBORS +--- + +Returns whether or not the provided [H3](https://eng.uber.com/h3/) indexes are neighbors. + +## Syntax + +```sql +H3_INDEXES_ARE_NEIGHBORS(h3, a_h3) +``` + +## Examples + +```sql +SELECT H3_INDEXES_ARE_NEIGHBORS(644325524701193974, 644325524701193897); + +┌──────────────────────────────────────────────────────────────────┐ +│ h3_indexes_are_neighbors(644325524701193974, 644325524701193897) │ +├──────────────────────────────────────────────────────────────────┤ +│ true │ +└──────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-is-pentagon.md b/tidb-cloud-lake/sql/h3-is-pentagon.md new file mode 100644 index 0000000000000..58ff9c4c50051 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-is-pentagon.md @@ -0,0 +1,23 @@ +--- +title: H3_IS_PENTAGON +--- + +Checks if the given [H3](https://eng.uber.com/h3/) index represents a pentagonal cell. + +## Syntax + +```sql +H3_IS_PENTAGON(h3) +``` + +## Examples + +```sql +SELECT H3_IS_PENTAGON(599119489002373119); + +┌────────────────────────────────────┐ +│ h3_is_pentagon(599119489002373119) │ +├────────────────────────────────────┤ +│ true │ +└────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-is-res-class-iii.md b/tidb-cloud-lake/sql/h3-is-res-class-iii.md new file mode 100644 index 0000000000000..87031ee6978df --- /dev/null +++ b/tidb-cloud-lake/sql/h3-is-res-class-iii.md @@ -0,0 +1,23 @@ +--- +title: H3_IS_RES_CLASS_III +--- + +Checks if the given [H3](https://eng.uber.com/h3/) index has a resolution with Class III orientation. + +## Syntax + +```sql +H3_IS_RES_CLASS_III(h3) +``` + +## Examples + +```sql +SELECT H3_IS_RES_CLASS_III(635318325446452991); + +┌─────────────────────────────────────────┐ +│ h3_is_res_class_iii(635318325446452991) │ +├─────────────────────────────────────────┤ +│ true │ +└─────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-is-valid.md b/tidb-cloud-lake/sql/h3-is-valid.md new file mode 100644 index 0000000000000..daabc1caa6dd9 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-is-valid.md @@ -0,0 +1,23 @@ +--- +title: H3_IS_VALID +--- + +Checks if the given [H3](https://eng.uber.com/h3/) index is valid. + +## Syntax + +```sql +H3_IS_VALID(h3) +``` + +## Examples + +```sql +SELECT H3_IS_VALID(644325524701193974); + +┌─────────────────────────────────┐ +│ h3_is_valid(644325524701193974) │ +├─────────────────────────────────┤ +│ true │ +└─────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-k-ring.md b/tidb-cloud-lake/sql/h3-k-ring.md new file mode 100644 index 0000000000000..aad22e0ce1953 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-k-ring.md @@ -0,0 +1,23 @@ +--- +title: H3_K_RING +--- + +Returns an array containing the [H3](https://eng.uber.com/h3/) indexes of the k-ring hexagons surrounding the input H3 index. Each element in this array is an H3 index. + +## Syntax + +```sql +H3_K_RING(h3, k) +``` + +## Examples + +```sql +SELECT H3_K_RING(644325524701193974, 1); + +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ h3_k_ring(644325524701193974, 1) │ +├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ [644325524701193974,644325524701193899,644325524701193869,644325524701193970,644325524701193968,644325524701193972,644325524701193897] │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-line.md b/tidb-cloud-lake/sql/h3-line.md new file mode 100644 index 0000000000000..975c89fd6b5f8 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-line.md @@ -0,0 +1,23 @@ +--- +title: H3_LINE +--- + +Returns the line of indexes between the given two [H3](https://eng.uber.com/h3/) indexes. + +## Syntax + +```sql +H3_LINE(h3, a_h3) +``` + +## Examples + +```sql +SELECT H3_LINE(599119489002373119, 599119491149856767); + +┌─────────────────────────────────────────────────┐ +│ h3_line(599119489002373119, 599119491149856767) │ +├─────────────────────────────────────────────────┤ +│ [599119489002373119,599119491149856767] │ +└─────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-num-hexagons.md b/tidb-cloud-lake/sql/h3-num-hexagons.md new file mode 100644 index 0000000000000..e60747906afc8 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-num-hexagons.md @@ -0,0 +1,23 @@ +--- +title: H3_NUM_HEXAGONS +--- + +Returns the number of unique [H3](https://eng.uber.com/h3/) indexes at the given resolution. + +## Syntax + +```sql +H3_NUM_HEXAGONS(res) +``` + +## Examples + +```sql +SELECT H3_NUM_HEXAGONS(10); + +┌─────────────────────┐ +│ h3_num_hexagons(10) │ +├─────────────────────┤ +│ 33897029882 │ +└─────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-to-center-child.md b/tidb-cloud-lake/sql/h3-to-center-child.md new file mode 100644 index 0000000000000..8be917d86282d --- /dev/null +++ b/tidb-cloud-lake/sql/h3-to-center-child.md @@ -0,0 +1,23 @@ +--- +title: H3_TO_CENTER_CHILD +--- + +Returns the center child index at the specified resolution. + +## Syntax + +```sql +H3_TO_CENTER_CHILD(h3, res) +``` + +## Examples + +```sql +SELECT H3_TO_CENTER_CHILD(599119489002373119, 15); + +┌────────────────────────────────────────────┐ +│ h3_to_center_child(599119489002373119, 15) │ +├────────────────────────────────────────────┤ +│ 644155484202336256 │ +└────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-to-children.md b/tidb-cloud-lake/sql/h3-to-children.md new file mode 100644 index 0000000000000..df72f6a9df082 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-to-children.md @@ -0,0 +1,23 @@ +--- +title: H3_TO_CHILDREN +--- + +Returns the indexes contained by `h3` at resolution `child_res`. + +## Syntax + +```sql +H3_TO_CHILDREN(h3, child_res) +``` + +## Examples + +```sql +SELECT H3_TO_CHILDREN(635318325446452991, 14); + +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ h3_to_children(635318325446452991, 14) │ +├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ [639821925073823431,639821925073823439,639821925073823447,639821925073823455,639821925073823463,639821925073823471,639821925073823479] │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-to-geo-boundary.md b/tidb-cloud-lake/sql/h3-to-geo-boundary.md new file mode 100644 index 0000000000000..9eb613949e8a1 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-to-geo-boundary.md @@ -0,0 +1,23 @@ +--- +title: H3_TO_GEO_BOUNDARY +--- + +Returns an array containing the longitude and latitude coordinates of the vertices of the hexagon corresponding to the [H3](https://eng.uber.com/h3/) index. + +## Syntax + +```sql +H3_TO_GEO_BOUNDARY(h3) +``` + +## Examples + +```sql +SELECT H3_TO_GEO_BOUNDARY(644325524701193974); + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ h3_to_geo_boundary(644325524701193974) │ +├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ [(37.79505811173477,55.712900225355526),(37.79506506997187,55.71289713485417),(37.795073126539855,55.71289934095484),(37.795074224871684,55.71290463755745),(37.79506726663349,55.71290772805916),(37.79505921006456,55.712905521957914)] │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-to-geo.md b/tidb-cloud-lake/sql/h3-to-geo.md new file mode 100644 index 0000000000000..e92ed19da800c --- /dev/null +++ b/tidb-cloud-lake/sql/h3-to-geo.md @@ -0,0 +1,23 @@ +--- +title: H3_TO_GEO +--- + +Returns the longitude and latitude corresponding to the given [H3](https://eng.uber.com/h3/) index. + +## Syntax + +```sql +H3_TO_GEO(h3) +``` + +## Examples + +```sql +SELECT H3_TO_GEO(644325524701193974); + +┌────────────────────────────────────────┐ +│ h3_to_geo(644325524701193974) │ +├────────────────────────────────────────┤ +│ (37.79506616830255,55.712902431456676) │ +└────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-to-parent.md b/tidb-cloud-lake/sql/h3-to-parent.md new file mode 100644 index 0000000000000..57dffdde30678 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-to-parent.md @@ -0,0 +1,23 @@ +--- +title: H3_TO_PARENT +--- + +Returns the parent index containing the `h3` at resolution `parent_res`. Returning 0 means an error occurred. + +## Syntax + +```sql +H3_TO_PARENT(h3, parent_res) +``` + +## Examples + +```sql +SELECT H3_TO_PARENT(635318325446452991, 12); + +┌──────────────────────────────────────┐ +│ h3_to_parent(635318325446452991, 12) │ +├──────────────────────────────────────┤ +│ 630814725819082751 │ +└──────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-to-string.md b/tidb-cloud-lake/sql/h3-to-string.md new file mode 100644 index 0000000000000..9862d4b57ef1e --- /dev/null +++ b/tidb-cloud-lake/sql/h3-to-string.md @@ -0,0 +1,23 @@ +--- +title: H3_TO_STRING +--- + +Converts the representation of the given [H3](https://eng.uber.com/h3/) index to the string representation. + +## Syntax + +```sql +H3_TO_STRING(h3) +``` + +## Examples + +```sql +SELECT H3_TO_STRING(635318325446452991); + +┌──────────────────────────────────┐ +│ h3_to_string(635318325446452991) │ +├──────────────────────────────────┤ +│ 8d11aa6a38826ff │ +└──────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/h3-unidirectional-edge-is-valid.md b/tidb-cloud-lake/sql/h3-unidirectional-edge-is-valid.md new file mode 100644 index 0000000000000..fb782d6efde28 --- /dev/null +++ b/tidb-cloud-lake/sql/h3-unidirectional-edge-is-valid.md @@ -0,0 +1,23 @@ +--- +title: H3_UNIDIRECTIONAL_EDGE_IS_VALID +--- + +Determines if the provided H3Index is a valid unidirectional edge index. Returns 1 if it's a unidirectional edge and 0 otherwise. + +## Syntax + +```sql +H3_UNIDIRECTIONAL_EDGE_IS_VALID(h3) +``` + +## Examples + +```sql +SELECT H3_UNIDIRECTIONAL_EDGE_IS_VALID(1248204388774707199); + +┌──────────────────────────────────────────────────────┐ +│ h3_unidirectional_edge_is_valid(1248204388774707199) │ +├──────────────────────────────────────────────────────┤ +│ true │ +└──────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/hash-functions.md b/tidb-cloud-lake/sql/hash-functions.md new file mode 100644 index 0000000000000..a54e753d68b96 --- /dev/null +++ b/tidb-cloud-lake/sql/hash-functions.md @@ -0,0 +1,58 @@ +--- +title: Hash Functions +--- + +This page provides a comprehensive overview of Hash functions in Databend, organized by functionality for easy reference. + +## Cryptographic Hash Functions + +| Function | Description | Example | +|----------|-------------|--------| +| [MD5](/tidb-cloud-lake/sql/md.md) | Calculates an MD5 128-bit checksum | `MD5('1234567890')` → `'e807f1fcf82d132f9bb018ca6738a19f'` | +| [SHA1](/tidb-cloud-lake/sql/sha.md) / [SHA](/tidb-cloud-lake/sql/sha.md) | Calculates an SHA-1 160-bit checksum | `SHA1('1234567890')` → `'01b307acba4f54f55aafc33bb06bbbf6ca803e9a'` | +| [SHA2](/tidb-cloud-lake/sql/sha.md) | Calculates SHA-2 family hash (SHA-224, SHA-256, SHA-384, SHA-512) | `SHA2('1234567890', 256)` → `'c775e7b757ede630cd0aa1113bd102661ab38829ca52a6422ab782862f268646'` | +| [BLAKE3](/tidb-cloud-lake/sql/blake.md) | Calculates a BLAKE3 hash | `BLAKE3('1234567890')` → `'e2cf6ae2a7e65c7b9e089da1ad582100a0d732551a6a07abb07f7a4a119ecc51'` | + +## Non-Cryptographic Hash Functions + +| Function | Description | Example | +|----------|-------------|--------| +| [XXHASH32](/tidb-cloud-lake/sql/xxhash.md) | Calculates an xxHash32 32-bit hash value | `XXHASH32('1234567890')` → `3768853052` | +| [XXHASH64](/tidb-cloud-lake/sql/xxhash.md) | Calculates an xxHash64 64-bit hash value | `XXHASH64('1234567890')` → `12237639266330420150` | +| [SIPHASH64](/tidb-cloud-lake/sql/siphash.md) / [SIPHASH](/tidb-cloud-lake/sql/siphash.md) | Calculates a SipHash-2-4 64-bit hash value | `SIPHASH64('1234567890')` → `2917646445633666330` | +| [CITY64WITHSEED](/tidb-cloud-lake/sql/city-withseed.md) | Calculates a CityHash64 hash with a seed value | `CITY64WITHSEED('1234567890', 42)` → `5210846883572933352` | + +## Usage Examples + +### Data Integrity Verification + +```sql +-- Calculate MD5 hash for file content verification +SELECT + filename, + MD5(file_content) AS content_hash +FROM files +ORDER BY filename; +``` + +### Data Anonymization + +```sql +-- Hash sensitive data before storing or processing +SELECT + user_id, + SHA2(email, 256) AS hashed_email, + SHA2(phone_number, 256) AS hashed_phone +FROM users; +``` + +### Hash-Based Partitioning + +```sql +-- Use hash functions for data distribution +SELECT + XXHASH64(customer_id) % 10 AS partition_id, + COUNT(*) AS records_count +FROM orders +GROUP BY partition_id; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/haversine.md b/tidb-cloud-lake/sql/haversine.md new file mode 100644 index 0000000000000..158aa26cde30d --- /dev/null +++ b/tidb-cloud-lake/sql/haversine.md @@ -0,0 +1,40 @@ +--- +title: HAVERSINE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Calculates the great circle distance in kilometers between two points on the Earth’s surface, using the [Haversine formula](https://en.wikipedia.org/wiki/Haversine_formula). The two points are specified by their latitude and longitude in degrees. + +## Syntax + +```sql +HAVERSINE(, , , ) +``` + +## Arguments + +| Arguments | Description | +|-----------|------------------------------------| +| `` | The latitude of the first point. | +| `` | The longitude of the first point. | +| `` | The latitude of the second point. | +| `` | The longitude of the second point. | + +## Return Type + +Double. + +## Examples + +```sql +SELECT + HAVERSINE(40.7127, -74.0059, 34.0500, -118.2500) AS distance + +┌────────────────┐ +│ distance │ +├────────────────┤ +│ 3936.390533556 │ +└────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/hex-functions.md b/tidb-cloud-lake/sql/hex-functions.md new file mode 100644 index 0000000000000..81cfb05ef5020 --- /dev/null +++ b/tidb-cloud-lake/sql/hex-functions.md @@ -0,0 +1,5 @@ +--- +title: HEX +--- + +Alias for [TO_HEX](/tidb-cloud-lake/sql/hex.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/histogram.md b/tidb-cloud-lake/sql/histogram.md new file mode 100644 index 0000000000000..8924247e9cbbc --- /dev/null +++ b/tidb-cloud-lake/sql/histogram.md @@ -0,0 +1,133 @@ +--- +title: HISTOGRAM +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Generates a data distribution histogram using an "equal height" bucketing strategy. + +## Syntax + +```sql +HISTOGRAM() + +-- The following two forms are equivalent: +HISTOGRAM()() +HISTOGRAM( [, ]) +``` + +| Parameter | Description | +|-------------------|-------------------------------------------------------------------------------------| +| `expr` | The data type of `expr` should be sortable. | +| `max_num_buckets` | Optional positive integer specifying the maximum number of buckets. Default is 128. | + +## Return Type + +Returns either an empty string or a JSON object with the following structure: + +- **buckets**: List of buckets with detailed information: + - **lower**: Lower bound of the bucket. + - **upper**: Upper bound of the bucket. + - **count**: Number of elements in the bucket. + - **pre_sum**: Cumulative count of elements up to the current bucket. + - **ndv**: Number of distinct values in the bucket. + +## Examples + +This example shows how the HISTOGRAM function analyzes the distribution of `c_int` values in the `histagg` table, returning bucket boundaries, distinct value counts, element counts, and cumulative counts: + +```sql +CREATE TABLE histagg ( + c_id INT, + c_tinyint TINYINT, + c_smallint SMALLINT, + c_int INT +); + +INSERT INTO histagg VALUES + (1, 10, 20, 30), + (1, 11, 21, 33), + (1, 11, 12, 13), + (2, 21, 22, 23), + (2, 31, 32, 33), + (2, 10, 20, 30); + +SELECT HISTOGRAM(c_int) FROM histagg; + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ histogram(c_int) │ +├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ [{"lower":"13","upper":"13","ndv":1,"count":1,"pre_sum":0},{"lower":"23","upper":"23","ndv":1,"count":1,"pre_sum":1},{"lower":"30","upper":"30","ndv":1,"count":2,"pre_sum":2},{"lower":"33","upper":"33","ndv":1,"count":2,"pre_sum":4}] │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +The result is returned as a JSON array: + +```json +[ + { + "lower": "13", + "upper": "13", + "ndv": 1, + "count": 1, + "pre_sum": 0 + }, + { + "lower": "23", + "upper": "23", + "ndv": 1, + "count": 1, + "pre_sum": 1 + }, + { + "lower": "30", + "upper": "30", + "ndv": 1, + "count": 2, + "pre_sum": 2 + }, + { + "lower": "33", + "upper": "33", + "ndv": 1, + "count": 2, + "pre_sum": 4 + } +] +``` + +This example shows how `HISTOGRAM(2)` groups c_int values into two buckets: + +```sql +SELECT HISTOGRAM(2)(c_int) FROM histagg; +-- Or +SELECT HISTOGRAM(c_int, 2) FROM histagg; + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ histogram(2)(c_int) │ +├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ [{"lower":"13","upper":"30","ndv":3,"count":4,"pre_sum":0},{"lower":"33","upper":"33","ndv":1,"count":2,"pre_sum":4}] │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +The result is returned as a JSON array: + +```json +[ + { + "lower": "13", + "upper": "30", + "ndv": 3, + "count": 4, + "pre_sum": 0 + }, + { + "lower": "33", + "upper": "33", + "ndv": 1, + "count": 2, + "pre_sum": 4 + } +] +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/hour.md b/tidb-cloud-lake/sql/hour.md new file mode 100644 index 0000000000000..8563bc84fe5ab --- /dev/null +++ b/tidb-cloud-lake/sql/hour.md @@ -0,0 +1,35 @@ +--- +title: TO_HOUR +--- + +Converts a date with time (timestamp/datetime) to a UInt8 number containing the number of the hour in 24-hour time (0-23). +This function assumes that if clocks are moved ahead, it is by one hour and occurs at 2 a.m., and if clocks are moved back, it is by one hour and occurs at 3 a.m. (which is not always true – even in Moscow the clocks were twice changed at a different time). + +## Syntax + +```sql +TO_HOUR() +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | timestamp | + +## Return Type + +`TINYINT` + +## Examples + +```sql +SELECT + to_hour('2023-11-12 09:38:18.165575'); + +┌───────────────────────────────────────┐ +│ to_hour('2023-11-12 09:38:18.165575') │ +├───────────────────────────────────────┤ +│ 9 │ +└───────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/hours.md b/tidb-cloud-lake/sql/hours.md new file mode 100644 index 0000000000000..d0dda1a5dc282 --- /dev/null +++ b/tidb-cloud-lake/sql/hours.md @@ -0,0 +1,32 @@ +--- +title: TO_HOURS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts a specified number of hours into an Interval type. + +- Accepts positive integers, zero, and negative integers as input. + +## Syntax + +```sql +TO_HOURS() +``` + +## Return Type + +Interval (in the format `hh:mm:ss`). + +## Examples + +```sql +SELECT TO_HOURS(2), TO_HOURS(0), TO_HOURS((- 2)); + +┌───────────────────────────────────────────┐ +│ to_hours(2) │ to_hours(0) │ to_hours(- 2) │ +├─────────────┼─────────────┼───────────────┤ +│ 2:00:00 │ 00:00:00 │ -2:00:00 │ +└───────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/humanize-number.md b/tidb-cloud-lake/sql/humanize-number.md new file mode 100644 index 0000000000000..52d2151ad8173 --- /dev/null +++ b/tidb-cloud-lake/sql/humanize-number.md @@ -0,0 +1,33 @@ +--- +title: HUMANIZE_NUMBER +--- + +Returns a readable number. + +## Syntax + +```sql +HUMANIZE_NUMBER(x); +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------------------| +| x | The numerical size. | + + +## Return Type + +String. + +## Examples + +```sql +SELECT HUMANIZE_NUMBER(1000 * 1000) ++-------------------------+ +| HUMANIZE_NUMBER((1000 * 1000)) | ++-------------------------+ +| 1 million | ++-------------------------+ +``` diff --git a/tidb-cloud-lake/sql/humanize-size.md b/tidb-cloud-lake/sql/humanize-size.md new file mode 100644 index 0000000000000..0347799863542 --- /dev/null +++ b/tidb-cloud-lake/sql/humanize-size.md @@ -0,0 +1,33 @@ +--- +title: HUMANIZE_SIZE +--- + +Returns the readable size with a suffix(KiB, MiB, etc). + +## Syntax + +```sql +HUMANIZE_SIZE(x); +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------------------| +| x | The numerical size. | + + +## Return Type + +String. + +## Examples + +```sql +SELECT HUMANIZE_SIZE(1024 * 1024) ++-------------------------+ +| HUMANIZE_SIZE((1024 * 1024)) | ++-------------------------+ +| 1 MiB | ++-------------------------+ +``` diff --git a/tidb-cloud-lake/sql/iceberg-manifest.md b/tidb-cloud-lake/sql/iceberg-manifest.md new file mode 100644 index 0000000000000..f9cf1f8685074 --- /dev/null +++ b/tidb-cloud-lake/sql/iceberg-manifest.md @@ -0,0 +1,53 @@ +--- +title: ICEBERG_MANIFEST +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns metadata about manifest files of an Iceberg table, including file paths, partitioning details, and snapshot associations. + +## Syntax + +```sql +ICEBERG_MANIFEST('', ''); +``` + +## Output + +The function returns a table with the following columns: + +- `content` (`INT`): The content type (0 for data files, 1 for delete files). +- `path` (`STRING`): The file path of the data or delete file. +- `length` (`BIGINT`): The file size in bytes. +- `partition_spec_id` (`INT`): The partition specification ID associated with the file. +- `added_snapshot_id` (`BIGINT`): The snapshot ID that added this file. +- `added_data_files_count` (`INT`): The number of new data files added. +- `existing_data_files_count` (`INT`): The number of existing data files referenced. +- `deleted_data_files_count` (`INT`): The number of data files deleted. +- `added_delete_files_count` (`INT`): The number of delete files added. +- `partition_summaries` (`MAP`): Summary of partition values related to the file. + +## Examples + +```sql +SELECT * FROM ICEBERG_MANIFEST('tpcds', 'catalog_returns'); + +╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ +│ content │ path │ length │ partition_spec │ added_snapshot │ added_data_fil │ existing_data_ │ deleted_data_ │ added_delete_ │ existing_dele │ deleted_delet │ partition_sum │ +│ Int32 │ String │ Int64 │ _id │ _id │ es_count │ files_count │ files_count │ files_count │ te_files_coun │ e_files_count │ maries │ +│ │ │ │ Int32 │ Nullable(Int64 │ Nullable(Int32 │ Nullable(Int32 │ Nullable(Int3 │ Nullable(Int3 │ t │ Nullable(Int3 │ Array(Nullabl │ +│ │ │ │ │ ) │ ) │ ) │ 2) │ 2) │ Nullable(Int3 │ 2) │ e(Tuple(Nulla │ +│ │ │ │ │ │ │ │ │ │ 2) │ │ ble(Boolean), │ +│ │ │ │ │ │ │ │ │ │ │ │ Nullable(Bool │ +│ │ │ │ │ │ │ │ │ │ │ │ ean), String, │ +│ │ │ │ │ │ │ │ │ │ │ │ String))) │ +├─────────┼────────────────┼────────┼────────────────┼────────────────┼────────────────┼────────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┤ +│ 0 │ s3://warehouse │ 9241 │ 0 │ 75657674165904 │ 2 │ 0 │ 0 │ 2 │ 0 │ 0 │ [] │ +│ │ /catalog_retur │ │ │ 11866 │ │ │ │ │ │ │ │ +│ │ ns/metadata/fa │ │ │ │ │ │ │ │ │ │ │ +│ │ 1ea4d5-a382-49 │ │ │ │ │ │ │ │ │ │ │ +│ │ 7a-9f22-1acb9a │ │ │ │ │ │ │ │ │ │ │ +│ │ 74a346-m0.avro │ │ │ │ │ │ │ │ │ │ │ +╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/iceberg-snapshot.md b/tidb-cloud-lake/sql/iceberg-snapshot.md new file mode 100644 index 0000000000000..9256a1cba8ee1 --- /dev/null +++ b/tidb-cloud-lake/sql/iceberg-snapshot.md @@ -0,0 +1,48 @@ +--- +title: ICEBERG_SNAPSHOT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns metadata about snapshots of an Iceberg table, including information about data changes, operations, and summary statistics. + +## Syntax + +```sql +ICEBERG_SNAPSHOT('', ''); +``` + +## Output + +The function returns a table with the following columns: + +- `committed_at` (`TIMESTAMP`): The timestamp when the snapshot was committed. +- `snapshot_id` (`BIGINT`): The unique identifier of the snapshot. +- `parent_id` (`BIGINT`): The parent snapshot ID, if applicable. +- `operation` (`STRING`): The type of operation performed (e.g., append, overwrite, delete). +- `manifest_list` (`STRING`): The file path of the manifest list associated with the snapshot. +- `summary` (`MAP`): A JSON-like structure containing additional metadata, such as: + - `added-data-files`: Number of newly added data files. + - `added-records`: Number of new records added. + - `total-records`: Total number of records in the snapshot. + - `total-files-size`: Total size of all data files (in bytes). + - `total-data-files`: Total number of data files in the snapshot. + - `total-delete-files`: Total number of delete files in the snapshot. + +## Examples + +```sql +SELECT * FROM ICEBERG_SNAPSHOT('tpcds', 'catalog_returns'); + +╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ +│ committed_at │ snapshot_id │ parent_id │ operation │ manifest_list │ summary │ +├────────────────────────────┼─────────────────────┼───────────┼───────────┼──────────────────────────────────────────────────────┼─────────────────────────────────────────────────────┤ +│ 2025-03-12 23:18:26.626000 │ 7565767416590411866 │ 0 │ append │ s3://warehouse/catalog_returns/metadata/snap-7565767 │ {'spark.app.id':'local-1741821433430','added-data-f │ +│ │ │ │ │ 416590411866-1-fa1ea4d5-a382-497a-9f22-1acb9a74a346. │ iles':'2','added-records':'144067','total-equality- │ +│ │ │ │ │ avro │ deletes':'0','changed-partition-count':'1','total-r │ +│ │ │ │ │ │ ecords':'144067','total-files-size':'7679811','tota │ +│ │ │ │ │ │ l-data-files':'2','added-files-size':'7679811','tot │ +│ │ │ │ │ │ al-delete-files':'0','total-position-deletes':'0'} │ +╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/if.md b/tidb-cloud-lake/sql/if.md new file mode 100644 index 0000000000000..ed846cf799412 --- /dev/null +++ b/tidb-cloud-lake/sql/if.md @@ -0,0 +1,30 @@ +--- +title: IF +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +If `` is TRUE, it returns ``. Otherwise if `` is TRUE, it returns ``, and so on. + +## Syntax + +```sql +IF(, , [, ...], ) +``` + +## Aliases + +- [IFF](/tidb-cloud-lake/sql/iff.md) + +## Examples + +```sql +SELECT IF(1 > 2, 3, 4 < 5, 6, 7); + +┌───────────────────────────────┐ +│ if((1 > 2), 3, (4 < 5), 6, 7) │ +├───────────────────────────────┤ +│ 6 │ +└───────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/iff.md b/tidb-cloud-lake/sql/iff.md new file mode 100644 index 0000000000000..605b681a1ef9c --- /dev/null +++ b/tidb-cloud-lake/sql/iff.md @@ -0,0 +1,8 @@ +--- +title: IFF +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Alias for [IF](/tidb-cloud-lake/sql/if.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/ifnull.md b/tidb-cloud-lake/sql/ifnull.md new file mode 100644 index 0000000000000..42f0d77795059 --- /dev/null +++ b/tidb-cloud-lake/sql/ifnull.md @@ -0,0 +1,35 @@ +--- +title: IFNULL +--- + +If `` is NULL, returns ``, otherwise returns ``. + +## Syntax + +```sql +IFNULL(, ) +``` + +## Aliases + +- [NVL](/tidb-cloud-lake/sql/nvl.md) + +## Examples + +```sql +SELECT IFNULL(NULL, 'b'), IFNULL('a', 'b'); + +┌──────────────────────────────────────┐ +│ ifnull(null, 'b') │ ifnull('a', 'b') │ +├───────────────────┼──────────────────┤ +│ b │ a │ +└──────────────────────────────────────┘ + +SELECT IFNULL(NULL, 2), IFNULL(1, 2); + +┌────────────────────────────────┐ +│ ifnull(null, 2) │ ifnull(1, 2) │ +├─────────────────┼──────────────┤ +│ 2 │ 1 │ +└────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/inet-aton.md b/tidb-cloud-lake/sql/inet-aton.md new file mode 100644 index 0000000000000..3a78932d27fbf --- /dev/null +++ b/tidb-cloud-lake/sql/inet-aton.md @@ -0,0 +1,31 @@ +--- +title: INET_ATON +--- + +Converts an IPv4 address to a 32-bit integer. + +## Syntax + +```sql +INET_ATON( '' ) +``` + +## Aliases + +- [IPV4_STRING_TO_NUM](ipv4-string-to-num.md) + +## Return Type + +Integer. + +## Examples + +```sql +SELECT IPV4_STRING_TO_NUM('1.2.3.4'), INET_ATON('1.2.3.4'); + +┌──────────────────────────────────────────────────────┐ +│ ipv4_string_to_num('1.2.3.4') │ inet_aton('1.2.3.4') │ +├───────────────────────────────┼──────────────────────┤ +│ 16909060 │ 16909060 │ +└──────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/inet-ntoa.md b/tidb-cloud-lake/sql/inet-ntoa.md new file mode 100644 index 0000000000000..13ee2de3d9bf2 --- /dev/null +++ b/tidb-cloud-lake/sql/inet-ntoa.md @@ -0,0 +1,31 @@ +--- +title: INET_NTOA +--- + +Converts a 32-bit integer to an IPv4 address. + +## Syntax + +```sql +INET_NOTA( ) +``` + +## Aliases + +- [IPV4_NUM_TO_STRING](ipv4-num-to-string.md) + +## Return Type + +String. + +## Examples + +```sql +SELECT IPV4_NUM_TO_STRING(16909060), INET_NTOA(16909060); + +┌────────────────────────────────────────────────────┐ +│ ipv4_num_to_string(16909060) │ inet_ntoa(16909060) │ +├──────────────────────────────┼─────────────────────┤ +│ 1.2.3.4 │ 1.2.3.4 │ +└────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/infer-schema.md b/tidb-cloud-lake/sql/infer-schema.md new file mode 100644 index 0000000000000..820e05b020117 --- /dev/null +++ b/tidb-cloud-lake/sql/infer-schema.md @@ -0,0 +1,253 @@ +--- +title: INFER_SCHEMA +--- + +Automatically detects the file metadata schema and retrieves the column definitions. + +`infer_schema` currently supports the following file formats: +- **Parquet** - Native support for schema inference +- **CSV** - With customizable delimiters and header detection +- **NDJSON** - Newline-delimited JSON files + +**Compression Support**: All formats also support compressed files with extensions `.zip`, `.xz`, `.zst`. + +:::info File Size Limit +Each individual file has a maximum size limit of **100MB** for schema inference. +::: + +:::info Schema Merging +When processing multiple files, `infer_schema` automatically merges different schemas: + +- **Compatible types** are promoted (e.g., INT8 + INT16 → INT16) +- **Incompatible types** fall back to **VARCHAR** (e.g., INT + FLOAT → VARCHAR) +- **Missing columns** in some files are marked as **nullable** +- **New columns** from later files are added to the final schema + +This ensures all files can be read using the unified schema. +::: + +## Syntax + +```sql +INFER_SCHEMA( + LOCATION => '{ internalStage | externalStage }' + [ PATTERN => ''] + [ FILE_FORMAT => '' ] + [ MAX_RECORDS_PRE_FILE => ] + [ MAX_FILE_COUNT => ] +) +``` + +## Parameters + +| Parameter | Description | Default | Example | +|-----------|-------------|---------|---------| +| `LOCATION` | Stage location: `@[/]` | Required | `'@my_stage/data/'` | +| `PATTERN` | File name pattern to match | All files | `'*.csv'`, `'*.parquet'` | +| `FILE_FORMAT` | File format name for parsing | Stage's format | `'csv_format'`, `'NDJSON'` | +| `MAX_RECORDS_PRE_FILE` | Max records to sample per file | All records | `100`, `1000` | +| `MAX_FILE_COUNT` | Max number of files to process | All files | `5`, `10` | + +## Examples + +### Parquet Files + +```sql +-- Create stage and export data +CREATE STAGE test_parquet; +COPY INTO @test_parquet FROM (SELECT number FROM numbers(10)) FILE_FORMAT = (TYPE = 'PARQUET'); + +-- Infer schema from parquet files using pattern +SELECT * FROM INFER_SCHEMA( + location => '@test_parquet', + pattern => '*.parquet' +); +``` + +Result: +``` ++-------------+-----------------+----------+----------+----------+ +| column_name | type | nullable | filenames| order_id | ++-------------+-----------------+----------+----------+----------+ +| number | BIGINT UNSIGNED | false | data_... | 0 | ++-------------+-----------------+----------+----------+----------+ +``` + +### CSV Files + +```sql +-- Create stage and export CSV data +CREATE STAGE test_csv; +COPY INTO @test_csv FROM (SELECT number FROM numbers(10)) FILE_FORMAT = (TYPE = 'CSV'); + +-- Create a CSV file format +CREATE FILE FORMAT csv_format TYPE = 'CSV'; + +-- Infer schema using pattern and file format +SELECT * FROM INFER_SCHEMA( + location => '@test_csv', + pattern => '*.csv', + file_format => 'csv_format' +); +``` + +Result: +``` ++-------------+---------+----------+----------+----------+ +| column_name | type | nullable | filenames| order_id | ++-------------+---------+----------+----------+----------+ +| column_1 | BIGINT | true | data_... | 0 | ++-------------+---------+----------+----------+----------+ +``` + +For CSV files with headers: + +```sql +-- Create CSV file format with header support +CREATE FILE FORMAT csv_headers_format +TYPE = 'CSV' +field_delimiter = ',' +skip_header = 1; + +-- Export data with headers +CREATE STAGE test_csv_headers; +COPY INTO @test_csv_headers FROM ( + SELECT number as user_id, 'user_' || number::string as user_name + FROM numbers(5) +) FILE_FORMAT = (TYPE = 'CSV', output_header = true); + +-- Infer schema with headers +SELECT * FROM INFER_SCHEMA( + location => '@test_csv_headers', + file_format => 'csv_headers_format' +); +``` + +Limit records for faster inference: + +```sql +-- Sample only first 5 records for schema inference +SELECT * FROM INFER_SCHEMA( + location => '@test_csv', + pattern => '*.csv', + file_format => 'csv_format', + max_records_pre_file => 5 +); +``` + +### NDJSON Files + +```sql +-- Create stage and export NDJSON data +CREATE STAGE test_ndjson; +COPY INTO @test_ndjson FROM (SELECT number FROM numbers(10)) FILE_FORMAT = (TYPE = 'NDJSON'); + +-- Infer schema using pattern and NDJSON format +SELECT * FROM INFER_SCHEMA( + location => '@test_ndjson', + pattern => '*.ndjson', + file_format => 'NDJSON' +); +``` + +Result: +``` ++-------------+---------+----------+----------+----------+ +| column_name | type | nullable | filenames| order_id | ++-------------+---------+----------+----------+----------+ +| number | BIGINT | true | data_... | 0 | ++-------------+---------+----------+----------+----------+ +``` + +Limit records for faster inference: + +```sql +-- Sample only first 5 records for schema inference +SELECT * FROM INFER_SCHEMA( + location => '@test_ndjson', + pattern => '*.ndjson', + file_format => 'NDJSON', + max_records_pre_file => 5 +); +``` + +### Schema Merging with Multiple Files + +When files have different schemas, `infer_schema` merges them intelligently: + +```sql +-- Suppose you have multiple CSV files with different schemas: +-- file1.csv: id(INT), name(VARCHAR) +-- file2.csv: id(INT), name(VARCHAR), age(INT) +-- file3.csv: id(FLOAT), name(VARCHAR), age(INT) + +SELECT * FROM INFER_SCHEMA( + location => '@my_stage/', + pattern => '*.csv', + file_format => 'csv_format' +); +``` + +Result shows merged schema: +``` ++-------------+---------+----------+-----------+----------+ +| column_name | type | nullable | filenames | order_id | ++-------------+---------+----------+-----------+----------+ +| id | VARCHAR | true | file1,... | 0 | -- INT+FLOAT→VARCHAR +| name | VARCHAR | true | file1,... | 1 | +| age | BIGINT | true | file1,... | 2 | -- Missing in file1→nullable ++-------------+---------+----------+-----------+----------+ +``` + +### Pattern Matching and File Limits + +Use pattern matching to infer schema from multiple files: + +```sql +-- Infer schema from all CSV files in the directory +SELECT * FROM INFER_SCHEMA( + location => '@my_stage/', + pattern => '*.csv' +); +``` + +Limit the number of files processed to improve performance: + +```sql +-- Process only the first 5 matching files +SELECT * FROM INFER_SCHEMA( + location => '@my_stage/', + pattern => '*.csv', + max_file_count => 5 +); +``` + +### Compressed Files + +`infer_schema` automatically handles compressed files: + +```sql +-- Works with compressed CSV files +SELECT * FROM INFER_SCHEMA(location => '@my_stage/data.csv.zip'); + +-- Works with compressed NDJSON files +SELECT * FROM INFER_SCHEMA( + location => '@my_stage/data.ndjson.xz', + file_format => 'NDJSON', + max_records_pre_file => 50 +); +``` + +### Create Table from Inferred Schema + +The `infer_schema` function displays the schema but doesn't create tables. To create a table from the inferred schema: + +```sql +-- Create table structure from file schema +CREATE TABLE my_table AS +SELECT * FROM @my_stage/ (pattern=>'*.parquet') +LIMIT 0; + +-- Verify the table structure +DESC my_table; +``` diff --git a/tidb-cloud-lake/sql/information-schema-columns.md b/tidb-cloud-lake/sql/information-schema-columns.md new file mode 100644 index 0000000000000..d2b5144693dee --- /dev/null +++ b/tidb-cloud-lake/sql/information-schema-columns.md @@ -0,0 +1,46 @@ +--- +title: information_schema.columns +--- + +Contains information about columns of tables. + +```sql +desc information_schema.columns + +╭─────────────────────────────────────────────────────────────────────────╮ +│ Field │ Type │ Null │ Default │ Extra │ +│ String │ String │ String │ String │ String │ +├──────────────────────────┼──────────────────┼────────┼─────────┼────────┤ +│ table_catalog │ VARCHAR │ NO │ '' │ │ +│ table_schema │ VARCHAR │ NO │ '' │ │ +│ table_name │ VARCHAR │ NO │ '' │ │ +│ column_name │ VARCHAR │ NO │ '' │ │ +│ ordinal_position │ TINYINT UNSIGNED │ NO │ 0 │ │ +│ column_default │ NULL │ NO │ NULL │ │ +│ column_comment │ VARCHAR │ NO │ '' │ │ +│ column_key │ NULL │ NO │ NULL │ │ +│ nullable │ TINYINT UNSIGNED │ YES │ NULL │ │ +│ is_nullable │ VARCHAR │ NO │ '' │ │ +│ data_type │ VARCHAR │ NO │ '' │ │ +│ column_type │ VARCHAR │ NO │ '' │ │ +│ character_maximum_length │ NULL │ NO │ NULL │ │ +│ character_octet_length │ NULL │ NO │ NULL │ │ +│ numeric_precision │ NULL │ NO │ NULL │ │ +│ numeric_precision_radix │ NULL │ NO │ NULL │ │ +│ numeric_scale │ NULL │ NO │ NULL │ │ +│ datetime_precision │ NULL │ NO │ NULL │ │ +│ character_set_catalog │ NULL │ NO │ NULL │ │ +│ character_set_schema │ NULL │ NO │ NULL │ │ +│ character_set_name │ NULL │ NO │ NULL │ │ +│ collation_catalog │ NULL │ NO │ NULL │ │ +│ collation_schema │ NULL │ NO │ NULL │ │ +│ collation_name │ NULL │ NO │ NULL │ │ +│ domain_catalog │ NULL │ NO │ NULL │ │ +│ domain_schema │ NULL │ NO │ NULL │ │ +│ domain_name │ NULL │ NO │ NULL │ │ +│ privileges │ NULL │ NO │ NULL │ │ +│ default │ VARCHAR │ NO │ '' │ │ +│ extra │ NULL │ NO │ NULL │ │ +╰─────────────────────────────────────────────────────────────────────────╯ + +``` diff --git a/tidb-cloud-lake/sql/information-schema-keywords.md b/tidb-cloud-lake/sql/information-schema-keywords.md new file mode 100644 index 0000000000000..c0a551c521b7c --- /dev/null +++ b/tidb-cloud-lake/sql/information-schema-keywords.md @@ -0,0 +1,17 @@ +--- +title: information_schema.keywords +--- + +The `information_schema.keywords` system table is a view that provides all keywords in Databend + +```sql +DESCRIBE information_schema.keywords + +╭─────────────────────────────────────────────────────────╮ +│ Field │ Type │ Null │ Default │ Extra │ +│ String │ String │ String │ String │ String │ +├──────────┼──────────────────┼────────┼─────────┼────────┤ +│ keywords │ VARCHAR │ NO │ '' │ │ +│ reserved │ TINYINT UNSIGNED │ NO │ 0 │ │ +╰─────────────────────────────────────────────────────────╯ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/information-schema-schemata.md b/tidb-cloud-lake/sql/information-schema-schemata.md new file mode 100644 index 0000000000000..ef7caf8fbba67 --- /dev/null +++ b/tidb-cloud-lake/sql/information-schema-schemata.md @@ -0,0 +1,23 @@ +--- +title: information_schema.schemata +--- + +Provides metadata about all databases in the system. + +```sql +desc information_schema.schemata + +╭─────────────────────────────────────────────────────────────────────╮ +│ Field │ Type │ Null │ Default │ Extra │ +│ String │ String │ String │ String │ String │ +├───────────────────────────────┼─────────┼────────┼─────────┼────────┤ +│ catalog_name │ VARCHAR │ NO │ '' │ │ +│ schema_name │ VARCHAR │ NO │ '' │ │ +│ schema_owner │ VARCHAR │ NO │ '' │ │ +│ default_character_set_catalog │ NULL │ NO │ NULL │ │ +│ default_character_set_schema │ NULL │ NO │ NULL │ │ +│ default_character_set_name │ NULL │ NO │ NULL │ │ +│ default_collation_name │ NULL │ NO │ NULL │ │ +│ sql_path │ NULL │ NO │ NULL │ │ +╰─────────────────────────────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/information-schema-tables-sql.md b/tidb-cloud-lake/sql/information-schema-tables-sql.md new file mode 100644 index 0000000000000..682e2fe62b8c6 --- /dev/null +++ b/tidb-cloud-lake/sql/information-schema-tables-sql.md @@ -0,0 +1,29 @@ +--- +title: information_schema.tables +--- + +The `information_schema.tables` system table is a view that provides metadata about all tables across all databases, including their schema, type, engine, and creation details. It also includes storage metrics such as data length, index length, and row count, offering insights into table structure and usage. + + +```sql +DESCRIBE information_schema.tables; + +┌────────────────────────────────────────────────────────────────────────────────────┐ +│ Field │ Type │ Null │ Default │ Extra │ +├─────────────────┼─────────────────┼────────┼──────────────────────────────┼────────┤ +│ table_catalog │ VARCHAR │ NO │ '' │ │ +│ table_schema │ VARCHAR │ NO │ '' │ │ +│ table_name │ VARCHAR │ NO │ '' │ │ +│ table_type │ VARCHAR │ NO │ '' │ │ +│ engine │ VARCHAR │ NO │ '' │ │ +│ create_time │ TIMESTAMP │ NO │ '1970-01-01 00:00:00.000000' │ │ +│ drop_time │ TIMESTAMP │ YES │ NULL │ │ +│ data_length │ BIGINT UNSIGNED │ YES │ NULL │ │ +│ index_length │ BIGINT UNSIGNED │ YES │ NULL │ │ +│ table_rows │ BIGINT UNSIGNED │ YES │ NULL │ │ +│ auto_increment │ NULL │ NO │ NULL │ │ +│ table_collation │ NULL │ NO │ NULL │ │ +│ data_free │ NULL │ NO │ NULL │ │ +│ table_comment │ VARCHAR │ NO │ '' │ │ +└────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/information-schema-tables.md b/tidb-cloud-lake/sql/information-schema-tables.md new file mode 100644 index 0000000000000..73db83dfce2ca --- /dev/null +++ b/tidb-cloud-lake/sql/information-schema-tables.md @@ -0,0 +1,31 @@ +--- +title: Information_Schema Tables +--- + +## Information Schema + +| Table | Description | +|----------------------------------------------|------------------------------------------------| +| [tables](/tidb-cloud-lake/sql/information-schema-tables.md) | ANSI SQL standard metadata view for tables. | +| [schemata](/tidb-cloud-lake/sql/information-schema-schemata.md) | ANSI SQL standard metadata view for databases. | +| [views](/tidb-cloud-lake/sql/information-schema-views.md) | ANSI SQL standard metadata view for views. | +| [keywords](/tidb-cloud-lake/sql/information-schema-keywords.md) | ANSI SQL standard metadata view for keywords. | +| [columns](/tidb-cloud-lake/sql/information-schema-columns.md) | ANSI SQL standard metadata view for columns. | + + +```sql +SHOW VIEWS FROM INFORMATION_SCHEMA; +╭─────────────────────────────╮ +│ Views_in_information_schema │ +│ String │ +├─────────────────────────────┤ +│ columns │ +│ key_column_usage │ +│ keywords │ +│ schemata │ +│ statistics │ +│ tables │ +│ views │ +╰─────────────────────────────╯ + +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/information-schema-views.md b/tidb-cloud-lake/sql/information-schema-views.md new file mode 100644 index 0000000000000..94ac6c2ca98ca --- /dev/null +++ b/tidb-cloud-lake/sql/information-schema-views.md @@ -0,0 +1,29 @@ +--- +title: information_schema.views +--- + +Provides metadata information for all views. + +See also: + +- [SHOW VIEWS](/tidb-cloud-lake/sql/show-views.md) + +```sql +DESCRIBE information_schema.views; + +╭───────────────────────────────────────────────────────────────────────────╮ +│ Field │ Type │ Null │ Default │ Extra │ +│ String │ String │ String │ String │ String │ +├────────────────────────────┼──────────────────┼────────┼─────────┼────────┤ +│ table_catalog │ VARCHAR │ NO │ '' │ │ +│ table_schema │ VARCHAR │ NO │ '' │ │ +│ table_name │ VARCHAR │ NO │ '' │ │ +│ view_definition │ VARCHAR │ NO │ '' │ │ +│ check_option │ VARCHAR │ NO │ '' │ │ +│ is_updatable │ TINYINT UNSIGNED │ NO │ 0 │ │ +│ is_insertable_into │ BOOLEAN │ NO │ false │ │ +│ is_trigger_updatable │ TINYINT UNSIGNED │ NO │ 0 │ │ +│ is_trigger_deletable │ TINYINT UNSIGNED │ NO │ 0 │ │ +│ is_trigger_insertable_into │ TINYINT UNSIGNED │ NO │ 0 │ │ +╰───────────────────────────────────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/inner-product.md b/tidb-cloud-lake/sql/inner-product.md new file mode 100644 index 0000000000000..597b4da1ee340 --- /dev/null +++ b/tidb-cloud-lake/sql/inner-product.md @@ -0,0 +1,125 @@ +--- +title: 'INNER_PRODUCT' +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Calculates the inner product (dot product) of two vectors, which measures the similarity and projection between vectors. + +## Syntax + +```sql +INNER_PRODUCT(vector1, vector2) +``` + +## Arguments + +- `vector1`: First vector (VECTOR Data Type) +- `vector2`: Second vector (VECTOR Data Type) + +## Returns + +Returns a FLOAT value representing the inner product of the two vectors. + +## Description + +The inner product (also known as dot product) calculates the sum of the products of corresponding elements in two vectors. The function: + +1. Verifies that both input vectors have the same length +2. Multiplies corresponding elements from each vector +3. Sums all the products to produce a single scalar value + +The mathematical formula implemented is: + +``` +inner_product(v1, v2) = Σ(v1ᵢ * v2ᵢ) +``` + +Where v1ᵢ and v2ᵢ are the elements of the input vectors. + +The inner product is fundamental in: +- Measuring vector similarity (higher values indicate more similar directions) +- Computing projections of one vector onto another +- Machine learning algorithms (neural networks, SVM, etc.) +- Physics calculations involving work and energy + +:::info +This function performs vector computations within Databend and does not rely on external APIs. +::: + +## Examples + +### Basic Usage + +```sql +SELECT INNER_PRODUCT([1,2,3]::VECTOR(3), [4,5,6]::VECTOR(3)) AS inner_product; +``` + +Result: +``` +┌───────────────┐ +│ inner_product │ +├───────────────┤ +│ 32.0 │ +└───────────────┘ +``` + +### Working with Table Data + +Create a table with vector data: + +```sql +CREATE TABLE vector_examples ( + id INT, + vector_a VECTOR(3), + vector_b VECTOR(3) +); + +INSERT INTO vector_examples VALUES + (1, [1.0, 2.0, 3.0], [4.0, 5.0, 6.0]), + (2, [1.0, 0.0, 0.0], [0.0, 1.0, 0.0]), + (3, [2.0, 3.0, 1.0], [1.0, 2.0, 3.0]); +``` + +Calculate inner products: + +```sql +SELECT + id, + vector_a, + vector_b, + INNER_PRODUCT(vector_a, vector_b) AS inner_product +FROM vector_examples; +``` + +Result: +``` +┌────┬───────────────┬───────────────┬───────────────┐ +│ id │ vector_a │ vector_b │ inner_product │ +├────┼───────────────┼───────────────┼───────────────┤ +│ 1 │ [1.0,2.0,3.0] │ [4.0,5.0,6.0] │ 32.0 │ +│ 2 │ [1.0,0.0,0.0] │ [0.0,1.0,0.0] │ 0.0 │ +│ 3 │ [2.0,3.0,1.0] │ [1.0,2.0,3.0] │ 11.0 │ +└────┴───────────────┴───────────────┴───────────────┘ +``` + +### Vector Similarity Analysis + +```sql +-- Calculate inner products to measure vector similarity +SELECT + INNER_PRODUCT([1,0,0]::VECTOR(3), [1,0,0]::VECTOR(3)) AS same_direction, + INNER_PRODUCT([1,0,0]::VECTOR(3), [0,1,0]::VECTOR(3)) AS orthogonal, + INNER_PRODUCT([1,0,0]::VECTOR(3), [-1,0,0]::VECTOR(3)) AS opposite; +``` + +Result: +``` +┌────────────────┬─────────────┬──────────┐ +│ same_direction │ orthogonal │ opposite │ +├────────────────┼─────────────┼──────────┤ +│ 1.0 │ 0.0 │ -1.0 │ +└────────────────┴─────────────┴──────────┘ +``` diff --git a/tidb-cloud-lake/sql/input-output-file-formats.md b/tidb-cloud-lake/sql/input-output-file-formats.md new file mode 100644 index 0000000000000..7c8cfda904fa9 --- /dev/null +++ b/tidb-cloud-lake/sql/input-output-file-formats.md @@ -0,0 +1,309 @@ +--- +title: Input & Output File Formats +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Databend accepts a variety of file formats both as a source and as a target for data loading or unloading. This page explains the supported file formats and their available options. + +## Syntax + +To specify a file format in a statement, use the following syntax: + +```sql +-- Specify a standard file format +... FILE_FORMAT = ( TYPE = { CSV | TSV | NDJSON | PARQUET | ORC | AVRO } [ formatTypeOptions ] ) + +-- Specify a custom file format +... FILE_FORMAT = ( FORMAT_NAME = '' ) +``` + +Databend determines the file format used by a COPY or Select statement in the following order of priority: +1. First, it checks if a FILE_FORMAT is explicitly specified within the statement. +2. If no FILE_FORMAT is specified in the operation, it uses the file format initially defined for the stage at the time of stage creation. +3. If no file format was defined for the stage during its creation, Databend defaults to using the PARQUET format. + +:::note +- Databend currently supports ORC and AVRO as a source ONLY. Unloading data into an ORC or AVRO file is not supported yet. +- For managing custom file formats in Databend, see [File Format](/tidb-cloud-lake/sql/file-format.md). +::: + +### formatTypeOptions + +`formatTypeOptions` includes one or more options to describe other format details about the file. The options vary depending on the file format. See the sections below to find out the available options for each supported file format. + +```sql +formatTypeOptions ::= + RECORD_DELIMITER = '' + FIELD_DELIMITER = '' + SKIP_HEADER = + QUOTE = '' + ESCAPE = '' + NAN_DISPLAY = '' + ROW_TAG = '' + COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE +``` + +## CSV Options + +Databend CSV is compliant with [RFC 4180](https://www.rfc-editor.org/rfc/rfc4180) and is subject to the following conditions: + +- A string must be quoted if it contains the character of a [QUOTE](#quote), [ESCAPE](#escape), [RECORD_DELIMITER](#record_delimiter), or [FIELD_DELIMITER](#field_delimiter). +- No character will be escaped in a quoted string except [QUOTE](#quote). +- No space should be left between a [FIELD_DELIMITER](#field_delimiter) and a [QUOTE](#quote). + +### RECORD_DELIMITER + +Delimiter character(s) to separates records in a file. + +**Available Values**: + +- `\r\n` +- A one-byte, non-alphanumeric character, such as `#` and `|`. +- A character with the escape char: `\b`, `\f`, `\r`, `\n`, `\t`, `\0`, `\xHH` + +**Default**: `\n` + +### FIELD_DELIMITER + +Delimiter character to separates fields in a record. + +**Available Values**: + +- A one-byte, non-alphanumeric character, such as `#` and `|`. +- A character with the escape char: `\b`, `\f`, `\r`, `\n`, `\t`, `\0`, `\xHH` + +**Default**: `,` (comma) + +### QUOTE (Load Only) + +Character used to quote values. + +For data loading, the quote is not necessary unless a string contains the character of a [QUOTE](#quote-load-only), [ESCAPE](#escape), [RECORD_DELIMITER](#record_delimiter), or [FIELD_DELIMITER](#field_delimiter). + +**Available Values**: `'\''`, `'"'`, or ``'`'``(backtick) + +**Default**: `'"'` + +### ESCAPE + +Character used to escape the quote character within quoted values, in addition to [QUOTE](#quote-load-only) itself. + +In some variants of CSV, quotes are escaped using a special escape character like `\`, instead of escaping quotes by doubling quoting. + +**Available Values**: `'\\'` or `''` (emtpy, means only use double quoting) + +**Default**: `''` + +### SKIP_HEADER (Load Only) + +Number of lines to be skipped from the beginning of the file. + +**Default**: `0` + +### OUTPUT_HEADER (Unload Only) + +Include a header row with column names. + +**Default**: `false` + +### NAN_DISPLAY + +String that represent a "NaN" (Not-a-Number). + +**Available Values**: Must be literal `'nan'` or `'null'` (case-insensitive) + +**Default**: `'NaN'` + +### NULL_DISPLAY + +String that represent a NULL value. + +When loading data, unquoted matches always become NULL, quoted matches convert to NULL only when `ALLOW_QUOTED_NULLS=true`. + +**Default**: `'\N'` + +### ALLOW_QUOTED_NULLS (Load Only) + +Allow the conversion of quoted strings to NULL values. + +Quoted strings that match `NULL_DISPLAY` become NULL only when this flag is true. Unquoted matches become NULL regardless of this option. + +**Default**: `false` + +### ERROR_ON_COLUMN_COUNT_MISMATCH (Load Only) + +Return error if the number of columns in the data file doesn't match the number of columns in the destination table. + +**Default**: `true` + +### EMPTY_FIELD_AS (Load Only) + +The value that unquoted empty fields(i.e `,,`) is converted to. + +| Available Values | Convert to | +|------------------|----------------------------------------------------------------------------------| +| `NULL` | `NULL`. Error if column is not nullable. | +| `STRING` | For String columns:`''`.
For other columns: `NULL`. Error if not nullable. | +| `FIELD_DEFAULT` | The column's default value. | + +**Default**: `NULL` + +### QUOTED_EMPTY_FIELD_AS (Load Only) + +The value that quoted empty fields(i.e `,"",`) is converted to. + +**Available Values**: same as [EMPTY_FIELD_AS](#empty_field_as-load-only) + +**Default**: `STRING` + +### BINARY_FORMAT + +Encoding format for `Binary` column. + +**Available Values**: `HEX` or `BASE64` + +**Default**: `HEX` + +### GEOMETRY_FORMAT + +Encoding format for `Geometry` column. + +**Available Values**: `EWKT`, `WKB`, `WKB`, `EWKB`, `GEOJSON` + +**Default**: `EWKT` + +### COMPRESSION + +The compression algorithm. + +| Available Values | Description | +|------------------|-----------------------------------------------------------------| +| `NONE` | Indicates that the files are not compressed. | +| `AUTO` | Auto detect compression via file extensions | +| `GZIP` | | +| `BZ2` | | +| `BROTLI` | Must be specified if loading/unloading Brotli-compressed files. | +| `ZSTD` | Zstandard v0.8 (and higher) is supported. | +| `DEFLATE` | Deflate-compressed files (with zlib header, RFC1950). | +| `RAW_DEFLATE` | Deflate-compressed files (without any header, RFC1951). | +| `XZ` | | + +**Default**: `NONE` + +## TSV Options + +Databend TSV is subject to the following conditions: + +- [RECORD_DELIMITER](#record_delimiter-1), [FIELD_DELIMITER](#field_delimiter-1) are escaped by `\` to resolve [delimter collision](https://en.wikipedia.org/wiki/Delimiter#Delimiter_collision) +- In addition to delimters, these characters in are also escaped: `\b`, `\f`, `\r`, `\n`, `\t`, `\0`, `\\`, `\'`. +- [QUOTE](#quote-load-only) is NOT part of the format. +- NULL is represent as `\N`. + +:::note +1. In Databend, the main difference between TSV and CSV is NOT using a tab instead of a comma as a field delemiter (which can be changed by options), but using escaping instead of quoting for +[delimter collision](https://en.wikipedia.org/wiki/Delimiter#Delimiter_collision) +2. We recommend CSV over TSV as a storage format since it has a formal standard. +3. TSV can be used to load files generated by + 1. [Clickhouse TSV](https://clickhouse.com/docs/integrations/data-formats/csv-tsv#tsv-tab-separated-files) + 2. [MySQL TabSeperated](https://dev.mysql.com/doc/refman/8.4/en/mysqldump.html) MySQL `mysqldump --tab`. If `--fields-enclosed-by` or `--fields-optinally-enclosed-by`, use CSV instead. + 3. [Postgresql TEXT](https://www.postgresql.org/docs/current/sql-copy.html). + 4. [Snowflake CSV](https://docs.snowflake.com/en/sql-reference/sql/create-file-format#type-csv) with default options. If `ESCAPE_UNENCLOSED_FIELD` is specified, use CSV instead. + 5. Hive Textfile. +::: + +### RECORD_DELIMITER + +Delimiter character(s) to separates records in a file. + +**Available Values**: + +- `\r\n` +- An arbitrary character, such as `#` and `|`. +- A character with the escape char: `\b`, `\f`, `\r`, `\n`, `\t`, `\0`, `\xHH` + +**Default**: `\n` + +### FIELD_DELIMITER + +Delimiter character to separates fields in a record. + +**Available Values**: + +- A non-alphanumeric character, such as `#` and `|`. +- A character with the escape char: `\b`, `\f`, `\r`, `\n`, `\t`, `\0`, `\xHH` + +**Default**: `\t` (TAB) + +### COMPRESSION + +Same as [the COMPRESSION option for CSV](#compression). + +## NDJSON Options + +### NULL_FIELD_AS (Load Only) + +The value that `null` is converted to. + +| Available Values | Convert to | +|-------------------------|----------------------------------------------------------| +| `NULL` (Default) | NULL for nullable fields. Error for non-nullable fields. | +| `FIELD_DEFAULT` | The default value of the field. | + +### MISSING_FIELD_AS (Load Only) + +The value that missing field is converted to. + +| Available Values | Convert to | +|------------------|----------------------------------------------------------| +| `ERROR` (Default)| Error. | +| `NULL` | NULL for nullable fields. Error for non-nullable fields. | +| `FIELD_DEFAULT` | The default value of the field. | + +### COMPRESSION + +Same as [the COMPRESSION option for CSV](#compression). + +## PARQUET Options + +### MISSING_FIELD_AS (Load Only) + +The value that missing field is converted to. + +| Available Values | Convert to | +|------------------|----------------------------------------------------------| +| `ERROR` (Default)| Error. | +| `FIELD_DEFAULT` | The default value of the field. | + +### COMPRESSION (Unload Only) + +Compression algorithm for internal blocks of parquet file. + +| Available Values | Description | +|------------------|-----------------------------------------------------------------------------| +| `ZSTD` (default) | Zstandard v0.8 (and higher) is supported. | +| `SNAPPY` | Snappy is a popular and fast compression algorithm often used with Parquet. | + +## ORC Options + +### MISSING_FIELD_AS (Load Only) + +The value that missing field is converted to. + +| Available Values | Convert to | +|------------------|----------------------------------------------------------| +| `ERROR` (Default)| Error. | +| `FIELD_DEFAULT` | The default value of the field. | + + +## AVRO Options + +### MISSING_FIELD_AS (Load Only) + +The value that missing field is converted to. + +| Available Values | Convert to | +|------------------|----------------------------------------------------------| +| `ERROR` (Default)| Error. | +| `FIELD_DEFAULT` | The default value of the field. | diff --git a/tidb-cloud-lake/sql/insert-multi-table.md b/tidb-cloud-lake/sql/insert-multi-table.md new file mode 100644 index 0000000000000..c338035e6b787 --- /dev/null +++ b/tidb-cloud-lake/sql/insert-multi-table.md @@ -0,0 +1,286 @@ +--- +title: INSERT (multi-table) +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Inserts rows into multiple tables in a single transaction, with the option for the insertion to be dependent on certain conditions (conditionally) or to occur regardless of any conditions (unconditionally). + +:::tip atomic operations +Databend ensures data integrity with atomic operations. Inserts, updates, replaces, and deletes either succeed completely or fail entirely. +::: + +See also: [INSERT](/tidb-cloud-lake/sql/insert.md) + +## Syntax + +```sql +-- Unconditional INSERT ALL: Inserts each row into multiple tables without any conditions or restrictions. +INSERT [ OVERWRITE ] ALL + INTO [ ( [ , ... ] ) ] [ VALUES ( [ , ... ] ) ] + ... +SELECT ... + + +-- Conditional INSERT ALL: Inserts each row into multiple tables, but only if certain conditions are met. +INSERT [ OVERWRITE ] ALL + WHEN THEN + INTO [ ( [ , ... ] ) ] [ VALUES ( [ , ... ] ) ] + [ INTO ... ] + + [ WHEN ... ] + + [ ELSE INTO ... ] +SELECT ... + + +-- Conditional INSERT FIRST: Inserts each row into multiple tables, but stops after the first successful insertion. +INSERT [ OVERWRITE ] FIRST + WHEN THEN + INTO [ ( [ , ... ] ) ] [ VALUES ( [ , ... ] ) ] + [ INTO ... ] + + [ WHEN ... ] + + [ ELSE INTO ... ] +SELECT ... +``` + +| Parameter | Description | +| ---------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `OVERWRITE` | Indicates whether existing data should be truncated before insertion. | +| `( [ , ... ] )` | Specifies the column names in the target table where data will be inserted.
- If omitted, data will be inserted into all columns in the target table. | +| `VALUES ( [ , ... ] )` | Specifies the source column names from which data will be inserted into the target table.
- If omitted, all columns returned by the subquery will be inserted into the target table.
- The data types of the columns listed in `` must match or be compatible with those specified in ``. | +| `SELECT ...` | A subquery that provides the data to be inserted into the target table(s).
- You have the option to explicitly assign aliases to columns within the subquery. This allows you to reference the columns by their aliases within WHEN clauses and VALUES clauses. | +| `WHEN` | Conditional statement to determine when to insert data into specific target tables.
- A conditional multi-table insert requires at least one WHEN clause.
- A WHEN clause can include multiple INTO clauses, and these INTO clauses can target the same table.
- To unconditionally execute a WHEN clause, you can use `WHEN 1 THEN ...`. | +| `ELSE` | Specifies the action to take if none of the conditions specified in the WHEN clauses are met. | +## Important Notes + +- Aggregate functions, external UDFs, and window functions are not allowed in the `VALUES(...)` expressions. + +## Examples + +### Example-1: Unconditional INSERT ALL + +This example demonstrates an Unconditional INSERT ALL operation, inserting each row from the `employee_data_source` table into both the `employees` and `employee_history` tables. + +1. Create tables for managing employee data, including employee details and their employment history, then populate a source table with sample employee information. + +```sql +-- Create the employees table +CREATE TABLE employees ( + employee_id INT, + employee_name VARCHAR(100), + hire_date DATE +); + +-- Create the employee_history table +CREATE TABLE employee_history ( + employee_id INT, + hire_date DATE, + termination_date DATE +); + +-- Create the employee_data_source table +CREATE TABLE employee_data_source ( + employee_id INT, + employee_name VARCHAR(100), + hire_date DATE +); + +-- Insert data into the employee_data_source table +INSERT INTO employee_data_source (employee_id, employee_name, hire_date) +VALUES + (1, 'Alice', '2023-01-15'), + (2, 'Bob', '2023-02-20'), + (3, 'Charlie', '2023-03-25'); +``` + +2. Transfer data from the `employee_data_source` table into both the `employees` and `employee_history` tables with an unconditional INSERT ALL operation. + +```sql +-- Unconditional INSERT ALL: Insert data into the employees and employee_history tables +INSERT ALL + INTO employees (employee_id, employee_name, hire_date) VALUES (employee_id, employee_name, hire_date) + INTO employee_history (employee_id, hire_date) VALUES (employee_id, hire_date) +SELECT employee_id, employee_name, hire_date FROM employee_data_source; + +-- Query the employees table +SELECT * FROM employees; + +┌─────────────────────────────────────────────────────┐ +│ employee_id │ employee_name │ hire_date │ +├─────────────────┼──────────────────┼────────────────┤ +│ 1 │ Alice │ 2023-01-15 │ +│ 2 │ Bob │ 2023-02-20 │ +│ 3 │ Charlie │ 2023-03-25 │ +└─────────────────────────────────────────────────────┘ + +-- Query the employee_history table +SELECT * FROM employee_history; + +┌─────────────────────────────────────────────────────┐ +│ employee_id │ hire_date │ termination_date │ +├─────────────────┼────────────────┼──────────────────┤ +│ 1 │ 2023-01-15 │ NULL │ +│ 2 │ 2023-02-20 │ NULL │ +│ 3 │ 2023-03-25 │ NULL │ +└─────────────────────────────────────────────────────┘ +``` + +### Example-2: Conditional INSERT ALL & FIRST + +This example demonstrates conditional INSERT ALL, inserting sales data into separate tables based on specific conditions, where records satisfying multiple conditions are inserted into all corresponding tables. + +1. Create three tables: products, `high_quantity_sales`, `high_price_sales`, and `sales_data_source`. Then, insert three sales records into the `sales_data_source` table. + +```sql +-- Create the high_quantity_sales table +CREATE TABLE high_quantity_sales ( + sale_id INT, + product_id INT, + sale_date DATE, + quantity INT, + total_price DECIMAL(10, 2) +); + +-- Create the high_price_sales table +CREATE TABLE high_price_sales ( + sale_id INT, + product_id INT, + sale_date DATE, + quantity INT, + total_price DECIMAL(10, 2) +); + +-- Create the sales_data_source table +CREATE TABLE sales_data_source ( + sale_id INT, + product_id INT, + sale_date DATE, + quantity INT, + total_price DECIMAL(10, 2) +); + +-- Insert data into the sales_data_source table +INSERT INTO sales_data_source (sale_id, product_id, sale_date, quantity, total_price) +VALUES + (1, 101, '2023-01-15', 5, 100.00), + (2, 102, '2023-02-20', 3, 75.00), + (3, 103, '2023-03-25', 10, 200.00); +``` + +2. Insert rows into multiple tables based on specific conditions using conditional INSERT ALL. Records with a quantity greater than 4 are inserted into the `high_quantity_sales` table, and those with a total price exceeding 50 are inserted into the `high_price_sales` table. + +```sql +-- Conditional INSERT ALL: Inserts each row into multiple tables, but only if certain conditions are met. +INSERT ALL + WHEN quantity > 4 THEN INTO high_quantity_sales + WHEN total_price > 50 THEN INTO high_price_sales +SELECT * FROM sales_data_source; + +SELECT * FROM high_quantity_sales; + +┌─────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ sale_id │ product_id │ sale_date │ quantity │ total_price │ +├─────────────────┼─────────────────┼────────────────┼─────────────────┼──────────────────────────┤ +│ 1 │ 101 │ 2023-01-15 │ 5 │ 100.00 │ +│ 3 │ 103 │ 2023-03-25 │ 10 │ 200.00 │ +└─────────────────────────────────────────────────────────────────────────────────────────────────┘ + +SELECT * FROM high_price_sales; + +┌─────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ sale_id │ product_id │ sale_date │ quantity │ total_price │ +├─────────────────┼─────────────────┼────────────────┼─────────────────┼──────────────────────────┤ +│ 1 │ 101 │ 2023-01-15 │ 5 │ 100.00 │ +│ 2 │ 102 │ 2023-02-20 │ 3 │ 75.00 │ +│ 3 │ 103 │ 2023-03-25 │ 10 │ 200.00 │ +└─────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +3. Empty the data from the high_quantity_sales and high_price_sales tables. + +```sql +TRUNCATE TABLE high_quantity_sales; + +TRUNCATE TABLE high_price_sales; +``` + +4. Insert rows into multiple tables based on specific conditions using conditional INSERT FIRST. For each row, it stops after the first successful insertion, therefore, the sales records with IDs 1 and 3 are inserted into the `high_quantity_sales` table only, compared to the conditional INSERT ALL results in Step 2. + +```sql +-- Conditional INSERT FIRST: Inserts each row into multiple tables, but stops after the first successful insertion. +INSERT FIRST + WHEN quantity > 4 THEN INTO high_quantity_sales + WHEN total_price > 50 THEN INTO high_price_sales +SELECT * FROM sales_data_source; + + +SELECT * FROM high_quantity_sales; + +┌─────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ sale_id │ product_id │ sale_date │ quantity │ total_price │ +├─────────────────┼─────────────────┼────────────────┼─────────────────┼──────────────────────────┤ +│ 1 │ 101 │ 2023-01-15 │ 5 │ 100.00 │ +│ 3 │ 103 │ 2023-03-25 │ 10 │ 200.00 │ +└─────────────────────────────────────────────────────────────────────────────────────────────────┘ + +SELECT * FROM high_price_sales; + +┌─────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ sale_id │ product_id │ sale_date │ quantity │ total_price │ +├─────────────────┼─────────────────┼────────────────┼─────────────────┼──────────────────────────┤ +│ 2 │ 102 │ 2023-02-20 │ 3 │ 75.00 │ +└─────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Example-3: Insert with Explicit Alias + +This example demonstrates using alias in VALUES clause to conditionally insert rows from the employees table into the `employee_history` table based on the hire date being after '2023-02-01'. + +1. Create two tables, `employees` and `employee_history`, and insert sample employee data into the `employees` table. + +```sql +-- Create tables +CREATE TABLE employees ( + employee_id INT, + first_name VARCHAR(50), + last_name VARCHAR(50), + hire_date DATE +); + +CREATE TABLE employee_history ( + employee_id INT, + full_name VARCHAR(100), + hire_date DATE +); + +INSERT INTO employees (employee_id, first_name, last_name, hire_date) +VALUES + (1, 'John', 'Doe', '2023-01-01'), + (2, 'Jane', 'Smith', '2023-02-01'), + (3, 'Michael', 'Johnson', '2023-03-01'); +``` + +2. Utilize conditional insertion with an alias to transfer records from the employees table to the `employee_history` table, filtering for hire dates after '2023-02-01'. + +```sql +INSERT ALL + WHEN hire_date >= '2023-02-01' THEN INTO employee_history + VALUES (employee_id, full_name, hire_date) -- Insert with the alias 'full_name' +SELECT employee_id, CONCAT(first_name, ' ', last_name) AS full_name, hire_date -- Alias the concatenated full name as 'full_name' +FROM employees; + +SELECT * FROM employee_history; + +┌─────────────────────────────────────────────────────┐ +│ employee_id │ full_name │ hire_date │ +│ Nullable(Int32) │ Nullable(String) │ Nullable(Date) │ +├─────────────────┼──────────────────┼────────────────┤ +│ 2 │ Jane Smith │ 2023-02-01 │ +│ 3 │ Michael Johnson │ 2023-03-01 │ +└─────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/insert-sql.md b/tidb-cloud-lake/sql/insert-sql.md new file mode 100644 index 0000000000000..5a66e055b7fbf --- /dev/null +++ b/tidb-cloud-lake/sql/insert-sql.md @@ -0,0 +1,67 @@ +--- +title: INSERT +--- + +Returns the string str, with the substring beginning at position pos and len characters long replaced by the string newstr. Returns the original string if pos is not within the length of the string. Replaces the rest of the string from position pos if len is not within the length of the rest of the string. Returns NULL if any argument is NULL. + +## Syntax + +```sql +INSERT(, , , ) +``` + +## Arguments + +| Arguments | Description | +|------------|-----------------| +| `` | The string. | +| `` | The position. | +| `` | The length. | +| `` | The new string. | + +## Return Type + +`VARCHAR` + +## Examples + +```sql +SELECT INSERT('Quadratic', 3, 4, 'What'); ++-----------------------------------+ +| INSERT('Quadratic', 3, 4, 'What') | ++-----------------------------------+ +| QuWhattic | ++-----------------------------------+ + +SELECT INSERT('Quadratic', -1, 4, 'What'); ++---------------------------------------+ +| INSERT('Quadratic', (- 1), 4, 'What') | ++---------------------------------------+ +| Quadratic | ++---------------------------------------+ + +SELECT INSERT('Quadratic', 3, 100, 'What'); ++-------------------------------------+ +| INSERT('Quadratic', 3, 100, 'What') | ++-------------------------------------+ +| QuWhat | ++-------------------------------------+ + ++--------------------------------------------+--------+ +| INSERT('123456789', number, number, 'aaa') | number | ++--------------------------------------------+--------+ +| 123456789 | 0 | +| aaa23456789 | 1 | +| 1aaa456789 | 2 | +| 12aaa6789 | 3 | +| 123aaa89 | 4 | +| 1234aaa | 5 | +| 12345aaa | 6 | +| 123456aaa | 7 | +| 1234567aaa | 8 | +| 12345678aaa | 9 | +| 123456789 | 10 | +| 123456789 | 11 | +| 123456789 | 12 | ++--------------------------------------------+--------+ +``` diff --git a/tidb-cloud-lake/sql/insert.md b/tidb-cloud-lake/sql/insert.md new file mode 100644 index 0000000000000..2d3c8e0f92818 --- /dev/null +++ b/tidb-cloud-lake/sql/insert.md @@ -0,0 +1,259 @@ +--- +title: INSERT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Inserts one or more rows into a table. + +:::tip atomic operations +Databend ensures data integrity with atomic operations. Inserts, updates, replaces, and deletes either succeed completely or fail entirely. +::: + +See also: [INSERT (multi-table)](dml-insert-multi.md) + +## Syntax + +```sql +INSERT { OVERWRITE [ INTO ] | INTO }
+ -- Optionally specify the columns to insert into + ( [ , ... ] ) + -- Insertion options: + { + -- Directly insert values or default values + VALUES ( | DEFAULT ) [ , ... ] | + -- Insert the result of a query + SELECT ... + } +``` + +| Parameter | Description | +|--------------------|----------------------------------------------------------------------------------| +| `OVERWRITE [INTO]` | Indicates whether existing data should be truncated before insertion. | +| `VALUES` | Allows direct insertion of specific values or the default values of the columns. | + +## Important Notes + +- Aggregate functions, external UDFs, and window functions are not allowed in the `VALUES(...)` expressions. + +## Examples + +### Example-1: Insert Values with OVERWRITE + +In this example, the INSERT OVERWRITE statement is utilized to truncate the employee table and insert new data, replacing all existing records with the values provided for an employee with ID 100. + +```sql +CREATE TABLE employee ( + employee_id INT, + employee_name VARCHAR(50) +); + +-- Inserting initial data into the employee table +INSERT INTO employee(employee_id, employee_name) VALUES + (101, 'John Doe'), + (102, 'Jane Smith'); + +-- Inserting new data with OVERWRITE +INSERT OVERWRITE employee VALUES (100, 'John Johnson'); + +-- Displaying the contents of the employee table +SELECT * FROM employee; + +┌────────────────────────────────────┐ +│ employee_id │ employee_name │ +├─────────────────┼──────────────────┤ +│ 100 │ John Johnson │ +└────────────────────────────────────┘ +``` + +### Example-2: Insert Query Results + +When inserting the results of a SELECT statement, the mapping of columns follows their positions in the SELECT clause. Therefore, the number of columns in the SELECT statement must be equal to or greater than the number of columns in the INSERT table. In cases where the data types of the columns in the SELECT statement and the INSERT table differ, type casting will be performed as needed. + +```sql +-- Creating a table named 'employee_info' with three columns: 'employee_id', 'employee_name', and 'department' +CREATE TABLE employee_info ( + employee_id INT, + employee_name VARCHAR(50), + department VARCHAR(50) +); + +-- Inserting a record into the 'employee_info' table +INSERT INTO employee_info VALUES ('101', 'John Doe', 'Marketing'); + +-- Creating a table named 'employee_data' with three columns: 'ID', 'Name', and 'Dept' +CREATE TABLE employee_data ( + ID INT, + Name VARCHAR(50), + Dept VARCHAR(50) +); + +-- Inserting data from 'employee_info' into 'employee_data' +INSERT INTO employee_data SELECT * FROM employee_info; + +-- Displaying the contents of the 'employee_data' table +SELECT * FROM employee_data; + +┌───────────────────────────────────────────────────────┐ +│ id │ name │ dept │ +├─────────────────┼──────────────────┼──────────────────┤ +│ 101 │ John Doe │ Marketing │ +└───────────────────────────────────────────────────────┘ +``` + +This example demonstrates creating a summary table named "sales_summary" to store aggregated sales data such as total quantity sold and revenue for each product by aggregating information from the sales table: + +```sql +-- Creating a table for sales data +CREATE TABLE sales ( + product_id INT, + quantity_sold INT, + revenue DECIMAL(10, 2) +); + +-- Inserting some sample sales data +INSERT INTO sales (product_id, quantity_sold, revenue) VALUES + (1, 100, 500.00), + (2, 150, 750.00), + (1, 200, 1000.00), + (3, 50, 250.00); + +-- Creating a summary table to store aggregated sales data +CREATE TABLE sales_summary ( + product_id INT, + total_quantity_sold INT, + total_revenue DECIMAL(10, 2) +); + +-- Inserting aggregated sales data into the summary table +INSERT INTO sales_summary (product_id, total_quantity_sold, total_revenue) +SELECT + product_id, + SUM(quantity_sold) AS total_quantity_sold, + SUM(revenue) AS total_revenue +FROM + sales +GROUP BY + product_id; + +-- Displaying the contents of the sales_summary table +SELECT * FROM sales_summary; + +┌──────────────────────────────────────────────────────────────────┐ +│ product_id │ total_quantity_sold │ total_revenue │ +├─────────────────┼─────────────────────┼──────────────────────────┤ +│ 1 │ 300 │ 1500.00 │ +│ 3 │ 50 │ 250.00 │ +│ 2 │ 150 │ 750.00 │ +└──────────────────────────────────────────────────────────────────┘ +``` + +### Example-3: Insert Default Values + +This example illustrates creating a table called "staff_records", with default values set for columns such as department and status. Data is then inserted, showcasing default value usage. + +```sql +-- Creating a table 'staff_records' with columns 'employee_id', 'department', 'salary', and 'status' with default values +CREATE TABLE staff_records ( + employee_id INT NULL, + department VARCHAR(50) DEFAULT 'HR', + salary FLOAT, + status VARCHAR(10) DEFAULT 'Active' +); + +-- Inserting data into 'staff_records' with default values +INSERT INTO staff_records +VALUES + (DEFAULT, DEFAULT, DEFAULT, DEFAULT), + (101, DEFAULT, 50000.00, DEFAULT), + (102, 'Finance', 60000.00, 'Inactive'), + (103, 'Marketing', 70000.00, 'Active'); + +-- Displaying the contents of the 'staff_records' table +SELECT * FROM staff_records; + +┌───────────────────────────────────────────────────────────────────────────┐ +│ employee_id │ department │ salary │ status │ +├─────────────────┼──────────────────┼───────────────────┼──────────────────┤ +│ NULL │ HR │ NULL │ Active │ +│ 101 │ HR │ 50000 │ Active │ +│ 102 │ Finance │ 60000 │ Inactive │ +│ 103 │ Marketing │ 70000 │ Active │ +└───────────────────────────────────────────────────────────────────────────┘ +``` + +### Example-4: Insert with Staged Files + +Databend enables you to insert data into a table from staged files with the INSERT INTO statement. This is achieved through Databend's capacity to [Query Staged Files](/tidb-cloud-lake/sql/stage.md) and subsequently incorporate the query result into the table. + +1. Create a table called `sample`: + +```sql +CREATE TABLE sample +( + id INT, + city VARCHAR, + score INT, + country VARCHAR DEFAULT 'China' +); +``` + +2. Set up an internal stage with sample data + +We'll establish an internal stage named `mystage` and then populate it with sample data. + +```sql +CREATE STAGE mystage; + +COPY INTO @mystage +FROM +( + SELECT * + FROM + ( + VALUES + (1, 'Chengdu', 80), + (3, 'Chongqing', 90), + (6, 'Hangzhou', 92), + (9, 'Hong Kong', 88) + ) +) +FILE_FORMAT = (TYPE = PARQUET); +``` + +3. Insert data from the staged Parquet file with `INSERT INTO` + +:::tip +You can specify the file format and various copy-related settings with the FILE_FORMAT and COPY_OPTIONS available in the [COPY INTO](/tidb-cloud-lake/sql/copy-into-table.md) command. When `purge` is set to `true`, the original file will only be deleted if the data update is successful. +::: + +```sql +INSERT INTO sample + (id, city, score) +ON + (Id) +SELECT + $1, $2, $3 +FROM + @mystage + (FILE_FORMAT => 'parquet'); +``` + +4. Verify the data insert + +```sql +SELECT * FROM sample; +``` + +The results should be: +```sql +┌─────────────────────────────────────────────────────────────────────────┐ +│ id │ city │ score │ country │ +├─────────────────┼──────────────────┼─────────────────┼──────────────────┤ +│ 1 │ Chengdu │ 80 │ China │ +│ 3 │ Chongqing │ 90 │ China │ +│ 6 │ Hangzhou │ 92 │ China │ +│ 9 │ Hong Kong │ 88 │ China │ +└─────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/inspect-parquet.md b/tidb-cloud-lake/sql/inspect-parquet.md new file mode 100644 index 0000000000000..4fcec78bbbf54 --- /dev/null +++ b/tidb-cloud-lake/sql/inspect-parquet.md @@ -0,0 +1,53 @@ +--- +title: INSPECT_PARQUET +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Retrieves a table of comprehensive metadata from a staged Parquet file, including the following columns: + +| Column | Description | +|----------------------------------|----------------------------------------------------------------| +| created_by | The entity or source responsible for creating the Parquet file | +| num_columns | The number of columns in the Parquet file | +| num_rows | The total number of rows or records in the Parquet file | +| num_row_groups | The count of row groups within the Parquet file | +| serialized_size | The size of the Parquet file on disk (compressed) | +| max_row_groups_size_compressed | The size of the largest row group (compressed) | +| max_row_groups_size_uncompressed | The size of the largest row group (uncompressed) | + +## Syntax + +```sql +INSPECT_PARQUET('@') +``` + +## Examples + +This example retrieves the metadata from a staged sample Parquet file named [books.parquet](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.parquet). The file contains two records: + +```text title='books.parquet' +Transaction Processing,Jim Gray,1992 +Readings in Database Systems,Michael Stonebraker,2004 +``` + +```sql +-- Show the staged file +LIST @my_internal_stage; + +┌──────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ size │ md5 │ last_modified │ creator │ +├───────────────┼────────┼──────────────────┼───────────────────────────────┼──────────────────┤ +│ books.parquet │ 998 │ NULL │ 2023-04-19 19:34:51.303 +0000 │ NULL │ +└──────────────────────────────────────────────────────────────────────────────────────────────┘ + +-- Retrieve metadata from the staged file +SELECT * FROM INSPECT_PARQUET('@my_internal_stage/books.parquet'); + +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ created_by │ num_columns │ num_rows │ num_row_groups │ serialized_size │ max_row_groups_size_compressed │ max_row_groups_size_uncompressed │ +├────────────────────────────────────┼─────────────┼──────────┼────────────────┼─────────────────┼────────────────────────────────┼──────────────────────────────────┤ +│ parquet-cpp version 1.5.1-SNAPSHOT │ 3 │ 2 │ 1 │ 998 │ 332 │ 320 │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/instr.md b/tidb-cloud-lake/sql/instr.md new file mode 100644 index 0000000000000..ff720e26b31fd --- /dev/null +++ b/tidb-cloud-lake/sql/instr.md @@ -0,0 +1,41 @@ +--- +title: INSTR +--- + +Returns the position of the first occurrence of substring substr in string str. +This is the same as the two-argument form of LOCATE(), except that the order of the arguments is reversed. + +## Syntax + +```sql +INSTR(, ) +``` + +## Arguments + +| Arguments | Description | +|------------|----------------| +| `` | The string. | +| `` | The substring. | + +## Return Type + +`BIGINT` + +## Examples + +```sql +SELECT INSTR('foobarbar', 'bar'); ++---------------------------+ +| INSTR('foobarbar', 'bar') | ++---------------------------+ +| 4 | ++---------------------------+ + +SELECT INSTR('xbar', 'foobar'); ++-------------------------+ +| INSTR('xbar', 'foobar') | ++-------------------------+ +| 0 | ++-------------------------+ +``` diff --git a/tidb-cloud-lake/sql/intdiv.md b/tidb-cloud-lake/sql/intdiv.md new file mode 100644 index 0000000000000..183b89f37d0e4 --- /dev/null +++ b/tidb-cloud-lake/sql/intdiv.md @@ -0,0 +1,5 @@ +--- +title: INTDIV +--- + +Alias for [DIV](/tidb-cloud-lake/sql/div.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/intersect-count.md b/tidb-cloud-lake/sql/intersect-count.md new file mode 100644 index 0000000000000..6188ab4c20237 --- /dev/null +++ b/tidb-cloud-lake/sql/intersect-count.md @@ -0,0 +1,35 @@ +--- +title: INTERSECT_COUNT +--- + +Counts the number of intersecting bits between two bitmap columns. + +## Syntax + +```sql +INTERSECT_COUNT( '', '' )( , ) +``` + +## Examples + +```sql +CREATE TABLE agg_bitmap_test(id Int, tag String, v Bitmap); + +INSERT INTO + agg_bitmap_test(id, tag, v) +VALUES + (1, 'a', to_bitmap('0, 1')), + (2, 'b', to_bitmap('0, 1, 2')), + (3, 'c', to_bitmap('1, 3, 4')); + +SELECT id, INTERSECT_COUNT('b', 'c')(v, tag) +FROM agg_bitmap_test GROUP BY id; + +┌─────────────────────────────────────────────────────┐ +│ id │ intersect_count('b', 'c')(v, tag) │ +├─────────────────┼───────────────────────────────────┤ +│ 1 │ 0 │ +│ 3 │ 3 │ +│ 2 │ 3 │ +└─────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/interval-functions.md b/tidb-cloud-lake/sql/interval-functions.md new file mode 100644 index 0000000000000..118162ba50ebe --- /dev/null +++ b/tidb-cloud-lake/sql/interval-functions.md @@ -0,0 +1,40 @@ +--- +title: Interval Functions +--- + +This section provides reference information for the interval functions in Databend. Interval functions allow you to create interval values of various time units for use in date and time calculations. + +## Time Unit Conversion Functions + +### Day-based Intervals + +| Function | Description | Example | +|----------|-------------|--------| +| [TO_DAYS](/tidb-cloud-lake/sql/days.md) | Converts a number to an interval of days | `TO_DAYS(2)` → `2 days` | +| [TO_WEEKS](/tidb-cloud-lake/sql/weeks.md) | Converts a number to an interval of weeks | `TO_WEEKS(3)` → `21 days` | +| [TO_MONTHS](/tidb-cloud-lake/sql/months.md) | Converts a number to an interval of months | `TO_MONTHS(2)` → `2 months` | +| [TO_YEARS](/tidb-cloud-lake/sql/years.md) | Converts a number to an interval of years | `TO_YEARS(1)` → `1 year` | + +### Hour-based Intervals + +| Function | Description | Example | +|----------|-------------|--------| +| [TO_HOURS](/tidb-cloud-lake/sql/hours.md) | Converts a number to an interval of hours | `TO_HOURS(5)` → `5:00:00` | +| [TO_MINUTES](/tidb-cloud-lake/sql/minutes.md) | Converts a number to an interval of minutes | `TO_MINUTES(90)` → `1:30:00` | +| [TO_SECONDS](/tidb-cloud-lake/sql/seconds.md) | Converts a number to an interval of seconds | `TO_SECONDS(3600)` → `1:00:00` | +| [EPOCH](/tidb-cloud-lake/sql/epoch.md) | Alias for TO_SECONDS | `EPOCH(60)` → `00:01:00` | + +### Smaller Time Units + +| Function | Description | Example | +|----------|-------------|--------| +| [TO_MILLISECONDS](/tidb-cloud-lake/sql/milliseconds.md) | Converts a number to an interval of milliseconds | `TO_MILLISECONDS(2000)` → `00:00:02` | +| [TO_MICROSECONDS](/tidb-cloud-lake/sql/microseconds.md) | Converts a number to an interval of microseconds | `TO_MICROSECONDS(2000000)` → `00:00:02` | + +### Larger Time Units + +| Function | Description | Example | +|----------|-------------|--------| +| [TO_DECADES](/tidb-cloud-lake/sql/decades.md) | Converts a number to an interval of decades | `TO_DECADES(2)` → `20 years` | +| [TO_CENTRIES](/tidb-cloud-lake/sql/to-centuries.md) | Converts a number to an interval of centuries | `TO_CENTRIES(1)` → `100 years` | +| [TO_MILLENNIA](/tidb-cloud-lake/sql/millennia.md) | Converts a number to an interval of millennia | `TO_MILLENNIA(1)` → `1000 years` | \ No newline at end of file diff --git a/tidb-cloud-lake/sql/interval.md b/tidb-cloud-lake/sql/interval.md new file mode 100644 index 0000000000000..ea111e962bde0 --- /dev/null +++ b/tidb-cloud-lake/sql/interval.md @@ -0,0 +1,97 @@ +--- +title: Interval +sidebar_position: 7 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +## Overview + +`INTERVAL` represents a duration that can be written in natural-language text (`'1 year 2 months'`, `'3 days ago'`) or as an integer number of microseconds. Databend supports units from millennia down to microseconds and allows arithmetic on intervals, dates, and timestamps. + +:::note +Fractional parts are discarded when parsing numeric intervals. `'1.6 seconds'` becomes a 1-second interval. +::: + +## Examples + +### Literals and Numeric Values + +```sql +CREATE OR REPLACE TABLE intervals (duration INTERVAL); + +INSERT INTO intervals VALUES + ('1 year 2 months'), -- positive natural language + ('1 year 2 months ago'), -- negative because of "ago" + ('1000000'), -- 1 second in microseconds + ('-1000000'); -- -1 second + +SELECT TO_STRING(duration) AS duration_text FROM intervals; +``` + +Result: +``` +┌──────────────────────┐ +│ duration_text │ +├──────────────────────┤ +│ 1 year 2 months │ +│ -1 year -2 months │ +│ 0:00:01 │ +│ -0:00:01 │ +└──────────────────────┘ +``` + +```sql +SELECT + TO_STRING(TO_INTERVAL('1 seconds')) AS whole, + TO_STRING(TO_INTERVAL('1.6 seconds')) AS fractional; +``` + +Result: +``` +┌────────┬────────────┐ +│ whole │ fractional │ +├────────┼────────────┤ +│ 0:00:01 │ 0:00:01 │ +└────────┴────────────┘ +``` + +### Interval Arithmetic + +```sql +SELECT + TO_STRING(TO_DAYS(3) + TO_DAYS(1)) AS add_interval, + TO_STRING(TO_DAYS(3) - TO_DAYS(1)) AS subtract_interval; +``` + +Result: +``` +┌──────────────┬──────────────────┐ +│ add_interval │ subtract_interval │ +├──────────────┼──────────────────┤ +│ 4 days │ 2 days │ +└──────────────┴──────────────────┘ +``` + +### Apply to DATE and TIMESTAMP + +```sql +SELECT + DATE '2024-12-20' + TO_DAYS(2) AS add_days, + DATE '2024-12-20' - TO_DAYS(2) AS subtract_days, + TIMESTAMP '2024-12-20 10:00:00' + TO_HOURS(36) AS add_hours, + TIMESTAMP '2024-12-20 10:00:00' - TO_HOURS(36) AS subtract_hours; +``` + +Result: +``` +┌────────────────────┬────────────────────┬────────────────────┬────────────────────┐ +│ add_days │ subtract_days │ add_hours │ subtract_hours │ +├────────────────────┼────────────────────┼────────────────────┼────────────────────┤ +│ 2024-12-22T00:00:00 │ 2024-12-18T00:00:00 │ 2024-12-21T22:00:00 │ 2024-12-18T22:00:00 │ +└────────────────────┴────────────────────┴────────────────────┴────────────────────┘ +``` + +Intervals are added or subtracted just like numbers, making it easy to slide windows or compute offsets with precise control down to microseconds. diff --git a/tidb-cloud-lake/sql/inverted-index.md b/tidb-cloud-lake/sql/inverted-index.md new file mode 100644 index 0000000000000..9cdfd7d7889bb --- /dev/null +++ b/tidb-cloud-lake/sql/inverted-index.md @@ -0,0 +1,20 @@ +--- +title: Inverted Index +--- +This page provides a comprehensive overview of inverted index operations in Databend, organized by functionality for easy reference. + +## Inverted Index Management + +| Command | Description | +|---------|-------------| +| [CREATE INVERTED INDEX](/tidb-cloud-lake/sql/create-inverted-index.md) | Creates a new inverted index for full-text search | +| [DROP INVERTED INDEX](/tidb-cloud-lake/sql/drop-inverted-index.md) | Removes an inverted index | +| [REFRESH INVERTED INDEX](/tidb-cloud-lake/sql/refresh-inverted-index.md) | Updates an inverted index with the latest data | + +## Related Topics + +- [Full-Text Index](/tidb-cloud-lake/guides/full-text-index.md) + +:::note +Inverted indexes in Databend enable efficient full-text search capabilities for text data, allowing for fast keyword searches across large text columns. +::: diff --git a/tidb-cloud-lake/sql/ip-address-functions.md b/tidb-cloud-lake/sql/ip-address-functions.md new file mode 100644 index 0000000000000..db1fb7a203160 --- /dev/null +++ b/tidb-cloud-lake/sql/ip-address-functions.md @@ -0,0 +1,21 @@ +--- +title: IP Address Functions +--- + +This page provides reference information for the IP address-related functions in Databend. These functions help convert between string and numeric representations of IP addresses. + +## IP Address Conversion Functions + +| Function | Description | Example | +|----------|-------------|--------| +| [INET_ATON](/tidb-cloud-lake/sql/inet-aton.md) / [IPV4_STRING_TO_NUM](ipv4-string-to-num.md) | Converts an IPv4 address string to a 32-bit integer | `INET_ATON('192.168.1.1')` → `3232235777` | +| [INET_NTOA](/tidb-cloud-lake/sql/inet-ntoa.md) / [IPV4_NUM_TO_STRING](ipv4-num-to-string.md) | Converts a 32-bit integer to an IPv4 address string | `INET_NTOA(3232235777)` → `'192.168.1.1'` | + +## Safe IP Address Conversion Functions + +These functions handle invalid inputs gracefully by returning NULL instead of raising an error. + +| Function | Description | Example | +|----------|-------------|--------| +| [TRY_INET_ATON](/tidb-cloud-lake/sql/try-inet-aton.md) / [TRY_IPV4_STRING_TO_NUM](try-ipv4-string-to-num.md) | Safely converts an IPv4 address string to a 32-bit integer | `TRY_INET_ATON('invalid')` → `NULL` | +| [TRY_INET_NTOA](/tidb-cloud-lake/sql/try-inet-ntoa.md) / [TRY_IPV4_NUM_TO_STRING](try-ipv4-num-to-string.md) | Safely converts a 32-bit integer to an IPv4 address string | `TRY_INET_NTOA(-1)` → `NULL` | diff --git a/tidb-cloud-lake/sql/ipv-num-string.md b/tidb-cloud-lake/sql/ipv-num-string.md new file mode 100644 index 0000000000000..9834e1592152a --- /dev/null +++ b/tidb-cloud-lake/sql/ipv-num-string.md @@ -0,0 +1,5 @@ +--- +title: IPV4_NUM_TO_STRING +--- + +Alias for [INET_NTOA](/tidb-cloud-lake/sql/inet-ntoa.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/ipv-string-num.md b/tidb-cloud-lake/sql/ipv-string-num.md new file mode 100644 index 0000000000000..1c63db40955b7 --- /dev/null +++ b/tidb-cloud-lake/sql/ipv-string-num.md @@ -0,0 +1,5 @@ +--- +title: IPV4_STRING_TO_NUM +--- + +Alias for [INET_ATON](/tidb-cloud-lake/sql/inet-aton.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/is-array.md b/tidb-cloud-lake/sql/is-array.md new file mode 100644 index 0000000000000..8f6f5cbf094a0 --- /dev/null +++ b/tidb-cloud-lake/sql/is-array.md @@ -0,0 +1,43 @@ +--- +title: IS_ARRAY +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Checks if the input value is a JSON array. Please note that a JSON array is not the same as the [ARRAY](/tidb-cloud-lake/sql/array.md) data type. A JSON array is a data structure commonly used in JSON, representing an ordered collection of values enclosed within square brackets `[ ]`. It is a flexible format for organizing and exchanging various data types, including strings, numbers, booleans, objects, and nulls. + +```json title='JSON Array Example:' +[ + "Apple", + 42, + true, + {"name": "John", "age": 30, "isStudent": false}, + [1, 2, 3], + null +] +``` + +## Syntax + +```sql +IS_ARRAY( ) +``` + +## Return Type + +Returns `true` if the input value is a JSON array, and `false` otherwise. + +## Examples + +```sql +SELECT + IS_ARRAY(PARSE_JSON('true')), + IS_ARRAY(PARSE_JSON('[1,2,3]')); + +┌────────────────────────────────────────────────────────────────┐ +│ is_array(parse_json('true')) │ is_array(parse_json('[1,2,3]')) │ +├──────────────────────────────┼─────────────────────────────────┤ +│ false │ true │ +└────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/is-boolean.md b/tidb-cloud-lake/sql/is-boolean.md new file mode 100644 index 0000000000000..5ba1493b4e02e --- /dev/null +++ b/tidb-cloud-lake/sql/is-boolean.md @@ -0,0 +1,32 @@ +--- +title: IS_BOOLEAN +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Checks if the input JSON value is a boolean. + +## Syntax + +```sql +IS_BOOLEAN( ) +``` + +## Return Type + +Returns `true` if the input JSON value is a boolean, and `false` otherwise. + +## Examples + +```sql +SELECT + IS_BOOLEAN(PARSE_JSON('true')), + IS_BOOLEAN(PARSE_JSON('[1,2,3]')); + +┌────────────────────────────────────────────────────────────────────┐ +│ is_boolean(parse_json('true')) │ is_boolean(parse_json('[1,2,3]')) │ +├────────────────────────────────┼───────────────────────────────────┤ +│ true │ false │ +└────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/is-error.md b/tidb-cloud-lake/sql/is-error.md new file mode 100644 index 0000000000000..1c5e3c647af3c --- /dev/null +++ b/tidb-cloud-lake/sql/is-error.md @@ -0,0 +1,42 @@ +--- +title: IS_ERROR +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns a Boolean value indicating whether an expression is an error value. + +See also: [IS_NOT_ERROR](/tidb-cloud-lake/sql/is-not-error.md) + +## Syntax + +```sql +IS_ERROR( ) +``` + +## Return Type + +Returns `true` if the expression is an error, otherwise `false`. + +## Examples + +```sql +-- Indicates division by zero, hence an error +SELECT IS_ERROR(1/0), IS_NOT_ERROR(1/0); + +┌───────────────────────────────────────────┐ +│ is_error((1 / 0)) │ is_not_error((1 / 0)) │ +├───────────────────┼───────────────────────┤ +│ true │ false │ +└───────────────────────────────────────────┘ + +-- The conversion to DATE is successful, hence not an error +SELECT IS_ERROR('2024-03-17'::DATE), IS_NOT_ERROR('2024-03-17'::DATE); + +┌─────────────────────────────────────────────────────────────────┐ +│ is_error('2024-03-17'::date) │ is_not_error('2024-03-17'::date) │ +├──────────────────────────────┼──────────────────────────────────┤ +│ false │ true │ +└─────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/is-float.md b/tidb-cloud-lake/sql/is-float.md new file mode 100644 index 0000000000000..79ee74aacc123 --- /dev/null +++ b/tidb-cloud-lake/sql/is-float.md @@ -0,0 +1,32 @@ +--- +title: IS_FLOAT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Checks if the input JSON value is a float. + +## Syntax + +```sql +IS_FLOAT( ) +``` + +## Return Type + +Returns `true` if the input JSON value is a float, and `false` otherwise. + +## Examples + +```sql +SELECT + IS_FLOAT(PARSE_JSON('1.23')), + IS_FLOAT(PARSE_JSON('[1,2,3]')); + +┌────────────────────────────────────────────────────────────────┐ +│ is_float(parse_json('1.23')) │ is_float(parse_json('[1,2,3]')) │ +├──────────────────────────────┼─────────────────────────────────┤ +│ true │ false │ +└────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/is-integer.md b/tidb-cloud-lake/sql/is-integer.md new file mode 100644 index 0000000000000..4533b3ce479eb --- /dev/null +++ b/tidb-cloud-lake/sql/is-integer.md @@ -0,0 +1,32 @@ +--- +title: IS_INTEGER +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Checks if the input JSON value is an integer. + +## Syntax + +```sql +IS_INTEGER( ) +``` + +## Return Type + +Returns `true` if the input JSON value is an integer, and `false` otherwise. + +## Examples + +```sql +SELECT + IS_INTEGER(PARSE_JSON('123')), + IS_INTEGER(PARSE_JSON('[1,2,3]')); + +┌───────────────────────────────────────────────────────────────────┐ +│ is_integer(parse_json('123')) │ is_integer(parse_json('[1,2,3]')) │ +├───────────────────────────────┼───────────────────────────────────┤ +│ true │ false │ +└───────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/is-not-distinct.md b/tidb-cloud-lake/sql/is-not-distinct.md new file mode 100644 index 0000000000000..af6c6eac67256 --- /dev/null +++ b/tidb-cloud-lake/sql/is-not-distinct.md @@ -0,0 +1,23 @@ +--- +title: "IS [ NOT ] DISTINCT FROM" +--- + +Compares whether two expressions are equal (or not equal) with awareness of nullability, meaning it treats NULLs as known values for comparing equality. + +## Syntax + +```sql + IS [ NOT ] DISTINCT FROM +``` + +## Examples + +```sql +SELECT NULL IS DISTINCT FROM NULL; + +┌────────────────────────────┐ +│ null is distinct from null │ +├────────────────────────────┤ +│ false │ +└────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/is-not-error.md b/tidb-cloud-lake/sql/is-not-error.md new file mode 100644 index 0000000000000..a4df171f7e8c5 --- /dev/null +++ b/tidb-cloud-lake/sql/is-not-error.md @@ -0,0 +1,42 @@ +--- +title: IS_NOT_ERROR +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns a Boolean value indicating whether an expression is an error value. + +See also: [IS_ERROR](/tidb-cloud-lake/sql/is-error.md) + +## Syntax + +```sql +IS_NOT_ERROR( ) +``` + +## Return Type + +Returns `true` if the expression is not an error, otherwise `false`. + +## Examples + +```sql +-- Indicates division by zero, hence an error +SELECT IS_ERROR(1/0), IS_NOT_ERROR(1/0); + +┌───────────────────────────────────────────┐ +│ is_error((1 / 0)) │ is_not_error((1 / 0)) │ +├───────────────────┼───────────────────────┤ +│ true │ false │ +└───────────────────────────────────────────┘ + +-- The conversion to DATE is successful, hence not an error +SELECT IS_ERROR('2024-03-17'::DATE), IS_NOT_ERROR('2024-03-17'::DATE); + +┌─────────────────────────────────────────────────────────────────┐ +│ is_error('2024-03-17'::date) │ is_not_error('2024-03-17'::date) │ +├──────────────────────────────┼──────────────────────────────────┤ +│ false │ true │ +└─────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/is-not-null.md b/tidb-cloud-lake/sql/is-not-null.md new file mode 100644 index 0000000000000..6085e86a863bc --- /dev/null +++ b/tidb-cloud-lake/sql/is-not-null.md @@ -0,0 +1,23 @@ +--- +title: IS_NOT_NULL +--- + +Checks whether a value is not NULL. + +## Syntax + +```sql +IS_NOT_NULL() +``` + +## Examples + +```sql +SELECT IS_NOT_NULL(1); + +┌────────────────┐ +│ is_not_null(1) │ +├────────────────┤ +│ true │ +└────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/is-null-value.md b/tidb-cloud-lake/sql/is-null-value.md new file mode 100644 index 0000000000000..263cbd4edcacb --- /dev/null +++ b/tidb-cloud-lake/sql/is-null-value.md @@ -0,0 +1,39 @@ +--- +title: IS_NULL_VALUE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Checks whether the input value is a JSON `null`. Please note that this function examines JSON `null`, not SQL NULL. To check if a value is SQL NULL, use [IS_NULL](/tidb-cloud-lake/sql/is-null.md). + +```json title='JSON null Example:' +{ + "name": "John", + "age": null +} +``` + +## Syntax + +```sql +IS_NULL_VALUE( ) +``` + +## Return Type + +Returns `true` if the input value is a JSON `null`, and `false` otherwise. + +## Examples + +```sql +SELECT + IS_NULL_VALUE(PARSE_JSON('{"name":"John", "age":null}') :age), --JSON null + IS_NULL(NULL); --SQL NULL + +┌──────────────────────────────────────────────────────────────────────────────┐ +│ is_null_value(parse_json('{"name":"john", "age":null}'):age) │ is_null(null) │ +├──────────────────────────────────────────────────────────────┼───────────────┤ +│ true │ true │ +└──────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/is-null.md b/tidb-cloud-lake/sql/is-null.md new file mode 100644 index 0000000000000..ecac0f5a44cba --- /dev/null +++ b/tidb-cloud-lake/sql/is-null.md @@ -0,0 +1,23 @@ +--- +title: IS_NULL +--- + +Checks whether a value is NULL. + +## Syntax + +```sql +IS_NULL() +``` + +## Examples + +```sql +SELECT IS_NULL(1); + +┌────────────┐ +│ is_null(1) │ +├────────────┤ +│ false │ +└────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/is-object.md b/tidb-cloud-lake/sql/is-object.md new file mode 100644 index 0000000000000..672849cbe2b5f --- /dev/null +++ b/tidb-cloud-lake/sql/is-object.md @@ -0,0 +1,32 @@ +--- +title: IS_OBJECT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Checks if the input value is a JSON object. + +## Syntax + +```sql +IS_OBJECT( ) +``` + +## Return Type + +Returns `true` if the input JSON value is a JSON object, and `false` otherwise. + +## Examples + +```sql +SELECT + IS_OBJECT(PARSE_JSON('{"a":"b"}')), -- JSON Object + IS_OBJECT(PARSE_JSON('["a","b","c"]')); --JSON Array + +┌─────────────────────────────────────────────────────────────────────────────┐ +│ is_object(parse_json('{"a":"b"}')) │ is_object(parse_json('["a","b","c"]')) │ +├────────────────────────────────────┼────────────────────────────────────────┤ +│ true │ false │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/is-string.md b/tidb-cloud-lake/sql/is-string.md new file mode 100644 index 0000000000000..16f92886825ec --- /dev/null +++ b/tidb-cloud-lake/sql/is-string.md @@ -0,0 +1,32 @@ +--- +title: IS_STRING +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Checks if the input JSON value is a string. + +## Syntax + +```sql +IS_STRING( ) +``` + +## Return Type + +Returns `true` if the input JSON value is a string, and `false` otherwise. + +## Examples + +```sql +SELECT + IS_STRING(PARSE_JSON('"abc"')), + IS_STRING(PARSE_JSON('123')); + +┌───────────────────────────────────────────────────────────────┐ +│ is_string(parse_json('"abc"')) │ is_string(parse_json('123')) │ +├────────────────────────────────┼──────────────────────────────┤ +│ true │ false │ +└───────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/jaro-winkler.md b/tidb-cloud-lake/sql/jaro-winkler.md new file mode 100644 index 0000000000000..d46c41303b1dd --- /dev/null +++ b/tidb-cloud-lake/sql/jaro-winkler.md @@ -0,0 +1,73 @@ +--- +title: JARO_WINKLER +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Calculates the [Jaro-Winkler distance](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) between two strings. It is commonly used for measuring the similarity between strings, with values ranging from 0.0 (completely dissimilar) to 1.0 (identical strings). + +## Syntax + +```sql +JARO_WINKLER(, ) +``` + +## Return Type + +The JARO_WINKLER function returns a FLOAT64 value representing the similarity between the two input strings. The return value follows these rules: + +- Similarity Range: The result ranges from 0.0 (completely dissimilar) to 1.0 (identical). + + ```sql title='Examples:' + SELECT JARO_WINKLER('databend', 'Databend') AS similarity; + + ┌────────────────────┐ + │ similarity │ + ├────────────────────┤ + │ 0.9166666666666666 │ + └────────────────────┘ + + SELECT JARO_WINKLER('databend', 'database') AS similarity; + + ┌────────────┐ + │ similarity │ + ├────────────┤ + │ 0.9 │ + └────────────┘ + ``` +- NULL Handling: If either string1 or string2 is NULL, the result is NULL. + + ```sql title='Examples:' + SELECT JARO_WINKLER('databend', NULL) AS similarity; + + ┌────────────┐ + │ similarity │ + ├────────────┤ + │ NULL │ + └────────────┘ + ``` +- Empty Strings: + - Comparing two empty strings returns 1.0. + + ```sql title='Examples:' + SELECT JARO_WINKLER('', '') AS similarity; + + ┌────────────┐ + │ similarity │ + ├────────────┤ + │ 1 │ + └────────────┘ + ``` + - Comparing an empty string with a non-empty string returns 0.0. + + ```sql title='Examples:' + SELECT JARO_WINKLER('databend', '') AS similarity; + + ┌────────────┐ + │ similarity │ + ├────────────┤ + │ 0 │ + └────────────┘ + ``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/join.md b/tidb-cloud-lake/sql/join.md new file mode 100644 index 0000000000000..b64133cabe961 --- /dev/null +++ b/tidb-cloud-lake/sql/join.md @@ -0,0 +1,920 @@ +--- +title: JOIN +--- + +## Overview + +Joins combine columns from two or more tables into one result set. Databend implements both ANSI SQL joins and Databend-specific extensions, allowing you to work with dimensional data, slowly changing facts, and time-series streams using the same syntax. + +## Supported Join Types + +* [Inner Join](#inner-join) +* [Natural Join](#natural-join) +* [Cross Join](#cross-join) +* [Left Join](#left-join) +* [Right Join](#right-join) +* [Full Outer Join](#full-outer-join) +* [Left / Right Semi Join](#left--right-semi-join) +* [Left / Right Anti Join](#left--right-anti-join) +* [Asof Join](#asof-join) + +## Sample Data + +### Prepare the Tables + +Run the following SQL once to create and populate the tables used throughout this page: + +```sql +-- VIP profile tables +CREATE OR REPLACE TABLE vip_info (client_id INT, region VARCHAR); +INSERT INTO vip_info VALUES + (101, 'Toronto'), + (102, 'Quebec'), + (103, 'Vancouver'); + +CREATE OR REPLACE TABLE purchase_records (client_id INT, item VARCHAR, qty INT); +INSERT INTO purchase_records VALUES + (100, 'Croissant', 2000), + (102, 'Donut', 3000), + (103, 'Coffee', 6000), + (106, 'Soda', 4000); + +CREATE OR REPLACE TABLE gift (gift VARCHAR); +INSERT INTO gift VALUES + ('Croissant'), ('Donut'), ('Coffee'), ('Soda'); + +-- IoT-style readings for ASOF examples +CREATE OR REPLACE TABLE sensor_readings ( + room VARCHAR, + reading_time TIMESTAMP, + temperature DOUBLE +); +INSERT INTO sensor_readings VALUES + ('LivingRoom', '2024-01-01 09:55:00', 22.8), + ('LivingRoom', '2024-01-01 10:00:00', 23.1), + ('LivingRoom', '2024-01-01 10:05:00', 23.3), + ('LivingRoom', '2024-01-01 10:10:00', 23.8), + ('LivingRoom', '2024-01-01 10:15:00', 24.0); + +CREATE OR REPLACE TABLE hvac_mode ( + room VARCHAR, + mode_time TIMESTAMP, + mode VARCHAR +); +INSERT INTO hvac_mode VALUES + ('LivingRoom', '2024-01-01 09:58:00', 'Cooling'), + ('LivingRoom', '2024-01-01 10:06:00', 'Fan'), + ('LivingRoom', '2024-01-01 10:30:00', 'Heating'); +``` + +### Preview the Data + +Unless stated otherwise, the examples below reuse the same tables so that you can compare the effect of each join type directly. + +```text +vip_info ++-----------+-----------+ +| client_id | region | ++-----------+-----------+ +| 101 | Toronto | +| 102 | Quebec | +| 103 | Vancouver | ++-----------+-----------+ + +purchase_records ++-----------+-----------+------+ +| client_id | item | qty | ++-----------+-----------+------+ +| 100 | Croissant | 2000 | +| 102 | Donut | 3000 | +| 103 | Coffee | 6000 | +| 106 | Soda | 4000 | ++-----------+-----------+------+ + +gift ++-----------+ +| gift | ++-----------+ +| Croissant | +| Donut | +| Coffee | +| Soda | ++-----------+ +``` + +sensor_readings ++-----------+---------------------+-------------+ +| room | reading_time | temperature | ++-----------+---------------------+-------------+ +| LivingRoom| 2024-01-01 09:55:00 | 22.8 | +| LivingRoom| 2024-01-01 10:00:00 | 23.1 | +| LivingRoom| 2024-01-01 10:05:00 | 23.3 | +| LivingRoom| 2024-01-01 10:10:00 | 23.8 | +| LivingRoom| 2024-01-01 10:15:00 | 24.0 | ++-----------+---------------------+-------------+ + +hvac_mode ++-----------+---------------------+----------+ +| room | mode_time | mode | ++-----------+---------------------+----------+ +| LivingRoom| 2024-01-01 09:58:00 | Cooling | +| LivingRoom| 2024-01-01 10:06:00 | Fan | +| LivingRoom| 2024-01-01 10:30:00 | Heating | ++-----------+---------------------+----------+ +``` + +## Inner Join + +An inner join returns rows that satisfy all join predicates. + +### Visual + +```text +┌──────────────────────────────┐ +│ vip_info (left) │ +├──────────────────────────────┤ +│ client_id | region │ +│ 101 | Toronto │ +│ 102 | Quebec │ +│ 103 | Vancouver │ +└──────────────────────────────┘ + │ client_id = client_id + ▼ +┌──────────────────────────────┐ +│ purchase_records (right) │ +├──────────────────────────────┤ +│ client_id | item | qty │ +│ 100 | Croissant | 2000 │ +│ 102 | Donut | 3000 │ +│ 103 | Coffee | 6000 │ +│ 106 | Soda | 4000 │ +└──────────────────────────────┘ + │ keep matches only + ▼ +┌──────────────────────────────┐ +│ INNER JOIN RESULT │ +├──────────────────────────────┤ +│ 102 | Donut | 3000 │ +│ 103 | Coffee | 6000 │ +└──────────────────────────────┘ +``` + +### Syntax + +```sql +SELECT select_list +FROM table_a + [INNER] JOIN table_b + ON join_condition +``` + +:::tip +`INNER` is optional. When the join columns share the same name, `USING(column_name)` can replace `ON table_a.column = table_b.column`. +::: + +### Example + +```sql +SELECT p.client_id, p.item, p.qty +FROM vip_info AS v +INNER JOIN purchase_records AS p + ON v.client_id = p.client_id; +``` + +Result: + +```text ++-----------+--------+------+ +| client_id | item | qty | ++-----------+--------+------+ +| 102 | Donut | 3000 | +| 103 | Coffee | 6000 | ++-----------+--------+------+ +``` + +## Natural Join + +A natural join automatically matches columns that have the same name in both tables. Only one copy of each matched column appears in the result. + +### Visual + +```text +┌──────────────────────────────┐ +│ vip_info │ +├──────────────────────────────┤ +│ client_id | region │ +│ 101 | Toronto │ +│ 102 | Quebec │ +│ 103 | Vancouver │ +└──────────────────────────────┘ + │ auto-match shared column names + ▼ +┌──────────────────────────────┐ +│ purchase_records │ +├──────────────────────────────┤ +│ client_id | item | qty │ +│ 100 | Croissant | 2000 │ +│ 102 | Donut | 3000 │ +│ 103 | Coffee | 6000 │ +│ 106 | Soda | 4000 │ +└──────────────────────────────┘ + │ emit shared columns once + ▼ +┌──────────────────────────────┐ +│ NATURAL JOIN RESULT │ +├──────────────────────────────┤ +│ 102: Quebec + Donut + 3000 │ +│ 103: Vanc. + Coffee + 6000 │ +└──────────────────────────────┘ +``` + +### Syntax + +```sql +SELECT select_list +FROM table_a +NATURAL JOIN table_b; +``` + +### Example + +```sql +SELECT client_id, item, qty +FROM vip_info +NATURAL JOIN purchase_records; +``` + +Result: + +```text ++-----------+--------+------+ +| client_id | item | qty | ++-----------+--------+------+ +| 102 | Donut | 3000 | +| 103 | Coffee | 6000 | ++-----------+--------+------+ +``` + +## Cross Join + +A cross join (Cartesian product) returns every combination of rows from the participating tables. + +### Visual + +```text +┌──────────────────────────────┐ +│ vip_info (3 rows) │ +├──────────────────────────────┤ +│ 101 | Toronto │ +│ 102 | Quebec │ +│ 103 | Vancouver │ +└──────────────────────────────┘ + │ pair with every gift + ▼ +┌──────────────────────────────┐ +│ gift (4 rows) │ +├──────────────────────────────┤ +│ Croissant │ +│ Donut │ +│ Coffee │ +│ Soda │ +└──────────────────────────────┘ + │ 3 × 4 combinations + ▼ +┌──────────────────────────────┐ +│ CROSS JOIN RESULT (snippet) │ +├──────────────────────────────┤ +│ 101 | Toronto | Croissant │ +│ 101 | Toronto | Donut │ +│ 101 | Toronto | Coffee │ +│ ... | ... | ... │ +└──────────────────────────────┘ +``` + +### Syntax + +```sql +SELECT select_list +FROM table_a +CROSS JOIN table_b; +``` + +### Example + +```sql +SELECT v.client_id, v.region, g.gift +FROM vip_info AS v +CROSS JOIN gift AS g; +``` + +Result (first few rows): + +```text ++-----------+----------+-----------+ +| client_id | region | gift | ++-----------+----------+-----------+ +| 101 | Toronto | Croissant | +| 101 | Toronto | Donut | +| 101 | Toronto | Coffee | +| 101 | Toronto | Soda | +| ... | ... | ... | ++-----------+----------+-----------+ +``` + +## Left Join + +A left join returns every row from the left table and the matching rows from the right table. When no match exists, the right-side columns are `NULL`. + +### Visual + +```text +┌──────────────────────────────┐ +│ vip_info (left preserved) │ +├──────────────────────────────┤ +│ 101 | Toronto │ +│ 102 | Quebec │ +│ 103 | Vancouver │ +└──────────────────────────────┘ + │ join on client_id + ▼ +┌──────────────────────────────┐ +│ purchase_records │ +├──────────────────────────────┤ +│ 100 | Croissant | 2000 │ +│ 102 | Donut | 3000 │ +│ 103 | Coffee | 6000 │ +│ 106 | Soda | 4000 │ +└──────────────────────────────┘ + │ unmatched right rows -> NULLs + ▼ +┌──────────────────────────────┐ +│ LEFT JOIN RESULT │ +├──────────────────────────────┤ +│ 101 | Toronto | NULL | NULL │ +│ 102 | Quebec | Donut | 3000 │ +│ 103 | Vanc. | Coffee | 6000│ +└──────────────────────────────┘ +``` + +### Syntax + +```sql +SELECT select_list +FROM table_a +LEFT [OUTER] JOIN table_b + ON join_condition; +``` + +:::tip +`OUTER` is optional. +::: + +### Example + +```sql +SELECT v.client_id, p.item, p.qty +FROM vip_info AS v +LEFT JOIN purchase_records AS p + ON v.client_id = p.client_id; +``` + +Result: + +```text ++-----------+--------+------+ +| client_id | item | qty | ++-----------+--------+------+ +| 101 | NULL | NULL | +| 102 | Donut | 3000 | +| 103 | Coffee | 6000 | ++-----------+--------+------+ +``` + +## Right Join + +A right join mirrors the left join: all rows from the right table appear, and unmatched rows from the left table produce `NULL`s. + +### Visual + +```text +┌──────────────────────────────┐ +│ purchase_records (right) │ +├──────────────────────────────┤ +│ 100 | Croissant | 2000 │ +│ 102 | Donut | 3000 │ +│ 103 | Coffee | 6000 │ +│ 106 | Soda | 4000 │ +└──────────────────────────────┘ + ▲ right table preserved + │ join on client_id +┌──────────────────────────────┐ +│ vip_info │ +├──────────────────────────────┤ +│ 101 | Toronto │ +│ 102 | Quebec │ +│ 103 | Vancouver │ +└──────────────────────────────┘ + ▼ fill missing VIP data with NULL +┌──────────────────────────────┐ +│ RIGHT JOIN RESULT │ +├──────────────────────────────┤ +│ 100 | Croissant | vip=NULL │ +│ 102 | Donut | region=Quebec │ +│ 103 | Coffee | region=Vanc. │ +│ 106 | Soda | vip=NULL │ +└──────────────────────────────┘ +``` + +### Syntax + +```sql +SELECT select_list +FROM table_a +RIGHT [OUTER] JOIN table_b + ON join_condition; +``` + +### Example + +```sql +SELECT v.client_id, v.region +FROM vip_info AS v +RIGHT JOIN purchase_records AS p + ON v.client_id = p.client_id; +``` + +Result: + +```text ++-----------+-----------+ +| client_id | region | ++-----------+-----------+ +| NULL | NULL | +| 102 | Quebec | +| 103 | Vancouver | +| NULL | NULL | ++-----------+-----------+ +``` + +## Full Outer Join + +A full outer join returns the union of left and right joins: every row from both tables, with `NULL`s where no match exists. + +### Visual + +```text +┌──────────────────────────────┐ +│ vip_info │ +├──────────────────────────────┤ +│ 101 | Toronto │ +│ 102 | Quebec │ +│ 103 | Vancouver │ +└──────────────────────────────┘ +┌──────────────────────────────┐ +│ purchase_records │ +├──────────────────────────────┤ +│ 100 | Croissant | 2000 │ +│ 102 | Donut | 3000 │ +│ 103 | Coffee | 6000 │ +│ 106 | Soda | 4000 │ +└──────────────────────────────┘ + │ combine matches + left-only + right-only + ▼ +┌──────────────────────────────┐ +│ FULL OUTER JOIN RESULT │ +├──────────────────────────────┤ +│ Toronto | NULL │ +│ Quebec | Donut │ +│ Vanc. | Coffee │ +│ NULL | Croissant │ +│ NULL | Soda │ +└──────────────────────────────┘ +``` + +### Syntax + +```sql +SELECT select_list +FROM table_a +FULL [OUTER] JOIN table_b + ON join_condition; +``` + +### Example + +```sql +SELECT v.region, p.item +FROM vip_info AS v +FULL OUTER JOIN purchase_records AS p + ON v.client_id = p.client_id; +``` + +Result: + +```text ++-----------+-----------+ +| region | item | ++-----------+-----------+ +| Toronto | NULL | +| Quebec | Donut | +| Vancouver | Coffee | +| NULL | Croissant | +| NULL | Soda | ++-----------+-----------+ +``` + +## Left / Right Semi Join + +Semi joins filter the left (or right) table to rows that have at least one match in the opposite table. Unlike inner joins, only columns from the preserved side are returned. + +### Visual + +```text +LEFT SEMI JOIN +┌──────────────────────────────┐ +│ vip_info │ +├──────────────────────────────┤ +│ 101 | Toronto │ +│ 102 | Quebec │ +│ 103 | Vancouver │ +└──────────────────────────────┘ + │ keep rows that find matches + ▼ +┌──────────────────────────────┐ +│ purchase_records │ +├──────────────────────────────┤ +│ 100 | Croissant | 2000 │ +│ 102 | Donut | 3000 │ +│ 103 | Coffee | 6000 │ +│ 106 | Soda | 4000 │ +└──────────────────────────────┘ + ▼ +┌──────────────────────────────┐ +│ LEFT SEMI RESULT │ +├──────────────────────────────┤ +│ 102 | Quebec │ +│ 103 | Vanc. │ +└──────────────────────────────┘ + +RIGHT SEMI JOIN +┌──────────────────────────────┐ +│ purchase_records │ +├──────────────────────────────┤ +│ 100 | Croissant | 2000 │ +│ 102 | Donut | 3000 │ +│ 103 | Coffee | 6000 │ +│ 106 | Soda | 4000 │ +└──────────────────────────────┘ + │ keep rows with VIP matches + ▼ +┌──────────────────────────────┐ +│ vip_info │ +├──────────────────────────────┤ +│ 101 | Toronto │ +│ 102 | Quebec │ +│ 103 | Vancouver │ +└──────────────────────────────┘ + ▼ +┌──────────────────────────────┐ +│ RIGHT SEMI RESULT │ +├──────────────────────────────┤ +│ 102 | Donut | 3000 │ +│ 103 | Coffee | 6000 │ +└──────────────────────────────┘ +``` + +### Syntax + +```sql +-- Left Semi Join +SELECT select_list +FROM table_a +LEFT SEMI JOIN table_b + ON join_condition; + +-- Right Semi Join +SELECT select_list +FROM table_a +RIGHT SEMI JOIN table_b + ON join_condition; +``` + +### Examples + +Left semi join—return VIP clients with purchases: + +```sql +SELECT * +FROM vip_info +LEFT SEMI JOIN purchase_records + ON vip_info.client_id = purchase_records.client_id; +``` + +Result: + +```text ++-----------+-----------+ +| client_id | region | ++-----------+-----------+ +| 102 | Quebec | +| 103 | Vancouver | ++-----------+-----------+ +``` + +Right semi join—return purchase rows that belong to VIP clients: + +```sql +SELECT * +FROM vip_info +RIGHT SEMI JOIN purchase_records + ON vip_info.client_id = purchase_records.client_id; +``` + +Result: + +```text ++-----------+--------+------+ +| client_id | item | qty | ++-----------+--------+------+ +| 102 | Donut | 3000 | +| 103 | Coffee | 6000 | ++-----------+--------+------+ +``` + +## Left / Right Anti Join + +Anti joins return rows that do **not** have a matching row on the other side, making them ideal for existence checks. + +### Visual + +```text +LEFT ANTI JOIN +┌──────────────────────────────┐ +│ vip_info │ +├──────────────────────────────┤ +│ 101 | Toronto │ +│ 102 | Quebec │ +│ 103 | Vancouver │ +└──────────────────────────────┘ + │ remove rows with matches + ▼ +┌──────────────────────────────┐ +│ purchase_records │ +├──────────────────────────────┤ +│ 100 | Croissant | 2000 │ +│ 102 | Donut | 3000 │ +│ 103 | Coffee | 6000 │ +│ 106 | Soda | 4000 │ +└──────────────────────────────┘ + ▼ +┌──────────────────────────────┐ +│ LEFT ANTI RESULT │ +├──────────────────────────────┤ +│ 101 | Toronto │ +└──────────────────────────────┘ + +RIGHT ANTI JOIN +┌──────────────────────────────┐ +│ purchase_records │ +├──────────────────────────────┤ +│ 100 | Croissant | 2000 │ +│ 102 | Donut | 3000 │ +│ 103 | Coffee | 6000 │ +│ 106 | Soda | 4000 │ +└──────────────────────────────┘ + │ remove rows with VIP matches + ▼ +┌──────────────────────────────┐ +│ vip_info │ +├──────────────────────────────┤ +│ 101 | Toronto │ +│ 102 | Quebec │ +│ 103 | Vancouver │ +└──────────────────────────────┘ + ▼ +┌──────────────────────────────┐ +│ RIGHT ANTI RESULT │ +├──────────────────────────────┤ +│ 100 | Croissant | 2000 │ +│ 106 | Soda | 4000 │ +└──────────────────────────────┘ +``` + +### Syntax + +```sql +-- Left Anti Join +SELECT select_list +FROM table_a +LEFT ANTI JOIN table_b + ON join_condition; + +-- Right Anti Join +SELECT select_list +FROM table_a +RIGHT ANTI JOIN table_b + ON join_condition; +``` + +### Examples + +Left anti join—VIP clients with no purchases: + +```sql +SELECT * +FROM vip_info +LEFT ANTI JOIN purchase_records + ON vip_info.client_id = purchase_records.client_id; +``` + +Result: + +```text ++-----------+---------+ +| client_id | region | ++-----------+---------+ +| 101 | Toronto | ++-----------+---------+ +``` + +Right anti join—purchase records that do not belong to a VIP client: + +```sql +SELECT * +FROM vip_info +RIGHT ANTI JOIN purchase_records + ON vip_info.client_id = purchase_records.client_id; +``` + +Result: + +```text ++-----------+-----------+------+ +| client_id | item | qty | ++-----------+-----------+------+ +| 100 | Croissant | 2000 | +| 106 | Soda | 4000 | ++-----------+-----------+------+ +``` + +## Asof Join + +An ASOF (Approximate Sort-Merge) join matches each row in a left-ordered stream to the most recent row on the right whose timestamp is **less than or equal to** the left timestamp. Optional equality predicates (for keys such as `symbol`) can further constrain the match. ASOF joins power analytics like attaching the latest quote to each trade. + +Think of ASOF as "give me the latest contextual row that happened **before or at** this event." + +### Matching Rules + +1. Partition both tables by the equality keys (for example, `symbol`). +2. Within each partition, ensure both tables are sorted by the inequality column (for example, `time`). +3. When visiting a left row, attach the latest right row whose timestamp is `<=` the left timestamp; if none exists, the right columns are `NULL`. + +### Quick Example (Room Temperature vs HVAC Mode) + +```text +┌──────────────────────────────┐ +│ sensor_readings (left table) │ +├──────────────────────────────┤ +│ room | time | temperature │ +│ LR | 09:55 | 22.8C │ +│ LR | 10:00 | 23.1C │ +│ LR | 10:05 | 23.3C │ +│ LR | 10:10 | 23.8C │ +│ LR | 10:15 | 24.0C │ +└──────────────────────────────┘ + +┌──────────────────────────────┐ +│ hvac_mode (right table) │ +├──────────────────────────────┤ +│ room | time | mode │ +│ LR | 09:58 | Cooling │ +│ LR | 10:06 | Fan │ +│ LR | 10:30 | Heating │ +└──────────────────────────────┘ + +┌────────────────────────────────────────────────────────────┐ +│ Result of ASOF JOIN ON r.room = m.room │ +│ AND r.reading_time >= m.mode_time │ +├────────────────────────────────────────────────────────────┤ +│ 10:00 reading -> matches 09:58 mode (latest <= 10:00) │ +│ 10:05 reading -> still matches 09:58 (no newer mode yet) │ +│ 10:10 reading -> matches 10:06 mode │ +│ 10:15 reading -> matches 10:06 mode │ +│ 09:55 reading -> no row (ASOF behaves like INNER JOIN) │ +└────────────────────────────────────────────────────────────┘ +``` + +In LEFT ASOF joins every sensor reading stays (for example, the 09:55 reading keeps `NULL` because no HVAC mode has started yet). In RIGHT ASOF joins you keep all HVAC changes (even if no reading has happened yet to reference them). + +### Syntax + +```sql +SELECT select_list +FROM table_a +ASOF [LEFT | RIGHT] JOIN table_b + ON table_a.time >= table_b.time + [AND table_a.key = table_b.key]; +``` + +### Example Tables + +Run the following once to reproduce the HVAC scenario shown below: + +```sql +CREATE OR REPLACE TABLE sensor_readings ( + reading_time TIMESTAMP, + temperature DOUBLE +); +INSERT INTO sensor_readings VALUES + ('2024-01-01 10:00:00', 23.1), + ('2024-01-01 10:05:00', 23.3), + ('2024-01-01 10:10:00', 23.8), + ('2024-01-01 10:15:00', 24.0); + +CREATE OR REPLACE TABLE hvac_mode ( + mode_time TIMESTAMP, + mode VARCHAR +); +INSERT INTO hvac_mode VALUES + ('2024-01-01 09:58:00', 'Cooling'), + ('2024-01-01 10:06:00', 'Fan'), + ('2024-01-01 10:30:00', 'Heating'); +``` + +### Examples + +Match each temperature reading with the latest HVAC mode that started before it: + +```sql +SELECT r.reading_time, r.temperature, m.mode +FROM sensor_readings AS r +ASOF JOIN hvac_mode AS m + ON r.room = m.room + AND r.reading_time >= m.mode_time +ORDER BY r.reading_time; +``` + +Result: + +```text +┌─────────────────────┬─────────────┬────────────┐ +│ reading_time │ temperature │ mode │ +├─────────────────────┼─────────────┼────────────┤ +│ 2024-01-01 10:00:00 │ 23.1C │ Cooling │ +│ 2024-01-01 10:05:00 │ 23.3C │ Cooling │ +│ 2024-01-01 10:10:00 │ 23.8C │ Fan │ +│ 2024-01-01 10:15:00 │ 24.0C │ Fan │ +└─────────────────────┴─────────────┴────────────┘ +``` + +ASOF left join—keep all sensor readings even if no HVAC mode was active yet: + +```sql +SELECT r.reading_time, r.temperature, m.mode +FROM sensor_readings AS r +ASOF LEFT JOIN hvac_mode AS m + ON r.room = m.room + AND r.reading_time >= m.mode_time +ORDER BY r.reading_time; +``` + +Result: + +```text +┌─────────────────────┬─────────────┬────────────┐ +│ reading_time │ temperature │ mode │ +├─────────────────────┼─────────────┼────────────┤ +│ 2024-01-01 09:55:00 │ 22.8C │ NULL │ ← before first HVAC mode +│ 2024-01-01 10:00:00 │ 23.1C │ Cooling │ +│ 2024-01-01 10:05:00 │ 23.3C │ Cooling │ +│ 2024-01-01 10:10:00 │ 23.8C │ Fan │ +│ 2024-01-01 10:15:00 │ 24.0C │ Fan │ +└─────────────────────┴─────────────┴────────────┘ +``` + +ASOF right join—keep all HVAC mode changes even if no later sensor reading references them: + +```sql +SELECT r.reading_time, r.temperature, m.mode_time, m.mode +FROM sensor_readings AS r +ASOF RIGHT JOIN hvac_mode AS m + ON r.room = m.room + AND r.reading_time >= m.mode_time +ORDER BY m.mode_time, r.reading_time; +``` + +Result: + +```text +┌─────────────────────┬─────────────┬─────────────────────┬────────────┐ +│ reading_time │ temperature │ mode_time │ mode │ +├─────────────────────┼─────────────┼─────────────────────┼────────────┤ +│ 2024-01-01 10:00:00 │ 23.1C │ 2024-01-01 09:58:00 │ Cooling │ +│ 2024-01-01 10:05:00 │ 23.3C │ 2024-01-01 09:58:00 │ Cooling │ +│ 2024-01-01 10:10:00 │ 23.8C │ 2024-01-01 10:06:00 │ Fan │ +│ 2024-01-01 10:15:00 │ 24.0C │ 2024-01-01 10:06:00 │ Fan │ +│ NULL │ NULL │ 2024-01-01 10:30:00 │ Heating │ ← waiting for reading +└─────────────────────┴─────────────┴─────────────────────┴────────────┘ +``` + +Multiple readings can land in the same HVAC interval, so a RIGHT ASOF join can emit more than one row per mode; the final `NULL` row shows the newly scheduled `Heating` mode that has not yet matched a reading. diff --git a/tidb-cloud-lake/sql/jq.md b/tidb-cloud-lake/sql/jq.md new file mode 100644 index 0000000000000..1bc946710ee1c --- /dev/null +++ b/tidb-cloud-lake/sql/jq.md @@ -0,0 +1,93 @@ +--- +title: JQ +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +The JQ function is a set-returning SQL function that allows you to apply [jq](https://jqlang.github.io/jq/) filters to JSON data stored in Variant columns. With this function, you can process JSON data by applying a specified jq filter, returning the results as a set of rows. + +## Syntax + +```sql +JQ (, ) +``` + +| Parameter | Description | +|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `jq_expression` | A `jq` filter expression that defines how to process and transform JSON data using the `jq` syntax. This expression can specify how to select, modify, and manipulate data within JSON objects and arrays. For information on the syntax, filters, and functions supported by jq, please refer to the [jq Manual](https://jqlang.github.io/jq/manual/#basic-filters). | +| `json_data` | The JSON-formatted input that you want to process or transform using the `jq` filter expression. It can be a JSON object, array, or any valid JSON data structure. | + +## Return Type + +The JQ function returns a set of JSON values, where each value corresponds to an element of the transformed or extracted result based on the ``. + +## Examples + +To start, we create a table named `customer_data` with columns for `id` and `profile`, where `profile` is a JSON type to store user information: + +```sql +CREATE TABLE customer_data ( + id INT, + profile JSON +); + +INSERT INTO customer_data VALUES + (1, '{"name": "Alice", "age": 30, "city": "New York"}'), + (2, '{"name": "Bob", "age": 25, "city": "Los Angeles"}'), + (3, '{"name": "Charlie", "age": 35, "city": "Chicago"}'); +``` + +This example extracts specific fields from the JSON data: + +```sql +SELECT + id, + jq('.name', profile) AS customer_name +FROM + customer_data; + +┌─────────────────────────────────────┐ +│ id │ customer_name │ +├─────────────────┼───────────────────┤ +│ 1 │ "Alice" │ +│ 2 │ "Bob" │ +│ 3 │ "Charlie" │ +└─────────────────────────────────────┘ +``` + +This example selects the user ID and the age incremented by 1 for each user: + +```sql +SELECT + id, + jq('.age + 1', profile) AS updated_age +FROM + customer_data; + +┌─────────────────────────────────────┐ +│ id │ updated_age │ +├─────────────────┼───────────────────┤ +│ 1 │ 31 │ +│ 2 │ 26 │ +│ 3 │ 36 │ +└─────────────────────────────────────┘ +``` + +This example converts city names to uppercase: + +```sql +SELECT + id, + jq('.city | ascii_upcase', profile) AS city_uppercase +FROM + customer_data; + +┌─────────────────────────────────────┐ +│ id │ city_uppercase │ +├─────────────────┼───────────────────┤ +│ 1 │ "NEW YORK" │ +│ 2 │ "LOS ANGELES" │ +│ 3 │ "CHICAGO" │ +└─────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/json-array-agg.md b/tidb-cloud-lake/sql/json-array-agg.md new file mode 100644 index 0000000000000..ca363aac0df59 --- /dev/null +++ b/tidb-cloud-lake/sql/json-array-agg.md @@ -0,0 +1,57 @@ +--- +title: JSON_ARRAY_AGG +title_includes: JSON_AGG +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts values into a JSON array while skipping NULLs. + +See also: [JSON_OBJECT_AGG](/tidb-cloud-lake/sql/json-object-agg.md) + +## Syntax + +```sql +JSON_ARRAY_AGG() +``` + +## Return Type + +JSON array. + +## Examples + +This example demonstrates how JSON_ARRAY_AGG aggregates values from each column into JSON arrays: + +```sql +CREATE TABLE d ( + a DECIMAL(10, 2), + b STRING, + c INT, + d VARIANT, + e ARRAY(STRING) +); + +INSERT INTO d VALUES + (20, 'abc', NULL, '{"k":"v"}', ['a','b']), + (10, 'de', 100, 'null', []), + (4.23, NULL, 200, '"uvw"', ['x','y']), + (5.99, 'xyz', 300, '[1,2,3]', ['z']); + +SELECT + json_array_agg(a) AS aggregated_a, + json_array_agg(b) AS aggregated_b, + json_array_agg(c) AS aggregated_c, + json_array_agg(d) AS aggregated_d, + json_array_agg(e) AS aggregated_e +FROM d; + +-[ RECORD 1 ]----------------------------------- +aggregated_a: [20.0,10.0,4.23,5.99] +aggregated_b: ["abc","de","xyz"] +aggregated_c: [100,200,300] +aggregated_d: [{"k":"v"},null,"uvw",[1,2,3]] +aggregated_e: [["a","b"],[],["x","y"],["z"]] +``` diff --git a/tidb-cloud-lake/sql/json-array-elements.md b/tidb-cloud-lake/sql/json-array-elements.md new file mode 100644 index 0000000000000..81027dc97a1fd --- /dev/null +++ b/tidb-cloud-lake/sql/json-array-elements.md @@ -0,0 +1,66 @@ +--- +title: JSON_ARRAY_ELEMENTS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Extracts the elements from a JSON array, returning them as individual rows in the result set. JSON_ARRAY_ELEMENTS does not recursively expand nested arrays; it treats them as single elements. + + + +## Syntax + +```sql +JSON_ARRAY_ELEMENTS() +``` + +## Return Type + +JSON_ARRAY_ELEMENTS returns a set of VARIANT values, each representing an element extracted from the input JSON array. + +## Examples + +```sql +-- Extract individual elements from a JSON array containing product information +SELECT + JSON_ARRAY_ELEMENTS( + PARSE_JSON ( + '[ + {"product": "Laptop", "brand": "Apple", "price": 1500}, + {"product": "Smartphone", "brand": "Samsung", "price": 800}, + {"product": "Headphones", "brand": "Sony", "price": 150} +]' + ) + ); + +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ json_array_elements(parse_json('[ \n {"product": "laptop", "brand": "apple", "price": 1500},\n {"product": "smartphone", "brand": "samsung", "price": 800},\n {"product": "headphones", "brand": "sony", "price": 150}\n]')) │ +├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ {"brand":"Apple","price":1500,"product":"Laptop"} │ +│ {"brand":"Samsung","price":800,"product":"Smartphone"} │ +│ {"brand":"Sony","price":150,"product":"Headphones"} │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ + +-- Display data types of the extracted elements +SELECT + TYPEOF ( + JSON_ARRAY_ELEMENTS( + PARSE_JSON ( + '[ + {"product": "Laptop", "brand": "Apple", "price": 1500}, + {"product": "Smartphone", "brand": "Samsung", "price": 800}, + {"product": "Headphones", "brand": "Sony", "price": 150} +]' + ) + ) + ); + +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ typeof(json_array_elements(parse_json('[ \n {"product": "laptop", "brand": "apple", "price": 1500},\n {"product": "smartphone", "brand": "samsung", "price": 800},\n {"product": "headphones", "brand": "sony", "price": 150}\n]'))) │ +├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ VARIANT NULL │ +│ VARIANT NULL │ +│ VARIANT NULL │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/json-array-transform.md b/tidb-cloud-lake/sql/json-array-transform.md new file mode 100644 index 0000000000000..be14b53568711 --- /dev/null +++ b/tidb-cloud-lake/sql/json-array-transform.md @@ -0,0 +1,33 @@ +--- +title: JSON_ARRAY_TRANSFORM +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Transforms each element of a JSON array using a specified transformation Lambda expression. For more information about Lambda expression, see [Lambda Expressions](/tidb-cloud-lake/sql/stored-procedure-sql-scripting.md#lambda-expressions). + +## Syntax + +```sql +ARRAY_TRANSFORM(, ) +``` + +## Return Type + +JSON array. + +## Examples + +In this example, each numeric element in the array is multiplied by 10, transforming the original array into `[10, 20, 30, 40]`: + +```sql +SELECT ARRAY_TRANSFORM( + [1, 2, 3, 4], + data -> (data::Int * 10) +); + +-[ RECORD 1 ]----------------------------------- +array_transform([1, 2, 3, 4], data -> data::Int32 * 10): [10,20,30,40] +``` diff --git a/tidb-cloud-lake/sql/json-contains-left.md b/tidb-cloud-lake/sql/json-contains-left.md new file mode 100644 index 0000000000000..310d71cd2657f --- /dev/null +++ b/tidb-cloud-lake/sql/json-contains-left.md @@ -0,0 +1,68 @@ +--- +title: JSON_CONTAINS_IN_LEFT +title_includes: JSON_CONTAINS_IN_RIGHT +--- + +Tests containment relationships between two `VARIANT` values: + +- `JSON_CONTAINS_IN_LEFT(left, right)` returns `TRUE` when *left* contains *right* (i.e., *left* is a superset). +- `JSON_CONTAINS_IN_RIGHT(left, right)` returns `TRUE` when *right* contains *left*. + +Containment works for both JSON objects and arrays. + +## Syntax + +```sql +JSON_CONTAINS_IN_LEFT(, ) +JSON_CONTAINS_IN_RIGHT(, ) +``` + +## Return Type + +`BOOLEAN` + +## Examples + +```sql +SELECT JSON_CONTAINS_IN_LEFT(PARSE_JSON('{"a":1,"b":{"c":2}}'), + PARSE_JSON('{"b":{"c":2}}')) AS left_contains; + +┌──────────────┐ +│ left_contains│ +├──────────────┤ +│ true │ +└──────────────┘ +``` + +```sql +SELECT JSON_CONTAINS_IN_LEFT(PARSE_JSON('[1,2,3]'), + PARSE_JSON('[2,3]')) AS left_contains; + +┌──────────────┐ +│ left_contains│ +├──────────────┤ +│ true │ +└──────────────┘ +``` + +```sql +SELECT JSON_CONTAINS_IN_LEFT(PARSE_JSON('[1,2]'), + PARSE_JSON('[2,4]')) AS left_contains; + +┌──────────────┐ +│ left_contains│ +├──────────────┤ +│ false │ +└──────────────┘ +``` + +```sql +SELECT JSON_CONTAINS_IN_RIGHT(PARSE_JSON('{"a":1}'), + PARSE_JSON('{"a":1,"b":2}')) AS right_contains; + +┌───────────────┐ +│ right_contains│ +├───────────────┤ +│ true │ +└───────────────┘ +``` diff --git a/tidb-cloud-lake/sql/json-each.md b/tidb-cloud-lake/sql/json-each.md new file mode 100644 index 0000000000000..7107cb47f6835 --- /dev/null +++ b/tidb-cloud-lake/sql/json-each.md @@ -0,0 +1,59 @@ +--- +title: JSON_EACH +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Extracts key-value pairs from a JSON object, breaking down the structure into individual rows in the result set. Each row represents a distinct key-value pair derived from the input JSON expression. + +## Syntax + +```sql +JSON_EACH() +``` + +## Return Type + +JSON_EACH returns a set of tuples, each consisting of a STRING key and a corresponding VARIANT value. + +## Examples + +```sql +-- Extract key-value pairs from a JSON object representing information about a person +SELECT + JSON_EACH( + PARSE_JSON ( + '{"name": "John", "age": 25, "isStudent": false, "grades": [90, 85, 92]}' + ) + ); + + +┌──────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ json_each(parse_json('{"name": "john", "age": 25, "isstudent": false, "grades": [90, 85, 92]}')) │ +├──────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ ('age','25') │ +│ ('grades','[90,85,92]') │ +│ ('isStudent','false') │ +│ ('name','"John"') │ +└──────────────────────────────────────────────────────────────────────────────────────────────────┘ + +-- Display data types of the extracted values +SELECT + TYPEOF ( + JSON_EACH( + PARSE_JSON ( + '{"name": "John", "age": 25, "isStudent": false, "grades": [90, 85, 92]}' + ) + ) + ); + +┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ typeof(json_each(parse_json('{"name": "john", "age": 25, "isstudent": false, "grades": [90, 85, 92]}'))) │ +├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ TUPLE(STRING, VARIANT) NULL │ +│ TUPLE(STRING, VARIANT) NULL │ +│ TUPLE(STRING, VARIANT) NULL │ +│ TUPLE(STRING, VARIANT) NULL │ +└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/json-exists-key.md b/tidb-cloud-lake/sql/json-exists-key.md new file mode 100644 index 0000000000000..4f6e79fe564e8 --- /dev/null +++ b/tidb-cloud-lake/sql/json-exists-key.md @@ -0,0 +1,64 @@ +--- +title: JSON_EXISTS_KEY +title_includes: JSON_EXISTS_ANY_KEYS, JSON_EXISTS_ALL_KEYS +--- + +Checks whether a JSON object contains one or more keys. + +- `JSON_EXISTS_KEY` tests a single key. +- `JSON_EXISTS_ANY_KEYS` accepts an array of keys and returns `TRUE` when at least one key exists. +- `JSON_EXISTS_ALL_KEYS` returns `TRUE` only when every key in the array exists. + +## Syntax + +```sql +JSON_EXISTS_KEY(, ) +JSON_EXISTS_ANY_KEYS(, ) +JSON_EXISTS_ALL_KEYS(, ) +``` + +## Return Type + +`BOOLEAN` + +## Examples + +```sql +SELECT JSON_EXISTS_KEY(PARSE_JSON('{"a":1,"b":2}'), 'b') AS has_b; + +┌──────┐ +│ has_b│ +├──────┤ +│ true │ +└──────┘ +``` + +```sql +SELECT JSON_EXISTS_ANY_KEYS(PARSE_JSON('{"a":1,"b":2}'), ['x','b']) AS any_key; + +┌────────┐ +│ any_key│ +├────────┤ +│ true │ +└────────┘ +``` + +```sql +SELECT JSON_EXISTS_ALL_KEYS(PARSE_JSON('{"a":1,"b":2}'), ['a','b','c']) AS all_keys; + +┌────────┐ +│ all_keys│ +├────────┤ +│ false │ +└────────┘ +``` + +```sql +SELECT JSON_EXISTS_ALL_KEYS(PARSE_JSON('{"a":1,"b":2}'), ['a','b']) AS all_keys; + +┌────────┐ +│ all_keys│ +├────────┤ +│ true │ +└────────┘ +``` diff --git a/tidb-cloud-lake/sql/json-extract-path-text.md b/tidb-cloud-lake/sql/json-extract-path-text.md new file mode 100644 index 0000000000000..76f20905c884b --- /dev/null +++ b/tidb-cloud-lake/sql/json-extract-path-text.md @@ -0,0 +1,56 @@ +--- +title: JSON_EXTRACT_PATH_TEXT +--- + +Extracts value from a Json string by `path_name`. +The value is returned as a `String` or `NULL` if either of the arguments is `NULL`. +This function is equivalent to `to_varchar(GET_PATH(PARSE_JSON(JSON), PATH_NAME))`. + +## Syntax + +```sql +JSON_EXTRACT_PATH_TEXT( , ) +``` + +## Arguments + +| Arguments | Description | +|---------------|------------------------------------------------------------------| +| `` | The Json String value | +| `` | The String value that consists of a concatenation of field names | + +## Return Type + +String + +## Examples + +```sql +SELECT json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k1[0]'); ++-------------------------------------------------------------------------+ +| json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k1[0]') | ++-------------------------------------------------------------------------+ +| 0 | ++-------------------------------------------------------------------------+ + +SELECT json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k2:k3'); ++-------------------------------------------------------------------------+ +| json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k2:k3') | ++-------------------------------------------------------------------------+ +| 3 | ++-------------------------------------------------------------------------+ + +SELECT json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k2.k4'); ++-------------------------------------------------------------------------+ +| json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k2.k4') | ++-------------------------------------------------------------------------+ +| 4 | ++-------------------------------------------------------------------------+ + +SELECT json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k2.k5'); ++-------------------------------------------------------------------------+ +| json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k2.k5') | ++-------------------------------------------------------------------------+ +| NULL | ++-------------------------------------------------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/json-functions.md b/tidb-cloud-lake/sql/json-functions.md new file mode 100644 index 0000000000000..de8b0204e78b1 --- /dev/null +++ b/tidb-cloud-lake/sql/json-functions.md @@ -0,0 +1,63 @@ +--- +title: JSON Functions +--- + +This section provides reference information for JSON functions in Databend. JSON functions enable parsing, validation, querying, and manipulation of JSON data structures. + +## JSON Parsing & Validation + +| Function | Description | Example | +|----------|-------------|---------| +| [PARSE_JSON](/tidb-cloud-lake/sql/parse-json.md) | Parses a JSON string into a variant value | `PARSE_JSON('{"name":"John","age":30}')` → `{"name":"John","age":30}` | +| [CHECK_JSON](/tidb-cloud-lake/sql/check-json.md) | Validates if a string is valid JSON | `CHECK_JSON('{"valid": true}')` → `true` | + +## JSON Type Information + +| Function | Description | Example | +|----------|-------------|---------| +| [JSON_TYPEOF](/tidb-cloud-lake/sql/json-typeof.md) | Returns the type of a JSON value | `JSON_TYPEOF('{"key": "value"}')` → `'OBJECT'` | + +## JSON Conversion + +| Function | Description | Example | +|----------|-------------|---------| +| [JSON_TO_STRING](/tidb-cloud-lake/sql/json-to-string.md) | Converts a JSON value to a string | `JSON_TO_STRING({"name":"John"})` → `'{"name":"John"}'` | + +## JSON Path Operations + +| Function | Description | Example | +|----------|-------------|---------| +| [JSON_PATH_EXISTS](/tidb-cloud-lake/sql/json-path-exists.md) | Checks if a JSON path exists | `JSON_PATH_EXISTS('{"a":1}', '$.a')` → `true` | +| [JSON_PATH_MATCH](/tidb-cloud-lake/sql/json-path-match.md) | Matches JSON values against a path pattern | `JSON_PATH_MATCH('{"items":[1,2,3]}', '$.items[*]')` → `[1,2,3]` | +| [JSON_PATH_QUERY](/tidb-cloud-lake/sql/json-path-query.md) | Queries JSON data using JSONPath | `JSON_PATH_QUERY('{"a":1,"b":2}', '$.a')` → `1` | +| [JSON_PATH_QUERY_ARRAY](/tidb-cloud-lake/sql/json-path-query-array.md) | Queries JSON data and returns results as an array | `JSON_PATH_QUERY_ARRAY('[1,2,3]', '$[*]')` → `[1,2,3]` | +| [JSON_PATH_QUERY_FIRST](/tidb-cloud-lake/sql/json-path-query-first.md) | Returns the first result from a JSON path query | `JSON_PATH_QUERY_FIRST('[1,2,3]', '$[*]')` → `1` | + +## JSON Data Extraction + +| Function | Description | Example | +|----------|-------------|---------| +| [GET](/tidb-cloud-lake/sql/get.md) | Extracts value from JSON by index or field name | `GET('{"name":"John"}', 'name')` → `"John"` | +| [GET_IGNORE_CASE](/tidb-cloud-lake/sql/get-ignore-case.md) | Extracts value with case-insensitive field matching | `GET_IGNORE_CASE('{"Name":"John"}', 'name')` → `"John"` | +| [GET_BY_KEYPATH](/tidb-cloud-lake/sql/get-by-keypath.md) | Extracts nested value using brace key paths | `GET_BY_KEYPATH('{"user":{"name":"Ada"}}', '{user,name}')` → `"Ada"` | +| [GET_PATH](/tidb-cloud-lake/sql/get-path.md) | Extracts value using path notation | `GET_PATH('{"user":{"name":"John"}}', 'user.name')` → `"John"` | +| [JSON_EXTRACT_PATH_TEXT](/tidb-cloud-lake/sql/json-extract-path-text.md) | Extracts text value from JSON using path | `JSON_EXTRACT_PATH_TEXT('{"name":"John"}', 'name')` → `'John'` | +| [JSON_EACH](/tidb-cloud-lake/sql/json-each.md) | Expands JSON object into key-value pairs | `JSON_EACH('{"a":1,"b":2}')` → `[("a",1),("b",2)]` | +| [JSON_ARRAY_ELEMENTS](/tidb-cloud-lake/sql/json-array-elements.md) | Expands JSON array into individual elements | `JSON_ARRAY_ELEMENTS('[1,2,3]')` → `1, 2, 3` | + +## JSON Formatting & Processing + +| Function | Description | Example | +|----------|-------------|---------| +| [JSON_PRETTY](/tidb-cloud-lake/sql/json-pretty.md) | Formats JSON with proper indentation | `JSON_PRETTY('{"a":1}')` → Formatted JSON string | +| [STRIP_NULL_VALUE](/tidb-cloud-lake/sql/strip-null-value.md) | Removes null values from JSON | `STRIP_NULL_VALUE('{"a":1,"b":null}')` → `{"a":1}` | +| [JQ](/tidb-cloud-lake/sql/jq.md) | Processes JSON using jq-style queries | `JQ('{"name":"John"}', '.name')` → `"John"` | + +## JSON Containment & Existence + +| Function | Description | Example | +|----------|-------------|---------| +| [JSON_CONTAINS_IN_LEFT](/tidb-cloud-lake/sql/contains.md) | Tests whether the left JSON contains the right JSON | `JSON_CONTAINS_IN_LEFT('{"a":1,"b":2}', '{"b":2}')` → `true` | +| [JSON_EXISTS_KEY](json/json-exists-keys) | Checks whether specific keys exist | `JSON_EXISTS_KEY('{"a":1}', 'a')` → `true` | +| [JSON_EXISTS_ANY_KEYS](json/json-exists-keys) | Returns `true` if any key in the list exists | `JSON_EXISTS_ANY_KEYS('{"a":1}', ['x','a'])` → `true` | +| [JSON_EXISTS_ALL_KEYS](json/json-exists-keys) | Returns `true` only if all keys exist | `JSON_EXISTS_ALL_KEYS('{"a":1,"b":2}', ['a','b'])` → `true` | diff --git a/tidb-cloud-lake/sql/json-object-agg.md b/tidb-cloud-lake/sql/json-object-agg.md new file mode 100644 index 0000000000000..4e133feebcb7c --- /dev/null +++ b/tidb-cloud-lake/sql/json-object-agg.md @@ -0,0 +1,60 @@ +--- +title: JSON_OBJECT_AGG +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts key-value pairs into a JSON object. For each row in the input, it generates a key-value pair where the key is derived from the `` and the value is derived from the ``. These key-value pairs are then combined into a single JSON object. + +See also: [JSON_ARRAY_AGG](/tidb-cloud-lake/sql/json-array-agg.md) + +## Syntax + +```sql +JSON_OBJECT_AGG(, ) +``` + +| Parameter | Description | +|------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------| +| key_expression | Specifies the key in the JSON object. **Only supports string** expressions. If the `key_expression` evaluates to NULL, the key-value pair is skipped. | +| value_expression | Specifies the value in the JSON object. It can be any supported data type. If the `value_expression` evaluates to NULL, the key-value pair is skipped. | + +## Return Type + +JSON object. + +## Examples + +This example demonstrates how JSON_OBJECT_AGG can be used to aggregate different types of data—such as decimals, integers, JSON variants, and arrays—into JSON objects, with the column b as the key for each JSON object: + +```sql +CREATE TABLE d ( + a DECIMAL(10, 2), + b STRING, + c INT, + d VARIANT, + e ARRAY(STRING) +); + +INSERT INTO d VALUES + (20, 'abc', NULL, '{"k":"v"}', ['a','b']), + (10, 'de', 100, 'null', []), + (4.23, NULL, 200, '"uvw"', ['x','y']), + (5.99, 'xyz', 300, '[1,2,3]', ['z']); + +SELECT + json_object_agg(b, a) AS json_a, + json_object_agg(b, c) AS json_c, + json_object_agg(b, d) AS json_d, + json_object_agg(b, e) AS json_e +FROM + d; + +-[ RECORD 1 ]----------------------------------- +json_a: {"abc":20.0,"de":10.0,"xyz":5.99} +json_c: {"de":100,"xyz":300} +json_d: {"abc":{"k":"v"},"de":null,"xyz":[1,2,3]} +json_e: {"abc":["a","b"],"de":[],"xyz":["z"]} +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/json-operators.md b/tidb-cloud-lake/sql/json-operators.md new file mode 100644 index 0000000000000..d7c226ceccb12 --- /dev/null +++ b/tidb-cloud-lake/sql/json-operators.md @@ -0,0 +1,25 @@ +--- +title: JSON Operators +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +| Operator | Description | Example | Result | +| ----------- | ------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------- | +| `->` | Retrieves a JSON array or object using an index or key, returning a JSON object. | - **Using a key**:
`SELECT '{"Databend": "Cloud Native Warehouse"}'::JSON -> 'Databend'`
- **Using an index**:
`SELECT '["Databend", "Cloud Native Warehouse"]'::JSON -> 1` | `"Cloud Native Warehouse"` | +| `->>` | Retrieves a JSON array or object using an index or key, returning a string. | - **Using a key**:
`SELECT '{"Databend": "Cloud Native Warehouse"}'::JSON ->> 'Databend'`
- **Using an index**:
`SELECT '["Databend", "Cloud Native Warehouse"]'::JSON ->> 1` | `Cloud Native Warehouse` | +| `#>` | Retrieves a JSON array or object by specifying a key path, returning a JSON object. | `SELECT '{"example": {"Databend": "Cloud Native Warehouse"}}'::JSON #> '{example, Databend}'` | `"Cloud Native Warehouse"` | +| `#>>` | Retrieves a JSON array or object by specifying a key path, returning a string. | `SELECT '{"example": {"Databend": "Cloud Native Warehouse"}}'::JSON #>> '{example, Databend}'` | `Cloud Native Warehouse` | +| `?` | Checks if the given string exists in a JSON object as a key or array, returning 1 for true and 0 for false. | `SELECT '{"a":1,"b":2,"c":3}'::JSON ? 'b'` | `true` | +| `?\|` | Checks if any string in the given array exists as a key or array element, returning 1 for true and 0 for false. | `SELECT '{"a":1,"b":2,"c":3}'::JSON ?\|` ['b','e'] | `true` | +| `?&` | Checks if each string in the given array exists as a key or array element, returning 1 for true and 0 for false. | `SELECT '{"a":1,"b":2,"c":3}'::JSON ?& ['b','e']` | `false` | +| `@>` | Checks if the left JSON expression contains all key-value pairs of the right JSON expression, returning 1 for true and 0 for false. | `SELECT '{"name":"Alice","age":30}'::JSON @> '{"name":"Alice"}'::JSON` | `true` | +| `<@` | Checks if the left JSON expression is a subset of the right JSON expression, returning 1 for true and 0 for false. | `SELECT '{"name":"Alice"}'::JSON <@ '{"name":"Bob"}'::JSON` | `false` | +| `@@` | Checks whether a specified JSON path expression matches certain conditions within a JSON data, returning 1 for true and 0 for false. | `SELECT '{"a":1,"b":[1,2,3]}'::JSON @@ '$.a == 1'` | `true` | +| `@?` | Checks whether any item is returned by the JSON path expression for the specified JSON value, returning 1 for true and 0 for false. | `SELECT '{"a":1,"b":[1,2,3]}'::JSON @? '$.b[3]'` | `false` | +| `- ''` | Deletes a key-value pair from a JSON object. | `SELECT '{"a":1,"b":2}'::JSON - 'a'` | `{"b":2}` | +| `- ` | Deletes an element at the specified index (negative integers counting from the end) from an array. | `SELECT '[1,2,3]'::JSON - 2` | `[1,2]` | +| `#-` | Deletes a key-value pair or an array element by key and/or index. | `SELECT '{"a":1,"b":[1,2,3]}'::JSON #- '{b,2}'` | `{"a":1,"b":[1,2]}` | +| \|\| | Combines multiple JSON objects into one | `SELECT '{"a": 1}'::JSON` \|\| `{"B": 1}'::JSON;` | `{"B":1,"a":1}`| diff --git a/tidb-cloud-lake/sql/json-path-exists.md b/tidb-cloud-lake/sql/json-path-exists.md new file mode 100644 index 0000000000000..faca203f1e15f --- /dev/null +++ b/tidb-cloud-lake/sql/json-path-exists.md @@ -0,0 +1,51 @@ +--- +title: JSON_PATH_EXISTS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Checks whether a specified path exists in JSON data. + +## Syntax + +```sql +JSON_PATH_EXISTS(, ) +``` + +- json_data: Specifies the JSON data you want to search within. It can be a JSON object or an array. + +- json_path_expression: Specifies the path, starting from the root of the JSON data represented by `$`, that you want to check within the JSON data. You can also include conditions within the expression, using `@` to refer to the current node or element being evaluated, to filter the results. + +## Return Type + +The function returns: + +- `true` if the specified JSON path (and conditions if any) exists within the JSON data. +- `false` if the specified JSON path (and conditions if any) does not exist within the JSON data. +- NULL if either the json_data or json_path_expression is NULL or invalid. + +## Examples + +```sql +SELECT JSON_PATH_EXISTS(parse_json('{"a": 1, "b": 2}'), '$.a ? (@ == 1)'); + +---- +true + + +SELECT JSON_PATH_EXISTS(parse_json('{"a": 1, "b": 2}'), '$.a ? (@ > 1)'); + +---- +false + +SELECT JSON_PATH_EXISTS(NULL, '$.a'); + +---- +NULL + +SELECT JSON_PATH_EXISTS(parse_json('{"a": 1, "b": 2}'), NULL); + +---- +NULL +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/json-path-match.md b/tidb-cloud-lake/sql/json-path-match.md new file mode 100644 index 0000000000000..cd2f0043c2bf8 --- /dev/null +++ b/tidb-cloud-lake/sql/json-path-match.md @@ -0,0 +1,74 @@ +--- +title: JSON_PATH_MATCH +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Checks whether a specified JSON path expression matches certain conditions within a JSON data. Please note that the `@@` operator is synonymous with this function. For more information, see [JSON Operators](/tidb-cloud-lake/sql/json-operators.md). + +## Syntax + +```sql +JSON_PATH_MATCH(, ) +``` + +- `json_data`: Specifies the JSON data you want to examine. It can be a JSON object or an array. +- `json_path_expression`: Specifies the conditions to be checked within the JSON data. This expression describes the specific path or criteria to be matched, such as verifying whether specific field values in the JSON structure meet certain conditions. The `$` symbol represents the root of the JSON data. It is used to start the path expression and indicates the top-level object in the JSON structure. + +## Return Type + +The function returns: + +- `true` if the specified JSON path expression matches the conditions within the JSON data. +- `false` if the specified JSON path expression does not match the conditions within the JSON data. +- NULL if either `json_data` or `json_path_expression` is NULL or invalid. + +## Examples + +```sql +-- Check if the value at JSON path $.a is equal to 1 +SELECT JSON_PATH_MATCH(parse_json('{"a":1,"b":[1,2,3]}'), '$.a == 1'); + +┌────────────────────────────────────────────────────────────────┐ +│ json_path_match(parse_json('{"a":1,"b":[1,2,3]}'), '$.a == 1') │ +├────────────────────────────────────────────────────────────────┤ +│ true │ +└────────────────────────────────────────────────────────────────┘ + +-- Check if the first element in the array at JSON path $.b is greater than 1 +SELECT JSON_PATH_MATCH(parse_json('{"a":1,"b":[1,2,3]}'), '$.b[0] > 1'); + +┌──────────────────────────────────────────────────────────────────┐ +│ json_path_match(parse_json('{"a":1,"b":[1,2,3]}'), '$.b[0] > 1') │ +├──────────────────────────────────────────────────────────────────┤ +│ false │ +└──────────────────────────────────────────────────────────────────┘ + +-- Check if any element in the array at JSON path $.b +-- from the second one to the last are greater than or equal to 2 +SELECT JSON_PATH_MATCH(parse_json('{"a":1,"b":[1,2,3]}'), '$.b[1 to last] >= 2'); + +┌───────────────────────────────────────────────────────────────────────────┐ +│ json_path_match(parse_json('{"a":1,"b":[1,2,3]}'), '$.b[1 to last] >= 2') │ +├───────────────────────────────────────────────────────────────────────────┤ +│ true │ +└───────────────────────────────────────────────────────────────────────────┘ + +-- NULL is returned if either the json_data or json_path_expression is NULL or invalid. +SELECT JSON_PATH_MATCH(parse_json('{"a":1,"b":[1,2,3]}'), NULL); + +┌──────────────────────────────────────────────────────────┐ +│ json_path_match(parse_json('{"a":1,"b":[1,2,3]}'), null) │ +├──────────────────────────────────────────────────────────┤ +│ NULL │ +└──────────────────────────────────────────────────────────┘ + +SELECT JSON_PATH_MATCH(NULL, '$.a == 1'); + +┌───────────────────────────────────┐ +│ json_path_match(null, '$.a == 1') │ +├───────────────────────────────────┤ +│ NULL │ +└───────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/json-path-query-array.md b/tidb-cloud-lake/sql/json-path-query-array.md new file mode 100644 index 0000000000000..db52f6c992cfa --- /dev/null +++ b/tidb-cloud-lake/sql/json-path-query-array.md @@ -0,0 +1,52 @@ +--- +title: JSON_PATH_QUERY_ARRAY +--- + +Get all JSON items returned by JSON path for the specified JSON value and wrap a result into an array. + +## Syntax + +```sql +JSON_PATH_QUERY_ARRAY(, '') +``` + + +## Return Type + +VARIANT + +## Example + +**Create a Table and Insert Sample Data** + +```sql +CREATE TABLE products ( + name VARCHAR, + details VARIANT +); + +INSERT INTO products (name, details) +VALUES ('Laptop', '{"brand": "Dell", "colors": ["Black", "Silver"], "price": 1200, "features": {"ram": "16GB", "storage": "512GB"}}'), + ('Smartphone', '{"brand": "Apple", "colors": ["White", "Black"], "price": 999, "features": {"ram": "4GB", "storage": "128GB"}}'), + ('Headphones', '{"brand": "Sony", "colors": ["Black", "Blue", "Red"], "price": 150, "features": {"battery": "20h", "bluetooth": "5.0"}}'); +``` + +**Query Demo: Extracting All Features from Product Details as an Array** + +```sql +SELECT + name, + JSON_PATH_QUERY_ARRAY(details, '$.features.*') AS all_features +FROM + products; +``` + +**Result** + +``` + name | all_features +-----------+----------------------- + Laptop | ["16GB", "512GB"] + Smartphone | ["4GB", "128GB"] + Headphones | ["20h", "5.0"] +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/json-path-query-first.md b/tidb-cloud-lake/sql/json-path-query-first.md new file mode 100644 index 0000000000000..db420c107d46c --- /dev/null +++ b/tidb-cloud-lake/sql/json-path-query-first.md @@ -0,0 +1,58 @@ +--- +title: JSON_PATH_QUERY_FIRST +--- + +Get the first JSON item returned by JSON path for the specified JSON value. + +## Syntax + +```sql +JSON_PATH_QUERY_FIRST(, '') +``` + + +## Return Type + +VARIANT + +## Example + +**Create a Table and Insert Sample Data** + +```sql +CREATE TABLE products ( + name VARCHAR, + details VARIANT +); + +INSERT INTO products (name, details) +VALUES ('Laptop', '{"brand": "Dell", "colors": ["Black", "Silver"], "price": 1200, "features": {"ram": "16GB", "storage": "512GB"}}'), + ('Smartphone', '{"brand": "Apple", "colors": ["White", "Black"], "price": 999, "features": {"ram": "4GB", "storage": "128GB"}}'), + ('Headphones', '{"brand": "Sony", "colors": ["Black", "Blue", "Red"], "price": 150, "features": {"battery": "20h", "bluetooth": "5.0"}}'); +``` + +**Query Demo: Extracting the First Feature from Product Details** + +```sql +SELECT + name, + JSON_PATH_QUERY(details, '$.features.*') AS all_features, + JSON_PATH_QUERY_FIRST(details, '$.features.*') AS first_feature +FROM + products; +``` + +**Result** + +```sql ++------------+--------------+---------------+ +| name | all_features | first_feature | ++------------+--------------+---------------+ +| Laptop | "16GB" | "16GB" | +| Laptop | "512GB" | "16GB" | +| Smartphone | "4GB" | "4GB" | +| Smartphone | "128GB" | "4GB" | +| Headphones | "20h" | "20h" | +| Headphones | "5.0" | "20h" | ++------------+--------------+---------------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/json-path-query.md b/tidb-cloud-lake/sql/json-path-query.md new file mode 100644 index 0000000000000..3370832027856 --- /dev/null +++ b/tidb-cloud-lake/sql/json-path-query.md @@ -0,0 +1,57 @@ +--- +title: JSON_PATH_QUERY +--- + +Get all JSON items returned by JSON path for the specified JSON value. + +## Syntax + +```sql +JSON_PATH_QUERY(, '') +``` + + +## Return Type + +VARIANT + +## Example + +**Create a Table and Insert Sample Data** + +```sql +CREATE TABLE products ( + name VARCHAR, + details VARIANT +); + +INSERT INTO products (name, details) +VALUES ('Laptop', '{"brand": "Dell", "colors": ["Black", "Silver"], "price": 1200, "features": {"ram": "16GB", "storage": "512GB"}}'), + ('Smartphone', '{"brand": "Apple", "colors": ["White", "Black"], "price": 999, "features": {"ram": "4GB", "storage": "128GB"}}'), + ('Headphones', '{"brand": "Sony", "colors": ["Black", "Blue", "Red"], "price": 150, "features": {"battery": "20h", "bluetooth": "5.0"}}'); +``` + +**Query Demo: Extracting All Features from Product Details** + +```sql +SELECT + name, + JSON_PATH_QUERY(details, '$.features.*') AS all_features +FROM + products; +``` + +**Result** + +```sql ++------------+--------------+ +| name | all_features | ++------------+--------------+ +| Laptop | "16GB" | +| Laptop | "512GB" | +| Smartphone | "4GB" | +| Smartphone | "128GB" | +| Headphones | "20h" | +| Headphones | "5.0" | ++------------+--------------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/json-pretty.md b/tidb-cloud-lake/sql/json-pretty.md new file mode 100644 index 0000000000000..05a8e8161dd4e --- /dev/null +++ b/tidb-cloud-lake/sql/json-pretty.md @@ -0,0 +1,51 @@ +--- +title: JSON_PRETTY +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Formats JSON data, making it more readable and presentable. It automatically adds indentation, line breaks, and other formatting to the JSON data for better visual representation. + +## Syntax + +```sql +JSON_PRETTY() +``` + +## Return Type + +String. + +## Examples + +```sql +SELECT JSON_PRETTY(PARSE_JSON('{"name":"Alice","age":30}')); + +--- +┌──────────────────────────────────────────────────────┐ +│ json_pretty(parse_json('{"name":"alice","age":30}')) │ +│ String │ +├──────────────────────────────────────────────────────┤ +│ { │ +│ "age": 30, │ +│ "name": "Alice" │ +│ } │ +└──────────────────────────────────────────────────────┘ + +SELECT JSON_PRETTY(PARSE_JSON('{"person": {"name": "Bob", "age": 25}, "location": "City"}')); + +--- +┌───────────────────────────────────────────────────────────────────────────────────────┐ +│ json_pretty(parse_json('{"person": {"name": "bob", "age": 25}, "location": "city"}')) │ +│ String │ +├───────────────────────────────────────────────────────────────────────────────────────┤ +│ { │ +│ "location": "City", │ +│ "person": { │ +│ "age": 25, │ +│ "name": "Bob" │ +│ } │ +│ } │ +└───────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/json-to-string.md b/tidb-cloud-lake/sql/json-to-string.md new file mode 100644 index 0000000000000..afcd0514142d5 --- /dev/null +++ b/tidb-cloud-lake/sql/json-to-string.md @@ -0,0 +1,8 @@ +--- +title: JSON_TO_STRING +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Alias for [TO_STRING](/tidb-cloud-lake/sql/to-string.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/json-typeof.md b/tidb-cloud-lake/sql/json-typeof.md new file mode 100644 index 0000000000000..c1e93a69e08fc --- /dev/null +++ b/tidb-cloud-lake/sql/json-typeof.md @@ -0,0 +1,74 @@ +--- +title: JSON_TYPEOF +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the type of the main-level of a JSON structure. + +## Syntax + +```sql +JSON_TYPEOF() +``` + +## Return Type + +The return type of the json_typeof function (or similar) is a string that indicates the data type of the parsed JSON value. The possible return values are: 'null', 'boolean', 'string', 'number', 'array', and 'object'. + +## Examples + +```sql +-- Parsing a JSON value that is NULL +SELECT JSON_TYPEOF(PARSE_JSON(NULL)); + +-- +json_typeof(parse_json(null))| +-----------------------------+ + | + +-- Parsing a JSON value that is the string 'null' +SELECT JSON_TYPEOF(PARSE_JSON('null')); + +-- +json_typeof(parse_json('null'))| +-------------------------------+ +null | + +SELECT JSON_TYPEOF(PARSE_JSON('true')); + +-- +json_typeof(parse_json('true'))| +-------------------------------+ +boolean | + +SELECT JSON_TYPEOF(PARSE_JSON('"Databend"')); + +-- +json_typeof(parse_json('"databend"'))| +-------------------------------------+ +string | + + +SELECT JSON_TYPEOF(PARSE_JSON('-1.23')); + +-- +json_typeof(parse_json('-1.23'))| +--------------------------------+ +number | + +SELECT JSON_TYPEOF(PARSE_JSON('[1,2,3]')); + +-- +json_typeof(parse_json('[1,2,3]'))| +----------------------------------+ +array | + +SELECT JSON_TYPEOF(PARSE_JSON('{"name": "Alice", "age": 30}')); + +-- +json_typeof(parse_json('{"name": "alice", "age": 30}'))| +-------------------------------------------------------+ +object | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/kill.md b/tidb-cloud-lake/sql/kill.md new file mode 100644 index 0000000000000..4ae17e497e09f --- /dev/null +++ b/tidb-cloud-lake/sql/kill.md @@ -0,0 +1,27 @@ +--- +title: KILL +--- + +Forcibly terminates the currently running queries. + +See also: [SHOW PROCESSLIST](/tidb-cloud-lake/sql/show-processlist.md) + +## Syntax + +```sql +KILL QUERY|CONNECTION +``` + +## Examples + +```sql +SHOW PROCESSLIST; ++--------------------------------------+-------+-----------------+------+-------+----------+--------------------------------------------------------------------------------------+--------------+------------------------+-------------------------+-------------------------+--------------------------+ +| id | type | host | user | state | database | extra_info | memory_usage | dal_metrics_read_bytes | dal_metrics_write_bytes | scan_progress_read_rows | scan_progress_read_bytes | ++--------------------------------------+-------+-----------------+------+-------+----------+--------------------------------------------------------------------------------------+--------------+------------------------+-------------------------+-------------------------+--------------------------+ +| e04dd121-88f4-4290-85be-2b45c6e3b011 | MySQL | 127.0.0.1:65291 | root | Query | default | SELECT sum(number) from numbers_mt(10000000000) group by number%3, number%4,number%5 | 0 | 0 | 0 | 2391200000 | 19129600000 | +| 179c99d5-1894-4d4c-a89e-4b293d404c88 | MySQL | 127.0.0.1:64597 | root | Query | default | show processlist | 0 | 0 | 0 | 0 | 0 | ++--------------------------------------+-------+-----------------+------+-------+----------+--------------------------------------------------------------------------------------+--------------+------------------------+-------------------------+-------------------------+--------------------------+ + +KILL QUERY 'e04dd121-88f4-4290-85be-2b45c6e3b011'; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/kurtosis.md b/tidb-cloud-lake/sql/kurtosis.md new file mode 100644 index 0000000000000..8afe36e943125 --- /dev/null +++ b/tidb-cloud-lake/sql/kurtosis.md @@ -0,0 +1,57 @@ +--- +title: KURTOSIS +--- + +Aggregate function. + +The `KURTOSIS()` function returns the excess kurtosis of all input values. + +## Syntax + +```sql +KURTOSIS() +``` + +## Arguments + +| Arguments | Description | +|-----------| ----------- | +| `` | Any numerical expression | + +## Return Type + +Nullable Float64. + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE stock_prices ( + id INT, + stock_symbol VARCHAR, + price FLOAT +); + +INSERT INTO stock_prices (id, stock_symbol, price) +VALUES (1, 'AAPL', 150), + (2, 'AAPL', 152), + (3, 'AAPL', 148), + (4, 'AAPL', 160), + (5, 'AAPL', 155); +``` + +**Query Demo: Calculate Excess Kurtosis for Apple Stock Prices** + +```sql +SELECT KURTOSIS(price) AS excess_kurtosis +FROM stock_prices +WHERE stock_symbol = 'AAPL'; +``` + +**Result** + +```sql +| excess_kurtosis | +|-------------------------| +| 0.06818181325581445 | +``` diff --git a/tidb-cloud-lake/sql/l1-distance.md b/tidb-cloud-lake/sql/l1-distance.md new file mode 100644 index 0000000000000..6138029468006 --- /dev/null +++ b/tidb-cloud-lake/sql/l1-distance.md @@ -0,0 +1,88 @@ +--- +title: 'L1_DISTANCE' +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Calculates the Manhattan (L1) distance between two vectors, measuring the sum of absolute differences between corresponding elements. + +## Syntax + +```sql +L1_DISTANCE(vector1, vector2) +``` + +## Arguments + +- `vector1`: First vector (VECTOR Data Type) +- `vector2`: Second vector (VECTOR Data Type) + +## Returns + +Returns a FLOAT value representing the Manhattan (L1) distance between the two vectors. The value is always non-negative: +- 0: Identical vectors +- Larger values: Vectors that are farther apart + +## Description + +The L1 distance, also known as Manhattan distance or taxicab distance, calculates the sum of absolute differences between corresponding elements of two vectors. It's useful for feature comparison and sparse data analysis. + +Formula: `L1_DISTANCE(a, b) = |a1 - b1| + |a2 - b2| + ... + |an - bn|` + +## Examples + +### Basic Usage + +```sql +-- Calculate L1 distance between two vectors +SELECT L1_DISTANCE([1.0, 2.0, 3.0]::vector(3), [4.0, 5.0, 6.0]::vector(3)) AS distance; +``` + +Result: +``` +╭──────────╮ +│ distance │ +├──────────┤ +│ 9 │ +╰──────────╯ +``` + +Create a table with vector data: + +```sql +CREATE OR REPLACE TABLE vectors ( + id INT, + vec VECTOR(3) +); + +INSERT INTO vectors VALUES + (1, [1.0000, 2.0000, 3.0000]), + (2, [1.0000, 2.2000, 3.0000]), + (3, [4.0000, 5.0000, 6.0000]); +``` + +Find the vector closest to [1, 2, 3] using L1 distance: + +```sql +SELECT + id, + vec, + L1_DISTANCE(vec, [1.0000, 2.0000, 3.0000]::VECTOR(3)) AS distance +FROM + vectors +ORDER BY + distance ASC; +``` + +``` +╭─────────────────────────────╮ +│ id │ vec │ distance │ +├────┼───────────┼────────────┤ +│ 1 │ [1,2,3] │ 0 │ +│ 2 │ [1,2.2,3] │ 0.20000005 │ +│ 3 │ [4,5,6] │ 9 │ +╰─────────────────────────────╯ +``` + diff --git a/tidb-cloud-lake/sql/l2-distance.md b/tidb-cloud-lake/sql/l2-distance.md new file mode 100644 index 0000000000000..755189fed1ec8 --- /dev/null +++ b/tidb-cloud-lake/sql/l2-distance.md @@ -0,0 +1,104 @@ +--- +title: 'L2_DISTANCE' +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Calculates the Euclidean (L2) distance between two vectors, measuring the straight-line distance between them in vector space. + +## Syntax + +```sql +L2_DISTANCE(vector1, vector2) +``` + +## Arguments + +- `vector1`: First vector (VECTOR Data Type) +- `vector2`: Second vector (VECTOR Data Type) + +## Returns + +Returns a FLOAT value representing the Euclidean (L2) distance between the two vectors. The value is always non-negative: +- 0: Identical vectors +- Larger values: Vectors that are farther apart + +## Description + +The L2 distance, also known as Euclidean distance, measures the straight-line distance between two points in Euclidean space. It is one of the most common metrics used in vector similarity search and machine learning applications. + +The function: + +1. Verifies that both input vectors have the same length +2. Computes the sum of squared differences between corresponding elements +3. Returns the square root of this sum + +The mathematical formula implemented is: + +``` +L2_distance(v1, v2) = √(Σ(v1ᵢ - v2ᵢ)²) +``` + +Where v1ᵢ and v2ᵢ are the elements of the input vectors. + +:::info +- This function performs vector computations within Databend and does not rely on external APIs. +::: + +## Examples + +### Basic Usage + +```sql +-- Calculate L2 distance between two vectors +SELECT L2_DISTANCE([1.0, 2.0, 3.0]::vector(3), [4.0, 5.0, 6.0]::vector(3)) AS distance; +``` + +Result: +``` +╭──────────╮ +│ distance │ +├──────────┤ +│ 5.196152 │ +╰──────────╯ +``` + +Create a table with vector data: + +```sql +CREATE OR REPLACE TABLE vectors ( + id INT, + vec VECTOR(3) +); + +INSERT INTO vectors VALUES + (1, [1.0000, 2.0000, 3.0000]), + (2, [1.0000, 2.2000, 3.0000]), + (3, [4.0000, 5.0000, 6.0000]); +``` + +Find the vector closest to [1, 2, 3] using L2 distance: + +```sql +SELECT + id, + vec, + L2_DISTANCE(vec, [1.0000, 2.0000, 3.0000]::VECTOR(3)) AS distance +FROM + vectors +ORDER BY + distance ASC; +``` + +``` +╭─────────────────────────────╮ +│ id │ vec │ distance │ +├────┼───────────┼────────────┤ +│ 1 │ [1,2,3] │ 0 │ +│ 2 │ [1,2.2,3] │ 0.20000005 │ +│ 3 │ [4,5,6] │ 5.196152 │ +╰─────────────────────────────╯ +``` + diff --git a/tidb-cloud-lake/sql/lag.md b/tidb-cloud-lake/sql/lag.md new file mode 100644 index 0000000000000..23bccb8eb3148 --- /dev/null +++ b/tidb-cloud-lake/sql/lag.md @@ -0,0 +1,95 @@ +--- +title: LAG +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the value from a previous row in the result set. + +See also: [LEAD](/tidb-cloud-lake/sql/lead.md) + +## Syntax + +```sql +LAG( + expression + [, offset ] + [, default ] +) +OVER ( + [ PARTITION BY partition_expression ] + ORDER BY sort_expression +) +``` + +**Arguments:** +- `expression`: The column or expression to evaluate +- `offset`: Number of rows before the current row (default: 1) +- `default`: Value to return when no previous row exists (default: NULL) + +**Notes:** +- Negative offset values work like LEAD function +- Returns NULL if the offset goes beyond partition boundaries + +## Examples + +```sql +-- Create sample data +CREATE TABLE scores ( + student VARCHAR(20), + test_date DATE, + score INT +); + +INSERT INTO scores VALUES + ('Alice', '2024-01-01', 85), + ('Alice', '2024-02-01', 90), + ('Alice', '2024-03-01', 88), + ('Bob', '2024-01-01', 78), + ('Bob', '2024-02-01', 82), + ('Bob', '2024-03-01', 85); +``` + +**Get previous test score for each student:** + +```sql +SELECT student, test_date, score, + LAG(score) OVER (PARTITION BY student ORDER BY test_date) AS previous_score +FROM scores +ORDER BY student, test_date; +``` + +Result: +``` +student | test_date | score | previous_score +--------+------------+-------+--------------- +Alice | 2024-01-01 | 85 | NULL +Alice | 2024-02-01 | 90 | 85 +Alice | 2024-03-01 | 88 | 90 +Bob | 2024-01-01 | 78 | NULL +Bob | 2024-02-01 | 82 | 78 +Bob | 2024-03-01 | 85 | 82 +``` + +**Get score from 2 tests ago:** + +```sql +SELECT student, test_date, score, + LAG(score, 2, 0) OVER (PARTITION BY student ORDER BY test_date) AS score_2_tests_ago +FROM scores +ORDER BY student, test_date; +``` + +Result: +``` +student | test_date | score | score_2_tests_ago +--------+------------+-------+------------------ +Alice | 2024-01-01 | 85 | 0 +Alice | 2024-02-01 | 90 | 0 +Alice | 2024-03-01 | 88 | 85 +Bob | 2024-01-01 | 78 | 0 +Bob | 2024-02-01 | 82 | 0 +Bob | 2024-03-01 | 85 | 78 +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/last-day.md b/tidb-cloud-lake/sql/last-day.md new file mode 100644 index 0000000000000..40566a741aba6 --- /dev/null +++ b/tidb-cloud-lake/sql/last-day.md @@ -0,0 +1,38 @@ +--- +title: LAST_DAY +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the last day of the specified interval (week, month, quarter, or year) based on the provided date or timestamp. + +## Syntax + +```sql +LAST_DAY(, ) +``` + +| Parameter | Description | +|---------------------|---------------------------------------------------------------------------------------------------------------| +| `` | A DATE or TIMESTAMP value to calculate the last day of the specified interval. | +| `` | The date_part for which to find the last day. Accepted values are `week`, `month`, `quarter`, and `year`. | + +## Return Type + +Date. + +## Examples + +Let's say you want to determine the billing date, which is always the last day of the month, based on an arbitrary date of a transaction (e.g., 2024-11-13): + +```sql +SELECT LAST_DAY(to_date('2024-11-13'), month) AS billing_date; + +┌──────────────┐ +│ billing_date │ +├──────────────┤ +│ 2024-11-30 │ +└──────────────┘ +``` diff --git a/tidb-cloud-lake/sql/last-query-id.md b/tidb-cloud-lake/sql/last-query-id.md new file mode 100644 index 0000000000000..c42e3589043c5 --- /dev/null +++ b/tidb-cloud-lake/sql/last-query-id.md @@ -0,0 +1,140 @@ +--- +title: LAST_QUERY_ID +--- + +Returns the ID of a query in the current session based on its order. + +:::note +This function is currently supported only through the MySQL protocol, meaning you must connect to Databend using a MySQL protocol-compatible client for it to work. +::: + +## Syntax + +```sql +LAST_QUERY_ID() +``` +`index` specifies the query order in the current session, accepting positive and negative numbers, with a default value of `-1`. +- Positive indexes (starting from `1`) retrieve the nth query from the session start. +- Negative indexes retrieve the nth query backward from the current query. + - When `index` is `-1`, it returns the query ID of the current query. + - To retrieve the previous query, set `index` to `-2`. +- NULL is returned if an index exceeds the query history. + +## Examples + +This example runs three simple queries in a new session, then uses both positive and negative indexes to retrieve the query ID of `SELECT 3`: + +| | Positive | Negative | +|----------------------------------------------|----------|----------| +| `SELECT 1` | 1 | -4 | +| `SELECT 2` | 2 | -3 | +| `SELECT 3` | 3 | -2 | +| `SELECT LAST_QUERY_ID(-2), LAST_QUERY_ID(3)` | 4 | -1 | + +```bash +MacBook-Air:~ eric$ mysql -u root -h 127.0.0.1 -P 3307 +Welcome to the MySQL monitor. Commands end with ; or \g. +Your MySQL connection id is 9 +Server version: 8.0.90-v1.2.720-nightly-2280cc9480(rust-1.85.0-nightly-2025-04-08T04:40:36.379825500Z) 0 + +Copyright (c) 2000, 2025, Oracle and/or its affiliates. + +Oracle is a registered trademark of Oracle Corporation and/or its +affiliates. Other names may be trademarks of their respective +owners. + +Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. + +mysql> select 1; ++------+ +| 1 | ++------+ +| 1 | ++------+ +1 row in set (0.02 sec) +Read 1 rows, 1.00 B in 0.004 sec., 264.46 rows/sec., 264.46 B/sec. + +mysql> select 2; ++------+ +| 2 | ++------+ +| 2 | ++------+ +1 row in set (0.01 sec) +Read 1 rows, 1.00 B in 0.003 sec., 366.94 rows/sec., 366.94 B/sec. + +mysql> select 3; ++------+ +| 3 | ++------+ +| 3 | ++------+ +1 row in set (0.01 sec) +Read 1 rows, 1.00 B in 0.003 sec., 373.16 rows/sec., 373.16 B/sec. + +mysql> SELECT LAST_QUERY_ID(-2), LAST_QUERY_ID(3); ++--------------------------------------+--------------------------------------+ +| last_query_id(- 2) | last_query_id(3) | ++--------------------------------------+--------------------------------------+ +| 74dd6dca-f9b0-44cd-99f4-ac7d11d47fee | 74dd6dca-f9b0-44cd-99f4-ac7d11d47fee | ++--------------------------------------+--------------------------------------+ +1 row in set (0.02 sec) +Read 1 rows, 1.00 B in 0.006 sec., 167.95 rows/sec., 167.95 B/sec. +``` + +This example demonstrates that the function returns the query ID of the current query when `` is `-1`: + +```bash +MacBook-Air:~ eric$ mysql -u root -h 127.0.0.1 -P 3307 +Welcome to the MySQL monitor. Commands end with ; or \g. +Your MySQL connection id is 10 +Server version: 8.0.90-v1.2.720-nightly-2280cc9480(rust-1.85.0-nightly-2025-04-08T04:40:36.379825500Z) 0 + +Copyright (c) 2000, 2025, Oracle and/or its affiliates. + +Oracle is a registered trademark of Oracle Corporation and/or its +affiliates. Other names may be trademarks of their respective +owners. + +Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. + +mysql> SELECT LAST_QUERY_ID(-1), LAST_QUERY_ID(); ++--------------------------------------+--------------------------------------+ +| last_query_id(- 1) | last_query_id() | ++--------------------------------------+--------------------------------------+ +| 5a1afbc2-dc16-4b69-a0e6-615e0b970cb1 | 5a1afbc2-dc16-4b69-a0e6-615e0b970cb1 | ++--------------------------------------+--------------------------------------+ +1 row in set (0.01 sec) +Read 1 rows, 1.00 B in 0.003 sec., 393.68 rows/sec., 393.68 B/sec. + +mysql> SELECT LAST_QUERY_ID(-2); ++--------------------------------------+ +| last_query_id(- 2) | ++--------------------------------------+ +| 5a1afbc2-dc16-4b69-a0e6-615e0b970cb1 | ++--------------------------------------+ +1 row in set (0.01 sec) +Read 1 rows, 1.00 B in 0.003 sec., 381.61 rows/sec., 381.61 B/sec. + +mysql> SELECT LAST_QUERY_ID(1); ++--------------------------------------+ +| last_query_id(1) | ++--------------------------------------+ +| 5a1afbc2-dc16-4b69-a0e6-615e0b970cb1 | ++--------------------------------------+ +1 row in set (0.01 sec) +Read 1 rows, 1.00 B in 0.003 sec., 353.63 rows/sec., 353.63 B/sec. +``` + +When the `index` exceeds the query history, NULL is returned. + +```bash +mysql> SELECT LAST_QUERY_ID(-100), LAST_QUERY_ID(100); ++----------------------+--------------------+ +| last_query_id(- 100) | last_query_id(100) | ++----------------------+--------------------+ +| | | ++----------------------+--------------------+ +1 row in set (0.02 sec) +Read 1 rows, 1.00 B in 0.008 sec., 128.69 rows/sec., 128.69 B/sec. +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/last-value.md b/tidb-cloud-lake/sql/last-value.md new file mode 100644 index 0000000000000..a164e3bb05a42 --- /dev/null +++ b/tidb-cloud-lake/sql/last-value.md @@ -0,0 +1,149 @@ +--- +title: LAST_VALUE +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the last value in the window frame. + +See also: + +- [FIRST_VALUE](/tidb-cloud-lake/sql/first-value.md) +- [NTH_VALUE](/tidb-cloud-lake/sql/nth-value.md) + +## Syntax + +```sql +LAST_VALUE(expression) [ { RESPECT | IGNORE } NULLS ] +OVER ( + [ PARTITION BY partition_expression ] + ORDER BY sort_expression [ ASC | DESC ] + [ window_frame ] +) +``` + +**Arguments:** +- `expression`: Required. The column or expression to return the last value from. +- `PARTITION BY`: Optional. Divides rows into partitions. +- `ORDER BY`: Required. Determines the ordering within the window. +- `window_frame`: Optional. Defines the window frame. The default is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`. + +**Notes:** +- Returns the last value in the ordered window frame. +- Supports `IGNORE NULLS` to skip null values and `RESPECT NULLS` to keep the default behaviour. +- Use a frame that ends after the current row (for example, `ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING`) when you need the true last row of a partition. +- Useful for finding the latest value in each group, or the most recent value inside a look-ahead window. + +## Examples + +```sql +-- Sample order data +CREATE OR REPLACE TABLE orders_window_demo ( + customer VARCHAR, + order_id INT, + order_time TIMESTAMP, + amount INT, + sales_rep VARCHAR +); + +INSERT INTO orders_window_demo VALUES + ('Alice', 1001, to_timestamp('2024-05-01 09:00:00'), 120, 'Erin'), + ('Alice', 1002, to_timestamp('2024-05-01 11:00:00'), 135, NULL), + ('Alice', 1003, to_timestamp('2024-05-02 14:30:00'), 125, 'Glen'), + ('Bob', 1004, to_timestamp('2024-05-01 08:30:00'), 90, NULL), + ('Bob', 1005, to_timestamp('2024-05-01 20:15:00'), 105, 'Kai'), + ('Bob', 1006, to_timestamp('2024-05-03 10:00:00'), 95, NULL), + ('Carol', 1007, to_timestamp('2024-05-04 09:45:00'), 80, 'Lily'); +``` + +**Example 1. Latest order in each customer partition** + +```sql +SELECT customer, + order_id, + order_time, + LAST_VALUE(order_id) OVER ( + PARTITION BY customer + ORDER BY order_time + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + ) AS last_order_for_customer +FROM orders_window_demo +ORDER BY customer, order_time; +``` + +Result: +``` +customer | order_id | order_time | last_order_for_customer +---------+----------+----------------------+------------------------- +Alice | 1001 | 2024-05-01 09:00:00 | 1003 +Alice | 1002 | 2024-05-01 11:00:00 | 1003 +Alice | 1003 | 2024-05-02 14:30:00 | 1003 +Bob | 1004 | 2024-05-01 08:30:00 | 1006 +Bob | 1005 | 2024-05-01 20:15:00 | 1006 +Bob | 1006 | 2024-05-03 10:00:00 | 1006 +Carol | 1007 | 2024-05-04 09:45:00 | 1007 +``` + +**Example 2. Peek 12 hours ahead within each customer** + +```sql +SELECT customer, + order_id, + order_time, + amount, + LAST_VALUE(amount) OVER ( + PARTITION BY customer + ORDER BY order_time + RANGE BETWEEN CURRENT ROW AND INTERVAL 12 HOUR FOLLOWING + ) AS last_amount_next_12h +FROM orders_window_demo +ORDER BY customer, order_time; +``` + +Result: +``` +customer | order_id | order_time | amount | last_amount_next_12h +---------+----------+----------------------+--------+---------------------- +Alice | 1001 | 2024-05-01 09:00:00 | 120 | 135 +Alice | 1002 | 2024-05-01 11:00:00 | 135 | 135 +Alice | 1003 | 2024-05-02 14:30:00 | 125 | 125 +Bob | 1004 | 2024-05-01 08:30:00 | 90 | 105 +Bob | 1005 | 2024-05-01 20:15:00 | 105 | 105 +Bob | 1006 | 2024-05-03 10:00:00 | 95 | 95 +Carol | 1007 | 2024-05-04 09:45:00 | 80 | 80 +``` + +**Example 3. Skip nulls when scanning forward for the last sales rep** + +```sql +SELECT customer, + order_id, + sales_rep, + LAST_VALUE(sales_rep) RESPECT NULLS OVER ( + PARTITION BY customer + ORDER BY order_time + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + ) AS last_rep_respect, + LAST_VALUE(sales_rep) IGNORE NULLS OVER ( + PARTITION BY customer + ORDER BY order_time + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + ) AS last_rep_ignore +FROM orders_window_demo +ORDER BY customer, order_id; +``` + +Result: +``` +customer | order_id | sales_rep | last_rep_respect | last_rep_ignore +---------+----------+-----------+------------------+----------------- +Alice | 1001 | Erin | Glen | Glen +Alice | 1002 | NULL | Glen | Glen +Alice | 1003 | Glen | Glen | Glen +Bob | 1004 | NULL | NULL | Kai +Bob | 1005 | Kai | NULL | Kai +Bob | 1006 | NULL | NULL | Kai +Carol | 1007 | Lily | Lily | Lily +``` diff --git a/tidb-cloud-lake/sql/last.md b/tidb-cloud-lake/sql/last.md new file mode 100644 index 0000000000000..6b8f1a8d25ebb --- /dev/null +++ b/tidb-cloud-lake/sql/last.md @@ -0,0 +1,9 @@ +--- +title: LAST +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Alias for [LAST_VALUE](/tidb-cloud-lake/sql/last-value.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/lcase.md b/tidb-cloud-lake/sql/lcase.md new file mode 100644 index 0000000000000..b224c2a7e778a --- /dev/null +++ b/tidb-cloud-lake/sql/lcase.md @@ -0,0 +1,5 @@ +--- +title: LCASE +--- + +Alias for [LOWER](/tidb-cloud-lake/sql/lower.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/lead.md b/tidb-cloud-lake/sql/lead.md new file mode 100644 index 0000000000000..9ec4393ce12ea --- /dev/null +++ b/tidb-cloud-lake/sql/lead.md @@ -0,0 +1,95 @@ +--- +title: LEAD +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the value from a subsequent row in the result set. + +See also: [LAG](/tidb-cloud-lake/sql/lag.md) + +## Syntax + +```sql +LEAD( + expression + [, offset ] + [, default ] +) +OVER ( + [ PARTITION BY partition_expression ] + ORDER BY sort_expression +) +``` + +**Arguments:** +- `expression`: The column or expression to evaluate +- `offset`: Number of rows after the current row (default: 1) +- `default`: Value to return when no next row exists (default: NULL) + +**Notes:** +- Negative offset values work like LAG function +- Returns NULL if the offset goes beyond partition boundaries + +## Examples + +```sql +-- Create sample data +CREATE TABLE scores ( + student VARCHAR(20), + test_date DATE, + score INT +); + +INSERT INTO scores VALUES + ('Alice', '2024-01-01', 85), + ('Alice', '2024-02-01', 90), + ('Alice', '2024-03-01', 88), + ('Bob', '2024-01-01', 78), + ('Bob', '2024-02-01', 82), + ('Bob', '2024-03-01', 85); +``` + +**Get next test score for each student:** + +```sql +SELECT student, test_date, score, + LEAD(score) OVER (PARTITION BY student ORDER BY test_date) AS next_score +FROM scores +ORDER BY student, test_date; +``` + +Result: +``` +student | test_date | score | next_score +--------+------------+-------+----------- +Alice | 2024-01-01 | 85 | 90 +Alice | 2024-02-01 | 90 | 88 +Alice | 2024-03-01 | 88 | NULL +Bob | 2024-01-01 | 78 | 82 +Bob | 2024-02-01 | 82 | 85 +Bob | 2024-03-01 | 85 | NULL +``` + +**Get score from 2 tests later:** + +```sql +SELECT student, test_date, score, + LEAD(score, 2, 0) OVER (PARTITION BY student ORDER BY test_date) AS score_2_tests_later +FROM scores +ORDER BY student, test_date; +``` + +Result: +``` +student | test_date | score | score_2_tests_later +--------+------------+-------+-------------------- +Alice | 2024-01-01 | 85 | 88 +Alice | 2024-02-01 | 90 | 0 +Alice | 2024-03-01 | 88 | 0 +Bob | 2024-01-01 | 78 | 85 +Bob | 2024-02-01 | 82 | 0 +Bob | 2024-03-01 | 85 | 0 +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/least-ignore-nulls.md b/tidb-cloud-lake/sql/least-ignore-nulls.md new file mode 100644 index 0000000000000..68cb1513eb064 --- /dev/null +++ b/tidb-cloud-lake/sql/least-ignore-nulls.md @@ -0,0 +1,30 @@ +--- +title: LEAST_IGNORE_NULLS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the maximum value from a set of values, ignoring any NULL values. + +See also: [LEAST](/tidb-cloud-lake/sql/least.md) + +## Syntax + +```sql +LEAST_IGNORE_NULLS(, ...) +``` + +## Examples + +```sql +SELECT LEAST_IGNORE_NULLS(5, 9, 4), LEAST_IGNORE_NULLS(5, 9, null); +``` + +```sql +┌──────────────────────────────────────────────────────────────┐ +│ least_ignore_nulls(5, 9, 4) │ least_ignore_nulls(5, 9, NULL) │ +├─────────────────────────────┼────────────────────────────────┤ +│ 4 │ 5 │ +└──────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/least.md b/tidb-cloud-lake/sql/least.md new file mode 100644 index 0000000000000..69c925797dae1 --- /dev/null +++ b/tidb-cloud-lake/sql/least.md @@ -0,0 +1,30 @@ +--- +title: LEAST +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the minimum value from a set of values. If any value in the set is `NULL`, the function returns `NULL`. + +See also: [LEAST_IGNORE_NULLS](/tidb-cloud-lake/sql/least-ignore-nulls.md) + +## Syntax + +```sql +LEAST(, ...) +``` + +## Examples + +```sql +SELECT LEAST(5, 9, 4), LEAST(5, 9, null); +``` + +``` +┌────────────────────────────────────┐ +│ least(5, 9, 4) │ least(5, 9, NULL) │ +├────────────────┼───────────────────┤ +│ 4 │ NULL │ +└────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/left.md b/tidb-cloud-lake/sql/left.md new file mode 100644 index 0000000000000..2f279ecb155b8 --- /dev/null +++ b/tidb-cloud-lake/sql/left.md @@ -0,0 +1,34 @@ +--- +title: LEFT +--- + +Returns the leftmost `len` characters from the string `str`, or NULL if any argument is NULL. If `len` is greater than the length of `str`, the entire `str` is returned. + +## Syntax + +```sql +LEFT(, ); +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------------------------------------------------| +| `` | The main string from where the character to be extracted | +| `` | The count of characters | + +## Return Type + +`VARCHAR` + +## Examples + +```sql +SELECT LEFT('foobarbar', 5), LEFT('foobarbar', 10); + +┌──────────────────────────────────────────────┐ +│ left('foobarbar', 5) │ left('foobarbar', 10) │ +├──────────────────────┼───────────────────────┤ +│ fooba │ foobarbar │ +└──────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/length-utf8.md b/tidb-cloud-lake/sql/length-utf8.md new file mode 100644 index 0000000000000..e21be0de2bd56 --- /dev/null +++ b/tidb-cloud-lake/sql/length-utf8.md @@ -0,0 +1,5 @@ +--- +title: LENGTH_UTF8 +--- + +Alias for [LENGTH](/tidb-cloud-lake/sql/length.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/length.md b/tidb-cloud-lake/sql/length.md new file mode 100644 index 0000000000000..dd4acf8cf9220 --- /dev/null +++ b/tidb-cloud-lake/sql/length.md @@ -0,0 +1,33 @@ +--- +title: LENGTH +--- + +Returns the length of a given input string or binary value. In the case of strings, the length represents the count of characters, with each UTF-8 character considered as a single character. For binary data, the length corresponds to the number of bytes. + +## Syntax + +```sql +LENGTH() +``` + +## Aliases + +- [CHAR_LENGTH](/tidb-cloud-lake/sql/char-length.md) +- [CHARACTER_LENGTH](/tidb-cloud-lake/sql/character-length.md) +- [LENGTH_UTF8](/tidb-cloud-lake/sql/length-utf8.md) + +## Return Type + +BIGINT + +## Examples + +```sql +SELECT LENGTH('Hello'), LENGTH_UTF8('Hello'), CHAR_LENGTH('Hello'), CHARACTER_LENGTH('Hello'); + +┌───────────────────────────────────────────────────────────────────────────────────────────┐ +│ length('hello') │ length_utf8('hello') │ char_length('hello') │ character_length('hello') │ +├─────────────────┼──────────────────────┼──────────────────────┼───────────────────────────┤ +│ 5 │ 5 │ 5 │ 5 │ +└───────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/like.md b/tidb-cloud-lake/sql/like.md new file mode 100644 index 0000000000000..e15c6a577fbcb --- /dev/null +++ b/tidb-cloud-lake/sql/like.md @@ -0,0 +1,25 @@ +--- +title: LIKE +--- + +Pattern matching using an SQL pattern. Returns 1 (TRUE) or 0 (FALSE). If either expr or pat is NULL, the result is NULL. + +## Syntax + +```sql + LIKE +``` + +## Examples + +```sql +SELECT name, category FROM system.functions WHERE name like 'tou%' ORDER BY name; ++----------+------------+ +| name | category | ++----------+------------+ +| touint16 | conversion | +| touint32 | conversion | +| touint64 | conversion | +| touint8 | conversion | ++----------+------------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/list-stage-files.md b/tidb-cloud-lake/sql/list-stage-files.md new file mode 100644 index 0000000000000..071d9a649316c --- /dev/null +++ b/tidb-cloud-lake/sql/list-stage-files.md @@ -0,0 +1,71 @@ +--- +title: LIST STAGE FILES +sidebar_position: 4 +--- + +Lists files in a stage. + +See also: + +- [LIST_STAGE](/tidb-cloud-lake/sql/list-stage.md): This function lists files in a stage and allows you to filter files in a stage based on their extensions and obtain comprehensive details about each file. +- [PRESIGN](/tidb-cloud-lake/sql/presign.md): Databend recommends using the Presigned URL method to upload files to the stage. +- [REMOVE STAGE FILES](/tidb-cloud-lake/sql/remove-stage-files.md): Removes files from a stage. + +## Syntax + +```sql +LIST { userStage | internalStage | externalStage } [ PATTERN = '' ] +``` + +## Examples + +The stage below contains a file named **books.parquet** and a folder named **2023**. + +![Alt text](/img/sql/list-stage.png) + +And the folder **2023** contains the following files: + +![Alt text](/img/sql/list-stage-2.png) + +The LIST command lists all the files in a stage by default: + +```sql +LIST @my_internal_stage; ++-----------------+------+------------------------------------+-------------------------------+---------+ +| name | size | md5 | last_modified | creator | ++-----------------+------+------------------------------------+-------------------------------+---------+ +| 2023/meta.log | 475 | "4208ff530b252236e14b3cd797abdfbd" | 2023-04-19 20:23:24.000 +0000 | NULL | +| 2023/query.log | 1348 | "1c6654b207472c277fc8c6207c035e18" | 2023-04-19 20:23:24.000 +0000 | NULL | +| 2023/readme.txt | 1193 | "8c0fbbebfedf26f93324541f97f5ac14" | 2023-04-19 20:23:24.000 +0000 | NULL | +| books.parquet | 998 | "88432bf90aadb79073682988b39d461c" | 2023-04-19 20:08:42.000 +0000 | NULL | ++-----------------+------+------------------------------------+-------------------------------+---------+ +``` + +To list the files in the folder **2023**, run the following command: + +:::note +It is necessary to add a slash "/" at the end of the path in the command, otherwise, the command may not work as expected and may result in an error. +::: + +```sql +LIST @my_internal_stage/2023/; ++-----------------+------+------------------------------------+-------------------------------+---------+ +| name | size | md5 | last_modified | creator | ++-----------------+------+------------------------------------+-------------------------------+---------+ +| 2023/meta.log | 475 | "4208ff530b252236e14b3cd797abdfbd" | 2023-04-19 20:23:24.000 +0000 | NULL | +| 2023/query.log | 1348 | "1c6654b207472c277fc8c6207c035e18" | 2023-04-19 20:23:24.000 +0000 | NULL | +| 2023/readme.txt | 1193 | "8c0fbbebfedf26f93324541f97f5ac14" | 2023-04-19 20:23:24.000 +0000 | NULL | ++-----------------+------+------------------------------------+-------------------------------+---------+ +``` + +To list all the files with the extension *.log in the stage, run the following command: + +```sql +LIST @my_internal_stage PATTERN = '.log'; ++----------------+------+------------------------------------+-------------------------------+---------+ +| name | size | md5 | last_modified | creator | ++----------------+------+------------------------------------+-------------------------------+---------+ +| 2023/meta.log | 475 | "4208ff530b252236e14b3cd797abdfbd" | 2023-04-19 20:23:24.000 +0000 | NULL | +| 2023/query.log | 1348 | "1c6654b207472c277fc8c6207c035e18" | 2023-04-19 20:23:24.000 +0000 | NULL | ++----------------+------+------------------------------------+-------------------------------+---------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/list-stage.md b/tidb-cloud-lake/sql/list-stage.md new file mode 100644 index 0000000000000..590f36cd11d4e --- /dev/null +++ b/tidb-cloud-lake/sql/list-stage.md @@ -0,0 +1,57 @@ +--- +title: LIST_STAGE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Lists files in a stage. This allows you to filter files in a stage based on their extensions and obtain comprehensive details about each file. The function is similar to the DDL command [LIST STAGE FILES](/tidb-cloud-lake/sql/list-stage.md), but provides you the flexibility to retrieve specific file information with the SELECT statement, such as file name, size, MD5 hash, last modified timestamp, and creator, rather than all file information. + +## Syntax + +```sql +LIST_STAGE( + LOCATION => '{ internalStage | externalStage | userStage }' + [ PATTERN => ''] +) +``` + +Where: + +### internalStage + +```sql +internalStage ::= @[/] +``` + +### externalStage + +```sql +externalStage ::= @[/] +``` + +### userStage + +```sql +userStage ::= @~[/] +``` + +### PATTERN + +See [COPY INTO table](/tidb-cloud-lake/sql/copy-into-table.md). + + +## Examples + +```sql +SELECT * FROM list_stage(location => '@my_stage/', pattern => '.*[.]log'); ++----------------+------+------------------------------------+-------------------------------+---------+ +| name | size | md5 | last_modified | creator | ++----------------+------+------------------------------------+-------------------------------+---------+ +| 2023/meta.log | 475 | "4208ff530b252236e14b3cd797abdfbd" | 2023-04-19 20:23:24.000 +0000 | NULL | +| 2023/query.log | 1348 | "1c6654b207472c277fc8c6207c035e18" | 2023-04-19 20:23:24.000 +0000 | NULL | ++----------------+------+------------------------------------+-------------------------------+---------+ + +-- Equivalent to the following statement: +LIST @my_stage PATTERN = '.log'; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/listagg.md b/tidb-cloud-lake/sql/listagg.md new file mode 100644 index 0000000000000..713c00df30fdd --- /dev/null +++ b/tidb-cloud-lake/sql/listagg.md @@ -0,0 +1,100 @@ +--- +title: LISTAGG +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Concatenates values from multiple rows into a single string, separated by a specified delimiter. This operation can be performed using two different function types: +- Aggregate Function: The concatenation happens across all rows in the entire result set. +- Window Function: The concatenation happens within each partition of the result set, as defined by the `PARTITION BY` clause. + +## Syntax + +```sql +-- Aggregate Function +LISTAGG([DISTINCT] [, ]) + [WITHIN GROUP (ORDER BY )] + +-- Window Function +LISTAGG([DISTINCT] [, ]) + [WITHIN GROUP (ORDER BY )] + OVER ([PARTITION BY ]) +``` + +| Parameter | Description | +|---------------------------------|---------------------------------------------------------------------------------------------------| +| `DISTINCT` | Optional. Removes duplicate values before concatenation. | +| `` | The expression to concatenate (typically a column or an expression). | +| `` | Optional. The string to separate each concatenated value. Defaults to an empty string if omitted. | +| `ORDER BY ` | Defines the order in which the values are concatenated. | +| `PARTITION BY ` | Divides rows into partitions to perform aggregation separately within each group. | + +## Aliases + +- [STRING_AGG](/tidb-cloud-lake/sql/string-agg.md) +- [GROUP_CONCAT](/tidb-cloud-lake/sql/group-concat.md) + +## Return Type + +String. + +## Examples + +In this example, we have a table of customer orders. Each order belongs to a customer, and we want to create a list of all products each customer has purchased. + +```sql +CREATE TABLE orders ( + customer_id INT, + product_name VARCHAR +); + +INSERT INTO orders (customer_id, product_name) VALUES +(1, 'Laptop'), +(1, 'Mouse'), +(1, 'Laptop'), +(2, 'Phone'), +(2, 'Headphones'); +``` + +The following uses `LISTAGG` as an aggregate function with GROUP BY to concatenate all products purchased by each customer into a single string: + +```sql +SELECT + customer_id, + LISTAGG(product_name, ', ') WITHIN GROUP (ORDER BY product_name) AS product_list +FROM orders +GROUP BY customer_id; +``` + +```sql +┌─────────────────────────────────────────┐ +│ customer_id │ product_list │ +├─────────────────┼───────────────────────┤ +│ 2 │ Headphones, Phone │ +│ 1 │ Laptop, Laptop, Mouse │ +└─────────────────────────────────────────┘ +``` + +The following uses `LISTAGG` as a window function, so each row keeps its original details but also displays the full product list for the customer's group: + +```sql +SELECT + customer_id, + product_name, + LISTAGG(product_name, ', ') WITHIN GROUP (ORDER BY product_name) + OVER (PARTITION BY customer_id) AS product_list +FROM orders; +``` + +```sql +┌────────────────────────────────────────────────────────────┐ +│ customer_id │ product_name │ product_list │ +├─────────────────┼──────────────────┼───────────────────────┤ +│ 2 │ Phone │ Headphones, Phone │ +│ 2 │ Headphones │ Headphones, Phone │ +│ 1 │ Laptop │ Laptop, Laptop, Mouse │ +│ 1 │ Mouse │ Laptop, Laptop, Mouse │ +│ 1 │ Laptop │ Laptop, Laptop, Mouse │ +└────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/ln.md b/tidb-cloud-lake/sql/ln.md new file mode 100644 index 0000000000000..c64ba244554fc --- /dev/null +++ b/tidb-cloud-lake/sql/ln.md @@ -0,0 +1,23 @@ +--- +title: LN +--- + +Returns the natural logarithm of `x`; that is, the base-e logarithm of `x`. If x is less than or equal to 0.0E0, the function returns NULL. + +## Syntax + +```sql +LN( ) +``` + +## Examples + +```sql +SELECT LN(2); + +┌────────────────────┐ +│ ln(2) │ +├────────────────────┤ +│ 0.6931471805599453 │ +└────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/locate.md b/tidb-cloud-lake/sql/locate.md new file mode 100644 index 0000000000000..620c62960e16b --- /dev/null +++ b/tidb-cloud-lake/sql/locate.md @@ -0,0 +1,51 @@ +--- +title: LOCATE +--- + +The first syntax returns the position of the first occurrence of substring substr in string str. +The second syntax returns the position of the first occurrence of substring substr in string str, starting at position pos. +Returns 0 if substr is not in str. Returns NULL if any argument is NULL. + +## Syntax + +```sql +LOCATE(, ) +LOCATE(, , ) +``` + +## Arguments + +| Arguments | Description | +|------------|----------------| +| `` | The substring. | +| `` | The string. | +| `` | The position. | + +## Return Type + +`BIGINT` + +## Examples + +```sql +SELECT LOCATE('bar', 'foobarbar') ++----------------------------+ +| LOCATE('bar', 'foobarbar') | ++----------------------------+ +| 4 | ++----------------------------+ + +SELECT LOCATE('xbar', 'foobar') ++--------------------------+ +| LOCATE('xbar', 'foobar') | ++--------------------------+ +| 0 | ++--------------------------+ + +SELECT LOCATE('bar', 'foobarbar', 5) ++-------------------------------+ +| LOCATE('bar', 'foobarbar', 5) | ++-------------------------------+ +| 7 | ++-------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/log-b-x.md b/tidb-cloud-lake/sql/log-b-x.md new file mode 100644 index 0000000000000..0fc2b3e5d58da --- /dev/null +++ b/tidb-cloud-lake/sql/log-b-x.md @@ -0,0 +1,23 @@ +--- +title: "LOG(b, x)" +--- + +Returns the base-b logarithm of `x`. If `x` is less than or equal to 0.0E0, the function returns NULL. + +## Syntax + +```sql +LOG( ) +``` + +## Examples + +```sql +SELECT LOG(2, 65536); + +┌───────────────┐ +│ log(2, 65536) │ +├───────────────┤ +│ 16 │ +└───────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/log-sql.md b/tidb-cloud-lake/sql/log-sql.md new file mode 100644 index 0000000000000..cb46b6769cb57 --- /dev/null +++ b/tidb-cloud-lake/sql/log-sql.md @@ -0,0 +1,23 @@ +--- +title: LOG2 +--- + +Returns the base-2 logarithm of `x`. If `x` is less than or equal to 0.0E0, the function returns NULL. + +## Syntax + +```sql +LOG2( ) +``` + +## Examples + +```sql +SELECT LOG2(65536); + +┌─────────────┐ +│ log2(65536) │ +├─────────────┤ +│ 16 │ +└─────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/log-x.md b/tidb-cloud-lake/sql/log-x.md new file mode 100644 index 0000000000000..949f4ec432be5 --- /dev/null +++ b/tidb-cloud-lake/sql/log-x.md @@ -0,0 +1,23 @@ +--- +title: "LOG(x)" +--- + +Returns the natural logarithm of `x`. If x is less than or equal to 0.0E0, the function returns NULL. + +## Syntax + +```sql +LOG( ) +``` + +## Examples + +```sql +SELECT LOG(2); + +┌────────────────────┐ +│ log(2) │ +├────────────────────┤ +│ 0.6931471805599453 │ +└────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/log.md b/tidb-cloud-lake/sql/log.md new file mode 100644 index 0000000000000..87cb6af3e5670 --- /dev/null +++ b/tidb-cloud-lake/sql/log.md @@ -0,0 +1,23 @@ +--- +title: LOG10 +--- + +Returns the base-10 logarithm of `x`. If `x` is less than or equal to 0.0E0, the function returns NULL. + +## Syntax + +```sql +LOG10( ) +``` + +## Examples + +```sql +SELECT LOG10(100); + +┌────────────┐ +│ log10(100) │ +├────────────┤ +│ 2 │ +└────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/logical-operators.md b/tidb-cloud-lake/sql/logical-operators.md new file mode 100644 index 0000000000000..6769038bc2b78 --- /dev/null +++ b/tidb-cloud-lake/sql/logical-operators.md @@ -0,0 +1,9 @@ +--- +title: Logical Operators +--- + +| Operator | Description | Example | Result | +|----------|----------------------------------------|---------------|--------| +| **AND** | Matches both expressions (`a` and `b`) | **1 AND 1** | TRUE | +| **NOT** | Does not match the expression | **NOT 1** | FALSE | +| **OR** | Matches either expression | **1 OR 0** | TRUE | diff --git a/tidb-cloud-lake/sql/lower.md b/tidb-cloud-lake/sql/lower.md new file mode 100644 index 0000000000000..8ac53f7d91470 --- /dev/null +++ b/tidb-cloud-lake/sql/lower.md @@ -0,0 +1,31 @@ +--- +title: LOWER +--- + +Returns a string with all characters changed to lowercase. + +## Syntax + +```sql +LOWER() +``` + +## Aliases + +- [LCASE](/tidb-cloud-lake/sql/lcase.md) + +## Return Type + +VARCHAR + +## Examples + +```sql +SELECT LOWER('Hello, Databend!'), LCASE('Hello, Databend!'); + +┌───────────────────────────────────────────────────────┐ +│ lower('hello, databend!') │ lcase('hello, databend!') │ +├───────────────────────────┼───────────────────────────┤ +│ hello, databend! │ hello, databend! │ +└───────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/lpad.md b/tidb-cloud-lake/sql/lpad.md new file mode 100644 index 0000000000000..73500a34d2bd1 --- /dev/null +++ b/tidb-cloud-lake/sql/lpad.md @@ -0,0 +1,42 @@ +--- +title: LPAD +--- + +Returns the string str, left-padded with the string padstr to a length of len characters. +If str is longer than len, the return value is shortened to len characters. + +## Syntax + +```sql +LPAD(, , ) +``` + +## Arguments + +| Arguments | Description | +|------------|-----------------| +| `` | The string. | +| `` | The length. | +| `` | The pad string. | + +## Return Type + +`VARCHAR` + +## Examples + +```sql +SELECT LPAD('hi',4,'??'); ++---------------------+ +| LPAD('hi', 4, '??') | ++---------------------+ +| ??hi | ++---------------------+ + +SELECT LPAD('hi',1,'??'); ++---------------------+ +| LPAD('hi', 1, '??') | ++---------------------+ +| h | ++---------------------+ +``` diff --git a/tidb-cloud-lake/sql/ltrim.md b/tidb-cloud-lake/sql/ltrim.md new file mode 100644 index 0000000000000..1a9bcb85c21ab --- /dev/null +++ b/tidb-cloud-lake/sql/ltrim.md @@ -0,0 +1,31 @@ +--- +title: LTRIM +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Removes all occurrences of any character present in the specified trim string from the left side of the string. + +See also: + +- [TRIM_LEADING](/tidb-cloud-lake/sql/trim-leading.md) +- [RTRIM](/tidb-cloud-lake/sql/rtrim.md) + +## Syntax + +```sql +LTRIM(, ) +``` + +## Examples + +```sql +SELECT LTRIM('xxdatabend', 'xx'), LTRIM('xxdatabend', 'xy'); + +┌───────────────────────────────────────────────────────┐ +│ ltrim('xxdatabend', 'xx') │ ltrim('xxdatabend', 'xy') │ +├───────────────────────────┼───────────────────────────┤ +│ databend │ databend │ +└───────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/map-cat.md b/tidb-cloud-lake/sql/map-cat.md new file mode 100644 index 0000000000000..a9ea266aca55c --- /dev/null +++ b/tidb-cloud-lake/sql/map-cat.md @@ -0,0 +1,41 @@ +--- +title: MAP_CAT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the concatenatation of two MAPs. + +## Syntax + +```sql +MAP_CAT( , ) +``` + +## Arguments + +| Arguments | Description | +|-----------|---------------------------------| +| `` | The source MAP. | +| `` | The MAP to be appended to map1. | + +:::note +- If both map1 and map2 have a value with the same key, then the output map contains the value from map2. +- If either argument is NULL, the function returns NULL without reporting any error. +::: + +## Return Type + +Map. + +## Examples + +```sql +SELECT MAP_CAT({'a':1,'b':2,'c':3}, {'c':5,'d':6}); +┌─────────────────────────────────────────────┐ +│ map_cat({'a':1,'b':2,'c':3}, {'c':5,'d':6}) │ +├─────────────────────────────────────────────┤ +│ {'a':1,'b':2,'c':5,'d':6} │ +└─────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/map-contains-key.md b/tidb-cloud-lake/sql/map-contains-key.md new file mode 100644 index 0000000000000..b17915f6f0c19 --- /dev/null +++ b/tidb-cloud-lake/sql/map-contains-key.md @@ -0,0 +1,43 @@ +--- +title: MAP_CONTAINS_KEY +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Determines whether the specified MAP contains the specified key. + +## Syntax + +```sql +MAP_CONTAINS_KEY( , ) +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------------------| +| `` | The map to be searched. | +| `` | The key to find. | + +## Return Type + +Boolean. + +## Examples + +```sql +SELECT MAP_CONTAINS_KEY({'a':1,'b':2,'c':3}, 'c'); +┌────────────────────────────────────────────┐ +│ map_contains_key({'a':1,'b':2,'c':3}, 'c') │ +├────────────────────────────────────────────┤ +│ true │ +└────────────────────────────────────────────┘ + +SELECT MAP_CONTAINS_KEY({'a':1,'b':2,'c':3}, 'x'); +┌────────────────────────────────────────────┐ +│ map_contains_key({'a':1,'b':2,'c':3}, 'x') │ +├────────────────────────────────────────────┤ +│ false │ +└────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/map-delete.md b/tidb-cloud-lake/sql/map-delete.md new file mode 100644 index 0000000000000..cfa736f9669fb --- /dev/null +++ b/tidb-cloud-lake/sql/map-delete.md @@ -0,0 +1,50 @@ +--- +title: MAP_DELETE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns an existing MAP with one or more keys removed. + +## Syntax + +```sql +MAP_DELETE( , [, , ... ] ) +MAP_DELETE( , ) +``` + +## Arguments + +| Arguments | Description | +|-----------|--------------------------------------------------------| +| `` | The MAP that contains the KEY to remove. | +| `` | The KEYs to be omitted from the returned MAP. | +| `` | The Array of KEYs to be omitted from the returned MAP. | + +:::note +- The types of the key expressions and the keys in the map must be the same. +- Key values not found in the map will be ignored. +::: + +## Return Type + +Map. + +## Examples + +```sql +SELECT MAP_DELETE({'a':1,'b':2,'c':3}, 'a', 'c'); +┌───────────────────────────────────────────┐ +│ map_delete({'a':1,'b':2,'c':3}, 'a', 'c') │ +├───────────────────────────────────────────┤ +│ {'b':2} │ +└───────────────────────────────────────────┘ + +SELECT MAP_DELETE({'a':1,'b':2,'c':3}, ['a', 'b']); +┌─────────────────────────────────────────────┐ +│ map_delete({'a':1,'b':2,'c':3}, ['a', 'b']) │ +├─────────────────────────────────────────────┤ +│ {'c':3} │ +└─────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/map-filter.md b/tidb-cloud-lake/sql/map-filter.md new file mode 100644 index 0000000000000..727c2cee02026 --- /dev/null +++ b/tidb-cloud-lake/sql/map-filter.md @@ -0,0 +1,32 @@ +--- +title: MAP_FILTER +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Filters key-value pairs in a JSON object based on a specified condition, defined using a [lambda expression](/tidb-cloud-lake/sql/stored-procedure-sql-scripting.md#lambda-expressions). + +## Syntax + +```sql +MAP_FILTER(, (, ) -> ) +``` + +## Return Type + +Returns a JSON object with only the key-value pairs that satisfy the specified condition. + +## Examples + +This example extracts only the `"status": "active"` key-value pair from the JSON object, filtering out the other fields: + +```sql +SELECT MAP_FILTER('{"status":"active", "user":"admin", "time":"2024-11-01"}'::VARIANT, (k, v) -> k = 'status') AS filtered_metadata; + +┌─────────────────────┐ +│ filtered_metadata │ +├─────────────────────┤ +│ {"status":"active"} │ +└─────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/map-functions.md b/tidb-cloud-lake/sql/map-functions.md new file mode 100644 index 0000000000000..494ff124b8e18 --- /dev/null +++ b/tidb-cloud-lake/sql/map-functions.md @@ -0,0 +1,41 @@ +--- +title: Map Functions +--- + +This section provides reference information for the map functions in Databend. Map functions allow you to create, manipulate, and extract information from map data structures (key-value pairs). + +## Map Creation and Combination + +| Function | Description | Example | +|----------|-------------|--------| +| [MAP_CAT](/tidb-cloud-lake/sql/map-cat.md) | Combines multiple maps into a single map | `MAP_CAT({'a':1}, {'b':2})` → `{'a':1,'b':2}` | + +## Map Access and Information + +| Function | Description | Example | +|----------|-------------|--------| +| [MAP_KEYS](/tidb-cloud-lake/sql/map-keys.md) | Returns all keys from a map as an array | `MAP_KEYS({'a':1,'b':2})` → `['a','b']` | +| [MAP_VALUES](/tidb-cloud-lake/sql/map-values.md) | Returns all values from a map as an array | `MAP_VALUES({'a':1,'b':2})` → `[1,2]` | +| [MAP_SIZE](/tidb-cloud-lake/sql/map-size.md) | Returns the number of key-value pairs in a map | `MAP_SIZE({'a':1,'b':2,'c':3})` → `3` | +| [MAP_CONTAINS_KEY](/tidb-cloud-lake/sql/map-contains-key.md) | Checks if a map contains a specific key | `MAP_CONTAINS_KEY({'a':1,'b':2}, 'a')` → `TRUE` | + +## Map Modification + +| Function | Description | Example | +|----------|-------------|--------| +| [MAP_INSERT](/tidb-cloud-lake/sql/map-insert.md) | Inserts a key-value pair into a map | `MAP_INSERT({'a':1,'b':2}, 'c', 3)` → `{'a':1,'b':2,'c':3}` | +| [MAP_DELETE](/tidb-cloud-lake/sql/map-delete.md) | Removes a key-value pair from a map | `MAP_DELETE({'a':1,'b':2,'c':3}, 'b')` → `{'a':1,'c':3}` | + +## Map Transformation + +| Function | Description | Example | +|----------|-------------|--------| +| [MAP_TRANSFORM_KEYS](/tidb-cloud-lake/sql/map-transform-keys.md) | Applies a function to each key in a map | `MAP_TRANSFORM_KEYS({'a':1,'b':2}, x -> UPPER(x))` → `{'A':1,'B':2}` | +| [MAP_TRANSFORM_VALUES](/tidb-cloud-lake/sql/map-transform-values.md) | Applies a function to each value in a map | `MAP_TRANSFORM_VALUES({'a':1,'b':2}, x -> x * 10)` → `{'a':10,'b':20}` | + +## Map Filtering and Selection + +| Function | Description | Example | +|----------|-------------|--------| +| [MAP_FILTER](/tidb-cloud-lake/sql/map-filter.md) | Filters key-value pairs based on a predicate | `MAP_FILTER({'a':1,'b':2,'c':3}, (k,v) -> v > 1)` → `{'b':2,'c':3}` | +| [MAP_PICK](/tidb-cloud-lake/sql/map-pick.md) | Creates a new map with only specified keys | `MAP_PICK({'a':1,'b':2,'c':3}, ['a','c'])` → `{'a':1,'c':3}` | diff --git a/tidb-cloud-lake/sql/map-insert.md b/tidb-cloud-lake/sql/map-insert.md new file mode 100644 index 0000000000000..40133dfcbfc88 --- /dev/null +++ b/tidb-cloud-lake/sql/map-insert.md @@ -0,0 +1,45 @@ +--- +title: MAP_INSERT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns a new MAP consisting of the input MAP with a new key-value pair inserted (an existing key updated with a new value). + +## Syntax + +```sql +MAP_INSERT( , , [, ] ) +``` + +## Arguments + +| Arguments | Description | +|----------------|----------------------------------------------------------------------------------------------| +| `` | The input MAP. | +| `` | The new key to insert into the MAP. | +| `` | The new value to insert into the MAP. | +| `` | The boolean flag indicates whether an existing key can be overwritten. The default is FALSE. | + +## Return Type + +Map. + +## Examples + +```sql +SELECT MAP_INSERT({'a':1,'b':2,'c':3}, 'd', 4); +┌─────────────────────────────────────────┐ +│ map_insert({'a':1,'b':2,'c':3}, 'd', 4) │ +├─────────────────────────────────────────┤ +│ {'a':1,'b':2,'c':3,'d':4} │ +└─────────────────────────────────────────┘ + +SELECT MAP_INSERT({'a':1,'b':2,'c':3}, 'a', 5, true); +┌───────────────────────────────────────────────┐ +│ map_insert({'a':1,'b':2,'c':3}, 'a', 5, TRUE) │ +├───────────────────────────────────────────────┤ +│ {'a':5,'b':2,'c':3} │ +└───────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/map-keys.md b/tidb-cloud-lake/sql/map-keys.md new file mode 100644 index 0000000000000..18ab4589cea70 --- /dev/null +++ b/tidb-cloud-lake/sql/map-keys.md @@ -0,0 +1,36 @@ +--- +title: MAP_KEYS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the keys in a map. + +## Syntax + +```sql +MAP_KEYS( ) +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------| +| `` | The input map. | + +## Return Type + +Array. + +## Examples + +```sql +SELECT MAP_KEYS({'a':1,'b':2,'c':3}); + +┌───────────────────────────────┐ +│ map_keys({'a':1,'b':2,'c':3}) │ +├───────────────────────────────┤ +│ ['a','b','c'] │ +└───────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/map-pick.md b/tidb-cloud-lake/sql/map-pick.md new file mode 100644 index 0000000000000..69cd50cca2751 --- /dev/null +++ b/tidb-cloud-lake/sql/map-pick.md @@ -0,0 +1,50 @@ +--- +title: MAP_PICK +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns a new MAP containing the specified key-value pairs from an existing MAP. + +## Syntax + +```sql +MAP_PICK( , [, , ... ] ) +MAP_PICK( , ) +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------------------------------------------------- | +| `` | The input MAP. | +| `` | The KEYs to be included from the returned MAP. | +| `` | The Array of KEYs to be included from the returned MAP. | + +:::note +- The types of the key expressions and the keys in the map must be the same. +- Key values not found in the map will be ignored. +::: + +## Return Type + +Map. + +## Examples + +```sql +SELECT MAP_PICK({'a':1,'b':2,'c':3}, 'a', 'c'); +┌─────────────────────────────────────────┐ +│ map_pick({'a':1,'b':2,'c':3}, 'a', 'c') │ +├─────────────────────────────────────────┤ +│ {'a':1,'c':3} │ +└─────────────────────────────────────────┘ + +SELECT MAP_PICK({'a':1,'b':2,'c':3}, ['a', 'b']); +┌───────────────────────────────────────────┐ +│ map_pick({'a':1,'b':2,'c':3}, ['a', 'b']) │ +├───────────────────────────────────────────┤ +│ {'a':1,'b':2} │ +└───────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/map-size.md b/tidb-cloud-lake/sql/map-size.md new file mode 100644 index 0000000000000..f3dff5a3b31c4 --- /dev/null +++ b/tidb-cloud-lake/sql/map-size.md @@ -0,0 +1,36 @@ +--- +title: MAP_SIZE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the size of a MAP. + +## Syntax + +```sql +MAP_SIZE( ) +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------| +| `` | The input map. | + +## Return Type + +UInt64. + +## Examples + +```sql +SELECT MAP_SIZE({'a':1,'b':2,'c':3}); + +┌───────────────────────────────┐ +│ map_size({'a':1,'b':2,'c':3}) │ +├───────────────────────────────┤ +│ 3 │ +└───────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/map-transform-keys.md b/tidb-cloud-lake/sql/map-transform-keys.md new file mode 100644 index 0000000000000..b6bdebf1c463a --- /dev/null +++ b/tidb-cloud-lake/sql/map-transform-keys.md @@ -0,0 +1,32 @@ +--- +title: MAP_TRANSFORM_KEYS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Applies a transformation to each key in a JSON object using a [lambda expression](/sql/stored-procedure-scripting/#lambda-expressions). + +## Syntax + +```sql +MAP_TRANSFORM_KEYS(, (, ) -> ) +``` + +## Return Type + +Returns a JSON object with the same values as the input JSON object, but with keys modified according to the specified lambda transformation. + +## Examples + +This example appends "_v1" to each key, creating a new JSON object with modified keys: + +```sql +SELECT MAP_TRANSFORM_KEYS('{"name":"John", "role":"admin"}'::VARIANT, (k, v) -> CONCAT(k, '_v1')) AS versioned_metadata; + +┌──────────────────────────────────────┐ +│ versioned_metadata │ +├──────────────────────────────────────┤ +│ {"name_v1":"John","role_v1":"admin"} │ +└──────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/map-transform-values.md b/tidb-cloud-lake/sql/map-transform-values.md new file mode 100644 index 0000000000000..15cd035ec1322 --- /dev/null +++ b/tidb-cloud-lake/sql/map-transform-values.md @@ -0,0 +1,32 @@ +--- +title: MAP_TRANSFORM_VALUES +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Applies a transformation to each value in a JSON object using a [lambda expression](/sql/stored-procedure-scripting/#lambda-expressions). + +## Syntax + +```sql +MAP_TRANSFORM_VALUES(, (, ) -> ) +``` + +## Return Type + +Returns a JSON object with the same keys as the input JSON object, but with values modified according to the specified lambda transformation. + +## Examples + +This example multiplies each numeric value by 10, transforming the original object into `{"a":10,"b":20}`: + +```sql +SELECT MAP_TRANSFORM_VALUES('{"a":1,"b":2}'::VARIANT, (k, v) -> v * 10) AS transformed_values; + +┌────────────────────┐ +│ transformed_values │ +├────────────────────┤ +│ {"a":10,"b":20} │ +└────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/map-values.md b/tidb-cloud-lake/sql/map-values.md new file mode 100644 index 0000000000000..a005c1023389d --- /dev/null +++ b/tidb-cloud-lake/sql/map-values.md @@ -0,0 +1,36 @@ +--- +title: MAP_VALUES +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the values in a map. + +## Syntax + +```sql +MAP_VALUES( ) +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------| +| `` | The input map. | + +## Return Type + +Array. + +## Examples + +```sql +SELECT MAP_VALUES({'a':1,'b':2,'c':3}); + +┌─────────────────────────────────┐ +│ map_values({'a':1,'b':2,'c':3}) │ +├─────────────────────────────────┤ +│ [1,2,3] │ +└─────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/map.md b/tidb-cloud-lake/sql/map.md new file mode 100644 index 0000000000000..0674311276185 --- /dev/null +++ b/tidb-cloud-lake/sql/map.md @@ -0,0 +1,119 @@ +--- +title: Map +sidebar_position: 10 +--- + +## Overview + +`MAP(K, V)` stores key-value pairs internally as `ARRAY(TUPLE(key, value))`. Define the key type `K` up front (Boolean, numeric, decimal, string, date, or timestamp). Keys must be non-null and unique; values can be any type, including nested structures. Use map literals (`{key: value}`) or the `MAP(keys, values)` function to build a map expression. + +```sql +SELECT + {'k1': 1, 'k2': 2} AS literal_map, + MAP(['x', 'y'], [10, 20]) AS from_arrays; +``` + +Result: +``` +┌───────────────────────┬──────────────────┐ +│ literal_map │ from_arrays │ +├───────────────────────┼──────────────────┤ +│ {'k1':1,'k2':2} │ {'x':10,'y':20} │ +└───────────────────────┴──────────────────┘ +``` + +## Examples + +### Create and Query + +```sql +CREATE TABLE web_traffic_data ( + id INT64, + traffic_info MAP(STRING, STRING) +); + +INSERT INTO web_traffic_data VALUES + (1, {'ip': '192.168.1.1', 'url': 'example.com/home'}), + (2, {'ip': '192.168.1.2', 'url': 'example.com/about'}), + (3, {'ip': '192.168.1.1', 'url': 'example.com/contact'}); + +SELECT + id, + traffic_info['ip'] AS ip_address, + traffic_info['url'] AS url +FROM web_traffic_data; +``` + +Result: +``` +┌────┬─────────────┬───────────────────────┐ +│ id │ ip_address │ url │ +├────┼─────────────┼───────────────────────┤ +│ 1 │ 192.168.1.1 │ example.com/home │ +│ 2 │ 192.168.1.2 │ example.com/about │ +│ 3 │ 192.168.1.1 │ example.com/contact │ +└────┴─────────────┴───────────────────────┘ +``` + +```sql +SELECT + traffic_info['ip'] AS ip_address, + COUNT(*) AS visits +FROM web_traffic_data +GROUP BY traffic_info['ip'] +ORDER BY visits DESC; +``` + +Result: +``` +┌─────────────┬────────┐ +│ ip_address │ visits │ +├─────────────┼────────┤ +│ 192.168.1.1 │ 2 │ +│ 192.168.1.2 │ 1 │ +└─────────────┴────────┘ +``` + +### Bloom Filter Index + +Map columns automatically maintain a bloom filter for supported value types (numeric, string, timestamp, date). Filtering on `map['key']` skips blocks quickly when the value is absent. + +```sql +CREATE TABLE nginx_log ( + id INT, + log MAP(STRING, STRING) +); + +INSERT INTO nginx_log VALUES + (1, {'ip': '205.91.162.148', 'url': 'test-1'}), + (2, {'ip': '205.91.162.141', 'url': 'test-2'}); +``` + +```sql +SELECT * +FROM nginx_log +WHERE log['ip'] = '205.91.162.148'; +``` + +Result: +``` +┌────┬─────────────────────────────────────────┐ +│ id │ log │ +├────┼─────────────────────────────────────────┤ +│ 1 │ {'ip':'205.91.162.148','url':'test-1'} │ +└────┴─────────────────────────────────────────┘ +``` + +```sql +SELECT * +FROM nginx_log +WHERE log['ip'] = '205.91.162.200'; +``` + +Result: +``` +┌────┬────┐ +│ id │ log │ +├────┼────┤ +└────┴────┘ +``` diff --git a/tidb-cloud-lake/sql/markov-generate.md b/tidb-cloud-lake/sql/markov-generate.md new file mode 100644 index 0000000000000..3577afaf40d38 --- /dev/null +++ b/tidb-cloud-lake/sql/markov-generate.md @@ -0,0 +1,67 @@ +--- +title: MARKOV_GENERATE +--- + +Using the model trained by [MARKOV_TRAIN](/tidb-cloud-lake/sql/markov-train.md) to anonymize the dataset. + +## Syntax + +```sql +MARKOV_GENERATE( , , , ) +``` + +## Arguments + +| Arguments | Description | +| ----------- | ----------- | +| `model` | The return model of markov_train | +| `params`| Json string: `{"order": 5, "sliding_window_size": 8}`
order:order of markov model to generate strings,
size of a sliding window in a source string - its hash is used as a seed for RNG in markov model | +| `seed` | seed | +| `determinator`| Source string | + +## Return Type + +String. + +## Examples + +Generate multiple PII-like columns (name + email) from small seed sets: + +```sql +-- 1) Train separate models on names and emails (PII text) +CREATE TABLE markov_name_model AS +SELECT markov_train(name) AS model +FROM ( + VALUES ('Alice Johnson'),('Bob Smith'),('Carol Davis'),('David Miller'),('Emma Wilson'), + ('Frank Brown'),('Grace Lee'),('Henry Clark'),('Irene Torres'),('Jack White') +) AS t(name); + +CREATE TABLE markov_email_model AS +SELECT markov_train(email) AS model +FROM ( + VALUES ('alice.johnson@gmail.com'),('bob.smith@yahoo.com'),('carol.davis@outlook.com'), + ('david.miller@example.com'),('emma.wilson@example.com'),('frank.brown@gmail.com'), + ('grace.lee@example.com'),('henry.clark@example.com'),('irene.torres@example.com'), + ('jack.white@example.com') +) AS t(email); + +-- 2) Generate synthetic name + email pairs; seed keeps it reproducible +SELECT + markov_generate(n.model, '{"order":3,"sliding_window_size":12}', 3030, CONCAT('orig_', number)) AS fake_name, + markov_generate(e.model, '{"order":3,"sliding_window_size":12}', 3030, CONCAT('orig_', number, '@example.com')) AS fake_email +FROM numbers(6) +JOIN markov_name_model n +JOIN markov_email_model e +LIMIT 6; +-- Sample output ++----------------+-------------------------+ +| fake_name | fake_email | ++----------------+-------------------------+ +| Frank Brown | henry.clark@example | +| Grace Johnso | quinn.foster@example | +| Rachel | paul.adams@example | +| Carol David | olivia.baker@example | +| Jack White | frank.brown@gmail.com | +| Noah Harris | race.johnson@example | ++----------------+-------------------------+ +``` diff --git a/tidb-cloud-lake/sql/markov-train.md b/tidb-cloud-lake/sql/markov-train.md new file mode 100644 index 0000000000000..1781e55ef14fc --- /dev/null +++ b/tidb-cloud-lake/sql/markov-train.md @@ -0,0 +1,49 @@ +--- +title: MARKOV_TRAIN +--- + +Extracting patterns from datasets using Markov models + +## Syntax + +```sql +MARKOV_TRAIN() + +MARKOV_TRAIN()() + +MARKOV_TRAIN(, , , , ) () +``` + +## Arguments + +| Arguments | Description | +|------------------| ------------------ | +| `string` | Input | +| `order` | Order of markov model to generate strings | +| `frequency-cutoff` | Frequency cutoff for markov model: remove all buckets with count less than specified | +| `num-buckets-cutoff` | Cutoff for number of different possible continuations for a context: remove all histograms with less than specified number of buckets | +| `frequency-add` | Add a constant to every count to lower probability distribution skew | +| `frequency-desaturate` | 0..1 - move every frequency towards average to lower probability distribution skew | + +## Return Type + +Depending on the implementation, it is only used as a argument for [MARKOV_GENERATE](/tidb-cloud-lake/sql/markov-generate.md). + +## Examples + +```sql +create table model as +select markov_train(concat('bar', number::string)) as bar from numbers(100); + +select markov_generate(bar,'{"order":5,"sliding_window_size":8}', 151, (number+100000)::string) as generate +from numbers(5), model; ++-----------+ +| generate | ++-----------+ +│ bar95 │ +│ bar64 │ +│ bar85 │ +│ bar56 │ +│ bar95 │ ++-----------+ +``` diff --git a/tidb-cloud-lake/sql/masking-policy-sql.md b/tidb-cloud-lake/sql/masking-policy-sql.md new file mode 100644 index 0000000000000..321bbba108d91 --- /dev/null +++ b/tidb-cloud-lake/sql/masking-policy-sql.md @@ -0,0 +1,24 @@ +--- +title: Masking Policy +--- +import EEFeature from '@site/src/components/EEFeature'; + + + +This page provides a comprehensive overview of Masking Policy operations in Databend, organized by functionality for easy reference. + +## Masking Policy Management + +| Command | Description | +|---------|-------------| +| [CREATE MASKING POLICY](/tidb-cloud-lake/sql/create-masking-policy.md) | Creates a new masking policy for data obfuscation | +| [DESCRIBE MASKING POLICY](/tidb-cloud-lake/sql/desc-masking-policy.md) | Shows details of a specific masking policy | +| [DROP MASKING POLICY](/tidb-cloud-lake/sql/drop-masking-policy.md) | Removes a masking policy | + +## Related Topics + +- [Masking Policy](/tidb-cloud-lake/guides/masking-policy.md) + +:::note +Masking policies in Databend allow you to protect sensitive data by dynamically transforming or obfuscating it when queried by users without proper privileges. +::: diff --git a/tidb-cloud-lake/sql/match.md b/tidb-cloud-lake/sql/match.md new file mode 100644 index 0000000000000..63a9ac9bd22b3 --- /dev/null +++ b/tidb-cloud-lake/sql/match.md @@ -0,0 +1,75 @@ +--- +title: MATCH +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +`MATCH` searches for rows that contain the supplied keywords within the listed columns. The function can only appear in a `WHERE` clause. + +:::info +Databend's MATCH function is inspired by Elasticsearch's [MATCH](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html#sql-functions-search-match). +::: + +## Syntax + +```sql +MATCH('', ''[, '']) +``` + +- ``: A comma-separated list of columns to search. Append `^` to weight a column higher than the others. +- ``: The terms to search for. Append `*` for suffix matching, for example `rust*`. +- ``: An optional semicolon-separated list of `key=value` pairs fine-tuning the search. + +## Options + +| Option | Values | Description | Example | +|--------|--------|-------------|---------| +| `fuzziness` | `1` or `2` | Matches keywords within the specified Levenshtein distance. | `MATCH('summary, tags', 'pedestrain', 'fuzziness=1')` matches rows that contain the correctly spelled `pedestrian`. | +| `operator` | `OR` (default) or `AND` | Controls how multiple keywords are combined when no boolean operator is specified. | `MATCH('summary, tags', 'traffic light red', 'operator=AND')` requires both words. | +| `lenient` | `true` or `false` | When `true`, suppresses parsing errors and returns an empty result set. | `MATCH('summary, tags', '()', 'lenient=true')` returns no rows instead of an error. | + +## Examples + +In many AI pipelines you may capture structured metadata in a `VARIANT` column while also materializing human-readable summaries for search. The following example stores dashcam frame summaries and tags that were extracted from the JSON payload. + +### Example: Build Searchable Summaries + +```sql +CREATE OR REPLACE TABLE frame_notes ( + id INT, + camera STRING, + summary STRING, + tags STRING, + INVERTED INDEX idx_notes (summary, tags) +); + +INSERT INTO frame_notes VALUES + (1, 'dashcam_front', + 'Green light at Market & 5th with pedestrian entering the crosswalk', + 'downtown commute green-light pedestrian'), + (2, 'dashcam_front', + 'Vehicle stopped at Mission & 6th red traffic light with cyclist ahead', + 'stop urban red-light cyclist'), + (3, 'dashcam_front', + 'School zone caution sign in SOMA with pedestrian waiting near crosswalk', + 'school-zone caution pedestrian'); +``` + +### Example: Boolean AND + +```sql +SELECT id, summary +FROM frame_notes +WHERE MATCH('summary, tags', 'traffic light red', 'operator=AND'); +-- Returns id 2 +``` + +### Example: Fuzzy Matching + +```sql +SELECT id, summary +FROM frame_notes +WHERE MATCH('summary^2, tags', 'pedestrain', 'fuzziness=1'); +-- Returns ids 1 and 3 +``` diff --git a/tidb-cloud-lake/sql/max-if.md b/tidb-cloud-lake/sql/max-if.md new file mode 100644 index 0000000000000..f6bc106215e4b --- /dev/null +++ b/tidb-cloud-lake/sql/max-if.md @@ -0,0 +1,44 @@ +--- +title: MAX_IF +--- + +## MAX_IF + +The suffix `_IF` can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition. + +```sql +MAX_IF(, ) +``` + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE sales ( + id INT, + salesperson_id INT, + product_id INT, + revenue FLOAT +); + +INSERT INTO sales (id, salesperson_id, product_id, revenue) +VALUES (1, 1, 1, 1000), + (2, 1, 2, 2000), + (3, 1, 3, 3000), + (4, 2, 1, 1500), + (5, 2, 2, 2500); +``` + +**Query Demo: Find Maximum Revenue for Salesperson with ID 1** + +```sql +SELECT MAX_IF(revenue, salesperson_id = 1) AS max_revenue_salesperson_1 +FROM sales; +``` + +**Result** +```sql +| max_revenue_salesperson_1 | +|---------------------------| +| 3000 | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/max.md b/tidb-cloud-lake/sql/max.md new file mode 100644 index 0000000000000..0f3fa3b69d870 --- /dev/null +++ b/tidb-cloud-lake/sql/max.md @@ -0,0 +1,57 @@ +--- +title: MAX +--- + +Aggregate function. + +The MAX() function returns the maximum value in a set of values. + +## Syntax + +``` +MAX() +``` + +## Arguments + +| Arguments | Description | +|-----------| ----------- | +| `` | Any expression | + +## Return Type + +The maximum value, in the type of the value. + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE temperatures ( + id INT, + city VARCHAR, + temperature FLOAT +); + +INSERT INTO temperatures (id, city, temperature) +VALUES (1, 'New York', 30), + (2, 'New York', 28), + (3, 'New York', 32), + (4, 'Los Angeles', 25), + (5, 'Los Angeles', 27); +``` + +**Query Demo: Find Maximum Temperature for New York City** + +```sql +SELECT city, MAX(temperature) AS max_temperature +FROM temperatures +WHERE city = 'New York' +GROUP BY city; +``` + +**Result** +```sql +| city | max_temperature | +|------------|-----------------| +| New York | 32 | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/md.md b/tidb-cloud-lake/sql/md.md new file mode 100644 index 0000000000000..a4caac3f6fd72 --- /dev/null +++ b/tidb-cloud-lake/sql/md.md @@ -0,0 +1,23 @@ +--- +title: MD5 +--- + +Calculates an MD5 128-bit checksum for a string. The value is returned as a string of 32 hexadecimal digits or NULL if the argument was NULL. + +## Syntax + +```sql +MD5() +``` + +## Examples + +```sql +SELECT MD5('1234567890'); + +┌──────────────────────────────────┐ +│ md5('1234567890') │ +├──────────────────────────────────┤ +│ e807f1fcf82d132f9bb018ca6738a19f │ +└──────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/median-tdigest.md b/tidb-cloud-lake/sql/median-tdigest.md new file mode 100644 index 0000000000000..0571907d9e441 --- /dev/null +++ b/tidb-cloud-lake/sql/median-tdigest.md @@ -0,0 +1,54 @@ +--- +title: MEDIAN_TDIGEST +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Computes the median of a numeric data sequence using the [t-digest](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf) algorithm. + +:::note +NULL values are not included in the calculation. +::: + +## Syntax + +```sql +MEDIAN_TDIGEST() +``` + +## Arguments + +| Arguments | Description | +|-----------|--------------------------| +| `` | Any numerical expression | + +## Return Type + +Returns a value of the same data type as the input values. + +## Examples + +```sql +-- Create a table and insert sample data +CREATE TABLE exam_scores ( + id INT, + student_id INT, + score INT +); + +INSERT INTO exam_scores (id, student_id, score) +VALUES (1, 1, 80), + (2, 2, 90), + (3, 3, 75), + (4, 4, 95), + (5, 5, 85); + +-- Calculate median exam score +SELECT MEDIAN_TDIGEST(score) AS median_score +FROM exam_scores; + +| median_score | +|----------------| +| 85.0 | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/median.md b/tidb-cloud-lake/sql/median.md new file mode 100644 index 0000000000000..a153aeda6c4c7 --- /dev/null +++ b/tidb-cloud-lake/sql/median.md @@ -0,0 +1,58 @@ +--- +title: MEDIAN +--- + +Aggregate function. + +The MEDIAN() function computes the median of a numeric data sequence. + +:::caution +NULL values are not counted. +::: + +## Syntax + +```sql +MEDIAN() +``` + +## Arguments + +| Arguments | Description | +|-----------|--------------------------| +| `` | Any numerical expression | + +## Return Type + +the type of the value. + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE exam_scores ( + id INT, + student_id INT, + score INT +); + +INSERT INTO exam_scores (id, student_id, score) +VALUES (1, 1, 80), + (2, 2, 90), + (3, 3, 75), + (4, 4, 95), + (5, 5, 85); +``` + +**Query Demo: Calculate Median Exam Score** +```sql +SELECT MEDIAN(score) AS median_score +FROM exam_scores; +``` + +**Result** +```sql +| median_score | +|----------------| +| 85.0 | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/merge.md b/tidb-cloud-lake/sql/merge.md new file mode 100644 index 0000000000000..a30f5cca49379 --- /dev/null +++ b/tidb-cloud-lake/sql/merge.md @@ -0,0 +1,239 @@ +--- +title: MERGE +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Performs **INSERT**, **UPDATE**, or **DELETE** operations on rows within a target table, all in accordance with conditions and matching criteria specified within the statement, using data from a specified source. + +The data source, which can be a subquery, is linked to the target data via a JOIN expression. This expression assesses whether each row in the source can find a match in the target table and then determines which type of clause (MATCHED or NOT MATCHED) it should move to in the next execution step. + +![Alt text](/img/sql/merge-into-single-clause.jpeg) + +A MERGE statement usually contains a MATCHED and / or a NOT MATCHED clause, instructing Databend on how to handle matched and unmatched scenarios. For a MATCHED clause, you have the option to choose between performing an **UPDATE** or **DELETE** operation on the target table. Conversely, in the case of a NOT MATCHED clause, the available choice is **INSERT**. + +## Multiple MATCHED & NOT MATCHED Clauses + +A MERGE statement can include multiple MATCHED and / or NOT MATCHED clauses, giving you the flexibility to specify different actions to be taken based on the conditions met during the MERGE operation. + +![Alt text](/img/sql/merge-into-multi-clause.jpeg) + +If a MERGE statement includes multiple MATCHED clauses, a condition needs to be specified for each clause EXCEPT the last one. These conditions determine the criteria under which the associated operations are executed. Databend evaluates the conditions in the specified order. Once a condition is met, it triggers the specified operation, skips any remaining MATCHED clauses, then moves on to the next row in the source. If the MERGE statement also includes multiple NOT MATCHED clauses, Databend handles them in a similar way. + +## Syntax + +```sql +MERGE INTO + USING (SELECT ... ) [AS] ON { matchedClause | notMatchedClause } [ ... ] + +matchedClause ::= + WHEN MATCHED [ AND ] THEN + { + UPDATE SET = [ , = ... ] | + UPDATE * | + DELETE /* Removes matched rows from the target table */ + } + +notMatchedClause ::= + WHEN NOT MATCHED [ AND ] THEN + { INSERT ( [ , ... ] ) VALUES ( [ , ... ] ) | INSERT * } +``` + +| Parameter | Description | +| --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| UPDATE \* | Updates all columns of the matched row in the target table with values from the corresponding row in the source. This requires the column names between the source and target are consistent (though their order can be different) because during the update process, matching is done based on column names. | +| INSERT \* | Inserts a new row into the target table with values from the source row. | +| DELETE | Removes the matched row from the target table. This is a powerful operation that can be used for data cleanup, removing obsolete records, or implementing conditional deletion logic based on source data. | + +## Output + +MERGE provides a summary of the data merge results with these columns: + +| Column | Description | +| ----------------------- | ---------------------------------------------------- | +| number of rows inserted | Count of new rows added to the target table. | +| number of rows updated | Count of existing rows modified in the target table. | +| number of rows deleted | Count of rows deleted from the target table. | + +## Examples + +### Example 1: Merge with Multiple Matched Clauses + +This example uses MERGE to synchronize employee data from 'employees' into 'salaries', allowing for inserting and updating salary information based on specified criteria. + +```sql +-- Create the 'employees' table as the source for merging +CREATE TABLE employees ( + employee_id INT, + employee_name VARCHAR(255), + department VARCHAR(255) +); + +-- Create the 'salaries' table as the target for merging +CREATE TABLE salaries ( + employee_id INT, + salary DECIMAL(10, 2) +); + +-- Insert initial employee data +INSERT INTO employees VALUES + (1, 'Alice', 'HR'), + (2, 'Bob', 'IT'), + (3, 'Charlie', 'Finance'), + (4, 'David', 'HR'); + +-- Insert initial salary data +INSERT INTO salaries VALUES + (1, 50000.00), + (2, 60000.00); + +-- Enable MERGE INTO + +-- Merge data into 'salaries' based on employee details from 'employees' +MERGE INTO salaries + USING (SELECT * FROM employees) AS employees + ON salaries.employee_id = employees.employee_id + WHEN MATCHED AND employees.department = 'HR' THEN + UPDATE SET + salaries.salary = salaries.salary + 1000.00 + WHEN MATCHED THEN + UPDATE SET + salaries.salary = salaries.salary + 500.00 + WHEN NOT MATCHED THEN + INSERT (employee_id, salary) + VALUES (employees.employee_id, 55000.00); + +┌──────────────────────────────────────────────────┐ +│ number of rows inserted │ number of rows updated │ +├─────────────────────────┼────────────────────────┤ +│ 2 │ 2 │ +└──────────────────────────────────────────────────┘ + +-- Retrieve all records from the 'salaries' table after merging +SELECT * FROM salaries; + +┌────────────────────────────────────────────┐ +│ employee_id │ salary │ +├─────────────────┼──────────────────────────┤ +│ 3 │ 55000.00 │ +│ 4 │ 55000.00 │ +│ 1 │ 51000.00 │ +│ 2 │ 60500.00 │ +└────────────────────────────────────────────┘ +``` + +### Example 2: Merge with UPDATE \* & INSERT \* + +This example uses MERGE to synchronize data between the target_table and source_table, updating matching rows with values from the source and inserting non-matching rows. + +```sql +-- Create the target table target_table +CREATE TABLE target_table ( + ID INT, + Name VARCHAR(50), + Age INT, + City VARCHAR(50) +); + +-- Insert initial data into target_table +INSERT INTO target_table (ID, Name, Age, City) +VALUES + (1, 'Alice', 25, 'Toronto'), + (2, 'Bob', 30, 'Vancouver'), + (3, 'Carol', 28, 'Montreal'); + +-- Create the source table source_table +CREATE TABLE source_table ( + ID INT, + Name VARCHAR(50), + Age INT, + City VARCHAR(50) +); + +-- Insert initial data into source_table +INSERT INTO source_table (ID, Name, Age, City) +VALUES + (1, 'David', 27, 'Calgary'), + (2, 'Emma', 29, 'Ottawa'), + (4, 'Frank', 32, 'Edmonton'); + +-- Enable MERGE INTO + +-- Merge data from source_table into target_table +MERGE INTO target_table AS T + USING (SELECT * FROM source_table) AS S + ON T.ID = S.ID + WHEN MATCHED THEN + UPDATE * + WHEN NOT MATCHED THEN + INSERT *; + +┌──────────────────────────────────────────────────┐ +│ number of rows inserted │ number of rows updated │ +├─────────────────────────┼────────────────────────┤ +│ 1 │ 2 │ +└──────────────────────────────────────────────────┘ + +-- Retrieve all records from the 'target_table' after merging +SELECT * FROM target_table order by ID; + +┌─────────────────────────────────────────────────────────────────────────┐ +│ id │ name │ age │ city │ +├─────────────────┼──────────────────┼─────────────────┼──────────────────┤ +│ 1 │ David │ 27 │ Calgary │ +│ 2 │ Emma │ 29 │ Ottawa │ +│ 3 │ Carol │ 28 │ Montreal │ +│ 4 │ Frank │ 32 │ Edmonton │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +### Example 3: Merge with DELETE Operation + +This example demonstrates how to use MERGE to delete records from the target table based on specific conditions from the source table. + +```sql +-- Create the customers table (target) +CREATE TABLE customers ( + customer_id INT, + customer_name VARCHAR(50), + status VARCHAR(20), + last_purchase_date DATE +); + +-- Insert initial customer data +INSERT INTO customers VALUES + (101, 'John Smith', 'Active', '2023-01-15'), + (102, 'Emma Johnson', 'Active', '2023-02-20'), + (103, 'Michael Brown', 'Inactive', '2022-11-05'), + (104, 'Sarah Wilson', 'Active', '2023-03-10'), + (105, 'David Lee', 'Inactive', '2022-09-30'); + +-- Create the removals table (source with customers to be removed) +CREATE TABLE removals ( + customer_id INT, + removal_reason VARCHAR(50), + removal_date DATE +); + +-- Insert data for customers to be removed +INSERT INTO removals VALUES + (103, 'Account Closed', '2023-04-01'), + (105, 'Customer Request', '2023-04-05'); + +-- Enable MERGE INTO + +-- Use MERGE to delete inactive customers that appear in the removals table +MERGE INTO customers AS c + USING removals AS r + ON c.customer_id = r.customer_id + WHEN MATCHED AND c.status = 'Inactive' THEN + DELETE; + +┌────────────────────────┐ +│ number of rows deleted │ +├────────────────────────┤ +│ 2 │ +└────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/microseconds.md b/tidb-cloud-lake/sql/microseconds.md new file mode 100644 index 0000000000000..c0fc841e88b0d --- /dev/null +++ b/tidb-cloud-lake/sql/microseconds.md @@ -0,0 +1,32 @@ +--- +title: TO_MICROSECONDS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts a specified number of microseconds into an Interval type. + +- Accepts positive integers, zero, and negative integers as input. + +## Syntax + +```sql +TO_MICROSECONDS() +``` + +## Return Type + +Interval (in the format `hh:mm:ss.sssssss`). + +## Examples + +```sql +SELECT TO_MICROSECONDS(2), TO_MICROSECONDS(0), TO_MICROSECONDS((- 2)); + +┌────────────────────────────────────────────────────────────────┐ +│ to_microseconds(2) │ to_microseconds(0) │ to_microseconds(- 2) │ +├────────────────────┼────────────────────┼──────────────────────┤ +│ 0:00:00.000002 │ 00:00:00 │ -0:00:00.000002 │ +└────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/mid.md b/tidb-cloud-lake/sql/mid.md new file mode 100644 index 0000000000000..9ce598fc709d5 --- /dev/null +++ b/tidb-cloud-lake/sql/mid.md @@ -0,0 +1,5 @@ +--- +title: MID +--- + +Alias for [SUBSTR](/tidb-cloud-lake/sql/substr.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/millennia.md b/tidb-cloud-lake/sql/millennia.md new file mode 100644 index 0000000000000..733f1f0cfe9ae --- /dev/null +++ b/tidb-cloud-lake/sql/millennia.md @@ -0,0 +1,32 @@ +--- +title: TO_MILLENNIA +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts a specified number of millennia into an Interval type. + +- Accepts positive integers, zero, and negative integers as input. + +## Syntax + +```sql +TO_MILLENNIA() +``` + +## Return Type + +Interval (represented in years). + +## Examples + +```sql +SELECT TO_MILLENNIA(2), TO_MILLENNIA(0), TO_MILLENNIA((- 2)); + +┌───────────────────────────────────────────────────────┐ +│ to_millennia(2) │ to_millennia(0) │ to_millennia(- 2) │ +├─────────────────┼─────────────────┼───────────────────┤ +│ 2000 years │ 00:00:00 │ -2000 years │ +└───────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/millennium.md b/tidb-cloud-lake/sql/millennium.md new file mode 100644 index 0000000000000..daebbf59c3fcd --- /dev/null +++ b/tidb-cloud-lake/sql/millennium.md @@ -0,0 +1,31 @@ +--- +title: MILLENNIUM +--- + +Returns the millennium of a given date or timestamp. The 1st millennium spans years 0001–1000, the 2nd spans 1001–2000, the 3rd spans 2001–3000, and so on. + +## Syntax + +```sql +MILLENNIUM() +``` + +## Return Type + +UInt8 — the millennium number starting from 1. + +## Examples + +```sql +SELECT + MILLENNIUM('1992-02-15') AS millennium_1992, + MILLENNIUM('2025-04-16 12:34:56') AS millennium_2025; +``` + +```sql +┌───────────────────────────────────┐ +│ millennium_1992 │ millennium_2025 │ +├─────────────────┼─────────────────┤ +│ 2 │ 3 │ +└───────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/milliseconds.md b/tidb-cloud-lake/sql/milliseconds.md new file mode 100644 index 0000000000000..eb6930e51b1aa --- /dev/null +++ b/tidb-cloud-lake/sql/milliseconds.md @@ -0,0 +1,32 @@ +--- +title: TO_MILLISECONDS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts a specified number of milliseconds into an Interval type. + +- Accepts positive integers, zero, and negative integers as input. + +## Syntax + +```sql +TO_MILLISECONDS() +``` + +## Return Type + +Interval (in the format `hh:mm:ss.sss`). + +## Examples + +```sql +SELECT TO_MILLISECONDS(2), TO_MILLISECONDS(0), TO_MILLISECONDS((- 2)); + +┌────────────────────────────────────────────────────────────────┐ +│ to_milliseconds(2) │ to_milliseconds(0) │ to_milliseconds(- 2) │ +├────────────────────┼────────────────────┼──────────────────────┤ +│ 0:00:00.002 │ 00:00:00 │ -0:00:00.002 │ +└────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/min-if.md b/tidb-cloud-lake/sql/min-if.md new file mode 100644 index 0000000000000..b0715ef8adc0f --- /dev/null +++ b/tidb-cloud-lake/sql/min-if.md @@ -0,0 +1,45 @@ +--- +title: MIN_IF +--- + + +## MIN_IF + +The suffix `_IF` can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition. + +``` +MIN_IF(, ) +``` + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE project_budgets ( + id INT, + project_id INT, + department VARCHAR, + budget FLOAT +); + +INSERT INTO project_budgets (id, project_id, department, budget) +VALUES (1, 1, 'HR', 1000), + (2, 1, 'IT', 2000), + (3, 1, 'Marketing', 3000), + (4, 2, 'HR', 1500), + (5, 2, 'IT', 2500); +``` + +**Query Demo: Find Minimum Budget for IT Department** + +```sql +SELECT MIN_IF(budget, department = 'IT') AS min_it_budget +FROM project_budgets; +``` + +**Result** +```sql +| min_it_budget | +|---------------| +| 2000 | +``` diff --git a/tidb-cloud-lake/sql/min.md b/tidb-cloud-lake/sql/min.md new file mode 100644 index 0000000000000..156750796876b --- /dev/null +++ b/tidb-cloud-lake/sql/min.md @@ -0,0 +1,82 @@ +--- +title: MIN +--- + +Aggregate function. + +The MIN() function returns the minimum value in a set of values. + +## Syntax + +``` +MIN() +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------| +| `` | Any expression | + +## Return Type + +The minimum value, in the type of the value. + +## Example + +--- +title: MIN +--- + +Aggregate function. + +The MIN() function returns the minimum value in a set of values. + +## Syntax + +``` +MIN(expression) +``` + +## Arguments + +| Arguments | Description | +| ----------- | ----------- | +| expression | Any expression | + +## Return Type + +The minimum value, in the type of the value. + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE gas_prices ( + id INT, + station_id INT, + price FLOAT +); + +INSERT INTO gas_prices (id, station_id, price) +VALUES (1, 1, 3.50), + (2, 1, 3.45), + (3, 1, 3.55), + (4, 2, 3.40), + (5, 2, 3.35); +``` + +**Query Demo: Find Minimum Gas Price for Station 1** +```sql +SELECT station_id, MIN(price) AS min_price +FROM gas_prices +WHERE station_id = 1 +GROUP BY station_id; +``` + +**Result** +```sql +| station_id | min_price | +|------------|-----------| +| 1 | 3.45 | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/minus.md b/tidb-cloud-lake/sql/minus.md new file mode 100644 index 0000000000000..1e85c712139e0 --- /dev/null +++ b/tidb-cloud-lake/sql/minus.md @@ -0,0 +1,29 @@ +--- +title: MINUS +--- + +Negates a numeric value. + +## Syntax + +```sql +MINUS( ) +``` + +## Aliases + +- [NEG](/tidb-cloud-lake/sql/neg.md) +- [NEGATE](/tidb-cloud-lake/sql/negate.md) +- [SUBTRACT](/tidb-cloud-lake/sql/subtract.md) + +## Examples + +```sql +SELECT MINUS(PI()), NEG(PI()), NEGATE(PI()), SUBTRACT(PI()); + +┌───────────────────────────────────────────────────────────────────────────────────┐ +│ minus(pi()) │ neg(pi()) │ negate(pi()) │ subtract(pi()) │ +├────────────────────┼────────────────────┼────────────────────┼────────────────────┤ +│ -3.141592653589793 │ -3.141592653589793 │ -3.141592653589793 │ -3.141592653589793 │ +└───────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/minute.md b/tidb-cloud-lake/sql/minute.md new file mode 100644 index 0000000000000..b56a8786a7b83 --- /dev/null +++ b/tidb-cloud-lake/sql/minute.md @@ -0,0 +1,34 @@ +--- +title: TO_MINUTE +--- + +Converts a date with time (timestamp/datetime) to a UInt8 number containing the number of the minute of the hour (0-59). + +## Syntax + +```sql +TO_MINUTE() +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | timestamp | + +## Return Type + + `TINYINT` + +## Examples + +```sql +SELECT + to_minute('2023-11-12 09:38:18.165575'); + +┌─────────────────────────────────────────┐ +│ to_minute('2023-11-12 09:38:18.165575') │ +├─────────────────────────────────────────┤ +│ 38 │ +└─────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/minutes.md b/tidb-cloud-lake/sql/minutes.md new file mode 100644 index 0000000000000..b1aeece059f8b --- /dev/null +++ b/tidb-cloud-lake/sql/minutes.md @@ -0,0 +1,32 @@ +--- +title: TO_MINUTES +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts a specified number of minutes into an Interval type. + +- Accepts positive integers, zero, and negative integers as input. + +## Syntax + +```sql +TO_MINUTES() +``` + +## Return Type + +Interval (in the format `hh:mm:ss`). + +## Examples + +```sql +SELECT TO_MINUTES(2), TO_MINUTES(0), TO_MINUTES((- 2)); + +┌─────────────────────────────────────────────────┐ +│ to_minutes(2) │ to_minutes(0) │ to_minutes(- 2) │ +├───────────────┼───────────────┼─────────────────┤ +│ 0:02:00 │ 00:00:00 │ -0:02:00 │ +└─────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/mod.md b/tidb-cloud-lake/sql/mod.md new file mode 100644 index 0000000000000..ba34ee3755830 --- /dev/null +++ b/tidb-cloud-lake/sql/mod.md @@ -0,0 +1,5 @@ +--- +title: MOD +--- + +Alias for [MODULO](/tidb-cloud-lake/sql/modulo.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/mode.md b/tidb-cloud-lake/sql/mode.md new file mode 100644 index 0000000000000..8cad6250a7a27 --- /dev/null +++ b/tidb-cloud-lake/sql/mode.md @@ -0,0 +1,48 @@ +--- +title: MODE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the value that appears most frequently in a group of values. + +## Syntax + +```sql +MODE() +``` + +## Examples + +This example identifies the best-selling product for each month from the sales data: + +```sql +CREATE OR REPLACE TABLE sales ( + sale_date DATE, + product_id INT, + quantity INT +); + +INSERT INTO sales (sale_date, product_id, quantity) VALUES + ('2024-01-01', 101, 10), + ('2024-01-02', 102, 15), + ('2024-01-02', 101, 10), + ('2024-01-03', 103, 8), + ('2024-01-03', 101, 10), + ('2024-02-01', 101, 20), + ('2024-02-02', 102, 15), + ('2024-02-03', 102, 15); + +SELECT MONTH(sale_date) AS month, MODE(product_id) AS most_sold_product +FROM sales +GROUP BY month +ORDER BY month; + +┌─────────────────────────────────────┐ +│ month │ most_sold_product │ +├─────────────────┼───────────────────┤ +│ 1 │ 101 │ +│ 2 │ 102 │ +└─────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/modulo.md b/tidb-cloud-lake/sql/modulo.md new file mode 100644 index 0000000000000..b75cc6e087a5b --- /dev/null +++ b/tidb-cloud-lake/sql/modulo.md @@ -0,0 +1,27 @@ +--- +title: MODULO +--- + +Returns the remainder of `x` divided by `y`. If `y` is 0, it returns an error. + +## Syntax + +```sql +MODULO( , ) +``` + +## Aliases + +- [MOD](/tidb-cloud-lake/sql/mod.md) + +## Examples + +```sql +SELECT MOD(9, 2), MODULO(9, 2); + +┌──────────────────────────┐ +│ mod(9, 2) │ modulo(9, 2) │ +├───────────┼──────────────┤ +│ 1 │ 1 │ +└──────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/monday.md b/tidb-cloud-lake/sql/monday.md new file mode 100644 index 0000000000000..f87f821b1ae15 --- /dev/null +++ b/tidb-cloud-lake/sql/monday.md @@ -0,0 +1,35 @@ +--- +title: TO_MONDAY +--- + +Round down a date or date with time (timestamp/datetime) to the nearest Monday. +Returns the date. + +## Syntax + +```sql +TO_MONDAY() +``` + +## Arguments + +| Arguments | Description | +| ----------- | ----------- | +| `` | date/timestamp | + +## Return Type + +`DATE`, returns date in “YYYY-MM-DD” format. + +## Examples + +```sql +SELECT + to_monday('2023-11-12 09:38:18.165575'); + +┌─────────────────────────────────────────┐ +│ to_monday('2023-11-12 09:38:18.165575') │ +├─────────────────────────────────────────┤ +│ 2023-11-06 │ +└─────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/month.md b/tidb-cloud-lake/sql/month.md new file mode 100644 index 0000000000000..feb550b6a9853 --- /dev/null +++ b/tidb-cloud-lake/sql/month.md @@ -0,0 +1,8 @@ +--- +title: MONTH +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Alias for [TO_MONTH](/tidb-cloud-lake/sql/to-month.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/months-between.md b/tidb-cloud-lake/sql/months-between.md new file mode 100644 index 0000000000000..0cf36fb58a644 --- /dev/null +++ b/tidb-cloud-lake/sql/months-between.md @@ -0,0 +1,71 @@ +--- +title: MONTHS_BETWEEN +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the number of months between *date1* and *date2*. + +## Syntax + +```sql +MONTHS_BETWEEN( , ) +``` + +## Arguments + +*date1* and *date2* can be of DATE type, TIMESTAMP type, or a mix of both. + +## Return Type + +The function returns a FLOAT value based on the following rules: + +- If *date1* is earlier than *date2*, the function returns a negative value; otherwise, it returns a positive value. + + ```sql title='Example:' + SELECT + MONTHS_BETWEEN('2024-03-15'::DATE, + '2024-02-15'::DATE), + MONTHS_BETWEEN('2024-02-15'::DATE, + '2024-03-15'::DATE); + + -[ RECORD 1 ]----------------------------------- + months_between('2024-03-15'::date, '2024-02-15'::date): 1 + months_between('2024-02-15'::date, '2024-03-15'::date): -1 + ``` + +- If *date1* and *date2* fall on the same day of their respective months or both are the last day of their respective months, the result is an integer. Otherwise, the function calculates the fractional portion of the result based on a 31-day month. + + ```sql title='Example:' + SELECT + MONTHS_BETWEEN('2024-02-29'::DATE, + '2024-01-29'::DATE), + MONTHS_BETWEEN('2024-02-29'::DATE, + '2024-01-31'::DATE); + + -[ RECORD 1 ]----------------------------------- + months_between('2024-02-29'::date, '2024-01-29'::date): 1 + months_between('2024-02-29'::date, '2024-01-31'::date): 1 + + SELECT + MONTHS_BETWEEN('2024-08-05'::DATE, + '2024-01-01'::DATE); + + -[ RECORD 1 ]----------------------------------- + months_between('2024-08-05'::date, '2024-01-01'::date): 7.129032258064516 + ``` + +- If *date1* and *date2* are the same date, the function ignores any time components and returns 0. + + ```sql title='Example:' + SELECT + MONTHS_BETWEEN('2024-08-05'::DATE, + '2024-08-05'::DATE), + MONTHS_BETWEEN('2024-08-05 02:00:00'::TIMESTAMP, + '2024-08-05 01:00:00'::TIMESTAMP); + + -[ RECORD 1 ]----------------------------------- + months_between('2024-08-05'::date, '2024-08-05'::date): 0 + months_between('2024-08-05 02:00:00'::timestamp, '2024-08-05 01:00:00'::timestamp): 0 + ``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/months.md b/tidb-cloud-lake/sql/months.md new file mode 100644 index 0000000000000..dfdb28c366d01 --- /dev/null +++ b/tidb-cloud-lake/sql/months.md @@ -0,0 +1,32 @@ +--- +title: TO_MONTHS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts a specified number of months into an Interval type. + +- Accepts positive integers, zero, and negative integers as input. + +## Syntax + +```sql +TO_MONTHS() +``` + +## Return Type + +Interval (represented in months). + +## Examples + +```sql +SELECT TO_MONTHS(2), TO_MONTHS(0), TO_MONTHS((- 2)); + +┌──────────────────────────────────────────────┐ +│ to_months(2) │ to_months(0) │ to_months(- 2) │ +├──────────────┼──────────────┼────────────────┤ +│ 2 months │ 00:00:00 │ -2 months │ +└──────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/multiply.md b/tidb-cloud-lake/sql/multiply.md new file mode 100644 index 0000000000000..7cfa05a1c4e3a --- /dev/null +++ b/tidb-cloud-lake/sql/multiply.md @@ -0,0 +1,70 @@ +--- +title: MULTIPLY +--- + +The MULTIPLY function performs multiplication between two numbers. It is equivalent to using the `*` operator. + +## Syntax + +```sql +MULTIPLY(x, y) +-- Or using the operator syntax +x * y +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| x, y | Numeric expressions to multiply together. | + +## Return Type + +Returns a numeric value with the appropriate data type based on the input arguments. + +## Examples + +```sql +-- Using the function syntax +SELECT MULTIPLY(5, 3); ++----------------+ +| MULTIPLY(5, 3) | ++----------------+ +| 15 | ++----------------+ + +-- Using the operator syntax +SELECT 5 * 3; ++-------+ +| 5 * 3 | ++-------+ +| 15 | ++-------+ + +-- With decimal numbers +SELECT MULTIPLY(2.5, 4); ++------------------+ +| MULTIPLY(2.5, 4) | ++------------------+ +| 10.0 | ++------------------+ + +-- With column references +SELECT number, MULTIPLY(number, 10) AS multiplied +FROM numbers(5); ++--------+------------+ +| number | multiplied | ++--------+------------+ +| 0 | 0 | +| 1 | 10 | +| 2 | 20 | +| 3 | 30 | +| 4 | 40 | ++--------+------------+ +``` + +## See Also + +- [PLUS](/tidb-cloud-lake/sql/plus.md) / [ADD](/tidb-cloud-lake/sql/add.md) +- [MINUS](/tidb-cloud-lake/sql/minus.md) / [SUBTRACT](/tidb-cloud-lake/sql/subtract.md) +- [DIV](/tidb-cloud-lake/sql/div.md) diff --git a/tidb-cloud-lake/sql/neg.md b/tidb-cloud-lake/sql/neg.md new file mode 100644 index 0000000000000..6374dfd29b996 --- /dev/null +++ b/tidb-cloud-lake/sql/neg.md @@ -0,0 +1,5 @@ +--- +title: NEG +--- + +Alias for [MINUS](/tidb-cloud-lake/sql/minus.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/negate.md b/tidb-cloud-lake/sql/negate.md new file mode 100644 index 0000000000000..a51a7967429bf --- /dev/null +++ b/tidb-cloud-lake/sql/negate.md @@ -0,0 +1,5 @@ +--- +title: NEGATE +--- + +Alias for [MINUS](/tidb-cloud-lake/sql/minus.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/network-policy.md b/tidb-cloud-lake/sql/network-policy.md new file mode 100644 index 0000000000000..ce50e2cb42b43 --- /dev/null +++ b/tidb-cloud-lake/sql/network-policy.md @@ -0,0 +1,28 @@ +--- +title: Network Policy +--- + +This page provides a comprehensive overview of Network Policy operations in Databend, organized by functionality for easy reference. + +## Network Policy Management + +| Command | Description | +|---------|-------------| +| [CREATE NETWORK POLICY](ddl-create-policy.md) | Creates a new network policy to control access based on IP addresses | +| [ALTER NETWORK POLICY](ddl-alter-policy.md) | Modifies an existing network policy | +| [DROP NETWORK POLICY](ddl-drop-policy.md) | Removes a network policy | + +## Network Policy Information + +| Command | Description | +|---------|-------------| +| [DESCRIBE NETWORK POLICY](ddl-desc-policy.md) | Shows details of a specific network policy | +| [SHOW NETWORK POLICIES](ddl-show-policy.md) | Lists all network policies | + +## Related Topics + +- [Network Policy](/tidb-cloud-lake/guides/network-policy.md) + +:::note +Network policies in Databend allow you to control access to your database by specifying allowed or blocked IP addresses and ranges. +::: \ No newline at end of file diff --git a/tidb-cloud-lake/sql/next-day.md b/tidb-cloud-lake/sql/next-day.md new file mode 100644 index 0000000000000..2e2cdac1e66a7 --- /dev/null +++ b/tidb-cloud-lake/sql/next-day.md @@ -0,0 +1,38 @@ +--- +title: NEXT_DAY +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the date of the upcoming specified day of the week after the given date or timestamp. + +## Syntax + +```sql +NEXT_DAY(, ) +``` + +| Parameter | Description | +|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `` | A `DATE` or `TIMESTAMP` value to calculate the next occurrence of the specified day. | +| `` | The target day of the week to find the next occurrence of. Accepted values include `monday`, `tuesday`, `wednesday`, `thursday`, `friday`, `saturday`, and `sunday`. | + +## Return Type + +Date. + +## Examples + +To find the next Monday after a specific date, such as 2024-11-13: + +```sql +SELECT NEXT_DAY(to_date('2024-11-13'), monday) AS next_monday; + +┌─────────────┐ +│ next_monday │ +├─────────────┤ +│ 2024-11-18 │ +└─────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/nextval.md b/tidb-cloud-lake/sql/nextval.md new file mode 100644 index 0000000000000..8c83a4ec8fb22 --- /dev/null +++ b/tidb-cloud-lake/sql/nextval.md @@ -0,0 +1,88 @@ +--- +title: NEXTVAL +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Retrieves the next value from a sequence. + +## Syntax + +```sql +NEXTVAL() +``` + +## Return Type + +Integer. + +## Access control requirements + +| Privilege | Object Type | Description | +|:----------------|:------------|:-------------------| +| ACCESS SEQUENCE | SEQUENCE | Access a sequence. | + + +To access a sequence, the user performing the operation or the roles must have the ACCESS SEQUENCE [privilege](/tidb-cloud-lake/guides/privileges.md). + +:::note + +The enable_experimental_sequence_rbac_check settings governs sequence-level access control. It is disabled by default. +sequence creation solely requires the user to possess superuser privileges, bypassing detailed RBAC checks. +When enabled, granular permission verification is enforced during sequence establishment. + +This is an experimental feature and may be enabled by default in the future. + +::: + + +## Examples + +This example demonstrates how the NEXTVAL function works with a sequence: + +```sql +CREATE SEQUENCE my_seq; + +SELECT + NEXTVAL(my_seq), + NEXTVAL(my_seq), + NEXTVAL(my_seq); + +┌─────────────────────────────────────────────────────┐ +│ nextval(my_seq) │ nextval(my_seq) │ nextval(my_seq) │ +├─────────────────┼─────────────────┼─────────────────┤ +│ 1 │ 2 │ 3 │ +└─────────────────────────────────────────────────────┘ +``` + +This example showcases how sequences and the NEXTVAL function are employed to automatically generate and assign unique identifiers to rows in a table. + +```sql +-- Create a new sequence named staff_id_seq +CREATE SEQUENCE staff_id_seq; + +-- Create a new table named staff with an auto-generated staff_id +CREATE TABLE staff ( + staff_id INT DEFAULT NEXTVAL(staff_id_seq), + name VARCHAR(50), + department VARCHAR(50) +); + +-- Insert a new staff member with an auto-generated staff_id into the staff table +INSERT INTO staff (name, department) +VALUES ('John Doe', 'HR'); + +-- Insert another row +INSERT INTO staff (name, department) +VALUES ('Jane Smith', 'Finance'); + +SELECT * FROM staff; + +┌───────────────────────────────────────────────────────┐ +│ staff_id │ name │ department │ +├─────────────────┼──────────────────┼──────────────────┤ +│ 3 │ Jane Smith │ Finance │ +│ 2 │ John Doe │ HR │ +└───────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/ngram-index.md b/tidb-cloud-lake/sql/ngram-index.md new file mode 100644 index 0000000000000..ecaf803e76f8e --- /dev/null +++ b/tidb-cloud-lake/sql/ngram-index.md @@ -0,0 +1,16 @@ +--- +title: Ngram Index +--- +This page provides a comprehensive overview of Ngram index operations in Databend, organized by functionality for easy reference. + +## Ngram Index Management + +| Command | Description | +|-----------------------------------------------|----------------------------------------------------------| +| [CREATE NGRAM INDEX](/tidb-cloud-lake/sql/create-ngram-index.md) | Creates a new Ngram index for efficient substring search | +| [REFRESH NGRAM INDEX](/tidb-cloud-lake/sql/refresh-ngram-index.md) | Refreshes an Ngram index | +| [DROP NGRAM INDEX](/tidb-cloud-lake/sql/drop-ngram-index.md) | Removes an Ngram index | + +:::note +Ngram indexes in Databend enable efficient substring and pattern matching searches within text data, improving performance for LIKE and similar operations. +::: diff --git a/tidb-cloud-lake/sql/not-between.md b/tidb-cloud-lake/sql/not-between.md new file mode 100644 index 0000000000000..cf425ba5435e3 --- /dev/null +++ b/tidb-cloud-lake/sql/not-between.md @@ -0,0 +1,31 @@ +--- +title: "[ NOT ] BETWEEN" +--- + +Returns `true` if the given numeric or string ` ` falls inside the defined lower and upper limits. + +## Syntax + +```sql + [ NOT ] BETWEEN AND +``` + +## Examples + +```sql +SELECT 'true' WHERE 5 BETWEEN 0 AND 5; + +┌────────┐ +│ 'true' │ +├────────┤ +│ true │ +└────────┘ + +SELECT 'true' WHERE 'data' BETWEEN 'data' AND 'databendcloud'; + +┌────────┐ +│ 'true' │ +├────────┤ +│ true │ +└────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/not-like.md b/tidb-cloud-lake/sql/not-like.md new file mode 100644 index 0000000000000..d2cf1d92141c0 --- /dev/null +++ b/tidb-cloud-lake/sql/not-like.md @@ -0,0 +1,24 @@ +--- +title: NOT LIKE +--- + +Pattern not matching using an SQL pattern. Returns 1 (TRUE) or 0 (FALSE). If either expr or pat is NULL, the result is NULL. + +## Syntax + +```sql + NOT LIKE +``` + +## Examples + +```sql +SELECT name, category FROM system.functions WHERE name like 'tou%' AND name not like '%64' ORDER BY name; ++----------+------------+ +| name | category | ++----------+------------+ +| touint16 | conversion | +| touint32 | conversion | +| touint8 | conversion | ++----------+------------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/not-regexp.md b/tidb-cloud-lake/sql/not-regexp.md new file mode 100644 index 0000000000000..91c5389761e77 --- /dev/null +++ b/tidb-cloud-lake/sql/not-regexp.md @@ -0,0 +1,22 @@ +--- +title: NOT REGEXP +--- + +Returns 1 if the string expr doesn't match the regular expression specified by the pattern pat, 0 otherwise. + +## Syntax + +```sql + NOT REGEXP +``` + +## Examples + +```sql +SELECT 'databend' NOT REGEXP 'd*'; ++------------------------------+ +| ('databend' not regexp 'd*') | ++------------------------------+ +| 0 | ++------------------------------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/not-rlike.md b/tidb-cloud-lake/sql/not-rlike.md new file mode 100644 index 0000000000000..509ab71293cbd --- /dev/null +++ b/tidb-cloud-lake/sql/not-rlike.md @@ -0,0 +1,22 @@ +--- +title: NOT RLIKE +--- + +Returns 1 if the string expr doesn't match the regular expression specified by the pattern pat, 0 otherwise. + +## Syntax + +```sql + NOT RLIKE +``` + +## Examples + +```sql +SELECT 'databend' not rlike 'd*'; ++-----------------------------+ +| ('databend' not rlike 'd*') | ++-----------------------------+ +| 0 | ++-----------------------------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/not.md b/tidb-cloud-lake/sql/not.md new file mode 100644 index 0000000000000..82b87fadda315 --- /dev/null +++ b/tidb-cloud-lake/sql/not.md @@ -0,0 +1,23 @@ +--- +title: "[ NOT ] IN" +--- + +Checks whether a value is (or is not) in an explicit list. + +## Syntax + +```sql + [ NOT ] IN (, ...) +``` + +## Examples + +```sql +SELECT 1 NOT IN (2, 3); + +┌────────────────┐ +│ 1 not in(2, 3) │ +├────────────────┤ +│ true │ +└────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/notification.md b/tidb-cloud-lake/sql/notification.md new file mode 100644 index 0000000000000..f7d1bc1fbc79e --- /dev/null +++ b/tidb-cloud-lake/sql/notification.md @@ -0,0 +1,17 @@ +--- +title: Notification +--- + +This page provides a comprehensive overview of Notification operations in Databend Cloud, organized by functionality for easy reference. + +## Notification Management + +| Command | Description | +|---------|-------------| +| [CREATE NOTIFICATION](01-ddl-create-notification.md) | Creates a new notification integration for event alerts | +| [ALTER NOTIFICATION](02-ddl-alter-notification.md) | Modifies an existing notification integration | +| [DROP NOTIFICATION](03-ddl-drop-notification.md) | Removes a notification integration | + +:::note +Notifications in Databend Cloud allow you to configure integrations with external services like email or Slack to receive alerts about database events and operations. +::: \ No newline at end of file diff --git a/tidb-cloud-lake/sql/now.md b/tidb-cloud-lake/sql/now.md new file mode 100644 index 0000000000000..d24da93281cd4 --- /dev/null +++ b/tidb-cloud-lake/sql/now.md @@ -0,0 +1,33 @@ +--- +title: NOW +--- + +Returns the current date and time. + +## Syntax + +```sql +NOW() +``` + +## Return Type + +TIMESTAMP + +## Aliases + +- [CURRENT_TIMESTAMP](/tidb-cloud-lake/sql/current-timestamp.md) + +## Examples + +This example returns the current date and time: + +```sql +SELECT CURRENT_TIMESTAMP(), NOW(); + +┌─────────────────────────────────────────────────────────┐ +│ current_timestamp() │ now() │ +├────────────────────────────┼────────────────────────────┤ +│ 2024-01-29 04:38:12.584359 │ 2024-01-29 04:38:12.584417 │ +└─────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/nth-value.md b/tidb-cloud-lake/sql/nth-value.md new file mode 100644 index 0000000000000..c780ebbeab73f --- /dev/null +++ b/tidb-cloud-lake/sql/nth-value.md @@ -0,0 +1,102 @@ +--- +title: NTH_VALUE +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the value at the specified position (N) within the window frame. + +See also: + +- [FIRST_VALUE](/tidb-cloud-lake/sql/first-value.md) +- [LAST_VALUE](/tidb-cloud-lake/sql/last-value.md) + +## Syntax + +```sql +NTH_VALUE( + expression, + n +) +[ { RESPECT | IGNORE } NULLS ] +OVER ( + [ PARTITION BY partition_expression ] + ORDER BY order_expression + [ window_frame ] +) +``` + +**Arguments:** +- `expression`: Required. The column or expression to evaluate. +- `n`: Required. Position number (1-based) of the value to return. +- `IGNORE NULLS`: Optional. Skips null values when counting positions. +- `RESPECT NULLS`: Optional. Keeps null values when counting positions (default). +- `window_frame`: Optional. Defines the window frame. The default is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`. + +**Notes:** +- `n` must be a positive integer; `n = 1` is equivalent to `FIRST_VALUE`. +- Returns `NULL` if the specified position does not exist in the frame. +- Combine with `ROWS BETWEEN ...` to control whether the position is evaluated over the whole partition or the rows seen so far. +- For window frame syntax, see [Window Frame Syntax](/tidb-cloud-lake/sql/window-functions.md#basic-syntax). + +## Examples + +```sql +-- Sample order data +CREATE OR REPLACE TABLE orders_window_demo ( + customer VARCHAR, + order_id INT, + order_time TIMESTAMP, + amount INT, + sales_rep VARCHAR +); + +INSERT INTO orders_window_demo VALUES + ('Alice', 1001, to_timestamp('2024-05-01 09:00:00'), 120, 'Erin'), + ('Alice', 1002, to_timestamp('2024-05-01 11:00:00'), 135, NULL), + ('Alice', 1003, to_timestamp('2024-05-02 14:30:00'), 125, 'Glen'), + ('Bob', 1004, to_timestamp('2024-05-01 08:30:00'), 90, NULL), + ('Bob', 1005, to_timestamp('2024-05-01 20:15:00'), 105, 'Kai'), + ('Bob', 1006, to_timestamp('2024-05-03 10:00:00'), 95, NULL), + ('Carol', 1007, to_timestamp('2024-05-04 09:45:00'), 80, 'Lily'); +``` + +**Find the second order and illustrate null-handling for the second sales rep:** + +```sql +SELECT customer, + order_id, + order_time, + NTH_VALUE(order_id, 2) OVER ( + PARTITION BY customer + ORDER BY order_time + ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW + ) AS second_order_so_far, + NTH_VALUE(sales_rep, 2) RESPECT NULLS OVER ( + PARTITION BY customer + ORDER BY order_time + ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW + ) AS second_rep_respect, + NTH_VALUE(sales_rep, 2) IGNORE NULLS OVER ( + PARTITION BY customer + ORDER BY order_time + ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW + ) AS second_rep_ignore +FROM orders_window_demo +ORDER BY customer, order_time; +``` + +Result: +``` +customer | order_id | order_time | second_order_so_far | second_rep_respect | second_rep_ignore +---------+----------+----------------------+---------------------+--------------------+------------------- +Alice | 1001 | 2024-05-01 09:00:00 | NULL | NULL | NULL +Alice | 1002 | 2024-05-01 11:00:00 | 1002 | NULL | NULL +Alice | 1003 | 2024-05-02 14:30:00 | 1002 | NULL | Glen +Bob | 1004 | 2024-05-01 08:30:00 | NULL | NULL | NULL +Bob | 1005 | 2024-05-01 20:15:00 | 1005 | Kai | Kai +Bob | 1006 | 2024-05-03 10:00:00 | 1005 | Kai | Kai +Carol | 1007 | 2024-05-04 09:45:00 | NULL | NULL | NULL +``` diff --git a/tidb-cloud-lake/sql/ntile.md b/tidb-cloud-lake/sql/ntile.md new file mode 100644 index 0000000000000..6e2944184f32f --- /dev/null +++ b/tidb-cloud-lake/sql/ntile.md @@ -0,0 +1,99 @@ +--- +title: NTILE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Divides rows into a specified number of buckets and assigns a bucket number to each row. Rows are distributed as evenly as possible across buckets. + +## Syntax + +```sql +NTILE(bucket_count) +OVER ( + [ PARTITION BY partition_expression ] + ORDER BY sort_expression [ ASC | DESC ] +) +``` + +**Arguments:** +- `bucket_count`: Required. Number of buckets to create (must be positive integer) +- `PARTITION BY`: Optional. Divides rows into partitions +- `ORDER BY`: Required. Determines the distribution order +- `ASC | DESC`: Optional. Sort direction (default: ASC) + +**Notes:** +- Bucket numbers range from 1 to `bucket_count` +- Rows are distributed as evenly as possible +- If rows don't divide evenly, earlier buckets get one extra row +- Useful for creating percentiles and equal-sized groups + +## Examples + +```sql +-- Create sample data +CREATE TABLE scores ( + student VARCHAR(20), + subject VARCHAR(20), + score INT +); + +INSERT INTO scores VALUES + ('Alice', 'Math', 95), + ('Alice', 'English', 87), + ('Alice', 'Science', 92), + ('Bob', 'Math', 85), + ('Bob', 'English', 85), + ('Bob', 'Science', 80), + ('Charlie', 'Math', 88), + ('Charlie', 'English', 85), + ('Charlie', 'Science', 85); +``` + +**Divide all scores into 3 buckets (tertiles):** + +```sql +SELECT student, subject, score, + NTILE(3) OVER (ORDER BY score DESC) AS score_bucket +FROM scores +ORDER BY score DESC, student, subject; +``` + +Result: +``` +student | subject | score | score_bucket +--------+---------+-------+------------- +Alice | Math | 95 | 1 +Alice | Science | 92 | 1 +Charlie | Math | 88 | 1 +Alice | English | 87 | 2 +Bob | English | 85 | 2 +Bob | Math | 85 | 2 +Charlie | English | 85 | 3 +Charlie | Science | 85 | 3 +Bob | Science | 80 | 3 +``` + +**Divide scores into quartiles within each student:** + +```sql +SELECT student, subject, score, + NTILE(2) OVER (PARTITION BY student ORDER BY score DESC) AS performance_half +FROM scores +ORDER BY student, score DESC, subject; +``` + +Result: +``` +student | subject | score | performance_half +--------+---------+-------+----------------- +Alice | Math | 95 | 1 +Alice | Science | 92 | 1 +Alice | English | 87 | 2 +Bob | English | 85 | 1 +Bob | Math | 85 | 1 +Bob | Science | 80 | 2 +Charlie | Math | 88 | 1 +Charlie | English | 85 | 2 +Charlie | Science | 85 | 2 \ No newline at end of file diff --git a/tidb-cloud-lake/sql/nullable.md b/tidb-cloud-lake/sql/nullable.md new file mode 100644 index 0000000000000..370f3cf62383b --- /dev/null +++ b/tidb-cloud-lake/sql/nullable.md @@ -0,0 +1,37 @@ +--- +title: TO_NULLABLE +--- + +Converts a value to its nullable equivalent. + +When you apply this function to a value, it checks if the value is already able to hold NULL values or not. If the value is already able to hold NULL values, the function will return the value without making any changes. + +However, if the value is not able to hold NULL values, the TO_NULLABLE function will modify the value to make it able to hold NULL values. It does this by wrapping the value in a structure that can hold NULL values, which means the value can now hold NULL values in the future. + +## Syntax + +```sql +TO_NULLABLE(x); +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------------------| +| x | The original value. | + + +## Return Type + +Returns a value of the same data type as the input value, but wrapped in a nullable container if the input value is not already nullable. + +## Examples + +```sql +SELECT typeof(3), TO_NULLABLE(3), typeof(TO_NULLABLE(3)); + +typeof(3) |to_nullable(3)|typeof(to_nullable(3))| +----------------+--------------+----------------------+ +TINYINT UNSIGNED| 3|TINYINT UNSIGNED NULL | + +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/nullif.md b/tidb-cloud-lake/sql/nullif.md new file mode 100644 index 0000000000000..ba263bdc7f1a8 --- /dev/null +++ b/tidb-cloud-lake/sql/nullif.md @@ -0,0 +1,23 @@ +--- +title: NULLIF +--- + +Returns NULL if two expressions are equal. Otherwise return expr1. They must have the same data type. + +## Syntax + +```sql +NULLIF(, ) +``` + +## Examples + +```sql +SELECT NULLIF(0, NULL); + +┌─────────────────┐ +│ nullif(0, null) │ +├─────────────────┤ +│ 0 │ +└─────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/numeric-functions.md b/tidb-cloud-lake/sql/numeric-functions.md new file mode 100644 index 0000000000000..53e0d057683cf --- /dev/null +++ b/tidb-cloud-lake/sql/numeric-functions.md @@ -0,0 +1,70 @@ +--- +title: Numeric Functions +--- + +This page provides a comprehensive overview of Numeric functions in Databend, organized by functionality for easy reference. + +## Basic Arithmetic Functions + +| Function | Description | Example | +|----------|-------------|---------| +| [PLUS](/tidb-cloud-lake/sql/plus.md) / [ADD](/tidb-cloud-lake/sql/add.md) | Addition operator | `5 + 3` → `8` | +| [MINUS](/tidb-cloud-lake/sql/minus.md) / [SUBTRACT](/tidb-cloud-lake/sql/subtract.md) | Subtraction operator | `5 - 3` → `2` | +| [MULTIPLY](/tidb-cloud-lake/sql/multiply.md) | Multiplication operator | `5 * 3` → `15` | +| [DIV](/tidb-cloud-lake/sql/div.md) | Division operator | `10 / 2` → `5.0` | +| [DIV0](/tidb-cloud-lake/sql/div.md) | Division that returns 0 instead of error for division by zero | `DIV0(10, 0)` → `0` | +| [DIVNULL](/tidb-cloud-lake/sql/divnull.md) | Division that returns NULL instead of error for division by zero | `DIVNULL(10, 0)` → `NULL` | +| [INTDIV](/tidb-cloud-lake/sql/intdiv.md) | Integer division | `10 DIV 3` → `3` | +| [MOD](/tidb-cloud-lake/sql/mod.md) / [MODULO](/tidb-cloud-lake/sql/modulo.md) | Modulo operation (remainder) | `10 % 3` → `1` | +| [NEG](/tidb-cloud-lake/sql/neg.md) / [NEGATE](/tidb-cloud-lake/sql/negate.md) | Negation | `-5` → `-5` | + +## Rounding and Truncation Functions + +| Function | Description | Example | +|-----------------------------------------|-----------------------------------------------------------|----------------------------------| +| [ROUND](/tidb-cloud-lake/sql/round.md) | Rounds a number to specified decimal places | `ROUND(123.456, 2)` → `123.46` | +| [FLOOR](/tidb-cloud-lake/sql/floor.md) | Returns the largest integer not greater than the argument | `FLOOR(123.456)` → `123` | +| [CEIL](/tidb-cloud-lake/sql/ceil.md) / [CEILING](/tidb-cloud-lake/sql/ceiling.md) | Returns the smallest integer not less than the argument | `CEIL(123.456)` → `124` | +| [TRUNCATE](/tidb-cloud-lake/sql/truncate.md) | Truncates a number to specified decimal places | `TRUNCATE(123.456, 1)` → `123.4` | +| [TRUNC](/tidb-cloud-lake/sql/trunc.md) | Truncates a number to specified decimal places | `TRUNC(123.456, 1)` → `123.4` | + +## Exponential and Logarithmic Functions + +| Function | Description | Example | +|----------|-------------|---------| +| [EXP](/tidb-cloud-lake/sql/exp.md) | Returns e raised to the power of x | `EXP(1)` → `2.718281828459045` | +| [POW](/tidb-cloud-lake/sql/pow.md) / [POWER](/tidb-cloud-lake/sql/power.md) | Returns x raised to the power of y | `POW(2, 3)` → `8` | +| [SQRT](/tidb-cloud-lake/sql/sqrt.md) | Returns the square root of x | `SQRT(16)` → `4` | +| [CBRT](/tidb-cloud-lake/sql/cbrt.md) | Returns the cube root of x | `CBRT(27)` → `3` | +| [LN](/tidb-cloud-lake/sql/ln.md) | Returns the natural logarithm of x | `LN(2.718281828459045)` → `1` | +| [LOG10](/tidb-cloud-lake/sql/log.md) | Returns the base-10 logarithm of x | `LOG10(100)` → `2` | +| [LOG2](/tidb-cloud-lake/sql/log.md) | Returns the base-2 logarithm of x | `LOG2(8)` → `3` | +| [LOGX](/tidb-cloud-lake/sql/log-x.md) | Returns the logarithm of y to base x | `LOGX(2, 8)` → `3` | +| [LOGBX](/tidb-cloud-lake/sql/log-b-x.md) | Returns the logarithm of x to base b | `LOGBX(8, 2)` → `3` | + +## Trigonometric Functions + +| Function | Description | Example | +|----------|-------------|---------| +| [SIN](/tidb-cloud-lake/sql/sin.md) | Returns the sine of x | `SIN(0)` → `0` | +| [COS](/tidb-cloud-lake/sql/cos.md) | Returns the cosine of x | `COS(0)` → `1` | +| [TAN](/tidb-cloud-lake/sql/tan.md) | Returns the tangent of x | `TAN(0)` → `0` | +| [COT](/tidb-cloud-lake/sql/cot.md) | Returns the cotangent of x | `COT(1)` → `0.6420926159343306` | +| [ASIN](/tidb-cloud-lake/sql/asin.md) | Returns the arc sine of x | `ASIN(1)` → `1.5707963267948966` | +| [ACOS](/tidb-cloud-lake/sql/acos.md) | Returns the arc cosine of x | `ACOS(1)` → `0` | +| [ATAN](/tidb-cloud-lake/sql/atan.md) | Returns the arc tangent of x | `ATAN(1)` → `0.7853981633974483` | +| [ATAN2](/tidb-cloud-lake/sql/atan.md) | Returns the arc tangent of y/x | `ATAN2(1, 1)` → `0.7853981633974483` | +| [DEGREES](/tidb-cloud-lake/sql/degrees.md) | Converts radians to degrees | `DEGREES(PI())` → `180` | +| [RADIANS](/tidb-cloud-lake/sql/radians.md) | Converts degrees to radians | `RADIANS(180)` → `3.141592653589793` | +| [PI](/tidb-cloud-lake/sql/pi.md) | Returns the value of π | `PI()` → `3.141592653589793` | + +## Other Numeric Functions + +| Function | Description | Example | +|----------|-------------|---------| +| [ABS](/tidb-cloud-lake/sql/abs.md) | Returns the absolute value of x | `ABS(-5)` → `5` | +| [SIGN](/tidb-cloud-lake/sql/sign.md) | Returns the sign of x | `SIGN(-5)` → `-1` | +| [FACTORIAL](/tidb-cloud-lake/sql/factorial.md) | Returns the factorial of x | `FACTORIAL(5)` → `120` | +| [RAND](/tidb-cloud-lake/sql/rand.md) | Returns a random number between 0 and 1 | `RAND()` → `0.123...` (random) | +| [RANDN](/tidb-cloud-lake/sql/rand-n.md) | Returns a random number from standard normal distribution | `RANDN()` → `-0.123...` (random) | +| [CRC32](/tidb-cloud-lake/sql/crc.md) | Returns the CRC32 checksum of a string | `CRC32('Databend')` → `3899655467` | \ No newline at end of file diff --git a/tidb-cloud-lake/sql/numeric.md b/tidb-cloud-lake/sql/numeric.md new file mode 100644 index 0000000000000..5a9fed06ad497 --- /dev/null +++ b/tidb-cloud-lake/sql/numeric.md @@ -0,0 +1,73 @@ +--- +title: Numeric +description: Basic Numeric data type. +sidebar_position: 4 +--- + +## Integer Data Types + +| Name | Alias | Size | Min Value | Max Value | +|----------|-------|---------|----------------------|---------------------| +| TINYINT | INT8 | 1 byte | -128 | 127 | +| SMALLINT | INT16 | 2 bytes | -32768 | 32767 | +| INT | INT32 | 4 bytes | -2147483648 | 2147483647 | +| BIGINT | INT64 | 8 bytes | -9223372036854775808 | 9223372036854775807 | + +:::tip +If you want unsigned integer, please use `UNSIGNED` constraint, this is compatible with MySQL, for example: + +```sql +CREATE TABLE test_numeric(tiny TINYINT, tiny_unsigned TINYINT UNSIGNED) +``` +::: + +## Floating-Point Data Types + +| Name | Size | Min Value | Max Value | +|--------|---------|--------------------------|-------------------------| +| FLOAT | 4 bytes | -3.40282347e+38 | 3.40282347e+38 | +| DOUBLE | 8 bytes | -1.7976931348623157E+308 | 1.7976931348623157E+308 | + +## Functions + +See [Numeric Functions](/tidb-cloud-lake/sql/numeric-functions.md). + +## Examples + +```sql +CREATE TABLE test_numeric +( + tiny TINYINT, + tiny_unsigned TINYINT UNSIGNED, + smallint SMALLINT, + smallint_unsigned SMALLINT UNSIGNED, + int INT, + int_unsigned INT UNSIGNED, + bigint BIGINT, + bigint_unsigned BIGINT UNSIGNED, + float FLOAT, + double DOUBLE +); +``` + +```sql +DESC test_numeric; +``` + +Result: +``` +┌───────────────────────────────────────────────────────────────────┐ +│ Field │ Type │ Null │ Default │ Extra │ +├───────────────────┼───────────────────┼────────┼─────────┼────────┤ +│ tiny │ TINYINT │ YES │ NULL │ │ +│ tiny_unsigned │ TINYINT UNSIGNED │ YES │ NULL │ │ +│ smallint │ SMALLINT │ YES │ NULL │ │ +│ smallint_unsigned │ SMALLINT UNSIGNED │ YES │ NULL │ │ +│ int │ INT │ YES │ NULL │ │ +│ int_unsigned │ INT UNSIGNED │ YES │ NULL │ │ +│ bigint │ BIGINT │ YES │ NULL │ │ +│ bigint_unsigned │ BIGINT UNSIGNED │ YES │ NULL │ │ +│ float │ FLOAT │ YES │ NULL │ │ +│ double │ DOUBLE │ YES │ NULL │ │ +└───────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/nvl.md b/tidb-cloud-lake/sql/nvl.md new file mode 100644 index 0000000000000..683b0a9e13ab2 --- /dev/null +++ b/tidb-cloud-lake/sql/nvl.md @@ -0,0 +1,38 @@ +--- +title: NVL +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +If `` is NULL, returns ``, otherwise returns ``. + +## Syntax + +```sql +NVL(, ) +``` + +## Aliases + +- [IFNULL](/tidb-cloud-lake/sql/ifnull.md) + +## Examples + +```sql +SELECT NVL(NULL, 'b'), NVL('a', 'b'); + +┌────────────────────────────────┐ +│ nvl(null, 'b') │ nvl('a', 'b') │ +├────────────────┼───────────────┤ +│ b │ a │ +└────────────────────────────────┘ + +SELECT NVL(NULL, 2), NVL(1, 2); + +┌──────────────────────────┐ +│ nvl(null, 2) │ nvl(1, 2) │ +├──────────────┼───────────┤ +│ 2 │ 1 │ +└──────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/nvl2.md b/tidb-cloud-lake/sql/nvl2.md new file mode 100644 index 0000000000000..c3d9041828d39 --- /dev/null +++ b/tidb-cloud-lake/sql/nvl2.md @@ -0,0 +1,34 @@ +--- +title: NVL2 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns `` if `` is not NULL; otherwise, it returns ``. + +## Syntax + +```sql +NVL2( , , ) +``` + +## Examples + +```sql +SELECT NVL2('a', 'b', 'c'), NVL2(NULL, 'b', 'c'); + +┌────────────────────────────────────────────┐ +│ nvl2('a', 'b', 'c') │ nvl2(null, 'b', 'c') │ +├─────────────────────┼──────────────────────┤ +│ b │ c │ +└────────────────────────────────────────────┘ + +SELECT NVL2(1, 2, 3), NVL2(NULL, 2, 3); + +┌──────────────────────────────────┐ +│ nvl2(1, 2, 3) │ nvl2(null, 2, 3) │ +├───────────────┼──────────────────┤ +│ 2 │ 3 │ +└──────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/obfuscate.md b/tidb-cloud-lake/sql/obfuscate.md new file mode 100644 index 0000000000000..3af2e5a91eb8d --- /dev/null +++ b/tidb-cloud-lake/sql/obfuscate.md @@ -0,0 +1,45 @@ +--- +title: OBFUSCATE +--- + +Dataset anonymization. This is a quick tool, and for more complex scenarios, it is recommended to directly use the underlying function [MARKOV_TRAIN](/tidb-cloud-lake/sql/markov-train.md), [MARKOV_GENERATE](/tidb-cloud-lake/sql/markov-generate.md), [FEISTEL_OBFUSCATE](/tidb-cloud-lake/sql/feistel-obfuscate.md). + +## Syntax + +```sql +OBFUSCATE('
'[, seed => ]) +``` + +## Examples + +```sql +CREATE OR REPLACE TABLE demo_customers AS +SELECT * +FROM ( + VALUES + (1,'Alice Johnson','alice.johnson@gmail.com','555-123-0001','123 Maple St, Springfield, IL'), + (2,'Bob Smith','bob.smith@yahoo.com','555-123-0002','456 Oak Ave, Dayton, OH'), + (3,'Carol Davis','carol.davis@outlook.com','555-123-0003','789 Pine Rd, Austin, TX'), + (4,'David Miller','david.miller@example.com','555-123-0004','321 Birch Blvd, Denver, CO'), + (5,'Emma Wilson','emma.wilson@example.com','555-123-0005','654 Cedar Ln, Seattle, WA'), + (6,'Frank Brown','frank.brown@gmail.com','555-123-0006','987 Walnut Dr, Portland, OR'), + (7,'Grace Lee','grace.lee@example.com','555-123-0007','159 Ash Ct, Boston, MA'), + (8,'Henry Clark','henry.clark@example.com','555-123-0008','753 Elm St, Phoenix, AZ') +) AS t(id, full_name, email, phone, address); + +-- One-call table masking; seed keeps it reproducible +SELECT * FROM obfuscate(demo_customers, seed=>2025) +ORDER BY id; + +-- Sample output +┌────id┬───────────────┬────────────────────────────────┬──────────────┬────────────────────────────────────┐ +│ 1 │ Alice Johnson │ emma.wilson@example.com │ 555-123-0002 │ 123 Maple St, Phoenix, AZ │ +│ 2 │ Alice Johnson │ grace.lee@example.com │ 555-123-0007 │ 753 Elm St, Phoenix, AZ │ +│ 3 │ David Miller │ frank.brown@gmail.com │ 555-123-0001 │ 321 Birch Blvd, Denver, │ +│ 4 │ Alice Johnson │ emma.wilson@example.com │ 555-123-0001 │ 654 Cedar Ln, Seattle, WA │ +│ 5 │ Grace Lee │ carol.david.miller@example │ 555-123-0003 │ 123 Maple St, Phoenix, AZ │ +│ 6 │ Carol David │ emma.wilson@example.com │ 555-123-0003 │ 654 Cedar Ln, Seattle, │ +│ 7 │ Emma Wilson │ bob.smith@yahoo.com │ 555-123-0004 │ 456 Oak Ave, Dayton, MA │ +│ 9 │ Carol David │ frank.brown@gmail.com │ 555-123-0006 │ 456 Oak Ave, Dayton, MA │ +└──────┴───────────────┴────────────────────────────────┴──────────────┴────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/object-construct-keep-null.md b/tidb-cloud-lake/sql/object-construct-keep-null.md new file mode 100644 index 0000000000000..71a834d567c35 --- /dev/null +++ b/tidb-cloud-lake/sql/object-construct-keep-null.md @@ -0,0 +1,71 @@ +--- +title: OBJECT_CONSTRUCT_KEEP_NULL +title_includes: TRY_OBJECT_CONSTRUCT_KEEP_NULL, JSON_OBJECT_KEEP_NULL, TRY_JSON_OBJECT_KEEP_NULL +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a JSON object with keys and values. + +- The arguments are zero or more key-value pairs(where keys are strings, and values are of any type). +- If a key is NULL, the key-value pair is omitted from the resulting object. However, if a value is NULL, the key-value pair will be kept. +- The keys must be distinct from each other, and their order in the resulting JSON might be different from the order you specify. +- `TRY_OBJECT_CONSTRUCT_KEEP_NULL` returns a NULL value if an error occurs when building the object. + +## Aliases + +- `JSON_OBJECT_KEEP_NULL` +- `TRY_JSON_OBJECT_KEEP_NULL` + +See also: [OBJECT_CONSTRUCT](/tidb-cloud-lake/sql/object-construct.md) + +## Syntax + +```sql +OBJECT_CONSTRUCT_KEEP_NULL(key1, value1[, key2, value2[, ...]]) + +TRY_OBJECT_CONSTRUCT_KEEP_NULL(key1, value1[, key2, value2[, ...]]) +``` + +## Return Type + +JSON object. + +## Examples + +```sql +SELECT OBJECT_CONSTRUCT_KEEP_NULL(); +┌──────────────────────────────┐ +│ object_construct_keep_null() │ +├──────────────────────────────┤ +│ {} │ +└──────────────────────────────┘ + +SELECT OBJECT_CONSTRUCT_KEEP_NULL('a', 3.14, 'b', 'xx', 'c', NULL); +┌───────────────────────────────────────────────────────────┐ +│ object_construct_keep_null('a', 3.14, 'b', 'xx', 'c', null) │ +├───────────────────────────────────────────────────────────┤ +│ {"a":3.14,"b":"xx","c":null} │ +└───────────────────────────────────────────────────────────┘ + +SELECT OBJECT_CONSTRUCT_KEEP_NULL('fruits', ['apple', 'banana', 'orange'], 'vegetables', ['carrot', 'celery']); +┌───────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ object_construct_keep_null('fruits', ['apple', 'banana', 'orange'], 'vegetables', ['carrot', 'celery']) │ +├───────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ {"fruits":["apple","banana","orange"],"vegetables":["carrot","celery"]} │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────┘ + +SELECT OBJECT_CONSTRUCT_KEEP_NULL('key'); + | +1 | SELECT OBJECT_CONSTRUCT_KEEP_NULL('key') + | ^^^^^^^^^^^^^^^^^^ The number of keys and values must be equal while evaluating function `object_construct_keep_null('key')` + + +SELECT TRY_OBJECT_CONSTRUCT_KEEP_NULL('key'); +┌─────────────────────────────────────┐ +│ try_object_construct_keep_null('key') │ +├─────────────────────────────────────┤ +│ NULL │ +└─────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/object-construct.md b/tidb-cloud-lake/sql/object-construct.md new file mode 100644 index 0000000000000..0d3e756db1316 --- /dev/null +++ b/tidb-cloud-lake/sql/object-construct.md @@ -0,0 +1,71 @@ +--- +title: OBJECT_CONSTRUCT +title_includes: TRY_OBJECT_CONSTRUCT, JSON_OBJECT, TRY_JSON_OBJECT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Creates a JSON object with keys and values. + +- The arguments are zero or more key-value pairs(where keys are strings, and values are of any type). +- If a key or value is NULL, the key-value pair is ommitted from the resulting object. +- The keys must be distinct from each other, and their order in the resulting JSON might be different from the order you specify. +- `TRY_OBJECT_CONSTRUCT` returns a NULL value if an error occurs when building the object. + +## Aliases + +- `JSON_OBJECT` +- `TRY_JSON_OBJECT` + +See also: [OBJECT_CONSTRUCT_KEEP_NULL](/tidb-cloud-lake/sql/object-construct-keep-null.md) + +## Syntax + +```sql +OBJECT_CONSTRUCT(key1, value1[, key2, value2[, ...]]) + +TRY_OBJECT_CONSTRUCT(key1, value1[, key2, value2[, ...]]) +``` + +## Return Type + +JSON object. + +## Examples + +```sql +SELECT OBJECT_CONSTRUCT(); +┌────────────────┐ +│ object_construct() │ +├────────────────┤ +│ {} │ +└────────────────┘ + +SELECT OBJECT_CONSTRUCT('a', 3.14, 'b', 'xx', 'c', NULL); +┌──────────────────────────────────────────────┐ +│ object_construct('a', 3.14, 'b', 'xx', 'c', null) │ +├──────────────────────────────────────────────┤ +│ {"a":3.14,"b":"xx"} │ +└──────────────────────────────────────────────┘ + +SELECT OBJECT_CONSTRUCT('fruits', ['apple', 'banana', 'orange'], 'vegetables', ['carrot', 'celery']); +┌──────────────────────────────────────────────────────────────────────────────────────────┐ +│ object_construct('fruits', ['apple', 'banana', 'orange'], 'vegetables', ['carrot', 'celery']) │ +├──────────────────────────────────────────────────────────────────────────────────────────┤ +│ {"fruits":["apple","banana","orange"],"vegetables":["carrot","celery"]} │ +└──────────────────────────────────────────────────────────────────────────────────────────┘ + +SELECT OBJECT_CONSTRUCT('key'); + | +1 | SELECT OBJECT_CONSTRUCT('key') + | ^^^^^^^^^^^^^^^^^^ The number of keys and values must be equal while evaluating function `object_construct('key')` + + +SELECT TRY_OBJECT_CONSTRUCT('key'); +┌───────────────────────────┐ +│ try_object_construct('key') │ +├───────────────────────────┤ +│ NULL │ +└───────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/object-delete.md b/tidb-cloud-lake/sql/object-delete.md new file mode 100644 index 0000000000000..fac9c3a0f06fa --- /dev/null +++ b/tidb-cloud-lake/sql/object-delete.md @@ -0,0 +1,50 @@ +--- +title: OBJECT_DELETE +title_includes: JSON_OBJECT_DELETE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Deletes specified keys from a JSON object and returns the modified object. If a specified key doesn't exist in the object, it is ignored. + +## Aliases + +- `JSON_OBJECT_DELETE` + +## Syntax + +```sql +OBJECT_DELETE(, [, , ...]) +``` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| json_object | A JSON object (VARIANT type) from which to delete keys. | +| key1, key2, ... | One or more string literals representing the keys to be deleted from the object. | + +## Return Type + +Returns a VARIANT containing the modified JSON object with specified keys removed. + +## Examples + +Delete a single key: +```sql +SELECT OBJECT_DELETE('{"a":1,"b":2,"c":3}'::VARIANT, 'a'); +-- Result: {"b":2,"c":3} +``` + +Delete multiple keys: +```sql +SELECT OBJECT_DELETE('{"a":1,"b":2,"d":4}'::VARIANT, 'a', 'c'); +-- Result: {"b":2,"d":4} +``` + +Delete a non-existent key (key is ignored): +```sql +SELECT OBJECT_DELETE('{"a":1,"b":2}'::VARIANT, 'x'); +-- Result: {"a":1,"b":2} +``` diff --git a/tidb-cloud-lake/sql/object-functions.md b/tidb-cloud-lake/sql/object-functions.md new file mode 100644 index 0000000000000..7e7b9288fb2a1 --- /dev/null +++ b/tidb-cloud-lake/sql/object-functions.md @@ -0,0 +1,31 @@ +--- +title: Object Functions +--- + +This section provides reference information for object functions in Databend. Object functions enable creation, manipulation, and extraction of information from JSON object data structures. + +## Object Construction + +| Function | Description | Example | +|----------|-------------|---------| +| [OBJECT_CONSTRUCT](/tidb-cloud-lake/sql/object-construct.md) | Creates a JSON object from key-value pairs | `OBJECT_CONSTRUCT('name', 'John', 'age', 30)` → `{"name":"John","age":30}` | +| [OBJECT_CONSTRUCT_KEEP_NULL](/tidb-cloud-lake/sql/object-construct-keep-null.md) | Creates a JSON object keeping null values | `OBJECT_CONSTRUCT_KEEP_NULL('a', 1, 'b', null)` → `{"a":1,"b":null}` | + +## Object Information + +| Function | Description | Example | +|----------|-------------|---------| +| [OBJECT_KEYS](/tidb-cloud-lake/sql/object-keys.md) | Returns all keys from a JSON object as an array | `OBJECT_KEYS({"name":"John","age":30})` → `["name","age"]` | + +## Object Modification + +| Function | Description | Example | +|----------|-------------|---------| +| [OBJECT_INSERT](/tidb-cloud-lake/sql/object-insert.md) | Inserts or updates a key-value pair in a JSON object | `OBJECT_INSERT({"name":"John"}, "age", 30)` → `{"name":"John","age":30}` | +| [OBJECT_DELETE](/tidb-cloud-lake/sql/object-delete.md) | Removes a key-value pair from a JSON object | `OBJECT_DELETE({"name":"John","age":30}, "age")` → `{"name":"John"}` | + +## Object Selection + +| Function | Description | Example | +|----------|-------------|---------| +| [OBJECT_PICK](/tidb-cloud-lake/sql/object-pick.md) | Creates a new object with only specified keys | `OBJECT_PICK({"a":1,"b":2,"c":3}, ["a","c"])` → `{"a":1,"c":3}` | diff --git a/tidb-cloud-lake/sql/object-insert.md b/tidb-cloud-lake/sql/object-insert.md new file mode 100644 index 0000000000000..a57c55f268633 --- /dev/null +++ b/tidb-cloud-lake/sql/object-insert.md @@ -0,0 +1,64 @@ +--- +title: OBJECT_INSERT +title_includes: JSON_OBJECT_INSERT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Inserts or updates a key-value pair in a JSON object. + +## Aliases + +- `JSON_OBJECT_INSERT` + +## Syntax + +```sql +OBJECT_INSERT(, , [, ]) +``` + +| Parameter | Description | | +|---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---| +| `` | The input JSON object. | | +| `` | The key to be inserted or updated. | | +| `` | The value to assign to the key. | | +| `` | A boolean flag that controls whether to replace the value if the specified key already exists in the JSON object. If `true`, the function replaces the value if the key already exists. If `false` (or omitted), an error occurs if the key exists. | | + +## Return Type + +Returns the updated JSON object. + +## Examples + +This example demonstrates how to insert a new key 'c' with the value 3 into the existing JSON object: + +```sql +SELECT OBJECT_INSERT('{"a":1,"b":2,"d":4}'::variant, 'c', 3); + +┌────────────────────────────────────────────────────────────┐ +│ object_insert('{"a":1,"b":2,"d":4}'::VARIANT, 'c', 3) │ +├────────────────────────────────────────────────────────────┤ +│ {"a":1,"b":2,"c":3,"d":4} │ +└────────────────────────────────────────────────────────────┘ +``` + +This example shows how to update the value of an existing key 'a' from 1 to 10 using the update flag set to `true`, allowing the key's value to be replaced: + +```sql +SELECT OBJECT_INSERT('{"a":1,"b":2,"d":4}'::variant, 'a', 10, true); + +┌───────────────────────────────────────────────────────────────────┐ +│ object_insert('{"a":1,"b":2,"d":4}'::VARIANT, 'a', 10, TRUE) │ +├───────────────────────────────────────────────────────────────────┤ +│ {"a":10,"b":2,"d":4} │ +└───────────────────────────────────────────────────────────────────┘ +``` + +This example demonstrates an error that occurs when trying to insert a value for an existing key 'a' without specifying the update flag set to `true`: + +```sql +SELECT OBJECT_INSERT('{"a":1,"b":2,"d":4}'::variant, 'a', 10); + +error: APIError: ResponseError with 1006: ObjectDuplicateKey while evaluating function `object_insert('{"a":1,"b":2,"d":4}', 'a', 10)` in expr `object_insert('{"a":1,"b":2,"d":4}', 'a', 10)` +``` diff --git a/tidb-cloud-lake/sql/object-keys.md b/tidb-cloud-lake/sql/object-keys.md new file mode 100644 index 0000000000000..8c81ea8b5f5be --- /dev/null +++ b/tidb-cloud-lake/sql/object-keys.md @@ -0,0 +1,45 @@ +--- +title: OBJECT_KEYS +title_includes: JSON_OBJECT_KEYS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the keys of the outermost JSON object as an array of strings. + +## Aliases + +- `JSON_OBJECT_KEYS` + +## Syntax + +```sql +OBJECT_KEYS() +``` + +## Return Type + +ARRAY of STRING. + +## Examples + +```sql +SELECT OBJECT_KEYS('{"a":1, "b":2, "c": {"d":3}}'::VARIANT); + +-[ RECORD 1 ]----------------------------------- +object_keys('{"a":1, "b":2, "c": {"d":3}}'::VARIANT): ["a","b","c"] + +-- Example with a table +CREATE TABLE t (var VARIANT); +INSERT INTO t VALUES ('{"a":1, "b":2}'), ('{"x":10, "y":20}'); + +SELECT id, object_keys(var), json_object_keys(var) FROM t; + +┌───────────┬──────────────────┬───────────────────────┐ +│ id │ object_keys(var) │ json_object_keys(var) │ +├───────────┼──────────────────┼───────────────────────┤ +│ 1 │ ["a","b"] │ ["a","b"] │ +│ 2 │ ["x","y"] │ ["x","y"] │ +└───────────┴──────────────────┴───────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/object-pick.md b/tidb-cloud-lake/sql/object-pick.md new file mode 100644 index 0000000000000..cfb147326e2cf --- /dev/null +++ b/tidb-cloud-lake/sql/object-pick.md @@ -0,0 +1,51 @@ +--- +title: OBJECT_PICK +title_includes: JSON_OBJECT_PICK +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + + +Creates a new JSON object containing only the specified keys from the input JSON object. If a specified key doesn't exist in the input object, it is omitted from the result. + +## Aliases + +- `JSON_OBJECT_PICK` + +## Syntax + +```sql +OBJECT_PICK(, [, , ...]) +``` + +## Parameters + +| Parameter | Description | +|-----------|-------------| +| json_object | A JSON object (VARIANT type) from which to pick keys. | +| key1, key2, ... | One or more string literals representing the keys to be included in the result object. | + +## Return Type + +Returns a VARIANT containing a new JSON object with only the specified keys and their corresponding values. + +## Examples + +Pick a single key: +```sql +SELECT OBJECT_PICK('{"a":1,"b":2,"c":3}'::VARIANT, 'a'); +-- Result: {"a":1} +``` + +Pick multiple keys: +```sql +SELECT OBJECT_PICK('{"a":1,"b":2,"d":4}'::VARIANT, 'a', 'b'); +-- Result: {"a":1,"b":2} +``` + +Pick with non-existent key (non-existent keys are ignored): +```sql +SELECT OBJECT_PICK('{"a":1,"b":2,"d":4}'::VARIANT, 'a', 'c'); +-- Result: {"a":1} +``` diff --git a/tidb-cloud-lake/sql/oct.md b/tidb-cloud-lake/sql/oct.md new file mode 100644 index 0000000000000..e1dbcf4add6e7 --- /dev/null +++ b/tidb-cloud-lake/sql/oct.md @@ -0,0 +1,24 @@ +--- +title: OCT +--- + +Returns a string representation of the octal value of N. + +## Syntax + +```sql +OCT() +``` + +## Examples + +```sql +SELECT OCT(12); ++---------+ +| OCT(12) | ++---------+ +| 014 | ++---------+ +``` + + diff --git a/tidb-cloud-lake/sql/octet-length.md b/tidb-cloud-lake/sql/octet-length.md new file mode 100644 index 0000000000000..53fcff9465df7 --- /dev/null +++ b/tidb-cloud-lake/sql/octet-length.md @@ -0,0 +1,24 @@ +--- +title: OCTET_LENGTH +--- + +OCTET_LENGTH() is a synonym for LENGTH(). + +## Syntax + +```sql +OCTET_LENGTH() +``` + +## Examples + +```sql +SELECT OCTET_LENGTH('databend'); ++--------------------------+ +| OCTET_LENGTH('databend') | ++--------------------------+ +| 8 | ++--------------------------+ +``` + + diff --git a/tidb-cloud-lake/sql/optimize-table.md b/tidb-cloud-lake/sql/optimize-table.md new file mode 100644 index 0000000000000..eb0cb9ba0ad3f --- /dev/null +++ b/tidb-cloud-lake/sql/optimize-table.md @@ -0,0 +1,173 @@ +--- +title: OPTIMIZE TABLE +sidebar_position: 8 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +import DetailsWrap from '@site/src/components/DetailsWrap'; + +Optimizing a table in Databend involves compacting or purging historical data to save storage space and enhance query performance. + + + +
+ Why Optimize? +
Databend stores data in tables using the Parquet format, which is organized into blocks. Additionally, Databend supports time travel functionality, where each operation that modifies a table generates a Parquet file that captures and reflects the changes made to the table.

+ +
As a table accumulates more Parquet files over time, it can lead to performance issues and increased storage requirements. To optimize the table's performance, historical Parquet files can be deleted when they are no longer needed. This optimization can help to improve query performance and reduce the amount of storage space used by the table.
+
+ +
+ +## Databend Data Storage: Snapshot, Segment, and Block + +Snapshot, segment, and block are the concepts Databend uses for data storage. Databend uses them to construct a hierarchical structure for storing table data. + +![](/img/sql/storage-structure.PNG) + +Databend automatically creates table snapshots upon data updates. A snapshot represents a version of the table's segment metadata. + +When working with Databend, you're most likely to access a snapshot with the snapshot ID when you retrieve and query a previous version of the table's data with the [AT](/tidb-cloud-lake/sql/at.md) clause. + +A snapshot is a JSON file that does not save the table's data but indicate the segments the snapshot links to. If you run [FUSE_SNAPSHOT](/tidb-cloud-lake/sql/fuse-snapshot.md) against a table, you can find the saved snapshots for the table. + +A segment is a JSON file that organizes the storage blocks (at least 1, at most 1,000) where the data is stored. If you run [FUSE_SEGMENT](/tidb-cloud-lake/sql/fuse-segment.md) against a snapshot with the snapshot ID, you can find which segments are referenced by the snapshot. + +Databends saves actual table data in parquet files and considers each parquet file as a block. If you run [FUSE_BLOCK](/tidb-cloud-lake/sql/fuse-block.md) against a snapshot with the snapshot ID, you can find which blocks are referenced by the snapshot. + +Databend creates a unique ID for each database and table for storing the snapshot, segment, and block files and saves them to your object storage in the path `////`. Each snapshot, segment, and block file is named with a UUID (32-character lowercase hexadecimal string). + +| File | Format | Filename | Storage Folder | +|----------|---------|---------------------------------|-----------------------------------------------------| +| Snapshot | JSON | `<32bitUUID>_.json` | `////_ss/` | +| Segment | JSON | `<32bitUUID>_.json` | `////_sg/` | +| Block | parquet | `<32bitUUID>_.parquet` | `////_b/` | + +## Table Optimizations + +In Databend, it's advisable to aim for an ideal block size of either 100MB (uncompressed) or 1,000,000 rows, with each segment consisting of 1,000 blocks. To maximize table optimization, it's crucial to gain a clear understanding of when and how to apply various optimization techniques, such as [Segment Compaction](#segment-compaction) and [Block Compaction](#block-compaction). +- When using the COPY INTO or REPLACE INTO command to write data into a table that includes a cluster key, Databend will automatically initiate a re-clustering process, as well as a segment and block compact process. + +- Segment & block compactions support distributed execution in cluster environments. You can enable them by setting ENABLE_DISTRIBUTED_COMPACT to 1. This helps enhance data query performance and scalability in cluster environments. + + ```sql + SET enable_distributed_compact = 1; + ``` + +### Segment Compaction + +Compact segments when a table has too many small segments (less than `100 blocks` per segment). +```sql +SELECT + block_count, + segment_count, + IF( + block_count / segment_count < 100, + 'The table needs segment compact now', + 'The table does not need segment compact now' + ) AS advice +FROM + fuse_snapshot('your-database', 'your-table') + LIMIT 1; +``` + +**Syntax** + +```sql +OPTIMIZE TABLE [database.]table_name COMPACT SEGMENT [LIMIT ] +``` + +Compacts the table data by merging small segments into larger ones. + +- The option LIMIT sets the maximum number of segments to be compacted. In this case, Databend will select and compact the latest segments. + +**Example** + +```sql +-- Check whether need segment compact +SELECT + block_count, + segment_count, + IF( + block_count / segment_count < 100, + 'The table needs segment compact now', + 'The table does not need segment compact now' + ) AS advice +FROM + fuse_snapshot('hits', 'hits'); + ++-------------+---------------+-------------------------------------+ +| block_count | segment_count | advice | ++-------------+---------------+-------------------------------------+ +| 751 | 32 | The table needs segment compact now | ++-------------+---------------+-------------------------------------+ + +-- Compact segment +OPTIMIZE TABLE hits COMPACT SEGMENT; + +-- Check again +SELECT + block_count, + segment_count, + IF( + block_count / segment_count < 100, + 'The table needs segment compact now', + 'The table does not need segment compact now' + ) AS advice +FROM + fuse_snapshot('hits', 'hits') + LIMIT 1; + ++-------------+---------------+---------------------------------------------+ +| block_count | segment_count | advice | ++-------------+---------------+---------------------------------------------+ +| 751 | 1 | The table does not need segment compact now | ++-------------+---------------+---------------------------------------------+ +``` + +### Block Compaction + +Compact blocks when a table has a large number of small blocks or when the table has a high percentage of inserted, deleted, or updated rows. + +You can check it with if the uncompressed size of each block is close to the perfect size of `100MB`. + +If the size is less than `50MB`, we suggest doing block compaction, as it indicates too many small blocks: + +```sql +SELECT + block_count, + humanize_size(bytes_uncompressed / block_count) AS per_block_uncompressed_size, + IF( + bytes_uncompressed / block_count / 1024 / 1024 < 50, + 'The table needs block compact now', + 'The table does not need block compact now' + ) AS advice +FROM + fuse_snapshot('your-database', 'your-table') + LIMIT 1; +``` + +:::info +We recommend performing segment compaction first, followed by block compaction. +::: + +**Syntax** +```sql +OPTIMIZE TABLE [database.]table_name COMPACT [LIMIT ] +``` +Compacts the table data by merging small blocks and segments into larger ones. + +- This command creates a new snapshot (along with compacted segments and blocks) of the most recent table data without affecting the existing storage files, so the storage space won't be released until you purge the historical data. + +- Depending on the size of the given table, it may take quite a while to complete the execution. + +- The option LIMIT sets the maximum number of segments to be compacted. In this case, Databend will select and compact the latest segments. + +- Databend will automatically re-cluster a clustered table after the compacting process. + +**Example** +```sql +OPTIMIZE TABLE my_database.my_table COMPACT LIMIT 50; +``` diff --git a/tidb-cloud-lake/sql/ord.md b/tidb-cloud-lake/sql/ord.md new file mode 100644 index 0000000000000..9c9394ed8163e --- /dev/null +++ b/tidb-cloud-lake/sql/ord.md @@ -0,0 +1,41 @@ +--- +title: ORD +--- + +If the leftmost character is not a multibyte character, ORD() returns the same value as the ASCII() function. + +If the leftmost character of the string str is a multibyte character, returns the code for that character, +calculated from the numeric values of its constituent bytes using this formula: + +```sql + (1st byte code) ++ (2nd byte code * 256) ++ (3rd byte code * 256^2) ... +``` + +## Syntax + +```sql +ORD() +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | The string. | + +## Return Type + +`BIGINT` + +## Examples + +```sql +SELECT ORD('2') ++--------+ +| ORD(2) | ++--------+ +| 50 | ++--------+ +``` diff --git a/tidb-cloud-lake/sql/other-functions.md b/tidb-cloud-lake/sql/other-functions.md new file mode 100644 index 0000000000000..90b21345a9831 --- /dev/null +++ b/tidb-cloud-lake/sql/other-functions.md @@ -0,0 +1,18 @@ +--- +title: Other Functions +--- + +This section collects assorted utilities that do not fit into the major functional groups. + +| Function | Description | +|----------|-------------| +| [ASSUME_NOT_NULL](/tidb-cloud-lake/sql/assume-not-null.md) | Hint that values in a nullable column are never NULL | +| [EXISTS](/tidb-cloud-lake/sql/exists.md) | Return TRUE if a subquery produces any rows | +| [GROUPING](/tidb-cloud-lake/sql/grouping.md) | Reveal whether an output column was aggregated in GROUPING SETS | +| [HUMANIZE_NUMBER](/tidb-cloud-lake/sql/humanize-number.md) | Format large numbers with unit suffixes | +| [HUMANIZE_SIZE](/tidb-cloud-lake/sql/humanize-size.md) | Format byte counts into readable units | +| [REMOVE_NULLABLE](/tidb-cloud-lake/sql/remove-nullable.md) | Strip NULLability from a column value | +| [TO_NULLABLE](/tidb-cloud-lake/sql/nullable.md) | Convert a value to a nullable type | +| [TYPEOF](/tidb-cloud-lake/sql/typeof.md) | Return the name of a value’s data type | + + diff --git a/tidb-cloud-lake/sql/parse-json.md b/tidb-cloud-lake/sql/parse-json.md new file mode 100644 index 0000000000000..8797471ef9895 --- /dev/null +++ b/tidb-cloud-lake/sql/parse-json.md @@ -0,0 +1,45 @@ +--- +title: PARSE_JSON +description: + Interprets input JSON string, producing a VARIANT value +title_includes: TRY_PARSE_JSON +--- + +`parse_json` and `try_parse_json` interprets an input string as a JSON document, producing a VARIANT value. + +`try_parse_json` returns a NULL value if an error occurs during parsing. + +## Syntax + +```sql +PARSE_JSON() +TRY_PARSE_JSON() +``` + +## Arguments + +| Arguments | Description | +|-----------|--------------------------------------------------------------------------------| +| `` | An expression of string type (e.g. VARCHAR) that holds valid JSON information. | + +## Return Type + +VARIANT + +## Examples + +```sql +SELECT parse_json('[-1, 12, 289, 2188, false]'); ++------------------------------------------+ +| parse_json('[-1, 12, 289, 2188, false]') | ++------------------------------------------+ +| [-1,12,289,2188,false] | ++------------------------------------------+ + +SELECT try_parse_json('{ "x" : "abc", "y" : false, "z": 10} '); ++---------------------------------------------------------+ +| try_parse_json('{ "x" : "abc", "y" : false, "z": 10} ') | ++---------------------------------------------------------+ +| {"x":"abc","y":false,"z":10} | ++---------------------------------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/password-policy.md b/tidb-cloud-lake/sql/password-policy.md new file mode 100644 index 0000000000000..3a92aa626b1c9 --- /dev/null +++ b/tidb-cloud-lake/sql/password-policy.md @@ -0,0 +1,28 @@ +--- +title: Password Policy +--- + +This page provides a comprehensive overview of Password Policy operations in Databend, organized by functionality for easy reference. + +## Password Policy Management + +| Command | Description | +|---------|-------------| +| [CREATE PASSWORD POLICY](/tidb-cloud-lake/sql/create-password-policy.md) | Creates a new password policy with specific requirements | +| [ALTER PASSWORD POLICY](/tidb-cloud-lake/sql/alter-password-policy.md) | Modifies an existing password policy | +| [DROP PASSWORD POLICY](/tidb-cloud-lake/sql/drop-password-policy.md) | Removes a password policy | + +## Password Policy Information + +| Command | Description | +|---------|-------------| +| [DESCRIBE PASSWORD POLICY](/tidb-cloud-lake/sql/desc-password-policy.md) | Shows details of a specific password policy | +| [SHOW PASSWORD POLICIES](/tidb-cloud-lake/sql/show-password-policies.md) | Lists all password policies | + +## Related Topics + +- [Password Policy](/tidb-cloud-lake/guides/password-policy.md) + +:::note +Password policies in Databend allow you to enforce security requirements for user passwords, such as minimum length, complexity, and expiration rules. +::: diff --git a/tidb-cloud-lake/sql/percent-rank.md b/tidb-cloud-lake/sql/percent-rank.md new file mode 100644 index 0000000000000..a2c0357facce6 --- /dev/null +++ b/tidb-cloud-lake/sql/percent-rank.md @@ -0,0 +1,70 @@ +--- +title: PERCENT_RANK +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Calculates the relative rank of each row as a percentage. Returns values between 0 and 1, where 0 represents the lowest rank and 1 represents the highest rank. + +See also: [CUME_DIST](/tidb-cloud-lake/sql/cume-dist.md) + +## Syntax + +```sql +PERCENT_RANK() +OVER ( + [ PARTITION BY partition_expression ] + ORDER BY sort_expression [ ASC | DESC ] +) +``` + +**Arguments:** +- `PARTITION BY`: Optional. Divides rows into partitions +- `ORDER BY`: Required. Determines the ranking order +- `ASC | DESC`: Optional. Sort direction (default: ASC) + +**Notes:** +- Returns values between 0 and 1 (inclusive) +- First row always has PERCENT_RANK of 0 +- Last row always has PERCENT_RANK of 1 +- Formula: (rank - 1) / (total_rows - 1) +- Multiply by 100 to get percentile values + +## Examples + +```sql +-- Create sample data +CREATE TABLE scores ( + student VARCHAR(20), + score INT +); + +INSERT INTO scores VALUES + ('Alice', 95), + ('Bob', 87), + ('Charlie', 87), + ('David', 82), + ('Eve', 78); +``` + +**Calculate percent rank (showing percentile position):** + +```sql +SELECT student, score, + PERCENT_RANK() OVER (ORDER BY score DESC) AS percent_rank, + ROUND(PERCENT_RANK() OVER (ORDER BY score DESC) * 100) AS percentile +FROM scores +ORDER BY score DESC, student; +``` + +Result: +``` +student | score | percent_rank | percentile +--------+-------+--------------+----------- +Alice | 95 | 0.0 | 0 +Bob | 87 | 0.25 | 25 +Charlie | 87 | 0.25 | 25 +David | 82 | 0.75 | 75 +Eve | 78 | 1.0 | 100 +``` diff --git a/tidb-cloud-lake/sql/pi.md b/tidb-cloud-lake/sql/pi.md new file mode 100644 index 0000000000000..fa28bf9008647 --- /dev/null +++ b/tidb-cloud-lake/sql/pi.md @@ -0,0 +1,23 @@ +--- +title: PI +--- + +Returns the value of π as a floating-point value. + +## Syntax + +```sql +PI() +``` + +## Examples + +```sql +SELECT PI(); + +┌───────────────────┐ +│ pi() │ +├───────────────────┤ +│ 3.141592653589793 │ +└───────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/pivot.md b/tidb-cloud-lake/sql/pivot.md new file mode 100644 index 0000000000000..7086b467cd4a3 --- /dev/null +++ b/tidb-cloud-lake/sql/pivot.md @@ -0,0 +1,85 @@ +--- +title: PIVOT +--- + +The `PIVOT` operation in Databend allows you to transform a table by rotating it and aggregating results based on specified columns. + +It is a useful operation for summarizing and analyzing large amounts of data in a more readable format. In this document, we will explain the syntax and provide an example of how to use the `PIVOT` operation. + +**See also:** +[UNPIVOT](/tidb-cloud-lake/sql/unpivot.md) + + +## Syntax + +```sql +SELECT ... +FROM ... + PIVOT ( ( ) + FOR IN ( [ , ... ] ) ) + +[ ... ] +``` + +Where: +* ``: The aggregate function for combining the grouped values from `pivot_column`. +* ``: The column that will be aggregated using the specified ``. +* ``: The column whose unique values will become new columns in the pivoted result set. +* ``: A unique value from the `` that will become a new column in the pivoted result set. + + +## Examples + +Let's say we have a table called monthly_sales that contains sales data for different employees across different months. We can use the `PIVOT` operation to summarize the data and calculate the total sales for each employee in each month. + +### Creating and Inserting Data + + +```sql +-- Create the monthly_sales table +CREATE TABLE monthly_sales( + empid INT, + amount INT, + month VARCHAR +); + +-- Insert sales data +INSERT INTO monthly_sales VALUES + (1, 10000, 'JAN'), + (1, 400, 'JAN'), + (2, 4500, 'JAN'), + (2, 35000, 'JAN'), + (1, 5000, 'FEB'), + (1, 3000, 'FEB'), + (2, 200, 'FEB'), + (2, 90500, 'FEB'), + (1, 6000, 'MAR'), + (1, 5000, 'MAR'), + (2, 2500, 'MAR'), + (2, 9500, 'MAR'), + (1, 8000, 'APR'), + (1, 10000, 'APR'), + (2, 800, 'APR'), + (2, 4500, 'APR'); +``` + +### Using PIVOT + +Now, we can use the `PIVOT` operation to calculate the total sales for each employee in each month. We will use the `SUM` aggregate function to calculate the total sales, and the MONTH column will be pivoted to create new columns for each month. + +```sql +SELECT * +FROM monthly_sales +PIVOT(SUM(amount) FOR MONTH IN ('JAN', 'FEB', 'MAR', 'APR')) +ORDER BY EMPID; +``` + +Output: +```sql ++-------+-------+-------+-------+-------+ +| empid | jan | feb | mar | apr | ++-------+-------+-------+-------+-------+ +| 1 | 10400 | 8000 | 11000 | 18000 | +| 2 | 39500 | 90700 | 12000 | 5300 | ++-------+-------+-------+-------+-------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/plus.md b/tidb-cloud-lake/sql/plus.md new file mode 100644 index 0000000000000..2f2cd9e6c7025 --- /dev/null +++ b/tidb-cloud-lake/sql/plus.md @@ -0,0 +1,27 @@ +--- +title: PLUS +--- + +Calculates the sum of two numeric or decimal values. + +## Syntax + +```sql +PLUS(, ) +``` + +## Aliases + +- [ADD](/tidb-cloud-lake/sql/add.md) + +## Examples + +```sql +SELECT ADD(1, 2.3), PLUS(1, 2.3); + +┌───────────────────────────────┐ +│ add(1, 2.3) │ plus(1, 2.3) │ +├───────────────┼───────────────┤ +│ 3.3 │ 3.3 │ +└───────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/point-polygon.md b/tidb-cloud-lake/sql/point-polygon.md new file mode 100644 index 0000000000000..887d4806586db --- /dev/null +++ b/tidb-cloud-lake/sql/point-polygon.md @@ -0,0 +1,23 @@ +--- +title: POINT_IN_POLYGON +--- + +Calculates whether a given point falls within the polygon formed by joining multiple points. A polygon is a closed shape connected by coordinate pairs in the order they appear. Changing the order of coordinate pairs can result in a different shape. + +## Syntax + +```sql +POINT_IN_POLYGON((x,y), [(a,b), (c,d), (e,f) ... ]) +``` + +## Examples + +```sql +SELECT POINT_IN_POLYGON((3., 3.), [(6, 0), (8, 4), (5, 8), (0, 2)]); + +┌────────────────────────────────────────────────────────────┐ +│ point_in_polygon((3, 3), [(6, 0), (8, 4), (5, 8), (0, 2)]) │ +├────────────────────────────────────────────────────────────┤ +│ 1 │ +└────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/policy-references.md b/tidb-cloud-lake/sql/policy-references.md new file mode 100644 index 0000000000000..ebdcc3acbfe53 --- /dev/null +++ b/tidb-cloud-lake/sql/policy-references.md @@ -0,0 +1,115 @@ +--- +title: POLICY_REFERENCES +--- + +Returns the associations between security policies (Masking Policy or Row Access Policy) and tables/views. You can query by policy name to find all tables using it, or by table name to find all policies applied to it. + +See also: + +- [MASKING POLICY](/tidb-cloud-lake/guides/masking-policy.md) + +## Syntax + +```sql +-- Find all tables/views using a specific policy +POLICY_REFERENCES(POLICY_NAME => '') + +-- Find all policies applied to a specific table/view +POLICY_REFERENCES( + REF_ENTITY_NAME => '[.]', + REF_ENTITY_DOMAIN => 'TABLE' | 'VIEW' +) +``` + +## Output Columns + +| Column | Description | +|----------------------|--------------------------------------------------------------------| +| policy_name | Name of the policy | +| policy_kind | Type of policy: `MASKING POLICY` or `ROW ACCESS POLICY` | +| ref_database_name | Database containing the referenced table/view | +| ref_entity_name | Name of the referenced table or view | +| ref_entity_domain | `TABLE` or `VIEW` | +| ref_column_name | Column the policy is applied to (for masking policies) | +| ref_arg_column_names | Argument columns used by the policy | +| policy_status | Policy status, typically `ACTIVE` | + +## Examples + +### Find Tables Using a Row Access Policy + +```sql +-- Create a row access policy +CREATE ROW ACCESS POLICY rap_employees AS (department STRING) RETURNS BOOLEAN -> + CASE + WHEN current_role() = 'admin' THEN true + WHEN department = 'Engineering' THEN true + ELSE false + END; + +-- Apply the policy to a table +CREATE TABLE employees(id INT, name STRING, department STRING); +ALTER TABLE employees ADD ROW ACCESS POLICY rap_employees ON (department); + +-- Find all tables using this policy +SELECT * FROM policy_references(POLICY_NAME => 'rap_employees'); + +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ policy_name │ policy_kind │ ref_database_name │ ref_entity_name │ ref_entity_domain │ ref_column_name │ ref_arg_column_names │ policy_status │ +├─────────────────┼───────────────────┼───────────────────┼─────────────────┼───────────────────┼─────────────────┼──────────────────────┼───────────────┤ +│ rap_employees │ ROW ACCESS POLICY │ default │ employees │ TABLE │ NULL │ department │ ACTIVE │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Find All Policies Applied to a Table + +```sql +-- Create a masking policy +CREATE MASKING POLICY mask_salary AS (val INT) RETURNS INT -> + CASE WHEN current_role() = 'admin' THEN val ELSE 0 END; + +-- Apply both policies to the table +ALTER TABLE employees ADD COLUMN salary INT; +ALTER TABLE employees MODIFY COLUMN salary SET MASKING POLICY mask_salary; + +-- Find all policies on this table +SELECT * FROM policy_references( + REF_ENTITY_NAME => 'default.employees', + REF_ENTITY_DOMAIN => 'TABLE' +); + +┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ policy_name │ policy_kind │ ref_database_name │ ref_entity_name │ ref_entity_domain │ ref_column_name │ ref_arg_column_names │ policy_status │ +├─────────────────┼───────────────────┼───────────────────┼─────────────────┼───────────────────┼─────────────────┼──────────────────────┼───────────────┤ +│ mask_salary │ MASKING POLICY │ default │ employees │ TABLE │ salary │ NULL │ ACTIVE │ +│ rap_employees │ ROW ACCESS POLICY │ default │ employees │ TABLE │ NULL │ department │ ACTIVE │ +└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Find Tables Using a Masking Policy with Multiple Arguments + +```sql +-- Create a masking policy with conditional arguments +CREATE MASKING POLICY mask_ssn AS (val STRING, user_role STRING) RETURNS STRING -> + CASE + WHEN user_role = current_role() THEN val + ELSE '***-**-****' + END; + +-- Apply to multiple tables +CREATE TABLE employees1(id INT, ssn STRING, role STRING); +CREATE TABLE employees2(id INT, ssn STRING, role STRING); + +ALTER TABLE employees1 MODIFY COLUMN ssn SET MASKING POLICY mask_ssn USING (ssn, role); +ALTER TABLE employees2 MODIFY COLUMN ssn SET MASKING POLICY mask_ssn USING (ssn, role); + +-- Find all tables using this policy +SELECT * FROM policy_references(POLICY_NAME => 'mask_ssn') ORDER BY ref_entity_name; + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ policy_name │ policy_kind │ ref_database_name │ ref_entity_name │ ref_entity_domain │ ref_column_name │ ref_arg_column_names │ policy_status │ +├─────────────┼────────────────┼───────────────────┼─────────────────┼───────────────────┼─────────────────┼──────────────────────┼───────────────┤ +│ mask_ssn │ MASKING POLICY │ default │ employees1 │ TABLE │ ssn │ role │ ACTIVE │ +│ mask_ssn │ MASKING POLICY │ default │ employees2 │ TABLE │ ssn │ role │ ACTIVE │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/position.md b/tidb-cloud-lake/sql/position.md new file mode 100644 index 0000000000000..32f54fd83603f --- /dev/null +++ b/tidb-cloud-lake/sql/position.md @@ -0,0 +1,42 @@ +--- +title: POSITION +--- + +POSITION(substr IN str) is a synonym for LOCATE(substr,str). +Returns the position of the first occurrence of substring substr in string str. +Returns 0 if substr is not in str. Returns NULL if any argument is NULL. + +## Syntax + +```sql +POSITION( IN ) +``` + +## Arguments + +| Arguments | Description | +|------------|----------------| +| `` | The substring. | +| `` | The string. | + +## Return Type + +`BIGINT` + +## Examples + +```sql +SELECT POSITION('bar' IN 'foobarbar') ++----------------------------+ +| POSITION('bar' IN 'foobarbar') | ++----------------------------+ +| 4 | ++----------------------------+ + +SELECT POSITION('xbar' IN 'foobar') ++--------------------------+ +| POSITION('xbar' IN 'foobar') | ++--------------------------+ +| 0 | ++--------------------------+ +``` diff --git a/tidb-cloud-lake/sql/pow.md b/tidb-cloud-lake/sql/pow.md new file mode 100644 index 0000000000000..12cdf547f25de --- /dev/null +++ b/tidb-cloud-lake/sql/pow.md @@ -0,0 +1,27 @@ +--- +title: POW +--- + +Returns the value of `x` to the power of `y`. + +## Syntax + +```sql +POW( ) +``` + +## Aliases + +- [POWER](/tidb-cloud-lake/sql/power.md) + +## Examples + +```sql +SELECT POW(-2, 2), POWER(-2, 2); + +┌─────────────────────────────────┐ +│ pow((- 2), 2) │ power((- 2), 2) │ +├───────────────┼─────────────────┤ +│ 4 │ 4 │ +└─────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/power.md b/tidb-cloud-lake/sql/power.md new file mode 100644 index 0000000000000..c25370d22341c --- /dev/null +++ b/tidb-cloud-lake/sql/power.md @@ -0,0 +1,5 @@ +--- +title: POWER +--- + +Alias for [POW](/tidb-cloud-lake/sql/pow.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/presign.md b/tidb-cloud-lake/sql/presign.md new file mode 100644 index 0000000000000..e464caac5a1aa --- /dev/null +++ b/tidb-cloud-lake/sql/presign.md @@ -0,0 +1,79 @@ +--- +title: PRESIGN +sidebar_position: 3 +--- + +Generates the pre-signed URL for a staged file by the stage name and file path you provide. The pre-signed URL enables you to access the file through a web browser or an API request. + +:::tip +When using cURL to interact with non-S3-like storage, remember to include the headers generated by the PRESIGN command for secure file uploads or downloads. For example, + +```bash +curl -H "" -o books.csv + +curl -X PUT -T books.csv -H "" +``` +::: + +See also: + +- [LIST STAGE FILES](/tidb-cloud-lake/sql/list-stage-files.md): Lists files in a stage. +- [REMOVE STAGE FILES](05-ddl-remove-stage.md): Removes files from a stage. + +## Syntax + +```sql +PRESIGN [ { DOWNLOAD | UPLOAD }] @/.../ [ EXPIRE = ] +``` +Where: + +`[ { DOWNLOAD | UPLOAD }]`: Specifies that the pre-signed URL is used for download or upload. The default value is `DOWNLOAD`. + +`[ EXPIRE = ]`: Specifies the length of time (in seconds) after which the pre-signed URL expires. The default value is 3,600 seconds. + +## Examples + +### Generating and Using Pre-signed URLs for Download + +This example generates the pre-signed URL for downloading the file `books.csv` on the stage `my-stage`: + +```sql +PRESIGN @my_stage/books.csv ++--------+---------+---------------------------------------------------------------------------------+ +| method | headers | url | ++--------+---------+---------------------------------------------------------------------------------+ +| GET | {} | https://example.s3.amazonaws.com/books.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&... | ++--------+---------+---------------------------------------------------------------------------------+ +``` + +This example functions in the same way as the preceding one: + +```sql +PRESIGN DOWNLOAD @my_stage/books.csv +``` + +To download the file with the pre-signed URL and save it as `books.csv`, execute the following command: + +```bash +curl -o books.csv +``` + +This example generates the pre-signed URL that expires in 7,200 seconds (2 hours): + +```sql +PRESIGN @my_stage/books.csv EXPIRE = 7200 +``` + +### Generating and Using Pre-signed URLs for Upload + +This example generates the pre-signed URL for uploading a file as `books.csv` to the stage `my_stage`: + +```sql +PRESIGN UPLOAD @my_stage/books.csv +``` + +To upload the file `books.csv` with the pre-signed URL, execute the following command: + +```bash +curl -X PUT -T books.csv +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/previous-day.md b/tidb-cloud-lake/sql/previous-day.md new file mode 100644 index 0000000000000..8af77023bad42 --- /dev/null +++ b/tidb-cloud-lake/sql/previous-day.md @@ -0,0 +1,38 @@ +--- +title: PREVIOUS_DAY +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the date of the most recent specified day of the week before the given date or timestamp. + +## Syntax + +```sql +PREVIOUS_DAY(, ) +``` + +| Parameter | Description | +|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `` | A `DATE` or `TIMESTAMP` value to calculate the previous occurrence of the specified day. | +| `` | The target day of the week to find the previous occurrence of. Accepted values include `monday`, `tuesday`, `wednesday`, `thursday`, `friday`, `saturday`, and `sunday`. | + +## Return Type + +Date. + +## Examples + +If you need to find the previous Friday before a given date, such as 2024-11-13: + +```sql +SELECT PREVIOUS_DAY(to_date('2024-11-13'), friday) AS last_friday; + +┌─────────────┐ +│ last_friday │ +├─────────────┤ +│ 2024-11-08 │ +└─────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/qualify.md b/tidb-cloud-lake/sql/qualify.md new file mode 100644 index 0000000000000..913f7af0dd20b --- /dev/null +++ b/tidb-cloud-lake/sql/qualify.md @@ -0,0 +1,108 @@ +--- +title: QUALIFY +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +QUALIFY is a clause used to filter the results of a window function. Therefore, to successfully utilize the QUALIFY clause, there must be at least one window function in the SELECT list or the QUALIFY clause (See [Examples](#examples) for each case). In other words, QUALIFY is evaluated after window functions are computed. Here’s the typical order of execution for a query with a QUALIFY statement clause: + +1. FROM +2. WHERE +3. GROUP BY +4. HAVING +5. WINDOW FUNCTION +6. QUALIFY +7. DISTINCT +8. ORDER BY +9. LIMIT + +## Syntax + +```sql +QUALIFY +``` + +## Examples + +This example demonstrates the use of ROW_NUMBER() to assign sequential numbers to employees within their departments, ordered by descending salary. Leveraging the QUALIFY clause, we filter the results to display only the top earner in each department. + +```sql +-- Prepare the data +CREATE TABLE employees ( + employee_id INT, + first_name VARCHAR, + last_name VARCHAR, + department VARCHAR, + salary INT +); + +INSERT INTO employees (employee_id, first_name, last_name, department, salary) VALUES + (1, 'John', 'Doe', 'IT', 90000), + (2, 'Jane', 'Smith', 'HR', 85000), + (3, 'Mike', 'Johnson', 'IT', 82000), + (4, 'Sara', 'Williams', 'Sales', 77000), + (5, 'Tom', 'Brown', 'HR', 75000); + +-- Select employee details along with the row number partitioned by department and ordered by salary in descending order. +SELECT + employee_id, + first_name, + last_name, + department, + salary, + ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num +FROM + employees; + +┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ employee_id │ first_name │ last_name │ department │ salary │ row_num │ +├─────────────────┼──────────────────┼──────────────────┼──────────────────┼─────────────────┼─────────┤ +│ 2 │ Jane │ Smith │ HR │ 85000 │ 1 │ +│ 5 │ Tom │ Brown │ HR │ 75000 │ 2 │ +│ 1 │ John │ Doe │ IT │ 90000 │ 1 │ +│ 3 │ Mike │ Johnson │ IT │ 82000 │ 2 │ +│ 4 │ Sara │ Williams │ Sales │ 77000 │ 1 │ +└──────────────────────────────────────────────────────────────────────────────────────────────────────┘ + +-- Select employee details along with the row number partitioned by department and ordered by salary in descending order. +-- Add a filter to only include rows where the row number is 1, selecting the employee with the highest salary in each department. +SELECT + employee_id, + first_name, + last_name, + department, + salary, + ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num +FROM + employees +QUALIFY row_num = 1; + +┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ employee_id │ first_name │ last_name │ department │ salary │ row_num │ +├─────────────────┼──────────────────┼──────────────────┼──────────────────┼─────────────────┼─────────┤ +│ 2 │ Jane │ Smith │ HR │ 85000 │ 1 │ +│ 1 │ John │ Doe │ IT │ 90000 │ 1 │ +│ 4 │ Sara │ Williams │ Sales │ 77000 │ 1 │ +└──────────────────────────────────────────────────────────────────────────────────────────────────────┘ + +-- Databend allows the direct use of window functions in the QUALIFY clause without requiring them to be explicitly named in the SELECT list. + +SELECT + employee_id, + first_name, + last_name, + department, + salary +FROM + employees +QUALIFY ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) = 1; + +┌────────────────────────────────────────────────────────────────────────────────────────────┐ +│ employee_id │ first_name │ last_name │ department │ salary │ +├─────────────────┼──────────────────┼──────────────────┼──────────────────┼─────────────────┤ +│ 2 │ Jane │ Smith │ HR │ 85000 │ +│ 1 │ John │ Doe │ IT │ 90000 │ +│ 4 │ Sara │ Williams │ Sales │ 77000 │ +└────────────────────────────────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/quantile-cont.md b/tidb-cloud-lake/sql/quantile-cont.md new file mode 100644 index 0000000000000..26354f9bea7a3 --- /dev/null +++ b/tidb-cloud-lake/sql/quantile-cont.md @@ -0,0 +1,62 @@ +--- +title: QUANTILE_CONT +--- + +Aggregate function. + +The QUANTILE_CONT() function computes the interpolated quantile number of a numeric data sequence. + +:::caution +NULL values are not counted. +::: + +## Syntax + +```sql +QUANTILE_CONT()() + +QUANTILE_CONT(level1, level2, ...)() +``` + +## Arguments + +| Arguments | Description | +|-------------|-------------------------------------------------------------------------------------------------------------------------------------------------| +| `` | Any numerical expression | + +## Return Type + +Float64 or float64 array based on level number. + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE sales_data ( + id INT, + sales_person_id INT, + sales_amount FLOAT +); + +INSERT INTO sales_data (id, sales_person_id, sales_amount) +VALUES (1, 1, 5000), + (2, 2, 5500), + (3, 3, 6000), + (4, 4, 6500), + (5, 5, 7000); +``` + +**Query Demo: Calculate 50th Percentile (Median) of Sales Amount using Interpolation** +```sql +SELECT QUANTILE_CONT(0.5)(sales_amount) AS median_sales_amount +FROM sales_data; +``` + +**Result** +```sql +| median_sales_amount | +|-----------------------| +| 6000.0 | +``` + diff --git a/tidb-cloud-lake/sql/quantile-disc.md b/tidb-cloud-lake/sql/quantile-disc.md new file mode 100644 index 0000000000000..af640f139f96a --- /dev/null +++ b/tidb-cloud-lake/sql/quantile-disc.md @@ -0,0 +1,62 @@ +--- +title: QUANTILE_DISC +--- + +Aggregate function. + +The `QUANTILE_DISC()` function computes the exact quantile number of a numeric data sequence. +The `QUANTILE` alias to `QUANTILE_DISC` + +:::caution +NULL values are not counted. +::: + +## Syntax + +```sql +QUANTILE_DISC()() + +QUANTILE_DISC(level1, level2, ...)() +``` + +## Arguments + +| Arguments | Description | +|------------|-----------------------------------------------------------------------------------------------------------------------------------------------| +| `level(s)` | level(s) of quantile. Each level is constant floating-point number from 0 to 1. We recommend using a level value in the range of [0.01, 0.99] | +| `` | Any numerical expression | + +## Return Type + +InputType or array of InputType based on level number. + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE salary_data ( + id INT, + employee_id INT, + salary FLOAT +); + +INSERT INTO salary_data (id, employee_id, salary) +VALUES (1, 1, 50000), + (2, 2, 55000), + (3, 3, 60000), + (4, 4, 65000), + (5, 5, 70000); +``` + +**Query Demo: Calculate 25th and 75th Percentile of Salaries** +```sql +SELECT QUANTILE_DISC(0.25, 0.75)(salary) AS salary_quantiles +FROM salary_data; +``` + +**Result** +```sql +| salary_quantiles | +|---------------------| +| [55000.0, 65000.0] | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/quantile-tdigest-weighted.md b/tidb-cloud-lake/sql/quantile-tdigest-weighted.md new file mode 100644 index 0000000000000..11951d5d9a5aa --- /dev/null +++ b/tidb-cloud-lake/sql/quantile-tdigest-weighted.md @@ -0,0 +1,63 @@ +--- +title: QUANTILE_TDIGEST_WEIGHTED +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Computes an approximate quantile of a numeric data sequence using the [t-digest](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf) algorithm. +This function takes into account the weight of each sequence member. Memory consumption is **log(n)**, where **n** is a number of values. + +:::caution +NULL values are not included in the calculation. +::: + +## Syntax + +```sql +QUANTILE_TDIGEST_WEIGHTED([, , ...])(, ) +``` + +## Arguments + +| Arguments | Description | +|-----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------| +| `` | A level of quantile represents a constant floating-point number ranging from 0 to 1. It is recommended to use a level value in the range of [0.01, 0.99]. | +| `` | Any numerical expression | +| `` | Any unsigned integer expression. Weight is a number of value occurrences. | + +## Return Type + +Returns either a Float64 value or an array of Float64 values, depending on the number of quantile levels specified. + +## Example + +```sql +-- Create a table and insert sample data +CREATE TABLE sales_data ( + id INT, + sales_person_id INT, + sales_amount FLOAT +); + +INSERT INTO sales_data (id, sales_person_id, sales_amount) +VALUES (1, 1, 5000), + (2, 2, 5500), + (3, 3, 6000), + (4, 4, 6500), + (5, 5, 7000); + +SELECT QUANTILE_TDIGEST_WEIGHTED(0.5)(sales_amount, 1) AS median_sales_amount +FROM sales_data; + +median_sales_amount| +-------------------+ + 6000.0| + +SELECT QUANTILE_TDIGEST_WEIGHTED(0.5, 0.8)(sales_amount, 1) +FROM sales_data; + +quantile_tdigest_weighted(0.5, 0.8)(sales_amount)| +-------------------------------------------------+ +[6000.0,7000.0] | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/quantile-tdigest.md b/tidb-cloud-lake/sql/quantile-tdigest.md new file mode 100644 index 0000000000000..1e0ff7799a12d --- /dev/null +++ b/tidb-cloud-lake/sql/quantile-tdigest.md @@ -0,0 +1,61 @@ +--- +title: QUANTILE_TDIGEST +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Computes an approximate quantile of a numeric data sequence using the [t-digest](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf) algorithm. + +:::caution +NULL values are not included in the calculation. +::: + +## Syntax + +```sql +QUANTILE_TDIGEST([, , ...])() +``` + +## Arguments + +| Arguments | Description | +|-------------|-------------------------------------------------------------------------------------------------------------------------------------------------| +| `` | A level of quantile represents a constant floating-point number ranging from 0 to 1. It is recommended to use a level value in the range of [0.01, 0.99]. | +| `` | Any numerical expression | + +## Return Type + +Returns either a Float64 value or an array of Float64 values, depending on the number of quantile levels specified. + +## Example + +```sql +-- Create a table and insert sample data +CREATE TABLE sales_data ( + id INT, + sales_person_id INT, + sales_amount FLOAT +); + +INSERT INTO sales_data (id, sales_person_id, sales_amount) +VALUES (1, 1, 5000), + (2, 2, 5500), + (3, 3, 6000), + (4, 4, 6500), + (5, 5, 7000); + +SELECT QUANTILE_TDIGEST(0.5)(sales_amount) AS median_sales_amount +FROM sales_data; + +median_sales_amount| +-------------------+ + 6000.0| + +SELECT QUANTILE_TDIGEST(0.5, 0.8)(sales_amount) +FROM sales_data; + +quantile_tdigest(0.5, 0.8)(sales_amount)| +----------------------------------------+ +[6000.0,7000.0] | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/quarter.md b/tidb-cloud-lake/sql/quarter.md new file mode 100644 index 0000000000000..ad21931fbe91d --- /dev/null +++ b/tidb-cloud-lake/sql/quarter.md @@ -0,0 +1,8 @@ +--- +title: QUARTER +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Alias for [TO_QUARTER](/tidb-cloud-lake/sql/to-quarter.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/query-history.md b/tidb-cloud-lake/sql/query-history.md new file mode 100644 index 0000000000000..5092821563075 --- /dev/null +++ b/tidb-cloud-lake/sql/query-history.md @@ -0,0 +1,62 @@ +--- +title: QUERY_HISTORY +sidebar_position: 7 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Retrieves query execution logs for analysis and monitoring purposes. + +## Syntax + +```sql +QUERY_HISTORY + [ BY WAREHOUSE ] + [ FROM '' ] + [ TO '' ] + [ LIMIT ] +``` + +| Parameter | Description | +| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------- | +| `BY WAREHOUSE` | Optional. Filters logs to a specific warehouse. Empty names raise an error. | +| `FROM` | Optional. Start timestamp for the query range. Format: `YYYY-MM-DD HH:MM:SS` (UTC or explicit timezone). Defaults to 1 hour before `TO`. | +| `TO` | Optional. End timestamp for the query range. Format: `YYYY-MM-DD HH:MM:SS` (UTC or explicit timezone). Defaults to current time. | +| `LIMIT` | Optional. Maximum number of records to return. Defaults to `10`. Must be a positive integer. | + +## Output Columns + +The result includes columns such as: + +| Column | Description | +| ------------ | ------------------------------------- | +| `query_id` | Unique identifier for the query | +| `query_text` | The SQL statement executed | +| `scan_bytes` | Amount of data scanned | +| ... | Additional query metrics and metadata | + +## Examples + +Get recent query history for a specific warehouse: + +```sql +QUERY_HISTORY + BY WAREHOUSE etl_wh + FROM '2023-08-20 00:00:00' + TO '2023-08-20 06:00:00' + LIMIT 200; +``` + +Get the last 10 queries across all warehouses: + +```sql +QUERY_HISTORY; +``` + +Get query history with a custom limit: + +```sql +QUERY_HISTORY LIMIT 50; +``` diff --git a/tidb-cloud-lake/sql/query-operators.md b/tidb-cloud-lake/sql/query-operators.md new file mode 100644 index 0000000000000..e9678219fdd31 --- /dev/null +++ b/tidb-cloud-lake/sql/query-operators.md @@ -0,0 +1,16 @@ +--- +title: Query Operators +--- + +This page provides reference information for the query operators in Databend. + +## Operator Types + +| Operator Type | Description | +|--------------|-------------| +| **[Arithmetic](/tidb-cloud-lake/sql/arithmetic-operators.md)** | Mathematical operations (+, -, *, /, %, DIV) | +| **[Comparison](/tidb-cloud-lake/sql/comparison-operators.md)** | Value comparisons (=, !=, <, >, <=, >=, BETWEEN, IN) | +| **[Logical](/tidb-cloud-lake/sql/logical-operators.md)** | Boolean logic (AND, OR, NOT, XOR) | +| **[JSON](/tidb-cloud-lake/sql/json-operators.md)** | JSON data operations (::, ->, ->>, @>, <@) | +| **[Set](/tidb-cloud-lake/sql/set.md)** | Combine query results (UNION, INTERSECT, EXCEPT) | +| **[Subquery](/tidb-cloud-lake/sql/subquery-operators.md)** | Nested queries (EXISTS, IN, ANY, ALL, SOME) | \ No newline at end of file diff --git a/tidb-cloud-lake/sql/query-syntax.md b/tidb-cloud-lake/sql/query-syntax.md new file mode 100644 index 0000000000000..5f2a82d72675f --- /dev/null +++ b/tidb-cloud-lake/sql/query-syntax.md @@ -0,0 +1,59 @@ +--- +title: Query Syntax +--- + +This page provides reference information for the query syntax in Databend. Each component can be used individually or combined to build powerful queries. + +## Core Query Components + +| Component | Description | +|-----------|-------------| +| **[SELECT](/tidb-cloud-lake/sql/select.md)** | Retrieve data from tables - the foundation of all queries | +| **[FROM / JOIN](/tidb-cloud-lake/sql/join.md)** | Specify data sources and combine multiple tables | +| **[WHERE](/tidb-cloud-lake/sql/select.md#where-clause)** | Filter rows based on conditions | +| **[GROUP BY](/tidb-cloud-lake/sql/group-by.md)** | Group rows and perform aggregations (SUM, COUNT, AVG, etc.) | +| **[HAVING](/tidb-cloud-lake/sql/group-by.md#having-clause)** | Filter grouped results | +| **[ORDER BY](/tidb-cloud-lake/sql/select.md#order-by-clause)** | Sort query results | +| **[LIMIT / TOP](/tidb-cloud-lake/sql/top.md)** | Restrict the number of rows returned | + +## Advanced Features + +| Component | Description | +|-----------|-------------| +| **[WITH (CTE)](/tidb-cloud-lake/sql/clause.md)** | Define reusable query blocks for complex logic | +| **[PIVOT](/tidb-cloud-lake/sql/pivot.md)** | Convert rows to columns (wide format) | +| **[UNPIVOT](/tidb-cloud-lake/sql/unpivot.md)** | Convert columns to rows (long format) | +| **[QUALIFY](/tidb-cloud-lake/sql/qualify.md)** | Filter rows after window function calculations | +| **[VALUES](/tidb-cloud-lake/sql/values.md)** | Create inline temporary data sets | + +## Time Travel & Streaming + +| Component | Description | +|-----------|-------------| +| **[AT](/tidb-cloud-lake/sql/at.md)** | Query data at a specific point in time | +| **[CHANGES](/tidb-cloud-lake/sql/changes.md)** | Track insertions, updates, and deletions | +| **[WITH CONSUME](/tidb-cloud-lake/sql/consume.md)** | Process streaming data with offset management | +| **[WITH STREAM HINTS](/tidb-cloud-lake/sql/stream-hints.md)** | Optimize stream processing behavior | + +## Query Execution + +| Component | Description | +|-----------|-------------| +| **[Settings](/tidb-cloud-lake/sql/settings-clause.md)** | Configure query optimization and execution parameters | + +## Query Structure + +A typical Databend query follows this structure: + +```sql +[WITH cte_expressions] +SELECT [TOP n] columns +FROM table +[JOIN other_tables] +[WHERE conditions] +[GROUP BY columns] +[HAVING group_conditions] +[QUALIFY window_conditions] +[ORDER BY columns] +[LIMIT n] +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/query.md b/tidb-cloud-lake/sql/query.md new file mode 100644 index 0000000000000..3b9c4ddf1d1d7 --- /dev/null +++ b/tidb-cloud-lake/sql/query.md @@ -0,0 +1,182 @@ +--- +title: QUERY +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +`QUERY` filters rows by matching a Lucene-style query expression against columns that have an inverted index. Use dot notation to navigate nested fields inside `VARIANT` columns. The function is valid only in a `WHERE` clause. + +:::info +Databend's QUERY function is inspired by Elasticsearch's [QUERY](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html#sql-functions-search-query). +::: + +## Syntax + +```sql +QUERY(''[, '']) +``` + +`` is an optional semicolon-separated list of `key=value` pairs that adjusts how the search works. + +## Building Query Expressions + +| Expression | Purpose | Example | +|------------|---------|---------| +| `column:keyword` | Matches rows where `column` contains the keyword. Append `*` for suffix matching. | `QUERY('meta.detections.label:pedestrian')` | +| `column:"exact phrase"` | Matches rows that contain the exact phrase. | `QUERY('meta.scene.summary:"vehicle stopped at red traffic light"')` | +| `column:+required -excluded` | Requires or excludes terms in the same column. | `QUERY('meta.tags:+commute -cyclist')` | +| `column:term1 AND term2` / `column:term1 OR term2` | Combines multiple terms with boolean operators. `AND` has higher precedence than `OR`. | `QUERY('meta.signals.traffic_light:red AND meta.vehicle.lane:center')` | +| `column:IN [value1 value2 ...]` | Matches any value from the list. | `QUERY('meta.tags:IN [stop urban]')` | +| `column:[min TO max]` | Performs inclusive range search. Use `*` to leave one side open. | `QUERY('meta.vehicle.speed_kmh:[0 TO 10]')` | +| `column:{min TO max}` | Performs exclusive range search that omits the boundary values. | `QUERY('meta.vehicle.speed_kmh:{0 TO 10}')` | +| `column:term^boost` | Increases the weight of matches in a specific column. | `QUERY('meta.signals.traffic_light:red^1.0 meta.tags:urban^2.0')` | + +### Nested `VARIANT` Fields + +Use dot notation to address inner fields inside a `VARIANT` column. Databend evaluates the path across objects and arrays. + +| Pattern | Description | Example | +|---------|-------------|---------| +| `variant_col.field:value` | Matches an inner field. | `QUERY('meta.signals.traffic_light:red')` | +| `variant_col.field:IN [ ... ]` | Matches any value inside arrays. | `QUERY('meta.detections.label:IN [pedestrian cyclist]')` | +| `variant_col.field:[min TO max]` | Applies range search to numeric inner fields. | `QUERY('meta.vehicle.speed_kmh:[0 TO 10]')` | + +## Options + +| Option | Values | Description | Example | +|--------|--------|-------------|---------| +| `fuzziness` | `1` or `2` | Matches terms within the specified Levenshtein distance. | `SELECT id FROM frames WHERE QUERY('meta.detections.label:pedestrain', 'fuzziness=1');` | +| `operator` | `OR` (default) or `AND` | Controls how multiple terms are combined when no explicit boolean operator is supplied. | `SELECT id FROM frames WHERE QUERY('meta.scene.weather:rain fog', 'operator=AND');` | +| `lenient` | `true` or `false` | Suppresses parsing errors and returns an empty result set when `true`. | `SELECT id FROM frames WHERE QUERY('meta.detections.label:()', 'lenient=true');` | + +## Examples + +### Set Up a Smart-Driving Dataset + +```sql +CREATE OR REPLACE TABLE frames ( + id INT, + meta VARIANT, + INVERTED INDEX idx_meta (meta) +); + +INSERT INTO frames VALUES + (1, '{ + "frame":{"source":"dashcam_front","timestamp":"2025-10-21T08:32:05Z","location":{"city":"San Francisco","intersection":"Market & 5th","gps":[37.7825,-122.4072]}}, + "vehicle":{"speed_kmh":48,"acceleration":0.8,"lane":"center"}, + "signals":{"traffic_light":"green","distance_m":55,"speed_limit_kmh":50}, + "detections":[ + {"label":"car","confidence":0.96,"distance_m":15,"relative_speed_kmh":2}, + {"label":"pedestrian","confidence":0.88,"distance_m":12,"intent":"crossing"} + ], + "scene":{"weather":"clear","time_of_day":"day","visibility":"good"}, + "tags":["downtown","commute","green-light"], + "model":"perception-net-v5" + }'), + (2, '{ + "frame":{"source":"dashcam_front","timestamp":"2025-10-21T08:32:06Z","location":{"city":"San Francisco","intersection":"Mission & 6th","gps":[37.7829,-122.4079]}}, + "vehicle":{"speed_kmh":9,"acceleration":-1.1,"lane":"center"}, + "signals":{"traffic_light":"red","distance_m":18,"speed_limit_kmh":40}, + "detections":[ + {"label":"traffic_light","state":"red","confidence":0.99,"distance_m":18}, + {"label":"bike","confidence":0.82,"distance_m":9,"relative_speed_kmh":3} + ], + "scene":{"weather":"clear","time_of_day":"day","visibility":"good"}, + "tags":["stop","cyclist","urban"], + "model":"perception-net-v5" + }'), + (3, '{ + "frame":{"source":"dashcam_front","timestamp":"2025-10-21T08:32:07Z","location":{"city":"San Francisco","intersection":"SOMA School Zone","gps":[37.7808,-122.4016]}}, + "vehicle":{"speed_kmh":28,"acceleration":0.2,"lane":"right"}, + "signals":{"traffic_light":"yellow","distance_m":32,"speed_limit_kmh":25}, + "detections":[ + {"label":"traffic_sign","text":"SCHOOL","confidence":0.91,"distance_m":25}, + {"label":"pedestrian","confidence":0.76,"distance_m":8,"intent":"waiting"} + ], + "scene":{"weather":"overcast","time_of_day":"day","visibility":"moderate"}, + "tags":["school-zone","caution"], + "model":"perception-net-v5" + }'); +``` + +### Example: Boolean AND + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.signals.traffic_light:red AND meta.vehicle.speed_kmh:[0 TO 10]'); +-- Returns id 2 +``` + +### Example: Boolean OR + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.signals.traffic_light:red OR meta.detections.label:bike'); +-- Returns id 2 +``` + +### Example: IN List Matching + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.tags:IN [stop urban]'); +-- Returns id 2 +``` + +### Example: Inclusive Range + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.vehicle.speed_kmh:[0 TO 10]'); +-- Returns id 2 +``` + +### Example: Exclusive Range + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.vehicle.speed_kmh:{0 TO 10}'); +-- Returns id 2 +``` + +### Example: Boost Across Fields + +```sql +SELECT id, meta['frame']['timestamp'] AS ts, SCORE() +FROM frames +WHERE QUERY('meta.signals.traffic_light:red^1.0 AND meta.tags:urban^2.0'); +-- Returns id 2 with higher relevance +``` + +### Example: Detect High-Confidence Pedestrians + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.detections.label:IN [pedestrian cyclist] AND meta.detections.confidence:[0.8 TO *]'); +-- Returns ids 1 and 3 +``` + +### Example: Filter by Phrase + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.scene.summary:"vehicle stopped at red traffic light"'); +-- Returns id 2 +``` + +### Example: School-Zone Filter + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.detections.text:SCHOOL AND meta.scene.time_of_day:day'); +-- Returns id 3 +``` diff --git a/tidb-cloud-lake/sql/quote.md b/tidb-cloud-lake/sql/quote.md new file mode 100644 index 0000000000000..68b82e4755d6a --- /dev/null +++ b/tidb-cloud-lake/sql/quote.md @@ -0,0 +1,31 @@ +--- +title: QUOTE +--- + +Quotes a string to produce a result that can be used as a properly escaped data value in an SQL statement. + +## Syntax + +```sql +QUOTE() +``` + +## Examples + +```sql +SELECT QUOTE('Don\'t!'); ++-----------------+ +| QUOTE('Don't!') | ++-----------------+ +| Don\'t! | ++-----------------+ + +SELECT QUOTE(NULL); ++-------------+ +| QUOTE(NULL) | ++-------------+ +| NULL | ++-------------+ +``` + + diff --git a/tidb-cloud-lake/sql/radians.md b/tidb-cloud-lake/sql/radians.md new file mode 100644 index 0000000000000..4abde6d15efae --- /dev/null +++ b/tidb-cloud-lake/sql/radians.md @@ -0,0 +1,23 @@ +--- +title: RADIANS +--- + +Returns the argument `x`, converted from degrees to radians. + +## Syntax + +```sql +RADIANS( ) +``` + +## Examples + +```sql +SELECT RADIANS(90); + +┌────────────────────┐ +│ radians(90) │ +├────────────────────┤ +│ 1.5707963267948966 │ +└────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/rand-n.md b/tidb-cloud-lake/sql/rand-n.md new file mode 100644 index 0000000000000..5142d69841bce --- /dev/null +++ b/tidb-cloud-lake/sql/rand-n.md @@ -0,0 +1,23 @@ +--- +title: RAND(n) +--- + +Returns a random floating-point value v in the range `0 <= v < 1.0`. To obtain a random integer R in the range `i <= R < j`, use the expression `FLOOR(i + RAND() * (j − i))`. Argument `n` is used as the seed value. For equal argument values, RAND(n) returns the same value each time , and thus produces a repeatable sequence of column values. + +## Syntax + +```sql +RAND( ) +``` + +## Examples + +```sql +SELECT RAND(1); + +┌────────────────────┐ +│ rand(1) │ +├────────────────────┤ +│ 0.7133693869548766 │ +└────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/rand.md b/tidb-cloud-lake/sql/rand.md new file mode 100644 index 0000000000000..2ddf9a0b34d2d --- /dev/null +++ b/tidb-cloud-lake/sql/rand.md @@ -0,0 +1,23 @@ +--- +title: RAND() +--- + +Returns a random floating-point value v in the range `0 <= v < 1.0`. To obtain a random integer R in the range `i <= R < j`, use the expression `FLOOR(i + RAND() * (j − i))`. + +## Syntax + +```sql +RAND() +``` + +## Examples + +```sql +SELECT RAND(); + +┌────────────────────┐ +│ rand() │ +├────────────────────┤ +│ 0.5191511074382174 │ +└────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/range-between.md b/tidb-cloud-lake/sql/range-between.md new file mode 100644 index 0000000000000..3acd56c544422 --- /dev/null +++ b/tidb-cloud-lake/sql/range-between.md @@ -0,0 +1,269 @@ +--- +title: RANGE BETWEEN +--- + +Defines a window frame using value-based boundaries for window functions. + +## Overview + +The `RANGE BETWEEN` clause specifies which rows to include in the window frame based on logical value ranges rather than physical row counts. It's particularly useful for time-based windows, value-based groupings, and handling duplicate values. + +## Syntax + +```sql +FUNCTION() OVER ( + [ PARTITION BY partition_expression ] + [ ORDER BY sort_expression ] + RANGE BETWEEN frame_start AND frame_end +) +``` + +### Frame Boundaries + +| Boundary | Description | Example | +|----------|-------------|---------| +| `UNBOUNDED PRECEDING` | Start of partition | `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW` | +| `value PRECEDING` | Value range before current row | `RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW` | +| `CURRENT ROW` | Current row value | `RANGE BETWEEN CURRENT ROW AND CURRENT ROW` | +| `value FOLLOWING` | Value range after current row | `RANGE BETWEEN CURRENT ROW AND INTERVAL '7' DAY FOLLOWING` | +| `UNBOUNDED FOLLOWING` | End of partition | `RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING` | + +## RANGE vs ROWS + +| Aspect | RANGE | ROWS | +|--------|-------|------| +| **Definition** | Logical value range | Physical row count | +| **Boundaries** | Value-based positions | Row positions | +| **Ties** | Tied values share same frame | Each row independent | +| **Performance** | May be slower with duplicates | Generally faster | +| **Use Case** | Time-based windows, percentile calculations | Moving averages, running totals | + +## Value Types for RANGE + +### 1. Numeric Values +```sql +-- Include rows within ±10 units +RANGE BETWEEN 10 PRECEDING AND 10 FOLLOWING + +-- Include rows with values up to 50 less than current +RANGE BETWEEN 50 PRECEDING AND CURRENT ROW +``` + +### 2. Interval Values (for DATE/TIMESTAMP) +```sql +-- 7-day window +RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW + +-- 1-hour window +RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW + +-- 30-minute centered window +RANGE BETWEEN INTERVAL '15' MINUTE PRECEDING AND INTERVAL '15' MINUTE FOLLOWING +``` + +### 3. No Value Specified (Default) +When no value is specified with `PRECEDING` or `FOLLOWING`, it defaults to `CURRENT ROW`: +```sql +RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- Default behavior +``` + +## Examples + +### Sample Data + +```sql +CREATE TABLE temperature_readings ( + reading_time TIMESTAMP, + sensor_id VARCHAR(10), + temperature DECIMAL(5,2) +); + +INSERT INTO temperature_readings VALUES + ('2024-01-01 00:00:00', 'S1', 20.5), + ('2024-01-01 01:00:00', 'S1', 21.0), + ('2024-01-01 02:00:00', 'S1', 20.8), + ('2024-01-01 03:00:00', 'S1', 22.1), + ('2024-01-01 04:00:00', 'S1', 21.5), + ('2024-01-01 00:00:00', 'S2', 19.8), + ('2024-01-01 01:00:00', 'S2', 20.2), + ('2024-01-01 02:00:00', 'S2', 19.9), + ('2024-01-01 03:00:00', 'S2', 21.0), + ('2024-01-01 04:00:00', 'S2', 20.5); +``` + +### 1. 24-Hour Rolling Average + +```sql +SELECT reading_time, sensor_id, temperature, + AVG(temperature) OVER ( + PARTITION BY sensor_id + ORDER BY reading_time + RANGE BETWEEN INTERVAL '24' HOUR PRECEDING AND CURRENT ROW + ) AS avg_24h +FROM temperature_readings +ORDER BY sensor_id, reading_time; +``` + +### 2. Value-Based Window (Within ±0.5 degrees) + +```sql +SELECT reading_time, sensor_id, temperature, + COUNT(*) OVER ( + PARTITION BY sensor_id + ORDER BY temperature + RANGE BETWEEN 0.5 PRECEDING AND 0.5 FOLLOWING + ) AS similar_readings_count +FROM temperature_readings +ORDER BY sensor_id, temperature; +``` + +### 3. Handling Duplicate Values + +```sql +CREATE TABLE sales_duplicates ( + sale_date DATE, + amount DECIMAL(10,2) +); + +INSERT INTO sales_duplicates VALUES + ('2024-01-01', 100.00), + ('2024-01-01', 100.00), -- Duplicate date + ('2024-01-02', 150.00), + ('2024-01-03', 200.00), + ('2024-01-03', 200.00); -- Duplicate date + +-- RANGE treats duplicate dates as the same "row" for window calculations +SELECT sale_date, amount, + SUM(amount) OVER ( + ORDER BY sale_date + RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW + ) AS running_total_range, + SUM(amount) OVER ( + ORDER BY sale_date + ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW + ) AS running_total_rows +FROM sales_duplicates +ORDER BY sale_date; +``` + +**Result comparison:** +``` +sale_date | amount | running_total_range | running_total_rows +------------+--------+---------------------+-------------------- +2024-01-01 | 100.00 | 200.00 | 100.00 +2024-01-01 | 100.00 | 200.00 | 200.00 -- ROWS: different +2024-01-02 | 150.00 | 350.00 | 350.00 +2024-01-03 | 200.00 | 750.00 | 550.00 +2024-01-03 | 200.00 | 750.00 | 750.00 -- ROWS: different +``` + +### 4. Time-Based Centered Window + +```sql +SELECT reading_time, sensor_id, temperature, + AVG(temperature) OVER ( + PARTITION BY sensor_id + ORDER BY reading_time + RANGE BETWEEN INTERVAL '30' MINUTE PRECEDING + AND INTERVAL '30' MINUTE FOLLOWING + ) AS avg_hour_centered +FROM temperature_readings +ORDER BY sensor_id, reading_time; +``` + +## Common Patterns + +### Time-Based Windows +**Syntax examples:** +```sql +-- 7-day rolling window +RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW + +-- 1-hour centered window +RANGE BETWEEN INTERVAL '30' MINUTE PRECEDING AND INTERVAL '30' MINUTE FOLLOWING + +-- Month-to-date (when ORDER BY is date) +RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW +``` + +**Complete example:** +```sql +-- 7-day rolling average +SELECT sale_date, amount, + AVG(amount) OVER ( + ORDER BY sale_date + RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW + ) AS avg_7day +FROM sales +ORDER BY sale_date; +``` + +### Value-Based Windows +**Syntax examples:** +```sql +-- Within ±10 units +RANGE BETWEEN 10 PRECEDING AND 10 FOLLOWING + +-- Values up to 100 less than current +RANGE BETWEEN 100 PRECEDING AND CURRENT ROW + +-- Note: Complex expressions like (current * 0.05) may not be supported +-- Use fixed values or simple expressions +``` + +**Complete example:** +```sql +-- Include rows within ±0.5 units +SELECT temperature, reading_time, + COUNT(*) OVER ( + ORDER BY temperature + RANGE BETWEEN 0.5 PRECEDING AND 0.5 FOLLOWING + ) AS similar_readings +FROM temperature_readings +ORDER BY temperature; +``` + +### Handling Duplicates +**Syntax examples:** +```sql +-- Include all duplicate values in same window +RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW + +-- Value-based grouping (groups identical values) +RANGE BETWEEN 0 PRECEDING AND 0 FOLLOWING +``` + +**Complete example:** +```sql +-- RANGE treats duplicate dates as same window +SELECT sale_date, amount, + SUM(amount) OVER ( + ORDER BY sale_date + RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW + ) AS running_total_range +FROM sales_duplicates +ORDER BY sale_date; +``` + +## Best Practices + +1. **Use RANGE for value-based windows** - When you care about logical value ranges rather than row counts +2. **Use with DATE/TIMESTAMP** - Perfect for time-based calculations +3. **Handle duplicates intentionally** - RANGE groups duplicate ORDER BY values +4. **Consider performance** - RANGE can be slower than ROWS with many duplicates +5. **Specify intervals clearly** - Use explicit INTERVAL syntax for date/time windows + +## Limitations + +1. **ORDER BY must be numeric or temporal** - RANGE requires sortable values +2. **Only one ORDER BY column** - RANGE works with single column ordering +3. **Value expressions limited** - Simple numeric/interval values, not complex expressions +4. **Performance considerations** - May be slower than ROWS with many duplicate values +5. **Frame boundaries must be compatible** - Same unit type for PRECEDING/FOLLOWING + +## See Also + +- [Window Functions Overview](/tidb-cloud-lake/sql/window-functions.md) +- [ROWS BETWEEN](/tidb-cloud-lake/sql/rows-between.md) - Row-based window frames +- [Aggregate Functions](/tidb-cloud-lake/sql/aggregate-functions.md) - Functions that can use window frames +- [Date and Time Functions](/tidb-cloud-lake/sql/date-time-functions.md) - Useful with RANGE intervals \ No newline at end of file diff --git a/tidb-cloud-lake/sql/range.md b/tidb-cloud-lake/sql/range.md new file mode 100644 index 0000000000000..2afc69c0f1ecb --- /dev/null +++ b/tidb-cloud-lake/sql/range.md @@ -0,0 +1,23 @@ +--- +title: RANGE +--- + +Returns an array collected by [start, end). + +## Syntax + +```sql +RANGE( , ) +``` + +## Examples + +```sql +SELECT RANGE(1, 5); + +┌───────────────┐ +│ range(1, 5) │ +├───────────────┤ +│ [1,2,3,4] │ +└───────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/rank.md b/tidb-cloud-lake/sql/rank.md new file mode 100644 index 0000000000000..a277c18bbf039 --- /dev/null +++ b/tidb-cloud-lake/sql/rank.md @@ -0,0 +1,96 @@ +--- +title: RANK +--- + +Assigns a rank to each row within a partition. Rows with equal values receive the same rank, with gaps in subsequent rankings. + +## Syntax + +```sql +RANK() +OVER ( + [ PARTITION BY partition_expression ] + ORDER BY sort_expression [ ASC | DESC ] +) +``` + +**Arguments:** +- `PARTITION BY`: Optional. Divides rows into partitions +- `ORDER BY`: Required. Determines the ranking order +- `ASC | DESC`: Optional. Sort direction (default: ASC) + +**Notes:** +- Ranks start from 1 +- Equal values get the same rank +- Creates gaps in ranking sequence after ties +- Example: 1, 2, 2, 4, 5 (not 1, 2, 2, 3, 4) + +## Examples + +```sql +-- Create sample data +CREATE TABLE scores ( + student VARCHAR(20), + subject VARCHAR(20), + score INT +); + +INSERT INTO scores VALUES + ('Alice', 'Math', 95), + ('Alice', 'English', 87), + ('Alice', 'Science', 92), + ('Bob', 'Math', 85), + ('Bob', 'English', 85), + ('Bob', 'Science', 80), + ('Charlie', 'Math', 88), + ('Charlie', 'English', 85), + ('Charlie', 'Science', 85); +``` + +**Rank all scores (showing tie handling with gaps):** + +```sql +SELECT student, subject, score, + RANK() OVER (ORDER BY score DESC) AS score_rank +FROM scores +ORDER BY score DESC, student, subject; +``` + +Result: +``` +student | subject | score | score_rank +--------+---------+-------+----------- +Alice | Math | 95 | 1 +Alice | Science | 92 | 2 +Charlie | Math | 88 | 3 +Alice | English | 87 | 4 +Bob | English | 85 | 5 +Bob | Math | 85 | 5 +Charlie | English | 85 | 5 +Charlie | Science | 85 | 5 +Bob | Science | 80 | 9 +``` + +**Rank scores within each student (showing ties within partitions):** + +```sql +SELECT student, subject, score, + RANK() OVER (PARTITION BY student ORDER BY score DESC) AS subject_rank +FROM scores +ORDER BY student, score DESC, subject; +``` + +Result: +``` +student | subject | score | subject_rank +--------+---------+-------+------------- +Alice | Math | 95 | 1 +Alice | Science | 92 | 2 +Alice | English | 87 | 3 +Bob | English | 85 | 1 +Bob | Math | 85 | 1 +Bob | Science | 80 | 3 +Charlie | Math | 88 | 1 +Charlie | English | 85 | 2 +Charlie | Science | 85 | 2 +``` diff --git a/tidb-cloud-lake/sql/read-file.md b/tidb-cloud-lake/sql/read-file.md new file mode 100644 index 0000000000000..565aac142b0a5 --- /dev/null +++ b/tidb-cloud-lake/sql/read-file.md @@ -0,0 +1,77 @@ +--- +title: READ_FILE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Reads the content of a file from a stage and returns it as a `BINARY` value. This is useful for loading raw file content (images, PDFs, binary data, etc.) directly into a table column. + +## Syntax + +```sql +-- Single argument: combined stage and file path +READ_FILE('') + +-- Two arguments: separate stage and relative file path +READ_FILE('', '') +``` + +## Parameters + +| Parameter | Description | +| ------------- | ------------------------------------------------------------------------------------------------------------- | +| `stage_path` | A combined stage and file path starting with `@`, e.g., `'@my_stage/path/to/file.png'`. | +| `stage` | The stage name starting with `@`, e.g., `'@my_stage'`. The stage is validated at bind time when it is a constant. | +| `file_path` | The relative file path within the stage, e.g., `'path/to/file.png'`. | + +## Return Type + +`BINARY`. Returns `NULL` if any argument is `NULL`. + +## Examples + +### Reading a single file + +```sql +-- Read a file using a combined stage path +SELECT to_hex(read_file('@my_stage/data/file.csv')); + +-- Read a file using separate stage and path arguments +SELECT to_hex(read_file('@my_stage', 'data/file.csv')); +``` + +### Reading files from a table column + +```sql +-- Create a table with file paths +CREATE TABLE file_paths(path STRING); +INSERT INTO file_paths VALUES + ('@my_stage/images/01.png'), + ('@my_stage/images/02.png'), + (NULL); + +-- Read all files referenced in the table +SELECT path, to_hex(read_file(path)) AS content_hex FROM file_paths; + +┌──────────────────────────────────────────────────┐ +│ path │ content_hex │ +├──────────────────────────┼────────────────────────┤ +│ @my_stage/images/01.png │ 89504e47... │ +│ @my_stage/images/02.png │ 89504e47... │ +│ NULL │ NULL │ +└──────────────────────────────────────────────────┘ +``` + +### Using two-argument form with relative paths + +```sql +-- Create a table with relative file paths +CREATE TABLE rel_paths(path STRING); +INSERT INTO rel_paths VALUES + ('data/file1.csv'), + ('data/file2.csv'); + +-- Read files using a fixed stage and relative paths from the table +SELECT path, to_hex(read_file('@my_stage', path)) AS content_hex FROM rel_paths; +``` diff --git a/tidb-cloud-lake/sql/recluster-table.md b/tidb-cloud-lake/sql/recluster-table.md new file mode 100644 index 0000000000000..1f2f2a3c8b91a --- /dev/null +++ b/tidb-cloud-lake/sql/recluster-table.md @@ -0,0 +1,63 @@ +--- +title: RECLUSTER TABLE +sidebar_position: 2 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Re-clusters a table. For why and when to re-cluster a table, see [Re-clustering Table](/tidb-cloud-lake/sql/cluster-key.md#cluster-key-management). + +### Syntax + +```sql +ALTER TABLE [ IF EXISTS ] RECLUSTER [ FINAL ] [ WHERE condition ] [ LIMIT ] +``` + +The command has a limitation on the number of segments it can process, with the default value being "max_thread * 4". You can modify this limit by using the **LIMIT** option. Alternatively, you have two options to cluster your data in the table further: + +- Run the command multiple times against the table. +- Use the **FINAL** option to continuously optimize the table until it is fully clustered. + +:::note + +Re-clustering a table consumes time (even longer if you include the **FINAL** option) and credits (when you are in Databend Cloud). During the optimizing process, do NOT perform DML actions to the table. +::: + +The command does not cluster the table from the ground up. Instead, it selects and reorganizes the most chaotic existing storage blocks from the latest **LIMIT** segments using a clustering algorithm. + +### Examples + +```sql +-- create table +create table t(a int, b int) cluster by(a+1); + +-- insert some data to t +insert into t values(1,1),(3,3); +insert into t values(2,2),(5,5); +insert into t values(4,4); + +select * from clustering_information('default','t')\G +*************************** 1. row *************************** + cluster_key: ((a + 1)) + total_block_count: 3 + constant_block_count: 1 +unclustered_block_count: 0 + average_overlaps: 1.3333 + average_depth: 2.0 + block_depth_histogram: {"00002":3} + +-- alter table recluster +ALTER TABLE t RECLUSTER FINAL WHERE a != 4; + +select * from clustering_information('default','t')\G +*************************** 1. row *************************** + cluster_key: ((a + 1)) + total_block_count: 2 + constant_block_count: 1 +unclustered_block_count: 0 + average_overlaps: 1.0 + average_depth: 2.0 + block_depth_histogram: {"00002":2} +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/refresh-aggregating-index.md b/tidb-cloud-lake/sql/refresh-aggregating-index.md new file mode 100644 index 0000000000000..59e1e03fa5d4c --- /dev/null +++ b/tidb-cloud-lake/sql/refresh-aggregating-index.md @@ -0,0 +1,36 @@ +--- +title: REFRESH AGGREGATING INDEX +sidebar_position: 2 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Databend automatically maintains aggregating indexes in `SYNC` mode as new data is ingested. Run `REFRESH AGGREGATING INDEX` when you introduce an index on a table that already contains data so earlier rows are backfilled. + +## Syntax + +```sql +REFRESH AGGREGATING INDEX +``` + +## Examples + +This example creates an aggregating index on a table that already contains data, then runs `REFRESH` once to backfill those rows: + +```sql +-- Prepare a table and load data before the index exists +CREATE TABLE agg(a int, b int, c int); +INSERT INTO agg VALUES (1,1,4), (1,2,1), (1,2,4); + +-- Declare the aggregating index (existing rows are not indexed yet) +CREATE AGGREGATING INDEX my_agg_index AS SELECT MIN(a), MAX(c) FROM agg; + +-- Backfill previously inserted rows +REFRESH AGGREGATING INDEX my_agg_index; + +-- Insert new data after the index exists (no manual refresh needed) +INSERT INTO agg VALUES (2,2,5); +-- SYNC mode keeps the index current automatically +``` diff --git a/tidb-cloud-lake/sql/refresh-inverted-index.md b/tidb-cloud-lake/sql/refresh-inverted-index.md new file mode 100644 index 0000000000000..291a1de503378 --- /dev/null +++ b/tidb-cloud-lake/sql/refresh-inverted-index.md @@ -0,0 +1,38 @@ +--- +title: REFRESH INVERTED INDEX +sidebar_position: 2 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Databend automatically refreshes inverted indexes in `SYNC` mode whenever new data is written. Use `REFRESH INVERTED INDEX` primarily to backfill rows that existed before the index was declared. + +## Syntax + +```sql +REFRESH INVERTED INDEX ON [.]
[LIMIT ] +``` + +| Parameter | Description | +|-----------|----------------------------------------------------------------------------------------------------------------------------------| +| `` | Specifies the maximum number of rows to process during index refresh. If not specified, all rows in the table will be processed. | + +## Examples + +```sql +-- Existing table with data loaded before the index was declared +CREATE TABLE IF NOT EXISTS customer_feedback(id INT, body STRING); +INSERT INTO customer_feedback VALUES + (1, 'Great coffee beans'), + (2, 'Needs fresh roasting'); + +-- Create the inverted index afterward +CREATE INVERTED INDEX customer_feedback_idx ON customer_feedback(body); + +-- Backfill historical rows so the index covers earlier inserts +REFRESH INVERTED INDEX customer_feedback_idx ON customer_feedback; + +-- Future inserts refresh automatically in SYNC mode +``` diff --git a/tidb-cloud-lake/sql/refresh-ngram-index.md b/tidb-cloud-lake/sql/refresh-ngram-index.md new file mode 100644 index 0000000000000..bc4e2a6e9f8fc --- /dev/null +++ b/tidb-cloud-lake/sql/refresh-ngram-index.md @@ -0,0 +1,35 @@ +--- +title: REFRESH NGRAM INDEX +sidebar_position: 2 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Databend automatically refreshes NGRAM indexes when data is ingested. Use `REFRESH NGRAM INDEX` when you need to backfill data that existed before the index was defined. + +## Syntax + +```sql +REFRESH NGRAM INDEX [IF EXISTS] +ON [.]; +``` + +## Examples + +```sql +-- Table already populated before the NGRAM index exists +CREATE TABLE IF NOT EXISTS amazon_reviews_ngram(review_id INT, review STRING); +INSERT INTO amazon_reviews_ngram VALUES + (1, 'coffee beans from Colombia'), + (2, 'best roasting kit'); + +-- Declare the NGRAM index afterward +CREATE NGRAM INDEX idx1 ON amazon_reviews_ngram(review) WITH (ngram_size = 3); + +-- Refresh so the pre-existing rows are indexed +REFRESH NGRAM INDEX idx1 ON amazon_reviews_ngram; + +-- Subsequent inserts refresh automatically in SYNC mode +``` diff --git a/tidb-cloud-lake/sql/refresh-vector-index.md b/tidb-cloud-lake/sql/refresh-vector-index.md new file mode 100644 index 0000000000000..0dcd15ece8d42 --- /dev/null +++ b/tidb-cloud-lake/sql/refresh-vector-index.md @@ -0,0 +1,50 @@ +--- +title: REFRESH VECTOR INDEX +sidebar_position: 2 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Builds a Vector index for existing data that was inserted before the index was created. + +## Syntax + +```sql +REFRESH VECTOR INDEX ON [.] +``` + +## When to Use REFRESH + +`REFRESH VECTOR INDEX` is **only needed in one specific scenario**: when you create a Vector index on a table that **already contains data**. + +The existing rows (written before the index was created) will not be automatically indexed. You must run `REFRESH VECTOR INDEX` to build the index for this pre-existing data. After the refresh completes, all subsequent data writes will automatically generate the index. + +## Examples + +### Example: Index Existing Data + +```sql +-- Step 1: Create a table without an index +CREATE TABLE products ( + id INT, + name VARCHAR, + embedding VECTOR(4) +) ENGINE = FUSE; + +-- Step 2: Insert data (without index) +INSERT INTO products VALUES + (1, 'Product A', [0.1, 0.2, 0.3, 0.4]), + (2, 'Product B', [0.5, 0.6, 0.7, 0.8]), + (3, 'Product C', [0.9, 1.0, 1.1, 1.2]); + +-- Step 3: Create vector index on existing data +CREATE VECTOR INDEX idx_embedding ON products(embedding) distance='cosine'; + +-- Step 4: Refresh to build index for the 3 existing rows +REFRESH VECTOR INDEX idx_embedding ON products; + +-- Step 5: New insertions are automatically indexed (no refresh needed) +INSERT INTO products VALUES (4, 'Product D', [1.3, 1.4, 1.5, 1.6]); +``` diff --git a/tidb-cloud-lake/sql/refresh-virtual-column.md b/tidb-cloud-lake/sql/refresh-virtual-column.md new file mode 100644 index 0000000000000..27404f3ee3c8b --- /dev/null +++ b/tidb-cloud-lake/sql/refresh-virtual-column.md @@ -0,0 +1,57 @@ +--- +title: REFRESH VIRTUAL COLUMN +sidebar_position: 3 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +import EEFeature from '@site/src/components/EEFeature'; + + + +The `REFRESH VIRTUAL COLUMN` command in Databend is used to explicitly trigger the creation of virtual columns for existing tables. While Databend automatically manages virtual columns for new data, there are specific scenarios where manual refreshing is necessary to take full advantage of this feature. + +Virtual columns are enabled by default starting from v1.2.832. + +## When to Use `REFRESH VIRTUAL COLUMN` + +- **Existing Tables Before Feature Enablement:** If you have tables containing `VARIANT` data that were created *before* the virtual column feature was enabled (or before upgrading to a version with automatic virtual column creation), you need to refresh the virtual columns to enable query acceleration. Databend will not automatically create virtual columns for data that already exists in these tables. + +## Syntax + +```sql +REFRESH VIRTUAL COLUMN FOR
+``` + +## Examples + +This example refreshes virtual columns for a table named 'test': + +```sql +CREATE TABLE test(id int, val variant); + +INSERT INTO + test +VALUES + ( + 1, + '{"id":1,"name":"databend"}' + ), + ( + 2, + '{"id":2,"name":"databricks"}' + ); + +REFRESH VIRTUAL COLUMN FOR test; + +SHOW VIRTUAL COLUMNS WHERE table = 'test' AND database = 'default'; +╭───────────────────────────────────────────────────────────────────────────────────────────────────╮ +│ database │ table │ source_column │ virtual_column_id │ virtual_column_name │ virtual_column_type │ +│ String │ String │ String │ UInt32 │ String │ String │ +├──────────┼────────┼───────────────┼───────────────────┼─────────────────────┼─────────────────────┤ +│ default │ test │ val │ 3000000000 │ ['id'] │ UInt64 │ +│ default │ test │ val │ 3000000001 │ ['name'] │ String │ +╰───────────────────────────────────────────────────────────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/regexp-instr.md b/tidb-cloud-lake/sql/regexp-instr.md new file mode 100644 index 0000000000000..ed847b7d28d12 --- /dev/null +++ b/tidb-cloud-lake/sql/regexp-instr.md @@ -0,0 +1,58 @@ +--- +title: REGEXP_INSTR +--- + +Returns the starting index of the substring of the string `expr` that matches the regular expression specified by the pattern `pat`, `0` if there is no match. If `expr` or `pat` is NULL, the return value is NULL. Character indexes begin at `1`. + +## Syntax + +```sql +REGEXP_INSTR(, ) +``` + +## Arguments + +| Arguments | Description | +|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| expr | The string expr that to be matched | +| pat | The regular expression | +| pos | Optional. The position in expr at which to start the search. If omitted, the default is 1. | +| occurrence | Optional. Which occurrence of a match to search for. If omitted, the default is 1. | +| return_option | Optional. Which type of position to return. If this value is 0, REGEXP_INSTR() returns the position of the matched substring's first character. If this value is 1, REGEXP_INSTR() returns the position following the matched substring. If omitted, the default is 0. | +| match_type | Optional. A string that specifies how to perform matching. The meaning is as described for REGEXP_LIKE(). | + +## Return Type + +A number data type value. + +## Examples + +```sql +SELECT REGEXP_INSTR('dog cat dog', 'dog'); ++------------------------------------+ +| REGEXP_INSTR('dog cat dog', 'dog') | ++------------------------------------+ +| 1 | ++------------------------------------+ + +SELECT REGEXP_INSTR('dog cat dog', 'dog', 2); ++---------------------------------------+ +| REGEXP_INSTR('dog cat dog', 'dog', 2) | ++---------------------------------------+ +| 9 | ++---------------------------------------+ + +SELECT REGEXP_INSTR('aa aaa aaaa', 'a{2}'); ++-------------------------------------+ +| REGEXP_INSTR('aa aaa aaaa', 'a{2}') | ++-------------------------------------+ +| 1 | ++-------------------------------------+ + +SELECT REGEXP_INSTR('aa aaa aaaa', 'a{4}'); ++-------------------------------------+ +| REGEXP_INSTR('aa aaa aaaa', 'a{4}') | ++-------------------------------------+ +| 8 | ++-------------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/regexp-like.md b/tidb-cloud-lake/sql/regexp-like.md new file mode 100644 index 0000000000000..b9a7350536870 --- /dev/null +++ b/tidb-cloud-lake/sql/regexp-like.md @@ -0,0 +1,73 @@ +--- +title: REGEXP_LIKE +--- + +REGEXP_LIKE function is used to check that whether the string matches regular expression. + +## Syntax + +```sql +REGEXP_LIKE(, ) +``` + +## Arguments + +| Arguments | Description | +|----------------|-----------------------------------------------------------------------------------| +| `` | The string expr that to be matched | +| `` | The regular expression | +| `[match_type]` | Optional. match_type argument is a string that specifying how to perform matching | + +`match_type` may contain any or all the following characters: + +* `c`: Case-sensitive matching. +* `i`: Case-insensitive matching. +* `m`: Multiple-line mode. Recognize line terminators within the string. The default behavior is to match line terminators only at the start and end of the string expression. +* `n`: The `.` character matches line terminators. The default is for `.` matching to stop at the end of a line. +* `u`: Unix-only line endings. Not be supported now. + +## Return Type + +`BIGINT` +Returns `1` if the string expr matches the regular expression specified by the pattern pat, `0` otherwise. If expr or pat is NULL, the return value is NULL. + +## Examples + +```sql +SELECT REGEXP_LIKE('a', '^[a-d]'); ++----------------------------+ +| REGEXP_LIKE('a', '^[a-d]') | ++----------------------------+ +| 1 | ++----------------------------+ + +SELECT REGEXP_LIKE('abc', 'ABC'); ++---------------------------+ +| REGEXP_LIKE('abc', 'ABC') | ++---------------------------+ +| 1 | ++---------------------------+ + +SELECT REGEXP_LIKE('abc', 'ABC', 'c'); ++--------------------------------+ +| REGEXP_LIKE('abc', 'ABC', 'c') | ++--------------------------------+ +| 0 | ++--------------------------------+ + +SELECT REGEXP_LIKE('new*\n*line', 'new\\*.\\*line'); ++-------------------------------------------+ +| REGEXP_LIKE('new* +*line', 'new\*.\*line') | ++-------------------------------------------+ +| 0 | ++-------------------------------------------+ + +SELECT REGEXP_LIKE('new*\n*line', 'new\\*.\\*line', 'n'); ++------------------------------------------------+ +| REGEXP_LIKE('new* +*line', 'new\*.\*line', 'n') | ++------------------------------------------------+ +| 1 | ++------------------------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/regexp-replace.md b/tidb-cloud-lake/sql/regexp-replace.md new file mode 100644 index 0000000000000..079c4b4a1eec4 --- /dev/null +++ b/tidb-cloud-lake/sql/regexp-replace.md @@ -0,0 +1,51 @@ +--- +title: REGEXP_REPLACE +--- + +Replaces occurrences in the string `expr` that match the regular expression specified by the pattern `pat` with the replacement string `repl`, and returns the resulting string. If `expr`, `pat`, or `repl` is NULL, the return value is NULL. + +## Syntax + +```sql +REGEXP_REPLACE(, , ) +``` + +## Arguments + +| Arguments | Description | +|------------|-------------------------------------------------------------------------------------------------------------------------| +| expr | The string expr that to be matched | +| pat | The regular expression | +| repl | The replacement string | +| pos | Optional. The position in expr at which to start the search. If omitted, the default is 1. | +| occurrence | Optional. Which occurrence of a match to replace. If omitted, the default is 0 (which means "replace all occurrences"). | +| match_type | Optional. A string that specifies how to perform matching. The meaning is as described for REGEXP_LIKE(). | + +## Return Type + +`VARCHAR` + +## Examples + +```sql +SELECT REGEXP_REPLACE('a b c', 'b', 'X'); ++-----------------------------------+ +| REGEXP_REPLACE('a b c', 'b', 'X') | ++-----------------------------------+ +| a X c | ++-----------------------------------+ + +SELECT REGEXP_REPLACE('abc def ghi', '[a-z]+', 'X', 1, 3); ++----------------------------------------------------+ +| REGEXP_REPLACE('abc def ghi', '[a-z]+', 'X', 1, 3) | ++----------------------------------------------------+ +| abc def X | ++----------------------------------------------------+ + +SELECT REGEXP_REPLACE('周 周周 周周周', '周+', 'X', 3, 2); ++-----------------------------------------------------------+ +| REGEXP_REPLACE('周 周周 周周周', '周+', 'X', 3, 2) | ++-----------------------------------------------------------+ +| 周 周周 X | ++-----------------------------------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/regexp-split-array.md b/tidb-cloud-lake/sql/regexp-split-array.md new file mode 100644 index 0000000000000..09b4cd8dc006c --- /dev/null +++ b/tidb-cloud-lake/sql/regexp-split-array.md @@ -0,0 +1,74 @@ +--- +title: REGEXP_SPLIT_TO_ARRAY +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Splits a string using a regular expression pattern and returns the segments as an array. + +## Syntax + +```sql +REGEXP_SPLIT_TO_ARRAY(string, pattern [, flags text]) +``` + +| Parameter | Description | +|--------------|----------------------------------------------------------------| +| `string` | The input string to split (VARCHAR type) | +| `pattern` | Regular expression pattern used for splitting (VARCHAR type) | +| `flags text` | A string of flags to modify the regular expression's behavior. | + +**Supported `flags` Parameter:** +Provides flexible regular expression configuration options, controlling matching behavior by combining the following characters: +* `i` (case-insensitive): Pattern matching ignores case. +* `c` (case-sensitive): Pattern matching is case-sensitive (default behavior). +* `n` or `m` (multi-line): Enables multi-line mode. In this mode, `^` and `$` match the beginning and end of the string, respectively, as well as the beginning and end of each line; the dot `.` does not match newline characters. +* `s` (single-line): Enables single-line mode (also known as dot-matches-newline). In this mode, the dot `.` matches any character, including newline characters. +* `x` (ignore-whitespace): Ignores whitespace characters in the pattern (improves pattern readability). +* `q` (literal): Treats the `pattern` as a literal string rather than a regular expression. + +## Examples + +### Basic Splitting +```sql +SELECT REGEXP_SPLIT_TO_ARRAY('apple,orange,banana', ','); +┌───────────────────────────────────────────┐ +│ ["apple","orange","banana"] │ +└───────────────────────────────────────────┘ +``` + +### Complex Delimiters +```sql +SELECT REGEXP_SPLIT_TO_ARRAY('2023-01-01T14:30:00', '[-T:]'); +┌───────────────────────────────────────────────────────┐ +│ ["2023","01","01","14","30","00"] │ +└───────────────────────────────────────────────────────┘ +``` + +### Handling Empty Elements +```sql +SELECT REGEXP_SPLIT_TO_ARRAY('a,,b,,,c', ',+'); +┌───────────────────────────────────┐ +│ ["a","b","c"] │ +└───────────────────────────────────┘ +``` + +### With flag text + +```sql +SELECT regexp_split_to_array('One_Two_Three', '[_-]', 'i') + +╭─────────────────────────────────────────────────────╮ +│ ['One','Two','Three'] │ +╰─────────────────────────────────────────────────────╯ + +``` + + +## See Also + +- [SPLIT](/tidb-cloud-lake/sql/split.md): For simple string splitting +- [REGEXP_SPLIT_TO_TABLE](/tidb-cloud-lake/sql/regexp-split-table.md): split string to table + diff --git a/tidb-cloud-lake/sql/regexp-split-table.md b/tidb-cloud-lake/sql/regexp-split-table.md new file mode 100644 index 0000000000000..90eceb8244214 --- /dev/null +++ b/tidb-cloud-lake/sql/regexp-split-table.md @@ -0,0 +1,86 @@ +--- +title: REGEXP_SPLIT_TO_TABLE +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Splits a string using a regular expression pattern and returns each segment as a table. + +## Syntax + +```sql +REGEXP_SPLIT_TO_TABLE(string, pattern [, flags text]) +``` + +| Parameter | Description | +|--------------|----------------------------------------------------------------| +| `string` | The input string to split (VARCHAR type) | +| `pattern` | Regular expression pattern used for splitting (VARCHAR type) | +| `flags text` | A string of flags to modify the regular expression's behavior. | + + +**Supported `flags` Parameter:** +Provides flexible regular expression configuration options, controlling matching behavior by combining the following characters: +* `i` (case-insensitive): Pattern matching ignores case. +* `c` (case-sensitive): Pattern matching is case-sensitive (default behavior). +* `n` or `m` (multi-line): Enables multi-line mode. In this mode, `^` and `$` match the beginning and end of the string, respectively, as well as the beginning and end of each line; the dot `.` does not match newline characters. +* `s` (single-line): Enables single-line mode (also known as dot-matches-newline). In this mode, the dot `.` matches any character, including newline characters. +* `x` (ignore-whitespace): Ignores whitespace characters in the pattern (improves pattern readability). +* `q` (literal): Treats the `pattern` as a literal string rather than a regular expression. + +## Examples + +### Basic Row Generation +```sql +SELECT REGEXP_SPLIT_TO_TABLE('one,two,three', ','); +┌─────────┐ +│ one │ +│ two │ +│ three │ +└─────────┘ +``` + +### Log Parsing +```sql +SELECT REGEXP_SPLIT_TO_TABLE('ERR:404:File Not Found', ':'); +┌──────────────────┐ +│ ERR │ +│ 404 │ +│ File Not Found │ +└──────────────────┘ +``` + +### With flag text + +```sql +SELECT regexp_split_to_table('One_Two_Three', '[_-]', 'i') + +╭────────╮ +│ One │ +│ Two │ +│ Three │ +╰────────╯ + +``` + +### Nested Usage + +```sql +WITH data AS ( + SELECT 'id=123,name=John' AS kv_pairs +) +SELECT + REGEXP_SPLIT_TO_TABLE(kv_pairs, ',') AS pair +FROM data; +┌──────────────┐ +│ id=123 │ +│ name=John │ +└──────────────┘ +``` + +## See Also + +- [SPLIT](/tidb-cloud-lake/sql/split.md): For simple string splitting +- [REGEXP_SPLIT_TO_ARRAY](/tidb-cloud-lake/sql/regexp-split-array.md): split string to array diff --git a/tidb-cloud-lake/sql/regexp-substr.md b/tidb-cloud-lake/sql/regexp-substr.md new file mode 100644 index 0000000000000..42d3f610cce51 --- /dev/null +++ b/tidb-cloud-lake/sql/regexp-substr.md @@ -0,0 +1,79 @@ +--- +title: REGEXP_SUBSTR +--- + +Returns the substring of the string `expr` that matches the regular expression specified by the pattern `pat`, NULL if there is no match. If expr or pat is NULL, the return value is NULL. + +- REGEXP_SUBSTR does not support extracting capture groups (subpatterns defined by parentheses `()`). It returns the entire matched substring instead of specific captured groups. + +```sql +SELECT REGEXP_SUBSTR('abc123', '(\w+)(\d+)'); +-- Returns 'abc123' (the entire match), not 'abc' or '123'. + +-- Alternative Solution: Use string functions like SUBSTRING and REGEXP_INSTR to manually extract the desired portion of the string: +SELECT SUBSTRING('abc123', 1, REGEXP_INSTR('abc123', '\d+') - 1); +-- Returns 'abc' (extracts the part before the digits). +SELECT SUBSTRING('abc123', REGEXP_INSTR('abc123', '\d+')); +-- Returns '123' (extracts the digits). +``` + +- REGEXP_SUBSTR does not support the `e` parameter (used in Snowflake to extract capture groups) or the `group_num` parameter for specifying which capture group to return. + +```sql +SELECT REGEXP_SUBSTR('abc123', '(\w+)(\d+)', 1, 1, 'e', 1); +-- Error: Databend does not support the 'e' parameter or capture group extraction. + +-- Alternative Solution: Use string functions like SUBSTRING and LOCATE to manually extract the desired substring, or preprocess the data with external tools (e.g., Python) to extract capture groups before querying. +SELECT SUBSTRING( + REGEXP_SUBSTR('letters:abc,numbers:123', 'letters:[a-z]+,numbers:[0-9]+'), + LOCATE('letters:', 'letters:abc,numbers:123') + 8, + LOCATE(',', 'letters:abc,numbers:123') - (LOCATE('letters:', 'letters:abc,numbers:123') + 8) +); +-- Returns 'abc' +``` + +## Syntax + +```sql +REGEXP_SUBSTR(, ) +``` + +## Arguments + +| Arguments | Description | +|------------|-----------------------------------------------------------------------------------------------------------| +| expr | The string expr that to be matched | +| pat | The regular expression | +| pos | Optional. The position in expr at which to start the search. If omitted, the default is 1. | +| occurrence | Optional. Which occurrence of a match to search for. If omitted, the default is 1. | +| match_type | Optional. A string that specifies how to perform matching. The meaning is as described for REGEXP_LIKE(). | + +## Return Type + +`VARCHAR` + +## Examples + +```sql +SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+'); ++----------------------------------------+ +| REGEXP_SUBSTR('abc def ghi', '[a-z]+') | ++----------------------------------------+ +| abc | ++----------------------------------------+ + +SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+', 1, 3); ++----------------------------------------------+ +| REGEXP_SUBSTR('abc def ghi', '[a-z]+', 1, 3) | ++----------------------------------------------+ +| ghi | ++----------------------------------------------+ + +SELECT REGEXP_SUBSTR('周 周周 周周周 周周周周', '周+', 2, 3); ++------------------------------------------------------------------+ +| REGEXP_SUBSTR('周 周周 周周周 周周周周', '周+', 2, 3) | ++------------------------------------------------------------------+ +| 周周周周 | ++------------------------------------------------------------------+ + +``` diff --git a/tidb-cloud-lake/sql/regexp.md b/tidb-cloud-lake/sql/regexp.md new file mode 100644 index 0000000000000..ae0e7ec6fe7ef --- /dev/null +++ b/tidb-cloud-lake/sql/regexp.md @@ -0,0 +1,27 @@ +--- +title: REGEXP +--- + +Returns `true` if the string `` matches the regular expression specified by the ``, `false` otherwise. + +## Syntax + +```sql + REGEXP +``` + +## Aliases + +- [RLIKE](/tidb-cloud-lake/sql/rlike.md) + +## Examples + +```sql +SELECT 'databend' REGEXP 'd*', 'databend' RLIKE 'd*'; + +┌────────────────────────────────────────────────────┐ +│ ('databend' regexp 'd*') │ ('databend' rlike 'd*') │ +├──────────────────────────┼─────────────────────────┤ +│ true │ true │ +└────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/remove-nullable.md b/tidb-cloud-lake/sql/remove-nullable.md new file mode 100644 index 0000000000000..f79aa2c8f4386 --- /dev/null +++ b/tidb-cloud-lake/sql/remove-nullable.md @@ -0,0 +1,5 @@ +--- +title: REMOVE_NULLABLE +--- + +Alias for [ASSUME_NOT_NULL](/tidb-cloud-lake/sql/assume-not-null.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/remove-stage-files.md b/tidb-cloud-lake/sql/remove-stage-files.md new file mode 100644 index 0000000000000..6b561c68b4125 --- /dev/null +++ b/tidb-cloud-lake/sql/remove-stage-files.md @@ -0,0 +1,42 @@ +--- +title: REMOVE STAGE FILES +sidebar_position: 5 +--- + +Removes files from a stage. + +See also: + +- [LIST STAGE FILES](/tidb-cloud-lake/sql/list-stage-files.md): Lists files in a stage. +- [PRESIGN](/tidb-cloud-lake/sql/presign.md): Databend recommends using the Presigned URL method to upload files to the stage. + +## Syntax + +```sql +REMOVE { userStage | internalStage | externalStage } [ PATTERN = '' ] +``` +Where: + +### internalStage + +```sql +internalStage ::= @[/] +``` + +### externalStage + +```sql +externalStage ::= @[/] +``` + +### PATTERN = 'regex_pattern' + +A regular expression pattern string, enclosed in single quotes, filters files to remove by their filename. + +## Examples + +This command removes all the files with a name matching the pattern *'ontime.*'* from the stage named *playground*: + +```sql +REMOVE @playground PATTERN = 'ontime.*' +``` diff --git a/tidb-cloud-lake/sql/rename-database.md b/tidb-cloud-lake/sql/rename-database.md new file mode 100644 index 0000000000000..f9c9cb3cd8501 --- /dev/null +++ b/tidb-cloud-lake/sql/rename-database.md @@ -0,0 +1,46 @@ +--- +title: RENAME DATABASE +sidebar_position: 4 +--- + +Changes the name of a database. + +## Syntax + +```sql +ALTER DATABASE [ IF EXISTS ] RENAME TO +``` + +## Examples + +```sql +CREATE DATABASE DATABEND; +``` + +```sql +SHOW DATABASES; ++--------------------+ +| Database | ++--------------------+ +| DATABEND | +| information_schema | +| default | +| system | ++--------------------+ +``` + +```sql +ALTER DATABASE `DATABEND` RENAME TO `NEW_DATABEND`; +``` + +```sql +SHOW DATABASES; ++--------------------+ +| Database | ++--------------------+ +| information_schema | +| NEW_DATABEND | +| default | +| system | ++--------------------+ +``` diff --git a/tidb-cloud-lake/sql/rename-table.md b/tidb-cloud-lake/sql/rename-table.md new file mode 100644 index 0000000000000..53f9bcd48f1fa --- /dev/null +++ b/tidb-cloud-lake/sql/rename-table.md @@ -0,0 +1,40 @@ +--- +title: RENAME TABLE +sidebar_position: 3 +--- + +Changes the name of a table. + +## Syntax + +```sql +ALTER TABLE [ IF EXISTS ] RENAME TO +``` + +## Examples + +```sql +CREATE TABLE test(a INT); +``` + +```sql +SHOW TABLES; ++------+ +| name | ++------+ +| test | ++------+ +``` + +```sql +ALTER TABLE `test` RENAME TO `new_test`; +``` + +```sql +SHOW TABLES; ++----------+ +| name | ++----------+ +| new_test | ++----------+ +``` diff --git a/tidb-cloud-lake/sql/rename-workload-group.md b/tidb-cloud-lake/sql/rename-workload-group.md new file mode 100644 index 0000000000000..b3d342b38fbc9 --- /dev/null +++ b/tidb-cloud-lake/sql/rename-workload-group.md @@ -0,0 +1,23 @@ +--- +title: RENAME WORKLOAD GROUP +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Renames an existing workload group to a new name. + +## Syntax + +```sql +RENAME WORKLOAD GROUP TO +``` + +## Examples + +This example renames `test_workload_group_1` to `test_workload_group`: + +```sql +RENAME WORKLOAD GROUP test_workload_group_1 TO test_workload_group; +``` + diff --git a/tidb-cloud-lake/sql/repeat.md b/tidb-cloud-lake/sql/repeat.md new file mode 100644 index 0000000000000..2549b26a32280 --- /dev/null +++ b/tidb-cloud-lake/sql/repeat.md @@ -0,0 +1,45 @@ +--- +title: REPEAT +--- + +Returns a string consisting of the string str repeated count times. If count is less than 1, returns an empty string. Returns NULL if str or count are NULL. + +## Syntax + +```sql +REPEAT(, ) +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | The string. | +| `` | The number. | + +## Examples + +```sql +SELECT REPEAT('databend', 3); ++--------------------------+ +| REPEAT('databend', 3) | ++--------------------------+ +| databenddatabenddatabend | ++--------------------------+ + +SELECT REPEAT('databend', 0); ++-----------------------+ +| REPEAT('databend', 0) | ++-----------------------+ +| | ++-----------------------+ + +SELECT REPEAT('databend', NULL); ++--------------------------+ +| REPEAT('databend', NULL) | ++--------------------------+ +| NULL | ++--------------------------+ +``` + + diff --git a/tidb-cloud-lake/sql/replace-sql.md b/tidb-cloud-lake/sql/replace-sql.md new file mode 100644 index 0000000000000..8803e3a7e7bc7 --- /dev/null +++ b/tidb-cloud-lake/sql/replace-sql.md @@ -0,0 +1,34 @@ +--- +title: REPLACE +--- + +Returns the string str with all occurrences of the string from_str replaced by the string to_str. + +## Syntax + +```sql +REPLACE(, , ) +``` + +## Arguments + +| Arguments | Description | +|--------------|------------------| +| `` | The string. | +| `` | The from string. | +| `` | The to string. | + +## Return Type + +`VARCHAR` + +## Examples + +```sql +SELECT REPLACE('www.mysql.com', 'w', 'Ww'); ++-------------------------------------+ +| REPLACE('www.mysql.com', 'w', 'Ww') | ++-------------------------------------+ +| WwWwWw.mysql.com | ++-------------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/replace.md b/tidb-cloud-lake/sql/replace.md new file mode 100644 index 0000000000000..c51730d91a9e8 --- /dev/null +++ b/tidb-cloud-lake/sql/replace.md @@ -0,0 +1,178 @@ +--- +title: REPLACE +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +REPLACE INTO can either insert multiple new rows into a table or update existing rows if those rows already exist, using the following sources of data: + +- Direct values + +- Query results + +- Staged files: Databend enables you to replace data into a table from staged files with the REPLACE INTO statement. This is achieved through Databend's capacity to [Query Staged Files](/tidb-cloud-lake/sql/stage.md) and subsequently incorporate the query result into the table. + +:::tip atomic operations +Databend ensures data integrity with atomic operations. Inserts, updates, replaces, and deletes either succeed completely or fail entirely. +::: + +## Syntax + +```sql +REPLACE INTO [ ( [ , ... ] ) ] + ON () ... +``` + +REPLACE INTO updates existing rows when the specified conflict key is found in the table and inserts new rows if the conflict key is not present. The conflict key is a column or combination of columns in a table that uniquely identifies a row and is used to determine whether to insert a new row or update an existing row in the table using the REPLACE INTO statement. See an example below: + +```sql +CREATE TABLE employees ( + employee_id INT, + employee_name VARCHAR(100), + employee_salary DECIMAL(10, 2), + employee_email VARCHAR(255) +); + +-- This REPLACE INTO inserts a new row +REPLACE INTO employees (employee_id, employee_name, employee_salary, employee_email) +ON (employee_email) +VALUES (123, 'John Doe', 50000, 'john.doe@example.com'); + +-- This REPLACE INTO updates the inserted row +REPLACE INTO employees (employee_id, employee_name, employee_salary, employee_email) +ON (employee_email) +VALUES (123, 'John Doe', 60000, 'john.doe@example.com'); +``` + +## Distributed REPLACE INTO + +`REPLACE INTO` support distributed execution in cluster environments. You can enable distributed REPLACE INTO by setting ENABLE_DISTRIBUTED_REPLACE_INTO to 1. This helps enhance data loading performance and scalability in cluster environments. + +```sql +SET enable_distributed_replace_into = 1; +``` + +## Examples + +### Example 1: Replace with Direct Values + +This example replaces data with direct values: + +```sql +CREATE TABLE employees(id INT, name VARCHAR, salary INT); + +REPLACE INTO employees (id, name, salary) ON (id) +VALUES (1, 'John Doe', 50000); + +SELECT * FROM Employees; ++------+----------+--------+ +| id | name | salary | ++------+----------+--------+ +| 1 | John Doe | 50000 | ++------+----------+--------+ +``` + +### Example 2: Replace with Query Results + +This example replaces data with a query result: + +```sql +CREATE TABLE employees(id INT, name VARCHAR, salary INT); + +CREATE TABLE temp_employees(id INT, name VARCHAR, salary INT); + +INSERT INTO temp_employees (id, name, salary) VALUES (1, 'John Doe', 60000); + +REPLACE INTO employees (id, name, salary) ON (id) +SELECT id, name, salary FROM temp_employees WHERE id = 1; + +SELECT * FROM Employees; ++------+----------+--------+ +| id | name | salary | ++------+----------+--------+ +| 1 | John Doe | 60000 | ++------+----------+--------+ +``` + +### Example 3: Replace with Staged Files + +This example demonstrates how to replace existing data in a table with data from a staged file. + +1. Create a table called `sample` + +```sql +CREATE TABLE sample +( + id INT, + city VARCHAR, + score INT, + country VARCHAR DEFAULT 'China' +); + +INSERT INTO sample + (id, city, score) +VALUES + (1, 'Chengdu', 66); +``` + +2. Set up an internal stage with sample data + +Firstly, we create a stage named `mystage`. Then, we load sample data into this stage. +```sql +CREATE STAGE mystage; + +COPY INTO @mystage +FROM +( + SELECT * + FROM + ( + VALUES + (1, 'Chengdu', 80), + (3, 'Chongqing', 90), + (6, 'Hangzhou', 92), + (9, 'Hong Kong', 88) + ) +) +FILE_FORMAT = (TYPE = PARQUET); +``` + +3. Replace existing data using the staged Parquet file with `REPLACE INTO` + +:::tip +You can specify the file format and various copy-related settings with the FILE_FORMAT and COPY_OPTIONS available in the [COPY INTO](/tidb-cloud-lake/sql/copy-into-table.md) command. +::: + +```sql +REPLACE INTO sample + (id, city, score) +ON + (Id) +SELECT + $1, $2, $3 +FROM + @mystage + (FILE_FORMAT => 'parquet'); +``` + +4. Verify the data replacement + +Now, we can query the sample table to see the changes: +```sql +SELECT * FROM sample; +``` + +The results should be: +```sql +┌─────────────────────────────────────────────────────────────────────────┐ +│ id │ city │ score │ country │ +│ Nullable(Int32) │ Nullable(String) │ Nullable(Int32) │ Nullable(String) │ +├─────────────────┼──────────────────┼─────────────────┼──────────────────┤ +│ 1 │ Chengdu │ 80 │ China │ +│ 3 │ Chongqing │ 90 │ China │ +│ 6 │ Hangzhou │ 92 │ China │ +│ 9 │ Hong Kong │ 88 │ China │ +└─────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/result-scan.md b/tidb-cloud-lake/sql/result-scan.md new file mode 100644 index 0000000000000..891817f664896 --- /dev/null +++ b/tidb-cloud-lake/sql/result-scan.md @@ -0,0 +1,55 @@ +--- +title: RESULT_SCAN +--- + +Retrieves the cached result of a previous query by its query ID. + +See also: [system.query_cache](/tidb-cloud-lake/sql/system-query-cache.md) + +## Syntax + +```sql +RESULT_SCAN('' | LAST_QUERY_ID()) +``` + +## Examples + +This example shows how to enable the query result cache and run a query whose result will be cached: + +```bash +# Enable the query result cache feature +mysql> SET enable_query_result_cache = 1; +Query OK, 0 rows affected (0.01 sec) + +# Cache all queries regardless of how fast they execute +mysql> SET query_result_cache_min_execute_secs = 0; +Query OK, 0 rows affected (0.01 sec) + +# Execute a query and cache its result +mysql> SELECT * FROM t1 ORDER BY a; ++------+ +| a | ++------+ +| 1 | +| 2 | +| 3 | ++------+ +3 rows in set (0.02 sec) +Read 0 rows, 0.00 B in 0.006 sec., 0 rows/sec., 0.00 B/sec. +``` + +Once the result is cached, you can use `RESULT_SCAN` to retrieve it without re-running the query: + +```bash +# Retrieve the cached result of the previous query using its query ID +mysql> SELECT * FROM RESULT_SCAN(LAST_QUERY_ID()) ORDER BY a; ++------+ +| a | ++------+ +| 1 | +| 2 | +| 3 | ++------+ +3 rows in set (0.02 sec) +Read 3 rows, 13.00 B in 0.006 sec., 464.06 rows/sec., 1.96 KiB/sec. +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/retention.md b/tidb-cloud-lake/sql/retention.md new file mode 100644 index 0000000000000..60b7c7c6e2ac6 --- /dev/null +++ b/tidb-cloud-lake/sql/retention.md @@ -0,0 +1,65 @@ +--- +title: RETENTION +--- + +Aggregate function + +The RETENTION() function takes as arguments a set of conditions from 1 to 32 arguments of type UInt8 that indicate whether a certain condition was met for the event. + +Any condition can be specified as an argument (as in WHERE). + +The conditions, except the first, apply in pairs: the result of the second will be true if the first and second are true, of the third if the first and third are true, etc. + +## Syntax + +```sql +RETENTION( , , ..., ); +``` + +## Arguments + +| Arguments | Description | +|-----------|---------------------------------------------| +| `` | An expression that returns a Boolean result | + +## Return Type + +The array of 1 or 0. + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE user_events ( + id INT, + user_id INT, + event_date DATE, + event_type VARCHAR +); + +INSERT INTO user_events (id, user_id, event_date, event_type) +VALUES (1, 1, '2022-01-01', 'signup'), + (2, 1, '2022-01-02', 'login'), + (3, 2, '2022-01-01', 'signup'), + (4, 2, '2022-01-03', 'purchase'), + (5, 3, '2022-01-01', 'signup'), + (6, 3, '2022-01-02', 'login'); +``` + +**Query Demo: Calculate User Retention Based on Signup, Login, and Purchase Events** +```sql +SELECT + user_id, + RETENTION(event_type = 'signup', event_type = 'login', event_type = 'purchase') AS retention +FROM user_events +GROUP BY user_id; +``` + +**Result** +```sql +| user_id | retention | +|---------|-----------| +| 1 | [1, 1, 0] | +| 2 | [1, 0, 1] | +| 3 | [1, 1, 0] | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/reverse.md b/tidb-cloud-lake/sql/reverse.md new file mode 100644 index 0000000000000..0f9e8074a0682 --- /dev/null +++ b/tidb-cloud-lake/sql/reverse.md @@ -0,0 +1,32 @@ +--- +title: REVERSE +--- + +Returns the string str with the order of the characters reversed. + +## Syntax + +```sql +REVERSE() +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------------| +| `` | The string value. | + +## Return Type + +`VARCHAR` + +## Examples + +```sql +SELECT REVERSE('abc'); ++----------------+ +| REVERSE('abc') | ++----------------+ +| cba | ++----------------+ +``` diff --git a/tidb-cloud-lake/sql/revoke.md b/tidb-cloud-lake/sql/revoke.md new file mode 100644 index 0000000000000..478749a7f8cbb --- /dev/null +++ b/tidb-cloud-lake/sql/revoke.md @@ -0,0 +1,203 @@ +--- +title: REVOKE +sidebar_position: 11 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Revokes privileges, roles, and ownership of a specific database object. This includes: + +- Revoking privileges from roles. +- Removing roles from users or other roles. + +See also: + +- [GRANT](/tidb-cloud-lake/sql/grant.md) +- [SHOW GRANTS](/tidb-cloud-lake/sql/show-grants.md) + +## Syntax + +### Revoking Privileges + +```sql +REVOKE { + schemaObjectPrivileges | ALL [ PRIVILEGES ] ON + } +FROM ROLE +``` + +Where: + +```sql +schemaObjectPrivileges ::= +-- For TABLE + { SELECT | INSERT } + +-- For SCHEMA + { CREATE | DROP | ALTER } + +-- For USER + { CREATE USER } + +-- For ROLE + { CREATE ROLE} + +-- For STAGE + { READ, WRITE } + +-- For UDF + { USAGE } + +-- For MASKING POLICY (account-level privileges) + { CREATE MASKING POLICY | APPLY MASKING POLICY } + +-- For ROW ACCESS POLICY (account-level privileges) + { CREATE ROW ACCESS POLICY | APPLY ROW ACCESS POLICY } +``` + +```sql +privileges_level ::= + *.* + | db_name.* + | db_name.tbl_name + | STAGE + | UDF + | MASKING POLICY + | ROW ACCESS POLICY +``` + +### Revoking Masking Policy Privileges + +```sql +REVOKE APPLY ON MASKING POLICY FROM ROLE +REVOKE ALL [ PRIVILEGES ] ON MASKING POLICY FROM ROLE +REVOKE OWNERSHIP ON MASKING POLICY FROM ROLE '' +``` + +Use these forms to remove access to individual masking policies. Global `CREATE MASKING POLICY` and `APPLY MASKING POLICY` privileges are revoked using the standard syntax with `ON *.*`. + +### Revoking Row Access Policy Privileges + +```sql +REVOKE APPLY ON ROW ACCESS POLICY FROM ROLE +REVOKE ALL [ PRIVILEGES ] ON ROW ACCESS POLICY FROM ROLE +REVOKE OWNERSHIP ON ROW ACCESS POLICY FROM ROLE '' +``` + +Use these forms to revoke access to specific row access policies. Revoke global `CREATE ROW ACCESS POLICY` and `APPLY ROW ACCESS POLICY` privileges with the standard syntax against `ON *.*`. + +### Revoking Role + +```sql +-- Revoke a role from a user +REVOKE ROLE FROM + +-- Revoke a role from a role +REVOKE ROLE FROM ROLE +``` + +## Examples + +### Example 1: Revoking Privileges from a Role + +Create a role: +```sql +CREATE ROLE user1_role; +``` + +Grant the `SELECT,INSERT` privilege on all existing tables in the `default` database to the role `user1_role`: + +```sql +GRANT SELECT,INSERT ON default.* TO ROLE user1_role; +``` +```sql +SHOW GRANTS FOR ROLE user1_role; ++---------------------------------------------------------+ +| Grants | ++---------------------------------------------------------+ +| GRANT SELECT,INSERT ON 'default'.* TO ROLE 'user1_role' | ++---------------------------------------------------------+ +``` + +Revoke `INSERT` privilege from role `user1_role`: +```sql +REVOKE INSERT ON default.* FROM ROLE user1_role; +``` + +```sql +SHOW GRANTS FOR ROLE user1_role; ++---------------------------------------------------+ +| Grants | ++---------------------------------------------------+ +| GRANT SELECT ON 'default'.* TO 'user1_role' | ++---------------------------------------------------+ +``` + +### Example 2: Revoking Privileges from Another Role + +Grant the `SELECT,INSERT` privilege on all existing tables in the `mydb` database to the role `role1`: + +Create role: +```sql +CREATE ROLE role1; +``` + +Grant privileges to the role: +```sql +GRANT SELECT,INSERT ON mydb.* TO ROLE role1; +``` + +Show the grants for the role: +```sql +SHOW GRANTS FOR ROLE role1; ++--------------------------------------------+ +| Grants | ++--------------------------------------------+ +| GRANT SELECT,INSERT ON 'mydb'.* TO 'role1' | ++--------------------------------------------+ +``` + +Revoke `INSERT` privilege from role `role1`: +```sql +REVOKE INSERT ON mydb.* FROM ROLE role1; +``` + +```sql +SHOW GRANTS FOR ROLE role1; ++-------------------------------------+ +| Grants | ++-------------------------------------+ +| GRANT SELECT ON 'mydb'.* TO 'role1' | ++-------------------------------------+ +``` + +### Example 3: Revoking a Role from a User + +```sql +REVOKE ROLE role1 FROM USER user1; +``` + +```sql +SHOW GRANTS FOR user1; +``` + +### Example 4: Revoking Masking Policy Privileges + +```sql +-- Remove per-policy access from a role +REVOKE APPLY ON MASKING POLICY email_mask FROM ROLE pii_readers; + +-- Revoke the ability to create masking policies at the account level +REVOKE CREATE MASKING POLICY ON *.* FROM ROLE security_admin; +``` + +### Example 5: Revoking Row Access Policy Privileges + +```sql +-- Remove per-policy access from a role +REVOKE APPLY ON ROW ACCESS POLICY rap_region FROM ROLE apac_only; + +-- Revoke the ability to create row access policies globally +REVOKE CREATE ROW ACCESS POLICY ON *.* FROM ROLE row_policy_admin; +``` diff --git a/tidb-cloud-lake/sql/right.md b/tidb-cloud-lake/sql/right.md new file mode 100644 index 0000000000000..e79d74ca1552c --- /dev/null +++ b/tidb-cloud-lake/sql/right.md @@ -0,0 +1,33 @@ +--- +title: RIGHT +--- + +Returns the rightmost len characters from the string str, or NULL if any argument is NULL. + +## Syntax + +```sql +RIGHT(, ); +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------------------------------------------------| +| `` | The main string from where the character to be extracted | +| `` | The count of characters | + +## Return Type + +`VARCHAR` + +## Examples + +```sql +SELECT RIGHT('foobarbar', 4); ++-----------------------+ +| RIGHT('foobarbar', 4) | ++-----------------------+ +| rbar | ++-----------------------+ +``` diff --git a/tidb-cloud-lake/sql/rlike.md b/tidb-cloud-lake/sql/rlike.md new file mode 100644 index 0000000000000..39fa88d7ee158 --- /dev/null +++ b/tidb-cloud-lake/sql/rlike.md @@ -0,0 +1,5 @@ +--- +title: RLIKE +--- + +Alias for [REGEXP](/tidb-cloud-lake/sql/regexp.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/rollback.md b/tidb-cloud-lake/sql/rollback.md new file mode 100644 index 0000000000000..cc01a0b363edb --- /dev/null +++ b/tidb-cloud-lake/sql/rollback.md @@ -0,0 +1,18 @@ +--- +title: ROLLBACK +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Undoes all changes made during a transaction. [BEGIN](/tidb-cloud-lake/sql/begin.md) and [COMMIT](/tidb-cloud-lake/sql/commit.md)/ROLLBACK must be used together to start and then either save or undo a transaction. + +## Syntax + +```sql +ROLLBACK +``` + +## Examples + +See [Examples](/tidb-cloud-lake/sql/begin.md#examples). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/round.md b/tidb-cloud-lake/sql/round.md new file mode 100644 index 0000000000000..191706a4ef697 --- /dev/null +++ b/tidb-cloud-lake/sql/round.md @@ -0,0 +1,30 @@ +--- +title: ROUND +--- + +Rounds the argument x to d decimal places. The rounding algorithm depends on the data type of x. d defaults to 0 if not specified. d can be negative to cause d digits left of the decimal point of the value x to become zero. The maximum absolute value for d is 30; any digits in excess of 30 (or -30) are truncated. + +When using this function's result in calculations, be aware of potential precision issues due to its return data type being DOUBLE, which may affect final accuracy: + +```sql +SELECT ROUND(4/7, 4) - ROUND(3/7, 4); -- Result: 0.14280000000000004 +SELECT ROUND(4/7, 4)::DECIMAL(8,4) - ROUND(3/7, 4)::DECIMAL(8,4); -- Result: 0.1428 +``` + +## Syntax + +```sql +ROUND( ) +``` + +## Examples + +```sql +SELECT ROUND(0.123, 2); + +┌─────────────────┐ +│ round(0.123, 2) │ +├─────────────────┤ +│ 0.12 │ +└─────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/row-number.md b/tidb-cloud-lake/sql/row-number.md new file mode 100644 index 0000000000000..dedb49618022a --- /dev/null +++ b/tidb-cloud-lake/sql/row-number.md @@ -0,0 +1,94 @@ +--- +title: ROW_NUMBER +--- + +Assigns a sequential number to each row within a partition, starting from 1. + +## Syntax + +```sql +ROW_NUMBER() +OVER ( + [ PARTITION BY partition_expression ] + ORDER BY sort_expression [ ASC | DESC ] +) +``` + +**Arguments:** +- `PARTITION BY`: Optional. Divides rows into partitions +- `ORDER BY`: Required. Determines the row numbering order +- `ASC | DESC`: Optional. Sort direction (default: ASC) + +**Notes:** +- Returns sequential integers starting from 1 +- Each partition restarts numbering from 1 +- Commonly used for ranking and pagination + +## Examples + +```sql +-- Create sample data +CREATE TABLE scores ( + student VARCHAR(20), + subject VARCHAR(20), + score INT +); + +INSERT INTO scores VALUES + ('Alice', 'Math', 95), + ('Alice', 'English', 87), + ('Alice', 'Science', 92), + ('Bob', 'Math', 78), + ('Bob', 'English', 85), + ('Bob', 'Science', 80), + ('Charlie', 'Math', 88), + ('Charlie', 'English', 90), + ('Charlie', 'Science', 85); +``` + +**Number all rows sequentially (even with tied scores):** + +```sql +SELECT student, subject, score, + ROW_NUMBER() OVER (ORDER BY score DESC, student, subject) AS row_num +FROM scores +ORDER BY score DESC, student, subject; +``` + +Result: +``` +student | subject | score | row_num +--------+---------+-------+-------- +Alice | Math | 95 | 1 +Alice | Science | 92 | 2 +Charlie | English | 90 | 3 +Charlie | Math | 88 | 4 +Alice | English | 87 | 5 +Bob | English | 85 | 6 +Charlie | Science | 85 | 7 +Bob | Science | 80 | 8 +Bob | Math | 78 | 9 +``` + +**Number rows within each student (for pagination/top-N):** + +```sql +SELECT student, subject, score, + ROW_NUMBER() OVER (PARTITION BY student ORDER BY score DESC) AS subject_rank +FROM scores +ORDER BY student, score DESC; +``` + +Result: +``` +student | subject | score | subject_rank +--------+---------+-------+------------- +Alice | Math | 95 | 1 +Alice | Science | 92 | 2 +Alice | English | 87 | 3 +Bob | English | 85 | 1 +Bob | Science | 80 | 2 +Bob | Math | 78 | 3 +Charlie | English | 90 | 1 +Charlie | Math | 88 | 2 +Charlie | Science | 85 | 3 \ No newline at end of file diff --git a/tidb-cloud-lake/sql/rows-between.md b/tidb-cloud-lake/sql/rows-between.md new file mode 100644 index 0000000000000..71260d207f878 --- /dev/null +++ b/tidb-cloud-lake/sql/rows-between.md @@ -0,0 +1,315 @@ +--- +title: ROWS BETWEEN +--- + +Defines a window frame using row-based boundaries for window functions. + +## Overview + +The `ROWS BETWEEN` clause specifies which rows to include in the window frame for window function calculations. It allows you to define sliding windows, cumulative calculations, and other row-based aggregations. + +## Syntax + +```sql +FUNCTION() OVER ( + [ PARTITION BY partition_expression ] + [ ORDER BY sort_expression ] + ROWS BETWEEN frame_start AND frame_end +) +``` + +### Frame Boundaries + +| Boundary | Description | Example | +|----------|-------------|---------| +| `UNBOUNDED PRECEDING` | Start of partition | `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW` | +| `n PRECEDING` | n rows before current row | `ROWS BETWEEN 2 PRECEDING AND CURRENT ROW` | +| `CURRENT ROW` | Current row | `ROWS BETWEEN CURRENT ROW AND CURRENT ROW` | +| `n FOLLOWING` | n rows after current row | `ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING` | +| `UNBOUNDED FOLLOWING` | End of partition | `ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING` | + +## ROWS vs RANGE + +| Aspect | ROWS | RANGE | +|--------|------|-------| +| **Definition** | Physical row count | Logical value range | +| **Boundaries** | Row positions | Value-based positions | +| **Ties** | Each row independent | Tied values share same frame | +| **Performance** | Generally faster | May be slower with duplicates | +| **Use Case** | Moving averages, running totals | Value-based windows, percentile calculations | + +## Examples + +### Sample Data + +```sql +CREATE OR REPLACE TABLE sales ( + sale_date DATE, + product VARCHAR(20), + amount DECIMAL(10,2) +); + +INSERT INTO sales VALUES + ('2024-01-01', 'A', 100.00), + ('2024-01-02', 'A', 150.00), + ('2024-01-03', 'A', 200.00), + ('2024-01-04', 'A', 250.00), + ('2024-01-05', 'A', 300.00), + ('2024-01-01', 'B', 50.00), + ('2024-01-02', 'B', 75.00), + ('2024-01-03', 'B', 100.00), + ('2024-01-04', 'B', 125.00), + ('2024-01-05', 'B', 150.00); +``` + +### 1. Running Total (Cumulative Sum) + +```sql +SELECT sale_date, product, amount, + SUM(amount) OVER ( + PARTITION BY product + ORDER BY sale_date + ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW + ) AS running_total +FROM sales +ORDER BY product, sale_date; +``` + +Result: +``` +sale_date | product | amount | running_total +------------+---------+--------+-------------- +2024-01-01 | A | 100.00 | 100.00 +2024-01-02 | A | 150.00 | 250.00 +2024-01-03 | A | 200.00 | 450.00 +2024-01-04 | A | 250.00 | 700.00 +2024-01-05 | A | 300.00 | 1000.00 +2024-01-01 | B | 50.00 | 50.00 +2024-01-02 | B | 75.00 | 125.00 +2024-01-03 | B | 100.00 | 225.00 +2024-01-04 | B | 125.00 | 350.00 +2024-01-05 | B | 150.00 | 500.00 +``` + +### 2. Moving Average (3-Day Window) + +```sql +SELECT sale_date, product, amount, + AVG(amount) OVER ( + PARTITION BY product + ORDER BY sale_date + ROWS BETWEEN 2 PRECEDING AND CURRENT ROW + ) AS moving_avg_3day +FROM sales +ORDER BY product, sale_date; +``` + +Result: +``` +sale_date | product | amount | moving_avg_3day +------------+---------+--------+---------------- +2024-01-01 | A | 100.00 | 100.00 +2024-01-02 | A | 150.00 | 125.00 -- (100+150)/2 +2024-01-03 | A | 200.00 | 150.00 -- (100+150+200)/3 +2024-01-04 | A | 250.00 | 200.00 -- (150+200+250)/3 +2024-01-05 | A | 300.00 | 250.00 -- (200+250+300)/3 +``` + +### 3. Centered Window (Current + 1 Before + 1 After) + +```sql +SELECT sale_date, product, amount, + SUM(amount) OVER ( + PARTITION BY product + ORDER BY sale_date + ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING + ) AS centered_sum +FROM sales +ORDER BY product, sale_date; +``` + +Result: +``` +sale_date | product | amount | centered_sum +------------+---------+--------+------------- +2024-01-01 | A | 100.00 | 250.00 -- (100+150) +2024-01-02 | A | 150.00 | 450.00 -- (100+150+200) +2024-01-03 | A | 200.00 | 600.00 -- (150+200+250) +2024-01-04 | A | 250.00 | 750.00 -- (200+250+300) +2024-01-05 | A | 300.00 | 550.00 -- (250+300) +``` + +### 4. Future Looking Window + +```sql +SELECT sale_date, product, amount, + MIN(amount) OVER ( + PARTITION BY product + ORDER BY sale_date + ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING + ) AS min_next_3days +FROM sales +ORDER BY product, sale_date; +``` + +Result: +``` +sale_date | product | amount | min_next_3days +------------+---------+--------+--------------- +2024-01-01 | A | 100.00 | 100.00 -- min(100,150,200) +2024-01-02 | A | 150.00 | 150.00 -- min(150,200,250) +2024-01-03 | A | 200.00 | 200.00 -- min(200,250,300) +2024-01-04 | A | 250.00 | 250.00 -- min(250,300) +2024-01-05 | A | 300.00 | 300.00 -- min(300) +``` + +### 5. Full Partition Window + +```sql +SELECT sale_date, product, amount, + MAX(amount) OVER ( + PARTITION BY product + ORDER BY sale_date + ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING + ) AS max_in_partition, + MIN(amount) OVER ( + PARTITION BY product + ORDER BY sale_date + ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING + ) AS min_in_partition +FROM sales +ORDER BY product, sale_date; +``` + +Result: +``` +sale_date | product | amount | max_in_partition | min_in_partition +------------+---------+--------+------------------+----------------- +2024-01-01 | A | 100.00 | 300.00 | 100.00 +2024-01-02 | A | 150.00 | 300.00 | 100.00 +2024-01-03 | A | 200.00 | 300.00 | 100.00 +2024-01-04 | A | 250.00 | 300.00 | 100.00 +2024-01-05 | A | 300.00 | 300.00 | 100.00 +``` + +## Common Patterns + +### Running Calculations +**Syntax examples (not complete statements):** +```sql +-- Running total +SUM(column) OVER (ORDER BY sort_col ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) + +-- Running average +AVG(column) OVER (ORDER BY sort_col ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) + +-- Running count +COUNT(*) OVER (ORDER BY sort_col ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) +``` + +**Complete example:** +```sql +-- Running total with actual table +SELECT sale_date, product, amount, + SUM(amount) OVER ( + ORDER BY sale_date + ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW + ) AS running_total +FROM sales +ORDER BY sale_date; +``` + +### Moving Windows +**Syntax examples:** +```sql +-- 3-period moving average +AVG(column) OVER (ORDER BY sort_col ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) + +-- 5-period moving sum +SUM(column) OVER (ORDER BY sort_col ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) + +-- Centered 3-period window +AVG(column) OVER (ORDER BY sort_col ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) +``` + +**Complete example:** +```sql +-- 3-day moving average +SELECT sale_date, amount, + AVG(amount) OVER ( + ORDER BY sale_date + ROWS BETWEEN 2 PRECEDING AND CURRENT ROW + ) AS moving_avg_3day +FROM sales +ORDER BY sale_date; +``` + +### Bounded Windows +**Syntax examples:** +```sql +-- First 3 rows of partition +SUM(column) OVER (ORDER BY sort_col ROWS BETWEEN UNBOUNDED PRECEDING AND 2 FOLLOWING) + +-- Last 3 rows of partition +SUM(column) OVER (ORDER BY sort_col ROWS BETWEEN 2 PRECEDING AND UNBOUNDED FOLLOWING) + +-- Fixed window of 5 rows +AVG(column) OVER (ORDER BY sort_col ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) +``` + +**Complete example:** +```sql +-- Fixed 5-row window average +SELECT sale_date, amount, + AVG(amount) OVER ( + ORDER BY sale_date + ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING + ) AS avg_5row_window +FROM sales +ORDER BY sale_date; +``` + +## Best Practices + +1. **Use ROWS for physical row counts** when you need exact row-based windows +2. **Always include ORDER BY** when using ROWS BETWEEN (except for UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) +3. **Consider performance** with large windows - smaller windows are more efficient +4. **Handle edge cases** - windows may be smaller at partition boundaries +5. **Combine with PARTITION BY** for per-group calculations +6. **Understand boundary behavior** - windows shrink at partition edges + +### Boundary Behavior Examples + +**Centered window at partition edges:** +```sql +-- For row 1: ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING +-- Actual window: CURRENT ROW AND 1 FOLLOWING (no preceding row exists) + +-- For last row: ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING +-- Actual window: 1 PRECEDING AND CURRENT ROW (no following row exists) +``` + +**Moving average at start:** +```sql +-- For row 1: ROWS BETWEEN 2 PRECEDING AND CURRENT ROW +-- Actual window: CURRENT ROW only (no preceding rows) + +-- For row 2: ROWS BETWEEN 2 PRECEDING AND CURRENT ROW +-- Actual window: 1 PRECEDING AND CURRENT ROW (only 1 preceding row exists) +``` + +This is normal behavior - the window frame adapts to available rows at partition boundaries. + +## Limitations + +1. **n must be non-negative integer** - cannot use negative values or expressions +2. **ORDER BY required** for most window frames (except full partition) +3. **Frame boundaries must be ordered** - start_bound <= end_bound +4. **Cannot mix PRECEDING and FOLLOWING arbitrarily** - must form valid window + +## See Also + +- [Window Functions Overview](/tidb-cloud-lake/sql/window-functions.md) +- [RANGE BETWEEN](/tidb-cloud-lake/sql/range-between.md) - Value-based window frames +- [Aggregate Functions](/tidb-cloud-lake/sql/aggregate-functions.md) - Functions that can use window frames +- [FIRST_VALUE](/tidb-cloud-lake/sql/first-value.md) - Window function examples with frames \ No newline at end of file diff --git a/tidb-cloud-lake/sql/rpad.md b/tidb-cloud-lake/sql/rpad.md new file mode 100644 index 0000000000000..9f53a6b633e55 --- /dev/null +++ b/tidb-cloud-lake/sql/rpad.md @@ -0,0 +1,42 @@ +--- +title: RPAD +--- + +Returns the string str, right-padded with the string padstr to a length of len characters. +If str is longer than len, the return value is shortened to len characters. + +## Syntax + +```sql +RPAD(, , ) +``` + +## Arguments + +| Arguments | Description | +|------------|-----------------| +| `` | The string. | +| `` | The length. | +| `` | The pad string. | + +## Return Type + +`VARCHAR` + +## Examples + +```sql +SELECT RPAD('hi',5,'?'); ++--------------------+ +| RPAD('hi', 5, '?') | ++--------------------+ +| hi??? | ++--------------------+ + +SELECT RPAD('hi',1,'?'); ++--------------------+ +| RPAD('hi', 1, '?') | ++--------------------+ +| h | ++--------------------+ +``` diff --git a/tidb-cloud-lake/sql/rtrim.md b/tidb-cloud-lake/sql/rtrim.md new file mode 100644 index 0000000000000..f5596ab2ff17b --- /dev/null +++ b/tidb-cloud-lake/sql/rtrim.md @@ -0,0 +1,31 @@ +--- +title: RTRIM +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Removes all occurrences of any character present in the specified trim string from the right side of the string. + +See also: + +- [TRIM_TRAILING](/tidb-cloud-lake/sql/trim-trailing.md) +- [LTRIM](/tidb-cloud-lake/sql/ltrim.md) + +## Syntax + +```sql +RTRIM(, ) +``` + +## Examples + +```sql +SELECT RTRIM('databendxx', 'x'), RTRIM('databendxx', 'xy'); + +┌──────────────────────────────────────────────────────┐ +│ rtrim('databendxx', 'x') │ rtrim('databendxx', 'xy') │ +├──────────────────────────┼───────────────────────────┤ +│ databend │ databend │ +└──────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/score.md b/tidb-cloud-lake/sql/score.md new file mode 100644 index 0000000000000..232285a4a5ca6 --- /dev/null +++ b/tidb-cloud-lake/sql/score.md @@ -0,0 +1,62 @@ +--- +title: SCORE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +`SCORE()` returns the relevance score assigned to the current row by the inverted index search. Use it together with [MATCH](/tidb-cloud-lake/sql/match.md) or [QUERY](/tidb-cloud-lake/sql/query.md) in a `WHERE` clause. + +:::info +Databend's SCORE function is inspired by Elasticsearch's [SCORE](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html#sql-functions-search-score). +::: + +## Syntax + +```sql +SCORE() +``` + +## Examples + +### Example: Prepare Text Notes for MATCH + +```sql +CREATE OR REPLACE TABLE frame_notes ( + id INT, + camera STRING, + summary STRING, + tags STRING, + INVERTED INDEX idx_notes (summary, tags) +); + +INSERT INTO frame_notes VALUES + (1, 'dashcam_front', + 'Green light at Market & 5th with pedestrian entering the crosswalk', + 'downtown commute green-light pedestrian'), + (2, 'dashcam_front', + 'Vehicle stopped at Mission & 6th red traffic light with cyclist ahead', + 'stop urban red-light cyclist'), + (3, 'dashcam_front', + 'School zone caution sign in SOMA with pedestrian waiting near crosswalk', + 'school-zone caution pedestrian'); +``` + +### Example: Score MATCH Results + +```sql +SELECT summary, SCORE() +FROM frame_notes +WHERE MATCH('summary^2, tags', 'traffic light red', 'operator=AND') +ORDER BY SCORE() DESC; +``` + +### Example: Score QUERY Results + +Reusing the `frames` table from the [QUERY](/tidb-cloud-lake/sql/query.md) examples: + +```sql +SELECT id, SCORE() +FROM frames +WHERE QUERY('meta.detections.label:pedestrian^3 AND meta.scene.time_of_day:day'); +``` diff --git a/tidb-cloud-lake/sql/second.md b/tidb-cloud-lake/sql/second.md new file mode 100644 index 0000000000000..9ecb49eb78f26 --- /dev/null +++ b/tidb-cloud-lake/sql/second.md @@ -0,0 +1,34 @@ +--- +title: TO_SECOND +--- + +Converts a date with time (timestamp/datetime) to a UInt8 number containing the number of the second in the minute (0-59). + +## Syntax + +```sql +TO_SECOND() +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | timestamp | + +## Return Type + +`TINYINT` + +## Examples + +```sql +SELECT + to_second('2023-11-12 09:38:18.165575'); + +┌─────────────────────────────────────────┐ +│ to_second('2023-11-12 09:38:18.165575') │ +├─────────────────────────────────────────┤ +│ 18 │ +└─────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/seconds.md b/tidb-cloud-lake/sql/seconds.md new file mode 100644 index 0000000000000..847a86d3fc7b8 --- /dev/null +++ b/tidb-cloud-lake/sql/seconds.md @@ -0,0 +1,36 @@ +--- +title: TO_SECONDS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts a specified number of seconds into an Interval type. + +- Accepts positive integers, zero, and negative integers as input. + +## Syntax + +```sql +TO_SECONDS() +``` + +## Aliases + +- [EPOCH](/tidb-cloud-lake/sql/epoch.md) + +## Return Type + +Interval (in the format `hh:mm:ss`). + +## Examples + +```sql +SELECT TO_SECONDS(2), TO_SECONDS(0), TO_SECONDS((- 2)); + +┌─────────────────────────────────────────────────┐ +│ to_seconds(2) │ to_seconds(0) │ to_seconds(- 2) │ +├───────────────┼───────────────┼─────────────────┤ +│ 0:00:02 │ 00:00:00 │ -0:00:02 │ +└─────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/select.md b/tidb-cloud-lake/sql/select.md new file mode 100644 index 0000000000000..e05475ab63d61 --- /dev/null +++ b/tidb-cloud-lake/sql/select.md @@ -0,0 +1,533 @@ +--- +title: SELECT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +import DetailsWrap from '@site/src/components/DetailsWrap'; + +Retrieves data from a table. + +## Syntax + +```sql +[WITH] +SELECT + [ALL | DISTINCT] + [ TOP ] + | [[AS] ] | $ [, ...] | * + COLUMNS + [EXCLUDE ( [, , , ...] ) ] + [FROM table_references] + [AT ...] + [WHERE ] + [GROUP BY {{ | | | }, + ... | }] + [HAVING ] + [ORDER BY { | | | } [ASC | DESC], + [ NULLS { FIRST | LAST }] + [LIMIT ] + [OFFSET ] + [IGNORE_RESULT] +``` +- The SELECT statement also allows you to query staged files directly. For syntax and examples, see [Efficient Data Transformation with Databend](/tidb-cloud-lake/sql/stage.md). + +- In the examples on this page, the table `numbers(N)` is used for testing, with a single UInt64 column (named `number`) that contains integers from 0 to N-1. + +## SELECT Clause + +### AS Keyword + +In Databend, you can use the AS keyword to assign an alias to a column. This allows you to provide a more descriptive and easily understandable name for the column in both the SQL statement and the query result: + +- Databend suggests avoiding special characters as much as possible when creating column aliases. However, if special characters are necessary in some cases, the alias should be enclosed in backticks, like this: SELECT price AS \`$CA\` FROM ... + +- Databend will automatically convert aliases into lowercase. For example, if you alias a column as *Total*, it will appear as *total* in the result. If the capitalization matters to you, enclose the alias in backticks: \`Total\`. + +```sql +SELECT number AS Total FROM numbers(3); ++--------+ +| total | ++--------+ +| 0 | +| 1 | +| 2 | ++--------+ + +SELECT number AS `Total` FROM numbers(3); ++--------+ +| Total | ++--------+ +| 0 | +| 1 | +| 2 | ++--------+ +``` + +If you alias a column in the SELECT clause, you can refer to the alias in the WHERE, GROUP BY, and HAVING clauses, as well as in the SELECT clause itself after the alias is defined. + +```sql +SELECT number * 2 AS a, a * 2 AS double FROM numbers(3) WHERE (a + 1) % 3 = 0; ++---+--------+ +| a | double | ++---+--------+ +| 2 | 4 | ++---+--------+ + +SELECT MAX(number) AS b, number % 3 AS c FROM numbers(100) GROUP BY c HAVING b > 8; ++----+---+ +| b | c | ++----+---+ +| 99 | 0 | +| 97 | 1 | +| 98 | 2 | ++----+---+ +``` + +If you assign an alias to a column and the alias name is the same as the column name, the WHERE and GROUP BY clauses will recognize the alias as the column name. However, the HAVING clause will recognize the alias as the alias itself. + +```sql +SELECT number * 2 AS number FROM numbers(3) +WHERE (number + 1) % 3 = 0 +GROUP BY number +HAVING number > 5; + ++--------+ +| number | ++--------+ +| 10 | +| 16 | ++--------+ +``` + +### EXCLUDE Keyword + +Excludes one or more columns by their names from the result. The keyword is usually used in conjunction with `SELECT * ...` to exclude a few columns from the result instead of retrieving them all. + +```sql +SELECT * FROM allemployees ORDER BY id; + +--- +| id | firstname | lastname | gender | +|----|-----------|----------|--------| +| 1 | Ryan | Tory | M | +| 2 | Oliver | Green | M | +| 3 | Noah | Shuster | M | +| 4 | Lily | McMeant | F | +| 5 | Macy | Lee | F | + +-- Exclude the column "id" from the result +SELECT * EXCLUDE id FROM allemployees; + +--- +| firstname | lastname | gender | +|-----------|----------|--------| +| Noah | Shuster | M | +| Ryan | Tory | M | +| Oliver | Green | M | +| Lily | McMeant | F | +| Macy | Lee | F | + +-- Exclude the columns "id" and "lastname" from the result +SELECT * EXCLUDE (id,lastname) FROM allemployees; + +--- +| firstname | gender | +|-----------|--------| +| Oliver | M | +| Ryan | M | +| Lily | F | +| Noah | M | +| Macy | F | +``` + +### COLUMNS Keyword + +The COLUMNS keyword provides a flexible mechanism for column selection based on literal regular expression patterns and lambda expressions. + +```sql +CREATE TABLE employee ( + employee_id INT, + employee_name VARCHAR(255), + department VARCHAR(50), + salary DECIMAL(10, 2) +); + +INSERT INTO employee VALUES +(1, 'Alice', 'HR', 60000.00), +(2, 'Bob', 'IT', 75000.00), +(3, 'Charlie', 'Marketing', 50000.00), +(4, 'David', 'Finance', 80000.00); + + +-- Select columns with names starting with 'employee' +SELECT COLUMNS('employee.*') FROM employee; + +┌────────────────────────────────────┐ +│ employee_id │ employee_name │ +├─────────────────┼──────────────────┤ +│ 1 │ Alice │ +│ 2 │ Bob │ +│ 3 │ Charlie │ +│ 4 │ David │ +└────────────────────────────────────┘ + +-- Select columns where the name contains the substring 'name' +SELECT COLUMNS(x -> x LIKE '%name%') FROM employee; + +┌──────────────────┐ +│ employee_name │ +├──────────────────┤ +│ Alice │ +│ Bob │ +│ Charlie │ +│ David │ +└──────────────────┘ +``` + +The COLUMNS keyword can also be utilized with EXCLUDE to explicitly exclude specific columns from the query result. + +```sql +-- Select all columns excluding 'salary' from the 'employee' table +SELECT COLUMNS(* EXCLUDE salary) FROM employee; + +┌───────────────────────────────────────────────────────┐ +│ employee_id │ employee_name │ department │ +├─────────────────┼──────────────────┼──────────────────┤ +│ 1 │ Alice │ HR │ +│ 2 │ Bob │ IT │ +│ 3 │ Charlie │ Marketing │ +│ 4 │ David │ Finance │ +└───────────────────────────────────────────────────────┘ +``` + +### Column Position + +By using $N, you can represent a column within the SELECT clause. For example, $2 represents the second column: + +```sql +CREATE TABLE IF NOT EXISTS t1(a int, b varchar); +INSERT INTO t1 VALUES (1, 'a'), (2, 'b'); +SELECT a, $2 FROM t1; + ++---+-------+ +| a | $2 | ++---+-------+ +| 1 | a | +| 2 | b | ++---+-------+ +``` + +### Retrieving All Columns + +The `SELECT *` statement is used to retrieve all columns from a table or query result. It is a convenient way to fetch complete data sets without specifying individual column names. + +This example returns all columns from my_table: + +```sql +SELECT * FROM my_table; +``` + +Databend extends SQL syntax by allowing queries to start with `FROM
` without explicitly using `SELECT *`: + +```sql +FROM my_table; +``` + +This is equivalent to: + +```sql +SELECT * FROM my_table; +``` + +## FROM Clause + +The FROM clause in a SELECT statement specifies the source table or tables from which data will be queried. You can also improve code readability by placing the FROM clause before the SELECT clause, especially when managing a lengthy SELECT list or aiming to quickly identify the origins of selected columns. + +```sql +-- The following two statements are equivalent: + +-- Statement 1: Using SELECT clause with FROM clause +SELECT number FROM numbers(3); + +-- Statement 2: Equivalent representation with FROM clause preceding SELECT clause +FROM numbers(3) SELECT number; + ++--------+ +| number | ++--------+ +| 0 | +| 1 | +| 2 | ++--------+ +``` + +The FROM clause can also specify a location, enabling direct querying of data from various sources and eliminating the need to first load it into a table. For more information, see [Querying Staged Files](/tidb-cloud-lake/sql/stage.md). + +## AT Clause + +The AT clause enables you to query previous versions of your data. For more information, see [AT](/tidb-cloud-lake/sql/at.md). + +## WHERE Clause + +```sql +SELECT number FROM numbers(3) WHERE number > 1; ++--------+ +| number | ++--------+ +| 2 | ++--------+ +``` + +## GROUP BY Clause + +```sql +--Group the rows of the result set by column alias +SELECT number%2 as c1, number%3 as c2, MAX(number) FROM numbers(10000) GROUP BY c1, c2; ++------+------+-------------+ +| c1 | c2 | MAX(number) | ++------+------+-------------+ +| 1 | 2 | 9995 | +| 1 | 1 | 9997 | +| 0 | 2 | 9998 | +| 0 | 1 | 9994 | +| 0 | 0 | 9996 | +| 1 | 0 | 9999 | ++------+------+-------------+ + +--Group the rows of the result set by column position in the SELECT list +SELECT number%2 as c1, number%3 as c2, MAX(number) FROM numbers(10000) GROUP BY 1, 2; ++------+------+-------------+ +| c1 | c2 | MAX(number) | ++------+------+-------------+ +| 1 | 2 | 9995 | +| 1 | 1 | 9997 | +| 0 | 2 | 9998 | +| 0 | 1 | 9994 | +| 0 | 0 | 9996 | +| 1 | 0 | 9999 | ++------+------+-------------+ + +``` + +## HAVING Clause + +```sql +SELECT + number % 2 as c1, + number % 3 as c2, + MAX(number) as max +FROM + numbers(10000) +GROUP BY + c1, c2 +HAVING + max > 9996; + ++------+------+------+ +| c1 | c2 | max | ++------+------+------+ +| 1 | 0 | 9999 | +| 1 | 1 | 9997 | +| 0 | 2 | 9998 | ++------+------+------+ +``` + +## ORDER BY Clause + +```sql +--Sort by column name in ascending order. +SELECT number FROM numbers(5) ORDER BY number ASC; ++--------+ +| number | ++--------+ +| 0 | +| 1 | +| 2 | +| 3 | +| 4 | ++--------+ + +--Sort by column name in descending order. +SELECT number FROM numbers(5) ORDER BY number DESC; ++--------+ +| number | ++--------+ +| 4 | +| 3 | +| 2 | +| 1 | +| 0 | ++--------+ + +--Sort by column alias. +SELECT number%2 AS c1, number%3 AS c2 FROM numbers(5) ORDER BY c1 ASC, c2 DESC; ++------+------+ +| c1 | c2 | ++------+------+ +| 0 | 2 | +| 0 | 1 | +| 0 | 0 | +| 1 | 1 | +| 1 | 0 | ++------+------+ + +--Sort by column position in the SELECT list +SELECT * FROM t1 ORDER BY 2 DESC; ++------+------+ +| a | b | ++------+------+ +| 2 | 3 | +| 1 | 2 | ++------+------+ + +SELECT a FROM t1 ORDER BY 1 DESC; ++------+ +| a | ++------+ +| 2 | +| 1 | ++------+ + +--Sort with the NULLS FIRST or LAST option. + +CREATE TABLE t_null ( + number INTEGER +); + +INSERT INTO t_null VALUES (1); +INSERT INTO t_null VALUES (2); +INSERT INTO t_null VALUES (3); +INSERT INTO t_null VALUES (NULL); +INSERT INTO t_null VALUES (NULL); + +--Databend considers NULL values larger than any non-NULL values. +--The NULL values appear last in the following example that sorts the results in ascending order: + +SELECT number FROM t_null order by number ASC; ++--------+ +| number | ++--------+ +| 1 | +| 2 | +| 3 | +| NULL | +| NULL | ++--------+ + +-- To make the NULL values appear first in the preceding example, use the NULLS FIRST option: + +SELECT number FROM t_null order by number ASC nulls first; ++--------+ +| number | ++--------+ +| NULL | +| NULL | +| 1 | +| 2 | +| 3 | ++--------+ + +-- Use the NULLS LAST option to make the NULL values appear last in descending order: + +SELECT number FROM t_null order by number DESC nulls last; ++--------+ +| number | ++--------+ +| 3 | +| 2 | +| 1 | +| NULL | +| NULL | ++--------+ +``` + +## LIMIT Clause + +```sql +SELECT number FROM numbers(1000000000) LIMIT 1; ++--------+ +| number | ++--------+ +| 0 | ++--------+ + +SELECT number FROM numbers(100000) ORDER BY number LIMIT 2 OFFSET 10; ++--------+ +| number | ++--------+ +| 10 | +| 11 | ++--------+ +``` + +For optimizing query performance with large result sets, Databend has enabled the lazy_read_threshold option by default with a default value of 1,000. This option is specifically designed for queries that involve a LIMIT clause. When the lazy_read_threshold is enabled, the optimization is activated for queries where the specified LIMIT number is smaller than or equal to the threshold value you set. To disable the option, set it to 0. + + + +
+ How it works +
The optimization improves performance for queries with an ORDER BY clause and a LIMIT clause. When enabled and the LIMIT number in the query is smaller than the specified threshold, only the columns involved in the ORDER BY clause are retrieved and sorted, instead of the entire result set.

After the system retrieves and sorts the columns involved in the ORDER BY clause, it applies the LIMIT constraint to select the desired number of rows from the sorted result set. The system then returns the limited set of rows as the query result. This approach reduces resource usage by fetching and sorting only the necessary columns, and it further optimizes query execution by limiting the processed rows to the required subset.
+
+ +
+ +```sql +SELECT * FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10 ignore_result; +Empty set (0.300 sec) + +set lazy_read_threshold=0; +Query OK, 0 rows affected (0.004 sec) + +SELECT * FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10 ignore_result; +Empty set (0.897 sec) +``` + +## OFFSET Clause + +```sql +SELECT number FROM numbers(5) ORDER BY number OFFSET 2; ++--------+ +| number | ++--------+ +| 2 | +| 3 | +| 4 | ++--------+ +``` + +## IGNORE_RESULT + +Do not output the result set. + +```sql +SELECT number FROM numbers(2); ++--------+ +| number | ++--------+ +| 0 | +| 1 | ++--------+ + +SELECT number FROM numbers(2) IGNORE_RESULT; +-- Empty set +``` + +## Nested Sub-Selects + +SELECT statements can be nested in queries. + +``` +SELECT ... [SELECT ...[SELECT [...]]] +``` + +```sql +SELECT MIN(number) FROM (SELECT number%3 AS number FROM numbers(10)) GROUP BY number%2; ++-------------+ +| min(number) | ++-------------+ +| 1 | +| 0 | ++-------------+ +``` diff --git a/tidb-cloud-lake/sql/sequence-functions.md b/tidb-cloud-lake/sql/sequence-functions.md new file mode 100644 index 0000000000000..594730d00769a --- /dev/null +++ b/tidb-cloud-lake/sql/sequence-functions.md @@ -0,0 +1,11 @@ +--- +title: Sequence Functions +--- + +This section provides reference information for the sequence functions in Databend. Sequence functions allow you to work with sequence objects, which generate unique, auto-incrementing numeric values. + +## Available Sequence Functions + +| Function | Description | Example | +|----------|-------------|--------| +| [NEXTVAL](/tidb-cloud-lake/sql/nextval.md) | Retrieves the next value from a sequence | `NEXTVAL(my_sequence)` | diff --git a/tidb-cloud-lake/sql/sequence.md b/tidb-cloud-lake/sql/sequence.md new file mode 100644 index 0000000000000..d8c78071c712f --- /dev/null +++ b/tidb-cloud-lake/sql/sequence.md @@ -0,0 +1,23 @@ +--- +title: Sequence +--- + +This page provides a comprehensive overview of sequence operations in Databend, organized by functionality for easy reference. + +## Sequence Management + +| Command | Description | +|---------|-------------| +| [CREATE SEQUENCE](/tidb-cloud-lake/sql/create-sequence.md) | Creates a new sequence generator | +| [DROP SEQUENCE](/tidb-cloud-lake/sql/drop-sequence.md) | Removes a sequence generator | + +## Sequence Information + +| Command | Description | +|---------|-------------| +| [DESC SEQUENCE](/tidb-cloud-lake/sql/desc-sequence.md) | Shows detailed information about a sequence | +| [SHOW SEQUENCES](/tidb-cloud-lake/sql/show-sequences.md) | Lists all sequences in the current or specified database | + +:::note +Sequences in Databend are used to generate unique numeric values in a sequential order, commonly used for primary keys or other unique identifiers. +::: \ No newline at end of file diff --git a/tidb-cloud-lake/sql/set-cluster-key.md b/tidb-cloud-lake/sql/set-cluster-key.md new file mode 100644 index 0000000000000..1084ee19ebd93 --- /dev/null +++ b/tidb-cloud-lake/sql/set-cluster-key.md @@ -0,0 +1,31 @@ +--- +title: SET CLUSTER KEY +sidebar_position: 1 +--- + +Set a cluster key when creating a table. + +Cluster key is intended to improve query performance by physically clustering data together. For example, when you set a column as your cluster key for a table, the table data will be physically sorted by the column you set. This will maximize the query performance if your most queries are filtered by the column. + +> **Note:** For String column, the cluster statistics uses only the first 8 bytes. You can use a substring to provide sufficient cardinality. + +See also: + +* [ALTER CLUSTER KEY](/tidb-cloud-lake/sql/alter-cluster-key.md) +* [DROP CLUSTER KEY](/tidb-cloud-lake/sql/drop-cluster-key.md) + +## Syntax + +```sql +CREATE TABLE ... CLUSTER BY ( [ , ... ] ) +``` + +## Examples + +This command creates a table clustered by columns: + +```sql +CREATE TABLE t1(a int, b int) CLUSTER BY(b,a); + +CREATE TABLE t2(a int, b string) CLUSTER BY(SUBSTRING(b, 5, 6)); +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/set-operators.md b/tidb-cloud-lake/sql/set-operators.md new file mode 100644 index 0000000000000..6a8b2ac90a69a --- /dev/null +++ b/tidb-cloud-lake/sql/set-operators.md @@ -0,0 +1,162 @@ +--- +title: Set Operators +description: + Set operators combine the results of two queries into a single result. +--- + +Set operators combine the results of two queries into a single result. Databend supports the following set operators: + +- [INTERSECT](#intersect) +- [EXCEPT](#except) +- [UNION [ALL]](#union-all) + +## INTERSECT + +Returns all distinct rows selected by both queries. + +### Syntax + +```sql +SELECT column1 , column2 .... +FROM table_names +WHERE condition + +INTERSECT + +SELECT column1 , column2 .... +FROM table_names +WHERE condition +``` + +### Example + +```sql +create table t1(a int, b int); +create table t2(c int, d int); + +insert into t1 values(1, 2), (2, 3), (3 ,4), (2, 3); +insert into t2 values(2,2), (3, 5), (7 ,8), (2, 3), (3, 4); + +select * from t1 intersect select * from t2; +``` + +Output: + +```sql +2|3 +3|4 +``` + +## EXCEPT + +Returns All distinct rows selected by the first query but not the second. + +### Syntax + +```sql +SELECT column1 , column2 .... +FROM table_names +WHERE condition + +EXCEPT + +SELECT column1 , column2 .... +FROM table_names +WHERE condition +``` + +### Example + +```sql +create table t1(a int, b int); +create table t2(c int, d int); + +insert into t1 values(1, 2), (2, 3), (3 ,4), (2, 3); +insert into t2 values(2,2), (3, 5), (7 ,8), (2, 3), (3, 4); + +select * from t1 except select * from t2; +``` + +Output: + +```sql +1|2 +``` + +## UNION [ALL] + +Combines rows from two or more result sets. Each result set must return the same number of columns, and the corresponding columns must have the same or compatible data types. + +The command removes duplicate rows by default when combining result sets. To include duplicate rows, use **UNION ALL**. + +### Syntax + +```sql +SELECT column1 , column2 ... +FROM table_names +WHERE condition + +UNION [ALL] + +SELECT column1 , column2 ... +FROM table_names +WHERE condition + +[UNION [ALL] + +SELECT column1 , column2 ... +FROM table_names +WHERE condition]... + +[ORDER BY ...] +``` + +### Example + +```sql +CREATE TABLE support_team + ( + NAME STRING, + salary UINT32 + ); + +CREATE TABLE hr_team + ( + NAME STRING, + salary UINT32 + ); + +INSERT INTO support_team +VALUES ('Alice', + 1000), + ('Bob', + 3000), + ('Carol', + 5000); + +INSERT INTO hr_team +VALUES ('Davis', + 1000), + ('Eva', + 4000); + +-- The following code returns the employees in both teams who are paid less than 2,000 dollars: + +SELECT NAME AS SelectedEmployee, + salary +FROM support_team +WHERE salary < 2000 +UNION +SELECT NAME AS SelectedEmployee, + salary +FROM hr_team +WHERE salary < 2000 +ORDER BY selectedemployee DESC; +``` + +Output: + +```sql +Davis|1000 +Alice|1000 +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/set-role.md b/tidb-cloud-lake/sql/set-role.md new file mode 100644 index 0000000000000..fcacb6dab092a --- /dev/null +++ b/tidb-cloud-lake/sql/set-role.md @@ -0,0 +1,40 @@ +--- +title: SET ROLE +sidebar_position: 5 +--- + +Switches the active role for a session, and the currently active role can be viewed using the [SHOW ROLES](/tidb-cloud-lake/sql/show-roles.md) command, with the `is_current` field indicating the active role. For more information about the active role and secondary roles, see [Active Role & Secondary Roles](/tidb-cloud-lake/guides/roles.md#active-role--secondary-roles). + +See also: [SET SECONDARY ROLES](04-user-set-2nd-roles.md) + +## Syntax + +```sql +SET ROLE +``` + +## Examples + +```sql +SHOW ROLES; + +┌───────────────────────────────────────────────────────┐ +│ name │ inherited_roles │ is_current │ is_default │ +├───────────┼─────────────────┼────────────┼────────────┤ +│ developer │ 0 │ false │ false │ +│ public │ 0 │ false │ false │ +│ writer │ 0 │ true │ true │ +└───────────────────────────────────────────────────────┘ + +SET ROLE developer; + +SHOW ROLES; + +┌───────────────────────────────────────────────────────┐ +│ name │ inherited_roles │ is_current │ is_default │ +├───────────┼─────────────────┼────────────┼────────────┤ +│ developer │ 0 │ true │ false │ +│ public │ 0 │ false │ false │ +│ writer │ 0 │ false │ true │ +└───────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/set-secondary-roles.md b/tidb-cloud-lake/sql/set-secondary-roles.md new file mode 100644 index 0000000000000..b7e8624ef72c2 --- /dev/null +++ b/tidb-cloud-lake/sql/set-secondary-roles.md @@ -0,0 +1,86 @@ +--- +title: SET SECONDARY ROLES +sidebar_position: 6 +--- + +Activates all secondary roles for the current session. This means that all secondary roles granted to the user will be active, extending the user's privileges. For more information about the active role and secondary roles, see [Active Role & Secondary Roles](/tidb-cloud-lake/guides/roles.md#active-role--secondary-roles). + +See also: [SET ROLE](/tidb-cloud-lake/sql/set-role.md) + +## Syntax + +```sql +SET SECONDARY ROLES { ALL | NONE } +``` + +| Parameter | Default | Description | +|-----------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| ALL | Yes | Activates all secondary roles granted to the user for the current session, in addition to the active role. This enables the user to utilize the privileges associated with all secondary roles. | +| NONE | No | Deactivates all secondary roles for the current session, meaning only the active role's privileges are active. This restricts the user's privileges to those granted by the active role alone. | + +## Examples + +This example shows how secondary roles work and how to active/deactivate them. + +1. Creating roles as user root. + +First, let's create two roles, `admin` and `analyst`: + +```sql +CREATE ROLE admin; + +CREATE ROLE analyst; +``` + +2. Granting privileges. + +Next, let's grant some privileges to each role. For example, we'll grant the `admin` role the ability to create databases, and the `analyst` role the ability to select from tables: + +```sql +GRANT CREATE DATABASE ON *.* TO ROLE admin; + +GRANT SELECT ON *.* TO ROLE analyst; +``` + +3. Creating a user. + +Now, let's create a user: + +```sql +CREATE USER 'user1' IDENTIFIED BY 'password'; +``` + +4. Assigning roles. + +Assign both roles to the user: + +```sql +GRANT ROLE admin TO 'user1'; + +GRANT ROLE analyst TO 'user1'; +``` + +5. Setting active role. + +Now, let's log in to Databend as `user1`, the set the active role to `analyst`. + +```sql +SET ROLE analyst; +``` + +All secondary roles are activated by default, so we can create a new database: + +```sql +CREATE DATABASE my_db; +``` + +6. Deactivate secondary roles. + +The active role `analyst` does not have the CREATE DATABASE privilege. When all secondary roles are deactivated, creating a new database will fail. + +```sql +SET SECONDARY ROLES NONE; + +CREATE DATABASE my_db2; +error: APIError: ResponseError with 1063: Permission denied: privilege [CreateDatabase] is required on *.* for user 'user1'@'%' with roles [analyst,public] +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/set-var.md b/tidb-cloud-lake/sql/set-var.md new file mode 100644 index 0000000000000..b4f01f8cd58a1 --- /dev/null +++ b/tidb-cloud-lake/sql/set-var.md @@ -0,0 +1,122 @@ +--- +title: SET_VAR +--- + +SET_VAR is used to specify optimizer hints within a single SQL statement, allowing for finer control over the execution plan of that specific statement. This includes: + +:::note +SET_VAR will be deprecated in an upcoming release. Consider using the [SETTINGS Clause](/tidb-cloud-lake/sql/settings-clause.md) instead. +::: + +- Configure settings temporarily, affecting only the duration of the SQL statement execution. It's important to note that the settings specified with SET_VAR will solely impact the result of the current statement being executed and will not have any lasting effects on the overall database configuration. For a list of available settings that can be configured using SET_VAR, see [SHOW SETTINGS](/tidb-cloud-lake/sql/show-settings.md). To understand how it works, see these examples: + + - [Example 1. Temporarily Set Timezone](#example-1-temporarily-set-timezone) + - [Example 2: Control Parallel Processing for COPY INTO](#example-2-control-parallel-processing-for-copy-into) + +- Control the deduplication behavior on [INSERT](/tidb-cloud-lake/sql/insert.md), [UPDATE](/tidb-cloud-lake/sql/update.md), or [REPLACE](/tidb-cloud-lake/sql/replace.md) operations with the label *deduplicate_label*. For those operations with a deduplicate_label in the SQL statements, Databend executes only the first statement, and subsequent statements with the same deduplicate_label value are ignored, regardless of their intended data modifications. Please note that once you set a deduplicate_label, it will remain in effect for a period of 24 hours. To understand how the deduplicate_label assists in deduplication, see [Example 3: Set Deduplicate Label](#example-3-set-deduplicate-label). + +See also: +- [SETTINGS Clause](/tidb-cloud-lake/sql/settings-clause.md) +- [SET](02-set-global.md) + +## Syntax + +```sql +/*+ SET_VAR(key=value) SET_VAR(key=value) ... */ +``` + +- The hint must immediately follow an [SELECT](/tidb-cloud-lake/sql/select.md), [INSERT](/tidb-cloud-lake/sql/insert.md), [UPDATE](/tidb-cloud-lake/sql/update.md), [REPLACE](/tidb-cloud-lake/sql/replace.md), [MERGE](/tidb-cloud-lake/sql/merge.md),[DELETE](/tidb-cloud-lake/sql/dml.md), or [COPY](/tidb-cloud-lake/sql/copy-into-table.md) (INTO) keyword that begins the SQL statement. +- A SET_VAR can include only one Key=Value pair, which means you can configure only one setting with one SET_VAR. However, you can use multiple SET_VAR hints to configure multiple settings. + - If multiple SET_VAR hints containing a same key, the first Key=Value pair will be applied. + - If a key fails to parse or bind, all hints will be ignored. + +## Examples + +### Example 1: Temporarily Set Timezone + +```sql +root@localhost> SELECT TIMEZONE(); + +SELECT + TIMEZONE(); + +┌────────────┐ +│ timezone() │ +│ String │ +├────────────┤ +│ UTC │ +└────────────┘ + +1 row in 0.011 sec. Processed 1 rows, 1B (91.23 rows/s, 91B/s) + +root@localhost> SELECT /*+SET_VAR(timezone='America/Toronto') */ TIMEZONE(); + +SELECT + /*+SET_VAR(timezone='America/Toronto') */ + TIMEZONE(); + +┌─────────────────┐ +│ timezone() │ +│ String │ +├─────────────────┤ +│ America/Toronto │ +└─────────────────┘ + +1 row in 0.023 sec. Processed 1 rows, 1B (43.99 rows/s, 43B/s) + +root@localhost> SELECT TIMEZONE(); + +SELECT + TIMEZONE(); + +┌────────────┐ +│ timezone() │ +│ String │ +├────────────┤ +│ UTC │ +└────────────┘ + +1 row in 0.010 sec. Processed 1 rows, 1B (104.34 rows/s, 104B/s) +``` +### Example 2: Control Parallel Processing for COPY INTO + +In Databend, the *max_threads* setting specifies the maximum number of threads that can be utilized to execute a request. By default, this value is typically set to match the number of CPU cores available on the machine. + +When loading data into Databend with COPY INTO, you can control the parallel processing capabilities by injecting hints into the COPY INTO command and setting the *max_threads* parameter. For example: + +```sql +COPY /*+ set_var(max_threads=6) */ INTO mytable FROM @mystage/ pattern='.*[.]parq' FILE_FORMAT=(TYPE=parquet); +``` + +### Example 3: Set Deduplicate Label + +```sql +CREATE TABLE t1(a Int, b bool); +INSERT /*+ SET_VAR(deduplicate_label='databend') */ INTO t1 (a, b) VALUES(1, false); +SELECT * FROM t1; + +a|b| +-+-+ +1|0| + +UPDATE /*+ SET_VAR(deduplicate_label='databend') */ t1 SET a = 20 WHERE b = false; +SELECT * FROM t1; + +a|b| +-+-+ +1|0| + +REPLACE /*+ SET_VAR(deduplicate_label='databend') */ INTO t1 on(a,b) VALUES(40, false); +SELECT * FROM t1; + +a|b| +-+-+ +1|0| + +MERGE /*+ SET_VAR(deduplicate_label='databend') */ INTO t1 using t2 on t1.a = t2.a when matched then update *; +SELECT * FROM t1; + +a|b| +-+-+ +1|0| +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/set-variable.md b/tidb-cloud-lake/sql/set-variable.md new file mode 100644 index 0000000000000..fa4fe1f2776b2 --- /dev/null +++ b/tidb-cloud-lake/sql/set-variable.md @@ -0,0 +1,100 @@ +--- +title: SET VARIABLE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Sets the value of one or more SQL variables within a session. The values can be simple constants, expressions, query results, or database objects. Variables persist for the duration of your session and can be used in subsequent queries. + +## Syntax + +```sql +-- Set one variable +SET VARIABLE = + +-- Set more than one variable +SET VARIABLE (, , ...) = (, , ...) + +-- Set multiple variables from a query result +SET VARIABLE (, , ...) = +``` + +## Accessing Variables + +Variables can be accessed using the dollar sign syntax: `$variable_name` + +## Examples + +### Setting a Single Variable + +```sql +-- Sets variable a to the string 'databend' +SET VARIABLE a = 'databend'; + +-- Access the variable +SELECT $a; +┌─────────┐ +│ $a │ +├─────────┤ +│ databend│ +└─────────┘ +``` + +### Setting Multiple Variables + +```sql +-- Sets variable x to 'xx' and y to 'yy' +SET VARIABLE (x, y) = ('xx', 'yy'); + +-- Access multiple variables +SELECT $x, $y; +┌────┬────┐ +│ $x │ $y │ +├────┼────┤ +│ xx │ yy │ +└────┴────┘ +``` + +### Setting Variables from Query Results + +```sql +-- Sets variable a to 3 and b to 55 +SET VARIABLE (a, b) = (SELECT 3, 55); + +-- Access the variables +SELECT $a, $b; +┌────┬────┐ +│ $a │ $b │ +├────┼────┤ +│ 3 │ 55 │ +└────┴────┘ +``` + +### Dynamic Table References + +Variables can be used with the `IDENTIFIER()` function to dynamically reference database objects: + +```sql +-- Create a sample table +CREATE OR REPLACE TABLE monthly_sales(empid INT, amount INT, month TEXT) AS SELECT 1, 2, '3'; + +-- Set a variable 't' to the name of the table 'monthly_sales' +SET VARIABLE t = 'monthly_sales'; + +-- Access the variable directly +SELECT $t; +┌──────────────┐ +│ $t │ +├──────────────┤ +│ monthly_sales│ +└──────────────┘ + +-- Use IDENTIFIER to dynamically reference the table name stored in the variable 't' +SELECT * FROM IDENTIFIER($t); +┌───────┬────────┬───────┐ +│ empid │ amount │ month │ +├───────┼────────┼───────┤ +│ 1 │ 2 │ 3 │ +└───────┴────────┴───────┘ +``` diff --git a/tidb-cloud-lake/sql/set.md b/tidb-cloud-lake/sql/set.md new file mode 100644 index 0000000000000..da7692e0c5d6e --- /dev/null +++ b/tidb-cloud-lake/sql/set.md @@ -0,0 +1,41 @@ +--- +title: SET +--- + +Changes the value of a system setting for the current session. To show all the current settings, use [SHOW SETTINGS](/tidb-cloud-lake/sql/show-settings.md). + +See also: +- [SETTINGS Clause](/tidb-cloud-lake/sql/settings-clause.md) +- [SET_VAR](/tidb-cloud-lake/sql/set-var.md) +- [UNSET](/tidb-cloud-lake/sql/unset.md) + +## Syntax + +```sql +SET [ SESSION | GLOBAL ] = +``` + +| Parameter | Description | +|-----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| SESSION | Applies the setting change at the session level. If omitted, it applies to the session level by default. | +| GLOBAL | Applies the setting change at the global level, rather than just the current session. For more information about the setting levels, see [Setting Levels](/tidb-cloud-lake/sql/show-settings.md#setting-levels). | + +## Examples + +The following example sets the `max_memory_usage` setting to `4 GB`: + +```sql +SET max_memory_usage = 1024*1024*1024*4; +``` + +The following example sets the `max_threads` setting to `4`: + +```sql +SET max_threads = 4; +``` + +The following example sets the `max_threads` setting to `4` and changes it to be a global-level setting: + +```sql +SET GLOBAL max_threads = 4; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/settings-clause.md b/tidb-cloud-lake/sql/settings-clause.md new file mode 100644 index 0000000000000..9bdd480450ba0 --- /dev/null +++ b/tidb-cloud-lake/sql/settings-clause.md @@ -0,0 +1,115 @@ +--- +title: SETTINGS Clause +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +The SETTINGS clause configures specific settings that influence the execution behavior of the SQL statement it precedes. To view the available settings in Databend and their values, use [SHOW SETTINGS](/tidb-cloud-lake/sql/show-settings.md). + +See also: [SET](../50-administration-cmds/02-set-global.md) + +## Syntax + +```sql +SETTINGS ( = [, = , ...] ) +``` + +## Supported Statements + +The SETTINGS clause can be used with the following SQL statements: + +- [SELECT](/tidb-cloud-lake/sql/select.md) +- [INSERT](/tidb-cloud-lake/sql/insert.md) +- [INSERT (multi-table)](/tidb-cloud-lake/sql/insert-multi-table.md) +- [MERGE](/tidb-cloud-lake/sql/merge.md) +- [`COPY INTO
`](/tidb-cloud-lake/sql/copy-into-table.md) +- [`COPY INTO `](/tidb-cloud-lake/sql/copy-into-location.md) +- [UPDATE](/tidb-cloud-lake/sql/update.md) +- [DELETE](/tidb-cloud-lake/sql/delete.md) +- [CREATE TABLE](/tidb-cloud-lake/sql/create-table.md) +- [EXPLAIN](/tidb-cloud-lake/sql/explain.md) + +## Examples + +This example demonstrates how the SETTINGS clause can be used to adjust the timezone parameter in a SELECT query, impacting the displayed result of `now()`: + +```sql +-- When no timezone is set, Databend defaults to UTC, so now() returns the current UTC timestamp +SELECT timezone(), now(); + +┌─────────────────────────────────────────┐ +│ timezone() │ now() │ +│ String │ Timestamp │ +├────────────┼────────────────────────────┤ +│ UTC │ 2024-11-04 19:42:28.424925 │ +└─────────────────────────────────────────┘ + +-- By setting the timezone to Asia/Shanghai, the now() function returns the local time in Shanghai, which is 8 hours ahead of UTC. +SETTINGS (timezone = 'Asia/Shanghai') SELECT timezone(), now(); + +┌────────────────────────────────────────────┐ +│ timezone() │ now() │ +├───────────────┼────────────────────────────┤ +│ Asia/Shanghai │ 2024-11-05 03:42:42.209404 │ +└────────────────────────────────────────────┘ + +-- Setting the timezone to America/Toronto adjusts the now() output to the local time in Toronto, reflecting the Eastern Time Zone (UTC-5 or UTC-4 during daylight saving time). +SETTINGS (timezone = 'America/Toronto') SELECT timezone(), now(); + +┌──────────────────────────────────────────────┐ +│ timezone() │ now() │ +│ String │ Timestamp │ +├─────────────────┼────────────────────────────┤ +│ America/Toronto │ 2024-11-04 14:42:48.353577 │ +└──────────────────────────────────────────────┘ +``` + +This example demonstrates how to use the date_format_style setting to switch between MySQL and Oracle date formatting styles: + +```sql +-- Default MySQL style date formatting +SELECT to_string('2024-04-05'::DATE, '%b'); + +┌────────────────────────────────┐ +│ to_string('2024-04-05', '%b') │ +├────────────────────────────────┤ +│ Apr │ +└────────────────────────────────┘ + +-- Oracle style date formatting +SETTINGS (date_format_style = 'Oracle') SELECT to_string('2024-04-05'::DATE, 'MON'); + +┌────────────────────────────────┐ +│ to_string('2024-04-05', 'MON') │ +├────────────────────────────────┤ +│ Apr │ +└────────────────────────────────┘ +``` + +This example shows how the week_start setting affects week-related date functions: + +```sql +-- Default week_start = 1 (Monday as first day of week) +SELECT date_trunc(WEEK, to_date('2024-04-03')); -- Wednesday + +┌────────────────────────────────────────┐ +│ date_trunc(WEEK, to_date('2024-04-03')) │ +├────────────────────────────────────────┤ +│ 2024-04-01 │ +└────────────────────────────────────────┘ + +-- Setting week_start = 0 (Sunday as first day of week) +SETTINGS (week_start = 0) SELECT date_trunc(WEEK, to_date('2024-04-03')); -- Wednesday + +┌────────────────────────────────────────┐ +│ date_trunc(WEEK, to_date('2024-04-03')) │ +├────────────────────────────────────────┤ +│ 2024-03-31 │ +└────────────────────────────────────────┘ +``` + +This example allows the COPY INTO operation to utilize up to 100 threads for parallel processing: + +```sql +SETTINGS (max_threads = 100) COPY INTO ... \ No newline at end of file diff --git a/tidb-cloud-lake/sql/sha-functions.md b/tidb-cloud-lake/sql/sha-functions.md new file mode 100644 index 0000000000000..f9cd704d8bca6 --- /dev/null +++ b/tidb-cloud-lake/sql/sha-functions.md @@ -0,0 +1,23 @@ +--- +title: SHA2 +--- + +Calculates the SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). If the hash length is not one of the permitted values, the return value is NULL. Otherwise, the function result is a hash value containing the desired number of bits as a string of hexadecimal digits. + +## Syntax + +```sql +SHA2(, ) +``` + +## Examples + +```sql +SELECT SHA2('1234567890', 0); + +┌──────────────────────────────────────────────────────────────────┐ +│ sha2('1234567890', 0) │ +├──────────────────────────────────────────────────────────────────┤ +│ c775e7b757ede630cd0aa1113bd102661ab38829ca52a6422ab782862f268646 │ +└──────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/sha-sql.md b/tidb-cloud-lake/sql/sha-sql.md new file mode 100644 index 0000000000000..9bb7de8f3e3a6 --- /dev/null +++ b/tidb-cloud-lake/sql/sha-sql.md @@ -0,0 +1,5 @@ +--- +title: SHA1 +--- + +Alias for [SHA](/tidb-cloud-lake/sql/sha.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/sha.md b/tidb-cloud-lake/sql/sha.md new file mode 100644 index 0000000000000..e0d708577acd6 --- /dev/null +++ b/tidb-cloud-lake/sql/sha.md @@ -0,0 +1,27 @@ +--- +title: SHA +--- + +Calculates an SHA-1 160-bit checksum for the string, as described in RFC 3174 (Secure Hash Algorithm). The value is returned as a string of 40 hexadecimal digits or NULL if the argument was NULL. + +## Syntax + +```sql +SHA() +``` + +## Aliases + +- [SHA1](/tidb-cloud-lake/sql/sha.md) + +## Examples + +```sql +SELECT SHA('1234567890'), SHA1('1234567890'); + +┌─────────────────────────────────────────────────────────────────────────────────────┐ +│ sha('1234567890') │ sha1('1234567890') │ +├──────────────────────────────────────────┼──────────────────────────────────────────┤ +│ 01b307acba4f54f55aafc33bb06bbbf6ca803e9a │ 01b307acba4f54f55aafc33bb06bbbf6ca803e9a │ +└─────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-columns.md b/tidb-cloud-lake/sql/show-columns.md new file mode 100644 index 0000000000000..d11ae0801275b --- /dev/null +++ b/tidb-cloud-lake/sql/show-columns.md @@ -0,0 +1,54 @@ +--- +title: SHOW COLUMNS +sidebar_position: 13 +--- + +Shows information about the columns in a given table. + +:::tip +[DESCRIBE TABLE](/tidb-cloud-lake/sql/describe-table.md) provides similar but less information about the columns of a table. +::: + +## Syntax + +```sql +SHOW [ FULL ] COLUMNS + {FROM | IN} tbl_name + [ {FROM | IN} db_name ] + [ LIKE '' | WHERE ] +``` + +When the optional keyword FULL is included, Databend will add the collation, privileges, and comment information for each column in the table to the result. + +## Examples + +```sql +CREATE TABLE books + ( + price FLOAT Default 0.00, + pub_time DATETIME Default '1900-01-01', + author VARCHAR + ); + +SHOW COLUMNS FROM books FROM default; + +Field |Type |Null|Default |Extra|Key| +--------+---------+----+------------+-----+---+ +author |VARCHAR |NO | | | | +price |FLOAT |NO |0.00 | | | +pub_time|TIMESTAMP|NO |'1900-01-01'| | | + +SHOW FULL COLUMNS FROM books; + +Field |Type |Null|Default |Extra|Key|Collation|Privileges|Comment| +--------+---------+----+------------+-----+---+---------+----------+-------+ +author |VARCHAR |NO | | | | | | | +price |FLOAT |NO |0.00 | | | | | | +pub_time|TIMESTAMP|NO |'1900-01-01'| | | | | | + +SHOW FULL COLUMNS FROM books LIKE 'a%' + +Field |Type |Null|Default|Extra|Key|Collation|Privileges|Comment| +------+-------+----+-------+-----+---+---------+----------+-------+ +author|VARCHAR|NO | | | | | | | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-connections.md b/tidb-cloud-lake/sql/show-connections.md new file mode 100644 index 0000000000000..a61110479ec05 --- /dev/null +++ b/tidb-cloud-lake/sql/show-connections.md @@ -0,0 +1,27 @@ +--- +title: SHOW CONNECTIONS +sidebar_position: 2 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Displays a list of all available connections. + +## Syntax + +```sql +SHOW CONNECTIONS +``` + +## Examples + +```sql +SHOW CONNECTIONS; + +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ storage_type │ storage_params │ +├─────────┼──────────────┼───────────────────────────────────────────────────────────────────────────────────┤ +│ toronto │ s3 │ access_key_id= secret_access_key= │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-create-database.md b/tidb-cloud-lake/sql/show-create-database.md new file mode 100644 index 0000000000000..fceed18f614ec --- /dev/null +++ b/tidb-cloud-lake/sql/show-create-database.md @@ -0,0 +1,23 @@ +--- +title: SHOW CREATE DATABASE +sidebar_position: 2 +--- + +Shows the CREATE DATABASE statement that creates the named database. + +## Syntax + +```sql +SHOW CREATE DATABASE database_name +``` + +## Examples + +```sql +SHOW CREATE DATABASE default; ++----------+---------------------------+ +| Database | Create Database | ++----------+---------------------------+ +| default | CREATE DATABASE `default` | ++----------+---------------------------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-create-table.md b/tidb-cloud-lake/sql/show-create-table.md new file mode 100644 index 0000000000000..367e413c2959f --- /dev/null +++ b/tidb-cloud-lake/sql/show-create-table.md @@ -0,0 +1,38 @@ +--- +title: SHOW CREATE TABLE +sidebar_position: 10 +--- + +Shows the CREATE TABLE statement for the specified table. To include the Fuse Engine options in the result, set `hide_options_in_show_create_table` to `0`. + +## Syntax + +```sql +SHOW CREATE TABLE [ . ] +``` + +## Examples + +This example shows how to display the full CREATE TABLE statement, including the Fuse Engine options, by setting `hide_options_in_show_create_table` to `0`: + +```sql +CREATE TABLE fuse_table (a int); + +SHOW CREATE TABLE fuse_table; + +-[ RECORD 1 ]----------------------------------- + Table: fuse_table +Create Table: CREATE TABLE fuse_table ( + a INT NULL +) ENGINE=FUSE + +SET hide_options_in_show_create_table=0; + +SHOW CREATE TABLE fuse_table; + +-[ RECORD 1 ]----------------------------------- + Table: fuse_table +Create Table: CREATE TABLE fuse_table ( + a INT NULL +) ENGINE=FUSE COMPRESSION='lz4' DATA_RETENTION_PERIOD_IN_HOURS='240' STORAGE_FORMAT='native' +``` diff --git a/tidb-cloud-lake/sql/show-databases.md b/tidb-cloud-lake/sql/show-databases.md new file mode 100644 index 0000000000000..2c022ff1055ce --- /dev/null +++ b/tidb-cloud-lake/sql/show-databases.md @@ -0,0 +1,54 @@ +--- +title: SHOW DATABASES +sidebar_position: 5 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Shows the list of databases that exist on the instance. + +See also: [system.databases](/tidb-cloud-lake/sql/system-databases.md) + +## Syntax + +```sql +SHOW [ FULL ] DATABASES + [ LIKE '' | WHERE ] +``` + +| Parameter | Description | +|-----------|-----------------------------------------------------------------------------------------------------------------------------| +| FULL | Lists the results with additional information. See [Examples](#examples) for more details. | +| LIKE | Filters the results by their names using case-sensitive pattern matching. | +| WHERE | Filters the results using an expression in the WHERE clause. | + +## Examples + +```sql +SHOW DATABASES; + +┌──────────────────────┐ +│ databases_in_default │ +├──────────────────────┤ +│ canada │ +│ china │ +│ default │ +│ information_schema │ +│ system │ +│ test │ +└──────────────────────┘ + +SHOW FULL DATABASES; + +┌───────────────────────────────────────────────────┐ +│ catalog │ owner │ databases_in_default │ +├─────────┼──────────────────┼──────────────────────┤ +│ default │ account_admin │ canada │ +│ default │ account_admin │ china │ +│ default │ NULL │ default │ +│ default │ NULL │ information_schema │ +│ default │ NULL │ system │ +│ default │ account_admin │ test │ +└───────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-drop-databases.md b/tidb-cloud-lake/sql/show-drop-databases.md new file mode 100644 index 0000000000000..e279ee7f86e6a --- /dev/null +++ b/tidb-cloud-lake/sql/show-drop-databases.md @@ -0,0 +1,44 @@ +--- +title: SHOW DROP DATABASES +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Lists all databases along with their deletion timestamps if they have been dropped, allowing users to review deleted databases and their details. + +- Dropped databases can only be retrieved if they are within the data retention period. +- It is recommended to use an admin user, such as `root`. If you are using Databend Cloud, use a user with the `account_admin` role to query dropped databases. + +See also: [system.databases_with_history](/tidb-cloud-lake/sql/system-databases-with-history.md) + +## Syntax + +```sql +SHOW DROP DATABASES + [ FROM ] + [ LIKE '' | WHERE ] +``` + +## Examples + +```sql +-- Create a new database named my_db +CREATE DATABASE my_db; + +-- Drop the database my_db +DROP DATABASE my_db; + +-- If a database has been dropped, dropped_on shows the deletion time; +-- If it is still active, dropped_on is NULL. +SHOW DROP DATABASES; + +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ catalog │ name │ database_id │ dropped_on │ +├─────────┼────────────────────┼─────────────────────┼────────────────────────────┤ +│ default │ default │ 1 │ NULL │ +│ default │ information_schema │ 4611686018427387906 │ NULL │ +│ default │ my_db │ 114 │ 2024-11-15 02:44:46.207120 │ +│ default │ system │ 4611686018427387905 │ NULL │ +└─────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-drop-tables.md b/tidb-cloud-lake/sql/show-drop-tables.md new file mode 100644 index 0000000000000..e2a49454316c4 --- /dev/null +++ b/tidb-cloud-lake/sql/show-drop-tables.md @@ -0,0 +1,43 @@ +--- +title: SHOW DROP TABLES +sidebar_position: 11 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Lists the dropped tables in the current or a specified database. + +See also: [system.tables_with_history](/tidb-cloud-lake/sql/system-tables-with-history.md) + +## Syntax + +```sql +SHOW DROP TABLES [ FROM ] [ LIKE '' | WHERE ] +``` + +## Examples + +```sql +USE database1; + +-- List dropped tables in the current database +SHOW DROP TABLES; + +-- List dropped tables in the "default" database +SHOW DROP TABLES FROM default; + +Name |Value | +--------------------+-----------------------------+ +tables |t1 | +table_type |BASE TABLE | +database |default | +catalog |default | +engine |FUSE | +create_time |2023-06-13 08:43:36.556 +0000| +drop_time |2023-07-19 04:39:18.536 +0000| +num_rows |2 | +data_size |34 | +data_compressed_size|330 | +index_size |464 | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-fields.md b/tidb-cloud-lake/sql/show-fields.md new file mode 100644 index 0000000000000..e21f652d84e0e --- /dev/null +++ b/tidb-cloud-lake/sql/show-fields.md @@ -0,0 +1,35 @@ +--- +title: SHOW FIELDS +sidebar_position: 12 +--- + +Shows information about the columns in a given table. Equivalent to [DESCRIBE TABLE](/tidb-cloud-lake/sql/describe-table.md). + +:::tip +[SHOW COLUMNS](show-full-columns.md) provides similar but more information about the columns of a table. +::: + +## Syntax + +```sql +SHOW FIELDS FROM [ . ] +``` + +## Examples + +```sql +CREATE TABLE books + ( + price FLOAT Default 0.00, + pub_time DATETIME Default '1900-01-01', + author VARCHAR + ); + +SHOW FIELDS FROM books; + +Field |Type |Null|Default |Extra| +--------+---------+----+----------------------------+-----+ +price |FLOAT |YES |0 | | +pub_time|TIMESTAMP|YES |'1900-01-01 00:00:00.000000'| | +author |VARCHAR |YES |NULL | | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-file-formats.md b/tidb-cloud-lake/sql/show-file-formats.md new file mode 100644 index 0000000000000..9c8d58b532991 --- /dev/null +++ b/tidb-cloud-lake/sql/show-file-formats.md @@ -0,0 +1,24 @@ +--- +title: SHOW FILE FORMATS +sidebar_position: 2 +--- + +Returns a list of created file formats. + +## Syntax + +```sql +SHOW FILE FORMATS; +``` + +## Examples + +```sql +SHOW FILE FORMATS; + ++---------------+------------------------------------------------------------------------------------------------------------------------+ +| name | format_options | ++---------------+------------------------------------------------------------------------------------------------------------------------+ +| my_custom_csv | TYPE = CSV FIELD_DELIMITER = '\t' RECORD_DELIMITER = '\n' QUOTE = '\"' ESCAPE = '' SKIP_HEADER = 0 NAN_DISPLAY = 'NaN' | ++---------------+------------------------------------------------------------------------------------------------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/show-functions.md b/tidb-cloud-lake/sql/show-functions.md new file mode 100644 index 0000000000000..80ff1f9e98680 --- /dev/null +++ b/tidb-cloud-lake/sql/show-functions.md @@ -0,0 +1,68 @@ +--- +title: SHOW FUNCTIONS +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Lists the currently supported built-in scalar and aggregate functions. + +See also: [system.functions](/tidb-cloud-lake/sql/system-functions.md) + +## Syntax + +```sql +SHOW FUNCTIONS [LIKE '' | WHERE ] | [LIMIT ] +``` + +## Example + +```sql +SHOW FUNCTIONS; + ++-------------------------+--------------+---------------------------+ +| name | is_aggregate | description | ++-------------------------+--------------+---------------------------+ +| != | 0 | | +| % | 0 | | +| * | 0 | | +| + | 0 | | +| - | 0 | | +| / | 0 | | +| < | 0 | | +| <= | 0 | | +| <> | 0 | | +| = | 0 | | ++-------------------------+--------------+---------------------------+ +``` + +Showing the functions begin with `"today"`: + +```sql +SHOW FUNCTIONS LIKE 'today%'; + ++--------------+--------------+-------------+ +| name | is_aggregate | description | ++--------------+--------------+-------------+ +| today | 0 | | +| todayofmonth | 0 | | +| todayofweek | 0 | | +| todayofyear | 0 | | ++--------------+--------------+-------------+ +``` + +Showing the functions begin with `"today"` with `WHERE`: + +```sql +SHOW FUNCTIONS WHERE name LIKE 'today%'; + ++--------------+--------------+-------------+ +| name | is_aggregate | description | ++--------------+--------------+-------------+ +| today | 0 | | +| todayofmonth | 0 | | +| todayofweek | 0 | | +| todayofyear | 0 | | ++--------------+--------------+-------------+ +``` diff --git a/tidb-cloud-lake/sql/show-grants-sql.md b/tidb-cloud-lake/sql/show-grants-sql.md new file mode 100644 index 0000000000000..c7cb786584b6e --- /dev/null +++ b/tidb-cloud-lake/sql/show-grants-sql.md @@ -0,0 +1,138 @@ +--- +title: SHOW_GRANTS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Lists privileges granted to roles, role assignments for users, or privileges on a specific object. + +See also: [SHOW GRANTS](/tidb-cloud-lake/sql/show-grants.md) + +## Syntax + +```sql +SHOW_GRANTS('role', '') +SHOW_GRANTS('user', '') +SHOW_GRANTS('stage', '') +SHOW_GRANTS('udf', '') +SHOW_GRANTS('table', '', '', '') +SHOW_GRANTS('database', '', '') +``` + +## Configuring `enable_expand_roles` Setting + +The `enable_expand_roles` setting controls whether the SHOW_GRANTS function expands role inheritance when displaying privileges. + +- `enable_expand_roles=1` (default): + + - SHOW_GRANTS recursively expands inherited privileges, meaning that if a role has been granted another role, it will display all the inherited privileges. + - Users will also see all privileges granted through their assigned roles. + +- `enable_expand_roles=0`: + + - SHOW_GRANTS only displays privileges that are directly assigned to the specified role or user. + - However, the result will still include GRANT ROLE statements to indicate role inheritance. + +For example, role `a` has the `SELECT` privilege on `t1`, and role `b` has the `SELECT` privilege on `t2`: + +```sql +SELECT grants FROM show_grants('role', 'a') ORDER BY object_id; + +┌──────────────────────────────────────────────────────┐ +│ grants │ +├──────────────────────────────────────────────────────┤ +│ GRANT SELECT ON 'default'.'default'.'t1' TO ROLE `a` │ +└──────────────────────────────────────────────────────┘ + +SELECT grants FROM show_grants('role', 'b') ORDER BY object_id; + +┌──────────────────────────────────────────────────────┐ +│ grants │ +├──────────────────────────────────────────────────────┤ +│ GRANT SELECT ON 'default'.'default'.'t2' TO ROLE `b` │ +└──────────────────────────────────────────────────────┘ +``` + +If you grant role `b` to role `a` and check the grants on role `a` again, you can see than the `SELECT` privilege on `t2` is now included in role `a`: + +```sql +GRANT ROLE b TO ROLE a; +``` + +```sql +SELECT grants FROM show_grants('role', 'a') ORDER BY object_id; + +┌──────────────────────────────────────────────────────┐ +│ grants │ +├──────────────────────────────────────────────────────┤ +│ GRANT SELECT ON 'default'.'default'.'t1' TO ROLE `a` │ +│ GRANT SELECT ON 'default'.'default'.'t2' TO ROLE `a` │ +└──────────────────────────────────────────────────────┘ +``` + +If you set `enable_expand_roles` to `0` and check the grants on role `a` again, the result will show the `GRANT ROLE` statement instead of listing the specific privileges inherited from role `b`: + +```sql +SET enable_expand_roles=0; +``` + +```sql +SELECT grants FROM show_grants('role', 'a') ORDER BY object_id; + +┌──────────────────────────────────────────────────────┐ +│ grants │ +├──────────────────────────────────────────────────────┤ +│ GRANT SELECT ON 'default'.'default'.'t1' TO ROLE `a` │ +│ GRANT ROLE b to ROLE `a` │ +│ GRANT ROLE public to ROLE `a` │ +└──────────────────────────────────────────────────────┘ +``` + +## Examples + +This example illustrates how to list grants for a user, privileges granted to a role, and privileges on a specific object. + +```sql +-- Create a new user +CREATE USER 'user1' IDENTIFIED BY 'password'; + +-- Create a new role +CREATE ROLE analyst; + +-- Grant the analyst role to the user +GRANT ROLE analyst TO 'user1'; + +-- Create a stage +CREATE STAGE my_stage; + +-- Grant privileges on the stage to the role +GRANT READ ON STAGE my_stage TO ROLE analyst; + +-- List grants for the user +SELECT * FROM SHOW_GRANTS('user', 'user1'); + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ privileges │ object_name │ object_id │ grant_to │ name │ grants │ +├────────────┼─────────────┼──────────────────┼──────────┼────────┼─────────────────────────────────────────────┤ +│ Read │ my_stage │ NULL │ USER │ user1 │ GRANT Read ON STAGE my_stage TO 'user1'@'%' │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ + +-- List privileges granted to the role +SELECT * FROM SHOW_GRANTS('role', 'analyst'); + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ privileges │ object_name │ object_id │ grant_to │ name │ grants │ +├────────────┼─────────────┼──────────────────┼──────────┼─────────┼────────────────────────────────────────────────┤ +│ Read │ my_stage │ NULL │ ROLE │ analyst │ GRANT Read ON STAGE my_stage TO ROLE `analyst` │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ + +-- List privileges granted on the stage +SELECT * FROM SHOW_GRANTS('stage', 'my_stage'); + +┌─────────────────────────────────────────────────────────────────────────────────────┐ +│ privileges │ object_name │ object_id │ grant_to │ name │ grants │ +├────────────┼─────────────┼──────────────────┼──────────┼─────────┼──────────────────┤ +│ Read │ my_stage │ NULL │ ROLE │ analyst │ │ +└─────────────────────────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/show-grants.md b/tidb-cloud-lake/sql/show-grants.md new file mode 100644 index 0000000000000..d2b3c6f092559 --- /dev/null +++ b/tidb-cloud-lake/sql/show-grants.md @@ -0,0 +1,97 @@ +--- +title: SHOW GRANTS +sidebar_position: 10 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Lists privileges granted to roles, role assignments for users, or privileges on a specific object. + +See also: + +- [SHOW_GRANTS](/tidb-cloud-lake/sql/show-grants.md) +- [GRANT](/tidb-cloud-lake/sql/grant.md) +- [REVOKE](/tidb-cloud-lake/sql/revoke.md) + +## Syntax + +```sql +-- List grants for a user +SHOW GRANTS FOR [ LIKE '' | WHERE | LIMIT ] + +-- List privileges granted to a role +SHOW GRANTS FOR ROLE [ LIKE '' | WHERE | LIMIT ] + +-- List privileges granted on an object +SHOW GRANTS ON { STAGE | TABLE | DATABASE | UDF | MASKING POLICY | ROW ACCESS POLICY } [ LIKE '' | WHERE | LIMIT ] + +-- Lists all users and roles that have been directly granted role_name. +SHOW GRANTS OF ROLE + +``` + +## Examples + +This example illustrates how to list grants for a user, privileges granted to a role, and privileges on a specific object. + +```sql +-- Create a new user +CREATE USER 'user1' IDENTIFIED BY 'password'; + +-- Create a new role +CREATE ROLE analyst; + +-- Grant the analyst role to the user +GRANT ROLE analyst TO 'user1'; + +-- Create a database +CREATE DATABASE my_db; + +-- Grant privileges on the database to the role +GRANT OWNERSHIP ON my_db.* TO ROLE analyst; + +-- List privileges granted to the user +SHOW GRANTS FOR user1; + + +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ privileges │ object_name │ object_id │ grant_to │ name │ grants │ +├────────────┼─────────────┼──────────────────┼──────────┼────────┼──────────────────────────────────────────────────────┤ +│ ROLE │ NULL │ NULL │ USER │ user1 │ GRANT ROLE analyst TO 'user1'@'%' │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ + +-- List privileges granted to the role +SHOW GRANTS FOR ROLE analyst; + +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ privileges │ object_name │ object_id │ grant_to │ name │ grants │ +├────────────┼─────────────┼──────────────────┼──────────┼─────────┼──────────────────────────────────────────────────────────┤ +│ OWNERSHIP │ my_db │ 16 │ ROLE │ analyst │ GRANT OWNERSHIP ON 'default'.'my_db'.* TO ROLE `analyst` │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +-- List privileges granted on the database +SHOW GRANTS ON DATABASE my_db; + +┌─────────────────────────────────────────────────────────────────────────────────────┐ +│ privileges │ object_name │ object_id │ grant_to │ name │ grants ├────────────┼─────────────┼──────────────────┼──────────┼─────────┼──────────────────┤ +│ OWNERSHIP │ my_db │ 16 │ ROLE │ analyst │ │ +└─────────────────────────────────────────────────────────────────────────────────────┘ + +-- Lists all users and roles that have been directly granted role_name. +-- This command displays only the direct grantees of role_name. +-- This means it lists users and roles that have explicitly received the role through a GRANT ROLE role_name TO statement. +-- It does not show users or roles that acquire role_name indirectly via role hierarchies or inheritance. +SHOW GRANTS OF ROLE analyst + +╭─────────────────────────────────────╮ +│ role │ granted_to │ grantee_name │ +│ String │ String │ String │ +├─────────┼────────────┼──────────────┤ +│ analyst │ USER │ user1 │ +╰─────────────────────────────────────╯ + +SHOW GRANTS ON MASKING POLICY email_mask; + +-- Inspect row access policy privileges +SHOW GRANTS ON ROW ACCESS POLICY rap_region; +``` diff --git a/tidb-cloud-lake/sql/show-indexes.md b/tidb-cloud-lake/sql/show-indexes.md new file mode 100644 index 0000000000000..3eddabdc73ad1 --- /dev/null +++ b/tidb-cloud-lake/sql/show-indexes.md @@ -0,0 +1,34 @@ +--- +title: SHOW INDEXES +sidebar_position: 3 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Shows the created indexes. Equivalent to `SELECT * FROM system.indexes`. + +See also: [system.indexes](/tidb-cloud-lake/sql/system-indexes.md) + +## Syntax + +```sql +SHOW INDEXES [LIKE '' | WHERE ] | [LIMIT ] +``` + +## Example + +```sql +CREATE TABLE t1(a int,b int); + +CREATE AGGREGATING INDEX agg_idx AS SELECT avg(a), abs(sum(b)), abs(b) AS bs FROM t1 GROUP BY bs; + +SHOW INDEXES; + + +┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ type │ original │ definition │ created_on │ updated_on │ +├─────────┼─────────────┼──────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────┼─────────────────────┤ +│ agg_idx │ AGGREGATING │ SELECT avg(a), abs(sum(b)), abs(b) AS bs FROM default.t1 GROUP BY bs │ SELECT abs(b) AS bs, COUNT(), COUNT(a), SUM(a), SUM(b) FROM default.t1 GROUP BY bs │ 2024-01-29 07:15:34.856234 │ NULL │ +└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-locks.md b/tidb-cloud-lake/sql/show-locks.md new file mode 100644 index 0000000000000..fd97fc4701d76 --- /dev/null +++ b/tidb-cloud-lake/sql/show-locks.md @@ -0,0 +1,64 @@ +--- +title: SHOW LOCKS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Provides a list of active transactions currently holding locks on tables, either for the current user across all their sessions or for all users within the Databend system. A lock is a synchronization mechanism that restricts access to shared resources, such as tables, ensuring orderly and controlled interactions among processes or threads within the Databend system to maintain data consistency and prevent conflicts. + +The operations, such as [UPDATE](/tidb-cloud-lake/sql/update.md), [DELETE](/tidb-cloud-lake/sql/delete.md), [OPTIMIZE TABLE](/tidb-cloud-lake/sql/optimize-table.md), [RECLUSTER TABLE](/tidb-cloud-lake/sql/recluster-table.md), and [ALTER TABLE](/tidb-cloud-lake/sql/alter-table.md#column-operations), can result in table locks in the system. The table lock feature is enabled by default. In case of resource conflicts, you can examine specific details using the command. To disable this feature, execute `set enable_table_lock=0;`. + +## Syntax + +```sql +SHOW LOCKS [IN ACCOUNT] [WHERE ] +``` + +| Parameter | Description | +|------------|-----------------------------------------------------------------------------------------------------------------------------------------------------| +| IN ACCOUNT | Displays lock information for all users within the Databend system. If omitted, the command returns locks for the current user across all sessions. | +| WHERE | Filters locks based on the status; valid values include `HOLDING` and `WAITING`. | + +## Output + +The command returns the lock information in a table with these columns: + +| Column | Description | +|-------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| table_id | Internal ID for the table associated with the lock. | +| revision | Revision number indicating the version of the transaction that initiated the lock. Commencing at 0, this number increases with each subsequent transaction, establishing a comprehensive order across all transactions. | +| type | The type of lock, such as `TABLE`. | +| status | The status of the lock, such as `HOLDING` or `WAITING`. | +| user | The user associated with the lock. | +| node | The identifier of query node where the lock is held. | +| query_id | The query session ID related to the lock. Use it to [KILL](/tidb-cloud-lake/sql/kill.md) a query in case of dead locks or excessively prolonged lock holdings. | +| created_on | Timestamp when the transaction that initiated the lock was created. | +| acquired_on | Timestamp when the lock was acquired. | +| extra_info | Additional information related to the lock, if any. | + +## Examples + +```sql +SHOW LOCKS IN ACCOUNT; ++----------+----------+-------+---------+------+------------------------+--------------------------------------+----------------------------+----------------------------+------------+ +| table_id | revision | type | status | user | node | query_id | created_on | acquired_on | extra_info | ++----------+----------+-------+---------+------+------------------------+--------------------------------------+----------------------------+----------------------------+------------+ +| 57 | 4517 | TABLE | HOLDING | root | xzi6pRbLUYasuA9QFB36m6 | d7989971-d5ec-4764-8e37-afe38ebc13e2 | 2023-12-13 09:56:47.295684 | 2023-12-13 09:56:47.310805 | | ++----------+----------+-------+---------+------+------------------------+--------------------------------------+----------------------------+----------------------------+------------+ + +SHOW LOCKS; ++----------+----------+-------+---------+------+------------------------+--------------------------------------+----------------------------+----------------------------+------------+ +| table_id | revision | type | status | user | node | query_id | created_on | acquired_on | extra_info | ++----------+----------+-------+---------+------+------------------------+--------------------------------------+----------------------------+----------------------------+------------+ +| 57 | 4517 | TABLE | HOLDING | root | xzi6pRbLUYasuA9QFB36m6 | d7989971-d5ec-4764-8e37-afe38ebc13e2 | 2023-12-13 09:56:47.295684 | 2023-12-13 09:56:47.310805 | | +| 57 | 4521 | TABLE | WAITING | zzq | xzi6pRbLUYasuA9QFB36m6 | 4bc78044-d4fc-4fe1-a5c5-ff6ab1e3e372 | 2023-12-13 09:56:48.419774 | NULL | | ++----------+----------+-------+---------+------+------------------------+--------------------------------------+----------------------------+----------------------------+------------+ + +SHOW LOCKS WHERE STATUS = 'HOLDING'; ++----------+----------+-------+---------+------+------------------------+--------------------------------------+----------------------------+----------------------------+------------+ +| table_id | revision | type | status | user | node | query_id | created_on | acquired_on | extra_info | ++----------+----------+-------+---------+------+------------------------+--------------------------------------+----------------------------+----------------------------+------------+ +| 57 | 4517 | TABLE | HOLDING | root | xzi6pRbLUYasuA9QFB36m6 | d7989971-d5ec-4764-8e37-afe38ebc13e2 | 2023-12-13 09:56:47.295684 | 2023-12-13 09:56:47.310805 | | ++----------+----------+-------+---------+------+------------------------+--------------------------------------+----------------------------+----------------------------+------------+ +``` diff --git a/tidb-cloud-lake/sql/show-metrics.md b/tidb-cloud-lake/sql/show-metrics.md new file mode 100644 index 0000000000000..88d37bb8abf1f --- /dev/null +++ b/tidb-cloud-lake/sql/show-metrics.md @@ -0,0 +1,31 @@ +--- +title: SHOW METRICS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Shows the list of [system metrics](/tidb-cloud-lake/sql/system-metrics.md). + +## Syntax + +```sql +SHOW METRICS [LIKE '' | WHERE ] | [LIMIT ] +``` + +## Examples + +```sql +SHOW METRICS; ++-----------------------------------+---------+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| metric | kind | labels | value | ++-----------------------------------+---------+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| session_connect_numbers | counter | {} | 1.0 | +| optimizer_optimize_usedtime_sum | untyped | {} | 0.000438079 | +| optimizer_optimize_usedtime_count | untyped | {} | 1.0 | +| parser_parse_usedtime_sum | untyped | {} | 0.000254307 | +| parser_parse_usedtime_count | untyped | {} | 2.0 | +| optimizer_optimize_usedtime | summary | {} | [{"quantile":0.0,"count":0.000438079},{"quantile":0.5,"count":0.000438079},{"quantile":0.9,"count":0.000438079},{"quantile":0.95,"count":0.000438079},{"quantile":0.99,"count":0.000438079},{"quantile":0.999,"count":0.000438079},{"quantile":1.0,"count":0.000438079}] | +| parser_parse_usedtime | summary | {} | [{"quantile":0.0,"count":0.000107972},{"quantile":0.5,"count":0.000107972},{"quantile":0.9,"count":0.000107972},{"quantile":0.95,"count":0.000107972},{"quantile":0.99,"count":0.000107972},{"quantile":0.999,"count":0.000107972},{"quantile":1.0,"count":0.000107972}] | ++-----------------------------------+---------+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` diff --git a/tidb-cloud-lake/sql/show-network-policies.md b/tidb-cloud-lake/sql/show-network-policies.md new file mode 100644 index 0000000000000..89f996c20ad2c --- /dev/null +++ b/tidb-cloud-lake/sql/show-network-policies.md @@ -0,0 +1,27 @@ +--- +title: SHOW NETWORK POLICIES +sidebar_position: 4 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Displays a list of all existing network policies in Databend. It provides information about the available network policies, including their names and whether they have any allowed or blocked IP address lists configured. + +## Syntax + +```sql +SHOW NETWORK POLICIES +``` + +## Examples + +```sql +SHOW NETWORK POLICIES; + +Name |Allowed Ip List |Blocked Ip List|Comment | +------------+----------------+---------------+------------+ +test_policy |192.168.1.0/24 |192.168.1.99 |test comment| +test_policy1|192.168.100.0/24| | | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-password-policies.md b/tidb-cloud-lake/sql/show-password-policies.md new file mode 100644 index 0000000000000..c89215d472eae --- /dev/null +++ b/tidb-cloud-lake/sql/show-password-policies.md @@ -0,0 +1,30 @@ +--- +title: SHOW PASSWORD POLICIES +sidebar_position: 4 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Displays a list of all existing password policies in Databend. + +## Syntax + +```sql +SHOW PASSWORD POLICIES [ LIKE '' ] +``` + +## Examples + +```sql +CREATE PASSWORD POLICY SecureLogin + PASSWORD_MIN_LENGTH = 10; + +SHOW PASSWORD POLICIES; + +┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ comment │ options │ +├─────────────┼─────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ SecureLogin │ │ MIN_LENGTH=10, MAX_LENGTH=256, MIN_UPPER_CASE_CHARS=1, MIN_LOWER_CASE_CHARS=1, MIN_NUMERIC_CHARS=1, MIN_SPECIAL_CHARS=0, MIN_AGE_DAYS=0, MAX_AGE_DAYS=90, MAX_RETRIES=5, LOCKOUT_TIME_MINS=15, HISTORY=0 │ +└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-procedures.md b/tidb-cloud-lake/sql/show-procedures.md new file mode 100644 index 0000000000000..2083aeb22c98d --- /dev/null +++ b/tidb-cloud-lake/sql/show-procedures.md @@ -0,0 +1,26 @@ +--- +title: SHOW PROCEDURES +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns a list of all stored procedures in the system. + +## Syntax + +```sql +SHOW PROCEDURES +``` + +## Examples + +```sql +SHOW PROCEDURES; + +┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ procedure_id │ arguments │ comment │ description │ created_on │ +├──────────────────┼──────────────┼─────────────────────────────────────────────────────────┼──────────────────────────────┼────────────────────────┼────────────────────────────┤ +│ convert_kg_to_lb │ 2104 │ convert_kg_to_lb(Decimal(4, 2)) RETURN (Decimal(10, 2)) │ Converts kilograms to pounds │ user-defined procedure │ 2024-11-07 04:12:25.243143 │ +└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/show-processlist.md b/tidb-cloud-lake/sql/show-processlist.md new file mode 100644 index 0000000000000..23a4cfc7d347f --- /dev/null +++ b/tidb-cloud-lake/sql/show-processlist.md @@ -0,0 +1,28 @@ +--- +title: SHOW PROCESSLIST +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +The Databend process list indicates the operations currently being performed by the set of threads executed within the server. + +See also: [KILL](/tidb-cloud-lake/sql/kill.md) + +## Syntax + +```sql +SHOW PROCESSLIST [LIKE '' | WHERE ] | [LIMIT ] +``` + +## Examples + +```sql +SHOW PROCESSLIST; ++--------------------------------------+-------+-----------------+------+-------+----------+-------------------------------------------------+--------------+------------------------+-------------------------+-------------------------+--------------------------+---------------------+------+ +| id | type | host | user | state | database | extra_info | memory_usage | dal_metrics_read_bytes | dal_metrics_write_bytes | scan_progress_read_rows | scan_progress_read_bytes | mysql_connection_id | time | ++--------------------------------------+-------+-----------------+------+-------+----------+-------------------------------------------------+--------------+------------------------+-------------------------+-------------------------+--------------------------+---------------------+------+ +| c1152483-de11-4375-bfe3-a35ad2ae9311 | MySQL | 127.0.0.1:57636 | root | Query | default | select sum(number) from numbers(10000000000000) | 0 | 0 | 0 | 816930000 | 6535440000 | 9 | 4 | +| ed21393e-6b6b-4efe-b333-1643f531e8ac | MySQL | 127.0.0.1:57637 | root | Query | system | show processlist | 0 | 0 | 0 | 0 | 0 | 10 | 0 | ++--------------------------------------+-------+-----------------+------+-------+----------+-------------------------------------------------+--------------+------------------------+-------------------------+-------------------------+--------------------------+---------------------+------+ +``` diff --git a/tidb-cloud-lake/sql/show-roles.md b/tidb-cloud-lake/sql/show-roles.md new file mode 100644 index 0000000000000..da8a446cef0f5 --- /dev/null +++ b/tidb-cloud-lake/sql/show-roles.md @@ -0,0 +1,37 @@ +--- +title: SHOW ROLES +sidebar_position: 6 +--- + +Lists all the roles assigned to the current user. + +## Syntax + +```sql +SHOW ROLES +``` + +## Output + +The command returns the results in a table with these columns: + +| Column | Description | +|-----------------|-------------------------------------------------------------| +| name | The role name. | +| inherited_roles | Number of roles inherited by the current role. | +| is_current | Indicates whether the role is currently active. | +| is_default | Indicates whether the role is the default role of the user. | + +## Examples + +```sql +SHOW ROLES; + +┌───────────────────────────────────────────────────────┐ +│ name │ inherited_roles │ is_current │ is_default │ +├───────────┼─────────────────┼────────────┼────────────┤ +│ developer │ 0 │ false │ false │ +│ public │ 0 │ false │ false │ +│ writer │ 0 │ true │ true │ +└───────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-sequences.md b/tidb-cloud-lake/sql/show-sequences.md new file mode 100644 index 0000000000000..d9a163bd3235d --- /dev/null +++ b/tidb-cloud-lake/sql/show-sequences.md @@ -0,0 +1,67 @@ +--- +title: SHOW SEQUENCES +sidebar_position: 3 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns a list of the created sequences. + +## Syntax + +```sql +SHOW SEQUENCES [ LIKE '' | WHERE ] +``` + +| Parameter | Description | +|-----------|-----------------------------------------------------------------------------------------------------------------------------| +| LIKE | Filters the results by their names using case-sensitive pattern matching. | +| WHERE | Filters the results using an expression in the WHERE clause. You can filter based on any column in the result set, such as `name`, `start`, `interval`, `current`, `created_on`, `updated_on`, or `comment`. For example: `WHERE start > 0` or `WHERE name LIKE 's%'`. | + +## Examples + +```sql +-- Create a sequence +CREATE SEQUENCE seq; + +-- Show all sequences +SHOW SEQUENCES; + +╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ +│ name │ start │ interval │ current │ created_on │ updated_on │ comment │ +├────────┼────────┼──────────┼─────────┼────────────────────────────┼────────────────────────────┼──────────────────┤ +│ seq │ 1 │ 1 │ 1 │ 2025-05-20 02:48:49.749338 │ 2025-05-20 02:48:49.749338 │ NULL │ +╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ + +-- Use the sequence in an INSERT statement +CREATE TABLE tmp(a int, b uint64, c int); +INSERT INTO tmp select 10,nextval(seq),20 from numbers(3); + +-- Show sequences after usage +SHOW SEQUENCES; + +╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ +│ name │ start │ interval │ current │ created_on │ updated_on │ comment │ +├────────┼────────┼──────────┼─────────┼────────────────────────────┼────────────────────────────┼──────────────────┤ +│ seq │ 1 │ 1 │ 4 │ 2025-05-20 02:48:49.749338 │ 2025-05-20 02:49:14.302917 │ NULL │ +╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ + +-- Filter sequences using WHERE clause +SHOW SEQUENCES WHERE start > 0; + +╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ +│ name │ start │ interval │ current │ created_on │ updated_on │ comment │ +├────────┼────────┼──────────┼─────────┼────────────────────────────┼────────────────────────────┼──────────────────┤ +│ seq │ 1 │ 1 │ 4 │ 2025-05-20 02:48:49.749338 │ 2025-05-20 02:49:14.302917 │ NULL │ +╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ + +-- Filter sequences by name pattern +SHOW SEQUENCES LIKE 's%'; + +╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ +│ name │ start │ interval │ current │ created_on │ updated_on │ comment │ +├────────┼────────┼──────────┼─────────┼────────────────────────────┼────────────────────────────┼──────────────────┤ +│ seq │ 1 │ 1 │ 4 │ 2025-05-20 02:48:49.749338 │ 2025-05-20 02:49:14.302917 │ NULL │ +╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-settings.md b/tidb-cloud-lake/sql/show-settings.md new file mode 100644 index 0000000000000..c612265dec2f0 --- /dev/null +++ b/tidb-cloud-lake/sql/show-settings.md @@ -0,0 +1,49 @@ +--- +title: SHOW SETTINGS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Databend provides a variety of system settings that enable you to control how Databend works. This command displays the current and default values, as well as the [Setting Levels](#setting-levels), of available system settings. To update a setting, use the [SET](02-set-global.md) or [UNSET](/tidb-cloud-lake/sql/unset.md) command. + +- Some Databend behaviors cannot be changed through the system settings; you must take them into consideration while working with Databend. For example, + - Databend encodes strings to the UTF-8 charset. + - Databend uses a 1-based numbering convention for arrays. +- Databend stores the system settings in the system table [system.settings](/tidb-cloud-lake/sql/system-settings.md). + +## Syntax + +```sql +SHOW SETTINGS [LIKE '' | WHERE ] | [LIMIT ] +``` + +## Setting Levels + +Each Databend setting comes with a level that can be Global, Default, or Session. This table illustrates the distinctions between each level: + +| Level | Description | +|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Global | Settings with this level are written to the meta service and affect all clusters in the same tenant. Changes at this level have a global impact and apply to the entire database environment shared by multiple clusters. | +| Default | Settings with this level are configured through the `databend-query.toml` configuration file. Changes at this level only affect a single query instance and are specific to the configuration file. This level provides a default setting for individual query instances. | +| Session | Settings with this level are restricted to a single request or session. They have the narrowest scope and apply only to the specific session or request in progress, providing a way to customize settings on a per-session basis. | + +## Examples + +:::note +As Databend updates the system settings every now and then, this example may not show the most recent results. To view the latest system settings in Databend, please execute `SHOW SETTINGS;` within your Databend instance. +::: + +```sql +SHOW SETTINGS LIMIT 5; + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ value │ default │ range │ level │ description │ type │ +├─────────────────────────────────────────────┼────────┼─────────┼──────────┼─────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────┤ +│ acquire_lock_timeout │ 15 │ 15 │ None │ DEFAULT │ Sets the maximum timeout in seconds for acquire a lock. │ UInt64 │ +│ aggregate_spilling_bytes_threshold_per_proc │ 0 │ 0 │ None │ DEFAULT │ Sets the maximum amount of memory in bytes that an aggregator can use before spilling data to storage during query execution. │ UInt64 │ +│ aggregate_spilling_memory_ratio │ 0 │ 0 │ [0, 100] │ DEFAULT │ Sets the maximum memory ratio in bytes that an aggregator can use before spilling data to storage during query execution. │ UInt64 │ +│ auto_compaction_imperfect_blocks_threshold │ 50 │ 50 │ None │ DEFAULT │ Threshold for triggering auto compaction. This occurs when the number of imperfect blocks in a snapshot exceeds this value after write operations. │ UInt64 │ +│ collation │ utf8 │ utf8 │ ["utf8"] │ DEFAULT │ Sets the character collation. Available values include "utf8". │ String │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-stages.md b/tidb-cloud-lake/sql/show-stages.md new file mode 100644 index 0000000000000..4eb1b5627a3ea --- /dev/null +++ b/tidb-cloud-lake/sql/show-stages.md @@ -0,0 +1,23 @@ +--- +title: SHOW STAGES +sidebar_position: 6 +--- + +Returns a list of the created stages. The output list does not include the user stage. + +## Syntax + +```sql +SHOW STAGES; +``` + +## Examples + +```sql +SHOW STAGES; + +--- +name|stage_type|number_of_files|creator |comment| +----+----------+---------------+----------+-------+ +eric|Internal | 0|'root'@'%'| | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-statistics.md b/tidb-cloud-lake/sql/show-statistics.md new file mode 100644 index 0000000000000..47157a3a96f63 --- /dev/null +++ b/tidb-cloud-lake/sql/show-statistics.md @@ -0,0 +1,88 @@ +--- +title: SHOW STATISTICS +sidebar_position: 15 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Displays statistical information about tables and their columns. Statistics help the query optimizer make better decisions about query execution plans by providing information about data distribution, row counts, and distinct values. + +Databend automatically generates statistics during data insertion. You can use this command to inspect the statistics and compare them with actual data to identify any discrepancies that might affect query performance. + +## Syntax + +```sql +SHOW STATISTICS [ FROM DATABASE | FROM TABLE . ] +``` + +| Parameter | Description | +|-----------|-----------------------------------------------------------------------------------------------------------------------------| +| FROM DATABASE | Shows statistics for all tables in the specified database. | +| FROM TABLE | Shows statistics for the specified table only. | + +If no parameter is specified, the command returns statistics for all tables in the current database. + +## Output Columns + +The command returns the following columns for each column in each table: + +| Column | Description | +|--------|-----------------------------------------------------------------------------------------------------------------------------| +| database | The database name. | +| table | The table name. | +| column_name | The column name. | +| stats_row_count | The accumulated number of rows considered in statistics. Since stats are updated on inserts but not decremented on deletes, this number can be **greater than** actual_row_count. | +| actual_row_count | The actual number of rows in the table under the current snapshot. | +| distinct_count | Estimated number of distinct values (NDV), computed from HyperLogLog. | +| null_count | Number of NULL values in the column. | +| avg_size | Average size in bytes of each value in the column. | + +## Examples + +### Show Statistics for Current Database + +```sql +CREATE DATABASE test_db; +USE test_db; + +CREATE TABLE t1 (id INT, name VARCHAR(50)); +INSERT INTO t1 VALUES (1, 'Alice'), (2, 'Bob'); + +SHOW STATISTICS; +``` + +Output: +``` +database table column_name stats_row_count actual_row_count distinct_count null_count avg_size +test_db t1 id 2 2 2 0 4 +test_db t1 name 2 2 2 0 16 +``` + +### Show Statistics for a Specific Table + +```sql +CREATE TABLE t2 (age INT, city VARCHAR(50)); +INSERT INTO t2 VALUES (25, 'New York'), (30, 'London'); + +SHOW STATISTICS FROM TABLE test_db.t2; +``` + +Output: +``` +database table column_name stats_row_count actual_row_count distinct_count null_count avg_size +test_db t2 age 2 2 2 0 4 +test_db t2 city 2 2 2 0 19 +``` + +### Show Statistics for All Tables in a Database + +```sql +SHOW STATISTICS FROM DATABASE test_db; +``` + +This will show statistics for all tables (`t1` and `t2`) in the `test_db` database. + +## Related Commands + +- [SHOW TABLE STATUS](/tidb-cloud-lake/sql/show-table-status.md): Shows status information about tables diff --git a/tidb-cloud-lake/sql/show-streams.md b/tidb-cloud-lake/sql/show-streams.md new file mode 100644 index 0000000000000..a548dff415137 --- /dev/null +++ b/tidb-cloud-lake/sql/show-streams.md @@ -0,0 +1,58 @@ +--- +title: SHOW STREAMS +sidebar_position: 2 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +import EEFeature from '@site/src/components/EEFeature'; + + + +Lists the streams associated with a specific database. + +## Syntax + +```sql +SHOW [ FULL ] STREAMS + [ { FROM | IN } ] + [ LIKE '' | WHERE ] +``` + +| Parameter | Description | +|-----------|----------------------------------------------------------------------------------------------| +| FULL | Lists the results with additional information. See [Examples](#examples) for more details. | +| FROM / IN | Specifies a database. If omitted, the command returns the results from the current database. | +| LIKE | Filters the stream names using case-sensitive pattern matching with the `%` wildcard. | +| WHERE | Filters the stream names using an expression in the WHERE clause. | + +## Examples + +This example shows streams belonging to the current database: + +```sql +SHOW STREAMS; + +┌──────────────────────────────────────────────────────────┐ +│ Streams_in_default │ table_on │ mode │ +├────────────────────┼───────────────────────┼─────────────┤ +│ order_changes │ default.orders │ append_only │ +│ s_append_only │ default.t_append_only │ append_only │ +│ s_standard │ default.t_standard │ standard │ +└──────────────────────────────────────────────────────────┘ +``` + +This example shows detailed information about streams in the current database: + +```sql +SHOW FULL STREAMS; + +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ created_on │ name │ database │ catalog │ table_on │ owner │ comment │ mode │ invalid_reason │ +├────────────────────────────┼───────────────┼──────────┼─────────┼───────────────────────┼──────────────────┼─────────┼─────────────┼────────────────┤ +│ 2024-05-12 14:28:33.886271 │ order_changes │ default │ default │ default.orders │ NULL │ │ append_only │ │ +│ 2024-05-12 14:35:05.992050 │ s_append_only │ default │ default │ default.t_append_only │ NULL │ │ append_only │ │ +│ 2024-05-12 14:35:05.981121 │ s_standard │ default │ default │ default.t_standard │ NULL │ │ standard │ │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-table-functions.md b/tidb-cloud-lake/sql/show-table-functions.md new file mode 100644 index 0000000000000..8aed04e922057 --- /dev/null +++ b/tidb-cloud-lake/sql/show-table-functions.md @@ -0,0 +1,59 @@ +--- +title: SHOW TABLE FUNCTIONS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Shows the list of supported table functions currently. + +## Syntax + +```sql +SHOW TABLE_FUNCTIONS [LIKE '' | WHERE ] | [LIMIT ] +``` + +## Example + +```sql +SHOW TABLE_FUNCTIONS; ++------------------------+ +| name | ++------------------------+ +| numbers | +| numbers_mt | +| numbers_local | +| fuse_snapshot | +| fuse_segment | +| fuse_block | +| fuse_statistic | +| clustering_information | +| sync_crash_me | +| async_crash_me | +| infer_schema | ++------------------------+ +``` + +Showing the table functions begin with `"number"`: +```sql +SHOW TABLE_FUNCTIONS LIKE 'number%'; ++---------------+ +| name | ++---------------+ +| numbers | +| numbers_mt | +| numbers_local | ++---------------+ +``` + +Showing the table functions begin with `"number"` with `WHERE`: +```sql +SHOW TABLE_FUNCTIONS WHERE name LIKE 'number%'; ++---------------+ +| name | ++---------------+ +| numbers | +| numbers_mt | +| numbers_local | ++---------------+ +``` diff --git a/tidb-cloud-lake/sql/show-table-status.md b/tidb-cloud-lake/sql/show-table-status.md new file mode 100644 index 0000000000000..b01206033f7bd --- /dev/null +++ b/tidb-cloud-lake/sql/show-table-status.md @@ -0,0 +1,61 @@ +--- +title: SHOW TABLE STATUS +sidebar_position: 14 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Shows the status of the tables in a database. The status information includes various physical sizes and timestamps about a table, see [Examples](#examples) for details. + +## Syntax + +```sql +SHOW TABLE STATUS + [ {FROM | IN} ] + [ LIKE 'pattern' | WHERE expr ] +``` + +| Parameter | Description | +|-----------|-----------------------------------------------------------------------------------------------------------------------------| +| FROM / IN | Specifies a database. If omitted, the command returns the results from the current database. | +| LIKE | Filters the results by the table names using case-sensitive pattern matching. | +| WHERE | Filters the results using an expression in the WHERE clause. | + +## Examples + +The following example displays the status of tables in the current database, providing details such as name, engine, rows, and other relevant information: + +```sql +SHOW TABLE STATUS; + +name |engine|version|row_format|rows|avg_row_length|data_length|max_data_length|index_length|data_free|auto_increment|create_time |update_time|check_time|collation|checksum|comment|cluster_by| +-------+------+-------+----------+----+--------------+-----------+---------------+------------+---------+--------------+-----------------------------+-----------+----------+---------+--------+-------+----------+ +books |FUSE | 0| | 2| | 160| | 713| | |2023-09-25 06:40:47.237 +0000| | | | | | | +mytable|FUSE | 0| | 5| | 40| | 1665| | |2023-08-28 07:53:05.455 +0000| | | | | |((a + 1)) | +ontime |FUSE | 0| | 199| | 147981| | 22961| | |2023-09-19 07:04:06.414 +0000| | | | | | | +``` + +The following example displays the status of tables in the current database where the names start with 'my': + +```sql +SHOW TABLE STATUS LIKE 'my%'; + +name |engine|version|row_format|rows|avg_row_length|data_length|max_data_length|index_length|data_free|auto_increment|create_time |update_time|check_time|collation|checksum|comment|cluster_by| +-------+------+-------+----------+----+--------------+-----------+---------------+------------+---------+--------------+-----------------------------+-----------+----------+---------+--------+-------+----------+ +mytable|FUSE | 0| | 5| | 40| | 1665| | |2023-08-28 07:53:05.455 +0000| | | | | |((a + 1)) | +``` + +The following example displays the status of tables in the current database where the number of rows is greater than 100: + +:::note +When using the SHOW TABLE STATUS query, be aware that some column names, such as "rows," may be interpreted as SQL keywords, potentially leading to errors. To avoid this issue, always enclose column names with backticks, as shown in this example. This ensures that column names are treated as identifiers rather than keywords in the SQL query. +::: + +```sql +SHOW TABLE STATUS WHERE `rows` > 100; + +name |engine|version|row_format|rows|avg_row_length|data_length|max_data_length|index_length|data_free|auto_increment|create_time |update_time|check_time|collation|checksum|comment|cluster_by| +------+------+-------+----------+----+--------------+-----------+---------------+------------+---------+--------------+-----------------------------+-----------+----------+---------+--------+-------+----------+ +ontime|FUSE | 0| | 199| | 147981| | 22961| | |2023-09-19 07:04:06.414 +0000| | | | | | | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-tables.md b/tidb-cloud-lake/sql/show-tables.md new file mode 100644 index 0000000000000..28dc4210580de --- /dev/null +++ b/tidb-cloud-lake/sql/show-tables.md @@ -0,0 +1,119 @@ +--- +title: SHOW TABLES +sidebar_position: 15 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Lists the tables in the current or a specified database. + +:::note +Starting from version 1.2.415, the SHOW TABLES command no longer includes views in its results. To display views, use [SHOW VIEWS](/tidb-cloud-lake/sql/show-views.md) instead. +::: + +See also: [system.tables](/tidb-cloud-lake/sql/system-tables.md) + +## Syntax + +```sql +SHOW [ FULL ] TABLES + [ {FROM | IN} ] + [ HISTORY ] + [ LIKE '' | WHERE ] +``` + +| Parameter | Description | +|-----------|-----------------------------------------------------------------------------------------------------------------------------| +| FULL | Lists the results with additional information. See [Examples](#examples) for more details. | +| FROM / IN | Specifies a database. If omitted, the command returns the results from the current database. | +| HISTORY | Displays the timestamps of table deletions within the retention period (24 hours by default). If a table has not been deleted yet, the value for `drop_time` is NULL. | +| LIKE | Filters the results by their names using case-sensitive pattern matching. | +| WHERE | Filters the results using an expression in the WHERE clause. | + +## Examples + +The following example lists the names of all tables in the current database (default): + +```sql +SHOW TABLES; + +┌───────────────────┐ +│ Tables_in_default │ +├───────────────────┤ +│ books │ +│ mytable │ +│ ontime │ +│ products │ +└───────────────────┘ +``` + +The following example lists all the tables with additional information: + +```sql +SHOW FULL TABLES; + +┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ tables │ table_type │ database │ catalog │ owner │ engine │ cluster_by │ create_time │ num_rows │ data_size │ data_compressed_size │ index_size │ +├──────────┼────────────┼──────────┼─────────┼──────────────────┼────────┼────────────┼────────────────────────────┼──────────────────┼──────────────────┼──────────────────────┼──────────────────┤ +│ books │ BASE TABLE │ default │ default │ account_admin │ FUSE │ │ 2024-01-16 03:53:15.354132 │ 0 │ 0 │ 0 │ 0 │ +│ mytable │ BASE TABLE │ default │ default │ account_admin │ FUSE │ │ 2024-01-16 03:53:27.968505 │ 0 │ 0 │ 0 │ 0 │ +│ ontime │ BASE TABLE │ default │ default │ account_admin │ FUSE │ │ 2024-01-16 03:53:42.052399 │ 0 │ 0 │ 0 │ 0 │ +│ products │ BASE TABLE │ default │ default │ account_admin │ FUSE │ │ 2024-01-16 03:54:00.883985 │ 0 │ 0 │ 0 │ 0 │ +└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` + +The following example demonstrates that the results will include dropped tables when the optional parameter HISTORY is present: + +```sql +DROP TABLE products; + +SHOW TABLES; + +┌───────────────────┐ +│ Tables_in_default │ +├───────────────────┤ +│ books │ +│ mytable │ +│ ontime │ +└───────────────────┘ + +SHOW TABLES HISTORY; + +┌────────────────────────────────────────────────┐ +│ Tables_in_default │ drop_time │ +├───────────────────┼────────────────────────────┤ +│ books │ NULL │ +│ mytable │ NULL │ +│ ontime │ NULL │ +│ products │ 2024-01-16 03:55:47.900362 │ +└────────────────────────────────────────────────┘ +``` + +The following example lists the tables containing the string "time" at the end of their name: + +```sql +SHOW TABLES LIKE '%time'; + +┌───────────────────┐ +│ Tables_in_default │ +├───────────────────┤ +│ ontime │ +└───────────────────┘ + +-- CASE-SENSITIVE pattern matching. +-- No results will be returned if you code the previous statement like this: +SHOW TABLES LIKE '%TIME'; +``` + +The following example lists tables where the data size is greater than 1,000 bytes: + +```sql +SHOW TABLES WHERE data_size > 1000 ; + +┌───────────────────┐ +│ Tables_in_default │ +├───────────────────┤ +│ ontime │ +└───────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-tasks.md b/tidb-cloud-lake/sql/show-tasks.md new file mode 100644 index 0000000000000..a83c044c67bf6 --- /dev/null +++ b/tidb-cloud-lake/sql/show-tasks.md @@ -0,0 +1,64 @@ +--- +title: SHOW TASKS +sidebar_position: 5 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Lists the tasks that are visible to the current role. + +**NOTICE:** This command works out of the box only in Databend Cloud. For self-hosted deployments, configure Cloud Control to query tasks. + +## Syntax + +```sql +SHOW TASKS [LIKE '' | WHERE ] +``` + +| Parameter | Description | +|-----------|-------------| +| LIKE | Filters task names using case-sensitive pattern matching with the `%` wildcard. | +| WHERE | Filters the result set using an expression on the output columns. | + +### Output + +`SHOW TASKS` returns the following columns: + +- `created_on`: Timestamp when the task was created. +- `name`: Task name. +- `id`: Internal task identifier. +- `owner`: Role that owns the task. +- `comment`: Optional comment. +- `warehouse`: Warehouse assigned to the task. +- `schedule`: Interval or CRON schedule, when present. +- `state`: Current status (`Started` or `Suspended`). +- `definition`: SQL the task runs. +- `condition_text`: WHEN condition for the task. +- `after`: Comma-separated list of upstream tasks in a DAG. +- `suspend_task_after_num_failures`: Number of consecutive failures before suspension. +- `error_integration`: Notification integration for failures. +- `next_schedule_time`: Timestamp of the next scheduled run. +- `last_committed_on`: Timestamp when the task definition was last updated. +- `last_suspended_on`: Timestamp when the task was last suspended, if any. +- `session_parameters`: Session parameters applied when the task runs. + +## Examples + +List all tasks available to the current role: + +```sql +SHOW TASKS; ++----------------------------+---------------+------+---------------+---------+-----------+---------------------------------+----------+-------------------------------------------+------------------------+---------+-------------------------------------+-------------------+----------------------------+----------------------------+----------------------------+---------------------------------------------------+ +| created_on | name | id | owner | comment | warehouse | schedule | state | definition | condition_text | after | suspend_task_after_num_failures | error_integration | next_schedule_time | last_committed_on | last_suspended_on | session_parameters | ++----------------------------+---------------+------+---------------+---------+-----------+---------------------------------+----------+-------------------------------------------+------------------------+---------+-------------------------------------+-------------------+----------------------------+----------------------------+----------------------------+---------------------------------------------------+ +| 2024-07-01 08:00:00.000000 | ingest_sales | 101 | ACCOUNTADMIN | NULL | etl_wh | CRON 0 5 * * * * TIMEZONE UTC | Started | COPY INTO sales FROM @stage PATTERN '.*' | STREAM_STATUS('s1') | | 3 | slack_errors | 2024-07-01 08:05:00.000000 | 2024-07-01 08:00:00.000000 | NULL | {"enable_query_result_cache":"1"} | +| 2024-07-01 09:00:00.000000 | hourly_checks | 102 | SYSADMIN | health | etl_wh | INTERVAL 3600 SECOND | Suspended | CALL run_health_check() | | ingest_sales | NULL | NULL | 2024-07-01 10:00:00.000000 | 2024-07-01 09:05:00.000000 | 2024-07-01 09:10:00.000000 | {"query_result_cache_min_execute_secs":"5"} | ++----------------------------+---------------+------+---------------+---------+-----------+---------------------------------+----------+-------------------------------------------+------------------------+---------+-------------------------------------+-------------------+----------------------------+----------------------------+----------------------------+---------------------------------------------------+ +``` + +Show only tasks whose names start with `ingest_`: + +```sql +SHOW TASKS LIKE 'ingest_%'; +``` diff --git a/tidb-cloud-lake/sql/show-user-functions-sql.md b/tidb-cloud-lake/sql/show-user-functions-sql.md new file mode 100644 index 0000000000000..5e73ca1376968 --- /dev/null +++ b/tidb-cloud-lake/sql/show-user-functions-sql.md @@ -0,0 +1,31 @@ +--- +title: SHOW USER FUNCTIONS +sidebar_position: 4 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Lists the existing user-defined functions and external functions in the system. Equivalent to `SELECT name, is_aggregate, description, arguments, language FROM system.user_functions ...`. + +See also: [system.user_functions](/tidb-cloud-lake/sql/system-user-functions.md) + +## Syntax + +```sql +SHOW USER FUNCTIONS [LIKE '' | WHERE ] | [LIMIT ] +``` + +## Example + +```sql +SHOW USER FUNCTIONS; + +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ is_aggregate │ description │ arguments │ language │ +├────────────────┼───────────────────┼─────────────┼───────────────────────────────────────────────────────────┼──────────┤ +│ binary_reverse │ NULL │ │ {"arg_types":["Binary NULL"],"return_type":"Binary NULL"} │ python │ +│ echo │ NULL │ │ {"arg_types":["String NULL"],"return_type":"String NULL"} │ python │ +│ isnotempty │ NULL │ │ {"parameters":["p"]} │ SQL │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-user-functions.md b/tidb-cloud-lake/sql/show-user-functions.md new file mode 100644 index 0000000000000..43693b5932810 --- /dev/null +++ b/tidb-cloud-lake/sql/show-user-functions.md @@ -0,0 +1,40 @@ +--- +title: SHOW USER FUNCTIONS +sidebar_position: 4 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Lists all user-defined functions including scalar functions, table functions, embedded functions, and external functions. + +## Syntax + +```sql +SHOW USER FUNCTIONS +``` + +## Output Columns + +| Column | Description | +|--------|-------------| +| `name` | Function name | +| `is_aggregate` | Whether it's an aggregate function (NULL for UDFs) | +| `description` | Function description if provided | +| `arguments` | Function parameters in JSON format | +| `language` | Programming language: SQL, python, javascript, wasm, or external | +| `created_on` | Function creation timestamp | + +## Examples + +```sql +SHOW USER FUNCTIONS; + +┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ is_aggregate │ description │ arguments │ language │ created_on │ +│ String │ Nullable(Boolean) │ String │ Variant │ String │ Timestamp │ +├────────┼───────────────────┼─────────────┼───────────────────────────────┼──────────┼────────────────────────────┤ +│ get_v1 │ NULL │ │ {"parameters":["input_json"]} │ SQL │ 2024-11-18 23:20:28.432842 │ +│ get_v2 │ NULL │ │ {"parameters":["input_json"]} │ SQL │ 2024-11-18 23:21:46.838744 │ +└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-users.md b/tidb-cloud-lake/sql/show-users.md new file mode 100644 index 0000000000000..3ed4b16c7adbf --- /dev/null +++ b/tidb-cloud-lake/sql/show-users.md @@ -0,0 +1,46 @@ +--- +title: SHOW USERS +sidebar_position: 3 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Lists all SQL users in the system. If you're using Databend Cloud, this command also shows the user accounts (email addresses) within your organization that are used to log in to Databend Cloud. + +## Syntax + +```sql +SHOW USERS +``` + +## Examples + +```sql +CREATE NETWORK POLICY my_network_policy ALLOWED_IP_LIST=('192.168.100.0/24'); + +CREATE PASSWORD POLICY my_password_policy + PASSWORD_MIN_LENGTH = 12 + PASSWORD_MAX_LENGTH = 24 + PASSWORD_MIN_UPPER_CASE_CHARS = 2 + PASSWORD_MIN_LOWER_CASE_CHARS = 2 + PASSWORD_MIN_NUMERIC_CHARS = 2 + PASSWORD_MIN_SPECIAL_CHARS = 2 + PASSWORD_MIN_AGE_DAYS = 1 + PASSWORD_MAX_AGE_DAYS = 30 + PASSWORD_MAX_RETRIES = 3 + PASSWORD_LOCKOUT_TIME_MINS = 30 + PASSWORD_HISTORY = 5 + COMMENT = 'test comment'; + +CREATE USER eric IDENTIFIED BY '123ABCabc$$123' WITH SET PASSWORD POLICY='my_password_policy', SET NETWORK POLICY='my_network_policy'; + +SHOW USERS; + +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ hostname │ auth_type │ is_configured │ default_role │ roles │ disabled │ network_policy │ password_policy │ must_change_password │ +├────────┼──────────┼──────────────────────┼───────────────┼───────────────┼───────────────┼──────────┼───────────────────┼────────────────────┼──────────────────────┤ +│ eric │ % │ double_sha1_password │ NO │ │ │ false │ my_network_policy │ my_password_policy │ NULL │ +│ root │ % │ no_password │ YES │ account_admin │ account_admin │ false │ NULL │ NULL │ NULL │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-variables-sql.md b/tidb-cloud-lake/sql/show-variables-sql.md new file mode 100644 index 0000000000000..3fb7a0e8acbf0 --- /dev/null +++ b/tidb-cloud-lake/sql/show-variables-sql.md @@ -0,0 +1,31 @@ +--- +title: SHOW_VARIABLES +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Displays all session variables and their details, such as names, values, and types. + +See also: [SHOW VARIABLES](/tidb-cloud-lake/sql/show-variables.md) + +## Syntax + +```sql +SHOW_VARIABLES() +``` + +## Examples + +```sql +SELECT name, value, type FROM SHOW_VARIABLES(); + +┌──────────────────────────┐ +│ name │ value │ type │ +├────────┼────────┼────────┤ +│ y │ 'yy' │ String │ +│ b │ 55 │ UInt8 │ +│ x │ 'xx' │ String │ +│ a │ 3 │ UInt8 │ +└──────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-variables.md b/tidb-cloud-lake/sql/show-variables.md new file mode 100644 index 0000000000000..a28afb459cd15 --- /dev/null +++ b/tidb-cloud-lake/sql/show-variables.md @@ -0,0 +1,41 @@ +--- +title: SHOW VARIABLES +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Displays all session variables and their details, such as names, values, and types. + +See also: [SHOW_VARIABLES](/tidb-cloud-lake/sql/show-variables.md) + +## Syntax + +```sql +SHOW VARIABLES [ LIKE '' | WHERE ] +``` + +## Examples + +The following example lists all session variables with their values and types: + +```sql +SHOW VARIABLES; + +┌──────────────────────────┐ +│ name │ value │ type │ +├────────┼────────┼────────┤ +│ a │ 3 │ UInt8 │ +│ b │ 55 │ UInt8 │ +│ x │ 'xx' │ String │ +│ y │ 'yy' │ String │ +└──────────────────────────┘ +``` + +To filter and return only the variable named `a`, use one of the following queries: + +```sql +SHOW VARIABLES LIKE 'a'; + +SHOW VARIABLES WHERE name = 'a'; +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-views.md b/tidb-cloud-lake/sql/show-views.md new file mode 100644 index 0000000000000..da9ad95870129 --- /dev/null +++ b/tidb-cloud-lake/sql/show-views.md @@ -0,0 +1,61 @@ +--- +title: SHOW VIEWS +sidebar_position: 4 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns a list of view names within the specified database, or within the current database if no database name is provided. + +## Syntax + +```sql +SHOW [ FULL ] VIEWS + [ { FROM | IN } ] + [ HISTORY ] + [ LIKE '' | WHERE ] +``` + +| Parameter | Description | +|-----------|----------------------------------------------------------------------------------------------| +| FULL | Lists the results with additional information. See [Examples](#examples) for more details. | +| FROM / IN | Specifies a database. If omitted, the command returns the results from the current database. | +| HISTORY | Displays the timestamps of view deletions within the retention period (24 hours by default). If a view has not been deleted yet, the value for `drop_time` is NULL. | +| LIKE | Filters the view names using case-sensitive pattern matching with the `%` wildcard. | +| WHERE | Filters the view names using an expression in the WHERE clause. | + +## Examples + +```sql +SHOW VIEWS; + +┌───────────────────────────────────────────────────────────────────┐ +│ Views_in_default │ view_query │ +├──────────────────┼────────────────────────────────────────────────┤ +│ books_view │ SELECT id, title, genre FROM default.books │ +│ users_view │ SELECT username, email, age FROM default.users │ +└───────────────────────────────────────────────────────────────────┘ + +SHOW FULL VIEWS; + +┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ views │ database │ catalog │ owner │ engine │ create_time │ view_query │ +├────────────┼──────────┼─────────┼──────────────────┼────────┼────────────────────────────┼────────────────────────────────────────────────┤ +│ books_view │ default │ default │ NULL │ VIEW │ 2024-04-14 23:29:52.916989 │ SELECT id, title, genre FROM default.books │ +│ users_view │ default │ default │ NULL │ VIEW │ 2024-04-14 23:31:02.918994 │ SELECT username, email, age FROM default.users │ +└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ + +-- Delete the view 'books_view' +DROP VIEW books_view; + +SHOW VIEWS HISTORY; + +┌────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ Views_in_default │ view_query │ drop_time │ +├──────────────────┼────────────────────────────────────────────────┼────────────────────────────┤ +│ books_view │ SELECT id, title, genre FROM default.books │ 2024-04-15 02:29:56.051081 │ +│ users_view │ SELECT username, email, age FROM default.users │ NULL │ +└────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/show-virtual-columns.md b/tidb-cloud-lake/sql/show-virtual-columns.md new file mode 100644 index 0000000000000..45f0607ac2a5c --- /dev/null +++ b/tidb-cloud-lake/sql/show-virtual-columns.md @@ -0,0 +1,52 @@ +--- +title: SHOW VIRTUAL COLUMNS +sidebar_position: 4 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +import EEFeature from '@site/src/components/EEFeature'; + + + +Shows the created virtual columns in the system. Equivalent to `SELECT * FROM system.virtual_columns`. + +Virtual columns are enabled by default starting from v1.2.832. + +See also: [system.virtual_columns](/tidb-cloud-lake/sql/system-virtual-columns.md) + +## Preferred Syntax + +Use the command in its simplest, most useful form to inspect a specific table or list all virtual columns: + +```sql +SHOW VIRTUAL COLUMNS [WHERE table = '' AND database = ''] +``` + +## Example + +```sql +CREATE TABLE test(id int, val variant); + +INSERT INTO + test +VALUES + ( + 1, + '{"id":1,"name":"databend"}' + ), + ( + 2, + '{"id":2,"name":"databricks"}' + ); + +SHOW VIRTUAL COLUMNS WHERE table = 'test' AND database = 'default'; +╭───────────────────────────────────────────────────────────────────────────────────────────────────╮ +│ database │ table │ source_column │ virtual_column_id │ virtual_column_name │ virtual_column_type │ +│ String │ String │ String │ UInt32 │ String │ String │ +├──────────┼────────┼───────────────┼───────────────────┼─────────────────────┼─────────────────────┤ +│ default │ test │ val │ 3000000000 │ ['id'] │ UInt64 │ +│ default │ test │ val │ 3000000001 │ ['name'] │ String │ +╰───────────────────────────────────────────────────────────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/show-warehouses.md b/tidb-cloud-lake/sql/show-warehouses.md new file mode 100644 index 0000000000000..094fe8db69d62 --- /dev/null +++ b/tidb-cloud-lake/sql/show-warehouses.md @@ -0,0 +1,58 @@ +--- +title: SHOW WAREHOUSES +sidebar_position: 3 +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Lists all warehouses visible to the current tenant. + +## Syntax + +```sql +SHOW WAREHOUSES [ LIKE '' ] [ ] +``` + +| Parameter | Description | +| ------------------------ | -------------------------------------------------------------------------------------------------------------------------- | +| `LIKE ''` | Optional. Filters warehouse names using SQL `LIKE` semantics (`%` matches any sequence, `_` matches any single character). | +| `` | Optional. When `LIKE` is omitted but a literal follows, it is treated as `LIKE ''`. | + +## Output Columns + +| Column | Description | +| ------------------- | ----------------------------------------- | +| `name` | Warehouse name | +| `state` | Current state (e.g., Running, Suspended) | +| `size` | Warehouse size | +| `auto_suspend` | Auto-suspend timeout in seconds | +| `auto_resume` | Whether auto-resume is enabled | +| `min_cluster_count` | Minimum cluster count for auto-scaling | +| `max_cluster_count` | Maximum cluster count for auto-scaling | +| `role` | Warehouse role | +| `comment` | User-defined comment | +| `tags` | Warehouse tags as a JSON-formatted string | +| `created_by` | Creator | +| `created_on` | Creation timestamp | + +## Examples + +List all warehouses: + +```sql +SHOW WAREHOUSES; +``` + +List warehouses matching a pattern: + +```sql +SHOW WAREHOUSES LIKE '%prod%'; +``` + +Use a literal without `LIKE`: + +```sql +SHOW WAREHOUSES nightly_etl; +``` diff --git a/tidb-cloud-lake/sql/show-workload-groups.md b/tidb-cloud-lake/sql/show-workload-groups.md new file mode 100644 index 0000000000000..08b48d0f02523 --- /dev/null +++ b/tidb-cloud-lake/sql/show-workload-groups.md @@ -0,0 +1,27 @@ +--- +title: SHOW WORKLOAD GROUPS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns a list of all existing workload groups along with their quotas. + +## Syntax + +```sql +SHOW WORKLOAD GROUPS +``` + +## Examples + +```sql +SHOW WORKLOAD GROUPS + +┌────────────────────────────────────────────────────────────────────────────────────────────┐ +│ name │ cpu_quota │ memory_quota │ query_timeout │ max_concurrency │ query_queued_timeout │ +│ String │ String │ String │ String │ String │ String │ +├────────┼───────────┼──────────────┼───────────────┼─────────────────┼──────────────────────┤ +│ test │ 30% │ │ 15s │ │ │ +└────────────────────────────────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/sign.md b/tidb-cloud-lake/sql/sign.md new file mode 100644 index 0000000000000..79a9872f55ccd --- /dev/null +++ b/tidb-cloud-lake/sql/sign.md @@ -0,0 +1,23 @@ +--- +title: SIGN +--- + +Returns the sign of the argument as -1, 0, or 1, depending on whether `x` is negative, zero, or positive or NULL if the argument was NULL. + +## Syntax + +```sql +SIGN( ) +``` + +## Examples + +```sql +SELECT SIGN(0); + +┌─────────┐ +│ sign(0) │ +├─────────┤ +│ 0 │ +└─────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/sin.md b/tidb-cloud-lake/sql/sin.md new file mode 100644 index 0000000000000..bfe13b4f571fe --- /dev/null +++ b/tidb-cloud-lake/sql/sin.md @@ -0,0 +1,23 @@ +--- +title: SIN +--- + +Returns the sine of `x`, where `x` is given in radians. + +## Syntax + +```sql +SIN( ) +``` + +## Examples + +```sql +SELECT SIN(90); + +┌────────────────────┐ +│ sin(90) │ +├────────────────────┤ +│ 0.8939966636005579 │ +└────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/siphash-sql.md b/tidb-cloud-lake/sql/siphash-sql.md new file mode 100644 index 0000000000000..1227f1fd08399 --- /dev/null +++ b/tidb-cloud-lake/sql/siphash-sql.md @@ -0,0 +1,27 @@ +--- +title: SIPHASH64 +--- + +Produces a 64-bit [SipHash](https://en.wikipedia.org/wiki/SipHash) hash value. + +## Syntax + +```sql +SIPHASH64() +``` + +## Aliases + +- [SIPHASH](/tidb-cloud-lake/sql/siphash.md) + +## Examples + +```sql +SELECT SIPHASH('1234567890'), SIPHASH64('1234567890'); + +┌─────────────────────────────────────────────────┐ +│ siphash('1234567890') │ siphash64('1234567890') │ +├───────────────────────┼─────────────────────────┤ +│ 18110648197875983073 │ 18110648197875983073 │ +└─────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/siphash.md b/tidb-cloud-lake/sql/siphash.md new file mode 100644 index 0000000000000..8d669b6e7d299 --- /dev/null +++ b/tidb-cloud-lake/sql/siphash.md @@ -0,0 +1,5 @@ +--- +title: SIPHASH +--- + +Alias for [SIPHASH64](/tidb-cloud-lake/sql/siphash.md). \ No newline at end of file diff --git a/tidb-cloud-lake/sql/skewness.md b/tidb-cloud-lake/sql/skewness.md new file mode 100644 index 0000000000000..46b223a1d8ed1 --- /dev/null +++ b/tidb-cloud-lake/sql/skewness.md @@ -0,0 +1,58 @@ +--- +title: SKEWNESS +--- + +Aggregate function. + +The `SKEWNESS()` function returns the skewness of all input values. + +## Syntax + +```sql +SKEWNESS() +``` + +## Arguments + +| Arguments | Description | +|-----------| ----------- | +| `` | Any numerical expression | + +## Return Type + +Nullable Float64. + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE temperature_data ( + id INT, + city_id INT, + temperature FLOAT +); + +INSERT INTO temperature_data (id, city_id, temperature) +VALUES (1, 1, 60), + (2, 1, 65), + (3, 1, 62), + (4, 2, 70), + (5, 2, 75); +``` + +**Query Demo: Calculate Skewness of Temperature Data** + +```sql +SELECT SKEWNESS(temperature) AS temperature_skewness +FROM temperature_data; +``` + +**Result** +```sql +| temperature_skewness | +|----------------------| +| 0.68 | +``` + + + diff --git a/tidb-cloud-lake/sql/sleep.md b/tidb-cloud-lake/sql/sleep.md new file mode 100644 index 0000000000000..4dc756f5a909d --- /dev/null +++ b/tidb-cloud-lake/sql/sleep.md @@ -0,0 +1,36 @@ +--- +title: SLEEP +--- + +Sleeps `seconds` seconds on each data block. + +:::caution +Only used for testing where sleep is required. +::: + +## Syntax + +```sql +SLEEP(seconds) +``` + +## Arguments + +| Arguments | Description | +| ----------- | ----------- | +| seconds | Must be a constant column of any nonnegative number or float.| + +## Return Type + +UInt8 + +## Examples + +```sql +SELECT sleep(2); ++----------+ +| sleep(2) | ++----------+ +| 0 | ++----------+ +``` diff --git a/tidb-cloud-lake/sql/slice.md b/tidb-cloud-lake/sql/slice.md new file mode 100644 index 0000000000000..1b27e26d13a39 --- /dev/null +++ b/tidb-cloud-lake/sql/slice.md @@ -0,0 +1,27 @@ +--- +title: SLICE +--- + +Extracts a slice from the array by index (1-based). + +## Syntax + +```sql +SLICE( , [, ] ) +``` + +## Aliases + +- [ARRAY_SLICE](/tidb-cloud-lake/sql/array-slice.md) + +## Examples + +```sql +SELECT ARRAY_SLICE([1, 21, 32, 4], 2, 3), SLICE([1, 21, 32, 4], 2, 3); + +┌─────────────────────────────────────────────────────────────────┐ +│ array_slice([1, 21, 32, 4], 2, 3) │ slice([1, 21, 32, 4], 2, 3) │ +├───────────────────────────────────┼─────────────────────────────┤ +│ [21,32] │ [21,32] │ +└─────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/soundex.md b/tidb-cloud-lake/sql/soundex.md new file mode 100644 index 0000000000000..9f48bc1ff9a2f --- /dev/null +++ b/tidb-cloud-lake/sql/soundex.md @@ -0,0 +1,73 @@ +--- +id: string-soundex +title: SOUNDEX +--- + +Generates the Soundex code for a string. + +- A Soundex code consists of a letter followed by three numerical digits. Databend's implementation returns more than 4 digits, but you can [SUBSTR](/tidb-cloud-lake/sql/substr.md) the result to get a standard Soundex code. +- All non-alphabetic characters in the string are ignored. +- All international alphabetic characters outside the A-Z range are ignored unless they're the first letter. + + +:::tip What is Soundex? +Soundex converts an alphanumeric string to a four-character code that is based on how the string sounds when spoken in English. For more information, see https://en.wikipedia.org/wiki/Soundex +::: + +See also: [SOUNDS LIKE](/tidb-cloud-lake/sql/sounds-like.md) + +## Syntax + +```sql +SOUNDEX() +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| str | The string. | + +## Return Type + +Returns a code of type VARCHAR or a NULL value. + +## Examples + +```sql +SELECT SOUNDEX('Databend'); + +--- +D153 + +-- All non-alphabetic characters in the string are ignored. +SELECT SOUNDEX('Databend!');; + +--- +D153 + +-- All international alphabetic characters outside the A-Z range are ignored unless they're the first letter. +SELECT SOUNDEX('Databend,你好'); + +--- +D153 + +SELECT SOUNDEX('你好,Databend'); + +--- +你3153 + +-- SUBSTR the result to get a standard Soundex code. +SELECT SOUNDEX('Databend Cloud'),SUBSTR(SOUNDEX('Databend Cloud'),1,4); + +soundex('databend cloud')|substring(soundex('databend cloud') from 1 for 4)| +-------------------------+-------------------------------------------------+ +D153243 |D153 | + +SELECT SOUNDEX(NULL); ++-------------------------------------+ +| `SOUNDEX(NULL)` | ++-------------------------------------+ +| | ++-------------------------------------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/sounds-like.md b/tidb-cloud-lake/sql/sounds-like.md new file mode 100644 index 0000000000000..f4f34d22b0e2e --- /dev/null +++ b/tidb-cloud-lake/sql/sounds-like.md @@ -0,0 +1,68 @@ +--- +title: SOUNDS LIKE +--- + +Compares the pronunciation of two strings by their Soundex codes. Soundex is a phonetic algorithm that produces a code representing the pronunciation of a string, allowing for approximate matching of strings based on their pronunciation rather than their spelling. Databend offers the [SOUNDEX](/tidb-cloud-lake/sql/soundex.md) function that allows you to get the Soundex code from a string. + +SOUNDS LIKE is frequently employed in the WHERE clause of SQL queries to narrow down rows using fuzzy string matching, such as for names and addresses, see [Filtering Rows](#filtering-rows) in [Examples](#examples). + +:::note +While the function can be useful for approximate string matching, it is important to note that it is not always accurate. The Soundex algorithm is based on English pronunciation rules and may not work well for strings from other languages or dialects. +::: + +## Syntax + +```sql + SOUNDS LIKE +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| str1, 2 | The strings you compare. | + +## Return Type + +Return a Boolean value of 1 if the Soundex codes for the two strings are the same (which means they sound alike) and 0 otherwise. + +## Examples + +### Comparing Strings + +```sql +SELECT 'two' SOUNDS LIKE 'too' +---- +1 + +SELECT CONCAT('A', 'B') SOUNDS LIKE 'AB'; +---- +1 + +SELECT 'Monday' SOUNDS LIKE 'Sunday'; +---- +0 +``` + +### Filtering Rows + +```sql +SELECT * FROM employees; + +id|first_name|last_name|age| +--+----------+---------+---+ + 0|John |Smith | 35| + 0|Mark |Smythe | 28| + 0|Johann |Schmidt | 51| + 0|Eric |Doe | 30| + 0|Sue |Johnson | 45| + + +SELECT * FROM employees +WHERE first_name SOUNDS LIKE 'John'; + +id|first_name|last_name|age| +--+----------+---------+---+ + 0|John |Smith | 35| + 0|Johann |Schmidt | 51| +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/space.md b/tidb-cloud-lake/sql/space.md new file mode 100644 index 0000000000000..c80c4344deadd --- /dev/null +++ b/tidb-cloud-lake/sql/space.md @@ -0,0 +1,32 @@ +--- +title: SPACE +--- + +Returns a string consisting of N blank space characters. + +## Syntax + +```sql +SPACE(); +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------------| +| `` | The number of spaces | + +## Return Type + +String data type value. + +## Examples + +```sql +SELECT SPACE(20) ++----------------------+ +| SPACE(20) | ++----------------------+ +| | ++----------------------+ +``` diff --git a/tidb-cloud-lake/sql/split-part.md b/tidb-cloud-lake/sql/split-part.md new file mode 100644 index 0000000000000..13cae3619ff7a --- /dev/null +++ b/tidb-cloud-lake/sql/split-part.md @@ -0,0 +1,67 @@ +--- +title: SPLIT_PART +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Splits a string using a specified delimiter and returns the specified part. + +See also: [SPLIT](/tidb-cloud-lake/sql/split.md) + +## Syntax + +```sql +SPLIT_PART('', '', '') +``` + +The *position* argument specifies which part to return. It uses a 1-based index but can also accept positive, negative, or zero values: + +- If *position* is a positive number, it returns the part at the position from the left to the right, or NULL if it doesn't exist. +- If *position* is a negative number, it returns the part at the position from the right to the left, or NULL if it doesn't exist. +- If *position* is 0, it is treated as 1, effectively returning the first part of the string. + +## Return Type + +String. SPLIT_PART returns NULL when either the input string, the delimiter, or the position is NULL. + +## Examples + +```sql +-- Use a space as the delimiter +-- SPLIT_PART returns a specific part. +SELECT SPLIT_PART('Databend Cloud', ' ', 1); + +split_part('databend cloud', ' ', 1)| +------------------------------------+ +Databend | + +-- Use an empty string as the delimiter or a delimiter that does not exist in the input string +-- SPLIT_PART returns the entire input string. +SELECT SPLIT_PART('Databend Cloud', '', 1); + +split_part('databend cloud', '', 1)| +-----------------------------------+ +Databend Cloud | + +SELECT SPLIT_PART('Databend Cloud', ',', 1); + +split_part('databend cloud', ',', 1)| +------------------------------------+ +Databend Cloud | + +-- Use ' ' (tab) as the delimiter +-- SPLIT_PART returns individual fields. +SELECT SPLIT_PART('2023-10-19 15:30:45 INFO Log message goes here', ' ', 3); + +split_part('2023-10-19 15:30:45 info log message goes here', ' ', 3)| +--------------------------------------------------------------------------+ +Log message goes here | + +-- SPLIT_PART returns an empty string as the specified part does not exist at all. +SELECT SPLIT_PART('2023-10-19 15:30:45 INFO Log message goes here', ' ', 4); + +split_part('2023-10-19 15:30:45 info log message goes here', ' ', 4)| +--------------------------------------------------------------------------+ + | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/split.md b/tidb-cloud-lake/sql/split.md new file mode 100644 index 0000000000000..e3df046cdb008 --- /dev/null +++ b/tidb-cloud-lake/sql/split.md @@ -0,0 +1,55 @@ +--- +title: SPLIT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Splits a string using a specified delimiter and returns the resulting parts as an array. + +See also: [SPLIT_PART](/tidb-cloud-lake/sql/split-part.md) + +## Syntax + +```sql +SPLIT('', '') +``` + +## Return Type + +Array of strings. SPLIT returns NULL when either the input string or the delimiter is NULL. + +## Examples + +```sql +-- Use a space as the delimiter +-- SPLIT returns an array with two parts. +SELECT SPLIT('Databend Cloud', ' '); + +split('databend cloud', ' ')| +----------------------------+ +['Databend','Cloud'] | + +-- Use an empty string as the delimiter or a delimiter that does not exist in the input string +-- SPLIT returns an array containing the entire input string as a single part. +SELECT SPLIT('Databend Cloud', ''); + +split('databend cloud', '')| +---------------------------+ +['Databend Cloud'] | + +SELECT SPLIT('Databend Cloud', ','); + +split('databend cloud', ',')| +----------------------------+ +['Databend Cloud'] | + +-- Use ' ' (tab) as the delimiter +-- SPLIT returns an array with timestamp, log level, and message. + +SELECT SPLIT('2023-10-19 15:30:45 INFO Log message goes here', ' '); + +split('2023-10-19 15:30:45\tinfo\tlog message goes here', '\t')| +---------------------------------------------------------------+ +['2023-10-19 15:30:45','INFO','Log message goes here'] | +``` diff --git a/tidb-cloud-lake/sql/sql-dialects-conformance.md b/tidb-cloud-lake/sql/sql-dialects-conformance.md new file mode 100644 index 0000000000000..5ae394de64a2c --- /dev/null +++ b/tidb-cloud-lake/sql/sql-dialects-conformance.md @@ -0,0 +1,198 @@ +--- +title: SQL Dialects & Conformance +--- + +This page provides details on the SQL dialects supported by Databend, along with its conformity to the SQL standard, particularly focusing on SQL:2011 features and their support status within Databend. + +## Supported SQL Dialects + +A SQL dialect refers to a particular variation or flavor of the Structured Query Language. Databend supports the `PostgreSQL` dialect by default and offers the flexibility to switch to other supported dialects. Please refer to the table below for details on the supported dialects and their respective brief descriptions: + +| Dialect | Introduction | Learn More | +|---------------|-------------------------------------------------------------------------------------------------|------------------------------| +| `PostgreSQL` | Default supported dialect commonly used in enterprises | https://www.postgresql.org/ | +| `MySQL` | Open-source database management system | https://www.mysql.com/ | +| `Hive` | Data warehouse for big data processing | https://hive.apache.org/ | +| `Prql` | PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement | https://github.com/PRQL/prql | +| `Experimental`| Experimental dialect for testing and research | N/A | + +To switch between the supported SQL dialects or display the current one, use the `sql_dialect` setting: + +```sql title='Examples:' +-- Set SQL dialect to PRQL +SET sql_dialect = 'Prql'; + +-- Display current dialect +SHOW SETTINGS LIKE 'sql_dialect'; +``` + +## SQL Conformance Summary + +Databend aims to conform to the SQL standard, with particular support for ISO/IEC 9075:2011, also known as SQL:2011. While not an exhaustive statement of conformance, Databend incorporates many features required by the SQL standard, often with slight differences in syntax or function. This page outlines the level of conformity of Databend to the SQL:2011 standard. + +| Feature ID | Feature Name | Supported? | Note | +|:----------: |:------------------------------------------------------------------------------------------------------------------------: |:----------: |:------------------------------------------------------------------------------------------------------------: | +| **E011** | **Numeric data types** | Yes | | +| E011-01 | INTEGER and SMALLINT data types | Yes | | +| E011-02 | REAL, DOUBLE PRECISION and FLOAT data types | Yes | | +| E011-03 | DECIMAL and NUMERIC data types | Yes | | +| E011-04 | Arithmetic operators | Yes | | +| E011-05 | Numeric comparison | Yes | | +| E011-06 | Implicit casting among the numeric data types | Yes | | +| **E021** | **Character string types** | Partial | | +| E021-01 | CHARACTER data type | No | Fixed-length string type not supported | +| E021-02 | CHARACTER VARYING data type | Yes | | +| E021-03 | Character literals | Yes | | +| E021-04 | CHARACTER_LENGTH function | Yes | | +| E021-05 | OCTET_LENGTH function | Yes | | +| E021-06 | SUBSTRING | Yes | | +| E021-07 | Character concatenation | Yes | | +| E021-08 | UPPER and LOWER functions | Yes | | +| E021-09 | TRIM function | Yes | | +| E021-10 | Implicit casting among the fixed-length and variable-length character string types | No | Fixed-length string type not supported | +| E021-11 | POSITION function | Yes | | +| E021-12 | Character comparison | Yes | | +| **E031** | **Identifiers** | Yes | | +| E031-01 | Delimited identifiers | Yes | | +| E031-02 | Lower case identifiers | Yes | | +| E031-03 | Trailing underscore | Yes | | +| **E051** | **Basic query specification** | Partial | | +| E051-01 | SELECT DISTINCT | Yes | | +| E051-02 | GROUP BY clause | Yes | | +| E051-04 | GROUP BY can contain columns not in SELECT list | Yes | | +| E051-05 | Select items can be renamed | Yes | | +| E051-06 | HAVING clause | Yes | | +| E051-07 | Qualified * in select list | No | | +| E051-08 | Correlation name in the FROM clause | Yes | | +| E051-09 | Rename columns in the FROM clause | No | | +| **E061** | **Basic predicates and search conditions** | Partial | | +| E061-01 | Comparison predicate | Yes | | +| E061-02 | BETWEEN predicate | Yes | | +| E061-03 | IN predicate with list of values | Yes | | +| E061-04 | LIKE predicate | Yes | | +| E061-05 | LIKE predicate: ESCAPE clause | No | | +| E061-06 | NULL predicate | Yes | | +| E061-07 | Quantified comparison predicate | Yes | | +| E061-08 | EXISTS predicate | Yes | | +| E061-09 | Subqueries in comparison predicate | Yes | | +| E061-11 | Subqueries in IN predicate | Yes | | +| E061-12 | Subqueries in quantified comparison predicate | Yes | | +| E061-13 | Correlated subqueries | Yes | | +| E061-14 | Search condition | Yes | | +| **E071** | **Basic query expressions** | Partial | | +| E071-01 | UNION DISTINCT table operator | Yes | | +| E071-02 | UNION ALL table operator | Yes | | +| E071-03 | EXCEPT DISTINCT table operator | Yes | | +| E071-05 | Columns combined via table operators need not have exactly the same data type | Partial | Only columns with data types that can be implicitly coerced are allowed to be combined with table operators. | +| E071-06 | Table operators in subqueries | Yes | | +| **E081** | **Basic privileges** | Partial | | +| E081-01 | SELECT privilege at the table level | Yes | | +| E081-02 | DELETE privilege | Yes | | +| E081-03 | INSERT privilege at the table level | Yes | | +| E081-04 | UPDATE privilege at the table level | Yes | | +| E081-05 | UPDATE privilege at the column level | No | | +| E081-06 | REFERENCES privilege at the table level | No | | +| E081-07 | REFERENCES privilege at the column level | No | | +| E081-08 | WITH GRANT OPTION | No | | +| E081-09 | USAGE privilege | No | | +| E081-10 | EXECUTE privilege | No | | +| **E091** | **Set functions** | Yes | | +| E091-01 | AVG | Yes | | +| E091-02 | COUNT | Yes | | +| E091-03 | MAX | Yes | | +| E091-04 | MIN | Yes | | +| E091-05 | SUM | Yes | | +| E091-06 | ALL quantifier | Yes | | +| E091-07 | DISTINCT quantifier | Partial | Currently, Databend supports COUNT(DISTINCT ...) and SELECT DISTINCT ... queries. | +| **E101** | **Basic data manipulation** | Partial | | +| E101-01 | INSERT statement | Yes | | +| E101-03 | Searched UPDATE statement | Yes | | +| E101-04 | Searched DELETE statement | Yes | | +| **E111** | **Single row SELECT statement** | Yes | | +| **E121** | **Basic cursor support** | Partial | | +| E121-01 | DECLARE CURSOR | No | | +| E121-02 | ORDER BY columns need not be in select list | Yes | | +| E121-03 | Value expressions in ORDER BY clause | Yes | | +| E121-04 | OPEN statement | No | | +| E121-06 | Positioned UPDATE statement | No | | +| E121-07 | Positioned DELETE statement | No | | +| E121-08 | CLOSE statement | No | | +| E121-10 | FETCH statement: implicit NEXT | No | | +| E121-17 | WITH HOLD cursors | No | | +| **E131** | **Null value support (nulls in lieu of values)** | Yes | | +| **E141** | **Basic integrity constraints** | No | | +| E141-01 | NOT NULL constraints | Yes | Default in Databend: All columns are nullable. | +| E141-02 | UNIQUE constraint of NOT NULL columns | No | | +| E141-03 | PRIMARY KEY constraints | No | | +| E141-04 | Basic FOREIGN KEY constraint with the NO ACTION default for both referential delete action and referential update action | No | | +| E141-06 | CHECK constraint | No | | +| E141-07 | Column defaults | Yes | | +| E141-08 | NOT NULL inferred on PRIMARY KEY | No | | +| E141-10 | Names in a foreign key can be specified in any order | No | | +| **E151** | **Transaction support** | Partial | | +| E151-01 | COMMIT statement | Partial | Databend only supports implicit transactions for every individual DML statement. | +| E151-02 | ROLLBACK statement | No | | +| **E152** | **Basic SET TRANSACTION statement** | No | | +| E152-01 | SET TRANSACTION statement: ISOLATION LEVEL SERIALIZABLE clause | No | | +| E152-02 | SET TRANSACTION statement: READ ONLY and READ WRITE clauses | No | | +| **E153** | **Updatable queries with subqueries** | Yes | | +| **E161** | **SQL comments using leading double minus** | Yes | | +| **E171** | **SQLSTATE support** | No | | +| **E182** | **Host language binding** | No | | +| **F031** | **Basic schema manipulation** | Yes | | +| F031-01 | CREATE TABLE statement to create persistent base tables | Yes | | +| F031-02 | CREATE VIEW statement | Yes | | +| F031-03 | GRANT statement | Partial | | +| F031-04 | ALTER TABLE statement: ADD COLUMN clause | Yes | | +| F031-13 | DROP TABLE statement: RESTRICT clause | Partial | | +| F031-16 | DROP VIEW statement: RESTRICT clause | Partial | | +| F031-19 | REVOKE statement: RESTRICT clause | Partial | | +| **F041** | **Basic joined table** | Yes | | +| F041-01 | Inner join (but not necessarily the INNER keyword) | Yes | | +| F041-02 | INNER keyword | Yes | | +| F041-03 | LEFT OUTER JOIN | Yes | | +| F041-04 | RIGHT OUTER JOIN | Yes | | +| F041-05 | Outer joins can be nested | Yes | | +| F041-07 | The inner table in a left or right outer join can also be used in an inner join | Yes | | +| F041-08 | All comparison operators are supported (rather than just =) | Yes | | +| **F051** | **Basic date and time** | Partial | | +| F051-01 | DATE data type (including support of DATE literal) | Yes | | +| F051-02 | TIME data type (including support of TIME literal) with fractional seconds precision of at least 0 | No | | +| F051-03 | TIMESTAMP data type (including support of TIMESTAMP literal) with fractional seconds precision of at least 0 and 6 | Yes | | +| F051-04 | Comparison predicate on DATE, TIME, and TIMESTAMP data types | Yes | | +| F051-05 | Explicit CAST between datetime types and character string types | Yes | | +| F051-06 | CURRENT_DATE | Yes | | +| F051-07 | LOCALTIME | Yes | | +| F051-08 | LOCALTIMESTAMP | Yes | | +| **F081** | **UNION and EXCEPT in views** | Yes | | +| **F131** | **Grouped operations** | Yes | | +| F131-01 | WHERE, GROUP BY, and HAVING clauses supported in queries with grouped views | Yes | | +| F131-02 | Multiple tables supported in queries with grouped views | Yes | | +| F131-03 | Set functions supported in queries with grouped views | Yes | | +| F131-04 | Subqueries with GROUP BY and HAVING clauses and grouped views | Yes | | +| F131-05 | Single row SELECT with GROUP BY and HAVING clauses and grouped views | Yes | | +| **F181** | **Multiple module support** | No | | +| **F201** | **CAST function** | Yes | | +| **F221** | **Explicit defaults** | No | | +| **F261** | **CASE expression** | Yes | | +| F261-01 | Simple CASE | Yes | | +| F261-02 | Searched CASE | Yes | | +| F261-03 | NULLIF | Yes | | +| F261-04 | COALESCE | Yes | | +| **F311** | **Schema definition statement** | Partial | | +| F311-01 | CREATE SCHEMA | Yes | | +| F311-02 | CREATE TABLE for persistent base tables | Yes | | +| F311-03 | CREATE VIEW | Yes | | +| F311-04 | CREATE VIEW: WITH CHECK OPTION | No | | +| F311-05 | GRANT statement | Partial | | +| **F471** | **Scalar subquery values** | Yes | | +| **F481** | **Expanded NULL predicate** | Yes | | +| **F812** | **Basic flagging** | No | | +| **S011** | **Distinct data types** | No | | +| **T321** | **Basic SQL-invoked routines** | No | | +| T321-01 | User-defined functions with no overloading | Yes | | +| T321-02 | User-defined stored procedures with no overloading | No | | +| T321-03 | Function invocation | Yes | | +| T321-04 | CALL statement | No | | +| T321-05 | RETURN statement | No | | +| **T631** | **IN predicate with one list element** | Yes | | \ No newline at end of file diff --git a/tidb-cloud-lake/sql/sql-function-reference.md b/tidb-cloud-lake/sql/sql-function-reference.md new file mode 100644 index 0000000000000..d0bd54aae427a --- /dev/null +++ b/tidb-cloud-lake/sql/sql-function-reference.md @@ -0,0 +1,73 @@ +--- +title: SQL Function Reference +--- + +Databend provides comprehensive SQL functions for all types of data processing. Functions are organized by importance and usage frequency. + +## Core Data Functions + +| Category | Description | +|----------|-------------| +| [Numeric Functions](/tidb-cloud-lake/sql/numeric-functions.md) | Mathematical operations and calculations | +| [String Functions](/tidb-cloud-lake/sql/string-functions.md) | Text manipulation and string processing | +| [Date & Time Functions](/tidb-cloud-lake/sql/date-time-functions.md) | Date, time, and temporal operations | +| [Conversion Functions](/tidb-cloud-lake/sql/conversion-functions.md) | Type casting and data format conversions | +| [Conditional Functions](/tidb-cloud-lake/sql/conditional-functions.md) | Logic and control flow operations | + +## Analytics Functions + +| Category | Description | +|----------|-------------| +| [Aggregate Functions](/tidb-cloud-lake/sql/aggregate-functions.md) | Statistical calculations across multiple rows | +| [Window Functions](/tidb-cloud-lake/sql/window-functions.md) | Advanced analytics with window operations | + +## Structured & Semi-Structured Data + +| Category | Description | +|----------|-------------| +| [Structured & Semi-Structured Functions](/tidb-cloud-lake/sql/structured-semi-structured-functions.md) | JSON, arrays, objects, and nested data processing | + +## Search Functions + +| Category | Description | +|----------|-------------| +| [Full-Text Search Functions](/tidb-cloud-lake/sql/full-text-search-functions.md) | Full-text search and relevance scoring | + +## Vector Functions + +| Category | Description | +|----------|-------------| +| [Vector Functions](/tidb-cloud-lake/sql/vector-functions.md) | Vector similarity and distance calculations | + +## Geospatial Functions + +| Category | Description | +|----------|-------------| +| [Geospatial Functions](/tidb-cloud-lake/sql/geospatial-functions.md) | Geometry, GeoHash, and H3 spatial operations | + +## Data Management + +| Category | Description | +|----------|-------------| +| [Table Functions](/tidb-cloud-lake/sql/table-functions.md) | File inspection, data generation, and system information | +| [System Functions](/tidb-cloud-lake/sql/system-functions.md) | System information and management operations | +| [Context Functions](/tidb-cloud-lake/sql/context-functions.md) | Current session, user, and database information | + +## Security & Integrity + +| Category | Description | +|----------|-------------| +| [Hash Functions](/tidb-cloud-lake/sql/hash-functions.md) | Data hashing and integrity verification | +| [Bitmap Functions](/tidb-cloud-lake/sql/bitmap-functions.md) | High-performance bitmap operations and analytics | +| [UUID Functions](/tidb-cloud-lake/sql/uuid-functions.md) | Universally unique identifier generation | +| [IP Address Functions](/tidb-cloud-lake/sql/ip-address-functions.md) | Network address manipulation and validation | + +## Utility Functions + +| Category | Description | +|----------|-------------| +| [Interval Functions](/tidb-cloud-lake/sql/interval-functions.md) | Time unit conversion and interval creation | +| [Sequence Functions](/tidb-cloud-lake/sql/sequence-functions.md) | Auto-incrementing sequence value generation | +| [Data Anonymization Functions](/tidb-cloud-lake/sql/data-anonymization-functions.md) | Data masking and anonymization utilities | +| [Test Functions](/tidb-cloud-lake/sql/test-functions.md) | Testing and debugging utilities | +| [Other Functions](/tidb-cloud-lake/sql/other-functions.md) | Miscellaneous helpers and utilities | diff --git a/tidb-cloud-lake/sql/sql-identifiers.md b/tidb-cloud-lake/sql/sql-identifiers.md new file mode 100644 index 0000000000000..aae2b1a3d1a84 --- /dev/null +++ b/tidb-cloud-lake/sql/sql-identifiers.md @@ -0,0 +1,153 @@ +--- +title: SQL Identifiers +sidebar_label: SQL Identifiers +--- + +SQL identifiers are names used for different elements within Databend, such as tables, views, and databases. + +## Unquoted & Double-quoted Identifiers + +Unquoted identifiers begin with a letter (A-Z, a-z) or underscore (“_”) and may consist of letters, underscores, numbers (0-9), or dollar signs (“$”). + +```text title='Examples:' +mydatabend +MyDatabend1 +My$databend +_my_databend +``` + +Double-quoted identifiers can include a wide range of characters, such as numbers (0-9), special characters (like period (.), single quote ('), exclamation mark (!), at symbol (@), number sign (#), dollar sign ($), percent sign (%), caret (^), and ampersand (&)), extended ASCII and non-ASCII characters, as well as blank spaces. + +```text title='Examples:' +"MyDatabend" +"my.databend" +"my databend" +"My 'Databend'" +"1_databend" +"$Databend" +``` + +Note that using double backticks (``) or double quotes (") is equivalent: + +```text title='Examples:' +`MyDatabend` +`my.databend` +`my databend` +`My 'Databend'` +`1_databend` +`$Databend` +``` + +## Identifier Casing Rules + +Databend stores unquoted identifiers by default in lowercase and double-quoted identifiers as they are entered. In other words, Databend handles object names, such as databases, tables, and columns, as case-insensitive. If you want Databend to handle them as case-sensitive, double-quote them. + +:::note +Databend allows you to have control over the casing sensitivity of identifiers. Two key settings are available: + +- unquoted_ident_case_sensitive: When set to 1, this option preserves the case of characters for unquoted identifiers, ensuring they are case-sensitive. If left at the default value of 0, unquoted identifiers remain case-insensitive, converting to lowercase. + +- quoted_ident_case_sensitive: By setting this option to 0, you can indicate that double-quoted identifiers should not preserve the case of characters, making them case-insensitive. +::: + +This example demonstrates how Databend treats the casing of identifiers when creating and listing databases: + +```sql +-- Create a database named "databend" +CREATE DATABASE databend; + +-- Attempt to create a database named "Databend" +CREATE DATABASE Databend; + +>> SQL Error [1105] [HY000]: DatabaseAlreadyExists. Code: 2301, Text = Database 'databend' already exists. + +-- Create a database named "Databend" +CREATE DATABASE "Databend"; + +-- List all databases +SHOW DATABASES; + +databases_in_default| +--------------------+ +Databend | +databend | +default | +information_schema | +system | +``` + +This example demonstrates how Databend handles identifier casing for table and column names, highlighting its case-sensitivity by default and the use of double quotes to differentiate between identifiers with varying casing: + +```sql +-- Create a table named "databend" +CREATE TABLE databend (a INT); +DESC databend; + +Field|Type|Null|Default|Extra| +-----+----+----+-------+-----+ +a |INT |YES |NULL | | + +-- Attempt to create a table named "Databend" +CREATE TABLE Databend (a INT); + +>> SQL Error [1105] [HY000]: TableAlreadyExists. Code: 2302, Text = Table 'databend' already exists. + +-- Attempt to create a table with one column named "a" and the other one named "A" +CREATE TABLE "Databend" (a INT, A INT); + +>> SQL Error [1105] [HY000]: BadArguments. Code: 1006, Text = Duplicated column name: a. + +-- Double quote the column names +CREATE TABLE "Databend" ("a" INT, "A" INT); +DESC "Databend"; + +Field|Type|Null|Default|Extra| +-----+----+----+-------+-----+ +a |INT |YES |NULL | | +A |INT |YES |NULL | | +``` + +## String Identifiers + +In Databend, when managing string items like text and dates, it is essential to enclose them within single quotes (') as a standard practice. + +```sql +INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27'); + +SELECT 'Databend'; + +'databend'| +----------+ +Databend | + +SELECT "Databend"; + +>> SQL Error [1105] [HY000]: SemanticError. Code: 1065, Text = error: + --> SQL:1:73 + | +1 | /* ApplicationName=DBeaver 23.2.0 - SQLEditor */ SELECT "Databend" + | ^^^^^^^^^^ column Databend doesn't exist, do you mean 'Databend'? +``` + +By default, Databend SQL dialect is `PostgreSQL`: + +```sql +SHOW SETTINGS LIKE '%sql_dialect%'; + +name |value |default |level |description |type | +-----------+----------+----------+-------+---------------------------------------------------------------------------------+------+ +sql_dialect|PostgreSQL|PostgreSQL|SESSION|Sets the SQL dialect. Available values include "PostgreSQL", "MySQL", and "Hive".|String| +``` + +You can change it to `MySQL` to enable double quotes (`"`): + +```sql +SET sql_dialect='MySQL'; + +SELECT "demo"; ++--------+ +| 'demo' | ++--------+ +| demo | ++--------+ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/sql-reference.md b/tidb-cloud-lake/sql/sql-reference.md new file mode 100644 index 0000000000000..5362db9849297 --- /dev/null +++ b/tidb-cloud-lake/sql/sql-reference.md @@ -0,0 +1,14 @@ +--- +title: SQL Reference +slug: '/' +--- + +Welcome to SQL Reference – your swift-access guide for Databend essentials! + +- **General Reference:** Provides insights into foundational elements like Data Types, System Tables, and Table Engines, helping you build a solid understanding of Databend's structure. + +- **SQL Commands:** Detailed information, syntax, and practical examples for executing commands, empowering confident data management in Databend. + +- **SQL Functions:** A concise guide to Databend's functions, providing insights into their diverse functionalities for effective data management and analysis. + +- **Stored Procedure & Scripting:** Covers the SQL scripting language, including variables, control flow, result handling, and dynamic execution within stored procedures. diff --git a/tidb-cloud-lake/sql/sql-statements-reference.md b/tidb-cloud-lake/sql/sql-statements-reference.md new file mode 100644 index 0000000000000..467eecf661423 --- /dev/null +++ b/tidb-cloud-lake/sql/sql-statements-reference.md @@ -0,0 +1,16 @@ +--- +title: SQL Commands Reference +--- + +These topics provide reference information for various SQL commands in Databend. + +## Command Categories + +| Category | Description | +|----------|-------------| +| **[DDL Commands](/tidb-cloud-lake/sql/ddl.md)** | Data Definition Language - Create, alter, and drop database objects | +| **[DML Commands](/tidb-cloud-lake/sql/dml.md)** | Data Manipulation Language - Insert, update, delete, and copy data | +| **[Query Syntax](/tidb-cloud-lake/sql/query-syntax.md)** | SELECT statement components - FROM, WHERE, GROUP BY, JOIN, etc. | +| **[Query Operators](/tidb-cloud-lake/sql/query-operators.md)** | Arithmetic, comparison, logical, and other operators | +| **[EXPLAIN Commands](/tidb-cloud-lake/sql/explain-commands.md)** | Query analysis and optimization tools | +| **[Administration Commands](/tidb-cloud-lake/sql/administration-commands.md)** | System monitoring, configuration, and maintenance | diff --git a/tidb-cloud-lake/sql/sql-variables.md b/tidb-cloud-lake/sql/sql-variables.md new file mode 100644 index 0000000000000..8b19566cd07d3 --- /dev/null +++ b/tidb-cloud-lake/sql/sql-variables.md @@ -0,0 +1,57 @@ +--- +title: SQL Variables +sidebar_label: SQL Variables +--- + +SQL variables allow you to store and manage temporary data within a session, making scripts more dynamic and reusable. + +## Variable Commands + +| Command | Description | +|---------|-------------| +| [SET VARIABLE](/tidb-cloud-lake/sql/set-variable.md) | Creates or modifies a session or user variable. | +| [UNSET VARIABLE](/tidb-cloud-lake/sql/unset-variable.md) | Removes a user-defined variable. | +| [SHOW VARIABLES](/tidb-cloud-lake/sql/show-variables.md) | Displays current values of system and user variables. | + +The SHOW VARIABLES command also has a table function counterpart, [`SHOW_VARIABLES`](/tidb-cloud-lake/sql/show-variables.md), which returns the same information in a tabular format for richer filtering and querying. + +## Querying with Variables + +You can reference variables in statements for dynamic value substitution or to build object names at runtime. + +### Accessing Variables with `$` and `getvariable()` + +Use the `$` symbol or the `getvariable()` function to embed variable values directly in a query. + +```sql title='Example:' +-- Set a variable to use as a filter value +SET VARIABLE threshold = 100; + +-- Use the variable in a query with $ +SELECT * FROM sales WHERE amount > $threshold; + +-- Alternatively, use the getvariable() function +SELECT * FROM sales WHERE amount > getvariable('threshold'); +``` + +### Accessing Objects with `IDENTIFIER` + +The `IDENTIFIER` keyword lets you reference database objects whose names are stored in variables, enabling flexible query construction. (Note: BendSQL does not yet support `IDENTIFIER`.) + +```sql title='Example:' +-- Create a table with sales data +CREATE TABLE sales_data (region TEXT, sales_amount INT, month TEXT) AS +SELECT 'North', 5000, 'January' UNION ALL +SELECT 'South', 3000, 'January'; + +select * from sales_data; + +-- Set variables for the table name and column name +SET VARIABLE table_name = 'sales_data'; +SET VARIABLE column_name = 'sales_amount'; + +-- Use IDENTIFIER to dynamically reference the table and column in the query +SELECT region, IDENTIFIER($column_name) +FROM IDENTIFIER($table_name) +WHERE IDENTIFIER($column_name) > 4000; +``` diff --git a/tidb-cloud-lake/sql/sqrt.md b/tidb-cloud-lake/sql/sqrt.md new file mode 100644 index 0000000000000..4b6362deda091 --- /dev/null +++ b/tidb-cloud-lake/sql/sqrt.md @@ -0,0 +1,23 @@ +--- +title: SQRT +--- + +Returns the square root of a nonnegative number `x`. Returns Nan for negative input. + +## Syntax + +```sql +SQRT( ) +``` + +## Examples + +```sql +SELECT SQRT(4); + +┌─────────┐ +│ sqrt(4) │ +├─────────┤ +│ 2 │ +└─────────┘ +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/st-area.md b/tidb-cloud-lake/sql/st-area.md new file mode 100644 index 0000000000000..5cf2034930377 --- /dev/null +++ b/tidb-cloud-lake/sql/st-area.md @@ -0,0 +1,56 @@ +--- +title: ST_AREA +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the area of a GEOMETRY or GEOGRAPHY object. For GEOMETRY inputs, the function uses planar area based on the [shoelace formula](https://en.wikipedia.org/wiki/Shoelace_formula). For GEOGRAPHY inputs, the function measures geodesic area on an ellipsoidal model of the earth using the method described in [Karney (2013)](https://arxiv.org/pdf/1109.4448.pdf). + +## Syntax + +```sql +ST_AREA() +``` + +## Arguments + +| Arguments | Description | +|---------------------------|-----------------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY. | + +## Return Type + +Double. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_AREA( + TO_GEOMETRY('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))') + ) AS area + +┌──────┐ +│ area │ +├──────┤ +│ 1.0 │ +└──────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_AREA( + TO_GEOGRAPHY('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))') + ) AS area + +╭────────────────────╮ +│ area │ +├────────────────────┤ +│ 12308778361.469452 │ +╰────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-asbinary.md b/tidb-cloud-lake/sql/st-asbinary.md new file mode 100644 index 0000000000000..1efcffadf3e46 --- /dev/null +++ b/tidb-cloud-lake/sql/st-asbinary.md @@ -0,0 +1,5 @@ +--- +title: ST_ASBINARY +--- + +Alias for [ST_ASWKB](/tidb-cloud-lake/sql/st-aswkb.md). diff --git a/tidb-cloud-lake/sql/st-asewkb.md b/tidb-cloud-lake/sql/st-asewkb.md new file mode 100644 index 0000000000000..e4ce441d812c9 --- /dev/null +++ b/tidb-cloud-lake/sql/st-asewkb.md @@ -0,0 +1,73 @@ +--- +title: ST_ASEWKB +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts a GEOMETRY or GEOGRAPHY object into a [EWKB(extended well-known-binary)](https://postgis.net/docs/ST_GeomFromEWKB.html) format representation. + +## Syntax + +```sql +ST_ASEWKB() +``` + +## Arguments + +| Arguments | Description | +|--------------|------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY. | + +## Return Type + +Binary. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_ASEWKB( + ST_GEOMETRYFROMWKT( + 'SRID=4326;LINESTRING(400000 6000000, 401000 6010000)' + ) + ) AS pipeline_ewkb; + +┌────────────────────────────────────────────────────────────────────────────────────────────┐ +│ pipeline_ewkb │ +├────────────────────────────────────────────────────────────────────────────────────────────┤ +│ 0102000020E61000000200000000000000006A18410000000060E3564100000000A07918410000000024ED5641 │ +└────────────────────────────────────────────────────────────────────────────────────────────┘ + +SELECT + ST_ASEWKB( + ST_GEOMETRYFROMWKT( + 'SRID=4326;POINT(-122.35 37.55)' + ) + ) AS pipeline_ewkb; + +┌────────────────────────────────────────────────────┐ +│ pipeline_ewkb │ +├────────────────────────────────────────────────────┤ +│ 0101000020E61000006666666666965EC06666666666C64240 │ +└────────────────────────────────────────────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_ASEWKB( + ST_GEOGFROMWKT( + 'SRID=4326;POINT(-122.35 37.55)' + ) + ) AS pipeline_ewkb; + +╭────────────────────────────────────────────────────╮ +│ pipeline_ewkb │ +├────────────────────────────────────────────────────┤ +│ 0101000020E61000006666666666965EC06666666666C64240 │ +╰────────────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-asewkt.md b/tidb-cloud-lake/sql/st-asewkt.md new file mode 100644 index 0000000000000..d39df8b99d553 --- /dev/null +++ b/tidb-cloud-lake/sql/st-asewkt.md @@ -0,0 +1,73 @@ +--- +title: ST_ASEWKT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts a GEOMETRY or GEOGRAPHY object into a [EWKT(extended well-known-text)](https://postgis.net/docs/ST_GeomFromEWKT.html) format representation. + +## Syntax + +```sql +ST_ASEWKT() +``` + +## Arguments + +| Arguments | Description | +|--------------|------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY. | + +## Return Type + +String. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_ASEWKT( + ST_GEOMETRYFROMWKT( + 'SRID=4326;LINESTRING(400000 6000000, 401000 6010000)' + ) + ) AS pipeline_ewkt; + +┌─────────────────────────────────────────────────────┐ +│ pipeline_ewkt │ +├─────────────────────────────────────────────────────┤ +│ SRID=4326;LINESTRING(400000 6000000,401000 6010000) │ +└─────────────────────────────────────────────────────┘ + +SELECT + ST_ASEWKT( + ST_GEOMETRYFROMWKT( + 'SRID=4326;POINT(-122.35 37.55)' + ) + ) AS pipeline_ewkt; + +┌────────────────────────────────┐ +│ pipeline_ewkt │ +├────────────────────────────────┤ +│ SRID=4326;POINT(-122.35 37.55) │ +└────────────────────────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_ASEWKT( + ST_GEOGFROMWKT( + 'SRID=4326;POINT(-122.35 37.55)' + ) + ) AS pipeline_ewkt; + +╭────────────────────────────────╮ +│ pipeline_ewkt │ +├────────────────────────────────┤ +│ SRID=4326;POINT(-122.35 37.55) │ +╰────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-asgeojson.md b/tidb-cloud-lake/sql/st-asgeojson.md new file mode 100644 index 0000000000000..0d0b6751e8436 --- /dev/null +++ b/tidb-cloud-lake/sql/st-asgeojson.md @@ -0,0 +1,59 @@ +--- +title: ST_ASGEOJSON +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts a GEOMETRY or GEOGRAPHY object into a [GeoJSON](https://geojson.org/) representation. + +## Syntax + +```sql +ST_ASGEOJSON() +``` + +## Arguments + +| Arguments | Description | +|--------------|------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY. | + +## Return Type + +Variant. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_ASGEOJSON( + ST_GEOMETRYFROMWKT( + 'SRID=4326;LINESTRING(400000 6000000, 401000 6010000)' + ) + ) AS pipeline_geojson; + +┌─────────────────────────────────────────────────────────────────────────┐ +│ pipeline_geojson │ +├─────────────────────────────────────────────────────────────────────────┤ +│ {"coordinates":[[400000,6000000],[401000,6010000]],"type":"LineString"} │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_ASGEOJSON( + ST_GEOGFROMWKT( + 'SRID=4326;POINT(-122.35 37.55)' + ) + ) AS pipeline_geojson; + +╭────────────────────────────────────────────────╮ +│ pipeline_geojson │ +├────────────────────────────────────────────────┤ +│ {"coordinates":[-122.35,37.55],"type":"Point"} │ +╰────────────────────────────────────────────────╯``` diff --git a/tidb-cloud-lake/sql/st-astext.md b/tidb-cloud-lake/sql/st-astext.md new file mode 100644 index 0000000000000..9d0ce235f4a52 --- /dev/null +++ b/tidb-cloud-lake/sql/st-astext.md @@ -0,0 +1,5 @@ +--- +title: ST_ASTEXT +--- + +Alias for [ST_ASWKT](/tidb-cloud-lake/sql/st-aswkt.md). diff --git a/tidb-cloud-lake/sql/st-aswkb.md b/tidb-cloud-lake/sql/st-aswkb.md new file mode 100644 index 0000000000000..cce2da9d6e0f1 --- /dev/null +++ b/tidb-cloud-lake/sql/st-aswkb.md @@ -0,0 +1,77 @@ +--- +title: ST_ASWKB +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts a GEOMETRY or GEOGRAPHY object into a [WKB(well-known-binary)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary) format representation. + +## Syntax + +```sql +ST_ASWKB() +``` + +## Aliases + +- [ST_ASBINARY](/tidb-cloud-lake/sql/st-asbinary.md) + +## Arguments + +| Arguments | Description | +|--------------|------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY. | + +## Return Type + +Binary. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_ASWKB( + ST_GEOMETRYFROMWKT( + 'SRID=4326;LINESTRING(400000 6000000, 401000 6010000)' + ) + ) AS pipeline_wkb; + +┌────────────────────────────────────────────────────────────────────────────────────┐ +│ pipeline_wkb │ +├────────────────────────────────────────────────────────────────────────────────────┤ +│ 01020000000200000000000000006A18410000000060E3564100000000A07918410000000024ED5641 │ +└────────────────────────────────────────────────────────────────────────────────────┘ + +SELECT + ST_ASBINARY( + ST_GEOMETRYFROMWKT( + 'SRID=4326;POINT(-122.35 37.55)' + ) + ) AS pipeline_wkb; + +┌────────────────────────────────────────────┐ +│ pipeline_wkb │ +├────────────────────────────────────────────┤ +│ 01010000006666666666965EC06666666666C64240 │ +└────────────────────────────────────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_ASWKB( + ST_GEOGFROMWKT( + 'SRID=4326;POINT(-122.35 37.55)' + ) + ) AS pipeline_wkb; + +╭────────────────────────────────────────────╮ +│ pipeline_wkb │ +├────────────────────────────────────────────┤ +│ 01010000006666666666965EC06666666666C64240 │ +╰────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-aswkt.md b/tidb-cloud-lake/sql/st-aswkt.md new file mode 100644 index 0000000000000..2e3d55a682359 --- /dev/null +++ b/tidb-cloud-lake/sql/st-aswkt.md @@ -0,0 +1,77 @@ +--- +title: ST_ASWKT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts a GEOMETRY or GEOGRAPHY object into a [WKT(well-known-text)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) format representation. + +## Syntax + +```sql +ST_ASWKT() +``` + +## Aliases + +- [ST_ASTEXT](/tidb-cloud-lake/sql/st-astext.md) + +## Arguments + +| Arguments | Description | +|--------------|------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY. | + +## Return Type + +String. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_ASWKT( + ST_GEOMETRYFROMWKT( + 'SRID=4326;LINESTRING(400000 6000000, 401000 6010000)' + ) + ) AS pipeline_wkt; + +┌───────────────────────────────────────────┐ +│ pipeline_wkt │ +├───────────────────────────────────────────┤ +│ LINESTRING(400000 6000000,401000 6010000) │ +└───────────────────────────────────────────┘ + +SELECT + ST_ASTEXT( + ST_GEOMETRYFROMWKT( + 'SRID=4326;POINT(-122.35 37.55)' + ) + ) AS pipeline_wkt; + +┌──────────────────────┐ +│ pipeline_wkt │ +├──────────────────────┤ +│ POINT(-122.35 37.55) │ +└──────────────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_ASWKT( + ST_GEOGFROMWKT( + 'SRID=4326;POINT(-122.35 37.55)' + ) + ) AS pipeline_wkt; + +╭──────────────────────╮ +│ pipeline_wkt │ +├──────────────────────┤ +│ POINT(-122.35 37.55) │ +╰──────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-contains.md b/tidb-cloud-lake/sql/st-contains.md new file mode 100644 index 0000000000000..c0e75ca24e0e3 --- /dev/null +++ b/tidb-cloud-lake/sql/st-contains.md @@ -0,0 +1,58 @@ +--- +title: ST_CONTAINS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns TRUE if the second GEOMETRY object is completely inside the first GEOMETRY object. + +## Syntax + +```sql +ST_CONTAINS(, ) +``` + +## Arguments + +| Arguments | Description | +|---------------|----------------------------------------------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY object that is not a GeometryCollection. | +| `` | The argument must be an expression of type GEOMETRY object that is not a GeometryCollection. | + +:::note +- The function reports an error if the two input GEOMETRY objects have different SRIDs. +::: + +## Return Type + +Boolean. + +## Examples + +```sql +SELECT ST_CONTAINS(TO_GEOMETRY('POLYGON((-2 0, 0 2, 2 0, -2 0))'), TO_GEOMETRY('POLYGON((-1 0, 0 1, 1 0, -1 0))')) AS contains + +┌──────────┐ +│ contains │ +├──────────┤ +│ true │ +└──────────┘ + +SELECT ST_CONTAINS(TO_GEOMETRY('POLYGON((-2 0, 0 2, 2 0, -2 0))'), TO_GEOMETRY('LINESTRING(-1 1, 0 2, 1 1)')) AS contains + +┌──────────┐ +│ contains │ +├──────────┤ +│ false │ +└──────────┘ + +SELECT ST_CONTAINS(TO_GEOMETRY('POLYGON((-2 0, 0 2, 2 0, -2 0))'), TO_GEOMETRY('LINESTRING(-2 0, 0 0, 0 1)')) AS contains + +┌──────────┐ +│ contains │ +├──────────┤ +│ true │ +└──────────┘ + +``` diff --git a/tidb-cloud-lake/sql/st-dimension.md b/tidb-cloud-lake/sql/st-dimension.md new file mode 100644 index 0000000000000..bba01a29300cb --- /dev/null +++ b/tidb-cloud-lake/sql/st-dimension.md @@ -0,0 +1,92 @@ +--- +title: ST_DIMENSION +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Return the dimension for a geometry object. The dimension of a GEOMETRY or GEOGRAPHY object is: + +| Geospatial Object Type | Dimension | +|------------------------------|------------| +| Point / MultiPoint | 0 | +| LineString / MultiLineString | 1 | +| Polygon / MultiPolygon | 2 | + +## Syntax + +```sql +ST_DIMENSION() +``` + +## Arguments + +| Arguments | Description | +|--------------|------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY. | + +## Return Type + +UInt8. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_DIMENSION( + ST_GEOMETRYFROMWKT( + 'POINT(-122.306100 37.554162)' + ) + ) AS pipeline_dimension; + +┌────────────────────┐ +│ pipeline_dimension │ +├────────────────────┤ +│ 0 │ +└────────────────────┘ + +SELECT + ST_DIMENSION( + ST_GEOMETRYFROMWKT( + 'LINESTRING(-124.20 42.00, -120.01 41.99)' + ) + ) AS pipeline_dimension; + +┌────────────────────┐ +│ pipeline_dimension │ +├────────────────────┤ +│ 1 │ +└────────────────────┘ + +SELECT + ST_DIMENSION( + ST_GEOMETRYFROMWKT( + 'POLYGON((-124.20 42.00, -120.01 41.99, -121.1 42.01, -124.20 42.00))' + ) + ) AS pipeline_dimension; + +┌────────────────────┐ +│ pipeline_dimension │ +├────────────────────┤ +│ 2 │ +└────────────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_DIMENSION( + ST_GEOGFROMWKT( + 'LINESTRING(-124.20 42.00, -120.01 41.99)' + ) + ) AS pipeline_dimension; + +╭────────────────────╮ +│ pipeline_dimension │ +├────────────────────┤ +│ 1 │ +╰────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-distance.md b/tidb-cloud-lake/sql/st-distance.md new file mode 100644 index 0000000000000..e1818ac2bbfcc --- /dev/null +++ b/tidb-cloud-lake/sql/st-distance.md @@ -0,0 +1,64 @@ +--- +title: ST_DISTANCE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the minimum distance between two objects. For GEOMETRY inputs, the function uses [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance). For GEOGRAPHY inputs, the function uses [haversine distance](https://en.wikipedia.org/wiki/Haversine_formula). + +## Syntax + +```sql +ST_DISTANCE(, ) +``` + +## Arguments + +| Arguments | Description | +|---------------|-------------------------------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY and must contain a Point. | +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY and must contain a Point. | + +:::note +- Returns NULL if one or more input points are NULL. +- The function reports an error if the two input GEOMETRY or GEOGRAPHY objects have different SRIDs. +::: + +## Return Type + +Double. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_DISTANCE( + TO_GEOMETRY('POINT(0 0)'), + TO_GEOMETRY('POINT(1 1)') + ) AS distance + +┌─────────────┐ +│ distance │ +├─────────────┤ +│ 1.414213562 │ +└─────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_DISTANCE( + ST_GEOGFROMWKT('POINT(0 0)'), + ST_GEOGFROMWKT('POINT(1 0)') + ) AS distance + +╭──────────────────╮ +│ distance │ +├──────────────────┤ +│ 111195.080233533 │ +╰──────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-endpoint.md b/tidb-cloud-lake/sql/st-endpoint.md new file mode 100644 index 0000000000000..e997c6f36f5d7 --- /dev/null +++ b/tidb-cloud-lake/sql/st-endpoint.md @@ -0,0 +1,60 @@ +--- +title: ST_ENDPOINT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the last Point in a LineString. + +## Syntax + +```sql +ST_ENDPOINT() +``` + +## Arguments + +| Arguments | Description | +|--------------|-----------------------------------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY that represents a LineString. | + +## Return Type + +Geometry. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_ENDPOINT( + ST_GEOMETRYFROMWKT( + 'LINESTRING(1 1, 2 2, 3 3, 4 4)' + ) + ) AS pipeline_endpoint; + +┌───────────────────┐ +│ pipeline_endpoint │ +├───────────────────┤ +│ POINT(4 4) │ +└───────────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_ENDPOINT( + ST_GEOGFROMWKT( + 'LINESTRING(1 1, 2 2, 3 3, 4 4)' + ) + ) AS pipeline_endpoint; + +┌───────────────────┐ +│ pipeline_endpoint │ +├───────────────────┤ +│ POINT(4 4) │ +└───────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-geogetryfromwkb.md b/tidb-cloud-lake/sql/st-geogetryfromwkb.md new file mode 100644 index 0000000000000..d1a9a51c3a399 --- /dev/null +++ b/tidb-cloud-lake/sql/st-geogetryfromwkb.md @@ -0,0 +1,5 @@ +--- +title: ST_GEOGETRYFROMWKB +--- + +Alias for [ST_GEOGRAPHYFROMWKB](/tidb-cloud-lake/sql/st-geographyfromwkb.md). diff --git a/tidb-cloud-lake/sql/st-geogfromewkb.md b/tidb-cloud-lake/sql/st-geogfromewkb.md new file mode 100644 index 0000000000000..91591abe0c28d --- /dev/null +++ b/tidb-cloud-lake/sql/st-geogfromewkb.md @@ -0,0 +1,5 @@ +--- +title: ST_GEOGFROMEWKB +--- + +Alias for [ST_GEOGRAPHYFROMWKB](/tidb-cloud-lake/sql/st-geographyfromwkb.md). diff --git a/tidb-cloud-lake/sql/st-geogfromgeohash.md b/tidb-cloud-lake/sql/st-geogfromgeohash.md new file mode 100644 index 0000000000000..b1261dec0cc04 --- /dev/null +++ b/tidb-cloud-lake/sql/st-geogfromgeohash.md @@ -0,0 +1,42 @@ +--- +title: ST_GEOGFROMGEOHASH +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns a GEOGRAPHY object for the polygon that represents the boundaries of a [geohash](https://en.wikipedia.org/wiki/Geohash). + +## Syntax + +```sql +ST_GEOGFROMGEOHASH() +``` + +## Arguments + +| Arguments | Description | +|-------------|---------------------------------| +| `` | The argument must be a geohash. | + +## Return Type + +Geography. + +## Examples + +```sql +SELECT + ST_ASWKT( + ST_GEOGFROMGEOHASH( + '9q60y60rhs' + ) + ) AS pipeline_geography; + +╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ +│ pipeline_geography │ +│ String │ +├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ POLYGON((-120.66229462623596 35.30029535293579,-120.66229462623596 35.30030071735382,-120.66230535507202 35.30030071735382,-120.66230535507202 35.30029535293579… │ +╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-geogfromtext.md b/tidb-cloud-lake/sql/st-geogfromtext.md new file mode 100644 index 0000000000000..c31a1fa6acb63 --- /dev/null +++ b/tidb-cloud-lake/sql/st-geogfromtext.md @@ -0,0 +1,5 @@ +--- +title: ST_GEOGFROMTEXT +--- + +Alias for [ST_GEOGRAPHYFROMWKT](/tidb-cloud-lake/sql/st-geographyfromwkt.md). diff --git a/tidb-cloud-lake/sql/st-geogfromwkb.md b/tidb-cloud-lake/sql/st-geogfromwkb.md new file mode 100644 index 0000000000000..fc9f2137c423a --- /dev/null +++ b/tidb-cloud-lake/sql/st-geogfromwkb.md @@ -0,0 +1,5 @@ +--- +title: ST_GEOGFROMWKB +--- + +Alias for [ST_GEOGRAPHYFROMWKB](/tidb-cloud-lake/sql/st-geographyfromwkb.md). diff --git a/tidb-cloud-lake/sql/st-geogfromwkt.md b/tidb-cloud-lake/sql/st-geogfromwkt.md new file mode 100644 index 0000000000000..57958adf7f919 --- /dev/null +++ b/tidb-cloud-lake/sql/st-geogfromwkt.md @@ -0,0 +1,5 @@ +--- +title: ST_GEOGFROMWKT +--- + +Alias for [ST_GEOGRAPHYFROMWKT](/tidb-cloud-lake/sql/st-geographyfromwkt.md). diff --git a/tidb-cloud-lake/sql/st-geogpointfromgeohash.md b/tidb-cloud-lake/sql/st-geogpointfromgeohash.md new file mode 100644 index 0000000000000..29da3dc7f850f --- /dev/null +++ b/tidb-cloud-lake/sql/st-geogpointfromgeohash.md @@ -0,0 +1,42 @@ +--- +title: ST_GEOGPOINTFROMGEOHASH +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns a GEOGRAPHY object for the point that represents center of a [geohash](https://en.wikipedia.org/wiki/Geohash). + +## Syntax + +```sql +ST_GEOGPOINTFROMGEOHASH() +``` + +## Arguments + +| Arguments | Description | +|-------------|---------------------------------| +| `` | The argument must be a geohash. | + +## Return Type + +Geography. + +## Examples + +```sql +SELECT + ST_ASWKT( + ST_GEOGPOINTFROMGEOHASH( + 's02equ0' + ) + ) AS pipeline_geography; + +╭──────────────────────────────────────────────╮ +│ pipeline_geography │ +│ String │ +├──────────────────────────────────────────────┤ +│ POINT(1.0004425048828125 2.0001983642578125) │ +╰──────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-geographyfromewkt.md b/tidb-cloud-lake/sql/st-geographyfromewkt.md new file mode 100644 index 0000000000000..d2d76356367e1 --- /dev/null +++ b/tidb-cloud-lake/sql/st-geographyfromewkt.md @@ -0,0 +1,5 @@ +--- +title: ST_GEOGRAPHYFROMEWKT +--- + +Alias for [ST_GEOGRAPHYFROMWKT](/tidb-cloud-lake/sql/st-geographyfromwkt.md). diff --git a/tidb-cloud-lake/sql/st-geographyfromtext.md b/tidb-cloud-lake/sql/st-geographyfromtext.md new file mode 100644 index 0000000000000..56804c3e03faf --- /dev/null +++ b/tidb-cloud-lake/sql/st-geographyfromtext.md @@ -0,0 +1,5 @@ +--- +title: ST_GEOGRAPHYFROMTEXT +--- + +Alias for [ST_GEOGRAPHYFROMWKT](/tidb-cloud-lake/sql/st-geographyfromwkt.md). diff --git a/tidb-cloud-lake/sql/st-geographyfromwkb.md b/tidb-cloud-lake/sql/st-geographyfromwkb.md new file mode 100644 index 0000000000000..c50860394b95e --- /dev/null +++ b/tidb-cloud-lake/sql/st-geographyfromwkb.md @@ -0,0 +1,66 @@ +--- +title: ST_GEOGRAPHYFROMWKB +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Parses a [WKB(well-known-binary)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary) or [EWKB(extended well-known-binary)](https://postgis.net/docs/ST_GeomFromEWKB.html) input and returns a value of type GEOGRAPHY. + +## Syntax + +```sql +ST_GEOGRAPHYFROMWKB() +ST_GEOGRAPHYFROMWKB() +``` + +## Aliases + +- [ST_GEOGFROMWKB](/tidb-cloud-lake/sql/st-geogfromwkb.md) +- [ST_GEOGETRYFROMWKB](/tidb-cloud-lake/sql/st-geogetryfromwkb.md) +- [ST_GEOGFROMEWKB](/tidb-cloud-lake/sql/st-geogfromewkb.md) + +## Arguments + +| Arguments | Description | +|-------------|--------------------------------------------------------------------------------| +| `` | The argument must be a string expression in WKB or EWKB in hexadecimal format. | +| `` | The argument must be a binary expression in WKB or EWKB format. | + +:::note +Only SRID 4326 is supported for GEOGRAPHY inputs. +::: + +## Return Type + +Geography. + +## Examples + +```sql +SELECT + ST_ASWKT( + ST_GEOGRAPHYFROMWKB( + '0101000020E6100000000000000000F03F0000000000000040' + ) + ) AS pipeline_geography; + +┌────────────────────┐ +│ pipeline_geography │ +├────────────────────┤ +│ POINT(1 2) │ +└────────────────────┘ + +SELECT + ST_ASWKT( + ST_GEOGRAPHYFROMWKB( + FROM_HEX('0101000000000000000000F03F0000000000000040') + ) + ) AS pipeline_geography; + +┌────────────────────┐ +│ pipeline_geography │ +├────────────────────┤ +│ POINT(1 2) │ +└────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-geographyfromwkt.md b/tidb-cloud-lake/sql/st-geographyfromwkt.md new file mode 100644 index 0000000000000..3205f7136b15d --- /dev/null +++ b/tidb-cloud-lake/sql/st-geographyfromwkt.md @@ -0,0 +1,65 @@ +--- +title: ST_GEOGRAPHYFROMWKT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Parses a [WKT(well-known-text)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) or [EWKT(extended well-known-text)](https://postgis.net/docs/ST_GeomFromEWKT.html) input and returns a value of type GEOGRAPHY. + +## Syntax + +```sql +ST_GEOGRAPHYFROMWKT() +``` + +## Aliases + +- [ST_GEOGFROMWKT](/tidb-cloud-lake/sql/st-geogfromwkt.md) +- [ST_GEOGRAPHYFROMEWKT](/tidb-cloud-lake/sql/st-geographyfromewkt.md) +- [ST_GEOGRAPHYFROMTEXT](/tidb-cloud-lake/sql/st-geographyfromtext.md) +- [ST_GEOGFROMTEXT](/tidb-cloud-lake/sql/st-geogfromtext.md) + +## Arguments + +| Arguments | Description | +|-------------|-----------------------------------------------------------------| +| `` | The argument must be a string expression in WKT or EWKT format. | + +:::note +Only SRID 4326 is supported for GEOGRAPHY inputs. +::: + +## Return Type + +Geography. + +## Examples + +```sql +SELECT + ST_ASWKT( + ST_GEOGRAPHYFROMWKT( + 'POINT(1 2)' + ) + ) AS pipeline_geography; + +┌────────────────────┐ +│ pipeline_geography │ +├────────────────────┤ +│ POINT(1 2) │ +└────────────────────┘ + +SELECT + ST_ASEWKT( + ST_GEOGRAPHYFROMWKT( + 'SRID=4326;POINT(1 2)' + ) + ) AS pipeline_geography; + +┌──────────────────────┐ +│ pipeline_geography │ +├──────────────────────┤ +│ SRID=4326;POINT(1 2) │ +└──────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-geohash.md b/tidb-cloud-lake/sql/st-geohash.md new file mode 100644 index 0000000000000..a50824b1d1945 --- /dev/null +++ b/tidb-cloud-lake/sql/st-geohash.md @@ -0,0 +1,75 @@ +--- +title: ST_GEOHASH +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Return the [geohash](https://en.wikipedia.org/wiki/Geohash) for a GEOMETRY or GEOGRAPHY object. A geohash is a short base32 string that identifies a geodesic rectangle containing a location in the world. The optional precision argument specifies the `precision` of the returned geohash. For example, passing 5 for `precision returns a shorter geohash (5 characters long) that is less precise. + +## Syntax + +```sql +ST_GEOHASH( [, ]) +``` + +## Arguments + +| Arguments | Description | +|-----------------|---------------------------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY. | +| `[precision]` | Optional. specifies the precision of the returned geohash, default is 12. | + +## Return Type + +String. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_GEOHASH( + ST_GEOMETRYFROMWKT( + 'POINT(-122.306100 37.554162)' + ) + ) AS pipeline_geohash; + +┌──────────────────┐ +│ pipeline_geohash │ +├──────────────────┤ +│ 9q9j8ue2v71y │ +└──────────────────┘ + +SELECT + ST_GEOHASH( + ST_GEOMETRYFROMWKT( + 'SRID=4326;POINT(-122.35 37.55)' + ), + 5 + ) AS pipeline_geohash; + +┌──────────────────┐ +│ pipeline_geohash │ +├──────────────────┤ +│ 9q8vx │ +└──────────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_GEOHASH( + ST_GEOGFROMWKT( + 'POINT(-122.306100 37.554162)' + ) + ) AS pipeline_geohash; + +┌──────────────────┐ +│ pipeline_geohash │ +├──────────────────┤ +│ 9q9j8ue2v71y │ +└──────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-geom-point.md b/tidb-cloud-lake/sql/st-geom-point.md new file mode 100644 index 0000000000000..1c02199b4ceb3 --- /dev/null +++ b/tidb-cloud-lake/sql/st-geom-point.md @@ -0,0 +1,5 @@ +--- +title: ST_GEOM_POINT +--- + +Alias for [ST_MAKEGEOMPOINT](/tidb-cloud-lake/sql/st-makegeompoint.md). diff --git a/tidb-cloud-lake/sql/st-geometryfromewkb.md b/tidb-cloud-lake/sql/st-geometryfromewkb.md new file mode 100644 index 0000000000000..0d072724d9da1 --- /dev/null +++ b/tidb-cloud-lake/sql/st-geometryfromewkb.md @@ -0,0 +1,5 @@ +--- +title: ST_GEOMETRYFROMEWKB +--- + +Alias for [ST_GEOMTRYFROMWKB](/tidb-cloud-lake/sql/st-geometryfromwkb.md). diff --git a/tidb-cloud-lake/sql/st-geometryfromewkt.md b/tidb-cloud-lake/sql/st-geometryfromewkt.md new file mode 100644 index 0000000000000..2e4d9fd7fc001 --- /dev/null +++ b/tidb-cloud-lake/sql/st-geometryfromewkt.md @@ -0,0 +1,5 @@ +--- +title: ST_GEOMETRYFROMEWKT +--- + +Alias for [ST_GEOMTRYFROMWKT](/tidb-cloud-lake/sql/st-geometryfromwkt.md). diff --git a/tidb-cloud-lake/sql/st-geometryfromtext.md b/tidb-cloud-lake/sql/st-geometryfromtext.md new file mode 100644 index 0000000000000..2fed89adeac00 --- /dev/null +++ b/tidb-cloud-lake/sql/st-geometryfromtext.md @@ -0,0 +1,5 @@ +--- +title: ST_GEOMETRYFROMTEXT +--- + +Alias for [ST_GEOMETRYFROMWKT](/tidb-cloud-lake/sql/st-geometryfromwkt.md). diff --git a/tidb-cloud-lake/sql/st-geometryfromwkb.md b/tidb-cloud-lake/sql/st-geometryfromwkb.md new file mode 100644 index 0000000000000..6167c5a6f6f87 --- /dev/null +++ b/tidb-cloud-lake/sql/st-geometryfromwkb.md @@ -0,0 +1,59 @@ +--- +title: ST_GEOMETRYFROMWKB +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Parses a [WKB(well-known-binary)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary) or [EWKB(extended well-known-binary)](https://postgis.net/docs/ST_GeomFromEWKB.html) input and returns a value of type GEOMETRY. + +## Syntax + +```sql +ST_GEOMETRYFROMWKB(, []) +ST_GEOMETRYFROMWKB(, []) +``` + +## Aliases + +- [ST_GEOMFROMWKB](/tidb-cloud-lake/sql/st-geomfromwkb.md) +- [ST_GEOMETRYFROMEWKB](/tidb-cloud-lake/sql/st-geometryfromewkb.md) +- [ST_GEOMFROMEWKB](/tidb-cloud-lake/sql/st-geomfromewkb.md) + +## Arguments + +| Arguments | Description | +|-------------|--------------------------------------------------------------------------------| +| `` | The argument must be a string expression in WKB or EWKB in hexadecimal format. | +| `` | The argument must be a binary expression in WKB or EWKB format. | +| `` | The integer value of the SRID to use. | + +## Return Type + +Geometry. + +## Examples + +```sql +SELECT + ST_GEOMETRYFROMWKB( + '0101000020797f000066666666a9cb17411f85ebc19e325641' + ) AS pipeline_geometry; + +┌────────────────────────────────────────┐ +│ pipeline_geometry │ +├────────────────────────────────────────┤ +│ SRID=32633;POINT(389866.35 5819003.03) │ +└────────────────────────────────────────┘ + +SELECT + ST_GEOMETRYFROMWKB( + FROM_HEX('0101000020797f000066666666a9cb17411f85ebc19e325641'), 4326 + ) AS pipeline_geometry; + +┌───────────────────────────────────────┐ +│ pipeline_geometry │ +├───────────────────────────────────────┤ +│ SRID=4326;POINT(389866.35 5819003.03) │ +└───────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-geometryfromwkt.md b/tidb-cloud-lake/sql/st-geometryfromwkt.md new file mode 100644 index 0000000000000..8b6203b12b27e --- /dev/null +++ b/tidb-cloud-lake/sql/st-geometryfromwkt.md @@ -0,0 +1,60 @@ +--- +title: ST_GEOMETRYFROMWKT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Parses a [WKT(well-known-text)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) or [EWKT(extended well-known-text)](https://postgis.net/docs/ST_GeomFromEWKT.html) input and returns a value of type GEOMETRY. + +## Syntax + +```sql +ST_GEOMETRYFROMWKT(, []) +``` + +## Aliases + +- [ST_GEOMFROMWKT](/tidb-cloud-lake/sql/st-geomfromwkt.md) +- [ST_GEOMETRYFROMEWKT](/tidb-cloud-lake/sql/st-geometryfromewkt.md) +- [ST_GEOMFROMEWKT](/tidb-cloud-lake/sql/st-geomfromewkt.md) +- [ST_GEOMFROMTEXT](/tidb-cloud-lake/sql/st-geomfromtext.md) +- [ST_GEOMETRYFROMTEXT](/tidb-cloud-lake/sql/st-geometryfromtext.md) + +## Arguments + +| Arguments | Description | +|-------------|-----------------------------------------------------------------| +| `` | The argument must be a string expression in WKT or EWKT format. | +| `` | The integer value of the SRID to use. | + +## Return Type + +Geometry. + +## Examples + +```sql +SELECT + ST_GEOMETRYFROMWKT( + 'POINT(1820.12 890.56)' + ) AS pipeline_geometry; + +┌───────────────────────┐ +│ pipeline_geometry │ +├───────────────────────┤ +│ POINT(1820.12 890.56) │ +└───────────────────────┘ + +SELECT + ST_GEOMETRYFROMWKT( + 'POINT(1820.12 890.56)', 4326 + ) AS pipeline_geometry; + +┌─────────────────────────────────┐ +│ pipeline_geometry │ +│ Geometry │ +├─────────────────────────────────┤ +│ SRID=4326;POINT(1820.12 890.56) │ +└─────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-geomfromewkb.md b/tidb-cloud-lake/sql/st-geomfromewkb.md new file mode 100644 index 0000000000000..2ac50f251d1e7 --- /dev/null +++ b/tidb-cloud-lake/sql/st-geomfromewkb.md @@ -0,0 +1,5 @@ +--- +title: ST_GEOMFROMEWKB +--- + +Alias for [ST_GEOMTRYFROMWKB](/tidb-cloud-lake/sql/st-geometryfromwkb.md). diff --git a/tidb-cloud-lake/sql/st-geomfromewkt.md b/tidb-cloud-lake/sql/st-geomfromewkt.md new file mode 100644 index 0000000000000..872c3521ec74a --- /dev/null +++ b/tidb-cloud-lake/sql/st-geomfromewkt.md @@ -0,0 +1,5 @@ +--- +title: ST_GEOMFROMEWKT +--- + +Alias for [ST_GEOMTRYFROMWKT](/tidb-cloud-lake/sql/st-geometryfromwkt.md). diff --git a/tidb-cloud-lake/sql/st-geomfromgeohash.md b/tidb-cloud-lake/sql/st-geomfromgeohash.md new file mode 100644 index 0000000000000..c30858e614ed7 --- /dev/null +++ b/tidb-cloud-lake/sql/st-geomfromgeohash.md @@ -0,0 +1,39 @@ +--- +title: ST_GEOMFROMGEOHASH +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns a GEOMETRY object for the polygon that represents the boundaries of a [geohash](https://en.wikipedia.org/wiki/Geohash). + +## Syntax + +```sql +ST_GEOMFROMGEOHASH() +``` + +## Arguments + +| Arguments | Description | +|-------------|---------------------------------| +| `` | The argument must be a geohash. | + +## Return Type + +Geometry. + +## Examples + +```sql +SELECT + ST_GEOMFROMGEOHASH( + '9q60y60rhs' + ) AS pipeline_geometry; + +┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ st_geomfromgeohash('9q60y60rhs') │ +├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ POLYGON((-120.66230535507202 35.30029535293579,-120.66230535507202 35.30030071735382,-120.66229462623596 35.30030071735382,-120.66229462623596 35.30029535293579,-120.66230535507202 35.30029535293579)) │ +└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-geomfromtext.md b/tidb-cloud-lake/sql/st-geomfromtext.md new file mode 100644 index 0000000000000..2e6c9cddd9795 --- /dev/null +++ b/tidb-cloud-lake/sql/st-geomfromtext.md @@ -0,0 +1,5 @@ +--- +title: ST_GEOMFROMTEXT +--- + +Alias for [ST_GEOMTRYFROMWKT](/tidb-cloud-lake/sql/st-geometryfromwkt.md). diff --git a/tidb-cloud-lake/sql/st-geomfromwkb.md b/tidb-cloud-lake/sql/st-geomfromwkb.md new file mode 100644 index 0000000000000..f855c81398775 --- /dev/null +++ b/tidb-cloud-lake/sql/st-geomfromwkb.md @@ -0,0 +1,5 @@ +--- +title: ST_GEOMFROMWKB +--- + +Alias for [ST_GEOMTRYFROMWKB](/tidb-cloud-lake/sql/st-geometryfromwkb.md). diff --git a/tidb-cloud-lake/sql/st-geomfromwkt.md b/tidb-cloud-lake/sql/st-geomfromwkt.md new file mode 100644 index 0000000000000..8c96b91e73aee --- /dev/null +++ b/tidb-cloud-lake/sql/st-geomfromwkt.md @@ -0,0 +1,5 @@ +--- +title: ST_GEOMFROMWKT +--- + +Alias for [ST_GEOMTRYFROMWKT](/tidb-cloud-lake/sql/st-geometryfromwkt.md). diff --git a/tidb-cloud-lake/sql/st-geompointfromgeohash.md b/tidb-cloud-lake/sql/st-geompointfromgeohash.md new file mode 100644 index 0000000000000..af186f86bda92 --- /dev/null +++ b/tidb-cloud-lake/sql/st-geompointfromgeohash.md @@ -0,0 +1,40 @@ +--- +title: ST_GEOMPOINTFROMGEOHASH +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns a GEOMETRY object for the point that represents center of a [geohash](https://en.wikipedia.org/wiki/Geohash). + +## Syntax + +```sql +ST_GEOMPOINTFROMGEOHASH() +``` + +## Arguments + +| Arguments | Description | +|-------------|---------------------------------| +| `` | The argument must be a geohash. | + +## Return Type + +Geometry. + +## Examples + +```sql +SELECT + ST_GEOMPOINTFROMGEOHASH( + 's02equ0' + ) AS pipeline_geometry; + +┌──────────────────────────────────────────────┐ +│ pipeline_geometry │ +│ Geometry │ +├──────────────────────────────────────────────┤ +│ POINT(1.0004425048828125 2.0001983642578125) │ +└──────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-length.md b/tidb-cloud-lake/sql/st-length.md new file mode 100644 index 0000000000000..ad0b27eb98561 --- /dev/null +++ b/tidb-cloud-lake/sql/st-length.md @@ -0,0 +1,81 @@ +--- +title: ST_LENGTH +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the Euclidean length of the LineString(s) in a GEOMETRY or GEOGRAPHY object. + +## Syntax + +```sql +ST_LENGTH() +``` + +## Arguments + +| Arguments | Description | +|--------------|-----------------------------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY containing linestrings. | + +:::note +- If `` is not a `LineString`, `MultiLineString`, or `GeometryCollection` containing linestrings, returns 0. +- If `` is a `GeometryCollection`, returns the sum of the lengths of the linestrings in the collection. +::: + +## Return Type + +Double. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_LENGTH(TO_GEOMETRY('POINT(1 1)')) AS length + +┌─────────┐ +│ length │ +├─────────┤ +│ 0 │ +└─────────┘ + +SELECT + ST_LENGTH(TO_GEOMETRY('LINESTRING(0 0, 1 1)')) AS length + +┌─────────────┐ +│ length │ +├─────────────┤ +│ 1.414213562 │ +└─────────────┘ + +SELECT + ST_LENGTH( + TO_GEOMETRY('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))') + ) AS length + +┌─────────┐ +│ length │ +├─────────┤ +│ 0 │ +└─────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_LENGTH( + ST_GEOGFROMWKT( + 'LINESTRING(0 0, 1 0)' + ) + ) AS length + +╭──────────────────╮ +│ length │ +├──────────────────┤ +│ 111319.490793274 │ +╰──────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-make-line.md b/tidb-cloud-lake/sql/st-make-line.md new file mode 100644 index 0000000000000..2f77a897916cd --- /dev/null +++ b/tidb-cloud-lake/sql/st-make-line.md @@ -0,0 +1,5 @@ +--- +title: ST_MAKE_LINE +--- + +Alias for [ST_MAKELINE](/tidb-cloud-lake/sql/st-makeline.md). diff --git a/tidb-cloud-lake/sql/st-makegeompoint.md b/tidb-cloud-lake/sql/st-makegeompoint.md new file mode 100644 index 0000000000000..d077cb76d260c --- /dev/null +++ b/tidb-cloud-lake/sql/st-makegeompoint.md @@ -0,0 +1,55 @@ +--- +title: ST_MAKEGEOMPOINT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Constructs a GEOMETRY object that represents a Point with the specified longitude and latitude. + +## Syntax + +```sql +ST_MAKEGEOMPOINT(, ) +``` + +## Aliases + +- [ST_GEOM_POINT](/tidb-cloud-lake/sql/st-geom-point.md) + +## Arguments + +| Arguments | Description | +|---------------|-----------------------------------------------| +| `` | A Double value that represents the longitude. | +| `` | A Double value that represents the latitude. | + +## Return Type + +Geometry. + +## Examples + +```sql +SELECT + ST_MAKEGEOMPOINT( + 7.0, 8.0 + ) AS pipeline_point; + +┌────────────────┐ +│ pipeline_point │ +├────────────────┤ +│ POINT(7 8) │ +└────────────────┘ + +SELECT + ST_MAKEGEOMPOINT( + -122.3061, 37.554162 + ) AS pipeline_point; + +┌────────────────────────────┐ +│ pipeline_point │ +├────────────────────────────┤ +│ POINT(-122.3061 37.554162) │ +└────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-makeline.md b/tidb-cloud-lake/sql/st-makeline.md new file mode 100644 index 0000000000000..b7fdd0cd6bcce --- /dev/null +++ b/tidb-cloud-lake/sql/st-makeline.md @@ -0,0 +1,71 @@ +--- +title: ST_MAKELINE +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Constructs a GEOMETRY or GEOGRAPHY object that represents a line connecting the points in the input two GEOMETRY or GEOGRAPHY objects. + +## Syntax + +```sql +ST_MAKELINE(, ) +``` + +## Aliases + +- [ST_MAKE_LINE](/tidb-cloud-lake/sql/st-make-line.md) + +## Arguments + +| Arguments | Description | +|---------------|-------------------------------------------------------------------------------------------------------------| +| `` | A GEOMETRY or GEOGRAPHY object containing the points to connect. This object must be a Point, MultiPoint, or LineString. | +| `` | A GEOMETRY or GEOGRAPHY object containing the points to connect. This object must be a Point, MultiPoint, or LineString. | + +## Return Type + +Geometry. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_MAKELINE( + ST_GEOMETRYFROMWKT( + 'POINT(-122.306100 37.554162)' + ), + ST_GEOMETRYFROMWKT( + 'POINT(-104.874173 56.714538)' + ) + ) AS pipeline_line; + +┌───────────────────────────────────────────────────────┐ +│ pipeline_line │ +├───────────────────────────────────────────────────────┤ +│ LINESTRING(-122.3061 37.554162,-104.874173 56.714538) │ +└───────────────────────────────────────────────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_MAKELINE( + ST_GEOGFROMWKT( + 'POINT(-122.306100 37.554162)' + ), + ST_GEOGFROMWKT( + 'POINT(-104.874173 56.714538)' + ) + ) AS pipeline_line; + +╭───────────────────────────────────────────────────────╮ +│ pipeline_line │ +├───────────────────────────────────────────────────────┤ +│ LINESTRING(-122.3061 37.554162,-104.874173 56.714538) │ +╰───────────────────────────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-makepoint.md b/tidb-cloud-lake/sql/st-makepoint.md new file mode 100644 index 0000000000000..c5ebe2367cfb5 --- /dev/null +++ b/tidb-cloud-lake/sql/st-makepoint.md @@ -0,0 +1,59 @@ +--- +title: ST_MAKEPOINT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Constructs a GEOGRAPHY object that represents a Point with the specified longitude and latitude. + +## Syntax + +```sql +ST_MAKEPOINT(, ) +``` + +## Aliases + +- [ST_POINT](/tidb-cloud-lake/sql/st-point.md) + +## Arguments + +| Arguments | Description | +|---------------|-----------------------------------------------| +| `` | A Double value that represents the longitude. | +| `` | A Double value that represents the latitude. | + +## Return Type + +Geography. + +## Examples + +```sql +SELECT + ST_ASWKT( + ST_MAKEPOINT( + 7.0, 8.0 + ) + ) AS pipeline_point; + +┌────────────────┐ +│ pipeline_point │ +├────────────────┤ +│ POINT(7 8) │ +└────────────────┘ + +SELECT + ST_ASWKT( + ST_MAKEPOINT( + -122.3061, 37.554162 + ) + ) AS pipeline_point; + +╭────────────────────────────╮ +│ pipeline_point │ +├────────────────────────────┤ +│ POINT(-122.3061 37.554162) │ +╰────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-makepolygon.md b/tidb-cloud-lake/sql/st-makepolygon.md new file mode 100644 index 0000000000000..1a893342f336e --- /dev/null +++ b/tidb-cloud-lake/sql/st-makepolygon.md @@ -0,0 +1,64 @@ +--- +title: ST_MAKEPOLYGON +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Constructs a GEOMETRY or GEOGRAPHY object that represents a Polygon without holes. The function uses the specified LineString as the outer loop. + +## Syntax + +```sql +ST_MAKEPOLYGON() +``` + +## Aliases + +- [ST_POLYGON](/tidb-cloud-lake/sql/st-polygon.md) + +## Arguments + +| Arguments | Description | +|--------------|------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY. | + +## Return Type + +Geometry. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_MAKEPOLYGON( + ST_GEOMETRYFROMWKT( + 'LINESTRING(0.0 0.0, 1.0 0.0, 1.0 2.0, 0.0 2.0, 0.0 0.0)' + ) + ) AS pipeline_polygon; + +┌────────────────────────────────┐ +│ pipeline_polygon │ +├────────────────────────────────┤ +│ POLYGON((0 0,1 0,1 2,0 2,0 0)) │ +└────────────────────────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_MAKEPOLYGON( + ST_GEOGFROMWKT( + 'LINESTRING(0.0 0.0, 1.0 0.0, 1.0 2.0, 0.0 2.0, 0.0 0.0)' + ) + ) AS pipeline_polygon; + +╭────────────────────────────────╮ +│ pipeline_polygon │ +├────────────────────────────────┤ +│ POLYGON((0 0,1 0,1 2,0 2,0 0)) │ +╰────────────────────────────────╯ +``` diff --git a/tidb-cloud-lake/sql/st-npoints.md b/tidb-cloud-lake/sql/st-npoints.md new file mode 100644 index 0000000000000..efa675aa9b068 --- /dev/null +++ b/tidb-cloud-lake/sql/st-npoints.md @@ -0,0 +1,86 @@ +--- +title: ST_NPOINTS +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the number of points in a GEOMETRY or GEOGRAPHY object. + +## Syntax + +```sql +ST_NPOINTS() +``` + +## Aliases + +- [ST_NUMPOINTS](/tidb-cloud-lake/sql/st-numpoints.md) + +## Arguments + +| Arguments | Description | +|--------------|-------------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY object. | + +## Return Type + +UInt8. + +## Examples + +### GEOMETRY examples + +```sql +SELECT ST_NPOINTS(TO_GEOMETRY('POINT(66 12)')) AS npoints + +┌─────────┐ +│ npoints │ +├─────────┤ +│ 1 │ +└─────────┘ + +SELECT ST_NPOINTS(TO_GEOMETRY('MULTIPOINT((45 21),(12 54))')) AS npoints + +┌─────────┐ +│ npoints │ +├─────────┤ +│ 2 │ +└─────────┘ + +SELECT ST_NPOINTS(TO_GEOMETRY('LINESTRING(40 60,50 50,60 40)')) AS npoints + +┌─────────┐ +│ npoints │ +├─────────┤ +│ 3 │ +└─────────┘ + +SELECT ST_NPOINTS(TO_GEOMETRY('MULTILINESTRING((1 1,32 17),(33 12,73 49,87.1 6.1))')) AS npoints + +┌─────────┐ +│ npoints │ +├─────────┤ +│ 5 │ +└─────────┘ + +SELECT ST_NPOINTS(TO_GEOMETRY('GEOMETRYCOLLECTION(POLYGON((-10 0,0 10,10 0,-10 0)),LINESTRING(40 60,50 50,60 40),POINT(99 11))')) AS npoints + +┌─────────┐ +│ npoints │ +├─────────┤ +│ 8 │ +└─────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT ST_NPOINTS(ST_GEOGFROMWKT('LINESTRING(40 60,50 50,60 40)')) AS npoints + +┌─────────┐ +│ npoints │ +├─────────┤ +│ 3 │ +└─────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-numpoints.md b/tidb-cloud-lake/sql/st-numpoints.md new file mode 100644 index 0000000000000..53c3cb8cf5f8c --- /dev/null +++ b/tidb-cloud-lake/sql/st-numpoints.md @@ -0,0 +1,5 @@ +--- +title: ST_NUMPOINTS +--- + +Alias for [ST_NPOINTS](/tidb-cloud-lake/sql/st-npoints.md). diff --git a/tidb-cloud-lake/sql/st-point.md b/tidb-cloud-lake/sql/st-point.md new file mode 100644 index 0000000000000..25a7ee85f07a6 --- /dev/null +++ b/tidb-cloud-lake/sql/st-point.md @@ -0,0 +1,5 @@ +--- +title: ST_POINT +--- + +Alias for [ST_MAKEPOINT](/tidb-cloud-lake/sql/st-makepoint.md). diff --git a/tidb-cloud-lake/sql/st-pointn.md b/tidb-cloud-lake/sql/st-pointn.md new file mode 100644 index 0000000000000..e5699034defe3 --- /dev/null +++ b/tidb-cloud-lake/sql/st-pointn.md @@ -0,0 +1,81 @@ +--- +title: ST_POINTN +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns a Point at a specified index in a LineString. + +## Syntax + +```sql +ST_POINTN(, ) +``` + +## Arguments + +| Arguments | Description | +|--------------|-----------------------------------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY that represents a LineString. | +| `` | The index of the Point to return. | + +:::note +The index is 1-based, and a negative index is uesed as the offset from the end of LineString. If index is out of bounds, the function returns an error. +::: + +## Return Type + +Geometry. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_POINTN( + ST_GEOMETRYFROMWKT( + 'LINESTRING(1 1, 2 2, 3 3, 4 4)' + ), + 1 + ) AS pipeline_pointn; + +┌─────────────────┐ +│ pipeline_pointn │ +├─────────────────┤ +│ POINT(1 1) │ +└─────────────────┘ + +SELECT + ST_POINTN( + ST_GEOMETRYFROMWKT( + 'LINESTRING(1 1, 2 2, 3 3, 4 4)' + ), + -2 + ) AS pipeline_pointn; + +┌─────────────────┐ +│ pipeline_pointn │ +├─────────────────┤ +│ POINT(3 3) │ +└─────────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_POINTN( + ST_GEOGFROMWKT( + 'LINESTRING(1 1, 2 2, 3 3, 4 4)' + ), + 2 + ) AS pipeline_pointn; + +┌─────────────────┐ +│ pipeline_pointn │ +├─────────────────┤ +│ POINT(2 2) │ +└─────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-polygon.md b/tidb-cloud-lake/sql/st-polygon.md new file mode 100644 index 0000000000000..c22bcb8bea5b2 --- /dev/null +++ b/tidb-cloud-lake/sql/st-polygon.md @@ -0,0 +1,5 @@ +--- +title: ST_POLYGON +--- + +Alias for [ST_MAKEPOLYGON](/tidb-cloud-lake/sql/st-makepolygon.md). diff --git a/tidb-cloud-lake/sql/st-setsrid.md b/tidb-cloud-lake/sql/st-setsrid.md new file mode 100644 index 0000000000000..598a02a888b70 --- /dev/null +++ b/tidb-cloud-lake/sql/st-setsrid.md @@ -0,0 +1,40 @@ +--- +title: ST_SETSRID +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns a GEOMETRY object that has its [SRID (spatial reference system identifier)](https://en.wikipedia.org/wiki/Spatial_reference_system#Identifier) set to the specified value. This Function only change the SRID without affecting the coordinates of the object. If you also need to change the coordinates to match the new SRS (spatial reference system), use [ST_TRANSFORM](/tidb-cloud-lake/sql/st-transform.md) instead. + +## Syntax + +```sql +ST_SETSRID(, ) +``` + +## Arguments + +| Arguments | Description | +|--------------|-------------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY object. | +| `` | The SRID integer to set in the returned GEOMETRY object. | + +## Return Type + +Geometry. + +## Examples + +```sql +SET GEOMETRY_OUTPUT_FORMAT = 'EWKT' + +SELECT ST_SETSRID(TO_GEOMETRY('POINT(13 51)'), 4326) AS geometry + +┌────────────────────────┐ +│ geometry │ +├────────────────────────┤ +│ SRID=4326;POINT(13 51) │ +└────────────────────────┘ + +``` diff --git a/tidb-cloud-lake/sql/st-srid.md b/tidb-cloud-lake/sql/st-srid.md new file mode 100644 index 0000000000000..b27ff197d32b9 --- /dev/null +++ b/tidb-cloud-lake/sql/st-srid.md @@ -0,0 +1,78 @@ +--- +title: ST_SRID +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the SRID (spatial reference system identifier) of a GEOMETRY or GEOGRAPHY object. + +## Syntax + +```sql +ST_SRID() +``` + +## Arguments + +| Arguments | Description | +|--------------|------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY. | + +## Return Type + +INT32. + +:::note +If the Geometry don't have a SRID, a default value 4326 will be returned. +::: + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_SRID( + TO_GEOMETRY( + 'POINT(-122.306100 37.554162)', + 1234 + ) + ) AS pipeline_srid; + +┌───────────────┐ +│ pipeline_srid │ +├───────────────┤ +│ 1234 │ +└───────────────┘ + +SELECT + ST_SRID( + ST_MAKEGEOMPOINT( + 37.5, 45.5 + ) + ) AS pipeline_srid; + +┌───────────────┐ +│ pipeline_srid │ +├───────────────┤ +│ 4326 │ +└───────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_SRID( + ST_GEOGFROMWKT( + 'POINT(1 2)' + ) + ) AS pipeline_srid; + +┌───────────────┐ +│ pipeline_srid │ +├───────────────┤ +│ 4326 │ +└───────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-startpoint.md b/tidb-cloud-lake/sql/st-startpoint.md new file mode 100644 index 0000000000000..7a3443ed9c297 --- /dev/null +++ b/tidb-cloud-lake/sql/st-startpoint.md @@ -0,0 +1,60 @@ +--- +title: ST_STARTPOINT +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the first Point in a LineString. + +## Syntax + +```sql +ST_STARTPOINT() +``` + +## Arguments + +| Arguments | Description | +|--------------|-----------------------------------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY that represents a LineString. | + +## Return Type + +Geometry. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_STARTPOINT( + ST_GEOMETRYFROMWKT( + 'LINESTRING(1 1, 2 2, 3 3, 4 4)' + ) + ) AS pipeline_endpoint; + +┌───────────────────┐ +│ pipeline_endpoint │ +├───────────────────┤ +│ POINT(1 1) │ +└───────────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_STARTPOINT( + ST_GEOGFROMWKT( + 'LINESTRING(1 1, 2 2, 3 3, 4 4)' + ) + ) AS pipeline_startpoint; + +┌─────────────────────┐ +│ pipeline_startpoint │ +├─────────────────────┤ +│ POINT(1 1) │ +└─────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-transform.md b/tidb-cloud-lake/sql/st-transform.md new file mode 100644 index 0000000000000..fb1bb2ec2728a --- /dev/null +++ b/tidb-cloud-lake/sql/st-transform.md @@ -0,0 +1,49 @@ +--- +title: ST_TRANSFORM +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Converts a GEOMETRY object from one [spatial reference system (SRS)](https://en.wikipedia.org/wiki/Spatial_reference_system) to another. If you just need to change the SRID without changing the coordinates (e.g. if the SRID was incorrect), use [ST_SETSRID](/tidb-cloud-lake/sql/st-setsrid.md) instead. + +## Syntax + +```sql +ST_TRANSFORM( [, ], ) +``` + +## Arguments + +| Arguments | Description | +|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY object. | +| `` | Optional SRID identifying the current SRS of the input GEOMETRY object, if this argument is omitted, use the SRID specified in the input GEOMETRY object. | +| `` | The SRID that identifies the SRS to use, transforms the input GEOMETRY object to a new object that uses this SRS. | + +## Return Type + +Geometry. + +## Examples + +```sql +SET GEOMETRY_OUTPUT_FORMAT = 'EWKT' + +SELECT ST_TRANSFORM(ST_GEOMFROMWKT('POINT(389866.35 5819003.03)', 32633), 3857) AS transformed_geom + +┌───────────────────────────────────────────────┐ +│ transformed_geom │ +├───────────────────────────────────────────────┤ +│ SRID=3857;POINT(1489140.093766 6892872.19868) │ +└───────────────────────────────────────────────┘ + +SELECT ST_TRANSFORM(ST_GEOMFROMWKT('POINT(4.500212 52.161170)'), 4326, 28992) AS transformed_geom + +┌──────────────────────────────────────────────┐ +│ transformed_geom │ +├──────────────────────────────────────────────┤ +│ SRID=28992;POINT(94308.670475 464038.168827) │ +└──────────────────────────────────────────────┘ + +``` diff --git a/tidb-cloud-lake/sql/st-x.md b/tidb-cloud-lake/sql/st-x.md new file mode 100644 index 0000000000000..5523d815efe4c --- /dev/null +++ b/tidb-cloud-lake/sql/st-x.md @@ -0,0 +1,60 @@ +--- +title: ST_X +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the longitude (X coordinate) of a Point represented by a GEOMETRY or GEOGRAPHY object. + +## Syntax + +```sql +ST_X() +``` + +## Arguments + +| Arguments | Description | +|--------------|-------------------------------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY and must contain a Point. | + +## Return Type + +Double. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_X( + ST_MAKEGEOMPOINT( + 37.5, 45.5 + ) + ) AS pipeline_x; + +┌────────────┐ +│ pipeline_x │ +├────────────┤ +│ 37.5 │ +└────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_X( + ST_GEOGFROMWKT( + 'POINT(37.5 45.5)' + ) + ) AS pipeline_x; + +┌────────────┐ +│ pipeline_x │ +├────────────┤ +│ 37.5 │ +└────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-xmax.md b/tidb-cloud-lake/sql/st-xmax.md new file mode 100644 index 0000000000000..947d135f6ea2f --- /dev/null +++ b/tidb-cloud-lake/sql/st-xmax.md @@ -0,0 +1,73 @@ +--- +title: ST_XMAX +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the maximum longitude (X coordinate) of all points contained in the specified GEOMETRY or GEOGRAPHY object. + +## Syntax + +```sql +ST_XMAX() +``` + +## Arguments + +| Arguments | Description | +|--------------|------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY. | + +## Return Type + +Double. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_XMAX( + TO_GEOMETRY( + 'GEOMETRYCOLLECTION(POINT(40 10),LINESTRING(10 10,20 20,10 40),POINT EMPTY)' + ) + ) AS pipeline_xmax; + +┌───────────────┐ +│ pipeline_xmax │ +├───────────────┤ +│ 40 │ +└───────────────┘ + +SELECT + ST_XMAX( + TO_GEOMETRY( + 'GEOMETRYCOLLECTION(POINT(40 10),LINESTRING(10 10,20 20,10 40),POLYGON((40 40,20 45,45 30,40 40)))' + ) + ) AS pipeline_xmax; + +┌───────────────┐ +│ pipeline_xmax │ +├───────────────┤ +│ 45 │ +└───────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_XMAX( + ST_GEOGFROMWKT( + 'LINESTRING(-179 0, 179 0)' + ) + ) AS pipeline_xmax; + +┌───────────────┐ +│ pipeline_xmax │ +├───────────────┤ +│ 179 │ +└───────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-xmin.md b/tidb-cloud-lake/sql/st-xmin.md new file mode 100644 index 0000000000000..bde03828ff5c9 --- /dev/null +++ b/tidb-cloud-lake/sql/st-xmin.md @@ -0,0 +1,73 @@ +--- +title: ST_XMIN +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the minimum longitude (X coordinate) of all points contained in the specified GEOMETRY or GEOGRAPHY object. + +## Syntax + +```sql +ST_XMIN() +``` + +## Arguments + +| Arguments | Description | +|--------------|------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY. | + +## Return Type + +Double. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_XMIN( + TO_GEOMETRY( + 'GEOMETRYCOLLECTION(POINT(180 10),LINESTRING(20 10,30 20,40 40),POINT EMPTY)' + ) + ) AS pipeline_xmin; + +┌───────────────┐ +│ pipeline_xmin │ +├───────────────┤ +│ 20 │ +└───────────────┘ + +SELECT + ST_XMIN( + TO_GEOMETRY( + 'GEOMETRYCOLLECTION(POINT(40 10),LINESTRING(20 10,30 20,10 40),POLYGON((40 40,20 45,45 30,40 40)))' + ) + ) AS pipeline_xmin; + +┌───────────────┐ +│ pipeline_xmin │ +├───────────────┤ +│ 10 │ +└───────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_XMIN( + ST_GEOGFROMWKT( + 'LINESTRING(-179 0, 179 0)' + ) + ) AS pipeline_xmin; + +┌───────────────┐ +│ pipeline_xmin │ +├───────────────┤ +│ -179 │ +└───────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-y.md b/tidb-cloud-lake/sql/st-y.md new file mode 100644 index 0000000000000..298e5fae0001f --- /dev/null +++ b/tidb-cloud-lake/sql/st-y.md @@ -0,0 +1,60 @@ +--- +title: ST_Y +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the latitude (Y coordinate) of a Point represented by a GEOMETRY or GEOGRAPHY object. + +## Syntax + +```sql +ST_Y() +``` + +## Arguments + +| Arguments | Description | +|--------------|-------------------------------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY and must contain a Point. | + +## Return Type + +Double. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_Y( + ST_MAKEGEOMPOINT( + 37.5, 45.5 + ) + ) AS pipeline_y; + +┌────────────┐ +│ pipeline_y │ +├────────────┤ +│ 45.5 │ +└────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_Y( + ST_GEOGFROMWKT( + 'POINT(37.5 45.5)' + ) + ) AS pipeline_y; + +┌────────────┐ +│ pipeline_y │ +├────────────┤ +│ 45.5 │ +└────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-ymax.md b/tidb-cloud-lake/sql/st-ymax.md new file mode 100644 index 0000000000000..c7255b53ab59a --- /dev/null +++ b/tidb-cloud-lake/sql/st-ymax.md @@ -0,0 +1,73 @@ +--- +title: ST_YMAX +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the maximum latitude (Y coordinate) of all points contained in the specified GEOMETRY or GEOGRAPHY object. + +## Syntax + +```sql +ST_YMAX() +``` + +## Arguments + +| Arguments | Description | +|--------------|------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY. | + +## Return Type + +Double. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_YMAX( + TO_GEOMETRY( + 'GEOMETRYCOLLECTION(POINT(180 50),LINESTRING(10 10,20 20,10 40),POINT EMPTY)' + ) + ) AS pipeline_ymax; + +┌───────────────┐ +│ pipeline_ymax │ +├───────────────┤ +│ 50 │ +└───────────────┘ + +SELECT + ST_YMAX( + TO_GEOMETRY( + 'GEOMETRYCOLLECTION(POINT(40 10),LINESTRING(10 10,20 20,10 40),POLYGON((40 40,20 45,45 30,40 40)))' + ) + ) AS pipeline_ymax; + +┌───────────────┐ +│ pipeline_ymax │ +├───────────────┤ +│ 45 │ +└───────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_YMAX( + ST_GEOGFROMWKT( + 'LINESTRING(-179 10, 179 22)' + ) + ) AS pipeline_ymax; + +┌───────────────┐ +│ pipeline_ymax │ +├───────────────┤ +│ 22 │ +└───────────────┘ +``` diff --git a/tidb-cloud-lake/sql/st-ymin.md b/tidb-cloud-lake/sql/st-ymin.md new file mode 100644 index 0000000000000..eb155ebe3e5f7 --- /dev/null +++ b/tidb-cloud-lake/sql/st-ymin.md @@ -0,0 +1,73 @@ +--- +title: ST_YMIN +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Returns the minimum latitude (Y coordinate) of all points contained in the specified GEOMETRY or GEOGRAPHY object. + +## Syntax + +```sql +ST_YMIN() +``` + +## Arguments + +| Arguments | Description | +|--------------|------------------------------------------------------| +| `` | The argument must be an expression of type GEOMETRY or GEOGRAPHY. | + +## Return Type + +Double. + +## Examples + +### GEOMETRY examples + +```sql +SELECT + ST_YMIN( + TO_GEOMETRY( + 'GEOMETRYCOLLECTION(POINT(-180 -10),LINESTRING(-179 0, 179 30),POINT EMPTY)' + ) + ) AS pipeline_ymin; + +┌───────────────┐ +│ pipeline_ymin │ +├───────────────┤ +│ -10 │ +└───────────────┘ + +SELECT + ST_YMIN( + TO_GEOMETRY( + 'GEOMETRYCOLLECTION(POINT(180 0),LINESTRING(-60 -30, 60 30),POLYGON((40 40,20 45,45 30,40 40)))' + ) + ) AS pipeline_ymin; + +┌───────────────┐ +│ pipeline_ymin │ +├───────────────┤ +│ -30 │ +└───────────────┘ +``` + +### GEOGRAPHY examples + +```sql +SELECT + ST_YMIN( + ST_GEOGFROMWKT( + 'LINESTRING(-179 10, 179 22)' + ) + ) AS pipeline_ymin; + +┌───────────────┐ +│ pipeline_ymin │ +├───────────────┤ +│ 10 │ +└───────────────┘ +``` diff --git a/tidb-cloud-lake/sql/stage.md b/tidb-cloud-lake/sql/stage.md new file mode 100644 index 0000000000000..b820e54cb06e5 --- /dev/null +++ b/tidb-cloud-lake/sql/stage.md @@ -0,0 +1,37 @@ +--- +title: Stage +--- + +This page provides a comprehensive overview of stage operations in Databend, organized by functionality for easy reference. + +## Stage Management + +| Command | Description | +|---------|-------------| +| [CREATE STAGE](/tidb-cloud-lake/sql/create-stage.md) | Creates a new stage for storing files | +| [DROP STAGE](/tidb-cloud-lake/sql/drop-stage.md) | Removes a stage | +| [PRESIGN](/tidb-cloud-lake/sql/presign.md) | Generates a pre-signed URL for stage access | + +## Stage Operations + +| Command | Description | +|---------|-------------| +| [LIST STAGE](/tidb-cloud-lake/sql/list-stage-files.md) | Lists files in a stage | +| [REMOVE STAGE](05-ddl-remove-stage.md) | Removes files from a stage | + +## Stage Information + +| Command | Description | +|---------|-------------| +| [DESC STAGE](/tidb-cloud-lake/sql/desc-stage.md) | Shows detailed information about a stage | +| [SHOW STAGES](/tidb-cloud-lake/sql/show-stages.md) | Lists all stages in the current or specified database | + +## Related Topics + +- [Load from Stage](/tidb-cloud-lake/guides/load-from-stage.md) +- [Query & Transform](/tidb-cloud-lake/guides/query-stage.md) +- [File Format (DDL)](/tidb-cloud-lake/sql/file-format.md) + +:::note +Stages in Databend are used as temporary storage locations for data files that you want to load into tables or unload from tables. +::: diff --git a/tidb-cloud-lake/sql/start-day.md b/tidb-cloud-lake/sql/start-day.md new file mode 100644 index 0000000000000..33be6ff656a93 --- /dev/null +++ b/tidb-cloud-lake/sql/start-day.md @@ -0,0 +1,33 @@ +--- +title: TO_START_OF_DAY +--- + +Rounds down a date with time (timestamp/datetime) to the start of the day. +## Syntax + +```sql +TO_START_OF_DAY( ) +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | timestamp | + +## Return Type + +`TIMESTAMP`, returns date in “YYYY-MM-DD hh:mm:ss.ffffff” format. + +## Examples + +```sql +SELECT + to_start_of_day('2023-11-12 09:38:18.165575'); + +┌───────────────────────────────────────────────┐ +│ to_start_of_day('2023-11-12 09:38:18.165575') │ +├───────────────────────────────────────────────┤ +│ 2023-11-12 00:00:00 │ +└───────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/start-fifteen-minutes.md b/tidb-cloud-lake/sql/start-fifteen-minutes.md new file mode 100644 index 0000000000000..db61d19b0b497 --- /dev/null +++ b/tidb-cloud-lake/sql/start-fifteen-minutes.md @@ -0,0 +1,33 @@ +--- +title: TO_START_OF_FIFTEEN_MINUTES +--- + +Rounds down the date with time (timestamp/datetime) to the start of the fifteen-minute interval. +## Syntax + +```sql +TO_START_OF_FIFTEEN_MINUTES() +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | timestamp | + +## Return Type + +`TIMESTAMP`, returns date in “YYYY-MM-DD hh:mm:ss.ffffff” format. + +## Examples + +```sql +SELECT + to_start_of_fifteen_minutes('2023-11-12 09:38:18.165575'); + +┌───────────────────────────────────────────────────────────┐ +│ to_start_of_fifteen_minutes('2023-11-12 09:38:18.165575') │ +├───────────────────────────────────────────────────────────┤ +│ 2023-11-12 09:30:00 │ +└───────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/start-five-minutes.md b/tidb-cloud-lake/sql/start-five-minutes.md new file mode 100644 index 0000000000000..781828a1db500 --- /dev/null +++ b/tidb-cloud-lake/sql/start-five-minutes.md @@ -0,0 +1,33 @@ +--- +title: TO_START_OF_FIVE_MINUTES +--- + +Rounds down a date with time (timestamp/datetime) to the start of the five-minute interval. +## Syntax + +```sql +TO_START_OF_FIVE_MINUTES() +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | timestamp | + +## Return Type + +`TIMESTAMP`, returns date in “YYYY-MM-DD hh:mm:ss.ffffff” format. + +## Examples + +```sql +SELECT + to_start_of_five_minutes('2023-11-12 09:38:18.165575') + +┌────────────────────────────────────────────────────────┐ +│ to_start_of_five_minutes('2023-11-12 09:38:18.165575') │ +├────────────────────────────────────────────────────────┤ +│ 2023-11-12 09:35:00 │ +└────────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/start-hour.md b/tidb-cloud-lake/sql/start-hour.md new file mode 100644 index 0000000000000..fcdf8fbb5cef6 --- /dev/null +++ b/tidb-cloud-lake/sql/start-hour.md @@ -0,0 +1,33 @@ +--- +title: TO_START_OF_HOUR +--- + +Rounds down a date with time (timestamp/datetime) to the start of the hour. +## Syntax + +```sql +TO_START_OF_HOUR() +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | timestamp | + +## Return Type + +`TIMESTAMP`, returns date in “YYYY-MM-DD hh:mm:ss.ffffff” format. + +## Examples + +```sql +SELECT + to_start_of_hour('2023-11-12 09:38:18.165575'); + +┌────────────────────────────────────────────────┐ +│ to_start_of_hour('2023-11-12 09:38:18.165575') │ +├────────────────────────────────────────────────┤ +│ 2023-11-12 09:00:00 │ +└────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/start-iso-year.md b/tidb-cloud-lake/sql/start-iso-year.md new file mode 100644 index 0000000000000..61a81b1628ac9 --- /dev/null +++ b/tidb-cloud-lake/sql/start-iso-year.md @@ -0,0 +1,34 @@ +--- +title: TO_START_OF_ISO_YEAR +--- + +Returns the first day of the ISO year for a date or a date with time (timestamp/datetime). + +## Syntax + +```sql +TO_START_OF_ISO_YEAR() +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------| +| `` | date/timestamp | + +## Return Type + +`DATE`, returns date in “YYYY-MM-DD” format. + +## Examples + +```sql +SELECT + to_start_of_iso_year('2023-11-12 09:38:18.165575'); + +┌────────────────────────────────────────────────────┐ +│ to_start_of_iso_year('2023-11-12 09:38:18.165575') │ +├────────────────────────────────────────────────────┤ +│ 2023-01-02 │ +└────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/start-minute.md b/tidb-cloud-lake/sql/start-minute.md new file mode 100644 index 0000000000000..f18fec8a91c42 --- /dev/null +++ b/tidb-cloud-lake/sql/start-minute.md @@ -0,0 +1,34 @@ +--- +title: TO_START_OF_MINUTE +--- + +Rounds down a date with time (timestamp/datetime) to the start of the minute. + +## Syntax + +```sql +TO_START_OF_MINUTE( ) +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | timestamp | + +## Return Type + +`TIMESTAMP`, returns date in “YYYY-MM-DD hh:mm:ss.ffffff” format. + +## Examples + +```sql +SELECT + to_start_of_minute('2023-11-12 09:38:18.165575'); + +┌──────────────────────────────────────────────────┐ +│ to_start_of_minute('2023-11-12 09:38:18.165575') │ +├──────────────────────────────────────────────────┤ +│ 2023-11-12 09:38:00 │ +└──────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/start-month.md b/tidb-cloud-lake/sql/start-month.md new file mode 100644 index 0000000000000..b6aa4393349f0 --- /dev/null +++ b/tidb-cloud-lake/sql/start-month.md @@ -0,0 +1,35 @@ +--- +title: TO_START_OF_MONTH +--- + +Rounds down a date or date with time (timestamp/datetime) to the first day of the month. +Returns the date. + +## Syntax + +```sql +TO_START_OF_MONTH() +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------| +| `` | date/timestamp | + +## Return Type + +`DATE`, returns date in “YYYY-MM-DD” format. + +## Examples + +```sql +SELECT + to_start_of_month('2023-11-12 09:38:18.165575'); + +┌─────────────────────────────────────────────────┐ +│ to_start_of_month('2023-11-12 09:38:18.165575') │ +├─────────────────────────────────────────────────┤ +│ 2023-11-01 │ +└─────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/start-quarter.md b/tidb-cloud-lake/sql/start-quarter.md new file mode 100644 index 0000000000000..0a43f858b412c --- /dev/null +++ b/tidb-cloud-lake/sql/start-quarter.md @@ -0,0 +1,36 @@ +--- +title: TO_START_OF_QUARTER +--- + +Rounds down a date or date with time (timestamp/datetime) to the first day of the quarter. +The first day of the quarter is either 1 January, 1 April, 1 July, or 1 October. +Returns the date. + +## Syntax + +```sql +TO_START_OF_QUARTER() +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------| +| `` | date/timestamp | + +## Return Type + +`DATE`, returns date in “YYYY-MM-DD” format. + +## Examples + +```sql +SELECT + to_start_of_quarter('2023-11-12 09:38:18.165575'); + +┌───────────────────────────────────────────────────┐ +│ to_start_of_quarter('2023-11-12 09:38:18.165575') │ +├───────────────────────────────────────────────────┤ +│ 2023-10-01 │ +└───────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/start-second.md b/tidb-cloud-lake/sql/start-second.md new file mode 100644 index 0000000000000..273a0b82d45ca --- /dev/null +++ b/tidb-cloud-lake/sql/start-second.md @@ -0,0 +1,34 @@ +--- +title: TO_START_OF_SECOND +--- + +Rounds down a date with time (timestamp/datetime) to the start of the second. + +## Syntax + +```sql +TO_START_OF_SECOND() +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | timestamp | + +## Return Type + +`TIMESTAMP`, returns date in “YYYY-MM-DD hh:mm:ss.ffffff” format. + +## Examples + +```sql +SELECT + to_start_of_second('2023-11-12 09:38:18.165575'); + +┌──────────────────────────────────────────────────┐ +│ to_start_of_second('2023-11-12 09:38:18.165575') │ +├──────────────────────────────────────────────────┤ +│ 2023-11-12 09:38:18 │ +└──────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/start-ten-minutes.md b/tidb-cloud-lake/sql/start-ten-minutes.md new file mode 100644 index 0000000000000..a5add89ee667f --- /dev/null +++ b/tidb-cloud-lake/sql/start-ten-minutes.md @@ -0,0 +1,34 @@ +--- +title: TO_START_OF_TEN_MINUTES +--- + +Rounds down a date with time (timestamp/datetime) to the start of the ten-minute interval. + +## Syntax + +```sql +TO_START_OF_TEN_MINUTES() +``` + +## Arguments + +| Arguments | Description | +|-----------|-------------| +| `` | timestamp | + +## Return Type + +`TIMESTAMP`, returns date in “YYYY-MM-DD hh:mm:ss.ffffff” format. + +## Examples + +```sql +SELECT + to_start_of_ten_minutes('2023-11-12 09:38:18.165575'); + +┌───────────────────────────────────────────────────────┐ +│ to_start_of_ten_minutes('2023-11-12 09:38:18.165575') │ +├───────────────────────────────────────────────────────┤ +│ 2023-11-12 09:30:00 │ +└───────────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/start-week.md b/tidb-cloud-lake/sql/start-week.md new file mode 100644 index 0000000000000..46d598f96336a --- /dev/null +++ b/tidb-cloud-lake/sql/start-week.md @@ -0,0 +1,36 @@ +--- +title: TO_START_OF_WEEK +--- + +Returns the first day of the week for a date or a date with time (timestamp/datetime). +The first day of a week can be Sunday or Monday, which is specified by the argument `mode`. + +## Syntax + +```sql +TO_START_OF_WEEK( [, mode]) +``` + +## Arguments + +| Arguments | Description | +|-----------|-----------------------------------------------------------------------------------------------------| +| `` | date/timestamp | +| `[mode]` | Optional. If it is 0, the result is Sunday, otherwise, the result is Monday. The default value is 0 | + +## Return Type + +`DATE`, returns date in “YYYY-MM-DD” format. + +## Examples + +```sql +SELECT + to_start_of_week('2023-11-12 09:38:18.165575'); + +┌────────────────────────────────────────────────┐ +│ to_start_of_week('2023-11-12 09:38:18.165575') │ +├────────────────────────────────────────────────┤ +│ 2023-11-12 │ +└────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/start-year.md b/tidb-cloud-lake/sql/start-year.md new file mode 100644 index 0000000000000..488ec70871029 --- /dev/null +++ b/tidb-cloud-lake/sql/start-year.md @@ -0,0 +1,35 @@ +--- +title: TO_START_OF_YEAR +--- + +Returns the first day of the year for a date or a date with time (timestamp/datetime). + +## Syntax + +```sql +TO_START_OF_YEAR() +``` + +## Arguments + +| Arguments | Description | +|-----------|----------------| +| `` | date/timestamp | + +## Return Type + +`DATE`, returns date in “YYYY-MM-DD” format. + +## Examples + +```sql +SELECT + to_start_of_year('2023-11-12 09:38:18.165575'); + +┌────────────────────────────────────────────────┐ +│ to_start_of_year('2023-11-12 09:38:18.165575') │ +│ Date │ +├────────────────────────────────────────────────┤ +│ 2023-01-01 │ +└────────────────────────────────────────────────┘ +``` diff --git a/tidb-cloud-lake/sql/stddev-pop.md b/tidb-cloud-lake/sql/stddev-pop.md new file mode 100644 index 0000000000000..6fe0414db5b14 --- /dev/null +++ b/tidb-cloud-lake/sql/stddev-pop.md @@ -0,0 +1,65 @@ +--- +title: STDDEV_POP +title_includes: STD, STDDEV +--- + +Aggregate function. + +The STDDEV_POP() function returns the population standard deviation(the square root of VAR_POP()) of an expression. + +:::tip +STD() or STDDEV() can also be used, which are equivalent but not standard SQL. +::: + +:::caution +NULL values are not counted. +::: + +## Syntax + +```sql +STDDEV_POP() +STDDEV() +STD() +``` + +## Arguments + +| Arguments | Description | +|-----------|--------------------------| +| `` | Any numerical expression | + +## Return Type + +double + +## Example + +**Create a Table and Insert Sample Data** +```sql +CREATE TABLE test_scores ( + id INT, + student_id INT, + score FLOAT +); + +INSERT INTO test_scores (id, student_id, score) +VALUES (1, 1, 80), + (2, 2, 85), + (3, 3, 90), + (4, 4, 95), + (5, 5, 100); +``` + +**Query Demo: Calculate Population Standard Deviation of Test Scores** +```sql +SELECT STDDEV_POP(score) AS test_score_stddev_pop +FROM test_scores; +``` + +**Result** +```sql +| test_score_stddev_pop | +|-----------------------| +| 7.07107 | +``` \ No newline at end of file diff --git a/tidb-cloud-lake/sql/stddev-samp.md b/tidb-cloud-lake/sql/stddev-samp.md new file mode 100644 index 0000000000000..37bd6fe3b5ef6 --- /dev/null +++ b/tidb-cloud-lake/sql/stddev-samp.md @@ -0,0 +1,58 @@ +--- +title: STDDEV_SAMP +--- + +Returns the sample standard deviation (the square root of VAR_SAMP()) of an expression. + +- NULL values are ignored. +- STDDEV_SAMP() returns `NULL` instead of `0` when there is only one input record. + +## Syntax + +```sql +STDDEV_SAMP() +``` + +## Arguments + +| Arguments | Description | +| --------- | ------------------------ | +| `` | Any numerical expression | + +## Return Type + +Double. + +## Example + +**Create a Table and Insert Sample Data** + +```sql +CREATE TABLE height_data ( + id INT, + person_id INT, + height FLOAT +); + +INSERT INTO height_data (id, person_id, height) +VALUES (1, 1, 5.8), + (2, 2, 6.1), + (3, 3, 5.9), + (4, 4, 5.7), + (5, 5, 6.3); +``` + +**Query Demo: Calculate Sample Standard Deviation of Heights** + +```sql +SELECT STDDEV_SAMP(height) AS height_stddev_samp +FROM height_data; +``` + +**Result** + +```sql +| height_stddev_samp | +|--------------------| +| 0.240 | +``` diff --git a/tidb-cloud-lake/sql/stored-procedure-sql-scripting.md b/tidb-cloud-lake/sql/stored-procedure-sql-scripting.md new file mode 100644 index 0000000000000..91bc5de663ad4 --- /dev/null +++ b/tidb-cloud-lake/sql/stored-procedure-sql-scripting.md @@ -0,0 +1,641 @@ +--- +title: Stored Procedure & SQL Scripting +slug: /stored-procedure-scripting/ +--- + +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Stored procedures in Databend let you package SQL logic that runs on the server with access to control flow, variables, cursors, and dynamic statements. This page explains how to create procedures and write the inline scripting that powers them. + +## Defining a Procedure + +```sql +CREATE [OR REPLACE] PROCEDURE ( , ...) +RETURNS [NOT NULL] +LANGUAGE SQL +[COMMENT = ''] +AS $$ +BEGIN + -- Declarations and statements + RETURN ; + -- Or return a query result + -- RETURN TABLE(); +END; +$$; +``` + +| Component | Description | +|-----------|-------------| +| `` | Identifier for the procedure. Schema qualification is optional. | +| ` ` | Input parameters typed with Databend scalar types. Parameters are passed by value. | +| `RETURNS [NOT NULL]` | Declares the logical return type. `NOT NULL` enforces a non-nullable response. | +| `LANGUAGE SQL` | Databend currently accepts `SQL` only. | +| `RETURN` / `RETURN TABLE` | Ends execution and provides either a scalar or tabular result. | + +Use [`CREATE PROCEDURE`](/tidb-cloud-lake/sql/create-procedure.md) to persist the definition, [`CALL`](/tidb-cloud-lake/sql/call-procedure.md) to run it, and [`DROP PROCEDURE`](/tidb-cloud-lake/sql/drop-procedure.md) to remove it. + +### Minimal Example + +```sql +CREATE OR REPLACE PROCEDURE convert_kg_to_lb(kg DOUBLE) +RETURNS DOUBLE +LANGUAGE SQL +COMMENT = 'Converts kilograms to pounds' +AS $$ +BEGIN + RETURN kg * 2.20462; +END; +$$; + +CALL PROCEDURE convert_kg_to_lb(10); +``` + +## Language Basics Inside Procedures + +### Declare Section + +Stored procedures can start with an optional `DECLARE` block to initialize variables before the executable section. Each entry in the block follows the same syntax as `LET`: `name [] [:= | DEFAULT ]`. When you omit the initializer, the variable must be assigned before it is read; referencing it too early raises error 3129. + +```sql +CREATE OR REPLACE PROCEDURE sp_with_declare() +RETURNS INT +LANGUAGE SQL +AS $$ +DECLARE + counter INT DEFAULT 0; +BEGIN + counter := counter + 5; + RETURN counter; +END; +$$; + +CALL PROCEDURE sp_with_declare(); +``` + +The `DECLARE` section accepts the same definitions as `LET`, including optional data types, `RESULTSET`, and `CURSOR` declarations. Use a semicolon after each item. + +### Variables and Assignment + +Use `LET` to declare variables or constants. You can optionally provide a type annotation and an initializer with either `:=` or the `DEFAULT` keyword. Without an initializer, the variable must be assigned before it is read; referencing it beforehand raises error 3129. Reassign by omitting `LET`. + +```sql +CREATE OR REPLACE PROCEDURE sp_demo_variables() +RETURNS FLOAT +LANGUAGE SQL +AS $$ +BEGIN + LET total DECIMAL(10, 2) DEFAULT 100; + LET rate FLOAT := 0.07; + LET surcharge FLOAT := NULL; -- Explicitly initialize before use + LET tax FLOAT DEFAULT rate; -- DEFAULT can reference initialized variables + + total := total * rate; -- Multiply by the rate + total := total + COALESCE(surcharge, 5); -- Reassign without LET + total := total + tax; + + RETURN total; +END; +$$; + +CALL PROCEDURE sp_demo_variables(); +``` + +Referencing an uninitialized variable anywhere in a procedure raises error 3129. + +### Variable Scope + +Variables are scoped to the enclosing block. Inner blocks can shadow outer bindings, and the outer value is restored when the block exits. + +```sql +CREATE OR REPLACE PROCEDURE sp_demo_scope() +RETURNS STRING +LANGUAGE SQL +AS $$ +BEGIN + LET threshold := 10; + LET summary := 'outer=' || threshold; + + IF threshold > 0 THEN + LET threshold := 5; -- Shadows the outer value + summary := summary || ', inner=' || threshold; + END IF; + + summary := summary || ', after=' || threshold; + RETURN summary; +END; +$$; + +CALL PROCEDURE sp_demo_scope(); +``` + +### Comments + +Procedures support single-line (`-- text`) and multi-line (`/* text */`) comments. + +```sql +CREATE OR REPLACE PROCEDURE sp_demo_comments() +RETURNS FLOAT +LANGUAGE SQL +AS $$ +BEGIN + -- Calculate price with tax + LET price := 15; + LET tax_rate := 0.08; + + /* + Multi-line comments are useful for documenting complex logic. + The following line returns the tax-inclusive price. + */ + RETURN price * (1 + tax_rate); +END; +$$; + +CALL PROCEDURE sp_demo_comments(); +``` + +### Lambda Expressions + +Lambda expressions define inline logic that can be passed to array functions or invoked within queries. They follow the ` -> ` form (wrap parameters in parentheses when more than one is provided). The expression can include casts, conditional logic, and even references to procedure variables. + +- Use `:variable_name` to reference procedure variables inside the lambda when it runs within a SQL statement. +- Functions such as `ARRAY_TRANSFORM` and `ARRAY_FILTER` evaluate the lambda for each element in the input array. + +```sql +CREATE OR REPLACE PROCEDURE sp_demo_lambda_array() +RETURNS STRING +LANGUAGE SQL +AS $$ +BEGIN + RETURN TABLE( + SELECT ARRAY_TRANSFORM([1, 2, 3, 4], item -> (item::Int + 1)) AS incremented + ); +END; +$$; + +CALL PROCEDURE sp_demo_lambda_array(); +``` + +Lambdas can also appear inside queries executed by the procedure. + +```sql +CREATE OR REPLACE PROCEDURE sp_demo_lambda_query() +RETURNS STRING +LANGUAGE SQL +AS $$ +BEGIN + RETURN TABLE( + SELECT + number, + ARRAY_TRANSFORM([number, number + 1], val -> (val::Int + 1)) AS next_values + FROM numbers(3) + ); +END; +$$; + +CALL PROCEDURE sp_demo_lambda_query(); +``` + +Capture procedure variables inside the lambda by prefixing them with `:` when the lambda runs in a SQL statement context. + +```sql +CREATE OR REPLACE PROCEDURE sp_lambda_filter() +RETURNS STRING +LANGUAGE SQL +AS $$ +BEGIN + LET threshold := 2; + RETURN TABLE( + SELECT ARRAY_FILTER([1, 2, 3, 4], element -> (element::Int > :threshold)) AS filtered + ); +END; +$$; + +CALL PROCEDURE sp_lambda_filter(); +``` + +You can also place complex expressions, such as `CASE` logic, inside the lambda body. + +```sql +CREATE OR REPLACE PROCEDURE sp_lambda_case() +RETURNS STRING +LANGUAGE SQL +AS $$ +BEGIN + RETURN TABLE( + SELECT + number, + ARRAY_TRANSFORM( + [number - 1, number, number + 1], + val -> (CASE WHEN val % 2 = 0 THEN 'even' ELSE 'odd' END) + ) AS parity_window + FROM numbers(3) + ); +END; +$$; + +CALL PROCEDURE sp_lambda_case(); +``` + +## Control Flow + +### IF Statements + +Use `IF ... ELSEIF ... ELSE ... END IF;` to branch inside a procedure. + +```sql +CREATE OR REPLACE PROCEDURE sp_evaluate_score(score INT) +RETURNS STRING +LANGUAGE SQL +AS $$ +BEGIN + IF score >= 90 THEN + RETURN 'Excellent'; + ELSEIF score >= 70 THEN + RETURN 'Good'; + ELSE + RETURN 'Review'; + END IF; +END; +$$; + +CALL PROCEDURE sp_evaluate_score(82); +``` + +### CASE Expressions + +`CASE` expressions provide an alternative to nested `IF` statements. + +```sql +CREATE OR REPLACE PROCEDURE sp_membership_discount(level STRING) +RETURNS FLOAT +LANGUAGE SQL +AS $$ +BEGIN + RETURN CASE + WHEN level = 'gold' THEN 0.2 + WHEN level = 'silver' THEN 0.1 + ELSE 0 + END; +END; +$$; + +CALL PROCEDURE sp_membership_discount('silver'); +``` + +### Range `FOR` + +Range-based loops iterate from a lower bound to an upper bound (inclusive). Use the optional `REVERSE` keyword to walk the range backwards. + +```sql +CREATE OR REPLACE PROCEDURE sp_sum_range(start_val INT, end_val INT) +RETURNS INT +LANGUAGE SQL +AS $$ +BEGIN + LET total := 0; + FOR i IN start_val TO end_val DO + total := total + i; + END FOR; + RETURN total; +END; +$$; + +CALL PROCEDURE sp_sum_range(1, 5); +``` + +Range loops require the lower bound to be less than or equal to the upper bound when stepping forward. + +```sql +CREATE OR REPLACE PROCEDURE sp_reverse_count(start_val INT, end_val INT) +RETURNS STRING +LANGUAGE SQL +AS $$ +BEGIN + LET output := ''; + FOR i IN REVERSE start_val TO end_val DO + output := output || i || ' '; + END FOR; + RETURN TRIM(output); +END; +$$; + +CALL PROCEDURE sp_reverse_count(1, 5); +``` + +#### `FOR ... IN` Queries + +Iterate directly over the result of a query. The loop variable exposes columns as fields. + +```sql +CREATE OR REPLACE PROCEDURE sp_sum_query(limit_rows INT) +RETURNS BIGINT +LANGUAGE SQL +AS $$ +BEGIN + LET total := 0; + FOR rec IN SELECT number FROM numbers(:limit_rows) DO + total := total + rec.number; + END FOR; + RETURN total; +END; +$$; + +CALL PROCEDURE sp_sum_query(5); +``` + +`FOR` can also iterate over previously declared result-set variables or cursors (see [Working with Query Results](#working-with-query-results)). + +### `WHILE` + +```sql +CREATE OR REPLACE PROCEDURE sp_factorial(n INT) +RETURNS INT +LANGUAGE SQL +AS $$ +BEGIN + LET result := 1; + WHILE n > 0 DO + result := result * n; + n := n - 1; + END WHILE; + RETURN result; +END; +$$; + +CALL PROCEDURE sp_factorial(5); +``` + +### `REPEAT` + +```sql +CREATE OR REPLACE PROCEDURE sp_repeat_sum(limit_val INT) +RETURNS INT +LANGUAGE SQL +AS $$ +BEGIN + LET counter := 0; + LET total := 0; + + REPEAT + counter := counter + 1; + total := total + counter; + UNTIL counter >= limit_val END REPEAT; + + RETURN total; +END; +$$; + +CALL PROCEDURE sp_repeat_sum(3); +``` + +### `LOOP` + +```sql +CREATE OR REPLACE PROCEDURE sp_retry_counter(max_attempts INT) +RETURNS INT +LANGUAGE SQL +AS $$ +BEGIN + LET retries := 0; + LOOP + retries := retries + 1; + IF retries >= max_attempts THEN + BREAK; + END IF; + END LOOP; + + RETURN retries; +END; +$$; + +CALL PROCEDURE sp_retry_counter(5); +``` + +### Break and Continue + +Use `BREAK` to exit a loop early and `CONTINUE` to skip to the next iteration. + +```sql +CREATE OR REPLACE PROCEDURE sp_break_example(limit_val INT) +RETURNS INT +LANGUAGE SQL +AS $$ +BEGIN + LET counter := 0; + LET total := 0; + + WHILE TRUE DO + counter := counter + 1; + IF counter > limit_val THEN + BREAK; + END IF; + IF counter % 2 = 0 THEN + CONTINUE; + END IF; + total := total + counter; + END WHILE; + + RETURN total; +END; +$$; + +CALL PROCEDURE sp_break_example(5); +``` + +Use `BREAK