Skip to content

Commit d2066c5

Browse files
TianyuZhang1214fanyang01
andauthored
feat: bootstrap MyDuck Server with DuckDB (apecloud#154)
--------- Co-authored-by: Fan Yang <[email protected]>
1 parent 3e9e0e0 commit d2066c5

File tree

9 files changed

+117
-6
lines changed

9 files changed

+117
-6
lines changed

README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,14 @@ MyDuck Server supports setting up replicas from common cloud-based MySQL offerin
123123

124124
With MyDuck's powerful analytics capabilities, you can create an HTAP (Hybrid Transactional/Analytical Processing) system where high-frequency data writes are directed to a standard MySQL instance, while analytical queries are handled by a MyDuck Server instance. Follow our HTAP setup instructions based on [ProxySQL](docs/tutorial/htap-proxysql-setup.md) or [MariaDB MaxScale](docs/tutorial/htap-maxscale-setup.md) to easily set up an HTAP demonstration.
125125

126+
### Query & Load Parquet Files
127+
128+
Looking to load Parquet files into MyDuck Server and start querying? Follow our [Parquet file loading guide](docs/tutorial/load-parquet-files.md) for easy setup.
129+
130+
### Already Using DuckDB?
131+
132+
Already have a DuckDB file? You can seamlessly bootstrap MyDuck Server with it. See our [DuckDB file bootstrapping guide](docs/tutorial/bootstrap.md) for more details.
133+
126134
## 💡 Contributing
127135

128136
Let’s make MySQL analytics fast and powerful—together!

catalog/database.go

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -217,9 +217,9 @@ func (d *Database) RenameTable(ctx *sql.Context, oldName string, newName string)
217217
// extractViewDefinitions is a helper function to extract view definitions from DuckDB
218218
func (d *Database) extractViewDefinitions(ctx *sql.Context, schemaName string, viewName string) ([]sql.ViewDefinition, error) {
219219
query := `
220-
SELECT view_name, sql
220+
SELECT DISTINCT view_name, sql
221221
FROM duckdb_views()
222-
WHERE schema_name = ?
222+
WHERE schema_name = ? AND NOT internal
223223
`
224224
args := []interface{}{schemaName}
225225

@@ -240,6 +240,12 @@ func (d *Database) extractViewDefinitions(ctx *sql.Context, schemaName string, v
240240
if err := rows.Scan(&name, &createViewStmt); err != nil {
241241
return nil, ErrDuckDB.New(err)
242242
}
243+
244+
// Skip system views directly
245+
if IsSystemView(name) {
246+
continue
247+
}
248+
243249
views = append(views, sql.ViewDefinition{
244250
Name: name,
245251
CreateViewStatement: createViewStmt,

catalog/internal_tables.go

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -81,21 +81,21 @@ var InternalTables = struct {
8181
GlobalStatus InternalTable
8282
}{
8383
PersistentVariable: InternalTable{
84-
Schema: "main",
84+
Schema: "__sys__",
8585
Name: "persistent_variable",
8686
KeyColumns: []string{"name"},
8787
ValueColumns: []string{"value", "vtype"},
8888
DDL: "name TEXT PRIMARY KEY, value TEXT, vtype TEXT",
8989
},
9090
BinlogPosition: InternalTable{
91-
Schema: "main",
91+
Schema: "__sys__",
9292
Name: "binlog_position",
9393
KeyColumns: []string{"channel"},
9494
ValueColumns: []string{"position"},
9595
DDL: "channel TEXT PRIMARY KEY, position TEXT",
9696
},
9797
PgReplicationLSN: InternalTable{
98-
Schema: "main",
98+
Schema: "__sys__",
9999
Name: "pg_replication_lsn",
100100
KeyColumns: []string{"slot_name"},
101101
ValueColumns: []string{"lsn"},

catalog/provider.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ func (prov *DatabaseProvider) AllDatabases(ctx *sql.Context) []sql.Database {
155155
}
156156

157157
switch schemaName {
158-
case "information_schema", "main", "pg_catalog":
158+
case "information_schema", "pg_catalog", "__sys__":
159159
continue
160160
}
161161

catalog/system_views.go

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
package catalog
2+
3+
var SystemViews = map[string]struct{}{
4+
"duckdb_columns": {},
5+
"duckdb_constraints": {},
6+
"duckdb_databases": {},
7+
"duckdb_indexes": {},
8+
"duckdb_schemas": {},
9+
"duckdb_tables": {},
10+
"duckdb_types": {},
11+
"duckdb_views": {},
12+
"pragma_database_list": {},
13+
"sqlite_master": {},
14+
"sqlite_schema": {},
15+
"sqlite_temp_master": {},
16+
"sqlite_temp_schema": {},
17+
}
18+
19+
func IsSystemView(viewName string) bool {
20+
_, ok := SystemViews[viewName]
21+
return ok
22+
}

docs/data/example.db

524 KB
Binary file not shown.

docs/data/example.parquet

352 Bytes
Binary file not shown.

docs/tutorial/bootstrap.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Bootstrapping from an existing DuckDB file
2+
3+
Given an existing DuckDB file, it is possible to bootstrap MyDuck Server with it or serve it with MyDuck Server.
4+
Here’s how to work with the `example.db` file located in `docs/data/`.
5+
6+
### Steps
7+
8+
1. **Prepare the data directory:**
9+
```bash
10+
mkdir example_data
11+
cp /path/to/example.db example_data/mysql.db # IMPORTANT: The attached filename must be `mysql.db`
12+
```
13+
14+
2. **Launch MyDuck Server and attach the data directory:**
15+
```bash
16+
docker run \
17+
-p 13306:3306 \
18+
-p 15432:5432 \
19+
--volume=/path/to/example_data:/home/admin/data \
20+
apecloud/myduckserver:main
21+
```
22+
23+
3. **Connect to MyDuck Server and query:**
24+
```bash
25+
# Using psql client & DuckDB syntax
26+
psql -h 127.0.0.1 -p 15432 -U mysql <<EOF
27+
SELECT * FROM "test_data";
28+
EOF
29+
30+
# Or using MySQL client & syntax
31+
mysql -h 127.0.0.1 -uroot -P13306 main <<EOF
32+
SELECT * FROM `test_data`;
33+
EOF
34+
```
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Query & Load Parquet Files
2+
3+
Imagine you have a large dataset stored in Parquet files. You want to share this data with your team, enabling them to query it using SQL.
4+
However, these files are too large to be stored locally and too slow to download from cloud storage every time.
5+
You can put these files on a server that is accessible to your team and run a MyDuck Server instance on it.
6+
Then, your team can query the dataset easily with either a Postgres or a MySQL client.
7+
8+
Below, we’ll show you how to query and load the `example.parquet` file from the `docs/data/` directory by attaching it into a MyDuck Server container.
9+
10+
## Steps
11+
12+
1. **Run MyDuck Server:**
13+
```bash
14+
docker run -p 13306:3306 -p 15432:5432 \
15+
-v /path/to/example.parquet:/home/admin/data/example.parquet \
16+
apecloud/myduckserver:main
17+
```
18+
19+
2. **Connect to MyDuck Server using `psql`:**
20+
```bash
21+
psql -h 127.0.0.1 -p 15432 -U mysql
22+
```
23+
24+
3. **Query the Parquet file directly:**
25+
```sql
26+
SELECT * FROM '/home/admin/data/example.parquet' LIMIT 10;
27+
```
28+
29+
4. **Load the Parquet file into a DuckDB table:**
30+
```sql
31+
CREATE TABLE test_data AS SELECT * FROM '/home/admin/data/example.parquet';
32+
SELECT * FROM test_data LIMIT 10;
33+
```
34+
35+
5. **Query the data with MySQL client & syntax:**
36+
```bash
37+
mysql -h 127.0.0.1 -uroot -P13306 main
38+
```
39+
```sql
40+
SELECT * FROM `test_data`;
41+
```

0 commit comments

Comments
 (0)