Skip to content

mcp: expose query tool on developer endpoint#36949

Open
bobbyiliev wants to merge 4 commits into
MaterializeInc:mainfrom
bobbyiliev:mcp-developer-query-tool
Open

mcp: expose query tool on developer endpoint#36949
bobbyiliev wants to merge 4 commits into
MaterializeInc:mainfrom
bobbyiliev:mcp-developer-query-tool

Conversation

@bobbyiliev

@bobbyiliev bobbyiliev commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Adds a query tool to the developer MCP endpoint, mirroring the agent's query shape (cluster + sql_query, read-only SELECT/SHOW/EXPLAIN) so EXPLAIN ANALYZE and queries against user objects work, query_system_catalog stays as the no-cluster catalog-only ergonomic path. Gated by the new enable_mcp_developer_query_tool dyncfg, default on, so the rollout matches the agent pattern.

Fixes https://linear.app/materializeinc/issue/DEX-25/

@bobbyiliev bobbyiliev requested a review from a team as a code owner June 9, 2026 20:30
@bobbyiliev bobbyiliev marked this pull request as draft June 9, 2026 20:31

@mtabebe mtabebe left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments... exciting stuff 👍

Comment thread src/environmentd/src/http/mcp.rs Outdated
"You are connected to the Materialize developer MCP server for troubleshooting and observability.\n\n",
"Tools:\n",
"- query_system_catalog: read-only SELECT/SHOW/EXPLAIN restricted to system catalog tables (mz_*, pg_catalog, information_schema). No cluster argument; prefer this for most catalog lookups.\n",
"- query: read-only SELECT/SHOW/EXPLAIN that can also reach user objects on a named cluster. Use this for EXPLAIN ANALYZE and for inspecting user objects directly. The tool may be hidden if the operator has disabled it.\n\n",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a little awkward that we advertize it if the feature is enabled ... IMO we should not add it to the instructions if it is gated off.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!

Comment thread src/environmentd/tests/server.rs Outdated
/// Tests the MCP developer endpoint.
/// Tests the MCP developer endpoint with the query tool explicitly disabled.
#[mz_ore::test]
fn test_mcp_developer() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to with test_mcp_developer_with_query_tool_disabled?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to test_mcp_developer_with_query_tool_disabled in the last commit!

@def- def- left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running bin/cargo-test -p mz-environmentd --test server test_mcp_developer_query_tool fails:

    tests/testdata/mcp/developer_query_tool:91:
    {"jsonrpc":"2.0","id":14,"method":"tools/call","params":{"name":"query","arguments":{"cluster":"quickstart","sql_query":"EXPLAIN ANALYZE MEMORY FOR INDEX t_idx AS SQL"}}}

    expected:
    200 OK
    {"jsonrpc":"2.0","id":14,"result":{"content":[{"type":"text","text":"[\n  [\n    \"WITH\\n    summary_memory AS\\n    (\\n        SELECT\\n            mlm.global_id AS global_id,\\n            mlm.lir_id AS lir_id,\\n            sum(mas.size) AS total_memory,\\n            sum(mas.records) AS total_records,\\n            CASE\\n                    WHEN count(DISTINCT mas.worker_id) <> 0\\n                            THEN sum(mas.size) / count(DISTINCT mas.worker_id)\\n                        ELSE NULL\\n                END\\n                AS avg_memory,\\n            CASE\\n                    WHEN count(DISTINCT mas.worker_id) <> 0\\n                            THEN\\n                                sum(mas.records) / count(DISTINCT mas.worker_id)\\n                        ELSE NULL\\n                END\\n                AS avg_records\\n        FROM\\n            mz_introspection.mz_lir_mapping AS mlm\\n                 CROSS JOIN generate_series((mlm.operator_id_start)::int8, (mlm.operator_id_end - 1)::int8) AS valid_id\\n                JOIN\\n                    mz_introspection.mz_arrangement_sizes_per_worker AS mas\\n                    ON (mas.operator_id = valid_id)\\n        GROUP BY mlm.global_id, mlm.lir_id\\n    )\\nSELECT\\n    repeat(' ', nesting * 2) || operator AS operator,\\n    pg_size_pretty(sm.total_memory) AS total_memory,\\n    sm.total_records AS total_records\\nFROM\\n    mz_introspection.mz_lir_mapping AS mlm\\n        LEFT JOIN summary_memory AS sm USING(global_id, lir_id)\\n        JOIN\\n            mz_introspection.mz_mappable_objects AS mo\\n            ON (mlm.global_id = mo.global_id)\\nWHERE mo.name = 'materialize.public.t_idx'\\nORDER BY mlm.lir_id DESC;\"\n  ]\n]"}],"isError":false}}

    actual:
    200 OK
    {"jsonrpc":"2.0","id":14,"error":{"code":-32602,"message":"Query validation failed: Only SELECT, SHOW, and EXPLAIN statements are allowed","data":{"error_type":"ValidationError"}}}

With this diff:

diff --git a/src/environmentd/tests/server.rs b/src/environmentd/tests/server.rs
index d5737cbe12..ca064177e6 100644
--- a/src/environmentd/tests/server.rs
+++ b/src/environmentd/tests/server.rs
@@ -5083,6 +5083,25 @@ fn run_mcp_datadriven(testdata_path: &str, harness: test_util::TestHarness) {
             let url = match tc.directive.as_str() {
                 "mcp-agent" => &agents_url,
                 "mcp-developer" => &developer_url,
+                // Setup SQL, executed over pgwire as the default HTTP user
+                // (the role the MCP endpoints run as), e.g. to create user
+                // objects for the `query` tool to target.
+                "sql" => {
+                    let mut client = server
+                        .pg_config()
+                        .user(&HTTP_DEFAULT_USER.name)
+                        .connect(postgres::NoTls)
+                        .unwrap();
+                    // One statement at a time: DDL is not allowed in the
+                    // implicit transaction block of a multi-statement query.
+                    for stmt in tc.input.split(';') {
+                        let stmt = stmt.trim();
+                        if !stmt.is_empty() {
+                            client.batch_execute(stmt).unwrap();
+                        }
+                    }
+                    return "ok\n".to_string();
+                }
                 other => panic!("unknown directive: {}", other),
             };

diff --git a/src/environmentd/tests/testdata/mcp/developer_query_tool b/src/environmentd/tests/testdata/mcp/developer_query_tool
index eb9f6ab59c..9c056bb027 100644
--- a/src/environmentd/tests/testdata/mcp/developer_query_tool
+++ b/src/environmentd/tests/testdata/mcp/developer_query_tool
@@ -64,6 +64,43 @@ mcp-developer
 200 OK
 {"jsonrpc":"2.0","id":13,"result":{"content":[{"type":"text","text":"[\n  [\n    \"Explained Query (fast path):\\n  →Constant (1 rows)\\n\\nTarget cluster: mz_catalog_server\\n\"\n  ]\n]"}],"isError":false}}

+# =============================================================================
+# query - EXPLAIN ANALYZE
+# =============================================================================
+
+# EXPLAIN ANALYZE is the headline use case for the dev query tool (it needs a
+# cluster, which query_system_catalog cannot provide). It does NOT parse to
+# Statement::ExplainPlan like plain EXPLAIN does: EXPLAIN ANALYZE ... FOR
+# INDEX/MATERIALIZED VIEW parses to Statement::ExplainAnalyzeObject and
+# EXPLAIN ANALYZE CLUSTER parses to Statement::ExplainAnalyzeCluster, so the
+# read-only validator must allow those variants explicitly.
+#
+# The AS SQL forms are used here because they exercise the exact same
+# statement variants through validation, planning, and execution, but return
+# the generated introspection SQL instead of executing it, which keeps the
+# output deterministic (the executed forms read mz_introspection, whose
+# contents depend on dataflow hydration timing).
+
+sql
+CREATE TABLE t (a int);
+CREATE INDEX t_idx ON t (a)
+----
+ok
+
+# EXPLAIN ANALYZE ... FOR INDEX (Statement::ExplainAnalyzeObject).
+mcp-developer
+{"jsonrpc":"2.0","id":14,"method":"tools/call","params":{"name":"query","arguments":{"cluster":"quickstart","sql_query":"EXPLAIN ANALYZE MEMORY FOR INDEX t_idx AS SQL"}}}
+----
+200 OK
+{"jsonrpc":"2.0","id":14,"result":{"content":[{"type":"text","text":"[\n  [\n    \"WITH\\n    summary_memory AS\\n    (\\n        SELECT\\n            mlm.global_id AS global_id,\\n            mlm.lir_id AS lir_id,\\n            sum(mas.size) AS total_memory,\\n            sum(mas.records) AS total_records,\\n            CASE\\n                    WHEN count(DISTINCT mas.worker_id) <> 0\\n                            THEN sum(mas.size) / count(DISTINCT mas.worker_id)\\n                        ELSE NULL\\n                END\\n                AS avg_memory,\\n            CASE\\n                    WHEN count(DISTINCT mas.worker_id) <> 0\\n                            THEN\\n                                sum(mas.records) / count(DISTINCT mas.worker_id)\\n                        ELSE NULL\\n                END\\n                AS avg_records\\n        FROM\\n            mz_introspection.mz_lir_mapping AS mlm\\n                 CROSS JOIN generate_series((mlm.operator_id_start)::int8, (mlm.operator_id_end - 1)::int8) AS valid_id\\n                JOIN\\n                    mz_introspection.mz_arrangement_sizes_per_worker AS mas\\n                    ON (mas.operator_id = valid_id)\\n        GROUP BY mlm.global_id, mlm.lir_id\\n    )\\nSELECT\\n    repeat(' ', nesting * 2) || operator AS operator,\\n    pg_size_pretty(sm.total_memory) AS total_memory,\\n    sm.total_records AS total_records\\nFROM\\n    mz_introspection.mz_lir_mapping AS mlm\\n        LEFT JOIN summary_memory AS sm USING(global_id, lir_id)\\n        JOIN\\n            mz_introspection.mz_mappable_objects AS mo\\n            ON (mlm.global_id = mo.global_id)\\nWHERE mo.name = 'materialize.public.t_idx'\\nORDER BY mlm.lir_id DESC;\"\n  ]\n]"}],"isError":false}}
+
+# EXPLAIN ANALYZE CLUSTER (Statement::ExplainAnalyzeCluster).
+mcp-developer
+{"jsonrpc":"2.0","id":15,"method":"tools/call","params":{"name":"query","arguments":{"cluster":"quickstart","sql_query":"EXPLAIN ANALYZE CLUSTER MEMORY AS SQL"}}}
+----
+200 OK
+{"jsonrpc":"2.0","id":15,"result":{"content":[{"type":"text","text":"[\n  [\n    \"WITH\\n    per_operator_memory_totals AS\\n    (\\n        SELECT\\n            mlm.global_id AS global_id,\\n            mlm.lir_id AS lir_id,\\n            sum(mas.size) AS total_memory,\\n            sum(mas.records) AS total_records\\n        FROM\\n            mz_introspection.mz_lir_mapping AS mlm\\n                 CROSS JOIN generate_series((mlm.operator_id_start)::int8, (mlm.operator_id_end - 1)::int8) AS valid_id\\n                JOIN\\n                    mz_introspection.mz_arrangement_sizes_per_worker AS mas\\n                    ON (mas.operator_id = valid_id)\\n        GROUP BY mlm.global_id, mlm.lir_id\\n    ),\\n    object_memory_totals AS\\n    (\\n        SELECT\\n            pomt.global_id AS global_id,\\n            sum(pomt.total_memory) AS total_memory,\\n            sum(pomt.total_records) AS total_records\\n        FROM per_operator_memory_totals AS pomt\\n        GROUP BY pomt.global_id\\n    )\\nSELECT\\n    mo.name AS object,\\n    mo.global_id AS global_id,\\n    pg_size_pretty(omt.total_memory) AS total_memory,\\n    omt.total_records AS total_records\\nFROM\\n    mz_introspection.mz_mappable_objects AS mo\\n        LEFT JOIN object_memory_totals AS omt USING(global_id)\\nORDER BY\\n    omt.total_memory DESC NULLS LAST,\\n    total_records DESC NULLS LAST,\\n    mo.name DESC;\"\n  ]\n]"}],"isError":false}}
+
 # =============================================================================
 # query - writes and DDL rejected
 # =============================================================================

So it seems like you can't actually do EXPLAIN ANALYZE

@bobbyiliev

Copy link
Copy Markdown
Contributor Author

Running bin/cargo-test -p mz-environmentd --test server test_mcp_developer_query_tool fails:

@def- Good catch, the validate_readonly_query only allowed Statement::ExplainPlan, but EXPLAIN ANALYZE parses to ExplainAnalyzeObject / ExplainAnalyzeCluster. Extended the allowlist to cover all read-only EXPLAIN variants with unit and integration tests.

@bobbyiliev bobbyiliev requested a review from def- June 10, 2026 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants