Skip to content

Commit 5f75cc1

Browse files
authored
EH: Add MCP server for Open Cluster Scheduler (#42)
1 parent c8902c2 commit 5f75cc1

12 files changed

+1032
-17
lines changed

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ simulate:
5555
rm -rf ./installation
5656
@echo "Creating new subdirectory for installation..."
5757
mkdir -p ./installation
58-
docker run --platform=linux/amd64 --rm -it -h master --privileged --cap-add SYS_ADMIN -p 9464:9464 --name $(CONTAINER_NAME) -v ./installation:/opt/cs-install -v ./:/root/go/src/github.com/hpc-gridware/go-clusterscheduler $(IMAGE_NAME):$(IMAGE_TAG) /bin/bash -c "cd /root/go/src/github.com/hpc-gridware/go-clusterscheduler/cmd/simulator && go build . && ./simulator run ../../cluster.json && /bin/bash"
58+
docker run --platform=linux/amd64 --rm -it -h master --privileged --cap-add SYS_ADMIN -p 9464:9464 -p 8888:8888 --name $(CONTAINER_NAME) -v ./installation:/opt/cs-install -v ./:/root/go/src/github.com/hpc-gridware/go-clusterscheduler $(IMAGE_NAME):$(IMAGE_TAG) /bin/bash -c "cd /root/go/src/github.com/hpc-gridware/go-clusterscheduler/cmd/simulator && go build . && ./simulator run ../../cluster.json && /bin/bash"
5959

6060
#.PHONY: simulate
6161
#simulate:

cmd/describe-mcp/README.md

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# Open Cluster Scheduler & Gridware Cluster Scheduler MCP Server Integration
2+
3+
This repository contains a sample Model Context Protocol (MCP) server integration for both the Open Cluster Scheduler (OCS) and Gridware Cluster Scheduler (GCS). This README provides detailed instructions on building, running, and using the example integration. It also includes guidance on how you can integrate this setup with tools such as Claude and Cursor for cluster configuration, job status analysis, and more.
4+
5+
This is primary for research and customization for your needs. It
6+
includes a tool for changing cluster configuration, see below.
7+
8+
## Overview
9+
10+
This example MCP server (“describe-mcp”) demonstrates how to integrate with the Open Cluster Scheduler (OCS) and Gridware Cluster Scheduler (GCS). By leveraging MCP, you can easily fetch configuration data, view job status, retrieve job accounting details, and perform write functions such as job submission or cluster configuration updates.
11+
12+
When properly configured, tools like Claude, Cursor, or other MCP clients can seamlessly query the cluster configuration, run commands like `qstat` or `qacct`, and view or modify cluster objects through this MCP integration.
13+
14+
## Build and Installation
15+
16+
1. Clone or download this repository.
17+
2. Open a terminal in the repository directory.
18+
3. Build the binary:
19+
```bash
20+
go build
21+
```
22+
23+
4. Run the binary with all tools enabled:
24+
```bash
25+
./describe-mcp
26+
```
27+
28+
To run in a restricted (read-only) mode that disables any “write” functionality (such as job submission or config changes):
29+
```bash
30+
export READ_ONLY="true"
31+
./describe-mcp
32+
```
33+
34+
## Available MCP Tools
35+
36+
Within this repository, you will find multiple MCP tools that can be called by your MCP clients (like Claude or Cursor). Each tool communicates with `describe-mcp` over SSE (Server-Sent Events) and invokes the respective OCS/GCS commands:
37+
38+
1. **get_cluster_configuration**
39+
• Fetches the complete cluster configuration in JSON format, including hosts, queues, users, projects, and resource settings.
40+
• Useful for quickly retrieving or backing up the entire configuration.
41+
42+
2. **job_details**
43+
• Retrieves detailed accounting information about finished jobs in a structured format.
44+
• Specify job IDs or leave it blank to fetch data about all finished jobs.
45+
46+
3. **qacct**
47+
• Queries historical job accounting data, including resource usage, execution details, and job outcomes.
48+
49+
4. **qstat**
50+
• Fetches real-time information about running and pending jobs, as well as queue states and scheduling details.
51+
• You can use options like `-j <job_id>` to see additional granularity.
52+
53+
5. **qsub_help**
54+
• Retrieves a thorough reference for the `qsub` command, listing parameters, examples, and usage notes.
55+
• Helpful for crafting precise job submission calls.
56+
57+
6. **set_cluster_configuration**
58+
• Applies a new cluster configuration to the system, supplied as JSON.
59+
• !DANGEROUS! Only for container based test clusters, for testing - disable in code if not needed or by env variable.
60+
61+
7. **submit_job**
62+
• Submits a job to the cluster using SGE-compatible command line parameters.
63+
• Allows direct control over resource requests, scheduling policies, environment settings, and job array usage.
64+
65+
## Example Usage
66+
67+
Below are some illustrative queries you might pose to your MCP-based tools (e.g., Claude) to interact with the cluster:
68+
69+
1. **Show me a summary of all running jobs as a table.**
70+
• Internally, your client might call the `qstat` tool and parse the results into a table.
71+
72+
2. **Provide a high-level overview of the cluster configuration. How many jobs can run concurrently in the cluster?**
73+
• Calls `get_cluster_configuration` and aggregates relevant capacity or slot data to inform concurrency limits.
74+
75+
3. **Submit a job array with 100 tasks executing “sleep 100”.**
76+
• Leverages `submit_job` with an SGE array argument like `-t 1-100 -b y sleep 100`.
77+
78+
## MCP Integration Configuration
79+
80+
This repository shows how you might configure Claude (or a similar service) to connect to the MCP server. Below is a sample JSON snippet for using `npx mcp-remote`, adapting it to your environment:
81+
82+
```json
83+
{
84+
"mcpServers": {
85+
"gridware": {
86+
"command": "npx",
87+
"args": [
88+
"mcp-remote",
89+
"http://localhost:8888/sse"
90+
]
91+
}
92+
}
93+
}
94+
```
95+
96+
When configured correctly, Claude (or another client) will be able to send queries to the `gridware` MCP server (i.e., this `describe-mcp` process) via the SSE endpoint. Make sure the container or process is exposing the relevant port (e.g., `8888`) and that it has the required privileges to run OCS/GCS commands (`qconf`, `qstat`, `qacct`, etc.).
97+
98+
## Security and Privileges
99+
100+
Be mindful of the following:
101+
• The MCP server requires sufficient privileges to run the cluster management commands. Test this only in a temporary test installation,
102+
like using "make simulate" or "make run" in go-clusterscheduler project
103+
to generate a local test cluster which runs in a container.
104+
• You may restrict write capabilities for safer operation via the `READ_ONLY="true"` environment variable.
105+
106+
---
107+
108+
## Contributing
109+
110+
We welcome PRs and contributions for improvements or new features. Feel free to open an issue for questions, feedback, or discussion.
111+
112+
---
113+
114+
## License
115+
116+
You can use or modify this integration for your unique setup. Check the repository’s LICENSE file (if included) for specific terms.
117+
118+
---
119+
120+
## Contact
121+
122+
If you have any questions or require further assistance, reach out via the issues section in this repository. We’re happy to help you get started or troubleshoot any issues along the way.
123+
124+
---
125+
126+
**Thank you for exploring this MCP integration for OCS & GCS.**

cmd/describe-mcp/accounting_tools.go

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
/*___INFO__MARK_BEGIN__*/
2+
/*************************************************************************
3+
* Copyright 2025 HPC-Gridware GmbH
4+
*
5+
* Licensed under the Apache License, Version 2.0 (the "License");
6+
* you may not use this file except in compliance with the License.
7+
* You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*
17+
************************************************************************/
18+
/*___INFO__MARK_END__*/
19+
20+
package main
21+
22+
import (
23+
"context"
24+
"encoding/json"
25+
"fmt"
26+
"log"
27+
"strconv"
28+
29+
qacct "github.com/hpc-gridware/go-clusterscheduler/pkg/qacct/v9.0"
30+
"github.com/mark3labs/mcp-go/mcp"
31+
)
32+
33+
// registerAccountingTools registers all job accounting related tools
34+
func registerAccountingTools(s *SchedulerServer, config SchedulerServerConfig) error {
35+
// Add qacct tool
36+
s.server.AddTool(mcp.NewTool(
37+
"qacct",
38+
mcp.WithDescription("Retrieves accounting information about finished jobs in the Gridware Cluster Scheduler. This tool allows querying job history, resource usage, and execution details for completed jobs. Use with various options or specify job IDs to get detailed information about specific jobs."),
39+
mcp.WithArray("arguments",
40+
mcp.Description("Command line arguments for qacct (e.g., '-j 123' for job information, '-u username' for user jobs, '-help' for help documentation)."),
41+
),
42+
), func(ctx context.Context, req mcp.CallToolRequest) (*mcp.CallToolResult, error) {
43+
log.Printf("Executing qacct command")
44+
45+
// Get optional arguments
46+
var arguments []string
47+
if args, ok := req.Params.Arguments["arguments"].([]interface{}); ok {
48+
for _, arg := range args {
49+
if strArg, ok := arg.(string); ok {
50+
arguments = append(arguments, strArg)
51+
}
52+
}
53+
}
54+
55+
// Execute qacct command
56+
output, err := getAccountingInfo(ctx, arguments)
57+
if err != nil {
58+
log.Printf("Failed to execute qacct: %v", err)
59+
return &mcp.CallToolResult{
60+
Content: []mcp.Content{
61+
mcp.TextContent{
62+
Type: "text",
63+
Text: fmt.Sprintf("Failed to execute qacct command: %v", err),
64+
},
65+
},
66+
IsError: true,
67+
}, nil
68+
}
69+
70+
log.Printf("Successfully executed qacct command")
71+
72+
return &mcp.CallToolResult{
73+
Content: []mcp.Content{
74+
mcp.TextContent{
75+
Type: "text",
76+
Text: output,
77+
},
78+
},
79+
}, nil
80+
})
81+
82+
// Add job_details tool
83+
s.server.AddTool(mcp.NewTool(
84+
"job_details",
85+
mcp.WithDescription("Retrieves detailed accounting information about finished jobs in a structured format. This tool returns comprehensive data about job execution, including resource usage, submission parameters, and execution timelines. Specify job IDs to get information about specific jobs or leave empty to get details for all finished jobs."),
86+
mcp.WithArray("job_ids",
87+
mcp.Description("List of job IDs to retrieve details for. If omitted, details for all finished jobs will be returned."),
88+
),
89+
), func(ctx context.Context, req mcp.CallToolRequest) (*mcp.CallToolResult, error) {
90+
log.Printf("Retrieving job details")
91+
92+
// Parse job IDs if provided
93+
var jobIDs []int64
94+
if ids, ok := req.Params.Arguments["job_ids"].([]interface{}); ok && len(ids) > 0 {
95+
for _, id := range ids {
96+
// Handle different possible formats (string, number)
97+
switch v := id.(type) {
98+
case string:
99+
if jobID, err := strconv.ParseInt(v, 10, 64); err == nil {
100+
jobIDs = append(jobIDs, jobID)
101+
} else {
102+
log.Printf("Invalid job ID format: %s", v)
103+
}
104+
case float64:
105+
jobIDs = append(jobIDs, int64(v))
106+
}
107+
}
108+
}
109+
110+
// Get job details
111+
output, err := getStructuredJobDetails(ctx, jobIDs)
112+
if err != nil {
113+
log.Printf("Failed to get job details: %v", err)
114+
return &mcp.CallToolResult{
115+
Content: []mcp.Content{
116+
mcp.TextContent{
117+
Type: "text",
118+
Text: fmt.Sprintf("Failed to retrieve job details: %v", err),
119+
},
120+
},
121+
IsError: true,
122+
}, nil
123+
}
124+
125+
log.Printf("Successfully retrieved job details")
126+
127+
return &mcp.CallToolResult{
128+
Content: []mcp.Content{
129+
mcp.TextContent{
130+
Type: "text",
131+
Text: output,
132+
},
133+
},
134+
}, nil
135+
})
136+
137+
return nil
138+
}
139+
140+
// Helper functions for job accounting
141+
142+
// getAccountingInfo executes qacct with the given arguments
143+
func getAccountingInfo(ctx context.Context, args []string) (string, error) {
144+
qa, err := qacct.NewCommandLineQAcct(qacct.CommandLineQAcctConfig{
145+
Executable: "qacct",
146+
})
147+
if err != nil {
148+
return "", fmt.Errorf("internal error: failed to initialize qacct command line tool: %v", err)
149+
}
150+
151+
output, err := qa.NativeSpecification(args)
152+
if err != nil {
153+
return "", fmt.Errorf("internal error: failed to execute qacct command: %v", err)
154+
}
155+
156+
return output, nil
157+
}
158+
159+
// getStructuredJobDetails retrieves job details using qacct
160+
func getStructuredJobDetails(ctx context.Context, jobIDs []int64) (string, error) {
161+
qa, err := qacct.NewCommandLineQAcct(qacct.CommandLineQAcctConfig{
162+
Executable: "qacct",
163+
})
164+
if err != nil {
165+
return "", fmt.Errorf("internal error: failed to initialize qacct command line tool: %v", err)
166+
}
167+
168+
jobDetails, err := qa.ShowJobDetails(jobIDs)
169+
if err != nil {
170+
return "", fmt.Errorf("internal error: failed to show job details: %v", err)
171+
}
172+
173+
// Convert job details to JSON for structured output
174+
data, err := json.MarshalIndent(jobDetails, "", " ")
175+
if err != nil {
176+
return "", fmt.Errorf("internal error: failed to format job details: %v", err)
177+
}
178+
179+
return string(data), nil
180+
}

0 commit comments

Comments
 (0)