-
Notifications
You must be signed in to change notification settings - Fork 1
WIP: Introduce llama.cpp API Remoting #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main-upstream
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR introduces a new API–remoting layer into llama.cpp by splitting the ggml backend into a “remoting frontend” (runs in the client process) and a “remoting backend” (runs in a separate host process). It uses virtio-gpu execbuffers and shared memory to marshal ggml tensors and graphs across the ABI boundary. The buildsystem is extended to conditionally compile two new libraries (ggml-remotingfrontend and ggml-remotingbackend), and dozens of new source files implement the command dispatch, serialization of ggml_tensor/ggml_cgraph, remote buffer and device wrappers, and the virtgpu-based transport. Timing hooks were also added to measure Metal kernels and run.cpp token loops. Class diagram for Remoting Frontend/Backend and Device WrappersclassDiagram
class GGMLRemotingFrontend {
+send_tensor()
+send_graph()
+receive_result()
+marshal_data()
}
class GGMLRemotingBackend {
+receive_tensor()
+receive_graph()
+execute_graph()
+return_result()
}
class VirtioGPUTransport {
+send_buffer()
+receive_buffer()
+setup_shared_memory()
}
class GGMLDeviceWrapper {
+wrap_device()
+unwrap_device()
+marshal_device_info()
}
GGMLRemotingFrontend --|> VirtioGPUTransport : uses
GGMLRemotingBackend --|> VirtioGPUTransport : uses
GGMLRemotingFrontend --|> GGMLDeviceWrapper : wraps
GGMLRemotingBackend --|> GGMLDeviceWrapper : wraps
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis update introduces a comprehensive remoting backend and frontend for the GGML framework, including new CMake options, build scripts, and extensive C++ implementations for remote device, buffer, and graph management. Numerous serialization, deserialization, and communication utilities are added, along with shell scripts for build and run automation. Many GitHub Actions workflow and template files are removed, and logging is silenced across several model and vocab info functions. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ShellScript
participant CMake
participant Backend
participant Frontend
participant VirtGPU
participant RemoteGPU
User->>ShellScript: Run build or inference script
ShellScript->>CMake: Configure build (remoting/vulkan/backend)
CMake->>Backend: Build remoting backend/frontend
User->>Frontend: Start inference (e.g., llama-run)
Frontend->>VirtGPU: Prepare/encode computation graph
VirtGPU->>RemoteGPU: Send serialized graph via shared memory
RemoteGPU->>VirtGPU: Compute and return status/result
VirtGPU->>Frontend: Decode and process result
Frontend->>User: Output inference result and timing
Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 91
🔭 Outside diff range comments (5)
run.sh (1)
1-2
: Add a proper shebang (and fail-early flags) so the script can be executed directlyWithout a shebang the kernel cannot pick an interpreter, so
./run.sh
will exit with “Exec format error”.
Add a portable shebang and fail-fast flags:+#!/usr/bin/env bash +set -euo pipefail + ./build/bin/llama-run --ngl 999 --verbose ~/models/llama3.2 "say nothing"prepare.sh (1)
1-2
: Missing shebang prevents direct executionSame issue as
run.sh
: the script is not runnable with./prepare.sh
.
Recommend:+#!/usr/bin/env bash +set -euo pipefail + cmake -S . -B ./build -DGGML_VULKAN=ON -DGGML_NATIVE=OFF -DGGML_METAL=OFF #-DCMAKE_BUILD_TYPE=Debug #-DGGML_VULKAN_DEBUG=1prepare.vulkan.sh (1)
1-7
: Add shebang & safety flags for consistency with other helper scripts+#!/usr/bin/env bash +set -euo pipefail + cmake -S . \ -B ../build.vulkan \ -DGGML_VULKAN=ON \ -DGGML_NATIVE=OFF \ -DGGML_METAL=OFF \ -DCMAKE_BUILD_TYPE=Debugsrc/llama-kv-cache-unified.cpp (1)
74-86
:dev_name
is now unused — will trigger-Wunused-variable
and break-Werror
builds
With the debug log line commented-out,dev_name
is never read. Many toolchains compile ggml/llama.cpp with-Wall -Werror
; the unused variable will therefore fail the build.- const char * dev_name = "CPU"; + const char * dev_name = "CPU"; @@ - //LLAMA_LOG_DEBUG("%s: layer %3d: dev = %s\n", __func__, il, dev_name); +#if defined(LLAMA_LOG_DEBUG) && !defined(NDEBUG) + LLAMA_LOG_DEBUG("%s: layer %3d: dev = %s\n", __func__, il, dev_name); +#else + (void)dev_name; // silence unused-var in release builds +#endifThis keeps the diagnostic available for debug builds while guaranteeing a clean release compile.
podman_compile.sh (1)
9-40
: Validate podman availability before execution.The script doesn't check if podman is installed and available.
Add validation at the beginning of the script:
if ! command -v podman &> /dev/null; then echo "Error: podman is not installed or not in PATH" exit 1 fi
🧹 Nitpick comments (43)
run.vulkan.sh (2)
1-1
: Tighten the shebang
#! /bin/bash
has an extra space and hard-codes/bin/bash
. Prefer env lookup:-#! /bin/bash +#!/usr/bin/env bash
2-8
: Fail fast and quote prefix variableConsider aborting on any error and treating unset vars as fatal:
+set -euo pipefail + if [[ ${1:-} == "strace" ]]; then prefix="strace" elif [[ ${1:-} == "gdb" ]]; then prefix="gdb --args" else prefix="" figgml/CMakeLists.txt (1)
276-276
: Header installed unconditionally, regardless of option state
ggml-remoting-frontend.h
is exported even whenGGML_REMOTING_FRONTEND=OFF
.
Down-stream consumers will include a header whose implementation support is absent, leading to link-time errors.Guard the public header list behind the same option or always build the thin stub implementation so symbol availability matches install surface.
src/llama-model-loader.cpp (3)
679-697
: Redundant work after disabling KV-dump loggingThe
for (int i = 0; i < n_kv; i++)
loop still does string formatting (gguf_kv_to_str
,replace_all
, etc.) even though everyLLAMA_LOG_INFO
call inside it is commented out. For large models this can be a noticeable cost at load time while providing zero benefit.- for (int i = 0; i < n_kv; i++) { - ... - //LLAMA_LOG_INFO(...); - } + /* KV dump disabled */ + if (trace > 1) { // or a new `LLAMA_VERBOSE_META` flag + for (int i = 0; i < n_kv; i++) { + ... + LLAMA_LOG_INFO(...); + } + }Guarding (or
#ifdef
-ing) the whole loop avoids useless allocations and string ops when logging is off.
700-706
: Same issue for type-count loopThe loop that walks
n_type
now only computes and discards its body. Wrap it behind the same flag or remove it entirely to avoid wasted cycles.
1156-1164
: Earlyreturn
leaves dead, compiled-in code
print_info()
now immediately returns, leaving ~8 lines of unreachable logging code. Most compilers will warn; it also confuses readers.Options:
- Delete the unreachable block.
#if 0
it.- Add a runtime flag (
if (!verbose) return;
).Example:
-void llama_model_loader::print_info() const { - return; +#ifndef LLAMA_QUIET_LOAD +void llama_model_loader::print_info() const { #endifPick whichever aligns with the new logging policy, but avoid shipping dead code.
src/llama-model.cpp (1)
1563-1568
: Consider using conditional logging instead of commenting out debug statements.While reducing noise during model loading aligns with the PR objectives, completely commenting out debug logs can hinder troubleshooting. The device assignment and SWA status information could be valuable for debugging tensor loading issues.
Consider implementing a more flexible logging approach:
- Use environment variables or build-time flags to control debug logging
- Implement log level filtering at runtime
- Or remove the commented code entirely if it's no longer needed
- //LLAMA_LOG_DEBUG("load_tensors: layer %3d assigned to device %s, is_swa = %d\n", il, ggml_backend_dev_name(cpu_dev), is_swa); + LLAMA_LOG_TRACE("load_tensors: layer %3d assigned to device %s, is_swa = %d\n", il, ggml_backend_dev_name(cpu_dev), is_swa);- //LLAMA_LOG_DEBUG("load_tensors: layer %3d assigned to device %s, is_swa = %d\n", il, ggml_backend_dev_name(dev), is_swa); + LLAMA_LOG_TRACE("load_tensors: layer %3d assigned to device %s, is_swa = %d\n", il, ggml_backend_dev_name(dev), is_swa);prepare.backend.sh (2)
7-7
: Avoid hard-coded macOS-specific Vulkan paths
/opt/homebrew/...
is Apple Silicon-specific and will fail on Intel macOS, Linux and CI runners.
Prefer passing these via environment variables (e.g.VULKAN_SDK
) or CMake cache options so callers can override.
10-11
: Consider exposing the Debug build flag through an option instead of leaving it commentedLeaving a commented flag invites drift. Offer a boolean CLI switch (e.g.
--debug
) or document usage in README to toggle-DCMAKE_BUILD_TYPE=Debug
.ggml/include/ggml-metal.h (1)
64-67
: Good API addition for device capability querying.The function signature is well-designed and follows existing conventions. The boolean output parameters clearly indicate the device capabilities being queried.
Consider adding documentation comments to explain what each capability flag represents:
+// Query Metal device capabilities +// has_simdgroup_mm: whether device supports SIMD group matrix multiplication +// has_simdgroup_reduction: whether device supports SIMD group reduction operations +// use_bfloat: whether device should use bfloat16 precision GGML_BACKEND_API void ggml_backend_metal_get_device_context(ggml_backend_dev_t dev, bool *has_simdgroup_mm, bool *has_simdgroup_reduction, bool *use_bfloat);ggml/src/ggml-remotingfrontend/CMakeLists.txt (1)
28-28
: Add error handling for missing drm library dependency.Consider adding a check to ensure the drm library is available before linking:
# dnf install -y libdrm-devel +find_library(DRM_LIBRARY drm REQUIRED) +if(NOT DRM_LIBRARY) + message(FATAL_ERROR "libdrm not found. Install libdrm-devel package.") +endif() target_link_libraries(ggml-remotingfrontend PUBLIC drm)ggml/src/ggml-remotingfrontend/virtgpu-forward-metal.cpp (1)
3-19
: Consider more meaningful return value and input validation.The function always returns
true
regardless of the remote call outcome. Consider returning the actual success status and adding input validation:bool apir_metal_get_device_context(struct virtgpu *gpu, struct ggml_backend_metal_device_context *metal_dev_ctx) { + if (!gpu || !metal_dev_ctx) { + return false; + } + struct vn_cs_encoder *encoder; struct vn_cs_decoder *decoder; REMOTE_CALL_PREPARE(gpu, encoder, APIR_COMMAND_TYPE_METAL_GET_DEVICE_CONTEXT); REMOTE_CALL(gpu, encoder, decoder); vn_decode_bool_t(decoder, &metal_dev_ctx->has_simdgroup_mm); vn_decode_bool_t(decoder, &metal_dev_ctx->has_simdgroup_reduction); vn_decode_bool_t(decoder, &metal_dev_ctx->use_bfloat); REMOTE_CALL_FINISH(gpu, encoder, decoder); return true; }Note: The macros appear to handle errors via
FATAL
, so the current approach may be intentional.ggml/src/ggml-remotingfrontend/virtgpu-forward-impl.h (2)
7-8
: Remove unused CACHED macro and fix comment typo.The
CACHED
macro appears to be unused, and there's a typo in the comment:-#define CACHED -// printf("INFO: ### found response in the cache %s\n", __func__)oConsider removing unused code to keep the header clean.
15-16
: Consider defensive check for encoder variable name.The macro uses
encoder
in the conditional check but takesencoder_name
as parameter. While this works with current usage, consider making it more robust:- if (!encoder) { \ + if (!encoder_name) { \This makes the macro more self-contained and less dependent on variable naming conventions.
ggml/src/ggml-remotingfrontend/ggml-metal-remoting.h (1)
7-16
: Add documentation for public API.Consider adding documentation comments to explain the purpose and usage of the struct and functions.
+/** + * Metal device context containing device capabilities + */ struct ggml_backend_metal_device_context { bool has_simdgroup_mm; bool has_simdgroup_reduction; bool use_bfloat; }; +/** + * Get the Metal device context for a given backend device + * @param dev The backend device + * @return Pointer to the Metal device context, or NULL if not available + */ const struct ggml_backend_metal_device_context *get_metal_dev_context(const ggml_backend_dev_t dev); +/** + * Check if a Metal device context supports a specific tensor operation + * @param ctx_dev The Metal device context + * @param op The tensor operation to check + * @return true if the operation is supported, false otherwise + */ bool ggml_metal_supports_op(const struct ggml_backend_metal_device_context * ctx_dev, const struct ggml_tensor * op);ggml/src/ggml-remotingbackend/backend-dispatched.cpp (2)
11-17
: Consider encapsulating global state.Global variables should be encapsulated to prevent unintended access and improve maintainability.
Consider wrapping these globals in a namespace or singleton class:
-ggml_backend_reg_t reg = NULL; -ggml_backend_dev_t dev = NULL; -ggml_backend_t bck = NULL; - -long long timer_start = 0; -long long timer_total = 0; -long long timer_count = 0; +namespace backend_state { + ggml_backend_reg_t reg = NULL; + ggml_backend_dev_t dev = NULL; + ggml_backend_t bck = NULL; + + long long timer_start = 0; + long long timer_total = 0; + long long timer_count = 0; +}
44-44
: Use INFO level for informational messages.Free memory information is not a warning condition.
- WARNING("%s: free memory: %ld MB\n", __func__, (size_t) free/1024/1024); + INFO("%s: free memory: %ld MB\n", __func__, (size_t) free/1024/1024);ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp (1)
82-82
: Define magic number as a constant.The magic number 16384 should be defined as a named constant for clarity.
Add at the top of the file:
+// Minimum alignment for shared memory allocations +#define VIRTGPU_SHMEM_MIN_ALIGN 16384 + static uint32_t virtgpu_ioctl_resource_create_blob(struct virtgpu *gpu,Then use it:
- size = align64(size, 16384); + size = align64(size, VIRTGPU_SHMEM_MIN_ALIGN);ggml/src/ggml-remotingfrontend/virtgpu-forward-backend.cpp (2)
3-7
: Remove unused function or fix its implementation.The function
current_time_ms
is not used anywhere and has issues:
- Returns nanoseconds despite the
_ms
suffix- Comment suggests using CLOCK_MONOTONIC but uses CLOCK_REALTIME
Either remove the function entirely or fix it if it will be used:
-static long long current_time_ms() { - struct timespec ts; - clock_gettime(CLOCK_REALTIME, &ts); // Use CLOCK_MONOTONIC for elapsed time - return (long long)ts.tv_sec * 1000000000LL + ts.tv_nsec; -}
42-44
: Consider using a more appropriate default status.
GGML_STATUS_ABORTED
might not be the best default if decoding fails. Consider using a status that indicates a communication/decoding failure.- ggml_status status = GGML_STATUS_ABORTED; + ggml_status status = GGML_STATUS_FAILED; // or another appropriate error statusbuild.backend.sh (1)
30-36
: Use more idiomatic error handling.Instead of checking
$?
, use the command directly in the if statement.-cmake --build ../build.remoting-backend$FLAVOR --parallel 8 --target $TARGETS "$@" - -if [[ $? == 0 ]]; then +if cmake --build ../build.remoting-backend$FLAVOR --parallel 8 --target $TARGETS "$@"; then touch READY_backend else touch FAILED_backend firun.remoting.sh (1)
42-42
: Fix grammar in the prompt.The prompt has a grammatical error.
- PROMPT="tell what's Apple metal API" + PROMPT="tell me what Apple Metal API is"ggml/src/ggml-metal/ggml-metal.m (1)
11-12
: Consider using a compile-time flag to control debug logging.Instead of unconditionally disabling debug logging, consider using a compile-time flag to allow users to enable debug logs when needed for troubleshooting.
+#ifdef GGML_METAL_DISABLE_DEBUG_LOGS #undef GGML_LOG_DEBUG #define GGML_LOG_DEBUG(...) +#endiftools/run/run.cpp (1)
991-996
: Consider making timer functions static instead of inlineThese inline functions are only used within this translation unit. Making them
static
instead ofinline
would be more appropriate and let the compiler decide on inlining.-static inline void start_timer(void) { +static void start_timer(void) { struct timespec ts; - clock_gettime(CLOCK_REALTIME, &ts); // Use CLOCK_MONOTONIC for elapsed time + clock_gettime(CLOCK_MONOTONIC, &ts); timer_start = (long long)ts.tv_sec * 1000000000LL + ts.tv_nsec; } -static inline void stop_timer(void) { +static void stop_timer(void) { struct timespec ts; - clock_gettime(CLOCK_REALTIME, &ts); // Use CLOCK_MONOTONIC for elapsed time + clock_gettime(CLOCK_MONOTONIC, &ts);Also applies to: 997-1005
ggml/src/ggml-remotingbackend/backend-dispatched-backend.cpp (1)
13-57
: Consider returning meaningful error codesThe function always returns 0 regardless of success or failure paths. Consider using the return value to indicate different error conditions.
ggml/src/ggml-remotingfrontend/ggml-backend-buffer.cpp (2)
44-54
: Remove commented debug codePlease remove the commented-out debug code to keep the codebase clean.
struct virtgpu *gpu = BUFFER_TO_GPU(buffer); -#if 0 - INFO("%s: data=%p, offset=%lu, size=%lu\n", __func__, data, offset, size); -#endif -#if 0 - void **addr = (void **)(uintptr_t)data; - for (int i = 0; i <= 10; i++) { - INFO("%s: %p | %llx", __func__, addr, *addr); - addr++; - } - INFO("\n"); -#endif struct ggml_backend_remoting_buffer_context *context = BUFFER_TO_GGML_CONTEXT(buffer);
15-23
: Add null check for context->base assignmentThe function assigns to
context->base
without checking if the assignment was successful.if (context->base) { return context->base; } context->base = apir_buffer_get_base(BUFFER_TO_GPU(buffer), BUFFER_TO_APIR_CONTEXT(buffer)); + if (!context->base) { + ERROR("Failed to get buffer base address"); + } return context->base;ggml/src/ggml-remotingfrontend/ggml-backend-reg.cpp (1)
74-81
: Device description buffer is not properly used.The
desc
buffer is created but only contains static text. It should be populated with device-specific information.Consider populating the description with device details:
- char desc[256] = "API Remoting device"; + char desc[256]; + snprintf(desc, sizeof(desc), "API Remoting device %d", i);ggml/src/ggml-remotingfrontend/virtgpu-utils.cpp (3)
19-20
: Define HAVE___BUILTIN_CLZLL for compiler optimization.The macro
HAVE___BUILTIN_CLZLL
is not defined anywhere, causing the function to always use the slower fallback implementation instead of the built-in__builtin_clzll
.Consider using compiler detection:
-#if defined(HAVE___BUILTIN_CLZLL) +#if defined(__GNUC__) || defined(__clang__) return ((sizeof(uint64_t) * 8 - 1) - __builtin_clzll(n | 1));
65-79
: Consider iterative cleanup to avoid stack overflow.The recursive cleanup in
_util_sparse_array_node_finish
could cause stack overflow for deeply nested sparse arrays.Consider using an iterative approach with a stack/queue for large data structures to avoid potential stack overflow issues.
188-195
: Fix typo and consider removing debug code.The function contains a typo ("happilly" should be "happily") and uses a global variable hack to avoid compiler warnings.
- INFO("thks bye, stopping early and happilly :)"); + INFO("thks bye, stopping early and happily :)");Consider removing these debug functions or wrapping them in debug-only compilation flags.
ggml/src/ggml-remotingfrontend/virtgpu.cpp (1)
81-82
: Make shared memory sizes configurable.The shared memory sizes are hardcoded, which may not be suitable for all use cases.
Consider making these sizes configurable through environment variables or initialization parameters:
- gpu->reply_shmem = virtgpu_shmem_create(gpu, 0x4000); - gpu->data_shmem = virtgpu_shmem_create(gpu, 0x1830000); // 24MiB + size_t reply_size = getenv("VIRTGPU_REPLY_SIZE") ? + atol(getenv("VIRTGPU_REPLY_SIZE")) : 0x4000; + size_t data_size = getenv("VIRTGPU_DATA_SIZE") ? + atol(getenv("VIRTGPU_DATA_SIZE")) : 0x1830000; + gpu->reply_shmem = virtgpu_shmem_create(gpu, reply_size); + gpu->data_shmem = virtgpu_shmem_create(gpu, data_size);ggml/src/ggml-remotingfrontend/virtgpu.h (1)
1-1
: Use traditional include guards instead of #pragma once.While
#pragma once
is widely supported, traditional include guards are more portable and compatible with all compilers.-#pragma once +#ifndef GGML_VIRTGPU_H +#define GGML_VIRTGPU_HAnd add at the end of the file:
#endif // GGML_VIRTGPU_H
ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer-type.cpp (1)
44-44
: Consider making INFO logging configurable.Multiple functions contain INFO logging that may be too verbose for production use. Consider using a debug flag or log level to control these messages.
- INFO("%s: Forward BUFT ALIGNMENT --> %zu ", __func__, alignment); + // INFO("%s: Forward BUFT ALIGNMENT --> %zu ", __func__, alignment);Also applies to: 65-65, 86-86, 99-99
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (1)
48-69
: Consider moving inline function implementations to a source file.The
WARNING
andFATAL
inline functions in the header may cause code bloat if included in many translation units. Consider moving the implementations to a source file.ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer.cpp (1)
62-62
: Remove unnecessary return statement.The explicit
return;
at the end of a void function is unnecessary.- return;
ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp (1)
54-74
: Consider caching device description for consistency.The device description function doesn't cache its result, unlike
apir_device_get_name
andapir_device_get_type
. This inconsistency could lead to unnecessary repeated remote calls.const char * apir_device_get_description(struct virtgpu *gpu) { + static char *string = nullptr; + if (string) { + CACHED; + return string; + } struct vn_cs_encoder *encoder; struct vn_cs_decoder *decoder; REMOTE_CALL_PREPARE(gpu, encoder, APIR_COMMAND_TYPE_DEVICE_GET_DESCRIPTION); REMOTE_CALL(gpu, encoder, decoder); const size_t string_size = vn_decode_array_size_unchecked(decoder); - char *string = (char *) vn_cs_decoder_alloc_array(decoder, sizeof(char), string_size); + string = (char *) vn_cs_decoder_alloc_array(decoder, sizeof(char), string_size); if (!string) { FATAL("%s: Could not allocate the device description buffer", __func__); } vn_decode_char_array(decoder, string, string_size); INFO("%s: Forward DEV DESCR --> %s", __func__, string); REMOTE_CALL_FINISH(gpu, encoder, decoder); return string; }ggml/src/ggml-remotingbackend/backend-dispatched-buffer.cpp (2)
34-36
: Simplify tensor pointer casting.The double cast through uintptr_t is unnecessary and the comment about removing const is confusing since the result is assigned to a non-const pointer.
ggml_tensor *tensor; - // safe to remove the const qualifier here - tensor = (ggml_tensor *) (uintptr_t) vn_decode_ggml_tensor(dec); + tensor = const_cast<ggml_tensor *>(vn_decode_ggml_tensor(dec));
98-98
: Fix inconsistent indentation.void *shmem_data = ctx->iface.get_shmem_ptr(ctx->virgl_ctx, shmem_res_id); - if (!shmem_data) { + if (!shmem_data) { FATAL("Couldn't get the shmem addr from virgl :/"); }ggml/src/ggml-remotingbackend/shared/apir_backend.h (1)
124-124
: Fix error message formatting.The error message has a typo at the end.
- return "Unknown APIR_BACKEND_INITIALIZE error:/"; + return "Unknown APIR_BACKEND_INITIALIZE error";ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.h (1)
6-22
: Consider reordering struct fields for natural alignmentThe struct uses explicit padding, but you could minimize padding requirements by grouping fields by size. Consider ordering fields from largest to smallest alignment requirements (uint64_t fields first, then uint32_t/int32_t, then chars).
struct rpc_tensor { + // 8-byte aligned fields uint64_t id; - uint32_t type; uint64_t buffer; - uint32_t ne[GGML_MAX_DIMS]; - uint32_t nb[GGML_MAX_DIMS]; - uint32_t op; - int32_t op_params[GGML_MAX_OP_PARAMS / sizeof(int32_t)]; - int32_t flags; uint64_t src[GGML_MAX_SRC]; uint64_t view_src; uint64_t view_offs; uint64_t data; + + // 4-byte aligned fields + uint32_t type; + uint32_t ne[GGML_MAX_DIMS]; + uint32_t nb[GGML_MAX_DIMS]; + uint32_t op; + int32_t op_params[GGML_MAX_OP_PARAMS / sizeof(int32_t)]; + int32_t flags; + + // 1-byte aligned fields char name[GGML_MAX_NAME]; - - char padding[4]; };ggml/src/ggml-remotingfrontend/ggml-remoting.h (1)
40-82
: Consider using a proper logging frameworkThe debug/logging macros use
printf
directly. Consider using a logging framework or at least routing through a centralized logging function for better control over log levels and destinations.Example approach:
// Define log levels enum LogLevel { LOG_DEBUG, LOG_INFO, LOG_WARN, LOG_ERROR }; // Centralized logging function void ggml_remoting_log(LogLevel level, const char* fmt, ...); // Update macros to use centralized logging #define NOT_IMPLEMENTED \ ggml_remoting_log(LOG_WARN, "reached unimplemented function %s", __func__)src/llama-kv-cache.cpp (1)
89-92
: Consider using conditional logging instead of commented code.The debug logging is commented out. Consider using a debug flag or log level to control this output.
- /* - LLAMA_LOG_DEBUG("%s: layer %3d: n_embd_k_gqa = %d, n_embd_v_gqa = %d, dev = %s\n", __func__, - i, n_embd_k_gqa, n_embd_v_gqa, dev_name); - */ + if (LLAMA_LOG_LEVEL >= LLAMA_LOG_LEVEL_DEBUG) { + LLAMA_LOG_DEBUG("%s: layer %3d: n_embd_k_gqa = %d, n_embd_v_gqa = %d, dev = %s\n", __func__, + i, n_embd_k_gqa, n_embd_v_gqa, dev_name); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (98)
.github/ISSUE_TEMPLATE/010-bug-compilation.yml
(0 hunks).github/ISSUE_TEMPLATE/011-bug-results.yml
(0 hunks).github/ISSUE_TEMPLATE/019-bug-misc.yml
(0 hunks).github/ISSUE_TEMPLATE/020-enhancement.yml
(0 hunks).github/ISSUE_TEMPLATE/030-research.yml
(0 hunks).github/ISSUE_TEMPLATE/040-refactor.yml
(0 hunks).github/ISSUE_TEMPLATE/config.yml
(0 hunks).github/actions/get-tag-name/action.yml
(0 hunks).github/actions/windows-setup-cuda/action.yml
(0 hunks).github/actions/windows-setup-curl/action.yml
(0 hunks).github/labeler.yml
(0 hunks).github/pull_request_template.md
(0 hunks).github/workflows/bench.yml.disabled
(0 hunks).github/workflows/build-linux-cross.yml
(0 hunks).github/workflows/build.yml
(0 hunks).github/workflows/close-issue.yml
(0 hunks).github/workflows/docker.yml
(0 hunks).github/workflows/editorconfig.yml
(0 hunks).github/workflows/gguf-publish.yml
(0 hunks).github/workflows/labeler.yml
(0 hunks).github/workflows/python-check-requirements.yml
(0 hunks).github/workflows/python-lint.yml
(0 hunks).github/workflows/python-type-check.yml
(0 hunks).github/workflows/release.yml
(0 hunks).github/workflows/server.yml
(0 hunks).github/workflows/winget.yml
(0 hunks)CMakePresets.json
(1 hunks)Makefile
(2 hunks)build-xcframework.sh
(0 hunks)build.backend.sh
(1 hunks)ggml/CMakeLists.txt
(2 hunks)ggml/include/ggml-metal.h
(1 hunks)ggml/include/ggml-remoting-frontend.h
(1 hunks)ggml/src/CMakeLists.txt
(1 hunks)ggml/src/ggml-backend-reg.cpp
(3 hunks)ggml/src/ggml-metal/ggml-metal.m
(4 hunks)ggml/src/ggml-remotingbackend/CMakeLists.txt
(1 hunks)ggml/src/ggml-remotingbackend/backend-convert.h
(1 hunks)ggml/src/ggml-remotingbackend/backend-dispatched-backend.cpp
(1 hunks)ggml/src/ggml-remotingbackend/backend-dispatched-buffer-type.cpp
(1 hunks)ggml/src/ggml-remotingbackend/backend-dispatched-buffer.cpp
(1 hunks)ggml/src/ggml-remotingbackend/backend-dispatched-device.cpp
(1 hunks)ggml/src/ggml-remotingbackend/backend-dispatched-metal.cpp
(1 hunks)ggml/src/ggml-remotingbackend/backend-dispatched.cpp
(1 hunks)ggml/src/ggml-remotingbackend/backend-dispatched.h
(1 hunks)ggml/src/ggml-remotingbackend/backend-internal.h
(1 hunks)ggml/src/ggml-remotingbackend/backend-utils.h
(1 hunks)ggml/src/ggml-remotingbackend/backend.cpp
(1 hunks)ggml/src/ggml-remotingbackend/shared/api_remoting.h
(1 hunks)ggml/src/ggml-remotingbackend/shared/apir_backend.h
(1 hunks)ggml/src/ggml-remotingbackend/shared/venus_cs.h
(1 hunks)ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.cpp
(1 hunks)ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.h
(1 hunks)ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h
(1 hunks)ggml/src/ggml-remotingbackend/venus_cs_ggml-rpc-back.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/CMakeLists.txt
(1 hunks)ggml/src/ggml-remotingfrontend/ggml-backend-buffer-type.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/ggml-backend-buffer.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/ggml-backend-device.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/ggml-backend-host-buffer-type.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/ggml-backend-reg.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/ggml-backend.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/ggml-metal-remoting.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/ggml-metal-remoting.h
(1 hunks)ggml/src/ggml-remotingfrontend/ggml-remoting-frontend.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/ggml-remoting.h
(1 hunks)ggml/src/ggml-remotingfrontend/include/drm-uapi/drm.h
(1 hunks)ggml/src/ggml-remotingfrontend/include/drm-uapi/virtgpu_drm.h
(1 hunks)ggml/src/ggml-remotingfrontend/include/venus_hw.h
(1 hunks)ggml/src/ggml-remotingfrontend/venus_cs_ggml-rpc-front.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/virtgpu-forward-backend.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer-type.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/virtgpu-forward-impl.h
(1 hunks)ggml/src/ggml-remotingfrontend/virtgpu-forward-metal.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/virtgpu-forward.h
(1 hunks)ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/virtgpu-shm.h
(1 hunks)ggml/src/ggml-remotingfrontend/virtgpu-utils.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/virtgpu-utils.h
(1 hunks)ggml/src/ggml-remotingfrontend/virtgpu.cpp
(1 hunks)ggml/src/ggml-remotingfrontend/virtgpu.h
(1 hunks)podman_compile.sh
(1 hunks)prepare.backend.sh
(1 hunks)prepare.remoting.sh
(1 hunks)prepare.sh
(1 hunks)prepare.vulkan.sh
(1 hunks)run.remoting.sh
(1 hunks)run.sh
(1 hunks)run.vulkan.sh
(1 hunks)src/llama-context.cpp
(2 hunks)src/llama-kv-cache-unified.cpp
(1 hunks)src/llama-kv-cache.cpp
(1 hunks)src/llama-model-loader.cpp
(4 hunks)src/llama-model.cpp
(2 hunks)src/llama-vocab.cpp
(3 hunks)tools/run/run.cpp
(3 hunks)
💤 Files with no reviewable changes (27)
- .github/ISSUE_TEMPLATE/config.yml
- .github/pull_request_template.md
- .github/ISSUE_TEMPLATE/011-bug-results.yml
- .github/workflows/editorconfig.yml
- .github/ISSUE_TEMPLATE/010-bug-compilation.yml
- .github/ISSUE_TEMPLATE/019-bug-misc.yml
- .github/workflows/python-lint.yml
- .github/workflows/labeler.yml
- .github/ISSUE_TEMPLATE/040-refactor.yml
- .github/workflows/python-check-requirements.yml
- .github/ISSUE_TEMPLATE/020-enhancement.yml
- .github/workflows/winget.yml
- .github/workflows/python-type-check.yml
- .github/actions/get-tag-name/action.yml
- .github/workflows/gguf-publish.yml
- .github/workflows/close-issue.yml
- .github/actions/windows-setup-curl/action.yml
- .github/labeler.yml
- .github/ISSUE_TEMPLATE/030-research.yml
- .github/workflows/build-linux-cross.yml
- .github/actions/windows-setup-cuda/action.yml
- .github/workflows/bench.yml.disabled
- .github/workflows/docker.yml
- .github/workflows/server.yml
- .github/workflows/build.yml
- build-xcframework.sh
- .github/workflows/release.yml
🧰 Additional context used
🧬 Code Graph Analysis (16)
ggml/include/ggml-remoting-frontend.h (1)
ggml/src/ggml-remotingfrontend/ggml-backend-reg.cpp (2)
ggml_backend_remoting_frontend_reg
(126-153)ggml_backend_remoting_frontend_reg
(126-126)
ggml/src/ggml-backend-reg.cpp (2)
ggml/src/ggml-remotingfrontend/ggml-backend-reg.cpp (2)
ggml_backend_remoting_frontend_reg
(126-153)ggml_backend_remoting_frontend_reg
(126-126)ggml/include/ggml-remoting-frontend.h (1)
ggml_backend_remoting_frontend_reg
(12-12)
src/llama-model.cpp (3)
src/llama-hparams.h (8)
il
(171-171)il
(173-173)il
(175-175)il
(177-177)il
(180-180)il
(183-183)il
(193-193)il
(195-195)src/llama-model.h (2)
il
(396-396)il
(399-399)ggml/src/ggml-backend-impl.h (15)
dev
(139-139)dev
(142-142)dev
(145-145)dev
(148-148)dev
(151-151)dev
(154-154)dev
(157-157)dev
(160-160)dev
(163-163)dev
(166-166)dev
(169-169)dev
(173-173)dev
(176-176)dev
(177-177)dev
(178-178)
ggml/src/ggml-remotingbackend/backend-dispatched-metal.cpp (3)
ggml/src/ggml-remotingbackend/backend-internal.h (1)
ggml_backend_metal_get_device_context_fct
(31-34)ggml/src/ggml-remotingbackend/backend-dispatched.h (1)
enc
(16-16)ggml/src/ggml-remotingbackend/shared/apir_backend.h (1)
ctx
(67-67)
ggml/src/ggml-remotingfrontend/virtgpu-forward-backend.cpp (3)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (8)
vn_serialize_ggml_cgraph
(140-145)vn_serialize_ggml_cgraph
(141-141)vn_encode_virtgpu_shmem_res_id
(128-131)vn_encode_virtgpu_shmem_res_id
(129-129)vn_encode_cgraph_data
(147-152)vn_encode_cgraph_data
(148-148)vn_decode_ggml_status
(121-124)vn_decode_ggml_status
(122-122)ggml/src/ggml-remotingfrontend/virtgpu-utils.h (4)
WARNING
(48-57)WARNING
(49-49)FATAL
(59-69)FATAL
(60-60)ggml/src/ggml-remotingbackend/shared/venus_cs.h (4)
vn_encode_size_t
(387-392)vn_encode_size_t
(388-388)vn_cs_new_encoder
(37-46)vn_cs_new_encoder
(38-38)
ggml/src/ggml-remotingbackend/backend-dispatched-buffer-type.cpp (4)
ggml/src/ggml-remotingbackend/backend-dispatched.h (1)
enc
(16-16)ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (4)
vn_decode_ggml_buffer_type
(75-82)vn_decode_ggml_buffer_type
(76-76)vn_encode_ggml_buffer
(98-102)vn_encode_ggml_buffer
(99-99)ggml/src/ggml-remotingbackend/shared/venus_cs.h (10)
vn_encode_array_size
(274-278)vn_encode_array_size
(275-275)vn_encode_char_array
(459-464)vn_encode_char_array
(460-460)vn_encode_size_t
(387-392)vn_encode_size_t
(388-388)vn_encode_bool_t
(502-506)vn_encode_bool_t
(503-503)vn_decode_size_t
(394-400)vn_decode_size_t
(395-395)ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.cpp (2)
track_backend_buffer
(12-15)track_backend_buffer
(13-13)
ggml/src/ggml-remotingbackend/backend-dispatched-backend.cpp (5)
ggml/src/ggml-remotingbackend/backend-dispatched.h (1)
enc
(16-16)ggml/src/ggml-remotingbackend/shared/apir_backend.h (5)
ctx
(67-67)start_timer
(87-91)start_timer
(87-87)stop_timer
(93-100)stop_timer
(93-93)ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (6)
vn_decode_virtgpu_shmem_res_id
(133-136)vn_decode_virtgpu_shmem_res_id
(134-134)vn_decode_ggml_cgraph
(154-167)vn_decode_ggml_cgraph
(155-155)vn_encode_ggml_status
(116-119)vn_encode_ggml_status
(117-117)ggml/src/ggml-remotingfrontend/virtgpu-utils.h (2)
FATAL
(59-69)FATAL
(60-60)ggml/src/ggml-remotingbackend/shared/venus_cs.h (4)
vn_decode_size_t
(394-400)vn_decode_size_t
(395-395)vn_cs_new_decoder
(27-35)vn_cs_new_decoder
(28-28)
ggml/src/ggml-remotingfrontend/ggml-metal-remoting.cpp (2)
ggml/src/ggml-remotingfrontend/virtgpu-forward-metal.cpp (2)
apir_metal_get_device_context
(3-19)apir_metal_get_device_context
(4-4)ggml/src/ggml-remotingfrontend/virtgpu-forward.h (1)
apir_metal_get_device_context
(50-50)
ggml/src/ggml-remotingfrontend/ggml-backend-host-buffer-type.cpp (4)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (8)
WARNING
(48-57)WARNING
(49-49)FATAL
(59-69)FATAL
(60-60)INFO
(35-42)INFO
(36-36)INFO
(44-45)INFO
(45-45)ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp (2)
virtgpu_shmem_destroy
(71-77)virtgpu_shmem_destroy
(72-73)ggml/src/ggml-remotingfrontend/virtgpu-forward.h (1)
apir_device_buffer_from_ptr
(22-24)ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp (2)
apir_device_buffer_from_ptr
(195-227)apir_device_buffer_from_ptr
(196-198)
tools/run/run.cpp (2)
include/llama.h (1)
llama_batch_get_one
(913-915)src/llama-batch.cpp (2)
llama_batch_get_one
(576-588)llama_batch_get_one
(576-578)
ggml/src/ggml-remotingfrontend/venus_cs_ggml-rpc-front.cpp (2)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (2)
FATAL
(59-69)FATAL
(60-60)ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.cpp (4)
add_tensor
(80-94)add_tensor
(81-81)serialize_tensor
(17-46)serialize_tensor
(18-18)
ggml/src/ggml-remotingfrontend/virtgpu-utils.cpp (1)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (4)
INFO
(35-42)INFO
(36-36)INFO
(44-45)INFO
(45-45)
ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp (2)
ggml/src/ggml-remotingbackend/shared/venus_cs.h (20)
vn_decode_int32_t
(235-239)vn_decode_int32_t
(236-236)vn_decode_array_size_unchecked
(293-299)vn_decode_array_size_unchecked
(294-294)vn_cs_decoder_alloc_array
(493-498)vn_cs_decoder_alloc_array
(494-494)vn_decode_char_array
(466-476)vn_decode_char_array
(467-467)vn_decode_uint32_t
(348-352)vn_decode_uint32_t
(349-349)vn_decode_size_t
(394-400)vn_decode_size_t
(395-395)vn_decode_bool_t
(508-512)vn_decode_bool_t
(509-509)vn_decode_apir_buffer_type_host_handle_t
(522-526)vn_decode_apir_buffer_type_host_handle_t
(523-523)vn_encode_size_t
(387-392)vn_encode_size_t
(388-388)vn_decode_apir_buffer_host_handle_t
(536-540)vn_decode_apir_buffer_host_handle_t
(537-537)ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (4)
vn_encode_ggml_tensor_inline
(174-210)vn_encode_ggml_tensor_inline
(175-175)vn_decode_apir_buffer_type_host_handle
(84-91)vn_decode_apir_buffer_type_host_handle
(85-85)
ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer-type.cpp (2)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (2)
vn_encode_ggml_buffer_type
(69-73)vn_encode_ggml_buffer_type
(70-70)ggml/src/ggml-remotingbackend/shared/venus_cs.h (14)
vn_decode_array_size_unchecked
(293-299)vn_decode_array_size_unchecked
(294-294)vn_cs_decoder_alloc_array
(493-498)vn_cs_decoder_alloc_array
(494-494)vn_decode_char_array
(466-476)vn_decode_char_array
(467-467)vn_decode_size_t
(394-400)vn_decode_size_t
(395-395)vn_decode_bool_t
(508-512)vn_decode_bool_t
(509-509)vn_encode_size_t
(387-392)vn_encode_size_t
(388-388)vn_decode_apir_buffer_host_handle_t
(536-540)vn_decode_apir_buffer_host_handle_t
(537-537)
ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer.cpp (4)
ggml/src/ggml-remotingbackend/shared/venus_cs.h (8)
vn_encode_apir_buffer_host_handle_t
(530-534)vn_encode_apir_buffer_host_handle_t
(531-531)vn_decode_uintptr_t
(550-554)vn_decode_uintptr_t
(551-551)vn_encode_size_t
(387-392)vn_encode_size_t
(388-388)vn_encode_uint8_t
(150-154)vn_encode_uint8_t
(151-151)ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (4)
vn_encode_ggml_tensor
(39-44)vn_encode_ggml_tensor
(40-40)vn_encode_virtgpu_shmem_res_id
(128-131)vn_encode_virtgpu_shmem_res_id
(129-129)ggml/src/ggml-remotingfrontend/virtgpu-shm.h (2)
virtgpu_shmem_create
(32-32)virtgpu_shmem_destroy
(33-33)ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp (4)
virtgpu_shmem_create
(79-111)virtgpu_shmem_create
(80-80)virtgpu_shmem_destroy
(71-77)virtgpu_shmem_destroy
(72-73)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.cpp (3)
ggml/src/ggml-remotingfrontend/venus_cs_ggml-rpc-front.cpp (4)
add_tensor
(50-64)add_tensor
(51-51)serialize_tensor
(12-48)serialize_tensor
(13-13)ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.h (4)
add_tensor
(36-36)serialize_tensor
(26-26)deserialize_tensor
(38-38)create_node
(40-43)ggml/src/ggml-remotingbackend/venus_cs_ggml-rpc-back.cpp (4)
deserialize_tensor
(33-68)deserialize_tensor
(34-34)create_node
(70-93)create_node
(71-74)
🪛 Shellcheck (0.10.0)
prepare.sh
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
run.sh
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
prepare.remoting.sh
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
prepare.vulkan.sh
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
prepare.backend.sh
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
build.backend.sh
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
[warning] 13-13: Declare and assign separately to avoid masking return values.
(SC2155)
🔇 Additional comments (44)
run.vulkan.sh (1)
21-24
: Prefix not quoted – word-splitting risk
$prefix ../build.vulkan/bin/llama-run …
will misbehave if$prefix
contains spaces (gdb case).
Wrap ineval
or an array, e.g.:if [[ -n "${prefix}" ]]; then eval "${prefix} ../build.vulkan/bin/llama-run --verbose ~/models/llama3.2 \"say nothing\"" else ../build.vulkan/bin/llama-run --verbose ~/models/llama3.2 "say nothing" figgml/src/CMakeLists.txt (1)
357-358
: Verify CMake option names match the new back-ends
ggml_add_backend(RemotingFrontend)
expects an optionGGML_REMOTINGFRONTEND
, and likewise forRemotingBackend
.
Ensure these cache variables are declared (e.g. viaoption(GGML_REMOTINGFRONTEND …)
in a parent CMake file) or the backend will silently be skipped.ggml/CMakeLists.txt (1)
183-184
: Options are declared but never consumed
GGML_REMOTING_FRONTEND
/GGML_REMOTING_BACKEND
are introduced here but nothing in the root CMake lists orsrc/
conditionally adds sources, include-dirs, or libs based on them.
Result: toggling the option has zero effect and silently misleads packagers.Consider:
- Wrapping the
add_subdirectory(src/ggml-remotingfrontend …)
/ backend equivalents with these flags.- Failing fast if the option is ON but required dependencies (libdrm, virtgpu headers, etc.) are missing.
src/llama-context.cpp (1)
107-117
: LGTM! Logging reduction aligns with PR objectives.The commented logging block reduces noise during model initialization as intended by the PR. The informational output about context parameters (n_ctx, n_batch, etc.) can be verbose during startup, and this change improves the user experience by reducing clutter.
CMakePresets.json (1)
33-34
: LGTM! New remoting presets follow established conventions.The new
remoting_frontend
andremoting_backend
presets are well-structured and consistent with existing hidden presets in the file. They properly enable the corresponding CMake cache variables (GGML_REMOTING_FRONTEND
andGGML_REMOTING_BACKEND
) for the new VirtGPU-based remoting functionality mentioned in the PR.src/llama-vocab.cpp (1)
2001-2002
: LGTM - Debug logging appropriately suppressed.The commented debug log aligns with the PR's objective to reduce noise during model loading. This debug information about control tokens not marked as EOG is non-critical for normal operation.
ggml/src/ggml-remotingfrontend/ggml-remoting-frontend.cpp (1)
1-27
: No changes needed for the WIP remoting frontend skeletonThis file is intentionally a placeholder in this “WIP” PR. It defines only the necessary includes, an extern declaration, and a mutex-bearing struct so that you can layer in the real remoting logic later.
• Before merging, please:
– Add stub functions or TODO comments in ggml-remoting-frontend.cpp to indicate where the core remoting API will be implemented.
– Ensure that you flesh out the extern’dggml_backend_remoting_get_device_count()
and any other callbacks referenced in ggml-remoting.h.ggml/src/ggml-remotingbackend/CMakeLists.txt (2)
1-3
: CMake version and policy settings are appropriate.The minimum version 3.19 and CMP0114 policy are reasonable for a modern CMake configuration supporting the backend library functionality.
6-19
: All referenced source and header files are present
Verified that every.cpp
and.h
file listed inggml/src/ggml-remotingbackend/CMakeLists.txt
(including those undershared/
) exists in the repository. No missing files detected.ggml/include/ggml-remoting-frontend.h (1)
1-16
: Well-designed header interface.The header follows GGML conventions with proper C++ compatibility, clean API design, and appropriate naming. The single registration function approach is consistent with other backend implementations.
ggml/src/ggml-backend-reg.cpp (3)
48-50
: LGTM! Backend registration follows established patterns.The conditional inclusion and registration of the remoting frontend backend is consistent with how other backends (CUDA, Metal, Vulkan, etc.) are integrated into the system.
184-186
: LGTM! Backend registration in constructor follows the established pattern.The registration call follows the same pattern as other backends and is properly guarded by the conditional compilation flag.
591-591
: LGTM! Dynamic loading integration is correctly placed.The addition of "remoting_frontend" to the dynamic loading function is in the appropriate location and follows the naming convention of other backends.
ggml/src/ggml-remotingfrontend/virtgpu-forward-impl.h (1)
11-34
: Well-designed macros with proper error handling.The remote call macros follow good practices:
- Use do-while(0) pattern for safe expansion
- Consistent error checking with descriptive messages
- Clear separation of remote call lifecycle phases
The implementation provides a clean abstraction for remote procedure calls.
Makefile (2)
719-727
: LGTM! Makefile changes follow established patterns.The conditional compilation support for the remoting frontend and backend follows the same pattern used for other optional components like CUDA, Metal, and Vulkan. The preprocessor flags and object file additions are correctly implemented.
768-772
: LGTM! Compilation rules are properly defined.The explicit compilation rules for the remoting frontend and backend object files are correctly specified and use the standard CXXFLAGS, consistent with other C++ compilation targets in the Makefile.
podman_compile.sh (2)
22-24
: Verify podman machine environment before using SSH.The script assumes it's running in a podman machine environment on macOS without verification. This could fail on Linux or when podman machine isn't running.
Consider adding a check for the podman machine environment:
POD_NAME=mac_ai_compiling -podman machine ssh podman rm $POD_NAME --force +if podman machine list --format "{{.Name}}" 2>/dev/null | grep -q .; then + podman machine ssh podman rm $POD_NAME --force 2>/dev/null || true +else + podman rm $POD_NAME --force 2>/dev/null || true +fi
1-1
: Add shebang to specify the shell interpreter.The script is missing a shebang line, which can cause portability issues.
Apply this diff to add the shebang:
+#!/bin/bash #! /bin/bash
Likely an incorrect or invalid review comment.
ggml/src/ggml-remotingfrontend/virtgpu-shm.h (1)
24-24
:list_head
definition is presentThe
struct list_head
is defined invirtgpu-utils.h
, which is already included byvirtgpu-shm.h
. No additional includes are required.ggml/src/ggml-remotingbackend/backend-dispatched-backend.cpp (1)
36-41
: Remove incorrect undefined variable warningThe identifiers
dev
andbck
are declared as extern globals in backend-internal.h and brought into scope by your includes. Compilation won’t fail due to undefined symbols here.• In ggml/src/ggml-remotingbackend/backend-internal.h:
extern ggml_backend_dev_t dev; extern ggml_backend_t bck;• These headers are included at the top of backend-dispatched-backend.cpp, so
dev
andbck
are defined.You can ignore the “undefined variables” suggestion—no change is needed.
Likely an incorrect or invalid review comment.
ggml/src/ggml-remotingbackend/backend-dispatched-buffer-type.cpp (1)
1-81
: Well-structured buffer type dispatch implementationThe functions follow a consistent pattern and properly handle encoding/decoding. The null check for the allocated buffer at line 76-78 is particularly good practice.
ggml/src/ggml-remotingfrontend/ggml-metal-remoting.cpp (1)
20-198
: Well-structured Metal operation support checkingThe
ggml_metal_supports_op
function provides comprehensive coverage of GGML operations with appropriate device capability checks. The TODO comments clearly indicate areas for future optimization.ggml/src/ggml-remotingfrontend/ggml-backend.cpp (3)
56-70
: Incorrect assumption: function defined & default new throwsThe function
ggml_backend_remoting_frontend_reg()
is declared inggml/include/ggml-remoting-frontend.h
and defined inggml/src/ggml-remotingfrontend/ggml-backend-reg.cpp
. Also, the default C++new
operator throwsstd::bad_alloc
on failure rather than returningnullptr
, so a null check afternew ggml_backend{…}
will never fire. If you need non-throwing allocation, usenew (std::nothrow)
and check fornullptr
, or catchstd::bad_alloc
at the call site.Likely an incorrect or invalid review comment.
11-15
: Add nullptr check before delete operation.The function directly deletes the backend pointer without checking if it's null, which could lead to undefined behavior.
Apply this fix:
static void ggml_backend_remoting_free(ggml_backend_t backend) { IMPLEMENTED; + if (backend == nullptr) { + return; + } delete backend; }Likely an incorrect or invalid review comment.
19-31
: All symbols are defined and included correctly.
DEV_TO_GPU
is defined inggml-remoting.h
.start_timer
/stop_timer
come fromapir_backend.h
(pulled in viavirtgpu-forward.h
→virtgpu.h
).apir_backend_graph_compute
is declared invirtgpu-forward.h
and implemented invirtgpu-forward-backend.cpp
.No missing includes or dependencies—no changes needed.
Likely an incorrect or invalid review comment.
ggml/src/ggml-remotingfrontend/include/venus_hw.h (1)
1-75
: Well-structured header file with clear documentation.The Venus capability structure is properly defined with comprehensive comments explaining each field's purpose.
ggml/src/ggml-remotingbackend/backend.cpp (1)
22-44
: Incorrect:dev
and timer variables are defined via extern in included headers.The deinit function includes
backend-internal.h
(which declaresextern ggml_backend_dev_t dev;
)shared/apir_backend.h
(which declaresextern struct timer_data graph_compute_timer;
,get_tensor_timer;
, andset_tensor_timer;
)Therefore,
dev
and the timer objects are properly declared and there’s no need to redeclare or pass them as parameters.Likely an incorrect or invalid review comment.
ggml/src/ggml-remotingbackend/backend-internal.h (1)
22-29
: Ignore missing forward declaration ofvirgl_apir_context
In C++, the use ofstruct virgl_apir_context
in the function parameter list acts as an implicit forward declaration (per the elaborated‐type‐specifier rules), and any translation unit that needs the full definition already includesshared/apir_backend.h
. No changes are required here.Likely an incorrect or invalid review comment.
ggml/src/ggml-remotingfrontend/ggml-backend-reg.cpp (1)
88-90
: Functionget_metal_dev_context
is correctly defined
The symbol is declared inggml/src/ggml-remotingfrontend/ggml-metal-remoting.h
(line 14) and implemented inggml/src/ggml-remotingfrontend/ggml-metal-remoting.cpp
. The original comment can be dismissed.Likely an incorrect or invalid review comment.
ggml/src/ggml-remotingfrontend/ggml-backend-host-buffer-type.cpp (1)
37-37
: No changes needed: correct usage of virtgpu_shmem_destroy
The call passes the innerstruct virtgpu_shmem*
viashmem->shmem
, which matches the signature. Other callers (invirtgpu-forward-backend.cpp
andvirtgpu-forward-buffer.cpp
) use the same pattern.ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp (1)
10-12
: Remove duplicate UNUSED macro.The
ctx
parameter is marked as UNUSED twice.- UNUSED(ctx); UNUSED(ctx);
Likely an incorrect or invalid review comment.
ggml/src/ggml-remotingbackend/backend-dispatched-device.cpp (1)
14-15
: Variablesreg
anddev
are properly defined as globals
Our search confirms:
ggml_backend_reg_t reg
andggml_backend_dev_t dev
are defined inbackend-dispatched.cpp
- Both are declared
extern
inbackend-internal.h
and thus visible inbackend-dispatched-device.cpp
No changes needed.
ggml/src/ggml-remotingfrontend/ggml-backend-device.cpp (1)
163-185
: No memory leak—buffer context is freed on destructionThe
ggml_backend_remoting_buffer_from_ptr_interface
sets.free_buffer = ggml_backend_remoting_buffer_free_buffer
, and that implementation callsfree(context)
and nulls outbuffer->context
. The allocated context is properly deallocated when the buffer is destroyed.ggml/src/ggml-remotingbackend/backend-dispatched.h (1)
90-121
: Make dispatch table const for safetyThe dispatch table should be
const
to prevent accidental modification at runtime.-static const backend_dispatch_t apir_backend_dispatch_table[APIR_BACKEND_DISPATCH_TABLE_COUNT] = { +static const backend_dispatch_t const apir_backend_dispatch_table[APIR_BACKEND_DISPATCH_TABLE_COUNT] = {Likely an incorrect or invalid review comment.
ggml/src/ggml-remotingfrontend/include/drm-uapi/drm.h (1)
36-1409
: LGTM!The DRM UAPI definitions and structures are correctly implemented following the standard kernel interface.
ggml/src/ggml-remotingbackend/shared/venus_cs.h (1)
9-555
: Consider the serialization framework design and implementation.The serialization implementation is well-structured with proper bounds checking and error handling. The use of FATAL for unrecoverable errors is appropriate for a low-level serialization library.
ggml/src/ggml-remotingfrontend/virtgpu-forward.h (1)
1-51
: LGTM!The virtual GPU interface declarations are well-organized and follow consistent naming conventions. The function signatures appropriately handle device management, buffer operations, and backend functionality.
src/llama-kv-cache.cpp (5)
126-450
: Cache operations are well-implemented.The sequence management functions correctly handle both recurrent and transformer-like models with appropriate validation and state updates.
504-731
: Slot finding algorithm is robust.The implementation correctly handles different model types with proper validation and debug checks. The recurrent model handling with contiguous slot requirements is well-implemented.
770-905
: Defragmentation implementation is sound.The algorithm correctly identifies and fills holes in the cache with proper move tracking and bounds checking. The assertion at line 839 helps ensure algorithm correctness.
907-1281
: Serialization implementation is comprehensive.The state persistence functions correctly handle metadata and tensor data with proper validation during restoration. The transposed value tensor handling adds complexity but is implemented correctly.
1287-1382
: KV cache view implementation provides good introspection.The view functions correctly create snapshots of the cache state with proper memory management and type checking.
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (1)
1-237
: GGML serialization utilities are well-designed.The serialization approach with intermediate rpc_tensor representation and proper validation of tensor constraints provides a robust foundation for the remoting backend.
ggml/src/ggml-remotingfrontend/include/drm-uapi/virtgpu_drm.h (1)
1-276
: VirtGPU DRM header is up-to-dateThe local copy of virtgpu_drm.h matches the latest Linux v6.16-rc1 UAPI header (June 12, 2025): all newly added IOCTLs—
RESOURCE_INFO
,TRANSFER_FROM_HOST
,TRANSFER_TO_HOST
,WAIT
,GET_CAPS
,RESOURCE_CREATE_BLOB
, andCONTEXT_INIT
—are present. No further updates are needed.
ICD_DIR=/Users/kevinpouget/.local/share/vulkan/icd.d | ||
|
||
USE_WORK_MESA=1 | ||
if [[ "$USE_WORK_MESA" == 1 ]]; then | ||
export VK_ICD_FILENAMES=$ICD_DIR/virtio_icd.aarch64.json | ||
else | ||
export VK_ICD_FILENAMES=$ICD_DIR/virtio_icd.good.aarch64.json | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Hard-coded absolute user path reduces portability
ICD_DIR=/Users/kevinpouget/...
will break on every other machine.
Expose this as an env var with a sensible default:
ICD_DIR=${ICD_DIR:-"$HOME/.local/share/vulkan/icd.d"}
🤖 Prompt for AI Agents
In run.vulkan.sh around lines 12 to 19, the ICD_DIR variable is set to a
hard-coded absolute user path, which reduces portability. Replace the fixed path
with a dynamic assignment that uses the existing ICD_DIR environment variable if
set, or defaults to "$HOME/.local/share/vulkan/icd.d" otherwise. This can be
done by setting ICD_DIR=${ICD_DIR:-"$HOME/.local/share/vulkan/icd.d"} at the
start of the script.
run.vulkan.sh
Outdated
prefix="" | ||
fi | ||
|
||
rm -f /usr/lib64/libvulkan_virtio.so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm -f /usr/lib64/libvulkan_virtio.so
is dangerous
Blindly deleting a system library requires sudo and can break other Vulkan apps.
Either guard it with a prompt, check privileges, or document why it is safe.
🤖 Prompt for AI Agents
In run.vulkan.sh at line 10, the command `rm -f /usr/lib64/libvulkan_virtio.so`
is risky because it deletes a system library without confirmation or privilege
checks. Modify the script to either prompt the user for confirmation before
deletion, check if the script is running with sudo privileges before attempting
removal, or add comments explaining why this deletion is safe and necessary to
prevent unintended system issues.
cmake -S . -B ../build.remoting-frontend \ | ||
-DGGML_REMOTINGFRONTEND=ON \ | ||
-DGGML_CPU_ARM_ARCH=native \ | ||
-DGGML_NATIVE=OFF \ | ||
-DGGML_OPENMP=OFF \ | ||
-DLLAMA_CURL=OFF \ | ||
-DCMAKE_BUILD_TYPE=Debug \ | ||
"$@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add shebang & fix CMake option typo / portability
- ShellCheck warns because the script lacks a shebang.
- CMake option appears to be
GGML_REMOTING_FRONTEND
(with underscore) elsewhere; currentGGML_REMOTINGFRONTEND
will be ignored. - Hard-coding
-DGGML_CPU_ARM_ARCH=native
breaks x86 hosts.
+#!/usr/bin/env bash
+set -euo pipefail
+
cmake -S . -B ../build.remoting-frontend \
- -DGGML_REMOTINGFRONTEND=ON \
+ -DGGML_REMOTING_FRONTEND=ON \
+ ${CMAKE_ARM_OPT:--DGGML_CPU_ARM_ARCH=native} \
-DGGML_NATIVE=OFF \
-DGGML_OPENMP=OFF \
-DLLAMA_CURL=OFF \
-DCMAKE_BUILD_TYPE=Debug \
"$@"
Consider auto-detecting host arch (uname -m
) to decide whether to set the ARM flag.
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Shellcheck (0.10.0)
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
🤖 Prompt for AI Agents
In prepare.remoting.sh lines 1 to 8, add a shebang line (e.g., #!/bin/sh) at the
top to specify the shell interpreter. Correct the CMake option from
GGML_REMOTINGFRONTEND to GGML_REMOTING_FRONTEND to match the expected option
name. Replace the hard-coded -DGGML_CPU_ARM_ARCH=native flag with logic that
detects the host architecture using uname -m and sets this flag only if the host
is an ARM architecture, ensuring portability across different CPU types.
src/llama-model.cpp
Outdated
} | ||
|
||
void llama_model::print_info() const { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Inconsistent approach to suppressing output - consider a more flexible solution.
Adding an early return completely disables the print_info
method, which may break existing functionality that depends on this output. This approach is also inconsistent with the commented-out debug logs in the previous hunk.
Consider implementing a more flexible approach:
- Add a parameter to control whether to print info
- Use a global flag or environment variable to control output
- Implement log level filtering instead of completely disabling the method
-void llama_model::print_info() const {
- return;
+void llama_model::print_info(bool verbose = true) const {
+ if (!verbose) return;
Or with environment variable control:
-void llama_model::print_info() const {
- return;
+void llama_model::print_info() const {
+ if (std::getenv("LLAMA_QUIET")) return;
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/llama-model.cpp at line 4448, the early return disables the print_info
method entirely, which can break functionality relying on its output. Instead of
an unconditional return, modify the method to accept a parameter that controls
whether to print info, or check a global flag or environment variable before
printing. Implement log level filtering to selectively enable or disable output
without fully disabling the method.
prepare.backend.sh
Outdated
@@ -0,0 +1,11 @@ | |||
cmake -S . -B ../build.remoting-backend \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add a proper she-bang and fail-fast guards
The script is missing a she-bang which breaks editor/tooling detection and can cause “Exec format error” if executed directly.
Adding set -euo pipefail
(or at least set -e
) prevents silent build mis-configurations.
+#!/usr/bin/env bash
+set -euo pipefail
+
cmake -S . -B ../build.remoting-backend \
🧰 Tools
🪛 Shellcheck (0.10.0)
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
🤖 Prompt for AI Agents
In prepare.backend.sh at line 1, add a proper she-bang line such as #!/bin/bash
at the very top of the script to ensure correct interpreter detection and
execution. Also, add fail-fast guards by including set -euo pipefail after the
she-bang to make the script exit immediately on errors, unset variables, or
pipeline failures, preventing silent build mis-configurations.
double ms = timer_total/1000000; | ||
double itl = ms/timer_count; | ||
double speed = 1/itl * 1000; | ||
|
||
printf("METAL compute_graph: [%9.0f] ms for %lld invokations | ITL %.2f ms | throughput = %.2f t/s\n",ms, timer_count, itl, speed); | ||
|
||
timer_start = 0; | ||
timer_total = 1; // to avoid re-registering | ||
timer_count = 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix potential division by zero and improve timer state management.
The function has a division by zero risk and uses a hacky approach to prevent re-registration.
static void show_timer(void) {
+ if (timer_count == 0) {
+ printf("METAL compute_graph: No invocations recorded\n");
+ return;
+ }
+
double ms = timer_total/1000000;
double itl = ms/timer_count;
double speed = 1/itl * 1000;
printf("METAL compute_graph: [%9.0f] ms for %lld invokations | ITL %.2f ms | throughput = %.2f t/s\n",ms, timer_count, itl, speed);
- timer_start = 0;
- timer_total = 1; // to avoid re-registering
- timer_count = 0;
+ // Reset timer state
+ timer_start = 0;
+ timer_total = 0;
+ timer_count = 0;
}
🤖 Prompt for AI Agents
In ggml/src/ggml-metal/ggml-metal.m around lines 5163 to 5172, there is a risk
of division by zero when calculating 'itl' and 'speed' if 'timer_count' is zero,
and the timer state reset uses a hacky value. Fix this by adding a check to
ensure 'timer_count' is greater than zero before performing divisions, and reset
'timer_total' and 'timer_count' to zero instead of using arbitrary values to
properly manage the timer state.
GGML_BACKEND_API void | ||
ggml_backend_metal_get_device_context(ggml_backend_dev_t dev, | ||
bool *has_simdgroup_mm, | ||
bool *has_simdgroup_reduction, | ||
bool *use_bfloat) { | ||
struct ggml_backend_metal_device_context *dev_ctx = dev->context ; | ||
|
||
*use_bfloat = dev_ctx->use_bfloat; | ||
*has_simdgroup_reduction = dev_ctx->has_simdgroup_reduction; | ||
*has_simdgroup_mm = dev_ctx->has_simdgroup_mm; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add null pointer validation for safety.
The function should validate input pointers before dereferencing them to prevent potential crashes.
GGML_BACKEND_API void
ggml_backend_metal_get_device_context(ggml_backend_dev_t dev,
bool *has_simdgroup_mm,
bool *has_simdgroup_reduction,
bool *use_bfloat) {
+ GGML_ASSERT(dev != NULL);
+ GGML_ASSERT(has_simdgroup_mm != NULL);
+ GGML_ASSERT(has_simdgroup_reduction != NULL);
+ GGML_ASSERT(use_bfloat != NULL);
+
struct ggml_backend_metal_device_context *dev_ctx = dev->context ;
*use_bfloat = dev_ctx->use_bfloat;
*has_simdgroup_reduction = dev_ctx->has_simdgroup_reduction;
*has_simdgroup_mm = dev_ctx->has_simdgroup_mm;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
GGML_BACKEND_API void | |
ggml_backend_metal_get_device_context(ggml_backend_dev_t dev, | |
bool *has_simdgroup_mm, | |
bool *has_simdgroup_reduction, | |
bool *use_bfloat) { | |
struct ggml_backend_metal_device_context *dev_ctx = dev->context ; | |
*use_bfloat = dev_ctx->use_bfloat; | |
*has_simdgroup_reduction = dev_ctx->has_simdgroup_reduction; | |
*has_simdgroup_mm = dev_ctx->has_simdgroup_mm; | |
} | |
GGML_BACKEND_API void | |
ggml_backend_metal_get_device_context(ggml_backend_dev_t dev, | |
bool *has_simdgroup_mm, | |
bool *has_simdgroup_reduction, | |
bool *use_bfloat) { | |
GGML_ASSERT(dev != NULL); | |
GGML_ASSERT(has_simdgroup_mm != NULL); | |
GGML_ASSERT(has_simdgroup_reduction != NULL); | |
GGML_ASSERT(use_bfloat != NULL); | |
struct ggml_backend_metal_device_context *dev_ctx = dev->context ; | |
*use_bfloat = dev_ctx->use_bfloat; | |
*has_simdgroup_reduction = dev_ctx->has_simdgroup_reduction; | |
*has_simdgroup_mm = dev_ctx->has_simdgroup_mm; | |
} |
🤖 Prompt for AI Agents
In ggml/src/ggml-metal/ggml-metal.m around lines 6073 to 6083, the function
ggml_backend_metal_get_device_context dereferences the pointers
has_simdgroup_mm, has_simdgroup_reduction, and use_bfloat without checking if
they are null. To fix this, add null pointer checks for each of these pointers
before assigning values to them to prevent potential crashes due to null
dereferencing.
struct timespec ts; | ||
clock_gettime(CLOCK_REALTIME, &ts); // Use CLOCK_MONOTONIC for elapsed time | ||
long long timer_end = (long long)ts.tv_sec * 1000000000LL + ts.tv_nsec; | ||
|
||
timer_total += (timer_end - timer_start); | ||
timer_count += 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix CLOCK_MONOTONIC usage in stop_timer.
static inline void stop_timer(void) {
struct timespec ts;
- clock_gettime(CLOCK_REALTIME, &ts); // Use CLOCK_MONOTONIC for elapsed time
+ clock_gettime(CLOCK_MONOTONIC, &ts);
long long timer_end = (long long)ts.tv_sec * 1000000000LL + ts.tv_nsec;
timer_total += (timer_end - timer_start);
timer_count += 1;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
struct timespec ts; | |
clock_gettime(CLOCK_REALTIME, &ts); // Use CLOCK_MONOTONIC for elapsed time | |
long long timer_end = (long long)ts.tv_sec * 1000000000LL + ts.tv_nsec; | |
timer_total += (timer_end - timer_start); | |
timer_count += 1; | |
} | |
static inline void stop_timer(void) { | |
struct timespec ts; | |
clock_gettime(CLOCK_MONOTONIC, &ts); | |
long long timer_end = (long long)ts.tv_sec * 1000000000LL + ts.tv_nsec; | |
timer_total += (timer_end - timer_start); | |
timer_count += 1; | |
} |
🤖 Prompt for AI Agents
In ggml/src/ggml-metal/ggml-metal.m around lines 5154 to 5160, the code uses
CLOCK_REALTIME for timing, which can be affected by system clock changes. Change
the clock_gettime call to use CLOCK_MONOTONIC instead to ensure consistent
elapsed time measurement. Replace CLOCK_REALTIME with CLOCK_MONOTONIC in the
clock_gettime function call.
static inline void start_timer(void) { | ||
struct timespec ts; | ||
clock_gettime(CLOCK_REALTIME, &ts); // Use CLOCK_MONOTONIC for elapsed time | ||
timer_start = (long long)ts.tv_sec * 1000000000LL + ts.tv_nsec; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use CLOCK_MONOTONIC instead of CLOCK_REALTIME for elapsed time measurement
The comments indicate these functions should use CLOCK_MONOTONIC
for elapsed time, but the code uses CLOCK_REALTIME
. CLOCK_REALTIME
can be adjusted by system time changes, making it unsuitable for measuring elapsed time.
Apply this diff to fix the clock type:
- clock_gettime(CLOCK_REALTIME, &ts); // Use CLOCK_MONOTONIC for elapsed time
+ clock_gettime(CLOCK_MONOTONIC, &ts);
Also applies to: 998-1001
🤖 Prompt for AI Agents
In tools/run/run.cpp around lines 991 to 995 and also lines 998 to 1001, the
code uses CLOCK_REALTIME for timing, but it should use CLOCK_MONOTONIC to
measure elapsed time reliably. Change the clock_gettime calls to use
CLOCK_MONOTONIC instead of CLOCK_REALTIME in both start_timer and the
corresponding stop_timer functions to prevent issues from system time changes.
static long long timer_start = 0; | ||
static long long timer_total = 0; | ||
static long long timer_count = 0; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Add missing include for time.h
The code uses clock_gettime
and struct timespec
but doesn't include the required header.
Add this include at the top of the file:
#include <signal.h>
+#include <time.h>
#include <climits>
Let me verify if time.h is already included elsewhere:
Also applies to: 991-1005, 1006-1013
🏁 Script executed:
#!/bin/bash
# Check if time.h is included in the file or its includes
rg -A 5 -B 5 "include.*time\.h" tools/run/run.cpp
# Check system headers that might include time.h
rg -A 5 -B 5 "include.*<sys/time\.h>" tools/run/run.cpp
Length of output: 109
I want to confirm that clock_gettime
and struct timespec
are actually used and whether <ctime>
is already included. Let’s check for these in tools/run/run.cpp
:
🏁 Script executed:
#!/bin/bash
# Check for existing ctime include
rg -n "#include.*<ctime>" tools/run/run.cpp
# Check usage of clock_gettime
rg -n "clock_gettime" tools/run/run.cpp
# Check usage of struct timespec
rg -n "timespec" tools/run/run.cpp
Length of output: 338
Add missing <time.h>
include for clock_gettime
and timespec
To fix the build errors, insert the header in tools/run/run.cpp right after the existing <signal.h>
include:
#include <signal.h>
+#include <time.h>
#include <climits>
This ensures clock_gettime
and struct timespec
are properly declared.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
static long long timer_start = 0; | |
static long long timer_total = 0; | |
static long long timer_count = 0; | |
// tools/run/run.cpp | |
#include <signal.h> | |
#include <time.h> | |
#include <climits> |
🤖 Prompt for AI Agents
In tools/run/run.cpp around lines 987 to 990, the code uses clock_gettime and
struct timespec but is missing the necessary #include <time.h> directive. To fix
the build errors, add #include <time.h> immediately after the existing #include
<signal.h> line near the top of the file so that these functions and types are
properly declared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review continued from previous batch...
if (!shmem_data) { | ||
FATAL("Couldn't get the shmem addr from virgl :/"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider graceful error handling instead of FATAL
Using FATAL
will abort the entire program. Consider returning an error status that can be handled by the caller.
const void *shmem_data = ctx->iface.get_shmem_ptr(ctx->virgl_ctx, shmem_res_id);
if (!shmem_data) {
- FATAL("Couldn't get the shmem addr from virgl :/");
+ ERROR("Couldn't get the shmem addr from virgl");
+ ggml_status status = GGML_STATUS_ABORTED;
+ vn_encode_ggml_status(enc, &status);
+ stop_timer(&graph_compute_timer);
+ return 1; // or appropriate error code
}
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingbackend/backend-dispatched-backend.cpp around lines 24
to 26, replace the FATAL call that aborts the program when shmem_data is null
with a graceful error handling approach. Modify the function to return an error
status or code indicating failure to get the shared memory address, allowing the
caller to handle the error appropriately instead of terminating the program
abruptly.
static void ggml_backend_remoting_buffer_memset_tensor(ggml_backend_buffer_t buffer, ggml_tensor * tensor, uint8_t value, size_t offset, size_t size) { | ||
NOT_IMPLEMENTED; | ||
|
||
STOP_HERE; | ||
|
||
UNUSED(buffer); | ||
UNUSED(tensor); | ||
UNUSED(value); | ||
UNUSED(offset); | ||
UNUSED(size); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unimplemented functions will crash the program
Functions marked with NOT_IMPLEMENTED
followed by STOP_HERE
will halt execution. This is problematic for production code.
Consider returning an error instead:
static void ggml_backend_remoting_buffer_memset_tensor(ggml_backend_buffer_t buffer, ggml_tensor * tensor, uint8_t value, size_t offset, size_t size) {
NOT_IMPLEMENTED;
-
- STOP_HERE;
-
- UNUSED(buffer);
- UNUSED(tensor);
- UNUSED(value);
- UNUSED(offset);
- UNUSED(size);
+ // Log error and handle gracefully
+ ERROR("memset_tensor not implemented for remoting buffer");
+ // Could potentially fall back to get_tensor/set_tensor implementation
}
Would you like me to provide a complete implementation or open an issue to track this?
Also applies to: 109-119
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingfrontend/ggml-backend-buffer.cpp at lines 26 to 36, the
function ggml_backend_remoting_buffer_memset_tensor is currently unimplemented
and uses NOT_IMPLEMENTED and STOP_HERE macros that cause the program to crash.
Replace these macros with a proper error handling mechanism, such as returning
an error code or status indicating the function is not implemented, to avoid
halting execution in production. Apply the same fix to the similar unimplemented
function at lines 109 to 119.
STOP_HERE; | ||
|
||
return true; | ||
|
||
UNUSED(buffer); | ||
UNUSED(src); | ||
UNUSED(dst); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unreachable code after STOP_HERE
The return statement and UNUSED macros after STOP_HERE
are unreachable.
static bool ggml_backend_remoting_buffer_cpy_tensor(ggml_backend_buffer_t buffer, const ggml_tensor * src, ggml_tensor * dst) {
NOT_IMPLEMENTED;
STOP_HERE;
-
- return true;
-
- UNUSED(buffer);
- UNUSED(src);
- UNUSED(dst);
}
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingfrontend/ggml-backend-buffer.cpp around lines 112 to
119, the return statement and UNUSED macros after the STOP_HERE macro are
unreachable and should be removed. Delete the lines containing 'return true;'
and the UNUSED(buffer), UNUSED(src), and UNUSED(dst) macros following STOP_HERE
to clean up the code.
const struct ggml_backend_metal_device_context *get_metal_dev_context(const ggml_backend_dev_t dev) { | ||
static struct ggml_backend_metal_device_context metal_dev_ctx; | ||
static bool has_metal_dev_ctx = false; | ||
|
||
if (has_metal_dev_ctx) { | ||
return &metal_dev_ctx; | ||
} | ||
|
||
struct virtgpu *gpu = DEV_TO_GPU(dev); | ||
|
||
apir_metal_get_device_context(gpu, &metal_dev_ctx); | ||
|
||
return &metal_dev_ctx; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical bug: Metal device context is never cached
The static variable has_metal_dev_ctx
is never set to true
, causing the remote call to be made every time instead of using the cached value.
const struct ggml_backend_metal_device_context *get_metal_dev_context(const ggml_backend_dev_t dev) {
static struct ggml_backend_metal_device_context metal_dev_ctx;
static bool has_metal_dev_ctx = false;
if (has_metal_dev_ctx) {
return &metal_dev_ctx;
}
struct virtgpu *gpu = DEV_TO_GPU(dev);
apir_metal_get_device_context(gpu, &metal_dev_ctx);
+ has_metal_dev_ctx = true;
return &metal_dev_ctx;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const struct ggml_backend_metal_device_context *get_metal_dev_context(const ggml_backend_dev_t dev) { | |
static struct ggml_backend_metal_device_context metal_dev_ctx; | |
static bool has_metal_dev_ctx = false; | |
if (has_metal_dev_ctx) { | |
return &metal_dev_ctx; | |
} | |
struct virtgpu *gpu = DEV_TO_GPU(dev); | |
apir_metal_get_device_context(gpu, &metal_dev_ctx); | |
return &metal_dev_ctx; | |
} | |
const struct ggml_backend_metal_device_context *get_metal_dev_context(const ggml_backend_dev_t dev) { | |
static struct ggml_backend_metal_device_context metal_dev_ctx; | |
static bool has_metal_dev_ctx = false; | |
if (has_metal_dev_ctx) { | |
return &metal_dev_ctx; | |
} | |
struct virtgpu *gpu = DEV_TO_GPU(dev); | |
apir_metal_get_device_context(gpu, &metal_dev_ctx); | |
has_metal_dev_ctx = true; | |
return &metal_dev_ctx; | |
} |
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingfrontend/ggml-metal-remoting.cpp around lines 4 to 17,
the static boolean has_metal_dev_ctx is never set to true after initializing
metal_dev_ctx, causing the function to always fetch the device context remotely
instead of caching it. Fix this by setting has_metal_dev_ctx to true immediately
after the call to apir_metal_get_device_context to ensure the cached context is
reused on subsequent calls.
if (cmd_type > APIR_BACKEND_DISPATCH_TABLE_COUNT) { | ||
ERROR("Received an invalid dispatch index (%d > %d)\n", | ||
cmd_type, APIR_BACKEND_DISPATCH_TABLE_COUNT); | ||
return APIR_BACKEND_FORWARD_INDEX_INVALID; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add lower bound check for cmd_type.
The function only checks the upper bound but not for negative values, which could lead to buffer underflow.
Apply this fix:
- if (cmd_type > APIR_BACKEND_DISPATCH_TABLE_COUNT) {
+ if (cmd_type >= APIR_BACKEND_DISPATCH_TABLE_COUNT) {
ERROR("Received an invalid dispatch index (%d > %d)\n",
cmd_type, APIR_BACKEND_DISPATCH_TABLE_COUNT);
return APIR_BACKEND_FORWARD_INDEX_INVALID;
}
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingbackend/backend.cpp around lines 125 to 129, the code
checks if cmd_type exceeds the upper bound APIR_BACKEND_DISPATCH_TABLE_COUNT but
does not verify if cmd_type is negative. Add a lower bound check to ensure
cmd_type is not less than zero before proceeding. If cmd_type is out of bounds
(either negative or greater than the upper limit), log an error and return
APIR_BACKEND_FORWARD_INDEX_INVALID to prevent buffer underflow.
struct timespec ts; | ||
clock_gettime(CLOCK_REALTIME, &ts); // Use CLOCK_MONOTONIC for elapsed time | ||
timer->start = (long long)ts.tv_sec * 1000000000LL + ts.tv_nsec; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use CLOCK_MONOTONIC for elapsed time measurement.
The comments indicate CLOCK_MONOTONIC should be used, but the code uses CLOCK_REALTIME. CLOCK_MONOTONIC is preferred for measuring elapsed time as it's not affected by system time adjustments.
static inline void start_timer(struct timer_data *timer) {
struct timespec ts;
- clock_gettime(CLOCK_REALTIME, &ts); // Use CLOCK_MONOTONIC for elapsed time
+ clock_gettime(CLOCK_MONOTONIC, &ts);
timer->start = (long long)ts.tv_sec * 1000000000LL + ts.tv_nsec;
}
static inline void stop_timer(struct timer_data *timer) {
struct timespec ts;
- clock_gettime(CLOCK_REALTIME, &ts); // Use CLOCK_MONOTONIC for elapsed time
+ clock_gettime(CLOCK_MONOTONIC, &ts);
long long timer_end = (long long)ts.tv_sec * 1000000000LL + ts.tv_nsec;
Also applies to: 94-96
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingbackend/shared/apir_backend.h around lines 88 to 91 and
94 to 96, replace CLOCK_REALTIME with CLOCK_MONOTONIC in the clock_gettime calls
to ensure elapsed time measurement is not affected by system time changes. This
involves changing the clock_gettime argument from CLOCK_REALTIME to
CLOCK_MONOTONIC in both places.
if (code == code_name) return #code_name; \ | ||
} while (0) \ | ||
|
||
APIR_BACKEND_INITIALIZE_ERROR(APIR_BACKEND_INITIALIZE_SUCCESSS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update macro reference to use corrected constant name.
This needs to be updated to match the corrected macro name.
- APIR_BACKEND_INITIALIZE_ERROR(APIR_BACKEND_INITIALIZE_SUCCESSS);
+ APIR_BACKEND_INITIALIZE_ERROR(APIR_BACKEND_INITIALIZE_SUCCESS);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
APIR_BACKEND_INITIALIZE_ERROR(APIR_BACKEND_INITIALIZE_SUCCESSS); | |
APIR_BACKEND_INITIALIZE_ERROR(APIR_BACKEND_INITIALIZE_SUCCESS); |
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingbackend/shared/apir_backend.h at line 117, the macro
reference uses APIR_BACKEND_INITIALIZE_SUCCESSS which appears to be a typo.
Update this to the corrected constant name, likely
APIR_BACKEND_INITIALIZE_SUCCESS, to ensure the macro reference is accurate and
consistent.
@@ -0,0 +1,127 @@ | |||
#pragma once | |||
|
|||
#define APIR_BACKEND_INITIALIZE_SUCCESSS 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in macro name.
The macro name has an extra 'S' at the end.
-#define APIR_BACKEND_INITIALIZE_SUCCESSS 0
+#define APIR_BACKEND_INITIALIZE_SUCCESS 0
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#define APIR_BACKEND_INITIALIZE_SUCCESSS 0 | |
#define APIR_BACKEND_INITIALIZE_SUCCESS 0 |
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingbackend/shared/apir_backend.h at line 3, the macro name
APIR_BACKEND_INITIALIZE_SUCCESSS contains a typo with an extra 'S' at the end.
Rename the macro to APIR_BACKEND_INITIALIZE_SUCCESS by removing the extra 'S' to
correct the spelling.
static bool | ||
ggml_backend_remoting_device_supports_op(ggml_backend_dev_t dev, const ggml_tensor * op) { | ||
#if USE_ALWAYS_TRUE_SUPPORTS_OP == 1 | ||
/* ggml-rpc cheats it like this */ | ||
/* with the current implementation of serialize_tensor, the src/view aren't properly passed */ | ||
UNUSED(dev); | ||
UNUSED(op); | ||
|
||
return true; | ||
#elif USE_METAL_GUEST_SUPPORTS_OP == 1 | ||
UNUSED(dev); | ||
|
||
struct ggml_backend_remoting_device_context *device_ctx = GET_DEVICE_CONTEXT(); | ||
|
||
return ggml_metal_supports_op(device_ctx->metal_dev_ctx, op); | ||
#else | ||
struct virtgpu *gpu = DEV_TO_GPU(dev); | ||
|
||
return apir_device_supports_op(gpu, op); | ||
#endif | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address incomplete tensor serialization
The comment indicates that serialize_tensor
doesn't properly pass src/view information, which could lead to incorrect operation support queries. This is currently worked around by compile-time flags, but the underlying issue should be fixed.
Would you like me to open an issue to track fixing the tensor serialization to properly include src/view information?
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingfrontend/ggml-backend-device.cpp between lines 45 and
65, the function ggml_backend_remoting_device_supports_op currently relies on
compile-time flags to bypass proper handling of tensor src/view serialization,
which is incomplete. To fix this, update the serialize_tensor implementation to
correctly include src/view information so that operation support queries can be
accurate without workarounds. This involves modifying the serialization logic to
fully capture tensor source and view details and then adjusting
ggml_backend_remoting_device_supports_op to use this corrected serialization
data.
ggml_backend_buffer_type_t | ||
ggml_backend_remoting_device_get_buffer_type(ggml_backend_dev_t dev) { | ||
IMPLEMENTED_ONCE; | ||
|
||
struct virtgpu *gpu = DEV_TO_GPU(dev); | ||
|
||
apir_buffer_type_host_handle_t ctx = apir_device_get_buffer_type(gpu); | ||
|
||
static struct ggml_backend_buffer_type buft { | ||
/* .iface = */ ggml_backend_remoting_buffer_type_interface, | ||
/* .device = */ dev, | ||
/* .context = */ (void *) ctx, | ||
}; | ||
|
||
return &buft; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Review static buffer type usage for thread safety
Multiple functions use static buffer type structs. This could be problematic if:
- Multiple devices need different buffer types
- The context needs to be device-specific
If these buffer types are meant to be singletons, consider documenting this design decision. Otherwise, allocate them dynamically per device.
Also applies to: 146-161, 187-198
🏁 Script executed:
#!/bin/bash
# Description: Check if multiple devices are supported and how buffer types are used
# Search for device enumeration or multiple device support
rg -A 5 'device.*count|get.*device.*count|multiple.*device'
# Check how buffer types are used across the codebase
ast-grep --pattern 'ggml_backend_buffer_type'
Length of output: 89479
Separate buffer types per remoting device
The local static buft
in ggml-remotingfrontend/ggml-backend-device.cpp only initializes once (on first call), binding the buffer type to the first dev
/ctx
. All subsequent devices will erroneously share that same buft
. To support multiple remoting devices, you must create one buffer-type struct per device:
• File: ggml/src/ggml-remotingfrontend/ggml-backend-device.cpp, function ggml_backend_remoting_device_get_buffer_type
(lines 129–144)
Replace:
cpp static struct ggml_backend_buffer_type buft { .iface = ggml_backend_remoting_buffer_type_interface, .device = dev, .context = (void*)ctx, }; return &buft;
With either:
- A static array sized by
apir_device_get_count(gpu)
, indexed bydev
. - Dynamically allocating a new
ggml_backend_buffer_type
perdev
(e.g.new ggml_backend_buffer_type{…}
).
Ensure each remoting device has its own ggml_backend_buffer_type
instance so that .device
and .context
reflect the correct device.
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingfrontend/ggml-backend-device.cpp around lines 129 to
144, the static local variable buft is shared across all calls, causing all
devices to share the same buffer type instance with incorrect device and
context. To fix this, replace the static buft with either a static array of
ggml_backend_buffer_type sized by the number of devices and indexed by dev, or
dynamically allocate a new ggml_backend_buffer_type for each device on each
call. This ensures each remoting device has its own distinct buffer type
instance with the correct device and context values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review continued from previous batch...
static enum ggml_backend_dev_type | ||
ggml_backend_remoting_device_get_type(ggml_backend_dev_t dev) { | ||
IMPLEMENTED_ONCE; | ||
struct virtgpu *gpu = DEV_TO_GPU(dev); | ||
|
||
static enum ggml_backend_dev_type type; | ||
static bool has_type = false; | ||
if (!has_type) { | ||
has_type = true; | ||
type = (enum ggml_backend_dev_type) apir_device_get_type(gpu); | ||
} | ||
|
||
return type; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix race condition in device type caching
The static variables type
and has_type
are accessed without synchronization, creating a race condition if multiple threads call this function simultaneously.
Consider using std::once_flag
with std::call_once
for thread-safe initialization:
static enum ggml_backend_dev_type
ggml_backend_remoting_device_get_type(ggml_backend_dev_t dev) {
IMPLEMENTED_ONCE;
struct virtgpu *gpu = DEV_TO_GPU(dev);
static enum ggml_backend_dev_type type;
- static bool has_type = false;
- if (!has_type) {
- has_type = true;
+ static std::once_flag type_flag;
+ std::call_once(type_flag, [&gpu, &type]() {
type = (enum ggml_backend_dev_type) apir_device_get_type(gpu);
- }
+ });
return type;
}
Don't forget to include <mutex>
at the top of the file.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
static enum ggml_backend_dev_type | |
ggml_backend_remoting_device_get_type(ggml_backend_dev_t dev) { | |
IMPLEMENTED_ONCE; | |
struct virtgpu *gpu = DEV_TO_GPU(dev); | |
static enum ggml_backend_dev_type type; | |
static bool has_type = false; | |
if (!has_type) { | |
has_type = true; | |
type = (enum ggml_backend_dev_type) apir_device_get_type(gpu); | |
} | |
return type; | |
} | |
static enum ggml_backend_dev_type | |
ggml_backend_remoting_device_get_type(ggml_backend_dev_t dev) { | |
IMPLEMENTED_ONCE; | |
struct virtgpu *gpu = DEV_TO_GPU(dev); | |
static enum ggml_backend_dev_type type; | |
static std::once_flag type_flag; | |
std::call_once(type_flag, [&gpu, &type]() { | |
type = (enum ggml_backend_dev_type) apir_device_get_type(gpu); | |
}); | |
return type; | |
} |
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingfrontend/ggml-backend-device.cpp around lines 21 to 34,
the static variables used for caching device type are accessed without
synchronization, causing a race condition in multithreaded scenarios. Replace
the static bool and type variables with a std::once_flag and use std::call_once
to initialize the device type exactly once in a thread-safe manner. Also,
include the <mutex> header at the top of the file to use these synchronization
primitives.
static inline const char *backend_dispatch_command_name(ApirBackendCommandType type) | ||
{ | ||
switch (type) { | ||
/* device */ | ||
case APIR_COMMAND_TYPE_DEVICE_GET_COUNT: return "backend_get_device_count"; | ||
case APIR_COMMAND_TYPE_DEVICE_GET_NAME: return "backend_get_device_name"; | ||
case APIR_COMMAND_TYPE_DEVICE_GET_DESCRIPTION: return "backend_get_device_description"; | ||
case APIR_COMMAND_TYPE_DEVICE_GET_TYPE: return "backend_device_get_type"; | ||
case APIR_COMMAND_TYPE_DEVICE_GET_MEMORY: return "backend_get_device_memory"; | ||
case APIR_COMMAND_TYPE_DEVICE_SUPPORTS_OP: return "backend_device_supports_op"; | ||
case APIR_COMMAND_TYPE_DEVICE_GET_BUFFER_TYPE: return "backend_get_buffer_type"; | ||
case APIR_COMMAND_TYPE_DEVICE_GET_PROPS: return "backend_get_props"; | ||
case APIR_COMMAND_TYPE_DEVICE_BUFFER_FROM_PTR: return "backend_buffer_from_ptr"; | ||
|
||
/* buffer-type */ | ||
case APIR_COMMAND_TYPE_BUFFER_TYPE_GET_NAME: return "backend_buffer_type_get_name"; | ||
case APIR_COMMAND_TYPE_BUFFER_TYPE_GET_ALIGNMENT: return "backend_buffer_type_get_alignment"; | ||
case APIR_COMMAND_TYPE_BUFFER_TYPE_GET_MAX_SIZE: return "backend_buffer_type_get_max_size"; | ||
case APIR_COMMAND_TYPE_BUFFER_TYPE_IS_HOST: return "backend_buffer_type_is_host"; | ||
case APIR_COMMAND_TYPE_BUFFER_TYPE_ALLOC_BUFFER: return "backend_buffer_type_alloc_buffer"; | ||
|
||
/* buffer */ | ||
case APIR_COMMAND_TYPE_BUFFER_GET_BASE: return "backend_buffer_get_base"; | ||
case APIR_COMMAND_TYPE_BUFFER_SET_TENSOR: return "backend_buffer_set_tensor"; | ||
case APIR_COMMAND_TYPE_BUFFER_GET_TENSOR: return "backend_buffer_get_tensor"; | ||
case APIR_COMMAND_TYPE_BUFFER_CLEAR: return "backend_buffer_clear"; | ||
case APIR_COMMAND_TYPE_BUFFER_FREE_BUFFER: return "backend_buffer_free_buffer"; | ||
|
||
/* backend */ | ||
case APIR_COMMAND_TYPE_BACKEND_GRAPH_COMPUTE: return "backend_graph_compute"; | ||
|
||
/* metal */ | ||
case APIR_COMMAND_TYPE_METAL_GET_DEVICE_CONTEXT: return "metal_get_device_context"; | ||
|
||
default: return "unknown"; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix function name mismatches in command name mapping
Several command names don't match the actual function names, which could cause confusion during debugging.
static inline const char *backend_dispatch_command_name(ApirBackendCommandType type)
{
switch (type) {
/* device */
- case APIR_COMMAND_TYPE_DEVICE_GET_COUNT: return "backend_get_device_count";
- case APIR_COMMAND_TYPE_DEVICE_GET_NAME: return "backend_get_device_name";
- case APIR_COMMAND_TYPE_DEVICE_GET_DESCRIPTION: return "backend_get_device_description";
+ case APIR_COMMAND_TYPE_DEVICE_GET_COUNT: return "backend_reg_get_device_count";
+ case APIR_COMMAND_TYPE_DEVICE_GET_NAME: return "backend_device_get_name";
+ case APIR_COMMAND_TYPE_DEVICE_GET_DESCRIPTION: return "backend_device_get_description";
case APIR_COMMAND_TYPE_DEVICE_GET_TYPE: return "backend_device_get_type";
- case APIR_COMMAND_TYPE_DEVICE_GET_MEMORY: return "backend_get_device_memory";
+ case APIR_COMMAND_TYPE_DEVICE_GET_MEMORY: return "backend_device_get_memory";
case APIR_COMMAND_TYPE_DEVICE_SUPPORTS_OP: return "backend_device_supports_op";
- case APIR_COMMAND_TYPE_DEVICE_GET_BUFFER_TYPE: return "backend_get_buffer_type";
- case APIR_COMMAND_TYPE_DEVICE_GET_PROPS: return "backend_get_props";
- case APIR_COMMAND_TYPE_DEVICE_BUFFER_FROM_PTR: return "backend_buffer_from_ptr";
+ case APIR_COMMAND_TYPE_DEVICE_GET_BUFFER_TYPE: return "backend_device_get_buffer_type";
+ case APIR_COMMAND_TYPE_DEVICE_GET_PROPS: return "backend_device_get_props";
+ case APIR_COMMAND_TYPE_DEVICE_BUFFER_FROM_PTR: return "backend_device_buffer_from_ptr";
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
static inline const char *backend_dispatch_command_name(ApirBackendCommandType type) | |
{ | |
switch (type) { | |
/* device */ | |
case APIR_COMMAND_TYPE_DEVICE_GET_COUNT: return "backend_get_device_count"; | |
case APIR_COMMAND_TYPE_DEVICE_GET_NAME: return "backend_get_device_name"; | |
case APIR_COMMAND_TYPE_DEVICE_GET_DESCRIPTION: return "backend_get_device_description"; | |
case APIR_COMMAND_TYPE_DEVICE_GET_TYPE: return "backend_device_get_type"; | |
case APIR_COMMAND_TYPE_DEVICE_GET_MEMORY: return "backend_get_device_memory"; | |
case APIR_COMMAND_TYPE_DEVICE_SUPPORTS_OP: return "backend_device_supports_op"; | |
case APIR_COMMAND_TYPE_DEVICE_GET_BUFFER_TYPE: return "backend_get_buffer_type"; | |
case APIR_COMMAND_TYPE_DEVICE_GET_PROPS: return "backend_get_props"; | |
case APIR_COMMAND_TYPE_DEVICE_BUFFER_FROM_PTR: return "backend_buffer_from_ptr"; | |
/* buffer-type */ | |
case APIR_COMMAND_TYPE_BUFFER_TYPE_GET_NAME: return "backend_buffer_type_get_name"; | |
case APIR_COMMAND_TYPE_BUFFER_TYPE_GET_ALIGNMENT: return "backend_buffer_type_get_alignment"; | |
case APIR_COMMAND_TYPE_BUFFER_TYPE_GET_MAX_SIZE: return "backend_buffer_type_get_max_size"; | |
case APIR_COMMAND_TYPE_BUFFER_TYPE_IS_HOST: return "backend_buffer_type_is_host"; | |
case APIR_COMMAND_TYPE_BUFFER_TYPE_ALLOC_BUFFER: return "backend_buffer_type_alloc_buffer"; | |
/* buffer */ | |
case APIR_COMMAND_TYPE_BUFFER_GET_BASE: return "backend_buffer_get_base"; | |
case APIR_COMMAND_TYPE_BUFFER_SET_TENSOR: return "backend_buffer_set_tensor"; | |
case APIR_COMMAND_TYPE_BUFFER_GET_TENSOR: return "backend_buffer_get_tensor"; | |
case APIR_COMMAND_TYPE_BUFFER_CLEAR: return "backend_buffer_clear"; | |
case APIR_COMMAND_TYPE_BUFFER_FREE_BUFFER: return "backend_buffer_free_buffer"; | |
/* backend */ | |
case APIR_COMMAND_TYPE_BACKEND_GRAPH_COMPUTE: return "backend_graph_compute"; | |
/* metal */ | |
case APIR_COMMAND_TYPE_METAL_GET_DEVICE_CONTEXT: return "metal_get_device_context"; | |
default: return "unknown"; | |
} | |
} | |
static inline const char *backend_dispatch_command_name(ApirBackendCommandType type) | |
{ | |
switch (type) { | |
/* device */ | |
case APIR_COMMAND_TYPE_DEVICE_GET_COUNT: return "backend_reg_get_device_count"; | |
case APIR_COMMAND_TYPE_DEVICE_GET_NAME: return "backend_device_get_name"; | |
case APIR_COMMAND_TYPE_DEVICE_GET_DESCRIPTION: return "backend_device_get_description"; | |
case APIR_COMMAND_TYPE_DEVICE_GET_TYPE: return "backend_device_get_type"; | |
case APIR_COMMAND_TYPE_DEVICE_GET_MEMORY: return "backend_device_get_memory"; | |
case APIR_COMMAND_TYPE_DEVICE_SUPPORTS_OP: return "backend_device_supports_op"; | |
case APIR_COMMAND_TYPE_DEVICE_GET_BUFFER_TYPE: return "backend_device_get_buffer_type"; | |
case APIR_COMMAND_TYPE_DEVICE_GET_PROPS: return "backend_device_get_props"; | |
case APIR_COMMAND_TYPE_DEVICE_BUFFER_FROM_PTR: return "backend_device_buffer_from_ptr"; | |
/* buffer-type */ | |
case APIR_COMMAND_TYPE_BUFFER_TYPE_GET_NAME: return "backend_buffer_type_get_name"; | |
case APIR_COMMAND_TYPE_BUFFER_TYPE_GET_ALIGNMENT: return "backend_buffer_type_get_alignment"; | |
case APIR_COMMAND_TYPE_BUFFER_TYPE_GET_MAX_SIZE: return "backend_buffer_type_get_max_size"; | |
case APIR_COMMAND_TYPE_BUFFER_TYPE_IS_HOST: return "backend_buffer_type_is_host"; | |
case APIR_COMMAND_TYPE_BUFFER_TYPE_ALLOC_BUFFER: return "backend_buffer_type_alloc_buffer"; | |
/* buffer */ | |
case APIR_COMMAND_TYPE_BUFFER_GET_BASE: return "backend_buffer_get_base"; | |
case APIR_COMMAND_TYPE_BUFFER_SET_TENSOR: return "backend_buffer_set_tensor"; | |
case APIR_COMMAND_TYPE_BUFFER_GET_TENSOR: return "backend_buffer_get_tensor"; | |
case APIR_COMMAND_TYPE_BUFFER_CLEAR: return "backend_buffer_clear"; | |
case APIR_COMMAND_TYPE_BUFFER_FREE_BUFFER: return "backend_buffer_free_buffer"; | |
/* backend */ | |
case APIR_COMMAND_TYPE_BACKEND_GRAPH_COMPUTE: return "backend_graph_compute"; | |
/* metal */ | |
case APIR_COMMAND_TYPE_METAL_GET_DEVICE_CONTEXT: return "metal_get_device_context"; | |
default: return "unknown"; | |
} | |
} |
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingbackend/backend-dispatched.h between lines 52 and 88,
some returned command name strings do not match their actual function names,
causing potential confusion. Review each case in the switch statement and update
the returned string to exactly match the corresponding function name for that
command type, ensuring consistency and clarity in debugging output.
ggml_backend_dev_t ggml_backend_remoting_get_device(size_t device); | ||
ggml_backend_buffer_type_t ggml_backend_remoting_host_buffer_type(); | ||
ggml_backend_t ggml_backend_remoting_device_init(ggml_backend_dev_t dev, const char * params); | ||
ggml_backend_buffer_type_t ggml_backend_remoting_device_get_buffer_type(ggml_backend_dev_t dev); | ||
ggml_backend_t ggml_backend_remoting_device_init(ggml_backend_dev_t dev, const char * params); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove duplicate function declaration
The function ggml_backend_remoting_device_init
is declared twice.
ggml_backend_dev_t ggml_backend_remoting_get_device(size_t device);
ggml_backend_buffer_type_t ggml_backend_remoting_host_buffer_type();
ggml_backend_t ggml_backend_remoting_device_init(ggml_backend_dev_t dev, const char * params);
ggml_backend_buffer_type_t ggml_backend_remoting_device_get_buffer_type(ggml_backend_dev_t dev);
-ggml_backend_t ggml_backend_remoting_device_init(ggml_backend_dev_t dev, const char * params);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
ggml_backend_dev_t ggml_backend_remoting_get_device(size_t device); | |
ggml_backend_buffer_type_t ggml_backend_remoting_host_buffer_type(); | |
ggml_backend_t ggml_backend_remoting_device_init(ggml_backend_dev_t dev, const char * params); | |
ggml_backend_buffer_type_t ggml_backend_remoting_device_get_buffer_type(ggml_backend_dev_t dev); | |
ggml_backend_t ggml_backend_remoting_device_init(ggml_backend_dev_t dev, const char * params); | |
ggml_backend_dev_t ggml_backend_remoting_get_device(size_t device); | |
ggml_backend_buffer_type_t ggml_backend_remoting_host_buffer_type(); | |
ggml_backend_t ggml_backend_remoting_device_init(ggml_backend_dev_t dev, const char * params); | |
ggml_backend_buffer_type_t ggml_backend_remoting_device_get_buffer_type(ggml_backend_dev_t dev); |
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingfrontend/ggml-remoting.h around lines 116 to 120, the
function ggml_backend_remoting_device_init is declared twice. Remove the
duplicate declaration so that the function is declared only once to avoid
redundancy and potential confusion.
#define STOP_HERE \ | ||
thks_bye() | ||
|
||
#define BREAKPOINT \ | ||
breakpoint() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined function references in debug macros
The macros STOP_HERE
and BREAKPOINT
reference undefined functions that will cause link errors.
Either:
- Define these functions or include the appropriate header
- Replace with standard debugging approaches:
#define STOP_HERE \
- thks_bye()
+ __builtin_trap() // or assert(false)
#define BREAKPOINT \
- breakpoint()
+ __builtin_debugtrap() // or raise(SIGTRAP) on POSIX
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#define STOP_HERE \ | |
thks_bye() | |
#define BREAKPOINT \ | |
breakpoint() | |
#define STOP_HERE \ | |
__builtin_trap() // or assert(false) | |
#define BREAKPOINT \ | |
__builtin_debugtrap() // or raise(SIGTRAP) on POSIX |
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingfrontend/ggml-remoting.h around lines 56 to 61, the
macros STOP_HERE and BREAKPOINT reference undefined functions thks_bye() and
breakpoint(), causing link errors. To fix this, either define these functions or
include the headers where they are defined. Alternatively, replace these macros
with standard debugging methods such as calling abort() for STOP_HERE and using
compiler-specific breakpoint intrinsics or inline assembly for BREAKPOINT.
/* | ||
* Header for the Direct Rendering Manager | ||
* | ||
* Author: Rickard E. (Rik) Faith <[email protected]> | ||
* | ||
* Acknowledgments: | ||
* Dec 1999, Richard Henderson <[email protected]>, move to generic cmpxchg. | ||
*/ | ||
|
||
/* | ||
* Copyright 1999 Precision Insight, Inc., Cedar Park, Texas. | ||
* Copyright 2000 VA Linux Systems, Inc., Sunnyvale, California. | ||
* All rights reserved. | ||
* | ||
* Permission is hereby granted, free of charge, to any person obtaining a | ||
* copy of this software and associated documentation files (the "Software"), | ||
* to deal in the Software without restriction, including without limitation | ||
* the rights to use, copy, modify, merge, publish, distribute, sublicense, | ||
* and/or sell copies of the Software, and to permit persons to whom the | ||
* Software is furnished to do so, subject to the following conditions: | ||
* | ||
* The above copyright notice and this permission notice (including the next | ||
* paragraph) shall be included in all copies or substantial portions of the | ||
* Software. | ||
* | ||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL | ||
* VA LINUX SYSTEMS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR | ||
* OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, | ||
* ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR | ||
* OTHER DEALINGS IN THE SOFTWARE. | ||
*/ | ||
|
||
#ifndef _DRM_H_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify the source and version of this DRM UAPI header.
This appears to be a copy of the Linux kernel's DRM UAPI header. Consider documenting the source kernel version for future maintenance and updates.
🏁 Script executed:
#!/bin/bash
# Description: Check if this is a verbatim copy from the Linux kernel
# Search for similar files in the repository
fd -e h drm.h
# Check if there are any modifications from the standard kernel header
# by searching for common DRM defines
rg -A 5 "DRM_NAME.*drm.*Name in kernel"
Length of output: 820
Document upstream Linux kernel version for DRM UAPI header
The file ggml/src/ggml-remotingfrontend/include/drm-uapi/drm.h
appears to be a direct copy of the Linux kernel’s DRM UAPI header (e.g. it defines DRM_NAME
, DRM_MIN_ORDER
, etc.). To simplify future syncs and maintenance, please add a top‐of‐file comment indicating the exact kernel version (or commit/patch level) this was taken from, along with a link or reference to the corresponding Linux kernel source location.
Suggested update:
- File:
ggml/src/ggml-remotingfrontend/include/drm-uapi/drm.h
- At the very top, insert something like:
/* * Upstream source: Linux kernel v5.15-rc2 (commit abcdef1234567890) * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/drm/drm.h */
- Verify that any local changes are documented or reconsidered in future merges.
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingfrontend/include/drm-uapi/drm.h at the very top before
any existing comments, add a new comment block specifying the exact Linux kernel
version or commit hash from which this DRM UAPI header was copied, along with a
URL to the corresponding upstream source in the Linux kernel repository. This
will help future maintainers track the origin and simplify syncing updates.
Also, review and document any local modifications compared to the upstream
version.
// needs UNUSED to be defined | ||
// needs FATAL to be defined | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Document the required macro dependencies.
The file requires UNUSED
and FATAL
macros to be defined but doesn't include a header that provides them. Consider either including the required header or documenting where these should come from.
// needs UNUSED to be defined
// needs FATAL to be defined
+// Example: These macros should be defined in a common header like "common.h" or "utils.h"
+// #define UNUSED(x) ((void)(x))
+// #define FATAL(...) do { fprintf(stderr, __VA_ARGS__); abort(); } while(0)
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingbackend/shared/venus_cs.h around lines 6 to 8, the file
uses the macros UNUSED and FATAL without defining them or including a header
that does. Fix this by either including the appropriate header file that defines
these macros or adding comments to document that these macros must be defined
before including this file.
/* apir_buffer_type_host_handle_t */ | ||
|
||
static inline void | ||
vn_encode_apir_buffer_type_host_handle_t(struct vn_cs_encoder *enc, const apir_buffer_type_host_handle_t *val) | ||
{ | ||
vn_encode(enc, sizeof(apir_buffer_type_host_handle_t), val, sizeof(apir_buffer_type_host_handle_t)); | ||
} | ||
|
||
static inline void | ||
vn_decode_apir_buffer_type_host_handle_t(struct vn_cs_decoder *dec, apir_buffer_type_host_handle_t *val) | ||
{ | ||
vn_decode(dec, sizeof(apir_buffer_type_host_handle_t), val, sizeof(apir_buffer_type_host_handle_t)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing type definitions for apir_buffer types.
The functions use apir_buffer_type_host_handle_t
and apir_buffer_host_handle_t
types that are not defined in this header.
Include the header that defines these types or add forward declarations:
+// Forward declarations or include the header that defines these
+// #include "apir_backend.h"
+typedef struct apir_buffer_type_host_handle_t apir_buffer_type_host_handle_t;
+typedef struct apir_buffer_host_handle_t apir_buffer_host_handle_t;
+
/* apir_buffer_type_host_handle_t */
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingbackend/shared/venus_cs.h around lines 514 to 526, the
types apir_buffer_type_host_handle_t and apir_buffer_host_handle_t used in the
encode and decode functions are not defined in this header. To fix this, include
the appropriate header file where these types are defined or add forward
declarations for these types at the top of this header file to ensure they are
recognized during compilation.
|
||
static inline void | ||
vn_encode_ggml_buffer(struct vn_cs_encoder *enc, const ggml_backend_buffer_t buffer) { | ||
apir_buffer_host_handle_t handle = BUFFER_TO_HOST_HANDLE(buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing definition for BUFFER_TO_HOST_HANDLE macro.
The macro BUFFER_TO_HOST_HANDLE
is used but not defined in this header.
Add the macro definition or include the header that defines it:
+// Include the header that defines BUFFER_TO_HOST_HANDLE or define it here
+// #define BUFFER_TO_HOST_HANDLE(buffer) ggml_buffer_to_apir_handle(buffer)
+
static inline void
vn_encode_ggml_buffer(struct vn_cs_encoder *enc, const ggml_backend_buffer_t buffer) {
apir_buffer_host_handle_t handle = BUFFER_TO_HOST_HANDLE(buffer);
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h at line 100, the macro
BUFFER_TO_HOST_HANDLE is used but not defined or included. To fix this, either
add the definition of the BUFFER_TO_HOST_HANDLE macro in this header or include
the appropriate header file where it is defined, ensuring the macro is available
before its usage.
|
||
// it safe to remove the `const` qualifier here, we *do* want to | ||
// modify the shared memory data to fix the `src` pointers. | ||
ggml_tensor *tensor = (ggml_tensor *)(uintptr_t) vn_cs_decoder_use_inplace(dec, sizeof(ggml_tensor)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Document the safety of the const cast.
The const qualifier is removed to modify shared memory data. While this appears intentional for fixing src pointers, it should be documented.
- // it safe to remove the `const` qualifier here, we *do* want to
+ // it is safe to remove the `const` qualifier here, we *do* want to
// modify the shared memory data to fix the `src` pointers.
+ // This is required because the deserialized tensor pointers need to be
+ // updated to point to the correct locations in the shared memory.
ggml_tensor *tensor = (ggml_tensor *)(uintptr_t) vn_cs_decoder_use_inplace(dec, sizeof(ggml_tensor));
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h at line 217, the code
removes the const qualifier by casting to a non-const pointer to modify shared
memory data. Add a comment explaining that this const cast is intentional and
safe because the modification is required to fix src pointers in shared memory,
clarifying the rationale to future readers and maintainers.
#ifndef VIRTGPU_DRM_H | ||
#define VIRTGPU_DRM_H | ||
|
||
#include "drm.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify the include path for drm.h
The include uses quotes which suggests a local header. Since this file is in drm-uapi/
, the include path should likely be relative or use the full path from the include directory.
🏁 Script executed:
#!/bin/bash
# Description: Check if drm.h exists in the expected location and how it's included elsewhere
# Find drm.h in the repository
fd -t f "drm.h$"
# Check how other files include drm headers
rg -A 2 '#include.*drm\.h'
Length of output: 760
Fix the include path in virtgpu_drm.h
The header currently uses a local include ("drm.h"
), but to match the project’s include-directory layout and how other headers reference drm-uapi files, it should use the full relative path.
• File: ggml/src/ggml-remotingfrontend/include/drm-uapi/virtgpu_drm.h
Change:
-#include "drm.h"
+#include "drm-uapi/drm.h"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#include "drm.h" | |
#include "drm-uapi/drm.h" |
🤖 Prompt for AI Agents
In ggml/src/ggml-remotingfrontend/include/drm-uapi/virtgpu_drm.h at line 27,
change the include directive from a local include "drm.h" to use the full
relative path that matches the project's include-directory layout and other
drm-uapi headers. Replace #include "drm.h" with the appropriate relative path
include, such as #include "drm-uapi/drm.h".
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* oai moe * compat with new checkpoint * add attn sink impl * add rope scaling yarn * logits match with latest transformers code * wip chat template * rm trailing space * use ggml_scale_bias * rm redundant is_swa_all * convert interleaved gate_up * graph : fix activation function to match reference (crc-org#7) * vocab : handle o200k_harmony special tokens * ggml : add attention sinks support (crc-org#1) * llama : add attn sinks * ggml : add attn sinks * cuda : add attn sinks * vulkan : add support for sinks in softmax remove unnecessary return * ggml : add fused swiglu_oai op (crc-org#11) * ggml : add fused swiglu_oai op * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <[email protected]> * update CUDA impl * cont : metal impl * add vulkan impl * test-backend-ops : more test cases, clean up * llama : remove unfused impl * remove extra lines --------- Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: slaren <[email protected]> * repack mxfp4 upon conversion * clean up a bit * enable thinking * add quick hack to render only some special tokens * fix bf16 conversion * remove vocab hack * webui ok * support chat parsing for gpt-oss * fix webui * direct mapping mxfp4, FINALLY * force using mxfp4 * properly use lazy tensor * ggml : add mxfp4 ggml : use e8m0 conversion instead of powf Co-authored-by: Diego Devesa <[email protected]> change kvalues_mxfp4 table to match e2m1 (crc-org#6) metal : remove quantization for now (not used) cuda : fix disabled CUDA graphs due to ffn moe bias vulkan : add support for mxfp4 cont : add cm2 dequant * ggml : add ggml_add_id (crc-org#13) * ggml : add ggml_add_id * add cuda impl * llama : add weight support check for add_id * perf opt * add vulkan impl * rename cuda files * add metal impl * allow in-place ggml_add_id * llama : keep biases on CPU with --cpu-moe * llama : fix compile error ggml-ci * cuda : add fallback for __nv_cvt_e8m0_to_bf16raw ggml-ci * cleanup ggml-ci * sycl : fix supports_op for MXFP4 ggml-ci * fix Unknown reasoning format * ggml-cpu : fix AVX build ggml-ci * fix hip build ggml-ci * cuda : add mxfp4 dequantization support for cuBLAS ggml-ci * ggml-cpu : fix mxfp4 fallback definitions for some architectures ggml-ci * cuda : fix version required for __nv_cvt_e8m0_to_bf16raw --------- Co-authored-by: Xuan Son Nguyen <[email protected]> Co-authored-by: slaren <[email protected]>
…e_context function for API Remoting
Summary by Sourcery
Introduce a new API remoting layer for llama.cpp that forwards graph computations and buffer operations to a remote execution context over VirtGPU (virgl renderer). Add remoting frontend and backend libraries with encoder/decoder dispatch logic, integrate virtgpu-shared memory for data exchange, and wire it into the build system under new GGML_REMOTING_FRONTEND/BACKEND flags.
New Features:
Enhancements:
Build:
Summary by CodeRabbit
New Features
Enhancements
Bug Fixes
Refactor
Chores
Documentation