WIP: Introduce llama.cpp API Remoting #1

kpouget · 2025-07-04T06:51:15Z

Summary by Sourcery

Introduce a new API remoting layer for llama.cpp that forwards graph computations and buffer operations to a remote execution context over VirtGPU (virgl renderer). Add remoting frontend and backend libraries with encoder/decoder dispatch logic, integrate virtgpu-shared memory for data exchange, and wire it into the build system under new GGML_REMOTING_FRONTEND/BACKEND flags.

New Features:

Add VirtGPU-based API remoting frontend and backend libraries to forward ggml graph compute and buffer operations to a remote process via virgl renderer.
Introduce new CMake/Makefile options (GGML_REMOTING_FRONTEND and GGML_REMOTING_BACKEND) and build rules to conditionally compile the remoting modules.
Implement remote call infrastructure with custom encoder/decoder, shared memory management, and command dispatch over DRM ioctls.

Enhancements:

Comment out most LLAMA_LOG_DEBUG/INFO calls to reduce internal logging noise during model loading and tensor operations.
Expose Metal device context remotely via ggml_backend_metal_get_device_context to propagate device capabilities across the remoting boundary.
Integrate timer instrumentation around Metal compute_graph and llama token generation to collect and report performance metrics triggered at exit or via USR1 signal.

Build:

Update Makefile and CMakeLists to register and compile remoting frontend and backend modules and link against DRM for VirtGPU support.

Summary by CodeRabbit

New Features
- Introduced remoting backend and frontend support, enabling remote execution and resource sharing across devices.
- Added extensive shell scripts for preparing, building, and running remoting and Vulkan environments, including containerized compilation and benchmarking.
- Implemented advanced key-value cache management for improved model performance and memory handling.
- Added timing instrumentation for performance measurement in Metal backend and token generation.
Enhancements
- Added new CMake and Makefile options for remoting backends.
- Extended backend device and buffer interfaces to support remote and Metal-specific capabilities.
Bug Fixes
- Improved error handling and logging in backend and buffer operations.
Refactor
- Modularized backend and buffer logic, separating device, buffer, and Metal-specific code for clarity and maintainability.
Chores
- Removed numerous GitHub Actions workflows, issue templates, and CI/CD configuration files to streamline repository maintenance.
- Disabled or silenced non-essential logging and informational output for a quieter runtime experience.
Documentation
- Added comprehensive header files for DRM, VirtGPU, and backend interfaces to clarify API usage and system integration.

sourcery-ai · 2025-07-04T06:51:19Z

Reviewer's Guide

This PR introduces a new API–remoting layer into llama.cpp by splitting the ggml backend into a “remoting frontend” (runs in the client process) and a “remoting backend” (runs in a separate host process). It uses virtio-gpu execbuffers and shared memory to marshal ggml tensors and graphs across the ABI boundary. The buildsystem is extended to conditionally compile two new libraries (ggml-remotingfrontend and ggml-remotingbackend), and dozens of new source files implement the command dispatch, serialization of ggml_tensor/ggml_cgraph, remote buffer and device wrappers, and the virtgpu-based transport. Timing hooks were also added to measure Metal kernels and run.cpp token loops.

Class diagram for Remoting Frontend/Backend and Device Wrappers

classDiagram
    class GGMLRemotingFrontend {
        +send_tensor()
        +send_graph()
        +receive_result()
        +marshal_data()
    }
    class GGMLRemotingBackend {
        +receive_tensor()
        +receive_graph()
        +execute_graph()
        +return_result()
    }
    class VirtioGPUTransport {
        +send_buffer()
        +receive_buffer()
        +setup_shared_memory()
    }
    class GGMLDeviceWrapper {
        +wrap_device()
        +unwrap_device()
        +marshal_device_info()
    }
    GGMLRemotingFrontend --|> VirtioGPUTransport : uses
    GGMLRemotingBackend --|> VirtioGPUTransport : uses
    GGMLRemotingFrontend --|> GGMLDeviceWrapper : wraps
    GGMLRemotingBackend --|> GGMLDeviceWrapper : wraps

File-Level Changes

Change	Details	Files
Added profiling timers to Metal backend and CLI runner	Inserted start/stop timers in ggml-metal encode/compute paths Added atexit and signal handlers to print accumulated timings Instrumented run.cpp generate loop with timing calls	`ggml/src/ggml-metal/ggml-metal.m` `tools/run/run.cpp` `ggml-remotingfrontend/shared/apir_backend.h`
Extended build files to conditionally include remoting support	Added GGML_REMOTING_FRONTEND/BACKEND flags to Makefile and CMakeLists Appended remoting source objects in build targets when enabled Exposed public headers for remoting frontend	`Makefile` `ggml/CMakeLists.txt` `ggml/src/CMakeLists.txt`
Added ggml-remotingfrontend library	Implements ggml_backend_device and buffer interfaces over virtgpu RPC Defines remote_marshalling of ggml_tensor and ggml_cgraph Wraps Metal device context calls and buffer operations Registers frontend via ggml_backend_reg	`ggml/src/ggml-remotingfrontend` `ggml/include/ggml-remoting-frontend.h`
Added ggml-remotingbackend library	Loads host ggml backend as a shared library Dispatches RPC commands in virtgpu execbuffer handler Deserializes ggml_cgraph and tensors into host context Calls underlying backend’s graph_compute and metal device queries	`ggml/src/ggml-remotingbackend` `ggml/src/ggml-remotingbackend/shared` `ggml/src/ggml-remotingbackend/backend-dispatched.cpp`
Implemented a custom virtgpu-based transport	Created virtgpu helper to open DRM device and shared memory blobs Use venus command streams to encode/decode requests Poll for response via eventfd in shared memory Manage blob allocation and mapping logic	`ggml/src/ggml-remotingfrontend/virtgpu.cpp` `ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp` `ggml/src/ggml-remotingfrontend/virtgpu-utils.cpp`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2025-07-04T06:51:20Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This update introduces a comprehensive remoting backend and frontend for the GGML framework, including new CMake options, build scripts, and extensive C++ implementations for remote device, buffer, and graph management. Numerous serialization, deserialization, and communication utilities are added, along with shell scripts for build and run automation. Many GitHub Actions workflow and template files are removed, and logging is silenced across several model and vocab info functions.

Changes

File(s) / Path(s)	Change Summary
`.github/ISSUE_TEMPLATE/.yml`, `.github/labeler.yml`, `.github/pull_request_template.md`, `.github/workflows/`, `.github/actions/*`	Deleted all GitHub Actions workflows, custom actions, issue templates, labeler config, and PR template line.
`CMakePresets.json`, `Makefile`, `ggml/CMakeLists.txt`, `ggml/src/CMakeLists.txt`	Added new remoting backend/frontend build options and presets; updated build rules and header installation.
`build.backend.sh`, `prepare.backend.sh`, `prepare.remoting.sh`, `prepare.sh`, `prepare.vulkan.sh`, `podman_compile.sh`, `run.remoting.sh`, `run.sh`, `run.vulkan.sh`	Added new shell scripts for configuring, building, and running remoting and Vulkan backends/frontends.
`ggml/include/ggml-metal.h`	Added device context query API for Metal backend.
`ggml/include/ggml-remoting-frontend.h`	New header: remoting frontend backend registration API and macro.
`ggml/src/ggml-backend-reg.cpp`	Integrated conditional registration and loading of remoting frontend backend.
`ggml/src/ggml-metal/ggml-metal.m`	Added timing instrumentation and public API for device context querying.
`ggml/src/ggml-remotingbackend/` (multiple files)	New backend: CMake config, device/buffer/graph dispatchers, serialization, logging, error codes, and utility headers.
`ggml/src/ggml-remotingfrontend/` (multiple files, including `include/`)	New frontend: device, buffer, host buffer, Metal support, remoting backend, serialization, virtgpu interface, shared memory, and utility code.
`src/llama-context.cpp`, `src/llama-kv-cache-unified.cpp`, `src/llama-model-loader.cpp`, `src/llama-model.cpp`, `src/llama-vocab.cpp`	Disabled or commented out logging and info printouts in constructors and info methods.
`src/llama-kv-cache.cpp`	Added unified key-value cache class for LLaMA, including serialization, defragmentation, and view support.
`tools/run/run.cpp`	Added timing instrumentation to token decoding loop and reporting at program exit.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ShellScript
    participant CMake
    participant Backend
    participant Frontend
    participant VirtGPU
    participant RemoteGPU

    User->>ShellScript: Run build or inference script
    ShellScript->>CMake: Configure build (remoting/vulkan/backend)
    CMake->>Backend: Build remoting backend/frontend
    User->>Frontend: Start inference (e.g., llama-run)
    Frontend->>VirtGPU: Prepare/encode computation graph
    VirtGPU->>RemoteGPU: Send serialized graph via shared memory
    RemoteGPU->>VirtGPU: Compute and return status/result
    VirtGPU->>Frontend: Decode and process result
    Frontend->>User: Output inference result and timing

Poem

🐇
A flurry of scripts, a backend anew,
Remoting and buffers, Metal shines through.
Serialization dances, graphs leap and bound,
VirtGPU whispers, in bytes they are found.
Logging grows silent, but timing is bright—
The rabbit builds onward, from morning to night!
🛠️🌈

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch main

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kpouget · 2025-07-04T06:51:54Z

@coderabbitai review

coderabbitai · 2025-07-04T06:52:00Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 91

🔭 Outside diff range comments (5)

run.sh (1)
1-2: Add a proper shebang (and fail-early flags) so the script can be executed directly

Without a shebang the kernel cannot pick an interpreter, so ./run.sh will exit with “Exec format error”.
Add a portable shebang and fail-fast flags:
+#!/usr/bin/env bash
+set -euo pipefail
+
 ./build/bin/llama-run --ngl 999 --verbose ~/models/llama3.2 "say nothing"
prepare.sh (1)
1-2: Missing shebang prevents direct execution

Same issue as run.sh: the script is not runnable with ./prepare.sh.
Recommend:
+#!/usr/bin/env bash
+set -euo pipefail
+
 cmake -S . -B ./build -DGGML_VULKAN=ON -DGGML_NATIVE=OFF -DGGML_METAL=OFF #-DCMAKE_BUILD_TYPE=Debug #-DGGML_VULKAN_DEBUG=1
prepare.vulkan.sh (1)
1-7: Add shebang & safety flags for consistency with other helper scripts
+#!/usr/bin/env bash
+set -euo pipefail
+
 cmake -S . \
       -B ../build.vulkan \
       -DGGML_VULKAN=ON \
       -DGGML_NATIVE=OFF \
       -DGGML_METAL=OFF \
       -DCMAKE_BUILD_TYPE=Debug
src/llama-kv-cache-unified.cpp (1)
74-86: dev_name is now unused — will trigger -Wunused-variable and break -Werror builds
With the debug log line commented-out, dev_name is never read. Many toolchains compile ggml/llama.cpp with -Wall -Werror; the unused variable will therefore fail the build.
-        const char * dev_name = "CPU";
+        const char * dev_name = "CPU";
@@
-        //LLAMA_LOG_DEBUG("%s: layer %3d: dev = %s\n", __func__, il, dev_name);
+#if defined(LLAMA_LOG_DEBUG) && !defined(NDEBUG)
+        LLAMA_LOG_DEBUG("%s: layer %3d: dev = %s\n", __func__, il, dev_name);
+#else
+        (void)dev_name;               // silence unused-var in release builds
+#endif
This keeps the diagnostic available for debug builds while guaranteeing a clean release compile.
podman_compile.sh (1)
9-40: Validate podman availability before execution.

The script doesn't check if podman is installed and available.

Add validation at the beginning of the script:
if ! command -v podman &> /dev/null; then
    echo "Error: podman is not installed or not in PATH"
    exit 1
fi

🧹 Nitpick comments (43)

run.vulkan.sh (2)
1-1: Tighten the shebang

#! /bin/bash has an extra space and hard-codes /bin/bash. Prefer env lookup:
-#! /bin/bash
+#!/usr/bin/env bash
2-8: Fail fast and quote prefix variable

Consider aborting on any error and treating unset vars as fatal:
+set -euo pipefail
+
 if [[ ${1:-} == "strace" ]]; then
     prefix="strace"
 elif [[ ${1:-} == "gdb" ]]; then
     prefix="gdb --args"
 else
     prefix=""
 fi
ggml/CMakeLists.txt (1)

276-276: Header installed unconditionally, regardless of option state
ggml-remoting-frontend.h is exported even when GGML_REMOTING_FRONTEND=OFF.
Down-stream consumers will include a header whose implementation support is absent, leading to link-time errors.

Guard the public header list behind the same option or always build the thin stub implementation so symbol availability matches install surface.
src/llama-model-loader.cpp (3)
679-697: Redundant work after disabling KV-dump logging

The for (int i = 0; i < n_kv; i++) loop still does string formatting (gguf_kv_to_str, replace_all, etc.) even though every LLAMA_LOG_INFO call inside it is commented out. For large models this can be a noticeable cost at load time while providing zero benefit.
-        for (int i = 0; i < n_kv; i++) {
-            ...
-            //LLAMA_LOG_INFO(...);
-        }
+        /* KV dump disabled */
+        if (trace > 1) {        // or a new `LLAMA_VERBOSE_META` flag
+            for (int i = 0; i < n_kv; i++) {
+                ...
+                LLAMA_LOG_INFO(...);
+            }
+        }
Guarding (or #ifdef-ing) the whole loop avoids useless allocations and string ops when logging is off.

700-706: Same issue for type-count loop

The loop that walks n_type now only computes and discards its body. Wrap it behind the same flag or remove it entirely to avoid wasted cycles.

1156-1164: Early return leaves dead, compiled-in code

print_info() now immediately returns, leaving ~8 lines of unreachable logging code. Most compilers will warn; it also confuses readers.

Options:

Delete the unreachable block.

#if 0 it.

Add a runtime flag (if (!verbose) return;).

Example:
-void llama_model_loader::print_info() const {
-    return;
+#ifndef LLAMA_QUIET_LOAD
+void llama_model_loader::print_info() const {
 #endif
Pick whichever aligns with the new logging policy, but avoid shipping dead code.
src/llama-model.cpp (1)
1563-1568: Consider using conditional logging instead of commenting out debug statements.

While reducing noise during model loading aligns with the PR objectives, completely commenting out debug logs can hinder troubleshooting. The device assignment and SWA status information could be valuable for debugging tensor loading issues.

Consider implementing a more flexible logging approach:

Use environment variables or build-time flags to control debug logging

Implement log level filtering at runtime

Or remove the commented code entirely if it's no longer needed
-            //LLAMA_LOG_DEBUG("load_tensors: layer %3d assigned to device %s, is_swa = %d\n", il, ggml_backend_dev_name(cpu_dev), is_swa);
+            LLAMA_LOG_TRACE("load_tensors: layer %3d assigned to device %s, is_swa = %d\n", il, ggml_backend_dev_name(cpu_dev), is_swa);
-        //LLAMA_LOG_DEBUG("load_tensors: layer %3d assigned to device %s, is_swa = %d\n", il, ggml_backend_dev_name(dev), is_swa);
+        LLAMA_LOG_TRACE("load_tensors: layer %3d assigned to device %s, is_swa = %d\n", il, ggml_backend_dev_name(dev), is_swa);
prepare.backend.sh (2)

7-7: Avoid hard-coded macOS-specific Vulkan paths

/opt/homebrew/... is Apple Silicon-specific and will fail on Intel macOS, Linux and CI runners.
Prefer passing these via environment variables (e.g. VULKAN_SDK) or CMake cache options so callers can override.

10-11: Consider exposing the Debug build flag through an option instead of leaving it commented

Leaving a commented flag invites drift. Offer a boolean CLI switch (e.g. --debug) or document usage in README to toggle -DCMAKE_BUILD_TYPE=Debug.
ggml/include/ggml-metal.h (1)
64-67: Good API addition for device capability querying.

The function signature is well-designed and follows existing conventions. The boolean output parameters clearly indicate the device capabilities being queried.

Consider adding documentation comments to explain what each capability flag represents:
+// Query Metal device capabilities
+// has_simdgroup_mm: whether device supports SIMD group matrix multiplication
+// has_simdgroup_reduction: whether device supports SIMD group reduction operations
+// use_bfloat: whether device should use bfloat16 precision
 GGML_BACKEND_API void ggml_backend_metal_get_device_context(ggml_backend_dev_t dev,
                                                             bool *has_simdgroup_mm,
                                                             bool *has_simdgroup_reduction,
                                                             bool *use_bfloat);
ggml/src/ggml-remotingfrontend/CMakeLists.txt (1)
28-28: Add error handling for missing drm library dependency.

Consider adding a check to ensure the drm library is available before linking:
# dnf install -y libdrm-devel
+find_library(DRM_LIBRARY drm REQUIRED)
+if(NOT DRM_LIBRARY)
+    message(FATAL_ERROR "libdrm not found. Install libdrm-devel package.")
+endif()
target_link_libraries(ggml-remotingfrontend PUBLIC drm)
ggml/src/ggml-remotingfrontend/virtgpu-forward-metal.cpp (1)
3-19: Consider more meaningful return value and input validation.

The function always returns true regardless of the remote call outcome. Consider returning the actual success status and adding input validation:
 bool
 apir_metal_get_device_context(struct virtgpu *gpu, struct ggml_backend_metal_device_context *metal_dev_ctx) {
+  if (!gpu || !metal_dev_ctx) {
+    return false;
+  }
+
   struct vn_cs_encoder *encoder;
   struct vn_cs_decoder *decoder;

   REMOTE_CALL_PREPARE(gpu, encoder, APIR_COMMAND_TYPE_METAL_GET_DEVICE_CONTEXT);

   REMOTE_CALL(gpu, encoder, decoder);

   vn_decode_bool_t(decoder, &metal_dev_ctx->has_simdgroup_mm);
   vn_decode_bool_t(decoder, &metal_dev_ctx->has_simdgroup_reduction);
   vn_decode_bool_t(decoder, &metal_dev_ctx->use_bfloat);

   REMOTE_CALL_FINISH(gpu, encoder, decoder);

   return true;
 }
Note: The macros appear to handle errors via FATAL, so the current approach may be intentional.
ggml/src/ggml-remotingfrontend/virtgpu-forward-impl.h (2)
7-8: Remove unused CACHED macro and fix comment typo.

The CACHED macro appears to be unused, and there's a typo in the comment:
-#define CACHED
-//  printf("INFO: ### found response in the cache %s\n", __func__)o
Consider removing unused code to keep the header clean.

15-16: Consider defensive check for encoder variable name.

The macro uses encoder in the conditional check but takes encoder_name as parameter. While this works with current usage, consider making it more robust:
-    if (!encoder) {							\
+    if (!encoder_name) {						\
This makes the macro more self-contained and less dependent on variable naming conventions.
ggml/src/ggml-remotingfrontend/ggml-metal-remoting.h (1)
7-16: Add documentation for public API.

Consider adding documentation comments to explain the purpose and usage of the struct and functions.
+/**
+ * Metal device context containing device capabilities
+ */
 struct ggml_backend_metal_device_context {
   bool has_simdgroup_mm;
   bool has_simdgroup_reduction;
   bool use_bfloat;
 };


+/**
+ * Get the Metal device context for a given backend device
+ * @param dev The backend device
+ * @return Pointer to the Metal device context, or NULL if not available
+ */
 const struct ggml_backend_metal_device_context *get_metal_dev_context(const ggml_backend_dev_t dev);

+/**
+ * Check if a Metal device context supports a specific tensor operation
+ * @param ctx_dev The Metal device context
+ * @param op The tensor operation to check
+ * @return true if the operation is supported, false otherwise
+ */
 bool ggml_metal_supports_op(const struct ggml_backend_metal_device_context * ctx_dev, const struct ggml_tensor * op);
ggml/src/ggml-remotingbackend/backend-dispatched.cpp (2)
11-17: Consider encapsulating global state.

Global variables should be encapsulated to prevent unintended access and improve maintainability.

Consider wrapping these globals in a namespace or singleton class:
-ggml_backend_reg_t reg = NULL;
-ggml_backend_dev_t dev = NULL;
-ggml_backend_t bck = NULL;
-
-long long timer_start = 0;
-long long timer_total = 0;
-long long timer_count = 0;
+namespace backend_state {
+    ggml_backend_reg_t reg = NULL;
+    ggml_backend_dev_t dev = NULL;
+    ggml_backend_t bck = NULL;
+    
+    long long timer_start = 0;
+    long long timer_total = 0;
+    long long timer_count = 0;
+}
44-44: Use INFO level for informational messages.

Free memory information is not a warning condition.
-  WARNING("%s: free memory: %ld MB\n", __func__, (size_t) free/1024/1024);
+  INFO("%s: free memory: %ld MB\n", __func__, (size_t) free/1024/1024);
ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp (1)
82-82: Define magic number as a constant.

The magic number 16384 should be defined as a named constant for clarity.

Add at the top of the file:
+// Minimum alignment for shared memory allocations
+#define VIRTGPU_SHMEM_MIN_ALIGN 16384
+
 static uint32_t
 virtgpu_ioctl_resource_create_blob(struct virtgpu *gpu,
Then use it:
-   size = align64(size, 16384);
+   size = align64(size, VIRTGPU_SHMEM_MIN_ALIGN);
ggml/src/ggml-remotingfrontend/virtgpu-forward-backend.cpp (2)
3-7: Remove unused function or fix its implementation.

The function current_time_ms is not used anywhere and has issues:

Returns nanoseconds despite the _ms suffix

Comment suggests using CLOCK_MONOTONIC but uses CLOCK_REALTIME

Either remove the function entirely or fix it if it will be used:
-static long long current_time_ms() {
-  struct timespec ts;
-  clock_gettime(CLOCK_REALTIME, &ts);  // Use CLOCK_MONOTONIC for elapsed time
-  return (long long)ts.tv_sec * 1000000000LL + ts.tv_nsec;
-}
42-44: Consider using a more appropriate default status.

GGML_STATUS_ABORTED might not be the best default if decoding fails. Consider using a status that indicates a communication/decoding failure.
-  ggml_status status = GGML_STATUS_ABORTED;
+  ggml_status status = GGML_STATUS_FAILED; // or another appropriate error status
build.backend.sh (1)
30-36: Use more idiomatic error handling.

Instead of checking $?, use the command directly in the if statement.
-cmake --build ../build.remoting-backend$FLAVOR --parallel 8 --target $TARGETS "$@"
-
-if [[ $? == 0 ]]; then
+if cmake --build ../build.remoting-backend$FLAVOR --parallel 8 --target $TARGETS "$@"; then
     touch READY_backend
 else
     touch FAILED_backend
 fi
run.remoting.sh (1)
42-42: Fix grammar in the prompt.

The prompt has a grammatical error.
-    PROMPT="tell what's Apple metal API"
+    PROMPT="tell me what Apple Metal API is"
ggml/src/ggml-metal/ggml-metal.m (1)
11-12: Consider using a compile-time flag to control debug logging.

Instead of unconditionally disabling debug logging, consider using a compile-time flag to allow users to enable debug logs when needed for troubleshooting.
+#ifdef GGML_METAL_DISABLE_DEBUG_LOGS
 #undef GGML_LOG_DEBUG
 #define GGML_LOG_DEBUG(...)
+#endif
tools/run/run.cpp (1)
991-996: Consider making timer functions static instead of inline

These inline functions are only used within this translation unit. Making them static instead of inline would be more appropriate and let the compiler decide on inlining.
-static inline void start_timer(void) {
+static void start_timer(void) {
   struct timespec ts;
-  clock_gettime(CLOCK_REALTIME, &ts);  // Use CLOCK_MONOTONIC for elapsed time
+  clock_gettime(CLOCK_MONOTONIC, &ts);
   timer_start = (long long)ts.tv_sec * 1000000000LL + ts.tv_nsec;
 }

-static inline void stop_timer(void) {
+static void stop_timer(void) {
   struct timespec ts;
-  clock_gettime(CLOCK_REALTIME, &ts);  // Use CLOCK_MONOTONIC for elapsed time
+  clock_gettime(CLOCK_MONOTONIC, &ts);
Also applies to: 997-1005
ggml/src/ggml-remotingbackend/backend-dispatched-backend.cpp (1)

13-57: Consider returning meaningful error codes

The function always returns 0 regardless of success or failure paths. Consider using the return value to indicate different error conditions.
ggml/src/ggml-remotingfrontend/ggml-backend-buffer.cpp (2)
44-54: Remove commented debug code

Please remove the commented-out debug code to keep the codebase clean.
   struct virtgpu *gpu = BUFFER_TO_GPU(buffer);
-#if 0
-  INFO("%s: data=%p, offset=%lu, size=%lu\n", __func__, data, offset, size);
-#endif
-#if 0
-  void **addr = (void **)(uintptr_t)data;
-  for (int i = 0; i <= 10; i++) {
-    INFO("%s: %p | %llx", __func__, addr, *addr);
-    addr++;
-  }
-  INFO("\n");
-#endif
   struct ggml_backend_remoting_buffer_context *context = BUFFER_TO_GGML_CONTEXT(buffer);
15-23: Add null check for context->base assignment

The function assigns to context->base without checking if the assignment was successful.
   if (context->base) {
     return context->base;
   }
 
   context->base = apir_buffer_get_base(BUFFER_TO_GPU(buffer),
 				       BUFFER_TO_APIR_CONTEXT(buffer));
+  if (!context->base) {
+    ERROR("Failed to get buffer base address");
+  }
 
   return context->base;
ggml/src/ggml-remotingfrontend/ggml-backend-reg.cpp (1)
74-81: Device description buffer is not properly used.

The desc buffer is created but only contains static text. It should be populated with device-specific information.

Consider populating the description with device details:
-  char desc[256] = "API Remoting device";
+  char desc[256];
+  snprintf(desc, sizeof(desc), "API Remoting device %d", i);
ggml/src/ggml-remotingfrontend/virtgpu-utils.cpp (3)
19-20: Define HAVE___BUILTIN_CLZLL for compiler optimization.

The macro HAVE___BUILTIN_CLZLL is not defined anywhere, causing the function to always use the slower fallback implementation instead of the built-in __builtin_clzll.

Consider using compiler detection:
-#if defined(HAVE___BUILTIN_CLZLL)
+#if defined(__GNUC__) || defined(__clang__)
    return ((sizeof(uint64_t) * 8 - 1) - __builtin_clzll(n | 1));
65-79: Consider iterative cleanup to avoid stack overflow.

The recursive cleanup in _util_sparse_array_node_finish could cause stack overflow for deeply nested sparse arrays.

Consider using an iterative approach with a stack/queue for large data structures to avoid potential stack overflow issues.

188-195: Fix typo and consider removing debug code.

The function contains a typo ("happilly" should be "happily") and uses a global variable hack to avoid compiler warnings.
-  INFO("thks bye, stopping early and happilly :)");
+  INFO("thks bye, stopping early and happily :)");
Consider removing these debug functions or wrapping them in debug-only compilation flags.
ggml/src/ggml-remotingfrontend/virtgpu.cpp (1)
81-82: Make shared memory sizes configurable.

The shared memory sizes are hardcoded, which may not be suitable for all use cases.

Consider making these sizes configurable through environment variables or initialization parameters:
-  gpu->reply_shmem = virtgpu_shmem_create(gpu, 0x4000);
-  gpu->data_shmem = virtgpu_shmem_create(gpu, 0x1830000); // 24MiB
+  size_t reply_size = getenv("VIRTGPU_REPLY_SIZE") ? 
+                      atol(getenv("VIRTGPU_REPLY_SIZE")) : 0x4000;
+  size_t data_size = getenv("VIRTGPU_DATA_SIZE") ? 
+                     atol(getenv("VIRTGPU_DATA_SIZE")) : 0x1830000;
+  gpu->reply_shmem = virtgpu_shmem_create(gpu, reply_size);
+  gpu->data_shmem = virtgpu_shmem_create(gpu, data_size);
ggml/src/ggml-remotingfrontend/virtgpu.h (1)
1-1: Use traditional include guards instead of #pragma once.

While #pragma once is widely supported, traditional include guards are more portable and compatible with all compilers.
-#pragma once
+#ifndef GGML_VIRTGPU_H
+#define GGML_VIRTGPU_H
And add at the end of the file:
#endif // GGML_VIRTGPU_H
ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer-type.cpp (1)
44-44: Consider making INFO logging configurable.

Multiple functions contain INFO logging that may be too verbose for production use. Consider using a debug flag or log level to control these messages.
-  INFO("%s: Forward BUFT ALIGNMENT --> %zu ", __func__, alignment);
+  // INFO("%s: Forward BUFT ALIGNMENT --> %zu ", __func__, alignment);
Also applies to: 65-65, 86-86, 99-99
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (1)

48-69: Consider moving inline function implementations to a source file.

The WARNING and FATAL inline functions in the header may cause code bloat if included in many translation units. Consider moving the implementations to a source file.
ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer.cpp (1)
62-62: Remove unnecessary return statement.

The explicit return; at the end of a void function is unnecessary.
-  return;
ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp (1)
54-74: Consider caching device description for consistency.

The device description function doesn't cache its result, unlike apir_device_get_name and apir_device_get_type. This inconsistency could lead to unnecessary repeated remote calls.
 const char *
 apir_device_get_description(struct virtgpu *gpu) {
+  static char *string = nullptr;
+  if (string) {
+    CACHED;
+    return string;
+  }
   struct vn_cs_encoder *encoder;
   struct vn_cs_decoder *decoder;

   REMOTE_CALL_PREPARE(gpu, encoder, APIR_COMMAND_TYPE_DEVICE_GET_DESCRIPTION);

   REMOTE_CALL(gpu, encoder, decoder);

   const size_t string_size = vn_decode_array_size_unchecked(decoder);
-  char *string = (char *) vn_cs_decoder_alloc_array(decoder, sizeof(char), string_size);
+  string = (char *) vn_cs_decoder_alloc_array(decoder, sizeof(char), string_size);
   if (!string) {
     FATAL("%s: Could not allocate the device description buffer", __func__);
   }
   vn_decode_char_array(decoder, string, string_size);

   INFO("%s: Forward DEV DESCR --> %s", __func__, string);

   REMOTE_CALL_FINISH(gpu, encoder, decoder);

   return string;
 }
ggml/src/ggml-remotingbackend/backend-dispatched-buffer.cpp (2)
34-36: Simplify tensor pointer casting.

The double cast through uintptr_t is unnecessary and the comment about removing const is confusing since the result is assigned to a non-const pointer.
   ggml_tensor *tensor;
-  // safe to remove the const qualifier here
-  tensor = (ggml_tensor *) (uintptr_t) vn_decode_ggml_tensor(dec);
+  tensor = const_cast<ggml_tensor *>(vn_decode_ggml_tensor(dec));
98-98: Fix inconsistent indentation.
   void *shmem_data = ctx->iface.get_shmem_ptr(ctx->virgl_ctx, shmem_res_id);
-    if (!shmem_data) {
+  if (!shmem_data) {
     FATAL("Couldn't get the shmem addr from virgl :/");
   }
ggml/src/ggml-remotingbackend/shared/apir_backend.h (1)
124-124: Fix error message formatting.

The error message has a typo at the end.
-  return "Unknown APIR_BACKEND_INITIALIZE error:/";
+  return "Unknown APIR_BACKEND_INITIALIZE error";
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.h (1)
6-22: Consider reordering struct fields for natural alignment

The struct uses explicit padding, but you could minimize padding requirements by grouping fields by size. Consider ordering fields from largest to smallest alignment requirements (uint64_t fields first, then uint32_t/int32_t, then chars).
 struct rpc_tensor {
+  // 8-byte aligned fields
   uint64_t id;
-  uint32_t type;
   uint64_t buffer;
-  uint32_t ne[GGML_MAX_DIMS];
-  uint32_t nb[GGML_MAX_DIMS];
-  uint32_t op;
-  int32_t  op_params[GGML_MAX_OP_PARAMS / sizeof(int32_t)];
-  int32_t  flags;
   uint64_t src[GGML_MAX_SRC];
   uint64_t view_src;
   uint64_t view_offs;
   uint64_t data;
+  
+  // 4-byte aligned fields
+  uint32_t type;
+  uint32_t ne[GGML_MAX_DIMS];
+  uint32_t nb[GGML_MAX_DIMS];
+  uint32_t op;
+  int32_t  op_params[GGML_MAX_OP_PARAMS / sizeof(int32_t)];
+  int32_t  flags;
+  
+  // 1-byte aligned fields
   char name[GGML_MAX_NAME];
-
-  char padding[4];
 };
ggml/src/ggml-remotingfrontend/ggml-remoting.h (1)
40-82: Consider using a proper logging framework

The debug/logging macros use printf directly. Consider using a logging framework or at least routing through a centralized logging function for better control over log levels and destinations.

Example approach:
// Define log levels
enum LogLevel { LOG_DEBUG, LOG_INFO, LOG_WARN, LOG_ERROR };

// Centralized logging function
void ggml_remoting_log(LogLevel level, const char* fmt, ...);

// Update macros to use centralized logging
#define NOT_IMPLEMENTED \
  ggml_remoting_log(LOG_WARN, "reached unimplemented function %s", __func__)
src/llama-kv-cache.cpp (1)
89-92: Consider using conditional logging instead of commented code.

The debug logging is commented out. Consider using a debug flag or log level to control this output.
-	/*
-        LLAMA_LOG_DEBUG("%s: layer %3d: n_embd_k_gqa = %d, n_embd_v_gqa = %d, dev = %s\n", __func__,
-                i, n_embd_k_gqa, n_embd_v_gqa, dev_name);
-	*/
+        if (LLAMA_LOG_LEVEL >= LLAMA_LOG_LEVEL_DEBUG) {
+            LLAMA_LOG_DEBUG("%s: layer %3d: n_embd_k_gqa = %d, n_embd_v_gqa = %d, dev = %s\n", __func__,
+                    i, n_embd_k_gqa, n_embd_v_gqa, dev_name);
+        }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d67341d and 7748ae1.

📒 Files selected for processing (98)

.github/ISSUE_TEMPLATE/010-bug-compilation.yml (0 hunks)
.github/ISSUE_TEMPLATE/011-bug-results.yml (0 hunks)
.github/ISSUE_TEMPLATE/019-bug-misc.yml (0 hunks)
.github/ISSUE_TEMPLATE/020-enhancement.yml (0 hunks)
.github/ISSUE_TEMPLATE/030-research.yml (0 hunks)
.github/ISSUE_TEMPLATE/040-refactor.yml (0 hunks)
.github/ISSUE_TEMPLATE/config.yml (0 hunks)
.github/actions/get-tag-name/action.yml (0 hunks)
.github/actions/windows-setup-cuda/action.yml (0 hunks)
.github/actions/windows-setup-curl/action.yml (0 hunks)
.github/labeler.yml (0 hunks)
.github/pull_request_template.md (0 hunks)
.github/workflows/bench.yml.disabled (0 hunks)
.github/workflows/build-linux-cross.yml (0 hunks)
.github/workflows/build.yml (0 hunks)
.github/workflows/close-issue.yml (0 hunks)
.github/workflows/docker.yml (0 hunks)
.github/workflows/editorconfig.yml (0 hunks)
.github/workflows/gguf-publish.yml (0 hunks)
.github/workflows/labeler.yml (0 hunks)
.github/workflows/python-check-requirements.yml (0 hunks)
.github/workflows/python-lint.yml (0 hunks)
.github/workflows/python-type-check.yml (0 hunks)
.github/workflows/release.yml (0 hunks)
.github/workflows/server.yml (0 hunks)
.github/workflows/winget.yml (0 hunks)
CMakePresets.json (1 hunks)
Makefile (2 hunks)
build-xcframework.sh (0 hunks)
build.backend.sh (1 hunks)
ggml/CMakeLists.txt (2 hunks)
ggml/include/ggml-metal.h (1 hunks)
ggml/include/ggml-remoting-frontend.h (1 hunks)
ggml/src/CMakeLists.txt (1 hunks)
ggml/src/ggml-backend-reg.cpp (3 hunks)
ggml/src/ggml-metal/ggml-metal.m (4 hunks)
ggml/src/ggml-remotingbackend/CMakeLists.txt (1 hunks)
ggml/src/ggml-remotingbackend/backend-convert.h (1 hunks)
ggml/src/ggml-remotingbackend/backend-dispatched-backend.cpp (1 hunks)
ggml/src/ggml-remotingbackend/backend-dispatched-buffer-type.cpp (1 hunks)
ggml/src/ggml-remotingbackend/backend-dispatched-buffer.cpp (1 hunks)
ggml/src/ggml-remotingbackend/backend-dispatched-device.cpp (1 hunks)
ggml/src/ggml-remotingbackend/backend-dispatched-metal.cpp (1 hunks)
ggml/src/ggml-remotingbackend/backend-dispatched.cpp (1 hunks)
ggml/src/ggml-remotingbackend/backend-dispatched.h (1 hunks)
ggml/src/ggml-remotingbackend/backend-internal.h (1 hunks)
ggml/src/ggml-remotingbackend/backend-utils.h (1 hunks)
ggml/src/ggml-remotingbackend/backend.cpp (1 hunks)
ggml/src/ggml-remotingbackend/shared/api_remoting.h (1 hunks)
ggml/src/ggml-remotingbackend/shared/apir_backend.h (1 hunks)
ggml/src/ggml-remotingbackend/shared/venus_cs.h (1 hunks)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.cpp (1 hunks)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.h (1 hunks)
ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (1 hunks)
ggml/src/ggml-remotingbackend/venus_cs_ggml-rpc-back.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/CMakeLists.txt (1 hunks)
ggml/src/ggml-remotingfrontend/ggml-backend-buffer-type.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/ggml-backend-buffer.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/ggml-backend-device.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/ggml-backend-host-buffer-type.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/ggml-backend-reg.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/ggml-backend.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/ggml-metal-remoting.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/ggml-metal-remoting.h (1 hunks)
ggml/src/ggml-remotingfrontend/ggml-remoting-frontend.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/ggml-remoting.h (1 hunks)
ggml/src/ggml-remotingfrontend/include/drm-uapi/drm.h (1 hunks)
ggml/src/ggml-remotingfrontend/include/drm-uapi/virtgpu_drm.h (1 hunks)
ggml/src/ggml-remotingfrontend/include/venus_hw.h (1 hunks)
ggml/src/ggml-remotingfrontend/venus_cs_ggml-rpc-front.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/virtgpu-forward-backend.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer-type.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/virtgpu-forward-impl.h (1 hunks)
ggml/src/ggml-remotingfrontend/virtgpu-forward-metal.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/virtgpu-forward.h (1 hunks)
ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/virtgpu-shm.h (1 hunks)
ggml/src/ggml-remotingfrontend/virtgpu-utils.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/virtgpu-utils.h (1 hunks)
ggml/src/ggml-remotingfrontend/virtgpu.cpp (1 hunks)
ggml/src/ggml-remotingfrontend/virtgpu.h (1 hunks)
podman_compile.sh (1 hunks)
prepare.backend.sh (1 hunks)
prepare.remoting.sh (1 hunks)
prepare.sh (1 hunks)
prepare.vulkan.sh (1 hunks)
run.remoting.sh (1 hunks)
run.sh (1 hunks)
run.vulkan.sh (1 hunks)
src/llama-context.cpp (2 hunks)
src/llama-kv-cache-unified.cpp (1 hunks)
src/llama-kv-cache.cpp (1 hunks)
src/llama-model-loader.cpp (4 hunks)
src/llama-model.cpp (2 hunks)
src/llama-vocab.cpp (3 hunks)
tools/run/run.cpp (3 hunks)

💤 Files with no reviewable changes (27)

.github/ISSUE_TEMPLATE/config.yml
.github/pull_request_template.md
.github/ISSUE_TEMPLATE/011-bug-results.yml
.github/workflows/editorconfig.yml
.github/ISSUE_TEMPLATE/010-bug-compilation.yml
.github/ISSUE_TEMPLATE/019-bug-misc.yml
.github/workflows/python-lint.yml
.github/workflows/labeler.yml
.github/ISSUE_TEMPLATE/040-refactor.yml
.github/workflows/python-check-requirements.yml
.github/ISSUE_TEMPLATE/020-enhancement.yml
.github/workflows/winget.yml
.github/workflows/python-type-check.yml
.github/actions/get-tag-name/action.yml
.github/workflows/gguf-publish.yml
.github/workflows/close-issue.yml
.github/actions/windows-setup-curl/action.yml
.github/labeler.yml
.github/ISSUE_TEMPLATE/030-research.yml
.github/workflows/build-linux-cross.yml
.github/actions/windows-setup-cuda/action.yml
.github/workflows/bench.yml.disabled
.github/workflows/docker.yml
.github/workflows/server.yml
.github/workflows/build.yml
build-xcframework.sh
.github/workflows/release.yml

🧰 Additional context used

🧬 Code Graph Analysis (16)

ggml/include/ggml-remoting-frontend.h (1)

ggml/src/ggml-remotingfrontend/ggml-backend-reg.cpp (2)

ggml_backend_remoting_frontend_reg (126-153)

ggml_backend_remoting_frontend_reg (126-126)

ggml/src/ggml-backend-reg.cpp (2)

ggml/src/ggml-remotingfrontend/ggml-backend-reg.cpp (2)

ggml_backend_remoting_frontend_reg (126-153)

ggml_backend_remoting_frontend_reg (126-126)

ggml/include/ggml-remoting-frontend.h (1)

ggml_backend_remoting_frontend_reg (12-12)

src/llama-model.cpp (3)

src/llama-hparams.h (8)

il (171-171)

il (173-173)

il (175-175)

il (177-177)

il (180-180)

il (183-183)

il (193-193)

il (195-195)

src/llama-model.h (2)

il (396-396)

il (399-399)

ggml/src/ggml-backend-impl.h (15)

dev (139-139)

dev (142-142)

dev (145-145)

dev (148-148)

dev (151-151)

dev (154-154)

dev (157-157)

dev (160-160)

dev (163-163)

dev (166-166)

dev (169-169)

dev (173-173)

dev (176-176)

dev (177-177)

dev (178-178)

ggml/src/ggml-remotingbackend/backend-dispatched-metal.cpp (3)

ggml/src/ggml-remotingbackend/backend-internal.h (1)

ggml_backend_metal_get_device_context_fct (31-34)

ggml/src/ggml-remotingbackend/backend-dispatched.h (1)

enc (16-16)

ggml/src/ggml-remotingbackend/shared/apir_backend.h (1)

ctx (67-67)

ggml/src/ggml-remotingfrontend/virtgpu-forward-backend.cpp (3)

ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (8)

vn_serialize_ggml_cgraph (140-145)

vn_serialize_ggml_cgraph (141-141)

vn_encode_virtgpu_shmem_res_id (128-131)

vn_encode_virtgpu_shmem_res_id (129-129)

vn_encode_cgraph_data (147-152)

vn_encode_cgraph_data (148-148)

vn_decode_ggml_status (121-124)

vn_decode_ggml_status (122-122)

ggml/src/ggml-remotingfrontend/virtgpu-utils.h (4)

WARNING (48-57)

WARNING (49-49)

FATAL (59-69)

FATAL (60-60)

ggml/src/ggml-remotingbackend/shared/venus_cs.h (4)

vn_encode_size_t (387-392)

vn_encode_size_t (388-388)

vn_cs_new_encoder (37-46)

vn_cs_new_encoder (38-38)

ggml/src/ggml-remotingbackend/backend-dispatched-buffer-type.cpp (4)

ggml/src/ggml-remotingbackend/backend-dispatched.h (1)

enc (16-16)

ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (4)

vn_decode_ggml_buffer_type (75-82)

vn_decode_ggml_buffer_type (76-76)

vn_encode_ggml_buffer (98-102)

vn_encode_ggml_buffer (99-99)

ggml/src/ggml-remotingbackend/shared/venus_cs.h (10)

vn_encode_array_size (274-278)

vn_encode_array_size (275-275)

vn_encode_char_array (459-464)

vn_encode_char_array (460-460)

vn_encode_size_t (387-392)

vn_encode_size_t (388-388)

vn_encode_bool_t (502-506)

vn_encode_bool_t (503-503)

vn_decode_size_t (394-400)

vn_decode_size_t (395-395)

ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.cpp (2)

track_backend_buffer (12-15)

track_backend_buffer (13-13)

ggml/src/ggml-remotingbackend/backend-dispatched-backend.cpp (5)

ggml/src/ggml-remotingbackend/backend-dispatched.h (1)

enc (16-16)

ggml/src/ggml-remotingbackend/shared/apir_backend.h (5)

ctx (67-67)

start_timer (87-91)

start_timer (87-87)

stop_timer (93-100)

stop_timer (93-93)

ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (6)

vn_decode_virtgpu_shmem_res_id (133-136)

vn_decode_virtgpu_shmem_res_id (134-134)

vn_decode_ggml_cgraph (154-167)

vn_decode_ggml_cgraph (155-155)

vn_encode_ggml_status (116-119)

vn_encode_ggml_status (117-117)

ggml/src/ggml-remotingfrontend/virtgpu-utils.h (2)

FATAL (59-69)

FATAL (60-60)

ggml/src/ggml-remotingbackend/shared/venus_cs.h (4)

vn_decode_size_t (394-400)

vn_decode_size_t (395-395)

vn_cs_new_decoder (27-35)

vn_cs_new_decoder (28-28)

ggml/src/ggml-remotingfrontend/ggml-metal-remoting.cpp (2)

ggml/src/ggml-remotingfrontend/virtgpu-forward-metal.cpp (2)

apir_metal_get_device_context (3-19)

apir_metal_get_device_context (4-4)

ggml/src/ggml-remotingfrontend/virtgpu-forward.h (1)

apir_metal_get_device_context (50-50)

ggml/src/ggml-remotingfrontend/ggml-backend-host-buffer-type.cpp (4)

ggml/src/ggml-remotingfrontend/virtgpu-utils.h (8)

WARNING (48-57)

WARNING (49-49)

FATAL (59-69)

FATAL (60-60)

INFO (35-42)

INFO (36-36)

INFO (44-45)

INFO (45-45)

ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp (2)

virtgpu_shmem_destroy (71-77)

virtgpu_shmem_destroy (72-73)

ggml/src/ggml-remotingfrontend/virtgpu-forward.h (1)

apir_device_buffer_from_ptr (22-24)

ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp (2)

apir_device_buffer_from_ptr (195-227)

apir_device_buffer_from_ptr (196-198)

tools/run/run.cpp (2)

include/llama.h (1)

llama_batch_get_one (913-915)

src/llama-batch.cpp (2)

llama_batch_get_one (576-588)

llama_batch_get_one (576-578)

ggml/src/ggml-remotingfrontend/venus_cs_ggml-rpc-front.cpp (2)

ggml/src/ggml-remotingfrontend/virtgpu-utils.h (2)

FATAL (59-69)

FATAL (60-60)

ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.cpp (4)

add_tensor (80-94)

add_tensor (81-81)

serialize_tensor (17-46)

serialize_tensor (18-18)

ggml/src/ggml-remotingfrontend/virtgpu-utils.cpp (1)

ggml/src/ggml-remotingfrontend/virtgpu-utils.h (4)

INFO (35-42)

INFO (36-36)

INFO (44-45)

INFO (45-45)

ggml/src/ggml-remotingfrontend/virtgpu-forward-device.cpp (2)

ggml/src/ggml-remotingbackend/shared/venus_cs.h (20)

vn_decode_int32_t (235-239)

vn_decode_int32_t (236-236)

vn_decode_array_size_unchecked (293-299)

vn_decode_array_size_unchecked (294-294)

vn_cs_decoder_alloc_array (493-498)

vn_cs_decoder_alloc_array (494-494)

vn_decode_char_array (466-476)

vn_decode_char_array (467-467)

vn_decode_uint32_t (348-352)

vn_decode_uint32_t (349-349)

vn_decode_size_t (394-400)

vn_decode_size_t (395-395)

vn_decode_bool_t (508-512)

vn_decode_bool_t (509-509)

vn_decode_apir_buffer_type_host_handle_t (522-526)

vn_decode_apir_buffer_type_host_handle_t (523-523)

vn_encode_size_t (387-392)

vn_encode_size_t (388-388)

vn_decode_apir_buffer_host_handle_t (536-540)

vn_decode_apir_buffer_host_handle_t (537-537)

ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (4)

vn_encode_ggml_tensor_inline (174-210)

vn_encode_ggml_tensor_inline (175-175)

vn_decode_apir_buffer_type_host_handle (84-91)

vn_decode_apir_buffer_type_host_handle (85-85)

ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer-type.cpp (2)

ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (2)

vn_encode_ggml_buffer_type (69-73)

vn_encode_ggml_buffer_type (70-70)

ggml/src/ggml-remotingbackend/shared/venus_cs.h (14)

vn_decode_array_size_unchecked (293-299)

vn_decode_array_size_unchecked (294-294)

vn_cs_decoder_alloc_array (493-498)

vn_cs_decoder_alloc_array (494-494)

vn_decode_char_array (466-476)

vn_decode_char_array (467-467)

vn_decode_size_t (394-400)

vn_decode_size_t (395-395)

vn_decode_bool_t (508-512)

vn_decode_bool_t (509-509)

vn_encode_size_t (387-392)

vn_encode_size_t (388-388)

vn_decode_apir_buffer_host_handle_t (536-540)

vn_decode_apir_buffer_host_handle_t (537-537)

ggml/src/ggml-remotingfrontend/virtgpu-forward-buffer.cpp (4)

ggml/src/ggml-remotingbackend/shared/venus_cs.h (8)

vn_encode_apir_buffer_host_handle_t (530-534)

vn_encode_apir_buffer_host_handle_t (531-531)

vn_decode_uintptr_t (550-554)

vn_decode_uintptr_t (551-551)

vn_encode_size_t (387-392)

vn_encode_size_t (388-388)

vn_encode_uint8_t (150-154)

vn_encode_uint8_t (151-151)

ggml/src/ggml-remotingbackend/shared/venus_cs_ggml.h (4)

vn_encode_ggml_tensor (39-44)

vn_encode_ggml_tensor (40-40)

vn_encode_virtgpu_shmem_res_id (128-131)

vn_encode_virtgpu_shmem_res_id (129-129)

ggml/src/ggml-remotingfrontend/virtgpu-shm.h (2)

virtgpu_shmem_create (32-32)

virtgpu_shmem_destroy (33-33)

ggml/src/ggml-remotingfrontend/virtgpu-shm.cpp (4)

virtgpu_shmem_create (79-111)

virtgpu_shmem_create (80-80)

virtgpu_shmem_destroy (71-77)

virtgpu_shmem_destroy (72-73)

ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.cpp (3)

ggml/src/ggml-remotingfrontend/venus_cs_ggml-rpc-front.cpp (4)

add_tensor (50-64)

add_tensor (51-51)

serialize_tensor (12-48)

serialize_tensor (13-13)

ggml/src/ggml-remotingbackend/shared/venus_cs_ggml-rpc.h (4)

add_tensor (36-36)

serialize_tensor (26-26)

deserialize_tensor (38-38)

create_node (40-43)

ggml/src/ggml-remotingbackend/venus_cs_ggml-rpc-back.cpp (4)

deserialize_tensor (33-68)

deserialize_tensor (34-34)

create_node (70-93)

create_node (71-74)

🪛 Shellcheck (0.10.0)

prepare.sh