Skip to content

[BUG]: The native library cannot be correctly loaded on Mac M1 #824

Closed
@kuojianlu

Description

@kuojianlu

Description

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="LlamaSharp" Version="0.13.0" />
    <PackageReference Include="LlamaSharp.Backend.Cpu" Version="0.13.0" />
  </ItemGroup>

</Project>
using LLama.Common;
using LLama;

string modelPath = @"/Users/m1test/Downloads/Meta-Llama-3-8B-Instruct-IQ3_M.gguf"; 

var parameters = new ModelParams(modelPath)
{
    ContextSize = 1024, // The longest length of chat as memory.
    GpuLayerCount = 5 // How many layers to offload to GPU. Please adjust it according to your GPU memory.
};
using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
var executor = new InteractiveExecutor(context);

// Add chat histories as prompt to tell AI how to act.
var chatHistory = new ChatHistory();
chatHistory.AddMessage(AuthorRole.System, "Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.");
chatHistory.AddMessage(AuthorRole.User, "Hello, Bob.");
chatHistory.AddMessage(AuthorRole.Assistant, "Hello. How may I help you today?");

ChatSession session = new(executor, chatHistory);

InferenceParams inferenceParams = new InferenceParams()
{
    MaxTokens = 256, // No more than 256 tokens should appear in answer. Remove it if antiprompt is enough for control.
    AntiPrompts = new List<string> { "User:" } // Stop generation once antiprompts appear.
};

Console.ForegroundColor = ConsoleColor.Yellow;
Console.Write("The chat session has started.\nUser: ");
Console.ForegroundColor = ConsoleColor.Green;
string userInput = Console.ReadLine() ?? "";

while (userInput != "exit")
{
    await foreach ( // Generate the response streamingly.
        var text
        in session.ChatAsync(
            new ChatHistory.Message(AuthorRole.User, userInput),
            inferenceParams))
    {
        Console.ForegroundColor = ConsoleColor.White;
        Console.Write(text);
    }
    Console.ForegroundColor = ConsoleColor.Green;
    userInput = Console.ReadLine() ?? "";
}

Error throws:
Unhandled exception. System.TypeInitializationException: The type initializer for 'LLama.Native.NativeApi' threw an exception.
---> LLama.Exceptions.RuntimeError: The native library cannot be correctly loaded. It could be one of the following reasons:

  1. No LLamaSharp backend was installed. Please search LLamaSharp.Backend and install one of them.

  2. You are using a device with only CPU but installed cuda backend. Please install cpu backend instead.

  3. One of the dependency of the native library is missed. Please use ldd on linux, dumpbin on windows and otoolto check if all the dependency of the native library is satisfied. Generally you could find the libraries under your output folder.

  4. Try to compile llama.cpp yourself to generate a libllama library, then use LLama.Native.NativeLibraryConfig.WithLibrary to specify it at the very beginning of your code. For more information about compilation, please refer to LLamaSharp repo on github.

    at LLama.Native.NativeApi..cctor()
    --- End of inner exception stack trace ---
    at LLama.Native.NativeApi.llama_max_devices()
    at LLama.Abstractions.TensorSplitsCollection..ctor()
    at LLama.Common.ModelParams..ctor(String modelPath)
    at Program.

    $(String[] args) in /Users/m1test/Downloads/LlamaSharpTest/Program.cs:line 12
    at Program.(String[] args)

Reproduction Steps

Use above codes.

Environment & Configuration

  • Operating system: MacBook Air (M1, 2020), Chip Apple M1
  • .NET runtime version: 8.0.302
  • LLamaSharp version: LlamaSharp 0.13.0, LlamaSharp.Backend.Cpu 0.13.0
  • CUDA version (if you are using cuda backend): NA
  • CPU & GPU device: CPU

Known Workarounds

NA

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions