Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application crashes on exit with v0.5.0 #1056

Open
f2bo opened this issue Nov 9, 2024 · 7 comments
Open

Application crashes on exit with v0.5.0 #1056

f2bo opened this issue Nov 9, 2024 · 7 comments

Comments

@f2bo
Copy link

f2bo commented Nov 9, 2024

The latest release (v0.5.0) introduces additional checks designed to prevent resource leaks that will cause an application to crash unless some explicit changes are made. This can be easily reproduced with the sample code included in the README file of the Microsoft.ML.OnnxRuntimeGenAI package.

Executing this code as is will result in a crash when the application exits after displaying the following message:

Error: Shutdown must be called before process exit, please check the documentation for the proper API to call to ensure clean shutdown.

Proper disposal of the OgaHandle instance fixes the issue

Replace

OgaHandle ogaHandle = new OgaHandle();

With this

using OgaHandle ogaHandle = new OgaHandle();

I couldn't find documentation regarding the OgaHandle, nor many examples that use it, considering how essential it seems to be. This impacts applications using the ONNX Connector in Semantic Kernel, which currently is not aware of this new requirement (see microsoft/semantic-kernel#9628).

While making it easier to diagnose resource leaks is always welcome, crashing the application seems a bit heavy handed. Maybe just keep the error messages but remove the forced shutdown?

@skyline75489
Copy link
Contributor

I remember having a discussion with @RyanUnderhill about whether we should make it std::abort. Maybe we can move the discussion here instead of Teams?

@skyline75489
Copy link
Contributor

In #799 the proper releasing of resources became a hard requirement. This especially impacts C# and Java scenarios where users will need to manually release the resources (using in C#, and try in Java). We should probably at least add some doc about it.

@sjpritchard
Copy link

sjpritchard commented Nov 13, 2024

Same occurs on C++. I can't find any reference in the C++ examples.

Image

@skyline75489
Copy link
Contributor

@sjpritchard Hi! Could you please post the C++ code you're using? The Adapters API are quite new and I don't think it should trigger leaked source check.

@sjpritchard
Copy link

sjpritchard commented Nov 15, 2024

@skyline75489 Here it is:

#pragma once

#include "ort_genai.h"

#include <QDebug>
#include <QObject>
#include <QString>
#include <QStringList>

class RagLanguageModel : public QObject {
    Q_OBJECT
public:
    explicit RagLanguageModel(QObject* parent = nullptr);
    void Generate(QString query, QStringList context);

public slots:

signals:
    void Generated(QString text);

private:
    std::unique_ptr<OgaModel>           model_{nullptr};
    std::unique_ptr<OgaTokenizer>       tokenizer_{nullptr};
};
#include "RagLanguageModel.h"

RagLanguageModel::RagLanguageModel(QObject* parent) : QObject{parent} {
    model_ = OgaModel::Create("phi-3.5-mini-instruct"); 
    tokenizer_        = OgaTokenizer::Create(*model_);
}

void RagLanguageModel::Generate(QString query, QStringList context) {
    QString joined_context = context.join(" ");

    auto prompt = QString("<|system|>\n"
                          "You are a helpful question answering bot. "
                          "<|end|>\n"
                          "<|user|>\n"
                          "Context: %1\n"
                          "Question: %2."
                          "<|end|>\n"
                          "<|assistant|>")
                      .arg(joined_context)
                      .arg(query);

    auto tokenizer_stream = OgaTokenizerStream::Create(*tokenizer_);
    auto sequences        = OgaSequences::Create();
    tokenizer_->Encode(prompt.toUtf8().constData(), *sequences);

    auto params = OgaGeneratorParams::Create(*model_);
    params->SetInputSequences(*sequences);
    params->SetSearchOption("max_length", 1024);

    try {
        auto        generator = OgaGenerator::Create(*model_, *params);
        while (!generator->IsDone()) {
            generator->ComputeLogits();
            generator->GenerateNextToken();
            size_t         sequence_index  = 0;
            size_t         sequence_length = generator->GetSequenceCount(sequence_index);
            const int32_t* sequence_data   = generator->GetSequenceData(sequence_index);
            int32_t        latest_token    = sequence_data[sequence_length - 1];
            const char*    decoded_chunk   = tokenizer_stream->Decode(latest_token);
            auto           text            = std::string(decoded_chunk);
            emit Generated(QString::fromStdString(text));
        }
    } catch (const std::exception& e) {
        qDebug() << "Error:" << e.what();
    }
}

@luomaojiang2016
Copy link

I encountered the same issue, the onnxruntime-genai.dll crashes when the program closes. It works fine in version 0.4, has problems in version 0.5. I also did an experiment, and found that the application crashes on closing as long as it calls the OgaModelCreate function. This is likely a bug in onnxruntime-genai.dll, and I hope it can be fixed as soon as possible. Thank you

@skyline75489
Copy link
Contributor

@sjpritchard Are you using DML? Or just CPU?

@RyanUnderhill This also looks like an adapter related crash similar to the DML one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants