Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AccessViolationException occurs during multiple calls #23713

Closed
RockNHawk opened this issue Feb 15, 2025 · 2 comments
Closed

AccessViolationException occurs during multiple calls #23713

RockNHawk opened this issue Feb 15, 2025 · 2 comments

Comments

@RockNHawk
Copy link

RockNHawk commented Feb 15, 2025

AccessViolationException occurs during multiple calls (the crash happens after Profiler::EndProfiling while writing profiler data to a file).

It's not because of the issue with EnableProfiling. I enabled EnableProfiling because the existing logs were crashing when I tried to view them.

When I call InferenceSession.Run in a multi-threaded environment, this function crashes after running several times.

If I create a new InferenceSession instance every time, it does not crash and works well.

I have tried adding a lock to prevent concurrent calls, but that did not solve the issue. I understand that InferenceSession.Run is thread-safe.

Here is the log.

Crash log1:

lfOwnBufferHelper] For ort_value with index: 266, block in memory pattern size is: 194560 but the actual size is: 27904, fall back to default allocation behavior
......
2025-02-15 17:57:52.0556390 [V:onnxruntime:, execution_frame.cc:563 onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper] For ort_value with index: 339, block in memory pattern size is: 276480 but the actual size is: 104448, fall back to default allocation behavior
2025-02-15 17:57:52.0618346 [V:onnxruntime:, execution_frame.cc:563 onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper] For ort_value with index: 340, block in memory pattern size is: 276480 but the actual size is: 104448, fall back to default allocation behavior

2025-02-15 17:57:52.1096720 [I:onnxruntime:, profiler.cc:115 onnxruntime::profiling::Profiler::EndProfiling] Writing profiler data to file onnxruntime_profile__2025-02-15_17-55-37.json

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

Crash log2:

2025-02-15 18:06:59.7739997 [V:onnxruntime:, execution_frame.cc:563 onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper] For ort_value with index: 339, block in memory pattern size is: 122880 but the actual size is: 251904, fall back to default allocation behavior
2025-02-15 18:06:59.7802401 [V:onnxruntime:, execution_frame.cc:563 onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper] For ort_value with index: 340, block in memory pattern size is: 122880 but the actual size is: 251904, fall back to default allocation behavior

2025-02-15 18:07:00.4918686 [I:onnxruntime:, profiler.cc:115 onnxruntime::profiling::Profiler::EndProfiling] Writing profiler data to file onnxruntime_profile__2025-02-15_18-04-46.json

2025-02-15 18:07:00.8677663 [V:onnxruntime:, inference_session.cc:2904 onnxruntime::InferenceSession::EndProfiling] Profiler is disabled.

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

code


public SentenceEncoder(Tokenizer tokenizer, TextModelCommonInfo modelInfo, SessionOptions? sessionOptions = null)
{
    _sessionOptions = sessionOptions ?? NewSessionOptions(null);
    _sessionOptions.LogSeverityLevel = OrtLoggingLevel.ORT_LOGGING_LEVEL_VERBOSE;
    _sessionOptions.EnableProfiling = true;
    _session = new InferenceSession(modelInfo.ModelPath, _sessionOptions);
    _tokenizer = tokenizer;
    _modelInfo = modelInfo;
    _outputNames = _session.OutputMetadata.Keys.ToArray();
    _inputNames = new[] { "input_ids", "attention_mask" };
}


public static SessionOptions NewSessionOptions(int? gpuDeviceId)
{
    var sessionOptions = new SessionOptions()
    {
        LogSeverityLevel = OrtLoggingLevel.ORT_LOGGING_LEVEL_WARNING
    };

    Exception? gpuException = null;
    if (gpuDeviceId != null)
    {
        try
        {
            sessionOptions.AppendExecutionProvider_DML(gpuDeviceId.Value);
        }
        catch (Exception ex)
        {
            gpuException = ex;
        }
    }

    sessionOptions.GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_ALL;
    return sessionOptions;
}


async Task<PooledList<TextEmbeddingResult>> EncodeCore(IEnumerable<TextEmbeddingItem> items)
{
    // inputs
    using var batchInput = EmbeddingGeneratorUtility.GetBatchInput(items, _tokenizer);
    // equals to numSentences
    var batchCount = batchInput.BatchCount;
    var maxTokenLength = batchInput.MaxTokenLength;
    var numSentences = batchCount;
    var tokenCount = maxTokenLength;
    Assertion.Assert(tokenCount == maxTokenLength);
    var allInputLength = maxTokenLength * batchCount;
    long[] flattenIDs = new long[allInputLength];
    long[] flattenAttentionMask = new long[allInputLength];
    long[]? flattenTokenTypeIds = ModelType == ModelTypes.MiniLM ? new long[allInputLength] : null;


    var inputList = batchInput.Inputs.ToList();
    Assertion.Debug(inputList.Count == batchCount);
    Assertion.Debug(inputList.Count * maxTokenLength == allInputLength);
    for (int i = 0; i < inputList.Count; i++)
    {
        var offset = i * maxTokenLength;
        var item = inputList[i];
        Assertion.Debug(item.InputIds.Length == maxTokenLength);
        // this operation is requried
        item.InputIds.CopyTo(flattenIDs, offset);
        item.AttentionMask.CopyTo(flattenAttentionMask, offset);
    }

    using var runOptions = new RunOptions();
    using var inputIdsOrtValue = OrtValue.CreateTensorValueFromMemory(flattenIDs, [batchCount, maxTokenLength]);
    using var attentionMaskOrtValue = OrtValue.CreateTensorValueFromMemory(flattenAttentionMask, [batchCount, maxTokenLength]);

    var inputs = new List<OrtValue>
    {
        { inputIdsOrtValue },
        { attentionMaskOrtValue },
    };

    using var outputValue = OrtValue.CreateAllocatedTensorValue(OrtAllocator.DefaultInstance, TensorElementType.Float, [batchCount, maxTokenLength, Dim]);
    await _session.RunAsync(runOptions, _inputNames, inputs, _outputNames, [outputValue]);
    // Assertion.Debug(outputTensor.Dimensions[2] == Dim);
    var outputVectors = outputValue.GetTensorDataAsSpan<float>();
    Assertion.Assert(outputVectors.Length == batchCount * maxTokenLength * Dim);

    ReadOnlySpan<int> optDims = stackalloc int[]
    {
        numSentences, tokenCount, Dim
    };
    DenseTensor<float> outputTensor = new DenseTensor<float>(outputVectors.ToArray(), optDims);

    var output_pooled = MeanPooling(outputTensor, inputList);
    var output_pooled_normalized = Normalize(output_pooled);
    Assertion.Debug(output_pooled.Length == output_pooled_normalized.Length);
    Assertion.Debug(output_pooled.Length == batchCount * Dim);

    var results = PolledCollectionUtility.List<TextEmbeddingResult>(batchCount);
    for (int s = 0; s < batchCount; s++)
    {
        var emb = new float[Dim];
        for (int i = 0; i < Dim; i++)
        {
            emb[i] = output_pooled_normalized[s, i];
        }

        results.Add(new(inputList[s].InputSource, emb));
    }

    return results;
}
@RockNHawk RockNHawk changed the title AccessViolationException occurs during multiple calls (the crash happens after Profiler::EndProfiling while writing profiler data to a file). AccessViolationException occurs during multiple calls Feb 15, 2025
@RockNHawk
Copy link
Author

RockNHawk commented Feb 15, 2025

And I tried set EnableCpuMemArena & EnableMemoryPattern = false but not work, then I and add an _sessionOptions.IsInvalid check & re-new InferenceSession for fix , it also crash, but Application can run more times.

public SentenceEncoder(Tokenizer tokenizer, TextModelCommonInfo modelInfo, SessionOptions? sessionOptions = null)
{
    _sessionOptions = sessionOptions ?? NewSessionOptions(null);
    ConfigureSessionOptions(_sessionOptions);
    _session = new InferenceSession(modelInfo.ModelPath, _sessionOptions);
     .....
}


private void ConfigureSessionOptions(SessionOptions sessionOptions)
{
    sessionOptions.EnableCpuMemArena = false; 
    sessionOptions.EnableMemoryPattern = false; 
}


public async Task<PooledList<TextEmbeddingResult>> EncodeCore(IEnumerable<TextEmbeddingItem> items)
{
    if (_sessionOptions.IsInvalid)
    {
        Console.WriteLine("[Embedding] Invalid session");
        Console.Error.WriteLine("[Embedding] Invalid session");

        // the session is broken, do not access it even not call `.Dispose()`, it will make AccessViolationException:
        // _session.Dispose();

        _sessionOptions = NewSessionOptions(null);
        ConfigureSessionOptions(_sessionOptions);
        _session = new InferenceSession(_modelInfo.ModelPath, _sessionOptions);
    }

    return await EncodeCoreImpl(items);

crash log:

[Embedding] Invalid session
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

@RockNHawk
Copy link
Author

Sorry, It's my problem, I Dispose the Session after a batch work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant