Skip to content

Conversation

Copy link

Copilot AI commented Sep 28, 2025

Problem

Multiple document converters were using Task.Run to wrap synchronous operations in async methods, which is an anti-pattern that wastes resources by unnecessarily switching threads without any benefit. This pattern only makes sense for GUI applications where you need to keep the UI thread responsive, but not for library code.

Examples of the problematic pattern:

// Before - unnecessary thread switching
public async Task<string> ExtractTextAsync(Stream stream, CancellationToken cancellationToken)
{
    var result = new StringBuilder();
    
    await Task.Run(() =>
    {
        // Synchronous work here
        using var document = SomeDocument.Open(stream, false);
        // ... process document
    }, cancellationToken);
    
    return result.ToString().Trim();
}

// After - direct execution with proper async signature
public Task<string> ExtractTextAsync(Stream stream, CancellationToken cancellationToken)
{
    var result = new StringBuilder();
    
    // Direct synchronous execution
    using var document = SomeDocument.Open(stream, false);
    // ... process document
    
    return Task.FromResult(result.ToString().Trim());
}

Changes Made

Fixed 5 instances of unnecessary Task.Run usage across 4 converter files:

  • PdfConverter.cs (2 fixes):

    • PdfPigTextExtractor.ExtractTextAsync: PDF text extraction
    • RenderOnSupportedPlatformsAsync: PDF image rendering
  • DocxConverter.cs: ExtractTextFromDocxAsync - Word document processing

  • XlsxConverter.cs: ExtractDataFromXlsxAsync - Excel spreadsheet processing

  • PptxConverter.cs: ExtractContentFromPptxAsync - PowerPoint presentation processing

Benefits

  • Performance improvement: Eliminates unnecessary thread context switches and reduces overhead
  • Resource efficiency: Reduces thread pool pressure and memory allocations
  • Maintained compatibility: Preserves the same async API surface for consumers
  • No breaking changes: All existing functionality continues to work (246/246 tests pass)

The fix follows .NET async best practices by avoiding Task.Run in library code where it provides no benefit, while maintaining proper cancellation token support and async method signatures.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • ccsca2021.crl.certum.pl
    • Triggering command: dotnet build MarkItDown.slnx (dns block)
  • ccsca2021.ocsp-certum.com
    • Triggering command: dotnet build MarkItDown.slnx (dns block)
  • crl.certum.pl
    • Triggering command: dotnet build MarkItDown.slnx (dns block)
  • subca.crl.certum.pl
    • Triggering command: dotnet build MarkItDown.slnx (dns block)
  • subca.ocsp-certum.com
    • Triggering command: dotnet build MarkItDown.slnx (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>Fix await</issue_title>
<issue_description>Looking at you PdfConverter.cs, line: 274.

The Task,Run does not make sense here. There is no point on freeing up current thread with await/async and offloading work to another thread. Work still has to be done, does not matter on which thread you do it. It's like taking $2 from left pocket and putting it into right pocket. You still have same amount of money. But you wasted effort on actually doing it (aka switching threads).

It only makes sense when you want to make main thready responsive. Aka GUI thread. So person does not get a feeling that app is frozen. But this is not the case here. There is no GUI.

so looks liek we have to rewniew all asyn-await usage</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #16

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits September 28, 2025 13:54
Copilot AI changed the title [WIP] Fix await Fix async/await: Remove unnecessary Task.Run usage in document converters Sep 28, 2025
Copilot AI requested a review from KSemenenko September 28, 2025 13:58
@KSemenenko KSemenenko marked this pull request as ready for review September 28, 2025 14:54
Copilot AI review requested due to automatic review settings September 28, 2025 14:54
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a performance anti-pattern by removing unnecessary Task.Run usage in document converters. The changes eliminate wasteful thread context switches in library code where synchronous work was being unnecessarily wrapped in async execution.

  • Removed Task.Run wrappers from 5 methods across 4 converter classes
  • Converted methods to return Task.FromResult for direct execution
  • Maintained async API compatibility while improving performance

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
src/MarkItDown/Converters/XlsxConverter.cs Removed Task.Run wrapper from Excel spreadsheet processing
src/MarkItDown/Converters/PptxConverter.cs Removed Task.Run wrapper from PowerPoint presentation processing
src/MarkItDown/Converters/PdfConverter.cs Removed Task.Run wrappers from PDF text extraction and image rendering
src/MarkItDown/Converters/DocxConverter.cs Removed Task.Run wrapper from Word document processing

@KSemenenko KSemenenko merged commit 97c842b into main Sep 28, 2025
2 checks passed
@KSemenenko KSemenenko deleted the copilot/fix-a5739cef-4fa8-47cd-abad-93850f97a530 branch September 28, 2025 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix await

2 participants