Skip to content

knitr::spin() result not cached, called 3 times per .R file during index build #14228

@cderv

Description

@cderv

markdownFromKnitrSpinScript spawns a fresh Rscript.exe process each time it is called, and readBaseInputIndex triggers it 3 times for each .R spin script:

  1. renderFormatsresolveFullMarkdownForFilemarkdownForFile → spin
  2. projectFileMetadatamarkdownForFile → spin
  3. engine.partitionedMarkdown → spin

export async function markdownFromKnitrSpinScript(file: string) {
// run spin to get .qmd and get markdown from .qmd
// TODO: implement a caching system because spin is slow and it seems we call this twice for each run
// 1. First as part of the target() call
// 2. Second as part of renderProject() call to get `partitioned` information to get `resourcesFrom` with `resourceFilesFromRenderedFile()`
// we need a temp dir for `CallR` to work but we don't have access to usual options.tempDir.
const tempDir = quarto.system.tempContext().createDir();
const result = await callR<string>(
"spin",
{ input: file },
tempDir,
undefined,
true,
);
return result;
}

There is already a TODO in the code:

// TODO: implement a caching system because spin is slow and it seems we call this twice for each run
// 1. First as part of the target() call

The spin result (markdown output) does not change between calls within the same render, so caching per file would eliminate 2 of the 3 R process spawns. In a project with many .R spin scripts, this adds up significantly (the reporter in #14225 measured 40s vs 3s).

Repro

https://github.com/byzheng/quarto-metadata-files-test

Context

Surfaced while investigating #14225. The spin caching issue is independent of the cache invalidation bug also reported from that investigation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions