Indexing bugs #1948

40Think · 2025-10-19T10:48:11Z

40Think
Oct 19, 2025

Disable all other plugins besides Copilot (required)
Screenshot of note + Copilot chat pane + dev console added (required)

Logs attached
Copilot version: 3.1.0

Bug Report – Copilot Plus & Smart Connections (Obsidian plugins)
Submitted by a user who has already sent the relevant log files.

1. Summary

When indexing a large Obsidian vault with Copilot Plus (and, to a lesser extent, Smart Connections) the application frequently hangs or crashes. The problem appears tied to:

high‑frequency LLM calls for embedding generation,
batch size and context window settings,
long absolute file paths on Ubuntu,
complex folder hierarchies / non‑alphanumeric characters, and
interaction with other plugins or the Obsidian developer console.

Smart Connections is comparatively stable, but both plugins suffer from index corruption after a freeze.

2. Environment

Item	Value (as reported)
OS	Ubuntu 22.04 (64‑bit)
Obsidian version	(user did not specify – logs contain the exact build)
Copilot Plus version	(see attached logs)
Smart Connections version	(see attached logs)
GPU	High‑end card with ample VRAM (able to process thousands of embedding batches)
Embedding model	`quern-3-8b-embedding-16f` (context window 40 k tokens, batch size set to 100/128)
Vault size	Tens of thousands of notes (some very small, many larger)
File‑system	Ext4, absolute paths used by the plugins

3. Steps to Reproduce

Prepare a fresh environment
- Close all other applications.
- Disable every Obsidian plugin except Copilot Plus (or Smart Connections).
- Do not open the developer console – it dramatically raises the chance of a freeze.
Configure the vault
- Use only alphanumeric characters (letters & digits) in folder and file names.
- Keep paths as short as possible; avoid deep hierarchies.
- Ensure most notes are small (< few KB).
Set indexing parameters
- In Copilot Plus: batch size = 128 embeddings, embedding‑model context window = small (e.g., < 10 k tokens).
- In Smart Connections: default settings (no special changes needed).
Start the indexing process while the computer is idle – no mouse/keyboard activity, no other software running, and ideally after a full system reboot.
Observe the UI for hangs, freezes, or crashes. When they occur, note whether the plugin shows an error message or simply becomes unresponsive (white/black screen).
Repeat with different batch sizes (e.g., 100 vs 128) and with single‑folder indexing (index one folder at a time, restart Obsidian between folders).

4. Expected Behaviour

The plugin should process all notes, build the embedding index, and finish without hanging or corrupting the index files, regardless of vault size or batch configuration.
Opening the developer console must not interfere with indexing.
Errors (if any) should be reported in the UI or console rather than causing a complete freeze.

5. Actual Behaviour

Observation	Details
Frequent hangs / freezes	Copilot Plus becomes completely unresponsive (white/black screen) during indexing, especially when the developer console is open.
Index corruption	After a hang, the generated index files are sometimes left in an inconsistent state; subsequent attempts to re‑index fail until the entire index is deleted and rebuilt from scratch.
Batch size impact	Larger batch sizes (128) increase the probability of hangs, despite the GPU having sufficient VRAM. Small batches appear to overload internal logging mechanisms, leading to similar failures.
Path length issue	On Ubuntu the plugins resolve absolute paths; extremely long paths cause failures even when the relative path inside the vault is short.
Folder hierarchy sensitivity	Deep or complex hierarchies (many nested folders) and non‑alphanumeric characters in names dramatically raise failure rates.
Plugin interaction	Even with all other plugins disabled, Copilot Plus still crashes; however, having any additional plugin active raises the overall instability of both plugins.
Smart Connections stability	Generally stable – crashes are rare, but if the index is corrupted by Copilot Plus it can inherit the same problem.

6. Workarounds Attempted (by the user)

Full system reboot before each indexing run.
Index one folder at a time (hundreds of files max) and restart Obsidian between folders.
Keep all notes small and use very short, alphanumeric file/folder names.
Delete the whole index and start from zero whenever a hang occurs.
Avoid any mouse/keyboard activity during indexing; keep the cursor out of the Obsidian window.

These mitigations improve success probability but do not eliminate the underlying issue.

7. Logs & Attachments

The user has already sent the complete log files (see attached copilot_plus.log and smart_connections.log).
Key excerpts show repeated “Unhandled exception” messages followed by a silent UI freeze when the developer console is open.

8. Suggested Improvements for the Developers

Area	Recommendation
Graceful error handling	Catch and surface any embedding‑generation failures instead of allowing them to block the main thread (especially when the dev console is active).
Batch‑size safety checks	Dynamically limit batch size based on available GPU memory and internal logging load; provide a “safe mode” that auto‑tunes these values.
Path handling	Normalize paths to relative vault roots before persisting index entries, or enforce a maximum absolute path length on Ubuntu.
File‑name validation	Reject or automatically sanitize non‑alphanumeric characters during indexing to avoid hidden parsing errors.
Index recovery	Implement an atomic write/commit strategy for the embedding index so that partial writes do not corrupt the whole database; add a “repair” command.
Developer console isolation	Ensure that opening the dev console does not block the UI thread of Copilot Plus; run indexing in a WebWorker or separate process.
Documentation of known limits	Clearly state in the README the recommended vault structure (short paths, small files, limited hierarchy) and the impact of batch size on stability.
Testing matrix expansion	Add automated tests that combine Copilot Plus with a handful of popular third‑party plugins to detect cross‑plugin deadlocks early.

9. Conclusion

The current implementation of Copilot Plus (and, indirectly, Smart Connections) is highly sensitive to vault size, file‑system layout, batch configuration, and interaction with the Obsidian developer console. The user’s extensive manual testing demonstrates reproducible failure modes that lead to UI freezes and index corruption. Implementing the suggested safeguards would greatly increase robustness for power users who need to index large knowledge bases.

obsidian.md-1760869775478.log
obsidian.md-1760869753350.log
obsidian.md-1760869737365.log
obsidian.md-1760869673450.log
copilot-log.md

Answered by logancyang

Oct 19, 2025

We have moved over to index-free search in v3. Our "semantic search" option is a backward-compatible mode. Having the vector db inside the Electron/browser environment has its limitations. The right way for larger vaults is to use agentic search and/or with a dedicated standalone local vector db.

With the upcoming "self-host" mode we will have a much more scalable local vector db that is cross-platform but separate from the Obsidian environment.

FYI this is on our website:

View full answer

ichts · 2025-10-19T16:10:58Z

ichts
Oct 19, 2025
Collaborator

I think it's JSON's size limit issue. cc @logancyang

0 replies

logancyang · 2025-10-19T20:49:52Z

logancyang
Oct 19, 2025
Maintainer

We have moved over to index-free search in v3. Our "semantic search" option is a backward-compatible mode. Having the vector db inside the Electron/browser environment has its limitations. The right way for larger vaults is to use agentic search and/or with a dedicated standalone local vector db.

With the upcoming "self-host" mode we will have a much more scalable local vector db that is cross-platform but separate from the Obsidian environment.

FYI this is on our website:

0 replies

40Think · 2025-10-21T07:47:48Z

40Think
Oct 21, 2025
Author

@logancyang At this moment, I have found two ways to index a large number of files using your system. The first variant. Instead of running global indexing for the entire vault, I go into each note separately and press the button on the top right that indexes only that one. In such a case, so far there were no problems — never once has anything frozen in this setup with 1500 notes indexed this way. The second variant I found is: if files are very small, it was possible to index 18 thousand files. But still, sometimes there are failures which appear to be initiated not just by a single file; rather, some part of the system — perhaps access control, logging, or buffer overflows — causes these failures and freezes. It's my assumption that some queues for user action control, resource allocation, etc., cause these errors and hangs, but your system can definitely index many files. I cannot say though whether this will continue for weeks or months; indexing one file at a time as they appear might lead to another failure tomorrow or tonight — and everything could collapse again. For now it works.

If I were in your place, I would analyze what the difference is between indexing a large number of small files versus big ones. Also, what’s the difference when you index large files one by one vs. running batches of 50–100 at once?

Perhaps it would be good to conduct a small experiment — run a version of the indexing system that emulates the loop which is triggered when each file gets indexed individually and simply feed files into it one by one. Possibly if we change the architecture of mass indexing, these errors will disappear. Naturally, in the settings we'd need to hardcode batch size to exactly one file per batch. And so-called, let’s name this mode “BigVault Index Mode” or “Safety Mode”.

0 replies

40Think · 2025-10-22T12:57:50Z

40Think
Oct 22, 2025
Author

Manual index 1700 files, i think this method save index to RAM? I dont see file modification. But index dont save after app crash.

0 replies

logancyang · 2025-10-22T16:06:35Z

logancyang
Oct 22, 2025
Maintainer

No index doesn't save to RAM, it saves to disk by batch with checkpoints. What file are you looking at? Use "list all indexed files" command for more details.

0 replies

logancyang · 2025-10-22T16:10:49Z

logancyang
Oct 22, 2025
Maintainer

We have tested with several other users with 100GB+ vaults. They are using a similar approach you described - index a subset and add more with the button one by one. It's not worth the engineering hours since this approach has clear limitations. We are moving to real vector dbs in self-host mode as a proper solution.

0 replies

40Think · 2025-10-22T18:28:47Z

40Think
Oct 22, 2025
Author

e are moving to real vector dbs in self-host mode as a proper solution.

I research about better RAG system. Current my preference - https://platform.cognee.ai/auth
https://docs.cognee.ai/core-concepts/architecture

Graph+Vector human-like memory, intellectual agentic graph generation inside this memory. Can you integrate this? Maybe in MCP mode? Or API call from you agent?

Relational store — Tracks your documents, their chunks, and provenance (i.e. where each piece of data came from and how it’s linked to the source).
Vector store — Holds embeddings for semantic similarity (i.e. numerical representations that let Cognee find conceptually related text, even if the wording is different).
Graph store — Captures entities and relationships in a knowledge graph (i.e. nodes and edges that let Cognee understand structure and navigate connections between concepts).

0 replies

40Think · 2025-10-22T23:27:00Z

40Think
Oct 22, 2025
Author

@logancyang Based on my analysis of the codebase, I need to provide you with important context: the Obsidian Copilot plugin's indexing pipeline has limited resource management capabilities and does not implement storage-type-aware I/O optimizations or inter-plugin coordination mechanisms. The persistent RangeError: Invalid string length and database corruption issues you're experiencing indicate fundamental limitations in the current architecture.

Resource Management Implementation

The plugin implements basic resource control mechanisms:

Rate Limiting: A simple time-based rate limiter controls embedding API requests 1

Batch Processing: Documents are processed in configurable batches, but this only affects embedding API calls, not I/O operations 2

Checkpointing: Database saves occur at intervals calculated as 8 * embeddingBatchSize, but this is purely count-based without consideration for data size or I/O performance 3

I/O Operations and Storage Limitations

Critical Issue with Partitioning: While the plugin detects RangeError: Invalid string length errors 4 , the underlying problem is more complex than simply increasing numPartitions.

The chunked storage system uses hash-based document distribution 5 , but partition 0 receives additional global metadata that includes the entire index structure 6 . This means:

Even with multiple partitions, partition 0 can exceed the JSON string length limit due to global metadata
The hash-based distribution (djb2 algorithm) doesn't guarantee even distribution
Each partition undergoes JSON.stringify() during save operations 7

No Storage-Type Awareness: The plugin uses Obsidian's abstract vault adapter without any special handling for SSD vs. HDD or I/O prioritization 8

Concurrency Control and Conflict Prevention

Internal Concurrency: The plugin uses a mutex for atomic upsert operations 9 , but this only prevents internal race conditions within the plugin itself.

No Inter-Plugin Coordination: There is no mechanism to prevent conflicts with other plugins that might be accessing the same files or performing I/O operations simultaneously. The only coordination is a 5-second debounce for reindexing 10

Background Operations: The plugin uses setTimeout with a 100ms delay for final saves and integrity checks 11 , but this doesn't provide true background processing or system-level priority management.

Why Your Issue Persists

The persistent RangeError: Invalid string length despite increasing numPartitions and adjusting batch sizes indicates:

Architectural Limitation: The error occurs during JSON.stringify() operations in the save process. The data structure being stringified includes nested objects with vectors, metadata, and index structures that can exceed JavaScript's maximum string length (~512MB) even when distributed across partitions.
Partition 0 Bottleneck: The first partition contains global data that grows with vault size regardless of numPartitions setting 12
No Progressive/Streaming I/O: All partition data is serialized to strings in memory before writing, rather than using streaming writes.
Database Corruption: When saves fail mid-operation, there's no transaction log or rollback mechanism 13

Notes

The plugin's error handling shows awareness of the string length issue and recommends increasing partitions 14 , but this solution has inherent limitations for very large vaults. The codebase doesn't implement:

Streaming serialization for large objects
Storage-backend-specific optimizations
System process priority management
Inter-plugin I/O coordination
Transaction-based saves with rollback capabilities
Memory-aware batching (only count-based)

For vaults exceeding the architectural limits, the current implementation cannot reliably prevent these errors regardless of configuration adjustments.

Citations

File: src/rateLimiter.ts (L1-28)

export class RateLimiter {
  private lastRequestTime = 0;
  private requestsPerMin: number;

  constructor(requestsPerMin: number) {
    this.requestsPerMin = requestsPerMin;
  }

  setRequestsPerMin(requestsPerMin: number) {
    this.requestsPerMin = requestsPerMin;
  }

  getRequestsPerMin(): number {
    return this.requestsPerMin;
  }

  async wait(): Promise<void> {
    const now = Date.now();
    const timeSinceLastRequest = now - this.lastRequestTime;
    const timeToWait = Math.max(0, 60000 / this.requestsPerMin - timeSinceLastRequest);

    if (timeToWait > 0) {
      await new Promise((resolve) => setTimeout(resolve, timeToWait));
    }

    this.lastRequestTime = Date.now();
  }
}

File: src/search/indexOperations.ts (L42-58)

  constructor(
    private app: App,
    private dbOps: DBOperations,
    private embeddingsManager: EmbeddingsManager
  ) {
    const settings = getSettings();
    this.rateLimiter = new RateLimiter(settings.embeddingRequestsPerMin);
    this.embeddingBatchSize = settings.embeddingBatchSize;
    this.checkpointInterval = 8 * this.embeddingBatchSize;

    // Subscribe to settings changes
    subscribeToSettingsChange(async () => {
      const settings = getSettings();
      this.rateLimiter = new RateLimiter(settings.embeddingRequestsPerMin);
      this.embeddingBatchSize = settings.embeddingBatchSize;
      this.checkpointInterval = 8 * this.embeddingBatchSize;
    });

File: src/search/indexOperations.ts (L107-183)

      for (let i = 0; i < allChunks.length; i += this.embeddingBatchSize) {
        if (this.state.isIndexingCancelled) break;
        await this.handlePause();

        const batch = allChunks.slice(i, i + this.embeddingBatchSize);
        try {
          await this.rateLimiter.wait();
          const embeddings = await embeddingInstance.embedDocuments(
            batch.map((chunk) => chunk.content)
          );

          // Validate embeddings
          if (!embeddings || embeddings.length !== batch.length) {
            throw new Error(
              `Embedding model returned ${embeddings?.length ?? 0} embeddings for ${batch.length} documents`
            );
          }

          // Save batch to database
          for (let j = 0; j < batch.length; j++) {
            const chunk = batch[j];
            const embedding = embeddings[j];

            // Skip documents with invalid embeddings
            if (!embedding || !Array.isArray(embedding) || embedding.length === 0) {
              logError(`Invalid embedding for document ${chunk.fileInfo.path}: ${embedding}`);
              this.dbOps.markFileMissingEmbeddings(chunk.fileInfo.path);
              continue;
            }

            try {
              await this.dbOps.upsert({
                ...chunk.fileInfo,
                id: this.getDocHash(chunk.content),
                content: chunk.content,
                embedding,
                created_at: Date.now(),
                nchars: chunk.content.length,
              });
              // Mark success for this file
              this.state.processedFiles.add(chunk.fileInfo.path);
            } catch (err) {
              // Log error but continue processing other documents in batch
              this.handleError(err, {
                filePath: chunk.fileInfo.path,
                errors,
              });
              this.dbOps.markFileMissingEmbeddings(chunk.fileInfo.path);
              continue;
            }
          }

          // Update progress after the batch
          this.state.indexedCount = this.state.processedFiles.size;
          this.updateIndexingNoticeMessage();

          // Calculate if we've crossed a checkpoint threshold
          const previousCheckpoint = Math.floor(
            (this.state.indexedCount - batch.length) / this.checkpointInterval
          );
          const currentCheckpoint = Math.floor(this.state.indexedCount / this.checkpointInterval);

          if (currentCheckpoint > previousCheckpoint) {
            await this.dbOps.saveDB();
            console.log("Copilot index checkpoint save completed.");
          }
        } catch (err) {
          this.handleError(err, {
            filePath: batch?.[0]?.fileInfo?.path,
            errors,
            batch,
          });
          if (this.isRateLimitError(err)) {
            break;
          }
        }
      }

File: src/search/indexOperations.ts (L189-201)

      setTimeout(() => {
        this.dbOps
          .saveDB()
          .then(() => {
            logInfo("Copilot index final save completed.");
            this.dbOps.checkIndexIntegrity().catch((err) => {
              logError("Background integrity check failed:", err);
            });
          })
          .catch((err) => {
            logError("Background save failed:", err);
          });
      }, 100); // 100ms delay

File: src/search/indexOperations.ts (L437-449)

  private isStringLengthError(error: any): boolean {
    if (!error) return false;

    // Check if it's a direct RangeError
    if (error instanceof RangeError && error.message.toLowerCase().includes("string length")) {
      return true;
    }

    // Check the error message at any depth
    const message = error.message || error.toString();
    const lowerMessage = message.toLowerCase();
    return lowerMessage.includes("string length") || lowerMessage.includes("rangeerror");
  }

File: src/search/indexOperations.ts (L492-498)

    if (this.isStringLengthError(error)) {
      new Notice(
        "Vault is too large for 1 partition, please increase the number of partitions in your Copilot QA settings!",
        10000 // Show for 10 seconds
      );
      return;
    }

File: src/search/chunkedStorage.ts (L36-52)

  public assignDocumentToPartition(docId: string, totalPartitions: number): number {
    // 1. Convert string to array of characters
    const chars = Array.from(docId); // e.g., "abc" -> ['a', 'b', 'c']

    // 2. Create a hash using the djb2 algorithm
    const hash = chars.reduce((acc, char) => {
      // For each character:
      // a. Left shift acc by 5 (multiply by 32): acc << 5
      // b. Subtract original acc: (acc << 5) - acc
      //    This is equivalent to: acc * 31
      // c. Add character code: + char.charCodeAt(0)
      return (acc << 5) - acc + char.charCodeAt(0);
    }, 0);

    // 3. Take absolute value and modulo to get partition number
    return Math.abs(hash) % totalPartitions;
  }

File: src/search/chunkedStorage.ts (L99-104)

  private async ensureDirectoryExists(filePath: string): Promise<void> {
    const dir = filePath.substring(0, filePath.lastIndexOf("/"));
    if (!(await this.app.vault.adapter.exists(dir))) {
      await this.app.vault.adapter.mkdir(dir);
    }
  }

File: src/search/chunkedStorage.ts (L164-172)

      // Create global data object (excluding partitioned fields)
      const globalData = {
        ...rawData,
        docs: { docs: {}, count: 0 },
        index: {
          ...(rawData as any).index,
          vectorIndexes: undefined,
        },
      };

File: src/search/chunkedStorage.ts (L196-207)

        // For first partition, include global data
        const finalPartitionData =
          partitionIndex === 0
            ? {
                ...globalData,
                docs: partitionData.docs,
                index: {
                  ...globalData.index,
                  vectorIndexes: partitionData.index.vectorIndexes,
                },
              }
            : partitionData;

File: src/search/chunkedStorage.ts (L209-216)

        const chunkPath = this.getChunkPath(partitionIndex);
        await this.ensureDirectoryExists(chunkPath);
        await this.app.vault.adapter.write(chunkPath, JSON.stringify(finalPartitionData));

        if (getSettings().debug) {
          console.log(`Saved partition ${partitionIndex + 1}/${numPartitions}`);
        }
      }

File: src/search/dbOperations.ts (L116-147)

  async saveDB() {
    if (Platform.isMobile && getSettings().disableIndexOnMobile) {
      return;
    }

    if (!this.oramaDb || !this.chunkedStorage) {
      // Instead of throwing immediately, try to initialize.
      // Crucial for new user onboarding.
      try {
        await this.initializeDB(await EmbeddingsManager.getInstance().getEmbeddingsAPI());
        // If still not initialized after attempt, then throw
        if (!this.oramaDb || !this.chunkedStorage) {
          throw new CustomError("Orama database not found.");
        }
      } catch (error) {
        logError("Failed to initialize database during save:", error);
        throw new CustomError("Failed to initialize and save database.");
      }
    }

    try {
      await this.chunkedStorage.saveDatabase(this.oramaDb);
      this.hasUnsavedChanges = false;

      if (getSettings().debug) {
        logInfo("Orama database saved successfully at:", this.dbPath);
      }
    } catch (error) {
      logError(`Error saving Orama database:`, error);
      throw error;
    }
  }

File: src/search/dbOperations.ts (L362-416)

  async upsert(docToSave: any): Promise<any | undefined> {
    if (!this.oramaDb) throw new Error("DB not initialized");
    const db = this.oramaDb;

    // Use mutex to make the operation atomic
    return await this.upsertMutex.runExclusive(async () => {
      try {
        // Calculate partition first
        const partition = this.chunkedStorage?.assignDocumentToPartition(
          docToSave.id,
          getSettings().numPartitions
        );

        // Check if document exists
        const existingDoc = await search(db, {
          term: docToSave.id,
          properties: ["id"],
          limit: 1,
        });

        if (existingDoc.hits.length > 0) {
          await remove(db, existingDoc.hits[0].id);
        }

        // Insert into the assigned partition
        try {
          await insert(db, docToSave);
          logInfo(
            `${existingDoc.hits.length > 0 ? "Updated" : "Inserted"} document ${docToSave.id} in partition ${partition}`
          );

          this.markUnsavedChanges();
          return docToSave;
        } catch (insertErr) {
          logError(
            `Failed to ${existingDoc.hits.length > 0 ? "update" : "insert"} document ${docToSave.id}:`,
            insertErr
          );
          // If we removed an existing document but failed to insert the new one,
          // we should try to restore the old document
          if (existingDoc.hits.length > 0) {
            try {
              await insert(db, existingDoc.hits[0].document);
            } catch (restoreErr) {
              logError("Failed to restore previous document version:", restoreErr);
            }
          }
          return undefined;
        }
      } catch (err) {
        logError(`Error upserting document ${docToSave.id}:`, err);
        return undefined;
      }
    });
  }

File: src/search/indexEventHandler.ts (L9-87)

const DEBOUNCE_DELAY = 5000; // 5 seconds

export class IndexEventHandler {
  private debounceTimer: number | null = null;
  private lastActiveFile: TFile | null = null;
  private lastActiveFileMtime: number | null = null;

  constructor(
    private app: App,
    private indexOps: IndexOperations,
    private dbOps: DBOperations
  ) {
    this.initializeEventListeners();
  }

  private initializeEventListeners() {
    if (getSettings().debug) {
      console.log("Copilot Plus: Initializing event listeners");
    }
    this.app.workspace.on("active-leaf-change", this.handleActiveLeafChange);
    this.app.vault.on("delete", this.handleFileDelete);
  }

  private handleActiveLeafChange = async (leaf: any) => {
    if (Platform.isMobile && getSettings().disableIndexOnMobile) {
      return;
    }

    const currentChainType = getChainType();
    if (currentChainType !== ChainType.COPILOT_PLUS_CHAIN) {
      return;
    }

    // Get the previously active file that we need to check
    const fileToCheck = this.lastActiveFile;
    const previousMtime = this.lastActiveFileMtime;

    // Update tracking for the new active file
    const currentView = leaf?.view;
    this.lastActiveFile = currentView instanceof MarkdownView ? currentView.file : null;
    this.lastActiveFileMtime = this.lastActiveFile?.stat?.mtime ?? null;

    // If there was no previous file or it's the same as current, do nothing
    if (!fileToCheck || fileToCheck === this.lastActiveFile) {
      return;
    }

    // Safety check for file stats and mtime
    if (!fileToCheck?.stat?.mtime || previousMtime === null) {
      return;
    }

    // Only process markdown files that match inclusion/exclusion patterns
    if (fileToCheck.extension === "md") {
      const { inclusions, exclusions } = getMatchingPatterns();
      const shouldProcess = shouldIndexFile(fileToCheck, inclusions, exclusions);

      // Check if file was modified while it was active
      const wasModified = previousMtime !== null && fileToCheck.stat.mtime > previousMtime;

      if (shouldProcess && wasModified) {
        this.debouncedReindexFile(fileToCheck);
      }
    }
  };

  private debouncedReindexFile = (file: TFile) => {
    if (this.debounceTimer !== null) {
      window.clearTimeout(this.debounceTimer);
    }

    this.debounceTimer = window.setTimeout(() => {
      if (getSettings().debug) {
        console.log("Copilot Plus: Triggering reindex for file ", file.path);
      }
      this.indexOps.reindexFile(file);
      this.debounceTimer = null;
    }, DEBOUNCE_DELAY);
  };

0 replies

40Think · 2025-10-22T23:30:49Z

40Think
Oct 22, 2025
Author

@logancyang # Single-File vs Batch Indexing Differences

The Obsidian Copilot plugin's single-file and batch indexing processes differ primarily in how and when index data is persisted to disk, not in where it's stored. Both use the same underlying Orama database and ChunkedStorage system.

Storage Location

Both processes store data in the same location - the Orama database managed by DBOperations and persisted through ChunkedStorage to JSON files in either .obsidian/ (if index sync is enabled) or .copilot-index/ directory. 1

Key Difference: Deferred vs Immediate Persistence

Single-File Indexing (`reindexFile`)

Single-file indexing updates the in-memory database but defers writing to disk:

Removes existing documents for the file
Prepares chunks and generates embeddings
Upserts documents to the in-memory Orama database
Calls markUnsavedChanges() instead of saveDB() 2

The markUnsavedChanges() method simply sets a flag without writing to disk: 3

Batch Indexing (`indexVaultToVectorStore`)

Batch indexing periodically saves to disk using checkpoints:

Processes files in batches
Saves to disk every checkpointInterval documents (every 8 × batch size) 4
Performs a final saveDB() after completion 5

Why Single-File Indexing Works Without Updating JSON Modification Dates

The JSON modification dates are only updated when saveDB() actually writes to disk. During the save operation, the lastModified timestamp in the metadata is set to Date.now(): 6

Single-file indexing works because:

Changes are immediately available in the in-memory Orama database for search queries
The JSON files on disk are only updated when:
- The plugin unloads (triggers a final save if hasUnsavedChanges is true) 7
- Batch indexing reaches a checkpoint
- Another explicit saveDB() operation occurs

This design improves performance by avoiding frequent disk I/O for single-file updates while ensuring data integrity through deferred persistence.

Notes

The document mtime (file modification time) stored in each OramaDocument is different from the JSON file's modification date - the former tracks when the source file was modified, while the latter tracks when the database was last saved to disk. 8

Single-file reindexing can be triggered automatically when switching away from a modified file (with a 5-second debounce) or manually through the "Reindex Current Note" command. 9

Citations

File: src/search/dbOperations.ts (L14-28)

export interface OramaDocument {
  id: string;
  title: string;
  content: string;
  embedding: number[];
  path: string;
  embeddingModel: string;
  created_at: number;
  ctime: number;
  mtime: number;
  tags: string[];
  extension: string;
  nchars: number;
  metadata: Record<string, any>;
}

File: src/search/dbOperations.ts (L217-221)

  public onunload() {
    if (this.hasUnsavedChanges) {
      this.saveDB();
    }
  }

File: src/search/dbOperations.ts (L229-249)

  public async getDbPath(): Promise<string> {
    const vaultRoot = this.app.vault.getRoot().path;
    let baseDir: string;

    if (getSettings().enableIndexSync) {
      baseDir = this.app.vault.configDir;
    } else {
      // If vaultRoot is just "/", treat it as empty
      const effectiveRoot = vaultRoot === "/" ? "" : vaultRoot;
      const prefix = effectiveRoot === "" || effectiveRoot.startsWith("/") ? "" : "/";
      baseDir = `${prefix}${effectiveRoot}/.copilot-index`;

      // Ensure the directory exists
      if (!(await this.app.vault.adapter.exists(baseDir))) {
        await this.app.vault.adapter.mkdir(baseDir);
        logInfo("Created directory:", baseDir);
      }
    }

    return baseDir;
  }

File: src/search/dbOperations.ts (L256-258)

  public markUnsavedChanges() {
    this.hasUnsavedChanges = true;
  }

File: src/search/indexOperations.ts (L163-172)

          // Calculate if we've crossed a checkpoint threshold
          const previousCheckpoint = Math.floor(
            (this.state.indexedCount - batch.length) / this.checkpointInterval
          );
          const currentCheckpoint = Math.floor(this.state.indexedCount / this.checkpointInterval);

          if (currentCheckpoint > previousCheckpoint) {
            await this.dbOps.saveDB();
            console.log("Copilot index checkpoint save completed.");
          }

File: src/search/indexOperations.ts (L189-201)

      setTimeout(() => {
        this.dbOps
          .saveDB()
          .then(() => {
            logInfo("Copilot index final save completed.");
            this.dbOps.checkIndexIntegrity().catch((err) => {
              logError("Background integrity check failed:", err);
            });
          })
          .catch((err) => {
            logError("Background save failed:", err);
          });
      }, 100); // 100ms delay

File: src/search/indexOperations.ts (L570-571)

      // Mark that we have unsaved changes instead of saving immediately
      this.dbOps.markUnsavedChanges();

File: src/search/chunkedStorage.ts (L153-161)

      const metadata: ChunkMetadata = {
        numPartitions,
        vectorLength: db.schema.embedding.match(/\d+/)[0],
        schema: db.schema,
        lastModified: Date.now(),
        documentPartitions: Object.fromEntries(
          rawDocs.map((doc: any) => [doc.id, this.assignDocumentToPartition(doc.id, numPartitions)])
        ),
      };

File: src/search/indexEventHandler.ts (L75-87)

  private debouncedReindexFile = (file: TFile) => {
    if (this.debounceTimer !== null) {
      window.clearTimeout(this.debounceTimer);
    }

    this.debounceTimer = window.setTimeout(() => {
      if (getSettings().debug) {
        console.log("Copilot Plus: Triggering reindex for file ", file.path);
      }
      this.indexOps.reindexFile(file);
      this.debounceTimer = null;
    }, DEBOUNCE_DELAY);
  };

0 replies

40Think · 2025-10-22T23:33:58Z

40Think
Oct 22, 2025
Author

@logancyang To modify the indexVaultToVectorStore function to perform all indexing operations in RAM and save the database to disk only once at the end (similar to reindexFile), you need to make the following changes:

Key Modification: Remove Checkpoint Saves

The primary change is to remove the checkpoint save logic that currently saves the database multiple times during the indexing process. Currently, indexVaultToVectorStore saves the database at regular intervals: 1

This checkpoint logic should be removed or commented out. The function will then work similarly to reindexFile, which only marks unsaved changes without immediately saving to disk: 2

How It Works

The system already operates primarily in RAM:

All upsert operations modify the in-memory Orama database - Each document chunk is inserted into the in-memory database without disk I/O: 3
The final save handles partitioning - The database is only written to disk when saveDB() is called, which serializes the entire in-memory database and distributes it across partitions: 4
ChunkedStorage handles the 40-partition structure - When saving, the system distributes documents to partitions and writes them atomically: 5

Recommended Implementation

Remove this checkpoint code block: 6

Keep the final save at the end: 7

Notes

The in-memory Orama database handles all indexing operations without disk I/O until saveDB() is called
Removing checkpoint saves means you lose intermediate progress if the process is interrupted, but this is the same behavior as reindexFile
The final save operation distributes documents across partitions using a hash-based algorithm for load balancing
All upsert operations already call markUnsavedChanges() to track that the database needs saving
The partition count is configurable via the numPartitions setting, with 40 being one possible configuration

Citations

File: src/search/indexOperations.ts (L163-172)

          // Calculate if we've crossed a checkpoint threshold
          const previousCheckpoint = Math.floor(
            (this.state.indexedCount - batch.length) / this.checkpointInterval
          );
          const currentCheckpoint = Math.floor(this.state.indexedCount / this.checkpointInterval);

          if (currentCheckpoint > previousCheckpoint) {
            await this.dbOps.saveDB();
            console.log("Copilot index checkpoint save completed.");
          }

File: src/search/indexOperations.ts (L189-201)

      setTimeout(() => {
        this.dbOps
          .saveDB()
          .then(() => {
            logInfo("Copilot index final save completed.");
            this.dbOps.checkIndexIntegrity().catch((err) => {
              logError("Background integrity check failed:", err);
            });
          })
          .catch((err) => {
            logError("Background save failed:", err);
          });
      }, 100); // 100ms delay

File: src/search/indexOperations.ts (L571-571)

      this.dbOps.markUnsavedChanges();

File: src/search/dbOperations.ts (L116-147)

  async saveDB() {
    if (Platform.isMobile && getSettings().disableIndexOnMobile) {
      return;
    }

    if (!this.oramaDb || !this.chunkedStorage) {
      // Instead of throwing immediately, try to initialize.
      // Crucial for new user onboarding.
      try {
        await this.initializeDB(await EmbeddingsManager.getInstance().getEmbeddingsAPI());
        // If still not initialized after attempt, then throw
        if (!this.oramaDb || !this.chunkedStorage) {
          throw new CustomError("Orama database not found.");
        }
      } catch (error) {
        logError("Failed to initialize database during save:", error);
        throw new CustomError("Failed to initialize and save database.");
      }
    }

    try {
      await this.chunkedStorage.saveDatabase(this.oramaDb);
      this.hasUnsavedChanges = false;

      if (getSettings().debug) {
        logInfo("Orama database saved successfully at:", this.dbPath);
      }
    } catch (error) {
      logError(`Error saving Orama database:`, error);
      throw error;
    }
  }

File: src/search/dbOperations.ts (L362-416)

  async upsert(docToSave: any): Promise<any | undefined> {
    if (!this.oramaDb) throw new Error("DB not initialized");
    const db = this.oramaDb;

    // Use mutex to make the operation atomic
    return await this.upsertMutex.runExclusive(async () => {
      try {
        // Calculate partition first
        const partition = this.chunkedStorage?.assignDocumentToPartition(
          docToSave.id,
          getSettings().numPartitions
        );

        // Check if document exists
        const existingDoc = await search(db, {
          term: docToSave.id,
          properties: ["id"],
          limit: 1,
        });

        if (existingDoc.hits.length > 0) {
          await remove(db, existingDoc.hits[0].id);
        }

        // Insert into the assigned partition
        try {
          await insert(db, docToSave);
          logInfo(
            `${existingDoc.hits.length > 0 ? "Updated" : "Inserted"} document ${docToSave.id} in partition ${partition}`
          );

          this.markUnsavedChanges();
          return docToSave;
        } catch (insertErr) {
          logError(
            `Failed to ${existingDoc.hits.length > 0 ? "update" : "insert"} document ${docToSave.id}:`,
            insertErr
          );
          // If we removed an existing document but failed to insert the new one,
          // we should try to restore the old document
          if (existingDoc.hits.length > 0) {
            try {
              await insert(db, existingDoc.hits[0].document);
            } catch (restoreErr) {
              logError("Failed to restore previous document version:", restoreErr);
            }
          }
          return undefined;
        }
      } catch (err) {
        logError(`Error upserting document ${docToSave.id}:`, err);
        return undefined;
      }
    });
  }

File: src/search/chunkedStorage.ts (L106-224)

  async saveDatabase(db: Orama<any>): Promise<void> {
    try {
      const rawData: RawData = await save(db);
      const numPartitions = getSettings().numPartitions;

      if (numPartitions === 1) {
        const legacyPath = this.getLegacyPath();
        await this.ensureDirectoryExists(legacyPath);
        await this.app.vault.adapter.write(
          legacyPath,
          JSON.stringify({
            ...rawData,
            schema: db.schema,
          })
        );
        return;
      }

      // NOTE: Orama RawData docs can be either an array or an object
      const docsData = (rawData as any).docs?.docs;
      const rawDocs = Array.isArray(docsData) ? docsData : Object.values(docsData || {});

      if (getSettings().debug) {
        console.log(`Starting save with ${rawDocs.length ?? 0} total documents`);
      }

      if (!rawDocs || rawDocs.length === 0) {
        const metadata: ChunkMetadata = {
          numPartitions,
          vectorLength: db.schema.embedding.match(/\d+/)[0],
          schema: db.schema,
          lastModified: Date.now(),
          documentPartitions: {},
        };

        const metadataPath = this.getMetadataPath();
        await this.ensureDirectoryExists(metadataPath);
        await this.app.vault.adapter.write(metadataPath, JSON.stringify(metadata));

        if (getSettings().debug) {
          console.log("Saved empty database state");
        }
        return;
      }

      const partitions = this.distributeDocumentsToPartitions(rawDocs, numPartitions);

      const metadata: ChunkMetadata = {
        numPartitions,
        vectorLength: db.schema.embedding.match(/\d+/)[0],
        schema: db.schema,
        lastModified: Date.now(),
        documentPartitions: Object.fromEntries(
          rawDocs.map((doc: any) => [doc.id, this.assignDocumentToPartition(doc.id, numPartitions)])
        ),
      };

      await this.saveMetadata(metadata);
      // Create global data object (excluding partitioned fields)
      const globalData = {
        ...rawData,
        docs: { docs: {}, count: 0 },
        index: {
          ...(rawData as any).index,
          vectorIndexes: undefined,
        },
      };

      // Save partitions
      for (const [partitionIndex, docs] of partitions.entries()) {
        // Create partition-specific data
        const partitionData = {
          index: {
            vectorIndexes: {
              embedding: {
                size: (rawData as any).index.vectorIndexes.embedding.size,
                vectors: Object.fromEntries(
                  Object.entries((rawData as any).index.vectorIndexes.embedding.vectors).filter(
                    ([id]) => docs.some((doc) => doc.id === id)
                  )
                ),
              },
            },
          },
          docs: {
            docs: Object.fromEntries(docs.map((doc, index) => [(index + 1).toString(), doc])),
            count: docs.length,
          },
        };

        // For first partition, include global data
        const finalPartitionData =
          partitionIndex === 0
            ? {
                ...globalData,
                docs: partitionData.docs,
                index: {
                  ...globalData.index,
                  vectorIndexes: partitionData.index.vectorIndexes,
                },
              }
            : partitionData;

        const chunkPath = this.getChunkPath(partitionIndex);
        await this.ensureDirectoryExists(chunkPath);
        await this.app.vault.adapter.write(chunkPath, JSON.stringify(finalPartitionData));

        if (getSettings().debug) {
          console.log(`Saved partition ${partitionIndex + 1}/${numPartitions}`);
        }
      }
      if (getSettings().debug) {
        console.log("Saved all partitions");
      }
    } catch (error) {
      console.error(`Error saving database:`, error);
      throw new CustomError(`Failed to save database: ${error.message}`);
    }
  }

1 reply

logancyang Oct 23, 2025
Maintainer

This doesn't make any sense. Please DO NOT spam with AI slops. Such long verbose AI-generated issues will be banned.

Uh oh!

Indexing bugs #1948

Uh oh!

40Think Oct 19, 2025

1. Summary

2. Environment

3. Steps to Reproduce

4. Expected Behaviour

5. Actual Behaviour

6. Workarounds Attempted (by the user)

7. Logs & Attachments

8. Suggested Improvements for the Developers

9. Conclusion

Replies: 10 comments · 1 reply

Uh oh!

ichts Oct 19, 2025 Collaborator

Uh oh!

Uh oh!

logancyang Oct 19, 2025 Maintainer

Uh oh!

40Think Oct 21, 2025 Author

Uh oh!

40Think Oct 22, 2025 Author

Uh oh!

Uh oh!

logancyang Oct 22, 2025 Maintainer

Uh oh!

logancyang Oct 22, 2025 Maintainer

Uh oh!

40Think Oct 22, 2025 Author

Uh oh!

40Think Oct 22, 2025 Author

Resource Management Implementation

I/O Operations and Storage Limitations

Concurrency Control and Conflict Prevention

Why Your Issue Persists

Notes

Citations

Uh oh!

40Think Oct 22, 2025 Author

Storage Location

Key Difference: Deferred vs Immediate Persistence

Single-File Indexing (reindexFile)

Batch Indexing (indexVaultToVectorStore)

Why Single-File Indexing Works Without Updating JSON Modification Dates

Notes

Citations

Uh oh!

40Think Oct 22, 2025 Author

Key Modification: Remove Checkpoint Saves

How It Works

Recommended Implementation

Notes

Citations

Uh oh!

logancyang Oct 23, 2025 Maintainer

40Think
Oct 19, 2025

Replies: 10 comments 1 reply

ichts
Oct 19, 2025
Collaborator

logancyang
Oct 19, 2025
Maintainer

40Think
Oct 21, 2025
Author

40Think
Oct 22, 2025
Author

logancyang
Oct 22, 2025
Maintainer

logancyang
Oct 22, 2025
Maintainer

40Think
Oct 22, 2025
Author

40Think
Oct 22, 2025
Author

40Think
Oct 22, 2025
Author

Single-File Indexing (`reindexFile`)

Batch Indexing (`indexVaultToVectorStore`)

40Think
Oct 22, 2025
Author

logancyang Oct 23, 2025
Maintainer