Dynamic indexing initialized #112

HARISHSENTHIL · 2025-06-25T07:13:05Z

This PR refactors the indexing system to support dynamic index loading.

Removed dependency on AvailableInfiniGramIndexId enum for index initialization.
Introduced a new dynamic InfiniGramProcessor constructor which accepts a direct configuration dictionary.
Allows new indexes to be added at runtime via configuration without modifying code or enums.
Updated service layers to utilize dynamic index initialization.
Preserved backward compatibility for enum-based initialization for existing flows.

mtblanton · 2025-06-25T17:05:09Z

This is fantastic, thank you! I've been thinking about how to get this to work.

A few questions:

IIRC there's some initialization the processor has to do on startup. Are we saving or otherwise caching the processors after their first initialization so we only get that perf hit once?
We'll likely need to use other tokenizers in the future. How do you see that working with what you have now?

HARISHSENTHIL · 2025-07-13T10:52:57Z

Thanks for the thoughtful questions @mtblanton!

I'm actively working on both aspects — caching and tokenizer flexibility — and will bring up a clean solution for them. The goal is to ensure minimal overhead on first load and make it easy to plug in different tokenizers dynamically per index. Will update you soon with a refined version!

HARISHSENTHIL · 2025-07-25T14:25:16Z

Added Processor Caching
-Implemented _processor_cache to avoid expensive re-initialization on every request

Processors now cached after first load, eliminating performance hit
Maintains original enum-based system performance while supporting dynamic indexing

Enhanced Tokenizer Flexibility

Extended tokenizer factory with get_tokenizer_for_index() for per-index tokenizer mapping
Added support for multiple tokenizer types (LLaMA-2, LLaMA-3, extensible for future models)
Framework ready for any future tokenizers without code changes

mtblanton · 2025-09-03T22:54:42Z

api/src/infinigram/infini_gram_dependency.py

+# def InfiniGramProcessorFactoryPathParam(
+#     index: AvailableInfiniGramIndexId,
+# ) -> InfiniGramProcessor:
+#     return indexes[index]


Suggested change

# def InfiniGramProcessorFactoryPathParam(

# index: AvailableInfiniGramIndexId,

# ) -> InfiniGramProcessor:

# return indexes[index]

mtblanton · 2025-09-03T22:55:11Z

attribution_worker/worker.py

            otel_span.set_attribute(SpanAttributes.MESSAGING_CLIENT_ID, worker.id)

-        infini_gram_index = indexes[AvailableInfiniGramIndexId(index)]
+        #infini_gram_index = indexes[AvailableInfiniGramIndexId(index)]


Suggested change

#infini_gram_index = indexes[AvailableInfiniGramIndexId(index)]

mtblanton · 2025-09-03T22:55:36Z

packages/infini-gram-processor/src/infini_gram_processor/processor.py

+    # def __init__(self, index: AvailableInfiniGramIndexId):
+    #     self.index = index.value
+    #     index_mapping = index_mappings[index.value]
+
+    #     self.tokenizer = index_mapping["tokenizer"]


Suggested change

# def __init__(self, index: AvailableInfiniGramIndexId):

# self.index = index.value

# index_mapping = index_mappings[index.value]

# self.tokenizer = index_mapping["tokenizer"]

mtblanton · 2025-09-03T22:56:13Z

packages/infini-gram-processor/src/infini_gram_processor/processor.py

+            # index_dir=index_mapping["index_dir"],
+            # index_dir_diff=index_mapping["index_dir_diff"],


Suggested change

# index_dir=index_mapping["index_dir"],

# index_dir_diff=index_mapping["index_dir_diff"],

mtblanton · 2025-09-03T22:56:23Z

packages/infini-gram-processor/src/infini_gram_processor/processor.py



-indexes = {index: InfiniGramProcessor(index) for index in AvailableInfiniGramIndexId}
+# indexes = {index: InfiniGramProcessor(index) for index in AvailableInfiniGramIndexId}


Suggested change

# indexes = {index: InfiniGramProcessor(index) for index in AvailableInfiniGramIndexId}

mtblanton

Apologies for the delay in the review! I need to make sure my Github notifs are set up correctly.

I need to test this myself but this is looking great! Suggested changes are just to get rid of commented code.

Dynamic indexing initialized

9c0aa90

flexible tokenzier and cache were added

f3f1489

HARISHSENTHIL closed this Jul 25, 2025

HARISHSENTHIL reopened this Jul 25, 2025

mtblanton reviewed Sep 3, 2025

View reviewed changes

mtblanton closed this Nov 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dynamic indexing initialized #112

Dynamic indexing initialized #112

Uh oh!

HARISHSENTHIL commented Jun 25, 2025

Uh oh!

mtblanton commented Jun 25, 2025

Uh oh!

HARISHSENTHIL commented Jul 13, 2025

Uh oh!

HARISHSENTHIL commented Jul 25, 2025

Uh oh!

mtblanton Sep 3, 2025

Uh oh!

mtblanton Sep 3, 2025

Uh oh!

mtblanton Sep 3, 2025

Uh oh!

mtblanton Sep 3, 2025

Uh oh!

mtblanton Sep 3, 2025

Uh oh!

mtblanton left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# index_dir=index_mapping["index_dir"],
		# index_dir_diff=index_mapping["index_dir_diff"],



		indexes = {index: InfiniGramProcessor(index) for index in AvailableInfiniGramIndexId}
		# indexes = {index: InfiniGramProcessor(index) for index in AvailableInfiniGramIndexId}

Dynamic indexing initialized #112

Dynamic indexing initialized #112

Uh oh!

Conversation

HARISHSENTHIL commented Jun 25, 2025

Uh oh!

mtblanton commented Jun 25, 2025

Uh oh!

HARISHSENTHIL commented Jul 13, 2025

Uh oh!

HARISHSENTHIL commented Jul 25, 2025

Uh oh!

mtblanton Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

mtblanton Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

mtblanton Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

mtblanton Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

mtblanton Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

mtblanton left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants