Skip to content

Conversation

@HARISHSENTHIL
Copy link

This PR refactors the indexing system to support dynamic index loading.

  • Removed dependency on AvailableInfiniGramIndexId enum for index initialization.
  • Introduced a new dynamic InfiniGramProcessor constructor which accepts a direct configuration dictionary.
  • Allows new indexes to be added at runtime via configuration without modifying code or enums.
  • Updated service layers to utilize dynamic index initialization.
  • Preserved backward compatibility for enum-based initialization for existing flows.

@mtblanton
Copy link
Contributor

This is fantastic, thank you! I've been thinking about how to get this to work.

A few questions:

  • IIRC there's some initialization the processor has to do on startup. Are we saving or otherwise caching the processors after their first initialization so we only get that perf hit once?
  • We'll likely need to use other tokenizers in the future. How do you see that working with what you have now?

@HARISHSENTHIL
Copy link
Author

Thanks for the thoughtful questions @mtblanton!

I'm actively working on both aspects — caching and tokenizer flexibility — and will bring up a clean solution for them. The goal is to ensure minimal overhead on first load and make it easy to plug in different tokenizers dynamically per index. Will update you soon with a refined version!

@HARISHSENTHIL
Copy link
Author

Added Processor Caching
-Implemented _processor_cache to avoid expensive re-initialization on every request

  • Processors now cached after first load, eliminating performance hit
  • Maintains original enum-based system performance while supporting dynamic indexing

Enhanced Tokenizer Flexibility

  • Extended tokenizer factory with get_tokenizer_for_index() for per-index tokenizer mapping
  • Added support for multiple tokenizer types (LLaMA-2, LLaMA-3, extensible for future models)
  • Framework ready for any future tokenizers without code changes

Comment on lines +12 to +15
# def InfiniGramProcessorFactoryPathParam(
# index: AvailableInfiniGramIndexId,
# ) -> InfiniGramProcessor:
# return indexes[index]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# def InfiniGramProcessorFactoryPathParam(
# index: AvailableInfiniGramIndexId,
# ) -> InfiniGramProcessor:
# return indexes[index]

otel_span.set_attribute(SpanAttributes.MESSAGING_CLIENT_ID, worker.id)

infini_gram_index = indexes[AvailableInfiniGramIndexId(index)]
#infini_gram_index = indexes[AvailableInfiniGramIndexId(index)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#infini_gram_index = indexes[AvailableInfiniGramIndexId(index)]

Comment on lines +45 to +49
# def __init__(self, index: AvailableInfiniGramIndexId):
# self.index = index.value
# index_mapping = index_mappings[index.value]

# self.tokenizer = index_mapping["tokenizer"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# def __init__(self, index: AvailableInfiniGramIndexId):
# self.index = index.value
# index_mapping = index_mappings[index.value]
# self.tokenizer = index_mapping["tokenizer"]

Comment on lines +67 to +68
# index_dir=index_mapping["index_dir"],
# index_dir_diff=index_mapping["index_dir_diff"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# index_dir=index_mapping["index_dir"],
# index_dir_diff=index_mapping["index_dir_diff"],



indexes = {index: InfiniGramProcessor(index) for index in AvailableInfiniGramIndexId}
# indexes = {index: InfiniGramProcessor(index) for index in AvailableInfiniGramIndexId}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# indexes = {index: InfiniGramProcessor(index) for index in AvailableInfiniGramIndexId}

Copy link
Contributor

@mtblanton mtblanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay in the review! I need to make sure my Github notifs are set up correctly.

I need to test this myself but this is looking great! Suggested changes are just to get rid of commented code.

@mtblanton mtblanton closed this Nov 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants