Allow parametrizing components without subclassing #450

pmeier · 2024-07-12T12:13:38Z

As of now, the __init__ of the user definable components contains potentially expensive setup code. To still have access to the requirements and the name of the components, we have defined class methods

ragna/ragna/core/_utils.py

Lines 73 to 75 in 55f7fc5

    
           @classmethod 
        
           def requirements(cls) -> list[Requirement]: 
        
               return []

ragna/ragna/core/_components.py

Lines 33 to 39 in 55f7fc5

    
               @classmethod 
        
               def display_name(cls) -> str: 
        
                   """ 
        
                   Returns: 
        
                       Component name. 
        
                   """ 
        
                   return cls.__name__

This works well for simple cases. For realistic use cases however, the methods have to be parametrized. This is currently solved by subclassing a baseclass and putting class constants on the subclass, e.g.

ragna/ragna/assistants/_anthropic.py

Lines 8 to 19 in 55f7fc5

    
           class AnthropicAssistant(HttpApiAssistant): 
        
               _API_KEY_ENV_VAR = "ANTHROPIC_API_KEY" 
        
               _STREAMING_PROTOCOL = HttpStreamingProtocol.SSE 
        
               _MODEL: str 
        
               @classmethod 
        
               def _extra_requirements(cls) -> list[Requirement]: 
        
                   return [PackageRequirement("httpx_sse")] 
        
               @classmethod 
        
               def display_name(cls) -> str: 
        
                   return f"Anthropic/{cls._MODEL}"

ragna/ragna/assistants/_anthropic.py

Lines 74 to 86 in 55f7fc5

    
           class ClaudeOpus(AnthropicAssistant): 
        
               """[Claude 3 Opus](https://docs.anthropic.com/claude/docs/models-overview) 
        
               !!! info "Required environment variables" 
        
                   - `ANTHROPIC_API_KEY` 
        
               !!! info "Required packages" 
        
                   - `httpx_sse` 
        
               """ 
        
               _MODEL = "claude-3-opus-20240229"

This approach is not really scalable. For assistant base classes with a lot of options for the parameters, e.g. OllamaAssistant, we have to provide as many subclasses. In the case of Ollama, we just added some of the available models, but certainly not all. Furthermore, the field of available LLMs is ever expanding and thus if a new model comes out, users will have to wait for another Ragna release to be able to use it or are forced to write the assistant themselves.

It would be much preferable to instantiate the components like any other Python object and allow parameters being passed to it. For example,

assistant = ragna.assistants.OllamaGemma2B()

could turn into

assistant = ragna.assistants.OllamaAssistant(model="gemma:2b")

This would eliminate the need for any subclasses of the OllamAssistant provided by Ragna while at the same time automatically supports future models assuming the API stays the same.

If we want to keep the behavior of the name and the requirements being accessible before running expensive setup code, we would need to keep the classmethods, but now have to pass the same parameters to them that we would pass to __init__. That is kinda awkward:

OllamaAssistant.display_name(model="gemma:2b")

Another solution, and the one that I'm advocating here, is to explicitly put expensive setup code in a dedicated .setup() method. Meaning, the __init__ is cheap by design and thus the requirements and name functions can become regular methods. For example

class MyLocalAssistant(ragna.core.Assistant):
    def __init__(self, model_name: str) -> None:
        self._model_name = model_name
        self._model: Callable
        
    def requirements():
        return ["torch"]
    
    def display_name():
        return f"MyLocalAssistant/{self._model_name}"
        
    def setup():
        import torch
        self._model = torch.load_model()

In the regular Ragna workflow, this means no overhead for the user. We can just call .setup() at the place where we are currently call the constructor. Only if one uses the class outside of ragna.Rag() there is some new overhead with the necessity of calling .setup() manually. But I'm ok with that, given that we aren't optimizing for this UX anyway.

Assuming my proposal above is accepted, we also need to look on how the extra parameters to the components can be passed through the configuration. Right now this is a non-issue, because we can just provide an import string for the specific subclass that we want, e.g.

assistants = [
    "ragna.assistants.RagnaDemoAssistant",
    "ragna.assistants.OllamaGemma2B",
]

With the changes detailed above, this will need to change to

assistants = [
    "ragna.assistants.RagnaDemoAssistant",
    {class="ragna.assistants.OllamaAssistant", model="gemma:2b"},
]

With that we can pass the parameters to the class rather simply. But the way back, i.e. from a Python object to an entry in the configuration file, is problematic. Assuming we have an instant of MyUserDefinedAssistant, we can only determine the class correctly. We have no idea, which parameters were used to create the class. If the original instance was created from a config file, we could store the parameters under a hidden attribute, e.g. __ragna_init_kwargs__, and read that on the way out. But this is rather brittle.

As stated, this is only an issue if we want to ever serialize a config. Right now we only require this in one place: using the interactive wizard to write a configuration file

ragna/ragna/deploy/_cli/core.py

Line 68 in 55f7fc5

config.to_file(output_path, force=force)

#449 is an RFD that would make the config wizard mostly obsolete. Assuming that is accepted, we might not even have an issue here.

The text was updated successfully, but these errors were encountered:

pmeier added area: user-experience 🙋🏻‍♀️ type: RFD ⚖️ Decision making labels Jul 12, 2024

pmeier mentioned this issue Jul 12, 2024

Naming for Gemma2B and Others #384

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow parametrizing components without subclassing #450

Allow parametrizing components without subclassing #450

pmeier commented Jul 12, 2024 •

edited

Loading

Allow parametrizing components without subclassing #450

Allow parametrizing components without subclassing #450

Comments

pmeier commented Jul 12, 2024 • edited Loading

pmeier commented Jul 12, 2024 •

edited

Loading