Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize Custom search() Method #1826

Open
sam-hey opened this issue Jan 17, 2025 · 7 comments
Open

Generalize Custom search() Method #1826

sam-hey opened this issue Jan 17, 2025 · 7 comments

Comments

@sam-hey
Copy link
Contributor

sam-hey commented Jan 17, 2025

Currently, only BM25 uses a custom implementation of the search() method, achieved by checking if the model name is bm25s. This approach is not scalable or practical for future implementations requiring custom search methods, such as ColBERT with an index. A more flexible and modular solution is needed to accommodate diverse search strategies.

        elif (
            hasattr(self.retriever.model.model, "mteb_model_meta")
            and self.retriever.model.model.mteb_model_meta.name == "bm25s"
        ):
            return self.retriever.model.model.search(
                corpus,
                queries,
                self.top_k,
                task_name=self.task_name,  # type: ignore
            )

https://github.com/embeddings-benchmark/mteb/blob/main/mteb/evaluation/evaluators/RetrievalEvaluator.py#L472:L475

@KennethEnevoldsen
Copy link
Contributor

Completely agree. Would be nice to just specify an interface for this such that any model could implement with. (cc @orionw)

@orionw
Copy link
Contributor

orionw commented Jan 17, 2025

Thanks for raising @sam-hey!

I can definitely see the benefit! On the other hand, having it standardized makes it so each model class has the same function and is more reliable that way.

I can see both sides, but personally I think I would prefer to keep the core search functions in MTEB, so users can see them there and assume each model searches the same within their own “class” (eg that all dense retrievers use the same base functionality). I think it’d be great if we made BM25 a first class MTEB model so we didn’t have to rely on that (and could also add other sparse non-neural versions like Pyserini).

OTOH, there are probably 3 ish other model “classes” or types that would involve a different search functionality: multi-vector (like ColBERT as you say), and then perhaps neural sparse retrieval (like Splade) and generative retrieval.

So we should definitely make it so that each of these could be added, which as @KennethEnevoldsen says likely involves a change to the interface. But since there are less than 10 model “classes”, it seems like we could do that with an if statement. But perhaps it’s too early in the morning and I’m missing something!

@sam-hey
Copy link
Contributor Author

sam-hey commented Feb 1, 2025

I would like to propose an improvement to MTEB by introducing an Adapter Class between Model Classes and Tasks. Currently, the logic in RetrievalEvaluator is growing, making it less scalable and harder to maintain. This structure tightly couples retrieval methods with the evaluator, requiring modifications to the core logic whenever a new retrieval method is added.

if self.is_cross_encoder:
return self.retriever.search_cross_encoder(
corpus, queries, self.top_k, instructions=instructions, **kwargs
)
elif (
hasattr(self.retriever.model.model, "mteb_model_meta")
and self.retriever.model.model.mteb_model_meta.name == "bm25s"
):
return self.retriever.model.model.search(
corpus,
queries,
self.top_k,
task_name=self.task_name, # type: ignore
instructions=instructions,
score_function="bm25",
**kwargs,
)
else:
return self.retriever.search(
corpus,
queries,
self.top_k,
instructions=instructions,
request_qid=qid,
task_name=self.task_name,
**kwargs,
)

To improve scalability and separation of concerns, I suggest introducing an Adapter pattern, where each model type (e.g., Cross-Encoder, Bi-Encoder) implements only the tasks it supports.

from abc import ABC, abstractmethod

class ModellTaskAdapter(ABC):
    @abstractmethod
    def searchRetrieval(...) -> dict[str, dict[str, float]]:
        """Search retrieval task implementation"""
        pass
    
    @abstractmethod
    def pairClassification(...):
        """Pair classification task implementation"""
        pass


class CrossEncoderModellTaskAdapter(ModellTaskAdapter): 
    def searchRetrieval(...) -> dict[str, dict[str, float]]:
        # some implementation
        return {}
    
    def pairClassification(...):
        raise NotImplementedError("Cross-Encoder does not support pair classification")
    
class BiEncoderModellTaskAdapter(ModellTaskAdapter): 
    def searchRetrieval(...) -> dict[str, dict[str, float]]:
        # some implementation
        return {}
    
    def pairClassification(...):
        # some implementation
        return {}

cc @Samoed @isaac-chung

@Samoed
Copy link
Collaborator

Samoed commented Feb 1, 2025

This is a very interesting approach! I see what you're aiming for, but I don’t see many issues with the current evaluators. Your approach would require significant refactoring if we want to make changes, and it might also make it harder to replicate model results. I think we can just add functions that are used in a lot of places like encode, similarity.

@KennethEnevoldsen
Copy link
Contributor

@sam-hey I do agree with your concern and agree that we should allow for as much flexibility as possible on the model end. I also agree that the retrieval evaluation is growing quite complicated

This suggestion however would be a large departure from the current interface approach where we specify a quite simple interface and I believe we can make fewer breaking changes to resolve the current issues. (backward compatibility is a concern that has been raised)

@sam-hey
Copy link
Contributor Author

sam-hey commented Feb 1, 2025

I'm not sure if I'm missing something, but in my opinion, it should be possible to simply inherit from one of the Adapter classes in the ModelWrapper (SentenceTransformerWrapper etc.) class. This wouldn't require too many changes and would make it much cleaner to call methods like searchRetrieval() directly.

Even though the UML looks quite messy, this approach would significantly reduce complexity while also making it clear which operations are implemented for each type of model class.

@isaac-chung
Copy link
Collaborator

@sam-hey thanks for your suggestion. As mentioned by @orionw and @KennethEnevoldsen previously, we do not currently plan on making large interface changes. Most of the feedback we got recently was on the frequency and amount of interface changes that made the library harder to use. As such, we will not pursue this avenue at this time. Thanks again for your efforts so far.

That said, there are many other open issues that could use your help. In addition to adding models, datasets, and benchmarks, we would love contributions (in the shorter term) on the v2 issues - see the pinned issue -, any leaderboard v2 issue, and issues that help improve the test suite or make the library more lightweight. Let me know if you'd like some pointers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants