-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v2] Refactor to Use Enums Instead of Literals #1784
Comments
So will just outline the counter argument. Since you have outlined the reason to use Enum quite well I will here focus more on Literals. vs code does autocomplete literals: You can also see here that it in fact detects the error that "" is not valid. Literal are not magic constants in the same way that string are. To take a few pointers from the blog:
literals in MTEB does have this feature e.g. all domains (which is type hinted) is contained in the list of DOMAINS. The same is the case for "cosine" which is in DISTANCE_METRICS.
I agree with this point. No way to change across the repo rename "cosine" to "Cosine Distance".
This doesn't really apply, it would just be import ...
domains=[DOMAINS.PROGRAMMING] which is more verbose than domains = ["programming"] The pros/cons of using literal (just to get them stated):
In a way a literal can be seen as a compromise between the simple (but unruly) string and the strict and verbose Enum. A few questions:
My general opinion is that for specialized codebased (a few long time maintainers) Enums would be the correct solution. I am unsure if it is the correct solution here with the main concerns being loss of backward compatibility without enough upside and loss of simplicity making it harder for new users to contribute. Would love to hear the opinion of other maintainers: |
The enum is converted to a string during serialization, which is why the mapping for SentenceTransformers functions correctly.
As far as I can tell, it is fully compatible with the previous implementation. However, potential issues could arise if someone has overridden the available scoring functions or if ModelMeta is passed, as this introduces a minor change. Nonetheless, such cases are likely uncommon. |
Although it was argued that Enums are not used, this is actually not the case.
|
Maybe I can also address this point, as I am quite new to the project:
I found it significantly more difficult to add a new model with the Literals compared to using just enums |
That is quite nice to know. for other readers, here is just a small overview of Enums with pydantic objects: from pydantic import BaseModel
from enum import Enum
class MyEnum(str, Enum):
option1 = "option 1"
option2 = "option 2"
class Tester(BaseModel):
var: MyEnum
tester = Tester(var="option 1")
# works which is nice
# the print is a bit less readable:
# Tester(var=<MyEnum.option1: 'option 1'>)
# though the value needs a bit of handling on the user-side
tester.var.value
# converting to json:
tester.model_dump_json()
# '{"var":"option 1"}' An important note: we currently have two major merges coming up (v2.0.0 and MIEB), this will likely cause notably merge conflicts so it might be worth waiting with this change until post MIEB -> v2.0.0 and MAIN -> v2.0.0. (cc @gowitheflow-1998) |
Closing this as there seems to be no interest in transitioning to Enums |
Closed it due to the following:
Good to know there is support for this change. |
Description:
Currently, the codebase relies on literals to describe scoring functions and other constants (magic literals), which can make the code harder to understand and maintain. Using literals for values like scoring functions instead of Enums makes the codebase less clear and reduces the benefits of modern IDE features such as autocompletion and error detection.
In line with the refactoring principle outlined in Refactoring Guru's "Replace Magic Number with Symbolic Constant", I propose refactoring the code to replace these magic literals with Enums. This change will enhance code clarity, facilitate easier development, and prevent potential errors caused by the incorrect use of raw literals.
For example, using literals like
cosine
ordot
as scoring function identifiers makes the code harder to inspect and add new functions to, especially as the project grows. The same applies to other literals used in the project, such asISO_LANGUAGE_SCRIPT
.VSCode does not provide autocompletion for Literal types by default when using the Python extension.
Proposed Solution:
Benefits:
Please provide any feedback or thoughts on this proposed change.
Thx a lot!
Start of the Discussion in: #1759
The text was updated successfully, but these errors were encountered: