Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TLM] Add custom types to docstrings #293

Merged
merged 7 commits into from
Aug 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 28 additions & 9 deletions cleanlab_studio/studio/trustworthy_language_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,14 @@
**This module is not meant to be imported and used directly.** Instead, use [`Studio.TLM()`](/reference/python/studio/#method-tlm) to instantiate a [TLM](#class-tlm) object, and then you can use the methods like [`prompt()`](#method-prompt) and [`get_trustworthiness_score()`](#method-get_trustworthiness_score) documented on this page.

The [Trustworthy Language Model tutorial](/tutorials/tlm/) further explains TLM and its use cases.

### Type Aliases

Type aliases returned by the TLM module.

- `TLMScoreResponse = Union[float, TLMScore]`: a single TLM response that can be either float, representing the trustworthiness score or a [TLMScore](#class-tlmscore) object containing both the trustworthiness score and log dictionary keys.
- `TLMBatchScoreResponse = Union[List[float], List[TLMScore]]`: a TLM response that can be either a list of floats or a list of [TLMScore](#class-tlmscore) objects containing both the trustworthiness score and log dictionary keys. The list will have the same length as the input list of prompts, response pairs.
- `TLMOptionalBatchScoreResponse = Union[List[Optional[float]], List[Optional[TLMScore]]]`: a TLM response that can be either a list of floats or None (if the call to the TLM failed) or a list of [TLMScore](#class-tlmscore) objects containing both the trustworthiness score and log dictionary keys or None (if the call to the TLM failed). The list will have the same length as the input list of prompts, response pairs.
"""

from __future__ import annotations
Expand All @@ -22,9 +30,9 @@
from cleanlab_studio.errors import ValidationError
from cleanlab_studio.internal.api import api
from cleanlab_studio.internal.constants import (
_TLM_DEFAULT_MODEL,
_TLM_MAX_RETRIES,
_VALID_TLM_QUALITY_PRESETS,
_TLM_DEFAULT_MODEL,
)
from cleanlab_studio.internal.tlm.concurrency import TlmRateHandler
from cleanlab_studio.internal.tlm.validation import (
Expand Down Expand Up @@ -192,7 +200,7 @@ async def _batch_get_trustworthiness_score(
capture_exceptions (bool): if should return None in place of the response for any errors or timeout processing some inputs

Returns:
Union[TLMBatchScoreResponse, TLMOptionalBatchScoreResponse]: TLM trustworthiness score for each prompt (in supplied order)
Union[TLMBatchScoreResponse, TLMOptionalBatchScoreResponse]: TLM trustworthiness score for each prompt (in supplied order).
"""
if capture_exceptions:
per_query_timeout, per_batch_timeout = self._timeout, None
Expand Down Expand Up @@ -437,8 +445,10 @@ def get_trustworthiness_score(
response (str | Sequence[str]): existing response (or list of responses) associated with the input prompts.
These can be from any LLM or human-written responses.
Returns:
float | List[float]: float or list of floats (if multiple prompt-responses were provided) corresponding
to the TLM's trustworthiness score.
TLMScoreResponse | TLMBatchScoreResponse: If a single prompt/response pair was passed in, method returns either a float (representing the output trustworthiness score) or a TLMScore object containing both the trustworthiness score and log dictionary keys. See the documentation for [TLMScoreResponse](#type-aliases) for more details.

If a list of prompt/responses was passed in, method returns a list of floats representing the trustworthiness score or a list of TLMScore objects each containing both the trustworthiness score and log dictionary keys for each prompt-response pair passed in. See the documentation for [TLMBatchScoreResponse](#type-aliases) for more details.

The score quantifies how confident TLM is that the given response is good for the given prompt.
If running on many prompt-response pairs simultaneously:
this method will raise an exception if any TLM errors or timeouts occur.
Expand Down Expand Up @@ -493,7 +503,10 @@ def try_get_trustworthiness_score(
prompt (Sequence[str]): list of prompts for the TLM to evaluate
response (Sequence[str]): list of existing responses corresponding to the input prompts (from any LLM or human-written)
Returns:
List[float]: list of floats corresponding to the TLM's trustworthiness score.
TLMOptionalBatchScoreResponse: If a single prompt/response pair was passed in, method returns either a float (representing the output trustworthiness score), a None (if the call to the TLM failed), or a TLMScore object containing both the trustworthiness score and log dictionary keys.

If a list of prompt/responses was passed in, method returns a list of floats representing the trustworthiness score or a list of TLMScore objects each containing both the trustworthiness score and log dictionary keys for each prompt-response pair passed in. For all TLM calls that failed, the returned list will contain None instead. See the documentation for [TLMOptionalBatchScoreResponse](#type-aliases) for more details.

The score quantifies how confident TLM is that the given response is good for the given prompt.
The returned list will always have the same length as the input list.
In case of TLM error or timeout on any prompt-response pair,
Expand Down Expand Up @@ -524,7 +537,7 @@ async def get_trustworthiness_score_async(
prompt: Union[str, Sequence[str]],
response: Union[str, Sequence[str]],
**kwargs: Any,
) -> Union[TLMScoreResponse, List[float], List[TLMScore]]:
) -> Union[TLMBatchScoreResponse, TLMScoreResponse]:
"""Asynchronously gets trustworthiness score for prompt-response pairs.
This method is similar to the [`get_trustworthiness_score()`](#method-get_trustworthiness_score) method but operates asynchronously,
allowing for non-blocking concurrent operations.
Expand All @@ -537,8 +550,9 @@ async def get_trustworthiness_score_async(
prompt (str | Sequence[str]): prompt (or list of prompts) for the TLM to evaluate
response (str | Sequence[str]): response (or list of responses) corresponding to the input prompts
Returns:
float | List[float]: float or list of floats (if multiple prompt-responses were provided) corresponding
to the TLM's trustworthiness score.
TLMScoreResponse | TLMBatchScoreResponse: If a single prompt/response pair was passed in, method returns either a float (representing the output trustworthiness score) or a TLMScore object containing both the trustworthiness score and log dictionary keys. See the documentation for [TLMScoreResponse](#type-aliases) for more details.

If a list of prompt/responses was passed in, method returns a list of floats representing the trustworthiness score or a list of TLMScore objects each containing both the trustworthiness score and log dictionary keys for each prompt-response pair passed in. See the documentation for [TLMBatchScoreResponse](#type-aliases) for more details.
The score quantifies how confident TLM is that the given response is good for the given prompt.
This method will raise an exception if any errors occur or if you hit a timeout (given a timeout is specified).
"""
Expand Down Expand Up @@ -647,7 +661,12 @@ class TLMResponse(TypedDict):
class TLMScore(TypedDict):
"""A typed dict containing the trustworthiness score and additional logs from the Trustworthy Language Model.

This dictionary is similar to TLMResponse, except it does not contain the response key.
Attributes:
trustworthiness_score (float, optional): score between 0-1 corresponding to the trustworthiness of the response.
A higher score indicates a higher confidence that the response is correct/trustworthy. The trustworthiness score
is omitted if TLM is run with quality preset "base".

log (dict, optional): additional logs and metadata returned from the LLM call only if the `log` key was specified in TLMOptions.
"""

trustworthiness_score: Optional[float]
Expand Down
2 changes: 1 addition & 1 deletion cleanlab_studio/version.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Note to developers:
# Consider if backend's MIN_CLI_VERSION needs updating when pushing any changes to this file.

__version__ = "2.4.1"
__version__ = "2.4.2"

SCHEMA_VERSION = "0.2.0"
MIN_SCHEMA_VERSION = "0.1.0"
Expand Down
Loading