Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TLM] Add TLMCalibrated class #326

Merged
merged 7 commits into from
Sep 25, 2024
Merged

[TLM] Add TLMCalibrated class #326

merged 7 commits into from
Sep 25, 2024

Conversation

huiwengoh
Copy link
Contributor

@huiwengoh huiwengoh commented Sep 24, 2024

Building off the custom eval criteria in #325

Sample workflow:

prompts = ["what is 1+1", "what is 1+2", "what is 1+3"]
responses = ["2", "hm let me think, i think the answer is 3? am i right?", "5"]
ratings = [5, 3, 1]

custom_eval_criteria_option = {"custom_eval_criteria": [
    {"name": "Conciseness", "criteria": "Determine if the output is concise."}
]}

tlm = studio.TLM(options=custom_eval_criteria_option)
scores = tlm.get_trustworthiness_score(prompts, responses)

tlm_calibrated = studio.TLMCalibrated(options=custom_eval_criteria_option)
tlm_calibrated.fit(scores, ratings)

tlm_calibrated.get_trustworthiness_score("what is 2+2", "5")
>> {'trustworthiness_score': 0.050770278656628066,
 'log': {'custom_eval_criteria': [{'name': 'Conciseness', 'score': 0.8}]},
 'calibrated_score': 0.365}

tlm_calibrated.get_trustworthiness_score(["what is 2+2", "what is 2+3"], ["i think it is 4? maybe 3", "5"])
>> [{'trustworthiness_score': 0.6060140420912912,
  'log': {'custom_eval_criteria': [{'name': 'Conciseness',
     'score': 0.2641642614722914}]},
  'calibrated_score': 0.51},
 {'trustworthiness_score': 0.9967050946426079,
  'log': {'custom_eval_criteria': [{'name': 'Conciseness',
     'score': 0.9999999999999998}]},
  'calibrated_score': 0.79}]

Copy link

Ensure final changes made to the TLM code are tested before merging. To run the TLM tests, comment /test-tlm on this PR. To re-run failed property tests, comment /rerun-failed-test-tlm instead.

Copy link

Ensure final changes made to the TLM code are tested before merging. To run the TLM tests, comment /test-tlm on this PR. To re-run failed property tests, comment /rerun-failed-test-tlm instead.

Copy link

Ensure final changes made to the TLM code are tested before merging. To run the TLM tests, comment /test-tlm on this PR. To re-run failed property tests, comment /rerun-failed-test-tlm instead.

Copy link

Ensure final changes made to the TLM code are tested before merging. To run the TLM tests, comment /test-tlm on this PR. To re-run failed property tests, comment /rerun-failed-test-tlm instead.

setup.py Outdated Show resolved Hide resolved
Copy link

Ensure final changes made to the TLM code are tested before merging. To run the TLM tests, comment /test-tlm on this PR. To re-run failed property tests, comment /rerun-failed-test-tlm instead.

@jas2600
Copy link
Contributor

jas2600 commented Sep 25, 2024

Copy link

Ensure final changes made to the TLM code are tested before merging. To run the TLM tests, comment /test-tlm on this PR. To re-run failed property tests, comment /rerun-failed-test-tlm instead.

@jas2600
Copy link
Contributor

jas2600 commented Sep 25, 2024

/test-tlm
Starting TLM tests...
If you want to run all the TLM tests again (because TLM code is ready for review), comment '/test-tlm' on this PR.
If you want to re-run only the failed tests (you are still developing), comment '/rerun-failed-test-tlm' on this PR.
View full GitHub Actions run log
Tests completed!
TLM Tests Results: ✅✅✅✅✅
TLM Property Tests Results: ✅✅✅✅✅
Click the Github Actions run log for more information.

Copy link

Ensure final changes made to the TLM code are tested before merging. To run the TLM tests, comment /test-tlm on this PR. To re-run failed property tests, comment /rerun-failed-test-tlm instead.

@jas2600 jas2600 merged commit 90fb5cf into main Sep 25, 2024
7 checks passed
@jas2600 jas2600 deleted the tlm-calibrate branch September 25, 2024 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants