Cleanlab Trustworthy Language Model (TLM) - Reliability and explainability added to every LLM output
In one line of code, Cleanlab TLM adds real-time evaluation of every response in GenAI, RAG, LLM, and Agent systems.
This tutorial requires a TLM API key. Get one here.
export CLEANLAB_TLM_API_KEY=<YOUR_API_KEY_HERE>
Install the package:
pip install cleanlab-tlm
To get started, copy the code below to try your own prompt or score existing prompt/response pairs with ease.
from cleanlab_tlm import TLM
tlm = TLM(options={"log": ["explanation"], "model": "gpt-4o-mini"}) # GPT, Claude, etc.
out = tlm.prompt("What's the third month of the year alphabetically?")
print(out)
TLM returns a dictionary containing response
, trustworthiness_score
, and any requested optional fields like explanation
.
{
"response": "March.",
"trustworthiness_score": 0.4590804375945598,
"explanation": "Found an alternate response: December"
}
- Trustworthiness Scores: Each response comes with a trustworthiness score, helping you reliably gauge the likelihood of hallucinations.
- Higher accuracy: Rigorous benchmarks show TLM consistently produces more accurate results than other LLMs like o3/o1, GPT 4o, and Claude.
- Scalable API: Designed to handle large datasets, TLM is suitable for most enterprise applications, including data extraction, tagging/labeling, Q&A (RAG), and more.
Comprehensive documentation along with tutorials and examples can be found here.
cleanlab-tlm
is distributed under the terms of the MIT license.