Improving benchmarking token counting accuracy #726

nwangfw · 2025-02-21T21:57:34Z

🐛 Describe the bug

It seems that the python client side token count is not same as the VLLM token counting results. The difference is about 300 for input length. It may worth to take a look of this issue.

"Prompt": "You are an AI programming assistant and your task is to generate a SQL query based on the input database schema and user questions.\n### Task Description:\nGiven the following database schema, please write a SQL query to answer the given question.\n\n### Schema:\nThe database contains 4 tables: ['atom', 'bond', 'connected', 'molecule'].\n\n- Table: atom\n\t- Description: The table atom has 3 columns: ['atom_id', 'molecule_id', 'element'].\n\t- Primary Key: atom_id\n\t- Foreign Keys: atom.molecule_id = molecule.molecule_id\n\t- Column: atom_id\n\t\t- Type: TEXT\n\t\t- Description: the unique id of atoms\n\t\t- Sampled Values: TR000_1, TR000_2, TR000_3, TR000_4, TR000_5\n\t- Column: molecule_id\n\t\t- Type: TEXT\n\t\t- Description: identifying the molecule to which the atom belongs\n\t\t- Sampled Values: TR000, TR001, TR002, TR003, TR004\n\t- Column: element\n\t\t- Type: TEXT\n\t\t- Description: the element of the toxicology\n\t\t- Sampled Values: b, br, c, ca, cl\n\n- Table: bond\n\t- Description: The table bond has 3 columns: ['bond_id', 'molecule_id', 'bond_type'].\n\t- Primary Key: bond_id\n\t- Foreign Keys: bond.molecule_id = molecule.molecule_id\n\t- Column: bond_id\n\t\t- Type: TEXT\n\t\t- Description: unique id representing bonds\n\t\t- Sampled Values: TR000_1_2, TR000_2_3, TR000_2_4, TR000_2_5, TR001_10_11\n\t- Column: molecule_id\n\t\t- Type: TEXT\n\t\t- Description: identifying the molecule in which the bond appears\n\t\t- Sampled Values: TR000, TR001, TR002, TR003, TR004\n\t- Column: bond_type\n\t\t- Type: TEXT\n\t\t- Description: type of the bond\n\t\t- Sampled Values: None, #, -, =\n\n- Table: connected\n\t- Description: The table connected has 3 columns: ['atom_id', 'atom_id2', 'bond_id'].\n\t- Primary Key: ['atom_id', 'atom_id2']\n\t- Foreign Keys: connected.atom_id = atom.atom_id, connected.atom_id2 = atom.atom_id, connected.bond_id = bond.bond_id\n\t- Column: atom_id\n\t\t- Type: TEXT\n\t\t- Description: id of the first atom\n\t\t- Sampled Values: TR000_1, TR000_2, TR000_3, TR000_4, TR000_5\n\t- Column: atom_id2\n\t\t- Type: TEXT\n\t\t- Description: id of the second atom\n\t\t- Sampled Values: TR000_1, TR000_2, TR000_3, TR000_4, TR000_5\n\t- Column: bond_id\n\t\t- Type: TEXT\n\t\t- Description: bond id representing bond between two atoms\n\t\t- Sampled Values: TR000_1_2, TR000_2_3, TR000_2_4, TR000_2_5, TR001_10_11\n\n- Table: molecule\n\t- Description: The table molecule has 2 columns: ['molecule_id', 'label'].\n\t- Primary Key: molecule_id\n\t- Foreign Keys: \n\t- Column: molecule_id\n\t\t- Type: TEXT\n\t\t- Description: unique id of molecule\n\t\t- Sampled Values: TR000, TR001, TR002, TR004, TR006\n\t- Column: label\n\t\t- Type: TEXT\n\t\t- Description: whether this molecule is carcinogenic or not\n\t\t- Sampled Values: +, -\n\n### Prior Knowledge:\n- label = '+' mean molecules are carcinogenic;\n\n### Requirements:\n* Please only return the SQL query to answer the question with no explanation.\n* Please format your answer into a Markdown code block as sql\n<YOUR SQL QUERY>\n.\n* Please do NOT select extra columns that are not explicitly requested in the query.\n* Ensure that the table and column names in the generated query exactly match those in the schema. Do NOT include any columns or tables that are not present in the provided schema.\n* Please ensure that the SQL query remains concise and avoids unnecessary joins with unrelated tables.\n\n### Question:\nHow many of the molecules are carcinogenic?\n\n### Output:\n",
"Prompt Length": 1026,
"Output Length": 20,
"Metadata": {
"model_response": {
"id": "chat-b2f6004c0c844c8d94b0e1ef95339882",
"object": "chat.completion",
"created": 1740116628,
"model": "deepseek-coder-33b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "sql\nSELECT COUNT(*) FROM molecule WHERE label = '+'\n",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 1316,
"total_tokens": 1336,
"completion_tokens": 20
},
"prompt_logprobs": null
},
"temperature": 0.0
}

Steps to Reproduce

send a curl to vllm and vllm will return the prompt_token. Compare this number with the local tokenizer's count,

Expected behavior

These two token counts should be the same

Environment

LLM Model used: deepseek-33b

The text was updated successfully, but these errors were encountered:

nwangfw self-assigned this Feb 21, 2025

nwangfw added the area/heterogeneous label Feb 21, 2025

nwangfw changed the title ~~Improving benchmarking accuracy~~ Improving benchmarking token counting accuracy Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving benchmarking token counting accuracy #726

Improving benchmarking token counting accuracy #726

nwangfw commented Feb 21, 2025

Improving benchmarking token counting accuracy #726

Improving benchmarking token counting accuracy #726

Comments

nwangfw commented Feb 21, 2025

🐛 Describe the bug

Steps to Reproduce

Expected behavior

Environment