LLM Benchmarks on company data

We've spent thousands of dollars evaluating LLM performance with company data - so you can skip straight to the results. Watch this repo to get a notification as soon as there's a new benchmark or model.

Since we do a lot of LLM-based company analysis at apistemic, we decided to have one central place to keep track of all the benchmarks. This repo thus covers many business/company-related LLM benchmarks.

How well do LLMs understand companies?

Firstly, we want to evaluate how much inherent knowledge LLMs have about companies and markets. To do this, we just use company names in all benchmarks without any further context provided.

Benchmark: Measuring company knowledge inherent in embeddings

To measure the LLMs' company knowledge in both width and depth, we embed company names in this benchmark. With the assumption being that the more inherent knowledge an LLM has about companies, the more information its embeddings contain.

Methodology: To measure inherent company knowlege, we prompt the name of companies to get an embedding. These embeddings are then used as the only inputs for a complex regression task, namely scoring the competitiveness of two companies via an SVM. A task, that requires a wide and deep understanding of markets, individual companies, business models, and more.

Dataset: See Competitive Positioning Dataset from Apistemic Markets.

Results:

Benchmark: Measuring inherent company knowledge by rating competitiveness

As a second benchmark to measure company knowledge, we use the same task as before and prompt the LLMs directly this time. We thus provide each LLM with the same instructions a human rater got and ask it to rate the competititveness of two companies. Our assumption is that the more knowledge (and understanding) an LLM has, both in width and depth, the better it can perform a competitiveness evaluation.

Methodology: This benchmark prompts LLMs to rate the competitiveness of company pairs on a 1-5 scale. We previously prompted human raters to do the same with the same prompts. The LLMs receive only company names and must use their internal knowledge to assess competitive relationships. Performance is then measured using R² scores and Spearman correlations against expert human evaluations. While R² should rate overall similarity to human raters, Spearman correlations between human and LLM ratings should indicate directional correctness, i.e. whether the LLM has a sense of competitiveness more generally.

Dataset: See Competitive Positioning Dataset from Apistemic Markets.

Results:

Datasets

Our benchmarks are based on proprietary datasets. This sections covers a description of each dataset used.

Competitive Positioning Dataset from Apistemic Markets

Source: apistemic markets

Description: Expert evaluations of competitive positioning between company pairs, where industry professionals assessed relative competitiveness using a standardized five-point scale. These assessments span diverse sectors, encompassing companies of varying sizes and geographic locations to ensure comprehensive coverage across different market contexts.

Used in:

Benchmark: Measuring company knowledge inherent in embeddings
Benchmark: Measuring inherent company knowledge by rating competitiveness

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.data/plots		.data/plots
.github/workflows		.github/workflows
apistemic/benchmarks		apistemic/benchmarks
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
cli.py		cli.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Benchmarks on company data

How well do LLMs understand companies?

Benchmark: Measuring company knowledge inherent in embeddings

Benchmark: Measuring inherent company knowledge by rating competitiveness

Datasets

Competitive Positioning Dataset from Apistemic Markets

About

Uh oh!

Releases

Packages

Uh oh!

Languages

apistemic/benchmarks

Folders and files

Latest commit

History

Repository files navigation

LLM Benchmarks on company data

How well do LLMs understand companies?

Benchmark: Measuring company knowledge inherent in embeddings

Benchmark: Measuring inherent company knowledge by rating competitiveness

Datasets

Competitive Positioning Dataset from Apistemic Markets

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages