Skip to content

Eval dataset suggestion #6442

Description

@connerlambden

Helium Market Resolution Benchmark – 300 frozen prompts from live option chains (NVDA, SPY, TSLA, AAPL, QQQ, AMZN). Tests IV, delta, arbitrage, rich-vs-average strike. Ground truth from the chain. No model above 50%.

Dataset: https://huggingface.co/datasets/HeliumTrades/helium-market-resolution-benchmark
Overview: https://heliumtrades.com/benchmarks/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions