Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore the ML Commons Benchmarks #1154

Open
chakravarthik27 opened this issue Dec 17, 2024 · 0 comments
Open

Explore the ML Commons Benchmarks #1154

chakravarthik27 opened this issue Dec 17, 2024 · 0 comments

Comments

@chakravarthik27
Copy link
Collaborator

Background:
MLCommons is a global AI engineering consortium that focuses on improving accuracy, safety, speed, and efficiency of AI systems through open collaboration and standardized benchmarks. Their mission includes democratizing AI with datasets and evaluation tools that set industry standards for performance and quality.

Objective:
This issue aims to explore and analyze MLCommons Benchmarks to identify opportunities for integrating relevant benchmarks, metrics, and datasets into LangTest. This exploration will help LangTest align with industry standards for model evaluation, ensuring accuracy, robustness, and efficiency.


Tasks:

  1. Research MLCommons Benchmarks:

    • Review MLCommons benchmarks (e.g., MLPerf for training, inference, and datasets).
    • Identify benchmarks that align with LangTest's objectives (e.g., accuracy, fairness, and performance metrics).
  2. Dataset Exploration:

    • Analyze MLCommons datasets (e.g., large-scale open datasets) for relevance to language testing.
    • Investigate potential integrations or extensions to evaluate model robustness, bias, and real-world scenarios.
  3. Metric Mapping:

    • Compare MLCommons metrics with LangTest's current evaluation framework.
    • Propose any new metrics for accuracy, safety, and efficiency improvements.
  4. Documentation and Recommendations:

    • Summarize findings and recommendations for incorporating MLCommons benchmarks or datasets into LangTest.
    • Outline steps for future integration or experimental validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant