Skip to content

eth-lre/MathEval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Math-Eval: Mathematical Equation Dataset Generator

A comprehensive toolkit for generating mathematical equation datasets with visual representations, designed for AI model evaluation and training.

Overview

This repository contains tools to generate various types of mathematical equation datasets:

  1. Text-based equations: Systems of linear equations with 2-3 variables
  2. Visual representations: Character-only, icon-only, and partial icon equations
  3. Counting questions: Visual counting problems using icons
  4. Verification tools: Automated verification of generated equations

Features

  • βœ… Generate systems of linear equations with guaranteed integer solutions
  • βœ… Create visual equation representations using icons or characters
  • βœ… Generate counting problems with visual elements
  • βœ… Automated verification of mathematical correctness
  • βœ… Comprehensive metadata tracking
  • βœ… Configurable parameters for dataset customization
  • βœ… Production-ready scripts with proper error handling

Quick Start

Installation

  1. Clone the repository:
git clone <repository-url>
cd math-eval
  1. Install dependencies:
pip install -r requirements.txt
  1. For inference with gated models (optional - only if using LLaMA models):

⚠️ Note: This step is only required if you plan to use LLaMA models for inference. You can skip this for:

  • Dataset generation only
  • API models (OpenAI, Gemini)
  • Other open-source models (Molmo, Qwen2-VL)

Set up your Hugging Face token for accessing gated repositories using one of these secure methods:

# Option 1: Environment variable (recommended)
export HUGGINGFACE_TOKEN="your_hf_token_here"
# or
export HF_TOKEN="your_hf_token_here"

# Option 2: Config file (any of these locations)
mkdir -p ~/.huggingface
echo "your_hf_token_here" > ~/.huggingface/token

# Option 3: Local token file in project directory
echo "your_hf_token_here" > hf_token.txt

Get your token from: https://huggingface.co/settings/tokens

πŸ”’ Security: Never commit token files to version control. All token files are automatically ignored by git.

  1. Verify installation:
python verifier.py --file two-vars.txt

Generate a Complete Dataset

ℹ️ Note: Dataset generation does not require Hugging Face tokens. HF tokens are only needed for inference with LLaMA models.

Usage: Use the master pipeline script to generate all types of datasets:

# Generate 100 equations with all visual representations
python run_pipeline.py --num_equations 100 --num_vars 2 --task all

# Generate only equations
python run_pipeline.py --num_equations 50 --num_vars 3 --task equations

# Generate only visual representations (requires equations file)
python run_pipeline.py --task visual

# Generate only counting questions
python run_pipeline.py --task counting

Dataset Types

1. Text Equations

Generate systems of linear equations:

python equation_generator.py --output_file my_equations.txt --num 100 --vars 2

Example output:

7 a + 3 b = 33 , 1 a + 10 b = 43 <sep> a = 3 , b = 4
8 a + 2 b = 62 , 6 a + 4 b = 54 <sep> a = 7 , b = 3

2. Visual Equations

Character-only equations

python generate_ocr_custom.py --equations_file equations.txt --output_dir output/char_only

Icon-only equations

python generate_visual_questions.py --equations_file equations.txt --icon_dir colored_icons_final --output_dir output/icon_only

Partial icon equations

python generate_partial_visual_questions.py --equations_file equations.txt --icon_dir colored_icons_final --output_dir output/icon_partial

3. Counting Questions

Generate visual counting problems:

python generate_counting_questions.py --equations_file equations.txt --icon_folder colored_icons_final --output_dir output/counting

File Structure

math-eval/
β”œβ”€β”€ README.md                              # This file
β”œβ”€β”€ requirements.txt                       # Python dependencies
β”œβ”€β”€ run_pipeline.py                        # Master pipeline script
β”œβ”€β”€ config_example.json                    # Example configuration
β”‚
β”œβ”€β”€ Core Scripts/
β”‚   β”œβ”€β”€ equation_generator.py             # Generate base equations
β”‚   β”œβ”€β”€ verifier.py                       # Verify equation correctness
β”‚   β”œβ”€β”€ generate_ocr_custom.py            # Character-only visuals
β”‚   β”œβ”€β”€ generate_visual_questions.py      # Icon-only visuals
β”‚   β”œβ”€β”€ generate_partial_visual_questions.py  # Partial icon visuals
β”‚   └── generate_counting_questions.py    # Counting problems
β”‚
β”œβ”€β”€ Icons/
β”‚   └── colored_icons_final/              # Icon assets
β”‚       β”œβ”€β”€ apple/
β”‚       β”œβ”€β”€ banana/
β”‚       └── ...
β”‚
β”œβ”€β”€ Inference/                             # Model inference scripts
β”‚   β”œβ”€β”€ visual_equation_solving/
β”‚   β”‚   β”œβ”€β”€ icon_only/
β”‚   β”‚   β”‚   β”œβ”€β”€ open-source/              # HuggingFace models
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ inference_icon.py     # Direct inference
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ inference_icon_cot.py # Chain-of-thought
β”‚   β”‚   β”‚   β”‚   └── inference_icon_two_step.py # Two-step reasoning
β”‚   β”‚   β”‚   └── api-models/               # API models
β”‚   β”‚   β”‚       β”œβ”€β”€ inference_icon.py     # Direct inference
β”‚   β”‚   β”‚       └── inference_icon_two_step.py # Two-step reasoning
β”‚   β”‚   └── char_only/
β”‚   β”‚       β”œβ”€β”€ open-source/              # HuggingFace models
β”‚   β”‚       β”‚   β”œβ”€β”€ inference_ocr_only.py # Direct inference
β”‚   β”‚       β”‚   β”œβ”€β”€ inference_ocr_only_cot.py # Chain-of-thought
β”‚   β”‚       β”‚   └── inference_ocr_only_two_step.py # Two-step reasoning
β”‚   β”‚       └── api-models/               # API models
β”‚   β”‚           β”œβ”€β”€ inference_ocr_only.py # Direct inference
β”‚   β”‚           β”œβ”€β”€ inference_ocr_only_cot.py # Chain-of-thought
β”‚   β”‚           └── inference_ocr_only_two_step.py # Two-step reasoning
β”‚   └── counting/
β”‚       β”œβ”€β”€ open-source/                  # HuggingFace models
β”‚       β”‚   β”œβ”€β”€ inference_direct.py      # Direct counting
β”‚       β”‚   └── inference_two_step.py    # Two-step counting
β”‚       └── api-models/                   # API models
β”‚           β”œβ”€β”€ inference_direct.py      # Direct counting
β”‚           └── inference_two_step.py    # Two-step counting
β”‚
β”œβ”€β”€ Sample Data/
β”‚   β”œβ”€β”€ two-vars.txt                      # 2-variable equations
β”‚   └── three-vars.txt                    # 3-variable equations
β”‚
└── outputs/                              # Generated datasets
    β”œβ”€β”€ equations/
    β”œβ”€β”€ visual/
    β”‚   β”œβ”€β”€ char_only/
    β”‚   β”œβ”€β”€ icon_only/
    β”‚   └── icon_partial/
    β”œβ”€β”€ counting/
    └── logs/

Configuration

Create a configuration file to customize dataset generation:

{
  "num_equations": 1000,
  "num_vars": 2,
  "task": "all",
  "output_dir": "my_dataset",
  "icon_dir": "colored_icons_final",
  "skip_verification": false
}

Use with:

python run_pipeline.py --config config.json

API Reference

Core Functions

equation_generator.py

  • Purpose: Generate systems of linear equations
  • Parameters:
    • --output_file: Output file path
    • --num: Number of equations (default: 10)
    • --vars: Number of variables, 2 or 3 (default: 2)
    • --const_max: Maximum constant value (default: 100)

verifier.py

  • Purpose: Verify mathematical correctness of equations
  • Parameters:
    • --file: Path to equations file

Visual Generation Scripts

All visual scripts share similar parameters:

  • --equations_file: Input equations file
  • --output_dir: Output directory
  • --icon_dir: Icon directory (for icon-based scripts)

Output Formats

Equations File Format

equation1 , equation2 , ... <sep> variable_assignments

Metadata CSV Format

Each visual generation script produces a metadata CSV with:

  • filename: Generated image filename
  • Variable-specific columns (icon types, counts, etc.)

Testing and Validation

Verify Generated Equations

python verifier.py --file outputs/equations/2_vars_equations.txt

Test Pipeline Components

# Test small dataset generation
python run_pipeline.py --num_equations 5 --task all

# Verify specific component
python equation_generator.py --output_file test.txt --num 5 --vars 2
python verifier.py --file test.txt

Troubleshooting

Common Issues

  1. Missing Dependencies

    pip install -r requirements.txt
  2. Icon Directory Not Found

    • Ensure colored_icons_final/ directory exists
    • Check icon structure matches expected format
  3. Permission Errors

    • Ensure write permissions in output directory
    • Create output directories manually if needed
  4. Memory Issues with Large Datasets

    • Generate datasets in smaller batches
    • Use --num_equations with smaller values

Error Messages

  • "Icon directory not found": Check --icon_dir parameter
  • "Equations file not found": Run equation generation first
  • "Verification failed": Check equation format in source files

Performance Considerations

  • Small datasets (< 100 equations): ~1 minute
  • Medium datasets (100-1000 equations): ~5-10 minutes
  • Large datasets (1000+ equations): ~30+ minutes

Factors affecting performance:

  • Number of equations
  • Image resolution and complexity
  • Available system memory
  • Icon loading and processing

Running Inference

The repository includes comprehensive inference capabilities for evaluating AI models on the generated datasets.

Inference Setup

  1. Install inference dependencies:
# For API models only (OpenAI, Gemini)
python setup_inference.py --model_type api

# For open-source models only (LLaMA, Molmo, Qwen2-VL)
python setup_inference.py --model_type opensource

# For all models
python setup_inference.py --model_type all
  1. Configure API keys (if using API models):

For secure API key management, use environment variables (recommended):

# For OpenAI models
export OPENAI_API_KEY="your_openai_api_key_here"

# For Google Gemini models
export GOOGLE_API_KEY="your_google_api_key_here"
# or
export GEMINI_API_KEY="your_gemini_api_key_here"

Alternatively, you can pass API keys directly via command line arguments:

python run_inference.py --api_key your_api_key_here ...

πŸ”’ Security: Using environment variables is more secure than hardcoding keys in scripts.

Supported Models

API Models

  • OpenAI GPT-4o: State-of-the-art vision-language model
  • Google Gemini: Advanced multimodal AI model

Open-Source Models

  • LLaMA Vision: Meta's vision-language model ⚠️ Requires HF token (gated repository)
  • Molmo: Allen AI's multimodal model
  • Qwen2-VL: Alibaba's vision-language model

Inference Types

  1. Direct: Single-step inference
  2. Two-step: Multi-step reasoning approach
  3. Chain-of-Thought (CoT): Explicit reasoning chains

Running Inference

Basic Usage

# API model inference (API key from environment variable)
python run_inference.py \
  --task visual_equation_solving \
  --dataset icon_only \
  --model_type api \
  --api_model openai \
  --inference_type direct

# API model inference (API key via argument)
python run_inference.py \
  --task visual_equation_solving \
  --dataset icon_only \
  --model_type api \
  --api_model openai \
  --api_key your-api-key \
  --inference_type direct

# Open-source model inference
python run_inference.py \
  --task visual_equation_solving \
  --dataset icon_only \
  --model_type open_source \
  --os_model llama_vision \
  --inference_type direct

Task Types

  1. Counting Tasks:
# Using environment variable for API key
python run_inference.py \
  --task counting \
  --model_type api \
  --api_model openai \
  --inference_type direct

# Or specify API key directly
python run_inference.py \
  --task counting \
  --model_type api \
  --api_model openai \
  --api_key your-key \
  --inference_type direct
  1. Visual Equation Solving:
# Character-only equations (API key from environment variable)
python run_inference.py \
  --task visual_equation_solving \
  --dataset char_only \
  --model_type api \
  --api_model gemini \
  --inference_type cot

# Icon-only equations
python run_inference.py \
  --task visual_equation_solving \
  --dataset icon_only \
  --model_type open_source \
  --os_model qwen2_vl \
  --inference_type two_step

# Partial visual equations
python run_inference.py \
  --task visual_equation_solving \
  --dataset icon_partial \
  --model_type open_source \
  --os_model molmo \
  --inference_type direct

Batch Processing

# Process multiple configurations (API key from environment)
for dataset in char_only icon_only icon_partial; do
  for inference_type in direct two_step cot; do
    python run_inference.py \
      --task visual_equation_solving \
      --dataset $dataset \
      --model_type api \
      --api_model openai \
      --inference_type $inference_type
  done
done

Configuration Options

Edit inference_config.json to customize:

{
  "api_models": {
    "openai": {
      "api_key": "your-openai-key",
      "model": "gpt-4o",
      "max_tokens": 200,
      "temperature": 0.1
    }
  },
  "datasets": {
    "visual_equation_solving": {
      "icon_only": {
        "image_dir": "three-vars/icon_only",
        "metadata_file": "three-vars/icon_only/metadata.csv"
      }
    }
  },
  "output_dir": "inference_results"
}

Output Analysis

Results are saved as CSV files in the configured output directory:

inference_results/
β”œβ”€β”€ counting_api_direct_results.csv
β”œβ”€β”€ visual_equation_solving_icon_only_api_cot_results.csv
└── visual_equation_solving_char_only_open_source_two_step_results.csv

Each results file contains:

  • image_path: Path to the input image
  • model_response: Raw model output
  • extracted_variables: Parsed variable assignments
  • ground_truth: Expected answers from metadata
  • correct: Boolean indicating if prediction was correct

Performance Analysis

import pandas as pd

# Load results
df = pd.read_csv('inference_results/visual_equation_solving_icon_only_api_direct_results.csv')

# Calculate accuracy
accuracy = df['correct'].mean() * 100
print(f"Accuracy: {accuracy:.2f}%")

# Analyze by equation complexity
correct_by_vars = df.groupby('num_variables')['correct'].mean()
print("Accuracy by number of variables:")
print(correct_by_vars)

Troubleshooting Inference

  1. API Rate Limits: Use delays between requests
  2. GPU Memory Issues: Reduce batch size for open-source models
  3. Model Loading Errors: Ensure sufficient disk space and memory
  4. Accuracy Issues: Try different inference types (cot, two_step)

Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/new-feature)
  3. Commit changes (git commit -am 'Add new feature')
  4. Push to branch (git push origin feature/new-feature)
  5. Create Pull Request

License

[Add your license information here]

Citation

If you use this dataset generator in your research, please cite:

@misc{matheval2024,
  title={Math-Eval: Mathematical Equation Dataset Generator},
  author={[Your Name]},
  year={2024},
  url={[Repository URL]}
}

Support

For issues and questions:

  • Create an issue on GitHub
  • Check existing documentation
  • Review troubleshooting section

Last Updated: September 2025

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published