A comprehensive toolkit for generating mathematical equation datasets with visual representations, designed for AI model evaluation and training.
This repository contains tools to generate various types of mathematical equation datasets:
- Text-based equations: Systems of linear equations with 2-3 variables
- Visual representations: Character-only, icon-only, and partial icon equations
- Counting questions: Visual counting problems using icons
- Verification tools: Automated verification of generated equations
- β Generate systems of linear equations with guaranteed integer solutions
- β Create visual equation representations using icons or characters
- β Generate counting problems with visual elements
- β Automated verification of mathematical correctness
- β Comprehensive metadata tracking
- β Configurable parameters for dataset customization
- β Production-ready scripts with proper error handling
- Clone the repository:
git clone <repository-url>
cd math-eval- Install dependencies:
pip install -r requirements.txt- For inference with gated models (optional - only if using LLaMA models):
- Dataset generation only
- API models (OpenAI, Gemini)
- Other open-source models (Molmo, Qwen2-VL)
Set up your Hugging Face token for accessing gated repositories using one of these secure methods:
# Option 1: Environment variable (recommended)
export HUGGINGFACE_TOKEN="your_hf_token_here"
# or
export HF_TOKEN="your_hf_token_here"
# Option 2: Config file (any of these locations)
mkdir -p ~/.huggingface
echo "your_hf_token_here" > ~/.huggingface/token
# Option 3: Local token file in project directory
echo "your_hf_token_here" > hf_token.txtGet your token from: https://huggingface.co/settings/tokens
π Security: Never commit token files to version control. All token files are automatically ignored by git.
- Verify installation:
python verifier.py --file two-vars.txtβΉοΈ Note: Dataset generation does not require Hugging Face tokens. HF tokens are only needed for inference with LLaMA models.
Usage: Use the master pipeline script to generate all types of datasets:
# Generate 100 equations with all visual representations
python run_pipeline.py --num_equations 100 --num_vars 2 --task all
# Generate only equations
python run_pipeline.py --num_equations 50 --num_vars 3 --task equations
# Generate only visual representations (requires equations file)
python run_pipeline.py --task visual
# Generate only counting questions
python run_pipeline.py --task countingGenerate systems of linear equations:
python equation_generator.py --output_file my_equations.txt --num 100 --vars 2Example output:
7 a + 3 b = 33 , 1 a + 10 b = 43 <sep> a = 3 , b = 4
8 a + 2 b = 62 , 6 a + 4 b = 54 <sep> a = 7 , b = 3
python generate_ocr_custom.py --equations_file equations.txt --output_dir output/char_onlypython generate_visual_questions.py --equations_file equations.txt --icon_dir colored_icons_final --output_dir output/icon_onlypython generate_partial_visual_questions.py --equations_file equations.txt --icon_dir colored_icons_final --output_dir output/icon_partialGenerate visual counting problems:
python generate_counting_questions.py --equations_file equations.txt --icon_folder colored_icons_final --output_dir output/countingmath-eval/
βββ README.md # This file
βββ requirements.txt # Python dependencies
βββ run_pipeline.py # Master pipeline script
βββ config_example.json # Example configuration
β
βββ Core Scripts/
β βββ equation_generator.py # Generate base equations
β βββ verifier.py # Verify equation correctness
β βββ generate_ocr_custom.py # Character-only visuals
β βββ generate_visual_questions.py # Icon-only visuals
β βββ generate_partial_visual_questions.py # Partial icon visuals
β βββ generate_counting_questions.py # Counting problems
β
βββ Icons/
β βββ colored_icons_final/ # Icon assets
β βββ apple/
β βββ banana/
β βββ ...
β
βββ Inference/ # Model inference scripts
β βββ visual_equation_solving/
β β βββ icon_only/
β β β βββ open-source/ # HuggingFace models
β β β β βββ inference_icon.py # Direct inference
β β β β βββ inference_icon_cot.py # Chain-of-thought
β β β β βββ inference_icon_two_step.py # Two-step reasoning
β β β βββ api-models/ # API models
β β β βββ inference_icon.py # Direct inference
β β β βββ inference_icon_two_step.py # Two-step reasoning
β β βββ char_only/
β β βββ open-source/ # HuggingFace models
β β β βββ inference_ocr_only.py # Direct inference
β β β βββ inference_ocr_only_cot.py # Chain-of-thought
β β β βββ inference_ocr_only_two_step.py # Two-step reasoning
β β βββ api-models/ # API models
β β βββ inference_ocr_only.py # Direct inference
β β βββ inference_ocr_only_cot.py # Chain-of-thought
β β βββ inference_ocr_only_two_step.py # Two-step reasoning
β βββ counting/
β βββ open-source/ # HuggingFace models
β β βββ inference_direct.py # Direct counting
β β βββ inference_two_step.py # Two-step counting
β βββ api-models/ # API models
β βββ inference_direct.py # Direct counting
β βββ inference_two_step.py # Two-step counting
β
βββ Sample Data/
β βββ two-vars.txt # 2-variable equations
β βββ three-vars.txt # 3-variable equations
β
βββ outputs/ # Generated datasets
βββ equations/
βββ visual/
β βββ char_only/
β βββ icon_only/
β βββ icon_partial/
βββ counting/
βββ logs/
Create a configuration file to customize dataset generation:
{
"num_equations": 1000,
"num_vars": 2,
"task": "all",
"output_dir": "my_dataset",
"icon_dir": "colored_icons_final",
"skip_verification": false
}Use with:
python run_pipeline.py --config config.json- Purpose: Generate systems of linear equations
- Parameters:
--output_file: Output file path--num: Number of equations (default: 10)--vars: Number of variables, 2 or 3 (default: 2)--const_max: Maximum constant value (default: 100)
- Purpose: Verify mathematical correctness of equations
- Parameters:
--file: Path to equations file
All visual scripts share similar parameters:
--equations_file: Input equations file--output_dir: Output directory--icon_dir: Icon directory (for icon-based scripts)
equation1 , equation2 , ... <sep> variable_assignments
Each visual generation script produces a metadata CSV with:
filename: Generated image filename- Variable-specific columns (icon types, counts, etc.)
python verifier.py --file outputs/equations/2_vars_equations.txt# Test small dataset generation
python run_pipeline.py --num_equations 5 --task all
# Verify specific component
python equation_generator.py --output_file test.txt --num 5 --vars 2
python verifier.py --file test.txt-
Missing Dependencies
pip install -r requirements.txt
-
Icon Directory Not Found
- Ensure
colored_icons_final/directory exists - Check icon structure matches expected format
- Ensure
-
Permission Errors
- Ensure write permissions in output directory
- Create output directories manually if needed
-
Memory Issues with Large Datasets
- Generate datasets in smaller batches
- Use
--num_equationswith smaller values
- "Icon directory not found": Check
--icon_dirparameter - "Equations file not found": Run equation generation first
- "Verification failed": Check equation format in source files
- Small datasets (< 100 equations): ~1 minute
- Medium datasets (100-1000 equations): ~5-10 minutes
- Large datasets (1000+ equations): ~30+ minutes
Factors affecting performance:
- Number of equations
- Image resolution and complexity
- Available system memory
- Icon loading and processing
The repository includes comprehensive inference capabilities for evaluating AI models on the generated datasets.
- Install inference dependencies:
# For API models only (OpenAI, Gemini)
python setup_inference.py --model_type api
# For open-source models only (LLaMA, Molmo, Qwen2-VL)
python setup_inference.py --model_type opensource
# For all models
python setup_inference.py --model_type all- Configure API keys (if using API models):
For secure API key management, use environment variables (recommended):
# For OpenAI models
export OPENAI_API_KEY="your_openai_api_key_here"
# For Google Gemini models
export GOOGLE_API_KEY="your_google_api_key_here"
# or
export GEMINI_API_KEY="your_gemini_api_key_here"Alternatively, you can pass API keys directly via command line arguments:
python run_inference.py --api_key your_api_key_here ...π Security: Using environment variables is more secure than hardcoding keys in scripts.
- OpenAI GPT-4o: State-of-the-art vision-language model
- Google Gemini: Advanced multimodal AI model
- LLaMA Vision: Meta's vision-language model
β οΈ Requires HF token (gated repository) - Molmo: Allen AI's multimodal model
- Qwen2-VL: Alibaba's vision-language model
- Direct: Single-step inference
- Two-step: Multi-step reasoning approach
- Chain-of-Thought (CoT): Explicit reasoning chains
# API model inference (API key from environment variable)
python run_inference.py \
--task visual_equation_solving \
--dataset icon_only \
--model_type api \
--api_model openai \
--inference_type direct
# API model inference (API key via argument)
python run_inference.py \
--task visual_equation_solving \
--dataset icon_only \
--model_type api \
--api_model openai \
--api_key your-api-key \
--inference_type direct
# Open-source model inference
python run_inference.py \
--task visual_equation_solving \
--dataset icon_only \
--model_type open_source \
--os_model llama_vision \
--inference_type direct- Counting Tasks:
# Using environment variable for API key
python run_inference.py \
--task counting \
--model_type api \
--api_model openai \
--inference_type direct
# Or specify API key directly
python run_inference.py \
--task counting \
--model_type api \
--api_model openai \
--api_key your-key \
--inference_type direct- Visual Equation Solving:
# Character-only equations (API key from environment variable)
python run_inference.py \
--task visual_equation_solving \
--dataset char_only \
--model_type api \
--api_model gemini \
--inference_type cot
# Icon-only equations
python run_inference.py \
--task visual_equation_solving \
--dataset icon_only \
--model_type open_source \
--os_model qwen2_vl \
--inference_type two_step
# Partial visual equations
python run_inference.py \
--task visual_equation_solving \
--dataset icon_partial \
--model_type open_source \
--os_model molmo \
--inference_type direct# Process multiple configurations (API key from environment)
for dataset in char_only icon_only icon_partial; do
for inference_type in direct two_step cot; do
python run_inference.py \
--task visual_equation_solving \
--dataset $dataset \
--model_type api \
--api_model openai \
--inference_type $inference_type
done
doneEdit inference_config.json to customize:
{
"api_models": {
"openai": {
"api_key": "your-openai-key",
"model": "gpt-4o",
"max_tokens": 200,
"temperature": 0.1
}
},
"datasets": {
"visual_equation_solving": {
"icon_only": {
"image_dir": "three-vars/icon_only",
"metadata_file": "three-vars/icon_only/metadata.csv"
}
}
},
"output_dir": "inference_results"
}Results are saved as CSV files in the configured output directory:
inference_results/
βββ counting_api_direct_results.csv
βββ visual_equation_solving_icon_only_api_cot_results.csv
βββ visual_equation_solving_char_only_open_source_two_step_results.csvEach results file contains:
image_path: Path to the input imagemodel_response: Raw model outputextracted_variables: Parsed variable assignmentsground_truth: Expected answers from metadatacorrect: Boolean indicating if prediction was correct
import pandas as pd
# Load results
df = pd.read_csv('inference_results/visual_equation_solving_icon_only_api_direct_results.csv')
# Calculate accuracy
accuracy = df['correct'].mean() * 100
print(f"Accuracy: {accuracy:.2f}%")
# Analyze by equation complexity
correct_by_vars = df.groupby('num_variables')['correct'].mean()
print("Accuracy by number of variables:")
print(correct_by_vars)- API Rate Limits: Use delays between requests
- GPU Memory Issues: Reduce batch size for open-source models
- Model Loading Errors: Ensure sufficient disk space and memory
- Accuracy Issues: Try different inference types (cot, two_step)
- Fork the repository
- Create feature branch (
git checkout -b feature/new-feature) - Commit changes (
git commit -am 'Add new feature') - Push to branch (
git push origin feature/new-feature) - Create Pull Request
[Add your license information here]
If you use this dataset generator in your research, please cite:
@misc{matheval2024,
title={Math-Eval: Mathematical Equation Dataset Generator},
author={[Your Name]},
year={2024},
url={[Repository URL]}
}For issues and questions:
- Create an issue on GitHub
- Check existing documentation
- Review troubleshooting section
Last Updated: September 2025