A simple synthetic dataset and baseline model for visual counting. The task is to count the number of even digits given a 100x100 image, each with up to 5 randomly chosen MNIST digits. We use rejection sampling to ensure digits are separated by at least 28 pixels. Reproduced with details from Learning to count with deep object features.
NOTE: This is not a dataset to beat, but a simple place to start for validating ideas in counting models.
- Generate TFRecords:
python -m counting_mnist.create_dataset
- Train baseline:
python -m counting_mnist.main
Model | Accuracy |
---|---|
Always Predict Zero Even Digits | 33% |
Uniform Count Predictions | 12% |
Baseline Model | 85% |