This repository provides some simple tasks based on board puzzles like
- N-Puzzle
- N-Puzzle with modified image (blurring, contours, etc.)
- N-Tile Jigsaw
- N-Tile Jigsaw with modified image (blurring, contours, etc.)
- Rush-Hour (uses boards by rush)
These tasks are set up as gymnasium environments as it allows for versatile evaluation (and can also be used to evaluate models other than VLMs).
pip install "visual_puzzle @ git+https://github.com/Sahaj09/Visual-Puzzles.git@main"
import gymnasium as gym
import visual_puzzle
env = gym.make("n_Puzzle-v0", render_mode="human")
# env = gym.make("jigsaw-v0", render_mode="human")
# env = gym.make("RushHour-v0", render_mode = "human")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample() # this is where you would insert your policy
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
The observation is an RGB image. The actions - up, down, left, and right (which moves the blank tile in that direction). The goal of the puzzle is to manipulate the tiles in order to get the goal format.
Start State | Goal State |
---|---|
The observation is an RGB image. The actions - [[a, b], [c, d]] where
The goal of the puzzle is to manipulate the tiles in order to get the goal format.
Start State | Goal State |
---|---|
Various transformations can be applied to the images in both the N-puzzle and Jigsaw tasks to make the task perceptually challenging for the agents. (Currently supports "BLUR", "CONTOUR", "EDGE_ENHANCE", "DETAIL, "EDGE_ENHANCE_MORE", "EMBOSS", "FIND_EDGES", "SHARPEN", "SMOOTH", "SMOOTH_MORE" from PIL library)
Blur Image | Contour Image |
---|---|
In addition, the number of pieces in the puzzle can also be changed to increase or decrease difficulty of the task.
Puzzle size | Puzzle |
---|---|
4-tiles | |
36-tiles | |
64-tiles | |
144-tiles |
The rush-hour game is sliding puzzle where the goal is to have the red tile (in our case, also indexed as 0) reach the right end of the board. More about this game can be read here and in this amazing blog.
The observation is an RGB image. The actions - [a, b] where a is the index of the tile, and b is the direction (up, down, left, right).
Example State |
---|
This project uses rush.txt file from rush under the MIT-License.
If you use this project or find it inspirational for your work, please cite it as follows:
@misc{Maini2024,
author = {Maini, Sahaj Singh},
title = {{Visual-Puzzles: Simple Puzzle(ing) Benchmark For Visual Reasoning in Vision Language Models}},
year = {2024},
url = {https://github.com/Sahaj09/Visual-Puzzles}
}