Images are generated by using Blender to invoke the script render_images.py
like this:
blender --background --python render_images.py -- [args]
Any arguments following the --
will be captured by render_images.py
.
This command should be run from the image_generation
directory, since by default the script will load resources from the data
directory.
When rendering on cluster machines without audio drivers installed you may need to add the -noaudio
flag to the Blender invocation like this:
blender --background -noaudio --python render_images.py -- [args]
You can also run render_images.py
as a standalone script to view help on all command line flags like this:
python render_images.py --help
You will need to download and install Blender; code has been developed and tested using Blender version 2.78c but other versions may work as well.
Blender ships with its own version of Python 3.5, and it uses its bundled Python to execute scripts. You'll need to add this directory to the Python path of Blender's bundled Python with a command like this:
echo $PWD >> $BLENDER/$VERSION/python/lib/python3.5/site-packages/clevr.pth
where $BLENDER
is the directory where Blender is installed and $VERSION
is your Blender version; for example on OSX you might run:
echo $PWD >> /Applications/blender/blender.app/Contents/Resources/2.78/python/lib/python3.5/site-packages/clevr.pth
The file data/base_scene.blend
contains a Blender scene used for the basis of all CLEVR images. This scene contains a ground plane, a camera, and several light sources. After loading the base scene, the positions of the camera and lights are randomly jittered (controlled with the --key_light_jitter
, --fill_light_jitter
, --back_light_jitter
, and --camera_jitter
flags).
After the base scene has been loaded, objects are placed one by one into the scene. The number of objects for each scene is a random integer between --min_objects
(default 3) and --max_objects
(default 10), and each object has a random shape, size, color, and material.
After placing all objects, we ensure that no objects are fully occluded; in particular each object must occupy at least 100 pixels in the rendered image (customizable using --min_pixels_per_object
). To accomplish this, we assign each object a unique color and render a version of the scene with lighting and shading disabled, writing it to a temporary file; we can then count the number of pixels of each color in this pre-render to check the number of visible pixels for each object.
Each invocation of render_images.py
will render --num_images
images, and they will be numbered starting at --start_idx
(default 0). Using non-default values for --start_idx
allows you to distribute rendering across many workers and recombine their results later without filename conflicts.
Each object is positioned randomly, but before actually adding the object to the scene we ensure that its center is at least --min_dist
units away from the centers of all other objects. We also ensure that between each pair of objects, the left/right and front/back distance along the ground plane is at least --margin
units; this helps to minimize ambiguous spatial relationships. If after --max_retries
attempts we are unable to find a suitable position for an object, then all objects are deleted and placed again from scratch.
By default images are rendered at 320x240
, but the resolution can be customized using the --height
and --width
flags.
Rendering uses CPU by default, but if you have an NVIDIA GPU with CUDA installed then you can use the GPU to accelerate rendering by adding the flag --use_gpu 1
. Blender also supports acceleration using OpenCL which allows the use of non-NVIDIA GPUs; however this is not currently supported by render_images.py
.
You can control the quality of rendering with the --render_num_samples
flag; using fewer samples will run more quickly but will result in grainy images. I've found that 64 samples is a good number to use for development; all released CLEVR images were rendered using 512 samples. The --render_min_bounces
and --render_max_bounces
control the number of bounces for transparent objects; I've found the default of 8 to work well for these options.
When rendering, Blender breaks up the output image into tiles and renders tiles sequentialy; the --render_tile_size
flag controls the size of these tiles. This should not affect the output image, but may affect the speed at which it is rendered. For CPU rendering smaller tile sizes may be optimal, while for GPU rendering larger tiles may be faster.
With default settings, rendering a 320x240 image takes about 4 seconds on a Pascal Titan X. It's very likely that these rendering times could be drastically reduced by someone more familiar with Blender, but this rendering speed was acceptable for our purposes.
You can save a Blender .blend
file for each rendered image by adding the flag --save_blendfiles 1
. These files can be more than 5 MB each, so they are not saved by default.
Rendered images are stored in the --output_image_dir
directory, which is created if it does not exist. The filename of each rendered image is constructed from the --filename_prefix
, the --split
, and the image index.
A JSON file for each scene containing ground-truth object positions and attributes is saved in the --output_scene_dir
directory, which is created if it does not exist. After all images are rendered the JSON files for each individual scene are combined into a single JSON file and written to --output_scene_file
. This single file will also store the --split
, --version
(default 1.0), --license
(default CC-BY 4.0), and --date
(default today).
When rendering large numbers of images, I have sometimes experienced random Blender crashes; saving JSON files for each scene as they are rendered ensures that you do not lose information for scenes already rendered in the event of a crash.
If saving Blender scene files for each image (--save_blendfiles 1
) then they are stored in the --output_blend_dir
directory, which is created if it does not exist.
The file --properties_json
file (default data/properties.json
) defines the allowed shapes, sizes, colors, and materials used for objects, making it easy to extend CLEVR with new object properties.
Each shape (cube, sphere, cylinder) is stored in its own .blend
file in the --shape_dir
(default data/shapes
); the file X.blend
contains a single object named X
centered at the origin with unit size. The shapes
field of the JSON properties file maps human-readable shape names to .blend
files in the --shape_dir
.
The colors
field of the JSON properties file maps human-readable color names to RGB values between 0 and 255; most of our colors are adapted from Wad's Optimum 16 Color Palette.
The sizes
field of the JSON properties file maps human-readable size names to scaling factors used to scale the object models from the --shape_dir
.
Each material is stored in its own .blend
file in the --material_dir
(default data/materials
). The file X.blend
should contain a single NodeTree item named X, and this NodeTree item must have a single Color
input that accepts an RGBA value so that each material can be used with any color. The materials
field of the JSON properties file maps human-readable material names to .blend
files in the --material_dir
.
The optional --shape_color_combos_json
flag can be used to restrict the colors of each shape. If provided, this should give a path to a JSON file mapping shape names to lists of allowed color names. This option can be used to render CLEVR-CoGenT images using the files data/CoGenT_A.json
and data/CoGenT_B.json
.