This repository is an implementation of the paper titled above.
https://arxiv.org/abs/2407.19787
Download the dataset from https://huggingface.co/datasets/omron-sinicx/scipostlayout_v2. Then, place the dataset directory as ./scipostlayout
.
We run all models on Python3.10 and CUDA12.1. Run the following commands to pull docker image and run docker container.
sh run_docker.sh
We consider layout analysis as an object detection problem and we use LayoutLMv3 and DiT as baselines.
Hyperparameter details (for both models):
- Model size: Base
- Backborn: Cascade R-CNN
- Epoch: 100
- Warm-up steps: 1000
- Weight decay: 0.05
- Batch size: 4
- Learning rate:
$lr \in {2e^{-5}, 5e^{-5}, 2e^{-4}}$
The checkpoint with the best performance on the dev set was used for evaluation.
- Run the following commands to install dependencies. (We downgrade torch to avoid errors after installing detectron2)
cd /scipostlayout/code/layoutlmv3/object_detection
apt update
apt upgrade -y
apt install -y gcc-10 g++-10
export CC=/usr/bin/gcc-10
export CXX=/usr/bin/g++-10
python3 -m venv layoutlm-venv
source layoutlm-venv/bin/activate
pip3 install -r requirements.txt
pip3 install 'git+https://github.com/facebookresearch/detectron2.git'
pip3 install torch==2.0.1 torchvision==0.15.2 "numpy<2"
- Download the image dataset (see instructions above). The composition should be like the following:
$ ls scipostlayout/poster/png
train/
dev/
test/
train.json
dev.json
test.json
- Download pre-trained checkpoint.
git clone https://huggingface.co/microsoft/layoutlmv3-base
Run the following commands to train and inference LayoutLMv3. [script]
config.json
would not be generated by training process but is neccasary for inference, plese cp
from the original pre-trained checkpoint directory.
cp ./layoutlmv3-base/config.json ./lr_0.0002_max_iter_22500/
You need to specify the path of the pre-trained checkpoint and the dataset.
Please refer to cascade_layoutlmv3.yaml for hyperparameter details.
cd /scipostlayout/code/layoutlmv3/object_detection
MODEL_PATH=./layoutlmv3-base/pytorch_model.bin
OUT_PATH=.
LR=0.0002
MAX_ITER=22500
python3 train_net.py --config-file cascade_layoutlmv3.yaml --num-gpus 4 \
MODEL.WEIGHTS $MODEL_PATH \
PUBLAYNET_DATA_DIR_TRAIN PATH_TO/scipostlayout/poster/png/train \
PUBLAYNET_DATA_DIR_TEST PATH_TO/scipostlayout/poster/png/dev \
SOLVER.GRADIENT_ACCUMULATION_STEPS 1 \
SOLVER.IMS_PER_BATCH 4 \
SOLVER.BASE_LR $LR \
SOLVER.WARMUP_ITERS 1000 \
SOLVER.MAX_ITER $MAX_ITER \
SOLVER.CHECKPOINT_PERIOD 2250 \
TEST.EVAL_PERIOD 2250 \
OUTPUT_DIR $OUT_PATH/lr_${LR}_max_iter_${MAX_ITER}
python3 train_net.py --config-file cascade_layoutlmv3.yaml --eval-only --num-gpus 4 \
MODEL.WEIGHTS $OUT_PATH/lr_0.0002_max_iter_22500/model_final.pth \
PUBLAYNET_DATA_DIR_TEST PATH_TO/scipostlayout/poster/png/test \
OUTPUT_DIR $OUT_PATH/lr_0.0002_max_iter_22500
- Install dependencies
The virtualenv
layoutlm-venv
made in 1. LayoutLMv3 should be able to use for DiT too.
source /scipostlayout/code/layoutlmv3/object_detection/layoutlm-venv/bin/activate
-
Download the image dataset. (same as LayoutLMv3)
-
Download pre-trained checkpoint and rename it to dit-base-224-p16-500k.pth.
- You need to specify the path of the dataset in
/scipostlayout/code/dit/object_detection/train_net.py
.
register_coco_instances(
"scipostlayout_train",
{},
"PATH_TO/scipostlayout/poster/png/train.json",
"PATH_TO/scipostlayout/poster/png/train"
)
register_coco_instances(
"scipostlayout_dev",
{},
"PATH_TO/scipostlayout/poster/png/dev.json",
"PATH_TO/scipostlayout/poster/png/dev"
)
register_coco_instances(
"scipostlayout_test",
{},
"PATH_TO/scipostlayout/poster/png/test.json",
"PATH_TO/scipostlayout/poster/png/test"
)
- Run the following commands to train and inference on DiT. You need to specify the path of the pre-trained checkpoint. [script]
Please refer to scipostlayout_configs for hyperparameter details.
cd /scipostlayout/code/dit/object_detection
MODEL_PATH=./checkpoints/dit-base-224-p16-500k.pth
OUT_PATH=.
LR=0.00002
MAX_ITER=22500
python3 train_net.py \
--config-file scipostlayout_configs/cascade/cascade_dit_base.yaml \
--num-gpus 4 \
MODEL.WEIGHTS $MODEL_PATH \
SOLVER.IMS_PER_BATCH 4 \
SOLVER.BASE_LR $LR \
SOLVER.WARMUP_ITERS 1000 \
SOLVER.MAX_ITER $MAX_ITER \
SOLVER.CHECKPOINT_PERIOD 2250 \
TEST.EVAL_PERIOD 2250 \
OUTPUT_DIR $OUT_PATH/lr_${LR}_max_iter_${MAX_ITER}
python3 train_net.py --config-file scipostlayout_configs/cascade/cascade_dit_base.yaml --eval-only --num-gpus 1 \
MODEL.WEIGHTS lr_0.00002_max_iter_22500/model_0022499.pth \
OUTPUT_DIR $OUT_PATH/results/lr_${LR}_max_iter_${MAX_ITER}
- make a virtualenv and install poetry inside it.
cd /scipostlayout/code/layout-dm
python3 -m venv layoutdm-venv
source layoutdm-venv/bin/activate
curl -sSL https://install.python-poetry.org | python3 -
echo 'export PATH="/root/.local/bin:$PATH"' >> ./layoutdm-venv/bin/activate
- Install dependencies.
poetry install
Please refer to the official README for more details.
To evaluate layout generation models, one has to train a FID model first.
Create a directory /scipostlayout/code/layout-dm/download/datasets/scipostlayout-max50/raw
and copy all files under /scipostlayout/poster/png
into the created directory.
cp -r /scipostlayout/scipostlayout/poster/png/* /scipostlayout/code/layout-dm/download/datasets/scipostlayout-max50/raw/
Rename dev.json to val.json under raw
directory.
And then run the following command to train a FID model. The training process should take a few days using an A100 GPU. [script]
Hyperparameter details:
- Training steps: 2e5
- Batch size: 64
- Learning rate: 3e-4
poetry run python3 src/trainer/trainer/fid/train.py \
src/trainer/trainer/config/dataset/scipostlayout.yaml \
--out_dir download/fid_weights/FIDNetV3/scipostlayout-max50
The checkpoints will be saved in /scipostlayout/code/layout-dm/download/fid_weights
. model_best.pth.tar
is used in all models' evaluation processes.
Create a directory /scipostlayout/code/layout-dm/download/clustering_weights
.
First conduct clustering before training. [script]
poetry run python3 bin/clustering_coordinates.py src/trainer/trainer/config/dataset/scipostlayout.yaml kmeans --result_dir download/clustering_weights
Run the following command to train LayoutDM. [script]
bash bin/train.sh scipostlayout layoutdm
Run the following command to inference. [script]
Update JOB_DIR to change the target results.
CONDS=(c cwh partial refinement relation)
JOB_DIR=/scipostlayout/code/layout-dm/tmp/jobs/scipostlayout/layoutdm_xxxxxxxx
RESULT_DIR=/scipostlayout/code/layout-dm/result_dir
for cond in ${CONDS[@]}; do
poetry run python3 -m src.trainer.trainer.test \
cond=$cond \
job_dir=$JOB_DIR \
result_dir=${RESULT_DIR}/${cond} \
is_validation=true
done
- is_validation=true: used to evaluate the generation performance on validation set instead of test set. This must be used when tuning the hyper-parameters.
We use the same evaluation code in LayoutDM and other models to ensure the consistency of results (Gen_T as an example). The visualization images will be saved under the result dir.
poetry run python3 eval.py /scipostlayout/code/layout-dm/result_dir/c/c_temperature_1.0_name_random_num_timesteps_100_validation
Run the following commands to install dependencies.
cd /scipostlayout/code/LayoutFormer++
python3 -m venv layoutformer-venv
source layoutformer-venv/bin/activate
pip3 install -r requirements.txt
Copy the FID checkpoint trained in LayoutDM part to /scipostlayout/code/LayoutFormer++/src/net
and rename to fid_scipostlayout.pth.tar
.
If the following error occurs, plese apply fix from torch._six import inf
to from torch import inf
in the library.
Traceback (most recent call last):
File "/scipostlayout/code/LayoutFormer++/src/main.py", line 5, in <module>
from deepspeed.runtime.lr_schedules import WarmupLR
File "/scipostlayout/code/LayoutFormer++/layoutformer-venv-tmp/lib/python3.10/site-packages/deepspeed/__init__.py", line 16, in <module>
from .runtime.engine import DeepSpeedEngine, DeepSpeedOptimizerCallable, DeepSpeedSchedulerCallable
File "/scipostlayout/code/LayoutFormer++/layoutformer-venv-tmp/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 24, in <module>
from deepspeed.runtime.utils import see_memory_usage, get_ma_status, DummyOptim
File "/scipostlayout/code/LayoutFormer++/layoutformer-venv-tmp/lib/python3.10/site-packages/deepspeed/runtime/utils.py", line 18, in <module>
from torch._six import inf
ModuleNotFoundError: No module named 'torch._six'
Please refer to the official README for more details.
Create a directory /scipostlayout/code/LayoutFormer++/datasets/scipostlayout/raw/scipostlayout
and copy the dataset under /scipostlayout/poster/png
to the created directory to start training. Rename dev.json to val.json under raw/scipostlayout
directory. When training starts, the dataset will be preprocessed automatically. We set max_num_elements
to 50 and there are 9 categories in the dataset.
Run the following commands to train the models.
cd /scipostlayout/code/LayoutFormer++/src
[script]
If you want to change the parameters, please refer to the scripts in /scipostlayout/code/LayoutFormer++/src/scripts
.
The training process should take 1-5 hours using an A100 GPU.
./scripts/scipostlayout_gen_t.sh train ../datasets ../results/gen_t basic 1 none
./scripts/scipostlayout_gen_ts.sh train ../datasets ../results/gen_ts basic 1 none
./scripts/scipostlayout_gen_r.sh train ../datasets ../results/gen_r basic 1 none
./scripts/scipostlayout_completion.sh train ../datasets ../results/completion basic 1 none
./scripts/scipostlayout_refinement.sh train ../datasets ../results/refinement basic 1 none
Run the following commands to inference on the test set. [script]
By default, we train the models for 200 epochs and use the final checkpoint for evaluation.
Attention: programs output evaluation results in this step, but in order to take the same evaluation settings as LayoutPrompter, we conduct evaluation to prediction files independently.
The visualization images will be saved under the result dir, for example, /scipostlayout/code/LayoutFormer++/results/completion/completion/epoch_199/pics
.
./scripts/scipostlayout_gen_t.sh test ../datasets ../results/gen_t basic 1 epoch_xxx
./scripts/scipostlayout_gen_ts.sh test ../datasets ../results/gen_ts basic 1 epoch_xxx
./scripts/scipostlayout_gen_r.sh test ../datasets ../results/gen_r basic 1 epoch_xxx
./scripts/scipostlayout_completion.sh test ../datasets ../results/completion basic 1 epoch_xxx
./scripts/scipostlayout_refinement.sh test ../datasets ../results/refinement basic 1 epoch_xxx
We save prediction and gold label files during inference (gen_t as an example).
/scipostlayout/code/LayoutFormer++/results/gen_t/gold_labels.pth
/scipostlayout/code/LayoutFormer++/results/gen_t/predictions.pth
Run /scipostlayout/code/LayoutPrompter/src/eval_layoutformer.py
to conduct evaluation. You need to specify the FID model's path (which was trained in the LayoutDM part) and the prediction and gold label files' path in the program. You need to setup the environment for LayoutPrompter based on the below section.
cd /scipostlayout/code/LayoutPrompter
source layoutprompter-venv/bin/activate
cd src
python eval_layoutformer.py
Update the result_path in eval_layoutformer.py
to change the target results.
result_path = "/scipostlayout/code/LayoutFormer++/results/gen_t"
Run the following commands to install dependencies.
cd /scipostlayout/code/LayouPrompter
python3 -m venv layoutprompter-venv
source layoutprompter-venv/bin/activate
pip3 install -r requirements.txt
Please refer to the official README for more details.
LayoutPrompter needs the dataset that LayoutFormer++ processed. Copy .pt
files in /scipostlayout/code/LayoutFormer++/datasets/scipostlayout/pre_processed_50_9
to /scipostlayout/code/LayoutPrompter/datasets/scipostlayout-max50/raw
to start to use LayoutPrompter.
To inference on LayoutPrompter, you need to prepare OpenAI API key. We use gpt-4-1106-preview
instead of text-davinci-003
for greater context length.
Run the following commands to inference on LayoutPrompter (gent
as an example). You need to specify the OPENAI_API_KEY, the OPENAI_ORGANIZATION, and the FID model's path (which was trained in the LayoutDM part). [script]
Evaluation will be automatically conducted after inference. We calculate metrics on the top-1 layouts of the layout ranker. Please refer to issue LayoutPrompter evaluation code? for details. The visualization images will be saved under the result dir.
cd /scipostlayout/code/LayoutPrompter
python3 src/constraint_explicit.py \
--task gent \
--base_dir . \
--fid_model_path $FID_MODEL_PATH
We use Nougat to parser papers' PDFs.
cd /scipostlayout/code/Paper-to-Layout
pip3 install nougat-ocr
nougat ../../dataset/paper/dev -o mmd/dev -m 0.1.0-base --recompute --no-skipping --batchsize 8
nougat ../../dataset/paper/test -o mmd/test -m 0.1.0-base --recompute --no-skipping --batchsize 8
The parsed mmd files are included in scipostlayout/paper/mmd
.
Copy scipostlayout/paper/mmd
to /scipostlayout/code/Paper-to-Layout/
.
We use GPT4 to extract constraints for layout generation from papers. Run the following commands to start inference. We provide three different prompts (base/rule/rule_react).
pip3 install --upgrade pip
pip3 install openai tqdm
export OPENAI_API_KEY='YOUR_API_KEY'
export OPENAI_ORGANIZATION='YOUR_ORGANIZATION'
python3 extract_constraints_gent.py \
--data_path ../../scipostlayout/poster/png/test.json \
--mmd_path mmd/test \
--prompt_path prompt/prompt_base.txt \
--model gpt-4-1106-preview
python3 extract_constraints_gent.py \
--data_path ../../scipostlayout/poster/png/test.json \
--mmd_path mmd/test \
--prompt_path prompt/prompt_rule.txt \
--model gpt-4-1106-preview
python3 extract_constraints_gent.py \
--data_path ../../scipostlayout/poster/png/test.json \
--mmd_path mmd/test \
--prompt_path prompt/prompt_rule_react.txt \
--model gpt-4-1106-preview
Generate layouts based on extracted constraints. Run the scripts inside each model directory.
[script]
cond=c
JOB_DIR=/scipostlayout/code/layout-dm/tmp/jobs/scipostlayout/layoutdm_xxxxxxxx
RESULT_DIR=/scipostlayout/code/layout-dm/result_dir
poetry run python3 -m src.trainer.trainer.test \
cond=$cond \
job_dir=$JOB_DIR \
result_dir=${RESULT_DIR}/${cond} \
gen_const_path="/scipostlayout/code/Paper-to-Layout/results/test/prompt_rule.json"
# is_validation=true
Run evaluation.
source layoutdm-venv/bin/activate
poetry run python3 eval.py /scipostlayout/code/layout-dm/result_dir/c_const_rule/c_temperature_1.0_name_random_num_timesteps_100_test
[script]
./scripts/scipostlayout_gen_t.sh test ../datasets ../results/gen_t basic 1 epoch_199 /scipostlayout/code/Paper-to-Layout/results/test/prompt_rule.json
Run /scipostlayout/code/LayoutPrompter/src/eval_layoutformer.py
to conduct evaluation.
cd /scipostlayout/code/LayoutPrompter
source layoutprompter-venv/bin/activate
cd src
python eval_layoutformer.py
[script]
python3 src/constraint_explicit.py \
--task gent \
--base_dir /scipostlayout/code/LayoutPrompter \
--fid_model_path /scipostlayout/code/layout-dm/download/fid_weights/FIDNetV3/scipostlayout-max50/model_best.pth.tar \
--gen_const_path /scipostlayout/code/Paper-to-Layout/results/test/prompt_rule.json \
--use_saved_response
Generate layouts from summarized papers. Run the scripts inside each model directory.
[script]
python3 src/constraint_explicit.py \
--task genp \
--base_dir /scipostlayout/code/LayoutPrompter \
--fid_model_path /scipostlayout/code/layout-dm/download/fid_weights/FIDNetV3/scipostlayout-max50/model_best.pth.tar \
--mmd_dir /scipostlayout/code/Paper-to-Layout/mmd \
--use_saved_response
If you find this code useful for your research, please cite our paper and the above repositories.:
@misc{tanaka2024scipostlayoutdatasetlayoutanalysis,
title={SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters},
author={Shohei Tanaka and Hao Wang and Yoshitaka Ushiku},
year={2024},
eprint={2407.19787},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.19787},
}