Skip to content

Commit f15e22b

Browse files
authored
Train job keeps 3 checkpoints at a time
This may be useful for recovering from NaN problems
1 parent 2466daf commit f15e22b

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

run_summarization.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ def setup_training(model, batcher):
161161
convert_to_coverage_model()
162162
if FLAGS.restore_best_model:
163163
restore_best_model()
164-
saver = tf.train.Saver(max_to_keep=1) # only keep 1 checkpoint at a time
164+
saver = tf.train.Saver(max_to_keep=3) # keep 3 checkpoints at a time
165165

166166
sv = tf.train.Supervisor(logdir=train_dir,
167167
is_chief=True,

0 commit comments

Comments
 (0)